Machine Learning  

 

 

 

 

Alexnet

 

 

Alexnet is a monumental neural network that was first proven that CNN (Convolutional Neural Network) performs better than any other types of neural network in terms of image classification. You may think of this as the ancestor of most of CNN based neural network that you see now a days. So it would be important and worth having detailed understanding on this neural network.

 

 

 

Highlights of Architecture

 

  • Number of Parameters : 60 Million
  • Number of Neurons : 650,000
  • Number of Convolutional Layer : 5 Convolutional Layers + Max Pooling layers
    • Fiirst Layer : 224 x 224 x 3 image input, 96 kernels of size 11 x 11 x 3 with stride of 4
    • Second Layer : 256 kernels of size 5 x 5 x 48
    • Third Layer : 384 kernels of size 3 x 3 x 256
    • Forth Layer : 384 kernels of size 3 x 3 x 192
    • Fifth Layer : 256 kernels of size 3 x 3 x 192
  • Number of Fully Connected Network : 3.
    • 4096 neurons for each.
    • Final layer with 1000 way softmax
  • Use of Non-saturating Neuron (ReLU) 
  • Use of Dropout
  • Input Size : 256x256.

 

 

 

Why ReLU ?

 

Why ReLU(a Non-Saturating  Function) than tanh(a saturating function) ? it is because it is observed that ReLU learns several times faster than hanh (shown in Figure 1 of Ref [1]).

ReLU does not require input normalization to prevent them from saturating.

 

 

 

Why use of Dropout ?

 

It is to reduce overfitting in the fully-connected layers.

 

 

 

How to process the training image to fit into input dimension ?

 

The training image is not all same as as this. So the authors rescale the image in such a way that the shorter side is of length 256 and then cropped out the central 256x256 patch from the rescaled image. They trained the network on the raw RGB values of the pixel.

 

 

 

Why CNN rather than standard feedforward Network ?

 

Theoretically the standard feedforward network can solve any types of classification problem if the enough number of neurons are provided, but in practice we don't know exactly what is the enough number for our application.. and we don't know 'enough number' can be trained by the reasonable/practical computing power.

 

The paper (Ref [1]) says as follows :

    Compared to standard feedforward neural networks with similarly-sized layers, CNNs have much fewer connections and parameters and so they are easier to train, while their theoretically-best performance is likely to be only slightly worse.

 

 

 

Fighting against Overfitting

 

As you may know, one of the ever occuring problem in most of neural network based Machine Learning is to overfitting issue. According to the paper (Ref [1]), there are a few common technique are used to tackle the issue of overfitting as summarized below.

    Data Augmentation : This is to transform images in such a way that it become slightly different from the image but not much different to fall out of the category it is labeled. And add those transformed data to the set of training data. It means that the number of training data set gets larger than the original training set.  This paper (Ref [1]) uses two types of Data Augmentation as follows.

    • Generating image translation and horizontal reflection. They do this by extracting random 224 x 224 patches from the 256 x 256 original images and use those exctracted image as training data
    • Altering the intensities of the RGB Channels in training image. For this, the authors performed PCA on the set of RGB pixel values throughout the training set and added multiples of the found principal components with magnitudes proportional to [the corresponding eigen values x a random Gaussian variable N(0,0.1)]

    Dropout  : This is a technique to remove a hidden layer neuron with a certain probability. This does not mean that we physically remove those nuerons. We can make it act as those are removed by setting the output of those dropout neuron to be zero. According to this paper(Ref[1], the neurons which are "dropped out" in this way do not contribute to the forward pass and do not participate in back propagation. So everytime an input is presented, the neural network samples a different architecture, but all these architectures share weights.

  

 

 

 

 

Reference

 

[1] ImageNet Classification with Deep Convolutional Neural Networks

                   by Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton