Image Classification - Convolutional Neural Network using Keras


Aim:
The main purpose of this project is to classify images using Convolutional Neural Network. For this, the cat vs dog dataset will be used, which has 8000 training images (4000 each) and 2000 test images.

2 methods will be used and compared:
a. Normal feed forward CNN
b. Data (Image) Augmentation technique.

To learn about how Convolutional neural network works, Click here


Data Augmentation explaination:
This method works very well when the dataset is small, i.e. very less images to train our neural network. Just like our cat vs dog dataset, which has only 4000 images of dogs and cats each.
In order to increase our training samples we can scrape through the internet collecting images. But this is very boring and expensive.
Enter Data Augmentation technique, which does the work for us. It increases our training samples in very less time.

Data/Image augmentation is the process of taking the images that are already in our training dataset and manipulating them to create many altered versions on the same images. 

Advantages:
1. This provides a larger training samples for the neural network to train on.
2. Also exposes our classifier to variety of colored and lighting versions of images, which makes the classifier robust by training on them.  

Techniques involved:
Scaling, translation, rotation at 90 degrees, rotation at finer angles, flipping, adding noise, lighting conditions, distortions etc etc.
Here is an example:

Note:
Images collected from the internet will be of varying sizes, so before appyling data augmentation techniques these images has to resized to a fixed size.

Lets apply our methods and compare them:

A. Normal feed-forward CNN
In this method, data augmentation techniques will not be used. We will just preprocess the available data and apply it to our neural network.

Dataset:
All the images of cats and dogs are in the same folder and each image has a filename (either cat or dog), no extra csv file with the labels are given.

1. so first I will write an ad-hoc function just to extract the labels of these images and store them in a dataframe in the same order as the images. This will be our target labels.
2. then load the images, resize them to a fixed size and map them to their respective target labels.
3. feed these samples to our neural network.

Note: This is an expensive process, as it takes a lot of time to extract the target labels, load the images, resize the images and map them both accordingly.

Model 1:

Network Architecture of the model: 
Images:     64x64 RGB scaled images
CNN part: 1 convolutional layer followed by 2x2 maxpooling
              1 hidden layer with 128 neurons
              1 output layer of size 2 with softmax activation
Epochs:     20
Batch size: 32

Training samples: 6400, Validation samples: 1600; Split Ratio: 80/20
This is a serious case of underfitting, where the training accuracy is lower (high loss) than the validation set. Typically, the test accuracy should not be more than the training accuracy, even if it is more, then the gap should not be very far off.

But in this case, the model is generalizing much much better on the unseen test data with a perfect score. This may be because the training data on which the model trained on is very worse to train and the test data is very simple for the model to classify. 
Also, the dataset is very small (less training samples), which can also be a reason.

Although we are getting a perfect accuracy for this test/validation set, which is our aim, this model cannot be trusted. 
Because:
1. The training score is very low, which is not good.
2. If we apply a different set of test images (totally unseen), maybe the model will perform worse.

What can be done?
1. Increase the training size.
2. Increase the number of convolutional layers and hidden layers.
3. Tune the model with different model parameters.

To view the code for the Normal Feed forward CNNClick here

NOTE: 
This method worked very well for the MNIST dataset, where there are gray-scaled 49,000 training and 21,000 test samples.
Validation accuracy -> above 99%
Click on the link to follow this project: Click here


B. Data/Image Augmentation method
To perform this task, we will use the ImageDataGenerator module of tensorflow.

NOTE: we do not need to resize the images to a fixed size, as this module will take care of this by its own.

Dataset:
In the previous method, we saw that the training images has to be in the same folder. But in this case, the images has to be in subfolders. Each subfolder will contain images of a single class. 
Suppose, there are 5 different classes of images, then there should be 5 subfolders.

Model 1:

Network Architecture of the model: 
Images:     64x64 RGB scaled images
CNN part: 1 convolutional layer followed by 2x2 maxpooling
              1 hidden layer with 128 neurons
              1 output layer of size 2 with softmax activation
Epochs:     20
Batch size: 32
This model seems to be doing fine, although not the perfect classifier. The validation accuracy nearly touched 70% with a single convolution layer. Lets try out another model with an extra convolution layer, increasing the complexity.

Model 2:

Network Architecture of the model: 
Images:     64x64 RGB scaled images
CNN part: 2 convolutional layer followed by 2x2 maxpooling
              1 hidden layer with 128 neurons
              1 output layer of size 2 with softmax activation
Epochs:     20
Batch size: 32
Yes, increasing the complexity of the model, increased the performance with validation accuracy touching 80%. Although there is evidence of overfitting towards the end. This can be fixed with a more complexed model.

What can be done?
1. Increase number of convolution layers and hidden layers.
2. Increase the number of epochs.
3. Tune the model with different model parameters.

To view the code for Data Augmentation technique,
 Click here


Conclusion:
We compared 2 methods to deal with image classification, Normal feed-forward CNN and Data(Image) augmentation technique. With a small dataset such as this, it is very hard to train a model to give a good performance, as can be seen from the normal feed-forward CNN which performed so poorly. So to deal with such problem we used the data augmentation technique which helped us gain a good performing model that can be trusted.

Comments

Post a Comment

Popular posts from this blog

A/B Testing: Bernoulli Model VS the Beta-Binomial Hierarchical Model

Exploratory Data Analysis and Hypothesis Testing - Loan Prediction

Recurrent Neural Networks and LSTM explained