Deep Learning - Natural Images Predictive Modelling

NATURAL IMAGES CLASSIFICATION USING CONVOLUTIONAL NEURAL NETWORKS

Introduction

Objective: The objective of this exercise is to train a convolutional neural network (CNN) model to classify the content of any given image within the Natural Images dataset into one of eight categories with high accuracy. The exercise explores how to improve the efficiency of a CNN first by ensuring the data fed into it is well prepared and encoded, as well as improve model performance through hyperparameter tuning.

Executive Summary: This report will detail the approach taken to achieve a validation accuracy score of 95.58% with a convolutional neural network model on the Natural Images dataset. The report will feature a description of the dataset, details on how the dataset was encoded, and a breakdown of the model features i.e., layers and options used. The report will also provide an overview of some results of hyperparameter tuning as well as a summary of the results achieved from the model. It will

provide visual representations such as train, test and validation plots and confusion matrix plots of some of the results obtained using different hyperparameters.

About the Dataset

The Natural Images dataset contains 6,899 images in 8 categories namely airplanes, cars, cats, dogs, flowers, fruits, motorbikes, and person compiled from various sources. This dataset was selected as it covers a broad spectrum of image categories (animate and inanimate). This dataset is created as a benchmark dataset for the work on Effects of Degradations on Deep Neural Network Architectures.

Airplane: This category includes coloured landscape images of commercial jets, fighter jets, commercial airplanes, et al in take-off, on-air and parked states. There are 727 images in this category. Images in this category have varying dimensions. Images for this category were obtained from http://host.robots.ox.ac.uk/pascal/VOC.

Car: This category includes coloured images of automobiles of different brands and models. There are 968 images in the category. All images have the same dimension of 100x100. The images were obtained from https://ai.stanford.edu/~jkrause/cars/car_dataset.html.

Cat: This category includes coloured images of different cat breeds, some images showing more than one cat in each frame. Most images in this category show the full image of the cats in them while some seem to have been cropped. There is a lot of background noise in the images. There are a total of 885 images in this category and they have varying dimensions. The images were obtained from https://www.kaggle.com/c/dogs-vs-cats.

Dog: This category includes coloured images of dogs across a diverse range of breeds, heights and colours. The images were taken from different angles with some showing the entire frame of the dogs while others showed only the face or upper body region of the dogs. A few images show more than one dog in them. There appears to be a lot of background noise in images of this category as well. There are 885 images in this category with varying image dimensions. The images were also obtained from [https://www.kaggle.com/c/dogs-vs-cats.](https://www.kaggle.com/c/dogs-vs-cats)

Flower: This category includes coloured images of different flower species such as daffodils, lilies, roses, et al. A handful of the images in the category show other objects such as animals/insects interacting with the flowers but there seems to be a general theme of minimal noise (background/otherwise) in most of the images. There are 843 images in this category with varying image dimensions. Images in this category were obtained from http://www.image-net.org/.

Fruit: This category includes images of different types of fruits like apples, oranges, strawberries, et al. There is no background noise in any of the images in this category as it appears this has already been removed by the photographer. The category appears to have a majority of round shaped fruits in most of the images, however retaining a nice balance of the different fruit types. All images in this category have the same dimension (100 x 100). There are 1000 fruits in this category. Images in this category were obtained from https://www.kaggle.com/moltean/fruits.

Motorbike: This category includes coloured landscape images of motorbikes, most of which appear very similar save for the difference in colour. A few of the images show parts of humans posing with the motorbikes while others have some background noise in them. There are 788 images in this category with varying image dimensions. The images for this category were obtained from http://host.robots.ox.ac.uk/pascal/VOC.

Person: The person category includes coloured images of people. There are more images of men than women in this category as well as more images of white/fair skinned people than any other race or skin colour. There is also some repetition in the images as some 4/5 images could contain the same person. There are also images with other objects in focus whereas the person in the image is in the background; this can introduce some noise to the algorithm. The images mostly show the heads and necks of the people in them while some show other body parts. There are 986 images in this category with the same dimension (256 x 256). Images in this category were obtained from http://www.briancbecker.com/blog/research/pubfig83-lfw-dataset.

(Natural Images, n.d.)

Distribution of categories in the Natural Images Dataset

Data Encoding

I used both MATLAB online and the MATLAB software for this exercise. MATLAB online offers access to a wide range of toolboxes and is efficient for visualization, data augmentation and deep learning. I however encountered some issues with parallel pool but was able to continue my work on MATLAB software. On the PC application, I used two toolboxes namely the Parallel Computing toolbox and the Deep Learning toolbox.

To import my data, I downloaded the dataset from the Kaggle website Natural Images | Kaggle, unzipped it on file explorer and uploaded its content onto MATLAB drive online. In a new MATLAB online notebook, I imported the data into a datastore using the imageDatastore function then split the data into train and validation datasets using an 80/20 split. I did the train - validation split before applying the augmentedImageDatastore function as I discovered the SplitEachLabel function does not work on augmented datasets.

Due to the varying dimensions noticed during my assessment of the dataset, I resized the images to a 250 x 250 dimension but maintained the image colours as all the images across the dataset were coloured. Also, because of the noise noticed in some of the datasets, I also removed backgrounds from all the images using the DispatchInBackground parameter. Although this increased my model’s runtime, it increased its performance significantly.

As a result, the final augmentedImageDatastore function hyperparameters included a dimension of [250 250 3] and DispatchInBackground set to true. These hyperparameters were applied to the train and validation datasets to maintain consistency in the types of images fed into the model at both stages.