Image Classification with CNN (Python)
Have you ever wondered how certain technologies can recognize your facial features or how autonomous vehicles detect objects on the road?
In a world where AI and machine learning have become essential for streamlining both professional and personal projects, having some knowledge of image classification can be incredibly useful and surprisingly easy to apply.
In this article, we will use Convolutional Neural Networks (CNNs) to classify cats and dogs.
Requirements
For this, we are going to use Jupyter Notebook (https://jupyter.org/install) with the follow libraries:
import os
import PIL
from PIL import Image
import cv2
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
Dataset
We are going to use a training dataset divided into 2 classes, comprising approximately 2000 dog images and 2000 cat images.
The validation dataset contains around 150 images of dogs and cats and will be automatically split from the original dataset. These images are intended to provide us with a realistic accuracy value for our model. The original dataset contains over 24,000 images, some of which are corrupted. However, we do not require that many images, and there are a few steps we need to take to create the training dataset:
path = 'C:/<PATH>/PetImages/Dog/' #DO WITH CAT TOO!
os.remove('C:/<PATH>/PetImages/Dog/thumbs.db') #DO WITH CAT TOO!
dir_ = os.listdir(path)
# leaves each folder with 2000 images
for image in dir_:
file = os.path.join(path,image)
image = image.split('.')[0]
if int(image)>2000:
os.remove(file)
# removes corrupt files
for filename in dir_:
try:
img = Image.open(path+filename) # open the image file
img.verify() # verify that it is, in fact an image
except (IOError, SyntaxError) as e:
print('Bad file:', filename)
os.remove(path+filename) # print out the names of corrupt files
GPU otimization
If you have an GPU and want to unlock the full potential of your machine when training the neural network, follow the simple tutorial made available by Tensorflow:
Data pre-processing
Using the TensorFlow library, we will randomly organize the images into tensors based on the batch size, resize each image to 256x256 pixels, and assign them their respective classes. Subsequently, the validation dataset will be created by applying a 0.05 split.
Assign the directory path to the "PetImages" directory, which contains two folders: "Dog" and "Cat." The labels (or classes) will be 'inferred,' meaning they will be automatically determined from the subfolders containing the images (Dog and Cat).
img_height = 256
img_width = 256
batch_size= 32
dir = 'C:/.../PetImages'
train_ds = tf.keras.utils.image_dataset_from_directory(
dir,
validation_split=0.1,
labels = 'inferred',
label_mode = 'int',
color_mode='rgb',
subset="training",
interpolation='bilinear',
seed=123,
image_size=(img_height, img_width),
batch_size=batch_size)
val_ds = tf.keras.utils.image_dataset_from_directory(
dir,
validation_split=0.1,
labels = 'inferred',
label_mode = 'int',
color_mode='rgb',
subset="validation",
interpolation='bilinear',
seed=123,
image_size=(img_height, img_width),
batch_size=batch_size)
#Scale images between [0,1]
tf.keras.layers.Rescaling( scale=1./255, offset=0.0 )
We can visualize our images on notebook through
class_names=['Cat','Dog']
plt.figure(figsize=(10, 10))
for images, labels in train_ds.take(1):
for i in range(9):
ax = plt.subplot(3, 3, i + 1)
plt.imshow(images[i].numpy().astype("uint8"))
plt.title(class_names[labels[i]])
plt.axis("off")
Recommended by LinkedIn
Artificial Intelligence
CNN Model
This is the part where we will define the layers of our CNN model with the help of the Keras API. We are going to use the following network, but feel free to try with your own version or use some famous architectures like LeNet, AlexNet, etc. This is not the most optimized and efficient network, so experimenting with different layers may yield better results.
model = keras.Sequential([
layers.Conv2D(32, kernel_size=(3, 3), activation="relu",input_shape = (256,256,3),padding='same'),
layers.MaxPooling2D(pool_size=(2, 2)),
layers.BatchNormalization(),
layers.Conv2D(64, kernel_size=(3, 3), activation="relu",padding='same'),
layers.MaxPooling2D(pool_size=(2, 2)),
layers.BatchNormalization(),
layers.Conv2D(64, kernel_size=(3, 3), activation="relu",padding='same'),
layers.MaxPooling2D(pool_size=(2, 2)),
layers.BatchNormalization(),
layers.Conv2D(128, kernel_size=(3, 3), activation="relu",padding='same'),
layers.MaxPooling2D(pool_size=(2, 2)),
layers.BatchNormalization(),
layers.Conv2D(128, kernel_size=(3, 3), activation="relu",padding='same'),
layers.MaxPooling2D(pool_size=(2, 2)),
layers.Flatten(),
layers.Dense(512),
layers.Dropout(0.2),
layers.Dense(2, activation="sigmoid"),
])
model.summary()
Training and Results
epochs = 25
model.compile(loss='binary_crossentropy', optimizer='SGD', metrics=["accuracy"])
hist = model.fit(train_ds, validation_data=val_ds, epochs=epochs)
For training, we will complete 25 'epochs,' which involve passes through the training data. We will use a 'binary crossentropy' loss function, which is ideal for two-class classification problems, along with a 'Stochastic Gradient Descent' (SGD) optimizer set to a learning rate of 0.01. These hyperparameters may not be optimal, so feel free to explore other options.
After running the code and waiting a few minutes, you should achieve results similar to these: a final training accuracy of 99.95% and a validation accuracy of 83.92%.
We will focus on the validation accuracy, which is the most realistic way to evaluate our model. We can think of it as follows: 'Out of 100 unprocessed images, the algorithm correctly guesses the class of 83.92 images.' This is an excellent score!
Prediction for a single image
Now, you can provide an image of your own cat or dog to the model, and it will predict the class instantly. Simply assign the image path to the variable 'image_path' and execute the code:
image_path = dir + "/Cat/2.jpg"
image_color = cv2.resize(cv2.imread(image_path, cv2.IMREAD_COLOR), (256,256), interpolation=cv2.INTER_CUBIC)
image = tf.expand_dims(np.array(image_color).astype("float32") / 255, 0)
pred = model.predict(image)
pred = np.argmax(pred)
classe = ['Cat', 'Dog']
plt.figure(figsize=(5,2.5))
plt.imshow(image_color)
print("The model predicts: " + classe[pred] + ".")
The full code is made available on my GitHub. Thank you for your time !!!