Transfer Learning: Garbage Classification Using MobileNetV2

Transfer Learning: Garbage Classification Using MobileNetV2

Building deep learning models can be very computationally expensive and time-consuming, especially when training from scratch. Transfer learning allows us to leverage pre-trained models on large datasets and fine-tune them for our specific tasks. In this project, I demonstrate how transfer learning can be used to classify garbage into different categories, using the MobileNetV2 model.


Before we start with the project... what is Transfer Learning?

Transfer learning is a machine learning technique where knowledge acquired from one task is reused to improve performance on another task. By using a model like MobileNetV2, trained on a large, general dataset, we can fine-tune it for a similar, specific task. This approach saves time, computational resources, and data requirements while achieving high accuracy


Article content

If we use a classification model trained on images of dogs and cats, we can transfer its knowledge to a new model that's trained on images of rabbits and deer. The pre-trained model has learned to recognize basic features (like edges, textures, and shapes) that are common across various objects. This is where the power of transfer learning comes in, especially when the new dataset (rabbits and deer) is smaller or lacks enough data to train a deep model from scratch.

Freezing Layers

The early layers of a neural network learn general features such as edges and textures that are useful across many different tasks. These layers are typically frozen during transfer learning. Freezing means we prevent the weights in these layers from being updated during training. The idea behind freezing these layers is that they have already learned generic, fundamental features (like edges, corners, and simple patterns) that apply to almost any image!

Fine-Tuning Layers

The deeper layers, which capture more abstract and complex features, are more specific to the task at hand (for example, recognizing the difference between a dog and a cat). These layers are where the model learns what makes each class distinct, such as the shape of the ears or the color patterns. Fine-tuning involves unfreezing some of these deeper layers and allowing them to adjust during training. The deeper layers will adapt to the new task (recognizing rabbits and deer), making the model more specialized to the new dataset.

The Final Layer

The final layer of the pre-trained model is specific to the original task (predicting dog or cat labels). When transferring the model to a new task, these final layers are replaced with a new set of layers that output predictions for the new classes (rabbits and deer). This allows the model to make predictions in the context of the new dataset.


Okay enough yapping, let's do some coding!

import numpy as np
import cv2

import PIL.Image as Image
import os

import gradio as gr

import tensorflow as tf
import tensorflow_hub as hub
import tf_keras

from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras.models import Sequential
from tensorflow.keras.callbacks import EarlyStopping        

First, we import the necessary libraries.


📂 The Dataset

I worked with a garbage classification dataset from Kaggle. It contains 2527 images categorized as glass, metal, paper, plastic, and trash, which were used for training and testing.

!pip install kaggle

!mkdir ~/.kaggle
!cp kaggle.json ~/.kaggle/
!chmod 600 ~/.kaggle/kaggle.json

!kaggle datasets download -d asdasdasasdas/garbage-classification
!unzip garbage-classification.zip        

This code installs the Kaggle API, sets up the credentials, downloads the Garbage Classification dataset from Kaggle, and then unzips the dataset in the current directory for use in the project.


feature_extractor_model = "https://tfhub.dev/google/tf2-preview/mobilenet_v2/feature_vector/4"
pretrained_model_without_top_layer = hub.KerasLayer(
    feature_extractor_model, input_shape=(224, 224, 3), trainable=False)        

This code snippet loads a pre-trained MobileNetV2 model as a feature extractor from TensorFlow Hub. It uses the KerasLayer to integrate the model into a Keras-based neural network. The model is set to non-trainable (trainable=False), meaning it will not be fine-tuned during training, and its weights will remain fixed. The input shape of the model is set to (224, 224, 3), which is the typical input size for MobileNetV2.


import pathlib
data_dir = pathlib.Path('path of the dataset')
data_dir        

I sat up a path to the dataset directory using the pathlib library.


garbage_images_dict = {
    'glass': list(data_dir.glob('glass/*')),
    'metal': list(data_dir.glob('metal/*')),
    'paper': list(data_dir.glob('paper/*')),
    'plastic': list(data_dir.glob('plastic/*')),
    'trash': list(data_dir.glob('trash/*')),
}

garbage_labels_dict = {
    'glass': 0,
    'metal': 1,
    'paper': 2,
    'plastic': 3,
    'trash': 4,
}        

I then created two dictionaries to organize the garbage classification dataset:

  • garbage_images_dict: Maps each type of garbage to a list of image file paths from the dataset directory. It uses the glob() method to gather all image files in the respective subdirectories for each category.
  • garbage_labels_dict: Maps each garbage type to a numerical label. These labels will be used as target values for training the model as it requires numerical values.


X, y = [], []

for garbage_name, images in garbage_images_dict.items():
    for image in images:
        img = cv2.imread(str(image))
        resized_img = cv2.resize(img,(224,224))
        X.append(resized_img)
        y.append(garbage_labels_dict[garbage_name])        

This code snippet prepares the input features (X) and target labels (y) for training. It initializes empty lists for X and y to store the images and their corresponding labels. Then, it loops through each garbage category in garbage_images_dict, reads each image using cv2.imread(), resizes the image to (224, 224) using cv2.resize(), and appends the resized image to X. The corresponding label, fetched from garbage_labels_dict, is appended to y. This forms the dataset for model training.


X = np.array(X)
y = np.array(y)        

Convert the lists X and y into NumPy arrays for efficient computation during model training.


from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)

X_train_scaled = X_train / 255
X_test_scaled = X_test / 255        

Split the dataset into training and testing sets using train_test_split from sklearn.model_selection. The dataset is divided into X_train, X_test for the images and y_train, y_test for the corresponding labels, with a fixed random_state for reproducibility. After splitting, the pixel values of the images are scaled to the range [0, 1] by dividing X_train and X_test by 255. This scaling step is important to normalize the data before feeding it into a neural network.


model = tf_keras.Sequential([
  pretrained_model_without_top_layer,
  tf_keras.layers.Dense(5, activation='softmax')
])

model.summary()        
Article content

This code defines a simple neural network model that consists of two layers: the pre-trained MobileNetV2 model as a feature extractor (with the top layer removed) and a fully connected dense layer with 5 units, for the 5 garbage categories, using the softmax activation for multi-class classification. The model.summary() function outputs a summary of the model, including the number of parameters in each layer. The pre-trained model has 2,257,984 non-trainable parameters, and the dense layer has 6,405 trainable parameters.


model.compile(
  optimizer="adam",
  loss=tf.keras.losses.SparseCategoricalCrossentropy(),
  metrics=['accuracy'])

model.fit(
    X_train_scaled, y_train,
    epochs=15,
    validation_split=0.1
)        

I then compiled and trained the model using the Adam optimizer and Sparse Categorical Crossentropy as the loss function. The training was performed for 15 epochs, with 10% of the training data set aside for validation to monitor the model's performance.


Article content

The model got a loss of 0.51 and an accuracy of 82.67%, which is considered good for such use case, now to further highlight the advantages of using transfer learning, let's build a CNN from scratch, train it on the same dataset (Garbage Classification) and compare its performance with this model.

model2 = models.Sequential()
model2.add(layers.Conv2D(32, (3,3), activation = 'relu', input_shape=(224,224,3)))
model2.add(layers.MaxPooling2D(2,2))
model2.add(layers.Conv2D(32, (3,3), activation = 'relu'))
model2.add(layers.MaxPooling2D(2,2))
model2.add(layers.Conv2D(32, (3,3), activation = 'relu'))
model2.add(layers.Flatten())
model2.add(layers.Dense(32, activation = 'relu'))
model2.add(layers.Dense(5, activation = 'softmax'))        

The model consists of three convolutional layers with ReLU activation, two max-pooling layers for downsampling, a flatten layer to convert feature maps into a vector, and two dense layers: one for feature learning and another with softmax activation for classifying into five categories.


Article content

I compiled it the same way as the transfer learning model, trained it for 15 epochs, and got an accuracy of just 51.7% with a loss of 2.98. This result is barely better than random guessing and demonstrates how poorly the model performs when trained from scratch on a small dataset. The high loss and low accuracy highlight that the model is underfitting, as it struggles to learn meaningful patterns due to the lack of pre-trained knowledge and insufficient data.

And even if the model did not underfit and was able to achieve a similar accuracy, it would have taken significantly more time and a lot more epochs to reach that point, making it inefficient compared to the transfer learning approach.


Model User Interface

# Garbage labels
garbage_labels = ['glass', 'metal', 'paper', 'plastic', 'trash']

def classify_image(image_path):
    img = cv2.imread(image_path)
    if img is None:
        return "Error: Unable to read the image."

    img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
    img_resized = cv2.resize(img, (224, 224)) / 255.0 
    img_array = np.expand_dims(img_resized, axis=0)  

    # Predict the class
    predictions = model.predict(img_array)
    predicted_label = garbage_labels[np.argmax(predictions)]

    return predicted_label
        

The function classify_image takes an image path as input, reads the image, and performs preprocessing by converting the image from BGR to RGB format, resizing it to 224x224 pixels, and scaling the pixel values to the range [0, 1]. It then expands the image array to match the model's input shape and uses the model to predict the class. The predicted class label is determined by selecting the label corresponding to the highest prediction probability and returned as the result.

Article content
import gradio as gr

def gradio_classify(image):
    img = cv2.cvtColor(np.array(image), cv2.COLOR_RGB2BGR)
    temp_path = 'temp.jpg'
    cv2.imwrite(temp_path, img)  
    return classify_image(temp_path)


ui = gr.Interface(
    fn=gradio_classify,
    inputs=gr.Image(type="pil"),
    outputs="label",
    title="Garbage Classification")

ui.launch(share=True)        

I created a simple interface using Gradio allowing users to interact with the model through a web interface for classifying garbage images. The gradio_classify function converts the input image from PIL to OpenCV format, saves it temporarily, and then uses the classify_image function to predict the garbage category. The Gradio interface takes an image input and outputs the predicted label.


Article content
Article content
Article content

Conclusion

This project highlights the benefits of transfer learning, by leveraging pre-trained models like MobileNetV2, we can achieve satisfactory results with fewer data and reduced computational cost. While building a custom CNN from scratch is a valuable learning experience, transfer learning is often the most efficient approach for real-world applications, especially with limited resources and small datasets.


To view or add a comment, sign in

More articles by Mohammad Zaiter

Others also viewed

Explore content categories