Understanding Transfer Learning as an methodology for efficiency in training CNN’s models
Image classification. Image credit: Gluon-cv.mxnet.io

Understanding Transfer Learning as an methodology for efficiency in training CNN’s models

Abstract

The training of a ConvNet model using large data sets can take hours and the performance of it may be affected by the method in which is being applied the training. For attend this scenario, a technique called Transfer Learning started to be implemented in ConvNets architectures to improve the time performance and accuracy for the implementation of pre-trained models to new models. Transfer learning is a paradigm applied to any neural network architecture but in the case of the computer vision field this technique has proven interesting results. The following paper will explore by using CFAR-10 data set the performance of transfer learning implementation, training an evaluation in a DenseNet-121 architecture. The technique implemented in the transfer learning process is a fine-tuning, freezing the entire DenseNet-121 architecture as well as the modification of that architecture using batch-normalizationAdam algorithm optimizationdropout and learning rate.

Key words: Transfer Learning, DenseNet-121, Batch-normalization, Fine-tuning, accuracy, loss


Introduction

Convolutional neural networks is a category within the Neural Networks field for Machine Learning or Artificial Intelligence. The ConvNets are highly effective architectures for image recognition and classification. Because the Neural Network in computer science is an analogy of the pattern of function from the Human Brain and the ConvNets architecture resembles the Visual Cortex of image recognition and classification of the human eye, in the Machine Learning field to replicate these ConvNets architectures it is used large set of data (images) to get a good response of this algorithm. The training of large data sets can take hours and the performance may be affected by the method in which is being applied the training.

These machine learning architectures usually require abundant data under a stationary environment. But, in real-world environments the pattern is that user’s interact with the preferences (weights, parameters, biases) making the architecture a dynamical model (Widmer et al. 1996). In response to this scenario, a technique called Transfer Learning started to be implemented and actually is frequently used in ConvNets architectures to improve the time performance and accuracy for the implementation of pre-trained models to new models (Gulli et al. 2017) .

Transfer learning is a paradigm in which by acquiring the knowledge of a pre-trained model the solution for one related task will be compatible over the new model. Transfer learning was initially discussed and classified into three different settings: inductive transfer learning, transductive transfer learning and unsupervised transfer learning (Pan, et al. 2009). However over the years the Transfer Learning technique has improved by being intuitive and enhancing

In this article we explore the behavior of the transfer learning technique in a DenseNet-121 architecture implementation with a CIFAR-10 data set. This architecture maintains a simple connectivity pattern seeking the maximum information flow between layers in the network by connecting all layers (with matching feature-map sizes) directly with each other. To preserve the feed-forward nature, each layer obtains additional inputs from all preceding layers and passes on its own feature-maps to all subsequent layers. (Huang et al. 2016)

No alt text provided for this image

Figure 1. DenseNet representation. Image credit: Huang et al. 2016

Currently there are different methods to implement transfer learning in a neural network such as data augmentation, fine-tuning, freezing, multitask, one-shot learning, domain adaptation, domain confusion and zero-shot learning. There is no accurate or ideal technique but there is a behavioral change in each implemented transfer learning technique. This article, will cover a proposed implementation using fine-tuning, and freezing techniques in a pre-trained densenet-121 architecture aiming to understand the time response and performance in the accuracy and loss of a valid data set of 10000 samples from the CIFAR-10 data set.

Techniques

Transfer Learning — Fine-tuning

Transfer learning attends the need for lifelong machine-learning methods that retain and reuse previously learned knowledge (Pan, et al. 2009). Figure 2 present a comparison between the traditional process of learning and one applying Transfer learning.

No alt text provided for this image

Figure 2. Transfer Learning representation. left(Traditional Machine Learning), right(Transfer Learning). Image credit: (Pan, et al. 2009)

Based on this idea, fine-tuning complements the transfer learning method due the machine learning model passes over a process in which all the layers of the model or some of the layers are frozen, which is necessary so the training can be initialized in the top layers with new classifiers that are randomly initialized but the frozen layers still contains the parameters and will support the extraction of the features.

The transfer learning technique implementing fine-tuning, adjusts the more abstract representations of the model being reused, so there is a re usability and efficiency in the prediction over the new trained parameters.

Experiments

The experiment consisted in put the ConvNet in three separated training processes. One a regular training over the entire model, a second one adding optimization and regularization techniques, and a third one applying transfer learning freezing all the layers in the model and fine tuning the last layers to be trained over the validation data of the dataset.

No alt text provided for this image

Figure 3. Densenet 121 Blue (Freezed layers), Green (Fine-tuning). Image credit: Author source

Dataset

To see the behavior of Transfer learning, it was chosen the CIFAR-10 dataset which contains 60000 32x32 color images in 10 classes, with 6000 images per class. There are 50000 training images and 10000 test images. This complex dataset was set to be trained over an architecture DenseNet-121 (Huang et al. 2016).

Training

All models were previously preprocessed and their inputs reshaped in a size of (155, 155). The optimizer for the training was ADAM using 128 batch and 32 epochs. The learning rate was between 0 and 1 with a early stopping of 1e-05. Both architectures also had two batch normalized layers before the softmax layer in the classification layers and these two layers with a dimensionality of the output space of 256 and 128 respectively. Finally dropout at 0.2 in the last activated layer after the last batch normalization. The base model of the ConvNet had a total of 6,964,106 trainable parameters; the base model with optimization techniques in the classifier layers had 7,252,746 trainable parameter and the ConvNet model with the fine-tuning implementation had 4,262,026 trainable parameters.

No alt text provided for this image

Figure 4. DenseNet 121 Base model

No alt text provided for this image

Figure 5. DenseNet 121 Base model adding optimization dropout and batch normalization in classifier layers

For the Transfer learning training, the workflow implemented was to take the model up to the classification layers, froze them, to avoid destroying any of the information they contain during future training rounds, then add the classification layers that are unfrozen and trainable. These last layers will learn to turn the old features into predictions on a new dataset.

No alt text provided for this image

Figure 6. DenseNet 121 Fine-Tuning Implemented

Results

Both model performed similar. There was small differences and is due the size of the dataset. Whereas this data set has heavy information. The more trainable samples the data set has the more differences will be presented.

Accuracy. The base model with optimization applied had an accuracy rate of 95.35% performed considerably better than the transfer learning model. This model had an accuracy rate of 92.72%, interestingly having a steady trend in the learning process as the figure 4 presents. The base model had a different behavior, where the learning process increased over the epochs the accuracy rate until the epoch 19 where achieved the highest result. Finally, the base model without optimization parameter performed at 86.59%, presumably due there was not an optimization technique applied.

No alt text provided for this image

Figure 7. DenseNet 121 Base Model Valid accuracy

No alt text provided for this image

Figure 8. DenseNet 121 Base Model Transfer Learning Valid accuracy

No alt text provided for this image

Figure 9. DenseNet 121 Base Model (No optimization applied) Valid accuracy

Time. The loss in each of the three model was an important feature to track. The fine tuning model took a time of 44.10 min and the base model with optimizer took 139.5 min. The standard model took a 13.8 min to be fully trained. The fine tuning technique is faster by a 68.38% with a basic model no standardized and 68.70% slower that a basic training model without optimization techniques. However when weighting the accuracy vs the time the perform is much better in the transfer learning model.

Loss and Overfitting. The DenseNet model performed efficiently to be less prone to overfitting in all of the three experiments. As for the loss, the loss validation of the sample in relation the validation accuracy of the base model, sustained a rate of 65%. Compared with the model with a fine tuning implementation, the loss was 48.2% better having a loss in the validation samples of 33.67%. Nevertheless, the DenseNet model without transfer learning but optimized with batch normalization had a loss of 16.29% being the best model trained

No alt text provided for this image

Figure 10. DenseNet 121 Base Model (Optimization applied) Valid loss

No alt text provided for this image

Figure 11. DenseNet 121 Base Model (Transfer learning applied) Valid loss

No alt text provided for this image

Figure 12. DenseNet 121 Base Model (No optimization applied) Valid loss

Discussion

As we mentioned at the beginning of this paper, we wanted to understand the performance of the traditional learning of the DenseNet 121 ConvNet architecture in a environment of traditional learning, vs optimized learning vs, transfer learning. The ConvNets architectures in a computer vision field of work requires excellent performance to solve applicable simulations that requires the evaluation of information captured in images. This leads to a problem in performance and time that Transfer Learning can enhance without sacrificing the accuracy or loss of the model. As we’ve seen, training a base ConvNet model needs a series of steps to avoid issues in the overfitting, underfitting or performance accuracy. The batch normalization proved to be a great option to improve a ConvNet presenting a model trained accuracy in validation data of 95%. When Transfer Learning is applied, there is a minor reduction of the performance but the time execution indeed create a value for take a transfer learning approach.

Conclusion

We understood that Transfer learning by adding fine-tuning to it is a great option to implement in ConvNets that are used to an approach where are different labels in the target task. Due the experiment was using CIFAR-10 it was concluded too that the transfer learning model required mode data to improve their performance. Also it was evident between the base model and the enhanced model that DenseNet tends to improve their performance in accuracy with a larger number of trainable parameters but in order to risk a overfitting of the data the batch normalization is an excellent tool to implemented in this model. ConvNets currently are widely used in simulations and the excess of data that needs to be trained requires not only a good accuracy but time performance. Therefore, by facing a simulation problem where the objects source is different with the probability of the simulation to be performed creates a challenge in complexity training that a dine-tuning approach are able to handle by freezing if not all layers some of the layer of a ConvNet model to be later unfrozen and take this trainable parameters into the new model training thus syncing both similar parameters and obtain an accurate simulation.


References

Appendices

To view or add a comment, sign in

More articles by Edward A Ortiz

Explore content categories