Understanding Transfer Learning as an methodology for efficiency in training CNN’s models
Abstract
The training of a ConvNet model using large data sets can take hours and the performance of it may be affected by the method in which is being applied the training. For attend this scenario, a technique called Transfer Learning started to be implemented in ConvNets architectures to improve the time performance and accuracy for the implementation of pre-trained models to new models. Transfer learning is a paradigm applied to any neural network architecture but in the case of the computer vision field this technique has proven interesting results. The following paper will explore by using CFAR-10 data set the performance of transfer learning implementation, training an evaluation in a DenseNet-121 architecture. The technique implemented in the transfer learning process is a fine-tuning, freezing the entire DenseNet-121 architecture as well as the modification of that architecture using batch-normalization, Adam algorithm optimization, dropout and learning rate.
Key words: Transfer Learning, DenseNet-121, Batch-normalization, Fine-tuning, accuracy, loss
Introduction
Convolutional neural networks is a category within the Neural Networks field for Machine Learning or Artificial Intelligence. The ConvNets are highly effective architectures for image recognition and classification. Because the Neural Network in computer science is an analogy of the pattern of function from the Human Brain and the ConvNets architecture resembles the Visual Cortex of image recognition and classification of the human eye, in the Machine Learning field to replicate these ConvNets architectures it is used large set of data (images) to get a good response of this algorithm. The training of large data sets can take hours and the performance may be affected by the method in which is being applied the training.
These machine learning architectures usually require abundant data under a stationary environment. But, in real-world environments the pattern is that user’s interact with the preferences (weights, parameters, biases) making the architecture a dynamical model (Widmer et al. 1996). In response to this scenario, a technique called Transfer Learning started to be implemented and actually is frequently used in ConvNets architectures to improve the time performance and accuracy for the implementation of pre-trained models to new models (Gulli et al. 2017) .
Transfer learning is a paradigm in which by acquiring the knowledge of a pre-trained model the solution for one related task will be compatible over the new model. Transfer learning was initially discussed and classified into three different settings: inductive transfer learning, transductive transfer learning and unsupervised transfer learning (Pan, et al. 2009). However over the years the Transfer Learning technique has improved by being intuitive and enhancing
In this article we explore the behavior of the transfer learning technique in a DenseNet-121 architecture implementation with a CIFAR-10 data set. This architecture maintains a simple connectivity pattern seeking the maximum information flow between layers in the network by connecting all layers (with matching feature-map sizes) directly with each other. To preserve the feed-forward nature, each layer obtains additional inputs from all preceding layers and passes on its own feature-maps to all subsequent layers. (Huang et al. 2016)
Figure 1. DenseNet representation. Image credit: Huang et al. 2016
Currently there are different methods to implement transfer learning in a neural network such as data augmentation, fine-tuning, freezing, multitask, one-shot learning, domain adaptation, domain confusion and zero-shot learning. There is no accurate or ideal technique but there is a behavioral change in each implemented transfer learning technique. This article, will cover a proposed implementation using fine-tuning, and freezing techniques in a pre-trained densenet-121 architecture aiming to understand the time response and performance in the accuracy and loss of a valid data set of 10000 samples from the CIFAR-10 data set.
Techniques
Transfer Learning — Fine-tuning
Transfer learning attends the need for lifelong machine-learning methods that retain and reuse previously learned knowledge (Pan, et al. 2009). Figure 2 present a comparison between the traditional process of learning and one applying Transfer learning.
Figure 2. Transfer Learning representation. left(Traditional Machine Learning), right(Transfer Learning). Image credit: (Pan, et al. 2009)
Based on this idea, fine-tuning complements the transfer learning method due the machine learning model passes over a process in which all the layers of the model or some of the layers are frozen, which is necessary so the training can be initialized in the top layers with new classifiers that are randomly initialized but the frozen layers still contains the parameters and will support the extraction of the features.
The transfer learning technique implementing fine-tuning, adjusts the more abstract representations of the model being reused, so there is a re usability and efficiency in the prediction over the new trained parameters.
Experiments
The experiment consisted in put the ConvNet in three separated training processes. One a regular training over the entire model, a second one adding optimization and regularization techniques, and a third one applying transfer learning freezing all the layers in the model and fine tuning the last layers to be trained over the validation data of the dataset.
Figure 3. Densenet 121 Blue (Freezed layers), Green (Fine-tuning). Image credit: Author source
Dataset
To see the behavior of Transfer learning, it was chosen the CIFAR-10 dataset which contains 60000 32x32 color images in 10 classes, with 6000 images per class. There are 50000 training images and 10000 test images. This complex dataset was set to be trained over an architecture DenseNet-121 (Huang et al. 2016).
Training
All models were previously preprocessed and their inputs reshaped in a size of (155, 155). The optimizer for the training was ADAM using 128 batch and 32 epochs. The learning rate was between 0 and 1 with a early stopping of 1e-05. Both architectures also had two batch normalized layers before the softmax layer in the classification layers and these two layers with a dimensionality of the output space of 256 and 128 respectively. Finally dropout at 0.2 in the last activated layer after the last batch normalization. The base model of the ConvNet had a total of 6,964,106 trainable parameters; the base model with optimization techniques in the classifier layers had 7,252,746 trainable parameter and the ConvNet model with the fine-tuning implementation had 4,262,026 trainable parameters.
Figure 4. DenseNet 121 Base model
Figure 5. DenseNet 121 Base model adding optimization dropout and batch normalization in classifier layers
For the Transfer learning training, the workflow implemented was to take the model up to the classification layers, froze them, to avoid destroying any of the information they contain during future training rounds, then add the classification layers that are unfrozen and trainable. These last layers will learn to turn the old features into predictions on a new dataset.
Figure 6. DenseNet 121 Fine-Tuning Implemented
Results
Both model performed similar. There was small differences and is due the size of the dataset. Whereas this data set has heavy information. The more trainable samples the data set has the more differences will be presented.
Accuracy. The base model with optimization applied had an accuracy rate of 95.35% performed considerably better than the transfer learning model. This model had an accuracy rate of 92.72%, interestingly having a steady trend in the learning process as the figure 4 presents. The base model had a different behavior, where the learning process increased over the epochs the accuracy rate until the epoch 19 where achieved the highest result. Finally, the base model without optimization parameter performed at 86.59%, presumably due there was not an optimization technique applied.
Figure 7. DenseNet 121 Base Model Valid accuracy
Figure 8. DenseNet 121 Base Model Transfer Learning Valid accuracy
Figure 9. DenseNet 121 Base Model (No optimization applied) Valid accuracy
Time. The loss in each of the three model was an important feature to track. The fine tuning model took a time of 44.10 min and the base model with optimizer took 139.5 min. The standard model took a 13.8 min to be fully trained. The fine tuning technique is faster by a 68.38% with a basic model no standardized and 68.70% slower that a basic training model without optimization techniques. However when weighting the accuracy vs the time the perform is much better in the transfer learning model.
Loss and Overfitting. The DenseNet model performed efficiently to be less prone to overfitting in all of the three experiments. As for the loss, the loss validation of the sample in relation the validation accuracy of the base model, sustained a rate of 65%. Compared with the model with a fine tuning implementation, the loss was 48.2% better having a loss in the validation samples of 33.67%. Nevertheless, the DenseNet model without transfer learning but optimized with batch normalization had a loss of 16.29% being the best model trained
Figure 10. DenseNet 121 Base Model (Optimization applied) Valid loss
Figure 11. DenseNet 121 Base Model (Transfer learning applied) Valid loss
Figure 12. DenseNet 121 Base Model (No optimization applied) Valid loss
Discussion
As we mentioned at the beginning of this paper, we wanted to understand the performance of the traditional learning of the DenseNet 121 ConvNet architecture in a environment of traditional learning, vs optimized learning vs, transfer learning. The ConvNets architectures in a computer vision field of work requires excellent performance to solve applicable simulations that requires the evaluation of information captured in images. This leads to a problem in performance and time that Transfer Learning can enhance without sacrificing the accuracy or loss of the model. As we’ve seen, training a base ConvNet model needs a series of steps to avoid issues in the overfitting, underfitting or performance accuracy. The batch normalization proved to be a great option to improve a ConvNet presenting a model trained accuracy in validation data of 95%. When Transfer Learning is applied, there is a minor reduction of the performance but the time execution indeed create a value for take a transfer learning approach.
Conclusion
We understood that Transfer learning by adding fine-tuning to it is a great option to implement in ConvNets that are used to an approach where are different labels in the target task. Due the experiment was using CIFAR-10 it was concluded too that the transfer learning model required mode data to improve their performance. Also it was evident between the base model and the enhanced model that DenseNet tends to improve their performance in accuracy with a larger number of trainable parameters but in order to risk a overfitting of the data the batch normalization is an excellent tool to implemented in this model. ConvNets currently are widely used in simulations and the excess of data that needs to be trained requires not only a good accuracy but time performance. Therefore, by facing a simulation problem where the objects source is different with the probability of the simulation to be performed creates a challenge in complexity training that a dine-tuning approach are able to handle by freezing if not all layers some of the layer of a ConvNet model to be later unfrozen and take this trainable parameters into the new model training thus syncing both similar parameters and obtain an accurate simulation.
References
- François Chollet. Deep Learning with Python. 2017
- Gerhard Widmer and Miroslav Kubat. Learning in the presence of concept drift and hidden contexts. Machine Learning, 69–101, 1996
- Antonio Gulli and Sujit Pal. Deep Learning with Keras. Implement neural networks with Keras on Theano and TensorFlow, 204, 2017
- Gao Huang, Zhuang Liu, Laurens van der Maaten, Kilian Q. Weinberger. Densely Connected Convolutional Networks, 2016
- Sinno Jialin Pan and Qiang Yang. A Survey on Transfer Learning, 2009
- Evergreen Technologies. Dog breed image classification using transfer learning, 2020
- Sebastian Ruder. Transfer Learning — Machine Learning’s Next Frontier, 2017
- Dipajan Sarkar. A Comprehensive Hands-on Guide to Transfer Learning with Real-World Applications in Deep Learning, 2018