Deep Learning Architecture Optimization using genetic algorithm-II

Welcome to the second part of application of genetic algorithm. In this article we will demonstrate the application of genetic algorithms to optimize the architecture of deep learning models which is often cited as a trial and error process. Data scientists often struggle to determine the number of hidden layers and the number of nodes in each layer which will produce the optimal output. You may try to experiment with all possible combinations within your given scope but very soon the number of runs go out of control as the search space becomes huge.Genetic algorithms come very handy in such situations. The main challenges that we need to navigate through are forming the right chromosome and the right evaluation function. We will take a very simple dataset and solve for classification. But the approach is extendable to very complex networks with multiple levels of hidden layers.so, let's get started!

As usual, the ideas expressed here are purely as learnt and understood by me and has nothing to do with my organization.

Also, if you do not know about genetic algorithms, here is a good place to start.

https://en.wikipedia.org/wiki/Genetic_algorithm

DEAP ( Distributed Evolutionary Algorithms In Python) is a robust library and here is a good place to start learning:

https://deap.readthedocs.io/en/master/

As usual, start with importing and preparing your data after importing the required libraries.

"""
Created on Mon Nov 23 13:35:25 2020


@author: chakraborty
"""
import pandas as pd
import numpy as np
import random
import tensorflow as tf
from sklearn.preprocessing import OneHotEncoder
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.compose import ColumnTransformer
from deap import algorithms,base,tools,creator


Complete your preprocessing and get the data ready for the actual fun to begin!

#import your data and complete proprocessing
...

ct = ColumnTransformer(transformers = [('encoder',OneHotEncoder(),[1,2])],remainder = 'passthrough')


...
...

X_train
X_test
y_train
y_test

Let's say we want to represent three hidden layers of 20 nodes each, this parameter's value would be [20, 20, 20]. Before we implement our genetic algorithm-based optimizer for the hidden layer configuration, we need to define a chromosome that can be translated into this pattern.So the length of the list and the value of the elements are two aspects we need to take care of.However, to control the actual number of hidden layers in the network, we will put some of these values as zero or negative while defining the bounds. Such a value would imply that no more layers will be added to the network.

Let us first create a mechanism for defining the chromosomes: Note how I am defining the upper and lower bounds. I am assuming there will be at max three hidden layers and the negative values will ensure that if that layer is not required, it will become zero. Also I am defining the parameters as floats.

l_bound = [1,-1,-1]
u_bound = [6,10,8]


PARAM = len(l_bound)


creator.create('FitnessMax',base.Fitness,weights = (1.0,))
creator.create('Individual',list,fitness = creator.FitnessMax)


toolbox = base.Toolbox()
for i in range(PARAM):
    toolbox.register("size_" + str(i),random.uniform,l_bound[i],u_bound[i])
layers = ()
for i in range(PARAM):
    layers = layers + (toolbox.__getattribute__("size_" + str(i)),)

toolbox.register("individual",tools.initCycle,creator.Individual,layers,n=1)
toolbox.register('population',tools.initRepeat,list,toolbox.individual)

Now, we will use tensorflow to define and create our evaluation function.Note how I am passing the number of nodes using the values created as a part of the individual chromosome.

def evaluate(individual):
    ann = tf.keras.models.Sequential()
    ann.add(tf.keras.layers.Dense(units = individual[0],activation = 'relu'))
    ann.add(tf.keras.layers.Dense(units = individual[1],activation = 'relu'))
    ann.add(tf.keras.layers.Dense(units = individual[2],activation = 'relu'))
    ann.add(tf.keras.layers.Dense(units = 1,activation = 'sigmoid'))
    ann.compile(optimizer = 'adam' ,loss = 'binary_crossentropy', metrics = ['accuracy'])
    ann.fit(X_train,y_train,batch_size=32,epochs = 10)
    y_pred = ann.predict(X_test)
    y_pred = y_pred >0.5
    return accuracy_score(y_test,y_pred),
         

Let's finish defining the processes

CROWDING_FACTOR = 10.0


toolbox.register("mate",tools.cxSimulatedBinaryBounded,low=l_bound,up=u_bound,eta=CROWDING_FACTOR)
toolbox.register("mutate",tools.mutPolynomialBounded,low=l_bound,up=u_bound,eta=CROWDING_FACTOR,indpb=1.0 /3.0)
toolbox.register("select", tools.selTournament, tournsize=2)
toolbox.register('evaluate',evaluate)
stats = tools.Statistics(lambda ind: ind.fitness.values)
stats.register("Max", np.max)

Just to mention, Crowding represents degree of the crossover. A high eta will produce children resembling to their parents, while a small eta will produce solutions much more different.

That's it.Run the code. Be very careful in choosing the number of epochs along with population and generation values.High values will take a very long time to run. In this specific dataset, accuracy value reached 100% from 79% within first five runs.

def main():
    pop = toolbox.population(n=10)
    hof = tools.HallOfFame(1)
    pop, log = algorithms.eaSimple(
        pop, toolbox, 0.7, 0.7, 10, halloffame=hof, stats=stats
    )
    return hof,pop,log

You will be amazed to see the application of this method when your datasets are large and complex. Feel free to experiment and let me know your thoughts/questions in comment. Next time, I will show the application of genetic algorithms in reinforcement learning. Till then,good bye!!


To view or add a comment, sign in

More articles by Saikat Chakraborty

Others also viewed

Explore content categories