Hyper-parameter Optimisation Using Genetic Algorithms: Classification Task

Author: Conor Rothwell

Keywords: Python, Scikit-learn, Classification, Hyper-parameter Optimisation, Genetic Algorithms, DEAP

Classification problems are among the most common problems in machine learning. They involve taking a set of predictor features and using these features in such a way as to produce a binary (or multi-class) prediction for a categorical target variable. A simple example is taking information such as age, income, marital status, etc and using these to predict whether someone should be approved for the bank loan they have applied for.

Pythons Scikit Learn library offers many different classifier algorithms including simple decision trees, logistic regression, and support vector machines. Each of these methods have parameters associated with them which can alter the performance of the algorithm in a classification task. For example, in decision trees, the depth of the tree can be controlled.

When trying to build the optimal classification model, these parameter need to be tuned. The search space of potential combinations of parameter options for these classification methods are potentially infinite. In such an optimisation problem, exact methods such as exhaustive search become impractical and heuristic methods become more appropriate. Genetic Algorithms are one such heuristic approach. This article will now briefly outline how genetic algorithms work before demonstrating how they can be used to optimise the performance of classification methods.

Genetic Algorithms

Genetic algorithms (GA) are one of a group of evolutionary algorithms that take inspiration from biology and the natural world. GA uses Darwin's notion of evolution and survival of the fittest to find the optimal solution to a given problem. GA allows the exploration of massive search spaces in a time efficient manner, making it an ideal approach for problems like parameter optimisation. Below is a flow chart of how the GA evolves an optimal solution.

Generation Initial Population

The basic component of a genetic algorithm is the chromosome which consists of a series of genes. The chromosome itself represents one candidate solution in an abstract way. The first step of the GA is to randomly generate an initial population with multiple chromosomes. How a chromosome is generated is defined by the user. For example, in the problem shown here, chromosomes may be generated where each gene can have a value between 0 and 5.

Evaluate Chromosome Fitness

Each chromosome is assigned a fitness score by a fitness function and this score reflects how good a solution is. In the above example, the objective may be to maximise the value of genes, in which case the fitness function would take the sum of all genes. The second chromosome would be the fittest with a score of 12.

Selection

Once each individual chromosome has been evaluated and assigned a fitness score, selection begins. This process involves identifying the fittest chromosomes so that these individuals can be used to generate the next generation of individuals. Chromosomes that are not selected are discarded. There are many methods to select the fittest chromosomes. Roulette wheel and tournament selection are two common techniques. Let's say that we have selected the first two chromosomes for evolution.

Crossover and Mutation

Once the chromosomes have been selected they need to be 'evolved'. Evolution includes two techniques, crossover and mutation. Crossover is the equivalent of two parents having a child. Each chromosome contributes a certain number of genes to the new individual.

In a GA, it is hoped that combining two fit individuals will produce an even fitter individual solution.

Again, there are many ways to perform chromosome crossover. The example above uses a single point crossover. Other methods include k-point crossover (k ≥ 1) and uniform crossover.

Mutation is equivalent to mutation in genetics and involves randomly altering the value of a gene in a given chromosome. For example, changing the value of the first gene of the offspring from 0 to 4. The value its changed to is randomly generated according to the rules set out during the generation of the initial population. Mutation typically occurs with a very low probability and is used to reduce the probability getting stuck in local optima.

These steps of evaluation, selection, crossover, and mutation are then repeated until some termination criteria is met. This may be a maximum number of generations of evolution, or the achievement of a certain fitness function value. In our example, we know that the maximum attainable value is 30 (5 in each gene). Therefore, it would be reasonable to stop the GA if a chromosome was found with this value. The example used here is deliberately simplistic. The more complicated problem of parameter optimisation for a classification task will be explored below.

Classification

The data for this problem is taken from "Telco Customer Churn" dataset on Kaggle. It is concerned with predicting if a customer will leave the company ("churn"). Area under the receiver operating characteristic curve (AUROC) is used as the metric for a model here.

To the right is the ROC curve for three methods using their default parameter values in scikit learn. The models and their respective AUROC scores are shown in the legend. Logistic regression is the best by some margin and decision trees perform quite poorly.

Parameter Tuning

In this example, the DEAP library is used to implement the GA. The full list of parameters associated with decision trees and their possible values are shown below. For more information of decision trees see the scikit learn documentation.

This is a potential chromosome of parameters. Chromosomes are a list of parameters with each item in the list referring to a specific parameter.

Implementation

Note: For those unfamiliar with DEAP I recommend reading the examples and explanations in the DEAP documentation.

Import all relevant libraries. It may be necessary to install the deap library separately.

import random

from deap import base
from deap import creator
from deap import tools
from deap import algorithms


from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import roc_auc_score

First define the creation of individual chromosomes and populations. The creator function is used here to define how fitness will be evaluated (maximise fitness function value) and define the individual chromosome (a list).

Our individuals are not homogeneous (not a list of floats or integers). Therefore, we need to build custom individuals. The tools.Initcycle function allows the creation of such custom individuals. First, set out the possible values for the various genes in the chromosome (e.g. class weight can take a value of "balanced" or None). Toolbox.register defines how to generate each gene and may take arguments. For example, the criterion gene is created by randomly choosing from the criterions list [random.choice(criterions)]. For floats between 0 and 1, random.random can be passed without any arguments.

Calling toolbox.individual() then returns a complete individual of the form [criterion, splitter, integer, ... , class weight, float] with a maximising the objective fitness attribute. Populations are much like individuals. Instead of being initialised with attributes, they are filled with individuals. The population initialised here is a bag full of a fixed number of individuals in no particular order.

creator.create("FitnessMax", base.Fitness, weights=(1.0,)) # Maximise the fitness function value
creator.create("Individual", list, fitness=creator.FitnessMax)

toolbox = base.Toolbox()

# Possible parameter values
criterions = ['gini', 'entropy']
splits = ['best', 'random']
lower_depth, upper_depth = 2, 100 
lower_leaf_nodes, upper_leaf_nodes = 2, 100
class_weight = ['balanced', None]
lower_samples_leaf, upper_samples_leaf  = 0, 0.5


#define how each gene will be generated (e.g. criterion is a random choice from the criterion list).
toolbox.register("attr_crit", random.choice, criterions)
toolbox.register("attr_splitter", random.choice, splits)
toolbox.register("attr_depth", random.randint, lower_depth, upper_depth)
toolbox.register("attr_flt", random.random)
toolbox.register("attr_leaf_nodes", random.randint, lower_leaf_nodes, upper_leaf_nodes)
toolbox.register("attr_weight", random.choice, class_weight)
toolbox.register("attr_samples_leaf", random.uniform, lower_samples_leaf, upper_samples_leaf)

# This is the order in which genes will be combined to create a chromosome
N_CYCLES = 1
toolbox.register("individual", tools.initCycle, creator.Individual,
                 (toolbox.attr_crit, toolbox.attr_splitter, toolbox.attr_depth,
                 toolbox.attr_flt, toolbox.attr_samples_leaf, toolbox.attr_samples_leaf, 
                 toolbox.attr_flt, toolbox.attr_leaf_nodes,
                 toolbox.attr_flt, toolbox.attr_weight), n=N_CYCLES)

toolbox.register("population", tools.initRepeat, list, toolbox.individual)

DEAP has a number of built in functions for mutation, and evaluation, however, the heterogeneous nature of the chromosomes in this problem mean custom functions must be defined. The standard one point crossover and tournament selection functions can be used. One point crossover is implemented as discussed above. Tournament selection involves selecting a (user defined) number of chromosomes and running a tournament between them. The winner of the each tournament is the fittest chromosome (highest fitness function score) and this chromosome proceeds to crossover.

When called, this custom mutation function randomly selects one of the genes in the individual chromosome passed to it and mutates it. How this mutation happens depends on the gene. For genes with two possible values, such as the criterion (gini vs entropy), the value is simply swapped to the other possible value (gini > entropy; entropy > gini). For genes with multiple options, one option is randomly generated in the same way as it was during the initialisation process. This modified individual is then returned.

def mutate(individual):
    '''This function randomly selects a gene and randomly generates a new
       value for it based on a set of rules
    '''
    
    gene = random.randint(0,9) #select which parameter to mutate
    if gene == 0:
        if individual[0] == 'gini':
            individual[0] = 'entropy'
        else:
            individual[0] = 'gini'
        
    elif gene == 1:
        if individual[1] == 'best':
            individual[1] = 'random'
        else:
            individual[1] = 'best'
    
    elif gene == 2:
        individual[2] = random.randint(lower_depth, upper_depth)
        
    elif gene in [4,5]:
        individual[gene] = random.uniform(lower_samples_leaf, upper_samples_leaf)
        
    elif gene in [3,6,8]:
        individual[gene] = random.random()
        
    elif gene == 7:
        individual[7] = random.randint(lower_leaf_nodes, upper_leaf_nodes)
        
    elif gene == 9:
        if individual[9] == 'balanced':
            individual[9] = None
        else:
            individual[9] = 'balanced'        
        
    return individual,

The evaluation function is almost always user defined. In this case we want to optimise the performance of a decision tree classifier. Therefore, it makes sense to build a decision tree model using the parameters passed to it by the individual chromosome being evaluated, and calculating the AUROC of that model. This AUROC score is the fitness score for the individual chromosome and is used for selection purposes later in the process.

def evaluate(individual):
    '''
    build and test a model based on the parameters in an individual and return
    the AUROC value
    '''
    #  extract the values of the parameters from the individual chromosome
    crit = individual[0]
    split = individual[1]
    depth = individual[2]
    samples_split = individual[3]
    samples_leaf = individual[4]
    weight_fraction_leaf = individual[5]
    features = individual[6]
    leaf_nodes = individual[7]
    impurity_decrease = individual[8]
    class_weight = individual[9]
    
    model = DecisionTreeClassifier(criterion=crit, 
                                   splitter=split, 
                                   max_depth=depth, 
                                   min_samples_split=samples_split, 
                                   min_samples_leaf=samples_leaf, 
                                   min_weight_fraction_leaf=weight_fraction_leaf, 
                                   max_features=features, 
                                   random_state=100, 
                                   max_leaf_nodes=leaf_nodes, 
                                   min_impurity_decrease=impurity_decrease, 
                                   class_weight=class_weight, 
                                   presort=False).fit(X_train, y_train)
    probs = model.predict_proba(X_test)
    preds = probs[:,1]
    fpr, tpr, threshold = metrics.roc_curve(y_test, preds)
    roc_auc = metrics.auc(fpr, tpr)
    
    return roc_auc,


Once all the evolution functions have been chosen, they must be registered in the toolbox. Here we have two custom (mutate, evaluate) and two generic DEAP functions (mate, select). In this example, tournament size is set two. Note that larger tournament sizes increase the probability of eliminating weaker individual chromosomes from the population.

toolbox.register("mate", tools.cxOnePoint)
toolbox.register("mutate",mutate)
toolbox.register("select", tools.selTournament, tournsize=2)
toolbox.register("evaluate", evaluate)

Finally, combine these functions to create the GA. The size of the population, crossover probability, mutation probability, and number of generations must be defined. Larger populations allow greater exploration of the search space at the expense of computation time. The probability of crossover and mutation are typically large and small respectively. A larger number of generations will typically produce better results. however, this improvement often follows the law of diminishing returns and the user must decide how much time they wish to commit for increasingly small improvements in performance. Note that the number of generations and size of population necessary to produce acceptably strong results is problem dependant. The larger the search space, the more generations and larger population necessary. The hall of fame function stores the best ever individual chromosome across the generations. Stats provide information about the evolution across generations

population_size = 1000
crossover_probability = 0.7
mutation_probability = 0.01
number_of_generations = 20

pop = toolbox.population(n=population_size)
hof = tools.HallOfFame(1)
stats = tools.Statistics(lambda ind: ind.fitness.values)
stats.register("avg", np.mean)
stats.register("std", np.std)
stats.register("min", np.min)
stats.register("max", np.max)
pop, log = algorithms.eaSimple(pop, toolbox, cxpb=crossover_probability, stats = stats, 
                               mutpb = mutation_probability, ngen=number_of_generations, halloffame=hof, 
                               verbose=True) 

best_parameters = hof[0] # save the optimal set of parameters

Each generation contains a population of 1000 individual chromosomes. The minimum, maximum, and average fitness score (AUROC) of these individuals across generations is shown below. As this is a maximisation problem, we are primarily interested in the maximum fitness scores, however, it is useful to note that the average score of individuals improves throughout evolution, converging on the maximum scores.

The highest AUROC score achieved was 0.8348. This is an improvement of 0.1624 on the AUROC of decision tree classifier with the default scikit learn parameters. Optimal parameters were as follows:

Optimal Parameters

Criterion:  entropy
Splitter:  best
Max Depth:  14
Min Sample Split:  0.05236412600702878
Min Sample Leaf:  0.015005650596154774
Min weight fraction leaf :  0.04848049897043877
Max features:  0.9338489607619295
Max leaf nodes :  93
Min impurity decrease :  0.00023570613112355865
Class weight:  None


Genetic Algorithms are a quick and easy way to optimise the performance of classification algorithms through parameter tuning. It can provide serious improvement in performance and once you get the hang of designing custom functions, it is a fast process. For more information and examples see DEAP documentation. Hope you enjoyed the post!

Appendix: Complete Code

from deap import base
from deap import creator
from deap import tools
from deap import algorithms
import random


def mutate(individual):
    
    gene = random.randint(0,9) #select which parameter to mutate
    if gene == 0:
        if individual[0] == 'gini':
            individual[0] = 'entropy'
        else:
            individual[0] = 'gini'
        
    elif gene == 1:
        if individual[1] == 'best':
            individual[1] = 'random'
        else:
            individual[1] = 'best'
    
    elif gene == 2:
        individual[2] = random.randint(lower_depth, upper_depth)
        
    elif gene in [4,5]:
        individual[gene] = random.uniform(lower_samples_leaf, upper_samples_leaf)
        
    elif gene in [3,6,8]:
        individual[gene] = random.random()
        
    elif gene == 7:
        individual[7] = random.randint(lower_leaf_nodes, upper_leaf_nodes)
        
    elif gene == 9:
        if individual[9] == 'balanced':
            individual[9] = None
        else:
            individual[9] = 'balanced'        
        
    return individual,


def evaluate(individual):
    '''
    build and test a model based on the parameters in an individual and return
    the AUROC value
    '''
    # extract the values of the parameters from the individual chromosome
    crit = individual[0]
    split = individual[1]
    depth = individual[2]
    samples_split = individual[3]
    samples_leaf = individual[4]
    weight_fraction_leaf = individual[5]
    features = individual[6]
    leaf_nodes = individual[7]
    impurity_decrease = individual[8]
    class_weight = individual[9]
    
    # build the model
    model = DecisionTreeClassifier(criterion=crit, 
                                   splitter=split, 
                                   max_depth=depth, 
                                   min_samples_split=samples_split, 
                                   min_samples_leaf=samples_leaf, 
                                   min_weight_fraction_leaf=weight_fraction_leaf, 
                                   max_features=features, 
                                   random_state=100, 
                                   max_leaf_nodes=leaf_nodes, 
                                   min_impurity_decrease=impurity_decrease, 
                                   class_weight=class_weight, 
                                   presort=False).fit(X_train, y_train)
    probs = model.predict_proba(X_test)
    preds = probs[:,1]
    fpr, tpr, threshold = metrics.roc_curve(y_test, preds)
    roc_auc = metrics.auc(fpr, tpr)
    
    return roc_auc,

    


creator.create("FitnessMax", base.Fitness, weights=(1.0,)) # Maximise the fitness function value
creator.create("Individual", list, fitness=creator.FitnessMax)

toolbox = base.Toolbox()

# Possible parameter values
criterions = ['gini', 'entropy']
splits = ['best', 'random']
lower_depth, upper_depth = 2, 100 
lower_leaf_nodes, upper_leaf_nodes = 2, 100
class_weight = ['balanced', None]
lower_samples_leaf, upper_samples_leaf  = 0, 0.5


N_CYCLES = 1

toolbox.register("attr_crit", random.choice, criterions)
toolbox.register("attr_splitter", random.choice, splits)
toolbox.register("attr_depth", random.randint, lower_depth, upper_depth)
toolbox.register("attr_flt", random.random)
toolbox.register("attr_leaf_nodes", random.randint, lower_leaf_nodes, upper_leaf_nodes)
toolbox.register("attr_weight", random.choice, class_weight)
toolbox.register("attr_samples_leaf", random.uniform, lower_samples_leaf, upper_samples_leaf)


toolbox.register("individual", tools.initCycle, creator.Individual,
                 (toolbox.attr_crit, toolbox.attr_splitter, toolbox.attr_depth,
                 toolbox.attr_flt, toolbox.attr_samples_leaf, toolbox.attr_samples_leaf, 
                 toolbox.attr_flt, toolbox.attr_leaf_nodes,
                 toolbox.attr_flt, toolbox.attr_weight), n=N_CYCLES)
toolbox.register("population", tools.initRepeat, list, toolbox.individual)


toolbox.register("mate", tools.cxOnePoint)
toolbox.register("mutate",mutate)
toolbox.register("select", tools.selTournament, tournsize=2)
toolbox.register("evaluate", evaluate)

population_size = 1000
crossover_probability = 0.7
mutation_probability = 0.01
number_of_generations = 20

pop = toolbox.population(n=population_size)
hof = tools.HallOfFame(1)
stats = tools.Statistics(lambda ind: ind.fitness.values)
stats.register("avg", np.mean)
stats.register("std", np.std)
stats.register("min", np.min)
stats.register("max", np.max)
pop, log = algorithms.eaSimple(pop, toolbox, cxpb=crossover_probability, stats = stats, 
                               mutpb = mutation_probability, ngen=number_of_generations, halloffame=hof, 
                               verbose=True) 

best_parameters = hof[0] # save the optimal set of parameters





gen = log.select("gen")
max_ = log.select("max")
avg = log.select("avg")
min_ = log.select("min")

evolution = pd.DataFrame({'Generation': gen,
                         'Max AUROC': max_,
                          'Average':avg,
                         'Min AUROC': min_})

plt.title('Parameter Optimisation')
plt.plot(evolution['Generation'], evolution['Min AUROC'], 'b', color = 'C1',
         label = 'Min')
plt.plot(evolution['Generation'], evolution['Average'], 'b', color = 'C2',
         label = 'Average')
plt.plot(evolution['Generation'], evolution['Max AUROC'], 'b', color = 'C3',
         label= 'Max')


plt.legend(loc = 'lower right')
plt.ylabel('AUROC')
plt.xlabel('Generation')
plt.xticks([0,5,10,15,20])

Iterations are keep running. Once the iterations stop then it starts running again but does not come to the final results. I am applying this to Cifar-10 dataset and have not come to conclusion. I am using MLPCalssifer by the way. I trained it for hours but it's useless for me. If you can help me in this then it will be great support from your side. I am also getting the following warning when each iteration run: "ConvergenceWarning: Stochastic Optimizer: Maximum iterations (2) reached and the optimization hasn't converged yet."

Hello sir, I want to ask again, how to add output in the form of a measurement matrix or classification report to the evaluation() function? so the expected output is ROC curve and classification report(recall, precision, f1-score and accuracy)

Like
Reply

Hello sir, I'm looking for references and source code for optimization of Support Vector Machine(SVM) parameters using genetic algorithms. After reading your article that uses genetic algorithms for parameter optimization in decision trees, please briefly explain what if genetic algorithms are used for parameter optimization in SVM? where for example some key parameters in SVM with possible values like C= [0.1,1,10,100] gamma=[0.0001,0.001.0.1,1] kernel=['rbf','poly','sigmoid'] Thank you for your concern

Like
Reply

Hello, sir. Incredible article here, very useful. But, could you share how you plotted these results? I optimized a neural network using your code as a reference and I would like to see it in a graph form. Thanks in advance.

Like
Reply

Very interesting article and topic itself. Thank you, Conor!

Like
Reply

To view or add a comment, sign in

Others also viewed

Explore content categories