Hyperparameter Optimization

Hyperparameter Optimization

In my quest to obtain most efficient model using various hyper parameters, led me to learning hyper parameter optimization. This article is a summary of what I learned and how we can use this algorithm to achieve best parameters for computationally expensive objective functions.

What are hyper parameters?

Hyper parameters are parameters used to control the learning process. These are adjustable parameters which need to be tuned to get a model with optimal performance. For example in Random Forest algorithm the number of estimators (n_estimators), maximum depth (max_depth) etc. are hyper parameters which are tuned to get the best performance.

Hyperparameter optimization is a process of finding the right combination of hyperparameter values in order to achieve maximum performance on the data in a reasonable amount of time. Hyperparameter optimization can be represented in simple equation form as:

No alt text provided for this image

Here f(x) represents an objective score to minimize— such as RMSE or error rate— evaluated on the validation set; x* is the set of hyperparameters that yields the lowest value of the score, and x can take on any value in the domain X.

Some of the strategies for Hype parameter optimization are:

  1. Manual – Manually check each parameter to evaluate the best model.
  2. Grid search – In this approach system performs an exhaustive search for best parameter from the grid of hyper parameter values. This takes a lot of time and can be computationally expensive.
  3. Random Search – In this approach system randomly selects hyperparameters and finds best solution to build the model. Disadvantage of this technique is that it might miss the best solution

Grid and random search are completely un-informed by past evaluations, and as a result, often spend a significant amount of time evaluating “bad” hyperparameters.

4.Bayesian Model Based Optimization – Build a probability model of the objective function and use it to select the most promising hyper parameters to evaluate the true objective function.

At a high-level, Bayesian optimization methods are efficient because they choose the next hyperparameters in an informed mannerThe basic idea is: spend a little more time selecting the next hyperparameters in order to make fewer calls to the objective function. In practice, the time spent selecting the next hyperparameters is inconsequential compared to the time spent in the objective function. By evaluating hyperparameters that appear more promising from past results, Bayesian methods can find better model settings than random search in fewer iterations.

Sequential Model Based Optimization

Sequential model-based optimization (SMBO) methods (SMBO) are a formalization of Bayesian optimization. The sequential refers to running trials one after another, each time trying better hyperparameters by applying Bayesian reasoning and updating a probability model (surrogate).

There are five aspects of model-based hyperparameter optimization:

1.      Domain: Domain is the search space of hyper parameters where the algorithm checks for model objective.

2.      Objective Function: The objective function takes in hyperparameters and outputs a single real-valued score that we want to minimize (or maximize).

While the objective function looks simple, it is very expensive to compute! If the objective function could be quickly calculated, then we could try every single possible hyperparameter combination (like in grid search). If we are using a simple model, a small hyperparameter grid, and a small dataset, then this might be the best way to go. However, in cases where the objective function may take hours or even days to evaluate, we want to limit calls to it.

 The entire concept of Bayesian model-based optimization is to reduce the number of times the objective function needs to be run by choosing only the most promising set of hyperparameters to evaluate based on previous calls to the evaluation function. The next set of hyperparameters are selected based on a model of the objective function called a surrogate.

3.      Surrogate Function (Probability Model): The surrogate function, also called the response surface, is the probability representation of the objective function built using previous evaluations. There are several different forms of the surrogate function including Gaussian Processes and Random Forest regression, Tree-structured Parzen Estimator (TPE). TPE uses Bayes approach to find the best possible hyperparameters using previous history.

4.      Selection Function: The selection function is the criteria by which the next set of hyperparameters are chosen from the surrogate function. The most common choice of criteria is Expected Improvement.

Implementation:

Hyperopt is a python library which uses Bayesian optimization for parameter tuning that allows you to get the best parameters for a given model.

Features of Hyperopt

i.        Objective function: This function is to minimize the expression, it gets hyper parameters as input from the search space and returns the loss

ii.      fmin function: fmin function is the optimization function that iterates on different set of algorithms and their hyperparameters and then minimizes the objective function. It has following parameters:

a.      Objective function to minimize

b.     Defined search space

c.      Search algorithm to use such as Random search, TPE (Tree Parzen Estimator)

d.     Maximum number of evaluations

e.     The Trials object (optional)

iii.     Search space: There are different functions to specify ranges of input parameters, these are stochastic search spaces. Most common search space are:

a.      hp.choice(label, options): These are used for categorical parameters and returns one of the options, which should be list of tuple. For example hp.choice(‘criterion’,[‘gini’,’’entropy’])

b.     hp.normal(label, mu, sigma): Here mu and sigma are the mean and standard deviation for normal distribution

c.      hp.uniform(label, low, high): it returns a value uniformly distributed between low and high

d.     hp.randint(label, upper): it returns random integer in the range of (0,upper)

iv.    Trial Object: Trail object keeps record of all hyperparameters, loss and other information which can be accessed after running optimization.

Detailed code to understand this can be checked here.

References:

  • https://towardsdatascience.com/a-conceptual-explanation-of-bayesian-model-based-hyperparameter-optimization-for-machine-learning-b8172278050f
  • INSAID class material


To view or add a comment, sign in

Others also viewed

Explore content categories