Bayesian Optimization
My colleague Wayne Thompson wrote a series of blogs about machine learning best practices, and the fifth post in this series is Autotune models to avoid local minimum breakdowns. In his post, Wayne mentioned two autotune methods, grid search and Bayesian optimization. According to another colleague’s paper Automated Hyperparameter Tuning for Effective Machine Learning, Bayesian optimization is currently popular for hyperparameter optimization. Recently I read a post on Github which demonstrated the Bayesian Optimization procedure through a great demo and I wondered if I could implement this demo with SAS/IML. In this article, I will show you how Bayesian optimization works through this simple demo.
Revisit Bayesian Optimization
Bayesian optimization builds a surrogate model for the black-box function between hyperparameters and the objective function based on the observations and uses an acquisition function to select the next hyperparameter.
1) Surrogate model
Functions between hyperparameters and objective function are often black-box functions, and Gaussian Process regression models are popular because of its capability to fit black-box functions with limited observations. More details about Gaussian Process regression, please refer to the book Gaussian Processes for Machine Learning.
2) Acquisition function and Location Propose function
Acquisition function evaluate points in the search space and Location Propose function finds out points which likely lead to the maximum improvement and are worth of trying. The acquisition function trades off exploitation and exploration. Exploitation selects points having high objective predictions, while exploration selects points having high variance. There are several acquisition functions, and I used expected improvement as the post does.
3) Sequential optimization procedure
Bayesian optimization is a sequential optimization procedure and the procedure as follows.
For t=1, 2, ... do
- Find the next sampling point by optimizing the acquisition function over the Gaussian Process (GP).
- Obtain a possibly noisy sample from the objective function.
- Add the sample to previous samples and update the GP.
end for
Implementation with SAS/IML
1) Gaussian Process Regression
Macro gpRegression is used to fit a Gaussian Process regression model through maximizing the marginal likelihood. More details, please read the Chapter 5 of book Gaussian Processes for Machine Learning.
2) Bayesian Optimization
Macro bayesianOptimization includes two functions, acquisition function and location propose function.
3) Visualization
Macro plotAll shows you the step-by-step optimization procedure visually.
The above 3 macros and following demo codes are available in the appendix.
Demo
proc iml;
%gpRegression;
%bayesianOptimization;
* Target function;
start f(X, noise=0);
return (-sin(3*X) - X##2 + 0.7*X + noise * randfun(nrow(X), "Normal"));
finish;
* Search space of sampling points;
X = T(do(-1, 1.99, 0.01));
call randseed(123);
Y = f(X, 0);
* Initialize samples;
X_train ={-0.9, 1.1};
noise = 0.2;
Y_train = f(X_train, noise);
* Initial parameters of GP regression model;
gprParms = {1 1 0};
* Max iterations for sequential Bayesian Optimization;
n_iter = 15;
do i=1 to n_iter;
* Update Gaussian process with existing samples;
gprParms = gprFit(gprParms);
* Obtain next sampling point from the acquisition function (acquisition);
proposeResults = proposeLocation(X, X_train, Y_train, gprParms);
X_next = proposeResults$"max_x";
if X_next=. then leave;
* Obtain next noisy sample from the objective function;
Y_next = f(X_next, noise);
* Add sample to previous samples;
X_train = X_train//X_next;
Y_train = Y_train//Y_next;
* Save all proposed sampling points into a matrix;
allProposed = allProposed//(j(1, 1, i)||X_next);
end;
* Save all proposed sampling points into a SAS dataset;
create allProposed from allProposed [colname={"Iteration" "X"}];
append from allProposed;
close allProposed;
run;
quit;
Visualize the step-by-step Bayesian Optimization procedure
Figure-1 Target function and initial two samples
Figure-2 Proposed location and acquisition function plot of the first two iterations.
To save the space, I will skip other iterations and jump to the last two iterations.
Figure-3 Proposed location and acquisition function plot of the last two iterations.
Figure-4 Distance between consecutive proposed points by iteration.
Codes of the three macros were posted on Github.
some IML code is version specific I presume? e.g. mu = predictResults$"mu_s" doesn't work in 9.4 ==>mu = ListGetItem(predictResults, "mu_s"); And also the .leave in some do-loops don't work in 9.4 ==> do until ...
I modified the SAS code to create an animation gif file to demonstrate the step-by-step optimization, pls check the GitHub with the latest SAS code.
I do not have IML Would it harder to implement the algorithms without IML?