Bayesian Optimization

My colleague Wayne Thompson wrote a series of blogs about machine learning best practices, and the fifth post in this series is Autotune models to avoid local minimum breakdowns. In his post, Wayne mentioned two autotune methods, grid search and Bayesian optimization. According to another colleague’s paper Automated Hyperparameter Tuning for Effective Machine Learning, Bayesian optimization is currently popular for hyperparameter optimization. Recently I read a post on Github which demonstrated the Bayesian Optimization procedure through a great demo and I wondered if I could implement this demo with SAS/IML. In this article, I will show you how Bayesian optimization works through this simple demo.

Revisit Bayesian Optimization

Bayesian optimization builds a surrogate model for the black-box function between hyperparameters and the objective function based on the observations and uses an acquisition function to select the next hyperparameter.

1)     Surrogate model

Functions between hyperparameters and objective function are often black-box functions, and Gaussian Process regression models are popular because of its capability to fit black-box functions with limited observations. More details about Gaussian Process regression, please refer to the book Gaussian Processes for Machine Learning.

2)     Acquisition function and Location Propose function

Acquisition function evaluate points in the search space and Location Propose function finds out points which likely lead to the maximum improvement and are worth of trying. The acquisition function trades off exploitation and exploration. Exploitation selects points having high objective predictions, while exploration selects points having high variance. There are several acquisition functions, and I used expected improvement as the post does.

3)     Sequential optimization procedure

Bayesian optimization is a sequential optimization procedure and the procedure as follows.

For t=1, 2, ... do

  • Find the next sampling point by optimizing the acquisition function over the Gaussian Process (GP).
  • Obtain a possibly noisy sample from the objective function.
  • Add the sample to previous samples and update the GP.

end for

Implementation with SAS/IML

1)     Gaussian Process Regression

Macro gpRegression is used to fit a Gaussian Process regression model through maximizing the marginal likelihood. More details, please read the Chapter 5 of book Gaussian Processes for Machine Learning.

2)     Bayesian Optimization

Macro bayesianOptimization includes two functions, acquisition function and location propose function.

3)     Visualization

Macro plotAll shows you the step-by-step optimization procedure visually.

The above 3 macros and following demo codes are available in the appendix.

Demo

proc iml;
   %gpRegression;
   %bayesianOptimization;  
 
   * Target function;
   start f(X, noise=0);
       return (-sin(3*X) - X##2 + 0.7*X + noise * randfun(nrow(X), "Normal"));
   finish;   
 
   * Search space of sampling points;
   X = T(do(-1, 1.99, 0.01));
   call randseed(123);
   Y = f(X, 0);   
 
   * Initialize samples;
   X_train ={-0.9, 1.1};
   noise = 0.2;
   Y_train = f(X_train, noise);  
   
   * Initial parameters of GP regression model;
   gprParms = {1 1 0};
   
   * Max iterations for sequential Bayesian Optimization;
   n_iter = 15;
   do i=1 to n_iter;
      * Update Gaussian process with existing samples;
      gprParms = gprFit(gprParms);
   
      * Obtain next sampling point from the acquisition function (acquisition);
      proposeResults = proposeLocation(X, X_train, Y_train, gprParms);
      X_next = proposeResults$"max_x";
      if X_next=. then leave;
    
      * Obtain next noisy sample from the objective function;
      Y_next = f(X_next, noise);
    
      * Add sample to previous samples;
      X_train = X_train//X_next;
      Y_train = Y_train//Y_next;  
      
      * Save all proposed sampling points into a matrix;
      allProposed = allProposed//(j(1, 1, i)||X_next);
   end;
   
   * Save all proposed sampling points into a SAS dataset;
   create allProposed from allProposed [colname={"Iteration" "X"}];
   append from allProposed;
   close allProposed;
run;
quit;

Visualize the step-by-step Bayesian Optimization procedure

No alt text provided for this image

Figure-1 Target function and initial two samples

No alt text provided for this image

Figure-2 Proposed location and acquisition function plot of the first two iterations.

To save the space, I will skip other iterations and jump to the last two iterations.

No alt text provided for this image

Figure-3 Proposed location and acquisition function plot of the last two iterations.

No alt text provided for this image

Figure-4 Distance between consecutive proposed points by iteration.

Codes of the three macros were posted on Github.















some IML code is version specific I presume? e.g. mu = predictResults$"mu_s" doesn't work in 9.4 ==>mu = ListGetItem(predictResults, "mu_s"); And also the .leave in some do-loops don't work in 9.4 ==> do until ...

Like
Reply

I modified the SAS code to create an animation gif file to demonstrate the step-by-step optimization, pls check the GitHub with the latest SAS code.

Like
Reply

I do not have IML Would it harder to implement the algorithms without IML?

Like
Reply

To view or add a comment, sign in

More articles by Emily Gao

Others also viewed

Explore content categories