Bayesian Optimization

Emily Gao

Published May 6, 2019

My colleague Wayne Thompson wrote a series of blogs about machine learning best practices, and the fifth post in this series is Autotune models to avoid local minimum breakdowns. In his post, Wayne mentioned two autotune methods, grid search and Bayesian optimization. According to another colleague’s paper Automated Hyperparameter Tuning for Effective Machine Learning, Bayesian optimization is currently popular for hyperparameter optimization. Recently I read a post on Github which demonstrated the Bayesian Optimization procedure through a great demo and I wondered if I could implement this demo with SAS/IML. In this article, I will show you how Bayesian optimization works through this simple demo.

Revisit Bayesian Optimization

Bayesian optimization builds a surrogate model for the black-box function between hyperparameters and the objective function based on the observations and uses an acquisition function to select the next hyperparameter.

1) Surrogate model

Functions between hyperparameters and objective function are often black-box functions, and Gaussian Process regression models are popular because of its capability to fit black-box functions with limited observations. More details about Gaussian Process regression, please refer to the book Gaussian Processes for Machine Learning.

2) Acquisition function and Location Propose function

Acquisition function evaluate points in the search space and Location Propose function finds out points which likely lead to the maximum improvement and are worth of trying. The acquisition function trades off exploitation and exploration. Exploitation selects points having high objective predictions, while exploration selects points having high variance. There are several acquisition functions, and I used expected improvement as the post does.

3) Sequential optimization procedure

Bayesian optimization is a sequential optimization procedure and the procedure as follows.

For t=1, 2, ... do

Find the next sampling point by optimizing the acquisition function over the Gaussian Process (GP).
Obtain a possibly noisy sample from the objective function.
Add the sample to previous samples and update the GP.

end for

Implementation with SAS/IML

1) Gaussian Process Regression

Macro gpRegression is used to fit a Gaussian Process regression model through maximizing the marginal likelihood. More details, please read the Chapter 5 of book Gaussian Processes for Machine Learning.

2) Bayesian Optimization

Macro bayesianOptimization includes two functions, acquisition function and location propose function.

3) Visualization

Macro plotAll shows you the step-by-step optimization procedure visually.

The above 3 macros and following demo codes are available in the appendix.

Demo

proc iml;
   %gpRegression;
   %bayesianOptimization;  
 
   * Target function;
   start f(X, noise=0);
       return (-sin(3*X) - X##2 + 0.7*X + noise * randfun(nrow(X), "Normal"));
   finish;   
 
   * Search space of sampling points;
   X = T(do(-1, 1.99, 0.01));
   call randseed(123);
   Y = f(X, 0);   
 
   * Initialize samples;
   X_train ={-0.9, 1.1};
   noise = 0.2;
   Y_train = f(X_train, noise);  
   
   * Initial parameters of GP regression model;
   gprParms = {1 1 0};
   
   * Max iterations for sequential Bayesian Optimization;
   n_iter = 15;
   do i=1 to n_iter;
      * Update Gaussian process with existing samples;
      gprParms = gprFit(gprParms);
   
      * Obtain next sampling point from the acquisition function (acquisition);
      proposeResults = proposeLocation(X, X_train, Y_train, gprParms);
      X_next = proposeResults$"max_x";
      if X_next=. then leave;
    
      * Obtain next noisy sample from the objective function;
      Y_next = f(X_next, noise);
    
      * Add sample to previous samples;
      X_train = X_train//X_next;
      Y_train = Y_train//Y_next;  
      
      * Save all proposed sampling points into a matrix;
      allProposed = allProposed//(j(1, 1, i)||X_next);
   end;
   
   * Save all proposed sampling points into a SAS dataset;
   create allProposed from allProposed [colname={"Iteration" "X"}];
   append from allProposed;
   close allProposed;
run;
quit;

Visualize the step-by-step Bayesian Optimization procedure

Figure-1 Target function and initial two samples

Figure-2 Proposed location and acquisition function plot of the first two iterations.

To save the space, I will skip other iterations and jump to the last two iterations.

Figure-3 Proposed location and acquisition function plot of the last two iterations.

Figure-4 Distance between consecutive proposed points by iteration.

Codes of the three macros were posted on Github.

Arie Oudshoorn 6y

some IML code is version specific I presume? e.g. mu = predictResults$"mu_s" doesn't work in 9.4 ==>mu = ListGetItem(predictResults, "mu_s"); And also the .leave in some do-loops don't work in 9.4 ==> do until ...

Emily Gao 6y

I modified the SAS code to create an animation gif file to demonstrate the step-by-step optimization, pls check the GitHub with the latest SAS code.

ChangYun (Jason) Chung 鍾昌運 6y

I do not have IML Would it harder to implement the algorithms without IML?

See more comments

To view or add a comment, sign in

Bayesian Optimization

Emily Gao

More articles by Emily Gao

Others also viewed

Linear Regression from a Machine Learning Perspective

Learn/Review Machine Learning With Me🤖

Why Linear Regression is not suitable for Classification

Parametric Methods for Machine Learning – Part 1

Machine Learning Algorithms

Machine Learning Basics 1: Linear Regression or Decision Trees or Clustering?

Understanding Lasso Regression: A Powerful Tool for Feature Selection

Understanding Simple Linear Regression: The Foundation of Predictive Modeling

Regression Analysis in Machine Learning: A Guide for Business Leaders and Data Professionals

Understanding Bayes’ Theorem: A Key Idea Behind Bayesian Methods

How to Optimize Machine Learning Performance

Optimization Techniques for Artificial Intelligence

Best Practices For Evaluating Predictive Analytics Models

Parameter Tuning Strategies for Large Language Models

Tips for Creating a Machine Learning Experimentation Environment

Explore content categories