How to optimize the model with Optuna?
Optuna is an automatic hyperparameter optimization
Let's implement the optimization of the model on dataset
We are using kidney stone data - that is classification analysis performed over data to predict whether a patient has a kidney stone or not. Data used in this analysis can be found on Kaggle.
First, import the required libraries:
import pandas as pd
import numpy as np
import optuna
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.metrics import accuracy_score, roc_auc_score
from sklearn.inspection import permutation_importance
from xgboost import XGBClassifier
from sklearn.model_selection import RandomizedSearchCV, train_test_split, RepeatedStratifiedKFoldd
Here, we are using cross-validation
train_df = pd.read_csv('train.csv')
X = train_df.drop(['target'], axis=1)
y = train_df.target
X_train, X_test, y_train, y_test = train_test_split(X,y, test_size=0.2, random_state=43)
Now, Visualize the correlation of data with the heatmap.
corr = train_df.corr()
mask = np.triu(corr)
sns.heatmap(corr, mask=mask, annot=True, fmt='.3f')
Now, let's plot the data with a pair plot to see the distribution of data and understand the trend of the data.
sns.pairplot(data=train_df, hue='target', corner=True,
plot_kws={'s':80, 'edgecolor':'white','linewidth':2.5},
palette='viridis')
Now that we have got some ideas about the data trend from the above data let's try to optimize the hyperparameter
Recommended by LinkedIn
optuna.logging.set_verbosity(optuna.logging.INFO)
def objective(trial):
params={
'verbosity': 0,
'n_estimators': trial.suggest_int('n_estimators', 50, 1500),
'learning_rate': trial.suggest_float('learning_rate', 1e-7, 1e-1),
'max_depth': trial.suggest_int('max_depth', 3, 20),
'colsample_bytree': trial.suggest_float('colsample_bytree', 0.1, 1.0),
'alpha': trial.suggest_float('alpha', 1e-5, 1e2),
'lambda': trial.suggest_float('lambda', 1e-5, 1e2),
'objective': 'binary:logistic',
'eval_metric': 'auc',
'booster':trial.suggest_categorical("booster", ["dart", "gbtree",'gblinear']),
'min_child_weight': trial.suggest_int('min_child_weight', 0, 5),
'tree_method': 'gpu_hist'
}
kf = RepeatedStratifiedKFold(n_splits=10, n_repeats=2, random_state=42)
scores = []
for train_idx, test_idx in kf.split(X,y):
X_train_fold, X_val_fold = X.iloc[train_idx], X.iloc[test_idx]
y_train_fold, y_val_fold = y.iloc[train_idx], y.iloc[test_idx]
xgb_model = XGBClassifier(**params)
xgb_model.fit(X_train_fold, y_train_fold)
y_pred = xgb_model.predict(X_val_fold)
score = roc_auc_score(y_val_fold, y_pred)
scores.append(score)
return np.mean(scores)
study = optuna.create_study(direction='maximize')
study.optimize(objective, n_trials=10, n_jobs=-1))
study.best_params
From the above code, Optuna given followed optimized hyperparameter:
xgb_optuna_params = {'n_estimators': 521
'learning_rate': 0.048058528487311035,
'max_depth': 19,
'colsample_bytree': 0.45072121994519687,
'alpha': 25.06956546276981,
'lambda': 12.722971177461535,
'booster': 'gbtree',
'min_child_weight': 4}
Let's see the difference between the default and optimized parameter output.
Default Parameters
kf = RepeatedStratifiedKFold(n_splits = 10, random_state = 42, n_repeats = 10
default_param_scores = []
for train_idx, val_idx in kf.split(X, y):
X_train, y_train = X.iloc[train_idx], y.iloc[train_idx]
X_val, y_val = X.iloc[val_idx], y.iloc[val_idx]
model = XGBClassifier().fit(X_train, y_train)
y_pred = model.predict_proba(X_val)
score = roc_auc_score(y_val, y_pred[:, 1])
default_param_scores.append(score)
print(np.array(default_param_scores).mean())) # 0.7515
Optimized parameter
kf = RepeatedStratifiedKFold(n_splits = 10, random_state = 42, n_repeats = 10
optimize_param_scores = []
for train_idx, val_idx in kf.split(X, y):
X_train, y_train = X.iloc[train_idx], y.iloc[train_idx]
X_val, y_val = X.iloc[val_idx], y.iloc[val_idx]
model = XGBClassifier(**xgb_optuna_params).fit(X_train, y_train)
y_pred = model.predict_proba(X_val)
score = roc_auc_score(y_val, y_pred[:, 1])
optimize_param_scores.append(score)
print(np.array(optimize_param_scores).mean())) # 0.7793
We can clearly see the difference with a few steps and we can further optimize parameters by giving some more time to optimize the model parameters.
The full code will be available on GitHub.
Fill free to reach me at jivaniutsav007@gmail.com for any questions, concerns or suggestions!
I hope you will find it insightful!
References: