Grid Search vs. Randomized Search: A Didactic Exploration with Practical Example
In supervised machine learning, the performance of models such as decision trees, support vector machines, or neural networks is highly influenced by hyperparameters — configuration settings that are not learned from data but defined prior to training. Finding the optimal set of hyperparameters is crucial to achieving better accuracy and generalization. Two widely used methods for hyperparameter tuning are Grid Search and Randomized Search.
This article introduces these two techniques with a didactic perspective, explains their differences, and demonstrates their application using a well-known dataset from the scikit-learn library.
What is Hyperparameter Tuning?
Hyperparameter tuning refers to the process of choosing a set of optimal hyperparameters for a learning algorithm. Examples of hyperparameters include:
An incorrect combination can lead to underfitting or overfitting. Hence, tuning these values is essential for model performance.
Grid Search: Exhaustive and Structured
Grid Search systematically works through multiple combinations of parameter values, cross-validating as it goes to determine which combination gives the best performance.
Pros:
Cons:
Randomized Search: Efficient and Scalable
Randomized Search samples a given number of candidates from a parameter space with a specified distribution. Instead of trying all combinations, it picks combinations randomly.
Pros:
Cons:
Visual Comparison of Grid Search vs Randomized Search
This diagram below illustrates how Grid Search (left) and Randomized Search (right) explore the hyperparameter space.
An essential consideration when choosing between Grid Search and Randomized Search is their computational complexity, especially as the hyperparameter space grows.
Grid Search Complexity
Grid Search performs an exhaustive search over all combinations of hyperparameters. If we denote:
Then the total number of model fits is:
This means the complexity grows exponentially with the number of parameters and tested values. Even for moderately sized grids, Grid Search becomes impractical.
Randomized Search Complexity
Randomized Search randomly samples n_iter combinations of hyperparameters, independently of the number of parameters or tested values per parameter. The complexity is:
Thus, the runtime grows linearly with the number of iterations specified, regardless of the size of the hyperparameter space. This makes it much more scalable.
Summary Comparison:
Practical Example with scikit-learn
Let us now compare the two methods using the Wine dataset and a Support Vector Classifier (SVC).
from sklearn.datasets import load_wine
from sklearn.model_selection import train_test_split, GridSearchCV, RandomizedSearchCV
from sklearn.svm import SVC
from sklearn.metrics import classification_report, accuracy_score
import numpy as np
import time
# Load dataset
wine = load_wine()
X, y = wine.data, wine.target
# Stratified split
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.3, random_state=42, stratify=y
)
# Define model
svc = SVC()
# Grid search parameters
param_grid = {
'C': [0.1, 1, 10],
'gamma': [0.01, 0.001],
'kernel': ['rbf']
}
param_dist = {
'C': np.logspace(-2, 2, 20),
'gamma': np.logspace(-4, -1, 20),
'kernel': ['rbf']
}
# Grid Search
start_grid = time.time()
grid_search = GridSearchCV(svc, param_grid, cv=5, verbose=1, n_jobs=-1)
grid_search.fit(X_train, y_train)
end_grid = time.time()
print("Best parameters from Grid Search:", grid_search.best_params_)
print(f"Grid Search Time: {end_grid - start_grid:.4f} seconds")
# Randomized Search
start_random = time.time()
random_search = RandomizedSearchCV(
svc, param_distributions=param_dist, n_iter=20,
cv=5, verbose=1, random_state=42, n_jobs=-1
)
random_search.fit(X_train, y_train)
end_random = time.time()
print("Best parameters from Randomized Search:", random_search.best_params_)
print(f"Randomized Search Time: {end_random - start_random:.4f} seconds")
# Predictions and Evaluation
y_pred_grid = grid_search.predict(X_test)
y_pred_random = random_search.predict(X_test)
print("\nClassification Report - Grid Search")
print(classification_report(y_test, y_pred_grid))
print("Accuracy (Grid Search):", accuracy_score(y_test, y_pred_grid))
print("\nClassification Report - Randomized Search")
print(classification_report(y_test, y_pred_random))
print("Accuracy (Randomized Search):", accuracy_score(y_test, y_pred_random))
Results:
Fitting 5 folds for each of 6 candidates, totalling 30 fits
Best parameters from Grid Search: {'C': 10, 'gamma': 0.001, 'kernel': 'rbf'}
Grid Search Time: 10.3307 seconds
Fitting 5 folds for each of 20 candidates, totalling 100 fits
Best parameters from Randomized Search: {'kernel': 'rbf', 'gamma': np.float64(0.0001438449888287663), 'C': np.float64(61.584821106602604)}
Randomized Search Time: 1.2150 seconds
Randomized Search completed in 1.2150 seconds, approximately one-tenth the time required by Grid Search.
Classification Report - Grid Search
precision recall f1-score support
0 0.94 0.89 0.91 18
1 0.82 0.67 0.74 21
2 0.60 0.80 0.69 15
accuracy 0.78 54
macro avg 0.79 0.79 0.78 54
weighted avg 0.80 0.78 0.78 54
Accuracy (Grid Search): 0.7777777777777778
Classification Report - Randomized Search
precision recall f1-score support
0 0.87 0.72 0.79 18
1 0.75 0.86 0.80 21
2 0.80 0.80 0.80 15
accuracy 0.80 54
macro avg 0.81 0.79 0.80 54
weighted avg 0.80 0.80 0.80 54
Accuracy (Randomized Search): 0.7962962962962963
Even though Grid Search guarantees that it checks every combination, this comes at the cost of computational time. On the other hand, Randomized Search can often find a good enough solution much faster by sampling fewer combinations, especially when some parameters have less influence than others.
In our example, both methods produced similar performance, but Randomized Search completed much faster. This suggests that Randomized Search is particularly useful when you have limited computational resources or a very large parameter space.
Lessons Learned
Grid Search and Randomized Search are powerful tools for hyperparameter optimization. While Grid Search is more exhaustive and guarantees the best solution within the grid, Randomized Search offers a faster alternative, especially for high-dimensional spaces.
Choosing between the two depends on the size of the search space, available computation time, and the criticality of finding the absolute best model.
In practice, a good approach is to begin with Randomized Search to narrow down the region of interest and then fine-tune with Grid Search if needed.
Test Your Intuition
Did you like the article? Like and share it with your network! Help spread the knowledge about Machine Learning!
#MachineLearning #HyperparameterTuning #GridSearch #RandomizedSearch #ScikitLearn #SVM #ModelOptimization #AI #DataScience #MLWorkflow #ComputationalEfficiency #MLTips #PythonMachineLearning #MLModelSelection