Grid Search vs. Randomized Search: A Didactic Exploration with Practical Example

Grid Search vs. Randomized Search: A Didactic Exploration with Practical Example

In supervised machine learning, the performance of models such as decision trees, support vector machines, or neural networks is highly influenced by hyperparameters — configuration settings that are not learned from data but defined prior to training. Finding the optimal set of hyperparameters is crucial to achieving better accuracy and generalization. Two widely used methods for hyperparameter tuning are Grid Search and Randomized Search.

This article introduces these two techniques with a didactic perspective, explains their differences, and demonstrates their application using a well-known dataset from the scikit-learn library.


What is Hyperparameter Tuning?

Hyperparameter tuning refers to the process of choosing a set of optimal hyperparameters for a learning algorithm. Examples of hyperparameters include:

  • The number of trees in a random forest
  • The regularization parameter C in a support vector machine
  • The learning rate in a neural network

An incorrect combination can lead to underfitting or overfitting. Hence, tuning these values is essential for model performance.


Grid Search: Exhaustive and Structured

Grid Search systematically works through multiple combinations of parameter values, cross-validating as it goes to determine which combination gives the best performance.

Pros:

  • Exhaustive search guarantees the best model within the grid.
  • Easy to implement and interpret.

Cons:

  • Computationally expensive.
  • Not efficient when the parameter space is large or when some parameters do not significantly affect the outcome.


Randomized Search: Efficient and Scalable

Randomized Search samples a given number of candidates from a parameter space with a specified distribution. Instead of trying all combinations, it picks combinations randomly.

Pros:

  • Much faster for large parameter spaces.
  • Can find a very good solution with fewer iterations.

Cons:

  • May miss the optimal combination.
  • Slightly less interpretable.


Visual Comparison of Grid Search vs Randomized Search

This diagram below illustrates how Grid Search (left) and Randomized Search (right) explore the hyperparameter space.

  • In Grid Search, points are evaluated in a fixed, evenly spaced grid, resulting in a structured but potentially inefficient search — especially if optimal values lie between grid points.
  • In Randomized Search, hyperparameters are sampled randomly, enabling coverage of a broader and more diverse portion of the space with fewer evaluations. This increases the chances of quickly locating high-performing regions, especially when only a few hyperparameters are influential.


Article content
Grid Search x Randomized Search exploring hyperparameters combination to find the global minimum.

An essential consideration when choosing between Grid Search and Randomized Search is their computational complexity, especially as the hyperparameter space grows.

Grid Search Complexity

Grid Search performs an exhaustive search over all combinations of hyperparameters. If we denote:

  • n₁, n₂, ..., nₖ as the number of values tested for each of the k hyperparameters,
  • and v as the number of cross-validation folds,

Then the total number of model fits is:

Article content

This means the complexity grows exponentially with the number of parameters and tested values. Even for moderately sized grids, Grid Search becomes impractical.

Randomized Search Complexity

Randomized Search randomly samples n_iter combinations of hyperparameters, independently of the number of parameters or tested values per parameter. The complexity is:


Article content

Thus, the runtime grows linearly with the number of iterations specified, regardless of the size of the hyperparameter space. This makes it much more scalable.

Summary Comparison:

Article content

Practical Example with scikit-learn

Let us now compare the two methods using the Wine dataset and a Support Vector Classifier (SVC).

from sklearn.datasets import load_wine
from sklearn.model_selection import train_test_split, GridSearchCV, RandomizedSearchCV
from sklearn.svm import SVC
from sklearn.metrics import classification_report, accuracy_score
import numpy as np
import time

# Load dataset
wine = load_wine()
X, y = wine.data, wine.target

# Stratified split
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=42, stratify=y
)

# Define model
svc = SVC()

# Grid search parameters
param_grid = {
    'C': [0.1, 1, 10],
    'gamma': [0.01, 0.001],
    'kernel': ['rbf']
}

param_dist = {
    'C': np.logspace(-2, 2, 20),
    'gamma': np.logspace(-4, -1, 20),
    'kernel': ['rbf']
}

# Grid Search
start_grid = time.time()
grid_search = GridSearchCV(svc, param_grid, cv=5, verbose=1, n_jobs=-1)
grid_search.fit(X_train, y_train)
end_grid = time.time()
print("Best parameters from Grid Search:", grid_search.best_params_)
print(f"Grid Search Time: {end_grid - start_grid:.4f} seconds")

# Randomized Search
start_random = time.time()
random_search = RandomizedSearchCV(
    svc, param_distributions=param_dist, n_iter=20,
    cv=5, verbose=1, random_state=42, n_jobs=-1
)
random_search.fit(X_train, y_train)
end_random = time.time()
print("Best parameters from Randomized Search:", random_search.best_params_)
print(f"Randomized Search Time: {end_random - start_random:.4f} seconds")

# Predictions and Evaluation
y_pred_grid = grid_search.predict(X_test)
y_pred_random = random_search.predict(X_test)

print("\nClassification Report - Grid Search")
print(classification_report(y_test, y_pred_grid))
print("Accuracy (Grid Search):", accuracy_score(y_test, y_pred_grid))

print("\nClassification Report - Randomized Search")
print(classification_report(y_test, y_pred_random))
print("Accuracy (Randomized Search):", accuracy_score(y_test, y_pred_random))        

Results:

Fitting 5 folds for each of 6 candidates, totalling 30 fits
Best parameters from Grid Search: {'C': 10, 'gamma': 0.001, 'kernel': 'rbf'}

Grid Search Time: 10.3307 seconds

Fitting 5 folds for each of 20 candidates, totalling 100 fits
Best parameters from Randomized Search: {'kernel': 'rbf', 'gamma': np.float64(0.0001438449888287663), 'C': np.float64(61.584821106602604)}

Randomized Search Time: 1.2150 seconds         

Randomized Search completed in 1.2150 seconds, approximately one-tenth the time required by Grid Search.

Classification Report - Grid Search
              precision    recall  f1-score   support

           0       0.94      0.89      0.91        18
           1       0.82      0.67      0.74        21
           2       0.60      0.80      0.69        15

    accuracy                           0.78        54
   macro avg       0.79      0.79      0.78        54
weighted avg       0.80      0.78      0.78        54

Accuracy (Grid Search): 0.7777777777777778        
Classification Report - Randomized Search
              precision    recall  f1-score   support

           0       0.87      0.72      0.79        18
           1       0.75      0.86      0.80        21
           2       0.80      0.80      0.80        15

    accuracy                           0.80        54
   macro avg       0.81      0.79      0.80        54
weighted avg       0.80      0.80      0.80        54

Accuracy (Randomized Search): 0.7962962962962963        

Even though Grid Search guarantees that it checks every combination, this comes at the cost of computational time. On the other hand, Randomized Search can often find a good enough solution much faster by sampling fewer combinations, especially when some parameters have less influence than others.

In our example, both methods produced similar performance, but Randomized Search completed much faster. This suggests that Randomized Search is particularly useful when you have limited computational resources or a very large parameter space.


Lessons Learned

Grid Search and Randomized Search are powerful tools for hyperparameter optimization. While Grid Search is more exhaustive and guarantees the best solution within the grid, Randomized Search offers a faster alternative, especially for high-dimensional spaces.

Choosing between the two depends on the size of the search space, available computation time, and the criticality of finding the absolute best model.

In practice, a good approach is to begin with Randomized Search to narrow down the region of interest and then fine-tune with Grid Search if needed.


Test Your Intuition

  • Have you ever used Grid Search or Randomized Search in one of your machine learning projects? What were the trade-offs you experienced?
  • In your opinion, is it worth spending more computation time to guarantee the best hyperparameters?
  • How would you adapt these techniques when working with limited computational resources or real-time constraints?
  • Do you think smarter search algorithms (like Bayesian Optimization) are worth the added complexity compared to Randomized Search?
  • Which tool do you prefer in practice—and why?



Did you like the article? Like and share it with your network! Help spread the knowledge about Machine Learning!

#MachineLearning #HyperparameterTuning #GridSearch #RandomizedSearch #ScikitLearn #SVM #ModelOptimization #AI #DataScience #MLWorkflow #ComputationalEfficiency #MLTips #PythonMachineLearning #MLModelSelection


To view or add a comment, sign in

More articles by Emmanuel Andrade

Explore content categories