Bayesian A/B Testing: Simulating Customer CTR

This project simulates and compares several algorithms for solving the explore-exploit dilemma, a classic reinforcement learning problem. The simulation is framed in the context of online advertising, where a decision engine must choose which ad to display to a user to maximize total clicks over time.

The core problem is that we have several “bandits” (in this case, advertisements), each with a different, unknown probability of yielding a reward (a click). The challenge is to balance:

Exploration: Trying out different ads to learn their true click-through rates (CTR).
Exploitation: Showing the ad that is currently believed to be the best to maximize immediate rewards.

This simulation runs for 1,000,000 user visits and compares the performance of three different strategies.

The simulation implements and compares the following three algorithms:

Random Selection: This is a pure exploration strategy. It serves as a baseline by choosing an ad completely at random for every user visit. It does not learn from past results.
Epsilon-Greedy: This algorithm balances exploration and exploitation.

With probability epsilon (ε), it explores by choosing an ad randomly.
With probability 1 - epsilon, it exploits by choosing the ad with the highest estimated CTR based on past performance.
In this simulation, epsilon is set to 0.1.

3. Thompson Sampling: This is a Bayesian approach.

It models the CTR of each ad as a probability distribution (specifically, a Beta distribution, which is ideal for modeling probabilities).
For each decision, it draws a random sample from each ad’s current distribution and chooses the ad with the highest sample.
This method naturally balances explore/exploit. Ads with uncertain but potentially high CTRs will sometimes be chosen, allowing for exploration, while ads with a proven high CTR will be chosen more frequently.

We implement the simulation in python, with the below key components:

The Advertisement: We decide to give some attributes to the advertisements — Name and CTR rate. (this should reflect the expected CTR for the customers that the simulation will ultimately converge to!)

ADS_TRUE_CTR = {
    "ad_1": 0.05,  # 5% click-through rate
    "ad_2": 0.10,  # 10% click-through rate
    "ad_3": 0.11
}

The Decision Engine: Next we create the decision engine class. It contains the functions to implement the 3 algorithms that will be used to drive the ad selection — random selection, epsilon-greedy with the parameter ε and the Thompson Sampling technique with a beta conjugate prior as the outcome is a click (yes or no).

class AdDecisionEngine:
    """
    A class to manage the state and decisions for our ad selection algorithms.
    """
    def __init__(self, ad_names):
        self.ad_names = ad_names
        self.num_ads = len(ad_names)
        # --- Epsilon-Greedy State ---
        self.ad_impressions = {name: 0 for name in ad_names}
        self.ad_clicks = {name: 0 for name in ad_names}
        # --- Thompson Sampling State ---
        # Beta distribution parameters (alpha, beta). Start with (1, 1) which is a uniform distribution.
        self.beta_params = {name: (1, 1) for name in ad_names}

    def choose_ad_random(self):
        """Pure exploration: randomly select an ad."""
        return np.random.choice(self.ad_names)

    def choose_ad_epsilon_greedy(self):
        """
        With probability epsilon, explore (choose randomly).
        With probability 1-epsilon, exploit (choose the best ad so far).
        """
        if np.random.random() < EPSILON:
            # Explore
            return np.random.choice(self.ad_names)
        else:
            # Exploit
            # Calculate current estimated CTR for each ad
            estimated_ctrs = {
                name: (self.ad_clicks[name] / self.ad_impressions[name]) if self.ad_impressions[name] > 0 else 0
                for name in self.ad_names
            }
            # Return the ad with the highest estimated CTR
            return max(estimated_ctrs, key=estimated_ctrs.get)

    def choose_ad_thompson(self):
        """
        Thompson Sampling: Sample from each ad's beta distribution
        and choose the one with the highest sample.
        """
        samples = {
            name: np.random.beta(self.beta_params[name][0], self.beta_params[name][1])
            for name in self.ad_names
        }
        return max(samples, key=samples.get)
    
    def update_epsilon_greedy(self, ad_name, was_clicked):
        """Update counts for Epsilon-Greedy."""
        self.ad_impressions[ad_name] += 1
        if was_clicked:
            self.ad_clicks[ad_name] += 1

    def update_thompson(self, ad_name, was_clicked):
        """Update beta distribution parameters for Thompson Sampling."""
        alpha, beta_val = self.beta_params[ad_name]
        if was_clicked:
            self.beta_params[ad_name] = (alpha + 1, beta_val) # Click is a success
        else:
            self.beta_params[ad_name] = (alpha, beta_val + 1) # No click is a failure

Simulating customer ‘click’: The CTR of ‘x%’ is simulated by drawing a value from the uniform distribution U[0,1] < x.

def simulate_client_click(ad_name):
    """
    --- 3. Client-Side Simulation ---
    Simulates if a client clicks on the shown ad based on its true CTR.
    Returns True for a click (reward=1), False otherwise (reward=0).
    """
    ctr = ADS_TRUE_CTR[ad_name]
    return np.random.random() < ctr

Running the simulation: Separate simulations to be run with each algorithm. Using a ‘run_simulation()’ function we can run each of the algorithms to be tested.

def run_simulation(decision_function, update_function, engine, num_visits):
    """
    Runs the main simulation loop for a given algorithm.
    """
    total_rewards = 0
    history = []
    for _ in range(num_visits):
        # --- 2. Decision Engine Selects Ad ---
        chosen_ad = decision_function()

        # --- 3. Client Side Simulates Click ---
        was_clicked = simulate_client_click(chosen_ad)

        # --- 4. Reward is Processed ---
        reward = 1 if was_clicked else 0
        total_rewards += reward

        # Update the engine's state
        update_function(chosen_ad, was_clicked)
        history.append(total_rewards)

    return total_rewards, history


# --- Algorithm 1: Random Selection ---
random_engine = AdDecisionEngine(AD_NAMES)
# For random, the "update" function does nothing as it doesn't learn.
total_rewards_random, history_random = run_simulation(
    random_engine.choose_ad_random,
    lambda name, click: None, # No learning/update needed
    random_engine,
    NUM_VISITS
)
print(f"\nTotal Rewards (Random Selection): {total_rewards_random}")
print(f"Overall CTR (Random Selection): {total_rewards_random / NUM_VISITS:.4f}")

Output (for 100 trials):
----------------------
Total Rewards (Random Selection): 7
Overall CTR (Random Selection): 0.0700

We can visualize the learning by plotting the rewards with trials.

Typically, the results will show:

Thompson Sampling performs the best, quickly identifying the optimal ad and exploiting it, leading to the highest cumulative reward.
Epsilon-Greedy performs significantly better than random selection but is less efficient than Thompson Sampling because it is forced to explore randomly epsilon percent of the time, even after it's confident about the best ad.
Random Selection performs the worst, as it never learns and its CTR converges to the average CTR of all available ads.

Find the complete notebook here: https://github.com/souravoo7/Marketing_Analytics/blob/main/testing_simulation.ipynb

Bayesian A/B Testing: Simulating Customer CTR

Sourav B.

More articles by Sourav B.

Explore content categories

More articles by Sourav B.

2025: Canadian Banking Overview

Generative AI: Anomaly Detection using Large Language Models

Radical Uncertainty - What do we know?

Data Quality

Data Governance over the ‘Data Life Cycle’

Data Governance: What is it?

The Cold Start Problem

Notes From The Future of Money

Explore content categories