Bayesian A/B Testing: Simulating Customer CTR
This project simulates and compares several algorithms for solving the explore-exploit dilemma, a classic reinforcement learning problem. The simulation is framed in the context of online advertising, where a decision engine must choose which ad to display to a user to maximize total clicks over time.
The core problem is that we have several “bandits” (in this case, advertisements), each with a different, unknown probability of yielding a reward (a click). The challenge is to balance:
This simulation runs for 1,000,000 user visits and compares the performance of three different strategies.
The simulation implements and compares the following three algorithms:
3. Thompson Sampling: This is a Bayesian approach.
We implement the simulation in python, with the below key components:
The Advertisement: We decide to give some attributes to the advertisements — Name and CTR rate. (this should reflect the expected CTR for the customers that the simulation will ultimately converge to!)
ADS_TRUE_CTR = {
"ad_1": 0.05, # 5% click-through rate
"ad_2": 0.10, # 10% click-through rate
"ad_3": 0.11
}
The Decision Engine: Next we create the decision engine class. It contains the functions to implement the 3 algorithms that will be used to drive the ad selection — random selection, epsilon-greedy with the parameter ε and the Thompson Sampling technique with a beta conjugate prior as the outcome is a click (yes or no).
class AdDecisionEngine:
"""
A class to manage the state and decisions for our ad selection algorithms.
"""
def __init__(self, ad_names):
self.ad_names = ad_names
self.num_ads = len(ad_names)
# --- Epsilon-Greedy State ---
self.ad_impressions = {name: 0 for name in ad_names}
self.ad_clicks = {name: 0 for name in ad_names}
# --- Thompson Sampling State ---
# Beta distribution parameters (alpha, beta). Start with (1, 1) which is a uniform distribution.
self.beta_params = {name: (1, 1) for name in ad_names}
def choose_ad_random(self):
"""Pure exploration: randomly select an ad."""
return np.random.choice(self.ad_names)
def choose_ad_epsilon_greedy(self):
"""
With probability epsilon, explore (choose randomly).
With probability 1-epsilon, exploit (choose the best ad so far).
"""
if np.random.random() < EPSILON:
# Explore
return np.random.choice(self.ad_names)
else:
# Exploit
# Calculate current estimated CTR for each ad
estimated_ctrs = {
name: (self.ad_clicks[name] / self.ad_impressions[name]) if self.ad_impressions[name] > 0 else 0
for name in self.ad_names
}
# Return the ad with the highest estimated CTR
return max(estimated_ctrs, key=estimated_ctrs.get)
def choose_ad_thompson(self):
"""
Thompson Sampling: Sample from each ad's beta distribution
and choose the one with the highest sample.
"""
samples = {
name: np.random.beta(self.beta_params[name][0], self.beta_params[name][1])
for name in self.ad_names
}
return max(samples, key=samples.get)
def update_epsilon_greedy(self, ad_name, was_clicked):
"""Update counts for Epsilon-Greedy."""
self.ad_impressions[ad_name] += 1
if was_clicked:
self.ad_clicks[ad_name] += 1
def update_thompson(self, ad_name, was_clicked):
"""Update beta distribution parameters for Thompson Sampling."""
alpha, beta_val = self.beta_params[ad_name]
if was_clicked:
self.beta_params[ad_name] = (alpha + 1, beta_val) # Click is a success
else:
self.beta_params[ad_name] = (alpha, beta_val + 1) # No click is a failure
Simulating customer ‘click’: The CTR of ‘x%’ is simulated by drawing a value from the uniform distribution U[0,1] < x.
def simulate_client_click(ad_name):
"""
--- 3. Client-Side Simulation ---
Simulates if a client clicks on the shown ad based on its true CTR.
Returns True for a click (reward=1), False otherwise (reward=0).
"""
ctr = ADS_TRUE_CTR[ad_name]
return np.random.random() < ctr
Running the simulation: Separate simulations to be run with each algorithm. Using a ‘run_simulation()’ function we can run each of the algorithms to be tested.
def run_simulation(decision_function, update_function, engine, num_visits):
"""
Runs the main simulation loop for a given algorithm.
"""
total_rewards = 0
history = []
for _ in range(num_visits):
# --- 2. Decision Engine Selects Ad ---
chosen_ad = decision_function()
# --- 3. Client Side Simulates Click ---
was_clicked = simulate_client_click(chosen_ad)
# --- 4. Reward is Processed ---
reward = 1 if was_clicked else 0
total_rewards += reward
# Update the engine's state
update_function(chosen_ad, was_clicked)
history.append(total_rewards)
return total_rewards, history
# --- Algorithm 1: Random Selection ---
random_engine = AdDecisionEngine(AD_NAMES)
# For random, the "update" function does nothing as it doesn't learn.
total_rewards_random, history_random = run_simulation(
random_engine.choose_ad_random,
lambda name, click: None, # No learning/update needed
random_engine,
NUM_VISITS
)
print(f"\nTotal Rewards (Random Selection): {total_rewards_random}")
print(f"Overall CTR (Random Selection): {total_rewards_random / NUM_VISITS:.4f}")
Output (for 100 trials):
----------------------
Total Rewards (Random Selection): 7
Overall CTR (Random Selection): 0.0700
We can visualize the learning by plotting the rewards with trials.
Typically, the results will show:
Find the complete notebook here: https://github.com/souravoo7/Marketing_Analytics/blob/main/testing_simulation.ipynb