Learn easily 6 basic concepts for testing
Welcome to our 10th article on Optimization tips. Based on CXL learnings, this time we will be focusing in professor Ben Labay’s lesson on testing.
One of the main steps of optimization strategies is testing. Testing is needed in order to validate the hypothesis we propose so we do need to master testing if we want to nail optimization.
In this article, we will schematically learn the main or most important concepts in testing in order to fully understand how to do it and to get better outcomes from it. You'll see they are mainly the statistics fundamentals for testing.
1. SAMPLING: populations, parameters, and statistics
The first concepts we will learn are the ones relates to sampling. Sampling is a really important key in testing given its usually impossible to test the whole population we want to base the study in.
Population: all potential users or things that we want to measure – the entire pool being measured
Parameter: what we want to compare or measure (in the example the temperature of the coffee would be the parameter)
There can be two parameters; a. True population parameter, b. Sample parameter – since we can´t know the true population parameter, we focus on the sample parameter.
True parameters: Mean and standard deviation represented by mu (mean) and sigma (standard deviation)
Sample parameters: Mean (X-bar), Standard deviation (s/sd)
2. MEAN AND VARIANCE
The second concepts we are going to learn are mean and variance, both of them really basic and important as well:
Mean: or average, it’s the most common measure of central tendency. The typical Value or Midpoint (Central Tendency)
Common variance: the shape of the data, how spread out the data is (standard deviation). To calculate it you start with the average squared distance from the mean for each of the points and you make the square root of that.
3. CONFIDENCE INTERVALS
Now we are going to learn what are the confidence intervals and how should them be when it comes to AB testing:
Confidence intervals: range of values defined in a way that there´s a specific probability of the value of that parameter that lies within it. Confidence intervals needs: the mean, sample size, variability or shape of the data and confidence level [how confident we want to be that our estimate of the parameter is within that confidence interval].
For A/B Testing, the Confidence Interval represents the amount of error allowed in A/B testing. The true conversion rate can´t be measured so we use the Confidence Interval to know our sampling error.
4. STATISTICAL SIGNIFICANCE AND THE P-VALUE
The fourth concepts we are going to learn are statistical significance and the p-value.
Statistical significance helps us quantify whether a result is likely due to chance. When a finding is significance this means that you can feel confident that’s real.
P-value is the probability of obtaining the difference you saw from a sample if there really isn´t a difference for all the users. The conventional, arbitrary threshold for declaring statistical significance is a P-Value of less than 0,05. This means that in this case, there would be less than a 5% chance of a false positive.
The P-Value does not tell us that the probability of B>A. It also does not tell us the probability that we will make a mistake in selecting B over A.
1-p-value equals confidence.
5. STATISTICAL POWER
In fifth place, we are going to see what the statistical power of a test is and which ingredients do we need to keep in mind in order to cook it. We will also see how it has to be when applied to an AB test.
The power of any test of significance is defined as the probability that it will reject a false null hypothesis (a null hypothesis is when the two variations in a test do not have a relation//a false null hypothesis is when the tester fails to tell when a hypothesis is not true, she or he do not identify whether is a good hypothesis or not)
The statistical power is determined by: 1. The size of the effect you want to detect, 2. The size of the sample used. The greater the effect that one tries to detect, the easier the effect will be detected.
An overpower AB test is the one in which it´s got much more than a sufficient sample size when the statistical power of your AB test is 80% (standard bar) – this means there is a 20% probability of making a type II error [a false null hypothesis or not identifying a null hypothesis]. To understand it in a simple way, an overpowered AB test is the one that, due to its super big sample size, has a higher statistical power. In the overpowered AB test, the probability of a type II error decreases whilst the probability of a type I error (a normal false positive) is bigger.
6. SAMPLE SIZE AND HOW TO CALCULATE IT
One of the most common questions is what should be the sample size of a test. For AB testing, it depends on how large of a difference you want to be able to detect. Else, you have to keep in mind the level of confidence -which should be between 90 and 95%- as well as the power and variability of it -which should be 80%. After that, one needs to pick a reasonable range for the conversion rate of the control -around 5%- and then one just needs to vary the difference between A and B to see what sample size we´d need to be able to detect a difference that is statistically significant.
And so there's three main ingredients to the calculator, assuming that you hold the power and confidence levels:
1.So there's the control group's expected conversion rate
2.The minimum relative change in conversions you wantto be able to detect, so the Lift.
3.And then the confidence level, so how confident that you want to be?
Hope these insights in testing were helpful!! Feel free to share the article and comment any thought you may have on them.
Best and see you next week with more optimization tips!