USING ONE-WAY ANOVA FOR EMAIL LIST SEGMENTATION

USING ONE-WAY ANOVA FOR EMAIL LIST SEGMENTATION

A one-way ANOVA is used to test a null hypothesis by comparing three or more sample groups from a population (a t-test is generally used if comparing two sample groups). To use this method, we take a random, equal sample from each group. Then we examine the mean and variance between samples.

For this post, we are going to test whether the city an email recipient resides in (the independent variable) affects whether they open emails from our email campaigns (the dependent variable). We will do this by comparing the City column to the Number of Marketing Emails Opened column.

In a broader, more business relevant sense, we are asking if some cities are more receptive to our email campaigns than other cities. If so, perhaps we should segment email campaigns based on the city that the email recipient lives in.

Our fictitious data of email recipients:

The data can be found in my original post on my blog: http://daranjjohnson.com/2016/12/19/using-one-way-anova-email-list-segmentation/

Calculate Sum of Squares Total

The first thing we have to do is sum the differences between each email recipient’s number of emails opened and the average emails opened for all email recipients. Because we want to know how far each email recipient is from the overall average, we are not interested if a given email recipient’s number of emails opened is less then or greater than the average, just the distance from the overall average. To make this easier, we square each result to eliminate negative numbers. Then we add them together. This will give us the Sum of Squares Total. Which in this case is 713.62.

Calculate Sum of Squares Between

Next we have to do the same for each city. So, calculate the mean (average) Email Opens for each city. Then take the mean for each city and subtract the total mean. Next square each result (to eliminate negative numbers). Then multiply each result by the number of email recipients in that city. This will give us the Sum of Squares Between. If I’ve done this correctly, then the Sum of Squares Between should be 544.19.

Calculate Sum of Squares Within or Sum of Squares Error

Now we need to calculate how far each value within each group is from the group mean. We do this by doing the same thing we did to get the Sum of Squares Total, but this time we use the mean of the email recipients of the city, not the total recipients mean. This can also be calculated by subtracting the Sum of Squares Between from the Sum of Squares Total. I calculated this both ways and came up with 169.43, which gives me better confidence in both the calculations above.

Variance Estimates of Sum of Squares Between and Sum of Squares Within

To calculate the variance estimate of the Sum of Squares Between, we divide the Sum of Squares Between by the number of groups minus one. This calculation gives us 272.10.

To calculate the variance estimate of the Sum of Squares Within, we divide the Sum of Squares Within by the number of email recipients minus the number of cities. This comes out to 56.48.

Calculating the F-ratio

The final step is to calculate the F-ratio. We divide the variance estimate of the Sum of Squares Between by the variance estimate of the Sum of Squares Within. This number will be used to decide whether there is correlation between the city and how many emails a recipient opens as well as how strong that correlation appears to be. Our F-ratio is calculated to be 4.82.

Use a table of the critical values for the F distribution to find the relevant F-value. If the value in the last step is greater than the value from the table, the correlation is significant at the level of significance of the table (ex. p = 0.05). Otherwise, the correlation is not significant at that confidence level.

If you look up the F-value with a p=0.05, it is 2.83. That is far lower then our F-ratio of 4.82. So, in this case, we have to conclude that there is probably a correlation between the city an email recipient resides in and how many of our emails they open.

Calculation Summary

To summarize the process: 

  1. Calculate the sum of squares for each email recipient for the whole sample.
  2. Calculate the sum of squares for each city.
  3. Calculate the sum of squares for each email recipient within each city.
  4. Divide the #2 above by the # of groups minus one.
  5. Divide the #3 above by the # of email recipients minus the # of groups.
  6. Divide #4 above by #5 above. This is the F Ratio.
  7. Look up the F Ratio on a F Values table. If F-ratio is greater then the F-values table, then we can conclude that the city that an email recipient resides in has an effect on whether they open an email.

Conclusion

Now it’s your turn. We want to know if Education is also an independent variable that can determine whether an email recipient opens our emails (our dependent variable).

Later, we can look at both City & Education together and whether both together determine whether an email recipient opens our emails as well as if they interfere with each other and to what extent that affects the outcome.

Last Thought

Of course, this feels very laborious and there are many tools that will do all these calculations for you. But, I think it’s important to understand what is happening under the hood, since this is just one model to do this type of calculation.

To view or add a comment, sign in

More articles by Daran Johnson

  • Time Series ARIMA Models

    The acronym ARIMA stands for auto-regressive (AR) integrated (I) moving-average (MA). ARIMA models can be broken down…

  • Replacing Excel With R

    Excel is great for spreadsheet use. I can remember taking accounting in college, when I was not aware of Excel (before…

  • Calculating Correlation of Data Attributes

    In optimizing email campaigns, it is best to segment your email list into smaller lists, based on the attributes of the…

    2 Comments
  • Steps for Social Media Success

    The first step in using social media channels successfully is to understand what you will be using them for. Is it for…

  • Strategies For Measuring Digital Branding Campaigns

    Digital branding campaigns are campaigns designed to boost positive awareness and recall of your brand. They are not…

  • 5 Tools Guaranteed to Boost Your Digital Marketing

    There are a lot of great tools out there to help you with digital marketing. Some help you update and upload new pages…

    1 Comment
  • Fun With Variables - Creating Dimensions With Google Analytics & GTM

    Google Analytics has become enormously powerful over the years. When I first started using GA, it was little more than…

  • Channel Attribution - Give Credit Where Credit is Due!

    Most analytics tools give credit to the last traffic channel (banner, search, etc.) through which a visitor arrived at…

  • Do You Know Your Data?

    Data-driven decision making is one of the most important factors in having a successful online presence, regardless of…

  • 6 Things You Must Do To Succeed in Digital Marketing

    There are so many things that need to be done when running an organization’s digital marketing. There are new…

Others also viewed

Explore content categories