Correlation vs. Causation

He Hao

Published Mar 3, 2021

In data analysis, we often observe that two variables are related: one variable varies when the other changes. This relationship might lead us to assume that a change to one variable causes the change in the other variable. For example, whenever we see people on the street carrying umbrellas, very soon it starts raining. Should we conclude that umbrellas cause the rain? No, correlation does not (always) imply causation. This issue of Data Science Bytes clarifies the difference between correlation and causation and explain why A/B testing is important to make causal claim.

Correlation

Correlation is a statistical relationship that measures the relationship between two numerical variables, no matter they are causal or not. Pearson’s correlation coefficient is a number between -1 and +1 that measures to which direction and what extent the two variables are linear related. The sign of the correlation coefficient represents positive or negative relationship, and its value represents the strength of relationship. Correlation coefficient that is close to +/- 1 indicates strong relationship, and correlation coefficient that is close to 0 means weak or no relationship.

For example, we would expect the age and height of a sample of teenagers to have a correlation coefficient that is close to +1, which means that generally the older a teenager is, the more weight he/she has.

Causation

Causation refers to the relationship of cause and effect. It is the influence by which one object (a cause) contributes to the production of another object. Causation explicitly applies to cases where action A causes outcome B. On the other hand, correlation is simply a relationship. Action A relates to Action B—but one event doesn’t necessarily cause the other event to happen. Causation and correlation can exist at the same time. However, correlation does not imply causation and causation does not imply (linear) correlation either (e.g., X~N(0,1) and Y = X^2).

In the following picture, we see that the sales of ice cream and the cases of sunburn are strongly correlated, but one does not cause the other. Instead, the cause for both is the weather.

Figure 2. Examples of correlation and causation.

Causation and A/B testing

When developing a new product or feature, we hope it will improve certain business metrics such as adoption, retention, and churn, etc. However, how do we make such a causal claim: a change in metrics is caused by changes introduced by the new feature or product? In theory, to test if A caused B, we need to satisfy the following three conditions:

Relationship. First of all, we need to observe a relationship between them such as strong correlation. Although correlation does not necessarily imply causation, we do need some kind of relationship to make a causal claim.
Time order. The order of time should be right, i.e., to test if A caused B, we have to make sure A happened before B.
Ruling out other explanations. More importantly, we need to make sure there is no other explanation for the relationship we observed between A and B.

The first and second conditions are easy to satisfy. However, how do we rule out all other explanations? The answer is A/B testing. In an A/B testing or randomized controlled experiments, randomization plays a key role in randomly assigning visitors into control and treatment groups to rule out any single possible alternative explanation.

To make scientific causal claims, let’s do A/B testing on your new features.

To view or add a comment, sign in

Correlation vs. Causation

He Hao

Correlation

Causation

Causation and A/B testing

More articles by He Hao

Others also viewed

Turn data into action with DISTRAct

Where Data Meets Reality

Data is not Memory.

"Navigating the Pandemic: Unraveling COVID-19 Trends with Tableau"

Strategies for dealing with missing data

4 Levels of Measurement in Statistics: Nominal, Ordinal, Interval & Ratio

Introduction to Time Series Analysis

Data is Beautiful

The Promise and Peril of Big Data

A Practical guide to time series forecasting

Explore content categories

Correlation

Causation

Causation and A/B testing

More articles by He Hao

Bayes’ Theorem

Interesting Facts About Women

Statistical Hypothesis Tests for Data Science

Charts you need to know in Exploratory Data Analysis

What are DS, MLE and DA?

What is Overfitting and Underfitting?

If X is a discrete uniform random variable, what about X mod K?

A Brief Introduction to A/B Testing

Others also viewed

Turn data into action with DISTRAct

Where Data Meets Reality

Data is not Memory.

"Navigating the Pandemic: Unraveling COVID-19 Trends with Tableau"

Strategies for dealing with missing data

4 Levels of Measurement in Statistics: Nominal, Ordinal, Interval & Ratio

Introduction to Time Series Analysis

Data is Beautiful

The Promise and Peril of Big Data

A Practical guide to time series forecasting

Similar topics

Understanding Correlation and Causation

Correlation and Variability Analysis for Data Analysts

Correlation Analysis in Engineering

Explore content categories