Correlation with Tableau
What Are Correlations?
Correlations are associations between Variables. The first question to answer in understanding a correlation is therefore "What are variables?"
Variables are things we measure that can differ from one observation to the next, such as height, weight, behavior, fat intake, lifespan, grade-point average, and income. With these variables we can easily assign a number to represent the value of the variable. Perhaps less obviously, we can also treat sex (gender), country of origin, and political preference as variables, even though we don't know how to assign a number to represent each category.
In general, a variable is a measure of something that can take on more than one value
Lets us take an example to start with Correlation.
The local ice cream shop keeps track of how much ice cream they sell versus the temperature on that day, here are their figures for the last 12 days:
And here is the same data as a Scatter Plot:-
We can easily see that warmer weather leads to more sales, the relationship is good. In fact the correlation is 0.9575
Note that Correlation can have a value:
- 1 is a perfect positive correlation
- 0 is no correlation
- -1 is a perfect negative correlation
Positive Correlation Examples
- The more time you spend running on a treadmill, the more calories you will burn
- The longer your hair grows, the more shampoo you will need
- When an employee works more hours his paycheck increases proportionately. ( I still have question on this, I dis-agree to this)
Negative Correlation Examples
- The older a man gets, the less hair that he has
- A student who has many absences has a decrease in grades
- If it is darker outside, more light is needed inside
Well here is the formula,
Don't worry Nana
Trust Me, we will make it Simple !!!
You probably won't have to calculate it like that, but simply a routine set of calculations.
- Step 1: Find the mean of x, and the mean of y
- Step 2: Subtract the mean of x from every x value (call them "a"), do the same for y (call them "b")
- Step 3: Calculate: a × b, a2 and b2 for every value
- Step 4: Sum up a × b, sum up a2 and sum up b2
- Step 5: Divide the sum of a × b by the square root of [(sum of a2) × (sum of b2)]
Download File for Solution :- Correlation in Excel
Here Star implies the step number. I think Image itself is self explanatory :P
Grandmother Excel is always there to help us, CORREL(Array1,Array2) function in Excel can be use to take all the pain of long calculation in one go.
Now it's time for twist in story. In first example we have taken 12 days data. Now we will take data for 24 days.
Before calculating correlation, lets have scatter plot for above data points.
The correlation calculation only works well for relationships that follow a straight line.
Thus without calculation we can say that, correlation value is now zero i.e "No Correlation" ... !
Lets check by calculation,
There is also one formula to find correlation:
Just sum up x, y, x2, y2 and xy. Yes we are going to see this formula with Tableau. Till then stay tuned with me
"Correlation Is Not Causation" ... which implies that a correlation does not mean that one thing causes the other (there could be other reasons the data has a good correlation).
Example 1 :- Chocolate Consumption vs Nobel Prize
Look at this graph -
It shows that countries in which the people eat a lot of chocolate, there are most Nobel prize winners. So chocolate consumption is correlated with Nobel prizes.
Now what would happen if the government of say, India began distributing chocolate to all of their people? Would India win more Nobel prizes to India then? Probably not. This is because eating chocolate does not cause Nobel prizes.
here are the thoughts of PM Modi Sir on Correlation,
It is so easy to find correlations - they're everywhere. Proving that it is actually a causation is much harder. In the present case, the correlation is probably because countries with high chocolate consumption levels are richer and because of this their people are better educated on average resulting in more Nobel prizes. The correlation is established in this way through several other factors.
To say chocolate causes Nobel prize winners, is the same as saying:
The amount of chocolate a countries' inhabitants eat determines the number of Nobel Laureates".
Example 2:- Ice Cream vs T Shirts
In this example we will look for Ice cream sales comparing it with T shirt sold by big store by each day. (Logically there is no relation between Ice Cream and T Shirt)
The correlation between Ice Cream sales and T Shirt Sold is high. Here relationships that follow is a straight line.
But, Does this mean that T Shirt make people want ice cream?
As committed above, we will compute second formula in Tableau.
Here, we will take the same data, in which local ice cream shop keeps track of how much ice cream they sell versus the temperature on that day, here is the data:- Correlation in Excel
To compute the formula, we have to write calculation to find calculated fields.
N = size()
Sigma XY = window_sum(sum(sales)*sum(Tempt in Celsius))
Sigma X = window_sum(sum(sales))
Sigma Y = window_sum(sum(Tempt in Celsius))
Sigma X^2 = window_sum(sum(sales)^2)
Sigma Y^2 = window_sum(sum(Tempt in Celsius)^2)
I have added Correlation in Title and Tooltip.
A correlation matrix is used to investigate the dependence between multiple variables at the same time
Correlation matrices offer a good way of visualizing similarities between members in your dataset. Data points used in calculating pair-wise correlations could be of many different kinds. For example you could be creating a correlation matrix between different commodities and have the price for each commodity over a period of time used for calculating the correlations
#MarketBasketAnalysis would be one good example for Correlation Matrix. In this we compare Each against each other- Thanks to Tharashasank Davuluru, from whom I came to know about this.
Now let's jump on Correlation Matrix in Tableau
Data Source :-
https://vincentarelbundock.github.io/Rdatasets/datasets.html and then search for mtcars
The data set has 35 variables. This translates into a huge correlation matrix. Notice that each line of mtcars represents one model of car. Each column is then one attribute/Variable of that car, such as the miles per gallon (or fuel efficiency), the number of cylinders, the displacement (or volume) of the car's engine in cubic inches, whether the car has an automatic or manual transmission, and so on
Variables in Data Set:-
- mpg = Miles/(US) gallon
- cyl = Number of cylinders
- disp = Displacement (cu.in.)
- hp = Gross horsepower
- drat = Rear axle ratio
- wt = Weight (lb/1000)
- qsec = ¼ mile time
- vs = V/S
- am = Transmission (0 = automatic, 1 = manual)
- gear = Number of forward gears
- carb = Number of carburetors
Output 1:-
Output 2:-
To explore more on correlation matrix :-
1) http://drawingwithnumbers.artisart.org/tag/correlation-matrix/ (source to download tableau workbook)
Summary:-
- The word Correlation is made of Co- (meaning "together"), and Relation.
- To measure the strength of relationship between two variables, it would be best to use a correlation
- Correlation can only be between -1 and +1.
- The closer the correlation is to 1 (either + or -), the stronger the relationship.
- In order to make good predictions between two variables, a strong correlation is necessary.
- Before calculating Correlation, use scatter plot and look at it. you may see a correlation that the calculation does not
- Correlation is not causation
I hope Correlation and the examples are helpful, if you have any good examples please feel free to mention them in the comments section and as usual all comments and corrections are welcome!
Thanks
Tableau is indeed the best business intelligence tool one can use. It not only helps you deal with huge data on a daily basis but also helps you with predicting results out of this data. I was able to learn a lot about tableau with the help of best tableau online courses. https://www.takethiscourse.net/tableau-training-certification-courses/
Hey Sumeet, This is beautiful article. I just loved the way you wrote it. Images, humor and topic research all are damn good. You always make us understand topic in more detailed way. Hats off to your effort. I would also request you to post link of this article on the tableau community blog section. Everyone on the community can create their own blog and document things which they have wrote. I am asking you to do this as we are having maximum user base on the community. It was nice read and I am sure that everyone of us has learned the topic well. More than the topic I was interested in your perspective. This is also pushing to write articles and share knowledge. I totally appreciate your efforts. keep writing.
wonderful, beautifully explained the magic of tableau!
Its such a good read.Beautifully explained.I would like to ask one question- In tableau formula 'CORREL'- the first line shows "Size" which ia nothing but 'N' is showing inside the window_sum but as per mathematical formula ,it should be outside the sigma.Please correct me if my understanding is wrong.?