Simpson's paradox and Data

Simpson's paradox and Data

Data for statistical analysis can be tricky information especially when you interpret it the wrong way. In terms of aggregation, the best statistical paradox is depicted by Simpson's paradox named after Edward Simpson, after he published a paper on Contingency Tables in the Journal of the Royal Statistical Society in 1950s.

So what is this paradox and why is it important for Data Analysts? Since we are moving towards big data, caution must be exercised when aggregating small data sets into bigger sets. That leads to a larger challenge of accurate interpretation. What Simpson's paradox refers to is the reversal in correlation that takes place whenever data is viewed in a disaggregated state vis-à-vis amalgamation for two variables upon conditioning of a third variable. The insights can be exactly opposing!

This paradoxical result can skew insights into something disagreeable! The University of California in Berkeley experienced it first hand in 1970s when a discrimination suit was filed against it for accepting more male students than female! Well in terms of analysis the result of aggregation of admissions was a paradox when compared to individual admissions in different subjects for both genders! So in simple terms when you view the admissions for males in say three subjects the percentage is low when compared to females in same subjects in terms of number of applications and acceptance rate. But when you view the total number of male admissions for all three subject combined compared to females for acceptance rates the opposite is true! All about aggregation!

Hence statistics can be deceptive and sometimes have much deeper meaning than apparent! It can also have impact on decisions involving industries where general effects or practices involving a number of variables are studied and  averaged.

Karl Pearson on statistics quoted, “Statistics is the grammar of science”. Data is most often telling a story and you may need to read in between lines to understand its underlying implication specially when making decisions!

As the saying goes.."Bigger is better" and the complexity increases proportionately.. understanding your data is the key and learning to read between the lines comes with experience and having a 360 degree view.. Data is the next Intel Inside - says Tim O Reilly Will large datasets offer a higher form of intelligence and knowledge that can generate insights that was previously not available? Yes it can and it will...

Like
Reply

To view or add a comment, sign in

More articles by Shubhra Kumar

  • Buyers Journey in Digital Marketing

    Even before I delve into what “Buyers Journey” is, let’s first understand the necessity of Digital Marketing. Why is…

  • You’re not old enough!

    The other day i was reading an article on psychological ageing and how it affects people and their careers, what…

    2 Comments
  • Marketing skewing toward a specialized field

    Gone are the days when the marketing guy was just driving sales or building brands in organizational silos…

  • Gen Y and Apps and Ads

    Gen Y mobility not happy with ad-tivity in apps. What marketers need to learn about marketing in personal space is…

Others also viewed

Explore content categories