Just got chi-squared!
banner image credit belongs to visage.co

Just got chi-squared!


One of these graph is telling a false story!  The question we are attempting to answer is, "Does marital status and/or gender effect the type of car an individual drives?”


This question reminds me of an impressive experience I had with Adobe Analytics product and UX manager whose goal was to verify if I’m able to understand and navigate a new feature of their product. I walked away wondering how they would quantify my response and find a way to apply their findings to a larger population set. The intent of this article is to show one standard way to approach investigating a relationship between categorical variables, as alway the immediate audience to this article is myself.

The mosaic graph of marital status vs. type of car tells a story that the married segment drives a more family type car than the single segment. Secondly, the single segment drives more sporty cars than the married segment. From my experience and observations, this makes sense! Similarly, the mosaic graph of the gender vs. the type of car tells another story. The female population tends to drive more family cars than the male population. In contrast, males tends to drive more sporty cars compared to females. From my experience and observations, this also makes sense! 

How do we determine if a relationship exists if all we have are several nominal and ordinal measurements? How can we confirm if marital status and gender indeed have an effect on the type of car an individual drives? 

This is where the Chi square independence test comes handy. In some cases we have to dummy code the responses. A chi square statistic is used to learn about the relationship between two qualitative variables. We can investigate whether distributions of binomial or multinomial measures differ from one another. Responses to such questions as “What is your marital status?" or What type of car do you own?" are categorical because they yield data such as “Single" or “Sporty"

 Chi-Square is based on the difference between expected count and observed count. Look at the deviations in the contingency tables.





Marital Staus and Type of Car:

Here we see the p-value is < 0.05. This indicates, statistical significance, that the difference between the expected and the observed count is big enough to counter-explain the sampling error.  Hence, we see that marital status has an effect on the type of car an individual drives. 

Gender and Type of Car:

Here we see the p-value is > 0.05 indicating that the difference between expected and observed count is not big enough to ignore the possibility that this relationship could be because of sampling error.  Hence, gender does not show any effect on the type of car an individual drives.

Conclusion

This test gives a bit more transparency to “makes sense” of the categorical data! From Chi-Squared test we learned that even though the relationship between gender and car type seemed to make sense, the deviation is not big enough to confirm any relationship in these variables. This understanding of relationships between variables can help us to avoid faulty conclusions and expensive managerial mistakes. That's how I got chi-squared'ed ;)

To view or add a comment, sign in

More articles by Vijay Patha

  • Tool#4 - Argument Boxes

    * This is an excerpt from the book Machine Learning Product Manager: 10 Tools to Jumpstart your Career* Product…

  • Tool #1 to succeed as a Machine Learning Product Manager

    * This is an excerpt from the book Machine Learning Product Manager: 10 Tools to Jumpstart your Career** By reading…

    2 Comments
  • Teamwork Myths

    When was the last time you paid attention to your breathing? Why would you? For most of us, breathing is natural and…

    1 Comment
  • Buy bitcoin? A case for and against

    Your friend might have paid off her student loans from the recent surge in the bitcoin's value. Like me, you might be…

  • Uncomplicated intro to Qubits

    How would solve a maze problem 20 years from now? Here is quick analogy comparing the classical and quantum computing…

  • Ordering Product Road Map

    Product road map should be ordered not just prioritized. Prioritization is one way to order a product road map.

  • Symphony of Variables Impact

    Nothing at work is more exciting to me than to bring data driven insights that can drive clarity into business…

  • Uses of Partial Correlation

    There are many ways to accomplish our goal to develop a simplest predictive model. A common and easiest approach is to…

  • Indicators of Multicollinearity

    As always the immediate audience for this article is myself. I don’t think there is no shortage of supply of…

    2 Comments

Others also viewed

Explore content categories