Do Your Homework Before Analyzing
Image credit: http://www.cardiff.ac.uk/__data/assets/image/0005/95486/knowledge-transfer-open-minds.jpg?w=240

Do Your Homework Before Analyzing

Many empirical studies confirm that in the course of building predictive and explanatory models, incorporating domain knowledge is vastly superior than blindly including predictors. Therefore, efforts to research the topic in order to gain a deeper understanding should never be overlooked.

Research into the topic could be as informal as discussions with subject matter experts or as formal as scientific publications. Access to these information is a lot easier today then it used to. However, precisely because there is so much information, it is important to be selective and cautiously skeptical:

  • Avoid second hands reporting and go straight to the primary sources.
  • Use Google Scholar
  • Trade journals
  • Conference proceedings
  • Blogs
  • Books
  • Discussion with colleagues and subject matter experts

Similarly, we should not generalize our own ignorance as ignorance of the field. In the early stages of data analysis, it's easy to fall for the "kitchen sink" temptation: collect everything. Despite low storage costs, organizations are better off using a principled approach: too much data will harm nearly as much as no data and present additional liabilities.


References

These references talk about these and other issues:

Harrell FE Jr, Lee KL, Califf RM, et al. Regression modelling strategies for improved prognostic prediction. Stat Med 1984;3:143–52.

Steyerberg EW, Eijkemans MJ, Harrell FE Jr, et al. Prognostic modelling with logistic regression analysis: a comparison of selection and estimation methods in small data sets. Stat Med 2000;19:1059–79.

Steyerberg EW, Eijkemans MJ, Van Houwelingen JC, et al. Prognostic models based on literature and individual patient data in logistic regression analysis. Stat Med 2000;19:141–60.

Spiegelhalter DJ. Probabilistic prediction in patient management and clinical trials. Stat Med 1986;5:421–33.


About the Author

Thomas Speidel, P.Stat., is a Canadian Statistician. He spent ten years working in cancer research before moving to the energy industry. Thomas is often seen writing and commenting on issues of statistical literacy on LinkedIn, Twitter, several blog and is a co-founder of About Data Analysis, a LinkedIn group.

LinkedIn: ca.linkedin.com/in/speidel/en 

Twitter: @ThomasSpeidel

About.me: http://about.me/Thomas.Speidel


I've been making a career out of looking at what others have done and (usually) ignoring it. I'm even giving a presentation on student level data and showing that a lot of the assumptions that go behind student success are flat wrong. That the anecdotal "evidence" of the few is misplaced and confirmation bias, and not data supported. When modeling, I always try to get an actionable model. Delivering possible insight is out of sight.

Like
Reply

To view or add a comment, sign in

More articles by Thomas Speidel

  • Fitting models is what we do for fun, when all the tedious work is done!

    As we continue to evolve at Suncor, I’m really excited about data literacy and technology playing such a big part of…

    4 Comments
  • Keeping Up With Data Science Innovation Part 1: Podcasts

    I get asked a lot how to keep up to date in the field of data science. Here are some of the resources I use.

    2 Comments
  • Single Point or Repeated Decisions?

    The work of a data scientist often results in one of two main outputs: single point decisions and repeated decisions…

    2 Comments
  • Single Source of Truth?

    Administrative data are data collected for the purpose of administering a service or for internal reporting…

    1 Comment
  • Rare Events & Cloud Services: a Winning Synergy?

    Most of my statistical formation happened in cancer research. Several types of cancer are considered rare diseases.

  • Substantive & Empirical Models

    In a previous post, I wrote about what models are and how they are chosen. However, I did not make justice to a broader…

    1 Comment
  • Statistical Process Control

    Few months ago, I posted an article on comparing KPI's. I illustrated a methodology that compares the observed KPI to a…

    10 Comments
  • Yes, But What if You're That One?

    Growing up, I recall my aunt buying a lottery ticket nearly every week. I used to tell her to save the money, she…

    2 Comments
  • What's in a Model?

    A key concept in the world of Data Science is that of a model. A model is simply a generalization of a reality…

    3 Comments
  • Comparing KPI's

    Organizations are often interested in comparing performance metrics (KPI) such as web traffic, sales, safety…

    3 Comments

Others also viewed

Explore content categories