Do Your Homework Before Analyzing
Many empirical studies confirm that in the course of building predictive and explanatory models, incorporating domain knowledge is vastly superior than blindly including predictors. Therefore, efforts to research the topic in order to gain a deeper understanding should never be overlooked.
Research into the topic could be as informal as discussions with subject matter experts or as formal as scientific publications. Access to these information is a lot easier today then it used to. However, precisely because there is so much information, it is important to be selective and cautiously skeptical:
- Avoid second hands reporting and go straight to the primary sources.
- Use Google Scholar
- Trade journals
- Conference proceedings
- Blogs
- Books
- Discussion with colleagues and subject matter experts
Similarly, we should not generalize our own ignorance as ignorance of the field. In the early stages of data analysis, it's easy to fall for the "kitchen sink" temptation: collect everything. Despite low storage costs, organizations are better off using a principled approach: too much data will harm nearly as much as no data and present additional liabilities.
References
These references talk about these and other issues:
Harrell FE Jr, Lee KL, Califf RM, et al. Regression modelling strategies for improved prognostic prediction. Stat Med 1984;3:143–52.
Steyerberg EW, Eijkemans MJ, Harrell FE Jr, et al. Prognostic modelling with logistic regression analysis: a comparison of selection and estimation methods in small data sets. Stat Med 2000;19:1059–79.
Steyerberg EW, Eijkemans MJ, Van Houwelingen JC, et al. Prognostic models based on literature and individual patient data in logistic regression analysis. Stat Med 2000;19:141–60.
Spiegelhalter DJ. Probabilistic prediction in patient management and clinical trials. Stat Med 1986;5:421–33.
About the Author
Thomas Speidel, P.Stat., is a Canadian Statistician. He spent ten years working in cancer research before moving to the energy industry. Thomas is often seen writing and commenting on issues of statistical literacy on LinkedIn, Twitter, several blog and is a co-founder of About Data Analysis, a LinkedIn group.
LinkedIn: ca.linkedin.com/in/speidel/en
Twitter: @ThomasSpeidel
About.me: http://about.me/Thomas.Speidel
I've been making a career out of looking at what others have done and (usually) ignoring it. I'm even giving a presentation on student level data and showing that a lot of the assumptions that go behind student success are flat wrong. That the anecdotal "evidence" of the few is misplaced and confirmation bias, and not data supported. When modeling, I always try to get an actionable model. Delivering possible insight is out of sight.