The “Instinct” of Data Analysis
I had the pleasure of visiting our wonderful colleagues in Meinig School of Biomedical Engineering at Cornell University in Ithaca just before the blizzard arrived. John F. Zimmerman, PhD generously shared his fascinating work on heart tissue engineering. Over dinner, I asked him whether he would trust emerging agentic or robotic systems to perform wet-lab experiments on behalf of scientists. His answer was a thoughtful no.
“You develop a sense—a feel—in your hands during these experiments,” he explained. “It’s difficult to put into words because it’s subtle, but it’s critical to success.” His perspective was echoed by Marjolein C H van der Meulen , who also joined us for dinner. Marjolein, a renowned biomedical engineer in orthopedic mechanics and former head of the school, described a student who excelled at a heart-related procedure that no one else could master.
This kind of “sense” or “feel” reflects the instinct of exceptional bench scientists—not intuition in a mystical sense, but tacit knowledge forged through repeated encounters with failure modes, biological variability, and the physical constraints of bench experiments.
Interestingly, this was not the first time the word “instinct” came up during my one-day visit to the school. After lunch with students, I spoke with Alexandra Werth , and we strongly agreed on the importance of teaching fundamental machine learning concepts so that students can develop an instinct for data analysis. Alex has been deeply engaged in education research, particularly in understanding how AI tools are reshaping teaching and learning.
We both shared a concern that today’s AI tools can give students the false impression that they no longer need to understand core machine learning principles, as they can simply rely on the tools themselves. This, we agreed, is risky. “Students need a basic sense of data analysis—knowing which methods are appropriate in which scenarios, and what an evaluation metric actually means,” Alex said. “That’s what gives them instinct when doing data analysis.”
Experienced data scientists develop a sense for when data are fragile, when patterns are artifacts of preprocessing, or when a model’s behavior is inconsistent with how the data were generated. This instinct often appears as a quiet unease: Why is the performance so good? Why does the model rely so heavily on one covariate? Why does performance improve after a transformation that should not add information? As in bench experiments, these judgments arise not from explicit protocols but from a deep internalization of process and failure.
Recommended by LinkedIn
Large language models (LLMs), despite their impressive capabilities, do not possess this kind of instinct. They are optimized to produce plausible, fluent outputs based on patterns in text, but not to reason about data-generating processes, statistical dependence, or algorithmic failure modes. They can write convincing analysis code, explain methods eloquently, and even suggest reasonable modeling choices. But they lack an internal sense of wrongness. An LLM will confidently fit a model on data with leakage, accept spurious correlations, or optimize a metric without questioning whether the metric itself is meaningful. When the analysis “works,” it works syntactically and rhetorically, not epistemically.
This is why blindly trusting LLMs for data analysis is dangerous. Data analysis is not just about executing procedures; it is about diagnosing whether those procedures are appropriate. It requires judgment about sample size adequacy, bias, confounding, non-stationarity, missingness mechanisms, and evaluation mismatch. These judgments rely on understanding why an algorithm behaves the way it does, not just how to invoke it. Without this understanding, users may accept outputs that look rigorous but are fundamentally flawed, much like trusting a perfectly written protocol that ignores an unspoken constraint of the wet lab.
Fundamental machine learning knowledge is what allows humans to develop analytical instinct. Knowing the bias-variance tradeoff changes how one interprets performance gains. Understanding overfitting alters how one reacts to complex models outperforming simpler baselines. Familiarity with optimization dynamics, regularization, and data leakage creates a reflexive skepticism toward “too good to be true” results. These concepts are not optional background; they are the substrate that allows analysts to question, stress-test, and reinterpret model outputs—especially when tools automate more of the surface-level work.
In this sense, LLMs amplify both competence and incompetence. For data scientists with solid background knowledge, LLMs are accelerators: they reduce friction, speed up exploration, and offload boilerplate while leaving judgment intact. For users without that foundation, LLMs can create an illusion of understanding, where fluent explanations substitute for insight and polished results mask fragile reasoning. The risk is not that LLMs make mistakes, it is that they make mistakes convincingly.
In an era of powerful AI assistants, the goal should not be to remove human judgment from data analysis, but to sharpen it. Instinct, whether at the bench or at the keyboard, is not an obstacle to automation, but the safeguard that keeps science honest.
Could not agree more. But how do we ensure the correct instinct passing along generations of researchers?
LLM exists today because of decades of foundational work in AI, ML, and particularly NLP by many many talented people who work diligently and understand the foundational concepts in information theory, statists, and computer science etc. Attention didn’t come over night. Before Transformers, attention had been explored for many years in RNN (e.g. Luong Attention) that laid the foundation for Transformers. We learn foundations of the past so that we can innovate the cutting edges of the future. Students need to learn Decision Tree so that they can learn Entropy, Gini Impurity, and non-linear modeling. I like Turing award winner Yann LeCun’s advice to young students wanting to go into AI, “Study things that have a long shelf life. Things that are fundamental and won't go out of fashion. Things that will help you learn new things faster throughout your career. Because technological evolution is accelerating. “ https://www.businessinsider.com/yann-lecun-advice-ai-careers-computer-science-degree-2025-12 Thank you, Professor Wang, and keep it up teaching foundations of AI in your classroom. Ultimately the future of AI will be in their hands one day!
Another wonderful contribution from Prof. Wang. “In this sense, LLMs amplify both competence and incompetence. For data scientists with solid background knowledge, LLMs are accelerators: they reduce friction, speed up exploration, and offload boilerplate while leaving judgment intact. For users without that foundation, LLMs can create an illusion of understanding, where fluent explanations substitute for insight and polished results mask fragile reasoning. The risk is not that LLMs make mistakes, it is that they make mistakes convincingly.” is going into my book of notable quotes.
I think the same concept applies to performing finite element analysis. You need to have a sense of whether the solution makes sense, and start with simple cases for which you have some intuition, that comes from closed form solutions
Did you follow Raiffa's approach to teaching decision trees ?