Machine Learning: The Cost of Getting it Wrong?
The Proteomics Winter

Machine Learning: The Cost of Getting it Wrong?

“When you worship technology, you stop questioning it.”

The Ovarian Cancer Story

In the early 2000s, proteomics promised to revolutionise cancer diagnostics. Instead, it generated a text book example of Statistical Debt in biomedical research - costing upwards of $1 billion, setting the field back a decade, precipitating the Proteomics Winter and eroding public trust in cancer diagnostics and science.

The Proteomics Gold Rush

At the turn of the century, Correlogic launched an early diagnostic test, OvaCheck, for ovarian cancer. Using mass spectrometry patterns from tiny patient cohorts and sophisticated machine learning algorithms they developed exciting and seductive models. On paper, those models were highly accurate promising hope and early intervention for patients with life-threatening ovarian cancers.

But the reality was simpler and harsher.

The models had no biological grounding, were not predictive, and they collapsed upon independent validation. These models were overfitting to noise variability. We learned the hard way that you can't validate a model using the data used to build that model.

“The models didn’t find biology - they found batch differences.”

Correlogic wasn’t alone. Dozens - if not hundreds - of academic labs and several startups joined the gold rush, claiming biomarkers for ovarian, prostate, pancreatic cancers, and anything else that moved.

Venture capital followed the hype.

Money was raised and lost.

Hopes were lifted, then dashed.

Counting the Cost

Let’s conservatively estimate the tangible costs of this misadventure.

  • Academic & Early-stage Research: Around 50 significant programs were launched, each burning roughly $500 k in expensive proteomics equipment and personnel. Say $25 million.
  • Commercialisation & Clinical Trials: Correlogic raised an estimated $30-50 million before failure. Two major companies went as far as Phase III trials, each spending around $75 million. Say $150 million.
  • Clean-up & Validation: Public agencies like the NCI spent heavily on multi-site validation efforts including the Clinical Proteomic Technology Assessment for Cancer consortium, designed to check the flaws. Say $25 million.

Total direct costs? Say $200 million.

But, the real damage came later.

The Proteomics Winter

An entire generation of postdocs and PIs had built their careers on these flawed foundations. When the house of cards collapsed, careers were damaged and intellectual effort was wasted. Estimated lost human capital: at least $50 million.

Investor scepticism triggered a Proteomics Winter. Venture capital dried up. Legitimate proteomics companies struggled for funding. Meanwhile, the entire field’s scientific credibility took a hit - good work became harder to publish, reviewers and editors having been burned by hype. Crucially, this fiasco delayed the development of robust, quantitative proteomics (such as modern SWATH/DIA-MS) by 5-10 years.

The real cost of the Proteomics Winter lies in the opportunities it stole. Quantifying this isn’t straightforward, but we can triangulate it in three ways: by looking at lost investment, lost time, and lost impact.

Lost Investment - The Counterfactual Pipeline

In the early 2000s, diagnostics platforms that caught investor and public funding waves - such as next-generation sequencing - typically attracted multiple, large-scale R&D programmes. Had proteomics retained investor confidence, it’s reasonable to expect that 5-10 legitimate diagnostic development programmes, each with budgets of $100-150 million, would have progressed through the pipeline over the following decade. This puts the foregone R&D investment in the $0.5-1 billion range, over and above the $200 million already burned on the flawed wave.

Lost Time - The Decade That Wasn’t

Investor scepticism and reputational damage delayed serious investment in proteomics by at least 5-10 years. Oncology diagnostics typically attract $100-200 million in annual global R&D spending when a platform is considered promising. Multiplying that by a lost decade gives a temporal opportunity cost of roughly $0.5-2 billion in delayed or diverted innovation. This is how economists often assess the cost of infrastructure delays - and the same principle applies here. A decade of cold investment meant slower technology maturation, fewer candidates in the pipeline, and ultimately, delayed patient benefit.

Lost Impact - Health and Societal Costs

Finally, there is the human dimension. Even modest improvements in early ovarian cancer detection can translate into hundreds or thousands of lives saved annually. We can value these savings through standard QALY (Quality Adjusted Life Year) or Value of Statistical Life metrics - typically $100,000-150,000 per life-year in the US. These societal opportunity costs run comfortably into the billions. This doesn’t require heroic assumptions: accelerating the availability of just one moderately effective diagnostic test by five years would have had major clinical and economic impact.

Taken together, these estimates give us a conservative figure of somewhere between $0.5 and $2+ billion in opportunity costs. For argument's sake, let's say that we're talking about $1 billion in round figures.

That's $1 billion in total.

This is the true cost of the Proteomics Winter. The financial burn was bad enough, compounded “interest” on the statistical debt was far, far greater - measured not just in dollars, but in lost innovation, delayed diagnostics, and avoidable deaths.

In addition, although OvaCheck never reached the market, the hype around it raised false hopes among patients. Its collapse contributed to public distrust in early detection tests and the erosion of trust in science itself. And trust, once lost, is expensive to rebuild.nbsp;

This was not a one-off, single bad paper.

This was Glitter Blindness. A system-wide failure: seductive technology + machine learning + compelling clinical need, combined with a lack of statistical and biological rigour. Statistical Debt may start small - a tiny advance on some “statistical nicety". But left unchecked, it compounds. And interest is paid in wasted resources, misdirected research, delayed discovery, and lost human capital.

Statistical debt is real and is taking place in a lab near you, right now.

There was nothing wrong per se with the various supervised machine learning classifiers. Genetic algorithms, decision trees, and support vector machines are all useful tools. The core problem wasn’t the algorithm itself, but the lack of statistical discipline in how it was trained, tested, and interpreted.

Statistical rigour isn’t a bureaucratic nicety. It’s an economic and strategic imperative.


The Proteomics Winter of Discontent

Reference: Ransohoff DF (2005). Ovarian cancer screening and serum proteomics. J Natl Cancer Inst, 97(4):315-319. DOI: 10.1093/jnci/dji054

David Ransohoff’s paper is a landmark critique of early claims that serum proteomic patterns could detect ovarian cancer with extraordinary sensitivity and specificity. He shows that the impressive results reported in early studies (notably the OvaCheck test) were artefacts of flawed study design, not genuine biological breakthroughs.

Good design matters more than algorithmic cleverness.

His critique punctured the hype surrounding proteomic diagnostics for ovarian cancer and became a turning point in recognising the dangers of overfitting, lack of replication, and systematic bias in high-dimensional biomarker research. It remains one of the clearest, most influential statements of why rigorous study design matters more than algorithmic cleverness.


More Statistical Tails of the Unexpected

Article content
Statistical Tails of the Unexpected
"This isn’t a stats textbook. It’s a demolition job on bad science."

https://www.amazon.co.uk/Apes-Anoraks-Statistical-Tails-Unexpected-ebook/dp/B0DVZQH3FR


💪🏻 "Good design matters more than algorithmic cleverness." Love that quote. When you create high quality and representative dataset to train any statistical model with discipline and good practices, the results are a lot more convincing and solid.

Glitter Blindness!!! “The core problem wasn’t the algorithm itself, but the lack of statistical discipline in how it was trained, tested, and interpreted.”

Thanks again for another enlightening post Dennis Lendrem. This reminds me of one of the quotes attributed to George Box: "All models are wrong, but some are useful". In particular don't trust a model until we have verified it with data that was not used to develop the model.

There is a pre-cursor story to this which is relates to the current fad of "digital twins" , Systems Biology. A company called Merrimack Pharmacueticals, which in the end probably spent close to a billion dollars, failed miserably. (If you look at the old financial documents you see roughly a billion dollars of investor money taken and in the end only 1/3rd was ever got back, so a huge loss.) At the end all the company had for all its endeavours was a refformulation of Irinotecan, which clearly has little to do with systems biology! Its fascinating there there is little literature on such failures.

To view or add a comment, sign in

More articles by Dennis Lendrem

Others also viewed

Explore content categories