The Paradox of Performance
Is a diagnostic that performs at 100% better than one that performs at 75%?
The answer is obvious...or is it?
I often see headlines of new high performance biomarkers. I am suspicious of the ones that claim 100%.
Why? You have to ask yourself how good the established gold standard is.
If, for example, a first diagnostic prostate biopsy identified 75% of the men who have prostate cancer, then if the new biomarker predicts that outcome to 100% then it suggests it has learnt something specific to the data set it was given rather than to the underlying cancer biology.
So in that case I would be more optimistic of the prospects of a biomarker that performed at 75% than I would be of one that performed at 100%.
There are very few clinical data sets that would let you confidently claim 100% accuracy. There are very few gold standards at that level.
Ankerst et al show a fantastic example of the problems of assessing performance
PCPTRC risks were calculated for 25,733 biopsies
So, a big analysis of many well designed studies using a well established and controlled assay.
AUCs of the PCPTRC ranged from a low of 56% in the ERSPC Goeteborg Rounds 2–6 cohort to a high of 72%
So performance from pretty much no better than chance to quite encouraging. Why?
External validation of the PCPTRC across ten cohorts revealed varying degree of success highly dependent on the cohort, most likely due to different criteria for and work-up before biopsy.
So whenever you see a report of a brand new high performance marker ask;
What was the gold standard? and what was the target population?
The further the population it was tested on is from the real clinical population the more cautious we should be.