Beware of the Morlocks:

Stephen Senn

Published Mar 1, 2026

If you use a Bayesian Time Machine you may be in for some surprises

"It sounds plausible enough tonight,” said the Medical Man; “but wait until tomorrow. Wait for the common sense of the morning.”

HG Wells, The Time Machine, Ch II

Guide Rail

The recent FDA Guidance [1] on using Bayesian methods has attracted much interest and examples of an enthusiastic welcome are not hard to find. But as I often say:

At any celebration, the role of a statistician, especially a frequentist one, is often that of a bad fairy. Nobody invited them, they turn up at the end, spread gloom and send everyone to sleep.

So I am going to pay the role of a bad fairy here and rail against the guidance. I am less than enamoured by its recommendations and not because it is Bayesian per se but because, in its enthusiasm for Bayesian methods, it is in danger of promoting debased Bayes and possibly even debayesed Bayes.

Praise for Bayes

But first, let me quote some praise.

If the FDA follows through with the proposed guidelines, and they are not fatally twisted by pressure from the medical establishment and health care industry, it should bring fresh air and sunlight into the approval process. It should save money and speed innovation, with better health outcomes.

Aaron Brown. How To Speed Up the Search for Cures Through a Change in Probability Theory.

“Bayesian methodologies help address two of the biggest problems of drug development: high costs and long timelines,” said FDA Commissioner Dr. Marty Makary in a press release announcing the draft guidance. “Providing clarity around modern statistical methods will help sponsors bring more cures and meaningful treatments to patients faster and more affordably.”

PharmaVoice New FDA guidance that’s a ‘huge deal’ for clinical trials Why using Bayesian statistics could transform trial design for rare diseases and beyond.

Richard Lilford, professor of public health at the University of Birmingham, UK, has long called for greater adoption of bayesian approaches, such as in drug development for rare diseases, and was excited by the new guidance. “It’s good that after years of prompting, a decision body has decided to accept ‘grown up’ statistics,” Lilford told The BMJ …

Peter Doshi, BMJ 2026;392:s180

Grown Up or Blown Up?

Grown up statistics? I am not so sure. In my opinion, the issue is not so much Bayes versus frequentist statistics but in terms of a number of matters related to concurrent control.

The classical clinical trial uses concurrent control, is randomised and double blind. It is not always appreciated that the degree of guaranteed blinding is constrained by the degree to which the random sequence used can be guessed[2]. Furthermore, blinding, which is often taken as a way of dealing with patient expectations, has a side-effect that is frequently overlooked. It makes everything else also random. See Blind Date.

I am going to illustrate the problems that can arise by considering what happens once you abandon concurrent control. The particular context is that of adaptive designs. I have been a member of at least two data-safety monitoring boards in which allocation ratios have been varied and I have found this a challenging experience. I am going to illustrate why adopting a so-called Bayesian Time Machine, as proposed in a paper that the FDA guidance cites, will not necessarily eliminate problems with concurrent control that adaptive designs may create. My starting point is a paper by Saville et al that the FDA guidance cites[3].

Time's chariot

But at my back I always hear/ Time’s wingèd chariot hurrying near;

Andrew Marvell, To His Coy Mistress

Figure 1 below is based on a 2022 paper by Saville et al[3]. It is a hypothetical example in which arms are added or dropped over time in an adaptive design. Initially 50 patients are added to Arm 1 and 50 to Control. In period 2, 50 further patients are allocated to each arm. In period 3, 33 patients are added to each of Arm 1 and Control but a new treatment Arm 2 is now included in the trial and 33 patients are included on it and so forth.

Blind faith

As some of the authors pointed out in an earlier paper in the same journal[4]:

In a platform trial with many experimental agents it may be difficult or impossible to blind patients to every possible arm. They may have different modes of administration or dosing in such ways that blinding to all arms becomes incredibly difficult and burdensome. It is not uncommon that patients in platform trials are unblinded to which possible treatment arm they receive, but remain blinded to whether they receive active or placebo of that treatment.

P(365)

Some years ago, I referred to trials in which patients do not know which treatment they are receiving but do know of at least one treatment in the trial they are not receiving, as veiled[5]. Consider a placebo-controlled trial involving two doses of a hormone replacement therapy to be delivered trans-dermally using adhesive patches. Unless every patient is given two patches, any patient will know from the size of the patch, either that they are not being given the lower dose, or that they are not being given the higher dose. Now, suppose that there are two possible placebo patches that can be used: a smaller one as a placebo to the lower dose and a larger one as a placebo to the higher dose. There will then be four treatment groups active lower dose and placebo to lower dose and active higher dose and placebo to higher dose. The trial is then veiled.

This has implications for analysis. For example, to compare the higher and lower dose in such a design in a way that would be credibly blinded, you need to first compare each dose to the corresponding placebo and then compare the differences to each other. Even though the two doses are given concurrently.

However, Saville et al claim[3]

Unlike typical historical controls or real-world evidence, these ‘‘contemporary’’ controls are enrolled with the same protocol, the same inclusion/exclusion criteria, and the same data elements. The only difference is time.

P491

This is not true. If blinding matters, then this creates further groups that cannot be categorised in terms of time period only. For example, in the design shown in Figure 1, in all periods from period 3 onwards at least two arms are being studied in addition to control. If there is to be any attempt at blinding, each active treatment requires its own control. In total we end up with, not 10 period categories but 24 placebo and time categories and the resulting variance multipliers if we respect the veiled design will be much higher than are given in Figure 2.

A further problem, that I shall not discuss in detail here, is that it is not unusual for additional centres to be added as a trial progresses and sometimes for others to leave. If the allocation rule is unchanged, concurrent control is not compromised. But if the rule is changed, then this has complex consequences.

O Brave New World, That has such Bayesians in't

I want to make it clear that I am not criticising Bayesian statistics per se, nor even the FDA's newfound enthusiasm for it. Nor am I claiming that there are no Bayesians who understand the points I have made. Nevertheless, some of the hype attending adaptive designs is not only ludicrous but dangerous. The principle values of adaptive designs are administrative efficiency and the ability to react swiftly to abandon unpromising treatments. Any claims beyond this should be viewed with suspicion[6].

References

U.S. Department of Health and Human Services and Administration FaD. Use of Bayesian Methodology in Clinical Trials of Drug and Biological Products Guidance for Industry. In: (CDER) CfDEaR and (CBER) CfBEaR, (eds.). Rockville, MD 208522026, p. 25.
Senn SJ. Fisher's game with the devil. Statistics in Medicine 1994; 13: 217-230. Research Paper.
Saville BR, Berry DA, Berry NS, et al. The Bayesian Time Machine: Accounting for temporal drift in multi-arm platform trials. Clinical trials (London, England) 2022; 19: 490-501. 20220822. DOI: 10.1177/17407745221112013.
Saville BR and Berry SM. Efficiencies of platform clinical trials: A vision of the future. Clinical trials (London, England) 2016; 13: 358-366. 20160217. DOI: 10.1177/1740774515626362.
Senn SJ. A personal view of some controversies in allocating treatment to patients in clinical trials [see comments]. Statistics in Medicine 1995; 14: 2661-2674.
Senn SJ. Being Efficient About Efficacy Estimation. Statistics in Biopharmaceutical Research 2013; 5: 204-210. Research. DOI: 10.1080/19466315.2012.754726.

Jonas Kristoffer Lindeløv🔸 1mo

The new (draft) guidance is almost void of adaptive designs. It consistently refers to the FDA-2019 guidance regarding adaptive designs (link below). I think your discussion on concurrent controls applies to frequentist and Bayesian approaches alike. I do agree that many of the superlatives in press releases and first-takes on the new guidance are not in sync with the actual content of the new guidance. As I see it, the "breakthrough" is mostly that industry now have a clearer view how FDA will evaluate applications of Bayesian statistical methods. This by itself will probably lead to wider adoption. Link to 2019-guidance on adaptive designs: https://www.fda.gov/regulatory-information/search-fda-guidance-documents/adaptive-design-clinical-trials-drugs-and-biologics-guidance-industry

Nicole C. Close, PhD 1mo

Stephen, as always, this is a thoughtful and characteristically sharp perspective. I’ve always appreciated how you push the field to separate statistical enthusiasm from statistical rigor. As someone who is primarily a frequentist, I admit that when collaborators ask me about Bayesian approaches, my first instinct is usually skepticism and caution. Not because the methods lack merit, but because the assumptions, particularly around borrowing information across time or populations can become quite consequential if they’re not carefully justified. I agree that the real issue is less Bayes vs frequentist and more about preserving the integrity of trial design, especially concurrent control and protection against time trends in adaptive or platform trials. The recent guidance from the U.S. Food and Drug Administration is an important signal that regulators are open to modern statistical tools. But methodological flexibility should never be mistaken for immunity from bias. And Stephen… I may already be heading in the “bad fairy” direction myself — I just didn’t know what to call it until now. 🧚♀️

1 Reaction

Sergey Alexeev 1mo

Thanks for this Important discussion. A productive way forward is not to frame this as Bayesian versus frequentist, but to structure how both are used. In forthcoming work in Statistics in Medicine, we introduce CARE (Clarify, Apply, Refine, Evaluate) for cluster trials. The approach anchors inference in a design-based, cluster-robust benchmark that remains valid under heterogeneity and imbalance, and then allows assumption-rich models—including Bayesian ones—as transparent refinements rather than defaults. The sequencing is deliberate: start with what the data support under minimal assumptions, then layer additional structure only when it is justified and computationally stable. That avoids false confidence from fragile covariance assumptions while still permitting efficient Bayesian learning where appropriate. In short, robustness and efficiency do not have to be competing camps—there is a disciplined way to do both. Preprint: https://www.researchgate.net/publication/376204429_Cluster_trials_inference_with_CARE

1 Reaction

Christopher Edward Ormsby 1mo

Thank you for the article. As per usual, it's the hype that is the danger. You said it best: "[A]dministrative efficiency": 🍾🎉 Rigorous biostatistician: 😱

David Harris 1mo

I try to stay out of these as it’s outside my wheelhouse, but I’ve been reading content both on here and in more sophisticated frameworks and they concern me. There is two hundred and fifty years of successes and failures and I am concerned that people may try and learn new ways to do things the wrong way rather than build a framework of formal tradeoffs. My concern is that it is happening in a less formal way than is advised for such an important process. I am not concerned by the use of Bayes. I am concerned the rule making process is less than what should be used. I argue in my field that everything except subjective Bayes is illegal to use. And everything except subjective Bayes should be illegal to use. I have no problem with Bayes. I have a problem with short cuts which produce bad results. I am less than certain that the framework is enough for the problem. But I don’t know the people. I am sure that they are serious and careful people. I am used to working with less than careful people. My concerns may be mooted by the skill of the people. As I said, it’s not my wheelhouse.

See more comments

To view or add a comment, sign in

Beware of the Morlocks:

Stephen Senn

If you use a Bayesian Time Machine you may be in for some surprises

Guide Rail

Praise for Bayes

Grown Up or Blown Up?

Time's chariot

Recommended by LinkedIn

Blind faith

O Brave New World, That has such Bayesians in't

References

More articles by Stephen Senn

Others also viewed

Big Data Becomes Real Data in Precision Health

How close are we to an AI-powered NHS?

Delphi-2M: The Deep Predictor Transforming the Future of Proactive Healthcare

Analyzing Cough Sounds for Triaging Covid-19 Patients

Our Health Data - The Most Important Medical Discovery Of Our Time

Can Your Cough Diagnose COVID-19? Exploring AI in Healthcare Diagnostics.

AI and Data Science supporting clinical decisions: Diagnosis of COVID-19 and its clinical spectrum - Einstein Data4u

AI in Ireland has become VERY real

How AI Will Change Health Care for the Better: Looking Forward

Embryonics: Transforming Healthcare Prior Authorization Using Artificial Intelligence

Explore content categories

If you use a Bayesian Time Machine you may be in for some surprises

Guide Rail

Praise for Bayes

Grown Up or Blown Up?

Time's chariot

Recommended by LinkedIn

Blind faith

O Brave New World, That has such Bayesians in't

References

More articles by Stephen Senn

Cards on the Table

Causes and Covariates

Die, Dichotomy

Pooling the Interaction

Two ways to leave your ANOVA

Double Trouble

Illegible Eligible

Knowing ANOVA

Bridge Over Trebled Order

Estimand, Messtimand. Continue treading carefully

Others also viewed

Big Data Becomes Real Data in Precision Health

How close are we to an AI-powered NHS?

Delphi-2M: The Deep Predictor Transforming the Future of Proactive Healthcare

Analyzing Cough Sounds for Triaging Covid-19 Patients

Our Health Data - The Most Important Medical Discovery Of Our Time

Can Your Cough Diagnose COVID-19? Exploring AI in Healthcare Diagnostics.

AI and Data Science supporting clinical decisions: Diagnosis of COVID-19 and its clinical spectrum - Einstein Data4u

AI in Ireland has become VERY real

How AI Will Change Health Care for the Better: Looking Forward

Embryonics: Transforming Healthcare Prior Authorization Using Artificial Intelligence

Similar topics

Bayesian Statistics Applications

Updates on New Drug Approvals and Clinical Trials

Explore content categories