Bleatorial on blocks, causal analysis and statistical thinking
Diagram showing a complicated blocking structures. See

Bleatorial on blocks, causal analysis and statistical thinking

Background

Judea Pearl drew attention on Twitter to a preprint by Abhishek Umrawal that applied directed acyclic graphs (DAGs) to blocking in experimental designs. Although I disagree with some of what the preprint says, it has been useful in clarifying some matters for me. This blog represeents the resulting tweetorial and might be called a bleatorial.

Confession and motivation

I am a complete beginner when it comes to causal analysis as developed by Judea Pearl. However I am also an admirer of what he has achieved. My inspiration here is that I think I that what the Rothamsted School achieved in experimental analysis is worth close examination and that there might be some aspects that could make causal analysis even better. I also cannot claim to be an expert on the extremely deep and beautiful theory of experimental design. I have been a user of it for many years but that is not the same thing.

The tweetorial

1.      Thanks to Judea Pearl @yudapearl for drawing attention to this interesting paper, https://arxiv.org/pdf/2111.02306.pdf which I have found helpful in understanding some causal thinking. I shall now illustrate a particular problem related to fixed and random effects that it raises.

2.      Have a look at Figure 1 and 2 in the paper. The figure suggests that if we can block for everything that is a direct ancestor of the outcome Y, we do not need to block for ancestors of the ancestors. This seems like a perfectly reasonable causal principle.

No alt text provided for this image


3.      Now consider appendix C where we are invited to consider blocking for sex, age, weight, blood pressure, cholesterol, “factors” with “levels” (in stats speak) 2, 4,5,5,5 respectively.  

No alt text provided for this image


4.      Suppose I can design a trial in which I can block for these and thousands of other factors known and unknown at least as successfully as blocking them individually.

5.      What is this blocking factor? It’s “patient”. In cross-over trials, of which I have made a particular study, http://senns.uk/cticr2.html  every patient is their own control. You are your own perfect twin.

6.      I will treat you on one occasion with one drug  and on another occasion with another, thus controlling for all genetic factors and your total life history up to (but not beyond) the start of the trial.

7.      How many levels does this factor “patient” have? As many levels as there are patients. However, I shall have two outcome observations per patient (one on each of two treatments, say A and B) so that does not matter.

8.      So are your genes an “ancestor” of you or are you an “ancestor” of your genes. This depends on context. Is it really helpful to think of things this way or not? Maybe, maybe not, the important thing however is that as a blocking factor it doesn’t matter.

9.      If I put “patient” in the model as an effect it accounts for all these things. In stats terms. I  can if I wish, put patients in the model as a “fixed” effect or a “random” one. With one A observation and one B observation per patient it does not matter.

10.  Suppose that I am now informed that I can have only one period to observe patients in. I will now have to run a parallel group trial. Patients either get A or B. I can’t put patient in the model as a factor with levels equal to the number of patients. (It's "confounded")

11.  What do I do. I declare it “random”. In terms of causal modelling, I treat it as an exogenous U variable. (I think.) But previously, it was an endogenous V variable. (The statistical distinction is not quite the same but that’s another tweetorial.)

12.  OK you may say. Models depend on context and assumptions so what. Fine I answer. But what changed? Nothing at the level of deep mechanistic causation. What has changed is the design. I can only make progress by treating the patient effect as random.

13.  That’s enough for now in a follow-up tweetorial I shall consider incomplete blocks. That shows that simple examples although useful are far from covering what is necessary.

Thank you so much for the mention, Stephen Senn!

Like
Reply

To view or add a comment, sign in

More articles by Stephen Senn

  • Cards on the Table

    Starting over When I started in the pharmaceutical industry almost 40 years ago, I knew nothing about clinical trials…

    2 Comments
  • Causes and Covariates

    We are often told that statistics has never had much to do with causation. I find the claim surprising and am going to…

    7 Comments
  • Beware of the Morlocks:

    If you use a Bayesian Time Machine you may be in for some surprises "It sounds plausible enough tonight,” said the…

    18 Comments
  • Die, Dichotomy

    We have studied 21 435 unique randomized controlled trials (RCTs) from the Cochrane Database of Systematic Reviews…

    19 Comments
  • Pooling the Interaction

    ..

    7 Comments
  • Two ways to leave your ANOVA

    Feel the hate I hate the way researchers categorise analysis of variance (ANOVA) as one-way, two-way etc. It places the…

    5 Comments
  • Double Trouble

    I can highly recommend Adam Kucharski's book Proof as an entertaining and informative account of a matter that should…

    1 Comment
  • Illegible Eligible

    Several times recently, in published papers and in papers I have been asked to review, I have come across the quite…

    19 Comments
  • Knowing ANOVA

    Lost in space WARNING. I know nothing about agriculture and even less about spatial statistics.

    8 Comments
  • Bridge Over Trebled Order

    Fantasy remains a human right: we make in our measure and in our derivative mode because we are made: and not only…

    3 Comments

Others also viewed

Explore content categories