Bleatorial on blocks, causal analysis and statistical thinking
Background
Judea Pearl drew attention on Twitter to a preprint by Abhishek Umrawal that applied directed acyclic graphs (DAGs) to blocking in experimental designs. Although I disagree with some of what the preprint says, it has been useful in clarifying some matters for me. This blog represeents the resulting tweetorial and might be called a bleatorial.
Confession and motivation
I am a complete beginner when it comes to causal analysis as developed by Judea Pearl. However I am also an admirer of what he has achieved. My inspiration here is that I think I that what the Rothamsted School achieved in experimental analysis is worth close examination and that there might be some aspects that could make causal analysis even better. I also cannot claim to be an expert on the extremely deep and beautiful theory of experimental design. I have been a user of it for many years but that is not the same thing.
The tweetorial
1. Thanks to Judea Pearl @yudapearl for drawing attention to this interesting paper, https://arxiv.org/pdf/2111.02306.pdf which I have found helpful in understanding some causal thinking. I shall now illustrate a particular problem related to fixed and random effects that it raises.
2. Have a look at Figure 1 and 2 in the paper. The figure suggests that if we can block for everything that is a direct ancestor of the outcome Y, we do not need to block for ancestors of the ancestors. This seems like a perfectly reasonable causal principle.
3. Now consider appendix C where we are invited to consider blocking for sex, age, weight, blood pressure, cholesterol, “factors” with “levels” (in stats speak) 2, 4,5,5,5 respectively.
Recommended by LinkedIn
4. Suppose I can design a trial in which I can block for these and thousands of other factors known and unknown at least as successfully as blocking them individually.
5. What is this blocking factor? It’s “patient”. In cross-over trials, of which I have made a particular study, http://senns.uk/cticr2.html every patient is their own control. You are your own perfect twin.
6. I will treat you on one occasion with one drug and on another occasion with another, thus controlling for all genetic factors and your total life history up to (but not beyond) the start of the trial.
7. How many levels does this factor “patient” have? As many levels as there are patients. However, I shall have two outcome observations per patient (one on each of two treatments, say A and B) so that does not matter.
8. So are your genes an “ancestor” of you or are you an “ancestor” of your genes. This depends on context. Is it really helpful to think of things this way or not? Maybe, maybe not, the important thing however is that as a blocking factor it doesn’t matter.
9. If I put “patient” in the model as an effect it accounts for all these things. In stats terms. I can if I wish, put patients in the model as a “fixed” effect or a “random” one. With one A observation and one B observation per patient it does not matter.
10. Suppose that I am now informed that I can have only one period to observe patients in. I will now have to run a parallel group trial. Patients either get A or B. I can’t put patient in the model as a factor with levels equal to the number of patients. (It's "confounded")
11. What do I do. I declare it “random”. In terms of causal modelling, I treat it as an exogenous U variable. (I think.) But previously, it was an endogenous V variable. (The statistical distinction is not quite the same but that’s another tweetorial.)
12. OK you may say. Models depend on context and assumptions so what. Fine I answer. But what changed? Nothing at the level of deep mechanistic causation. What has changed is the design. I can only make progress by treating the patient effect as random.
13. That’s enough for now in a follow-up tweetorial I shall consider incomplete blocks. That shows that simple examples although useful are far from covering what is necessary.
Thank you so much for the mention, Stephen Senn!