Pooling the Interaction
...a further refinement, which, I would not myself recommend had been added; namely that if the two samples are in fact from the same population, the difference between the observed means will give an additional degree of freedom for error, and this can be included in the test conditions at the expense of some algebraic gymnastics...
(p xvii) In Yates, 1990 [1]
Choices, choices
Analysis of variance (ANOVA) for a randomised block design will typically fit not only a factor for treatments but also a factor for the blocks. This means that not only the treatment sum of squares but also the blocks sums of squares will be subtracted from the total sum of squares in order to calculate the error sum of squares. This is only right and proper, since treatment will be varied within blocks and unaffected by variation between blocks. To paraphrase RA Fisher, that which has been removed by the design, should be removed by the analysis.
When you have a randomised blocks design with replication, however, it becomes possible to identify a treatment-by-block interaction. The questions the arises. should this also be removed from the error sum of squares. In this blog I shall consider arguments for and against this strategy. But first, I shall consider a concrete example.
Concrete case
Suppose that we have a clinical trial comparing two treatments, A and B, in k centres and that each centre has 2n patients, with n being randomly allocated A and n being randomly allocated B. We this have 2kn patients in total. The two possible analysis of variance tables will have degrees of freedom (DF) as follows.
Degrees of freedom
I now consider the two arguments.
Against fitting
I can think of the following arguments against fitting the interaction.
For fitting
These are arguments for fitting and interaction that occur to me.
The meta-analysis connection
Note that argument 3) in the preceding section points to a strong analogy to fixed effects meta-analysis of many trials. This was pointed out some years ago in The Many Modes of Meta, [3] in which it was also argued that a fixed effects meta-analysis does not require the assumption that the treatment estimate in each trial is identical. In fact the treatment-by-trial interaction has been removed from the implicit model. It instead it assume that some question is being answered for which a weighted combination of estimates is an appropriate means to do so. One such question is ' was there a difference between treatments for at least some of the patients?'. Another might be 'what was the effect for the average patient?' See also chapters 14 & 16 of Statistical Issues in Drug Development [4]
My opinion
My opinion has shifted over the years. I used to think that there was something inappropriate about removing the treatment by block interaction from the residual sum of squares when examining the main effect of treatment. I now think that it is a reasonable thing to do. Nevertheless, I consider it is a choice that the statistician has to make. (Since I am used to thinking about all such things in the context of drug development, I consider that it is a choice that has to be made in advance of seeing the data and that that choice should be registered in a statistical analysis plan.)
For an application of the above philosophy to n-of-1 trials, where various approaches to analysis are possible, see Senn, 2024[5]
Footnote
*The passage quoted from Frank Yates at the head of this blog refers to a modification that Yates claims Fisher made in later editions of Statistical Methods for Research Workers to reflect this.
References
If I am caught in a fire in a hotel room, and the firefighter tells me to bring one ítem only with me on the ladder, it will be a hard choice between my undies and my easel eared copy of "Statistics in Drug Develipment" 😀
Accounting for treatment-by-block interactions carefully can clarify your error structure and connects nicely to insights from fixed-effects meta-analysis.
You pose a question in the first sentence of your preface so I guess it’s ok to offer my 2-cents. The [ treatment x block ] interaction should be accounted for in the analysis. It should also be considered as a source of variability against which the treatment effect is compared. The idea: consider the treatment effect “significant”… large enough to be meaningful… only if it transcends noise from the environments (as generated by block differences) in which the treatments were tested. At least: compare the 2 mean squares, treatment to [ treatment x block ], to assess strength of treatment effect. Use a rule of thumb, say a 10:1 ratio, to quantify the strength. At most: the ratio of (independent) mean squares is an F-statistic, used to assess significance of the treatment effect. (To replace the usual model error mean square with the [ treatment x block ] mean square depends on the researcher.) Note: Considering a researcher's desire to include as many covariates as possible in a model and model misspecification (e.g., not taking into account sample sculpting), I think model errors are too small. Use of [ treatment x block ] mean square is a way to make tests of treatment effects more conservative, if not “correct”.
I should have been clearer. My blog has nothing to do with detecting interactions. It’s only about the main effect of treatment. Should or should not the interaction be in the model when examining the main effect of treatment?
In point one for fitting, it is not only degrees of freedom, it is also balancing risk. The slightly diminished power of including the interaction term if there is not an interaction, is offset by the ability to detect an interaction. In many trials there are center to center differences in care, that may not be addressed in the protocol.