Pooling the Interaction

Stephen Senn

Published Jan 10, 2026

...a further refinement, which, I would not myself recommend had been added; namely that if the two samples are in fact from the same population, the difference between the observed means will give an additional degree of freedom for error, and this can be included in the test conditions at the expense of some algebraic gymnastics...

(p xvii) In Yates, 1990 [1]

Choices, choices

Analysis of variance (ANOVA) for a randomised block design will typically fit not only a factor for treatments but also a factor for the blocks. This means that not only the treatment sum of squares but also the blocks sums of squares will be subtracted from the total sum of squares in order to calculate the error sum of squares. This is only right and proper, since treatment will be varied within blocks and unaffected by variation between blocks. To paraphrase RA Fisher, that which has been removed by the design, should be removed by the analysis.

When you have a randomised blocks design with replication, however, it becomes possible to identify a treatment-by-block interaction. The questions the arises. should this also be removed from the error sum of squares. In this blog I shall consider arguments for and against this strategy. But first, I shall consider a concrete example.

Concrete case

Suppose that we have a clinical trial comparing two treatments, A and B, in k centres and that each centre has 2n patients, with n being randomly allocated A and n being randomly allocated B. We this have 2kn patients in total. The two possible analysis of variance tables will have degrees of freedom (DF) as follows.

Degrees of freedom

Article content — Degrees of Freedom for two Modelling Strategies

I now consider the two arguments.

Against fitting

I can think of the following arguments against fitting the interaction.

Under the null hypothesis that the treatments are identical, there can be no interaction of treatments and blocks. It is therefore not necessary to fit the interaction to examine this hypothesis.
The interaction can be examined as a secondary matter once the question of the main effect of treatment has been addressed. It should not confuse the consideration of this primary question.
The null analysis of variance is that which applies to the experimental units in the absence of any variation in treatment. This null ANOVA cannot have a treatment by block interaction. It has the block effects only. The experiment then imposes a treatment structure on the experimental units and the importance of treatment effects is appropriately judged by comparing the variation due to treatment to the null ANOVA adjusted to remove the treatment sum of squares.

For fitting

These are arguments for fitting and interaction that occur to me.

Consider the 2kn-k-1 DF used for estimating random error if the interaction is not fitted. If there is no interaction effect, then each of the two sums of squares, one corresponding to the k-1 DF for the interaction and the other corresponding to the remaining 2k(n-1) DF, provide the means of validly estimating random error under the null hypothesis. Thus, choosing the latter only leads to a valid test albeit a the expense of losing the information from the former. If the treatment effects are purely additive it is true that using all 2kn-k-1 would provide a more powerful test but if there is an interaction then only using the 2k(n-1) DF may be more powerful.
There are other cases in statistics where we commonly choose to estimate a variance under some alternative hypothesis, sacrificing information that might be available under the null hypothesis. Such an example is provided by the common two sample t-test. We commonly estimate a within groups variance using N-2 DF, correcting for each of the two group means, where N is the total number of subjects, However, an estimate using N-1 DF, correcting for the global mean only would be valid under the null.* The problem is that if we use an estimate of the variance with N-1 DF, it will have a contribution from the difference between the two means. If we do not take account of this, not only the numerator but also the denominator of the t-statistic will reflect this difference. This is undesirable The easiest way to deal with this is to make the conventional variance estimate from each group using N-2 DF in total.
Suppose we consider an independent analysis for each centre. Each such analysis will use 2(n-1) DF for the error sum of squares. Each such analysis would be regarded as providing a valid test of the null hypothesis that the treatments are identical were the centre concerned the only centre we had. Now suppose that we decide to combine our estimates and used a pooled estimate of variance from them all. We now have k error sum of squares, each with 2(n-1) DF. The pooled error sums of squares thus has 2k(n-1) DF which is the same as the model fitting the interaction.

The meta-analysis connection

Note that argument 3) in the preceding section points to a strong analogy to fixed effects meta-analysis of many trials. This was pointed out some years ago in The Many Modes of Meta, [3] in which it was also argued that a fixed effects meta-analysis does not require the assumption that the treatment estimate in each trial is identical. In fact the treatment-by-trial interaction has been removed from the implicit model. It instead it assume that some question is being answered for which a weighted combination of estimates is an appropriate means to do so. One such question is ' was there a difference between treatments for at least some of the patients?'. Another might be 'what was the effect for the average patient?' See also chapters 14 & 16 of Statistical Issues in Drug Development [4]

My opinion

My opinion has shifted over the years. I used to think that there was something inappropriate about removing the treatment by block interaction from the residual sum of squares when examining the main effect of treatment. I now think that it is a reasonable thing to do. Nevertheless, I consider it is a choice that the statistician has to make. (Since I am used to thinking about all such things in the context of drug development, I consider that it is a choice that has to be made in advance of seeing the data and that that choice should be registered in a statistical analysis plan.)

For an application of the above philosophy to n-of-1 trials, where various approaches to analysis are possible, see Senn, 2024[5]

Footnote

*The passage quoted from Frank Yates at the head of this blog refers to a modification that Yates claims Fisher made in later editions of Statistical Methods for Research Workers to reflect this.

References

Yates F. Foreword. In: Bennett H (ed) RA Fisher: Statistical Methods, Experimental Design and Scientific Inference. Oxford: Oxford University Press, 1990, pp.vii-xxxii.
Fisher RA. Statistical Methods for Research Workers. In: Bennet JH (ed) Statistical Methods, Experimental Design and Scientific Inference. Oxford: Oxford University, 1990.
Senn SJ. The many modes of meta. Drug Information Journal 2000; 34: 535-549.
Senn SJ. Statistical Issues in Drug Development. 3rd ed. Chichester: John Wiley & Sons, 2021, p.616.
Senn S. The analysis of continuous data from n-of-1 trials using paired cycles: a simple tutorial. Trials 2024; 25: 128. 20240216. DOI: 10.1186/s13063-024-07964-7.

Jens Praestgaard 3mo

If I am caught in a fire in a hotel room, and the firefighter tells me to bring one ítem only with me on the ladder, it will be a hard choice between my undies and my easel eared copy of "Statistics in Drug Develipment" 😀

1 Reaction

Vahe M. 3mo

Accounting for treatment-by-block interactions carefully can clarify your error structure and connects nicely to insights from fixed-effects meta-analysis.

Paul Holtzman 3mo

You pose a question in the first sentence of your preface so I guess it’s ok to offer my 2-cents. The [ treatment x block ] interaction should be accounted for in the analysis. It should also be considered as a source of variability against which the treatment effect is compared. The idea: consider the treatment effect “significant”… large enough to be meaningful… only if it transcends noise from the environments (as generated by block differences) in which the treatments were tested. At least: compare the 2 mean squares, treatment to [ treatment x block ], to assess strength of treatment effect. Use a rule of thumb, say a 10:1 ratio, to quantify the strength. At most: the ratio of (independent) mean squares is an F-statistic, used to assess significance of the treatment effect. (To replace the usual model error mean square with the [ treatment x block ] mean square depends on the researcher.) Note: Considering a researcher's desire to include as many covariates as possible in a model and model misspecification (e.g., not taking into account sample sculpting), I think model errors are too small. Use of [ treatment x block ] mean square is a way to make tests of treatment effects more conservative, if not “correct”.

1 Reaction

Stephen Senn 3mo

I should have been clearer. My blog has nothing to do with detecting interactions. It’s only about the main effect of treatment. Should or should not the interaction be in the model when examining the main effect of treatment?

1 Reaction

Nicholas Lewin-Koh 3mo

In point one for fitting, it is not only degrees of freedom, it is also balancing risk. The slightly diminished power of including the interaction term if there is not an interaction, is offset by the ability to detect an interaction. In many trials there are center to center differences in care, that may not be addressed in the protocol.

See more comments

To view or add a comment, sign in

Pooling the Interaction

Stephen Senn

Choices, choices

Concrete case

Degrees of freedom

Against fitting

For fitting

The meta-analysis connection

My opinion

Footnote

References

More articles by Stephen Senn

Explore content categories

Choices, choices

Concrete case

Degrees of freedom

Against fitting

For fitting

The meta-analysis connection

My opinion

Footnote

References

More articles by Stephen Senn

Cards on the Table

Causes and Covariates

Beware of the Morlocks:

Die, Dichotomy

Two ways to leave your ANOVA

Double Trouble

Illegible Eligible

Knowing ANOVA

Bridge Over Trebled Order

Estimand, Messtimand. Continue treading carefully

Explore content categories