Model-based hypotheses testing, part 2: the 2-sample z-statistic for proportions with pooled variances IS the Rao test of logistic regression
In my two previous articles: “Is logistic regression a regression? It has been a regression since its birth — and is used this way every day” and "Logistic regression can replicate multiple parametric and non-parametric tests of proportions" I explained why the common misconception that “logistic regression is not a regression” is fundamentally flawed on the statistical ground, and showed several examples of model-based testing where the logistic regression allows one not only to replicate a variety of the classic statistical tests of proportions (%), but also to extend them in many ways.
Showing the direct relationships for all listed tests would be rather challenging and make this article extremely long. But some relationships are really cool! Previously I proved the equivalence of the 2-sample Wald’s z-statistic for comparing proportions with non-pooled variances and the average marginal effect of the logistic regression with a single binary predictor distinguishing the compared samples.
Today I will prove, the equivalence between the 2-sample z-statistic for comparing proportions with pooled variances and Rao score test of logistic regression with a single binary predictor.
This will be a bit more difficult (40 formulas) than previously, so hold tight! The result is just beautiful! (at least to me 😉).
⚠️Disclaimer:
So, if you prepare for an exam, use it with care.
We will assume independent binomial sampling and no small-sample corrections. The z-statistic for difference in 2 proportions with pooled variances is of the following form:
where pA stands for the estimated probability (sample proportion, %) in the 1st group, pB is the estimated probability in the 2nd group, nA and nB denote respective group sizes, and p is the pooled probability p = (xA + xB)/(nA + nB)
/ sorry for missing hats ^; pretend it's there, LinkedIn doesn't support LaTeX in the text. /
Traditionally the z statistic is expressed in the squared form, becoming: z^2~χ^2(df=1).
Both forms yield the same p-value. For convenience I will show that this χ^2(df=1) statistic is 1:1 equivalent to the Rao score test over the logistic regression with a single binary predictor playing role of indicator for the compared samples.
To simplify calculations, I will derive the formula for pooled probability p and the overall statistic form separately.
The equation of logistic regression
Let’s start from the equation of the logistic regression with a single binary predictor:
or equivalently, after applying the inverse-logit, i.e. sigmoid function (let’s also simplify X1 to X)
where Yi are independent Bernoulli random variables with probabilities pi. The second, simpler form with η=β0+Xβ1 will facilitate later calculations.
Let’s also encode the two levels {A, B} using a single binary predictor X such that: A: X=0, B: X=1, let’s simplify notation and express pA and pB in terms of beta coefficients:
Let’s also skip the hat notation for estimated p and use simply pA and pB until stated differently.
Pooled probability
Under H0, i.e. β1=0, pA = pB, and the best estimator of this common proportion is the pooled proportion pP. Let’s find its form:
We need to assume that the data consists of two independent binomial samples:
So the likelihood function is:
where xA and xB are the observed number of successes in groups A and B, respectively, and nA and nB are the total sample sizes in each group.
Under the null hypothesis H0: pA = pB = p (pooled) of some form - yet unknown.
Knowing that:
we obtain a single binomial likelihood where the total number of successes is xA+xB and the total number of trials is nA+nB, i.e.
Note: The binomial coefficients are multiplied rather than pooled, as they originally come from two independent samples, the true H0 is an assumption, not the property of the data. Actually, the choice doesn’t matter, as this term will be zeroed when taking the derivative.
Let’s simplify notation and replace p_pooled with just p. Now, the log-likelihood, log(L(p))=l(p), is defined as:
(I wrote const to highlight that this term will disappear after taking the derivative). Now by taking dl(p)/dp and setting it to 0, we obtain (provided that p != {0, 1}):
Or, alternatively, since pi=xi/ni
Log-likelihood of the logistic regression with a single binary predictor
The log-likelihood function is of the form:
where β is the vector of estimated coefficients, i.e. (β0, β1). Let’s express p and 1-p in terms of η=β0+Xβ1
Then
Gradient, Score, Hessian, Fisher Information, Covariance…
We will need both the gradient and Hessian of the log-likelihood. For future use, we will call the gradient as Rao score function denoted by “U” and the Hessian as “H”.
First, let’s find the form of U(β):
By noticing that dηi/dβ0=1, dηi/dβ1=xi, and remembering that p=exp(η)/(1+exp(η) we obtain:
and
So finally:
Now, the Hessian:
The partial derivatives are as follows:
Recommended by LinkedIn
Therefore:
Let’s also determine the Fisher Information matrix:
where qi = 1-pi.
This can be further expanded, by substituting sums with appropriate (per respective group, A and B) counts of elements, remembering that n = nA + nB, A: X=0, B: X=1, pA=p(Y=1|X=A) and pB=p(Y=1|X=B), and:
So the final, useful form is:
This matrix will be used to find the covariance one:
Another way to obtain I(β):
You can obtain this matrix also in a different way. I described it my previous article https://www.garudax.id/pulse/2-sample-walds-z-statistic-proportions-w-unpooled-vs-ame-olszewski-obp4f/ Briefly:
where X is the design matrix with 2 columns (β0 of ones and β1 indicating when X=1), with nA and nB number of rows corresponding to group A and B, respectively.
Now, W is the diagonal matrix of weights, of the block-diagonal form:
which can be expressed simpler as:
where InA and InB are respective identity matrices.
The result of the matrix multiplication can be expressed as appropriate sums:
where 1 is the result of multiplying 1 x 1 (the β0 vector), and X refer to the other products of the β0 and β1 vectors. Notice, that this is exactly the matrix #24.
The Rao Score and Information Matrix under H0: β1=0
The Rao Score
Recall the formula #18:
Again, nA and nB depict the respective number of rows corresponding to group A and B, for group A: X=0 and for B: X =1 and n=nA+nB. Also, under H0, pA = pB = p (i.e. ), where p is the pooled probability:
Here, yi is the response vector containing 1s in respective group. Summing all those 1s yields the total number of successes in this group, which can be expressed as:
So, the first element of the vector cancel out and finally:
Information Matrix
By recalling matrix #26 and remembering that under H0, pA=pB=p we obtain:
Let’s also calculate I^(-1):
After simplifying the denominator term:
we finally obtain:
Rao score test under H0: β1=0
The Rao score test (called also Lagrange multiplier test) under H0 is defined as the following quadratic form:
But since the first element of U is 0, this reduces to just scalar operation: U^2/I = U^2Σ:
This way I have shown the equivalence of the 2-sample z-statistic for comparing proportions with pooled variances and the Rao score test over the logistic regression with a single binary predictor distinguishing compared samples.
Phew! I felt like I was back in college! Only this time I liked the challenge 😊
And, again, some comparisons in R:
> wald_z_test_pooled(x1 = 6, n1 = 20, x2 = 10, n2 = 20)
diff z chi2 se p.value p.value_1 LCI HCI
1 -0.2 -1.290994 1.666667 0.1549193 0.1967056 0.0983528 -0.5036363 0.1036363
> prop.test(c(6,10), c(20,20), correct = FALSE)
2-sample test for equality of proportions without continuity correction
data: c(6, 10) out of c(20, 20)
X-squared = 1.6667, df = 1, p-value = 0.1967
alternative hypothesis: two.sided
95 percent confidence interval:
-0.49724326 0.09724326
sample estimates:
prop 1 prop 2
0.3 0.5
> data <- data.frame(response = factor(c(rep("Success", 6), rep("Failure", 20-6),
+ rep("Success", 10), rep("Failure", 20-10))),
+ grp = factor(rep(c("B", "A"), each=20)))
> m <- glm(response ~ grp, data = data, family = binomial(link = "logit"))
> anova(m, test = "Rao")
Analysis of Deviance Table
Model: binomial, link: logit
Response: response
Terms added sequentially (first to last)
Df Deviance Resid. Df Resid. Dev Rao Pr(>Chi)
NULL 39 53.841
grp 1 1.6805 38 52.160 1.6667 0.1967
The implementation of the z statistic:
wald_z_test_pooled <- function(x1, n1, x2, n2, conf.level=0.95) {
p1 <- x1/n1
p2 <- x2/n2
p_pool <- (p1*n1 + p2*n2) / (n1+n2)
se_p1 <- sqrt(p_pool * (1 - p_pool) / n1)
se_p2 <- sqrt(p_pool * (1 - p_pool) / n2)
se_diff <- sqrt(se_p1^2 + se_p2^2)
z <- (p1 - p2) / se_diff
p <- 2 * (1 - pnorm(abs(z)))
hCI <- abs(qnorm((1 - conf.level)/2)) * se_diff
return(data.frame(diff=p1-p2,
z = z,
chi2 = z^2,
se = se_diff, p.value = p, p.value_1 =p/2,
LCI = (p1-p2) - hCI,
HCI = (p1-p2) + hCI,
row.names = NULL))
}
If these formulas render poorly on your screen, find it on my GitHub: https://github.com/adrianolszewski/Logistic-regression-is-regression/blob/main/logistic_regression_Rao_Wald_z_test_proportions.md
In the next articles I will try to prove other relationships, so stay tuned!
Love this
Loved the 3rd sentence😊
Who wants to have some fun on Friday evening? Let's do some math! 😂 Darko Medin, Justin Bélair - I know you will understand my dark sense of Friday evening 😅 🎯 Ming "Tommy" Tang, Jonas Kristoffer Lindeløv🔸 you both inspired me to research the field of model-based testing, now outside the general linear model. Piotr Szulc, Andrzej Porębski - nie twierdzę, że się nudzicie w piątkowy wiosenny wieczór, ale.... ale może Wam się przyda do poduszki 😉 Andrzeju - przypomniałem sobie Twój post dot. ANOVAy i modeli, niebawem wrócę do niego! Bo to się spotka po drodze z model-based testing. Tylko musiałem uporządkować bałagan myśli i koncepcji. PS: here is the 2nd part: https://www.garudax.id/pulse/2-sample-walds-z-statistic-proportions-w-unpooled-vs-ame-olszewski-obp4f/?trackingId=KyjV3mBaSruiQnWt1kDAlQ%3D%3D
All your Friday nights are refreshing 😁. Vs some blanket posts saying that hypothesis testing is useless from certain communities.