Model-based hypotheses testing, part 2: the 2-sample z-statistic for proportions with pooled variances IS the Rao test of logistic regression

Model-based hypotheses testing, part 2: the 2-sample z-statistic for proportions with pooled variances IS the Rao test of logistic regression

In my two previous articles: “Is logistic regression a regression? It has been a regression since its birth — and is used this way every day and "Logistic regression can replicate multiple parametric and non-parametric tests of proportions" I explained why the common misconception that “logistic regression is not a regression” is fundamentally flawed on the statistical ground, and showed several examples of model-based testing where the logistic regression allows one not only to replicate a variety of the classic statistical tests of proportions (%), but also to extend them in many ways.

Showing the direct relationships for all listed tests would be rather challenging and make this article extremely long. But some relationships are really cool! Previously I proved the equivalence of the 2-sample Wald’s z-statistic for comparing proportions with non-pooled variances and the average marginal effect of the logistic regression with a single binary predictor distinguishing the compared samples.

Today I will prove, the equivalence between the 2-sample z-statistic for comparing proportions with pooled variances and Rao score test of logistic regression with a single binary predictor.

This will be a bit more difficult (40 formulas) than previously, so hold tight! The result is just beautiful! (at least to me 😉).


⚠️Disclaimer:

  1. What you will read below demonstrates the algebraic equivalence between selected classical hypothesis tests and their model-based analogues under common large-sample assumptions. My goal was to show how the formulas transition from one framework to another. The derivations focus on analytic equivalence of test statistics and do not provide a fully rigorous proof of asymptotic distributions or verify all regularity conditions required in formal statistical theory. As such, this note should be viewed as a proof of concept and kind of a "conceptual bridge" rather than a complete theoretical course. In particular: distributional assumptions are stated informally rather than derived formally, asymptotic arguments (e.g., convergence to χ² distributions) are not proven (it is not a textbook!), conditions required for likelihood-based theory are not verified.
  2. You might notice the code uses the name "Wald". It's intentional. In the GLM family, Wald, Rao (Score), and Likelihood Ratio tests only converge asymptotically. Strictly speaking, the pooled z-statistic derived here is a Rao Score test, not a Wald test. However, because the term "Wald test" is frequently (and incorrectly) used as an umbrella term for any z-ratio in software and textbooks, and I saw many times that people search for it in this way. So I’ve retained the name for consistency with common practice. The critical takeaway is the use of pooled standard errors, which aligns this calculation with Rao’s approach rather than the unpooled Wald approach.

So, if you prepare for an exam, use it with care.


We will assume independent binomial sampling and no small-sample corrections. The z-statistic for difference in 2 proportions with pooled variances is of the following form:

Article content

where pA stands for the estimated probability (sample proportion, %) in the 1st group, pB is the estimated probability in the 2nd group,  nA and nB denote respective group sizes, and p is the pooled probability p = (xA + xB)/(nA + nB)

/ sorry for missing hats ^; pretend it's there, LinkedIn doesn't support LaTeX in the text. /

Traditionally the z statistic is expressed in the squared form, becoming: z^2~χ^2(df=1).

Both forms yield the same p-value. For convenience I will show that this χ^2(df=1) statistic is 1:1 equivalent to the Rao score test over the logistic regression with a single binary predictor playing role of indicator for the compared samples.

To simplify calculations, I will derive the formula for pooled probability p and the overall statistic form separately.

The equation of logistic regression

Let’s start from the equation of the logistic regression with a single binary predictor:

Article content

or equivalently, after applying the inverse-logit, i.e. sigmoid function (let’s also simplify X1 to X)

Article content

where Yi are independent Bernoulli random variables with probabilities pi. The second, simpler form with η=β0+Xβ1 will facilitate later calculations.

Let’s also encode the two levels {A, B} using a single binary predictor X such that: A: X=0, B: X=1, let’s simplify notation and express pA and pB in terms of beta coefficients:

Article content

Let’s also skip the hat notation for estimated p and use simply pA and pB until stated differently.

Pooled probability

Under H0, i.e. β1=0, pA = pB, and the best estimator of this common proportion is the pooled proportion pP. Let’s find its form:

We need to assume that the data consists of two independent binomial samples:

Article content

So the likelihood function is:

Article content

where xA and xB are the observed number of successes in groups A and B, respectively, and nA and nB are the total sample sizes in each group.

Under the null hypothesis H0: pA = pB = p (pooled) of some form - yet unknown.

Knowing that:

Article content

we obtain a single binomial likelihood where the total number of successes is xA+xB and the total number of trials is nA+nB, i.e.

Article content

Note: The binomial coefficients are multiplied rather than pooled, as they originally come from two independent samples, the true H0 is an assumption, not the property of the data. Actually, the choice doesn’t matter, as this term will be zeroed when taking the derivative.

Let’s simplify notation and replace p_pooled with just p. Now, the log-likelihood, log(L(p))=l(p), is defined as:

Article content

(I wrote const to highlight that this term will disappear after taking the derivative). Now by taking dl(p)/dp and setting it to 0, we obtain (provided that p != {0, 1}):

Article content

Or, alternatively, since pi=xi/ni

Article content

Log-likelihood of the logistic regression with a single binary predictor

The log-likelihood function is of the form:

Article content

where β is the vector of estimated coefficients, i.e. (β0, β1). Let’s express p and 1-p in terms of η=β0+Xβ1

Article content

Then

Article content

Gradient, Score, Hessian, Fisher Information, Covariance…

We will need both the gradient and Hessian of the log-likelihood. For future use, we will call the gradient as Rao score function denoted by “U” and the Hessian as “H”.

First, let’s find the form of U(β):

Article content

By noticing that dηi/dβ0=1, dηi/dβ1=xi, and remembering that p=exp(η)/(1+exp(η) we obtain:

Article content

and

Article content

So finally:

Article content

Now, the Hessian:

Article content

The partial derivatives are as follows:

Article content
Article content
Article content

Therefore:

Article content

Let’s also determine the Fisher Information matrix:

Article content

where qi = 1-pi.

This can be further expanded, by substituting sums with appropriate (per respective group, A and B) counts of elements, remembering that n = nA + nB, A: X=0, B: X=1, pA=p(Y=1|X=A) and pB=p(Y=1|X=B), and:

Article content

So the final, useful form is:

Article content

This matrix will be used to find the covariance one:

Article content

Another way to obtain I(β):

You can obtain this matrix also in a different way. I described it my previous article https://www.garudax.id/pulse/2-sample-walds-z-statistic-proportions-w-unpooled-vs-ame-olszewski-obp4f/ Briefly:

Article content

where X is the design matrix with 2 columns (β0 of ones and β1 indicating when X=1), with nA and nB number of rows corresponding to group A and B, respectively.

Article content

Now, W is the diagonal matrix of weights, of the block-diagonal form:

Article content

which can be expressed simpler as:

Article content

where InA and InB are respective identity matrices.

The result of the matrix multiplication can be expressed as appropriate sums:

Article content

where 1 is the result of multiplying 1 x 1 (the β0 vector), and X refer to the other products of the β0 and β1 vectors. Notice, that this is exactly the matrix #24.


The Rao Score and Information Matrix under H0: β1=0

The Rao Score

Recall the formula #18:

Article content

Again, nA and nB depict the respective number of rows corresponding to group A and B, for group A: X=0 and for B: X =1 and n=nA+nB. Also, under H0, pA = pB = p (i.e. ), where p is the pooled probability:

Article content

Here, yi is the response vector containing 1s in respective group. Summing all those 1s yields the total number of successes in this group, which can be expressed as:

Article content

So, the first element of the vector cancel out and finally:

Article content

Information Matrix

By recalling matrix #26 and remembering that under H0, pA=pB=p we obtain:

Article content

Let’s also calculate I^(-1):

Article content

After simplifying the denominator term:

Article content

we finally obtain:

Article content

Rao score test under H0: β1=0

The Rao score test (called also Lagrange multiplier test) under H0 is defined as the following quadratic form:

Article content

But since the first element of U is 0, this reduces to just scalar operation: U^2/I = U^2Σ:


Article content

This way I have shown the equivalence of the 2-sample z-statistic for comparing proportions with pooled variances and the Rao score test over the logistic regression with a single binary predictor distinguishing compared samples.

Phew! I felt like I was back in college! Only this time I liked the challenge 😊

And, again, some comparisons in R:

> wald_z_test_pooled(x1 = 6, n1 = 20, x2 = 10, n2 = 20)
  diff         z     chi2        se   p.value p.value_1        LCI       HCI
1 -0.2 -1.290994 1.666667 0.1549193 0.1967056 0.0983528 -0.5036363 0.1036363

> prop.test(c(6,10), c(20,20), correct = FALSE)
	2-sample test for equality of proportions without continuity correction

data:  c(6, 10) out of c(20, 20)
X-squared = 1.6667, df = 1, p-value = 0.1967
alternative hypothesis: two.sided
95 percent confidence interval:
 -0.49724326  0.09724326
sample estimates:
prop 1 prop 2 
   0.3    0.5 

> data <- data.frame(response = factor(c(rep("Success", 6), rep("Failure", 20-6),
+                                        rep("Success", 10), rep("Failure", 20-10))),
+                    grp    = factor(rep(c("B", "A"), each=20)))

> m <- glm(response ~ grp, data = data, family = binomial(link = "logit"))
> anova(m, test = "Rao")
Analysis of Deviance Table

Model: binomial, link: logit
Response: response
Terms added sequentially (first to last)

     Df Deviance Resid. Df Resid. Dev    Rao Pr(>Chi)
NULL                    39     53.841                
grp   1   1.6805        38     52.160 1.6667   0.1967        


Article content

The implementation of the z statistic:

wald_z_test_pooled <- function(x1, n1, x2, n2, conf.level=0.95) {
      p1 <- x1/n1
  p2 <- x2/n2
  
  p_pool <- (p1*n1 + p2*n2) / (n1+n2)
  se_p1 <- sqrt(p_pool * (1 - p_pool) / n1)
  se_p2 <- sqrt(p_pool * (1 - p_pool) / n2)
  
  se_diff <- sqrt(se_p1^2 + se_p2^2)
  
  z <- (p1 - p2) / se_diff
  p <- 2 * (1 - pnorm(abs(z)))
  hCI <- abs(qnorm((1 - conf.level)/2)) * se_diff
  
  return(data.frame(diff=p1-p2, 
                    z = z, 
                    chi2 = z^2,
                    se = se_diff, p.value = p, p.value_1 =p/2,
                    LCI = (p1-p2) - hCI,
                    HCI = (p1-p2) + hCI,
                    row.names = NULL))
}        

If these formulas render poorly on your screen, find it on my GitHub: https://github.com/adrianolszewski/Logistic-regression-is-regression/blob/main/logistic_regression_Rao_Wald_z_test_proportions.md

In the next articles I will try to prove other relationships, so stay tuned!

Loved the 3rd sentence😊

Like
Reply

Who wants to have some fun on Friday evening? Let's do some math! 😂 Darko Medin, Justin Bélair - I know you will understand my dark sense of Friday evening 😅 🎯 Ming "Tommy" Tang, Jonas Kristoffer Lindeløv🔸 you both inspired me to research the field of model-based testing, now outside the general linear model. Piotr Szulc, Andrzej Porębski - nie twierdzę, że się nudzicie w piątkowy wiosenny wieczór, ale.... ale może Wam się przyda do poduszki 😉 Andrzeju - przypomniałem sobie Twój post dot. ANOVAy i modeli, niebawem wrócę do niego! Bo to się spotka po drodze z model-based testing. Tylko musiałem uporządkować bałagan myśli i koncepcji. PS: here is the 2nd part: https://www.garudax.id/pulse/2-sample-walds-z-statistic-proportions-w-unpooled-vs-ame-olszewski-obp4f/?trackingId=KyjV3mBaSruiQnWt1kDAlQ%3D%3D

All your Friday nights are refreshing 😁. Vs some blanket posts saying that hypothesis testing is useless from certain communities.

To view or add a comment, sign in

More articles by Adrian Olszewski

Others also viewed

Explore content categories