Model-based hypotheses testing, part 2: the 2-sample z-statistic for proportions with pooled variances IS the Rao test of logistic regression

Adrian Olszewski

Published Mar 21, 2025

In my two previous articles: “Is logistic regression a regression? It has been a regression since its birth — and is used this way every day” and "Logistic regression can replicate multiple parametric and non-parametric tests of proportions" I explained why the common misconception that “logistic regression is not a regression” is fundamentally flawed on the statistical ground, and showed several examples of model-based testing where the logistic regression allows one not only to replicate a variety of the classic statistical tests of proportions (%), but also to extend them in many ways.

Showing the direct relationships for all listed tests would be rather challenging and make this article extremely long. But some relationships are really cool! Previously I proved the equivalence of the 2-sample Wald’s z-statistic for comparing proportions with non-pooled variances and the average marginal effect of the logistic regression with a single binary predictor distinguishing the compared samples.

Today I will prove, the equivalence between the 2-sample z-statistic for comparing proportions with pooled variances and Rao score test of logistic regression with a single binary predictor.

This will be a bit more difficult (40 formulas) than previously, so hold tight! The result is just beautiful! (at least to me 😉).

⚠️Disclaimer:

What you will read below demonstrates the algebraic equivalence between selected classical hypothesis tests and their model-based analogues under common large-sample assumptions. My goal was to show how the formulas transition from one framework to another. The derivations focus on analytic equivalence of test statistics and do not provide a fully rigorous proof of asymptotic distributions or verify all regularity conditions required in formal statistical theory. As such, this note should be viewed as a proof of concept and kind of a "conceptual bridge" rather than a complete theoretical course. In particular: distributional assumptions are stated informally rather than derived formally, asymptotic arguments (e.g., convergence to χ² distributions) are not proven (it is not a textbook!), conditions required for likelihood-based theory are not verified.
You might notice the code uses the name "Wald". It's intentional. In the GLM family, Wald, Rao (Score), and Likelihood Ratio tests only converge asymptotically. Strictly speaking, the pooled z-statistic derived here is a Rao Score test, not a Wald test. However, because the term "Wald test" is frequently (and incorrectly) used as an umbrella term for any z-ratio in software and textbooks, and I saw many times that people search for it in this way. So I’ve retained the name for consistency with common practice. The critical takeaway is the use of pooled standard errors, which aligns this calculation with Rao’s approach rather than the unpooled Wald approach.

So, if you prepare for an exam, use it with care.

We will assume independent binomial sampling and no small-sample corrections. The z-statistic for difference in 2 proportions with pooled variances is of the following form:

where pA stands for the estimated probability (sample proportion, %) in the 1st group, pB is the estimated probability in the 2nd group, nA and nB denote respective group sizes, and p is the pooled probability p = (xA + xB)/(nA + nB)

/ sorry for missing hats ^; pretend it's there, LinkedIn doesn't support LaTeX in the text. /

Traditionally the z statistic is expressed in the squared form, becoming: z^2~χ^2(df=1).

Both forms yield the same p-value. For convenience I will show that this χ^2(df=1) statistic is 1:1 equivalent to the Rao score test over the logistic regression with a single binary predictor playing role of indicator for the compared samples.

To simplify calculations, I will derive the formula for pooled probability p and the overall statistic form separately.

The equation of logistic regression

Let’s start from the equation of the logistic regression with a single binary predictor:

or equivalently, after applying the inverse-logit, i.e. sigmoid function (let’s also simplify X1 to X)

where Yi are independent Bernoulli random variables with probabilities pi. The second, simpler form with η=β0+Xβ1 will facilitate later calculations.

Let’s also encode the two levels {A, B} using a single binary predictor X such that: A: X=0, B: X=1, let’s simplify notation and express pA and pB in terms of beta coefficients:

Let’s also skip the hat notation for estimated p and use simply pA and pB until stated differently.

Pooled probability

Under H0, i.e. β1=0, pA = pB, and the best estimator of this common proportion is the pooled proportion pP. Let’s find its form:

We need to assume that the data consists of two independent binomial samples:

So the likelihood function is:

where xA and xB are the observed number of successes in groups A and B, respectively, and nA and nB are the total sample sizes in each group.

Under the null hypothesis H0: pA = pB = p (pooled) of some form - yet unknown.

Knowing that:

we obtain a single binomial likelihood where the total number of successes is xA+xB and the total number of trials is nA+nB, i.e.

Note: The binomial coefficients are multiplied rather than pooled, as they originally come from two independent samples, the true H0 is an assumption, not the property of the data. Actually, the choice doesn’t matter, as this term will be zeroed when taking the derivative.

Let’s simplify notation and replace p_pooled with just p. Now, the log-likelihood, log(L(p))=l(p), is defined as:

(I wrote const to highlight that this term will disappear after taking the derivative). Now by taking dl(p)/dp and setting it to 0, we obtain (provided that p != {0, 1}):

Or, alternatively, since pi=xi/ni

Log-likelihood of the logistic regression with a single binary predictor

The log-likelihood function is of the form:

where β is the vector of estimated coefficients, i.e. (β0, β1). Let’s express p and 1-p in terms of η=β0+Xβ1

Then

Gradient, Score, Hessian, Fisher Information, Covariance…

We will need both the gradient and Hessian of the log-likelihood. For future use, we will call the gradient as Rao score function denoted by “U” and the Hessian as “H”.

First, let’s find the form of U(β):

By noticing that dηi/dβ0=1, dηi/dβ1=xi, and remembering that p=exp(η)/(1+exp(η) we obtain:

and

So finally:

Now, the Hessian:

The partial derivatives are as follows:

Recommended by LinkedIn

Key Metrics for Evaluating Regression Models [Part 1]

Shilpika Saxena 1 year ago

Simple Logistic Regression

Anoop B. 6 years ago

The Day, Linear Regression fails - Example 1

Srijit Mukherjee 3 years ago

Therefore:

Let’s also determine the Fisher Information matrix:

where qi = 1-pi.

This can be further expanded, by substituting sums with appropriate (per respective group, A and B) counts of elements, remembering that n = nA + nB, A: X=0, B: X=1, pA=p(Y=1|X=A) and pB=p(Y=1|X=B), and:

So the final, useful form is:

This matrix will be used to find the covariance one:

Another way to obtain I(β):

You can obtain this matrix also in a different way. I described it my previous article https://www.garudax.id/pulse/2-sample-walds-z-statistic-proportions-w-unpooled-vs-ame-olszewski-obp4f/ Briefly:

where X is the design matrix with 2 columns (β0 of ones and β1 indicating when X=1), with nA and nB number of rows corresponding to group A and B, respectively.

Now, W is the diagonal matrix of weights, of the block-diagonal form:

which can be expressed simpler as:

where InA and InB are respective identity matrices.

The result of the matrix multiplication can be expressed as appropriate sums:

where 1 is the result of multiplying 1 x 1 (the β0 vector), and X refer to the other products of the β0 and β1 vectors. Notice, that this is exactly the matrix #24.

The Rao Score and Information Matrix under H0: β1=0

The Rao Score

Recall the formula #18:

Again, nA and nB depict the respective number of rows corresponding to group A and B, for group A: X=0 and for B: X =1 and n=nA+nB. Also, under H0, pA = pB = p (i.e. ), where p is the pooled probability:

Here, yi is the response vector containing 1s in respective group. Summing all those 1s yields the total number of successes in this group, which can be expressed as:

So, the first element of the vector cancel out and finally:

Information Matrix

By recalling matrix #26 and remembering that under H0, pA=pB=p we obtain:

Let’s also calculate I^(-1):

After simplifying the denominator term:

we finally obtain:

Rao score test under H0: β1=0

The Rao score test (called also Lagrange multiplier test) under H0 is defined as the following quadratic form:

But since the first element of U is 0, this reduces to just scalar operation: U^2/I = U^2Σ:

This way I have shown the equivalence of the 2-sample z-statistic for comparing proportions with pooled variances and the Rao score test over the logistic regression with a single binary predictor distinguishing compared samples.

Phew! I felt like I was back in college! Only this time I liked the challenge 😊

And, again, some comparisons in R:

> wald_z_test_pooled(x1 = 6, n1 = 20, x2 = 10, n2 = 20)
  diff         z     chi2        se   p.value p.value_1        LCI       HCI
1 -0.2 -1.290994 1.666667 0.1549193 0.1967056 0.0983528 -0.5036363 0.1036363

> prop.test(c(6,10), c(20,20), correct = FALSE)
	2-sample test for equality of proportions without continuity correction

data:  c(6, 10) out of c(20, 20)
X-squared = 1.6667, df = 1, p-value = 0.1967
alternative hypothesis: two.sided
95 percent confidence interval:
 -0.49724326  0.09724326
sample estimates:
prop 1 prop 2 
   0.3    0.5 

> data <- data.frame(response = factor(c(rep("Success", 6), rep("Failure", 20-6),
+                                        rep("Success", 10), rep("Failure", 20-10))),
+                    grp    = factor(rep(c("B", "A"), each=20)))

> m <- glm(response ~ grp, data = data, family = binomial(link = "logit"))
> anova(m, test = "Rao")
Analysis of Deviance Table

Model: binomial, link: logit
Response: response
Terms added sequentially (first to last)

     Df Deviance Resid. Df Resid. Dev    Rao Pr(>Chi)
NULL                    39     53.841                
grp   1   1.6805        38     52.160 1.6667   0.1967

The implementation of the z statistic:

wald_z_test_pooled <- function(x1, n1, x2, n2, conf.level=0.95) {
      p1 <- x1/n1
  p2 <- x2/n2
  
  p_pool <- (p1*n1 + p2*n2) / (n1+n2)
  se_p1 <- sqrt(p_pool * (1 - p_pool) / n1)
  se_p2 <- sqrt(p_pool * (1 - p_pool) / n2)
  
  se_diff <- sqrt(se_p1^2 + se_p2^2)
  
  z <- (p1 - p2) / se_diff
  p <- 2 * (1 - pnorm(abs(z)))
  hCI <- abs(qnorm((1 - conf.level)/2)) * se_diff
  
  return(data.frame(diff=p1-p2, 
                    z = z, 
                    chi2 = z^2,
                    se = se_diff, p.value = p, p.value_1 =p/2,
                    LCI = (p1-p2) - hCI,
                    HCI = (p1-p2) + hCI,
                    row.names = NULL))
}

If these formulas render poorly on your screen, find it on my GitHub: https://github.com/adrianolszewski/Logistic-regression-is-regression/blob/main/logistic_regression_Rao_Wald_z_test_proportions.md

In the next articles I will try to prove other relationships, so stay tuned!

Subhash Chandra 1y

Love this

Arindam Pal 1y

Loved the 3rd sentence😊

Adrian Olszewski 1y

Who wants to have some fun on Friday evening? Let's do some math! 😂 Darko Medin, Justin Bélair - I know you will understand my dark sense of Friday evening 😅 🎯 Ming "Tommy" Tang, Jonas Kristoffer Lindeløv🔸 you both inspired me to research the field of model-based testing, now outside the general linear model. Piotr Szulc, Andrzej Porębski - nie twierdzę, że się nudzicie w piątkowy wiosenny wieczór, ale.... ale może Wam się przyda do poduszki 😉 Andrzeju - przypomniałem sobie Twój post dot. ANOVAy i modeli, niebawem wrócę do niego! Bo to się spotka po drodze z model-based testing. Tylko musiałem uporządkować bałagan myśli i koncepcji. PS: here is the 2nd part: https://www.garudax.id/pulse/2-sample-walds-z-statistic-proportions-w-unpooled-vs-ame-olszewski-obp4f/?trackingId=KyjV3mBaSruiQnWt1kDAlQ%3D%3D

2 Reactions

Gabriel Ryan, FRM 1y

All your Friday nights are refreshing 😁. Vs some blanket posts saying that hypothesis testing is useless from certain communities.

Model-based hypotheses testing, part 2: the 2-sample z-statistic for proportions with pooled variances IS the Rao test of logistic regression

Adrian Olszewski

Recommended by LinkedIn

More articles by Adrian Olszewski

Others also viewed

Ridge Regression and Lasso Regression.

How to Interpret the Intercept in 6 Linear Regression Examples

Logistic Regression in a Nutshell

Let's understand Linear Regression

Multiple Regression in R: Multiple Variables, Interactions, Graphing, and Assessment

What is a Logit Function and Why Use Logistic Regression?

Logistic Regression

Evaluation of logistic regression model ( Must read for all )

How to Interpret the Constant (Y Intercept) in Regression Analysis By Jim Frost http://statisticsbyjim.com/jim_frost/

Introduction Of Logistic Regression

Explore content categories

Recommended by LinkedIn

More articles by Adrian Olszewski

Type-M and Type-S errors in underpowered studies

The role of the frequentist framework, p-values, statistical (NHST) and practical significance (MID) in randomised controlled (clinical) trials

Importance of statistical significance and p-values in clinical trials

What key statistical tools dedicated to clinical trial biostatistics exist in R but are missing (or too limited) in Python?

Pearson's χ2 for 2×2 contingency table IS the 2-sample Wald’s z test for proportions with pooled variances

Model-based hypotheses testing, part I: the 2-sample Wald’s z-statistic for proportions with un-pooled variances IS the AME of logistic regression

Logistic regression can replicate multiple parametric and non-parametric tests of proportions

Logistic regression has been a regression since its birth - and is used this way every day.

Why is transforming the response in regression analysis and hypothesis testing so dangerous? ☠️

Small Data in Clinical Research

Others also viewed

Ridge Regression and Lasso Regression.

How to Interpret the Intercept in 6 Linear Regression Examples

Logistic Regression in a Nutshell

Let's understand Linear Regression

Multiple Regression in R: Multiple Variables, Interactions, Graphing, and Assessment

What is a Logit Function and Why Use Logistic Regression?

Logistic Regression

Evaluation of logistic regression model ( Must read for all )

How to Interpret the Constant (Y Intercept) in Regression Analysis By Jim Frost http://statisticsbyjim.com/jim_frost/

Introduction Of Logistic Regression

Explore content categories