The Geometric Mean

The Geometric Mean

The Beginning

I was recently tasked with creating an output for a clinical trial report that involved a geometric mean calculation. I thought for sure, having a mathematics and statistics background, that certainly I would have acquired this knowledge and that I simply didn't recall it. Having hung onto my college textbooks all these years, the search began. Scanning the index of these books, I encountered Geometric Series, Geometric Distribution, Geometric Random Variable, Geometric Probability Function, but no Geometric Mean. Down to the last book, it was a supplemental book from a Mathematical Statistical course called Stat Labs: Mathematical Statistics Through Applications by Deborah Nolan and Terry Speed, and there it was. A nice crisp definition, that couldn't be questioned like that of a Google search or Wikipedia. It stated,

It turns out that capitol pi symbol is not something from a language of an alien species, but it is telling us to multiply all the numbers together. So, the geometric mean of 2, 4, 6 is the cube root of 2*4*6 or ~3.63. Also, with a little elbow grease and dusting off of some high school algebra, we can use logarithm and exponent properties to show

Note that, this now requires the extra condition that our original values x > 0 since log 0 is undefined. Also note, that log is the natural log (base e).

Now, let's go on to talk about how to do this as a SAS programmer.

The Middle

DATA STEP

Unbeknown to me, SAS actually has a GEOMEAN() function. But, in the spirit of overkill here are 4 ways to calculate the geometric mean in SAS:

The X1 variable uses the original product definition. Notice that, GEOMEAN() used for X2 variable has the advantage of only considering non-missing values. Another interesting function, is the CONSTANT() function used by X3. Other constants this function can handle are the glorious pi value (3.14159) and Euler-Mascheroni constant (0.577), but I digress. Variables X3 and X4 use the equivalent alternative definition containing the log transformed values.

These methods of calculating the geometric mean are great and all, but data observations come in rows not variables (or columns). Without awkwardly transposing the data simply to apply the GEOMEAN() function, we are left to seek other solutions. I thought perhaps, the SQL procedure would let us use GEOMEAN as an aggregate function, but that was met with high resistance:

LET'S GET MEAN

Since we have an alternative definition of the geometric mean, the best approach in my opinion, is to

  1. Apply the log transformation in a data step,
  2. Calculate the arithmetic mean using the MEANS procedure
  3. Exponentiate the mean using the EXP() function in a subsequent data step.

And, there we have it, all that's left is some extra data manipulation to incorporate this into the main report. However, the fun stuff is over, so let's wrap this up.

The End

Geometric mean is typically calculated when dealing with pharmacokinetic (PK) concentrations of a drug. However, there is usually a Lower Limit of Quantitation (LLOQ) where the drug concentration can not be detected. As it relates to the geometric mean calculation, it is good that we can't have negative numbers in this scenario since this violates the definition of a geometric mean. However, I am curious about the handling of these Below Quantitation Limit (BQL) results. What is the best way to handle those cases? If we handle them as being 0, this automatically makes the geometric mean 0 based on the original product definition. Is it better to treat them as missing or impute values as LLOQ divided by 2? If you are a statistician in the pharmaceutical industry, I'd love to hear from you in the comments below. Do these methods introduce any bias?


To view or add a comment, sign in

More articles by Christopher Smith

  • Finding Hope

    Back in 2011, I remember Priority Health, my employer at the time, asked if anyone would like to participate in the…

    9 Comments
  • Marathon Man

    Last week I signed up for the Grand Rapids Marathon for October 2021. When I tell friends and family, they immediately…

    5 Comments
  • Don’t Snooze Your Retirement Savings

    May 2, 2021 10 minute read Introduction I get it, talking retirement, money, and numbers is boring. It is far beyond…

    5 Comments
  • Casino Games: A Losing Bet

    April 25, 2021 10 minute read Introduction When I turned 18, I was excited to be able to go on short road trips with my…

    3 Comments
  • Debt: It’s Personal

    April 18, 2021 5 minute read From a young age, I have always hated debt. I remember vividly, one day during my…

    5 Comments
  • Good Programming Practices, Writing Clean Code, and Winning Over Your Teammates

    Are we losing the art of writing clean code? Programming is not just logic, it is an art form. Regardless of the…

    26 Comments
  • The Kaplan-Meier Survival Time Percentiles

    If you’ve been a statistical programmer in the pharmaceutical-biotech industry for any length of time, you’ve likely…

    5 Comments
  • Partial End Date Imputation with Leap Year Considerations

    The Breakfast Yesterday, I had breakfast with an industry colleague and friend. Similar to our neighboring breakfast…

    10 Comments

Others also viewed

Explore content categories