Ratio is Not Rational

Ratio is Not Rational

We often use ratios as KPIs when we train models for classification or for other tasks. However, if you think about it, ratio is a poor way for capturing the connection between two variables.

* Note - I am referring to variables with positive values only.

Let's say you want to compare the sales of products. You decided to take the sales in 2019 and divide them by the sales of 2018. Sometimes this ratio is referred to as the lift in sales or as the sales growth. One problem with this KPI is that it is not bounded. The smaller the sales in 2018 - the higher the ratio. When the sales are 0 - the ratio is infinite. Another problem is the symmetry (or the lack of one) of the KPI. The smaller the sales in 2019 - the closer the KPI will be to 0 - very different than the behvior of the KPI when the sales of 2018 drops to 0.

Meet the natural growth ratio, denoted by (x-y)/(x+y). This ratio have some very nice characteristics:

  1. The ratio is (almost) bounded - to make it bounded, there is only one case that needs to be addressed - when x+y=0. In this case, we will set the KPI to 0. This will make the KPI bounded between -1 and 1.
  2. The ratio is symmetric - the KPI is symmetric around 0. if you draw the function Z=(X-Y)/(X+Y) and play with the values of X and Y you will get a mirror image on the negative axis of Z
  3. This ratio will obtain the sames rank as X/Y. This means that if X1/Y1 > X2/Y2, then also (X1-Y1)/(X1+Y1)>(X2-Y2)/(X2+Y2)

This is the scaling of (x-y)/(x+y) vs x/y, you can see that the natural growth KPI will asymptotically reach 1, when x/y reaches infinity.

These characteristics make the natural growth KPI much better than the usual ratio most of us are used to calculate.

So, next time you are planning to compare variables, be rational and don't use a simple ratio.


תודה רבה לך על השיתוף🙂 אני מזמין אותך לקבוצה שלי: הקבוצה מחברת בין ישראלים במגוון תחומים, הקבוצה מייצרת לקוחות,שיתופי פעולה ואירועים. https://chat.whatsapp.com/BubG8iFDe2bHHWkNYiboeU

Like
Reply

To view or add a comment, sign in

More articles by Roee Anuar

  • Trying multiple classifiers on your data

    In many cases, you don't know which algorithm is going to work best with your data. In this scenario, you may test…

    3 Comments
  • Deep Learning with R

    About a year ago, the Keras deep learning package was announced for R (https://blog.rstudio.

    1 Comment
  • A quick SQL tip <for dummies> that saves time

    Usually, after running a tree based prediction model, I use a SQL query to examine the data. This allows me to fine…

    2 Comments
  • I cannot correlate to that

    "Lies, damned lies, and statistics" is a common phrase, emphasizing the lack of faith we sometimes experience with…

    2 Comments
  • The Reversal Paradox

    When working with customers, it may sometimes get pretty frustrating explaining why one cannot always rely on…

    4 Comments
  • Easy 1-D KMeans in Excel

    One method for creating a discrete variable from a continuous variable, includes running a clustering algorithm for…

  • C-B4 is recruiting a Senior J2EE Developer

    Requirements: 5+ years' experience as a software developer Java, JEE; Expert level knowledge in Spring, JPA/Hibernate…

  • One Click Entropy in Excel

    Usually when trying to capture a variable's entropy in excel, you would have to use a pivot table to find the frequency…

    3 Comments
  • Entropy Per Customer & SQL

    Shannon's (1948) entropy measures the amount of randomness in the data (http://en.wikipedia.

  • Supervised vs. Un-supervised Analysis

    I was asked to help a client with the following issue: The client has a data file containing financial, demographic…

Others also viewed

Explore content categories