Misleading Graphs!
A poor graphical design simply results in miscommunication. Data can be handled faster using high end programming tools but they don't tell us how to communicate the graphs or summaries you generate to decision makers.
I recently came across this graph. It's purpose is to show the change in prescription refills and revenue by consumer group.
At first glance, this looks meaningful and more importantly attractive. Let's evaluate the graph to make sure that it conveys the correct message and does not mislead the viewers.
Per the refill per customer by Group graph above, customers in Group B refill the prescriptions several times more than the customers in Group A. For instance, the orange bar for year 2015 conveys the message that refills by Group B is 5 times more than Group A. Similarly, for year 2019, orange bar appears to be twice as big as the blue bar indicating that refills by Group B is extremely high. Is that true?
A few items to notice in this graph are -
- Chart type - Simplicity is the ultimate sophistication. The author has unnecessarily used a 3-D column chart simply to make it attractive, which does not add any value.
- Baseline in y-axis - The y-axis has a minimum bound of 4. This results in the orange bar appearing to be larger than it actually is. Let's go back to the example for Calendar year 2019, the refill for group A is slightly over 5, let's say it's 5.30. The orange bar i.e. refill by customer B is close to 7, let's assume is 6.90. This means that the refills by Group B customers is 30% higher than customers in Group A. By using a non-zero baseline, the author misleads the viewers to believe that Group B is extremely profitable than Group A. Excel, R, and Python makes it too easy to change the scales i.e. minimum and maximum bounds but remember this technique should be carefully handled.
Similar to the refills graph above, this graph has a few flaws as well. But if you notice, though both charts have a non-zero baseline, the line graph still conveys the message correctly.
The relative differences in bar graph appears to be higher due to non-zero minimum bound. Let's say I have 2 values 5 and 6 to be plotted and if I begin my y-bound at 4, the points 5 and 6 appear to have twice the difference which is actually not the case. Hence, non-zero minimum bounds should never be used in a column graph.
Also, the revenue graph is displayed separately. A decision maker may find it confusing if these graphs are presented separately as they are related.
A better representation would be when these charts are put together i.e. display both refill counts and revenue in the same graph and fix other issues such as avoiding 3-D display, non-zero baseline.
Now, this graph conveys the message that although Group B has filled higher number of prescriptions in year 2013 and 2019, their revenue per customer is not higher than Group A. It is clear from the above graph, that average refills by Group B customers are not 3 or 4 times higher than Group A refills. Calendar Year 2018 is profitable for Group B as their average refill is close to $300, while for Group A, it is $200.
Simplicity is the ultimate sophistication - Leonardo Da Vinci.
If you have any questions or concerns, please reach out to me at actuarialtools@gmail.com