Machine Learning Is Not An Open Book!!
In my 3+ years of experience in analytics, I have worked on different use cases in pharma and financial services which used Machine learning methods for problem solving. After all this time people generally wonder that we have easily available data from clients and we have ready to use ML packages, so what’s the ML hype all about? Isn’t it so easy to deliver solutions?
So, my friends, the deal here is not just to apply the package and get correct results on given data. It’s more about our model being able to generalize everything that’ll come in future equally well. Sounds easy but is really hard!! Companies are making huge investments to analyse and predict sales/customer behavior/ fraudulent aspects etc. well in advance, not for us to just implement sklearn packages.
Ensuring that the data we feed into the model makes proper business sense takes major chunk of the model building time. In a boatload of ML methods, ensuring that your selected method aligns with the business ask has equivalent weightage.
The trickiest step is to find best set of parameters which’ll explain your model the best i.e. parameter tuning. For this, knowing just the name of the package won’t work for you. The only rescue option is to study inner working of your algorithm (unless your stars are aligned, and hit & trial gives you the best model)!!
With an overconfident mindset, I once tried to build a model to predict fraudulent customers and the client wanted good model Precision (In layman terms precision is having most of the predictions correct in the model predicted pool). I went to my manager with a properly implemented model with 96.8% precision!!!
With a smile on his face, he bluntly rejected the model. I was so sure that the precision was good, the packages were right, data was correct! So, where did I go wrong?
It came out that my focus was so much on methods, precision and code building that I forgot to visualize other model performance metrics. People in analytics might know that there is a trade-off between precision and recall. With my selected model parameters, 96.8% precision meant 2-3% Recall.
In layman terms Recall denotes coverage i.e. how many positives our model was able to capture. For example- out of 1000 cancer patients, 3% recall meant that my model identified only 30 cancer patients. The model made 31 predictions overall out of which 30 were True Positives and 1 was False Positive. Hence the Precision = 96.8% (30/31). Such blunders are highly unaffordable as we are losing out on so many positives despite having high precision.In business world, this could have terrible impact!
So, ML is not only about data and packages. It’s more about knowing the inner working of the algorithms and the ability to visualize the broader picture, ensuring that picture you're painting uses right colors, brushes and the final picture doesn’t give you a dog photo instead of Mona Lisa’s.
To conclude, someone rightly said –
What we know is a DROP. What we don’t know is an OCEAN!!
Thanks for sharing. Confusion matrix concept is so really confusing until we faced some problem.👍
Thanks for sharing your experience, Monika!