Evaluation Techniques for Machine Learning Models

Ayo Kehinde Samuel, AIE™

Published Jul 23, 2021

In the present day, Machine Learning allows gain insight from data using through predictive modelling. It basically involves utilizing mathematical models to create business insights for understanding the data at hand. Once these models have been fitted to prepared data, it can be used to predict newly observed data.

In Machine Learning, truthfully models can only be as useful as their quality of predictions; hence the fundamental our goal is never to create models but to create high-quality models with promising predictive power.

Alright, I’m done being a traditional lecturer. Let’s examine strategies for evaluating the quality of models that are generated by ML algorithms.

1. Binary Classifier Evaluation Metrics

Should I explain binary classifiers? No!, not in this article, please.

When it comes to evaluating a Binary Classifier, assuming you’ve built one. if not Kaggle has a great resource to begin.

Accuracy is a popular performance evaluation metric. It can be used to differ a better classification model from one that is weak.

a. Accuracy is, simply put, the total proportion of observations that have been correctly predicted. So there are four (4) main components that comprise the mathematical formula for calculating Accuracy.

· TP – True positive (it is the total number of labels that belong to the positive class and have been predicted correctly)

· TN – True negative (it is the total number of labels that belong to the negative class and have been predicted correctly.)

· FP – False positive (This is the total number of labels that have been predicted to belong to the positive class, but actually, belong to the negative class. It is also known as a Type 1 Error)

· FN– false negative (it is the total number of labels that have been predicted to belong to the negative class but instead belong to the positive class. It may be referred to as a Type II Error), and these components grant us the ability to explore other ML Model Evaluation Metrics. The formula for calculating accuracy is:

The sole reason for utilizing the Accuracy Evaluation Metric is for ease of use.

My disclaimer and if I was a lawyer, CAVEAT!

Accuracy, as it is, is an evaluation metric that does not perform well when the classes are imbalanced. It suffers from a paradox; where the accuracy value maybe high but the model is seriously lacking predictive power and most, if not all the time, predictions are going to be incorrect.

For this reason, we are highly compelled to turn to other evaluation metrics in the scikit-learn arsenal, obviously not England’s Arsenal FC (no offence – none taken. lol) to understand this better.

Our Case Study

Most importantly, before we go on. You learn ml practically with hands-on. Let's use the Heart Disease Dataset available on the UCI repository. You can download the clean dataset from here and my notebook from here

Lets look at the confusion matrix values from the model:

A confusion matrix is an N * N matrix, where N is the number of labels being predicted. For this demonstration, let’s have N=2, and hence we get a 2 x 2 matrix.

From our train and test data, we already know that our test data consisted of 91 data points. That is the intersection of the 3rd column and 3rd row value at the end. We have also noticed that there are some actual and predicted values. The actual values are the number of data points that were originally categorized into 0 or 1. The predicted values are the number of data points our KNN model predicted as 0 or 1.

The actual values are:

The patients who actually don’t have a heart disease = 49
The patients who actually do have a heart disease = 42

While the predicted values are:

Recommended by LinkedIn

Evaluation Metrics in Machine Learning: How to Measure…

Amina Javaid 2 years ago

The Power of Linear Regression in Machine Learning

Sadup Softech 2 years ago

Unveiling Evaluation Metrics for Machine Learning: A…

Anubhav Yadav 1 year ago

Number of patients who were predicted as not having a heart disease = 42
Number of patients who were predicted as having a heart disease = 49

All the values we obtain above have a term. Let’s go over them one by one:

40 cases in matrix showed that the patients actually did not have heart disease and our model also predicted as not having, this is called the True Negatives.
40 cases in matrix showed that the patients actually have heart disease and our model also predicted as having it are called the True Positives.
Unfortunately, there are are some cases where the patient actually has no heart disease, but our model has predicted that they do. it is a False Positives. we have 9

Similarly, there are are some cases where the patient actually has heart disease, but our model has predicted that he/she don’t. It is False Negatives. we have 2.

Back to base!

b. Precision – Precision is clearly the proportion (total number) of all observations that have been predicted to belong to the positive class and that are actually positive. In clearer terms, it is the ratio between the True Positives and all the Positives. For reference to our problem statement, that would be the measure of patients that we correctly identify having a heart disease out of all the patients actually having it. Mathematically:

What is the Precision for our model? Yes, it is 0.816(40/40+9) or, when it predicts that a patient has heart disease, it is correct around 82% of the time.

c. Recall - Recall is the measure of a model correctly identifying True Positives. Thus, for all the patients who actually have heart disease, recall tells us how many we correctly identified as having a heart disease. For our model, Recall = 0.95(40/40+2).

Recall provides insight of how accurately our model is able to identify the relevant data. We refer to it as Sensitivity or True Positive Rate(TPR). What if a patient has heart disease, but there is no treatment given to him/her because our model predicted they're negative? That is a situation with a big mess!

Mathematically:

d. F1 Score – In ML, there is always a trade-off between precision and recall. For example, for our case study, we can consider that achieving a high recall is more important than getting a high precision – we would like to detect as many heart patients as possible.

For some other models, like loan default classification, classifying whether a bank customer is a loan defaulter or not, it is desirable to have a high precision since the bank wouldn’t want to lose customers who were denied a loan based on the model’s prediction that they would be defaulters.

The F1-Score is the harmonic mean of the precision and recall values for a classification problem. This is an averaging metric that is used to generate a ratio. This evaluation metric is a measure of overall correctness that our model has achieved in a positive prediction environment-i.e., of all observations that our model has labeled as positive, how many of these observations are actually positive. The formula for the F1 Score

2. Regression Analysis Evaluation Metrics.

In Regression analysis, you will find that one of the widely used and well-known metrics for evaluation is the MSE. MSE stands for Mean Squared Error.

Mean Squared Error involves finding the squared sum of all the distances between predicted and true values.

A good MAE should stay as far away from 1 as possible.

In clearer terms: The higher the output value for MSE, the worse the quality of model predictions. This is because the high output value indicates there is a great sum of squared error present in the model.

To view or add a comment, sign in

Evaluation Techniques for Machine Learning Models

Ayo Kehinde Samuel, AIE™

Recommended by LinkedIn

More articles by Ayo Kehinde Samuel, AIE™

Others also viewed

From Statistical Methods to Deep Learning: The Ultimate Guide to Outliers in Machine Learning

Metrics for Evaluation of Supervised Machine Learning Models

Machine Learning: How to Choose an Ideal Algorithm for your Initiative

How does Machine Learning work?

Types of machine learning

How to: Causal Machine Learning Modeling

Machine Learning for Predictive Analytics

Synergy between Statistics and Machine Learning

Demystifying Machine Learning: A Beginner’s Guide to Key Algorithms

Machine Learning Apps. with Real ROI

Best Practices For Evaluating Predictive Analytics Models

Understanding Model Drift In Machine Learning Applications

Understanding Overfitting In Predictive Analytics

The Importance Of Cross-Validation In Machine Learning

How to Optimize Machine Learning Performance

Explore content categories

Recommended by LinkedIn

More articles by Ayo Kehinde Samuel, AIE™

THE AI HIERARCHY OF NEEDS

What exactly is Batch Size, Epoch, Sample?

AI - the future of technology

Others also viewed

From Statistical Methods to Deep Learning: The Ultimate Guide to Outliers in Machine Learning

Metrics for Evaluation of Supervised Machine Learning Models

Machine Learning: How to Choose an Ideal Algorithm for your Initiative

How does Machine Learning work?

Types of machine learning

How to: Causal Machine Learning Modeling

Machine Learning for Predictive Analytics

Synergy between Statistics and Machine Learning

Demystifying Machine Learning: A Beginner’s Guide to Key Algorithms

Machine Learning Apps. with Real ROI

Similar topics

Best Practices For Evaluating Predictive Analytics Models

Understanding Model Drift In Machine Learning Applications

Understanding Overfitting In Predictive Analytics

The Importance Of Cross-Validation In Machine Learning

How to Optimize Machine Learning Performance

Explore content categories