Evaluation Techniques for Machine Learning Models

Evaluation Techniques for Machine Learning Models

In the present day, Machine Learning allows gain insight from data using through predictive modelling. It basically involves utilizing mathematical models to create business insights for understanding the data at hand. Once these models have been fitted to prepared data, it can be used to predict newly observed data.

In Machine Learning, truthfully models can only be as useful as their quality of predictions; hence the fundamental our goal is never to create models but to create high-quality models with promising predictive power.

Alright, I’m done being a traditional lecturer. Let’s examine strategies for evaluating the quality of models that are generated by ML algorithms.

1.     Binary Classifier Evaluation Metrics

Should I explain binary classifiers? No!, not in this article, please.

When it comes to evaluating a Binary Classifier, assuming you’ve built one. if not Kaggle has a great resource to begin.

Accuracy is a popular performance evaluation metric. It can be used to differ a better classification model from one that is weak.

No alt text provided for this image

a.      Accuracy is, simply put, the total proportion of observations that have been correctly predicted. So there are four (4) main components that comprise the mathematical formula for calculating Accuracy.

·        TP – True positive (it is the total number of labels that belong to the positive class and have been predicted correctly)

·        TN – True negative (it is the total number of labels that belong to the negative class and have been predicted correctly.)

·        FP – False positive (This is the total number of labels that have been predicted to belong to the positive class, but actually, belong to the negative class. It is also known as a Type 1 Error)

·        FN– false negative (it is the total number of labels that have been predicted to belong to the negative class but instead belong to the positive class. It may be referred to as a Type II Error), and these components grant us the ability to explore other ML Model Evaluation Metrics. The formula for calculating accuracy is:

The sole reason for utilizing the Accuracy Evaluation Metric is for ease of use.

My disclaimer and if I was a lawyer, CAVEAT!

Accuracy, as it is, is an evaluation metric that does not perform well when the classes are imbalanced. It suffers from a paradox; where the accuracy value maybe high but the model is seriously lacking predictive power and most, if not all the time, predictions are going to be incorrect.

For this reason, we are highly compelled to turn to other evaluation metrics in the scikit-learn arsenal, obviously not England’s Arsenal FC (no offence – none taken. lol) to understand this better.

Our Case Study

Most importantly, before we go on. You learn ml practically with hands-on. Let's use the Heart Disease Dataset available on the UCI repository. You can download the clean dataset from here and my notebook from here

Lets look at the confusion matrix values from the model:

No alt text provided for this image

A confusion matrix is an N * N matrix, where N is the number of labels being predicted. For this demonstration, let’s have N=2, and hence we get a 2 x 2 matrix.

From our train and test data, we already know that our test data consisted of 91 data points. That is the intersection of the 3rd column and 3rd row value at the end. We have also noticed that there are some actual and predicted values. The actual values are the number of data points that were originally categorized into 0 or 1. The predicted values are the number of data points our KNN model predicted as 0 or 1.

The actual values are:

  • The patients who actually don’t have a heart disease = 49
  • The patients who actually do have a heart disease = 42

While the predicted values are:

  • Number of patients who were predicted as not having a heart disease = 42
  • Number of patients who were predicted as having a heart disease = 49

All the values we obtain above have a term. Let’s go over them one by one:

  • 40 cases in matrix showed that the patients actually did not have heart disease and our model also predicted as not having, this is called the True Negatives. 
  • 40 cases in matrix showed that the patients actually have heart disease and our model also predicted as having it are called the True Positives. 
  • Unfortunately, there are are some cases where the patient actually has no heart disease, but our model has predicted that they do. it is a False Positives. we have 9

  1.  Similarly, there are are some cases where the patient actually has heart disease, but our model has predicted that he/she don’t. It is  False Negatives.  we have 2.

Back to base!

b.     Precision – Precision is clearly the proportion (total number) of all observations that have been predicted to belong to the positive class and that are actually positive. In clearer terms, it is the ratio between the True Positives and all the Positives. For reference to our problem statement, that would be the measure of patients that we correctly identify having a heart disease out of all the patients actually having it. Mathematically:

No alt text provided for this image

What is the Precision for our model? Yes, it is 0.816(40/40+9) or, when it predicts that a patient has heart disease, it is correct around 82% of the time.

c.      Recall - Recall is the measure of a model correctly identifying True Positives. Thus, for all the patients who actually have heart disease, recall tells us how many we correctly identified as having a heart disease. For our model, Recall = 0.95(40/40+2).

Recall provides insight of how accurately our model is able to identify the relevant data. We refer to it as Sensitivity or True Positive Rate(TPR). What if a patient has heart disease, but there is no treatment given to him/her because our model predicted they're negative? That is a situation with a big mess!

Mathematically:

No alt text provided for this image

d.     F1 Score – In ML, there is always a trade-off between precision and recall. For example, for our case study, we can consider that achieving a high recall is more important than getting a high precision – we would like to detect as many heart patients as possible.

For some other models, like loan default classification, classifying whether a bank customer is a loan defaulter or not, it is desirable to have a high precision since the bank wouldn’t want to lose customers who were denied a loan based on the model’s prediction that they would be defaulters.

The F1-Score is the harmonic mean of the precision and recall values for a classification problem. This is an averaging metric that is used to generate a ratio. This evaluation metric is a measure of overall correctness that our model has achieved in a positive prediction environment-i.e., of all observations that our model has labeled as positive, how many of these observations are actually positive. The formula for the F1 Score

No alt text provided for this image
No alt text provided for this image


2.     Regression Analysis Evaluation Metrics.

In Regression analysis, you will find that one of the widely used and well-known metrics for evaluation is the MSE. MSE stands for Mean Squared Error.

Mean Squared Error involves finding the squared sum of all the distances between predicted and true values.

No alt text provided for this image


A good MAE should stay as far away from 1 as possible.

In clearer terms: The higher the output value for MSE, the worse the quality of model predictions. This is because the high output value indicates there is a great sum of squared error present in the model.

To view or add a comment, sign in

More articles by Ayo Kehinde Samuel, AIE™

  • THE AI HIERARCHY OF NEEDS

    The AI Hierarchy of Needs In Abraham Maslow’s 1943 paper in psychology, he proposed the Maslow's hierarchy of needs, an…

    6 Comments
  • What exactly is Batch Size, Epoch, Sample?

    What Is a Sample? A sample is a single row of data. It contains inputs that are fed into the algorithm and an output…

  • AI - the future of technology

    What is AI? Artificial intelligence also called AI refers to the simulation of the intelligence in human beings in…

    2 Comments

Others also viewed

Explore content categories