Evaluation Metrics in Classification Problem
javatpoint

Evaluation Metrics in Classification Problem

Classification is a supervised machine learning model used to classify new observations. The model learns from training data and classifies new observations. Choosing the correct evaluation metric for a classification problem is important as it can vary from problem to problem. Let’s understand classification evaluation metrics.

1. Accuracy

Accuracy simply measures how often the model correctly predicts. It is the ratio of the number of correct predictions divided by the number of total predictions.

Article content

TP: Number of positive labels that are also predicted as positive. 

FP: Number of negative labels that are predicted as positive.

TN: Number of negative labels that are also predicted as negative.

FN: Number of positive labels that are predicted as negative.

But it only takes into account the correctly classified predictions and not the wrong predictions. Let’s see an example. Suppose there are 99 negative labels and 1 positive label then the classifier would always predict observation as positive then we got 99%(TP = 99, TN = 0, FP = 1, FN = 0) accuracy which is a good score but the model is poor as it is not predicting negative label, hence the model is biased. To tackle this situation, we need some other metrics. Accuracy is useful when the target class is well-balanced.

2. Precision

Precision is defined as the number of true positives divided by positive predictions(TP+FP). It explains correct positive prediction among all positive predictions. It is useful when false positive(FP) is higher concern than false negative(FN)

Article content

3. Recall

Recall is defined as the number of true positives divided by actual positives. It explains correct positive predictions among all actual positive labels. It is useful when false negative(FN) is higher concern than false positive(FP). Example, In cancer detection, FP is fine but the actual positive case should not go undetected.

Article content

We need both precision and recall to be high but there is a trade-off between precision and recall. Hence we have F1 Score.

4. F1 Score

F1 Score is the combination of precision and recall. It is calculated as harmonic mean of precision and recall. It is useful when FN and FP are equally important. It is maximum when precision and recall are equal.

Article content

5. AUC-ROC

The ROC(Receiver Operating Characteristics) curve, is a graph showing the performance of the classification model at various threshold values. It plots TPR Vs FPR. True Positive Rate(TPR) is nothing but recall. False Positive Rate(FPR) is the ratio of FP and the number of actual negatives.

AUC is the Area Under Curve of the ROC plot. Maximum the area, the better the classification model. When AUC is 1, it means the classifier is able to perfectly distinguish between all positive and negative classes.

Article content
geeksforgeeks: auc-roc-curve

We can compare the performance of multiple models and choose the model with the highest AUC value. It is a good metric to compare performance between two or more models.

These are the most used classification evaluation metrics and should be used according to the given problem.

End Notes

Thanks for reading! I hope I have given some basic understanding of evaluation metrics in the classification problem. I am always open to your questions and suggestions. Do connect with me on LinkedIn.


To view or add a comment, sign in

More articles by Alisha Metkari

  • Power Transformations In Machine Learning

    Many machine learning algorithms like regression assumes dependent variable to have gaussian(normal) distribution. In…

Others also viewed

Explore content categories