Evaluating Classification Models -ML
There are different techniques to evaluate classification algorithms such as Logistic Regression, SVM, Decision Trees etc.. Listed below are evaluation techniques that helps to choose the best model.
Confusion Matrix :
Assume we want to predict if a patient would potentially suffer from diabetics based out of his diet ,age etc... Our prediction model should potentially say true or false i.e Diabetics / Non-diabetics patient . Confusion matrix would be 2*2 Matrix (It could be n*n (Multi-class), but in the above case its 0,1(binary))
True Positive (TP) : Model predicted correctly that patient has diabetics
False Positive (FP): Model predicted that patient has diabetics , but actually patient is not diabetic-Predicted Incorrectly
True Negative(TN): Model predicted correctly that patient is not diabetic
False Negative(FN) : Model predicted that the patient is not diabetic ,but actually the patient is diabetic - Predicted Incorrectly.
Type 1 Error - False positive is often called as Type 1 Error.
Type 2 Error - False Negative is often called as Type 2 Error .
Accuracy :
How well does the model perform with respect to its correct prediction to total number of samples provided , i.e
Accuracy = (TP +TN) / (TP+TN+FP+FN)
Precision :
How precise did the model perform with respect to its Positive scenarios.
Precision = TP/(TP+FP)
Recall :
Recall other wise called as sensitivity or the true positive rate which infers model behavior with false negative(model should have identified positive but flagged negative ).
Recall = TP/(TP+FN)
F1 Score :
F1 Score is an harmonic mean of precision and recall. Higher the F1 Score the higher are the precision and recall
An interesting read why harmonic mean and not a simple arithmetic mean ? -Intention to choose harmonic mean is to punish the outliers.
a) https://stackoverflow.com/questions/26355942/why-is-the-f-measure-a-harmonic-mean-and-not-an-arithmetic-mean-of-the-precision
b) http://groups.di.unipi.it/~bozzo/The%20Harmonic%20Mean.htm
ROC / AUC :
ROC : Receiver operating characteristic plots the TPR (Recall) against False Positive rate (FP/(FP+TN)). ROC helps us to choose the Threshold that balances Recall and Specificity.
References :
https://en.wikipedia.org/wiki/F-score
https://stackoverflow.com/questions/26355942/why-is-the-f-measure-a-harmonic-mean-and-not-an-arithmetic-mean-of-the-precision