An Introduction to Machine Learning Models using the kNN Algorithm
Recently, I was looking for a new Machine Learning Model, which most people have not heard of. So, I came across something called a kNN and started learning about it and coding it. A kNN stands for k Nearest Neighbours. This algorithm is primarily used for classification predictive problems. Here is everything you need to know about this simple and easy to use algorithm.
How do kNN's work?
Let's start off with a easy case. Here is a plane with a set of data points. In this case the points are either red circles or green squares.
But, there is a blue star in the dataset as well. We need to know what the blue star identifies as. Is the blue star a red circle or is it a green square?
This is where the kNN algorithm can help us. The "k" in kNN, is the amount of nearest neighbours we want to look at, in order to classify the blue star. Let's give "k" the value of 3 in this case. Now, let's take a look at the 3 closest data points to the blue star.
The 3 closest data points to the blue star are all red circles. This means that we can safely predict the blue star is a red circle. Seem easy enough? The truth is, this is all there is to the algorithm.
Here is another example:
The test sample (green circle) should be classified either to the first class of blue squares or to the second class of red triangles. If k = 1 it is assigned to the first class because there is only 1 blue square. If k = 3, it is assigned to the second class because there is only one blue square but there are 2 red triangles.
Applications of kNN's?
- Healthcare: Who has a tumour like mine? Was it a malignant or benign cancer? Was it an operable or inoperable condition? Do I need a biopsy?
- Banks: Should a bank loan me money? Would I pay interest rates? Do people with characteristics like me pay their interest rates?
- Handwriting detection: Who has handwriting similar to mine?
Summary:
Pros:
- Simple algorithm
- High accuracy - it is relatively high but not enough to compete with better Supervised Learning Models like SVM's or Naive Bayes
- The algorithm is efficient, comprehensive and generally versatile to different industries (see applications of kNN's above)
Cons:
- High memory requirement
- Stores most of the training data (as it is essential for the algorithm)
- Prediction time can be slow if there is a big value of "k"
Steps of a kNN:
- A positive integer value is given to k
- The nearest amount of "k" neighbours are found
- The most common classification is found
- This is the classification which is given to the sample
I hope this gave you a good introduction to Machine Learning models as well as what a kNN model is. Please feel free to comment or ask questions below.