How KNN Algorithm Works, When do we use KNN and How do we choose the factor K in KNN?

Subhasish Bhattacharjee

Published Aug 7, 2019

In this article I am giving an honest try to simplify the understanding of KNN algorithm and do a deep drive on KNN so that any students or anyone new to machine learning who is trying to dig through internet/other resources and probably getting confused when to use which algorithm, can understand KNN better and implement this lazy and simple algorithm in real time classification problem.

Here I am trying to give point wise amplifications on implementation of KNN and as I mentioned this article is for those who are new to machine learning, others who are already a Data Scientist and having hands some experience, may find this article very simple, but pointless to say again that the intension of this article is to make KNN algorithm understanding simple for novice.

In future articles, I will try to organize and simplify all other machine learning algorithms and map those to real time implementation.

Let’s start with point wise understanding about KNN:

What is KNN?

The KNN algorithm is a robust and versatile classifier that is often used as a benchmark for more complex classifiers such as Artificial Neural Networks (ANN) and Support Vector Machines (SVM). Despite its simplicity, KNN can outperform more powerful classifiers and is used in a variety of applications such as economic forecasting, data compression and genetics.

Let’s first start by establishing some definitions and notations. We will use xx to denote a feature (aka. predictor, attribute) and yy to denote the target (aka. label, class) we are trying to predict.

KNN falls in the supervised learning family of algorithms. Informally, this means that we are given a labelled dataset consisting of training observations (x,y)(x,y) and would like to capture the relationship between xx and yy. More formally, our goal is to learn a function h:X→Yh:X→Y so that given an unseen observation xx, h(x)h(x) can confidently predict the corresponding output yy.

The KNN classifier is also a non parametric and instance-based learning algorithm.

· Non-parametric means it makes no explicit assumptions about the functional form of h, avoiding the dangers of mismodeling the underlying distribution of the data. For example, suppose our data is highly non-Gaussian but the learning model we choose assumes a Gaussian form. In that case, our algorithm would make extremely poor predictions.

· Instance-based learning means that our algorithm doesn’t explicitly learn a model. Instead, it chooses to memorize the training instances which are subsequently used as “knowledge” for the prediction phase. Concretely, this means that only when a query to our database is made (i.e. when we ask it to predict a label given an input), will the algorithm use the training instances to spit out an answer.

How the KNN algorithm works?

In the classification setting, the K-nearest neighbor algorithm essentially boils down to forming a majority vote between the K most similar instances to a given “unseen” observation. Similarity is defined according to a distance metric between two data points. A popular choice is the Euclidean distance given by

d(x,x′)=(x1−x′1)2+…+(xn−x′n)2−−−−−−−−−−−−−−−−−−−−−−−√d(x,x′)=(x1−x1′)2+…+(xn−xn′)2

but other measures can be more suitable for a given setting and include the Manhattan, Chebyshev and Hamming distance.

More formally, given a positive integer K, an unseen observation xx and a similarity metric dd, KNN classifier performs the following two steps:

· It runs through the whole dataset computing dd between xx and each training observation. We’ll call the K points in the training data that are closest to xx the set AA. Note that K is usually odd to prevent tie situations.

· It then estimates the conditional probability for each class, that is, the fraction of points in AA with that given class label. (Note I(x)I(x) is the indicator function which evaluates to 11 when the argument xx is true and 00otherwise)

P(y=j|X=x)=1K∑i∈AI(y(i)=j)P(y=j|X=x)=1K∑i∈AI(y(i)=j)

Finally, our input xx gets assigned to the class with the largest probability.

When do we use KNN?

We can use KNN when all these below points are valid

· When Data is labelled:

What do we mean when we say data is labelled? It means we already know the results of the data for a particular datasets under our analysis and based on this we are trying make our model learn how to classify the future unknown data.

· When Data is noise free:

Noisy data is data with a large amount of additional meaningless information in it. This includes data corruption and the term is often used as a synonym for corrupt data. It also includes any data that a user system cannot understand and interpreted correctly. Now you know that Noise (in the data science space) is unwanted data items, features or records which don’t help in explaining the feature itself, or the relationship between feature & target. Noise often causes the algorithms to miss out patterns in the data.

Noise in tabular data can be of three types:

§ Anomalies in certain data items (Noise 1: certain anomalies in features & target)

§ Features that don’t help in explaining the target (Noise 2: irrelevant/weak features)

§ Records which don’t follow the form or relation which rest of the records do (Noise 3: noisy records)

Note: Features means an individual measurable property or characteristic of a phenomenon being observed. The data features that you use to train your machine learning models. In future I will write on Feature Selection Techniques, Noise identification and Dealing with Noisy Data in Machine Learning

· When Dataset is small:

If dataset is too big KNN may underperform as KNN is lazy learner and does not learn a discriminative function from training set.

How do we choose the factor K in KNN?

We can choose the k factor by following below steps:

· Take square root of the number of data points and that number can be the k

e.g.: if you have ‘100’ data points, the k=10

· But always try to make sure that you decide on odd numbers of k then even number to avoid confusion between to classed of data

e.g.: if the above case k came out as 10 after the square root, but we can choose either k=9 or k=11 just to make sure that k is odd.

As a rule of thumb, setting K to the square root of the number of training patterns/samples can lead to better results.

Choosing the right value for K:

To select the K that’s right for your data, we run the KNN algorithm several times with different values of K and choose the K that reduces the number of errors we encounter while maintaining the algorithm’s ability to accurately make predictions when it’s given data it hasn’t seen before.

Here are some things to keep in mind:

1. As we decrease the value of K to 1, our predictions become less stable. Just think for a minute, imagine K=1 and we have a query point surrounded by several reds and one green (I’m thinking about the top left corner of the colored plot above), but the green is the single nearest neighbor. Reasonably, we would think the query point is most likely red, but because K=1, KNN incorrectly predicts that the query point is green.

2. Inversely, as we increase the value of K, our predictions become more stable due to majority voting / averaging, and thus, more likely to make more accurate predictions (up to a certain point). Eventually, we begin to witness an increasing number of errors. It is at this point we know we have pushed the value of K too far.

3. In cases where we are taking a majority vote (e.g. picking the mode in a classification problem) among labels, we usually make K an odd number to have a tiebreaker.

Hope below all these links helps you all to dig KNN further:

Youtube Links:

https://www.youtube.com/watch?v=gHdbqsDK9YY&list=PLBv09BD7ez_48heon5Az-TsyoXVYOJtDZ

https://www.youtube.com/watch?v=4HKqjENq9OU&t=8s

https://www.youtube.com/watch?v=6kZ-OPLNcgE&t=768s

https://www.youtube.com/watch?v=6kZ-OPLNcgE

https://www.youtube.com/watch?v=ZBHStob8lhc

Blogs:

https://towardsdatascience.com/machine-learning-basics-with-the-k-nearest-neighbors-algorithm-6a6e71d01761

https://www.analyticsvidhya.com/blog/2018/03/introduction-k-neighbours-algorithm-clustering/

https://blog.usejournal.com/a-quick-introduction-to-k-nearest-neighbors-algorithm-62214cea29c7