KNN ALGORITHM

Varshinii kasirajan

Published Mar 8, 2024

K-Nearest Neighbors (K-NN) algorithm is a popular Machine Learning algorithm used mostly for solving classification problems. Evelyn Fix and Joseph Hodges developed this algorithm in 1951, which was subsequently expanded by Thomas Cover.

It belongs to the supervised learning domain and finds intense application in pattern recognition, data mining, and intrusion detection. It is widely disposable in real-life scenarios since it is non-parametric, meaning it does not make any underlying assumptions about the distribution of data.

How Does the K-Nearest Neighbors Algorithm Work?

K-NN compares a new data entry to the values in a given data set .Based on its closeness in a given range (K) of neighbors, the algorithm assigns the new data to a class or category in the data set.

Step 1 - Assign a value to K. k represents the number of nearest neighbors that needs to be considered while making prediction.

Step 2 - Calculate the distance between the new data entry and all other existing data entries. Arrange them in ascending order.

Step 3 - Find the K nearest neighbors to the new entry based on the calculated distances.

Step 4 - Assign the new data entry to the majority class in the nearest neighbors.

Why do we need a KNN algorithm?

It is a versatile and widely used machine learning algorithm that is primarily used for its simplicity and ease of implementation.
It does not require any assumptions about the underlying data distribution.
It can handle both numerical and categorical data, making it a flexible choice for various types of datasets in classification and regression tasks.
The K-NN algorithm works by finding the K nearest neighbors to a given data point based on a distance metric, such as Euclidean distance.The class or value of the data point is then determined by the majority vote or average of the K neighbors. This approach allows the algorithm to adapt to different patterns and make predictions based on the local structure of the data.

Distance Metrics Used in KNN Algorithm

Euclidean Distance

the cartesian distance between the two points which are in the plane/hyperplane. Euclidean distance can also be visualized as the length of the straight line that joins the two points which are into consideration.

Manhattan Distance

it is generally used when we are interested in the total distance traveled by the object instead of the displacement. This metric is calculated by summing the absolute difference between the coordinates of the points in n-dimensions.

Minkowski Distance

Euclidean, as well as the Manhattan distance, are special cases of the minkowski distance.

Recommended by LinkedIn

k-Nearest Neighbors

Gauransh Singh 5 years ago

Transform Your Data into a New Space with Linear…

Uttam Kumar 3 years ago

K-means Clustering in Machine Learning

Aarti M. 4 years ago

Advantages of the KNN Algorithm

Easy to implement as the complexity of the algorithm is not that high.
Adapts Easily – As per the working of the KNN algorithm it stores all the data in memory storage and hence whenever a data point is added then the algorithm adjusts itself as per that new example and has its contribution to the future predictions as well.
Few Hyperparameters – The only parameters which are required in the training of a KNN algorithm are the value of k and the choice of the distance metric which we would like to choose from our evaluation metric.

Disadvantages of the KNN Algorithm

Does not scale – As we have heard about this that the KNN algorithm is also considered a Lazy Algorithm. The main significance of this term is that this takes lots of computing power as well as data storage. This makes this algorithm both time-consuming and resource exhausting.
Curse of Dimensionality – There is a term known as the peaking phenomenon according to this the KNN algorithm is affected by the curse of dimensionality which implies the algorithm faces a hard time classifying the data points properly when the dimensionality is too high.
Prone to Overfitting – As the algorithm is affected due to the curse of dimensionality it is prone to the problem of overfitting as well. Hence generally feature selection as well as dimensionality reduction techniques are applied to deal with this problem.

Example Program:

Assume 0 and 1 as the two classifiers.

import math
def classifyAPoint(points,p,k=3):
  distance=[]
      for group in points:
          for feature in points[group]:
              euclidean_distance = math.sqrt((feature[0]-p[0])**2 +   (feature[1]-p[1])**2)
              distance.append((euclidean_distance,group))
  distance = sorted(distance)[:k]
  freq1 = 0 
  freq2 = 0
  for d in distance:
          if d[1] == 0:
              freq1 += 1
          elif d[1] == 1:
              freq2 += 1
  return 0 if freq1>freq2 else 1
def main():
    points = {0:[(1,12),(2,5),(3,6),(3,10),(3.5,8),(2,11),(2,9),(1,7)],
              1:[(5,3),(3,2),(1.5,9),(7,2),(6,1),(3.8,1),(5.6,4),(4,2),(2,5)]}
    p = (2.5,7)
    k = 3
    print("The value classified to unknown point is: {}".\
          format(classifyAPoint(points,p,k)))
if __name__ == '__main__':
    main()

Output:

The value classified as an unknown point is 0.

Time Complexity: O(N * logN)

Auxiliary Space: O(1)

Few applications of KNN algorithm are :

data preprocessing
pattern recognition
Recommendation Engines

What is the difference between KNN, and K means?

KNN is a supervised machine learning model used for classification problems whereas K-means is an unsupervised machine learning model used for clustering.
The “K” in KNN is the number of nearest neighbors whereas the “K” in K means is the number of clusters.

To view or add a comment, sign in

KNN ALGORITHM

Varshinii kasirajan

How Does the K-Nearest Neighbors Algorithm Work?

Why do we need a KNN algorithm?

Distance Metrics Used in KNN Algorithm

Euclidean Distance

Manhattan Distance

Minkowski Distance

Recommended by LinkedIn

Advantages of the KNN Algorithm

Disadvantages of the KNN Algorithm

What is the difference between KNN, and K means?

More articles by Varshinii kasirajan

Others also viewed

Essential Machine Learning Algorithms

Group Think: A Deep Dive into the World of Clustering Algorithms

What is the best algorithm for classification problem?

Naive Bayes Algorithm for supervised classification: How is it different from Bayes algorithm.

Introduction to K means Clustering

Naive Bayes Algorithm

Multi-class Classification on Imbalanced Data using Random Forest Algorithm in Spark

Unveiling the Power of Lasso Regression in Machine Learning: A Comprehensive Guide

Understanding Bayes’ Theorem: A Key Idea Behind Bayesian Methods

Explore content categories

How Does the K-Nearest Neighbors Algorithm Work?

Why do we need a KNN algorithm?

Distance Metrics Used in KNN Algorithm

Euclidean Distance

Manhattan Distance

Minkowski Distance

Recommended by LinkedIn

Advantages of the KNN Algorithm

Disadvantages of the KNN Algorithm

What is the difference between KNN, and K means?

More articles by Varshinii kasirajan

DATA PREPROCESSING

LOGISTIC REGRESSION

Is R-Square value always between 0 to 1?

LINEAR REGRESIION

Others also viewed

Essential Machine Learning Algorithms

Group Think: A Deep Dive into the World of Clustering Algorithms

What is the best algorithm for classification problem?

Naive Bayes Algorithm for supervised classification: How is it different from Bayes algorithm.

Introduction to K means Clustering

Naive Bayes Algorithm

Multi-class Classification on Imbalanced Data using Random Forest Algorithm in Spark

Unveiling the Power of Lasso Regression in Machine Learning: A Comprehensive Guide

Understanding Bayes’ Theorem: A Key Idea Behind Bayesian Methods

Similar topics

Understanding Approximate Nearest Neighbor Algorithms

Neural Network Architectures

Linear Regression Models

Explore content categories