KNN ALGORITHM
K-Nearest Neighbors (K-NN) algorithm is a popular Machine Learning algorithm used mostly for solving classification problems. Evelyn Fix and Joseph Hodges developed this algorithm in 1951, which was subsequently expanded by Thomas Cover.
It belongs to the supervised learning domain and finds intense application in pattern recognition, data mining, and intrusion detection. It is widely disposable in real-life scenarios since it is non-parametric, meaning it does not make any underlying assumptions about the distribution of data.
How Does the K-Nearest Neighbors Algorithm Work?
K-NN compares a new data entry to the values in a given data set .Based on its closeness in a given range (K) of neighbors, the algorithm assigns the new data to a class or category in the data set.
Step 1 - Assign a value to K. k represents the number of nearest neighbors that needs to be considered while making prediction.
Step 2 - Calculate the distance between the new data entry and all other existing data entries. Arrange them in ascending order.
Step 3 - Find the K nearest neighbors to the new entry based on the calculated distances.
Step 4 - Assign the new data entry to the majority class in the nearest neighbors.
Why do we need a KNN algorithm?
Distance Metrics Used in KNN Algorithm
Euclidean Distance
the cartesian distance between the two points which are in the plane/hyperplane. Euclidean distance can also be visualized as the length of the straight line that joins the two points which are into consideration.
Manhattan Distance
it is generally used when we are interested in the total distance traveled by the object instead of the displacement. This metric is calculated by summing the absolute difference between the coordinates of the points in n-dimensions.
Minkowski Distance
Euclidean, as well as the Manhattan distance, are special cases of the minkowski distance.
Recommended by LinkedIn
Advantages of the KNN Algorithm
Disadvantages of the KNN Algorithm
Example Program:
Assume 0 and 1 as the two classifiers.
import math
def classifyAPoint(points,p,k=3):
distance=[]
for group in points:
for feature in points[group]:
euclidean_distance = math.sqrt((feature[0]-p[0])**2 + (feature[1]-p[1])**2)
distance.append((euclidean_distance,group))
distance = sorted(distance)[:k]
freq1 = 0
freq2 = 0
for d in distance:
if d[1] == 0:
freq1 += 1
elif d[1] == 1:
freq2 += 1
return 0 if freq1>freq2 else 1
def main():
points = {0:[(1,12),(2,5),(3,6),(3,10),(3.5,8),(2,11),(2,9),(1,7)],
1:[(5,3),(3,2),(1.5,9),(7,2),(6,1),(3.8,1),(5.6,4),(4,2),(2,5)]}
p = (2.5,7)
k = 3
print("The value classified to unknown point is: {}".\
format(classifyAPoint(points,p,k)))
if __name__ == '__main__':
main()
Output:
The value classified as an unknown point is 0.
Time Complexity: O(N * logN)
Auxiliary Space: O(1)
Few applications of KNN algorithm are :
What is the difference between KNN, and K means?