Big Data Analytics

Chinthala Yeshwanth Reddy

Published Apr 25, 2022

K - Means Clustering

The K-means clustering algorithm computes centroids and repeats until the optimal centroid is found. It is presumptively known how many clusters there are. It is also known as the flat clustering algorithm. The number of clusters found from data by the method is denoted by the letter ‘K’ in K-means.

In this method, data points are assigned to clusters in such a way that the sum of the squared distances between the data points and the centroid is as small as possible. It is essential to note that reduced diversity within clusters leads to more identical data points within the same cluster.

Working on K-Means Algorithm

The following stages will help us understand how the K-Means clustering technique works-

Step 1: First, we need to provide the number of clusters, K, that need to be generated by this algorithm.
Step 2: Next, choose K data points at random and assign each to a cluster. Briefly, categorize the data based on the number of data points.
Step 3: The cluster centroids will now be computed.
Step 4: Iterate the steps below until we find the ideal centroid, which is the assigning of data points to clusters that do not vary.
4.1 The sum of squared distances between data points and centroids would be calculated first.
4.2 At this point, we need to allocate each data point to the cluster that is closest to the others (centroid).
4.3 Finally, compute the centroids for the clusters by averaging all of the cluster’s data points.

Applications:

Vector quantization
Cluster analysis
Feature Learning

K-means implements the Expectation-Maximization strategy to solve the problem. The Expectation-step is used to assign data points to the nearest cluster, and the Maximization-step is used to compute the centroid of each cluster.

Recommended by LinkedIn

K-means Clustering & It’s real use-cases in security…

Krishna Tripathi 4 years ago

What is Clustering?

Ridham Lakhani 4 years ago

Hierarchical clustering: The simplest clustering…

Suravi Mahanta 6 years ago

There is an algorithm that tries to minimize the distance of the points in a cluster with their centroid – the k-means clustering technique.

K-means is a centroid-based algorithm, or a distance-based algorithm, where we calculate the distances to assign a point to a cluster. In K-Means, each cluster is associated with a centroid.

The main objective of the K-Means algorithm is to minimize the sum of distances between the points and their respective cluster centroid.

Let’s now take an example to understand how K-Means actually works:

Output images :

Thank You.

Chinthala Yeshwanth Reddy.

To view or add a comment, sign in

Big Data Analytics

Chinthala Yeshwanth Reddy

Working on K-Means Algorithm

Recommended by LinkedIn

More articles by Chinthala Yeshwanth Reddy

Others also viewed

Data Science: Using Data to transform a small business enterprise

Clustering overview

𝐊-𝐦𝐞𝐚𝐧𝐬 𝐂𝐥𝐮𝐬𝐭𝐞𝐫𝐢𝐧𝐠 𝐚𝐧𝐝 𝐢𝐭’𝐬 𝐑𝐞𝐚𝐥 𝐮𝐬𝐞-𝐜𝐚𝐬𝐞 𝐢𝐧 𝐭𝐡𝐞 𝐒𝐞𝐜𝐮𝐫𝐢𝐭𝐲 𝐃𝐨𝐦𝐚𝐢𝐧

Understanding IQR (Interquartile Range) in Data Science A Comprehensive Guide

K-means Clustering

The Importance of Data Preprocessing in Data Science

K-Means clustering

Comprehensive Overview of Hierarchical Clustering: Agglomerative and Divisive Approaches, Dendrogram Visualization, and Practical Considerations

k-Means Clustering in Price Trend Prediction

Steps Involved In Data Science Problem:

Explore content categories

Working on K-Means Algorithm

Recommended by LinkedIn

More articles by Chinthala Yeshwanth Reddy

Share Your Knowlege

Want to know about Cryptocurrencies?

Others also viewed

Data Science: Using Data to transform a small business enterprise

Clustering overview

𝐊-𝐦𝐞𝐚𝐧𝐬 𝐂𝐥𝐮𝐬𝐭𝐞𝐫𝐢𝐧𝐠 𝐚𝐧𝐝 𝐢𝐭’𝐬 𝐑𝐞𝐚𝐥 𝐮𝐬𝐞-𝐜𝐚𝐬𝐞 𝐢𝐧 𝐭𝐡𝐞 𝐒𝐞𝐜𝐮𝐫𝐢𝐭𝐲 𝐃𝐨𝐦𝐚𝐢𝐧

Understanding IQR (Interquartile Range) in Data Science A Comprehensive Guide

K-means Clustering

The Importance of Data Preprocessing in Data Science

K-Means clustering

Comprehensive Overview of Hierarchical Clustering: Agglomerative and Divisive Approaches, Dendrogram Visualization, and Practical Considerations

k-Means Clustering in Price Trend Prediction

Steps Involved In Data Science Problem:

Explore content categories