Big Data Analytics

Big Data Analytics

K - Means Clustering

The K-means clustering algorithm computes centroids and repeats until the optimal centroid is found. It is presumptively known how many clusters there are. It is also known as the flat clustering algorithm. The number of clusters found from data by the method is denoted by the letter ‘K’ in K-means.

In this method, data points are assigned to clusters in such a way that the sum of the squared distances between the data points and the centroid is as small as possible. It is essential to note that reduced diversity within clusters leads to more identical data points within the same cluster.

Working on K-Means Algorithm

The following stages will help us understand how the K-Means clustering technique works-

  • Step 1: First, we need to provide the number of clusters, K, that need to be generated by this algorithm.
  • Step 2: Next, choose K data points at random and assign each to a cluster. Briefly, categorize the data based on the number of data points.
  • Step 3: The cluster centroids will now be computed.
  • Step 4: Iterate the steps below until we find the ideal centroid, which is the assigning of data points to clusters that do not vary.
  • 4.1 The sum of squared distances between data points and centroids would be calculated first.
  • 4.2 At this point, we need to allocate each data point to the cluster that is closest to the others (centroid).
  • 4.3 Finally, compute the centroids for the clusters by averaging all of the cluster’s data points.

Applications:

  1. Vector quantization
  2. Cluster analysis
  3. Feature Learning

K-means implements the Expectation-Maximization strategy to solve the problem. The Expectation-step is used to assign data points to the nearest cluster, and the Maximization-step is used to compute the centroid of each cluster.

There is an algorithm that tries to minimize the distance of the points in a cluster with their centroid – the k-means clustering technique.

K-means is a centroid-based algorithm, or a distance-based algorithm, where we calculate the distances to assign a point to a cluster. In K-Means, each cluster is associated with a centroid.

The main objective of the K-Means algorithm is to minimize the sum of distances between the points and their respective cluster centroid.

Let’s now take an example to understand how K-Means actually works:

No alt text provided for this image

Output images :

No alt text provided for this image

Thank You.

Chinthala Yeshwanth Reddy.

To view or add a comment, sign in

More articles by Chinthala Yeshwanth Reddy

  • Share Your Knowlege

    Introduction: Hello, I am Yeshwanth Reddy from K L University of ID Number 190030324. As part of the Enterprise…

  • Want to know about Cryptocurrencies?

    Hello all, Here I wrote a short article on Cryptocurrency i.e Bitcoin.

Others also viewed

Explore content categories