How to transform a supervised problem into an unsupervised one?

How to transform a supervised problem into an unsupervised one?

Supervised learning uses labeled input and output data while unsupervised learning doesn't have labeled input data. Although supervised learning problems are easier to tackle than unsupervised learning, this is just an example to show the relationship between both.

Here we have MNIST data that includes the 28x28 pixel images of digits and the goal is to predict the target i.e. the digit by looking at the images.

from sklearn import datasets

digits = datasets.load_digits()

_, axes = plt.subplots(nrows=1, ncols=10, figsize=(15, 6))

for ax, image, label in zip(axes, digits.images, digits.target):
    ax.set_axis_off()
    ax.imshow(image, cmap=plt.cm.gray_r, interpolation="nearest"))        
No alt text provided for this image

let's do some dimension reduction using t-distributed Stochastic Neighbour Embedding (t-SNE) and plot the data as a scatter plot for 4000 random digits and dimensions as two.


data = datasets.fetch_openml('mnist_784', version=1, return_X_y=True)

pixel_values, targets = data

targets = targets.astype('int'))

tsne = manifold.TSNE(init='random', learning_rate='auto', n_components=2, random_state=42)

transformed_data = tsne.fit_transform(pixel_values.loc[:4000, :].values))

tsne_df = pd.DataFramenp.column_stack((transformed_data, targets.loc[:3000].values)), 
columns=["x", "y", "targets"])

tsne_df.loc[:, "targets"] = tsne_df.targets.astype(int)

tsne_df.head(10)        
No alt text provided for this image


grid = sns.FacetGrid(tsne_df, hue="targets", height=8)

grid.map(plt.scatter, "x", "y").add_legend()        

following is the plot of supervised data with labels

No alt text provided for this image

grid = sns.FacetGrid(tsne_df, hue="targets", height=8) 
grid.map(plt.scatter, "x", "y").add_legend())        

following is the plot without labels

No alt text provided for this image

There are some clusters that can be seen in the data, let's apply some unsupervised clustering algorithm on this data and see how it will create clusters.



from sklearn.cluster import KMean


df = pd.DataFrame(
np.column_stack((transformed_data,)),
columns=["x", "y"]
)


kmeans = KMeans(n_clusters= 10)
label = kmeans.fit_predict(df)


df_with_target =  pd.DataFrame(
    np.column_stack((transformed_data, label)),
    columns=["x", "y", "targets"]
)

df_with_target.loc[:, "targets"] = df_with_target.targets.astype(int)


grid = sns.FacetGrid(df_with_target, hue="targets", height=8)
grid.map(plt.scatter, "x", "y").add_legend()        


No alt text provided for this image

There is one drawback the target values are random and don't indicate the actual label of the digit, so labeling needs to be changed after analyzing some of the data.

The k-Means clustering algorithm is able to create the clusters, the accuracy of this algorithm may not be good in comparison to the supervised algorithms because supervised models are trained on the labeled data. Hence we solve a supervised learning problem with unsupervised algorithms.

What a brilliant illustration 💡

Like
Reply

To view or add a comment, sign in

More articles by krishna khadka

Others also viewed

Explore content categories