Implementation of ML project using GIT(I)

Implementation of ML project using GIT(I)

Hello #connections this is my 2nd day of learning Machine Learning with Kushal Sharma

Today’s session focused on ML Preprocessing, some algorithms (KMeans algorithm), Version control and end session with git and GitHub.

Let's start with KMeans algorithm,

K-means is a centroid-based clustering algorithm, where we calculate the distance between each data point and a centroid to assign it to a cluster. The goal is to identify the K number of groups in the dataset.

How KMeans works

No alt text provided for this image
Working


No alt text provided for this image
Example of KMeans algorithm

ML Preprocessing

ML Preprocessing: Understanding the data is the foundation of any successful Machine Learning project. Learning how to preprocess and clean the data properly can make or break the model's performance. I'm now equipped with the knowledge and techniques to handle missing data, feature scaling, and more, ensuring our models are fed with the best quality data possible!

Let's take one example

# data handling
import pandas as pd
data = pd.read_csv('/content/ML.CSV', header = None)
data.columns = ('Id','Spend','Age')
data

# data set 

   Id   Spend	Age
0	1	1000	41
1	2	200	    20
2	3	150	    19
3	4	2000	18
4	5	500	     5
5	6	600	    33        


# model -> Create -> Train -> Predict
from sklearn.cluster import KMeans

# Create
model = KMeans(n_clusters=4)

#Train
model.fit(data)

#predict
y_pred = model.predict(data)
print(y_pred)

#Output
[0 3 3 2 1 1]
        

Version Control Version control tracks and manages changes in a collection of related entities. It records changes and modifications over time, so you can recall, revert, compare, reference, and restore anything you want. Version control is also known as source control or revision control.

There are some types of Version control

1) Centralized Version Control Systems A Centralized Version Control System (CVCS) is a version control where the developer has to check out the repository from a single centralized server containing all the files and file history.

These systems make it easy to control the full codebase in one place, and everyone is aware of any changes that happen. However, it can be slow in case of central server connection issues, plus it’s risky to have all the backups in one place. 

This can be used when your model is mature enough and used as part of a product. You have different teams working with a lot of features and changes, you don’t need to have all the code on the developer computer, and you want to reduce the complexity of merging and adding changes. 

2) Distributed Version Control System Distributed Version Control System (DVCS) is a version control system where the full codebase is available locally on the developer’s computer, including the history. This enables the developer to merge and create branches locally, without being connected to a remote server or any network at all. An example of these systems is Git. 

Using this approach for model development has the advantage of working privately on your machine without the need to be online and not relying on a single server for backup. This can be useful for training purposes and while the project is still small. However, when the project repository becomes bigger, this will require a large storage to keep all the history and all the branches data on the developer computer, which is why another centralized version control approach was introduced

Git : Git is a distributed version control system that tracks changes in any set of computer files, usually used for coordinating work among programmers collaboratively developing source code during software development. Its goals include speed, data integrity, and support for distributed, non-linear workflows.

Github: GitHub, Inc. is a platform and cloud-based service for software development and version control using Git, allowing developers to store and manage their code.

I am thankful to HOD  Dr. Meenakshi Thalor for arranging this course and  Kushal Sharma who is guiding us through this incredible experience. Your dedication and passion for teaching are making a significant impact on all of us.

To view or add a comment, sign in

More articles by Rohit Utekar

  • Comparing MLOPs Libraries

    In this journey of learning Machine Learning, we will now look into the libraries which can be used in MLOps for…

  • Implementation of ML project using GIT(II)

    🚀 Excited to share my Day 3 experience at the ongoing Value-Added Course for Machine Learning with Kushal Sharma sir !…

  • Introduction to ML,DevOps,MLOps.

    Hello #connections Rohit Utekar here! This was my first day of learning ML,DevOps and MLOps with Kushal Sharma First…

Others also viewed

Explore content categories