The 7 Steps of Machine Learning

Parth Mehta

Published Aug 23, 2019

+ Follow

1 - Data Collection

The quantity & quality of your data dictate how accurate our model is
The outcome of this step is generally a representation of data (Guo simplifies to specifying a table) which we will use for training
Using pre-collected data, by way of datasets from Kaggle, UCI, etc., still fits into this step

2 - Data Preparation

Wrangle data and prepare it for training
Clean that which may require it (remove duplicates, correct errors, deal with missing values, normalization, data type conversions, etc.)
Randomize data, which erases the effects of the particular order in which we collected and/or otherwise prepared our data
Visualize data to help detect relevant relationships between variables or class imbalances (bias alert!), or perform other exploratory analysis
Split into training and evaluation sets

3 - Choose a Model

Different algorithms are for different tasks; choose the right one

4 - Train the Model

The goal of training is to answer a question or make a prediction correctly as often as possible
Linear regression example: algorithm would need to learn values for m (or W) and b (x is input, yis output)
Each iteration of process is a training step

5 - Evaluate the Model

Uses some metric or combination of metrics to "measure" objective performance of model
Test the model against previously unseen data
This unseen data is meant to be somewhat representative of model performance in the real world, but still helps tune the model (as opposed to test data, which does not)
Good train/eval split? 80/20, 70/30, or similar, depending on domain, data availability, dataset particulars, etc.

6 - Parameter Tuning

This step refers to hyperparameter tuning, which is an "artform" as opposed to a science
Tune model parameters for improved performance
Simple model hyperparameters may include: number of training steps, learning rate, initialization values and distribution, etc.

7 - Make Predictions

Using further (test set) data which have, until this point, been withheld from the model (and for which class labels are known), are used to test the model; a better approximation of how the model will perform in the real world

In short details mention below

Data collection

→ Defining the problem and assembling a dataset (1)

Data preparation

→ Preparing your data (4)

Choose model

Train model

→ Developing a model that does better than a baseline (5)

Evaluate model

→ Choosing a measure of success (2)

→ Deciding on an evaluation protocol (3)

Parameter tuning

→ Scaling up: developing a model that overfits (6)

→ Regularizing your model and tuning your parameters (7)

Predict

To view or add a comment, sign in

The 7 Steps of Machine Learning

Parth Mehta

Others also viewed

Understanding the Essentials of Machine Learning: A Deep Dive into Module 2 of Introduction to Data Mining by Tan ,Steinbach - Machine Learning Book

How to create a train and test dataset

Training Data is Crucial for Building an Accurate Model

A side note on imbalanced datasets and how to handle them for predictive modelling.

Sampling Sorcery: The Magic of Data Selection

Feature Engineering: A Complete Guide to Transforming Raw Data

Predictions with Uncertainties

Data Cleansing and Transformation in Machine Learning

How to Optimize Machine Learning Performance

Tips for Machine Learning Success

Tips for Creating a Machine Learning Experimentation Environment

How Quantization is Transforming Model Performance

How To Fine-Tune AI Models On Small Datasets

Explore content categories