The 7 Steps of Machine Learning

The 7 Steps of Machine Learning

1 - Data Collection

  • The quantity & quality of your data dictate how accurate our model is
  • The outcome of this step is generally a representation of data (Guo simplifies to specifying a table) which we will use for training
  • Using pre-collected data, by way of datasets from Kaggle, UCI, etc., still fits into this step

 

2 - Data Preparation

  • Wrangle data and prepare it for training
  • Clean that which may require it (remove duplicates, correct errors, deal with missing values, normalization, data type conversions, etc.)
  • Randomize data, which erases the effects of the particular order in which we collected and/or otherwise prepared our data
  • Visualize data to help detect relevant relationships between variables or class imbalances (bias alert!), or perform other exploratory analysis
  • Split into training and evaluation sets

 

3 - Choose a Model

  • Different algorithms are for different tasks; choose the right one

 

4 - Train the Model

  • The goal of training is to answer a question or make a prediction correctly as often as possible
  • Linear regression example: algorithm would need to learn values for m (or W) and b (x is input, yis output)
  • Each iteration of process is a training step

 

5 - Evaluate the Model

  • Uses some metric or combination of metrics to "measure" objective performance of model
  • Test the model against previously unseen data
  • This unseen data is meant to be somewhat representative of model performance in the real world, but still helps tune the model (as opposed to test data, which does not)
  • Good train/eval split? 80/20, 70/30, or similar, depending on domain, data availability, dataset particulars, etc.

 

6 - Parameter Tuning

  • This step refers to hyperparameter tuning, which is an "artform" as opposed to a science
  • Tune model parameters for improved performance
  • Simple model hyperparameters may include: number of training steps, learning rate, initialization values and distribution, etc.

 

7 - Make Predictions

  • Using further (test set) data which have, until this point, been withheld from the model (and for which class labels are known), are used to test the model; a better approximation of how the model will perform in the real world



In short details mention below


Data collection

→ Defining the problem and assembling a dataset (1)

Data preparation

→ Preparing your data (4)

Choose model

Train model

→ Developing a model that does better than a baseline (5)

Evaluate model

→ Choosing a measure of success (2)

→ Deciding on an evaluation protocol (3)

Parameter tuning

→ Scaling up: developing a model that overfits (6)

→ Regularizing your model and tuning your parameters (7)

Predict

To view or add a comment, sign in

Others also viewed

Explore content categories