Machine Learning

Machine Learning

Steps for Machine Learning Prediction

1. Understand the problem:

Before getting the data, we need to understand the problem we are trying to solve.

2. Hypothesis Generation:

Hypothesis generation refers to creating a set of features which could influence the target variable given a confidence interval ( taken as 95% all the time). We can do this before looking at the data to avoid biased thoughts. This step often helps in creating new features.

3. Get Data:

Now, collect data and look on it. Determine which features are available and which aren't, how many features we generated in hypothesis generation hit the mark, and which ones could be created. Answering these questions will set us on the right track.

4. Data Exploration:

We can't determine everything by just looking at the data. We need to dig deeper. This step helps us understand the nature of variables (skewed, missing, zero variance feature) so that they can be treated properly. It involves creating charts, graphs (univariate and bivariate analysis), and cross-tables to understand the behavior of features.

 5. Data Preprocessing:

Here, we impute missing values and clean string variables (remove space, irregular tabs, data time format) and anything that shouldn't be there. This step is usually followed along with the data exploration stage.

6. Feature Engineering:

Now, we create and add new features to the data set. Most of the ideas for these features come during the hypothesis generation stage.

7. Model Training:

Using a suitable algorithm, we train the model on the given data set.

8. Model Evaluation:

Once the model is trained, we evaluate the model's performance using a suitable error metric. Here, we also look for variable importance, i.e., which variables have proved to be significant in determining the target variable. And, accordingly, we can shortlist the best variables and train the model again.

9. Model Testing:

Finally, we test the model on the unseen data (test data) set.

To view or add a comment, sign in

More articles by Aashish Pandey

Others also viewed

Explore content categories