Data Science Lifecycle

Bhushan Dhamankar

Published Feb 19, 2021

Data Science Lifecycle revolves around using machine learning and other analytical methods to produce hidden insights and predictions from data in order to achieve a business objective or to build a business logic. The entire process involves several steps like data cleaning, preparation, modeling, model evaluation, etc.

1. Data Gathering

Data Gathering is the process of gathering and measuring information on variables of interest, in an established systematic fashion that enables one to answer stated research questions, test hypotheses, and evaluate outcomes. It involves gathering data from various sources like Database, Third party API’s, Web Scrapping and many more. This is specially performed by Big Data Engineers.

2. Feature Engineering

Feature Engineering is one of my most favorite pipeline of Data Science. It is the process of using domain knowledge to extract features from raw data via data mining techniques. These features can be used to improve the performance of machine learning algorithms. It can be considered as applied machine learning itself.

3. Feature Selection

Feature Selection is the process of reducing the number of input variables when developing a predictive model. It is desirable to reduce the number of input variables to both reduce the computational cost of modeling and, in some cases, to improve the performance of the model.

4. Model Training

The process of training a Machine Learning model involves providing an ML algorithm with training dataset to learn from. The term ML model refers to the model artifact that is created by the training process. Model creation requires Analyzed Data Report, Machine Learning, Deep Learning concepts

5. Model Testing

Testing model performance is about testing the models with the test datasets and comparing the model performance in terms of parameters such as accuracy etc. If by any chance the fails to show good/high accuracy which may be due to less Data or improper Feature Engineering, the model is rejected. Which makes us come back to Data Gathering or Feature Engineering stage depending on the dataset

6. Deployment

Deployment is the method by which Data Scientist integrate a machine learning model into an existing production environment to make practical business decisions based on data. It is one of the last stages in the machine learning life cycle after the Model Testing is successful and can be one of the most cumbersome.

Sairaj Chidrawar 5y

Nice to see with an article 😀. Hope you should come with new content next time👌👍👍

1 Reaction

Sahil Hemnani 5y

Data preprocessing and Retraining

1 Reaction

See more comments

To view or add a comment, sign in

Feature Engineering Techniques

Feb 21, 2021

Data Science Lifecycle

Bhushan Dhamankar

More articles by this author

Others also viewed

The Power of Data Science in Data Analysis

Feature Engineering: A Complete Guide to Transforming Raw Data

Data analysis types

Not Just Training a Model: What Building a Housing Price Predictor Taught Me About Data

"Maximizing Data Potential: Understanding Data Mining and Machine Learning"

How long does it take to implement Data Science/AI project?

Association rule learning

Data Science

The Role Of Feature Engineering In Predictive Analytics

Key Considerations for Data Lifecycle Management

Machine Learning Model Development

Best Practices For Evaluating Predictive Analytics Models

Requirements Gathering Techniques

How Data Science Drives AI Development

Explore content categories

Feature Engineering Techniques

Feb 21, 2021