Data Science Lifecycle
by Bhushan Dhamankar

Data Science Lifecycle

Data Science Lifecycle revolves around using machine learning and other analytical methods to produce hidden insights and predictions from data in order to achieve a business objective or to build a business logic. The entire process involves several steps like data cleaning, preparation, modeling, model evaluation, etc.  

    1. Data Gathering  

Data Gathering is the process of gathering and measuring information on   variables of interest, in an established systematic fashion that enables one to  answer stated research questions, test hypotheses, and evaluate outcomes. It   involves gathering data from various sources like Database, Third party API’s,  Web Scrapping and many more. This is specially performed by Big Data Engineers.    

   2. Feature Engineering 

Feature Engineering is one of my most favorite pipeline of Data Science. It is the process of using domain knowledge to extract features from raw data via data mining techniques. These features can be used to improve the performance of machine learning algorithms. It can be considered as applied machine learning itself.  

   3. Feature Selection 

Feature Selection is the process of reducing the number of input variables when developing a predictive model. It is desirable to reduce the number of input variables to both reduce the computational cost of modeling and, in some cases, to improve the performance of the model.  

   4. Model Training 

The process of training a Machine Learning model involves providing an ML algorithm with training dataset to learn from. The term ML model refers to the model artifact that is created by the training process. Model creation requires Analyzed Data Report, Machine Learning, Deep Learning concepts  

   5. Model Testing 

Testing model performance is about testing the models with the test datasets and comparing the model performance in terms of parameters such as accuracy etc. If by any chance the fails to show good/high accuracy which may be due to less Data or improper Feature Engineering, the model is rejected. Which makes us come back to Data Gathering or Feature Engineering stage depending on the dataset   

   6. Deployment 

Deployment is the method by which Data Scientist integrate a machine learning model into an existing production environment to make practical business decisions based on data. It is one of the last stages in the machine learning life cycle after the Model Testing is successful and can be one of the most cumbersome.   

Nice to see with an article 😀. Hope you should come with new content next time👌👍👍

Data preprocessing and Retraining

To view or add a comment, sign in

Others also viewed

Explore content categories