Data Science Lifecycle
Data Science Lifecycle revolves around using machine learning and other analytical methods to produce hidden insights and predictions from data in order to achieve a business objective or to build a business logic. The entire process involves several steps like data cleaning, preparation, modeling, model evaluation, etc.
1. Data Gathering
Data Gathering is the process of gathering and measuring information on variables of interest, in an established systematic fashion that enables one to answer stated research questions, test hypotheses, and evaluate outcomes. It involves gathering data from various sources like Database, Third party API’s, Web Scrapping and many more. This is specially performed by Big Data Engineers.
2. Feature Engineering
Feature Engineering is one of my most favorite pipeline of Data Science. It is the process of using domain knowledge to extract features from raw data via data mining techniques. These features can be used to improve the performance of machine learning algorithms. It can be considered as applied machine learning itself.
3. Feature Selection
Feature Selection is the process of reducing the number of input variables when developing a predictive model. It is desirable to reduce the number of input variables to both reduce the computational cost of modeling and, in some cases, to improve the performance of the model.
4. Model Training
The process of training a Machine Learning model involves providing an ML algorithm with training dataset to learn from. The term ML model refers to the model artifact that is created by the training process. Model creation requires Analyzed Data Report, Machine Learning, Deep Learning concepts
5. Model Testing
Testing model performance is about testing the models with the test datasets and comparing the model performance in terms of parameters such as accuracy etc. If by any chance the fails to show good/high accuracy which may be due to less Data or improper Feature Engineering, the model is rejected. Which makes us come back to Data Gathering or Feature Engineering stage depending on the dataset
6. Deployment
Deployment is the method by which Data Scientist integrate a machine learning model into an existing production environment to make practical business decisions based on data. It is one of the last stages in the machine learning life cycle after the Model Testing is successful and can be one of the most cumbersome.
Nice to see with an article 😀. Hope you should come with new content next time👌👍👍
Data preprocessing and Retraining