From the course: Python: Working with Predictive Analytics
Unlock this course with a free trial
Join today to access over 25,500 courses taught by industry experts.
Divide the data into test and train - Python Tutorial
From the course: Python: Working with Predictive Analytics
Divide the data into test and train
- [Instructor] We are still in the data preparation step of the predictive analytics roadmap. At this stage, we need to divide the data into train and test datasets. The train dataset contains known outputs and is used to train the prediction model. The test dataset, however, is used to evaluate how well the model performs on unseen new data. Imagine our data now as separate wooden blocks where each column is an individual data frame. Stacking them together, gives us the final data frame. Sometimes we might reduce dimensions to make processing faster, but we won't cover that here. Well, why do we need to split the data? The trained dataset is used to train the model, while the test dataset ensures the model generalizes well to unseen data. In other words, it doesn't memorize the model. Think of it like a fast food line. You've seen kids ordering from the kids menu many, many times before. Consider that as training data. When you see a new kid in the line that represents unseen new…
Contents
-
-
-
Differentiate data types5m 46s
-
Python libraries and data import7m 31s
-
(Locked)
Handling missing values12m 36s
-
(Locked)
Solution: Handling missing values2m 32s
-
(Locked)
Convert categorical data into numbers12m 59s
-
(Locked)
Divide the data into test and train8m 32s
-
(Locked)
Feature scaling11m 35s
-
(Locked)
Solution: Feature scaling2m 44s
-
-
-