From the course: Introduction to Python: Learn How to Program Today with Python by Pearson

Unlock this course with a free trial

Join today to access over 25,500 courses taught by industry experts.

Clean some data

Clean some data

Okay, now let's look a bit more at this data and make sure that it's nice and clean and ready to be analyzed. So some of the things we could do, is there missing data? Like if so, should you give it a default value or should you remove it from the data set and ignore the row or just not include it in that analysis for that column? And then, is there data that was input wrong or in a different format? So make sure that they're all normalized and consistent. And then, is there some that are unrealistic, maybe like they were input in the wrong units and things are an order of magnitude off? Or are there huge outliers that are skewing the results, like the mean, for example, of one of the columns? And should you ignore those, or how do you handle them? So there's no one right way of doing it. But one thing we definitely should do, which I'll show you now, is handling the inputs and making sure they're consistent. So we can go tips, and let's look at the sex column. And again, we can get…

Contents