From the course: Data Preparation, Feature Engineering, and Augmentation for AI Models

Unlock this course with a free trial

Join today to access over 25,500 courses taught by industry experts.

Statistical techniques for data quality assessment

Statistical techniques for data quality assessment

- [Instructor] Statistical methods provide objective measures of data quality. Now, these measures help us identify patterns, anomalies, and data quality issues at scale. And they include techniques like distribution analysis, outlier detection, correlation and consistency analysis, and sampling techniques. Let's take a look at distribution analysis. With distribution analysis, what we're trying to do is examine the data distribution to identify quality issues. So, for example, we might look at price point distribution to detect pricing errors, or purchase frequency patterns to find abnormal customer behaviors, or inventory level distributions that will help us identify data collection issues. Now we have a variety of tools that help with distribution analysis. We can use visual techniques, like histograms and box plots, as well as statistical calculations that give us numerical measures, like distribution tests like the Shapiro-Wilk test. Now, I will say if you're not familiar with…

Contents