From the course: Data Quality: Core Concepts
Unlock this course with a free trial
Join today to access over 25,500 courses taught by industry experts.
Null rates
- [Instructor] I argue that one of the most important data quality signals to look for are null rates. Now, nulls are extremely important because they represent missingness due to data loss or non-inner joins where there's no matching values. And another caveat is intentionally stating that the value is not present. Nulls are often the first attribute I check when doing root cause analysis for data quality issues. And so I'll pull up SQL, I'll go to a database and I'll check for null values and group by the dates. Now, there's various type of nulls. There are true nulls, so all nulls are not bad. A lack of information is a data point in itself. Examples of this could be a demographic's not applicable in a survey. Maybe there's drop off in the product funnel, or people intentionally identifying where joins don't match. So say for instance, I was doing analysis where I wanted to see if a user visited and when did they drop off for a certain date. I would merge the information on the…
Practice while you learn with exercise files
Download the files the instructor uses to teach the course. Follow along and learn by watching, listening and practicing.