From the course: Data Quality: Core Concepts

Unlock this course with a free trial

Join today to access over 25,500 courses taught by industry experts.

Null rates

Null rates

- [Instructor] I argue that one of the most important data quality signals to look for are null rates. Now, nulls are extremely important because they represent missingness due to data loss or non-inner joins where there's no matching values. And another caveat is intentionally stating that the value is not present. Nulls are often the first attribute I check when doing root cause analysis for data quality issues. And so I'll pull up SQL, I'll go to a database and I'll check for null values and group by the dates. Now, there's various type of nulls. There are true nulls, so all nulls are not bad. A lack of information is a data point in itself. Examples of this could be a demographic's not applicable in a survey. Maybe there's drop off in the product funnel, or people intentionally identifying where joins don't match. So say for instance, I was doing analysis where I wanted to see if a user visited and when did they drop off for a certain date. I would merge the information on the…

Contents