Types of missing values in data
Missing values in datasets present a common challenge, and understanding their nature is crucial for accurate analysis. These missing values generally fall into three categories: Missing Completely At Random (MCAR) where missing values are entirely random and unrelated to any other data; Missing At Random (MAR) where missing values are related to observed data but not unobserved data; and Missing Not At Random (MNAR) where missing values are related to the unobserved data itself, Structural Missing (SM) where missing values are inherent to the data collection process or the nature of the variable itself Recognizing these distinctions is vital, as each type necessitates different handling strategies to prevent bias and ensure reliable results.
We will now explore each of these four types in more detail.
1. Missing Completely at Random (MCAR)
In MCAR, the probability of a value being missing is entirely random and unrelated to any other observed or unobserved variables in the dataset, Essentially Missing values occur by pure chance
Example (Survey Data Collection)
Sample Data:
Characteristics:
2. Missing at Random (MAR)
The missing values can be explained by observed data but not the missing values themselves, In other words, we can predict the likelihood of missing values based on other available information
Example (Healthcare Research)
Sample Data:
Characteristics:
3. Missing Not at Random (MNAR)
MNAR occurs when the probability of a value being missing is related to the unobserved value itself
Note: This is the most challenging type of missing data to handle, as the missing values are tied to the very information we lack.
Recommended by LinkedIn
Example (Mental Health Survey)
Sample Data:
Characteristics:
4. Structured Missing (SM)
Missing data exhibits a pattern or structure, Data is absent because it is logically irrelevant or because data collection design
Example 1 (Age-Specific Questionnaire)
Sample Data:
Example 2 (Professional Role Survey)
Sample Data:
Characteristics:
Relationship between types of missing values and the confidence of why this data is missing:
As we move through the different types of missing data, our certainty about the reasons for those missing values grows. Initially, with Missing Completely At Random (MCAR), the cause is entirely unknown. Progressing to Missing At Random (MAR), we gain some insight, allowing for more informed treatment strategies. Further along, with Missing Not At Random (MNAR), we develop a stronger understanding, enabling us to propose probable explanations. Finally, in the case of System Missing (SM), we achieve near-absolute certainty regarding the cause. This increasing confidence directly translates to improved methods for addressing the missing data.
Conclusion
Missing data is a common problem when working with data. In this article, we covered four types of missing data: structurally missing, missing completely at random (MCAR), missing at random (MAR), and missing not at random (MNAR).