Types of missing values in data

Types of missing values in data


Missing values in datasets present a common challenge, and understanding their nature is crucial for accurate analysis. These missing values generally fall into three categories: Missing Completely At Random (MCAR) where missing values are entirely random and unrelated to any other data; Missing At Random (MAR) where missing values are related to observed data but not unobserved data; and Missing Not At Random (MNAR) where missing values are related to the unobserved data itself, Structural Missing (SM) where missing values are inherent to the data collection process or the nature of the variable itself Recognizing these distinctions is vital, as each type necessitates different handling strategies to prevent bias and ensure reliable results.

We will now explore each of these four types in more detail.


1. Missing Completely at Random (MCAR)

In MCAR, the probability of a value being missing is entirely random and unrelated to any other observed or unobserved variables in the dataset, Essentially Missing values occur by pure chance


Example (Survey Data Collection)

  • Some survey responses are lost due to: Mail delivery errors Technical glitches in online survey platform Random human error during data entry

Sample Data:

Article content
Customer Satisfaction Survey


Characteristics:

  • Missing data is purely accidental
  • No predictable pattern
  • Probability of missing values is equal across all observations


2. Missing at Random (MAR)

The missing values can be explained by observed data but not the missing values themselves, In other words, we can predict the likelihood of missing values based on other available information


Example (Healthcare Research)

  • Older patients less likely to report income, Income missing values is related to age

Sample Data:

Article content
Patient Health Survey


Characteristics:

  • Missing income data correlated with age
  • Can be predicted using other observed variables


3. Missing Not at Random (MNAR)

MNAR occurs when the probability of a value being missing is related to the unobserved value itself

Note: This is the most challenging type of missing data to handle, as the missing values are tied to the very information we lack.


Example (Mental Health Survey)

  • People with severe depression less likely to complete mental health questionnaires

Sample Data:

Article content
Depression and Reporting Behavior


Characteristics:

  • Missing data directly related to the severity of the condition
  • Non-random missing values based on the unobserved value


4. Structured Missing (SM)

Missing data exhibits a pattern or structure, Data is absent because it is logically irrelevant or because data collection design


Example 1 (Age-Specific Questionnaire)

  • Young People don't respond to retirement planning survey

Sample Data:

Article content
Retirement Planning Survey


Example 2 (Professional Role Survey)

  • Leadership and management questions skipped by respondents who occupy entry roles

Sample Data:

Article content
Management Experience Tracking

Characteristics:

  • Missing data follows a logical, predictable pattern
  • Missing values are intentional and meaningful


Relationship between types of missing values and the confidence of why this data is missing:

As we move through the different types of missing data, our certainty about the reasons for those missing values grows. Initially, with Missing Completely At Random (MCAR), the cause is entirely unknown. Progressing to Missing At Random (MAR), we gain some insight, allowing for more informed treatment strategies. Further along, with Missing Not At Random (MNAR), we develop a stronger understanding, enabling us to propose probable explanations. Finally, in the case of System Missing (SM), we achieve near-absolute certainty regarding the cause. This increasing confidence directly translates to improved methods for addressing the missing data.


Conclusion

Missing data is a common problem when working with data. In this article, we covered four types of missing data: structurally missing, missing completely at random (MCAR), missing at random (MAR), and missing not at random (MNAR).





To view or add a comment, sign in

More articles by Bassel AbdulHak

Others also viewed

Explore content categories