Debiasing AI, a word of caution
Bias in AI is a term that one often hears and uses in the context of Responsible AI. Every responsible AI framework, whether it is one written by an Enterprise or a Multinational body, has something related to it. For example, the OECD AI first principle is that “AI should benefit people and the planet by driving inclusive growth, sustainable development and well-being”. It further states that the inclusive growth, sustainable development and well-being principle “… recognises that AI systems could perpetuate existing biases and have a disparate impact on vulnerable and underrepresented populations, …”
While well-thought and well-intended, this principle can lead to a classical catch-22 in its application. My word of caution here is about when one attempts to remedy this dilemma.
What is this catch-22?
Simply put, you may be in a situation where you’d want to evaluate your AI model for potential harmful biases on a given underrepresented minority, but you don’t have the data about who is or isn’t an underrepresented minority. Hence you are unable to evaluate your AI for potential biases and potentially debias your model.
Faced with this dilemma and very good intentions, you may be tempted to attempt to infer who, amongst your users, is or isn’t a minority. This with the explicit desire and stated goal to solely prevent any potential harm to these underrepresented populations by debiasing your AI.
My word of caution here, is that if you were to follow on this intent, you must act with deontological principles. You absolutely cannot take a consequentialist perspective.
Recommended by LinkedIn
As we are building AI models, we evaluate them on how they perform. We evaluate them on a ‘golden dataset’, in a test environment and when we ramp them up in production after they met some baseline criteria in a training environment. A model is defined as good if it causes an increase in the desired outcome metric. The consequences or the impact of a model is what defines it good or bad. A natural consequentialist perspective.
If this approach is so ingrained and natural in AI, why would I argue against it relative to this catch-22?
Simply put, history is full of examples when ‘good intentions’ from the majority were extremely prejudicial to a minority. Inevitably, the argument would be we (the majority) believe that they (the minority) would be ‘ok’ with it as it is for their own good. The majority may have had the best of intentions but the impact of the actions the majority took, often did fall far short and created further real harm. One clear lesson history should have taught us is that it is not for the majority to presume what the minority needs. It is for the majority to reach out to minorities to hear their perspectives and give them control.
Acting deontologically, in our catch-22, would necessitate that one is not just transparent about attempting to do something but that you’d reach out to have the consent of these groups you are attempting to ensure your AI is not biased against. Acting deontologically would require of you that you do so very early in your attempt to ‘help’ these underrepresented groups. Finally, acting ethically, if these minorities don’t want you to do this inference, then you need to respect their desires and stop. No ifs or buts.
Interesting read Igor Perisic.
Frequent corrections to models is necessary to resolve issues of underrepresentation, unavailability, changing trend
I think there is a good technical solution here (if I understooxd the problem): If Y is the outcome and X the underpresented group, we can model the joint (Y,X) = [Y|X][X]. By treating X as a r.v, we get rid of the fallacy. There is a possibility the estimate of [X] can be biased if a large percentage is missing and not missing at random but those can be corrected by statistical methods (e.g., propensity matching).
Igor Perisic Great points about the under-represented. For me, its also a point about mis-classification. Lets say I make an algorithm to decide credit amount for an individual, and it only uses salary as the variable. To me, this is unbiased and credible. For you, you may measure my algorithm to find that I make bigger loans to white, middle aged men. My estimator is correlated with protected variables. But does this mean I should be giving more loans to other groups? Well, no. So a statistical adjustment or normalisation is inappropriate. The problem is that the sample data itself is biased. Income is not evenly distributed. If you tested my model in a simulated datasets with perfect distribution, you would find it unbiased.
My takeaway is to set out principles and be open about them so everyone working on this is on the same page.