Clusterplot
When presented with sets of objects, one common technique used by statisticians to make sense of them is cluster analysis. What this does is it tries to group them into the namesake clusters so that objects within a cluster are more similar to each other than they are to objects belonging to any other cluster. There are many mathematical models that can be used to do this, and applying different algorithms to the same dataset can lead to dramatically different conclusions; choosing the most appropriate clustering method, one that can describe the data in a meaningful way, is sometimes more art than science. If the statistician is lucky, their data will look like this, where there are clearly two, well-defined clusters, so they can easily separate them and knock off for lunch early.
Most datasets in science will instead probably look much more like this, where there can be two clusters, or one cluster, or three clusters, depending on the model, and so it’s another late night trying to understand the graph.
But choosing a questionable model may have even more dire consequences than just a dodgy graph, if the underlying data represent a thorny issue with political and social implications. This is exactly what happened to the authors of a Nature paper last month, which included a chart of genetic ancestry that seemed to suggest that different human ethnicities can be separated into different clusters, when in fact their data, and the general scientific consensus, state the contrary. The plot has added to the longstanding debate of how to publish work that will avoid willful misinterpretation by those seeking to propagate discredited theories.
Let’s get one thing straight: all human beings belong to the same race, Homo sapiens sapiens, and the notion of biological race, as something that exists innately within our DNA, is soundly rejected by anthropologists and geneticists. Although the idea of categorising groups of humans has existed since the dawn of mankind, its modern formulation largely comes from theories developed to justify colonialism and nationalism. These theories distorted some facts, ignored others, and flat out made up many in order to create a hierarchy of races, almost inevitably ascribing all good qualities to white Europeans and placing them at the top of a divinely created “order of things”, while humans of other continents and skin colours were deemed as less evolved, more brutish and nearly animal-like. But although scientific racism is no longer the consensus in academia, some aspects of it remain embedded in the popular consciousness. These can range from vaguely complimentary, such as believing that certain ethnicities have innate athletic ability or higher intelligence, to potentially life-threatening, like believing that someone is not at risk for a disease because of their skin colour.
Recommended by LinkedIn
In reality, humans are really similar to each other. Whether it’s the person you’re sitting right next to or someone from the other side of the world, you share nearly 99.9% of your genetic code with them. In fact, you’re much more likely to be more different from someone that you consider of your own “race” than you do with someone who is not. Genetic variation within populations is continuous, representing a long history of mixing and interbreeding, and it is impossible to divide humanity in ethnicity-based genetic groups. To get back to our example from the top, we are very much many dots in the same big cluster, too close to separate.
At the same time, humans are undeniably different from each other, in ways that often seem to follow skin colour or nationality. If not from genes, where does this discrepancy come from? Almost always, it is socioeconomic factors, such as poverty and systemic discrimination, that account for this. In other words, race is a social construct, a categorisation born of human-made factors lacking a biological basis. Genes and DNA offer a convenient cover to avoid having to address bias and prejudice: it is easier to believe that there is something innate and unchangeable deep inside our cells that predetermines our successes and failures than to confront the reality that human choices have led to inequality.
This is why the plot in the Nature article can be so dangerous: it can be exploited by those who want to reinforce pseudoscientific narratives of race and genetics. By using fancy jargon and misleading graphs, they try to give a semblance of legitimacy to these discredited theories and claim they are founded in research, and never address the root causes of the faults of a system that favours some.
Science has a duty to be impartial and try to approach every subject without bias or agenda, but it must always be vigilant in how its work is represented. Scientists and researchers cannot just show what they have done: it is imperative that they communicate appropriately and clearly. Bad graphs and poorly worded presentations can do more than just make for boring conferences: they can end up reinforcing bigotry and injustice.
Very well thought of. Loved reading the article