Big Data has failed us
Let me first preface this article by saying I am not a data analyst, data scientist, or epidemiologist. I am however a father, a partner, and the son of two elderly parents, and that is the perspective from which I am writing this article.
On the weekend, I posted the following thought on one of Dan Andrew's posts.
It received the usual amount of engagement. Someone who didn't read my post properly. Someone who posted a link to the type of visualisation I was saying wasn't detailed enough. The naysayer who said it's impossible.
Coincidently the same day, The Age posted an article calling for transparency on Victorian COVID-19 data, which echoed my thoughts, and detailed some of the problems we are facing.
Transparency
"The first question is what data do the government actually have? Because they haven’t really told us," said epidemiologist Professor John Mathews from the University of Melbourne.
It has been 5 months since COVID-19 hit Victoria, and we have seen little data transparency from both state and federal governments. When they have released data, the visualisation of that data is then often left to news outlets or citizens, which can lead to problems like misinterpretation, injection of bias, and plain old human error.
Searching the DHHS website — the main content hub for COVID-19 information in Victoria — for "graphs" rolls doughnuts.
To their credit they have created a Power BI dashboard which contains a number of charts and graphs with reasonably up-to-date data, however it's only accessible via a small text link on the home page that says "View full report".
Not exactly a great user experience.
Quality and accuracy
As I mentioned in my Facebook post, my main problem with the current data available is its lack of accuracy. Take for example, this graph from The Guardian.
Now I greatly admire The Guardian data team, in particular Nick Evershed, but this is in no way meaningful or helpful on any practical level. This isn't a criticism of The Guardian team, they did the best with what they had.
Lack of insights
"We as a citizenry need better data," Professor Blakely said. "We don't want to just be berated about behaving badly. We want to know why [the cases are increasing] ... how we can fix it. Everybody needs to feel like they are in the control room."
This really gets to the heart of my concerns. As a citizen, I'd like to know what's going on around me so I can make more informed decisions about how I conduct my life.
Currently, if infections occur in workplaces, it is up to those workplaces to communicate to their customers that one of their staff has been infected. This can be difficult, as in-store transactions are often anonymous, and announcements are often disseminated through different channels.
If I had data available to me that displayed the rough location of infected people, overlaid with their contact tracing information/travel patterns, I can decide where I shop, where I exercise, or whether or not I keep my kids home from daycare, and even whether or not I might have been exposed.
Individual privacy vs. community safety
As I mentioned in my Facebook post, there is a real issue around making data about individual cases and their contact tracing data public, but we need to weigh that against the greater good.
One technique that I have used to anonymise/obfuscate geolocation data with Trueme is to use a geocoding system called Geohash.
Geohash, put simply, is a way to convert a high precision GPS coordinates to a low precision grid reference.
The precision of the grid can be changed simply by changing the length of the Geohash, which is just a number between 1 (an area of about 5000 x 5000km) and 12 (an area of just a few centimetres). A similar geocoding implementation you might be familiar with is What3Words.
For me, this would be a good way to visualise transmission in our community that balances accuracy against privacy.
A map similar to the one below visualising cases and hot spots in Melbourne suburbs would be far more useful than the one above.
What's the solution?
The short answer is, I don't know. The problem is huge, rapidly changing, and I am only looking at one small part of it.
I am really keen to hear your thoughts on this, particularly if you work for the state government, or work in data science or epidemiology.
Here are a few thought starters:
What data would you like to see? Are there any states/countries that are using data really well? How is the contact racing data being stored/analysed? How are you making decisions about how you and your family live your life right now?
How do we best balance individual privacy against the common good? What part should Apple and Google play in augmenting this data? Which government department would be best equip to tackle these problems? Is this something private enterprise could tackle? What channels are best for sharing this kind of data?
Let me know in the comments!
Not specifically data related, but some interesting thoughts for developers and public health officials on what the public thinks about COVID-19 technologies. https://www.adalovelaceinstitute.org/no-green-lights-no-red-lines/
Department of Health & Human Services, Victoria keen to hear your thoughts on this
Hey Psy, thought you might be interested to see this - https://powerplatform.microsoft.com/en-au/return-to-the-workplace/?ocid=AID3017346_QSG_EML_440404
Great post! As the article from The Age highlighted, I think sharing what information the government actually have, and are operating with, would be a good first step. I wonder if there is a correlation between the granularity of the data we are seeing in the charts and the timeliness of the data. As I understand it, by the time we have information about an infected person, that information might be as much as two weeks old. Then starts the process of trying to work out where that person has been and who they may have been in contact with in that time. The juxtaposition of individual privacy vs common good is an interesting one. There has been a large number of cases reported where people have ignored self isolation directives after testing positive. Does that mean we should release the information of the whereabouts of those individuals and where they have been?
I think the potential risk that gets run with something like this that if people see they're in a nice happy green area, they relax and it futher promotes the spread if something does make it in. That being said more info would be awesome. Maybe a business/POI lookup for links to cases that would help make those decisions you mentioned better?