My Postulates About Data Sharing
There has been a recent fiasco related to data sharing by Facebook. I’m yet to understand reason behind such hue and cry about the incident. It has been well known for ages that Facebook (like so many other companies) feeds on your data, and any app users (those surveys, dumb-down quizzes etc.) don’t just pass their data, but their friends’ data as well, to app developers. I’m not surprised, shocked or agitated at all due to firm belief in two data postulates, presented below.
1. Data once shared, can’t be controlled in any way
Once you’ve shared a dataset, you can’t dictate any terms on its use, traceability, or deletion. Not even compliance to some contract. Or, exercising these controls and ensuring compliance is prohibitively difficult and/or expensive. Once a copy of data is created and passed, you don’t ‘really’ dictate how it may be handled, how many times it may be copied, where these copies may get stored (disks, cloud, version control); and so on.
The reason, that this data was considered useful once, never really goes away. If not useful immediately, data will be zipped and tucked away – but will never be deleted. It is better to assume, data once shared will never be deleted – even if claimed so. Advice is to know and understand this, before decision to shared data is taken.
2. Data considered anonymized, may reveal personal information
One might think that ‘anonymizing’ the data before sharing will be safe, as it would not reveal anything specific to a person. We need to understand that by using brute force, aggregation, summarization or with help from supplementary public info, even ‘anonymized’ data may reveal
· * Same info which you wanted to protect in first place, (Re-identification possible with Australian de-identified Medicare and PBS open data)
· * Some other secret (that should have remained secret) - (Strava Data Heat Maps Expose Military Base Locations Around the World)
Check the above links about recent real-life mistakes for better understanding on this.
Happy data sharing! (but be careful)…