My Postulates About Data Sharing

My Postulates About Data Sharing

There has been a recent fiasco related to data sharing by Facebook. I’m yet to understand reason behind such hue and cry about the incident. It has been well known for ages that Facebook (like so many other companies) feeds on your data, and any app users (those surveys, dumb-down quizzes etc.) don’t just pass their data, but their friends’ data as well, to app developers. I’m not surprised, shocked or agitated at all due to firm belief in two data postulates, presented below.

1. Data once shared, can’t be controlled in any way

Once you’ve shared a dataset, you can’t dictate any terms on its use, traceability, or deletion. Not even compliance to some contract. Or, exercising these controls and ensuring compliance is prohibitively difficult and/or expensive. Once a copy of data is created and passed, you don’t ‘really’ dictate how it may be handled, how many times it may be copied, where these copies may get stored (disks, cloud, version control); and so on.

The reason, that this data was considered useful once, never really goes away. If not useful immediately, data will be zipped and tucked away – but will never be deleted. It is better to assume, data once shared will never be deleted – even if claimed so. Advice is to know and understand this, before decision to shared data is taken.

2. Data considered anonymized, may reveal personal information

One might think that ‘anonymizing’ the data before sharing will be safe, as it would not reveal anything specific to a person. We need to understand that by using brute force, aggregation, summarization or with help from supplementary public info, even ‘anonymized’ data may reveal

·        * Same info which you wanted to protect in first place, (Re-identification possible with Australian de-identified Medicare and PBS open data)

·        * Some other secret (that should have remained secret) - (Strava Data Heat Maps Expose Military Base Locations Around the World)

Check the above links about recent real-life mistakes for better understanding on this.

Happy data sharing! (but be careful)…

To view or add a comment, sign in

More articles by Mayank Kumar

  • Managing Stakeholders in Analytics Projects

    Like any project, stakeholder management is a critical activity for analytics projects. More so, if they also happen to…

  • Data analysis is for everyone, not just Data Scientists!

    I think that time has come when data analysis needs to be considered a mainstream skill. This, along with curiosity to…

    1 Comment
  • My First Attempt at Word Prediction

    I did this project for Coursera Data Science Specialization in January this year. Objective was to predict the next…

    2 Comments

Explore content categories