GC Data Analytics - Cloud Changes Everything
I 've been told I have a gift for writing titles that are boring. This one is pretty good. But not competitive with Canada's Best.
Thanks to my client for the past few months, Transport Canada, I've had a chance to spend serious time thinking about data analytics in a GC context.
People who actually know anything about data analytics can stop reading now. I'm a generalist and not an expert on any one thing (although I'm an invited speaker at the upcoming conference on AI in the Public Sector, which will surprise anyone who knows anything about AI, perhaps).
But I did come up with a couple of useful perspectives.
Data is a Thing
In the past I thought about data as being a part of a computer system. It was something produced and consumed by the important thing, the IT Application. OK, I was thinking food and excrement when I wrote that, can't deny it.
After reading like a decade's worth of smart people writing about Data, blah-blah, lately the light went on that the data (information) is the valuable thing. The IT apps are just ways to collect/cleanse/process/analyse/distribute the data. So THAT's what "data-centric" means, huh.
Data Stewards Should be a Thing
Data stewards are to data as Ents are to trees. They cherish data and protect it and cultivate it. I see Data Management groups emerging in different GC departments. In the past, I thought they were annoying librarian-wannabes who wanted to book my time to discuss "Open Data" or "retention periods". Now - enlightened - I understand that they are the custodians of the Data. Data sensitivity ("Protected B") is the merest shadow of the complex nature of GC data.
When I do architecture now, I draw a big Visio box around the datastores. This reminds me that the datastores really need to be managed by departmental Data stewards.
The Cloud Makes This All Easy
In the cloud, data is not the same thing as "storage". Storage belongs to applications and lives by the application's rules. Data has moved out and gotten it's own apartment.
I create data storage accounts, of various temperatures, in various locations. I give ACCESS - but not ownership - of these storage accounts/datasets/databases to IT applications. It's easy to be promiscuous - I can give MANY applications access to my data. I can give the global public access to my data. (If I do this accidentally it's a 'data breach'. If I do it on purpose it's "open government.")
So - in the cloud - the model of independent data stewards is finally easy to do.