The problem of spreadsheets
If you live in the UK you can't have missed the email that cost £7bn. For those outside the UK, several years ago during the big pull-out of Afghanistan, someone in the MoD emailed out a spreadsheet with the details of some 19,000 people who had applied for a scheme to be evacuated to Britain, notionally because they had been collaborating with the British forces. The subsequent coverup and resettlement scheme are predicted to cost at least £7bn, or slightly over 1/3rd of the Chagos Islands Debacle.
I don't want to retread the politics here, and I'm sure that many of you have already tuned out, what I want to talk about here is the problem of spreadsheets and how it's really not as easy to fix as we (techy people) make out.
Needless to say, from "our side" of the Internet there has been a lot of condemnation of the data storage (I've engaged in it myself) used and of course it is absolutely, categorically bad to store data like that in a spreadsheet, and we have to acknowledge the other IT failings here - why was the data in a spreadsheet, in an unsecured environment where it could just be emailed? It is infuriating to learn that the MoD takes less care of such sensitive data than we keep of, say credentials for throwaway test VMs.
We've spent a lot of time, money and sanity on working out how to safely manage medical data in compute facilities at UCL and indeed have also engaged in such efforts at a national level. It's not an easy task but one would hope that the MoD where similarly engaged. Apparently not?
But again, I'm drifting off into a side quest.
Spreadsheets.
Spreadsheets are misused a lot and it's pretty common to complain that person X is using a spreadsheet when they shouldn't (and yes, I complain about this too, particularly because they are very bad at storing textual data but people will make spreadsheets with thousands of rows of that) but we don't really talk about why they are doing it.
They are doing it because we have failed them.
I've had this rant in a different context before (namely scientific misuse of Excel which has irrevocably corrupted research data sets by misinterpreting things as dates) but there I think there are at least tools to process numerical data correctly. You can use data frames and Python, or R, or MATLAB or any number of other data processing tools.
I'm not sure what a non-techy person is supposed to do when managing a list of names, addresses or other non-numerical data.
The obvious answer is "a database" to which my response would be "which database?" What databases do non-technical people have access to? I remember in the good old days, when the world was young and 32 bits was an exotic luxury, there were a lot of "consumer" databases - every "works" suite had one - ClarisWorks, Microsoft Works - there were common easy to use products like Borland dBase, Microsoft FoxPro and FileMaker Pro and more complex baby relational databases like Microsoft Access, Borland Paradox and so on.
(I have used every single one of those tools in the pre-2000s, except FoxPro)
Now what is there? All of the "easy" databases (the ones that came with the "works" packages) are gone. The midrange databases like dBase and FoxPro are gone. FileMaker Pro still exists but no-one remembers it exists and Paradox, bizarrely also still exists as part of Corel WordPerfect Office (which I bet you don't know still exists).
Microsoft Access still exists but up until recently it was only available on the more expensive tiers of Microsoft Office (it's now included on all Office 365 plans) and remains Windows only. It was never the simplest of tools on the one end, and on the other, more technical end, you will find people treat it with almost the same derision as Excel, largely because it's not something it was never meant to be - namely it is not a multi-user database server. It is, I'll confess, a slightly awkward fish, being much more powerful than, say, Works, but then not actually powerful enough for large projects. It's a bit like the Microsoft XEDOS of the database world.
Recommended by LinkedIn
There is also the rank outsider of LibreOffice Base, something that does indeed exist, but again, I would absolutely never suggest that anyone non-techy download it and install it because malvertising is now a terribly common thing (thanks Google) and in most environments it's probably a firing offense.
With both Access and LibreOffice Base you need to a) know they exist and b) be moderately techy.
It doesn't matter though, because of course the people saying you shouldn't use a spreadsheet for tables of textual data don't really want you to use Access (or Base) either. They want you to use a "real" database, one that lives on a server and you connect client applications to - MariaDB or DB2 or Postgres.
I need you to sit back and think a bit about why that might be difficult. Take your time.
OK. For those who find it difficult to put themselves in someone else’s shoes, it is absolutely not reasonable to expect normal people to stand up a Postgres server somewhere, and manage it. Indeed thanks to the prehistoric security model a lot of databases have (encryption has only just been invented and is super optional) it would be an even worse security mess than putting a spreadsheet on OneDrive.
“But that’s what IT teams are for”
Yes, yes it is. And have you considered how difficult it is to get the local overstretched IT team to set up a database in the enterprise database server for you, even to set up a table in an existing database for you? And then you are limited in what you can do to the structure? And the DBAs start asking you for a schema (whatever that is) and all you want to do was store a list of names and addresses?
And then someone needs to write an application for you or you need to use an interface even less friendly than Access. And then your colleague down the hall needs access and then they need and account and then…
(Now there is a rank spoiler in here which is that GDPR means you should really do a data risk assessment (I’d love to have seen the one for the Afghan leak) which means you should probably be at least talking to your data controller about this but let’s transport this outside of regulatory land because that’s a whole other side quest.)
How are you supposed to store that data?
Well there’s an application installed by IT on all your computers. It lays out data in a table which is how you model the data in your head. You can share it on SharePoint/OneDrive/Teams and make co-operative updates to it. You can insert columns and rows as you need.
It’s called Excel.
And that my fellow techy folks, is the root of the problem. We have failed normal people because we have not created the tools for them to do their jobs safely and, worse, destroyed the ones that did exist.
100% with you. 30 years ago as a dba I was tearing my hair out at the number of access databases etc were presented to my team with the comment can you stop it doing x or can it now do y. But no budget etc etc etc. I hated all user databases since. But you're correct I could offer no alternative decent solution that worked and was secure and above all cheap to implement
Owain I'm with you
Great write up Owain Kenway. I always think it was a huge mistake when MS dropped the Excel binding to SharePoint lists for editing. The remaining ability to link to and present the data maintained elsewhere into Excel is still very powerful. Set the data connection correctly and the data is cleared on closing and refreshed again on re-opening, and when updated. Upshot for the MoD fiasco would have been emailing an empty spreadsheet.
I do love a good spreadsheet though..