Using Excel as 'My First Database' - Not!
I'll admit it; I'm old enough to remember reading the Janet and John books (in the USA think Alice and Jerry) when I was at infant school, as they were called back then. And I'm sure we all remember 'My First …' whatever it was; a bicycle, a pet, a toy. But they're never quite as good as the grown up equivalents, Janet and John is not Shakespeare and a Play-Doh oven is never going to produce haute cuisine.
And by the time he's done with it, oh and renamed his copy, then Spot the Dog has a go too (though there's a good chance that he'll make a better job of it).
And I'm sure many of us remember our first foray into the world of spread sheets (and yes, mine was Lotus 123). And aren't they great! They certainly are, but only for what they're intended to be used for, which essentially is maths and calculations. They're not word processors, though some use them as such, they're not presentation tools, though some use them as such, they're not collaboration tools, though some try and use them as such too. I well remember the time a Partner at a law firm furiously demanded that the IT Department 'force' standard Excel to be able to be updated by more than one person at a time because otherwise a deadline would be missed (of course we used a bit of judicious copy 'n' paste and hit the deadline).
But why was that demand made in the first place? It was because a client had supplied a list of property details which needed to be updated; and they'd sent it in a spread sheet. If often happens; Excel is ubiquitous, well known, has a built in GUI and its layout has rows and columns in a 'table'. So it very often gets used as 'My First Database'. But like Janet and John, it's not exactly grown up – sure, it's very fit for its intended purpose, but that purpose is not to be a database. And using it as such can often lead to issues and wasted time… which I'll come on to.
But first, I should mention that Microsoft is not entirely without fault here. MS Access was fine, and was good because it was very easy to create a UI and yet enforced data types - but it's largely gone the way of the Dodo now for good reasons I won't dwell on here. And because Excel is so ubiquitous then over the years MS have added features to the product that have encouraged users (lacking Access) to try and use it as a Janet and John DB primer; I refer particularly to external data connections. Yes it's a good idea to be able to get data into Excel from external sources, but unfortunately it encourages people to think of Excel as a database in itself, and that’s where the problems start.
So how does using Excel as 'My First Database' cause us issues? Well in and of itself, it doesn't really – if things stay in Excel then that’s fine; it's when the Excel 'data' wants to start playing with the grown-ups that the problems start and by grown-ups I mean 'real' databases.
- Formatting – getting well formatted data, of particular data types, from 'grown-up' databases into Excel is easy. And then users want it uploaded back to the 'grown up' system. But oops! They've added or deleted some columns, and really, really needed to add some comments into that value column, or change that numeric to currency because they wanted to see a currency symbol when they printed it.
- Excel itself doesn't help here. Paste a text value in and then spend an hour wondering where all the damned CR/LFs are coming from!
- Collaboration or lack thereof – So not only does Janet make all those formatting changes, she passed the spread sheet (or even worse, a copy of said spread sheet) to John, who also does the same. And by the time he's done with it, oh and renamed his copy, then Spot the Dog has a go too (though there's a good chance that he'll make a better job of it).
- Hidden data repositories. With GDRP just around the corner then this one really speaks for itself. What information do we have lurking undiscovered in the Janet and John data silo that shouldn't really be there?
- It's too easy to start going down the wrong track. Because Excel is on almost every desktop, then it's too easy to start using it for a purpose it's not suited too. But unfortunately by the time users realise that and start shouting for help from the adults (the IT department), then it’s too late. They already have reams of badly formatted and uncontrolled data that can't easily be ported.
None of these issues are real 'show stoppers', but they can be time consuming and I've spent many a happy and expensive hour re-formatting data and having to clean up data so that it can actually become useful. And having to do so, despite informing the users in advance what they should do as best practice.
Now there are ways to mitigate some of this; locked and protected areas, controlled process and working practices, using a DMS to index contents etc. but these methods cannot mitigate against the last point. The fact is that Excel is really not intended to be used as a database (but yes, I appreciate, it's just so damned easy!).
So, what advice would I give around using Excel as 'My First Database'.
- Firstly – Don't. At least not without considering other options, such as; could a SharePoint list (which you can use collaboratively) do the same thing? Does the business need that I'm trying to address have a timescale that would allow us to develop something better (with appropriately skilled resource, an 'Access' type front end can easily be developed and deployed against a 'grown up' database as long as there is no convoluted business logic or validation required)?
- Secondly, before you decide to start in Excel, get some advice before you start down a course you'll live to regret.
- Lastly if you do absolutely need to use Excel as a data repository, for example if a client demands it:
- Maintain consistent formatting and use protected areas if required, particularly if the data has come from an external source where it will need to go back to.
- Use a DMS to control versioning and access. Additionally this will index the contents and make then more easily discoverable for GDPR purposes.
- KISS – Keep it Simple. Only values where values should be and Absolute Values please, no merged cells, only one row of column headings, no hidden columns. Keep it like a simple table; no vases of flowers on top to make it look pretty.
- Remember that CSV is your friend. When importing data from 'Excel', it's so much easier to import from a CSV. Users should save as a CSV and then re-format all values to 'General' numbers and save again. And don't allow Excel to open the CSV once it's saved in order to check it, as it has a nasty habit of re-formatting again - there's a great program for that already, and it's called Notepad.
Okay, so using Excel as 'My First Database' is a bit of a first world problem that only drives people like me to distraction because I have to deal with the fallout, but with a bit of forethought it's one that can be avoided entirely, or at least mitigated so as to reduce the business overheads of dealing with Janet and John (and Spot the Dog).
So glad I don't have to do that any more... dealing with spreadsheets populated by people with shoe horns and hobnail boots is tedious at best.
Thank you for this very witty article, and for the trip down memory lane. It reminded me of Harvard Graphics for some reason. Back when I worked for Amoco Oil Company, I was asked to create a bar chart for data that ended up being the number 2 shown over a period of time. I could not convince the requester that a bar chart of that particular data would not be a good idea until he actually saw the end result.
Great article! We've come a long way since Quattro & Lotus. 😀
You forgot to mention VisiCalc.... free clip on beard and snazzy check shirt with every copy....