Spreadsheet Hell

Spreadsheet Hell

The Dangerous Allure of the Spreadsheet

Spreadsheets are great. Really great. They can perform a seemingly unlimited number of calculations in a fraction of a second. There is a huge range of in-built functions from simple summation through to calculating complex polynomials. The flexible structure of sheets and grids enables data to be stored easily, and visualised as the user prefers. Features such as pivot tables and charts allow analysis and really effective presentation. And they are quick to get up and running. Need to perform a few calculations? Create a spreadsheet in a few clicks. Need to write a list? Create a new spreadsheet.

So what’s the problem with spreadsheets? Well actually there isn’t one. Spreadsheets are great (see paragraph 1 above). However there are a few things that they don’t do so well. For the purposes of this I’m going to assume we’re talking standard corporate desktop spreadsheet applications, typically the ubiquitous Microsoft Excel.

  • Spreadsheets are single-user. Whilst users can open as read-only they may only be worked on by a single user.
  • Spreadsheets have only the crudest of security and protection.
  • Spreadsheets do not have an audit trail of who changed what and when.
  • It’s pretty hard to prevent people from specifying invalid data values.
  • It’s almost impossible to prevent people from entering inconsistent data.

None of these limitations are terribly restrictive or problematic when you consider how spreadsheets are intended to be used.

The trouble with spreadsheets starts when you try to use them for something they’re not really good at. And it starts off ever so innocuously. Need a list? Create a spreadsheet.

“I need a list of environments” – create a spreadsheet. With a column for Environment Name and then probably a whole selection of other attributes about the environment. And a column saying which project is using the environment.

“I need a list of servers” – create a spreadsheet. In the server spreadsheet we create a column with “Environment Name”. So now we can track which server is assigned to which environment.

This is a typical start to Environment Management. I’ve been there and I’ve done it. More than once. Yesterday we didn’t have Environment Management. And today we do. This is good progress.

Pretty soon things start to get a bit more complicated. The trouble is that Environment Management is actually quite involved and there’s rather a lot of information that you find you need to capture. But that’s ok, we can add more columns for this data, or we can create whole new spreadsheets when we want to maintain a list of, say, applications, databases, releases and various hardware components.

After a short while of using our new Environment Management “system” (we’ve already given it a neat name, EMILY, short for "Environment Management Information LibrarY") we also find that having found what project an environment is assigned to we keep having to look up information elsewhere about the project. Who the Project Manager and Test Manager are and their phone numbers. Also what cost code the project has allocated to it. The simplest way to solve this is to add a few more columns to the environment spreadsheet. Now when we look at an environment we can immediately see all the information we need.

Joe, who fancies himself as a bit of a database expert, pipes up that we really ought to have a separate spreadsheet containing a list of projects with that information in so as to avoid duplication but he is overruled as that would still mean having to go somewhere else to look up the project details. Having filtered our environment list we just want it there, in the list, clearly visible. Joe suggests using a "vlookup" to copy the information automatically into the environments sheet but we find it unreliable when normal people need to maintain the projects and environment lists. He mentions something about buttons and macros. But nobody wants to press a button, and everyone remembers the last time someone embedded macros in a spreadsheet. It worked well (apart from the annoyance of having to consent to potentially disastrous actions whenever you opened the spreadsheet) right up until the point when the New Bloke changed the spreadsheet. Then it stopped working. And he didn’t know much about macros and the chap who had originally written the undocumented spaghetti had left. New Bloke muttered something about “probably MS Office version incompatabilities” and we never really got to the bottom of it. Or got it working again.

Anyway, I digress.

At some point it becomes evident that it’s all very well knowing who is currently using an environment but it would be jolly useful to know how long they’re going to be using it for. New projects are starting and they will need development and test environments at some point. Rather than building new environments it would be good to be able to reuse existing ones. So now we need to keep track of an environment’s schedule, or bookings.

It feels a step too far to add columns to the environment spreadsheet for the bookings as we’re not quite sure how many bookings there are going to be. And so we have to create a separate spreadsheet to hold the bookings. For each booking it lists the project name, the environment name and the From and To dates. Now we can find which environments are available for use by a project. Filter by from and to date and then scan the list, comparing the from and to date of the existing bookings with the slot we’re looking for. It’s not terribly easy because if an environment has multiple bookings we have to look at them all to find any gaps between them. But it’s possible. We even ask Joe if he can write a macro to do it but after quite a lot of head scratching and umming and ahhing he declares it beyond his abilities. And now whenever a new environment is built it has to be added to both the environment spreadsheet and the bookings spreadsheet otherwise no one will find it.

Inevitably people forget to add an environment to both spreadsheets. And sometimes they put a slightly different name in each which causes confusion. A 1 and an l may look similar but they play havoc with the filters. We also start to find that a number of bookings have started having from or to dates of TBA and ??? - “I wasn’t sure and I couldn’t get hold of the PM so I thought I’d go back to it later. And then forgot”. Occasionally a booking will disappear from the booking spreadsheet. The project who thought that they had the booking get a bit cross when the environment isn’t handed over to them on the start date. Unfortunately we can’t find out who deleted it and why (we suspect a bit of finger trouble) because there is no audit trail.

After lunch one day Simon is busy editing bookings in EMILY when he is called into a meeting that lasted the rest of the afternoon. When they couldn’t locate him one of the other guys in his team saved EMILY as EMILY2 and when Simon eventually reappeared he had to work out which changes he had saved and which he hadn’t and merge them into EMILY2 and then rename it back to EMILY. After vowing never to do it again the following week he disappears into another meeting leaving EMILY locked.

Many management reports are proving time consuming to construct as data has to be pulled from various spreadsheets and be matched or combined. It’s starting to take a few hours every week to manually pull the information together. And when we have problems – for example a server goes down – it can take quite a while to perform all the tracing through the various spreadsheets to work out the impact and for example which projects are going to be affected.

To return to the original point, spreadsheets are great - they just aren’t intended to be used as a replacement for a database. But we can see how we might easily be seduced into thinking that Environment Management is a good candidate for spreadsheets. It’s all just lists right?

One of the common problems is many-to-many relationships. We saw it above with projects and environments. A project can use many environments and an environment can be used by many projects. The solution was to create the bookings spreadsheet to avoid enormous amounts of duplication. But this is just one of the many-to-many relationships. The same exists between environments and servers (an environment contains many servers and a server may be contained within many environments) and servers and applications. And there are many more. This is beyond the capability of mere spreadsheets. It is possible to model this using spreadsheets but you effectively end up replacing a lot of the standard database functionality with manual effort - which is costly and unreliable.

When the actor in this story created those initial spreadsheets he didn’t really have much else that he could get up and running with quickly. Another option would have been to create a database with tables, joins and views. However, this is problematic for several reasons.

A simple database on its own is not that usable. The user interface is nowhere near as simple, intuitive or as fully featured as a spreadsheet. It doesn’t contain a library of formulae, and most corporate users do not have the software available to directly access databases. Spreadsheets make filtering and sorting easy but a casual database user really requires more SQL knowledge and experience than the casual database user normally has at their disposal.

A database on its own is not a viable solution, the database really needs a front-end application to be developed to be useful. At the very least a set of forms. Suddenly instead of quickly getting Environment Management up and running in a few days it now requires a small project. And if you’re going to do it well then it’s actually not that small a project. It requires quite a lot of thought, analysis and design to get the structure right for today and for the future. Maintenance becomes much harder and more expensive. The simplicity of adding another column now requires application changes.

The simplest, and yet worst, database option is to try and use Microsoft Access. But that’s a whole different story.

As an example, when I implemented our information management product Cimera for a large retail organisation a couple of years ago they were managing their Environment Management function with spreadsheets. Although “getting by” would probably be a more appropriate description. They had a dozen main spreadsheets, held in different locations, each containing multiple worksheets. Allocating a set of environments to a project would take several hours of filtering, sorting and searching to find suitable candidate environments that were free at the required times.

There was so much data duplication (or in database terms “denormalised data”) that the Environments team would have to make changes in several places. Maintaining the spreadsheets had become so onerous that people in the team responsible for the testing infrastructure would often batch up their changes and apply them at the end of the week. Immediately the spreadsheets are out of step with reality. And sometimes by the end of the week there would be higher priority work and the changes would not get made at all.

With so many spreadsheets in use carrying out any sort of audit was impossible. To audit the validity of the data would effectively require re-gathering the data and comparing the before and after. Just checking consistency within the spreadsheets would have required a number of substantial macros to be written.

When we analysed the data contained in the spreadsheets we identified over 40 different types of item that needed to be managed such as projects, environments, bookings, servers, applications, application instances, databases and infrastructure and network items (a retail test environment  contains devices such as tills, printers, scanners, routers and switches). Many of these items had multiple links to other items. After analysis we identified about 15,000 discrete objects and 30,000 links.

I asked them how accurate they thought their data was and they said over 80%.

When we loaded the data into Cimera it used the rules we’d agreed. There were many instances of invalid data –TBA, TBD, Not Sure, Unknown and a varying number of question marks were quite popular. There were also instances of data in different spreadsheets not aligning – an item that should exist in multiple places could not be matched, either mismatches due to misspellings or simply rows that did not exist.

Less than 20% of the data loaded without error. On further analysis many of the errors were fairly minor but some had more serious consequences.

Once the information started being managed by an appropriate tool efficiency dramatically improved. Finding suitable environments to match project requirements now takes minutes rather than hours and uses an interactive Gantt chart. Understanding the impact of problems or proposed changes is now just a few mouse clicks away. Information is only stored once and therefore only needs to be added or updated once. Information is now kept up to date at the time of making changes. The data has integrity and there are fewer mistakes and errors. Standard queries have been created and management can now get their own live reports based on more accurate and uptodate information.

Spreadsheets are great. But we must avoid the temptation to use them for purposes that they weren’t really intended to be used for. And it is tempting when we are all so familiar with spreadsheets. As Mark Twain said: “To a man with a hammer, everything looks like a nail”.

I migrated from spread sheets to Propel Systems Cimera at a well known telecoms company and building society in the UK. The spread sheets were at the point of breaking. There is only so much you can manage before things start to break or data input and validations becomes overwhelming. And to be honest only I understood what some of them did and how to use them. To add to that different departments had their own spreadsheets and so we had different opinions on the true story. The first time I implemented Cimera, the Propel Systems team helped me understand what they needed in order to develop a suitable system that would work for my team and other departments. As I learnt, one of the great things about Cimera is that the system, developed for you, can be easily changed and tweaked as you work out how you are really using the data rather than how you think you are using the data. I went from being at breaking point with spread sheets to having a systems that I felt I could pour data into within a matter a couple of months. It is fair to say that I fell in love with Cimera, the time savings were huge and the data accuracy night to day. What was the data being managed? Software release related; component names, versions, environments, deployments, people related with deployments, defects, incidents, RFCs. You name it. So much more data than had been possible with the spread sheets. So much less labor intensive, especially if you take data feeds from other sources e.g. Remedy and Quality Center.

Like
Reply

I have re-read the post again. I agree with the challenges of environment management and I agree that spreadsheets in this case are not the best tool for end to end environment management. The Cimera application looks promising, I will see what it offers :)

Like
Reply

This article is too long and jumps all over the place. The content is great but it could be shrinked into the dos and donts of using spreadsheets. The content is great, just needs to be a little more succinct.

Like
Reply

Finally! I absolutely agree. Use products (like spreadsheets) for what they should be used for. Excellent article.

To view or add a comment, sign in

Others also viewed

Explore content categories