Excel Formulae: Sheet References and Data Quality Risks
When using Excel you'll often have need to reference cells on other sheets within the same workbook, and be used to seeing these references in your formulae, e.g.:
- =Sheet1!B2 (where there are no spaces in the sheet name, the sheet name will be followed by an exclamation mark)
- ='Sample Date'!B3 (where there are spaces in the sheet name, it is surrounded by single quotation marks before the exclamation mark)
These sheet references can cause data quality issues when referencing cells on the same sheet, most commonly seen when linking to cells from more than one sheet in the same formula: If a cell on a different sheet is referenced, when returning to the original sheet to add subsequent cell references to the same formula these will also have a sheet reference assigned. Here are two instances where such same-sheet references can cause problems.
1. Copying formulae to new sheets
In this simple example the two formulae initially produce the same result.
The first includes a same-sheet reference, the second is the 'usual' format you would expect when linking to a cell on the same sheet. Now look what happens when these cells are copied and pasted into a new sheet:
The same-sheet reference in B4 is unchanged and refers to the original Sheet1. When the value in cell B2 is changed on Sheet2, only cell B5 shows the correct answer to the calculation. Whilst there could be circumstances when this is the desired result, care must be taken.
2. Sorting data containing formulae
If you have a data set containing cells with same-sheet references, sorting the data can cause serious issues. In the sheet above there are 3 columns of data:
- Column B contains hardcoded values
- Column C cells contain a link to the cell on the same row in column B
- Column D cells contain a link to the cell on the same row in column B, but also include a sheet reference
By sorting the table of data into ascending order based on values in column B we can see how problems arise:
Because the cells in column D contain a sheet reference they 'follow' the value they were originally linked to whereas the formula itself doesn't move. As an example, the formula in cell D2 now refers to cell B5 instead of B2. D9 is the only correct cell, because the absolute value of 40 hasn't moved to a new row - more luck than judgement!
If you relied on this table of data for subsequent analysis, for example with SUMIF or VLOOKUP, sorting the data would give incorrect results.
How to avoid same-sheet reference issues?
Simply ensure same-sheet name references are not included. If your formulae are already in place, do a find and replace in Excel (CTRL + H) and replace the same-sheet name reference with a null (blank) value. Using the List sheet example:
Also, when receive a workbook from someone else containing tables of data with formulae in, check for sorting errors!
Surely, if you were concerned about legibility you shouldn't be using referencing by relative worksheet location (A1 notation) in the first place. The concept, though it be the default convention of spreadsheets, is simply an accident looking for somewhere to happen. The difference in behaviour with or without the sheet name is interesting but is merely one more trap for the unwary. I would recommend Names every time. For imported data the List/Table gives a means of reference that is both dynamic and meaningful. Within the table = 2 * [@Value] is clear. Externally to the table, the possibilities are = 2 * List[@Value] or, better still, with dynamic arrays = 2 * List[Value] returns the whole range from a formula that only exists within a single cell. It could be wrong but it cannot be inconsistent; if fact, none of these alternatives ever give rise to inconsistent formulas down the column. According to the Excel version you are using, either the whole output range should be named or, with DA, the formula cell can be named 'double' (say) and the array would be 'double#'.
using the current sheet name in the formula increases the formula's illegibility