Human-Centric Approach to Programming Errors
To deliver quality, you need to control the process.
Introduction
Software fails. It fails because software is written by people. Yet, how often do we consciously take a human-centric perspective on bugs?
We know a lot about how errors are made, but this knowledge tends to stay in the realm of interaction design and safety-critical systems. As a software developer, I find that unfortunate; every programmer can benefit from knowing their shortcomings. Thus, inspired by Donald Norman’s “Design of Everyday Things”, I want to bring to your attention the interaction between a programmer and their tools, how errors are made, and why without a proper development environment we cannot even begin to talk about software quality.
Basics of Interaction
One of the key concepts of how people interact with the outside world, according to Norman, is bridging the gulfs of execution and evaluation. In other words, people have to carry out some action and check whether it accomplished their goals. Both of these tasks are non-trivial.
Gulfs of Execution and Evaluation
The diagram shows a useful model for understanding issues that your users are facing. Imagine that you are writing a function for your fellow programmers that communicates its success via error codes. When they call it, is it easy to perceive the function’s success, or can a programmer forget to check for the return code? Is it easy to interpret whether 0 means success or failure? How can the programmer be sure whether their goals were met?
As a designer, you should ask such questions about every aspect of your code because it will be used by others. But as a programmer, you yourself are interacting with your tools and the codebase, and thus you err when making changes and analysing their impact. Being responsible both for yourself and for others, you need to pay double the amount of attention to what kinds of mistakes people make.
Types of Error
There are two large groups of error: action (called slips) and thinking (called mistakes). They occur at different stages of execution and evaluation:
Occurrence of mistakes and slips during interaction.
Slips can be further classified into three categories:
- Capture — when instead of a desired activity you perform the most common one, like adding a semicolon in a language that does not require it.
- Description similarity — acting upon items similar to the target. A real-life example would be throwing dirty laundry into rubbish bin instead of a laundry basket. A code example would be using the median value when intending to use a mean.
- Mode-error — when a device has different states in which the same control has different meanings, and we fail to recognise which mode we are in. vim text-editor is particularly bad at this: lowercase and uppercase letters are completely different commands in vim, so if caps-lock is accidentally hit — havoc ensues. Alternatively, stateful services are also prone to mode errors: the same call leads to different outcomes based on past execution path, which is often difficult to track.
A capture slip if no malign intentions were present.
Slips usually occur when we are performing familiar actions. When the situation is less familiar, we need to choose our actions consciously, based on some rules, or if there are none — general knowledge. This is when mistakes occur.
Some mistakes are relatively easy to spot. If we decide on the right goal but mistakenly apply a wrong action, we can still notice that our goals were not met. An overly-specific example of such a mistake is using virtual functions from constructors in C++, since then virtual dispatch does not work. However, when our goals are wrong but the actions are correct for those goals, detecting failure can become extremely challenging.
On top of slips and mistakes, we have memory lapses. These can occur at every stage of bridging the gulfs of execution and evaluation, during the transitions between phases. For example, when managing resources manually, we can forget to release them, even though we know that we should. This is a memory-lapse slip. Or, when designing a program, we can forget that the input data needs to be normalised, because we were interrupted during the process, and thus omit this requirement. This is a memory-lapse mistake.
Dealing with Error
Given a program, can you tell me how many bugs it has? As a general rule — no. And neither can the regulators tasked with ensuring that software which flies our planes and runs our powers stations is sound. The only thing they can really do at any meaningful scale is ensuring the process of producing this software was acceptable.
The logic here is simple, and maybe — simplistic. If the manufacturing process is sound, then probably the end-product is sound too. Unfortunately, this is the best tool at your disposal. While you might not have to comply to heavy tomes of industrial standards, having conscious control over your production environment is a must.
Memory-lapses
In order to minimise the number of memory lapses, you need to minimise interruptions and always maintain crucial information in the visual field. Get into a habit of writing lists for yourself. For example, when practising Test-Driven Development, you have to iterate between writing tests and code. Having a list of all test-cases you’d like to write will prevent forgetting corner-cases in the process.
Of course, even better than lists is removing the need to remember something in the first place. For example, in order not to leak resources, you should use RAII paradigm in C++ (acquire resources in the constructor and release in the destructor, since these are automatically managed). Every time there is a pattern to the actions you have to perform, make the computer do it for you.
Slips
The key to handling slips is feedback. In the first Lithuanian Olympiad in Informatics, you had to code on paper. Then, you would submit the solution to a typist in front of a terminal, who would type your solution in, letter for letter with all the typos, and after a while — give you a printout with compiler errors. The programmers of the past were dedicated, but not very productive.
Make sure you’ve moved on since then. Unit tests that you run during development and a continuous integration pipeline that is triggered automatically by version control are not optional. The alternative of having feedback once per release cycle is worse than waiting for a human typist. There are no excuses for this in 2016.
Use all the automated help you can get: static analysers, linters, compiler warnings. But on the second line of defense you still need to have people. Do code reviews rigorously. It can be particularly uncomfortable for a junior to review senior developer’s code and demand changes. However, you have to remember that slips are mostly made by skillful individuals, precisely because they are skilled to the point of automation.
Keep code review differentials short, though. This will both allow you to move in shorter iterations, and make the reviews meaningful. Reviewing a 1K line diff is a mere formality for all but the most OCD developers.
Finally, to minimise slips, you need to be consistent. Consistency should be a theme for everything: error handling, code style, etc. Most inconsistency stems from the lack of thought. Be a thoughtful programmer.
Mistakes
Mistakes are probably the most challenging to overcome. You need a combination of training and decision aids. You also need to understand what you are doing. In many huge projects, this is not always possible, so your code has to stay small, self-contained, having a clear purpose. In many safety-critical systems, you have to be able to documentarily prove that every line of your code is there to implement a certain requirement. If when asked to do something similar you wouldn’t know where to begin — many mistakes have already been made.
Code is full of ephemeral assumptions. Make them real. Assert everything you can, communicate the rest via types and naming. For example, if you expect an element to be present in a container — assert that it is not empty. If you expect a container to be sorted, you should probably name it accordingly. This is much weaker than assertion, but still much stronger than knowledge captured in your head only.
In general, code is mostly read and then occasionally modified. Any misinterpretations will lead to difficult to trace mistakes. Thus, you should take professional pride in writing clean code. Uncle Bob’s “Clean Code” is still one of the best books on this.
Conclusion
Software is the most complex thing we humans produce, often needlessly so. We need all the help we can get in order to write correct software.
While most people agree that an efficient work environment and coding discipline are good, the emphasis is usually on how this affects productivity. However, the quality of software suffers from the lack of this much more. Until such time when programs are written by robots, the lack of rapid feedback coupled with the focus on solving problems as quickly as possible, instead of maximising clarity and documenting intent, will ensure that errors are made. And after a few rounds of BDD (bug-driven development), another monster will be born.
As engineers and professionals, let’s not forget: there are standards we have to live up to. And then let’s practice what we preach.
I originally published this article on my blog on Medium. Cover photo taken from here. The opinions expressed here represent my own and not those of my employer.