Debugging
At Last the 1948 Show: The Four Yorkshiremen

Debugging

I was reading an interesting article I found on Hacker News about a bug in Google Docs that took the engineer days to track down. This is not an unusual experience in software development, and the comments section on Hacker News for this article is replete with war story contributions and variations on "you thought that was difficult? We had...".

Having read the article though, I thought about two things:

  1. How could this have been easier to debug?
  2. How could the problem have been found earlier?

Debugging

The first challenge is often just getting a reproduction. Once you have this, you can start debugging!

The author describes using a debugger, setting breakpoints, adding logging — everything you would expect. Except that the engineer was constantly having to rerun the test case as they moved the debug breakpoints further back in the code. Admittedly, the incident was 12 years ago so maybe the tooling just wasn't there. Today, using something like Undo 's time travel debugger would have made this so much easier.

If you had been using Antithesis 's deterministic simulation environment, then reproduction is taken care of and you can then use their interactive debugging to investigate further.

Testing

As for finding the problem earlier: I'm sure a lot of readers were asking why this wasn't caught by something in the test pyramid (unit test, system test and so forth)?

To recreate the problem, the engineer created a 50 page Google Doc, then scripted bolding and unbolding the entire document repeatedly until the problem manifested.

To me, the reproduction case highlights why hand-written tests will often fail to detect this: Engineer-, or even QA-led, test cases like this are hard to conceive of a priori, because they just don't seem reasonable. How many variations would be necessary? Does this work as well if you used a 1 page Google Doc but it takes longer? What if you put a 'wait' in between cycles of changing the formatting? Is it just bold formatting? What mix of formatting changes should we have?

Clearly it is just too complex to imagine someone having thought of all this and have coded this that into test cases.

Is the answer generating the test cases? Maybe. There are a ton of tools out there that use generative AI to create test cases, probably using something like code coverage as a guide to whether sufficient test cases have been created. You now have thousands of test cases — good thing you have test automation in place!

Would those generated test cases have recreated the reproduction conditions? Hard to say with certainty, but I doubt it.

Personally, I think model- and property-based testing combined with fuzzing (see The Fuzzing Book) would give better results, having seen this work extremely well at Antithesis. Combine this with engineering approaches like 'buggification' that FoundationDB uses.

Similarly, being in a position to run a simulation yields incredible benefits: see TigerBeetle on simulation testing and "Designing Dope Distributed Systems for Outer Space with High-fidelity Simulation" — amazing talk from the last Strange Loop conference.

Conclusion

When you next face a "this took me ages to debug" problem, and you will because software is complex and complicated, hopefully these pointers will help.

Better debuggers on the day will bring immediate benefit.

Learning from the bugs and improving testability, which cannot be built in a day, should be the goal. It is worth considering retrofitting testing changes, or at least being aware of these options before starting the next major component build or rewrite.

Aside: The header image is a frame from the Four Yorskhiremn skit in which each has a worse childhood than the other: "We lived in a box in the middle of the road..."

To view or add a comment, sign in

More articles by Marcus Edwards

  • Unknown parameter in bagging area

    I recently had an interesting situation when I was calling a webhook API endpoint using curl and happened to misspell…

  • The Whole Is Less Than The Sum Of Its Parts

    This week I've been dealing with an unfortunate sequence of events around email, one of the oldest and most widespread…

  • Vacuous Truth & Black Swans

    Why does this fragment of Javascript code return true? 🤔 Initially I thought this was going to be some peculiarity…

    1 Comment

Others also viewed

Explore content categories