The Bug
Oil field pump jack, By Sanjay Acharya - Own work, CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=7935275

The Bug

One of my former students asked about the trickiest bug anyone had run into -- and that got me reminiscing about The Bug. I have some tricky code that I need to write, but I don't want to do it, so I'll stall by rambling about The Bug for a few minutes.

Back in the late 1980s, when I had a brand-spanking-new BS degree in Computer Science, I went to work for a startup founded by some of my college friends. We were putting small microprocessor-powered control units onto oil pump jacks, which were connected by radio to a base station PC (running MS-DOS!) back at the offices. The controllers (Texas Instruments Remote Telemetry Units, or RTUs) would turn the pump jacks on and off to maximize production, reduce the chances of equipment failure, and so on.

We had solar panels to charge batteries, which then supplied the power to the control units and radios. There's not generally a handy plug-in strip in the middle of an oil field in Wyoming.

Everything was working fine until around October. Then, on some mornings, we'd come into the office, and find the base station PC had rebooted. All the data from the night before was lost. Your friendly MS-DOS PC just staring at you, with no clue at all what had happened or why.

For months, we'd sit by the PC, late into the night, waiting for the machine to reboot. And it wouldn't. Unless we got tired of waiting, decided it wasn't going to happen, and then went to bed. After everyone was sleeping, sometimes it would reboot. Did I mention that this was in Wyoming? Not even in a nice part of Wyoming; Casper the Friendly Ghost Town was a bustling metropolis compared to where we were. We were in One-Horse-Oilfield-Shack, in the middle of Winter. Yeah, good times.

This was The Bug. And it kicked our butts for quite a while, until Greg figured it out. I remember when that happened. Greg was in the office next to mine, and I heard a short blast of high volume profanity (trust me, oil field swearing can be pretty intense), followed by a coffee mug smashing against a wall. Greg was a little bit pissed off. But he found The Bug.

Here's the sequence of events for The Bug.....

On a cold and cloudy day, the solar panels wouldn't gather much energy. If that happens for a few days in a row, the batteries can run low. When the batteries run low, the radio will still turn on to transmit from the control unit, but it only sends zeroes (seriously?!?). If the number of bytes sent was odd, it would pass the simple RTU checksum verification (thanks a lot, Texas Instruments RTU protocol developers), and be viewed as a valid message. The base station would then take this message, and update the entry (that's entry zero) in the database. Entry zero was not a valid spot in the database, though, and it turned into a write to the operating system memory space.

Y'know one of the advantages of Window over MS-DOS? That would be memory protection. On a DOS box, you could change pretty much any chunk of memory at any time. Wham, bam, blue screen of death 'mam.

Long story short, I don't trust batteries, and I'm not particularly fond of Wyoming. If we had coded everything in Rust, we could have avoided The Bug entirely.

Now that the story is done, I should jump onto the tricky code that I need to write. If I get the frames of reference exactly right, it'll be tight, clean, elegant, work immediately, and be awesome. Or it could be a few days of weeding through debug output to figure out what I screwed up. Potentially followed by a blast of profanity, but I don't usually throw my coffee mug. And I'm not in Wyoming, so that's cool.

Hmmm. I should make a cup of coffee first, before I jump into the tricky code. Yeah, that's a good idea. No, I'm not stalling. A cup of coffee will help.

I am the "Greg" in this story. I am so glad Patrick remembers the particulars so well! I had forgotten some of them. One moral is: don't trust any data that originates outside your program. Try to have an active imagination about how bad what the worst thing that could happen actually is, and validate, validate! It was so embarrassing to realize how gullible I'd been. Even the most rudimentary sanity check on the received bytes would have prevented the problem. Something like Ada's range-constrained integral types or Chapel's range types could even have done it without the clutter of explicit checks in the code, but none of that was available to us back then. Aside: you probably shouldn't even trust data that originates INside your program, but then where does it all end? After our little startup cratered I went to work at Cray for a nice long career working mostly on compilers and runtime libraries for parallel programming models. I wrote some more wild store bugs (argh! darn it!), but at least they only affected (my own) user space, and none of them were as diabolical as The Bug.

What's the tricky code about?

Like
Reply

Always made my day hearing stories like this in cs375 😂

To view or add a comment, sign in

More articles by Patrick Madden

  • Benchmarks and Research in an AI World

    [Related ISPD 2026 paper] Over the past few years, there's been rapid growth in the use of AI in almost any area you…

    2 Comments
  • One more GPU bro! One more GPU will fix it!

    I was at a faculty gathering the other day, and wound up talking to one of my colleagues from economics. As you might…

    1 Comment
  • Bad Vibe Coding

    AI-powered programming has been getting a lot of attention, with the promise that it will revolutionize programming…

    5 Comments
  • In Rust We Trust

    For many years, I was primarily a C programmer. I learned C shortly after fire was invented.

    5 Comments
  • This Chip is Bananas (B A N A N A S)

    For reasons, I've been thinking about circuit topologies quite a bit lately. It's stuff that probably wouldn't make it…

    12 Comments
  • Big-O, Amdahl, Finals

    End of the semester, so I'm busy with finals. This time around, I'm teaching core computer science algorithms, with…

    4 Comments
  • Benchmarking mixed-size placement

    A few weeks ago, I was quoted in a CACM news update related to Google's RL based macro placement work. https://cacm.

    5 Comments

Others also viewed

Explore content categories