Frankenstein's Monster and the Problem of Solving Problems
Imagine for a moment a world where every time you go to the doctor's office, the doctor lays you out on a table, flips a couple of latches on your side, opens you up like a violin case, and goes to work fixing whatever issue you came in with. The doctor can turn you off, restart you, connect diagnostic machinery to specific organs -- whatever helps. At the end, he closes you up, secures the latches, and sends you on your way.
To me, this actually sounds wonderful.
The problem, of course, is that our bodies don't work that way. You can't just shut us off, put us in diagnostic mode, or really even open us up in any kind of meaningful way without difficulty or consequence. Modern medicine has had to learn how to work around these limitations and engineer solutions. We've developed diagnostic tooling -- everything from the humble, iconic stethoscope to feats of engineering like MRI machines -- and we've had to discover non-invasive therapies and medicines for treating issues once discovered and properly diagnosed.
But I'm not a doctor. I'm an engineer. If you're reading this, you probably are too. This is the best metaphor my tired mind could conjure while driving into the office today to answer a question that's been bothering me lately.
The long, jargon-riddled version of the question is this: "Why is it so difficult to effectively communicate the value proposition of compliant, production-ready diagnostic and support tooling to product teams?"
Put much more simply: "Why do product teams often overlook the need for tools required to support their product in production until after they've deployed to production?"
The second question is perhaps a generalization, bordering on stereotyping. The former is hard on the eyes but more accurate. Regardless, the conclusion that I arrived at, which was sort of an epiphany for me, but which may (or may not) sound obvious to you, is this:
Solving a problem and solving the problem of solving problems are two very different problems.
If your company is like the company I work for, you have compliance restrictions on access to production environments. Returning to the metaphor, your production environment is a human. Humans and production environments cannot stop. There is no maintenance mode. You cannot simply unlatch it, open it up like a violin case, and do whatever you want when it's not working right. Like a human, you cannot open it up in any kind of meaningful way without difficulty or consequence.
But here's where it gets interesting: unlike the real thing that grows up and matures in this state that we would compare to a production software environment, we build our humans in a lab.
When we build them, we have the ability to do anything we want with them. No system is a black box. Debuggers can be hooked up at will. Free access to the data layer is a given. We flay the beast and watch every organ as it functions -- and it's all allowed. So we come to a place where we've trained ourselves to solve problems in a lab, but when Frankenstein's monster comes alive and walks out the door we cannot put it back on the table.
We try, though.
We try to reproduce errors observed in production in our dev environments, flailing blindly, hoping to stumble across the root cause often armed with only two data points (the version of our deployed application and the reported error state) and our intuition to connect them. But it's not the same. It's not the same, and the whole time, while you're torturing the clone in your lab, your monster is out burning villages.
And we do this, at least in part, because we're myopic, obsessed parents coming to the aid of a child in distress, laboring under the illusion that these crises are -- and will be -- rare.
To come back to the epiphany that probably confused you, what I'm trying to articulate is that the problem of fixing immediate software issues and the problem of providing the capability to do so in production are two very different problems, and as product developers, we tend to only think about (or focus on) one of them: the former.
Put in the terms of the metaphor, when we are tasked with building our human, we are also effectively being tasked with building an MRI, a stethoscope and everything in between. We are tasked to discover medicines and non-invasive treatments. In our minds these tasks are entirely different, but both are required for success.
Fundamentally:
Building Frankenstein is solving a problem, and building the MRI is solving the entirely different problem of how we solve problems with Frankenstein.
So how do we express the value proposition of magnetic resonance imaging and penicillin to engineers? It really comes down to the value that we place on unburned villagers. When Frankie gets that look in his eye and reaches for the torch, he must be stopped. Fast.
At the end of the day, it is the village that suffers our presence and allows us the freedom to play in our labs.
Love the metaphor! Thanks for sharing.