The Hardest Bug
One day, I was tasked with improving a utility that tests various services by sending network requests and measuring their processing time.
I quickly added the required code, launched the utility, and… it immediately crashed with a memory access violation. Our project had long ago developed its own binary message protocol, similar to protobuf, with a custom C++ code generator and encoding/decoding mechanisms. This part of the code was old, and nobody wanted to touch it.
The debugger showed the crash occurred within the message-parsing code. I hadn’t touched that code, but just in case, I regenerated it—this didn’t help.
My first thought was: perhaps my new code was corrupting memory somewhere. To find the culprit, I decided to build the project with Address Sanitizer. I asked my colleagues if they'd used it before; they mentioned some unsuccessful attempts. After half a day of patience, I got a build with sanitizer support. Unfortunately, the sanitizer didn't find anything.
The strange thing was, the message causing the crash was being passed smoothly between the services and parsed successfully there. The services and my utility used exactly the same parsing code—the same static library. So, why did the services work fine, but the utility didn't?
I had no choice but to dig into the "scary" code manually. It looked something like this:
Parse_MessageName(const void* buffer, size_t size) {
MessageName_Model model(buffer, size);
MessageName message;
model.Parse(message);
// ...
}
The crash occurred during the Parse method call. The model itself was structured like a nesting doll, containing sub-models for each part of the message. In the constructor, the root model created a buffer and passed it by reference to its sub-models. In the working version, the main model constructor correctly initialized the sub-models, but in the utility, it didn't.
I had to look at the assembly code. Though I'm not an expert in assembly, it was clear that in the utility, the root model constructor was suspiciously short and didn't call the sub-model constructors. As a result, the memory of the sub-models wasn't initialized. However, in the service, the constructor looked normal, fully initializing all required memory.
Recommended by LinkedIn
Why the difference? Compiler flags, maybe? I checked them—identical. Could it be a compiler bug? To test this theory, I compiled the code with GCC, and the utility worked fine! It seemed the MSVC compiler was at fault, but why?
Looking carefully at the constructor, I noticed it was entirely defined in a header file. This meant different source files could see and compile it differently. I moved it to a .cpp file, updated the code generator, regenerated the files, and started the build.
The services compiled successfully, but the utility didn’t. The linker complained that the constructor was defined both in the library and in the utility file (let's call it test.cpp). How could this happen? I deleted object files, rebuilt the project—still no luck. Where was this constructor coming from in test.cpp?
To find out, I commented out test.cpp entirely. The linker error disappeared. Gradually uncommenting the code, I eventually identified the culprit: calling a template function foo, specialized for the MessageName type, caused the constructor to appear. It turned out that the function foo instantiated a variable of type MessageName_Model.
Finally, here was the solution. The MessageName_Model was actually a specialization of the template Model<MessageName>. The general template class was defined in the main proto-library header, but specializations were scattered across different headers. The test.cpp file only saw the general template, creating its own incomplete version without sub-models, unaware that a correct specialization existed.
As soon as I removed the body of the base template, leaving only the declaration, the compiler immediately pointed out the missing specialization. I included the proper header, rebuilt the project, and everything worked!
From this experience, I learned a critical lesson: never leave a base template open for unintended instantiation, as it can lead to obscure and hard-to-diagnose errors. If avoiding this is impractical because the base template handles most cases, it is safer to consolidate all specializations within a single class. Allowing users to freely specialize templates is a significant challenge in C++ that demands extra care and vigilance from developers.
This turned out to be the trickiest and most elusive bug of my life. It took several days to unravel, but the joy of finally conquering it made every moment worth the effort.