The Abandoned Memory Problem
In this article I will describe a technique for tracking total amount of heap memory allocated (malloc-ed) and a method for locating memory that has been allocated, is still accessible but no longer being used.
In a previous life I worked on a software product (developed in C) with two operations which were in theory equal and opposite (for the purposes of this article lets call these operations Show and Hide). When I say equal and opposite, specifically I mean that executing Hide after Show should have returned the system and specifically the amount of memory allocated to the state before Show was executed. However on careful inspection, each time the Show/Hide operations were repeated the amount of allocated gradually saw-toothed upwards.
As is usually the case when memory gradually goes missing, a memory leak was suspected, so I started out by downloading the valgrind instrumentation framework [1],[2] which includes a tool memcheck for detecting memory management issues and in particular memory leaks. However valgrind/memcheck didn't report a single memory leak and I came to the realization that the memory was not being leaked (i.e. it was still accessible and was being freed on application shutdown ) but rather abandoned (allocated but left unused) - a different technique was needed to discover where the memory was going and why it wasn't being freed during the Hide operation.
Before explaining how I went about locating the abandoned memory, I'll start by explaining how in fact I was able to determined that the amount of memory allocated was increasing over time. The technique requires creating wrapper functions around the basic memory management functions of malloc, calloc, realloc and free as well as any other functions that allocate and memory and return a pointer to this allocated memory (notably the Posix function strdup). The main problem these wrappers solve is that while functions that allocate memory know exactly how much space they are allocating (they receive the value as input parameters), functions that free memory don't know how much they are freeing (since they only receive a pointer to the allocated memory.) To solve this problem, the wrappers for functions allocating memory, allocate slightly more than they are requested and use the extra space to note the allocated amount. Functions freeing memory can then read this value and since both memory allocating and memory freeing functions now know how much memory they are allocating or freeing then it is easy to keep a running total of the amount of memory currently allocated.
Listing 1 shows the code for the wrapper functions. In order for the code to work all source files need to include the common header so that calls to the basic memory management functions get replaced with calls to the wrapper functions.
In order to track down the abandoned memory I created a second set of wrapper functions to dump out all information relevant to the memory operations (a character to indicate the operation, the relevant memory location and the file name and line number where the invocation came from) to a file. I also added functions to start and stop the dump so that by switching on the dump just before a Show operation and then switching it off again after a Hide operation I could capture only the memory operations relevant to my hunt for abandoned memory.
Finally to pinpoint which memory was being abandoned I wrote a Perl script to process the dump file. The Perl script uses a simple hash to map memory locations to the corresponding memory allocation and related data. When the script matches a free operation to its corresponding allocation operation it removes it from the hash. After iterating over the memory operation dump file the script outputs details of what is left in the hash i.e. the memory allocations corresponding to abandoned memory.
Listing 2 contains the sources for the memory dump wrapper functions - a pre-processor define MEMDUMP can be used to use these definitions on and off. My dump file was a massive 300MB but after processing with the Perl script (Listing 3) only about a 150 lines were left which was a lot more managable.
However just knowing which operations had given rise to the abandoned was not enough on its own. The same invocations had been made 1000s of times during the Show and Hide operations but only a few of them had resulted in abandoned memory. To get to the bottom of the problem there was one more thing I had to do - inspect the memory locations that had been abandoned and see what was in them. (In Visual Studio you can inspect memory using the Debug→Windows→Memory menu; in gdb use the x command [3] to examine memory.) Inspecting memory proved to be the key; I was able to identity a very precise subset of data that was causing the memory to be abandoned. Having identified this subset I was able to re-run the Show/Hide test with just this subset and identify the precise causes of the abandoned memory and fix them.
So where did the memory go?
There were 2 principle causes of the abandoned memory.
The first was an array of pointers, working like a simple cache, which grew over time. Pointers to objects that were no longer needed were NULLed but the empty slots they created were never reused. This was simple enough to solve - when inserting new pointers into the array instead of simply putting each new pointer at the end of the array I looked for the first empty slot and used that.
A second cause also involved objects in a cache - these objects were being reference counted and when the reference count reached 0 they were removed from the cache - however in a very few specific cases the reference count was not being decremented correctly leaving these references to live in the cache until eventually they were removed on application shutdown.
Conclusions
- When an application's allocated memory unexpectedly grows over time - it is not necessarily the case that memory is being leaked. It may simply have been abandoned.
- Even if the quantity of memory operations is extremely large - dumping a record of the operations to a text file and analyzing the output using a simple script can break the problem down to manageable size.
- By analyzing memory while the application is still running it is possible to inspect the abandoned memory locations to further break down the problem.
[1]http://www.cprogramming.com/debugging/valgrind.html
[2]http://valgrind.org/docs/manual/mc-manual.html
[3]http://www.delorie.com/gnu/docs/gdb/gdb_56.html