Debugging software - A step by step checklist
Debugging is a reality for all engineers. I was reading "The Practice of Programming" by Kernighan and Ritchie today, and found the chapter on Debugging really useful. It had a very organized approach that I felt summarized a number of lessons that engineers learn the hard way. So, I wrote down a high level checklist for myself when I am dealing with a bug next time. and depending on whether the bug is in my code/someone else's code/easy/hard/reproducible/non reproducible I would meander thru this list. Here is, what I gathered in my notes:
Good Clues, Easy Bugs:
- Look for familiar patterns - Have you seen such failures before? where?
- Examine the most recent change - the last change likely caused it, or exposed it
- Don't make the same mistake twice - Once you find a bug, find where else it exists and fix them all.
- Debug it now, not later - Later would be far more costly, debugging data would be gone, it mayn't even repeat easily.
- Get a stack trace - Knowing the line number where it crashed helps a ton. Call trace, unexpected values of variables etc also help.
- Read code, before starting to type to fix the code - Resist the urge to type the fix before you are there.
- Explain your code to someone else - Works more often than you would expect.
No Clues, Hard bugs:
- Make the bug reproducible - Have a one click reproduction. Because you would likely need to recreate it several times to debug it
- Divide and Conquer - narrow down the input to get to smallest input that causes it
- Is there any pattern in the failures - for e.g. every nth request is impacted, happens every 5 minutes etc. Are you hitting a boundary condition?
- Print output as needed to understand flow or state of program
- Insert Assert/verifying/checking code
- Write debug details into log files
- Draw a picture to explain the flow/state - Its worth a 1000 words, visuals help.
- Annotate data structures with statistics information, and graph them - See if data structures are looking good ( data distribution, values, operations, time per operation etc)
- Use tools (debugger)
- Keep records - because you would otherwise forget what you have already tested
Last Resorts:
- Mental model bugs - Use a debugger, are you looking in the wrong place, or just not seeing the problem
- Follow what the program is doing, not what you think it is doing.
Non Reproducible bugs:
- Bug is non reproducible is information.
- Probably means algorithm is okay, you might be using information that changes each time - uninitialized variables, overwritten value, does bug persist in debug mode
- If crash is far from anything that could be wrong - look for shared resources (files/heap/memory/sockets etc)
Other People Bugs -
- Get some understanding of the code, and how the developer thought and wrote
- grep, references, stepping thru, revision history are some ways to understand
What other steps do you take for debugging? Please share in the comments.
Thanks
Umesh