Book Review: “Debugging: The 9 Indispensable Rules for Finding Even the Most Elusive Software and Hardware Problems” by David J. Agans

I picked up this book from the library while browsing the shelves. I thought it was interesting to put a methodology around debugging and was pleasantly surprised by the contents. They match what I practice when I need to debug a hardware or software issue. Some of the examples are really old and are probably difficult to understand for some but I can appreciate the examples as they put some context around the framework. I think this book is a must read for an engineer who has gone through some debugging wars and is disparately looking for a method to the madness. This is it!

The 9 Rules are:

1. Understand the System. a) Read the manual. b) Read everything in depth. c) Know the fundamentals. d) Know the road map. e) Understand your tools. f) Look up the details.

2. Make It Fail. It seems easy, but if you don’t do it, debugging is hard. a) Do it again. b) Start at the beginning. c) Stimulate the failure. d) But don’t simulate the failure. e) Find the uncontrolled condition that makes it intermittent. f) Record everything and find the signature of intermittent bugs. g) Don’t trust statistics too much. h) Know that “that” can happen. i) Never throw away a debugging tool.

3. Quit Thinking and Look: You can think up thousands of possible reasons for a failure. You can see only the actual cause. a) See the failure. The senior engineer saw the real failure and was able to find the cause. The junior guys thought they knew what the failure was and fixed something that wasn’t broken. b) See the details. c) Build instrumentation in. d) Add instrumentation on. e) Don’t be afraid to dive in. f) Watch out for Heisenberg. Don’t let your instruments overwhelm your system. g) Guess only to focus the search.

4. Divide and Conquer: a) Narrow the search with successive approximation, b) Get the range, (if the number of 135 and you think the range is 1 to 100, you’ll have to widen the range) c) Determine which side of the bug you are on, d) Use easy-to-spot test patterns, e) Start with the bad – start where it’s broken and work your way back up to the cause. f) Fix the bugs you know about – bugs defend and hide one another. Take them out as soon as you find them. g) Fix the noise first. Watch for stuff that you know will make the rest of the system go crazy. But don’t get carried away on margin problems or aesthetic changes.

5. Change One Thing at a Time: a) Isolate the key factor. b) Grab the brass bar with both hands. (as the brass bar in a nuclear submarine on the instrumentation panel – look at the dials and indicators carefully) c) Change one test at a time. d) Compare it with a good one. e) Determine what you changed since the last time it worked.

6. Keep an Audit Trail: a) Write down what you did, in what order, and what happened as a result. b) Understand that any detail could be the important one. c) Correlate events: “It made a noise for four seconds starting at 21:04:53” is better than “It made a noise.” d) Understand that audit trails for design are also good for testing. e) Write it down!

7. Check the Plug: Obvious assumptions are often wrong. Assumption bugs are usually the easiest to fix. a) Question Your Assumptions, b) Don’t Start at Square Three, c) Test the tool.

8. Get a Fresh View: You need to take a break anyway. a) Ask for fresh insights, b) Tap expertise, c) Listen to the voice of experience, d) Know that help is all around you, e) Don’t be proud, f) Report symptoms, not theories, g) Realize that you don’t have to b sure.

9. If You Didn’t Fix It, It Ain’t Fixed: a) Check that it’s really fixed. b) Check that it’s really your fix that fixed it. c) Know that it never just goes away by itself. d) Fix the cause. e) Fix the process.

The author’s website may be helpful.
The Debugging Rules Poster