Skip to main content

Notions of Debugging

Bugs can be really evasive and produce some surprising, even "impossible" behaviors. Part of a software developer's job is to relentlessly hunt them down and destroy them. However, often at least half the battle is finding them to begin with. The best debuggers I have seen rely a lot on experience, both experience with the application and experience with debugging in general. The former is situation-specific, and there many not be any good ways to advance in that regard other than practice. However, we can extract some information pertaining to good debugging practices in general.

Dig for Clues Using All Available Resources

The first step in debugging an issue is ideally to reproduce the problem. This isn't always possible, but if you want any significant level of confidence that you have fixed a bug, you need to observe the broken behavior and understand how to trigger it from a user's perspective. Sometimes we're lucky, and a tester has taken the time to write detailed steps on how to reproduce the issue. At other times, we get a vague description of what someone was doing when the application crashed, and no description concerning what the user was doing in the two hours of application use leading up to that moment.

In the less informative cases, application logs can be very handy. Some applications even have session or recovery files that allow for scripted playback of a user's actions. Use these tools to your maximum advantage. I have debugged several issues where scouring a log or recovery file allowed me to reproduce a problem I never would have been able to otherwise. One of the greatest advantages of these resources is that they don't forget or confuse the details. I've encountered several cases where a tester reported doing one thing, but a recovery file indicated a different sequence of steps were actually taken, which led to reproducing the issue.

Debugging without a Debugger

There are several situations where you might not want to or be able to use a debugger. Some types of issues don't lend themselves well to debugging with a debugger. At times you may want to do a little analysis first without firing up and configuring a debugger, particularly if this issue occurs in a configuration or part of the application that you don't debug often.

One of the simplest techniques in debugging without a debugger is using print statements. Almost every programming language has a simple mechanism to print information to the console, and printing the values of variables that may be related to the bug is a good way to get a sense of the application's behavior. Print statements can also be used in conjunction with conditionals to alert you of unexpected values. For example:

if (suspectVar > 50) {
    cout << "suspectVar has a bad value" << endl;
}

You can often get a lot of information using these techniques. However, the downsides are that it requires you to know which questions you want to ask before each run of your program, and you have to recompile every time.

Debugging with a Debugger

When you do have the power of a debugger at your disposal, there are some very effective tools available to you. Debugging with a debugger can be somewhat distilled to determining where to set breakpoints. Two of my favorite techniques are breaking on exceptions and setting conditional breakpoints.

Break on Exception

Most debuggers allow you to trigger a break when an exception is thrown. You will usually need to make ample use of the filtering options, however, because many applications will throw exceptions far too frequently for you to debug effectively if you break on every one. If you know the kind of exception you're looking for, either from log output or an error message presented to the user, you can select to break on only that type of exception. Otherwise, you will often have an exception class from which most or all of the exceptions you are interested in derive. It's also generally useful to always break on critical exceptions such as access violations in C++ or NullPointerExceptions in Java.

Conditional Breakpoints

Many modern IDEs allow you to set breakpoints that will only be triggered under certain conditions. Often these conditions will be as simple as a method parameter being equal to some value of interest, but they can be more sophisticated if needed. I have used this technique to great effect when I suspect a problem in a particular invocation of a method that is called very frequently. I might know which object is causing the problem, but I don't want to wade through 10,000 calls to the method before I can observe it being called with the object of interest. This technique can also be very helpful in connjuction with event listeners by setting a breakpoint that looks for a specific value or condition when an event handling method is triggered.

When the Debugger isn't Enough

Sometimes you just can't get enough information from the log files, print statements or the debugger to figure out what is going wrong. This can especially be the case when you are crashing in a third-party library for which you do not have access to the source code. If you find yourself scratching your head in bewilderment, utterly at a loss to explain why the code, which looks perfectly correct, is failing miserably, you can sometimes still identify the problem through a steady, often tedious, process of elimination.

The basic idea here is to find some code that "works" by adding or deleting code. When I say "works", I mean the code doesn't exhibit the bug you are trying to track down. It may be missing some important functionality that needs to be there, but if the code is crashing right now, the working code shouldn't crash even if it achieves that crash-free state by doing nothing at all. At this point, you will have the bad code that you started with and code that does not exhibit the bad behavior. You can proceed to make changes from either end to bring the two closer together until you get to the point that a very small change to one or the other makes the working code bad or the bad code work.

Binary Search

One convenient way to do this, particularly in the case of a regression, is to use binary search to track down a bad commit. Some source control software products have built-in support for this such as Git's bisect command. In the absence of such automation, you can still conduct the search manually, eliminating half of the commits in between a known good commit and a known bad commit with each iteration by testing a commit that occurred in the middle.

One Small Change at a Time

In the worst case, you may just need to comment out one or a few lines of code at a time until the bug disappears. Whichever line of code you just commented out at the point the issue vanished is often, but not always the problem. However, if it isn't the problem, it will likely give you good insight into where the problem is or, if nothing else, serve as a starting point toward working around the issue.

I worked on a project where I did some significant refactoring of the threading logic. It went very smoothly in the code line I was working in. Then, it was merged with a different code line, and it broke. We spent days debugging it and could not figure out what the problem was. Another developer found a workaround that didn't seem related to the issue (it turned out it wasn't, but it did hide the problem). We submitted that workaround, as we still had no idea what the real problem was and couldn't continue to spend all of our time trying to find it.

However, I was determined to find the problem, so I laid out code that worked prior to the merge and the code that didn't work after the merge and diffed the two. I took one small piece at a time from the failing code and added it to the working code and tested it. It all worked until the last few lines. Once I was nearly done, tediously transforming the working code into the broken code, I finally stumbled upon the piece that broke it. It was an ugly way to debug, but when you're really stumped, sometimes it works when all else fails.

Process of Elimination

In the end, debugging is a process of elimination. Sometimes your experience and intuition will take you directly to the source of the problem, but other times, you need to wade through every line until you find the sly piece of code dragging the whole system down.

Comments

Popular posts from this blog

Books That Have Influenced Me and Why

A mantra that I often repeat to myself is, "Don't abandon the behaviors and habits that made you successful." I believe this trap is actually much easier to fall into than most people realize. You can sometimes observe it in the context of a professional sporting event such as American football. One team might dominate the game, playing exceptionally well for the first three quarters. Then, as they sit with a comfortable lead, you see a shift in their strategy. They start to play more conservatively, running the ball more often than they had. Their defense shifts to a "prevent" formation, designed to emphasize stopping any big plays by the other team while putting less pressure on the short game. The leading team often looks awkward in this mode. They have switched their perspective from that of pursuing victory to that of avoiding defeat. They have stopped executing in the way that gained them the lead in the first place. I have seen more than one game ult

Code Maintenance Requires More Than Testing

Writing and running automated tests can go a long way to help maintain the quality and reliability of a code base. Tests help ensure the code you've written executes the way you expect it to. However, even though automated tests are a great tool for helping to ensure quality over the years in which people will contribute to the code, they are not sufficient in and of themselves. It may be inevitable that all software products will reach a point of obsolescence no matter how much work is done to keep them current. The inherent problem is that if we view tests as constraints upon what the software system under test can be, then as we add more and more tests over the years, we will have more and more constraints on the shape our software is allowed to take. It then seems we must reach an inevitable point where we can no longer change our software system without violating a constraint, i.e. causing a test to fail. In more practical terms, we'll more likely first reach a situat