Bugs can be really evasive and produce some surprising, even "impossible" behaviors. Part of a software developer's job is to relentlessly hunt them down and destroy them. However, often at least half the battle is finding them to begin with. The best debuggers I have seen rely a lot on experience, both experience with the application and experience with debugging in general. The former is situation-specific, and there many not be any good ways to advance in that regard other than practice. However, we can extract some information pertaining to good debugging practices in general.
In the less informative cases, application logs can be very handy. Some applications even have session or recovery files that allow for scripted playback of a user's actions. Use these tools to your maximum advantage. I have debugged several issues where scouring a log or recovery file allowed me to reproduce a problem I never would have been able to otherwise. One of the greatest advantages of these resources is that they don't forget or confuse the details. I've encountered several cases where a tester reported doing one thing, but a recovery file indicated a different sequence of steps were actually taken, which led to reproducing the issue.
One of the simplest techniques in debugging without a debugger is using print statements. Almost every programming language has a simple mechanism to print information to the console, and printing the values of variables that may be related to the bug is a good way to get a sense of the application's behavior. Print statements can also be used in conjunction with conditionals to alert you of unexpected values. For example:
You can often get a lot of information using these techniques. However, the downsides are that it requires you to know which questions you want to ask before each run of your program, and you have to recompile every time.
The basic idea here is to find some code that "works" by adding or deleting code. When I say "works", I mean the code doesn't exhibit the bug you are trying to track down. It may be missing some important functionality that needs to be there, but if the code is crashing right now, the working code shouldn't crash even if it achieves that crash-free state by doing nothing at all. At this point, you will have the bad code that you started with and code that does not exhibit the bad behavior. You can proceed to make changes from either end to bring the two closer together until you get to the point that a very small change to one or the other makes the working code bad or the bad code work.
I worked on a project where I did some significant refactoring of the threading logic. It went very smoothly in the code line I was working in. Then, it was merged with a different code line, and it broke. We spent days debugging it and could not figure out what the problem was. Another developer found a workaround that didn't seem related to the issue (it turned out it wasn't, but it did hide the problem). We submitted that workaround, as we still had no idea what the real problem was and couldn't continue to spend all of our time trying to find it.
However, I was determined to find the problem, so I laid out code that worked prior to the merge and the code that didn't work after the merge and diffed the two. I took one small piece at a time from the failing code and added it to the working code and tested it. It all worked until the last few lines. Once I was nearly done, tediously transforming the working code into the broken code, I finally stumbled upon the piece that broke it. It was an ugly way to debug, but when you're really stumped, sometimes it works when all else fails.
Dig for Clues Using All Available Resources
The first step in debugging an issue is ideally to reproduce the problem. This isn't always possible, but if you want any significant level of confidence that you have fixed a bug, you need to observe the broken behavior and understand how to trigger it from a user's perspective. Sometimes we're lucky, and a tester has taken the time to write detailed steps on how to reproduce the issue. At other times, we get a vague description of what someone was doing when the application crashed, and no description concerning what the user was doing in the two hours of application use leading up to that moment.In the less informative cases, application logs can be very handy. Some applications even have session or recovery files that allow for scripted playback of a user's actions. Use these tools to your maximum advantage. I have debugged several issues where scouring a log or recovery file allowed me to reproduce a problem I never would have been able to otherwise. One of the greatest advantages of these resources is that they don't forget or confuse the details. I've encountered several cases where a tester reported doing one thing, but a recovery file indicated a different sequence of steps were actually taken, which led to reproducing the issue.
Debugging without a Debugger
There are several situations where you might not want to or be able to use a debugger. Some types of issues don't lend themselves well to debugging with a debugger. At times you may want to do a little analysis first without firing up and configuring a debugger, particularly if this issue occurs in a configuration or part of the application that you don't debug often.One of the simplest techniques in debugging without a debugger is using print statements. Almost every programming language has a simple mechanism to print information to the console, and printing the values of variables that may be related to the bug is a good way to get a sense of the application's behavior. Print statements can also be used in conjunction with conditionals to alert you of unexpected values. For example:
if (suspectVar > 50) {
cout << "suspectVar has a bad value" << endl;
}
You can often get a lot of information using these techniques. However, the downsides are that it requires you to know which questions you want to ask before each run of your program, and you have to recompile every time.
Debugging with a Debugger
When you do have the power of a debugger at your disposal, there are some very effective tools available to you. Debugging with a debugger can be somewhat distilled to determining where to set breakpoints. Two of my favorite techniques are breaking on exceptions and setting conditional breakpoints.Break on Exception
Most debuggers allow you to trigger a break when an exception is thrown. You will usually need to make ample use of the filtering options, however, because many applications will throw exceptions far too frequently for you to debug effectively if you break on every one. If you know the kind of exception you're looking for, either from log output or an error message presented to the user, you can select to break on only that type of exception. Otherwise, you will often have an exception class from which most or all of the exceptions you are interested in derive. It's also generally useful to always break on critical exceptions such as access violations in C++ or NullPointerExceptions in Java.Conditional Breakpoints
Many modern IDEs allow you to set breakpoints that will only be triggered under certain conditions. Often these conditions will be as simple as a method parameter being equal to some value of interest, but they can be more sophisticated if needed. I have used this technique to great effect when I suspect a problem in a particular invocation of a method that is called very frequently. I might know which object is causing the problem, but I don't want to wade through 10,000 calls to the method before I can observe it being called with the object of interest. This technique can also be very helpful in connjuction with event listeners by setting a breakpoint that looks for a specific value or condition when an event handling method is triggered.When the Debugger isn't Enough
Sometimes you just can't get enough information from the log files, print statements or the debugger to figure out what is going wrong. This can especially be the case when you are crashing in a third-party library for which you do not have access to the source code. If you find yourself scratching your head in bewilderment, utterly at a loss to explain why the code, which looks perfectly correct, is failing miserably, you can sometimes still identify the problem through a steady, often tedious, process of elimination.The basic idea here is to find some code that "works" by adding or deleting code. When I say "works", I mean the code doesn't exhibit the bug you are trying to track down. It may be missing some important functionality that needs to be there, but if the code is crashing right now, the working code shouldn't crash even if it achieves that crash-free state by doing nothing at all. At this point, you will have the bad code that you started with and code that does not exhibit the bad behavior. You can proceed to make changes from either end to bring the two closer together until you get to the point that a very small change to one or the other makes the working code bad or the bad code work.
Binary Search
One convenient way to do this, particularly in the case of a regression, is to use binary search to track down a bad commit. Some source control software products have built-in support for this such as Git's bisect command. In the absence of such automation, you can still conduct the search manually, eliminating half of the commits in between a known good commit and a known bad commit with each iteration by testing a commit that occurred in the middle.One Small Change at a Time
In the worst case, you may just need to comment out one or a few lines of code at a time until the bug disappears. Whichever line of code you just commented out at the point the issue vanished is often, but not always the problem. However, if it isn't the problem, it will likely give you good insight into where the problem is or, if nothing else, serve as a starting point toward working around the issue.I worked on a project where I did some significant refactoring of the threading logic. It went very smoothly in the code line I was working in. Then, it was merged with a different code line, and it broke. We spent days debugging it and could not figure out what the problem was. Another developer found a workaround that didn't seem related to the issue (it turned out it wasn't, but it did hide the problem). We submitted that workaround, as we still had no idea what the real problem was and couldn't continue to spend all of our time trying to find it.
However, I was determined to find the problem, so I laid out code that worked prior to the merge and the code that didn't work after the merge and diffed the two. I took one small piece at a time from the failing code and added it to the working code and tested it. It all worked until the last few lines. Once I was nearly done, tediously transforming the working code into the broken code, I finally stumbled upon the piece that broke it. It was an ugly way to debug, but when you're really stumped, sometimes it works when all else fails.
Comments
Post a Comment