More about debugging.

One of the blogs that I have on my feed list recently reposted this tidbit:

There are only two debugging techniques in the universe:

  1. printf.
  2. /* */

Since I recently posted my own little list of some debugging techniques, I can’t resist weighing in on this assertion. While their list is a bit flip, changing behavior and running experiments to see the changes is, of course, a core debugging technique. However, the claim that debuggers are just extensions of this is a bit too reductionist for my tastes – it’s like claiming that a car is just a horse that takes longer to get tired, or that the printing press is just a scribe that works faster.

I also feel that “debugging techniques” shouldn’t be limited to “things you do to the program,” either – thought experiments, code change analyses, and the like are all valid in my book as well. My definition includes any form of investigation that gives you more insight into the problem.

Completely separate from the article linked above, I had a glance at Wikipedia’s article on debugging, and…well, there’s a lot of stuff in there that gets my back up:

  • Citing language choice as having an impact on the debugging process is silly – your choice of language may make it more difficult to write buggy code, but once a bug is in there, I’m hard pressed to think of a reason why language choice would have a substantive impact on the actual debugging process. (Saying that C++ makes debugging easier than C because it has single-line comments is not funny.)
  • “Generally, high-level programming languages, such as Java, make debugging easier, because they have features such as exception handling that make real sources of erratic behaviour easier to spot. In lower-level programming languages such as C or assembly, bugs may cause silent problems such as memory corruption, and it is often difficult to see where the initial problem happened.”

    1) Exception handling is only as useful as the exceptions that are thrown in the code.
    2) The claim that “high-level programming languages” like Java don’t suffer from “memory corruption” is misleading. Any imperative language with side effects is going to be capable of bugs that look like “memory corruption,” and can effectively be treated the same way with regards to the debugging process. (As an aside, it’s interesting to see how the definition of what qualifies as a “low-level language” have shifted over the years…yikes.)

  • Static code analysis tools are meant to be used to fix code problems before you run into the bugs they cause. Using them as an example of a debugging tool is kind of missing the point of using them entirely. lint isn’t going to help you debug why something is busted – it’s only going to tell you that you’re using an uninitialized variable, and that’s something that shouldn’t have been in your compiled code in the first place.

 

Addendum to the earlier debugging post: One important quality of the printf and commenting techniques that I didn’t mention in my earlier post, though, is that they are techniques that will work even if you can’t actually attach a debugger to the process (or if there is no debugger for the environment in which you’re working). Sometimes this characteristic can really save your bacon – I remember having to change the screen background color register (the poor man’s printf, which doesn’t even need the C standard library) on a certain console in order to debug the DVD boot loading code of a game. Each line of code was prefaced with a call to set the background color to a distinct value, and the color remaining on the screen when the console froze allowed me to determine the location of the crash (and, eventually, the solution).

Me and Google

Two unrelated stories:

  • We’ve both recently been the target of hacking attempts originating from China. Of course, the attack they suffered was much more serious and alarming than mine, which appeared to be a bot trying to log into my wireless router (running Tomato) and was mindlessly trying every entry in a password dictionary. I turned off remote access in the admin panel, and that was that.
  • I own a Garmin nuvi 200, and got accustomed to using the Communicator plugin with Google Maps to easily input addresses into it without having to use the touchscreen interface. Tonight, though, I tried to use it and discovered that Google Maps no longer shows the Send link off of which the browser integration hangs. Not sure why it’s no longer there, and an update to the plugin didn’t change anything, so it seems like it’s just broken. Argh.

Debugging!

Here’s another Reddit stub dealing with a topic that is near and dear to my heart: debugging! Unfortunately, the comments on that article seem to focus more on the “fuzzy” aspects of debugging – the “go home and mull it over while watching TV” kind of stuff, rather than more concrete debugging techniques.

Whenever I run into a bug whose cause is not immediately obvious, I have a standard bag of tricks that I fall back upon. Every programmer has a toolbox like this – I figured I would write about some of the techniques that I use, and why I use them. Some of them are not applicable to every situation, but there are still many that can be applied to any given bug. These are presented in no particular order.

  1. Change the inputs.

    Many times, by changing the inputs to a function, you can cause a recognizable change to occur in the output. This helps you to envision what’s actually happening in the function, and where things might be going wrong. This can include changing parameter values, input files, textures, etc.

  2. Do things in a different sequence, or with different timing.

    Pretty self-explanatory, and related to the first item. The idea is to observe differences in the behavior of the program in similar circumstances. This is mostly useful for interactive programs.

  3. Run all of your automated test code, even the slow tests, and examine any issues that are reported.

    This is helpful if for no other reason than as a sanity check.

  4. Check the logs.

    This one is pretty standard. Even though most debug logs tend to be overflowing with spam messages, you might still find a smoking gun in there.

  5. Ensure that you’re validating all of the return values of function calls.

    This is also known as the “be paranoid” rule, or perhaps the “re-check all of your assumptions” rule. It’s easy to forget to check return values, but it’s crucial to do so. Code that silently ignores failure can cause problems or symptoms unrelated to the function call that actually failed.

    A related problem is returning pointers to objects on the stack – this will result in havoc since they will be freed as the function exits.

  6. Ensure that pointer values get cleared out when the struct or object to which they point is freed.

    Using stale pointer values is a surefire way to get in trouble. Clearing them out when the associated object is freed (except in very, very special circumstances) will help keep you sane.

  7. Check for any masked exceptions in managed code.

    I wrote about this the other day. Ensure that no unexpected exceptions are being silently masked in your code.

  8. Check the data.

    Make sure that the data you’re trying to use is actually valid! The GIGO principle is as true as ever. Check for data out of the expected range, QNANs being generated (which are infamous for screwing up subsequent floating-point operations), the ordering of data, legacy data, and correct offsets/sizes of data.

  9. Put in additional logging or debug visualization code.

    This can help provide additional information, but this strategy can also backfire, as it can significantly change the timing of your code. (Disk, socket, and/or pipe I/O are relatively expensive operations.) Use with caution. If you have general-purpose code for validating the state of the application, sprinkle calls to that code throughout the application – this can be useful in determining when things go off the rails.

  10. Check the crash dump, if you have one.

    Post-mortem analysis of core dumps is often extremely useful in tracking down bugs that you didn’t personally witness, or for which the steps to reproduce are lengthy or time-consuming.

  11. Step through the code in the debugger.

    It can sometimes be quite slow if you’re processing large data sets, but it’s often the easiest way to monitor the control flow of a function.

  12. Inspect the disassembly.

    This sounds hardcore, but being able to do this is invaluable in some cases. It’s useful not only for checking the compiler’s output, but also for cases where you’re examining a minidump of managed code. Unless you’re working with CLR4 and Visual Studio 2010, opening these minidumps in Windbg results in callstacks that don’t include line numbers. You do have the instruction pointer value, though, so you can actually print out the disassembly with the !u SOS command, and compare the disassembly with the original source code of the function to figure out the exact point at which the crash occurred.

  13. Inspect memory.

    If you have pointer problems, look at the contents of memory in the debugger to try and figure out what’s going on. A frequent problem is an invalid offset, which results in struct member values “shifting” forwards or backwards. It helps to be familiar with memory representations of things like floating-point numbers – knowing a couple of common values (0x3f800000 == 1.0f, etc.) can be very handy.

  14. Use conditional or memory breakpoints to isolate the bug.

    If you know that a particular object or memory address is related to the bug, you can set up breakpoints to pick out a particular loop iteration or write to a memory location. In cases where you’re interacting with a large body of unknown code, memory breakpoints can be particularly useful for tracking state changes.

  15. Try a different build configuration (or turn on asserts).

    This is intended to provoke behavior changes, add validation, and otherwise provide additional data points for determining exactly what’s going on. Turning on validation such as array bounds checks and heap checking can help find some tough bugs (albeit at a tremendous cost in execution speed).

  16. Run on a different platform, or build with a different compiler.

    There are often significant differences in timing and other behavior when you run software on a different platform. Endianness and word size also often differ between platforms, which can expose problems or bad assumptions about data in code. Like changing the inputs, careful observation of these differences can help you get an understanding of what’s actually happening. Additionally, if you are using a different compiler, you may see different warnings or code behavior due to optimization.

  17. Check the version control history for anything suspicious.

    Examining changes to the source code can give you a good idea of how the behavior of the program has changed (even if you don’t know much about the changed code to begin with), and can shed some light on a bug. Checking all of the cases where a function is called to ensure that they take into account any changed behavior is essential.

  18. If your codebase easily allows you to do so, try running the buggy code synchronously instead of asynchronously, for testing purposes.

    This can help determine if a race condition is what’s causing the bug to occur. (It should be noted that I’m not crazy about just adding random sleeps into an asynchronous function to determine this – it’s too unreliable for my tastes.)

  19. Check (and re-check) your data dependencies in asynchronous code.

    Don’t fall into the trap of trying to envision multithreaded code by imagining each possible combination of instruction pointer values. Instead, when trying to prove correctness, focus entirely on data dependencies and ensuring that locks are used correctly and respected. (For deadlock bugs, inspect the order in which locks are taken, and check for proper use of back-off and other algorithms for avoiding deadlock.)

    Note that applying these techniques won’t help you write optimal multithreaded code – this requires much broader insight into the particular algorithm in question, and the overall architecture of the code. However, they will help in tracking down correctness issues.

  20. Turn off chunks of the code, or switch to a different implementation of an interface.

    If you have multiple providers of an interface available, try using a different one, and see how the behavior of the program changes. (Using null/echo interfaces is a common debugging technique.) Additionally, you can try disabling features of your application to see if they are somehow related to the bug.

  21. Use third-party validation tools and/or debugging information.

    This includes things like the Direct3D debug runtime, FxCop, lint, valgrind, the Visual C++ runtime debug heap functions, the Application Verifier, and the checked build of Windows. The more debugging aids you have active, the more likely that you’ll get a clue upon which you can act.

  22. If the code is unfamiliar, find out who wrote it, and start asking them basic questions about it.

    This is similar to the “be paranoid” rule, except that by asking all of these basic questions of the author, you’re forcing them to re-check all of their assumptions. It’s not uncommon to have a eureka moment while explaining a bit of code to someone else.

  23. Try turning off optimizations for a chunk of code.

    My experience has been that people tend to fall back on the “it must be a compiler bug” excuse explanation way earlier than they really should. Nevertheless, turning off optimizations for a section of code might help you debug a problem that occurs in optimized builds. (Whether it’s a genuine optimizer bug, or, say, a misuse of the C99 restrict type qualifier, is for you to find out. Anyone interested in using the latter, incidentally, should really read this excellent article by Mike Acton on the topic.) Performing a quasi-“binary search” when turning off optimizations can help minimize the time spent searching for the problem code snippet for a genuine optimizer bug.

  24. Try running on a different machine, or piece of hardware.

    Hardware failure is another bug explanation of which programmers tend to be a little too fond. However, it does happen occasionally, so it’s definitely something worth testing if you run out of other ideas.

That’s pretty much all I can think of at this point. There are a few more tips that spring to my mind, but they are pretty specific to Windows or Visual Studio development, so I won’t recount them here.

First Chance .NET Exception Handling

I saw this article posted on Reddit’s programming feed*, which talks about a Visual Studio debugging technique for getting first crack at exceptions, before any upstream handlers run. The Debug->Exceptions dialog can be used to set the debugger to break before any exception handlers fire for a particular exception. This is useful not only for debugging code that interfaces with third-party libraries (where it is often unclear why an exception might be thrown), but also your own code. Why?

Imagine that you have some code that runs in an interactive session, but which has a high-level catch block to catch and report errors. This catch block may attempt to continue execution when an error occurs – for example, writing one file in a batch might fail, but the code should continue writing other files. Unfortunately, the exception log produced by this might not provide sufficient information to debug the issue. For example, many of the I/O exception types fail to include information about which file or directory was being modified when the exception was thrown!

In tracking down cases like these, it’s often easier to set the debugger to catch that particular exception at the first chance it gets, and then examine the state of the calling function where the exception was thrown. Having both a call stack and the specific line of code where the exception was thrown often allows you to see the problem immediately, without further investigation.

Another illuminating debugging exercise is to turn on first chance exception handling for all exceptions, and then see where they are thrown. Code that masks exceptions (by using an untyped, empty catch block) is particularly nasty, as it can leave the program in an ill-defined state, but without any feedback to indicate that something failed. A different problem results from the frequent use of exceptions: they have a significant negative performance impact. Turning on first chance exception handling makes it trivial to find these cases, if you hadn’t already noticed the debug window spam that frequent exceptions tend to create.

 

 

* There’s a lot of stuff that gets posted there that gets my (figurative) blood pressure up, so we’ll see how long I stick with it. But I figured it would be a good exercise to read it regularly and write about things I see there.

Windows 7 on my netbook

I decided to try running Windows 7 Ultimate on my netbook (an Asus EEE PC), since I had heard that it was nearly as fast as XP (and much more modern). I installed it by burning the ISO (acquired by virtue of being an MSDN subscriber) and then booting the netbook from an external DVD drive. I am happy to report that Windows 7 works like a charm, with the only major wrinkle being that I had to reinstall the ACPI drivers along with some of the other Asus utilities to get the special function keys working as they did before. (The link there is a decent summary of what is required, although I actually came across this information elsewhere.)

Windows 7 takes longer to boot than XP on the EEE PC, but performance is comparable once the OS is loaded — I have no complaints about the experience so far. The only other strange thing is that the video driver seems to have issues with coming out of sleep mode — the nice thing is that the Vista driver model allows the OS to completely restart the graphics system, so the machine actually recovers gracefully from this. I’m hoping that there will be future driver updates to solve this issue, but it’s not a big deal at this point.

Using GPPG and GPLEX with Visual Studio

Here’s a quick note on a problem I ran into awhile back. I was using the GPPG and GPLEX parser tools as part of a Visual Studio project – the input files for these tools generate C# source files which are then compiled into the project. However, I noticed a problem with the recommended project setup (basically, setting up a MSBuild .targets file for GPPG and GPLEX source files, and then including that target file in the project). Changes that I made to the grammar would only take effect the second time I built the project. The samples and documentation for MPPG and MPLEX (earlier versions of GPPG and GPLEX) are silent on this issue. After inserting some debug code into my build targets, I determined that the files being output by the GPPG and GPLEX target handlers were correct, but the compiler was still using the older versions of the files. It seemed like it was using a cached copy of the old version of the grammar.

As it turns out, there is a bit of chicanery going on inside Visual Studio that results in this behavior. Visual Studio actually runs the C# compiler in-process, as an optimization to avoid process start overhead. This in-process compiler gets fouled up, however, when C# source files are generated as part of a build step – it seems to load all of the source files when the build starts, so it winds up using the old version instead of the freshly generated one.

The solution is to add the UseHostCompilerIfAvailable property to the .csproj project file, like this:

<PropertyGroup>
	<UseHostCompilerIfAvailable>False</UseHostCompilerIfAvailable>
</PropertyGroup>

This will force Visual Studio to use the out-of-process compiler, which will cause the correct version of the grammar to be built. Building the project will be a little slower, but it’s better than having to build twice!

Big Trouble

The D-League is undergoing quite a bit of turmoil at the moment. The Arsenal have left town, and the Bakersfield Jam’s owner, in one of the more amusing euphemisms I’ve ever seen, has declared that “I wouldn’t say we’re folding. We’re just not going to operate anymore.” The Albuquerque Thunderbirds have laid off all of their staff, and even the D-League champion Colorado 14ers may be folding up their tent. While the D-League isn’t threatening to turn into the USBL or ABA, this is still cause for concern.

When even NBA teams are having money issues, it’s no surprise that the minor leagues are also suffering. Anecdotally, the crowds have been quite a bit smaller this season — I’m guessing that the discretionary income of people who might be D-League customers in better times is severely restricted or non-existent right now. I’m not really sure what the answer to this problem is, apart from better cost controls and more aggressive promotions and community outreach. The best-attended Arsenal games tended to be ones that had community involvement — large groups performing during halftime or otherwise involved in some sort of festivities.

With the support of the NBA, I don’t think that the D-League is in any danger of going away, but the ownership groups of some teams may have dug themselves holes that will be difficult to escape. An additional complicating factor is that the NBA’s collective bargaining contract restricts the interactions of NBA teams and their D-League affiliates — with some reforms, I feel that NBA teams would have more flexibility to assign players and use the D-League, and build a stronger relationship with their affiliates.

As a final note, I was pleased to see that former Arsenal player Marcin Gortat came through in a big way for the Magic in game 6 of their series against the Sixers, with 11 points and 15 boards.

Anaheim Arsenal: 2006-2009

The Arsenal recently finished out their season with an epic 12-game losing streak. As it turns out, this will be their final season — the team announced that the franchise was being transferred to another organization, which will relocate the team to Springfield, MA for next season. I am, naturally, pretty disappointed, although the move is not entirely unexpected.

It’s sort of strange — I feel that, overall, the team personnel this year was a little bit better than last year. Kedrick Brown, Cedric Bozeman, and James White all had solid years, and I really feel like Brown in particular improved his game quite a bit since last year. However, it seems like the team’s offense relied too much on a philosophy of more shooting from the guards this year — I feel like the team’s ball-handling took a dive this year, and it really hurt them down the stretch in close games. (And this is even after the team parted ways with Tierre Brown, who never met a shot he didn’t like…) There were a number of games where the Arsenal were close or slightly ahead in the final minutes, but had key turnovers on the offensive end that were the difference makers. The lack of solid ball-handlers, and guards who could drive and dish under pressure, seems like the biggest difference between last year’s team (which had a solid second half and very nearly climbed back to .500) and this year’s, which finished tied for the worst record in the league.

The one player who I feel was pretty much a total bust was Malick Badiane, who had “project player” written all over him. He didn’t have much of a shot, particularly once you got past about 5 feet from the basket, didn’t rebound well, didn’t pass well, didn’t have great court awareness (particularly on the offensive end without the ball), was only an okay defender, and was prone to committing poor fouls. In short, not a heck  of a lot of upside for a big man.

Sadly, it also felt like the team quit down the stretch — I’m not sure if they knew the news about the move, or if there was contention between the players and the coach, but there was a palpable lack of enthusiasm and team play as the season went on. I don’t have any sort of inside information on this matter, but it was pretty clear to me as someone who was at nearly every home game.

The expanded D-League playoffs are about to begin, but it’s hard for me to muster up a lot of enthusiasm about it. This is the first time that a team I followed and rooted for has simply ceased to exist, and without the prospect of a local replacement (I can’t support the D-Fenders, if only because of the Laker connection), I find myself in the strange position of feeling my fandom being taken away from me, instead of me drifting away from it.

Broken Headset

A few weeks ago, my beloved Plantronics Discovery 925 started acting up — the microphone on it was apparently failing. People that I called using the headset said that they could only hear me very faintly, although I could still hear them just fine. A couple of support e-mails to Plantronics later, and I received a refurb replacement. I’m not thrilled with the fact that it’s a refurb, but it seems to work just fine.

Normally I would be annoyed that something I bought failed so soon after I bought it, but I like the headset so much that I felt more relief than anger when I got my replacement.