Static program analysis with LLVM and clang

"Static program analysis" is a technique whereby a program is verified for errors without actually running it. Finding bugs manually with a debugger and one's brain is tedious, so every shortcut that can help you avoid having to do so is great.

There are some commercial tools available to do such analysis, some of which are rather expensive; but there are also some open source tools available to do similar things. One of these is built into clang, the C compiler of the LLVM project.

Using it is fairly simple. Instead of compiling something with 'make', compile it with 'scan-build make'. This will set the CC (and similar) environment variable(s) so that before the compiler is ran, the clang static-analysis checker is run over the very same source code. The output of this checker is an HTML file with your source, but with comments added to explain the bugs which the tool found.

What does that mean? well, let's look at an example, shall we?

The iframe above contains one out of three reports produced by a scan-build run over the nbd source code (if there's no iframe, someone scrubbed some HTML in your RSS reader. Just follow the link just above instead). The other two are 'dead assignments', which might mean that I'm currently depending on some undefined behaviour (which would be bad), or it might mean that I'm being overly cautious (which makes my code future-proof, which would be good), or it might mean something else—I still need to investigate. But this one is pretty interesting.

In the example, there are eight, numbered, comments in the source code. The first seven show the code path which scan-build took through my code before getting at the eighth comment; and the eighth is where things go bad. In this case, when going through the function as shown, we have a NULL pointer dereference.

When looking at the scan-build output, it's important to realize a few things. First, the code path shown may be just one of a number of possible code paths. For instance, the null pointer dereference would still happen if the phase function parameter would not contain the NEG_INIT bit with the client pointer set to NULL. However, clang does not show these other code paths; this is presumably an optimization ("if we've already shown that this kind of bug is possible at a particular location through one code path, don't bother recording future instances of that very same bug at the exact same location through another code path"). This means that sometimes, some of the branches shown may be completely irrelevant to the bug at hand. In this particular case, in fact, it's possible to show the bug with just the eighth comment; the first seven are in fact totally irrelevant.

Second, the fact that the clang static analyser found a bug does not mean that it's possible to crash the application. Yet. In this particular case, the negotiate function will never be called with the NEG_INIT or NEG_MODERN bits not set, and with the client parameter set to NULL. That's an implicit assertion; there are a few ways in which this function may be called, but the client parameter may only be NULL if NEG_INIT and NEG_MODERN are both set at the same time.

Since nbd-server doesn't currently call the negotiate function in that way, it is not currently possible to crash the server by exploiting this bug. But that doesn't mean it won't ever be possible, nor that there isn't a bug in the code. We may assume that the above rules are true, but we never check it. Adding an assertion to that effect should make sure that no future change to the code will accidentally introduce that error and cause a NULL pointer dereference.

Is this a silly and useless precautionary measure? Not really. Usually, bugs happen in code not because someone wasn't thinking straight, but because there's so much going on inside a piece of software that it can be too much for any programmer to remember. If a function assumes that its parameters are within a given subset of all possible states, but does not check that this is in fact true, then when (not if) some future change incorrectly introduces a state that is outside of the assumed states, things will break. And that's Bad(TM).