On beards and politics.
I'd almost forget...
Before:
During:
After:
They finally made it.
Static program analysis with LLVM and clang
"Static program analysis" is a technique whereby a program is verified for errors without actually running it. Finding bugs manually with a debugger and one's brain is tedious, so every shortcut that can help you avoid having to do so is great.
There are some commercial tools available to do such analysis, some of which are rather expensive; but there are also some open source tools available to do similar things. One of these is built into clang, the C compiler of the LLVM project.
Using it is fairly simple. Instead of compiling something with 'make', compile it with 'scan-build make'. This will set the CC (and similar) environment variable(s) so that before the compiler is ran, the clang static-analysis checker is run over the very same source code. The output of this checker is an HTML file with your source, but with comments added to explain the bugs which the tool found.
What does that mean? well, let's look at an example, shall we?
The iframe above contains one out of three reports produced by a scan-build run over the nbd source code (if there's no iframe, someone scrubbed some HTML in your RSS reader. Just follow the link just above instead). The other two are 'dead assignments', which might mean that I'm currently depending on some undefined behaviour (which would be bad), or it might mean that I'm being overly cautious (which makes my code future-proof, which would be good), or it might mean something else—I still need to investigate. But this one is pretty interesting.
In the example, there are eight, numbered, comments in the source code. The first seven show the code path which scan-build took through my code before getting at the eighth comment; and the eighth is where things go bad. In this case, when going through the function as shown, we have a NULL pointer dereference.
When looking at the scan-build output, it's important to realize a few things. First, the code path shown may be just one of a number of possible code paths. For instance, the null pointer dereference would still happen if the phase function parameter would not contain the NEG_INIT bit with the client pointer set to NULL. However, clang does not show these other code paths; this is presumably an optimization ("if we've already shown that this kind of bug is possible at a particular location through one code path, don't bother recording future instances of that very same bug at the exact same location through another code path"). This means that sometimes, some of the branches shown may be completely irrelevant to the bug at hand. In this particular case, in fact, it's possible to show the bug with just the eighth comment; the first seven are in fact totally irrelevant.
Second, the fact that the clang static analyser found a bug does not mean that it's possible to crash the application. Yet. In this particular case, the negotiate function will never be called with the NEG_INIT or NEG_MODERN bits not set, and with the client parameter set to NULL. That's an implicit assertion; there are a few ways in which this function may be called, but the client parameter may only be NULL if NEG_INIT and NEG_MODERN are both set at the same time.
Since nbd-server doesn't currently call the negotiate function in that way, it is not currently possible to crash the server by exploiting this bug. But that doesn't mean it won't ever be possible, nor that there isn't a bug in the code. We may assume that the above rules are true, but we never check it. Adding an assertion to that effect should make sure that no future change to the code will accidentally introduce that error and cause a NULL pointer dereference.
Is this a silly and useless precautionary measure? Not really. Usually, bugs happen in code not because someone wasn't thinking straight, but because there's so much going on inside a piece of software that it can be too much for any programmer to remember. If a function assumes that its parameters are within a given subset of all possible states, but does not check that this is in fact true, then when (not if) some future change incorrectly introduces a state that is outside of the assumed states, things will break. And that's Bad(TM).
Rsync'ing over a newer subversion (fsfs) repository
So there are two servers; let's call them srv1 and srv2. Srv1 contains a bunch of subversion repositories, but these are to be migrated to srv2. Since the repositories are not (just) used for ascii-only files, they're fairly big (several tens of gigabytes, altogether), so copying them from one server to the other would take a while. In order to make sure this would happen quickly, we had already copied them over to the new server, so that on the final switch would be quick (an rsync that would copy over just the new done transactions).
That final switch was today. Only I didn't know that instead of just testing, the customer had already started using one of the repositories (and they'd forgotten to remind me), so the subversion repository suddenly jumped backwards in time. In addition, the new server wasn't being backed up yet (at least not for the subversion bits), so restoring from backups wasn't an option. Oops.
Luckily the solution is fairly simple. You see, fsfs stores each revision in a unique file; that means that as long as nobody has committed to the repository yet (which they couldn't, since system users aren't the same on both servers, so the webserver didn't have write permissions on the repository after the rsync), nothing is lost. One only needs to manually change the repository so that whatever subversion thinks is the latest commit, actually is the latest commit.
That information is stored in a file called db/current inside the repository. What's in that file depends on the repository version, which is stored in a file called db/format in the repository. For versions 1 and 2, the format is a single line with three space-separated values, of which the first is the last revision number used in the repository. The other two are counters that are used to give transactions unique names; and they, too, need to be up-to-date. For version 3 and above, the file contains only the revision number; there, the other two are derived from that instead of having their own unique number.
Figuring out the last used revision number in an fsfs repository is ridiculously easy:
ls -v db/revs|tail -n1
So if you've got a repository of fsfs version 3 or above, just change the revision number in the db/current file (after taking backups and making sure nobody can access the repository while you're doing this, of course), and you're all set.
Unfortunately, in my case, the repository was still in fsfs version 2, which meant I could not change just the revision number and not expect trouble. I suppose it should've been possible to figure out what the last transaction numbers are, somehow, so that I could fix the current file completely, but I reasoned that upgrading to a newer repository format might have other advantages too, so I just dumped the repository and reloaded it, and everything worked at that point.