Buildd maintenance

In Debian, I've been a buildd maintainer since 2001; most of that time was for the m68k port (I still am active there, though not as much as I used to be), but there's also been a short stint with armeb, and since a while I'm now also a PowerPC buildd maintainer. I used to do just one powerpc host at first, but now I maintain both malo and voltaire, with Philipp Kern doing praetorius.

This probably makes me one of the more experienced buildd maintainers in Debian today, together with the likes of LaMont Jones and Ryan Murray. I did do a talk about how this is supposed to work at FOSDEM 2004, but that's now five years ago, and some things have changed since. Also, a not-videotaped talk isn't very helpful if you weren't there.

So I'd thought I'd write up what it means to be a buildd maintainer. There's of course the documentation on the website, but that only explains how the system works in theory; it does not explain what us buildd maintainers tend to do on a daily basis.

So let's have a look at that, shall we?

Basically, the work of a buildd maintainer is pretty monotonuous, and an experienced buildd maintainer will usually have a set of scripts to help them. Their work can be categorized into three main categories. In order of frequency, these are:

  1. Log handling;
  2. State handling;
  3. Host and chroot maintenance

Log handling

The first is the most obvious one. Every time the buildd builds a package, it will send a full log of the build to and to myself. The successful ones are signed with a simple script:

sed -i -e '1,/\.changes:$/d;/^?:space:*$/,$d' $1 | tr -d "\200-\377" > $tmpfile
cat $tmpfile > $1
rm $tmpfile

Easy: use a sed command to fish out the embedded .changes file, and write that to the original file. I use a folder-hook to set this script as my 'editor' in mutt when I'm in my buildd mail directory; thus the result is thereby mailed off to the buildd. In that same folder-hook, mutt is also configured to send the reply gpg-signed in the 'traditional' format, without confirmation, and with just one keystroke, so that (after I have entered my GPG key passphrase) I can send off all the signed changes files in one go. A possible improvement could be to change the macro so that it would work with mutt's 'tag' feature (it doesn't, currently), but that's not a big issue (currently, doing 100 mails takes a few seconds and some careful counting).

Note the 'tr'; this is to avoid 8bit characters from appearing in the mail, which might otherwise be converted to their quoted-printable version in transit to the buildd; and since buildd-mail (the part that receives that mail) does not understand MIME, this would corrupt the GPG signature. This way, we do lose a few characters from the changelog, but that doesn't really matter -- the source still contains the unmodified changelog entry.

With this script, I often handle my 'success' folder several times a day. It's no effort, anyway.

The somewhat harder but much more fun part of log handling, is the handling of failure mails. Since there are loads and loads of possible failures, the scripts to handle these are somewhat more involved. I did receive a script from LaMont at some point, a few years ago, which I then built on so as to improve it. It's not perfect, but it does handle a few common cases with no extra input from me. Some of the others are not so easy, however.

One of the more common cases that cannot easily be automated is the case of the buildd failing to install a certain package, because 'foo depends on bar, but it will not be installed'. This is apt's way of telling you that bar depends on foobar which depends on quux (>= 1:2.3.4-5) which depends on libfrobnitz2, but that has now been replaced by libfrobnitz3. Or some such. The only way to figure out what the hell the problem is, is to walk the dependency tree and figure out stuff from there.

There is an 'edos-debcheck' that reportedly can help with this; personally, I wrote a set of perl scripts that will cache a Packages file into DBM files, and then allow you to walk over them to help you figure out what's wrong. They're not perfect, but if you use the '-v' option to check-dep-waits and verify the output when it tells you about missing libraries, it should be able to figure out the whole dependency tree I described above, and will allow me to write a proper dep-wait response, allowing the buildd host to automatically retry the package when the missing dependency is available.

Also somewhat common and routine are things like transient network failures (in which case we use either 'retry' or 'give-back' if the buildd hasn't figured that out by itself and done the latter), the maintainer uploading a new version of the package while the previous version is building (resulting in wanna-build firing off an email to the buildd host, which in turn results in buildd killing the build by removing the build directory; this is not always easily distinguishable from a regular failure, so I commonly respond to that mail with a failure message; if it did indeed fail because of a newer version, then buildd will notice that and ignore my mail), the incoming.d.o Packages file (which is only available to buildd hosts, so don't ask) being out of sync with reality (which happens 4 times a day for about an hour. In this case build-deps will fail to install, requiring a retry or give-back), and similar things.

Other things are less common; but because of that, they are not routine and require an in-depth investigation. Sometimes the fix is to just file a bug report and/or to mark the package as 'failed' (and let the maintainer or a porter handle the problem); sometimes the failures are due a maintainer script in a package being utterly broken, resulting in either some build-deps being uninstallable or (worse) the buildd chroot being fucked up. Sometimes a build is interrupted halfway through, leaving the chroot in an unclean state (sbuild is not pbuilder, and does not remove and recreate its chroot between builds). This would push us to category 3 of our work.

Basically, however, figuring out which is which takes some experience. Not all compilers are based on gcc (there are some really weird languages in Debian), and thus not all of their error output is the same; learning their different error modes can help quite a lot. Additionally, by continually compiling 10G worth of software, you'll be stress-testing your toolchain. If you've never seen an 'Internal Compiler Error' before, you will once you become a buildd maintainer, and it helps if you know what they are and how to deal with them (even if there isn't much one can do beyond filing bugs).

Obviously, handling failures takes some more time than does handling success mails, and it's not something I do quite as often. The exact time between both varies, but it's usually somewhere between a few days and one or two weeks—unless I suddenly stop receiving success mails from one of my buildd hosts, in which case I know something is utterly wrong and will usually investigate immediately.

State handling

With 'state handling', I mean managing the state of a package in the wanna-build database. There's help about this from the people on the debian-wb-team mailinglist; call me oldfashioned, but I still do consider this to be the final responsibility of the buildd maintainers. After all, the routine state changes are a result of decisions that I make; as such, if I fuck up, it should be me who fixes the fuckup. Also, if I mark a package as 'failed' because I believe the maintainer fucked up, then the debian-wb-team people may not know about my reasoning there, and might give the package back to another failure (although I would consider the latter pretty rare).

These requests are pretty common. Quite often, they're unnecessary—many maintainers are unaware of the intricacies of the wanna-build system, and may misunderstand that when a build is in dep-wait state, it will automatically migrate to needs-build once dependencies are available. About as often, however, they are very much necessary, and, since regular Debian package maintainers do not have access to the wanna-build database, require someone who does have access to said database to update it for them.

Having said that, there are cases where I will preemptively edit the wanna-build database. Usually this is to do something useful with packages in 'Building' state that have been in that state for far too long; either upload the package if its signature mail got lost (which happens once in a blue moon), or give the package back if its build was not attempted even though it is marked as such (this should not happen, but the system is not perfect and it does). Sometimes this is because I figured out that some common build-dependency (say, the GTK or Qt libraries) are in a transitional state and currently not installable; and rather than having a build daemon try a bunch of packages and failing them all, I may want to note in the wanna-build database that they should not bother attempting these 75 packages before the GTK package was done. This isn't done as often on the official Debian machines (since the release managers will do it for me there), but in m68k we do need to do this ourselves.

These kind of requests happen once every few days up to once every few weeks, and take little time to deal with.

Host and chroot maintenance

This is the hardest and least fun part of buildd maintenance, but it is just as necessary. Luckily, it is not as often needed.

Because Debian Unstable is a system that's in a constant state of flux, often things will break. This is even more of a problem on a buildd chroot, since it builds out of incoming; a maintainer may upload a package with a fucked postinst script, have its build succeed, but then fail spectacularly to install. This maintainer may notice that, and may upload a new package half an hour later. As such, the broken package will not end up on the system of a user or Debian Unstable, but between the time of the upload of the broken package and that of the new package, the old package will be available to buildd hosts, who may use it to completely and utterly destroy their build chroot. The joys of having a high turnaround time.

Luckily, Debian package maintainers are not stupid, and this kind of fuckup does not happen every other day. It does happen, however, and when it does, this often means manual work for the buildd maintainer. In the best case, it's a matter of syntactically fixing a postinst script and calling 'apt-get -f install' or 'dpkg --configure -a'. In the worst case (which is almost, but not quite entirely, totally unheard of), it's a matter of rebuilding the buildd chroot. In addition to that, a machine which runs 24/7 for the sole purpose of building packages tends to generate quite a lot of disk activity, which in turn tends to be detrimental to the disk in the long turn. If not looked after properly, disks will die, taking the entire buildd chroot with them. That requires rebuilding them. Obviously, this last issue is dealt with by the Debian System Administration team in the case of official Debian hosts, but the same is not true for the m68k port.

A somewhat more common thing that needs to be taken care of is the fact that buildd does not in all cases clean up after itself. For instance, when a new version of a package is uploaded to the archive between the time that the buildd host built it and the time the buildd maintainer sent the signed .changes file back, then buildd will say "I haven't got that package taken as Building" and refuse to upload it. This makes sense (you can't upload an old version of a package, since there wouldn't be any source for it, and dak would refuse the upload), but it does mean that the packages aren't cleaned out. Arguably a bug in buildd-mail, over time it will result in the disk filling up with outdated packages, and those require manual work from the admin. I recently (as in, a few hours ago now) finished a script to check each .changes file in the "build result" directory against the wanna-build database, and list those that are no longer necessary. I already had a script that, given a list of .changes files, would remove every .deb file listed in the given .changes files, and then proceed to remove the .changes files themselves. Combined, these do make that kind of work somewhat less of a burden.

As said, this kind of work does not need to be done all that often; for instance, I just cleaned the build result directory on voltaire and malo, my two powerpc buildd hosts, and found old files from late 2008...

And that's it, I guess. It may seem to be quite much, but in reality it isn't; the thing I've always liked about buildd maintenance is the fact that you do something little for Debian every day, but that it ends up being something big and helpful after a while.

Of course, the little things are the cherry on the cake. By looking at a lot of build logs, one eventually learns a thing or two about build systems, which is valuable knowledge. Getting build logs from the whole of Debian allows one to learn things about the archive that many people don't know about—for instance, did you know that we had a package called trousers? I didn't, until I signed the buildd log...

Update: changed the URL of this post to be under the buildd/ directory, rather than having it conflict with that and thus killing its permalink and making it impossible to comment on this post. Oops.

Interesting read

Interesting stuff, always fun to read about what goes on behind the scenes.

As a user of Debian, thank you for the effort you put in.

Comment by James early Sunday morning, July 26th, 2009