WEBlog -- Wouter's Eclectic Blog

Mon, 26 Jul 2010

Maintainer stupidity

As a buildd admin, you get to see the myriad of ways in which packages can fail to build. Sometimes this is due to interesting technical reasons about the architecture in question; sometimes... not so much.

cc1: unknown option: -mmmx

For reference, I maintain powerpc and m68k buildd hosts.

configure: error: pkg-config: command not found

Checking build-dependencies is hard. Right?

ccache: failed to create /home/buildd/.ccache (No such file or directory)

No, I don't manually install ccache inside the buildd chroot, which must mean some build-dependency pulled it in. Why one would want to do that, I dunno—it's not like a build that takes longer would produce a different package, so it clearly is not required for the build.

(BTW, the invalidity of $HOME is on purpose -- packages are not allowed to write outside their build directory during builds, which includes home directories)

... and more. Sometimes, I wonder how people get the right to upload. But, well.

Wed, 22 Jul 2009

Buildd maintenance

In Debian, I've been a buildd maintainer since 2001; most of that time was for the m68k port (I still am active there, though not as much as I used to be), but there's also been a short stint with armeb, and since a while I'm now also a PowerPC buildd maintainer. I used to do just one powerpc host at first, but now I maintain both malo and voltaire, with Philipp Kern doing praetorius.

This probably makes me one of the more experienced buildd maintainers in Debian today, together with the likes of LaMont Jones and Ryan Murray. I did do a talk about how this is supposed to work at FOSDEM 2004, but that's now five years ago, and some things have changed since. Also, a not-videotaped talk isn't very helpful if you weren't there.

So I'd thought I'd write up what it means to be a buildd maintainer. There's of course the documentation on the Debian.org website, but that only explains how the system works in theory; it does not explain what us buildd maintainers tend to do on a daily basis.

So let's have a look at that, shall we?

Basically, the work of a buildd maintainer is pretty monotonuous, and an experienced buildd maintainer will usually have a set of scripts to help them. Their work can be categorized into three main categories. In order of frequency, these are:

  1. Log handling;
  2. State handling;
  3. Host and chroot maintenance

Log handling

The first is the most obvious one. Every time the buildd builds a package, it will send a full log of the build to buildd.debian.org and to myself. The successful ones are signed with a simple script:

#!/bin/bash
tmpfile=$(mktemp)
sed -i -e '1,/\.changes:$/d;/^[[:space:]]*$/,$d' $1 | tr -d "\200-\377" > $tmpfile
cat $tmpfile > $1
rm $tmpfile

Easy: use a sed command to fish out the embedded .changes file, and write that to the original file. I use a folder-hook to set this script as my 'editor' in mutt when I'm in my buildd mail directory; thus the result is thereby mailed off to the buildd. In that same folder-hook, mutt is also configured to send the reply gpg-signed in the 'traditional' format, without confirmation, and with just one keystroke, so that (after I have entered my GPG key passphrase) I can send off all the signed changes files in one go. A possible improvement could be to change the macro so that it would work with mutt's 'tag' feature (it doesn't, currently), but that's not a big issue (currently, doing 100 mails takes a few seconds and some careful counting).

Note the 'tr'; this is to avoid 8bit characters from appearing in the mail, which might otherwise be converted to their quoted-printable version in transit to the buildd; and since buildd-mail (the part that receives that mail) does not understand MIME, this would corrupt the GPG signature. This way, we do lose a few characters from the changelog, but that doesn't really matter -- the source still contains the unmodified changelog entry.

With this script, I often handle my 'success' folder several times a day. It's no effort, anyway.

The somewhat harder but much more fun part of log handling, is the handling of failure mails. Since there are loads and loads of possible failures, the scripts to handle these are somewhat more involved. I did receive a script from LaMont at some point, a few years ago, which I then built on so as to improve it. It's not perfect, but it does handle a few common cases with no extra input from me. Some of the others are not so easy, however.

One of the more common cases that cannot easily be automated is the case of the buildd failing to install a certain package, because 'foo depends on bar, but it will not be installed'. This is apt's way of telling you that bar depends on foobar which depends on quux (>= 1:2.3.4-5) which depends on libfrobnitz2, but that has now been replaced by libfrobnitz3. Or some such. The only way to figure out what the hell the problem is, is to walk the dependency tree and figure out stuff from there.

There is an 'edos-debcheck' that reportedly can help with this; personally, I wrote a set of perl scripts that will cache a Packages file into DBM files, and then allow you to walk over them to help you figure out what's wrong. They're not perfect, but if you use the '-v' option to check-dep-waits and verify the output when it tells you about missing libraries, it should be able to figure out the whole dependency tree I described above, and will allow me to write a proper dep-wait response, allowing the buildd host to automatically retry the package when the missing dependency is available.

Also somewhat common and routine are things like transient network failures (in which case we use either 'retry' or 'give-back' if the buildd hasn't figured that out by itself and done the latter), the maintainer uploading a new version of the package while the previous version is building (resulting in wanna-build firing off an email to the buildd host, which in turn results in buildd killing the build by removing the build directory; this is not always easily distinguishable from a regular failure, so I commonly respond to that mail with a failure message; if it did indeed fail because of a newer version, then buildd will notice that and ignore my mail), the incoming.d.o Packages file (which is only available to buildd hosts, so don't ask) being out of sync with reality (which happens 4 times a day for about an hour. In this case build-deps will fail to install, requiring a retry or give-back), and similar things.

Other things are less common; but because of that, they are not routine and require an in-depth investigation. Sometimes the fix is to just file a bug report and/or to mark the package as 'failed' (and let the maintainer or a porter handle the problem); sometimes the failures are due a maintainer script in a package being utterly broken, resulting in either some build-deps being uninstallable or (worse) the buildd chroot being fucked up. Sometimes a build is interrupted halfway through, leaving the chroot in an unclean state (sbuild is not pbuilder, and does not remove and recreate its chroot between builds). This would push us to category 3 of our work.

Basically, however, figuring out which is which takes some experience. Not all compilers are based on gcc (there are some really weird languages in Debian), and thus not all of their error output is the same; learning their different error modes can help quite a lot. Additionally, by continually compiling 10G worth of software, you'll be stress-testing your toolchain. If you've never seen an 'Internal Compiler Error' before, you will once you become a buildd maintainer, and it helps if you know what they are and how to deal with them (even if there isn't much one can do beyond filing bugs).

Obviously, handling failures takes some more time than does handling success mails, and it's not something I do quite as often. The exact time between both varies, but it's usually somewhere between a few days and one or two weeks—unless I suddenly stop receiving success mails from one of my buildd hosts, in which case I know something is utterly wrong and will usually investigate immediately.

State handling

With 'state handling', I mean managing the state of a package in the wanna-build database. There's help about this from the people on the debian-wb-team mailinglist; call me oldfashioned, but I still do consider this to be the final responsibility of the buildd maintainers. After all, the routine state changes are a result of decisions that I make; as such, if I fuck up, it should be me who fixes the fuckup. Also, if I mark a package as 'failed' because I believe the maintainer fucked up, then the debian-wb-team people may not know about my reasoning there, and might give the package back to another failure (although I would consider the latter pretty rare).

These requests are pretty common. Quite often, they're unnecessary—many maintainers are unaware of the intricacies of the wanna-build system, and may misunderstand that when a build is in dep-wait state, it will automatically migrate to needs-build once dependencies are available. About as often, however, they are very much necessary, and, since regular Debian package maintainers do not have access to the wanna-build database, require someone who does have access to said database to update it for them.

Having said that, there are cases where I will preemptively edit the wanna-build database. Usually this is to do something useful with packages in 'Building' state that have been in that state for far too long; either upload the package if its signature mail got lost (which happens once in a blue moon), or give the package back if its build was not attempted even though it is marked as such (this should not happen, but the system is not perfect and it does). Sometimes this is because I figured out that some common build-dependency (say, the GTK or Qt libraries) are in a transitional state and currently not installable; and rather than having a build daemon try a bunch of packages and failing them all, I may want to note in the wanna-build database that they should not bother attempting these 75 packages before the GTK package was done. This isn't done as often on the official Debian machines (since the release managers will do it for me there), but in m68k we do need to do this ourselves.

These kind of requests happen once every few days up to once every few weeks, and take little time to deal with.

Host and chroot maintenance

This is the hardest and least fun part of buildd maintenance, but it is just as necessary. Luckily, it is not as often needed.

Because Debian Unstable is a system that's in a constant state of flux, often things will break. This is even more of a problem on a buildd chroot, since it builds out of incoming; a maintainer may upload a package with a fucked postinst script, have its build succeed, but then fail spectacularly to install. This maintainer may notice that, and may upload a new package half an hour later. As such, the broken package will not end up on the system of a user or Debian Unstable, but between the time of the upload of the broken package and that of the new package, the old package will be available to buildd hosts, who may use it to completely and utterly destroy their build chroot. The joys of having a high turnaround time.

Luckily, Debian package maintainers are not stupid, and this kind of fuckup does not happen every other day. It does happen, however, and when it does, this often means manual work for the buildd maintainer. In the best case, it's a matter of syntactically fixing a postinst script and calling 'apt-get -f install' or 'dpkg --configure -a'. In the worst case (which is almost, but not quite entirely, totally unheard of), it's a matter of rebuilding the buildd chroot. In addition to that, a machine which runs 24/7 for the sole purpose of building packages tends to generate quite a lot of disk activity, which in turn tends to be detrimental to the disk in the long turn. If not looked after properly, disks will die, taking the entire buildd chroot with them. That requires rebuilding them. Obviously, this last issue is dealt with by the Debian System Administration team in the case of official Debian hosts, but the same is not true for the m68k port.

A somewhat more common thing that needs to be taken care of is the fact that buildd does not in all cases clean up after itself. For instance, when a new version of a package is uploaded to the archive between the time that the buildd host built it and the time the buildd maintainer sent the signed .changes file back, then buildd will say "I haven't got that package taken as Building" and refuse to upload it. This makes sense (you can't upload an old version of a package, since there wouldn't be any source for it, and dak would refuse the upload), but it does mean that the packages aren't cleaned out. Arguably a bug in buildd-mail, over time it will result in the disk filling up with outdated packages, and those require manual work from the admin. I recently (as in, a few hours ago now) finished a script to check each .changes file in the "build result" directory against the wanna-build database, and list those that are no longer necessary. I already had a script that, given a list of .changes files, would remove every .deb file listed in the given .changes files, and then proceed to remove the .changes files themselves. Combined, these do make that kind of work somewhat less of a burden.

As said, this kind of work does not need to be done all that often; for instance, I just cleaned the build result directory on voltaire and malo, my two powerpc buildd hosts, and found old files from late 2008...

And that's it, I guess. It may seem to be quite much, but in reality it isn't; the thing I've always liked about buildd maintenance is the fact that you do something little for Debian every day, but that it ends up being something big and helpful after a while.

Of course, the little things are the cherry on the cake. By looking at a lot of build logs, one eventually learns a thing or two about build systems, which is valuable knowledge. Getting build logs from the whole of Debian allows one to learn things about the archive that many people don't know about—for instance, did you know that we had a package called trousers? I didn't, until I signed the buildd log...

Update: changed the URL of this post to be under the buildd/ directory, rather than having it conflict with that and thus killing its permalink and making it impossible to comment on this post. Oops.

Mon, 19 Jan 2009

Re: emulated buildds

Aurelien, your claim is wrong. About a year ago (IIRC; might've been longer), the Debian/m68k team decided that it wanted to do emulated buildd hosts, since that would allow us to more easily keep up with unstable. We discussed it in the team, we discussed it with ftp-masters, and we all decided to go for it. We've had emulated m68k builds go to the archive for quite a long time, with full knowledge and agreement of ftp-masters. We realized that emulated builds could be problematic, but we evaluated the issues and decided to take that risk, as a team.

The reason your key was rejected for uploads of arm binaries was because you started doing those emulated builds without discussing it with the arm buildd maintainers, and without discussing it with the arm porters. You just decided it might help, so it must be good, right?

Finding the difference is left as an exercise to the reader.

Sat, 05 Jan 2008

Since you asked for it...

Chris, buildd hosts keep those time and space usage statistics in a database, which can be queried...

Script started on Sat Jan  5 19:54:15 2008
wouter@country:~$ ssh kiivi.cyber.ee
Linux kiivi 2.6.18-4-mac #1 Fri Mar 30 23:05:11 CEST 2007 m68k
No mail.
Last login: Sat Jan  5 20:41:16 2008 from d51532c45.access.telenet.be
wouter@kiivi:~$ avg-pkg-build-time -s -t | head -n 20
boost:			2015312k (2015312k lastest)
iceweasel:		1846628k (1846628k lastest)
iceape:			1674172k (1674416k lastest)
icedove:		1624848k (1624848k lastest)
xulrunner:		1611880k (1611880k lastest)
mesa:			1230432k (1230432k lastest)
koffice:		1208062k (2115156k lastest)
gcc-snapshot:		1191866k (1207164k lastest)
gcc-4.1:		1185620k (1185620k lastest)
gtk+2.0:		1148634k (1208684k lastest)
k3d:			1145580k (1145580k lastest)
ardour:			1077800k (1077800k lastest)
lyx:			924876k (924876k lastest)
gcc-4.2:		909488k (909488k lastest)
gcc-3.4:		892015k (571084k lastest)
glibc:			891408k (891408k lastest)
linux-2.6:		725961k (744700k lastest)
kdebase:		720618k (1106804k lastest)
kdelibs:		716376k (1150372k lastest)
kdepim:			711858k (784356k lastest)
wouter@kiivi:~$ avg-pkg-build-time -t | head -n 20
gcc-snapshot:		220:52:14 (2 entries, sigma 27:38:21)
axiom:			216:27:57 (1 entry, sigma 00:00:00)
gcc-4.1:		193:47:30 (1 entry, sigma 00:00:00)
k3d:			177:22:10 (1 entry, sigma 00:00:00)
gcc-3.4:		171:40:28 (4 entries, sigma 25:10:55)
boost:			141:13:38 (1 entry, sigma 00:00:00)
koffice:		140:40:23 (2 entries, sigma 53:48:58)
ace:			130:04:14 (1 entry, sigma 00:00:00)
linux-2.6:		129:36:19 (3 entries, sigma 10:12:24)
qt-x11-free:		105:00:32 (1 entry, sigma 00:00:00)
xulrunner:		97:03:38 (1 entry, sigma 00:00:00)
iceape:			96:23:13 (2 entries, sigma 03:12:58)
kdepim:			96:06:54 (2 entries, sigma 19:09:39)
icedove:		94:35:48 (1 entry, sigma 00:00:00)
iceweasel:		93:56:26 (1 entry, sigma 00:00:00)
gcc-4.2:		91:08:32 (1 entry, sigma 00:00:00)
glibc:			91:01:58 (1 entry, sigma 00:00:00)
kdevelop3:		84:27:21 (2 entries, sigma 02:47:35)
kdeedu:			82:09:55 (3 entries, sigma 10:59:18)
vtk:			81:44:20 (3 entries, sigma 00:46:32)
wouter@kiivi:~$ logout
Connection to kiivi.cyber.ee closed.
wouter@country:~$ ssh arrakis
Linux arrakis 2.6.12-1-amiga #1 Sat Aug 13 22:59:59 CEST 2005 m68k

The programs included with the Debian GNU/Linux system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.

Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
permitted by applicable law.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Please read "links2 http://nefud/info.txt"

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

No mail.
Last login: Sat Jan  5 18:48:58 2008 from hahn.os.localnet on pts/5
wouter@arrakis:~$ avg-pkg-build-time -s -t | head -n 20
wxwidgets2.6:		1648020k (1648020k lastest)
qt4-x11:		1601086k (1753532k lastest)
gcj-4.1:		1135772k (1135772k lastest)
k3d:			1094345k (1159204k lastest)
kdebase:		1087375k (1087368k lastest)
python-kde3:		1018144k (1018144k lastest)
ghc6:			991524k (991524k lastest)
gcc-3.3:		988690k (990244k lastest)
glibc:			814536k (855720k lastest)
lyx:			804672k (804672k lastest)
bouml:			729888k (729888k lastest)
kdelibs:		685565k (1142304k lastest)
freedroidrpg:		673988k (673988k lastest)
mozilla:		663066k (636896k lastest)
ardour:			647580k (691128k lastest)
mozilla-thunderbird:	591384k (591384k lastest)
omniorb4:		576288k (576288k lastest)
subversion:		568764k (624768k lastest)
xorg-server:		527520k (527520k lastest)
mozilla-firebird:	498896k (502692k lastest)
wouter@arrakis:~$ avg-pkg-build-time -t | head -n 20
enlightenment:		198842:37:14 (3 entries, sigma 344403:17:35)
axiom:			164:43:54 (1 entry, sigma 00:00:00)
ghc6:			159:20:15 (1 entry, sigma 00:00:00)
k3d:			145:18:35 (3 entries, sigma 57:08:46)
python-kde3:		126:39:34 (1 entry, sigma 00:00:00)
qt4-x11:		121:26:11 (2 entries, sigma 03:07:35)
gcj-4.1:		116:05:34 (1 entry, sigma 00:00:00)
gcc-3.3:		108:45:19 (3 entries, sigma 02:01:50)
qt-x11-free:		99:12:17 (4 entries, sigma 14:12:32)
wxwindows2.4:		90:26:51 (1 entry, sigma 00:00:00)
openscenegraph:		74:37:21 (1 entry, sigma 00:00:00)
ardour:			63:44:55 (2 entries, sigma 07:23:05)
bouml:			62:01:16 (1 entry, sigma 00:00:00)
gcc-snapshot:		61:10:07 (1 entry, sigma 00:00:00)
mozilla-thunderbird:	60:47:29 (1 entry, sigma 00:00:00)
mico:			60:15:15 (1 entry, sigma 00:00:00)
kdebase:		58:41:04 (5 entries, sigma 08:00:17)
glibc:			58:04:26 (6 entries, sigma 40:01:12)
mozilla-firebird:	52:47:13 (2 entries, sigma 11:31:07)
zeroc-ice:		52:34:03 (2 entries, sigma 28:12:19)
wouter@arrakis:~$ logout
Connection to fremen-os.dyndns.org closed.
wouter@country:~$ exit

Script done on Sat Jan  5 22:28:46 2008

"avg-pkg-build-time" is a script that queries the statistics database. It's quite CPU-intensive, so I don't like to run it too often (certainly not from cron); but, hey, here you go.

In case you didn't know, "arrakis" and "kiivi" are both official m68k/unstable buildd hosts, and especially arrakis has been for quite a long time, although it's been reinstalled once or twice since. Oh, and yes, it does appear as though some bits of the data are invalid, but most of the idea should be alright.

(oh, and before you ask, no, it didn't take two and a half hours to run those commands -- I just needed to go while the last one was running, and had only come back several hours later)

Mon, 30 Jan 2006

Ten ways to make a buildd admin's life miserable.

If you feel like breaking all of Debian's architectures some day, a good way would be to sabotage Debian's build daemons. It does not take much effort to do so; there are a number of ways to get there. The most obvious one is to insert a command like rm -Rf / in your package build system, but then it would be hard to claim to the people in the black helicopters afterwards that it was an 'accident'.

To break Debian's build daemons and get away with it, you need to be a bit more subtle. Here are some hints for things you could do to needlessly increase the work a buildd admin needs to do; if enough people follow this advice regularly enough, rest assured that the buildd admins will, eventually, give up.

  1. Make your build loop, but still produce output all the time. Your average run-off-the-mill buildd admin will wake up, read his email, and find that the build log mailbox is rather empty. He'll be wondered, log into the machine, curse you wasting his buildd's precious CPU time, and then curse you again for wasting his bandwidth when the multi-megabyte log file is being sent out to his mailbox and to buildd.debian.org (or, as in this case, experimental.ftbfs.de).
  2. Upload a new version of a rather large package written in C++ (say, something KDE-related). Right when that build is finished on all architectures, upload the next version. Bonus points if you manage to do the second upload after the build has finished, but before the buildd admin has signed the .changes file. Extra bonus points if you manage to do all of that during a backlog.
  3. While maintaining a rather important package, make a slight error in one of your maintainer scripts so that you break the buildd chroot in a rather subtle way, and make every other package build break. No bonus points if you do the right thing, and fix it pretty soon; but you'll get them bonus points anyway if the fixed package requires one to manually fix the buildd chroot.
  4. Break your postinst script so that it does not ever exit. Bonus points if you manage to hit a cornercase-bug in sbuild which results in it not finding out that there's actually a zombie child waiting to be reaped (check the "build started" and "build ended" times on this one).
  5. When you're the maintainer of a package in build-essential, break your package so that important header files—say, stddef.h—are gone.
  6. Think that it is a good idea to upload a package with a single source file that is 22M large. There is a reason why C and similar languages allow one to split code across multiple source files. Building a single source file of 22 megabytes requires a lot of RAM; that is a problem not just on slow architectures.
  7. Upload a package that hides the compiler command line while compiling. Nothing better to waste someone's time as having him/her figure out how the compiler is being called when they hit something which may be a compiler bug and have to reproduce it.
  8. Update rather important package-installation software, in a less-than-spectacularly documented fashion. Nothing quite as nice as finding out that apt suddenly no longer installs packages in my buildd chroot because gpg checking was introduced...
  9. Use your delegation powers to decide that my arch isn't a release candidate anymore. Okay, so that was our fault. Whatever.
  10. And, last but not least:

  11. Take all of the above too seriously. No, really. It's only a joke.

Mon, 24 Oct 2005

Buildd docs

Steve Kemp wonders where the documentation for buildd is. Or how to set it up. Well, good news: it's on the Debian.org website. Since about a year, actually. It contains a cheat sheet for those wanting to set up their own buildd, some background information on the whole system, and an explanation about the different states a package can have in the wanna-build system, which was based on my (now defunct) own page. Plus some more on the overview page.

I'd say, go check it out; and if you've got any questions, feel free to ask...

Fri, 21 Oct 2005

More buildd work

I've gone from managing two buildd machines a few months ago, to five now, six or seven soon. Whew.

New machines are bob and wendy (armeb), and ska/kiivi (ska was set up around the time kiivi broke down; kiivi's been fixed now, so that means one extra box too). Future ones may be jazz (m68k; Quadra950) and a third armeb box. Getting pretty crowdy—but nothing I can't handle. Yet. Though it is the highest number of buildd machines I've ever managed before.

Since Christian Steigies is back from holiday, he had a look at garkin, his atari-with-CT60-board. It had lost its serial connection while he was away, so it didn't build anymore. But other than that the connection was gone, the box was still alive and kicking -- just needed to download a few megabytes of updates, which takes rather long through serial. In case you wonder: there's no ethernet connection, because the ethernet boards haven't been manufactured yet (they've been designed and ordered at the factory which will create them, but that hasn't happened yet).

So with kiivi and garkin now both building again, the graph is going in the right direction again since a day or two. It's even more visible on the second graph; if you have a look at the new quarter graph, you'll see that the level of up-to-dateness is currently at its highest level in weeks; I count 28 days between now and the previous peak that is at approximately the same level as today. Which is good, of course.

I'm quite happy about that quarter graph, BTW. The usefullness of the other two graphs (one with all information since the statistics were being gathered, one with the information of the last two weeks only) was seriously deteriorating. What this graph shows is way more useful... now if only there'd be a quarter version for the other graph as well...

Sun, 10 Jul 2005

Recent buildd frustrations

You know what happens when someone installs apt with authentication support without checking whether it's possible to disable that? Ugly things. Someone told me on IRC that it's supposed to be possible to disable that authentication by specifying 'APT::Get::AllowUnauthenticated "true";' in a file in /etc/apt/apt.conf.d, which is exactly what I did. No luck, however.

Another problem is the recent upgrade to GCC4, which seems to be fairly buggy on m68k. GCC3, which exists for quite a long time is now fairly bugfree (the last Internal Compiler Error I've seen from GCC3 must've been months ago), but since I installed GCC4 on kiivi, I've seen far, far higher numbers of those bastards. According to the interface behind crest.debian.org, we're on 13 of these bastards (and counting).

Grumble.

Guess we'll have some work to do.