WEBlog -- Wouter's Eclectic Blog

Fri, 05 Mar 2010

Netgear WNDR3700 and OpenWRT

I wanted a machine on which I could easily run OpenWRT. So I'd went to the #openwrt channel on freenode a while back, and just asked for suggestions; people suggested to me that the Netgear WNDR3700 was a good choice, so I ordered that.

I assumed that it would be easy enough to install OpenWRT on this device, but hadn't actually looked into it, planning to wait with that until the device had arrived. Little did I know that the machine actually comes with OpenWRT preinstalled. Now there's an interesting twist.

Now you do need to run some "telnetenable" thingy to be able to get a shell, after which "telnet <device>" gets you a root shell (with no username or password by default). Supposedly you should update that by using "passwd", but they managed to break that in the firmware that comes with the device.

I am missing a few things, though.

root@WNDR3700:/bin# dmesg
/bin/ash: dmesg: not found
root@WNDR3700:~# uname -a
/bin/ash: uname: not found
root@WNDR3700:~# hexdump /bin/config |more
/bin/ash: less: not found

Unh?

root@WNDR3700:~# alias
more='less'
vim='vi'
root@WNDR3700:~# 

Aahh.

And for those who were wondering: no, it does not have any 'vi' installed, either.

Oh well.

The fun thing is, this device has a USB connector, too; so it should be possible to connect a USB storage device, install Debian, and use it as a very potent home server/router/switch/whatever. That'd require me to understand how hostap works, though, which I haven't played with yet. I'm sure I'll figure that bit out -- at some point.

Wed, 03 Mar 2010

dpkg vs RPM

Thomas blogs about some issues he had with his N900's facebook plugin. This post isn't about that, as I don't use facebook.

But as part of his blog post, he mentions the following:

This reminded me of a pet peeve I have with those people who claim Debian’s packaging system to be far superior to rpm – apparently dpkg doesn’t have any equivalent of rpm -qv which allows you to verify that the files that should be installed by a package are indeed on disk

True, probably because the script would be so trivial:

for i in $(cat /var/lib/dpkg/info/nbd-client.list)
do
	[ -f "$i" -o -d "$i" ] || echo "$i missing"
done

There, that wasn't hard, was it?

Now I'm not sure whether rpm's -qv option actually checks the checksum of the files, too. If it does that, then the semantically similar way would be:

(cd / && md5sum -c var/lib/dpkg/info/nbd-client.md5sums)

... except that MD5 is totally and utterly useless these days, and that we should be changing to something else. And that md5sums is an optional feature, provided by some, but not all, packages. And it may also be that maemo packages don't have md5sums (which would make sense). But, anyway.

Wed, 17 Feb 2010

Booked flights

Whee.

As usual, I'll be there the whole time, both for debcamp and debconf. In addition, since I'll be halfway there anyway, I'll be paying my niece a visit after the conference; she lives in Portland, OR.

Should make for a good holiday, I would say.

Local kernels

LC_ALL=C debian/rules debian/control
md5sum --check debian/control.md5sum --status || \
		/usr/bin/make -f debian/rules debian/control-real
make[1]: Entering directory `/home/wouter/debian/other-peoples-source/linux-2.6-2.6.32'
chmod +x debian/bin/gencontrol.py
debian/bin/gencontrol.py
Traceback (most recent call last):
  File "debian/bin/gencontrol.py", line 331, in 
    Gencontrol()()
  File "debian/bin/gencontrol.py", line 14, in __init__
    self.process_changelog()
  File "debian/bin/gencontrol.py", line 305, in process_changelog
    (distribution, version))
RuntimeError: Can't upload to unstable with a version of 2.6.32-8~local1
make[1]: *** [debian/control-real] Error 1
make[1]: Leaving directory `/home/wouter/debian/other-peoples-source/linux-2.6-2.6.32'
make: *** [debian/control] Error 2

The issue at hand was that I'd created a version of '2.6.32-8~local1', to document that I'd locally branched version 2.6.32-7 with that one config option turned on, but that my version should not be deemed larger than 2.6.32-8 (signalled by the ~ and whatever follows that). However, something somewhere in the Debian kernel build system (I was unable to figure out what, exactly) disliked that version number and told me I could not upload to unstable with that version. So it borked, and told me I had to fix my version.

Well, no, I don't want to do that. Instead, I wanted to disable that check somehow. Turns out that isn't too hard; gencontrol.py not only checks the version number, it also checks the target distribution. So if you don't want to upload to unstable, you just need to tell the tool that:

-linux-2.6 (2.6.32-8~local1) unstable; urgency=low
+linux-2.6 (2.6.32-8~local1) local; urgency=low

Simple, but you'd need to know.

Mon, 15 Feb 2010

Hum.

I wonder how these people will react to this.

At least it's going to be somewhat more open, I guess. Which is good.

Fri, 12 Feb 2010

FOSDEM 2010...

... is over (for almost a week now), and it was a blast again.

If you went to one of the distro devrooms, I would appreciate it if you were to reply to this mail. We need feedback to improve stuff for next year.

Thanks.

In related news, I've uploaded the slides of my (unexpectedly horrendously successful) talk here

Sat, 30 Jan 2010

On MySQL and Oracle

I think Monty has well and truly lost it.

The European Commision, after careful consideration, has cleared Oracle's purchase of Sun:

The Commission's investigation showed that another open source database, PostgreSQL, is considered by many database users to be a credible alternative to MySQL and could be expected to replace to some extent the competitive force currently exerted by MySQL on the database market.

I'd go one step further, and would say that MySQL is not a credible alternative to PostgreSQL. But whatever. Hopefully, if MySQL fails, then PostgreSQL will (finally) get the attention that it deserves. I'll have a real database every time over this piece of... anyway.[1]

This is a fair argument, and to be sure it is certainly not a problem for anyone to migrate from MySQL to a MySQL fork, or (with some work) from MySQL to PostgreSQL. But Monty seems to disagree, and now tries to get Russia and China to block the merger.

What's next, Andorra?

[1] comments on this blog item in defense of MySQL will be vigorously moderated away. MySQL is a POS that falls over if data is corrupt, that corrupts its own data (most distributions call 'mysql_recover' in their initscript for a reason), and whose C API does not properly support cursors unless you want to block concurrent access until the cursor is closed (paragraph 3). Every time a customer asks me about MySQL, I vigorously recommend against it, because it's a bad idea.

Tue, 26 Jan 2010

Going, obviously

If you thought otherwise, you're crazy, but just for reference:

I'm going to FOSDEM, the
Free and Open Source Software Developers' European Meeting

This year I volunteered to organize the "distributions" devroom/track, because it seemed to be going nowhere, and the people who where supposed to do so were too busy with other stuff. I'm still not very fond of the idea of mixing all distributions in one room, but at least we managed to avoid complete and utter disaster wherein nearly no talks would have been submitted.

Let's see how it goes, now.

Fri, 22 Jan 2010

Clijsters out?

Whoa.

Kim Clijsters, Belgium's number 1 female tennis player of the moment, just got booted out of the Australian Open by Nadia Petrova. Not what I'd expected—especially not with this kind of score; 6-0 6-1. To call this "unexpected" would be a severe understatement.

Seriously.

That leaves Belgium's hope with Justine Henin or Yanina Wickmayer. I say "or", because they'll next meet eachother in Melbourne. Sounds like an interesting match, indeed.

Sat, 16 Jan 2010

ACCEPTED

My mutt said this last night:

 894   + Jan 15 Archive Adminis (0,4K) ipcfg_0.1_amd64.changes ACCEPTED

This obviously means that if you wish to use it, you no longer need to go through git; you can just add experimental to sources.list and run 'apt-get install ipcfg'. A few notes, though:

And in case you wonder why the hell I went from 0.1 to 0.3:

ipcfg (0.2) experimental; urgency=low

  * Rebuild without .git directory. D'oh.

 -- Wouter Verhelst   Tue, 12 Jan 2010 17:43:09 +0100

srsly

Wed, 13 Jan 2010

Using ipcfg, now

Yesterday, I had some time to debug ipcfg some more. The main blocker for me to upload it to unstable was the fact that I could not get WPA security to work; therefore, I could not use it myself. In the interest of "eat your own dogfood", I did not feel that uploading some experimental code to Debian that I'm not using myself is a good idea.

That problem is now fixed, albeit through something of a hack, one that I hope will not be necessary forever: I decided to write a plugin to run ifupdown extension scripts (found in /etc/network/if-*.d). It does require some set-up, and there are still some severe issues; but as of now, I am using ipcfg rather than ifupdown on my laptop.

Those interested in trying it out can either wait for ftp-master to ack the upload and then install from unstable, or just fetch it from the git repository.

Wed, 30 Dec 2009

On cultural differences.

Christian writes about differences on the international level; how even something as simple as people's names isn't always easy, and that one has to be quite careful to avoid calling people 'Dear Verhelst', or some such, because you assumed that their first name was indeed their given name, while in their culture the last name is the given name.

Funnily, Christian advocates using capital letters to clarify which part of one's name should be considered the family name.

Sorry to disappoint you, Christian. Writing one's last name with all capitals is a French typographical convention. While it is understood in other places, it is not often used outside of France (except in organisations with a strong link to France, obviously); and it is, in fact, shouting.

It is also not actually helping any. I should, in fact, somewhere in my mailbox archive have an email reply from a South Korean guy that starts off like so:

Dear TIA,

to which I then had to explain that no, TIA is not my first name, it is an abbreviation for "Thanks In Advance".

Note the use of capitals.

Tue, 29 Dec 2009

Perl on the N900

Since about a week, I have a Nokia N900. They're not actually on sale in Belgium yet, but I managed to get hold of one through a grey market shop.

The device is quite nice. I do have a few irks with the interface here and there, but Nokia has a public bugzilla that I'll surely use to file bugs whenever and wherever appropriate. And after using it for about a week, I thought of migrating this small application which I once wrote on half an afternoon using perl, postgres, and Gtk2, to the device. It's mainly a cataloguing system that I store my DVD collection in (which has gotten large enough that this is necessary), nothing earthshattering. The ideal candidate to get my feet wet in developing for the N900, so to speak.

That effort got stopped dead in its tracks pretty quickly.

Nokia-N900-42-11:~# perl
use Gtk2;
Can't locate Gtk2.pm in @INC (@INC contains: /etc/perl /usr/local/lib/perl/5.8.3 /usr/local/share/perl/5.8.3 /usr/lib/perl5 /usr/share/perl5 /usr/lib/perl/5.8 /usr/share/perl/5.8 /usr/local/lib/site_perl .) at - line 1.
BEGIN failed--compilation aborted at - line 1.
Nokia-N900-42-11:~# apt-cache search perl
perl - Larry Wall's Practical Extraction and Report Language.
libpcrecpp0 - Perl 5 Compatible Regular Expression Library - C++ runtime files
po4a - tools for helping translation of documentation
libpcre3 - Perl 5 Compatible Regular Expression Library - runtime files
libgdbm3 - GNU dbm database routines (runtime version)
perl-base - The Pathologically Eclectic Rubbish Lister.
liblocale-gettext-perl - Using libc functions for internationalization in Perl

Or, in other words: they only ship the perl bits needed to make dpkg run.

Hrmpf. This would make stuff slightly more... involved.

Oh well.

Sun, 20 Dec 2009

Re: Grrr

I received a number of comments on my "Grrr" post, all of which missed the point:

Yes, I am aware that there are many more ways to fix this issue beyond a memcpy. However, the example code is legal and would not crash the application, if not for the fact that libc thinks I am doing something wrong. On top of that, this kind of overflow "protection" only kicks in when the code is compiled with -O2 rather than with -g -O0. While I am not sure whether the difference is due to the absense of debugger symbols, or rather due to the different optimization levels, fact is that software which runs fine in debugging should also run fine in production.

There are good arguments for compiling all C code with -Wall -Werror, and I do that as a matter of course for all C software that I write. However, sometimes automated tools are just wrong in their compile- or run-time bug detection, in which case such it should be possible to disable that detection. This is one such case, and my blog post was more about ranting about the inability to do so, rather than about the fact that I had to memcpy when in fact there were other options available.

But yeah, perhaps I should have been clearer about that. Forgive me for not being clear after having fought with compilers for far too long.

Good, not evil

There is a bit of a fluff online currently about the following clause in the "jsmin" code (whatever that is):

The software shall be used for good, not evil

This seems to have started when Google rejected a project based on that code due to its license being not free or open source according to their standards, and therefore not welcome anymore.

The arguments then quickly degenerated into things like 'when did google stop being against evil'. But those are all besides the point.

One of the most important properties of free and open source software is that anyone can use it for any purpose; there are no restrictions to using them. The DFSG (and hence, the OSD which was derived from the DFSG) encode this as follows:

No Discrimination Against Fields of Endeavor
The license must not restrict anyone from making use of the program in a specific field of endeavor. For example, it may not restrict the program from being used in a business, or from being used for genetic research.

You may think "how is 'evil' a field of endeavor", but that is the wrong question. To "use software for evil" can mean any of a number of things, including "nuclear research", "weapons development", "abortion", or, heck "the cash register in a butchery shop", depending on the ethical and moral framework through which the person writing the license sees the world.

The ability to give someone a CD or DVD with a bunch of software on there, being able to tell them that they can just use this in any which way they see fit, is a very strong and important feature of the free and open source community. Every time someone comes up with a clause like the above, however, this ability is diluted somewhat; and if it is readily accepted within the greater free and open source movement, then eventually everyone interested in using a piece of software must first check whether they're not trying to use software that forbids someone's pet evil, and we lose one of the greatest strenghts that does exist for free software, but not for proprietary software.

The sad thing is, the jsmin author seems to agree. From a video/transcript on which he talks about his absurd license clause is the following quote:

Also about once a year, I get a letter from a lawyer, every year a different lawyer, at a company – I don't want to embarrass the company by saying their name, so I'll just say their initials – IBM…
[laughter]
…saying that they want to use something I wrote. Because I put this on everything I write, now. They want to use something that I wrote in something that they wrote, and they were pretty sure they weren't going to use it for evil, but they couldn't say for sure about their customers. So could I give them a special license for that? Of course. So I wrote back – this happened literally two weeks ago – "I give permission for IBM, its customers, partners, and minions, to use JSLint for evil."

Or, in other words, all you have to do if you want to use this software for evil is set up a second company, tell Douglas that this second company wants to sell software that uses his software to people who might use it for "evil", even though the first company won't, and you're in business. Because Douglas doesn't really oppose evildoers.

So the question is, why is that clause in there in the first place? There are only two possibilities; either Douglas didn't really think about those issues, in which case I hope he will one day see the light and remove the clause; or he did, and decided to go ahead and put that clause there anyway. And that would be evil.

QNAP NAS device

A customer called me a while back.

They have a pretty high-end NAS (2U server with external disk enclosure) that they use for most of their needs: both document and other storage for management types, as well as storage for their EDA cluster. People familiar with EDA tools will know that they are pretty disk intensive. They tend to require shitloads of diskspace, preferably at high performance.

The problem was that extending the storage space of this high-end NAS device is rather expensive. Supported disks only come at 300G units with prices that are higher than the cheap TB disks available today for the desktop market; and while the price and performance of those disks is worth it for the EDA requirements, the number of people storing documents and similar on the same device is not so high that performance would be an issue.

Thus, what they wanted was a cheap solution to augment, rather than replace, their current storage solution, with the focus being on low cost at the possible expense of performance and, since bacula runs well there, reliability.

They were quite surprised when I offered them a solution based around a QNAP device; they had expected a price, without hard disks, of about three to four times the price of the QNAP, and therefore were very happy with this suggestion. Unfortunately, the devices produced by this manufacturer that are supported by Lenny, the current Debian stable release, were EOL'd just before we placed the order; however, their replacements will be supported by Squeeze, the next release, and with a bit of help from Martin Michlmayr, we were able to install Lenny with a somewhat more recent kernel. This drove the price up somewhat, both because the newer devices are a few tens of euros more expensive, and because we agreed to pay Martin to prioritize work on the specific device that we'd need, so that we wouldn't have to wait for him to get around to it.

As of this Thursday, the device, which had finally arrived after some initial delays, has been installed. Unfortunately, one of the disks that they'd bought was DOA; but they had the foresight to buy five, rather than four, disks, so that was not a major problem. In fact, the only real issues that I ran into were this one arm-specific bug, and the fact that the high-end NAS device is still running etch, which means that rsync --acl won't work. That means I'll have to go back there soon to do an upgrade of the high-end device (which had been planned for a while already); for the arm-specific bug, a workaround is already in place.

All in all, a pretty good experience.

Wed, 16 Dec 2009

Grrr

struct {
	char str[4];
	char sc1;
	char str2[3];
	char sc2;
} foo __attribute__((packed));

snprintf(foo.str, 5, "%04X", data);
foo.sc1 = ';';
snprintf(foo.str2, 4, "%03X", otherdata);
foo.sc2 = ';';

Yes, I know that both snprintf() calls in the above snippet will overflow their immediate buffer. Yet this code is safe; the network protocol for which this code is written does not actually need nor expect NUL bytes; instead, it wants semicolons. I could of course use a "char foo[9]" rather than a struct as above, but I find this to be slightly more convenient than to count offsets.

However, this code does not work with glibc, because the buffer overflow detection kicks in.

Solution:

char buf[5];

snprintf(buf, 5, "%04X", data);
memcpy(foo.str, buf, 4);

In other words: add a stupid and useless memcpy, because someone thinks they're smarter than me. Stupid morons.

Tue, 08 Dec 2009

Hardware RNG

So, as many people probably know by now, the nice folks over at Simtec Electronics, some of whom were at DebConf9 last summer, have created a nice small device that plugs in a USB port and, with a userland daemon and some encryption for security, generates entropy (randomness) for the Linux kernel to use (through /dev/random and friends).

I've plugged one in my server today, and suddenly my server's entropy pool was full. This is a really nice thing. For a simple example of what happens when you insert such a thing into a server, check out my munin graphs.

Very nice for such a cheap device...

Fri, 04 Dec 2009

Dear lazyweb,

Following my disagreement with sony, I managed to convince my brother that swapping my PS3 for his is a good idea, so now I have a PS3 running Debian, and the fun can start.

Of course the reason why I wanted this machine in the first place was to hack for it, so that's what I'm trying to do now. If only a split-brain processor was something that automake would understand...

How do I make automake understand that it needs to do this?

spu-gcc -o foo-spu foo-spu.c
embedspu foo_spu foo-spu foo-spu.o
gcc -c -o foo-ppu.o foo-ppu.c
gcc -o foo foo-ppu.o foo-spu.o

I tried a Makefile.am like this:

bin_PROGRAMS = foo
foo_SOURCES = foo-ppu.c foo-spu.c
%-spu.obj: %-spu.c
	$(SPUGCC) $(SPUCCFLAGS) -o $@ $^
%-spu.o: %-spu.obj
	embedspu $(subst .,_,$<) $^ $@

... but that doesn't work, since automake inserts a more specific foo-spu.o target that uses $(CC). It only works if I then manually run 'make foo-spu.obj', before running 'make foo'. Obviously we don't want that.

Any help would be greatly appreciated.

Update: understood why some bits didn't work, and clarified here what the real problem was.

Wed, 18 Nov 2009

Debian @ FOSDEM '10

As I do every year, this year too I asked for a devroom and a booth at the yearly FOSDEM meeting in Brussels, Belgium.

We've been granted a booth. We've not been granted a devroom.

This is not because the organizers hate Debian, but because the organizers wish to organize things slightly differently this year. As a matter of fact, they've not granted a devroom to any distribution project.

Does that mean we can't hold talks at FOSDEM? Certainly not.

Instead of a bunch of distribution devrooms, there will be a 'distribution miniconf' that the Debian distribution has been invited in. What wasn't clear from the initial mail (at least not to me), however, was that talk proposals can already be sent in.

If you want to hold a talk about a Debian-specific subject, you should subscribe to the relevant FOSDEM mailinglist, and send your proposal there. However, do note that since it will not be a Debian-specific event anymore, that while the talk may be about something related to Debian, it should target people who may be involved with other distributions. The goal is to learn from eachother.

With that out of the way, I guess the booth will gain on importance this year, since there will not be any other Debian-specific bits anymore. As such, if people would like to come up with suggestions on what to do with it, that would be greatly appreciated. These should probably go to debian-events-eu@lists.d.o.

See you at FOSDEM,

Dear lazyweb,

Which part of acpid is it that sends XF86Display key events to my X server? It's broken: when I suspend and then resume, it keeps sending those events over and over in an infinite loop; and since the script that I've hooked to that event takes an order of magnitude longer to run than the interval between said events, I'll just say that I find this highly annoying. I'm quite sure it's actually acpid that's sending them out, since the fix for this problem seems to be to call sudo /etc/init.d/acpid restart...

Occasionally, I don't think the ACPI daemon should be sending out these events at all (it seems to do so whenever I open or close my laptop's lid), but that's a different story...

Sun, 15 Nov 2009

NaN currency

The Daily WTF, an excellent website that I recommend everyone remotely involved in computers has a look at every now and then, has one series of posts where it shows 'strange' or 'funny' error messages from not just desktop computers, but also ATMs and other embedded devices.

One of the more recent entries of these Error'd series contained a picture that made me cringe; it showed 'NaN' for a currency value. Obviously this is an unhandled floating point error, but that's not what made me cringe.

What did make me cringe is that people who are supposed to work with shitloads of money are using floating point numbers to store monetary values. This is stupid. By their nature, floating point numbers are imprecise and inaccurate. The representation of a floating point number is an approximation. This is not the kind of thing one would like for bank transfers. "Mr Foo would like to transfer approximately €100 from his account to that of his neighbour." Err, no; it doesn't work that way.

Instead, whenever you use currency, you should use integer numbers. If your local currency is the US Dollar or the Euro or another similar currency where zero-point-something amounts of money are regularly used, work on the convention that "1" in your variable actually means ".01", or ".1", or whatever is most convenient, and make the right conversion whenever you need to do some output. Do not think you can deal with rounding errors in floating point, because you cannot.

COBOL (I can't believe I'm saying this) actually makes the latter easy, because you can create a statement that says "to output this variable, show the first three digits, then a point, then the last two digits". I believe it doesn't even have support for floating point numbers. Unfortunately this kind of thing is harder in C (and similar languages), but that doesn't mean you should just go ahead and assume that floating points are a good idea...

This is the kind of error I could accept from a first-year graduate student, but not from someone who's supposed to work with money on a daily basis.

Oh well. At least the ATM is about US Dollars, so it's not likely that it's going to be my problem any time soon.

Sun, 11 Oct 2009

Fuck you, Sony.

I had wanted to buy a PS3 ever since I learned about this interesting processor that is called the Cell. Not that I'm very much into console gaming or any such thing; I'd have settled for any kind of affordable Cell hardware, really, but that basically is 'the PS3' these days.

What I had missed, however, was the fact that recent PS3 machines don't support running Linux anymore. Apparently this was all over the interwebs, but unfortunately I didn't see that.

That's €300 I won't see again, for a useless (to me) piece of hardware.

Stupid morons.

Thu, 24 Sep 2009

RedHat is so medieval

So there's three systems. One of them runs CentOS 4, the other two run RedHat EL 4. One of the RedHats has been in use for quite a while, but the other two are freshly installed, and would need to have the same functionality as the first, but don't yet, mainly because they miss a number of packages.

So we do this.

[wouter@working-host ~]$ rpm -qa --qf "%{NAME}" | sort -u > working-host
[wouter@CentOS ~]$ rpm -qa --qf "%{NAME}" | sort -u > CentOS

Next, we compare the two:

diff -u working-host CentOS | grep '^-' | tail -n +2 | sed -e 's/^-//' > CentOS-missing

At this point, the file CentOS-missing contains some 403 lines. So how does one install packages on CentOS4?

[root@CentOS ~]# yum install $(cat CentOS-missing)

It starts thinking about it, then after several minutes comes back to me with "foo is already installed in an older version. I can't do what you're asking me"

[root@CentOS ~]# grep foo CentOS-missing
[root@CentOS ~]#

In other words, it decided that, to satisfy one of the dependencies of one of the packages I'm asking it to install, it needs to have some other package installed, too, so adds it to the set. Then figures out that it's already installed, at an older version than what it was trying to install. Then bombs.

Okay Wouter, what were you thinking? Surely you can't expect a package manager to suddenly update packages when you ask it to install something else? That would mean upgrading and installing in the same command execution! Oh, the horror.

So, okay, we run yum update first.

Do you think that fixes it? Of course not!

It now comes back with... a message saying that a newer version of what we're trying to install is already installed.

Sigh.

So I'm now running 'for i in $(cat CentOS-missing); yum -y install $i; done' on this host. Since it takes yum several minutes to resolve dependencies, that means it's going to take a while to have it install 403 packages.

The not-yet-working RedHat 4 installation seems to have the exact same problem, except that up2date takes even longer to resolve dependencies, so we're not beyond the 'upgrade' step there yet. I'm sure it will come up with all kinds of other interesting failure modes. Luckily the RedHat-missing file is much smaller (83 packages).

How do RedHat users not eat their keyboard in frustration?

Note: yes, I'm aware of apt-rpm. Unfortunately, that is not supported on 64bit installations, because apt (in neither its dpkg nor its rpm variant) understands the notion of multiarch. At this time.

Fri, 18 Sep 2009

Announcement

I was really happy a few years back, when Martin announced he'd start working on netconf. Not just because I agreed with his assessment that ifupdown needed replacement; also because I had been thinking about a good way to implement such a thing, and had been planning to start writing 'soon'. With Martin's announcement, I made a few suggestions to give him some input, and then put my plans away—I had more important things to take care of, anyway.

Unfortunately, netconf did not manage to reach the ambitious goals that Martin set out, mainly due to lack of contributors. I wanted to help out; I really did. But Martin's choice of python to implement a 'prototype' which then had to be reimplemented in C or C++ didn't get me very thrilled. Not just because I have a severe dislike of python; also because implementing something twice is, in my humble opinion, not quite the best way to do something like this.

But hey, who am I, if I don't put my code out there.

Never the less, I tried helping. I really did. During debconf8 at Mar del Plata, I approached Martin with the suggestion that I start reimplementing those bits that were unlikely to change anymore in C, as a first step toward something that could be in the base system. Unfortunately, that didn't happen. After trying for about two weeks, I just gave up. I thought I could read python code and reimplement that so that it would work the same way, but I couldn't. I thought I could implement python modules in C without knowing python, but as it turns out that's a laughable idea.

For several months then, I didn't think about it anymore. In March of this year, however, I got a little bored. And what does one do when they get bored? Right, you find something to do. I my case, that was 'figure out how bison works'. I'd of course known about parser generators for quite a while, but just never had the time to dive in deep and figure out how they work. When I needed to do a config file for nbd, I instead hand-wrote my parser and used a lexer from libglib. But I wanted to learn how to do things properly, so I sat down.

Long story short, when I had to choose something somewhat more complex to implement, I thought 'how about a hypothetical config file for a network configuration utility', and I started implementing that. It turned out to be fun, so I didn't stop after I knew enough about bison. And just last week, for the first time I successfully used it to bring up my wired interface using DHCP -- and bring it down again. It can also already manage static interfaces through netlink, though it isn't quite able to the equivalent of 'ip addr flush' yet.

Fun.

The system is quite flexible. It has a plugin interface (which needed some refinement that I did using some help on IRC just today), which should allow developers to implement extra functionality by just dropping a shared object in the right directory. In fact, I'm working on the wireless support (hence my blog post of last week) as a plugin.

Obviously the code isn't remotely ready yet. The wireless code has a long way to go. The firewall module needs to be started. It will probably crash and burn if put to the test. But even so, it already has one feature that ifupdown, in its 10 years of existence, never acquired: it will not try to DHCP off an interface if it finds out that there is no cable connected to it—unless you wrote a configuration that asks you to do so.

I wanted things to be as simple as possible, and therefore the minimal valid configuration file that will do something useful is the following:


That's right, it's empty. This will cause the system to bring up the 'lo' interface at bootup, and do nothing else. If you say 'ifup eth0', where eth0 is a valid and existing interface, it will first check whether there is a link, and if so, DHCP off of it.

There's much to do still, but I have now reached a point where I feel the system is ready to meet all challenges it should be able to meet (or so I hope), and where it might be nice if interested people check it out.

The code is in a git repository that's mirrored on git.debian.org, github.com, and my own git.grep.be. The latter is my --mirror repository, but is not available for cloning to anyone else. If you're interested in checking it out, you may want to start with doc/tech.txt, and/or examples/ipcfg.cfg. I updated the former just yesterday, so it's quite up-to-date. On the other hand, while the latter shows the direction I would like the system to take, the code is a far cry from implementing everything shown there, and it has been a while since I last checked that file out, so the comments in there are, in some cases, somewhat outdated. But that's good, because I want to keep the file as a reference of the goals that I originally had, before I'd written this whole corpus of code, so that I can go back at some undefined point in the future and fix things.

Comments are (more than) welcome.

Dear lazyweb,

Since a day or two, I've got myself a nice new monitor. It's positively huge, which makes it useful even alone.

There is, however, one little problem. When I call xrandr --output LVDS --right-of VGA, my laptop's display is situated at the top right of the monitor. This is not what I'm after; I'd much rather have it at the bottom right of the monitor, since

Hints as to how one should fix this are more than welcome.

Thu, 17 Sep 2009

#include <iwlib.h>

Writing a network configuration tool is... fun.

Well, mostly anyway.

/***************************** SCANNING *****************************/
/*
 * This one behave quite differently from the others
 *
 * Note that we don't use the scanning capability of iwlib (functions
 * iw_process_scan() and iw_scan()). The main reason is that
 * iw_process_scan() return only a subset of the scan data to the caller,
 * for example custom elements and bitrates are ommited. Here, we
 * do the complete job...
 */

/*------------------------------------------------------------------*/

(from iwlist.c)

Not. The. Way. To. Do. It. Especially not if you're the person who writes both libiw and iwlist. Grmbl.

Also, documentation for libiw seems to be, well, nonexistent.

Fun. Not.

Mon, 07 Sep 2009

Tennis: Back Again!

It wasn't too long that Kim Clijsters retired from Tennis, with Justine Henin following her not too long after that. At that point, with the next best player, Kirsten Flipkens, being ranked a hundred-something, I thought it'd be a long time before we'd see Belgian women's tennis reappear. Certainly I didn't think we'd be hopeful this year already.

As such, I'm delighted to be able to say that not only is Kim Clijsters making a smashing come-back—already beating Venus Williams, ranked 3rd on the WTA lists, at the US open, and only a few months after her returning to tennis at that—but Yanina Wickmayer, ranked 50-something on the WTA lists by now, has managed to reach the last 16 players. With Kirsten besting her career record by reaching the 3rd round of the US Open (before being ousted by Kim, no less), I feel the future of Belgian Women's tennis is looking very bright indeed.

Perhaps I should think about getting that cable subscription, so I can actually watch the matches.

Tue, 25 Aug 2009

Shirts

I so want the t-shirt...

Mon, 17 Aug 2009

New toy: Nikon NIKKOR 50mm f/1.4D prime

Ever since I first borrowed such a lens from Tiago, I loved it. People have often told me that the pictures I took at Debconf8 were great; well, they only were because I borrowed that lens for half a day. I did the same thing at Debconf9, and again came up with wonderful pictures. It's not just the lens, of course, but it helps. I've been wanting to buy this thing ever since.

I can't really afford it yet, but I just didn't care anymore. So today, I went to the shop and bought the thing.

In short: I'm not regretting it.

Lays and Kellog's Piano

More on my Flickr photo stream.

Thu, 06 Aug 2009

Re:

I recently (some of it during debconf, some of it right after) blogged a few things that are, shall we say, controversial. Obviously I got some response to that. Since I haven't replied individually to most of those replies, I thought a new blog post would work well.

On my very short and rather unconstructingly hateful post about gnome-keyring, JanC wrote:

I can understand why you would get popups from seahorse-agent if you use that, but if ssh gives you popups from gnome-keyring, that sounds like something is misconfigured to me?
BTW: it's easy to configure things in such a way that seahorse-agent doesn't get used inside a terminal; it's just another ssh-agent after all...

Well, presumably. But I didn't do it.

wouter@celtic:~/data/blog/live/en/computer$ sudo dpkg-divert --list|grep gnome
Password: 
local diversion of /usr/bin/gnome-power-manager to /usr/bin/gnome-power-manager-sucks
local diversion of /usr/bin/gnome-keyring-daemon to /usr/bin/g-k-sucks
wouter@celtic:~/data/blog/live/en/computer$ svn up

** (process:10384): WARNING **: couldn't communicate with gnome keyring daemon via dbus: Failed to execute program /usr/bin/gnome-keyring-daemon: Success

** (process:10384): WARNING **: couldn't communicate with gnome keyring daemon via dbus: Failed to execute program /usr/bin/gnome-keyring-daemon: Success

** (process:10384): WARNING **: couldn't communicate with gnome keyring daemon via dbus: Failed to execute program /usr/bin/gnome-keyring-daemon: Success

** (process:10384): WARNING **: couldn't communicate with gnome keyring daemon via dbus: Failed to execute program /usr/bin/gnome-keyring-daemon: Success

** (process:10384): WARNING **: couldn't communicate with gnome keyring daemon via dbus: Failed to execute program /usr/bin/gnome-keyring-daemon: Success

** (process:10384): WARNING **: couldn't communicate with gnome keyring daemon via dbus: Failed to execute program /usr/bin/gnome-keyring-daemon: Success
At revision 1086.
wouter@celtic:~/data/blog/live/en/computer$ 

Note particularly the fact that the login works perfectly okay right after whatever causes this finally gives up trying to call gnome-keyring, and svn finds the authentication info in its config file (this is my blog. Not as if there's anything of serious security there; those repositories that are important are behind SSH)

I never even asked anything to run gnome-keyring-daemon; but if I allow svn to do its thing without asking, this is what happens. I don't have libpam-gnome-keyring installed, and I'm no longer running gdm; two sources of things that run gnome-keyring-daemon without my asking. Yet the bloody thing keeps cropping up everywhere I look. It is broken, and it should not exist. Thank god for dpkg-divert.

On my post about MS and the GPL, some people wrote that Microsoft only published the code for GPL compliancy, not because they thought it a good idea or some such.

While this might be true, the fact of the matter is that, unlike many many many other companies out there, they decided to not just put a blob of code on their website and be done with it, but instead to do the right thing, and get the code merged. That is far beyond what the GPL requires them to do, and it signals to me that they did this because they are changing—the Microsoft of 10 years ago wouldn't even have written those drivers in the first place, let alone get them merged. That, I think, was the most important point of that post.

On my post about passwords, Matthew Johnson wrote:

An SSH key is very much not just "a password written down". Passwords are sent in the clear (inside the tunnel) and are therefore more subject to MITM attacks, keyloggers, etc. SSH keys, because of the asymmetric crypto, if you do manage an MITM attack using an SSH key does not reveal anything which could be used to authenticate another time.

Good point; I overlooked that. Yes, that does make passwords somewhat less secure, and defeats most of what I said. However, it does not defeat the fact that allowing only one way to login increases the chance that you will require some people to jump through hoops—me, I do not keep my SSH keys on my laptop's hard disk, so they don't get compromised should the laptop ever get stolen. That does make it harder for me to log on to a system using these keys, however, and it makes me consider generating a separate SSH key that I will keep on my laptop, just for Debian.org hosts. I need to log in to some of them far too often.

On that latter post about buildd maintenance, I received a response from Riku about how he feels using tr on the client-side is wrong, and how I should instead have patched buildd-mail instead to decode MIME encodings.

Technically, he's right. Practically, he's not. When I wrote this script, it was far easier to fix the problem once, on my side of the equation, rather than every time I install or upgrade buildd. At the time, the code was in a repository that only Ryan had commit access to, and as a result most people kept patches locally. I should still have a patch lying around somewhere that has the same effect as the one that got committed, but gave up trying to use it, as in my ever-changing set of buildd hosts, one would creep up again and again that did not have the patch applied, because it was recently reinstalled after a dead disk, or because buildd was upgraded, or some such, meaning that signatures would fail and I would have a shitload of unnecessary manual work. In such a context, writing code on my client side—in casu 21 characters—is far easier than trying to fix the code centrally.

'night

Didn't feel like cooking today, so I went out to have a steak in a nearby tavern. As I had finished my meal and walked home, I saw the moon peep through the clouds, right around the corner of where the apartment is. I just had to take a picture, so I went home, grabbed my camera, put on the 18-70mm lens, and went out again.

Onze-Lieve-Vrouwestraat

Taking a picture with 4 seconds of exposure isn't easy, especially so if you don't have a tripod. But after six tries, I managed to come up with the above. It was a matter of sitting down, leaning one arm against the wall to the left, and not breathing. Oh, and cropping some wall away afterwards. But hey.

I'm not 100% happy with the result (the top edge of the frame should've been a little higher), but it's close.

note: this was written last night, but the commit failed because I misconfigured my server. Whatever.

Thu, 30 Jul 2009

Passwords

The Debian System Administrators decided, apparently, that disabling password logons is a good thing that warrants a 'Good News' post.

Allow me to politely disagree, for two reasons:

First, an SSH key is a password that is stored on the hard disk, while a 'regular' password is only stored inside someone's brain. While torturing someone to get at their password is arguably possible, it is not possible to do so without this person noticing. The same cannot be said about someone secretly stealing a file from someone else's hard disk; and while it is certainly possible to protect an SSH key with a password, it is not at all required to do so in order to use such keys. As such, on the server end you have no way to know whether a remote client is in fact the person whom they claim to be, just because they happen to have a SSH key that just happens to match the original.

Second, security is not accomplished by forcing people to use things they do not want to use. If you do that, they will find ways to work around your security—leaving you with no security at all.

But oh well, it's not my call to make, so whatever.

The lying will stop

A few hours from now, this site will stop lying in its section of past events of the same type.

One might think I'd be happy about the end of lies; but given what it implies, not quite so.

I guess I can't wait until some other future site starts lying about past events.

We'll see.

Tue, 28 Jul 2009

Extramadura: gnuLinEx and NBD

So obviously I already knew that the region of Extramadura uses a version of Debian they call gnuLinEx, but I didn't know the specifics. As such, it was nice and interesting that they offered us the option of going to a local school, where we could see an installation of gnuLinEx in action. Obviously I went there.

This was an interesting experience, for sure. When I arrived in Cáceres, I learned from Vagrant Cascadian that the school installations make extensive use of LTSP. This, in turn, uses NBD. Since they run this on 80.000 computers, it's quite likely that they're the largest NBD installation in the world. I had no clue.

So, today, when noticing that nbd-client was used on the local machines, I had a short little chat with José, one of the guys from gnuLinEx who's a Debian Developer, about the fact that I've been wanting to do some work on NBD's performance (mostly profiling runs etc), but that I don't have the setup to do this efficiently. To keep a long story short: I now have a test lab the size of which is several times the country I live in. Whee.

I had my camera with me, and took some pictures. I'll upload them soon, but have some other, more urgent, matters to attend to now (such as 'eat').

Later.

Sat, 25 Jul 2009

RC NMUs

A few days ago, during DebCamp9, someone NMU'ed belpic to close #525593, a 'failed to build from source' bug that was filed against it on april 25th, 2009.

Since such bugs (for good reason) are deemed release critical, my package was facing the prospect of being thrown out of testing, and this person (who shall rename nameless in this blog post, since it is not about fingerpointing; but if you must, check the bugreport to find out who) did an upload to prevent that from happening.

In and of itself, this is a good thing. I don't like it when my packages have bugs, and generally try to keep the number as low as possible. There used to be a time when my maintainer bug listing had zero open bugs for most of the time, and though I long ago gave up trying to keep it that way, it is still a goal I would like to reach at some undefined point in the future. Yet, with this particular upload, I was quite unhappy, even going so far as to cancel it, preventing it from going into the archive; and when, over lunch, I discussed the situation with Adeodato Simó, he seemed to feel that I improperly blocked this upload; that I instead should have allowed it to proceed. I felt as if he was almost hostile to my notion that the uploader should have coordinated more with me than had happened. I tried to explain why I felt the uploader had done something wrong, but I do not think I convinced him.

After having thought about it for a few days now, I still don't feel I was wrong in my actions; and I don't like the idea of possibly being a bad maintainer. So here's my position on the whole thing:

First of all, the bug was indeed open for a long time. The reason is that I didn't have much spare time in May to look at it. I also had no clue as to what the problem was, which makes it kinda hard to fix it. I did have some spare time in June, but since Belgian citizens have to file their tax reports in that month, and since you may need the software in these packages to be able to do so, I did not think it was proper to indeed do an upload before the deadline of June 30th; I did not want to have to scramble to fix a botched upload at exactly the wrong time.

On the 11th of this month, a patch was sent to the bugreport that would fix it. This was the weekend before I would leave for DebCamp/DebConf, so I postponed working on it until I would arrive here in Cáceres.

Secondly, what I'm most upset about is the fact that the upload wasn't tested. It couldn't have been; the only way you can test this package is by using a smartcard reader and a Belgian ID card; since the uploader does not have the Belgian nationality, I doubt he has such a card. I don't personally do an NMU that often; but when I do, I consider it my duty to perform extensive testing, even more so than with my own packages. In this particular case, doing such tests is quite important, as I've had issues with the software in the past, where new builds wouldn't work properly for reasons that I haven't fully been able to pin down.

Of course the uploader couldn't know all that, but this is exactly why he should have talked to me before doing the upload. Of course, in theory, I could have documented all my knowledge about the package; but in practice, it's pretty hard to do that well (i.e., many of the things I know about my packages involve 'feeling', 'instinct', and 'experience', which does not easily translate to 'words').

For clarity: I'm not upset about the fact that I've been NMU'ed. That's happened in the past, even for this particular package, and that's been perfectly fine. What I am unhappy about, is that the NMU happened to DELAYED/1 (which gives me very little time to intervene) and that it happened without prior coordination with me.

So, was I really wrong in preventing the upload?

note: since the new upstream release that was available was eventually uploaded to experimental rather than unstable, I just uploaded 2.6.0-7 which also fixes the bug in question.

Fri, 24 Jul 2009

Bison

The Bison parser expects to report the error by calling an error reporting function named "yyerror", which you must supply. It is called by "yyparse" whenever a syntax error is found, and it receives one argument. For a syntax error, the string is normally "syntax error".
If you invoke the directive "%error-verbose" in the Bison declarations section, then Bison provides a more verbose and specific error message string instead of just plain "syntax error".

Sounds good, right?

Well, no, not entirely.

syntax error, unexpected $undefined

Well, goody. Now I know what's going on.

note: yes, I do know that there are other ways to debug a Bison parser than just to use the parser error string. It's just that this could have been more useful, like, say, provide the line on which the error is found? The file I'm trying to parse here is pretty large, thank you very much.

DebCamp 9: stuff

This has, by far, been the most productive DebConf ever, for me.

Not that this means all that much—mostly that previous DebCamps haven't been productive at all—but I still got a few things done.

There've basically been three things that I worked on: d-i support for the Intel SS4000-E; a belpic/beid upstream update; and a minor incremental NBD update.

The latter was simple, and was basically the first thing I did. I had a chat with Vagrant Cascadian, who does a lot of LTSP stuff in Debian, and added some stuff to the package that would make his life a bit easier. Not a lot of work; as I'm also upstream for NBD, and as it's been one of those packages that I've maintained since an eternity, I know the code pretty well. Half a day later, all the code was there.

D-i support for the SS4000-E wasn't that hard (most of the hard parts had already been done by Martin Michlmayr), but unfortunately some bits are not yet completely in order—mostly having to do with the fact that the original firmware has a kernel command line embedded in the kernel. As such, for now, you'll have to connect to the serial line in order to fix the redboot config; maybe we'll come up with a sane way to fix that in the future, but as long as we don't, that does mean you need a serial null modem cable.

Not that you need to solder anything (the main board has a connector for a regular serial port; you just need to plug the right cable to the right connector, so it's not that bad.

The final thing was horrible. A piece of software that presumably works well, but initially wouldn't even compile on my laptop because of pointer/int confusion; a build system made on shell scripts and qmake; and other similar things.

Eventually, I just gave up and uploaded what I had to experimental. It works, to some extent, but should be improved over the next few weeks. That's not for today, however.

Microsoft and the GPL

For some strange reason, people all over the net are oohing and awing over Microsoft releasing some drivers to use Linux on their proprietary virtualization software. I'm oohing too; not because of the drivers, but because of all the buzz that goes around it.

Ten years ago, I would have oohed and awed, too. At that time, Microsoft was fighting open source and free software like a cancer. Today, they're not; they provide open source software for Windows themselves (such as an installer framework which they provide through sourceforge), and actively cooperate with many open source and free software projects through their open source labs. They even have a section of their website dedicated to open source software

A large company like Microsoft can't survive if it tries to actively work against what the marketplace wants. The fact that they were indeed so actively fighting against open source software is quite likely why the first decade of the 21st century has seen such a huge loss of market share for them; like any large company, they needed to adapt or lose. They've chosen the first; good for them, and that might be good for us too.

Over the past decade, I've seen Microsoft warming up to open source, to some extent. This is why I don't understand much of the 'Boycott Novell' lunatics; sure, I don't trust Microsoft enough yet to be willing to say that they don't have any plans that will negatively affect us; however, that doesn't mean I will assume that evil things are their plans; unless proven otherwise, I will assume they have the best interest of their customers and/or shareholders as their main goal.

Which is why I was totally not surprised in seeing a GPL patch from Microsoft at this point in time. Rather, I find it normal and expected behaviour, a continuation of an evolution that has been going on for the better part of a decade now.

Wed, 22 Jul 2009

Buildd maintenance

In Debian, I've been a buildd maintainer since 2001; most of that time was for the m68k port (I still am active there, though not as much as I used to be), but there's also been a short stint with armeb, and since a while I'm now also a PowerPC buildd maintainer. I used to do just one powerpc host at first, but now I maintain both malo and voltaire, with Philipp Kern doing praetorius.

This probably makes me one of the more experienced buildd maintainers in Debian today, together with the likes of LaMont Jones and Ryan Murray. I did do a talk about how this is supposed to work at FOSDEM 2004, but that's now five years ago, and some things have changed since. Also, a not-videotaped talk isn't very helpful if you weren't there.

So I'd thought I'd write up what it means to be a buildd maintainer. There's of course the documentation on the Debian.org website, but that only explains how the system works in theory; it does not explain what us buildd maintainers tend to do on a daily basis.

So let's have a look at that, shall we?

Basically, the work of a buildd maintainer is pretty monotonuous, and an experienced buildd maintainer will usually have a set of scripts to help them. Their work can be categorized into three main categories. In order of frequency, these are:

  1. Log handling;
  2. State handling;
  3. Host and chroot maintenance

Log handling

The first is the most obvious one. Every time the buildd builds a package, it will send a full log of the build to buildd.debian.org and to myself. The successful ones are signed with a simple script:

#!/bin/bash
tmpfile=$(mktemp)
sed -i -e '1,/\.changes:$/d;/^[[:space:]]*$/,$d' $1 | tr -d "\200-\377" > $tmpfile
cat $tmpfile > $1
rm $tmpfile

Easy: use a sed command to fish out the embedded .changes file, and write that to the original file. I use a folder-hook to set this script as my 'editor' in mutt when I'm in my buildd mail directory; thus the result is thereby mailed off to the buildd. In that same folder-hook, mutt is also configured to send the reply gpg-signed in the 'traditional' format, without confirmation, and with just one keystroke, so that (after I have entered my GPG key passphrase) I can send off all the signed changes files in one go. A possible improvement could be to change the macro so that it would work with mutt's 'tag' feature (it doesn't, currently), but that's not a big issue (currently, doing 100 mails takes a few seconds and some careful counting).

Note the 'tr'; this is to avoid 8bit characters from appearing in the mail, which might otherwise be converted to their quoted-printable version in transit to the buildd; and since buildd-mail (the part that receives that mail) does not understand MIME, this would corrupt the GPG signature. This way, we do lose a few characters from the changelog, but that doesn't really matter -- the source still contains the unmodified changelog entry.

With this script, I often handle my 'success' folder several times a day. It's no effort, anyway.

The somewhat harder but much more fun part of log handling, is the handling of failure mails. Since there are loads and loads of possible failures, the scripts to handle these are somewhat more involved. I did receive a script from LaMont at some point, a few years ago, which I then built on so as to improve it. It's not perfect, but it does handle a few common cases with no extra input from me. Some of the others are not so easy, however.

One of the more common cases that cannot easily be automated is the case of the buildd failing to install a certain package, because 'foo depends on bar, but it will not be installed'. This is apt's way of telling you that bar depends on foobar which depends on quux (>= 1:2.3.4-5) which depends on libfrobnitz2, but that has now been replaced by libfrobnitz3. Or some such. The only way to figure out what the hell the problem is, is to walk the dependency tree and figure out stuff from there.

There is an 'edos-debcheck' that reportedly can help with this; personally, I wrote a set of perl scripts that will cache a Packages file into DBM files, and then allow you to walk over them to help you figure out what's wrong. They're not perfect, but if you use the '-v' option to check-dep-waits and verify the output when it tells you about missing libraries, it should be able to figure out the whole dependency tree I described above, and will allow me to write a proper dep-wait response, allowing the buildd host to automatically retry the package when the missing dependency is available.

Also somewhat common and routine are things like transient network failures (in which case we use either 'retry' or 'give-back' if the buildd hasn't figured that out by itself and done the latter), the maintainer uploading a new version of the package while the previous version is building (resulting in wanna-build firing off an email to the buildd host, which in turn results in buildd killing the build by removing the build directory; this is not always easily distinguishable from a regular failure, so I commonly respond to that mail with a failure message; if it did indeed fail because of a newer version, then buildd will notice that and ignore my mail), the incoming.d.o Packages file (which is only available to buildd hosts, so don't ask) being out of sync with reality (which happens 4 times a day for about an hour. In this case build-deps will fail to install, requiring a retry or give-back), and similar things.

Other things are less common; but because of that, they are not routine and require an in-depth investigation. Sometimes the fix is to just file a bug report and/or to mark the package as 'failed' (and let the maintainer or a porter handle the problem); sometimes the failures are due a maintainer script in a package being utterly broken, resulting in either some build-deps being uninstallable or (worse) the buildd chroot being fucked up. Sometimes a build is interrupted halfway through, leaving the chroot in an unclean state (sbuild is not pbuilder, and does not remove and recreate its chroot between builds). This would push us to category 3 of our work.

Basically, however, figuring out which is which takes some experience. Not all compilers are based on gcc (there are some really weird languages in Debian), and thus not all of their error output is the same; learning their different error modes can help quite a lot. Additionally, by continually compiling 10G worth of software, you'll be stress-testing your toolchain. If you've never seen an 'Internal Compiler Error' before, you will once you become a buildd maintainer, and it helps if you know what they are and how to deal with them (even if there isn't much one can do beyond filing bugs).

Obviously, handling failures takes some more time than does handling success mails, and it's not something I do quite as often. The exact time between both varies, but it's usually somewhere between a few days and one or two weeks—unless I suddenly stop receiving success mails from one of my buildd hosts, in which case I know something is utterly wrong and will usually investigate immediately.

State handling

With 'state handling', I mean managing the state of a package in the wanna-build database. There's help about this from the people on the debian-wb-team mailinglist; call me oldfashioned, but I still do consider this to be the final responsibility of the buildd maintainers. After all, the routine state changes are a result of decisions that I make; as such, if I fuck up, it should be me who fixes the fuckup. Also, if I mark a package as 'failed' because I believe the maintainer fucked up, then the debian-wb-team people may not know about my reasoning there, and might give the package back to another failure (although I would consider the latter pretty rare).

These requests are pretty common. Quite often, they're unnecessary—many maintainers are unaware of the intricacies of the wanna-build system, and may misunderstand that when a build is in dep-wait state, it will automatically migrate to needs-build once dependencies are available. About as often, however, they are very much necessary, and, since regular Debian package maintainers do not have access to the wanna-build database, require someone who does have access to said database to update it for them.

Having said that, there are cases where I will preemptively edit the wanna-build database. Usually this is to do something useful with packages in 'Building' state that have been in that state for far too long; either upload the package if its signature mail got lost (which happens once in a blue moon), or give the package back if its build was not attempted even though it is marked as such (this should not happen, but the system is not perfect and it does). Sometimes this is because I figured out that some common build-dependency (say, the GTK or Qt libraries) are in a transitional state and currently not installable; and rather than having a build daemon try a bunch of packages and failing them all, I may want to note in the wanna-build database that they should not bother attempting these 75 packages before the GTK package was done. This isn't done as often on the official Debian machines (since the release managers will do it for me there), but in m68k we do need to do this ourselves.

These kind of requests happen once every few days up to once every few weeks, and take little time to deal with.

Host and chroot maintenance

This is the hardest and least fun part of buildd maintenance, but it is just as necessary. Luckily, it is not as often needed.

Because Debian Unstable is a system that's in a constant state of flux, often things will break. This is even more of a problem on a buildd chroot, since it builds out of incoming; a maintainer may upload a package with a fucked postinst script, have its build succeed, but then fail spectacularly to install. This maintainer may notice that, and may upload a new package half an hour later. As such, the broken package will not end up on the system of a user or Debian Unstable, but between the time of the upload of the broken package and that of the new package, the old package will be available to buildd hosts, who may use it to completely and utterly destroy their build chroot. The joys of having a high turnaround time.

Luckily, Debian package maintainers are not stupid, and this kind of fuckup does not happen every other day. It does happen, however, and when it does, this often means manual work for the buildd maintainer. In the best case, it's a matter of syntactically fixing a postinst script and calling 'apt-get -f install' or 'dpkg --configure -a'. In the worst case (which is almost, but not quite entirely, totally unheard of), it's a matter of rebuilding the buildd chroot. In addition to that, a machine which runs 24/7 for the sole purpose of building packages tends to generate quite a lot of disk activity, which in turn tends to be detrimental to the disk in the long turn. If not looked after properly, disks will die, taking the entire buildd chroot with them. That requires rebuilding them. Obviously, this last issue is dealt with by the Debian System Administration team in the case of official Debian hosts, but the same is not true for the m68k port.

A somewhat more common thing that needs to be taken care of is the fact that buildd does not in all cases clean up after itself. For instance, when a new version of a package is uploaded to the archive between the time that the buildd host built it and the time the buildd maintainer sent the signed .changes file back, then buildd will say "I haven't got that package taken as Building" and refuse to upload it. This makes sense (you can't upload an old version of a package, since there wouldn't be any source for it, and dak would refuse the upload), but it does mean that the packages aren't cleaned out. Arguably a bug in buildd-mail, over time it will result in the disk filling up with outdated packages, and those require manual work from the admin. I recently (as in, a few hours ago now) finished a script to check each .changes file in the "build result" directory against the wanna-build database, and list those that are no longer necessary. I already had a script that, given a list of .changes files, would remove every .deb file listed in the given .changes files, and then proceed to remove the .changes files themselves. Combined, these do make that kind of work somewhat less of a burden.

As said, this kind of work does not need to be done all that often; for instance, I just cleaned the build result directory on voltaire and malo, my two powerpc buildd hosts, and found old files from late 2008...

And that's it, I guess. It may seem to be quite much, but in reality it isn't; the thing I've always liked about buildd maintenance is the fact that you do something little for Debian every day, but that it ends up being something big and helpful after a while.

Of course, the little things are the cherry on the cake. By looking at a lot of build logs, one eventually learns a thing or two about build systems, which is valuable knowledge. Getting build logs from the whole of Debian allows one to learn things about the archive that many people don't know about—for instance, did you know that we had a package called trousers? I didn't, until I signed the buildd log...

Update: changed the URL of this post to be under the buildd/ directory, rather than having it conflict with that and thus killing its permalink and making it impossible to comment on this post. Oops.


Warning: Unknown: write failed: No space left on device (28) in Unknown on line 0

Warning: Unknown: Failed to write session data (files). Please verify that the current setting of session.save_path is correct (/var/lib/php5) in Unknown on line 0