Film: I am legend
On New year's eve, I went to some local movie theatre, and watched the above movie.
Must say I'm impressed. Been a while since any movie managed to get me hooked like this. I just love the way in which the backstory is communicated to the audience through little clues and flashbacks that are given throughout the movie. This certainly makes for an interesting way of discovering what the hell is exactly going on here. Add to that a premise that is sure to raise some interest (New York abandoned? Quite some special effect), and you get a very entertaining night out.
But credit where credit is due: Will Smith's performance was outstanding. He's done a marvellous job in portraying a real human being: not a superhuman, but a person who's had the luck not to be affected by this horrible disease, and who's now slowly turning insane because of his lack of contact with other human beings.
Really nice movie.
Broken C code
A few weeks back, someone asked on debian-68k about an unexpected result of some piece of software when it was compiled on ARAnyM, an m68k emulator that can run Linux.
His initial idea was that this could perhaps be a bug in ARAnyM, since it only occurred to him inside ARAnyM, and not on any of the other architectures he tried. In fact, it was a bug in his code.
Let's have a look at what's going on here. The example program which Sergei provided and which exhibits the problematic behaviour is spread over two files. The first file contains this:
#include <stdio.h> void *f(); main() { void *a; a = f(); printf("%d\n", (long)a); }while the second contains this:
#include <stdio.h> long f(); long f() { long a = 84; printf("%d\n", a); return a; }
Many people familiar with the C programming language will intuitively expect that this program, when run, will output the following:
84 84
When in fact, when compiled and ran on an m68k machine, the output is as follows:
84 -1072577264
What's going on here? In order to understand that, we have to take a look at the assembly code. This is done by use of "objdump -d", and gives (amongst others) the following output:
800004c4 <main>: 800004c4: 4e56 0000 linkw %fp,#0 800004c8: 61ff 0000 001a bsrl 800004e4 <f> 800004ce: 2f08 movel %a0,%sp@- 800004d0: 4879 8000 05aa pea 800005aa <_IO_stdin_used+0x4> 800004d6: 61ff ffff feec bsrl 800003c4 <printf@plt> 800004dc: 508f addql #8,%sp 800004de: 4e5e unlk %fp 800004e0: 4e75 rts 800004e2: 4e75 rts 800004e4 <f>: 800004e4: 4e56 0000 linkw %fp,#0 800004e8: 4878 0054 pea 54 <_init-0x800002f0> 800004ec: 4879 8000 05aa pea 800005aa <_IO_stdin_used+0x4> 800004f2: 61ff ffff fed0 bsrl 800003c4 <printf@plt> 800004f8: 7054 moveq #84,%d0 800004fa: 4e5e unlk %fp 800004fc: 4e75 rts 800004fe: 4e75 rts
So what's happening here?
- main acquires some stack space, and then (0x800004c8) immediately jumps to a subroutine 26 bytes ahead; our function f.
- f also acquires stack space (0x800004e4), then pushes two addresses on the stack (...4e8 and 4ec), and jumps to the printf function. The first of these two addresses is the constant 84 that is hardcoded in the binary; the second address is our format string.
- f then moves the constant value 84 to the D0 register (...4f8), frees its stack space (4fa), and returns (4fc; the second rts can be ignored, that's a harmless quirk in the compiler).
- main now copies whatever is in the A0 register to the stack (4ce), pushes the exact same format string to the stack (4d0), and again jumps to the printf function (4d6; the difference in hexadecimal values is due to the fact that the bsr opcode uses processor-indirect addressing).
- Finally, main clears up what it's been doing, and exits
The difference should be clear: the f function stores its result value in the D0 register, but main goes looking for it in A0.
The reason for this discrepancy very simple. The m68k processors has three sets of registers: one set is for integer values (D0 through D7), one set is for address values (A0 through A7), and one is for floating-point values (FP0 through FP7). The m68k ABI specifies that integer return values should be stored in integer registers, that address return values should be stored in address registers, and that floating-point return values should be stored in floating-point registers. This makes sense, since register-indirect addressing modes require the address to be stored in an address register; and calculating values requires the value to be stored in either a floating-point or integer register. When main, then, tried to look for a return value in A0, it found something, but obviously not what it should have found...
This issue would have been a non-issue had this function been written the way it should have, like so:
#include <stdio.h> void* f(); int main() { long* a; a=(long*)f(); printf("%Ld\n", *a); return 0; } void* f() { static long a=84; printf("%Ld\n", a); return &a; }
To close up: there's a reason why compilers emit warnings if you do something strange or "clever". In this particular case, the warning was suppressed, since both files contained a declaration of an f() function, even if the declaration was different. This is a horrible hack that tries to work around those warnings, rendering them utterly useless.
Please, pretty please, with sugar on top: don't do something like that. If you write C code, please compile it with -Wall -Werror, and make sure it compiles that way. If you want to access a function in a different file, create a common header file that both will #include, so that the compiler can notice differences between declaration and definition of a function; and don't expect that something will work because you fixed it for a common and well-known case, because there will often still be other places where your bug will still trigger. In this case, by declaring the function as one returning a long value, they made sure it could not break on 64-bit architectures. However, as shown above, that doesn't make the bug go away...)
Writing clean and portable code is way more fun. Trust me.
Since you asked for it...
Chris, buildd hosts keep those time and space usage statistics in a database, which can be queried...
Script started on Sat Jan 5 19:54:15 2008 wouter@country:~$ ssh kiivi.cyber.ee Linux kiivi 2.6.18-4-mac #1 Fri Mar 30 23:05:11 CEST 2007 m68k No mail. Last login: Sat Jan 5 20:41:16 2008 from d51532c45.access.telenet.be wouter@kiivi:~$ avg-pkg-build-time -s -t | head -n 20 boost: 2015312k (2015312k lastest) iceweasel: 1846628k (1846628k lastest) iceape: 1674172k (1674416k lastest) icedove: 1624848k (1624848k lastest) xulrunner: 1611880k (1611880k lastest) mesa: 1230432k (1230432k lastest) koffice: 1208062k (2115156k lastest) gcc-snapshot: 1191866k (1207164k lastest) gcc-4.1: 1185620k (1185620k lastest) gtk+2.0: 1148634k (1208684k lastest) k3d: 1145580k (1145580k lastest) ardour: 1077800k (1077800k lastest) lyx: 924876k (924876k lastest) gcc-4.2: 909488k (909488k lastest) gcc-3.4: 892015k (571084k lastest) glibc: 891408k (891408k lastest) linux-2.6: 725961k (744700k lastest) kdebase: 720618k (1106804k lastest) kdelibs: 716376k (1150372k lastest) kdepim: 711858k (784356k lastest) wouter@kiivi:~$ avg-pkg-build-time -t | head -n 20 gcc-snapshot: 220:52:14 (2 entries, sigma 27:38:21) axiom: 216:27:57 (1 entry, sigma 00:00:00) gcc-4.1: 193:47:30 (1 entry, sigma 00:00:00) k3d: 177:22:10 (1 entry, sigma 00:00:00) gcc-3.4: 171:40:28 (4 entries, sigma 25:10:55) boost: 141:13:38 (1 entry, sigma 00:00:00) koffice: 140:40:23 (2 entries, sigma 53:48:58) ace: 130:04:14 (1 entry, sigma 00:00:00) linux-2.6: 129:36:19 (3 entries, sigma 10:12:24) qt-x11-free: 105:00:32 (1 entry, sigma 00:00:00) xulrunner: 97:03:38 (1 entry, sigma 00:00:00) iceape: 96:23:13 (2 entries, sigma 03:12:58) kdepim: 96:06:54 (2 entries, sigma 19:09:39) icedove: 94:35:48 (1 entry, sigma 00:00:00) iceweasel: 93:56:26 (1 entry, sigma 00:00:00) gcc-4.2: 91:08:32 (1 entry, sigma 00:00:00) glibc: 91:01:58 (1 entry, sigma 00:00:00) kdevelop3: 84:27:21 (2 entries, sigma 02:47:35) kdeedu: 82:09:55 (3 entries, sigma 10:59:18) vtk: 81:44:20 (3 entries, sigma 00:46:32) wouter@kiivi:~$ logout Connection to kiivi.cyber.ee closed. wouter@country:~$ ssh arrakis Linux arrakis 2.6.12-1-amiga #1 Sat Aug 13 22:59:59 CEST 2005 m68k The programs included with the Debian GNU/Linux system are free software; the exact distribution terms for each program are described in the individual files in /usr/share/doc/*/copyright. Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent permitted by applicable law. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Please read "links2 http://nefud/info.txt" ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ No mail. Last login: Sat Jan 5 18:48:58 2008 from hahn.os.localnet on pts/5 wouter@arrakis:~$ avg-pkg-build-time -s -t | head -n 20 wxwidgets2.6: 1648020k (1648020k lastest) qt4-x11: 1601086k (1753532k lastest) gcj-4.1: 1135772k (1135772k lastest) k3d: 1094345k (1159204k lastest) kdebase: 1087375k (1087368k lastest) python-kde3: 1018144k (1018144k lastest) ghc6: 991524k (991524k lastest) gcc-3.3: 988690k (990244k lastest) glibc: 814536k (855720k lastest) lyx: 804672k (804672k lastest) bouml: 729888k (729888k lastest) kdelibs: 685565k (1142304k lastest) freedroidrpg: 673988k (673988k lastest) mozilla: 663066k (636896k lastest) ardour: 647580k (691128k lastest) mozilla-thunderbird: 591384k (591384k lastest) omniorb4: 576288k (576288k lastest) subversion: 568764k (624768k lastest) xorg-server: 527520k (527520k lastest) mozilla-firebird: 498896k (502692k lastest) wouter@arrakis:~$ avg-pkg-build-time -t | head -n 20 enlightenment: 198842:37:14 (3 entries, sigma 344403:17:35) axiom: 164:43:54 (1 entry, sigma 00:00:00) ghc6: 159:20:15 (1 entry, sigma 00:00:00) k3d: 145:18:35 (3 entries, sigma 57:08:46) python-kde3: 126:39:34 (1 entry, sigma 00:00:00) qt4-x11: 121:26:11 (2 entries, sigma 03:07:35) gcj-4.1: 116:05:34 (1 entry, sigma 00:00:00) gcc-3.3: 108:45:19 (3 entries, sigma 02:01:50) qt-x11-free: 99:12:17 (4 entries, sigma 14:12:32) wxwindows2.4: 90:26:51 (1 entry, sigma 00:00:00) openscenegraph: 74:37:21 (1 entry, sigma 00:00:00) ardour: 63:44:55 (2 entries, sigma 07:23:05) bouml: 62:01:16 (1 entry, sigma 00:00:00) gcc-snapshot: 61:10:07 (1 entry, sigma 00:00:00) mozilla-thunderbird: 60:47:29 (1 entry, sigma 00:00:00) mico: 60:15:15 (1 entry, sigma 00:00:00) kdebase: 58:41:04 (5 entries, sigma 08:00:17) glibc: 58:04:26 (6 entries, sigma 40:01:12) mozilla-firebird: 52:47:13 (2 entries, sigma 11:31:07) zeroc-ice: 52:34:03 (2 entries, sigma 28:12:19) wouter@arrakis:~$ logout Connection to fremen-os.dyndns.org closed. wouter@country:~$ exit Script done on Sat Jan 5 22:28:46 2008
"avg-pkg-build-time" is a script that queries the statistics database. It's quite CPU-intensive, so I don't like to run it too often (certainly not from cron); but, hey, here you go.
In case you didn't know, "arrakis" and "kiivi" are both official m68k/unstable buildd hosts, and especially arrakis has been for quite a long time, although it's been reinstalled once or twice since. Oh, and yes, it does appear as though some bits of the data are invalid, but most of the idea should be alright.
(oh, and before you ask, no, it didn't take two and a half hours to run those commands -- I just needed to go while the last one was running, and had only come back several hours later)
Broken C code: followup
I received a number of comments on my previous blog post, with people stating that my example code wasn't right, either.
For reference, here's my code snippet again:
#include <stdio.h> void* f(); int main() { long* a; a=(long*)f(); printf("%Ld\n", *a); return 0; } void* f() { static long a=84; printf("%Ld\n", a); return &a; }
First, let me make one thing perfectly clear: the "a" variable is not on the stack. It is a locally-scoped variable, but it is also declared as static, meaning that it's valid for the entire run of the process; if it were on the stack, then this variable would vanish when the function would end. Yes, it's wrong to return a pointer to a variable on the stack, since there's no guarantee that the stack value will still be valid; but I'm not doing that.
Alternative implementations could've used a global variable, or could've malloc()'ed a pointer; but the compiled assembly code for my C code will only differ from a version with a global variable in the used label for the variable's location, and in the lack of an extra function call from a version that uses malloc().
Second, no, using A0 as a return value for pointers is not a bug in the compiler, and has not been since 1990, when this interesting book entitled "System V ABI Motorola 68000 Processor Family Supplement" was written. Allow me to quote a snippet from page 3-14 of that book:
%a0 Pointer return values appear in %a0. When calling a function that returns a structure or union, the caller allocates space for the return value and sets %a0 to its address. A function that returns a structure or union value places the same address in %a0 before it returns.
To this day, ELF-conforming implementations do it this way. Really. Go look it up, if you wish; the ISBN is 0-13-877663-6.
Network booting
Those who keep track, will know that I started writing support for NBD in debian-installer last weekend. But that's not what this post is about.
I've been setting up a laptop for my brother. He talked about "something about as fast as our parents' computer", which is a 600Mhz PIII. After looking a while on ebay, I found a Dell Latitude L400, a 12" 700Mhz PIII-based laptop, for a fairly low price, which seems to fit that description, so I placed a bid. Only to figure out later that I should've read the description a slight bit more, because this was an incomplete machine: no RAM, no power adapter, no battery, no hard disk, which was being sold "for parts". Darn.
Got me some fitting RAM and a power adapter on replacedirect (which is a shop I can really recommend, BTW -- new batteries for almost all conceivable laptop models, including those that have been out of production for years), put the RAM in, connected the power adapter, and crossed my fingers. Luckily the machine appears to work; it's not that they took it apart because it had broken down, or some such.
I also still have a 2" hard disk to put in the machine, but apparently Dell has this conversion thing that I need to put between the disk and the rest of the machine, and that's missing; I guess I'll have to be a bit creative there. So, for the time being, I configured the machine to enable PXE booting, and set something up on the network to boot from. Since the installer support to install to the network is not (yet finished, I needed to do some stuff manually, however.
After making sure I could boot the machine (long live debootstrap), I set out to install additional packages. After downloading and installing a number of packages, however, the machine suddenly died. Unfortunately rebooting didn't fix anything; the machine would die somewhat halfway through the boot. Since it was quite, eh, "early" in the morning by then, I left it at that and went to bed.
Today, I easily figured out what the problem was: network-manager expects to take the network down, and then to run some scripts... which obviously breaks just about everything.
I guess that if I want to make partman-nbd work, I'll have to do some work there, too...
Dear world,
I'm a Debian developer; therefore, I have a GPG key.
Just because I do, doesn't mean I want every random message to me to be encrypted.
Thank you for your attention.
On Sexism
Men are pigs.
Can we please move on now?
MySQL is a toy
Really.
mysql> select count(*) from $VERY_LARGE_TABLE ERROR 2013 (HY000): Lost connection to MySQL server during query mysql> select count(*) from $VERY_LARGE_TABLE ERROR 2006 (HY000): MySQL server has gone away No connection. Trying to reconnect... Connection id: 1 Current databse: $DATABASE ERROR 2013 (HY000): Lost connection to MySQL server during query
Yes, I edited the output to hide table- and database-names. Other than that, this is exactly what happened.
I'll have PostgreSQL every day.
... in sickness and in health ...
That's how I'm passing this week.
What? No, I'm not getting married. It just feels like my body isn't quite sure whether it wants to be sick or not.
On monday, I got down with influenza, mostly in the stomach area. Not only does that hurt and feel awfully bad, it also involves your yesterday's lunch reversing direction halfway through the digestive system, and finding the wrong way out. Some people call that puking. Being restless, doing nothing, I tried to do some work for Debian.
Now usually, when I get sick, I get really sick. For one day. The next day, I can work again. That doesn't mean I'm at my best, but at least I can handle some stuff. So on tuesday, I went to work, since I had this appointment on wednesday and I still had to prepare some stuff for that.
I made the mistake, however, of not dressing warmer than usual. With my body still recovering, and with the extra bad weather in the evening, I caught a bad cold.
So on wednesday, I stayed at home. Lucky for me, customer had asked to postpone the appointment on wednesday to thursday, since he wasn't going to get everything ready in time otherwise. Me, I was still feeling the leftover sickness from the influenza, and now this cold was tearing me down.
But as before, after one day I felt much better, so I thought I'll just dress much better this time, go to that appointment, and make the best of it. I figured it couldn't be that bad.
Except that in a data center, the temperature is optimized for computers, not for sick human beings; and nobody cares about the noise, or about the fact that trying to yell above said noise isn't a very good idea if your voice is having issues because of some virus.
I guess the only nice thing that happened this week was that nice lady on the train who gave me a clean handkerchief when my own was... well... you don't want to know.
Tomorrow, I guess I'll just stay in bed. And on saturday, and perhaps even sunday, too. Get this week over with.
FOSDEM 2008 approaching...
In just over a month (1 month and 1 day, to be precise), it's FOSDEM again. The Debian Developers Room schedule is almost complete, and some interesting things are being published around it.
Such as the pre-FOSDEM beer event. In previous years, this always happened at the "Le Roi d'Espagne", a bar at Grand Place in Brussels. This was a nice place for many years, because it's easily found, and they have this backroom where quite a lot of people would fit in. Unfortunately, "quite a lot" does not equal "a huge number", with the amount of people going to the pre-FOSDEM beer event more and more approaching the latter than the former as the years go by. Last year, it finally became too crowded, and it was decided that the pre-FOSDEM beer event would change bars this year.
Which has now happened. This year, the pre-FOSDEM beer event will be at the Delirium Café, which is not at the Grand Place anymore, but just 100m away from it; the FOSDEM website has more information. A nice feature of the Delirium Café, besides the fact that they are much larger, is the huge amount of different beers they have. Over 2000 of them, which you can query online!. And that's not even considering the soft drinks or waters or whatever, which they also have on their menu.
Another novelty this year is the streaming; the main tracks that will be held in Janson will be streamed out of the event, and the Debian video team is also planning to do some streaming. Meaning, even if you can't make it to FOSDEM, you'll still be able to follow some of it. You won't be able to follow everything, especially not the hallway track which at FOSDEM is a very important one; but at least you'll be able to sample some bits, which is nice.
Other than that, not much will change. There'll be a key signing party, organized by yours truly. There'll be Lightning talks, there'll be the Main track, and there'll be a number of Developer rooms, including the Debian one—which, occasionally, is also organized by yours truly.
This conference has always been my favourite (even though I haven't been to many conferences besides FOSDEM); I've never missed even one edition ever since the very first one back in 2001 (when it was still called OSDEM). And since a few years, I've also been offering crashing space at our office, for international geeks who don't mind roughing it a bit (it's an office, there no shower, there's only floor space). This year, perhaps I haven't been spamming enough for that, but for now I only have two people who're interested in sleeping over (and they're not even sure); if you're going to FOSDEM and still need a place to sleep, you're (probably welcome. Send me an email.
On the Debian front, things are going well, for the most part. The list of speakers for the Devroom is nearing completion, leaving me with little more to do than to start assigning times. We've been assigned a booth as well, but there I still need to figure out how to organize it, and what to do with it. There will of course be the regular UK T-shirts, and probably some other merchandizing, but I also want something involving some computer device. We've had this d-i "babelbox" installation running a few times now, which was nice; but I do think we need some variation, too. The only problem with that approach is that I don't really know what that variation could be. Suggestions are welcome!
And so time goes on, and FOSDEM draws nearer. At some point it'll eventually happen, and then it will be over before one can say "blueberry pie". Sometimes I wonder whether all these months of preparation are actually worth it. But when FOSDEM is actually here, I'm sure I'll find that it is, indeed, worth it...
Adrian,
No, that is not the correct key combination. You were looking for start+r.
Samba v2
samba.grep.be.
This machine has been my trusty publically-accessible server for about four years now. It runs my website, including this blog, is my primary MX, contains my subversion repositories, and my gallery. For slightly less than the first year of Planet Grep, it was the only machine running that site (it has now been migrated to two machines running in two different data centers, to cope with the ever-increasing bandwidth usage of that site). It runs the bacula-director for our backup system, and it contains the modem behind our fax number, allowing us both to receive faxes and to log in using PPP to our local network. Most importantly, I've been using it as an SSH jumphost and IRC box to allow me to connect to other machines whenever I'm behind an overly paranoid firewall.
But now it's time to retire the machine. The high number of services combined with the, to today's standards, low number of available system resources, is beginning to be a problem. A few times now, already, I've had to reboot the machine because a load spike was making it unresponsive. So it has to go.
Many people will be shocked to find out that the machine powering samba up to now was an IBM SurePos 500. No, really. The reasons are long and complex, but suffice to say that there was a point in time where I wanted to set up a server for myself, and I had an idle, never-to-be-used-anymore, €2000+ priced, machine standing by. So there I went, usurped this machine as server, and have given it near-100% uptime. If you want to know just how near-100% that is:
9 Power_On_Hours 0x0032 058 058 000 Old_age Always - 31170 [...] 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 99
Or, in plain English: according to SMART, the hard disk has seen just 99 power cycles in four years.
This machine was never designed to do that. Side note: yes, that is a touch screen on top of that box.
The new machine is much, much better. We're going from this:
model name : Intel(R) Celeron(TM) CPU 1200MHz
to this:
model name : Dual-Core AMD Opteron(tm) Processor 1210
which is much, much better. The box actually containing this processor is also much, much better.
However, there was one problem: the old processor was a PentiumIV-class processor, requiring Debian's i386 port, whereas the new one (obviously) contains a processor that will allow me to run the amd64 port; and I did intend to do this. Additionally, I wanted to migrate from a simple "root-on-partitions" system to a "root-on-LVM" system, with Xen somewhere in between. As such, simply rsync'ing the entire hard disk on the old system over to the new system (my usual way to migrate a non-critical server) wasn't going to work.
Luckily Debian isn't too hard to migrate from one machine to the other, however.
- Installed Debian on the new machine, giving it the same hostname (samba) as the old one, but with root on LVM.
- Ran 'dpkg --get-selections' on the old machine, feeding the output to 'dpkg --set-selections' on the new one. Installed packages.
- Rsynced over /home
- Rsynced /etc over to a separate directory on the new box, and copy some files (such as fstab) from the live /etc on the new box into the separate directory.
- Rsynced /srv over
- Rsynced /var over to a separate LVM volume.
- Created a snapshot volume of that /var LVM volume, moved most of the live /var over to some place else, and added the original LVM volume (i.e., not the snapshot) to fstab. Obvious exceptions were such things as databases, which usually have an architecture-specific on-disk format, and the dpkg directory.
- Brought the original system to runlevel 1.
- Rsynced /home, /srv, and /var over again to get the last-minute changes in.
- Rsynced (with --delete) the separate /etc over the live /etc.
- Brought the original system down.
- Rebooted, checked which services died because their on-disk format also differed between i386 and amd64, utterly and completely killed those, copied the files from the original amd64 /var back again, and made some type of plain-text dump that I then imported into that service.
- Rinse, repeat until all those services work. Then, removed the /var snapshot (didn't need that backup anymore, then), and voila.
Or, well, that was the idea. Because I forgot about one silly detail: if I wanted to make everything work out correctly, I would have had to keep UID numbers in synch. Unfortunately, I forgot about that; and as a result, a number of files were created with the wrong owner. This made all kinds of things fail horribly; and rather than stop and think about it, which would have made me copy the old passwd file over from the old installation, add some extra lines for users that were on the old server but were not yet created on the new one, and then finally leave it at that, everything would've been fixed. Instead, I started to change ownership of files all over the filesystem, creating a mess of things.
Silly me.
But, well, in the end I did get everything to work correctly; even if it took me longer than expected. No worries.
White Zombie: Astro-Creep: 2000 -- Songs of Love, Destruction, and Other Synthetic Delusions of the Electric Head
I've had that record since late high school, over a decade ago, and loved it at the time. This was my metal period, when I would go for the heaviest metal you could find, and loved it. I should still have a Sepultura album somewhere, a number of Metallica ones, and similar stuff.
I haven't put them on in years, though. These days, I much more like easier music, such as jazz, blues, and similar things. Not that I ever started to hate metal or something; it's just that it went a bit back to "things I used to love, but now just appreciate".
Today, I stumbled upon this White Zombie record, and put it on again. Even though it's been ages, and even though I don't dig metal as much as I used to anymore, I still appreciate it. Astro-Creep: 2000 is just one hell of an album...
... even if I know a few people who would vehemently disagree. But that aside
Software RAID
Russel blogs about write intent bitmaps, which are an option in the Linux Software RAID subsystem, which works somewhat like a journal on the RAID level: every time you write to the array, you first mark the bits you're going to write to as dirty, then write them, and then mark them as clean again. This allows the RAID subsystem to have to check much less in case of a system crash, where normally the system would have to run a full array rebuild.
He'd suggested this before on the debian-boot mailinglist, and when I read that post, it seemed to make sense. However, after reading his blog post, I'm not so sure anymore. In his words:
The down-side to this feature is that it will slightly reduce performance. But when comparing the possibility of a few percent performance loss all the time and the possibility of a massive performance loss for an hour or two after a crash it seems that losing a few percent all the time is almost always the desired option.
I vehemently disagree there. Performance is irrelevant in case you have a large server park; in that case, adding another server to the park is relatively cheap—you don't run hundreds of servers on a small budget, and besides in these days of virtualization, often migrating a service from one physical server to another isn't very hard.
However, this isn't true when you're talking about small businesses, or (especially) home servers. When I have a choice between high loss of performance in case of something which happens only rarely (in my experience, the Software RAID subsystem is pretty good in recovering from a power loss without having to go through the RAID rebuild, leaving only kernel crashes and similar) or a small but continuous performance loss on my home server, there is no doubt in my mind that I will choose the former. First, my home server is a Thecus N2100, which, while powerful enough to run a number of services for my home network, is not a very fast system with somewhat low resources in comparison to some other systems; and even a small loss of performance is probably noticeable. Second, the speed of recovery which the RAID subsystem uses (and, hence, its performance impact) is manageable through the files /proc/sys/dev/raid/speed_limit_max and /proc/sys/dev/raid/speed_limit_min. Obviously lowering the speed of the RAID rebuild will make the process take longer; but if performance matters that much to you, then lowering the rebuild speed can be an option. Finally, sometimes the RAID subsystem chooses to go through a lengthy check of the entire array; it would be interesting to know whether using the write intent bitmap feature disables this too. I suspect this is not the case, and if so it would seem as if enabling this feature would cost some performance for little benefit: in normal situations these checks happen far more often than actual RAID rebuilds; so the most important source of performance loss would not be handled at the cost of extra performance loss.
In closing, I guess the right answer to this question is that it's a trade-off; that choosing the right defaults should be done by upstream (to avoid confusion), and that the user should be given the possibility (somehow) of enabling or disabling this option in defiance of the defaults at install time (perhaps only in d-i's expert mode)
FOSDEM schedule for the Debian Devroom
I spent a good few hours today on creating a preliminary schedule for the Debian Developer's room at FOSDEM 2008. Just as I had finished it and sent it to the speakers to get their feedback before declaring it final, I received a reminder from the FOSDEM organizers that the schedule is due by the end of this week. My timing is impeccable
Having said that, the schedule is full. If you wanted to do a talk at FOSDEM, you're probably too late by now. Try again next year.
Time to start focusing on the booth now, I guess...