WEBlog -- Wouter's Eclectic Blog

Sat, 05 May 2012

First-rate Linux support by Xerox

When buying hardware for a Linux system, often we need to hunt for their support status by searching for PCI or USB IDs in the kernel source, or by looking up the printer model on the linuxprinting.org (now freedesktop.org) openprinting database, or similar things. This is something I'm used to, and not at all unexpected anymore. And after having done it a thousand times for several customers as well as for myself, it's become routine.

So in that light, I was pleasantly surprised last monday when I delivered and installed a Xerox WorkCentre 3220 at a customer last week to see that they support Linux on the same level as they do Windows and MacOS: the "system requirements" part of the manual contains a section laying out the requirements for a computer running Linux, as does the "troubleshooting" section; and when there are Linux-specific bits to be said, there'll just be a Linux-specific section in the manual to tell you what to do. Also, the CD-ROM that came with the device has a Rock Ridge extension, which means that if you pop it into a Linux system you'll see an installer for CUPS and SANE drivers.

The only criticism I have is that it's an installer, and not an LSB package or some such. But hey, for once I didn't have to fight to get a printer to work properly!

Thu, 26 Apr 2012

Switching to duckduckgo

In the late 90s, google became popular for one reason: because they had a no-nonsense frontpage that loaded quickly and didn't try to play with your mind. Well, at least that was my motivation for switching. The fact that they were using a revolutionary new search algorithm which changed the way you search the web had nothing to do with it, but was a nice extra.

Over the years, that small hand-written frontpage has morphed into something else. A behind-the-scenes look at the page shows that it's no longer the hand-written simple form of old, but something horrible that went through a minifier (read: obfuscator). Even so, a quick check against the Internet Wayback machine shows that the size of that page has increased twenty-fold, which is a lot. But I could live with that, since at least it looked superficially similar.

Recently, however, they've changed their frontpage so that search-as-you-type is enabled by default. Switching that off requires you to log in. So, you have a choice between giving up your privacy by logging in before you enter a search term, or by having everything you type, including any typos and stuff you may not have confirmed yet, be sent over to a data center god knows where. Additionally, at the first character you type, the front page switches away to the results page, causing me to go "uh?!?" as I try to find where they moved my cursor to. This is annoying.

Duckduckgo doesn't do these things; and since they also don't do things like combining my typing skills, phone contact list, calendar, and chat history to figure out that I might be interested in a date, I'm a lot more comfortable using them.

So a few days ago, I decided to switch my default search engine in chromium to duckduckgo. It still feels a bit weird, to be using a browser written by one search engine to search something on another; but all in all, it's been a positive experience. And the fact that wikipedia results are shown first, followed by (maybe) one ad, followed by other search results, is refreshing.

We'll see how far this gets us.

Wed, 18 Apr 2012

Screen scraping sucks

At a customer, I've migrated a number of manually-maintained servers to having them be maintained through puppet not so long ago. Since then, some more machines have been added, and getting them up and running properly was a breeze: do a base install, install puppet, sign the certificate, restart puppet, and then wait and twiddle thumbs while puppet did its magic. Easy as pie.

Now, a few months later, we needed to install a number of windows machines for a lab (not my choice), and the person involved asked me to figure out some diskspace so we could start creating images for those.

Not a chance.

Instead, I suggested looking for a configuration management system, similar to puppet. Since we're using Samba 3 to run the Windows network here, dropping everything in Active Directory was not an option. But a short while later, he came back with the note that puppet, in its 2.7 version, actually does support Windows as a platform for the managed machines.

Interesting.

The unfortunate bit was that puppet supports creating files and installing software when it is distributed as an MSI file, but not when it's distributed as a .exe file. This is not unexpected; MSI files can be installed noninteractively; but when something is distributed as a .exe file, it means it needs to be installed interactively; and puppet does not have the ability to interact with GUI software.

The workaround: use something that does have that ability (in my case, autoit), and use an exec block in puppet to make it call those scripts. In effect, that's a bit like screenscraping. Add a creates stanza to the block, so that the installer isn't started again if the software at hand has already been installed. This 'autoit' thing also comes with a recording utility, allowing one to create an initial script by just doing the installation, and having the tool just record stuff.

With that, the machines are installed 99% automated. I say 99%, because there are still some issues:

I'll have to think about this some more, I guess. First, it's clear that while puppet does have some Windows functionality, it's not entirely ready yet. And somehow, using autoit to add to Puppet functionality feels like an ugly hack.

We'll see what the future brings.

Sat, 14 Apr 2012

LOAD: tutorial on Debian Packaging

Two weeks ago, I was at LOAD, where I did a tutorial on Debian Packaging. Unfortunately, I wasn't doing very well physically, and as a result my preparation wasn't what I had hoped for. It ended up being not so much a tutorial as a walking the audience through packaging a piece of software—in casu NBD with debhelper in dh mode. Not very difficult, but since there's nothing for people to fall back on afterwards, it may help if they have a writedown of what I said during the session; and I promised to put it on my blog. So, here goes. However, note that the canonical written tutorial on Debian Packaging is here.

With apologies to readers of Planet Debian, most of whom for which all this is probably old hat.

Unlike RPM packaging, in Debian, the data that tells the debian packaging system how to package a piece of software is functionally split among a number of files. All these files go in a toplevel directory in the source package, (unsurprisingly) called 'debian'.

There are four files that are required for every Debian package: the control, changelog, copyright, and rules files. Without these files, dpkg-dev will fail to produce a package—any package.

The first, the control file, contains metadata on packages: the name of the package, its description, the dependencies, etc. Basically, almost all the information you see when you run 'apt-cache show <package>' (with one exception) is in the control file. If you have a source package that builds multiple binary packages, then you should have multiple 'Package' stanzas in the control file.

In my opinion, creating a control file is easiest done by copying a control file from another (similar?) package, and modifying it to suit the software you're dealing with.

The one exception, the one bit of metadata that is not found in the control file is the version number: this data is contained in the changelog file. This file looks like a free-form file for the most part, but it really is a machine-parsable format; as such, it's best to edit it with specialized tools, such as the debian-changelog-mode in the emacs editor, or the debchange script, also available as dch, from the devscripts package. In the changelog, you should document any changes you make to the package, making sure the version number (top line), distribution (unstable, experimental, stable, ...; top line), urgency (top line; used mainly when testing needs to be updated urgently for things like security updates), author (bottom line) and date and time (also bottom line) are correct.

The copyright file, unsurprisingly, should contain the copyrights statements of the original software, and the license (or a reference to a copy of the same license text under /usr/share/common-licenses). This file is still free-format in most packages today, although a machine-readable format for this file has recently been defined. Of the four required files, this is probably the most boring one, but hey, we can't like everything we do.

The last file, the rules file, is where all the action happens. This file is defined as a Makefile, of which the targets are called for various parts of the build system.

The rules file has a number of required targets (such as build, clean, and others), but if you're using debhelper in dh mode, you don't need to worry about those; instead, you can use the following simple rules file:

#!/usr/bin/make -f

%:
	dh $@

With the first line, the shebang, we make clear that this is a makefile, and that it should be called by make. The next is a generic target (the % is a wildcard for make), with just one command: 'call dh with the name of the target being called'. Since dh implements all required rules targets, that immediately gets you a working package. Go ahead, try! Run 'dpkg-buildpackage -rfakeroot', and see what happens.

Did that work? Maybe, maybe not. First, you'll have seen loads of warnings about something involving a "compatibility level". This is from debhelper; this compatibility level, or 'API level' if you wish, allows debhelper to move forward and make incompatible changes without breaking all the existing packages out there; whenever the compat level is bumped, some new functionality will only be made available if you also raise the compat level in your source package (and then you'll know you may have some changes to do in your package). Since the original debhelper, over a decade ago, did not have a compat level, the absense of a compat level signals compat level 1, which is far outdated now, and about to be unsupported. Hence the warning. The fix is simple: create a file debian/compat containing a single number: the compatibility level you're working with. It's best to use the most recent level which debhelper supports when you create your package, which as of this writing is level 9.

Ignoring the compatibility level, if you're building a software package which uses a well-established build system, and you only build one binary package out of that, chances are pretty high that everything worked as expected. If not, you'll have more work.

Building multiple packages requires that you tell debhelper somehow which file goes in which package. You can do this with a file debian/package.install, where package is the name of the binary package you're building.

The install file is read by dh_install, one of many tools in the debhelper suite. You see, in the old days (before debhelper 7), debhelper was just a suite of tools, which required that you wrote a debian/rules file containing all the individual tools to be called in the correct order. This mode is still supported; in fact, even if you do use dh, you do, since all that tool really does is call the right tools in the right order; the real work is done by the invidual tools. They all have their own man page; to understand how dh_install chooses which file to put in which package, go read man dh_install. Go ahead, do it now; I'll wait.

Back? Good. You've now learned that dh_install can install files either from the source directory (useful for packages containing only scripts or so), or from a directory debian/tmp. If you use an autotools-based software package, this is what dh will do for you.

When you called dpkg-builpackage above, you may also have noticed that the output contained many lines starting with dh_, one of which said dh_install. As you may have guessed, dh echoes every debhelper command just before it will execute them. This allows you to look at the output, and see what's happening. For more detail, set the environment variable DH_VERBOSE to a non-zero value.

I'm sure you'll have seen one or two debhelper tools that could make your package better. Go ahead, go and read their man pages to see what they do, and how they do it. Most of these will require you to create a file debian/package.toolname to specify details.

In some cases, you may need to specify command-line arguments to the tool to get it to do what you want. In yet other cases, the tool won't support what you need it to do, and you'll have to do something manual. What to do now? The dh command line doesn't support adding extra commands easily. Does this mean you'd have to revert to old-style long, non-dh debian/rules files?

Luckily, no. You can create an override target. If dh detects that you have such a target for a particular tool, it will call that target instead of the tool. This target's rules can then call the tool in question (or not), and add any command line arguments, or extra commands, as needed.

An override target is a normal Makefile target with a name of the form override_dh_something, where dh_something is the name of the tool you wish to override.

At this point, I'd reached almost the end of my two allotted hours, and a member of the audience asked how to make dpkg deal with configuration files.

The answer is, you don't need to do anything! Not if you use debhelper, anyway; all you need to do is install the file in /etc. Since debian policy specifies that all files in /etc need to be configuration files and that no conffiles may be placed anywhere but in /etc, debhelper will automatically mark any file installed in /etc as a conffile, which will cause dpkg to ask the user what to do with changes to such files.

Note that in Debian parlance, a conffile is not the same thing as a configuration file: a conffile is a configuration file that is part of the binary package (i.e., if you call 'dpkg --contents' on the .deb file, you'll see it in the output), whereas a configuration file is a file (any file, including files generated during or after installation of the .deb file on a system).

Occasionally, this is also why some differences to configuration files are managed through debconf, while others aren't: the ones managed through debconf are non-conffile configuration files managed through ucf, whereas the others are conffiles.

So in the simplest of cases, if you want to install a configuration file and you want to make sure it's protected against accidental overwrites by package upgrades, all you need to do is make sure it's installed to /etc; debhelper will do the rest.

And that concludes this introduction. Please note it's only an introduction, not a full-blown tutorial; while this will allow you to get started, you may have to learn a bit more if you wish to eventually upload a package to Debian.

Fri, 13 Apr 2012

On PHP

This dude nails it. Well, almost—can't say I agree with the python bit. But other than that, yeah, pretty much what's wrong with PHP.

(Add that to what google seems to consider my most popular bit of code, ever, and, well, hmpf).

Thu, 29 Mar 2012

Apple hates quality

I own a DAS Ultimate S keyboard, and am very happy with it. Yes, the price tag is pretty high, especially when compared to those cheap keyboards you'll find in every bakery shop these days; but the difference in quality isn't something to be ignored.

A while back, I got three PowerMac G4 machines from someone, so I could work on them for Debian. They'd been gathering dust for a while, since I'd been busy with other things, but this week I had a look at them.

One of the three was incomplete; it lacked a video card. The other two, however, were quite usable. They came with MacOS 10.4.something, and one of them had a 40GB hard disk, while the other had an 80GB one. The disks were quite noisy, but I've already replaced them with a CF-IDE adapter, so now they're running off some solid-state storage, which should be at least somewhat faster, if not entirely efficient.

However, installing a quality operating system on them proved to be somewhat harder than expected.

When I connect the DAS keyboard to the PowerMac, put the Debian CD in the drive, and reboot the machine, it will eject the CD rather than trying to boot off of it.

When I disconnect the keyboard, it will boot off the CD fine, but since OpenFirmware apparently doesn't support USB hotplugging, that means yaboot will sit there forever waiting for me to enter "install" and hit enter.

When I replace the high-quality, 130-euro DAS keyboard with a cheap crappy "I don't know what a proper ISO layout looks like" 5-euro Logitech "keyboard", suddenly everything works as it should.

Except that my hands hurt from trying to type on that thing, of course.

I swear, Apple hates quality.

Tue, 27 Mar 2012

Bjorn Monnens on Debian

If this were twitter (and if I had a twitter account, which I don't), this post would start with "RT". As it is, you'll have to make do with this link: Bjorn Monnens on why he chose Debian testing over Fedora.

This weekend I switched again to Debian 6 testing as this is also Gnome 3. I had to do the same stuff as for Fedora but I noticed already that it’s much more stable and it is also much faster than Fedora. Boot is really a difference of +30 seconds.

Go team!

(with apologies to those who read this blog through Planet Grep, as Bjorn is there too)

Wed, 15 Feb 2012

On documentation

If you're going to write documentation, then make sure it means something. So that if I find some term in your UI that I don't understand, and I decide to look up the term in the documentation, I get something more than just the fact that the term exists and that I can switch the feature, whatever it does, on or off.

Because that doesn't help me squat, thank you very much.

Wed, 28 Dec 2011

Rsync'ing over a newer subversion (fsfs) repository

So there are two servers; let's call them srv1 and srv2. Srv1 contains a bunch of subversion repositories, but these are to be migrated to srv2. Since the repositories are not (just) used for ascii-only files, they're fairly big (several tens of gigabytes, altogether), so copying them from one server to the other would take a while. In order to make sure this would happen quickly, we had already copied them over to the new server, so that on the final switch would be quick (an rsync that would copy over just the new done transactions).

That final switch was today. Only I didn't know that instead of just testing, the customer had already started using one of the repositories (and they'd forgotten to remind me), so the subversion repository suddenly jumped backwards in time. In addition, the new server wasn't being backed up yet (at least not for the subversion bits), so restoring from backups wasn't an option. Oops.

Luckily the solution is fairly simple. You see, fsfs stores each revision in a unique file; that means that as long as nobody has committed to the repository yet (which they couldn't, since system users aren't the same on both servers, so the webserver didn't have write permissions on the repository after the rsync), nothing is lost. One only needs to manually change the repository so that whatever subversion thinks is the latest commit, actually is the latest commit.

That information is stored in a file called db/current inside the repository. What's in that file depends on the repository version, which is stored in a file called db/format in the repository. For versions 1 and 2, the format is a single line with three space-separated values, of which the first is the last revision number used in the repository. The other two are counters that are used to give transactions unique names; and they, too, need to be up-to-date. For version 3 and above, the file contains only the revision number; there, the other two are derived from that instead of having their own unique number.

Figuring out the last used revision number in an fsfs repository is ridiculously easy:

ls -v db/revs|tail -n1

So if you've got a repository of fsfs version 3 or above, just change the revision number in the db/current file (after taking backups and making sure nobody can access the repository while you're doing this, of course), and you're all set.

Unfortunately, in my case, the repository was still in fsfs version 2, which meant I could not change just the revision number and not expect trouble. I suppose it should've been possible to figure out what the last transaction numbers are, somehow, so that I could fix the current file completely, but I reasoned that upgrading to a newer repository format might have other advantages too, so I just dumped the repository and reloaded it, and everything worked at that point.

Fri, 23 Dec 2011

Static program analysis with LLVM and clang

"Static program analysis" is a technique whereby a program is verified for errors without actually running it. Finding bugs manually with a debugger and one's brain is tedious, so every shortcut that can help you avoid having to do so is great.

There are some commercial tools available to do such analysis, some of which are rather expensive; but there are also some open source tools available to do similar things. One of these is built into clang, the C compiler of the LLVM project.

Using it is fairly simple. Instead of compiling something with 'make', compile it with 'scan-build make'. This will set the CC (and similar) environment variable(s) so that before the compiler is ran, the clang static-analysis checker is run over the very same source code. The output of this checker is an HTML file with your source, but with comments added to explain the bugs which the tool found.

What does that mean? well, let's look at an example, shall we?

The iframe above contains one out of three reports produced by a scan-build run over the nbd source code (if there's no iframe, someone scrubbed some HTML in your RSS reader. Just follow the link just above instead). The other two are 'dead assignments', which might mean that I'm currently depending on some undefined behaviour (which would be bad), or it might mean that I'm being overly cautious (which makes my code future-proof, which would be good), or it might mean something else—I still need to investigate. But this one is pretty interesting.

In the example, there are eight, numbered, comments in the source code. The first seven show the code path which scan-build took through my code before getting at the eighth comment; and the eighth is where things go bad. In this case, when going through the function as shown, we have a NULL pointer dereference.

When looking at the scan-build output, it's important to realize a few things. First, the code path shown may be just one of a number of possible code paths. For instance, the null pointer dereference would still happen if the phase function parameter would not contain the NEG_INIT bit with the client pointer set to NULL. However, clang does not show these other code paths; this is presumably an optimization ("if we've already shown that this kind of bug is possible at a particular location through one code path, don't bother recording future instances of that very same bug at the exact same location through another code path"). This means that sometimes, some of the branches shown may be completely irrelevant to the bug at hand. In this particular case, in fact, it's possible to show the bug with just the eighth comment; the first seven are in fact totally irrelevant.

Second, the fact that the clang static analyser found a bug does not mean that it's possible to crash the application. Yet. In this particular case, the negotiate function will never be called with the NEG_INIT or NEG_MODERN bits not set, and with the client parameter set to NULL. That's an implicit assertion; there are a few ways in which this function may be called, but the client parameter may only be NULL if NEG_INIT and NEG_MODERN are both set at the same time.

Since nbd-server doesn't currently call the negotiate function in that way, it is not currently possible to crash the server by exploiting this bug. But that doesn't mean it won't ever be possible, nor that there isn't a bug in the code. We may assume that the above rules are true, but we never check it. Adding an assertion to that effect should make sure that no future change to the code will accidentally introduce that error and cause a NULL pointer dereference.

Is this a silly and useless precautionary measure? Not really. Usually, bugs happen in code not because someone wasn't thinking straight, but because there's so much going on inside a piece of software that it can be too much for any programmer to remember. If a function assumes that its parameters are within a given subset of all possible states, but does not check that this is in fact true, then when (not if) some future change incorrectly introduces a state that is outside of the assumed states, things will break. And that's Bad(TM).