Faster tar

I have a new laptop. The new one is a Dell Latitude 5521, whereas the old one was a Dell Latitude 5590.

As both the old and the new laptops are owned by the people who pay my paycheck, I'm supposed to copy all my data off the old laptop and then return it to the IT department.

A simple way of doing this (and what I'd usually use) is to just rsync the home directory (and other relevant locations) to the new machine. However, for various reasons I didn't want to do that this time around; for one, my home directory on the old laptop is a bit of a mess, and a new laptop is an ideal moment in time to clean that up. If I were to just rsync over the new home directory, then, well.

So instead, I'm creating a tar ball. The first attempt was quite slow:

tar cvpzf wouter@new-laptop:old-laptop.tar.gz /home /var /etc

The problem here is that the default compression algorithm, gzip, is quite slow, especially if you use the default non-parallel implementation.

So we tried something else:

tar cvpf wouter@new-laptop:old-laptop.tar.gz -Ipigz /home /var /etc

Better, but not quite great yet. The old laptop now has bursts of maxing out CPU, but it doesn't even come close to maxing out the gigabit network cable between the two.

Tar can compress to the LZ4 algorithm. That algorithm doesn't compress very well, but it's the best algorithm if "speed" is the most important consideration. So I could do that:

tar cvpf wouter@new-laptop:old-laptop.tar.gz -Ilz4 /home /var /etc

The trouble with that, however, is that the tarball will then be quite big.

So why not use the CPU power of the new laptop?

tar cvpf - /home /var /etc | ssh new-laptop "pigz > old-laptop.tar.gz"

Yeah, that's much faster. Except, now the network speed becomes the limiting factor. We can do better.

tar cvpf - -Ilz4 /home /var /etc | ssh new-laptop "lz4 -d | pigz > old-laptop.tar.gz"

This uses about 70% of the link speed, just over one core on the old laptop, and 60% of CPU time on the new laptop.

After also adding a bit of --exclude="*cache*", to avoid files we don't care about, things go quite quickly now: somewhere between 200 and 250G (uncompressed) was transferred into a 74G file, in 20 minutes. My first attempt hadn't even done 10G after an hour!

Posted