I have a home server.

Initially conceived and sized so I could digitize my (rather sizeable) DVD collection, I started using it for other things; I added a few play VMs on it, started using it as a destination for the deja-dup-based backups of my laptop and the time machine-based ones of the various macs in the house, and used it as the primary location of all the photos I've taken with my cameras over the years (currently taking up somewhere around 500G) as well as those that were taking at our wedding (another 100G). To add to that, I've copied the data that my wife had on various older laptops and external hard drives onto this home server as well, so that we don't lose the data should something happen to one or more of these bits of older hardware.

Needless to say, the server was running full, so a few months ago I replaced the 4x2T hard drives that I originally put in the server with 4x6T ones, and there was much rejoicing.

But then I started considering what I was doing. Originally, the intent was for the server to contain DVD rips of my collection; if I were to lose the server, I could always re-rip the collection and recover that way (unless something happened that caused me to lose both at the same time, of course, but I consider that sufficiently unlikely that I don't want to worry about it). Much of the new data on the server, however, cannot be recovered like that; if the server dies, I lose my photos forever, with no way of recovering them. Obviously that can't be okay.

So I started looking at options to create backups of my data, preferably in ways that make it easily doable for me to automate the backups -- because backups that have to be initiated are backups that will be forgotten, and backups that are forgotten are backups that don't exist. So let's not try that.

When I was still self-employed in Belgium and running a consultancy business, I sold a number of lower-end tape libraries for which I then configured bacula, and I preferred a solution that would be similar to that without costing an arm and a leg. I did have a look at a few second-hand tape libraries, but even second hand these are still way outside what I can budget for this kind of thing, so that was out too.

After looking at a few solutions that seemed very hackish and would require quite a bit of handholding (which I don't think is a good idea), I remembered that a few years ago, I had a look at the Amazon Storage Gateway for a customer. This gateway provides a virtual tape library with 10 drives and 3200 slots (half of which are import/export slots) over iSCSI. The idea is that you install the VM on a local machine, you connect it to your Amazon account, you connect your backup software to it over iSCSI, and then it syncs the data that you write to Amazon S3, with the ability to archive data to S3 Glacier or S3 Glacier Deep Archive. I didn't end up using it at the time because it required a VMWare virtualization infrastructure (which I'm not interested in), but I found out that these days, they also provide VM images for Linux KVM-based virtual machines (amongst others), so that changes things significantly.

After making a few calculations, I figured out that for the amount of data that I would need to back up, I would require a monthly budget of somewhere between 10 and 20 USD if the bulk of the data would be on S3 Glacier Deep Archive. This is well within my means, so I gave it a try.

The VM's technical requirements state that you need to assign four vCPUs and 16GiB of RAM, which just so happens to be the exact amount of RAM and CPU that my physical home server has. Obviously we can't do that. I tried getting away with 4GiB and 2 vCPUs, but that didn't work; the backup failed out after about 500G out of 2T had been written, due to the VM running out of resources. On the VM's console I found complaints that it required more memory, and I saw it mention something in the vicinity of 7GiB instead, so I decided to try again, this time with 8GiB of RAM rather than 4. This worked, and the backup was successful.

As far as bacula is concerned, the tape library is just a (very big...) normal tape library, and I got data throughput of about 30M/s while the VM's upload buffer hadn't run full yet, with things slowing down to pretty much my Internet line speed when it had. With those speeds, Bacula finished the backup successfully in "1 day 6 hours 43 mins 45 secs", although the storage gateway was still uploading things to S3 Glacier for a few hours after that.

All in all, this seems like a viable backup solution for large(r) amounts of data, although I haven't yet tried to perform a restore.