In my previous post, I explained how I recently set up backups for my home server to be synced using Amazon's services. I received a (correct) comment on that by Iustin Pop which pointed out that while it is reasonably cheap to upload data into Amazon's offering, the reverse -- extracting data -- is not as cheap.
He is right, in that extracting data from S3 Glacier Deep Archive costs over an order of magnitude more than it costs to store it there on a monthly basis -- in my case, I expect to have to pay somewhere in the vicinity of 300-400 USD for a full restore. However, I do not consider this to be a major problem, as these backups are only to fulfill the rarer of the two types of backups cases.
There are two reasons why you should have backups.
The first is the most common one: "oops, I shouldn't have deleted that file". This happens reasonably often; people will occasionally delete or edit a file that they did not mean to, and then they will want to recover their data. At my first job, a significant part of my job was to handle recovery requests from users who had accidentally deleted a file that they still needed.
Ideally, backups to handle this type of situation are easily accessible to end users, and are performed reasonably frequently. A system that automatically creates and deletes filesystem snapshots (such as the zfsnap script for ZFS snapshots, which I use on my server) works well. The crucial bit here is to ensure that it is easier to copy an older version of a file than it is to start again from scratch -- if a user must file a support request that may or may not be answered within a day or so, it is likely they will not do so for a file they were working on for only half a day, which means they lose half a day of work in such a case. If, on the other hand, they can just go into the snapshots directory themselves and it takes them all of two minutes to copy their file, then they will also do that for files they only created half an hour ago, so they don't even lose half an hour of work and can get right back to it. This means that backup strategies to mitigate the "oops I lost a file" case ideally do not involve off-site file storage, and instead are performed online.
The second case is the much rarer one, but (when required) has the much bigger impact: "oops the building burned down". Variants of this can involve things like lightning strikes, thieves, earth quakes, and the like; in all cases, the point is that you want to be able to recover all your files, even if every piece of equipment you own is no longer usable.
That being the case, you will first need to replace that equipment, which is not going to be cheap, and it is also not going to be an overnight thing. In order to still be useful after you lost all your equipment, they must also be stored off-site, and should preferably be offline backups, too. Since replacing your equipment is going to cost you time and money, it's fine if restoring the backups is going to take a while -- you can't really restore from backup any time soon anyway. And since you will lose a number of days of content that you can't create when you can only fall back on your off-site backups, it's fine if you also lose a few days of content that you will have to re-create.
All in all, the two types of backups have opposing requirements: "oops I lost a file" backups should be performed often and should be easily available; "oops I lost my building" backups should not be easily available, and are ideally done less often, so you don't pay a high amount of money for storage of your off-sites.
In my opinion, if you have good "lost my file" backups, then it's also fine if the recovery of your backups are a bit more expensive. You don't expect to have to ever pay for these; you may end up with a situation where you don't have a choice, and then you'll be happy that the choice is there, but as long as you can reasonably pay for the worst case scenario of a full restore, it's not a case you should be worried about much.
As such, and given that a full restore from Amazon Storage Gateway is going to be somewhere between 300 and 400 USD for my case -- a price I can afford, although it's not something I want to pay every day -- I don't think it's a major issue that extracting data is significantly more expensive than uploading data.
But of course, this is something everyone should consider for themselves...
There are some other tools which might be worth investigating. I've used duplicati for a while, which de ups, compresses, encrypts and then stores on S3 or other backends, but I've been running issues with larger backup sets.
Now I'm trialling restic. Same story, it's a cli utility to that backs up straight to eg. S3 (or in my case, B2)
Data verification checks also happen in a my script, as well as pruning the history.
restoring is a question of doi g a mount command which gives me visibility into all the snapshots.
I've got a life cycle on it that gives me daily restore possibilities up to 5 years back - and after uploading (which takes forever on Belgian isp speeds).
It might be an idea, as you do t need a big vm to run this.
Hi Wouter,
Of course, there's always more than one way to 'scratch an itch'. Just wanted to share how I solved the issue back here in Aalst:
* Hourly ZFS snaps for those 'lost my file' moments * Encrypted & append-only borg-backup to local USB-mounted disk * Encrypted & append-only borg-backup to borgbase.com
I have been persuaded by the 'encrypted & append-only' functionality of borg-backup, which supposedly helps mitigate ransomware attacks: even if the data is encrypted on your server, the backup server will not overwrite or allow write-access to the existing backups.
The locally mounted USB backup may not add a lot (maybe except if the internet would be down during a restore) - but I had one lying around without a real purpose for it.
Once you have TBs in plural, it gets kind of tricky to find the balance between easy and cheap.
I am pretty happy with my backup solution, but it includes a physical rotation that I would prefer to do on the cloud. In short, on my home server (that plays a NAS on television) I use a mirrored ZFS and alsmost ZFS snapshotting for regular data and an a regular ZFS filesystem for device backups. The last one isn't snapshotted because the backup software on the client OS handle the versioning. This part is solid.
Then I dump the snapshots on an ZFS encrypted disk that I store off-site and rotate in principle every week (2 disks). In principle. That last part sounds like a perfect match for remotely uploading to the cloud (like you this only needs to be restored if the house burns down, snapshots and versioning for the rest).
Whatever I look there seem to be 2 types of solutions: - complicated and fragile (buckets, blah) and not always compatible with ZFS snapshots. - expensive because a) cloud storage can be expensive b) I need to setup a VM on top.
Maybe we just need to start a tit-for-tat service where we store each others backups at night in a throttled connection. We just need a rpi4 to get started... oh wait. Never mind