Efficient server backup 

I do back up my servers, of course1. I rsync the image files (VM snapshots, since all servers are virtual) to a remote location. The good part about VM snapshots is that a consistent disk state can be copied while the machine is running, regardless of the OS and guest FS (though, snapshots alone are also possible on NTFS, Ext+LVM, ZFS and possibly many other file systems, but each case has to be handled individually).

So I devised the following strategy: since this site lives on a modest Linux machine with a net footprint of only 2GB, I decided to copy it off-site every night (rsync also compresses the data, so only 1GB has to be copied). For other (less actively changing) servers (net sizes of 10GB and 60GB), I decided on weekly and monthly backups, respectively. I have enough storage2, so I decided to keep old copies to have some fun (or rather, to see how long I’d last with Google’s strategy of not throwing anything away, ever).

2GB is clearly not much, but it adds up over time. All in all, adding up all images, it should come close to 2TB / year.

Now, 2 TB is also not that much. A 2TB drive costs about 80 EUR today3, and it will cost half as much before the year is over. But it becomes incredibly annoying to have your hard drive filled with nearly identical backup copies, gobbling up space like a starving sumo wrestler.

So my first idea to bring the number down was to compress the backups on the fly. I played with NTFS compression on the target server and was quite satisfied: the images shrunk to about 60% of the original size, which would yield 1.2TB/year. Better.

It’s well known that NTFS compression is not very efficient, but it’s pretty fast. That’s a trade-off I don’t really need for what is basically a one-time archiving.

So I tried to shrink about 100GB of mostly duplicate data (50 almost identical images of 2GB, taken 24 hrs apart over 50 days) with different compression and deduplication software.

For starters, since I had great experience with WinRAR, I tried that. WinRAR has a special “solid archive” mode which treats all files combined as a single data stream, which proves beneficial in most cases. Countless hours later, I was disappointed. Data only shrunk to about 40%4, which is exactly the same as if compressing individual files without solid compression. Pretty much the same with tar+bzip25, which performed a tad worse with 44% compression.

This is an excellent moment to ponder on the difference between compression and deduplication: while they may seem the same (they both reduce the size of a data stream by removing redundancy) and even operate in a similar manner (at least in my case, where different files were concatenated together using tar) they differ in the size of the compression window. Bzips of the world will obviously only use small windows to look for the similarities and redundancy, while deduplication ignores fine-grained similarities and tries to identify identical file system blocks (which usually range from 4KB to 128KB).

So I figured deduplication might be worth a try. I found an experimental filesystem with great promise, called Opendedup (also SDFS). I got it to work almost without hiccups: it creates a file-based container which can be mounted as a volume6. However, it’s turned out to be somewhat unreliable and crashed a couple of times. I still got the tests done, however, and it significantly outperformed simple compression by bringing the data down to 20% (with block size 128k; it might perform even better with a block size of 4K, but that would take 32 times more memory  since all the hash tables for block lookup must be stored in RAM).

Next stop: ZFS. I repeated the same test with ZFS@8K blocks, and it performed even better with over 6-fold reduction in size (likely a combination of smaller block size and added gzip-9 block compression. Opendedup, however, doesn’t support compression) — the resulting images were only around 15% the original size.

Impressive, but on the other hand, what the heck? There’s no way 15GB of unique data has misteriously appeared in 50 days within this 2GB volume. The machine is mostly idle, doing nothing but appending data to a couple of log files.

I’ve read much about ZFS block alignment, and it still made no sense. Files withing the VM image should not just move around the FS, so where could the difference come from? As the last option, I tried xdelta, a pretty efficient binary diff utility. It yielded about 200MB of changes between successive images, which would bring 100GB down to 12% (2GB + 49 x 200MB).

Could there really be that much difference in the images? I suspected extfs’ atime attribute, which would (presumably) change a lot of data by just reading files. It would make perfect sense. So I turned it off7. Unfortunately, there was no noticeable improvement.

Then I tried to snapshot the system state every couple of hours. Lo and behold, deltas of only 100K most of the day, and then suddenly, a 200MB spike. It finally occurred to me that I’ve been doing some internal dumping (mysqldump + a tar of /var/www) every night as a cron job, yielding more than 200MB of data, which I’d SCP somewhere else and then immediately delete. And while deleting it does virtually free the disk space, the data’s still there — in a kind of digital limbo.

So I tried to zero out the empty space, like this (this writes a 100MB file full of zeros; change the count to fill most of the disk; when done, delete it):

dd if=/dev/zero of=zero.file bs=1024 count=102400
rm zero.file

The results were immediate: the backup snapshot itself shrunk to 1.7GB. And what’s more important: the successive backups differed by only about 700K (using xdelta again). My first impression is that Opendedup and ZFS should also do a much better job now.

But having gone through all that trouble, I’ve found myself longing for a more elegant solution. And in the long run, there really is just one option, which will likely become prevalent in a couple of years: to make efficient backups, you need an advanced filesystem which allows you to backup only the snapshot deltas. Today, this means ZFS8. And there’s not that many options if you want ZFS. There’s OpenIndiana (former OpensSolaris), which I hate. It’s bloated, eats RAM for breakfast and has an awfully unfamiliar service management. And to top all that, it doesn’t easily support everything I need (which is: loads of Debian packages).

Then there’s ZFS fuse, and ZFS on BSD. Both lag behind latest zpool versions singnificantly, not to mention slow I/O speeds. I just wouldn’t put that into production.

Luckily, there’s one last option: Nexenta Core — Solaris kernel with GNU userland. I’ve seen some pretty good reviews which convinced me to give it a try, and I’ve been running it for a couple of weeks without problems. And if it delivers on its promise, it really is the best of two worlds. Included zpool (version 26) supports deduplication, endless snapshots, data integrity checking, and the ability to send an entire filesystem snapshot or just deltas with built-in zfs send. And the best part, compared to OpenIndiana: everything I need installs or compiles just fine. Expect to hear more about this.

  1. last week was World Backup Day 2011 []
  2. two 2TB WD elements attached to a recycled EEE701 netbook []
  3. look here []
  4. WinRAR 3.92, solid archive, compression=rar, method=best []
  5. simple tar cjvf, bzip2 v1.05 []
  6. I even succeeded mounting it as a drive under Windows, using Dokan, a windows version of FUSE []
  7. adding noatime flag in /etc/fstab and remounting the filesystem []
  8. I’m eagerly awaiting BTRFS, but currently it’s pretty much in its infancy []