Backing up large zpools

I’ve just recently heard the term “data gravity”, which implies that large amounts of data are next to impossible to move. I’ve experienced this with my ZFS NAS, which makes heavy use of compression and deduplication to cram tons of backup VMWare images to a 2TB mirrored volume.

To back it up I simply did what seems to be best practice around the Internets: I attached a USB volume, used zfs send to synchronize only the changes between the most recent common snapshots, and that was it. However, the process took a long time.

Fast forward a couple of months, when I naively played with the thing and tried virtualizing it (VMware + raw device map), in the process corrupting the zpool so badly, that it had to be restored from backup in its entirety.

If I thought syncing the changes to USB took long (days), this took weeks. That’s when I learned about mbuffer to speed up zfs sync, but dedup and compression still took their toll. Plain uncompressed and undeduped file systems synced with normal USB read speeds (about 15MB/s), while dedup slowed zfs send down by two orders of magnitude.

So I’ve resorted to a little advertised approach, which has so far also proved to be the most fool-proof. It’s using the zpool resilvering mechanism for what’s called a split mirror backup.

It goes like this: you create a 3-way mirror. Pull out one disk and shelf it as backup. When you want to “refresh” your backup, plug it back in and put it online using something like zpool online tank c5t0d0.

A great surprise was how smart the resilvering process is. It doesn’t rewrite the entire disk, but only copies the changes. There’s a lot of advantages over zfs sending, at least from my point of view:

  • it’s fast, without the need to fiddle with mbuffer or copy each filesystem individually
  • no snapshots needed to serve as reference for incremental zfs send
  • creates an exact copy (entire zpool, with all the properties intact), so if your server burns, the backup drive can immediately serve as a seed for a new server
  • if you pull out a disk during resilvering, there seems to be no harm and the process just continues next time

The only drawback is that your pool is always in a degraded state, because a disk is missing. But that’s really a minor inconvenience.