Howto Restore ZFS Encryption Hierarchies

Backing up encrypted ZFS datasets you’ll see that ZFS breaks up the encryption hierarchy. The backed up datasets will look like they’ve all been encrypted separately. You can still use the (same) original key to unlock all the datasets, but you’ll have to unlock them separately. 😐

This howto should help you bring them back together when you have to restore from a backup.

Assuming we’ve created a new and encrypted pool to restore the previous backup to (I’ll call it new_rpool). We send our data from the backup pool to new_rpool.

sudo zfs send -w -v backup/laptop/rpool/ROOT@zrepl_20210131_223653_000 | sudo zfs receive -v new_rpool/ROOT
sudo zfs send -w -v backup/laptop/rpool/ROOT/ubuntu@zrepl_20210402_113057_000 | sudo zfs receive -v new_rpool/ROOT/ubuntu
[...]

Note that we’re using zfs send -w which sends the encrypted blocks “as is” from the backup pool to new_pool. This means that these datasets can only be decrypted with the key they were originally encrypted with.

Also note that you cannot restore an encrypted root/pool dataset with another encrypted one: i.e. we can’t restore the contents/snapshots of rpool to new_rpool (at least not without decrypting them first on the sender, sending them unencrypted and reencrypting them upon receive). Luckily for me that dataset is empty. 😎

Anyway … our new pool should look something like this now:

$ zfs list -o name,encryption,keystatus,keyformat,keylocation,encryptionroot -t filesystem,volume -r new_rpool
NAME                   ENCRYPTION   KEYSTATUS    KEYFORMAT   KEYLOCATION  ENCROOT
new_rpool              aes-256-gcm  available    passphrase  prompt       new_rpool
new_rpool/ROOT         aes-256-gcm  unavailable  raw         prompt       new_rpool/ROOT
new_rpool/ROOT/ubuntu  aes-256-gcm  unavailable  raw         prompt       new_rpool/ROOT/ubuntu
[...]

Note that each dataset is treated as it is encrypted by itself (visible in the encryptionroot property). To restore our ability to unlock all datasets with a single key we’ll have to to some work.

First we have to unlock each of these datasets. We can do this with the zfs load-key command (my data was encrypted using a raw key in a file, hence the -L file:///...):

sudo zfs load-key -L file:///tmp/backup.key new_rpool/ROOT
sudo zfs load-key -L file:///tmp/backup.key new_rpool/ROOT/ubuntu
[...]

Although zfs load-key is supposed to have a -r option that works when keylocation=prompt it fails for me with the following error message 🤨:

sudo zfs load-key -r -L file:///tmp/backup.key new_rpool/ROOT

alternate keylocation may only be 'prompt' with -r or -a
usage:
        load-key [-rn] [-L <keylocation>] <-a | filesystem|volume>



For the property list, run: zfs set|get



For the delegated permission list, run: zfs allow|unallow

The keystatus should have changed to available now:

$ zfs list -o name,encryption,keystatus,keyformat,keylocation,encryptionroot -t filesystem,volume -r new_rpool
NAME                   ENCRYPTION   KEYSTATUS    KEYFORMAT   KEYLOCATION  ENCROOT
new_rpool              aes-256-gcm  available    passphrase  prompt       new_rpool
new_rpool/ROOT         aes-256-gcm  available    raw         prompt       new_rpool/ROOT
new_rpool/ROOT/ubuntu  aes-256-gcm  available    raw         prompt       new_rpool/ROOT/ubuntu
[...]

We can now change the encryption keys and hierarchy by inheriting them (similar to regular dataset properties):

sudo zfs change-key -l -i new_rpool/ROOT
sudo zfs change-key -l -i new_rpool/ROOT/ubuntu
[...]

When we list our encryption properties now we can see that all the datasets have the same encryptionroot. This means that unlocking it unlocks all the other datasets as well. 🎉

$ zfs list -o name,encryption,keystatus,keyformat,keylocation,encryptionroot -t filesystem,volume -r new_rpool
NAME                   ENCRYPTION   KEYSTATUS    KEYFORMAT   KEYLOCATION  ENCROOT
new_rpool              aes-256-gcm  available    passphrase  prompt       new_rpool
new_rpool/ROOT         aes-256-gcm  available    passphrase  none         new_rpool
new_rpool/ROOT/ubuntu  aes-256-gcm  available    passphrase  none         new_rpool
[...]

Restoring Dataset Properties

This howto doesn’t touch restoring dataset properties, because I’ve not been able to reliably backup dataset properties using the -p and -b options of zfs send. Therefore I make sure that I have a (manual) backup of the dataset properties with something like `zfs get all -s local > zfs_all_local_properties_$(date -Iminutes).txt`

Moving LXD Containers From One Pool to Another

When I started playing with LXD I just accepted the default storage configuration which creates an image file and uses that to initialize a ZFS pool. Since I’m using ZFS as my main file system this seemed silly as LXD can use an existing dataset as a source for a storage pool. So I wanted to migrate my existing containers to the new storage pool.

Although others seemed to to have the same problem there was no ready answer. Digging through the documentation I finally found out that the lxc move  command had a -s  option … I had an idea. ? Here’s what I came up with …

Preparation

First we create the dataset on the existing ZFS pool and add it to LXC.

sudo zfs create -o mountpoint=none mypool/lxd
lxc storage create pool2 zfs source=mypool/lxd

lxc storage list should show something like this now:

+-------+-------------+--------+--------------------+---------+
| NAME  | DESCRIPTION | DRIVER |       SOURCE       | USED BY |
+-------+-------------+--------+--------------------+---------+
| pool1 |             | zfs    | /path/to/pool1.img | 2       |
+-------+-------------+--------+--------------------+---------+
| pool2 |             | zfs    | mypool/lxd         | 0       |
+-------+-------------+--------+--------------------+---------+

pool1 is the old pool backed by the image file and is used by some containers at the moment as can be seen in the “Used By” column. pool2 is added by not used by any contaiers yet.

Moving

We now try to move our containers to pool2.

# move container to pool2
lxc move some_container some_container-moved -s=pool2
# rename container back for sanity ;)
lxc move some_container-moved some_container

We can check with lxc storage list whether we succeeded.

+-------+-------------+--------+--------------------+---------+
| NAME  | DESCRIPTION | DRIVER |       SOURCE       | USED BY |
+-------+-------------+--------+--------------------+---------+
| pool1 |             | zfs    | /path/to/pool1.img | 1       |
+-------+-------------+--------+--------------------+---------+
| pool2 |             | zfs    | mypool/lxd         | 1       |
+-------+-------------+--------+--------------------+---------+

Indeed pool2 is beeing used now. ? Just to be sure we check that zfs list -r mypool/lxd  also reflects this.

NAME                                  USED  AVAIL  REFER  MOUNTPOINT
mypool/lxd/containers                 1,08G  92,9G    24K  none
mypool/lxd/containers/some_container  1,08G  92,9G   704M  /var/snap/lxd/common/lxd/storage-pools/pool2/containers/some_container
mypool/lxd/custom                       24K  92,9G    24K  none
mypool/lxd/deleted                      24K  92,9G    24K  none
mypool/lxd/images                       24K  92,9G    24K  none
mypool/lxd/snapshots                    24K  92,9G    24K  none

Awesome!

⚠ Note that this only moves the container, but not the LXC image it was cloned off of.

We can repeat this until all containers we care about are moved over to pool2.

Cleanup

To prevent new containers to use pool1  we have to edit the default  profile.

# change devices.root.pool to pool2
lxc profile edit default

Finally …. when we’re happy with the migration and we’ve verified that everything works as expected we can now remove pool1.

lxc storage rm pool1