Fixing Dracut for Encrypted ZFS on Root on Ubuntu 25.10

I just upgraded from Ubuntu 25.04 to 25.10 … well it was more of a reinstall really. Because I knew the new release changed the initrd-related tools to Dracut I tried to understand all the changes from a test installation in a VM. Well, I still somehow broke Dracut’s ability to unlock my encrypted ZFS on root setup automatically.

Looking at journalctl it claimed it couldn’t find the key file:

dracut-pre-mount[940]: Warning: ZFS: Key /run/keystore/rpool/system.key for rpool/enc hasn't appeared. Trying anyway.
[...]
dracut-pre-mount[1001]: Key load error: Failed to open key material file: No such file or directory
[...]
systemd[1]: Mounting sysroot.mount - /sysroot...
mount[1007]: zfs_mount_at() failed: encryption key not loaded
systemd[1]: sysroot.mount: Mount process exited, code=exited, status=2/INVALIDARGUMENT
systemd[1]: sysroot.mount: Failed with result 'exit-code'.
systemd[1]: Failed to mount sysroot.mount - /sysroot.
systemd[1]: Dependency failed for initrd-root-fs.target - Initrd Root File System.

All I could do was mounting the keystore manually in the emergency console:

systemd-cryptsetup attach keystore-rpool /dev/zvol/rpool/keystore
mkdir -p /run/keystore/rpool
mount /dev/mapper/keystore-rpool /run/keystore/rpool

After pressing Ctrl-d Systemd continued booting as if everything was OK. This worked, but was HUGELY annoying, especially considering it was also using an English keyboard mapping. 🤬

After I was done setting up my desktop I took the time investigate the issue. I compared all the things between my real system and the freshly setup VM. After comparing the system startup plots (exported with systemd-analyze plot > plot.svg) I noticed that the systemd-ask-password.service would start quite late in my real system (after I manually mounted the keystore). I knew there was a bug report for teaching Dracut Ubuntu’s ZFS on root encryption scheme (i.e. putting the root ZFS dataset’s encryption keys in a LUKS container on a Zvol (rpool/keystore)). So I looked at the actual patch and tried to walk through of how it would behave on my system. There I noticed that the script actually assumes the ZFS encryption root to be the same as the Zpool’s root dataset (e.g. rpool). 😯 I moved away from this kind of setup years ago as it makes restoring from a backup quite cumbersome. So I was using a sub-dataset for the encrypted data (e.g. root/crypt) which messed up the logic which assumed it to only contain the pool name. 🤦‍♂️

Long story short the following patch determines the pool name of the encryption root before trying to open and mount the LUKS keystore:

--- zfs-load-key.sh.orig        2025-10-16 20:44:47.955349974 +0200
+++ zfs-load-key.sh     2025-10-16 20:55:00.229000464 +0200
@@ -54,9 +54,11 @@
     [ "$(zfs get -Ho value keystatus "${ENCRYPTIONROOT}")" = "unavailable" ] || return 0

     KEYLOCATION="$(zfs get -Ho value keylocation "${ENCRYPTIONROOT}")"
+    # `ENCRYPTIONROOT` might not be the root dataset (e.g. `rpool/enc`)
+    ENCRYPTIONROOT_POOL="$(echo "${ENCRYPTIONROOT}" | cut -d/ -f1)"
     case "$KEYLOCATION" in
-        "file:///run/keystore/${ENCRYPTIONROOT}/"*)
-            _open_and_mount_luks_keystore "${ENCRYPTIONROOT}" "${KEYLOCATION#file://}"
+        "file:///run/keystore/${ENCRYPTIONROOT_POOL}/"*)
+            _open_and_mount_luks_keystore "${ENCRYPTIONROOT_POOL}" "${KEYLOCATION#file://}"
             ;;
     esac

🎉

Running k3s on Incus

I know the pain to manage a bunch of services on my own. Even with relying on Incus, Podman and Systemd as much as possible held together by lot’s of Ansible duct tape: it’s still arduous. I convinced myself change was in order: … something something Kubernetes.

My main criteria are basically:

  • Must be able to run on a single node (for now). i.e. no clustered services or databases. (k3s looks like it fits the bill)
  • Services must be able to be deployed with public service definitions (Helm FTW)
  • These service definitions must lend themselves to be version controlled
  • All relevant data directories must live on a separate ZFS datasets

Running k3s in an Incus container

You can run k3s in an Incus container, but it gets increasingly difficult. There’re reports of people getting it to run, but it gets increasingly difficult. Even public LXD/LXC definitions for microk8s or k3s are either quite old (as of 2025-08 3 and 6 years old respectively) and blast HUGE holes in the sandbox. ☹️ K3s “requires” access to /dev/kmsg, several places in /proc and /sys as well as modprobing several kernel modules (it checks for access to them and spams the logs with warnings and errors). 😶

It looks doable in a technical sense, but it’s a huge pain having to go though Incus, without any of the (sandboxing/security) benefits. So the general wisdom is to just use a VM. (No, I didn’t try k3s’ experimental rootless mode)

Running k3s in an Incus VM

I started with a fresh VM and could reuse my now much simplified Ansible tasks for setting um k3s. But my happiness got cut short by the k3s service spamming the journal with useless

level=error msg="failed to ping connection: disk I/O error: no such device"

error messages.After removing all the directories and files from /var/lib/rancher/k3s and starting the server by hand I got:

Error: preparing server: failed to bootstrap cluster data: creating storage endpoint: failed to create driver for default endpoint: setup db: disk I/O error: no such device

Some more mucking around with the k3s server config revealed a puzzling, but more useful

failed to mount overlay: invalid argument.

Looking at what dmesg had to say I got:

overlayfs: upper fs does not support tmpfile.
overlayfs: failed to set xattr on upper
overlayfs: …falling back to redirect_dir=nofollow.
overlayfs: …falling back to uuid=null.
overlayfs: …falling back to xino=off.
overlayfs: try mounting with 'userxattr' option
overlayfs: upper fs missing required features.

Long story short: it turns out in my eagerness I had mounted a custom Incus volume as k3s’ data directory (/var/lib/rancher/k3s). This being a VM (instead of a container) it mounted the volume using the virtiofs protocol. And it turns out the overlayfs doesn’t like being put on top of virtiofs devices (or NFS it seems). 😵‍💫 But good news: it was fixable, although hacky. I found out by grepping for “virtiofsd” processes that Incus vendors its own virtiofsd binary in /opt/incus/bin/virtiofsd. And it already runs it with the --posix-acl option with implies the required --xattr option. But Incus currently doesn’t support any way for configuring virtiofsd. 😓 So the only solution (by the main Incus maintainer none the less) is to replace /opt/incus/bin/virtiofsd with a shim script calling the real virtiofsd binary with the additional --modcaps=+sys_admin option. Basically something silly like:

#!/usr/bin/bash
exec /opt/incus/bin/virtiofsd.orig --modcaps=+sys_admin "$@"

Yeah also, “try mounting with ‘userxattr’ option” was not helpful and sent me down the wrong path. 🤐

All in all … all these stumbling blocks ate my weekend. Which was kind of in line with my prejudices against Kubernetes. 😅

Using Posteo With Custom Domains

I tried to find out if you can use Posteo with a custom domain. And the short answer is: no.

Andy Schwartzmeyer found out the long answer. 👏😀 I implemented it for my blog and so far I’m happy. Interestingly enough it’s the receiving side that needs an external service. I currently use Forward Email‘s free tier, because I already have publicly known email addresses that I automatically collect emails from.

I always had the impression that sending mails is made more difficult in order to combat spammers, but what do I know. 🤷 Sending only needs to include Posteo in the SPF rules and adding a sender identity (usable after a one week cool down 🙈).

Moving LXD Containers From One Pool to Another

When I started playing with LXD I just accepted the default storage configuration which creates an image file and uses that to initialize a ZFS pool. Since I’m using ZFS as my main file system this seemed silly as LXD can use an existing dataset as a source for a storage pool. So I wanted to migrate my existing containers to the new storage pool.

Although others seemed to to have the same problem there was no ready answer. Digging through the documentation I finally found out that the lxc move  command had a -s  option … I had an idea. ? Here’s what I came up with …

Preparation

First we create the dataset on the existing ZFS pool and add it to LXC.

sudo zfs create -o mountpoint=none mypool/lxd
lxc storage create pool2 zfs source=mypool/lxd

lxc storage list should show something like this now:

+-------+-------------+--------+--------------------+---------+
| NAME  | DESCRIPTION | DRIVER |       SOURCE       | USED BY |
+-------+-------------+--------+--------------------+---------+
| pool1 |             | zfs    | /path/to/pool1.img | 2       |
+-------+-------------+--------+--------------------+---------+
| pool2 |             | zfs    | mypool/lxd         | 0       |
+-------+-------------+--------+--------------------+---------+

pool1 is the old pool backed by the image file and is used by some containers at the moment as can be seen in the “Used By” column. pool2 is added by not used by any contaiers yet.

Moving

We now try to move our containers to pool2.

# move container to pool2
lxc move some_container some_container-moved -s=pool2
# rename container back for sanity ;)
lxc move some_container-moved some_container

We can check with lxc storage list whether we succeeded.

+-------+-------------+--------+--------------------+---------+
| NAME  | DESCRIPTION | DRIVER |       SOURCE       | USED BY |
+-------+-------------+--------+--------------------+---------+
| pool1 |             | zfs    | /path/to/pool1.img | 1       |
+-------+-------------+--------+--------------------+---------+
| pool2 |             | zfs    | mypool/lxd         | 1       |
+-------+-------------+--------+--------------------+---------+

Indeed pool2 is beeing used now. ? Just to be sure we check that zfs list -r mypool/lxd  also reflects this.

NAME                                  USED  AVAIL  REFER  MOUNTPOINT
mypool/lxd/containers                 1,08G  92,9G    24K  none
mypool/lxd/containers/some_container  1,08G  92,9G   704M  /var/snap/lxd/common/lxd/storage-pools/pool2/containers/some_container
mypool/lxd/custom                       24K  92,9G    24K  none
mypool/lxd/deleted                      24K  92,9G    24K  none
mypool/lxd/images                       24K  92,9G    24K  none
mypool/lxd/snapshots                    24K  92,9G    24K  none

Awesome!

⚠ Note that this only moves the container, but not the LXC image it was cloned off of.

We can repeat this until all containers we care about are moved over to pool2.

Cleanup

To prevent new containers to use pool1  we have to edit the default  profile.

# change devices.root.pool to pool2
lxc profile edit default

Finally …. when we’re happy with the migration and we’ve verified that everything works as expected we can now remove pool1.

lxc storage rm pool1