systemd – Bi·lak

Fixing Dracut for Encrypted ZFS on Root on Ubuntu 25.10

I just upgraded from Ubuntu 25.04 to 25.10 … well it was more of a reinstall really. Because I knew the new release changed the initrd-related tools to Dracut I tried to understand all the changes from a test installation in a VM. Well, I still somehow broke Dracut’s ability to unlock my encrypted ZFS on root setup automatically.

Looking at journalctl it claimed it couldn’t find the key file:

dracut-pre-mount[940]: Warning: ZFS: Key /run/keystore/rpool/system.key for rpool/enc hasn't appeared. Trying anyway.
[...]
dracut-pre-mount[1001]: Key load error: Failed to open key material file: No such file or directory
[...]
systemd[1]: Mounting sysroot.mount - /sysroot...
mount[1007]: zfs_mount_at() failed: encryption key not loaded
systemd[1]: sysroot.mount: Mount process exited, code=exited, status=2/INVALIDARGUMENT
systemd[1]: sysroot.mount: Failed with result 'exit-code'.
systemd[1]: Failed to mount sysroot.mount - /sysroot.
systemd[1]: Dependency failed for initrd-root-fs.target - Initrd Root File System.

All I could do was mounting the keystore manually in the emergency console:

systemd-cryptsetup attach keystore-rpool /dev/zvol/rpool/keystore
mkdir -p /run/keystore/rpool
mount /dev/mapper/keystore-rpool /run/keystore/rpool

After pressing Ctrl-d Systemd continued booting as if everything was OK. This worked, but was HUGELY annoying, especially considering it was also using an English keyboard mapping. 🤬

After I was done setting up my desktop I took the time investigate the issue. I compared all the things between my real system and the freshly setup VM. After comparing the system startup plots (exported with systemd-analyze plot > plot.svg) I noticed that the systemd-ask-password.service would start quite late in my real system (after I manually mounted the keystore). I knew there was a bug report for teaching Dracut Ubuntu’s ZFS on root encryption scheme (i.e. putting the root ZFS dataset’s encryption keys in a LUKS container on a Zvol (rpool/keystore)). So I looked at the actual patch and tried to walk through of how it would behave on my system. There I noticed that the script actually assumes the ZFS encryption root to be the same as the Zpool’s root dataset (e.g. rpool). 😯 I moved away from this kind of setup years ago as it makes restoring from a backup quite cumbersome. So I was using a sub-dataset for the encrypted data (e.g. root/crypt) which messed up the logic which assumed it to only contain the pool name. 🤦‍♂️

Long story short the following patch determines the pool name of the encryption root before trying to open and mount the LUKS keystore:

--- zfs-load-key.sh.orig        2025-10-16 20:44:47.955349974 +0200
+++ zfs-load-key.sh     2025-10-16 20:55:00.229000464 +0200
@@ -54,9 +54,11 @@
     [ "$(zfs get -Ho value keystatus "${ENCRYPTIONROOT}")" = "unavailable" ] || return 0

     KEYLOCATION="$(zfs get -Ho value keylocation "${ENCRYPTIONROOT}")"
+    # `ENCRYPTIONROOT` might not be the root dataset (e.g. `rpool/enc`)
+    ENCRYPTIONROOT_POOL="$(echo "${ENCRYPTIONROOT}" | cut -d/ -f1)"
     case "$KEYLOCATION" in
-        "file:///run/keystore/${ENCRYPTIONROOT}/"*)
-            _open_and_mount_luks_keystore "${ENCRYPTIONROOT}" "${KEYLOCATION#file://}"
+        "file:///run/keystore/${ENCRYPTIONROOT_POOL}/"*)
+            _open_and_mount_luks_keystore "${ENCRYPTIONROOT_POOL}" "${KEYLOCATION#file://}"
             ;;
     esac

🎉

Running Circles Around Detecting Containers

Recently my monitoring service warned me that my Raspberry Pi was not syncing its time any more. I logged into the devices and tried restarting systemd-timesyncd.service and it failed.

The error it presented was:

ConditionVirtualization=!container was not met

I was confused. Although I was running containers on this device, this was on the host! 😯

I checked the service definition and it indeed had this condition. Then I tried to look up the docs for the ContainerVirtualization setting and found out Systemd has a helper command that can be used to find out if it has been run inside a Container/VM/etc.

To my surprise running systemd-detect-virt determined it was being run inside a Podman container, although it was run on the host. I was totally confused. Does it detect any Container or being run in one? 😵‍💫

I tried to dig deeper, but the docs only tell you what known Container/VM solutions can be detected, but not what it uses to do so. So I searched the code of systemd-detect-virt for indications how it tried to detect Podman containers … and I found it: it looks for the existence of a file at /run/.containerenv. 😯

Looking whether this file existed on the host I found out: it did!!! 😵 How could this be? I checked another device running Podman and the file wasn’t there!?! 😵‍💫 … Then it dawned on me. I was running cAdvisor on the Raspberry Pi and it so happens that it wants /var/run to be mounted inside the container, /var/run just links to /run and independent of me mounting it read-only it creates the /run/.containerenv file!!! 🤯

I looked into /run/.containerenv and found out it was empty, so I removed it and could finally restart systemd-timesyncd.service. The /run/.containerenv file is recreated on every restart of the container, but at least I know what to look for. 😩

Update 2025-03-23: I’ve created a bug report: https://github.com/containers/podman/issues/25655

systemd from BSD

A remarkably sober analysis of what problem systemd solves for Linux … at a BSD conference of all places. ?