So You Want To Delegate ZFS Datasets to Containers

ZFS supports delegating datasets and their children to containers since version 2.2. It moves the control of the datasets from the host to a container’s namespace (ZFS also calls them “zoned”). But it’s never as easy as it sounds. As with everything containers the shifting of user ids plays weird tricks on you.

I recently tried experimenting with the ZFS delegation feature of Incus custom volumes. This allows Incus/LXD/LXC-style system containers to manage a sub-tree of ZFS datasets from inside the container. Everything is fine when you create the top dataset you want to delegate, delegate it to the container and create all the necessary sub-datasets from inside the container. But things get weird when you have datasets created on the host that you want to move under the delegated dataset (e.g. zfs rename tank/some-where/some-data tank/incus/custom/default_c1-custom-volume/some-data).

It basically boils down to:

Even root can’t change or write data into a dataset that was created on the host and then moved under a container’s delegated custom volume. Creating a new dataset from inside the container doesn’t have the same problem.

I felt like this was a serious shortcoming and would impede migration scenarios like mine so I reported it as a bug … it turns out, I was holding it wrong. 😅

The Solution

To fix my situation and move externally created datasets into a zone I needed to find the Hostid fields from the container’s volatile.idmap.current option (one for UIDs and one for GIDs; both were 1000000 in my case).
Then running chown -R 1000000:1000000 /mountpoint/of/the/dataset/to/be/moved on the host is where the magic lies. 😁
Moving the dataset by running zfs unmount ..., zfs rename ..., zfs set zoned=on ... on the host I was not only able to zfs mount it in the container, but now the ids were in the right range for the container to manage the data in it.

Running k3s on Incus

I know the pain to manage a bunch of services on my own. Even with relying on Incus, Podman and Systemd as much as possible held together by lot’s of Ansible duct tape: it’s still arduous. I convinced myself change was in order: … something something Kubernetes.

My main criteria are basically:

  • Must be able to run on a single node (for now). i.e. no clustered services or databases. (k3s looks like it fits the bill)
  • Services must be able to be deployed with public service definitions (Helm FTW)
  • These service definitions must lend themselves to be version controlled
  • All relevant data directories must live on a separate ZFS datasets

Running k3s in an Incus container

You can run k3s in an Incus container, but it gets increasingly difficult. There’re reports of people getting it to run, but it gets increasingly difficult. Even public LXD/LXC definitions for microk8s or k3s are either quite old (as of 2025-08 3 and 6 years old respectively) and blast HUGE holes in the sandbox. ☹️ K3s “requires” access to /dev/kmsg, several places in /proc and /sys as well as modprobing several kernel modules (it checks for access to them and spams the logs with warnings and errors). 😶

It looks doable in a technical sense, but it’s a huge pain having to go though Incus, without any of the (sandboxing/security) benefits. So the general wisdom is to just use a VM. (No, I didn’t try k3s’ experimental rootless mode)

Running k3s in an Incus VM

I started with a fresh VM and could reuse my now much simplified Ansible tasks for setting um k3s. But my happiness got cut short by the k3s service spamming the journal with useless

level=error msg="failed to ping connection: disk I/O error: no such device"

error messages.After removing all the directories and files from /var/lib/rancher/k3s and starting the server by hand I got:

Error: preparing server: failed to bootstrap cluster data: creating storage endpoint: failed to create driver for default endpoint: setup db: disk I/O error: no such device

Some more mucking around with the k3s server config revealed a puzzling, but more useful

failed to mount overlay: invalid argument.

Looking at what dmesg had to say I got:

overlayfs: upper fs does not support tmpfile.
overlayfs: failed to set xattr on upper
overlayfs: …falling back to redirect_dir=nofollow.
overlayfs: …falling back to uuid=null.
overlayfs: …falling back to xino=off.
overlayfs: try mounting with 'userxattr' option
overlayfs: upper fs missing required features.

Long story short: it turns out in my eagerness I had mounted a custom Incus volume as k3s’ data directory (/var/lib/rancher/k3s). This being a VM (instead of a container) it mounted the volume using the virtiofs protocol. And it turns out the overlayfs doesn’t like being put on top of virtiofs devices (or NFS it seems). 😵‍💫 But good news: it was fixable, although hacky. I found out by grepping for “virtiofsd” processes that Incus vendors its own virtiofsd binary in /opt/incus/bin/virtiofsd. And it already runs it with the --posix-acl option with implies the required --xattr option. But Incus currently doesn’t support any way for configuring virtiofsd. 😓 So the only solution (by the main Incus maintainer none the less) is to replace /opt/incus/bin/virtiofsd with a shim script calling the real virtiofsd binary with the additional --modcaps=+sys_admin option. Basically something silly like:

#!/usr/bin/bash
exec /opt/incus/bin/virtiofsd.orig --modcaps=+sys_admin "$@"

Yeah also, “try mounting with ‘userxattr’ option” was not helpful and sent me down the wrong path. 🤐

All in all … all these stumbling blocks ate my weekend. Which was kind of in line with my prejudices against Kubernetes. 😅