Configuring Custom Ingress Ports With Cilium

This is just a note for anyone looking for a solution to this problem.

While it’s extremely easy with the Kubernetes’ newer Gateway API via listeners on Gateway resources it seems the Ingress resources were always meant to be used with (global?) default ports … mainly 80 and 443 for HTTP and HTTPS respectively. So every Ingress Controller seems to have their own “side-channel solution” that leverages some resource metadata to convey this information. For Cilium this happens to be the sparsely documented ingress.cilium.io/host-listener-port annotation.

So your Ingress definition should look something like this:

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: ...
  namespace: ...
  annotations:
    ingress.cilium.io/host-listener-port: 1234
spec:
  ingressClassName: cilium
  rules:
  - http: ...

Fixing Dracut for Encrypted ZFS on Root on Ubuntu 25.10

I just upgraded from Ubuntu 25.04 to 25.10 … well it was more of a reinstall really. Because I knew the new release changed the initrd-related tools to Dracut I tried to understand all the changes from a test installation in a VM. Well, I still somehow broke Dracut’s ability to unlock my encrypted ZFS on root setup automatically.

Looking at journalctl it claimed it couldn’t find the key file:

dracut-pre-mount[940]: Warning: ZFS: Key /run/keystore/rpool/system.key for rpool/enc hasn't appeared. Trying anyway.
[...]
dracut-pre-mount[1001]: Key load error: Failed to open key material file: No such file or directory
[...]
systemd[1]: Mounting sysroot.mount - /sysroot...
mount[1007]: zfs_mount_at() failed: encryption key not loaded
systemd[1]: sysroot.mount: Mount process exited, code=exited, status=2/INVALIDARGUMENT
systemd[1]: sysroot.mount: Failed with result 'exit-code'.
systemd[1]: Failed to mount sysroot.mount - /sysroot.
systemd[1]: Dependency failed for initrd-root-fs.target - Initrd Root File System.

All I could do was mounting the keystore manually in the emergency console:

systemd-cryptsetup attach keystore-rpool /dev/zvol/rpool/keystore
mkdir -p /run/keystore/rpool
mount /dev/mapper/keystore-rpool /run/keystore/rpool

After pressing Ctrl-d Systemd continued booting as if everything was OK. This worked, but was HUGELY annoying, especially considering it was also using an English keyboard mapping. 🤬

After I was done setting up my desktop I took the time investigate the issue. I compared all the things between my real system and the freshly setup VM. After comparing the system startup plots (exported with systemd-analyze plot > plot.svg) I noticed that the systemd-ask-password.service would start quite late in my real system (after I manually mounted the keystore). I knew there was a bug report for teaching Dracut Ubuntu’s ZFS on root encryption scheme (i.e. putting the root ZFS dataset’s encryption keys in a LUKS container on a Zvol (rpool/keystore)). So I looked at the actual patch and tried to walk through of how it would behave on my system. There I noticed that the script actually assumes the ZFS encryption root to be the same as the Zpool’s root dataset (e.g. rpool). 😯 I moved away from this kind of setup years ago as it makes restoring from a backup quite cumbersome. So I was using a sub-dataset for the encrypted data (e.g. root/crypt) which messed up the logic which assumed it to only contain the pool name. 🤦‍♂️

Long story short the following patch determines the pool name of the encryption root before trying to open and mount the LUKS keystore:

--- zfs-load-key.sh.orig        2025-10-16 20:44:47.955349974 +0200
+++ zfs-load-key.sh     2025-10-16 20:55:00.229000464 +0200
@@ -54,9 +54,11 @@
     [ "$(zfs get -Ho value keystatus "${ENCRYPTIONROOT}")" = "unavailable" ] || return 0

     KEYLOCATION="$(zfs get -Ho value keylocation "${ENCRYPTIONROOT}")"
+    # `ENCRYPTIONROOT` might not be the root dataset (e.g. `rpool/enc`)
+    ENCRYPTIONROOT_POOL="$(echo "${ENCRYPTIONROOT}" | cut -d/ -f1)"
     case "$KEYLOCATION" in
-        "file:///run/keystore/${ENCRYPTIONROOT}/"*)
-            _open_and_mount_luks_keystore "${ENCRYPTIONROOT}" "${KEYLOCATION#file://}"
+        "file:///run/keystore/${ENCRYPTIONROOT_POOL}/"*)
+            _open_and_mount_luks_keystore "${ENCRYPTIONROOT_POOL}" "${KEYLOCATION#file://}"
             ;;
     esac

🎉

Running k3s on Incus

I know the pain to manage a bunch of services on my own. Even with relying on Incus, Podman and Systemd as much as possible held together by lot’s of Ansible duct tape: it’s still arduous. I convinced myself change was in order: … something something Kubernetes.

My main criteria are basically:

  • Must be able to run on a single node (for now). i.e. no clustered services or databases. (k3s looks like it fits the bill)
  • Services must be able to be deployed with public service definitions (Helm FTW)
  • These service definitions must lend themselves to be version controlled
  • All relevant data directories must live on a separate ZFS datasets

Running k3s in an Incus container

You can run k3s in an Incus container, but it gets increasingly difficult. There’re reports of people getting it to run, but it gets increasingly difficult. Even public LXD/LXC definitions for microk8s or k3s are either quite old (as of 2025-08 3 and 6 years old respectively) and blast HUGE holes in the sandbox. ☹️ K3s “requires” access to /dev/kmsg, several places in /proc and /sys as well as modprobing several kernel modules (it checks for access to them and spams the logs with warnings and errors). 😶

It looks doable in a technical sense, but it’s a huge pain having to go though Incus, without any of the (sandboxing/security) benefits. So the general wisdom is to just use a VM. (No, I didn’t try k3s’ experimental rootless mode)

Running k3s in an Incus VM

I started with a fresh VM and could reuse my now much simplified Ansible tasks for setting um k3s. But my happiness got cut short by the k3s service spamming the journal with useless

level=error msg="failed to ping connection: disk I/O error: no such device"

error messages.After removing all the directories and files from /var/lib/rancher/k3s and starting the server by hand I got:

Error: preparing server: failed to bootstrap cluster data: creating storage endpoint: failed to create driver for default endpoint: setup db: disk I/O error: no such device

Some more mucking around with the k3s server config revealed a puzzling, but more useful

failed to mount overlay: invalid argument.

Looking at what dmesg had to say I got:

overlayfs: upper fs does not support tmpfile.
overlayfs: failed to set xattr on upper
overlayfs: …falling back to redirect_dir=nofollow.
overlayfs: …falling back to uuid=null.
overlayfs: …falling back to xino=off.
overlayfs: try mounting with 'userxattr' option
overlayfs: upper fs missing required features.

Long story short: it turns out in my eagerness I had mounted a custom Incus volume as k3s’ data directory (/var/lib/rancher/k3s). This being a VM (instead of a container) it mounted the volume using the virtiofs protocol. And it turns out the overlayfs doesn’t like being put on top of virtiofs devices (or NFS it seems). 😵‍💫 But good news: it was fixable, although hacky. I found out by grepping for “virtiofsd” processes that Incus vendors its own virtiofsd binary in /opt/incus/bin/virtiofsd. And it already runs it with the --posix-acl option with implies the required --xattr option. But Incus currently doesn’t support any way for configuring virtiofsd. 😓 So the only solution (by the main Incus maintainer none the less) is to replace /opt/incus/bin/virtiofsd with a shim script calling the real virtiofsd binary with the additional --modcaps=+sys_admin option. Basically something silly like:

#!/usr/bin/bash
exec /opt/incus/bin/virtiofsd.orig --modcaps=+sys_admin "$@"

Yeah also, “try mounting with ‘userxattr’ option” was not helpful and sent me down the wrong path. 🤐

All in all … all these stumbling blocks ate my weekend. Which was kind of in line with my prejudices against Kubernetes. 😅

Force VLC to use VA-API for Hardware Accellerated Video Decoding

tl;dr: add the --avcodec-hw=vaapi option on the command line or to the Exec option in the .desktop file.

It’s stupid, I know, but it’s been bothering me for a while now. Especially when I want to watch conference talks that are available in the AV1 video format (e.g. FOSDEM) the video always seems to hang (show an old frame indefinitely), have broken decoding (shows alternating weirdly colored blocks), de-sync from audio or just stay black. This is happening on both Intel and AMD integrated graphics for years now, and I somehow decided that VDPAU must be the culprit. I also definitely know that VA-API works on my machines, because I’ve tested it … so that can’t be the problem. 😇

VLC (generally) supports both VA-API (mainly for Intel and AMD hardware) and VDPAU (for Nvidia) libraries for hardware accelerated video decoding, but on my Ubuntu desktop machines prefers VDPAU on any hardware for some reason. The settings don’t even show support for anything else: “Simple Preferences” -> “Input/Codecs” tab -> “Hardware-accelerated decoding” only shows “Automatic”, “VDPAU video decoder” and “Disable” options. 😵‍💫 The only “variant” that correctly uses VA-API automatically on my machines is the VLC Flatpak. I checked which backend was used via the “Modules Tree” tab in the “Tools” -> “Messages” dialog. It will show “vdpau”-something in the “video output” subtree (or not).

The Solution

So I dug through weird forums and tried different suggested options, of those many weren’t even supported until I found the right incantation: --avcodec-hw=vaapi .

Fixing the .desktop file

To make your desktop always call VLC with the right options we have to edit VLC’s so-called .desktop file. Mine was located in /usr/share/applications/vlc.desktop.
The relevant line looked like this: Exec=/usr/bin/vlc --started-from-file %U .

Copy the vlc.desktop file to either the $HOME/.local/share/applications/ directory if you want to change the behavior only for you. Alternatively if you have root privileges you can update vlc.desktop for all users of that machine by copying it to /usr/local/share/applications/ . NOTE: you may need to create those directories first.

Then edit the Exec= line to look like this: Exec=/usr/bin/vlc --avcodec-hw=vaapi --started-from-file %U

Or if you want to just copy the relevant commands:

# create the directory for personal .desktop files
mkdir -p $HOME/.local/share/applications/

# copy the original vlc.desktop to this directory
cp /usr/share/applications/vlc.desktop $HOME/.local/share/applications/

# edit the copied vlc.desktop by changing its "Exec" option to include the relevant VLC option
desktop-file-edit --set-key=Exec --set-value="/usr/bin/vlc --avcodec-hw=vaapi --started-from-file %U" $HOME/.local/share/applications/vlc.desktop

Enjoy!

Dropbear vs SSH woes between Ubuntu LTSes

Imagine you’re using dropbear-initrd to log in to a server during boot for unlocking the hard disk encryption and you’re greeted with the following error after a reboot:

root@server: Permission denied (publickey).

🤨😓😖 You start to sweat … this looks like extra work you didn’t need right now. You try to remember: were there any updates lately that could have messed up the initrd? … deep breath, lets take it slowly.

First try to get SSH to spit out more details:

$ ssh -vvv server-boot
[...]
debug1: Next authentication method: publickey
debug1: Offering public key: /home/user/.ssh/... RSA SHA256:... explicit
debug1: send_pubkey_test: no mutual signature algorithm
[...]

That doesn’t seem right … this worked before. The server is running Ubuntu 20.04 LTS and I’ve just upgraded my work machine to Ubuntu 22.04 LTS. I know that Dropbear doesn’t support ed25519 keys (at least not on the version on the server), that’s why I still use RSA keys for that. 🤔

Time to ask the Internet, but all the posts with a “no mutual signature algorithm” error message are years old … but most of them were circling around the SSH client having deprecated old key types (namely DSA keys). 😯

Can it be that RSA keys have also been deprecated? 😱 … I’ve recently upgraded my client machine 😶 … no way! … well, yes! That was exactly the problem.

Allowing RSA keys in the connection settings for that server allowed me to log in again 😎:

PubkeyAcceptedKeyTypes +ssh-rsa

But this whole detour unnecessarily wasted an hour of my life. 😓

Finding out what rules to add to /etc/gai.conf

I had a weird problem. I was using network prefix translation (NPT) for routing IPv6 packets to the Internet through a VPN. But while all devices could connect to the IPv6 Internet without problems, they never did so on their own. They always preferred IPv4 connections when they had the choice. 🤨

Problem Background

I knew that modern network stacks are configured to prefer IPv6 over IPv4 generally, but was baffled why it wouldn’t use IPv6 since it was clear that connections to the Internet work. A little bit of tinkering revealed that IPv4 connections to the Internet are preferred only when my device had no global IPv6 addresses. Because I was relying on NPT my devices only had ULAs.

It turns out that the wise people making standards decided that when a device has only private IPv4 addresses and ULAs IPv4 connections are preferred for the Internet under the assumption that private IPv4 addresses are definitely NATed while IPv6’s ULA probably (definitely?) won’t. 😯

Finding a Solution

A quick search for anything related to IPv4 vs. IPv6 priority leads exclusively to questions and posts where the authors want to always have IPv4 prioritized over IPv6. Although my case was the opposite one thing became clear: it had to do with modifying /etc/gai.conf. It’s a file for configuring RFC 6724 (i.e. Default Address Selection for Internet Protocol Version 6 (IPv6)).

This allowed me to influence the selection algorithm which seemed to be needed for solving my problem. If you open this file it even has commented-out lines for solving the “always prefer IPv4 over IPv6” problem. The inverse case was not so simple, because among the precedence rules there was no address range for ULAs and adding one for my specific ULA didn’t solve the problem either:

[...]
# precedence  <mask>   <value>
#    Add another rule to the RFC 3484 precedence table.  See section 2.1
#    and 10.3 in RFC 3484.  The default is:
#
precedence  ::1/128       50
precedence  ::/0          40
precedence  2002::/16     30
precedence ::/96          20
precedence ::ffff:0:0/96  10
precedence fd:11:22::/48  45  # <-- added my ULA, but didn't help
#
#    For sites which prefer IPv4 connections change the last line to
#
#precedence ::ffff:0:0/96  100
[...]

Manual Algorithm

I tried to take a step back and find out if a precedence setting was even the right change. I bit the bullet and tried to evaluated the “Source Address Selection” algorithm from RFC 6725 (Section 5) by hand.

Candidate Addresses

My candidate addresses for the destination (this server) were:

2a01:4f8:c2c:8101::1   # native IPv6
::ffff:116.203.176.52  # native IPv4 (mapped to IPv6 for this algorithm)

My candidate source addresses (from my WLAN connection) looked like:

fd00:11:22::aa  # global dynamic noprefixroute
fd00:11:22::bb  # global temporary dynamic
fd00:11:22::cc  # global mngtmpaddr noprefixroute
::ffff:10.0.0.50  # private IPv4 (mapped to IPv6 for this algorithm)

The Rules

Rule 1: Prefer same address.

skip, source and destination are not the same.

Rule 2: Prefer appropriate scope.

skip, connection is unicast, so no multicast.

Rule 3: Avoid deprecated addresses.

skip, no deprecated source addresses used.

Rule 4: Prefer home addresses.

skip? I was not sure what a “home address” is supposed to be, but it seems related to mobile networks. I just assumed all source addresses were “home” addresses.

Rule 5: Prefer outgoing interface.

skip, I was already only considering the outgoing interface here.

Rule 5.5: Prefer addresses in a prefix advertised by the next-hop.

skip? all next-hops were fe00::<router's EUI64>.

Rule 6: Prefer matching label.

We get the default labels from /etc/gai.conf (mine from Ubuntu 21.10):

[…]
#label ::1/128       0  # loopback address
#label ::/0          1  # IPv6, unless matched by other rules
#label 2002::/16     2  # 6to4 tunnels
#label ::/96         3  # IPv4-compatible addresses (deprecated)
#label ::ffff:0:0/96 4  # IPv4-mapped addresses
#label fec0::/10     5  # site-local addresses (deprecated)
#label fc00::/7      6  # ULAs
#label 2001:0::/32   7  # Teredo tunnels
[…]

Then the destination addresses would get labeled like this:

2a01:4f8:c2c:8101::1   # label 1
::ffff:116.203.176.52  # label 4

And the source addresses would get labeled like this:

fd00:11:22::aa  # label 6
fd00:11:22::bb  # label 6
fd00:11:22::cc  # label 6
::ffff:10.0.0.50  # label 4

Here we see why IPv4 addresses are selected: their destination and source addresses have the same label while the IPv6 address don’t. 😔

So I could add a new label for our ULA that has the same label as the ::/0 addresses (i.e. 1 here). I didn’t change the label on the fc00::/7 line in order not to change the behavior for all ULAs, but I wanted a special rule for my specific network. So I uncommented the default label lines and added the following line:

label fd00:11:22::/48 1  # my ULA prefix and the same label as ::/0

Reboot (may no be strictly necessary) … and lo and behold it worked! 😎

Conclusion

While this worked I really felt uneasy messing with the address priorization especially if you take into account that I’d have to do this on every device. This is on top of the already esoteric setup for using NPT. 🙈

I later found out that when the VPN goes down (i.e. there’s no IPv6 Internet connectivity) it won’t (actually can’t) fall back to IPv4 for the Internet connection. 😓

Routing My Way Out With IPv6: NPT6

This article is part of a series of how I built a WireGuard tunnel for getting IPv6 connectivity. Where the last step was to figure out how to route packets from devices in my private network through the WireGuard tunnel to the Internet.

I’ve explored three different methods for solving this:

I’ll try to show how to set each of them up and try to convey their pros and cons.

TL;DR

You should always consider IPv6-PD first!

Consider any other option only if:

  • you have a “weird” setup or want to support an esoteric use case (like I do e.g. with too many local subnets for too long a public prefix)
  • you’re willing to set up, debug and maintain a somewhat experimental configuration
  • you more or less understand the tradeoffs
  • all of the above!

Starting Point

I’ll assume the following has been set up:

  • default OpenWRT networks named “LAN”, “WAN”, “WAN6”
  • default OpenWRT firewall rules
  • an ULA prefix of fd00:11:22::/48
  • an IPv6 WireGuard tunnel with the endpoint on our OpenWRT router being 2000:30:40:50::2
  • the remote WireGurad tunnel end point forwards the whole 2000:30:40:50::/64 to our OpenWRT router

NPTv6 (Network Prefix Translation)

This is probably the least publicly documented method of all. Discussions and tutorials are scarce. Its use cases are esoteric and probably better solved in other ways. But it’s the most interesting method, because it’s conceptually even simpler than NAT6, but only viable with IPv6 addresses.

NPT basically means that you swap the prefix part of an IPv6 address with another same-sized prefix. It exploits two facts about IPv6 addresses. The first one is that prefixes can be at most 64 bits long (i.e. for a /64) leaving the interface identifier (i.e. the second half of the IPv6 address) untouched. The second one is that interface identifiers are basically random (i.e. because they’re either derived from (globally) unique MAC addresses or they’re randomly generated temporary addresses) and hence won’t clash. This allows for stateless, NAT-like behavior (i.e.without the “expensive” tracking of NATed connections).

You can configure NPT to be bidirectional which maps prefixes in both directions basically creating a 1:1 mapping. If you’re doing this you’re probably better off just announcing multiple prefixes to your devices or creating custom routes to bridge two networks.

An even more esoteric use case is when you create one or more unidirectional mappings allowing you to multiplex multiple networks onto one. This works great, because the interface identifiers are basically random and can be left as they are. In my tests having one-way mappings still managed to route the responses correctly although strictly speaking it shouldn’t. 🤨 I suspect that this worked accidentally, because of the standard firewall “conntrack” (i.e. connection tracking) rules. 🤔

Setup

On the “Network > Interfaces” page edit the “WAN6” interface and set “Protocol” to “unmanaged”. And make sure the “WAN6_WG” addresses say 2000:30:40:50::2/64 (note the /64 at the end).

Update 2025-05-10: it seems there’s a hint of NPTv6 in the OpenWRT Wiki. The script provided there uses the current nft(ables) tools for configuring the firewall. Note that the simple example only configures the outgoing LAN -> WAN mapping. There’s also the “symmetric dynamic” variant for a true 1:1 mapping.

Similar to the NAT6 case we need a custom firewall script. You have to install the iptables-mod-nat-extra package. I’ve created a Gist for the script. Save it to /etc/firewall.npt6 and instruct the firewall to run it when being reloaded by adding the following section to /etc/config/firewall:

config include 'npt6'
        option path '/etc/firewall.npt6'
        option reload '1'

After restarting the firewall with /etc/init.d/firewall restart you should be good to go.

As described at the top of the firewall script you can configure mappings by adding npt6 config sections to /etc/config/firewall (sorry, there’s no UI for this 😅).

config npt6
        option src_interface 'lan'
        option dest_interface 'wan6_wg'
        option output '1'

This is the minimal setup. Just add more sections for more source and destination network pairs. Run /etc/init.d/firewall reload to apply new configurations.

In my tests all devices could connect to IPv6 services on the internet without problems. But devices always preferred IPv4 connections over IPv6 ones. This was tricky to solve, but it comes down to this:

When a domain has both public/global IPv4 and IPv6 addresses your devices tries to determine how to connect to it. It’ll generally prefer IPv6 over IPv4, but actually its more complicated than that. All IPv4 addresses are treated as global during address selection while IPv6 addresses are classified differently depending on the prefix. It just so happens that from the outside it looks something like this: global IPv6 address > IPv4 addresses > IPv6 ULAs. It’s a little more complicated

Since we don’t have a global IPv6 address, IPv4 is preferred assuming that private IPv4 addresses will generally be NATed to the Internet while ULA prefixes won’t. 😞

This was tricky to solve. All related questions on the Internet revolved around how to prefer IPv4 over IPv6, but the solution was not invertible. It boils down to changing /etc/gai.conf to classify your ULA prefix the same as a global ones. You can accomplish this by adding a label line for your ULA (i.e. fd00:11:22::/48 here) and giving it the same label (i.e. the last number on the line) as the line with ::/0 (i.e. 1 here for me). Finding this out took me a week of trial and error until I resigned to doing the address selection algorithm by hand. 😅

Update 2025-05-10: to circumvent this the OpenWRT Wiki actually suggest you use an unassigned prefix as your ULA (e.g. the example simply replaces the first hex digit “f” with a “d”). 😶 “Unassigned” here means nobody is using it for anything official yet. These addresses counts as globally routable and will hence be preferred above using IPv4. But this circumvents protections that prevent leaking ULA traffic to the internet. IIRC there’s talk about changing the RFCs regarding the IPV4 > ULA ordering. Nonetheless this is a grey area: it works as long as the prefix stays unassigned … and you protect yourself against leaking this “fake-ULA” traffic. 🙈
…. back to the old content:

I had to uncomment all the label configuration lines and then add my custom line, because once you add a custom rule all the default ones will be reset. So to add a rule on top of the default ones I ended up with the following (note that I only added the last line, all others were part of Ubuntu’s default configuration):

...
label ::1/128       0
label ::/0          1
label 2002::/16     2
label ::/96         3
label ::ffff:0:0/96 4
label fec0::/10     5
label fc00::/7      6
label 2001:0::/32   7
label fd00:11:22::/48 1
...

I only added my network’s ULA to preserve the default behavior as much as possible and only make an exception for my network specifically. so this will change the behavior only when the device has addresses from this specific ULA.

You have to restart applications for them to pick up changes to /etc/gai.conf.

Pros

  • multiple internal networks can be multiplexed onto one upstream network (even when the upstream prefix is too long (e.g. for IPv6-PD))
  • internal devices are not directly reachable from the Internet (with unidirectional mapping) (this is not a replacement for a firewall!)

Cons

  • very little documentation and online resources
  • for your devices to use IPv6 by default you have to muck with address selection preferences on each and every one of them
  • it doesn’t fall back to IPv4 when the IPv6 tunnel goes down

Routing My Way Out With IPv6: NAT6

This article is part of a series of how I built a WireGuard tunnel for getting IPv6 connectivity. Where the last step was to figure out how to route packets from devices in my private network through the WireGuard tunnel to the Internet.

I’ve explored three different methods for solving this:

I’ll try to show how to set each of them up and try to convey their pros and cons.

TL;DR

You should always consider IPv6-PD first!

Consider any other option only if:

  • you have a “weird” setup or want to support an esoteric use case (like I do e.g. with too many local subnets for too long a public prefix)
  • you’re willing to set up, debug and maintain a somewhat experimental configuration
  • you more or less understand the tradeoffs
  • all of the above!

Starting Point

I’ll assume the following has been set up:

  • default OpenWRT networks named “LAN”, “WAN”, “WAN6”
  • default OpenWRT firewall rules
  • an IPv6 WireGuard tunnel with the endpoint on our OpenWRT router being 2000:30:40:50::2
  • the remote WireGuard tunnel end point forwards the whole 2000:30:40:50::/64 to our OpenWRT router

NAT6 a.k.a. Masquerading

NAT6 is basically a rehash of the “the old way” of using NAT for the IPv4 Internet. The router/gateway replaces the internal source (i.e. sender) address of a packet going out with its own public address. The router makes note of original sender and recipient to be able to reverse the process when an answer comes back. When the router receives a packet it forwards it to the actual recipient by replacing the destination address with the internal address of original sender.

Setup

On the “Network > Interfaces” page edit the “WAN6” interface and set “Protocol” to “unmanaged”. Then follow the OpenWRT NAT6 and IPv6 Masquerading documentation.

In my tests the masq6_privacy setting had no impact. All outgoing packages always had an address of 2000:30:40:50::2 (i.e. the router’s WireGuard interface address). 😕 It seems using WireGuard interferes with OpenWRT’s ability to generate temporary addresses for the interface. No amount of fiddling (e.g. setting addresses with /64, suffixes to “random”, setting prefix filters, setting a delegatable prefix, but disabling delegation, … I really got desperate) on the “WAN6_WG” interfaces’ settings or creating a “WAN6” alias and doing the same to it made the temporary addresses work. 😵 You could manually add addresses with random suffixes to the WireGuard interface … maybe even write a script that changes them periodically … 😅😐😞

Pros

  • multiple internal networks can be multiplexed onto one upstream network (even when the upstream prefix is too long (e.g. for IPv6-PD))
  • internal devices are not directly reachable from the Internet (this is not a replacement for a firewall!)

Cons

  • connections can only be started from internal devices
  • router needs to keep state for every connection
  • router needs to touch/manipulate every packet
  • you only have one static external address, because it seems temporary addresses (i.e. IPv6 privacy extensions) don’t work with WireGuard connections

Routing My Way Out With IPv6: IPv6-PD

Since I wrote my blog post about using a WireGuard tunnel for getting IPv6 connectivity there was one thing that was bugging me immensely: having to use NAT for IPv6. 😓

My initial howto used a private network for the WireGuard VPN which led to having two translation steps: one when entering the WireGuard VPN and one when exiting. I later realized I could use the global /64 assigned to the cloud VPN endpoint for the WireGuard VPN itself and just forward all traffic to and from it on the cloud VPN endpoint. This was easy, because the address mapping was 1:1 (cloud server’s /64 ⇔ WireGuard VPNs /64). This eliminated one translation.

The second translation (i.e. the one on the OpenWRT router) is more difficult to remove. The crux of the matter is that I only have a /64 for the tunnel which means I either have to select which internal network gets to be connected or I have to “multiplex” multiple internal /64s onto one upstream /64.

I’ve explored three different methods for solving this:

I’ll try to show how to set each of them up and try to convey their pros and cons.

TL;DR

You should always consider IPv6-PD first!

Consider any other option only if:

  • you have a “weird” setup or want to support an esoteric use case (like I do e.g. with too many local subnets for too long a public prefix)
  • you’re willing to set up, debug and maintain a somewhat experimental configuration
  • you more or less understand the tradeoffs
  • all of the above!

Starting Point

I’ll assume the following has been set up:

  • default OpenWRT networks named “LAN”, “WAN”, “WAN6”
  • default OpenWRT firewall rules
  • an IPv6 WireGuard tunnel with the endpoint on our OpenWRT router being 2000:30:40:50::2
  • the remote WireGurad tunnel end point forwards the whole 2000:30:40:50::/64 to our OpenWRT router

IPv6-PD (Prefix Delegation)

IPv6-PD (i.e. prefix delegation) is basically the built-in mechanism of sharing global IPv6 addresses with internal networks. As the word “delegation” implies you give away (a portion/sub-prefixes) to “downstream” networks. This also implies that if you get a long global prefix you may not be able to partition it for delegating it to (all) your internal networks. Normally you’ll get something like a /56 from your ISP, but I only have a /64, because I “hijacked” a cloud server’s addresses.

Setup

On the “Network > Interfaces” page edit the “WAN6” interface:

OpenWRT – WAN6 Interface: General Settings (for IPv6-PD)
OpenWRT – WAN6 Interface: General Settings (for IPv6-PD)
  • Set “Protocol” to “static”.
  • Set “Device” to “Alias interface: @wan6_wg”.
  • Set “IPv6 routed prefix” to the WireGuard public prefix (i.e. 2000:30:40:50::/64 in our case).
  • Make sure that in the “Advanced Settings” tab “Delegate IPv6 prefixes” is enabled.

After saving and applying those settings the “Network > Interfaces” page should look like the following screenshot.

OpenWRT – Resulting WAN6 Interfaces in Overview (for IPv6-PD)

Make sure that your WireGuard interface has its address set to 2000:30:40:50::2/128. If you have something like 2000:30:40:50::2/64 (note the /64) set as described in an earlier version of the previous how-to you’ll get the same /64 route for both the “WAN6” and the “WAN6_WG” interfaces. In my case packets from the “LAN” network would reach the Internet correctly, but the responses would arrive at the OpenWRT router’s WireGuard interface but never turn up in the “LAN” network. The following screenshot shows a working configuration on the “Status > Routes” page.

OpenWRT – Active IPv6 Routes (for IPv6-PD)

Pros

  • simple, built-in
  • devices can “directly” connect to the Internet (“no NAT, no nothin”; see below)
  • middleware boxes don’t need to keep state (it’s all just routing)
  • few things can break

Cons

  • your global prefix needs a short enough to be useful (i.e. shorter than /64)
  • internal devices have a routable address reachable from the Internet (i.e. your firewall should deny incoming connections from the Internet by default)

My First Container-based Postgres Upgrade

Yesterday I did my first container-based PostgreSQL version upgrade. In my case the upgrade was from version 13 to 14. In hindsight I was quite naïve. 😅

I was always wondering why distros kept separate data directories for different versions … now I know: you can’t do in-place upgrades with PostgreSQL. You need to have separate data directories as well as both version’s binaries. 😵 Distros have their mechanisms for it, but in the container world you’re kind of on your own.

Well not really … it’s just different. I found there’s a project that specializes in exactly the tooling part of the upgrade. After a little trial an error (see below) it went quite smoothly.

Procedure

In the end it came down to the following steps:

  1. Stop the old postgres container.
  2. Backup the old data directory (yay ZFS snapshots).
  3. Create the new postgres container (with a new data directory; in my case via Ansible)
  4. Stop the new postgres container.
  5. Run the upgrade. (see command below)
  6. Start the new postgres container.
  7. Run vacuumdb as suggested at the end of the upgrade. (see command below)

The Upgrade Command

I used the tianon/postgres-upgrade container for the upgrade. Since my directory layout didn’t follow the “default” structure I had to mount each version’s data directory separately.

docker run --rm \
-e POSTGRES_INITDB_ARGS="--no-locale --encoding=UTF8" \
-v /tmp/pg_upgrade:/var/lib/postgresql \
-v /tank/containers/postgres-13:/var/lib/postgresql/13/data \
-v /tank/containers/postgres-14:/var/lib/postgresql/14/data \
tianon/postgres-upgrade:13-to-14

I set the POSTGRES_INITDB_ARGS to what I used when creating the new Postgres container’s data directory. This shouldn’t be necessary because we let the new Postgres container initialize the data directory. (see below) I left it in just to be safe. 🤷

I explicitly mounted something to the container’s /var/lib/postgresql directory in order to have access to the upgrade logs which are mentioned in error messages. (see below)

The Vacuumdb Command

Upgrading finishes with a suggestion like:

Upgrade Complete
—————-
Optimizer statistics are not transferred by pg_upgrade.
Once you start the new server, consider running:
/usr/lib/postgresql/14/bin/vacuumdb –all –analyze-in-stages

We can run the command in the new Postgres container:

docker exec postgres vacuumdb -U postgres --all --analyze-in-stages

We use the postgres user, because we didn’t specify a POSTGRES_USER when creating the database container.

Pitfalls

When you’re not using the default directory structure there’re some pitfalls. Mounting the two versions’ data directories separately is easy enough … it says so in the README. It’s what it doesn’t say that makes it more difficult than necessary. 😞

Errors When Initializing the New Data Directory

The first error I encountered was that the new data directory would get initialized with the default initdb options. where I used an optimized cargo-culted incantation which was incompatible (in my case --no-locale --encoding=UTF8). The upgrade failed with the following error:

lc_collate values for database “postgres” do not match: old “C”, new “en_US.utf8”

So I made sure I created the new database container (with the correct initdb args) before the migration fixed this.

Extra Mounts for the Upgrade

What tripped me really up was that when something failed it said to look into a specific log file which I couldn’t find. 🤨 I had to also mount something to the /var/lib/postgres directory which then had all the upgrade log files. 😔

This also solved another of my problems where the upgrade tool wanted to start an instance of the Postgres database, but failed because it couldn’t find a specific socket … which also happens to be located in the directory mentioned above.

Authentication Errors After Upgrade

After the upgrade I had a lot of authentication errors although non of the passwords should have changed.

FATAL: password authentication failed for user “nextcloud”

After digging through the internet and comparing both the old and new data directories it looked like the password hashing method changed. It changed from md5 to scram-sha-256 (in pg_ hda.conf the line saying host all all all scram-sha-256). 😑Just re-setting (i.e. setting the same passwords again) via ALTER ROLE foo SET PASSWORD '...'; on all users fixed the issue.🤐