A remarkably sober analysis of what problem systemd solves for Linux … at a BSD conference of all places. ?
Tag: Admin
Moving LXD Containers From One Pool to Another
When I started playing with LXD I just accepted the default storage configuration which creates an image file and uses that to initialize a ZFS pool. Since I’m using ZFS as my main file system this seemed silly as LXD can use an existing dataset as a source for a storage pool. So I wanted to migrate my existing containers to the new storage pool.
Although others seemed to to have the same problem there was no ready answer. Digging through the documentation I finally found out that the lxc move command had a -s option … I had an idea. ? Here’s what I came up with …
Preparation
First we create the dataset on the existing ZFS pool and add it to LXC.
sudo zfs create -o mountpoint=none mypool/lxd lxc storage create pool2 zfs source=mypool/lxd
lxc storage list should show something like this now:
+-------+-------------+--------+--------------------+---------+ | NAME | DESCRIPTION | DRIVER | SOURCE | USED BY | +-------+-------------+--------+--------------------+---------+ | pool1 | | zfs | /path/to/pool1.img | 2 | +-------+-------------+--------+--------------------+---------+ | pool2 | | zfs | mypool/lxd | 0 | +-------+-------------+--------+--------------------+---------+
pool1 is the old pool backed by the image file and is used by some containers at the moment as can be seen in the “Used By” column. pool2 is added by not used by any contaiers yet.
Moving
We now try to move our containers to pool2.
# move container to pool2 lxc move some_container some_container-moved -s=pool2 # rename container back for sanity ;) lxc move some_container-moved some_container
We can check with lxc storage list whether we succeeded.
+-------+-------------+--------+--------------------+---------+ | NAME | DESCRIPTION | DRIVER | SOURCE | USED BY | +-------+-------------+--------+--------------------+---------+ | pool1 | | zfs | /path/to/pool1.img | 1 | +-------+-------------+--------+--------------------+---------+ | pool2 | | zfs | mypool/lxd | 1 | +-------+-------------+--------+--------------------+---------+
Indeed pool2 is beeing used now. ? Just to be sure we check that zfs list -r mypool/lxd also reflects this.
NAME USED AVAIL REFER MOUNTPOINT mypool/lxd/containers 1,08G 92,9G 24K none mypool/lxd/containers/some_container 1,08G 92,9G 704M /var/snap/lxd/common/lxd/storage-pools/pool2/containers/some_container mypool/lxd/custom 24K 92,9G 24K none mypool/lxd/deleted 24K 92,9G 24K none mypool/lxd/images 24K 92,9G 24K none mypool/lxd/snapshots 24K 92,9G 24K none
Awesome!
⚠ Note that this only moves the container, but not the LXC image it was cloned off of.
We can repeat this until all containers we care about are moved over to pool2.
Cleanup
To prevent new containers to use pool1 we have to edit the default profile.
# change devices.root.pool to pool2 lxc profile edit default
Finally …. when we’re happy with the migration and we’ve verified that everything works as expected we can now remove pool1.
lxc storage rm pool1
Backup And Restore Your Android Phone With ADB (And rsync)
Based on my previous scripts and inspired by two blog posts that I stumbled upon I tackled the “backup all my apps, settings and data” problem for my Android devices again. The “new” solutions both use
rsync
instead of
adb pull
for file transfers. They both use ADB to start a rsync daemon on the device, forward its ports to localhost and run rsync against it from your host.
Simon’s solution assumes your phone has rsync already (e.g. because you run CyanogenMod) and can become root via
adb root
. It clones all files from the phone (minus
/dev
,
/sys
,
/proc
etc.). He also configures udev to start the backup automatically when the phone is plugged in.
pts solves the setup without necessarily becoming root. He also has a way of providing a rsync binary to phones that don’t have any (e.g. when running OxygenOS). He also has a few tricks on how to debug the rsync daemon setup on the phone.
I’ve tried to combine both methods. My approach doesn’t require adb or rsync to be run as root. It’ll use the the system’s rsync when available or temporarily upload and use a backup one extracted from Cyanogen OS (for my OnePlus One). Android won’t allow you to
chmod +x
a file uploaded to
/sdcard
, but in
/data/local/tmp
it works. ?
The scripts will currently only backup and restore all of your
/sdcard
directory. Assuming you’re also using something like Titanium Backup you’ll be able to backup and restore all your apps, settings and data. To reduce the amount of data to copy it uses rsync filters to exclude caches and other files that you definitely don’t want synced (
.DS_Store
files anyone?).
At the moment there’s one caveat: I had to disable restoring modification times (i.e. use
--no-times
) because of an obnoxious error (they will be backuped fine, only restoring is the problem): ?
mkstemp “…” (in root) failed: Operation not permitted (1)
Additionally if you’re on the paranoid side you can also build your own rsync for Android to use as the backup binary.
The code and a ton of documentation can be found on GitHub. Comments and suggestions are welcome. ?
Build Rsync for Android Yourself
To build rsync for Android you’ll need to have the Android NDK installed already.
Then clone the rsync for android source (e.g. from CyanogenMod LineageOS) …
git clone https://github.com/LineageOS/android_external_rsync.git cd android_external_rsync # checkout the most recent branch git checkout cm-14.1
… create the missing
jni/Application.mk
build file (e.g. from this Gist) and adapt it to your case …
… and start the build with
export NDK_PROJECT_PATH=<code>pwd</code> ndk-build -d rsync
You’ll find your self-build rsync in
obj/local/*/rsync
. ?
Update 2017-10-06:
- Updated sources from CyanogenMod to LineageOS.
- Added links to Gist and Andoid NDK docs
- Updated steps to work with up-to-date setups
If you get something like the following warnings and errors …
[...]
./flist.c:454:16: warning: implicit declaration of function 'major' is invalid in C99
[-Wimplicit-function-declaration]
if ((uint32)major(rdev) == rdev_major)
^
./flist.c:458:41: warning: implicit declaration of function 'minor' is invalid in C99
[-Wimplicit-function-declaration]
if (protocol_version < 30 && (uint32)minor(rdev) <= 0xFFu)
^
./flist.c:467:11: warning: implicit declaration of function 'makedev' is invalid in C99
[-Wimplicit-function-declaration]
rdev = MAKEDEV(major(rdev), 0);
^
./rsync.h:446:36: note: expanded from macro 'MAKEDEV'
#define MAKEDEV(devmajor,devminor) makedev(devmajor,devminor)
^
3 warnings generated.
[...]
./flist.c:473: error: undefined reference to 'makedev'
./flist.c:454: error: undefined reference to 'major'
./flist.c:457: error: undefined reference to 'major'
./flist.c:458: error: undefined reference to 'minor'
./flist.c:467: error: undefined reference to 'major'
./flist.c:467: error: undefined reference to 'makedev'
./flist.c:617: error: undefined reference to 'major'
./flist.c:619: error: undefined reference to 'minor'
./flist.c:621: error: undefined reference to 'minor'
./flist.c:788: error: undefined reference to 'makedev'
./flist.c:869: error: undefined reference to 'makedev'
./flist.c:1027: error: undefined reference to 'minor'
clang++: error: linker command failed with exit code 1 (use -v to see invocation)
make: *** [obj/local/armeabi-v7a/rsync] Error 1
… you probably need to update
config.h
and change
/* #undef MAJOR_IN_SYSMACROS */
to
#define MAJOR_IN_SYSMACROS 1
.
Most Awesome Script Collection
CFSSL FTW
After reading how CloudFlare handles their PKI and that LetsEncrypt will use it I wanted to give CFSSL a shot.
Reading the project’s documentation doesn’t really help in building your own CA, but searching the Internet I found Fernando Barillas’ blog explaining how to create your own root certificate and how to create intermediate certificates from this.
I took it a step further I wrote a script generating new certificates for several services with different intermediates and possibly different configurations (e.g. depending on your distro and services certain cyphers (e.g. using ECC) may not be supported).
I also streamlined generating service specific key, cert and chain files. 😀
Have a look at the full Gist or just the most interesting part:
You’ll still have to deploy them yourself.
Update 2016-10-04:
Fixed some issues with this Gist.
- Fixed a bug where intermediate CA certificates weren’t marked as CAs any more
- Updated the example CSRs and the script so it can now be run without errors
Update 2017-10-08:
- Cleaned up `renew-certs.sh` by extracting functions for generating root CA, intermediate CA and service keys.
A Service Monitor built with Polymer
I tried to build a service monitor having the following features:
- showing the reachability of HTTP servers
- plotting the amount of messages in a specific RabbitMQ queue
- plotting the amount of queues with specific prefixes
- showing the status of RabbitMQ queues i.e. how many messages are in there? are there any consumers? are they hung?
- showing the availability of certain Redis clients
Well, you can find the result on GitHub.
It uses two things I published before: polymer-flot and flot-sparklines. 😀
An example dashboard:
too long for Unix domain socket
If you’re an Ansible user and encounter the following error:
unix_listener: "..." too long for Unix domain socket
you need to set the control_path option in your ansible.cfg file to tell SSH to use shorter path names for the control socket. You should have a look at the ssh_config(5) man page (under
ControlPath
) for a list of possible substitutions.
I chose:
control_path = %(directory)s/ssh-%%C
Widow Update FUBAR
Microsoft accidentally published a weird “test” patch via Windows Update … world-wide! ?
Update 2015-10-05: And now they also seem to use an untrusted certificate(German). o.O
Making RabbitMQ Recover from (a)Mnesia
In the company I work for we’re using RabbitMQ to offload non-timecritical processing of tasks. To be able to recover in case RabbitMQ goes down our queues are durable and all our messages are marked as persistent. We generally have a very low number of messages in flight at any moment in time. There’s just one queue with a decent amount of them: the “failed messages” dump.
The Problem
It so happens that after a botched update to the most recent version of RabbitMQ (3.5.3 at the time) our admins had to nuke the server and install it from scratch. They had made a backup of RabbitMQ’s Mnesia database and I was tasked to recover the messages from it.
This is the story of how I did it.
Since our RabbitMQ was configured to persist all the messages this should be generally possible. Surely I wouldn’t be the first one to attempt this. ?
Looking through the Internet it seems there’s no way of ex/importing a node’s configuration if it’s not running. I couldn’t find any documentation on how to import a Mnesia backup into a new node or extract data from it into a usable form. ?
The Idea
My idea was to setup a virtual machine (running Debian Wheezy) with RabbitMQ and then to somehow make it read/recover and run the broken server’s database.
In the following you’ll see the following placeholders:
- RABBITMQ_MNESIA_BASE will be
/var/lib/rabbitmq/mnesia
on Debian (see RabbitMQ’S file locations)
- RABBITMQ_MNESIA_DIR is just $RABBITMQ_MNESIA_BASE/$RABBITMQ_NODENAME
- BROKEN_NODENAME the $RABBITMQ_NODENAME of the broken server we have backups from
- BROKEN_HOST the hostname of said server
One more thing before we start: if I say “fix permissions” below I mean
sudo chown -R rabbitmq:rabbitmq $RABBITMQ_MNESIA_DIR
1st Try
My first try was to just copy the broken node’s Mnesia files to the VM’s $RABBITMQ_MNESIA_DIR failed. The files contained node names that RabbitMQ tried to reach but were unreachable from the VM.
Error description:
{could_not_start,rabbit,
{{failed_to_cluster_with,
['$BROKEN_NODENAME'],
"Mnesia could not connect to any nodes."},
{rabbit,start,[normal,[]]}}}
So I tried to be a little bit more picky on what I copied.
First I had to reset $RABBITMQ_MNESIA_DIR by deleting it and have RabbitMQ recreate it. (I needed to do this way too many times ?)
sudo service rabbitmq-server stop rm -r $RABBITMQ_MNESIA_DIR sudo service rabbitmq-server start
Stopping RabbitMQ I tried to feed it the broken server’s data in piecemeal fashion. This time I only copied the
rabbit_*.[DCD,DCL]
and restarted RabbitMQ.

Looking at the web management interface there were all the queues we were missing, but they were “down” and clicking on them told you
The object you clicked on was not found; it may have been deleted on the server.
Copying any more data didn’t solve the issue. So this was a dead end. ?
2nd Try
So I thought why doesn’t the RabbitMQ in the VM pretend to be the exact same node as on the broken server?
So I created a
/etc/rabbitmq/rabbitmq-env.conf
with
NODENAME=$BROKEN_NODENAME
in there.
I copied the backup to $RABBITMQ_MNESIA_DIR (now with the new node name) and fixed the permissions.
Now starting RabbitMQ failed with
ERROR: epmd error for host $BROKEN_HOST: nxdomain (non-existing domain)
I edited
/etc/hosts
to add $BROKEN_HOST to the list of names that resolve to 127.0.0.1.
Now restarting RabbitMQ failed with yet another error:
Error description:
{could_not_start,rabbit,
{{schema_integrity_check_failed,
[{table_attributes_mismatch,rabbit_queue,
[name,durable,auto_delete,exclusive_owner,arguments,pid,
slave_pids,sync_slave_pids,recoverable_slaves,policy,
gm_pids,decorators,state],
[name,durable,auto_delete,exclusive_owner,arguments,pid,
slave_pids,sync_slave_pids,mirror_nodes,policy]}]},
{rabbit,start,[normal,[]]}}}
Now what? Why don’t I try to give it the Mnesia files piece by piece again?
- Reset $RABBITMQ_MNESIA_DIR
- Stop RabbitMQ
- Copy
rabbit_*
files in again and fix their permissions
- Start RabbitMQ
All our queues were back and all their configuration seemed OK as well. But we still didn’t have our messages back yet.

Solution
So I tried to copy more and more files over from the backup repeating the above steps. I finally reached my goal after copying
rabbit_*
,
msg_store_*
,
queues
and
recovery.dets
. Fixing their permissions and starting RabbitMQ it had all the queues restored with all the messages in them. ?

Now I could use ordinary methods to extract all the messages. Dumping all the messages and examining them they looked OK. Publishing the recovered messages to the new server I was pretty euphoric. ?
