Here’s a hard truth: regardless of the boilerplate in your privacy policy, none of your users have given informed consent to being tracked. Every tracker and beacon script on your web site increases the privacy cost they pay for transacting with you, chipping away at the trust in the relationship.
Because
The all too typical corporate big data strategy boils down to three steps:
Write down all the data
???
Profit
This never makes sense. You can’t expect the value of data to just appear out of thin air. Data isn’t fissile material. It doesn’t spontaneously reach critical mass and start producing insights.
Which leads to the realization:
Think this way for a while, and you notice a key factor: old data usually isn’t very interesting. You’ll be much more interested in what your users are doing right now than what they were doing a year ago. Sure, spotting trends in historical data might be cool, but in all likelihood it isn’t actionable. Today’s data is.
So
Actionable insight is an asset. Data is a liability. And old data is a non-performing loan.
It also has a discussion on how difficult it is to do a “psychological” assessment of toddlers’ behavior and derive concrete explanations or conclusions from them.
In the company I work for we’re using RabbitMQ to offload non-timecritical processing of tasks. To be able to recover in case RabbitMQ goes down our queues are durable and all our messages are marked as persistent. We generally have a very low number of messages in flight at any moment in time. There’s just one queue with a decent amount of them: the “failed messages” dump.
The Problem
It so happens that after a botched update to the most recent version of RabbitMQ (3.5.3 at the time) our admins had to nuke the server and install it from scratch. They had made a backup of RabbitMQ’s Mnesia database and I was tasked to recover the messages from it.
This is the story of how I did it.
Since our RabbitMQ was configured to persist all the messages this should be generally possible. Surely I wouldn’t be the first one to attempt this. ?
Looking through the Internet it seems there’s no way of ex/importing a node’s configuration if it’s not running. I couldn’t find any documentation on how to import a Mnesia backup into a new node or extract data from it into a usable form. ?
The Idea
My idea was to setup a virtual machine (running DebianWheezy) with RabbitMQ and then to somehow make it read/recover and run the broken server’s database.
In the following you’ll see the following placeholders:
My first try was to just copy the broken node’s Mnesia files to the VM’s $RABBITMQ_MNESIA_DIR failed. The files contained node names that RabbitMQ tried to reach but were unreachable from the VM.
Error description:
{could_not_start,rabbit,
{{failed_to_cluster_with,
['$BROKEN_NODENAME'],
"Mnesia could not connect to any nodes."},
{rabbit,start,[normal,[]]}}}
So I tried to be a little bit more picky on what I copied.
First I had to reset $RABBITMQ_MNESIA_DIR by deleting it and have RabbitMQ recreate it. (I needed to do this way too many times ?)
sudo service rabbitmq-server stop
rm -r $RABBITMQ_MNESIA_DIR
sudo service rabbitmq-server start
Stopping RabbitMQ I tried to feed it the broken server’s data in piecemeal fashion. This time I only copied the
rabbit_*.[DCD,DCL]
and restarted RabbitMQ.
RabbitMQ’s management interface lists all the queues, but it thinks the node they’re on is “down”
Looking at the web management interface there were all the queues we were missing, but they were “down” and clicking on them told you
The object you clicked on was not found; it may have been deleted on the server.
Copying any more data didn’t solve the issue. So this was a dead end. ?
2nd Try
So I thought why doesn’t the RabbitMQ in the VM pretend to be the exact same node as on the broken server?
So I created a
/etc/rabbitmq/rabbitmq-env.conf
with
NODENAME=$BROKEN_NODENAME
in there.
I copied the backup to $RABBITMQ_MNESIA_DIR (now with the new node name) and fixed the permissions.
Now starting RabbitMQ failed with
ERROR: epmd error for host $BROKEN_HOST: nxdomain (non-existing domain)
I edited
/etc/hosts
to add $BROKEN_HOST to the list of names that resolve to 127.0.0.1.
Now restarting RabbitMQ failed with yet another error:
Now what? Why don’t I try to give it the Mnesia files piece by piece again?
Reset $RABBITMQ_MNESIA_DIR
Stop RabbitMQ
Copy
rabbit_*
files in again and fix their permissions
Start RabbitMQ
All our queues were back and all their configuration seemed OK as well. But we still didn’t have our messages back yet.
The queues have been restored, but they have no messages in them
Solution
So I tried to copy more and more files over from the backup repeating the above steps. I finally reached my goal after copying
rabbit_*
,
msg_store_*
,
queues
and
recovery.dets
. Fixing their permissions and starting RabbitMQ it had all the queues restored with all the messages in them. ?
Queues and messages restored
Now I could use ordinary methods to extract all the messages. Dumping all the messages and examining them they looked OK. Publishing the recovered messages to the new server I was pretty euphoric. ?
I often tell myself and my students: medicine is the most human of all the sciences that is stuck with the least human of all the experiments: and that is the randomized trial.
Randomization doesn’t exist because doctors are malign or medicine is nasty it exists precisely for the utterly opposite reason: because we hope too much.
We’re so hopeful, that we want things to work so badly-especially against cancer-we want things to work so badly that we’ll trick ourselves to believing that they’re working.
And there’s nothing as toxic or as lethal as that trick: the trick of hope.
— Dr. Siddhartha Mukherjee in PBS’ Cancer: The Emperor of All Maladies
What’s so new about RC cars? Oh … they’re the real, “off-the-shelve” ones, those where people ride in … and they can be hijacked by anybody with the same cellular provider … over the Internet, no direct access required … o.O … WIRED has a piece.
The Intercept has a lengthy article on what we know on the NSA’s speech recognition capabilities. Putting aside the actual capabilities, just the fact that anything you say will be recorded, stored and may be accessed at any point in the future only protected by “policy” sends shivers down my spine.
“People still aren’t realizing quite the magnitude that the problem could get to,” Raj said. “And it’s not just surveillance,” he said. “People are using voice services all the time. And where does the voice go? It’s sitting somewhere. It’s going somewhere. You’re living on trust.” He added: “Right now I don’t think you can trust anybody.”
Also when all the voice data gets automatically transcribed, made keyword-searchable, flagged and presented to agents as “potentially interesting” there’s basically no way of producing any sort of indication for suspicion other than pointing at a black box and mumbling something vaguely resembling “correlation.”
“When the NSA identifies someone as ‘interesting’ based on contemporary NLP [Natural Language Processing] methods, it might be that there is no human-understandable explanation as to why beyond: ‘his corpus of discourse resembles those of others whom we thought interesting’; or the conceptual opposite: ‘his discourse looks or sounds different from most people’s.'”
If you use Python‘s Bottle micro-framework there’ll be a time where you’ll want to add custom plugins. To get a better feeling on what code gets executed when, I created a minimal Bottle app with a test plugin that logs what code gets executed. I uesed it to test both global and route-specific plugins.
When Python loads the module you’ll see that the plugins’
__init__()
and
setup()
methods will be called immediately when they are installed on the app or applied to the route. This happens in the order they appear in the code. Then the app is started.
The first time a route is called Bottle executes the plugins’
apply()
methods. This happens in “reversed order” of installation (which makes sense for a nested callback chain). This means first the route-specific plugins get applied then the global ones. Their result is cached, i.e. only the inner/wrapped function is executed from here on out.
Then for every request the
apply()
method’s inner function is executed. This happens in the “original” order again.
Below you can see the code and example logs for two requests. You can also clone the Gist and do your own experiments.