What Happens If Someone Reimplements Your Open Source Software with LLMs And Relicenses It?

It seems there’s a new use case for LLMs: letting it reimplement open-source software in order to re-license the result.

Armin Ronacher has some interesting thoughts on the licensing consequences:

What I think is more interesting about this question is the consequences of where we are. Copyleft code like the GPL heavily depends on copyrights and friction to enforce it. But because it’s fundamentally in the open, with or without tests, you can trivially rewrite it these days.

There are huge consequences to this. When the cost of generating code goes down that much, and we can re-implement it from test suites alone, what does that mean for the future of software? Will we see a lot of software re-emerging under more permissive licenses? Will we see a lot of proprietary software re-emerging as open source? Will we see a lot of software re-emerging as proprietary?

For me personally, what is more interesting is that we might not even be able to copyright these creations at all. A court still might rule that all AI-generated code is in the public domain, because there was not enough human input in it. That’s quite possible, though probably not very likely.

In the GPL case, though, I think it warms up some old fights about copyleft vs permissive licenses that we have not seen in a long time. It probably does not feel great to have one’s work rewritten with a Clanker and one’s authorship eradicated. Unlike the Ship of Theseus, though, this seems more clear-cut: if you throw away all code and start from scratch, even if the end result behaves the same, it’s a new ship. It only continues to carry the name. Which may be another argument for why authors should hold on to trademarks rather than rely on licenses and contract law.

Simon Willison has a timeline of how it came to the “LLM rewrite” of chardet summarizes the arguments of those involved. There’s also a comment by one of the authors of the GPLv3 and LGPLv3 Richard Fontana:

[…] FWIW, IANDBL, TINLA, etc., I don’t currently see any basis for concluding that chardet 7.0.0 is required to be released under the LGPL. AFAIK no one including Mark Pilgrim has identified persistence of copyrightable expressive material from earlier versions in 7.0.0 nor has anyone articulated some viable alternate theory of license violation. I don’t think I personally would have used the MIT license here, even if I somehow rewrote everything from scratch without the use of AI in a way that didn’t implicate obligations flowing from earlier versions of chardet, but that’s irrelevant.

Moving Passwords From Firefox (Lockwise) to KeePassXC

Update 2021-03-27: Good news, it seems KeepassXC 2.7.0 learned to import Firefox timestamps. 😀

Go to about:logins in Firefox.

Click on the three dots in the top right corner, select “Export Logins…” and save your passwords info a CSV file (e.g. exported_firefox_logins.csv).

There’s one complication: Firefox will save dates as Unix timestamps (with millisecond resolution) which KeePassXC doesn’t understand, so we’ll have to help it out.

Save the following script into a file (e.g. firefox_csv_date_fix.py).

#!/usr/bin/env python3
#
# Author: Riyad Preukschas <riyad@informatik.uni-bremen.de>
# License: Mozilla Public License 2.0
# SPDX-License-Identifier: MPL-2.0


import csv
import sys

from datetime import datetime


def main():
    if len(sys.argv) != 2:
        print("Usage: {} <exported_firefox_logins.csv>".format(sys.argv[0]), file=sys.stderr)
        exit(1)

    csv_file_name = sys.argv[1]

    with open(csv_file_name, 'rt') as f:
        # field names will be determined from first row
        csv_reader = csv.DictReader(f)
        # read all rows in one go
        rows = list(csv_reader)

    # add new columns with Iso-formatted dates
    for row in rows:
        row['timeCreatedIso'] = datetime.fromtimestamp(int(row['timeCreated'])/1000).isoformat()
        row['timeLastUsedIso'] = datetime.fromtimestamp(int(row['timeLastUsed'])/1000).isoformat()
        row['timePasswordChangedIso'] = datetime.fromtimestamp(int(row['timePasswordChanged'])/1000).isoformat()

    # write out the updated rows
    csv_writer = csv.DictWriter(sys.stdout, fieldnames=rows[0].keys())
    csv_writer.writeheader()
    csv_writer.writerows(rows)


if __name__ == '__main__':
    main()

Call it giving it the path of the exported_firefox_logins.csv file and redirect the output into a new file:

python3 ./firefox_csv_date_fix.py ./exported_firefox_logins.csv > ./fixed_firefox_logins.csv

It will add new columns (named: old column name + “Iso”) with the dates converted to the ISO 8601 format KeePassXC wants.

We can now use the fixed_firefox_logins.csv file to import the logins into KeePassXC.

Select Database -> Import -> CSV File... from the menu.

It will ask you to create a new database. Just do it you can get the data into your default database later (e.g. call it firefox_logins.kdbx for now).

In the import form select “First line has field names” and match up KeePassXC’s fields with the columns from the CSV:

KeePassXC FieldFirefox CSV ColumnNotes
GroupNot Present
TitleurlFirefox doesn’t have a title field so we use the URL for it.
Usernameusername
Passwordpassword
URLurl
NoteshttpRealmI’m not sure how KeePassXC deals with HTTP authentication. There’s no special field for it, so I decided to keep it in the Notes field.
TOTPNot Present
IconNot Present
Last ModifiedtimePasswordChangedIsothe “Iso” part is important!
CreatedtimeCreatedIsothe “Iso” part is important!

Have a look at the preview at the bottom. Does the data look sane? 🤨

Sadly KeePassXC doesn’t let us import Firefox’s timeLastUsedIso field into its Accessed field. 😑

All your imported Logins will be inside the “Root” group. I’d suggest creating a new group (e.g. “Firefox Import”) and moving all imported Logins into it.

You can now open your default database and use the Database -> Merge From Database ... menu item, select the firefox_logins.kdbx file to include it’s contents into the current database. You’ll see a new “Firefox Imports” group show up.

You can now move the single entries into the appropriate groups if you want (e.g. “KeePassXC-Browser Passwords” if you use the browser extension).

Scaffolding for fetching and parsing emails from IMAP with Python

A friend asked me if I could help him write a Python script for fetching and processing data from emails in his mailbox … Well, the thing with emails is that they’re a pain to work with (in any form). So, I tried to help him out with a little scaffolding (also available as a Gist).

Bottle Plugin Lifecycle

If you use Python‘s Bottle micro-framework there’ll be a time where you’ll want to add custom plugins. To get a better feeling on what code gets executed when, I created a minimal Bottle app with a test plugin that logs what code gets executed. I uesed it to test both global and route-specific plugins.

When Python loads the module you’ll see that the plugins’

__init__()

and

setup()

methods will be called immediately when they are installed on the app or applied to the route. This happens in the order they appear in the code. Then the app is started.

The first time a route is called Bottle executes the plugins’

apply()

methods. This happens in “reversed order” of installation (which makes sense for a nested callback chain). This means first the route-specific plugins get applied then the global ones. Their result is cached, i.e. only the inner/wrapped function is executed from here on out.

Then for every request the

apply()

method’s inner function is executed. This happens in the “original” order again.

Below you can see the code and example logs for two requests. You can also clone the Gist and do your own experiments.

https://twitter.com/riyadpr/status/617681143538786304

Poetic APIs

During PyCon 2014 Erik Rose gave a very insightful talk about dos and don’ts of designing APIs. Towards the end he “gets meta” and groups all his points into categories drawing connections how different design goals influence each other. You see two main groups–”lingual” and “mathematical”–and he closes with this gem: 😀

This spotlights something that programming languages have over ordinary human languages. Programs are alive! They not only mean things when people read them, but they actually do things when run. So, very literally a program with carefully chosen symbols is poetry in motion.
— Erik Rose (PyCon 2014)

https://youtu.be/JQYnFyG7A8c

MagicDict

If you write software in Python you come to a point where you are testing a piece of code that expects a more or less elaborate dictionary as an argument to a function. As a good software developer we want that code properly tested but we want to use minimal fixtures to accomplish that.

So, I was looking for something that behaves like a dictionary, that you can give explicit return values for specific keys and that will give you some sort of a “default” return value when you try to access an “unknown” item (I don’t care what as long as there is no Exception raised (e.g.

KeyError

 )).

My first thought was “why not use MagicMock?” … it’s a useful tool in so many situations.

from mock import MagicMock
m = MagicMock(foo="bar")

But using MagicMock where dict is expected yields unexpected results.

>>> # this works as expected
>>> m.foo
'bar'
>>> # but this doesn't do what you'd expect
>>> m["foo"]
<MagicMock name='mock.__getitem__()' id='4396280016'>

First of all attribute and item access are treated differently. You setup MagicMock using key word arguments (i.e. “dict syntax”), but have to use attributes (i.e. “object syntax”) to access them.

Then I thought to yourself “why not mess with the magic methods?”

__getitem__

  and 

__getattr__

  expect the same arguments anyway. So this should work:

m = MagicMock(foo="bar")
m.__getitem__.side_effect = m.__getattr__

Well? …

>>> m.foo
'bar'
>>> m["foo"]
<MagicMock name='mock.foo' id='4554363920'>

… No!

By this time I thought “I can’t be the first to need this” and started searching in the docs and sure enough they provide an example for this case.

d = dict(foo="bar")

m = MagicMock()
m.__getitem__.side_effect = d.__getitem__

Does it work? …

>>> m["foo"]
'bar'
>>> m["bar"]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File ".../env/lib/python2.7/site-packages/mock.py", line 955, in __call__
    return _mock_self._mock_call(*args, **kwargs)
  File ".../env/lib/python2.7/site-packages/mock.py", line 1018, in _mock_call
    ret_val = effect(*args, **kwargs)
KeyError: 'bar'

Well, yes and no. It works as long as you only access those items that you have defined to be in the dictionary. If you try to access any “unknown” item you get a

KeyError

 .

After trying out different things the simplest answer to accomplish what I set out to do seems to be sub-classing defaultdict.

from collections import defaultdict

class MagicDict(defaultdict):
    def __missing__(self, key):
        result = self[key] = MagicDict()
        return result

And? …

>>> m["foo"]
'bar'
>>> m["bar"]
defaultdict(None, {})
>>> m.foo
Traceback (most recent call last):
&nbsp; File "<stdin>", line 1, in <module>
AttributeError: 'MagicDict' object has no attribute 'foo'

Indeed, it is. 😀

Well, not quite. There are still a few comfort features missing (e.g. a proper

__repr__

). The whole, improved and tested code can be found in this Gist: