What Happens If Someone Reimplements Your Open Source Software with LLMs And Relicenses It?

It seems there’s a new use case for LLMs: letting it reimplement open-source software in order to re-license the result.

Armin Ronacher has some interesting thoughts on the licensing consequences:

What I think is more interesting about this question is the consequences of where we are. Copyleft code like the GPL heavily depends on copyrights and friction to enforce it. But because it’s fundamentally in the open, with or without tests, you can trivially rewrite it these days.

There are huge consequences to this. When the cost of generating code goes down that much, and we can re-implement it from test suites alone, what does that mean for the future of software? Will we see a lot of software re-emerging under more permissive licenses? Will we see a lot of proprietary software re-emerging as open source? Will we see a lot of software re-emerging as proprietary?

For me personally, what is more interesting is that we might not even be able to copyright these creations at all. A court still might rule that all AI-generated code is in the public domain, because there was not enough human input in it. That’s quite possible, though probably not very likely.

In the GPL case, though, I think it warms up some old fights about copyleft vs permissive licenses that we have not seen in a long time. It probably does not feel great to have one’s work rewritten with a Clanker and one’s authorship eradicated. Unlike the Ship of Theseus, though, this seems more clear-cut: if you throw away all code and start from scratch, even if the end result behaves the same, it’s a new ship. It only continues to carry the name. Which may be another argument for why authors should hold on to trademarks rather than rely on licenses and contract law.

Simon Willison has a timeline of how it came to the “LLM rewrite” of chardet summarizes the arguments of those involved. There’s also a comment by one of the authors of the GPLv3 and LGPLv3 Richard Fontana:

[…] FWIW, IANDBL, TINLA, etc., I don’t currently see any basis for concluding that chardet 7.0.0 is required to be released under the LGPL. AFAIK no one including Mark Pilgrim has identified persistence of copyrightable expressive material from earlier versions in 7.0.0 nor has anyone articulated some viable alternate theory of license violation. I don’t think I personally would have used the MIT license here, even if I somehow rewrote everything from scratch without the use of AI in a way that didn’t implicate obligations flowing from earlier versions of chardet, but that’s irrelevant.

Bionic Duckweed

bionic duckweed, noun

An as-yet-non-existent innovation, hyped with the aim not to sell it or to invent it, but simply to put a stop on or stalling the actually-existing competition.

In its broader sense, bionic duckweed can be thought of a sort of unobtainium that renders investment in present-day technologies pointless, unimaginative, and worst of all On The Wrong Side Of History. […] A a sort of promissory note in reverse, forcing us into inaction today in the hope of wonders tomorrow.

from Bionic Duckweed: making the future the enemy of the present.

On Moltbook

Bruce Schneier has probably found the best and most succinct quotes to summarize Moltbook:

Many people have pointed out that a lot of the viral comments were in fact posted by people posing as bots. But even the bot-written posts are ultimately the result of people pulling the strings, more puppetry than autonomy.

But it also has a very dystopian outlook on what might follow:

The theory is simple: First, AI gets accessible enough that anyone can use it. Second, AI gets good enough that you can’t reliably tell what’s fake. Third, and this is the crisis point, regular people realize there’s nothing online they can trust. At that moment, the internet stops being useful for anything except entertainment.

H-Bomb: A Frank Lloyd Wright Typographic Mystery

This is a pointless, but fun investigation of why some the letters “H” above the entrance of Frank Lloyd Wright‘s Unity Temple church in Chicago are up-side-down. The author tries to track down historical documents and pictures to reconstruct the history of the when those letters were put up and maybe taken down … and to ultimately see how far back the mistake goes.

Current

Terry Godier has found a great metaphor for a feed reader: a current. It leaves the shadow of mail clients and models feeds as currents with different velocities: automatically drifting by and fading away if unread. While moving away from traditional mail-like UI concepts feeds are still presented in-order (in contrast to social media-like “curated” feeds).

I like the idea and how far the metaphor carries and applies to all the technical and usability bits. It’ll take time to see if it really “holds water,” 😜 but I’m intrigued.

So You Want To Delegate ZFS Datasets to Containers

ZFS supports delegating datasets and their children to containers since version 2.2. It moves the control of the datasets from the host to a container’s namespace (ZFS also calls them “zoned”). But it’s never as easy as it sounds. As with everything containers the shifting of user ids plays weird tricks on you.

I recently tried experimenting with the ZFS delegation feature of Incus custom volumes. This allows Incus/LXD/LXC-style system containers to manage a sub-tree of ZFS datasets from inside the container. Everything is fine when you create the top dataset you want to delegate, delegate it to the container and create all the necessary sub-datasets from inside the container. But things get weird when you have datasets created on the host that you want to move under the delegated dataset (e.g. zfs rename tank/some-where/some-data tank/incus/custom/default_c1-custom-volume/some-data).

It basically boils down to:

Even root can’t change or write data into a dataset that was created on the host and then moved under a container’s delegated custom volume. Creating a new dataset from inside the container doesn’t have the same problem.

I felt like this was a serious shortcoming and would impede migration scenarios like mine so I reported it as a bug … it turns out, I was holding it wrong. 😅

The Solution

To fix my situation and move externally created datasets into a zone I needed to find the Hostid fields from the container’s volatile.idmap.current option (one for UIDs and one for GIDs; both were 1000000 in my case).
Then running chown -R 1000000:1000000 /mountpoint/of/the/dataset/to/be/moved on the host is where the magic lies. 😁
Moving the dataset by running zfs unmount ..., zfs rename ..., zfs set zoned=on ... on the host I was not only able to zfs mount it in the container, but now the ids were in the right range for the container to manage the data in it.

What Was Actually Achieved By LLMs Building A C-Compiler

Ars Technica has put into perspective what it means that LLMs “created” a C-compiler by themselves. My favourite quotes:

It’s worth noting that a C compiler is a near-ideal task for semi-autonomous AI model coding: The specification is decades old and well-defined, comprehensive test suites already exist, and there’s a known good reference compiler to check against. Most real-world software projects have none of these advantages. The hard part of most development isn’t writing code that passes tests; it’s figuring out what the tests should be in the first place.

[…] Even with all optimizations enabled, it produces less-efficient code than GCC running with all optimizations disabled. […]

Anthropic describes the compiler as a “clean-room implementation” because the agents had no Internet access during development. But that framing is somewhat misleading. The underlying model was trained on enormous quantities of publicly available source code, almost certainly including GCC, Clang, and numerous smaller C compilers. In traditional software development, “clean room” specifically means the implementers have never seen the original code. By that standard, this isn’t one. […]

“It was rather a brute force attempt to decompress fuzzily stored knowledge contained within the network,”

None of this should obscure what the project actually demonstrates. A year ago, no language model could have produced anything close to a functional multi-architecture compiler, even with this kind of babysitting and an unlimited budget. The methodology of parallel agents coordinating through Git with minimal human supervision is novel, and the engineering tricks Carlini developed to keep the agents productive (context-aware test output, time-boxing, the GCC oracle for parallelization) could potentially represent useful contributions to the wider use of agentic software development tools.