What Happens If Someone Reimplements Your Open Source Software with LLMs And Relicenses It?

It seems there’s a new use case for LLMs: letting it reimplement open-source software in order to re-license the result.

Armin Ronacher has some interesting thoughts on the licensing consequences:

What I think is more interesting about this question is the consequences of where we are. Copyleft code like the GPL heavily depends on copyrights and friction to enforce it. But because it’s fundamentally in the open, with or without tests, you can trivially rewrite it these days.

There are huge consequences to this. When the cost of generating code goes down that much, and we can re-implement it from test suites alone, what does that mean for the future of software? Will we see a lot of software re-emerging under more permissive licenses? Will we see a lot of proprietary software re-emerging as open source? Will we see a lot of software re-emerging as proprietary?

For me personally, what is more interesting is that we might not even be able to copyright these creations at all. A court still might rule that all AI-generated code is in the public domain, because there was not enough human input in it. That’s quite possible, though probably not very likely.

In the GPL case, though, I think it warms up some old fights about copyleft vs permissive licenses that we have not seen in a long time. It probably does not feel great to have one’s work rewritten with a Clanker and one’s authorship eradicated. Unlike the Ship of Theseus, though, this seems more clear-cut: if you throw away all code and start from scratch, even if the end result behaves the same, it’s a new ship. It only continues to carry the name. Which may be another argument for why authors should hold on to trademarks rather than rely on licenses and contract law.

Simon Willison has a timeline of how it came to the “LLM rewrite” of chardet summarizes the arguments of those involved. There’s also a comment by one of the authors of the GPLv3 and LGPLv3 Richard Fontana:

[…] FWIW, IANDBL, TINLA, etc., I don’t currently see any basis for concluding that chardet 7.0.0 is required to be released under the LGPL. AFAIK no one including Mark Pilgrim has identified persistence of copyrightable expressive material from earlier versions in 7.0.0 nor has anyone articulated some viable alternate theory of license violation. I don’t think I personally would have used the MIT license here, even if I somehow rewrote everything from scratch without the use of AI in a way that didn’t implicate obligations flowing from earlier versions of chardet, but that’s irrelevant.

Bionic Duckweed

bionic duckweed, noun

An as-yet-non-existent innovation, hyped with the aim not to sell it or to invent it, but simply to put a stop on or stalling the actually-existing competition.

In its broader sense, bionic duckweed can be thought of a sort of unobtainium that renders investment in present-day technologies pointless, unimaginative, and worst of all On The Wrong Side Of History. […] A a sort of promissory note in reverse, forcing us into inaction today in the hope of wonders tomorrow.

from Bionic Duckweed: making the future the enemy of the present.

On Moltbook

Bruce Schneier has probably found the best and most succinct quotes to summarize Moltbook:

Many people have pointed out that a lot of the viral comments were in fact posted by people posing as bots. But even the bot-written posts are ultimately the result of people pulling the strings, more puppetry than autonomy.

But it also has a very dystopian outlook on what might follow:

The theory is simple: First, AI gets accessible enough that anyone can use it. Second, AI gets good enough that you can’t reliably tell what’s fake. Third, and this is the crisis point, regular people realize there’s nothing online they can trust. At that moment, the internet stops being useful for anything except entertainment.

H-Bomb: A Frank Lloyd Wright Typographic Mystery

This is a pointless, but fun investigation of why some the letters “H” above the entrance of Frank Lloyd Wright‘s Unity Temple church in Chicago are up-side-down. The author tries to track down historical documents and pictures to reconstruct the history of the when those letters were put up and maybe taken down … and to ultimately see how far back the mistake goes.

Current

Terry Godier has found a great metaphor for a feed reader: a current. It leaves the shadow of mail clients and models feeds as currents with different velocities: automatically drifting by and fading away if unread. While moving away from traditional mail-like UI concepts feeds are still presented in-order (in contrast to social media-like “curated” feeds).

I like the idea and how far the metaphor carries and applies to all the technical and usability bits. It’ll take time to see if it really “holds water,” 😜 but I’m intrigued.

What Was Actually Achieved By LLMs Building A C-Compiler

Ars Technica has put into perspective what it means that LLMs “created” a C-compiler by themselves. My favourite quotes:

It’s worth noting that a C compiler is a near-ideal task for semi-autonomous AI model coding: The specification is decades old and well-defined, comprehensive test suites already exist, and there’s a known good reference compiler to check against. Most real-world software projects have none of these advantages. The hard part of most development isn’t writing code that passes tests; it’s figuring out what the tests should be in the first place.

[…] Even with all optimizations enabled, it produces less-efficient code than GCC running with all optimizations disabled. […]

Anthropic describes the compiler as a “clean-room implementation” because the agents had no Internet access during development. But that framing is somewhat misleading. The underlying model was trained on enormous quantities of publicly available source code, almost certainly including GCC, Clang, and numerous smaller C compilers. In traditional software development, “clean room” specifically means the implementers have never seen the original code. By that standard, this isn’t one. […]

“It was rather a brute force attempt to decompress fuzzily stored knowledge contained within the network,”

None of this should obscure what the project actually demonstrates. A year ago, no language model could have produced anything close to a functional multi-architecture compiler, even with this kind of babysitting and an unlimited budget. The methodology of parallel agents coordinating through Git with minimal human supervision is novel, and the engineering tricks Carlini developed to keep the agents productive (context-aware test output, time-boxing, the GCC oracle for parallelization) could potentially represent useful contributions to the wider use of agentic software development tools.

The Worst Programming Language of All Time

You can argue that C++ shares this honor with the likes of JavaScript and TeX. Among them only JavaScript managed to design itself out of the mess it was in the early-to-mid 2000s. There’re still the ugly parts, but each new iteration actually improved the language as a whole. All while keeping backward compatibility. Well, TeX is odd and idiosyncratic, but it’s a “niche” language. And then there’s C++ … which managed to become more and more of a mess the more they tried to “improve” it. Making big blunders in designing features, failing to rectify them in a timely manner and then cowardly leaving “broken” features in the language to preserve backward compatibility. *sigh*

Here’s a great collection of grievances:

While many of the features are useful and necessary for a modern language, all the pieces are so shoddily Frankensteined together it is hilarious.

Just the amount of “separate” Turing-complete languages it contains is out of this world: C++, its C subset, Macros, Templates, Exceptions, constexpr/consteval, co-routines. All with separate syntax, semantics, inconsistencies and foot guns and no coherent design.

And even after all that it’s still missing essential pieces for software development like dependency and build management which the specification doesn’t even acknowledge as relevant. 🤯 This leading to weird edge cases like ODR violations or “ill-formed, NDR”-like atrocities, which was summarized best in a CppCon talk:

This is a language which has false positives for the question “was this a program?”

What is C++ – Chandler Carruth, Titus Winters – CppCon 2019 at 13:23