What Happens If Someone Reimplements Your Open Source Software with LLMs And Relicenses It?

It seems there’s a new use case for LLMs: letting it reimplement open-source software in order to re-license the result.

Armin Ronacher has some interesting thoughts on the licensing consequences:

What I think is more interesting about this question is the consequences of where we are. Copyleft code like the GPL heavily depends on copyrights and friction to enforce it. But because it’s fundamentally in the open, with or without tests, you can trivially rewrite it these days.

There are huge consequences to this. When the cost of generating code goes down that much, and we can re-implement it from test suites alone, what does that mean for the future of software? Will we see a lot of software re-emerging under more permissive licenses? Will we see a lot of proprietary software re-emerging as open source? Will we see a lot of software re-emerging as proprietary?

For me personally, what is more interesting is that we might not even be able to copyright these creations at all. A court still might rule that all AI-generated code is in the public domain, because there was not enough human input in it. That’s quite possible, though probably not very likely.

In the GPL case, though, I think it warms up some old fights about copyleft vs permissive licenses that we have not seen in a long time. It probably does not feel great to have one’s work rewritten with a Clanker and one’s authorship eradicated. Unlike the Ship of Theseus, though, this seems more clear-cut: if you throw away all code and start from scratch, even if the end result behaves the same, it’s a new ship. It only continues to carry the name. Which may be another argument for why authors should hold on to trademarks rather than rely on licenses and contract law.

Simon Willison has a timeline of how it came to the “LLM rewrite” of chardet summarizes the arguments of those involved. There’s also a comment by one of the authors of the GPLv3 and LGPLv3 Richard Fontana:

[…] FWIW, IANDBL, TINLA, etc., I don’t currently see any basis for concluding that chardet 7.0.0 is required to be released under the LGPL. AFAIK no one including Mark Pilgrim has identified persistence of copyrightable expressive material from earlier versions in 7.0.0 nor has anyone articulated some viable alternate theory of license violation. I don’t think I personally would have used the MIT license here, even if I somehow rewrote everything from scratch without the use of AI in a way that didn’t implicate obligations flowing from earlier versions of chardet, but that’s irrelevant.

On Moltbook

Bruce Schneier has probably found the best and most succinct quotes to summarize Moltbook:

Many people have pointed out that a lot of the viral comments were in fact posted by people posing as bots. But even the bot-written posts are ultimately the result of people pulling the strings, more puppetry than autonomy.

But it also has a very dystopian outlook on what might follow:

The theory is simple: First, AI gets accessible enough that anyone can use it. Second, AI gets good enough that you can’t reliably tell what’s fake. Third, and this is the crisis point, regular people realize there’s nothing online they can trust. At that moment, the internet stops being useful for anything except entertainment.

What Was Actually Achieved By LLMs Building A C-Compiler

Ars Technica has put into perspective what it means that LLMs “created” a C-compiler by themselves. My favourite quotes:

It’s worth noting that a C compiler is a near-ideal task for semi-autonomous AI model coding: The specification is decades old and well-defined, comprehensive test suites already exist, and there’s a known good reference compiler to check against. Most real-world software projects have none of these advantages. The hard part of most development isn’t writing code that passes tests; it’s figuring out what the tests should be in the first place.

[…] Even with all optimizations enabled, it produces less-efficient code than GCC running with all optimizations disabled. […]

Anthropic describes the compiler as a “clean-room implementation” because the agents had no Internet access during development. But that framing is somewhat misleading. The underlying model was trained on enormous quantities of publicly available source code, almost certainly including GCC, Clang, and numerous smaller C compilers. In traditional software development, “clean room” specifically means the implementers have never seen the original code. By that standard, this isn’t one. […]

“It was rather a brute force attempt to decompress fuzzily stored knowledge contained within the network,”

None of this should obscure what the project actually demonstrates. A year ago, no language model could have produced anything close to a functional multi-architecture compiler, even with this kind of babysitting and an unlimited budget. The methodology of parallel agents coordinating through Git with minimal human supervision is novel, and the engineering tricks Carlini developed to keep the agents productive (context-aware test output, time-boxing, the GCC oracle for parallelization) could potentially represent useful contributions to the wider use of agentic software development tools.

Those ones were the expensive headcount anyway

Arstechnica reports on a study where they measured the productivity of software developers of different open source projects doing different (also non-coding) tasks.

In the comments there’s a snarky summary of the articles main point:

“These factors lead the researchers to conclude that current AI coding tools may be particularly ill-suited to “settings with very high quality standards, or with many implicit requirements (e.g., relating to documentation, testing coverage, or linting/formatting) that take humans substantial time to learn.” While those factors may not apply in “many realistic, economically relevant settings” involving simpler code bases, they could limit the impact of AI tools in this study and similar real-world situations.”

So as long as I cull the experienced people and commit to lousy software the glorious Age of AI will deliver productivity gains? Awesome, those ones were the expensive headcount!

We’ll Ask The AI How to Make Money

We have no current plans to make revenue.

We have no idea how we may one day generate revenue.

We have made a soft promise to investors that once we’ve built a general intelligence system, basically we will ask it to figure out a way to generate an investment return for you.

Sam Altman to VCs in 2024

A video of this memorable moment … you can’t make this up.

Best “AI”-Rant

Most organizations cannot ship the most basic applications imaginable with any consistency, and you’re out here saying that the best way to remain competitive is to roll out experimental technology that is an order of magnitude more sophisticated than anything else your I.T department runs, which you have no experience hiring for, when the organization has never used a GPU for anything other than junior engineers playing video games with their camera off during standup, and even if you do that all right there is a chance that the problem is simply unsolvable due to the characteristics of your data and business? This isn’t a recipe for disaster, it’s a cookbook for someone looking to prepare a twelve course fucking catastrophe.

How about you remain competitive by fixing your shit? I’ve met a lead data scientist with access to hundreds of thousands of sensitive customer records who is allowed to keep their password in a text file on their desktop, and you’re worried that customers are best served by using AI to improve security through some mechanism that you haven’t even come up with yet? You sound like an asshole and I’m going to kick you in the jaw until, to the relief of everyone, a doctor will have to wire it shut, giving us ten seconds of blessed silence where we can solve actual problems.

After some general ranting the author answers several common “reasons” why a company might want to use LLMs/AI tools.