What Was Actually Achieved By LLMs Building A C-Compiler

Ars Technica has put into perspective what it means that LLMs “created” a C-compiler by themselves. My favourite quotes:

It’s worth noting that a C compiler is a near-ideal task for semi-autonomous AI model coding: The specification is decades old and well-defined, comprehensive test suites already exist, and there’s a known good reference compiler to check against. Most real-world software projects have none of these advantages. The hard part of most development isn’t writing code that passes tests; it’s figuring out what the tests should be in the first place.

[…] Even with all optimizations enabled, it produces less-efficient code than GCC running with all optimizations disabled. […]

Anthropic describes the compiler as a “clean-room implementation” because the agents had no Internet access during development. But that framing is somewhat misleading. The underlying model was trained on enormous quantities of publicly available source code, almost certainly including GCC, Clang, and numerous smaller C compilers. In traditional software development, “clean room” specifically means the implementers have never seen the original code. By that standard, this isn’t one. […]

“It was rather a brute force attempt to decompress fuzzily stored knowledge contained within the network,”

None of this should obscure what the project actually demonstrates. A year ago, no language model could have produced anything close to a functional multi-architecture compiler, even with this kind of babysitting and an unlimited budget. The methodology of parallel agents coordinating through Git with minimal human supervision is novel, and the engineering tricks Carlini developed to keep the agents productive (context-aware test output, time-boxing, the GCC oracle for parallelization) could potentially represent useful contributions to the wider use of agentic software development tools.

Those ones were the expensive headcount anyway

Arstechnica reports on a study where they measured the productivity of software developers of different open source projects doing different (also non-coding) tasks.

In the comments there’s a snarky summary of the articles main point:

“These factors lead the researchers to conclude that current AI coding tools may be particularly ill-suited to “settings with very high quality standards, or with many implicit requirements (e.g., relating to documentation, testing coverage, or linting/formatting) that take humans substantial time to learn.” While those factors may not apply in “many realistic, economically relevant settings” involving simpler code bases, they could limit the impact of AI tools in this study and similar real-world situations.”

So as long as I cull the experienced people and commit to lousy software the glorious Age of AI will deliver productivity gains? Awesome, those ones were the expensive headcount!

Moral parents, moral babies

Ars again covers interesting research on the psychology toddlers. This time: toddlers with parents with lower tolerance to injustice show stronger differences in EEG-readings when watching prosocial vs. antisocial behavior.

It also has a discussion on how difficult it is to do a “psychological” assessment of toddlers’ behavior and derive concrete explanations or conclusions from them.