Cargo Cults, Compilers, and Codebots

Feb 23

or, On Understanding in the Age of Code-Generating Machines

The famous and fabulous “stochastic parrot” essay is making the rounds again. Written in 2021, it focuses on the use of large language models (LLMs) as conversational tools. In the years since, LLMs have risen in importance as coding tools. Of course they are still popular as chatbots but that terrain has been mapped extensively, so I’d like to examine their functionality as codebots.

It’s easy to see that a conversation with a chatbot is different than one with a human due to a difference in understanding. The chatbot does not understand what it means to navigate the world in a human body. But a codebot navigating the world of machine bodies? Perhaps there is a parallel type of insight.

I’m old-fashioned and like to write code from scratch. I disable autocomplete and stick to the vanilla functions of a small number of languages. If I don’t understand a code snippet, I don’t use it. It’s a very limiting approach that in 25 years has not been shared by many of my peers or mentors! I’m often told to think of it like a magic spell: you don’t need to understand it for it to work.

Industry practice supports this view. It is not reasonable to expect a single engineer to understand a company’s full codebase. Our own software, Rookery, is relatively simple — and still it contains ten thousand lines of code, a dozen libraries, and several B2B microservices. The file tree alone is a small ecosystem. But some files are just a single line: an import statement. That imported library may contain hundreds of functions; we might use two. If the organization maintaining that library updates it, our code might need to change too. Maintenance becomes a logistical labyrinth. Modern applications are often — maybe always — more powerful than they strictly need to be.

So even an old-fashioned coder like me, who insists on reading every line, stands on abstraction after abstraction. I understand my functions. I understand the libraries I call (partially). But what happens in the compiler? In the operating system? On the circuit board? My understanding becomes hazy, gestural. A modern chatbot can probably describe those lower layers more fluently than I can. It may even understand them better.

What does understanding mean in this context, exactly? It’s a semantic question that occupies entire schools of philosophy, so I won’t get too into the weeds here. But I think it’s useful to employ a definition that does not limit itself to humans. Birds understand color and song in a way we do not, for example. Perhaps understanding is a form of embodiment.

LLMs trained on code have seen millions of repositories, documentation files, error messages, and patches. They are remarkably good at reproducing idioms, translating between languages, and generating boilerplate that compiles. They think of code as distributions over valid continuations. They do not execute it internally. They do not experience debugging as frustration or surprise. But they have ingested vast statistical regularities about what tends to work.

Human coders, on the contrary, seem to operate on something more like faith than statistics. Relying on frameworks we barely understand, copy-pasting large swaths of code from tutorials and help forums and style guides. This has been called cargo cult programming, a term perfectly layered in esoteric abstraction.

The difference may not be as simple as “LLMs parrot, humans understand.” The sharper distinction might be this: humans bring intention and causal modeling; LLMs bring structural compression at scale.

When I write a function, I hold a goal in mind. I imagine runtime behavior. I anticipate edge cases — or fail to. When something breaks, I form hypotheses. I test them. Debugging reveals whether I truly grasp the system’s causal structure. Understanding, for a human, often becomes visible only under failure.

In the computational domain, LLMs are embodied as machines in a way that humans are not. They are instantiated in hardware. They operate within the same substrate that executes the code they generate. The computational world is native terrain for them. For us, it is an abstraction layered over silicon we barely comprehend.

Computers and programming languages are human-designed. But modern LLMs are black boxes even to their creators. A single programmer cannot fully trace the internal representations of a large model. We have built a system that writes code for machines — and that system itself is opaque to us.

So who understands coding better? Of course the answer is the Cyborg: we are better together.

References

On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? 🦜
Authors: Emily M. Bender, Timnit Gebru, Angelina McMillan-Major, Shmargaret Shmitchell
FAccT '21: Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency Pages 610 - 623
https://doi.org/10.1145/3442188.3445922
https://dl.acm.org/doi/10.1145/3442188.3445922

https://en.wikipedia.org/wiki/Stochastic_parrot

https://en.wikipedia.org/wiki/Cargo_cult_programming