Vibe coding: is this really the future?

Apr 11, 2025

About a year ago, I mentioned that people were starting to integrate LLM-generated code within their software projects without looking at the code that was generated. Now, thanks to OpenAI grandee Andrej Karpathy, we have a term for this — vibe coding. The basic principle can be summed up by his quote “I just see stuff, say stuff, run stuff, and copy paste stuff, and it mostly works.”

Vibe coding is a bit like prototyping, a concept that’s been used in software engineering for some time. The idea behind prototyping is that you bung something together as quickly as possible to get an idea of what it might look like. Then you get feedback from the people who might use it. Typically you’d go through a series of prototypes that get incrementally closer to the final product. After this, you’d throw away the prototypes and develop the product properly. Which would involve doing things like code reviews and software testing to ensure that it works as expected.

However, vibe coding seems to differ from prototyping in that final quality checks are not part of the package. Which is worrying, because the code currently produced by LLMs is not reliable. In part, this is due to the use of informal natural language prompts as specifications, meaning that what an LLM produces may not be what the “programmer” intended. But it’s also unreliable because LLMs have been found to produce code that contains all sorts of errors. After all, they were trained on code written by human programmers, who are less than perfect.

To be fair to Andrej, he was talking about this in the context of hobby projects. But would anyone really apply vibe coding in a production environment? You know, where there’s a lot of code and a lot of users depending on it. Well, according to the CEO of a prominent technology startup accelerator, a quarter of startups have code bases that are almost entirely LLM-generated. I guess they may have tested all this code, but one reason why startups use generative AI to this degree is because of the considerable time and resource pressures they face. Testing is expensive and time-consuming, so there’s quite an incentive to cut corners. The fact that startups are actively recruiting vibe coders is also telling.

One of my first Substack posts was about the British Post Office scandal, where buggy software led to innocent post office workers becoming imprisoned and losing their livelihoods after being unfairly accused of fraud. Basically the developers had saved time by rapidly adapting some existing software and not carrying out proper testing of the resulting system. In a way, they were vibe coding before this became fashionable.

Buggy software has been around since computers were invented, and there are countless other examples of it causing havoc. Indeed, this happened so often in the early days of software development that it drove the emergence of robust software engineering practices that could prevent this sort of thing from happening. Largely speaking, this has been a success; we can now routinely develop software for even the most safety critical of applications without worrying about it failing to work correctly.

Vibe coding is clearly a threat to this progress. It discourages code comprehension, it deprioritises testing, it doesn’t consider dependability. But the potential savings from vibe coding are immense. Writing code from scratch, reviewing other people’s code, and thoroughly testing software are all very expensive activities. If someone can come along and undercut the competition by not doing them, then someone probably will. All that’s standing in their way are things like regulation, reputation and retribution — all of which are, sadly, negotiable in a world where vibe trumps truth.

So vibe coding (and AI-assisted development more generally) isn’t likely to go away. Which means we have to come up with suitable ways of integrating it into our existing software engineering processes. And, funnily enough, one way of doing this is to increase the involvement of generative AI, specifically in the form of agentic AI.

Agentic AI involves using LLMs to generate sequences of interactions with other tools and systems, including other LLMs. In a software engineering context, these interactions involve things like executing code, invoking debuggers, generating unit tests, or even running proof checkers; the kind of things that software engineers do in order to assess and improve the quality of their code. Broadly speaking, the idea is that you would create one of these agents and ask it to not just write some code, but also do all the things that help build confidence that the code functions as expected. Or to put it another way, do all the things that vibe coders don’t really have the time, resources or inclination to do themselves.

So basically, if you can’t beat ‘em, join ‘em. Whilst there’s an element of AI marking its own homework going on here, I don’t think it’s impossible that this kind of approach could lead to more reliable AI-generated code. There’s certainly evidence that current LLMs are capable of doing things like iteratively refining code and generating test cases. However, it’s important to remember that software engineering is not just about producing code that works; it’s also about producing code that fulfils a need. And this brings us back to specification.

A specification is a description of what you want software to do. In times long gone, people didn’t worry too much about getting specifications right, and so developers commonly developed software that no one wanted to use, because it didn’t adequately a need. This made people unhappy. To address it, a whole ecosystem of approaches were developed at the specification end of the software engineering process, finding ways to capture and communicate exactly what users wanted. These varied from structured textual descriptions, through to diagrams (such as UML), and even formal specification languages (like B, which I talked about here).

And nowadays it’s specification errors, not implementation errors (i.e. bugs), which arguably cause the most suffering. A bug can be fixed relatively quickly, but a specification error might affect an entire code base. Which is why it’s important to get the specification right, ideally before you start coding1. So it’s worrying that specification does not feature highly in the vibe coding philosophy, where the specification is really just a series of unstructured natural language prompts, thrown together according to the whims of the person driving the LLM.

So one important question is whether we can tighten up the specification element of vibe coding. For instance, can we integrate it with existing, trusted, forms of specifying software behaviour? I’d say the answer is a guarded yes. LLMs are capable of multimodal input (and output), so could potentially be fed (and update) existing specification documents containing things like structured text and UML diagrams2. It’s also possible that LLMs could help refine specifications. After all, they’ve seen a huge amount of software documentation during training, and may be able to anticipate more appropriate software solutions than humans can come up with. For instance, it’s not too hard to imagine LLMs critiquing their own software prototypes, and iteratively refining the design without a human in the loop.

Yet there’s still a whole bunch of uncertainties hanging over LLMs. Whilst there is evidence that they can perform a wide range of software engineering tasks, there haven’t been sufficient studies to determine how well they can do these tasks, and hence to what degree we can trust them. But with the move fast and break things mentality of vibe coding, I guess we’ll find out pretty soon. Just assume the brace position in the meantime.

Though a shout out to agile here, which is a software development methodology that’s designed to react to changes in specification.

There are already examples of this sort of thing in the literature, e.g. this paper.

Fetch Decode Execute

Discussion about this post

Ready for more?