A lot has been said about the potential impact of large language models (LLMs) on the jobs of programmers. LLMs can generate code, and they often generate pretty good code1. Over time we’d expect them to get even better at this, and at some point it won’t be a surprise if they write code as well as — or perhaps better than — humans. Less has been said about the impact of LLMs on programming languages. For the last 70 years or so, computer scientists have been churning out programming languages at breakneck speed. We can’t help ourselves; it’s something we just love to do. This has led to the production of lots of code in lots of languages, and this code has now been sucked up by a host of LLMs, each of which will happily generate more of it for us.
And this has led to a big change in the behaviour of programmers. Like most people, programmers want to make efficient use of their time, and the temptation to use LLM-generated code has become too big to resist to many. So we’re now in a situation where programmers are getting pretty used to using LLMs, and so increasingly don’t see the value in writing code from scratch — why bother, when you can just ask your favourite LLM to do it for you?
The end is nigh
Given this situation, what happens when a computer scientist comes along and invents a new language? Well, there won’t be much existing code, so there’s little for an LLM to suck up. Without lots of sample code to learn from, the LLM won’t be able to generate new code. And so programmers will have to write code, from scratch, themselves. But why would they? By this point they’ve adopted LLMs into their workflow, and going back to writing all their code from scratch would seriously impair their productivity.
So there’s little incentive for programmers to take up new languages, and therefore little potential for adequate2 volumes of sample code to be generated, so little possibility of using LLMs to generate code, and therefore little incentive for programmers to take up new languages. A vicious circle.
And all this means that the current generation of general-purpose programming languages might be the last one. As an example of this, I asked Google’s Gemini LLM to generate a program in the Lobster programming language, which has been around for a few years, but hasn’t yet made it to the mainstream:
Unfortunately, there isn't currently a widely recognized programming language called "Lobster". There might be custom languages or internal tools with that name within specific organizations, but it's not commonly used or publicly documented.
Therefore, without specific information about the syntax and functionalities of the "Lobster" language you're referring to, it's impossible to write a program in that language. If you have more details about the features and functionalities of "Lobster", I might be able to help you write the program.
So, the future is worrying for Lobster and its fellow emerging languages.
Putting things in context
However, it’s worth considering that there are other ways of training an LLM. One way is to fine-tune an existing model. That is, you take an existing LLM and tweak its weights until it learns something new. This usually requires a lot less data than in the LLM’s initial training phase, meaning you could potentially bootstrap knowledge of a new programming language using less code samples. The downside is that you’d have to create a new customised LLM and distribute this to the people who want to use your language, which would be kind of messy.
Another approach is to use in-context learning. This is when you tell an LLM how to do something before you ask it to do it. So, before you ask an LLM to write code, you tell it — at inference time — everything it needs to know about the programming language you want to use. In theory, the LLM could then adapt its general understanding of how to generate code to the specifics of this “context”. In practice, this only works with relatively small amounts of context. For instance, imagine teaching it python: in addition to basic syntax, you’d have to teach it about common APIs, python’s programming conventions, and all the other stuff it needs to know to generate good code. Given the way context windows are growing, this may become feasible in terms of space, but expecting an LLM to absorb all this new information at inference time would be a big ask3.
Redefining nigh
But this all assumes that programmers care what language they’re generating code in. At the moment, it’s common practice to take chunks of code generated by LLMs and mix it with code written by a programmer, but over time it’s not hard to imagine the burden of work shifting increasingly towards the LLM. It’s already the case that programmers don’t always look at the code which LLM’s generate — they just assume it’s correct and paste it into their program. At some point, it might become common practice for the programmer to just write a natural language description of what they want a program to do, and have the LLM do the rest.
And at this point, does it matter which “programming language” an LLM generates code in? If only a computer is going to read it, then it might as well be written in sanskrit, for all that future generations of software engineers will care. So, even the last generation languages — the Pythons and Javas of this world — may fade from human memory and become a mere implementation detail. And that’s assuming they don’t get refined away entirely some way down the line. That is, it’s not too hard to imagine an LLM outputting machine code4, rather than bothering to use an intermediate high-level language.
So, in this respect, LLMs are changing the language of programming. In a way, they’ve fulfilled the dream of 1980s visionaries who talked about fourth-generation languages paving the way to programming computers in natural language. This is precisely what we’re doing when we ask an LLM to generate code for us. It’s perhaps not quite the path they envisaged5, but that’s where we find ourselves.
However, computers will still need to be programmed. Even if we’re using natural language, we’ll still have to be pretty specific about what we want a program to do. For this reason, I find it hard to imagine that software engineers will become redundant — though perhaps the term “programmer” will become irrelevant as the actual business of programming gets taken over by LLMs.
Levelling up
And arguably it’s a good thing that we’ll be forced to focus on the high-level behaviour of programs rather than how they’re implemented. Many failings in software development are a result of programmers getting lost in the detail of implementation and losing sight of what their software should do. For decades, computer scientists have been pushing a spectrum of approaches to mitigate against this, from unit tests up to formal specifications, but most of these have received only modest engagement. If software engineers are no longer able to get lost in the business of programming, then they’ll hopefully spend more time coming up with better specifications for their programs’ behaviours.
Though this is ignoring a big fat elephant in the room — the fact that, at present, LLMs can not be relied on to refine a high-level specification into source code that actually implements the specification. Addressing this problem is a high priority within the code-generating LLM community6, but it’s also generating a flurry of activity within the programming language community. For example, it’s known that some languages are a better fit to current LLMs than others, as shown in a study which compared the correctness of generated code across 10 languages. One approach to address this is to develop new languages which are a better fit to LLMs. An example is MoonBit, whose design favours linear code over nested code in order to match the sequential generation process of transformers. However, this does raise questions about over-fitting to particular LLMs, and whether pandering to the needs of LLMs will lead to languages which are fit for LLMs but not fit for humans.
Another promising area for future languages is in scripting interactions with LLMs. LMLQ is a good example of this. It provides a structured language, based around Python, for guiding interactions with a range of LLMs. This includes the use of standard control flow mechanisms, such as for loops and if statements, to generate sequences of queries, and using programmatic conditions to determine which queries are invoked. In theory this could automate a lot of the to-and-fro currently involved in getting sensible content out of LLMs. It’s also not hard to imagine approaches that focus specifically on code generation. For example, scripting an LLM to iteratively improve generated code by getting it to do things like check for bugs and known vulnerabilities, verifying algorithmic choices, and criticising the structure or style of the code. Ultimately this might help fill the gaping chasm between “here’s a specification for a program” and “here’s how I get an LLM to write a program that correctly and sensibly implements the specification”.
Living in interesting times
Thanks to the advent of LLMs, there is an understandable concern that certain activities will become unviable. Designing new general-purpose programming languages for use by humans is potentially one of these. It’s seldom that a step-change in how we do things comes along, but as I’ve hinted at in this post, change also brings opportunities. At the moment, I think the opportunities within programming languages, and software engineering more generally, are still poorly understood. However, it seems to me that the productive integration of LLMs within software engineering practice is going to require a lot of work by a lot of people. Certainly it doesn’t seem tenable that the current ad hoc approach of throwing together snippets of code generated by black boxes is anywhere near being a viable solution. So, watch this space, and I guess we’ll see where the future takes us.
They also sometimes generate bad code. See Can LLMs spot machine learning pitfalls? for one example, but there are plenty more out there.
With current LLM approaches, “adequate” means “a huge amount”. Maybe in the future they’ll be able to generalise from fewer examples.
For instance, it’s known that the effectiveness of in-context learning seems to reduce as the amount of context grows; see this review.
And not necessarily the kind of machine code we’re familiar with. For example, the “machine code” produced by LLMs in the future could be the architecture and weights of a neural network designed to implement a particular behaviour, running on special-purpose hardware rather than a conventional von Neumann computer.
Their path would have involved a lot more focus on how behaviour is specified and verified. A black box churning out un-tested code in response to vague prompts may have been their nightmare.
Somewhat outside the scope of this post, but a couple of examples: this paper presents a method to get an LLM to self-debug the code it generates, and this paper uses an LLM to generate a verifier to check the correctness of its generated code.
I find the suggestion that some programmers are already trusting LLM generated code without checking it very worrying. The problems of "hallucination" are well known so why would they do that?