Context engineering in, prompt engineering out
Just like the regular changing of the guard at Buckingham Palace1, the term prompt engineering has been marched back to barracks, and context engineering has been ceremonially piped in to take its place. Or so I assume, since AI people are suddenly talking about context engineering, and they weren’t previously.
There may be too much change around these days, but for once I like this one, and think it’s well overdue. The term prompt engineering was, in my view, far too specific to how information found its way into an LLM’s context window. Yes, information enters through the current prompt, but the context window also contains all the stuff that the LLM generated itself, and all the previous stuff entered by an increasingly large cast of characters that includes the user, the system, agents, and anything else who happens to be hanging around the modern-day LLM ecosystem. But basically, an LLM only cares about what’s in its context window, not how it got there.
I should explain the term context window. Imagine you’re using an LLM. You load it up, you type in a prompt, and you hit enter2. At that point, your prompt becomes the LLM’s context — a string of words (or more correctly, tokens3) that are going to become the input to the model. The LLM then generates its response, and this gets appended to the context. So, at this point, the context contains both the prompt and the response, in that order. If you then ask the LLM something else, this too gets appended to the context, as does its response. And so on. But at some point, the amount of text in the context may be too long for the LLM to handle, and it will cut off and ignore the earliest part of the context. The remaining text — the bit it does look at — is known as its context window. The average user doesn’t need to worry about filling the context window, since they’re so large these days, but the kind of user who engages in context engineering does.
So who is this mystical user who cares so much about context engineering, and is not satisfied by mere prompt engineering? Well, typically someone who’s using agentic AI or retrieval-augmented generation (RAG), which I previously talked about here and here. But for those not inclined to look there and there, they’re both mechanisms for bringing external information into the context window. Agentic uses tools like search engines to source this information, and RAG uses a database. Increasingly, power users of LLMs are interested in using multiple sources of information, and this means there’s a lot of stuff that they can potentially stuff into the context window.
One set of questions in context engineering is about which stuff is best to stuff. Which information does the LLM really need, and what is just taking up unnecessary space? Can we trim verbose output, or get another LLM to summarise it before adding to this LLM’s context window? Another set of questions is about the order of information. Which agent should we invoke when? In what order should their output be added to the context window? What if they’re running in parallel?
And this latter set of questions touches upon the increasingly dynamic ways in which people — or rather their scripts4 — interact with LLMs. It’s no longer just a linear sequence of pre-planned interactions (e.g. retrieve this information from a database, add it to the prompt, ask this question, …) but rather a set of branching decision points administered both by external tools and by the LLM’s own reasoning abilities. So, context engineering is not just about pre-planning what and when to stuff the context window with, but about how to do this on the fly.
Before I go on, I just wanted to point out that one of the weird things about LLMs is that everything does need to be stuffed into a single linear context window. This is not how interactions within computer systems were handled in the past. Previously, if a computer system wanted information from a particular source, it would directly query that source; you wouldn’t have to pre-assemble everything into one big string of text before invoking it. But, alas, this is the nature of transformers, at least for now5.
Okay, so what constitutes context engineering in practice? Well, like prompt engineering, it’s far from being an engineering discipline, and it’s far from being a well-defined process. It’s more like a set of activities and heuristics around how you construct an LLM-based system. That is, how you formulate prompts, which agents you give the LLM access to (and when), which external libraries you use to rationalise and sanitise LLM interactions, and myriad other things, including those I covered in Plumbing LLMs into the world. I guess you could say it’s the knowledge and experience that developers of LLM-based systems inevitably accumulate.
And context engineer is fast becoming the in job title in AI circles. But if you’re a prompt engineer, I wouldn’t panic too much. Context engineering is really just a new take on prompt engineering that reflects a change in how people think about LLM-based systems. It’s not really something new, and I suspect most prompt engineers are already doing what’s now referred to as context engineering. So a quick find-and-replace in your CV should sort it.
But having said that in a slightly cynical way, I do — as I said earlier — approve of the emergence of this new term. I still think engineering is a serious exaggeration, but the move from prompt to context is welcome. May it enjoy its time in the limelight before it’s replaced by a new and more shiny term in a few months to come.
At last, a UK reference that everyone can enjoy!
Other user interface elements are available, including big shiny buttons.
The distinction is not worth talking about. Well, maybe it is, but it’s quite boring.
At this point, my cat added “fdrcgc’[;/sdaepo;”. Make of that what you will.
I don’t see why this always has to be the case. You could make transformer architectures more adaptable, and I’m sure someone will one day.