Plumbing is not the sexiest of pursuits, but life is pretty hard (and messy) without it. In previous LLM-themed posts, I’ve dwelled on how LLMs work, what they can do, and whether they might cause armageddon. I’ve said little about how they’re plumbed into the world around them, yet this is a topic of increasing importance.
Many people still think of LLMs as entities that hide behind chat interfaces. The name ChatGPT doesn’t help to dispel this idea. But increasingly, people are not using LLMs by typing/pasting text into a chat box, getting a response, and manually copying the response somewhere else. Instead, LLMs are being directly tied into other systems, cutting out the human in the middle.
People are even starting to talk about LLMs having a role that’s comparable to the CPU or operating system of a computer1. That is, a central controller that sits in the middle of a bunch of other stuff and orchestrates their activities in some way. And in principle this is not hard to imagine, since LLMs have shown themselves capable of sticking their fingers into a lot of pies (mmm, pies).
Yet to do so requires mechanisms that allow LLMs to understand, discover and interact with other components of a computer system. Or to put it simply, plumbing. In this post, I’m going to give a high-level, but hopefully informative, overview of a bunch of technologies that are commonly used for this.
ReAct Prompting
The first of these is not a technology as such, but rather a way of encouraging LLMs to interact with other things in sensible ways. ReAct prompting2 is an extension to more classical forms of prompt engineering (see Deep Dips #5: Prompt Engineering) that encourages LLMs to frame their output — and hence their reasoning process3 — in a particular way. For ReAct, the aim is to get an LLM to structure its output as a sequence of thought-action-observation blocks.
A ReAct prompt looks something like this4:
< Question to be answered goes here >
Answer the question to the best of your ability. You have access to the following tools:
< Description of available tools goes here, including how their inputs should be formatted, perhaps including examples of using them >
Use the following output format:
Thought - you should always think about what to do
Action - the action to take, specifying the tool and its input(s)
Observation - the result of the action
Thought/Action/Observation can repeat multiple times
After this, you should output your final answer.
Basically, this encourages an LLM to break down answering the question into a series of thought-action-observation stages. The “thought” identifies something the LLM needs to work out before it can generate an answer, the “action” then uses an external tool to work this out, and the “observation” extracts the relevant information from the tool’s response. The LLM is told about the available tools in the prompt, and can use them however it sees fit. Typical examples are running a piece of software that does numerical calculations, looking up information from files or the web, or compiling and executing code generated by the LLM.
LangChain
LangChain is probably the most famous framework for integrating LLMs into larger applications. In plumbing terms, it provides the pipes and connectors you need to assemble things. Probably the most important of these is a mature implementation of ReAct prompting, which includes numerous wrappers for accessing common tools, plus the facilities for creating your own. Here’s a hypothetical example5:
What is the volume of the world's largest cat?
Thought 1 - I need to search for the world's largest domestic cat.
Action 1 - wikipedia_search[input = World's largest domestic cat]
Observation 1 - The world's largest domestic cat was Meow, who weighed 18kg.
Thought 2 - The average density of a cat is roughly that of water, which is 1 gram per cubic centimeter. I need to convert 18kg into grams and then convert this into volume using volume = mass / density.
Action 2 - calculator[input = 18 * 1000 / 1]
Observation 2 - Meow's volume was approximately 18000 cm^3.
Answer - Great question! The world's largest cat, Meow, had a weight of 18kg, corresponding to a volume of approximately 18000 cm^3. Let me know if you'd like to know more about the volumes of large pets.
Yes, I like cats — but ignoring the specifics of cat volume calculation, this example shows how an LLM can use an external information resource (Wikipedia in this case) and something that can do maths reliably (i.e. a calculator) to supplement its built-in reasoning capabilities.
There are plenty of examples online of how you would go about implementing this sort of behaviour in code, so I won’t go through one here, but basically it involves only a modest number of LangChain function calls to set up the LLM and the tools you want it to use (i.e. a search tool and a calculator in this case), and then another call that carries out the ReAct prompting. It’s all pretty straightforward, and this is why frameworks like LangChain have become popular.
And this is at the heart of how many AI agents work. For example, chatbots used in customer service often use an LLM to search through product documentation and other information available to the company. To do this, their developers can use LangChain to give the LLM access to tools that can search through disk files or databases, whilst also handling other relevant plumbing, such as chunking up data, semantic matching6 and formatting output.
PydanticAI
Formatting output? Yes, also not an exciting topic, but surprisingly important. Take classification, where an LLM is given some data and asked to say what it is. For example, in customer sentiment classification, an LLM may be given a customer’s review of a product and asked to say whether it’s positive or negative. The number of positives and negatives can then be summed and a sentiment score calculated.
But this only works if the LLM provides its response in a predictable manner, e.g. by outputting the single word positive or negative. You can of course ask the LLM to do this in the prompt. However, LLMs don’t always do what they’re told, and can in practice output as much or as little text as they feel like. And they often do. Which plays havoc with any less flexible components that the LLM is plumbed into.
And this is where technologies like PydanticAI come in, which provide a sanitisation and cajoling layer between the LLM and whatever occurs downstream. It allows the programmer to specify exactly what format they want an LLM’s output to appear in, and it then won’t relay it until the LLM plays ball. Which in practice can involve careful prompt formulation and asking it — potentially many times — to correct what it’s outputted. All of which is hidden away so that the programmer doesn’t have to worry about it.
Model Context Protocol
Model Context Protocol, commonly known as MCP, is a relatively new addition to the plumbing landscape, and it’s been making a surprising number of waves. Surprising, that is, given that most people don’t care much about plumbing.
People often describe MCP as the USB of LLMs. Which annoys me, because it doesn’t have that much in common with USB, and because people have a habit of lazily parroting the same uninformative generalities when talking about AI7. But before I start ranting, let’s just say that MCP and USB do have in common that they’re both about providing standard interfaces between things that are being connected together.
And the things being connected together are exactly the same sort of things I’ve already been talking about, i.e. connecting external tools and data sources to LLMs. MCP’s main contribution is to make the interfaces between these things clean and uniform, so that everything knows what to expect and in what format, and without being tied to a particular programming language or library. Or to use a plumbing analogy, it’s like standardising the widths and connections between pipes. Which in turn makes it easier to implement and extend frameworks like LangChain8.
But it’s not just developers who benefit from this kind of plumbing. To go full circle, it’s also good for those who experience LLMs through services like ChatGPT. Anthropic (the developers of MCP) give a demo here using their own LLM, Claude, showing how you can give it access to your own disk files by simply9 installing and configuring an appropriate MCP server. Of course, whether you want an LLM to rummage through your files and send them off to who-knows-where is another matter.
The future of LLM plumbing?
So far, the development of LLM plumbing has pretty much mirrored the development of any new computing technology, in that the initial excitement about capabilities gradually led into a focus on how things can be practically and seamlessly used, all whilst lowering the barrier to entry.
One of the interesting things about LLMs though is that they’re not passive components — and, in principle, this means they could have a role in building and discovering their own interfaces with other components. As a simple and near-term example, they could determine by themselves which tool could help them to solve a problem and edit their own configuration files to install it.
In the longer term, LLMs may gradually replace passive components within computers, or at least get wrapped as interfaces around them. For instance, an LLM could provide an interface to a file system. And once this sort of thing starts happening, these LLMs could talk to each other to determine how to interact and collectively solve problems. So, plumbing, like many other things, could be delegated to LLMs to worry about. In which case, move on, nothing to see here…
Whilst not the first to do so, Andrej Karpathy (co-founder of OpenAI and inventor of the term vibe coding) recently used the analogy of an LLM being like an OS. This has been somewhat misreported as him saying that LLMs will become an OS, and ruffling the feathers of people who know about OSs. It’s hard to see LLMs being bothered about all the humdrum activities of OSs like scheduling processes and managing memory, though even this is not beyond imagination.
First described in this ICLR paper by authors at Princeton and Google.
Since they repeatedly stuff their output text back into their input, the text they generate essentially guides their reasoning process. Whether you can say that it’s strictly equivalent to their reasoning process is more debatable.
There’s a more in depth example in this blog post by Harpreet Sahota. ReAct prompts can also include few-shot in-context examples, i.e. one or more examples of concrete thought-action-observation blocks used to answer a related question.
Reformatted to make it a bit more user-friendly. Note that most of the prompt (i.e. the part that tells the LLM about tools and the required output format) is hidden, as it typically is in the output provided by LangChain.
This is used specifically in retrieval augmented generation (RAG - see Deep Dips #5: Prompt Engineering for a brief overview). It involves generating embeddings for both the user’s query and the documents that are being queried, and then finding close matches.
Partly, I suspect, because this is the kind of description that LLMs give when asked about technologies like MCP and LangChain. Maybe this is an early sign of a doom loop where people only know what LLMs tell them, and LLMs only know what they already told people.
Though note that a significant difference between the LangChain example I gave and MCP is that the latter uses a client-server architecture. Which basically means that the tool is not executed and then terminated once it’s done its job, but rather a request is sent out to a persistent tool server (which can be running on the same machine or remotely anywhere in the internet) which can hang around indefinitely, responding to requests from many potential clients. An implementation detail, but a fairly significant one.
Loosely speaking. It does involve editing config files, but presumably these kind of integrations will get even simpler over time.
I'm starting to rely on your posts to keep me up to date with LLM developments. Many thanks!
Hi, I think you've mixed up PydanticAI with Pydantic in this article. Pydantic is a data validation library that's also used in backend web development when making APIs, typically with FastAPI. PydanticAI is the AI agent framework developed by the folks at Pydantic, as an alternative to Langchain. It's Pydantic that is included in almost every LLM framework, not PydanticAI.