There’s a new buzzword in town: Generative AI. Or just GenAI to the cool kids. It’s about using AI techniques to create artefacts, most notably text and images.
There’s nothing particularly new about using AI to generate things. I should know — I was using evolutionary algorithms to generate electronic circuit designs way back in the noughties. However, what has changed is the ease with which things can be generated, and the breadth of things that can be generated. This is mostly1 due to transformers (see Deep Dips #3: Transformers), especially when wrapped up in web-based services like ChatGPT.
The term GenAI is now becoming ubiquitous. You see it in the advertising of companies. You see innumerable new companies being built around it. You see it in job adverts. But what is it actually good for?
We can start by considering what it’s not good for. It isn’t a replacement for other types of AI. For instance, if you want to do machine learning, then you can’t generally use ChatGPT for this. There are some exceptions2, but the vast majority of ML tasks should still be done using dedicated ML approaches. And these include transformers. That is, transformers are not the same thing as GenAI. You can of course use transformers to generate stuff, but they’re equally at home performing routine ML activities like classification and regression. Another way of putting this is that GenAI is a product of ML; it’s not presently a way of doing ML.
So what is GenAI good for? Well, the main use still lies in generating text, with most implementations taking the form of a chatbot. That is, you ask it questions and it gives you answers. And most industrial and business interest lies in making the underlying model more aware of the intricacies of a particular domain, so that it can answer questions that are relevant to people working in or interacting with that domain.
Much of this3 rests on the idea of fine-tuning an off-the-shelf foundation model. A foundation model is a model which has been pre-trained on a huge amount of data. In the case of ChatGPT, this is pretty much all the text that can be readily sucked up from the web, and absorbing all this text gives it a general ability to interpret and generate written language. Fine-tuning then involves assembling a much smaller collection of text which is specific to a particular domain — think user manuals, policy documents, communication transcripts — and then re-training some components of the model so that it focuses on dealing with this specific kind of information. OpenAI (the creators of ChatGPT) for example, provide a web interface to do this fine-tuning, so that even people from a non-technical background can do it. For a fee of course.
This kind of GenAI functionality is also being rolled out into consumer operating systems. So it shouldn’t be long before my computer can fine-tune a transformer on all my email communications and learn to respond to them for me. Phew!
Another example based around text is code generation. Code foundation models (and even general text models like ChatGPT) have become pretty adept at generating code for pretty much any programming task. And this includes ML, although they don’t always get it right (see Can LLMs spot machine learning pitfalls?). However, it also introduces challenges to academics like me who are trying to assess students on their ability to write code —rather than their ability to ask an LLM to write it for them.
GenAI is not limited to generating just text. There are plenty of models that will generate images, and this includes the most recent generations of commercial models like ChatGPT, which can generate both text and images. There are also numerous models for generating video and audio. All of these models have introduced both benefits and challenges to the creative industries.
Basically, wherever there’s a big enough set of existing examples of something, then someone can (and generally has) come along and trained a foundation model to generate more of these somethings.
One nice example of this is proteins, the building blocks of biology. For years, people constructed intricate models to predict the shape of a protein from its genetic encoding, without a huge amount of success. But then DeepMind threw a transformer at a big pile of protein data, and now it’s possible to generate the shape of a protein quite accurately. This in turn has the potential to revolutionise fields like drug design.
Another emerging example is the use of GenAI in manufacturing, with tools like Autodesk introducing functions to automatically generate designs that match some sort of specification. This means you can say things like “generate me a sprocket flange that decreases mechanical wear by 50%” and out will pop a potential design.
The downside of all this is that generative models don’t always get things right. Hallucination of facts by text-based transformers has become a well-known example of this, but the same is also true of other generative models. This means that a deep domain knowledge is still required to determine whether the output of a generative model is correct. A more subtle issue is bias. That is, if the examples the models are trained on are biased in some way, then it’s likely that the models will generate these same biases in their outputs. An example of this is the widely-reported tendency of text and image-generation models to propagate racial and sexist stereotypes.
Another problem with generative models is the extent to which they can generalise from their training data. Whilst there are plenty of examples of them generating solutions that are in some sense novel, there are also many examples of them outputting verbatim the same data they were trained on. This results in problems when these models are trained on other people’s intellectual property. And, due to their reliance on existing data, there will always be limits to how far neural network-based models like transformers can generalise beyond human knowledge. This means that they may not be capable of true novelty, but rather just novelty in the sense that they can reconstitute existing solutions in new ways.
Although this does mean there is still a space for AI tools that can generate true novelty — which is good news for people like me who work with evolutionary algorithms. Although these algorithms are slow compared to transformers, and challenging to scale, they are capable of generating truly novel solutions that go beyond the sorts of things that humans can imagine.
So, as with anything AI-related, there are both benefits and challenges to generative AI, and it can be hard to separate the hype from the true potential.
But not exclusively. Diffusion models are pretty important for generating images.
For example, LLMs have now widely replaced conventional ML models for text-based sentiment classification. There’s also been some success in using in-context learning for classification and regression tasks beyond language modelling.
But not all of this. RAG (Retrieval Augmented Generation) is an alternative approach that rests on in-context learning. That is, adding relevant information retrieved from elsewhere into the prompt.
I agree that the most promising route to truly creative computer systems is evolutionary computing but even then a fitness/objective function is required.
I am now envisioning a future you who never reads any email but needs to take his GenAI assistant to all his in-person meetings so it can tell him what he needs to say 😀