Universities are having a rough ride. Those on the far right of politics think we’re a bloated wokeberry ripe for the compost heap. Even moderates don’t see universities as a priority. Yet, at the same time, everyone seems eager to splash the cash on big tech. In this context, I think it’s important to reflect on where the ideas underlying modern AI actually came from, and where they might come from in the future. Despite their contemporary association with big tech, they were mostly dreamt up long ago, and mostly by people working with modest resources at publicly-funded universities.
This story really starts way back in the 1600s, with Newton’s invention of calculus1 at the University of Cambridge. Because it’s the basis of backpropagation, without calculus we wouldn’t be able to train the large neural networks that we take for granted. In fact, we’d struggle to get beyond parameter counts in the 1000s2, which would be quite a limitation considering that current counts are up in the billions.
The other mathematical underpinning of neural networks, matrix multiplication, was also developed in academia, notably by Binet working at the École Polytechnique in Paris in the early 1800s. Since this is the basis of efficiently training and using neural networks, without it we’d be a lot more limited in the models we could practically train.
And let’s not forget the biologists. Neural networks are based on understanding of how biological neurons function, something that first came about thanks to Cajal’s work in the 1890s at the universities of Barcelona and Madrid.
This understanding of animal brains paved the way for the McCulloch-Pitts artificial neuron in the 1940s, developed at the universities of Chicago and Illinois, and followed up by Rosenblatt’s perceptron at Cornell in 1958. Practical neural networks emerged in the mid-80s, again in academia, when Rumelhart, Hinton3 and Williams at the University of California San Diego used backpropagation to train MLPs.
The 80s also saw the emergence of other components that are now considered fundamental to modern AI, including convolutional layers, ReLU activation functions, pooling, residual connections, encoder-decoder architectures, and early forms of attention. Fukushima deserves particular (and often overlooked) recognition here, having developed the first CNN; whilst he didn’t work at a university, he did work at the publicly-funded research lab of Japan’s state broadcaster.
And at this point, most of the ingredients of modern neural networks had been developed4, mostly in academia, and well before big tech started to take an interest. But this certainly wasn’t the end of academic leadership in AI. Pausing in the 90s for an AI winter, the early 2000s then saw initial successes at using neural networks for natural language processing — a thread of research that would eventually lead to contemporary LLMs — notably due to Bengio’s work with feedforward architectures at the Université de Montréal, and later Mikolov’s work on RNN-based models at Brno University.
After that, things get a bit fuzzier. By the 2010s, there were clear commercial opportunities for companies to come along and build on what had been done. However, many of these companies worked very closely with academia, and many of them sprung out of academic research groups. For example: two of the founders of DeepMind met whilst doing research at UCL, one of the founders of OpenAI came from the group Hinton set up at the University of Toronto, and more recently DeepSeek was formed by alumni of Zhejiang University5.
On top of this, many company researchers famous for their AI contributions owe a lot to their training in academia. LeCun, renowned for the modern incarnation of CNNs, developed this at Bell Labs only a year after leaving Hinton’s group. Mikolov, after earning his doctorate at Brno University, went on to work at Google, where he soon developed the influential word2vec text embedding model.
Of course, big tech has played an important role in recent developments, especially in having the money and resources to scale up neural architectures. They’ve also contributed to more fundamental work. Transformers are particularly noteworthy for having been developed by researchers at Google. Even though transformers build on a lot of existing work, they’re a good example of what companies can do when pushed6, in this case by the perceived threat from Apple’s Siri at the time.
But this is really the point — most companies will only invest in research where they see a near-term benefit to doing so. In the case of Google, they’re in fierce competition with other tech titans, and can’t afford to drop behind. Consequently, research that has a clear path to innovative new products is an easy sell to their shareholders. But would they invest in ideas that won’t deliver monetary outcomes for decades to come? Probably not.
And this is why universities are still necessary. Despite the desires of governments, we’re not here to directly drive economic growth. We’re here to deliver the raw ingredients of knowledge which will one day underpin many things, including economic growth. And this includes knowledge which doesn’t have a direct and measurable path to impact — but which could, one day, underlie big things.
Neural networks are a good example of this. Even ignoring the earlier work by mathematicians and biologists, it’s taken almost 80 years of effort, by hundreds of thousands7 of researchers, for this field to develop to a stage where it has clear real world impact. Which included persevering through two AI winters. All of these factors would have made the journey infeasible for a company, so it’s really no coincidence that the ideas behind modern AI had to come from academia.
And what of the future? Despite concerns around the plateauing of transformer performance, I’m pretty sure we haven’t reached the end-point of AI. In fact, this plateauing may be a sign of the danger of focusing on one approach to the exclusion of others — something that is encouraged by the short-termism of big tech, which typically invests in what works and cuts off what currently doesn’t.
Academia is also not perfect8, but it does benefit from a multiplicity of viewpoints. Despite falling out of favour during the AI winters, academics who believed in the potential of neural networks were still able to work on them, and this ultimately delivered the goods we’re now benefitting from. Multiplicity can of course lead to inefficiency (and I could mention again the hundreds of thousands of researchers) but it also provides an ability to backtrack and explore other routes when one of them runs out of road.
So we’re getting to a time when academics may have to step in again and move the conversation forward to new ideas. Or perhaps old ideas, since there are plenty of promising ideas in the history of neural networks (and AI more generally) which didn’t lie on the path to transformers9. And with academia being academia, people are still studying them.
Credit is usually shared with Liebniz, who independently came up with the same idea a little while later.
This is roughly the limit for non-gradient based optimisation.
Hinton did his PhD at my neighbouring university, the University of Edinburgh.
Something I talked about in Neural networks: everything changes but you
This is a very loose estimate, but given that a single academic conference can have thousands of papers, and there are quite a few of these each year, the number of people who have meaningfully contributed to current understanding of neural networks is probably above 100,000.
To mention just one problem, the incentive structure promotes a focus on process (e.g. getting grants) over outcomes (e.g. doing something useful with them).
To name a few: spiking neural networks, recurrent neural networks, and reservoir computers, all of which have much more in common with our brains than transformers.
On an unrelated note: a research group at my university are working on spiking neural networks and it seems quite interesting. I hope you'd consider writing an article on that field at some point as I find your writing quite enjoyable and easy to comprehend.
Hmmm... This feels like it was directed towards me as I have used many of the same argument against academia's role in contemporary AI.