Every now and then I like to Google1 “programming languages for machine learning” and then get annoyed by the posts that list every language in the world as being good for machine learning. Really? Admittedly, a lot of the machine learning programming I do these days is of the vicarious nature, but even I know that pretty much everyone is using Python. It’s definitely not being done2 in Java, C++ or Haskell, much as I might like that to be the case.
As the title of my post Python, destroyer of worlds suggests, I’m no great lover of Python. But I’m going to go out of character and say nice things about Python to begin with. Python is great for prototyping, throwing things together quickly, and scripting things written in other languages. It’s accessible, it isn’t loaded down with excess syntax, and it forms a low barrier for people new to programming and machine learning < rinses mouth out with salty water > But it’s so darn slow and inefficient! And types; it doesn’t have any!3 As I discussed in Reverting to types, types are one of the great tools we have within programming for making code less error-prone and easier to understand — both things we should want in machine learning.
So what are the alternatives? Well, I think there are only three worth mentioning at present: R, Julia and Mojo4.
R is also not one I’d encourage. It’s just as awful as Python when it comes to speed, efficiency, proneness to errors and comprehensibility. Or perhaps worse. R developed as an open source alternative to S, a language that was designed to replace Fortran for use in statistical programming. Which gives you a clue about how old it is. The good thing about R is that it does have a huge ecosystem of statistical and machine learning libraries, and sometimes these are more feature rich than their Python equivalents. But — having used R quite a lot in the past — its syntax can be confusing, and it suffers a lack of clear design due to being incrementally developed over decades.
Julia is more my cup of tea. It’s far more recent than either R or Python, having appeared in this century rather than the last. Like Python, it’s a dynamically typed language, meaning you can still shunt together code without thinking too much about what you’re doing. I’m not saying this is a good thing, but it does seem to be one of Python’s selling points. However, it also supports type annotations that are enforced by the language5. This means you can use types to improve the safety, readability or efficiency of your code, but you don’t have to. But the most visible departure from Python is that Julia is a compiled language, so it’s much faster than Python. Even when you use it within an interactive shell, it still attempts to compile as much as it can ahead of time. The only thing I’m not too keen on is Julia’s lack of classes. It kind of supports object-oriented programming, but not in the nice structured way of languages like C++ and Java. However, OOP is sadly a rare thing in machine learning, so this is probably not an issue for most practitioners.
Mojo is a more recent language. So recent in fact that it’s not finished. Mojo’s big selling point is that it’s both a superset of Python6 and compatible with Python libraries — meaning that Python code will run in Mojo either by writing it directly or by calling routines in SciPy, NumPy etc7. Which seems to be enough of a selling point that people are willing to use pre-releases of the final language. Its other selling point is that it’s trying to be a systems programming language in addition to a scripting language. That is, it lets programmers write code that takes advantage of the low-level efficiencies of the system it’s running on, much like C and Rust. One way it achieves this is through the idea of gradual typing — that you can take some Pythonesque code without types, and gradually add them in order to speed things up. In practice this is similar to Julia’s approach, but the fact you can start with Python syntax is a particular advantage. It also enables programmers to leverage hardware resources like GPUs and TPUs, so in principle provides one language that can be used both for high-level scripting and the kind of low-level efficiency that currently requires C++ etc.
However, it’s the backwards compatibility with Python that is the real driver behind Mojo. This is arguably the factor that prevented Julia from making a sizeable impact on the machine learning landscape. Whilst Julia is superior to Python in many ways, it doesn’t allow programmers to leverage all the existing mass of Python code and teaching materials, and this has been a barrier to uptake. It’s too early to see whether Mojo will be able to scale this barrier, but there are already signs that it’s making an impact, despite its early stage of development.
Do I like Mojo? Well, I don’t hate it, and frankly, extending Python is the only way I see of a new language overcoming Python’s dominance in machine learning. But this is ignoring an elephantine presence in the room: LLMs. In a previous post, I talked about the difficulties any new language will face in an era where people are becoming reliant on LLMs to write code for them. LLMs gain their code generation abilities by training on existing code samples, plus information from websites such as StackOverflow. With few of the former, and the slow demise of the latter, it’s unclear how LLMs will learn new languages. As a quick test of this, I asked Microsoft Copilot to write some Mojo code to train a simple machine learning model. It happily spewed out code, but I couldn’t get the code to run in the Mojo playground, so I’m making the uneducated assumption that it wasn’t valid Mojo code.
One more thing to consider is whether people will continue to use programming languages like Python. If LLMs become reliable enough at generating code, will there be any need for people to think about the language that is being used? Instead, it’s not hard to imagine human activity moving to the level of specification — perhaps using some combination of natural language and visual language, rather than code as we currently think of it. Which language the code is generated in then becomes an implementation detail for the LLM, and not something us mere humans have to concern ourselves with. And at that point, maybe I can stop worrying about Python.
Other search engines are available.
Those actually implementing machine learning algorithms still use something sensible and fast like C++, but most machine learning is now done at the scripting level.
Or at least it doesn’t readily expose any. It does have them deep down.
There are other languages that people can use for machine learning, but they don’t have enough critical mass within machine learning to make them real contenders. There are also more niche languages like Scala and SQL which have particular roles within machine learning, but here I’m focusing on general purpose machine learning.
Unlike Python, which does now let you specify types, but will then happily ignore them.
The current release doesn’t yet implement all of Python. Lists, dictionaries and global variables are notably lacking, as are classes.
The latter is also true of Julia.
Very inspiring about the demise of Stack Overflow leads to inability of parasitical LLM in new language, making human coders living space in the race with AI.
If I have special talent, the best strategy is to keep silent thus uncreative parasites can’t follow suit.