I’m not a great fan of python1. Its use of dynamic types worries me. I’m also disturbed by the flexibility it gives when structuring code. Whilst both of these can make life easier for programmers, they also make it easier for programmers to write sloppy, poorly-structured code that is prone to bugs, hard to maintain and more challenging for other programmers to read.
But what really concerns me is python’s inefficiency. It’s an interpreted language, which means that every time you run a python program, an interpreter has to translate it line by line into machine code whilst the program is executing. This is a significant overhead, and it means that python code runs a lot slower than code written in a compiled language like C++ or Java2, where the job of translating the program into machine code is mostly done before execution.
This would concern me much less if python hadn’t become the standard language for AI and data science, which are two areas where efficiency and speed are particularly important. After all, every language has its use case — back in 19923, when python was first released, its use case was to allow programmers to develop prototypes quickly. And this is a big part of why python eventually became popular. AI and data science involve a lot of prototyping; that is, quickly throwing together bits of code to find out what works. So, in that regard, it’s natural to use a language that doesn’t have the development overheads of those designed for software engineering, such as C++ and Java. It’s also the case that AI and data science specialists are often not specialists in programming, so have a preference for a language with simpler syntax.
Now, at this point, you could argue that the inefficiency of python doesn’t matter, because python is mostly used as a glue language. That is, most of the stuff that requires efficiency is written in languages such as C and C++, and this code is then called from python. This is the case for many of the well-known AI and data science libraries in python: for instance, PyTorch, a popular deep learning library, is really just a thin layer of python wrapped over a C++ backend.
However, if a python-using AI or data science practitioner wants to implement something that isn’t in an existing library, it’s unlikely they would go off and write it in C++. As a consequence, there’s a lot of inefficient python code running within the world’s data centres, needlessly using more power4 and emitting more heat than equivalent code written in a compiled language. So, in that respect, python is contributing to the slow heat death of the planet, and ultimately the universe.
But this isn’t the only reason I allude to python being a potential destroyer of worlds. This goes back to the sloppy programming practice I mentioned earlier. AI software is set to have a profound influence on our lives and livelihoods. It’ll probably be involved in everything, from deciding which medicines we should take to how we fight wars, and a bug in this code could have some quite destructive effects — potentially even on a planetary scale. It’s already challenging to verify the correctness of AI models, and adding a layer of poorly structured, difficult to maintain code on top of these only makes things worse.
Is it time we moved onto a different language for AI and data science? I would say so. In my opinion, there are better languages available. Julia, for instance, is a compiled language that still allows for easy prototyping. It also has better support for type annotations, which make dynamic typing safer. However, the critical mass of python code and users means that a transition to another language is unlikely, at least in the near term, so I guess I’ll just have to keep on grinding my teeth for now.
For some more thoughts on programming languages, see But which programming language should I learn?
The programming language, not the snake or the comedy troupe after whom the language was named.
Technically Java is a just-in-time compiled language, but these run almost as fast as “proper” compiled languages these days.
Something else that annoys me is people describing python as a modern language. It’s actually a few years older than Java, which is seen as a bit crusty.
Okay, there are some subtleties here. Whilst there is some processing overhead due to interpreting vs compiling, the resulting machine code doesn’t itself need more power to run. However, the virtual machine that runs it needs to persist longer. This means that the processor will produce more heat due to the overheads of running a (often quite heavyweight) virtual machine for longer. Running code slower also means that processors will be under-utilised, requiring other computers to be spun up to handle jobs waiting in the queue.