Reverting to types
Following my tirade against python last week, I thought I’d say a bit more about why types are important in programming, and why it might be good to have more of them.
Until recently, types were a visible part of most programming languages. Whenever you created a variable, you had to define its type. That is, you had to explicitly indicate whether it was going to hold an integer, a floating-point value, a character string, or perhaps something more complex such as an image. Nowadays, this is much less common, with dynamically-typed1 languages such as python almost removing the need for programmers to worry about types. This can make life a bit easier for programmers, but is it a good thing?
Well, to understand that, you need to understand why types were introduced to programming languages in the first place.
One reason is efficiency. At a low level, computers have to worry about where to store things. This includes finding large enough blocks of memory to store things in, but also making the best use of fast memory such as on-processor registers in order to speed things up. For this reason, languages such as C provide a range of different types for dealing with values of differing bit lengths. By carefully choosing which of these to use, a programmer can significantly reduce memory usage and significantly increase the speed of their code. However, these days, efficiency is much less important, since people have computers that are much faster than they need to be for most applications, and most computers have a glut of memory2.
Another important driver behind the adoption of types was the desire to make code more expressive and less error-prone. This fragment of Java is a very simple example of how types help with this:
float interest = 100.0;
interest = “Not interested”;
The first line creates a variable called interest and stores a number in it — let’s say it represents the interest to be added to a bank account. It also declares float as its type; this tells the compiler that it will only ever contain a floating-point numeric value. The second line then attempts to store the character string “Not interested” in it. Since the compiler knows the variable can only store a floating-point number, it generates an error. This forces the programmer to engage in a period of constructive self-reflection.
Next, consider the python version:
interest = 100.0
interest = “Not interested”
The first line again creates a variable called interest and stores a number in it. The second line then stores “Not interested” in it. No error is produced, and the programmer walks away, content at a job well done. The bank account owner, on the other hand, is less happy.
This is clearly a very simple example, but shows the role that types play in steering programmers away from making errors. Whilst the example is tiny, in much larger programs there are many variables, and it’s much easier for a programmer to confuse them. Although type checking doesn’t stop all errors, it does prevent those that involve incompatible types.
Beyond this, types also help to self-document code. For instance, in the above Java code, the name of the variable interest is slightly ambiguous, but its type gives a clue to the fact it’s related to money, rather than whether someone is interested in something. This self-documentation becomes particularly useful when using function calls in code, since the flow of types into and out of functions really helps to understand what they do. Personally, I find this lack of information quite an impediment when trying to read large python programs.
Even in traditional statically-typed3 languages, there has been a move away from explicit types. Java, for instance, now provides type inference, in which the compiler infers a variable’s type from the first thing a programmer tries to store in it, rather than having to explicitly declare its type. This is useful for cutting down on the sometimes unwieldy type declarations you get in Java4. However, I’ve seen novice programmers use type inference for everything, and in the process lose all the benefits that type declarations can bring.
Interestingly, whilst the prevailing trend is towards the use of languages with less visible types, programming language researchers are increasingly heading in the opposite direction, towards stricter types. This includes dependent types, which blur the distinction between types, values and code. Rather than just specifying the kind of thing a variable can hold, they can also express precise constraints about its exact form. For example, for a list-type variable, dependent types can express constraints on the size of the list and the ordering of its elements. In turn, this allows the language to analyse the code you write and tell you whether there’s any possibility of these constraints not being met, ruling out a lot of potential bugs in the process.
An interesting side-benefit of dependent types is the possibility of automatic program construction. The idea is that, if you use dependent types to specify the input and output of a piece of code, and these dependent types are sufficiently expressive, then the type system can actually write most of the code for you. This helps shift the programmer’s effort towards describing what the code should achieve, rather than how it should achieve it — a little bit like using a large language model, except the code it produces is guaranteed to be correct. However, doing so involves a lot of maths, and this currently limits the use of dependent types to programming languages that can be easily reasoned about, notably functional programming languages such as Haskell.
And it’s not all bad news at the dynamically-typed end of things. Python, for example, now supports type annotations, which allows programmers to indicate the initial types that are associated with variables. This can improve readability, though unfortunately the python interpreter does not enforce these — meaning that, in practice, these variables could still contain anything. Other dynamically-typed languages go further than this. In Julia, for instance, type annotations are binding on the programmer when used, and the language then uses the information they provide to improve runtime efficiency. Whilst type purists such as me might still frown upon this kind of thing, it does show that it’s possible to achieve some balance between the benefits of explicit types and the care-free ease of dynamic-typing.
In a dynamically-typed language, the type of each variable is determined at runtime, and can change whilst a program is running.
At least from the perspective of someone like me, who grew up squeezing things into a meagre 32k of RAM.
In a statically-typed language, the type of each variable needs to be known at compile time, and remains fixed at runtime.
For instance, when using collection classes with generics.