So we have seen that nowadays operating systems are multi-tasking, which means that they are able to run multiple processes (tasks) at the same time. They achieve that by interrupting them every so often to allow other tasks to run, so that over a period of time every process gets a little runtime, giving a full experience to the user and usually increasing the throughput of the system.
We already know that processes are simply running instances of applications. Many applications allow us to run several instances simultaneously (e.g. you can open Notepad several times, each instance is a separate process).
Processes are also able to share data with each other. The mechanism used to share data between different processes is called I-PC (Inter-Process Communication). I have used the dash in the name, in order to distinguish it from IPC (Instructions per Clock) which we defined in a previous episode. I-PC uses a set of tools to allow processes to communicate with each other: files, pipes, sockets, shared memory, etc.
Every time you save a file created by one application to open it in another application, whenever you pipe two commands in a Linux shell, whenever you simply copy some string of text from one application just to paste it into another application, you’re using inter-process communication. I-PC is also what allows you right now to read this web page via the Internet, only that this time the browser application on your system is exchanging data through a socket with a web-server application running on another machine sitting somewhere else on the Internet.
All this flexibility comes at a cost: latency. I-PC communication is not a very efficient way of sharing information in some cases. It’s mostly useful if the communication between processes is sparse, doesn’t require a lot of bandwidth and it is not latency-sensitive. Such scenarios usually means some kind of computer-human interaction. So for copy-pasting text, chatting, piping commands, web browsing, etc., I-PC is more than enough. But if you have an application which calculates lots of prime numbers on all its cores, a game which needs to draw the image on the screen the same millisecond as the mouse moved, a rendering application which generates a 3D scene, etc., sharing the load using I-PC is very inefficient.
Above, we can see a wonderfully well constructed comparison of latencies that a CPU needs to endure in order to perform our daily tasks. Please note that this picture reflects the latencies as they were estimated in 2012, but – with only minor changes – these numbers and their hierarchy is still relevant today. Important here are not the precise values, rather the ratio between different latency classes. Looking at the table, we can see that accessing L1 cache is around 2 orders of magnitude faster than accessing main memory.
We have seen that I-PC uses complex constructs to allow data sharing. If we look at the above picture, we will see that they mostly sit at the lower end of that table. So a processor that runs at an average speed of 4GHz (that means 4 cycles per nanosecond), has to wait 600 000 cycles – just to inspect a file from a file-system located on an SSD. An eternity!
Multi-threading improves this situation, because the data is shared between threads of the same program. As you can see in the table above, mutex lock/unlock is achieved in a few nanoseconds, so it is around 3 orders of magnitude faster than I-PC. In other words, your 4GHz CPU needs to wait only 100 cycles, instead of hundreds of thousands of cycles.
The downside is that in case of multi-threading, the application programmer is now again in control of deciding how many threads to use, splitting the workload between them in an efficient way and – most importantly – synchronizing them. Multi-threading empowers the programmer with lot more control over the application. But with great power comes great responsibility, so the programmers had to change their way of thinking and programming.
Just to clarify, multi-threading doesn’t replace multi-tasking. Both are employed and needed today. But multi-threading allows solving certain classes of problems more efficiently.
So what exactly is a thread? Well, the easiest way to think of it is like a function inside a program that is running “in parallel” with the main function.
Threads must be explicitly spawned (created) from the main function, they don’t magically spawn themselves. In addition, any thread can spawn extra threads, and there is no difference between the threads based on how or who spawned them. And when that function exits, the thread simply dies too. Therefore, long-lived threads are usually coded using loops.
In all other respects, the thread function works identically to any other application function. Thread code is no different than any other code. Remember, even traditional applications have a thread – the main thread.
There are several strategies in managing threads. A simple way of using threads is to simply create and destroy them as needed. When the program reaches a point in the execution when it needs to parallelize some work, it may spawn a few threads which will perform the workload (hopefully, more efficiently than a single thread could) and then let those threads die as they finish their work.
However, creating threads is a rather expensive operation, because it involves functionality at the operating system level. As such, another strategy of managing threads could be to pool them. This means that the application will create a bunch of threads which will stay pending until the application has some workloads to throw at them. After performing the required workloads, the threads would move back to idle mode, waiting for the next workload.
Different applications employ one or the other strategy. There is no silver bullet here, it really depends on the nature of the application itself and the environment for which the application is designed. Various constraints require using one or another strategy. It is up to the programmer to know which is the better strategy for the application use-case.
In conclusion, threads are useful in the following scenarios:
- performing the same operations on different data (SIMD case), (e.g. image processing)
- high-bandwidth data exchange requirements (e.g. video trans-coding)
- interaction with different peripherals (e.g. gaming)
- low-latency data exchange (e.g. gaming)
Otherwise, a combination of multi-tasking and single-threaded applications is more than enough. For very simple applications or tools, multi-threading can be counter-productive, as it is more difficult to develop and test multi-threaded code. As such, many operating systems provide a rich set of tools that are simple single-threaded apps, doing one job well and which communicate via I-PC with each other.
To be continued…
Previous: Episode II – Software
Next: Episode IV – Multi-threading in action