Python supports multiple threads in a program; a multi-threaded program can execute multiple sub-tasks (I/O-bound and CPU-bound) independently. Apart from intelligently mixing CPU-bound and I/O-bound threads, developers try to exploit the availability of multiple cores, which allows the parallel execution of tasks.
What is special about Python threads is that they are regular system threads! The Python interpreter maps Python thread requests to either POSIX/pthreads, or Windows threads. Hence, similar to ordinary threads, Python threads are handled by the host operating system.
There is no logic for thread scheduling in the Python interpreter. Thus thread priority, scheduling schemes, and thread pre-emption do not exist in the Python interpreter. The scheduling and context switching of Python threads is at the disposal of the host scheduler.
Python threads are expensive
Python threads, especially those that are CPU-bound, are very expensive in terms of the usage of system calls. We created a small Python program with two threads, which performs a basic arithmetic task:
# A test snap: threads.py from threading import Thread class mythread(Thread): def __init__(self): Thread.__init__(self) def count(self, n): while n > 0: n -= 1 def run(self): self.count(10000000) th1 = mythread() th1.start() th2 = mythread() th2.start() th1.join() th2.join()
Let’s check the number of system calls used in this program:
vishal@tangerine:~$ strace -f -c python ./thread.py % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 99.97 0.361636 7 54446 26494 futex 0.03 0.000124 1 238 163 open 0.00 0.000000 0 116 read 0.00 0.000000 0 75 close 0.00 0.000000 0 1 execve [...] ------ ----------- ----------- --------- --------- ---------------- 100.00 0.361760 55336 26738 total
The (outrageous number of)
futex() calls are used to synchronise access to a global data structure (GIL), which is explained below. Next, we tweaked our program logic to run in a sequential flow, eliminating threads, and thus, ran it on a single core.
# thread.py class my(): def count(self, n): while n > 0: n -= 1 th1.count(10000000) th1.count(10000000)
vishal@tangerine:~$ strace -f -c python ./thread.py % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 100.00 0.000040 0 238 163 open 0.00 0.000000 0 116 read 0.00 0.000000 0 75 close [...] ------ ----------- ----------- --------- --------- ---------------- 100.00 0.000040 882 245 total
Now, let’s take a look at the time consumed by both versions of the program:
Two threads running on a dual-core Intel x86 machine:
vishal@tangerine:~$ time python ./thread.py real 0m1.988s user 0m1.856s sys 0m0.384s
Sequential version of the same program:
vishal@tangerine:~$ time python ./thread.py real 0m1.443s user 0m1.436s sys 0m0.004s
As apparent, Python threads are very expensive in:
- The number of system calls used
- The higher turn-around time of application
Multi-threading is used to exploit redundant hardware, get better performance, and reduce turn-around time. Python does not serve the purpose in this context. What is the reason for such behaviour?
The cause of this inefficiency is the way Python provides interpreter access to a thread. Only one thread can be active in the Python interpreter, at a time. Every thread in a Python program shares a global data structure called the Global Interpreter Lock (GIL).
The GIL is implemented with a mutex and conditional variable. It ensures that each running thread gets exclusive access to interpreter internals. A Python thread should acquire the GIL to become eligible to run.
A thread can be in any of the following states:
- Ready: Ready to run in the system scheduler.
- Blocked: Waiting for a resource.
- Running: Running by the system scheduler.
- Terminated: The thread has exited, normally.
A thread that’s in the “ready” state but does not have the GIL, may get scheduled by the host scheduler, but would not proceed to the “running” state until it acquires the GIL. Since the GIL is a critical resource, the Python interpreter ensures that each thread releases and reacquires the GIL:
- After a pre-specified interval (termed as “ticks”).
- If the thread performs an I/O operation (read, write, send, etc.)
Ever wondered why you could not interrupt your multi-threaded Python program with Ctrl+C? Ctrl+C sends the SIGINT signal to a Linux process. In Python, only the “main” thread can handle this signal. As mentioned in the last point, only a single thread can be active at a time in the Python interpreter. Also, the Python interpreter periodically asks a thread to release, and gives another thread a chance to acquire the GIL.
The ownership of the GIL is not moderated, and is random. By switching threads, the Python interpreter makes sure that the “main” thread will eventually get a chance to acquire the GIL. The host scheduler will schedule the “main” thread and run the signal-handler code, so we have to wait till then. And then, what’s next?
Now, the main thread is often blocked on an uninterruptible thread-join or lock. Hence, it never gets a chance to process the signal handler. Thus, your Ctrl+C does not reach the main thread. The only solution is to kill the Python process.
Vikas Tomar is Senior Software Engineer at National Semiconductor, Bangalore-India. He likes driving in to the wild and spending time with nature.