Thanks to all the readers who responded to the problems we discussed in last month’s column. We had given a small code snippet of multi-threaded code, which exhibits false sharing of data. We had asked readers to rewrite the code to avoid the false sharing issue. Congratulations to Nilesh Govande, Arivendu Bhardwaj, Vivek Goel and Miranda Walters for getting the correct answer. Here is the original buggy code snippet from the takeaway problem:
incrementTicketCounter() { int thread_id = pthread_self(); num_tickets_per_thread[thread_id] ++; } void PrintTotalTickets() { //only the main thread calls this function, so no locking needed for (int i = 0; i < N; i++) num_total_tickets += Num_tickets_per_thread[i]; printf ("%d", num_total_tickets); }
Assuming a cache line of size 64 bytes, we see that 16 elements of the array num_tickets_per_thread can fit in the same cache line. Recall that a processor updating the array element num_tickets_per_thread[thread_id] ends up invalidating the other processor’s cache copies because of their co-location in the same cache line. Unlike true sharing where the same datum is accessed by multiple threads, in this case, different threads access different data items. Because of their co-location in the same cache line, the line gets migrated across the different processor caches. This phenomenon is known as false sharing. This results in an increased number of cache misses and will show up as an increase in the execution time of the application.
#define CACHE_LINE_SIZE 64 //we assume a 64 byte cache line size struct ticket { int number; int dummy_pad[CACHE_LINE_SIZE – sizeof(int)]; }; struct ticket num_tickets_per_thread [N]; //N is the number of threads in your application incrementTicketCounter() { int thread_id = pthread_self(); num_tickets_per_thread[thread_id].number ++; } void PrintTotalTickets() { //only the main thread calls this function, so no locking needed for (int i = 0; i < N; i++) num_total_tickets += num_tickets_per_thread[i].number; printf (“%d”, num_total_tickets); }
As our readers correctly pointed out, in order to avoid false sharing, each element of the array num_tickets_per_thread needs to be placed on a separate cache line. The modified code given below does not exhibit any false sharing. We add a pad of size 60 bytes (assuming a cache line size of 64 bytes) around each element of the array so that each element is located on a different cache line. Hence, updating of one element of the array does not result in invalidating copies of any other element present in caches of other processors.
Issues to be aware of in lock-based programming
We’ll now look at some of the issues associated with multi-threaded programming that arise due to the use of locks and how they can be overcome. Locks are used to protect the access to shared resources in a multi-threaded application. While locks are needed to ensure a consistent application state, they also bring forth certain issues. We have already discussed a couple of these, namely deadlock and data races. In today’s column, we will look at another potential problem associated with locks, called priority inversion.
Consider an application that has two threads–a higher priority thread T1 and a lower priority thread T2. Both T1 and T2 need to update a shared global counter that counts the number of operations performed in the application. This shared global counter is protected by a lock, L1. Now when the higher priority thread T1 is ready to run, it ends up waiting for the lower priority thread T2, since T2 currently holds the lock L1. Thus even if T1 has a higher priority than T2, T1 ends up waiting for T2 to finish updating the counter and hence release the lock L1, before it can run. Such a situation is known as priority inversion since T1 has a higher priority than T2, but ends up waiting for T2 clearly resulting in a priority inversion. In an application with a large number of threads, this situation can result in the higher priority thread waiting for a long time on lower priority threads because of access to common shared resources.
A famous case of priority inversion
In many cases, priority inversion happens without the programmer being aware of it and it does not have a major impact on systems performance. But there have been cases where priority inversion has caused poor response on performance, especially in realtime systems. A famous case of priority inversion occurred in the software used in the Mars Pathfinder mission in July 1997. The Pathfinder mission took high-resolution colour pictures of the Martian surface and relayed them back to Earth. When the software was employed on Mars, the programmers back on earth were puzzled by the number of software resets that occurred frequently, leading to poor systems performance. On investigation, it was found that the issue was due to priority inversion.
In the Martian spacecraft, various devices communicated over a data bus. Activity on this bus was managed by a pair of high-priority tasking threads — BM1 and BM2. One of the bus manager tasks that BM1 communicated through a pipe was a low-priority meteorological science task, MS1. The communication pipe was protected by a mutex L1. BM1 and MS1 needed to acquire the mutex L1 before they could send data over the pipe.
The sequence of events leading to each reset began when the low-priority task MS1 was pre-empted by a couple of medium-priority tasks while it held the mutex L1. While the low-priority task MS1 was pre-empted, the high-priority bus distribution manager thread, BM1, tried to send more data to it over the same pipe. Because the mutex L1 was still held by MS1, the bus distribution manager was made to wait. Shortly thereafter, the other bus scheduler thread BM2 became active. It noticed that the distribution manager BM1 hadn’t completed its work for that bus cycle and forced a system reset. VxWorks was the RTOS used on Pathfinder and using the priority inversion workaround VxWorks had, NASA engineers were able to solve the issue remotely. More details on the priority inversion issue encountered in the Pathfinder mission, can be found in David Wilner’s keynote talk at the 1997 IEEE real-time symposium.
Dealing with priority inversion
One way of dealing with priority inversion is to use a technique known as priority inheritance. With priority inheritance being enforced by the operating system, a lower priority task will inherit the task of any other higher priority task that is waiting on a shared resource currently owned by the lower priority task. Consider a simple example of two tasks T1 and T2 with priorities P1 and P2, respectively, with P1 being higher than P2. Hence T1 is the higher priority task compared to T2. Both access a shared resource protected by a mutex L1.
Assume that T2 holds the mutex L1 currently. Now, if T1 becomes ready and needs to wait for mutex L1, then T2’s priority is boosted to P1. This is done so that T2 can finish quickly and release the mutex so that the higher priority task T1 can then run. So T2 temporarily gets a higher priority for a short time. Since T2 inherits the priority from T1, this technique is known as priority inheritance. Priority change due to priority inheritance takes place as soon as the high-priority task begins to be pending; it ends when the resource is released by the lower priority thread.
Another technique to deal with priority inversion is known as priority ceiling. Priority ceiling associates a priority with each resource; the scheduler then transfers that priority to any task that accesses the resource. The priority assigned to the resource is the priority of its highest-priority user, plus one. Once a task finishes with the resource, its priority returns to normal. Both priority inversion and priority ceiling require operating system support. Linux supports priority ceiling via PTHREAD_PRIO_PROTECT protocol and priority inheritance via PTHREAD_PRIO_INHERIT protocol. For more details on techniques to avoid priority inversion, read this paper.
Takeaway problem for this month
This month’s takeaway problem comes from Nilesh Govande. Thank you Nilesh, for sending in the question. Given a two-dimensional NXN array of positive and negative integers, find the sub-rectangle with the largest sum. The sum of a rectangle is the sum of all the elements in that rectangle. In this problem, the sub-rectangle with the largest sum is referred to as the maximal sub-rectangle. A sub-rectangle is any contiguous sub-array of size 1X1 or greater, located within the whole array. As an example, the maximal sub-rectangle of the aray:
0 -2 -7 0 9 2 -6 2 -4 1 -4 1 -1 8 0 -2
…is in the lower-left-hand corner:
9 2 -4 1 -1 8
…and has the sum of 15.
If you have any favourite programming puzzles that you would like to discuss on this forum, please send them to me. Feel free to send your solutions and feedback to sandyasm_AT_yahoo_DOT_com. Till we meet again next month, happy programming!