Threads are the core element of a multi-tasking programming environment. By definition, a thread is an execution context in a process; hence, every process has at least one thread. Multi-threading implies the existence of multiple, concurrent (on multi-processor systems), and often synchronised execution contexts in a process.
Threads have their own identity (thread ID), and can function independently. They share the address space within the process, and reap the benefits of avoiding any IPC (Inter-Process Communication) channel (shared memory, pipes and so on) to communicate. Threads of a process can directly communicate with each other — for example, independent threads can access/update a global variable. This model eliminates the potential IPC overhead that the kernel would have had to incur. As threads are in the same address space, a thread context switch is inexpensive and fast.
A thread can be scheduled independently; hence, multi-threaded applications are well-suited to exploit parallelism in a multi-processor environment. Also, the creation and destruction of threads is quick. Unlike fork()
, there is no new copy of the parent process, but it uses the same address space and shares resources, including file descriptors and signal handlers.
A multi-threaded application uses resources optimally, and is highly efficient. In such an application, threads are loaded with different categories of work, in such a manner that the system is optimally used. One thread may be reading a file from the disk, and another writing it to a socket. Both work in tandem, yet are independent. This improves system utilisation, and hence, throughput.
A few concerns
The most prominent concern with threads is synchronisation, especially if there is a shared resource, marked as a critical section. This is a piece of code that accesses a shared resource, and must not be concurrently accessed by more than one thread. Since each thread can execute independently, access to the shared resource is not moderated naturally but using synchronisation primitives including mutexes (mutual exclusion), semaphores, read/write locks and so on.
These primitives allow programmers to control access to a shared resource. In addition, similar to processes, threads too suffer states of deadlock, or starvation, if not designed carefully. Debugging and analysing a threaded application can also be a little cumbersome.
How does Linux implement threads?
Linux supports the development and execution of multi-threaded applications. User-level threads in Linux follow the open POSIX (Portable Operating System Interface for uniX) standard, designated as IEEE 1003. The user-level library (on Ubuntu, glibc.so
) has an implementation of the POSIX API for threads.
Threads exist in two separate execution spaces in Linux — in user space and the kernel. User-space threads are created with the pthread
library API (POSIX compliant). These user-space threads are mapped to kernel threads. In Linux, kernel threads are regarded as “light-weight processes”. An LWP is the unit of a basic execution context. Unlike other UNIX variants, including HP-UX and SunOS, there is no special treatment for threads. A process or a thread in Linux is treated as a “task”, and shares the same structure representation (list of struct
task_structs
).
For a set of user threads created in a user process, there is a set of corresponding LWPs in the kernel. The following example illustrates this point:
#include <stdio.h> #include <syscall.h> #include <pthread.h> int main() { pthread_t tid = pthread_self(); int sid = syscall(SYS_gettid); printf("LWP id is %dn", sid); printf("POSIX thread id is %dn", tid); return 0; }
Running the ps
command too, lists processes and their LWP/ threads information:
kanaujia@ubuntu:~/Desktop$ ps -fL UID PID PPID LWP C NLWP STIME TTY TIME CMD kanaujia 17281 5191 17281 0 1 Jun11 pts/2 00:00:02 bash kanaujia 22838 17281 22838 0 1 08:47 pts/2 00:00:00 ps -fL kanaujia 17647 14111 17647 0 2 00:06 pts/0 00:00:00 vi clone.s
What is a Light-Weight Process?
An LWP is a process created to facilitate a user-space thread. Each user-thread has a 1×1 mapping to an LWP. The creation of LWPs is different from an ordinary process; for a user process “P”, its set of LWPs share the same group ID. Grouping them allows the kernel to enable resource sharing among them (resources include the address space, physical memory pages (VM), signal handlers and files). This further enables the kernel to avoid context switches among these processes. Extensive resource sharing is the reason these processes are called light-weight processes.
How does Linux create LWPs?
Linux handles LWPs via the non-standard clone()
system call. It is similar to fork()
, but more generic. Actually, fork()
itself is a manifestation of clone()
, which allows programmers to choose the resources to share between processes. The clone()
call creates a process, but the child process shares its execution context with the parent, including the memory, file descriptors and signal handlers. The pthread
library too uses clone()
to implement threads. Refer to ./nptl/sysdeps/pthread/createthread.c
in the glibc version 2.11.2 sources.
Create your own LWP
I will demonstrate a sample use of the clone()
call. Have a look at the code in demo.c
below:
#include <malloc.h> #include <sys/types.h> #include <sys/wait.h> #include <signal.h> #include <sched.h> #include <stdio.h> #include <fcntl.h> // 64kB stack #define STACK 1024*64 // The child thread will execute this function int threadFunction( void* argument ) { printf( "child thread entering\n" ); close((int*)argument); printf( "child thread exiting\n" ); return 0; } int main() { void* stack; pid_t pid; int fd; fd = open("/dev/null", O_RDWR); if (fd < 0) { perror("/dev/null"); exit(1); } // Allocate the stack stack = malloc(STACK); if (stack == 0) { perror("malloc: could not allocate stack"); exit(1); } printf("Creating child thread\n"); // Call the clone system call to create the child thread pid = clone(&threadFunction, (char*) stack + STACK, SIGCHLD | CLONE_FS | CLONE_FILES |\ CLONE_SIGHAND | CLONE_VM, (void*)fd); if (pid == -1) { perror("clone"); exit(2); } // Wait for the child thread to exit pid = waitpid(pid, 0, 0); if (pid == -1) { perror("waitpid"); exit(3); } // Attempt to write to file should fail, since our thread has // closed the file. if (write(fd, "c", 1) < 0) { printf("Parent:\t child closed our file descriptor\n"); } // Free the stack free(stack); return 0; }
The program in demo.c
allows the creation of threads, and is fundamentally similar to what the pthread
library does. However, the direct use of clone()
is discouraged, because if not used properly, it may crash the developed application. The syntax for calling clone()
in a Linux program is as follows:
#include <sched.h> int clone (int (*fn) (void *), void *child_stack, int flags, void *arg);
The first argument is the thread function; it will be executed once a thread starts. When clone()
successfully completes, fn
will be executed simultaneously with the calling process.
The next argument is a pointer to a stack memory for the child process. A step backward from fork()
, clone()
demands that the programmer allocates and sets the stack for the child process, because the parent and child share memory pages — and that includes the stack too. The child may choose to call a different function than the parent, hence needs a separate stack. In our program, we allocate this memory chunk in the heap, with the malloc()
routine. Stack size has been set as 64KB. Since the stack on the x86 architecture grows downwards, we need to simulate it by using the allocated memory from the far end. Hence, we pass the following address to clone()
:
(char*) stack + STACK
The next field, flags
, is the most critical. It allows you to choose the resources you want to share with the newly created process. We have chosen SIGCHLD | CLONE_FS | CLONE_FILES | CLONE_SIGHAND | CLONE_VM
, which is explained below:
SIGCHLD
: The thread sends aSIGCHLD
signal to the parent process after completion. It allows the parent towait()
for all its threads to complete.CLONE_FS
: Shares the parent’s filesystem information with its thread. This includes the root of the filesystem, the current working directory, and the umask.CLONE_FILES
: The calling and caller process share the same file descriptor table. Any change in the table is reflected in the parent process and all its threads.CLONE_SIGHAND
: Parent and threads share the same signal handler table. Again, if the parent or any thread modifies a signal action, it is reflected to both the parties.CLONE_VM
: The parent and threads run in the same memory space. Any memory writes/mapping performed by any of them is visible to other process.
The last parameter is the argument to the thread function (threadFunction
), and is a file descriptor in our case.
Please refer to the sample code implementation of LWP, in demo.c
we presented earlier.
The thread closes the file (/dev/null
) opened by the parent. As the parent and this thread share the file descriptor table, the file close operation will reflect in the parent context also, and a subsequent file write()
operation in the parent will fail. The parent waits till thread execution completes (till it receives a SIGCHLD
). Then, it frees the memory and returns.
Compile and run the code as usual; and it should be similar to what is shown below:
$gcc demo.c $./a.out Creating child thread child thread entering child thread exiting Parent: child closed our file descriptor $
Linux provides support for an efficient, simple, and scalable infrastructure for threads. It encourages programmers to experiment and develop thread libraries using clone()
as the core component.
Please share your suggestions/feedback in the comments sections below.
References and suggested reading
- Wikipedia article on
clone()
- clone() man page
- Using the clone() System Call by Joey Bernard
- Implementing a Thread Library on Linux
- IEEE Standards Interpretations for IEEE Std 1003.1c-1995
- Sources of pthread implementation in
glibc.so
- The Fibers of Threads by Benjamin Chelf
Create your own LWP Program We have to Use
#define _GNU_SOURCE
#define __USE_GNU
also in header else some of the CLONE flags will not be defined.
I was getting following errors
******************************************************************************************************
clone.c:45:28: error: ‘CLONE_FS’ undeclared (first use in this function)
clone.c:45:28: note: each undeclared identifier is reported only once for each function it appears in
clone.c:45:39: error: ‘CLONE_FILES’ undeclared (first use in this function)
clone.c:46:19: error: ‘CLONE_SIGHAND’ undeclared (first use in this function)
clone.c:46:35: error: ‘CLONE_VM’ undeclared (first use in this function)
******************************************************************************************************
please let me know if any thing wrong
You are right.
http://linux.die.net/man/2/clone
You can find some info from here too
http://nubyte.blogspot.in/2011/10/thread.html
hi ganga , can you please provide with example , how two threads , two handler , two bottom halves shares the resource ? also what if data to be shared between the combination of any two like , thread and handler or thread and BH or BH and handler ..please explain and give real time example and sample code like where to write spinlock code and where to not….that will be great help. my email id is voip.ims@gmail.com
Good article, It clears a lot of concepts regarding multi-threaded environment, still I have a bit of confusion about, kernel threads and LWPs, you have mentioned that LWPs have one to one mapping with User level thread, which means that in many to many threading model, LWPs are the structures contending for the Kernel threads, and also I would like to understand if TCB is same as LWP?? cause kernel threads are scheduled somehow, and processes are scheduled as well, so either process and kernel threads have same controlling structures are same or there are two different queues for kernel thread and processes?? Am bit confused so it will be great if you can clear the air, Thanks in advance.