Getting Started with SystemTap

4
9885
Health check

Health check

If programming is an art, then debugging is even more so. To be a good programmer, one must master debugging. We have seen some good methods of kernel debugging, e.g., gdb, kgdb, kprobes, etc, but none is as dynamic a tool as SystemTap, a probing and tracing tool that lets you analyse a Linux kernel’s activity deeply, at runtime.

SystemTap can probe system calls and kernel functions at runtime, and can examine variables in functions. There is no need to change the kernel source code to insert instrumentation, and then recompile and install it. Simple scripts can be developed to probe the kernel at runtime.

Installation

SystemTap can be installed directly via the Synaptic package manager for Ubuntu/Debian, or Yum for CentOS, Fedora and Red Hat. However, I always like to download the source code and compile it, with the following steps.

  1. Download the source code from its ftp server. I fetched the latest, systemtap-1.6.tar.gz.
  2. Extract the tar ball with tar -zxvf systemtap-1.6.tar.gz, and change to the extracted directory (cd systemtap-1.6).
  3. Run ./configure. If this fails at any point (an error for a missing package) then you have to install that package (usually with yum/apt-get).
  4. Once configure is done, compile it with make. It will take only a couple of minutes; after it’s done, look for the output binary, stap.
  5. Now run make install to install SystemTap. You could also add your SystemTap directory to your PATH variable, to directly run stap without installing.

Let’s test SystemTap

This is a sample SystemTap script, which I will describe later on. First run it (as the root user, else you will get the error “Warning: /usr/local/bin/staprun is not executable (Permission denied) …”).

# cat hello.stp
probe begin
{
    printf ("hello world\n");
    exit ();
}
 
# stap hello.stp
hello world

Now, I will cover some script syntax and background details. Fundamentally, SystemTap is based on event and handler concepts. In Qt, .NET and other frameworks, this method is used for both interactive and non-interacting actions. When an event occurs, its corresponding handler will be executed. In SystemTap, when a specified event occurs, the Linux kernel runs the handler, and then resumes normal execution.

Events are of two types, synchronous and asynchronous. Typical synchronous events are system call execution, entering and exiting a function, functions in a kernel file, etc. Asynchronous events are timers, jiffies, etc.

Handlers are written in SystemTap’s script language. The format of a SystemTap script is as follows:

probe event { statements}

In a single SystemTap script, multiple events can be used. The statements to be executed for each event are enclosed in {} braces.
When a SystemTap script is executed, the script is translated into C; the C compiler compiles the code to create a kernel loadable module (.ko). During the module insertion, all the events are initialised in the kernel. So when an event occurs, the corresponding handler will be run. When exit() is executed, the module is unloaded. To confirm this, you can omit exit() from the sample script and run stap hello.stp.

Next run lsmod in another terminal, to list the loaded modules. At the top of the list, you will find the stap module entry. Now, to exit the script, press Ctrl+C, which will unload the module.

Let us now try different events that we can use in SystemTap.

Probing a system call

# cat exec.stp
probe syscall.execve
{
    printf ("%s(%d) execve (%s)\n", execname(), pid(), argstr)
}
probe syscall.exit
{
    printf ("%s(%d) exit (%s)\n", execname(), pid(), argstr)
}

The syntax for system-call probing is probe syscall.syscall-name. In the above script, we probed the execve system call, which is used to start a new process, and exit, which is used to exit from a process.

Instead of a simple print statement, we have used arguments with printf. The function execname() returns the name of the current process, pid() the process ID, and argstr() the command-line arguments list. Run the script with stap, as follows:

# stap exec.stp

After this, I started gnome-system-monitor and closed it, causing the following output:

gnome-panel(12515) exit (0)
gnome-panel(12516) execve (/usr/lib/qt-3.3/bin/gnome-system-monitor )
gnome-panel(12516) execve (/usr/local/bin/gnome-system-monitor )
gnome-panel(12516) execve (/usr/bin/gnome-system-monitor )
gnome-system-mo(12516) exit (0)

Probing a kernel function

The most important thing SystemTap can do is probe a kernel function by name. From syscall probing, we get only limited details; there are thousands of other kernel functions too, that we may need to probe — like network stack functions. The syntax for a kernel function probe is given below:

probe kernel.function("function-name") {}

I have chosen to probe ‘ip_rcv()’ as it is called very frequently, whenever an IP packet is received:

# cat ip_rcv.stp
probe kernel.function("ip_rcv")
{
    printf ("packet rcvd %s\n",$$parms);
}
probe timer.ms(4000)
{
    exit()
}

In the kernel function probe handler, we have used the argument $$parms which returns the function’s parameters and its values. Other options are $$locals (local variables) and $$vars (all variables). The second handler in this script is a timer probe, whose syntax is as follows:

probe timer.ms(milliseconds)

Our handler will run after 4,000 milliseconds — thus it will exit() after 4 seconds. Now run the following script:

# stap ip_rcv.stp
packet rcvd skb=0xf50e9f00 dev=0xf412d000 pt=0xc0a4e760 orig_dev=0xf412d000
packet rcvd skb=0xf272ae00 dev=0xf412d000 pt=0xc0a4e760 orig_dev=0xf412d000
packet rcvd skb=0xf6cefa80 dev=0xf412d000 pt=0xc0a4e760 orig_dev=0xf412d000
packet rcvd skb=0xf436a240 dev=0xf412d000 pt=0xc0a4e760 orig_dev=0xf412d000
.......

Using jiffies

If you are a kernel programmer, you will be familiar with jiffies, a global kernel variable representing the number of ticks since the machine has booted. This can be used instead of milliseconds, in the timer. The syntax is given below:

probe timer.jiffies(jiffies){}

Here’s a test script and its execution:

# cat variable.stp
global counter
probe timer.jiffies(100) {
    printf("count = %d\n", counter++);
}
# stap variable.stp
count = 0
count = 1
count = 2
count = 3

The handler will be called every 100 ticks. Like in C, we can declare global variables in the script — as we have declared ‘count’ and used it in the handler.

Probing kernel functions in a C file

At some time we may need to probe all functions present in a kernel C file. For example, we may want to know which functions are called during the IP input layer. The syntax is probe kernel.function(function-name@filename) and the test script is as follows:

# cat ip_input.stp
probe kernel.function("*@net/ipv4/ip_input.c")
{
    printf ("ip_input-> time=%u funcion = %s\n", gettimeofday_s(), probefunc());
}
probe timer.ms(10000)
{
    exit()
}

Here we have used the file net/ipv4/ip_input.c and * instead of a function name, to invoke the handler when any function in this file is called. Now, run the following script:

# stap ip_input.stp
ip_input-> time=1320311892 funcion = ip_rcv
ip_input-> time=1320311892 funcion = ip_rcv_finish
ip_input-> time=1320311892 funcion = ip_local_deliver
ip_input-> time=1320311892 funcion = ip_local_deliver_finish
ip_input-> time=1320311892 funcion = ip_rcv

Probing function return value

To probe the return value of a function, $$return is used. The return value will be in a string. It can only be retrieved from a return event. The syntax is given below:

probe kernel.function("function-name").return {}
probe syscall.syscall-name {}

Test script:

# cat return.stp
probe syscall.mkdir.return
{
    printf ("mkdir() %s\n",$$return);
}

We have chosen the mkdir() system call (called whenever a directory is created). Execute the script, and create a directory (run mkdir test_dir in another terminal):

# stap return.stp
...
mkdir() return=0x0

The mkdir system call returns 0 on success, which has been printed by the script.

Conditional statements

In SystemTap scripts, like other programming languages, we can use if and if-else statements for branching on conditions. The syntax is simple:

if(condition)
    statement1
else
    statement2

Example:

# cat ifelse.stp
global counter
probe kernel.function("*@net/ipv4/ip_input.c")
{
    if (probefunc() == "ip_rcv")
    counter++;
}
probe timer.s(5)
{
    exit();
}
probe end
{
    printf ("ip_rcv() has been called %d times\n", counter);
}

In this script, we declared a global variable and incremented it every time ip_rcv() was called. After 5 seconds, the timer handler will be called and it will executive exit() function which in turn will exit the script. At the end, the end handler will print the counter:

# stap ifelse.stp
ip_rcv() has been called 12 times

This type of script can also be used by application developers who want to monitor kernel work when an application runs in user space.

Creating functions

Reusing source code by creating functions can also be done in SystemTap. Functions also make our scripts easy to control and read. The syntax to create a function is function function_name(arguments) {statements} and to use a function within a handler, it is simply function_name(arguments):

# cat func.stp
function myfunc(){
    printf("my function\n");
}
probe begin
{
    myfunc();
    printf ("hello world\n");
}

Let us call our simple function myfunc() from an event handler. The output is as expected:

# stap func.stp
my function
hello world

There are several other options in SystemTap which you can explore. For more information, refer to its official documentation.