Dynamic Program Analysis Using Valgrind: A Jump-start Guide

0
7417

 

This article introduces Valgrind, a dynamic instrumentation framework to detect memory errors. The MemCheck tool, which comes as a part of the Valgrind framework, is used for this purpose. Throughout this article, the use of the term Valgrind implies the Valgrind MemCheck tool.

 Memory errors lead to segmentation faults, which are very common while dealing with pointers in C/C++ programming. It is always easy to identify and solve compilation errors, but the task of fixing segmentation faults is tedious without the help of any tools. The GNU Project Debugger (GDB) and Valgrind are two elegant tools available in the open source community that are useful for fixing these errors. GDB is a debugger, while Valgrind is a memory checker. Unlike GDB, Valgrind will not let you step interactively through a program, but it checks for the use of uninitialised values or over/underflowing dynamic memory, and also gives you the cause of the segmentation fault.

Segmentation faults
What is a segmentation fault, and how is it generated? A program uses the memory space allocated to it, which includes the stack, where local variables are stored; and the heap, where memory is allocated from during runtime. Remember that memory is allocated dynamically at runtime, using keywords such as malloc in C or new in C++. Now consider the following scenarios:

  • Not releasing acquired memory using delete/free.
  • Writing into an array with an index that’s out of bounds.
  • Trying to reference/dereference a pointer that is not yet initialised.
  • Passing system call parameters with inadequate buffers for read/write; i.e., if your program makes a system call passing an invalid buffer (a buffer that cannot be addressed) for either reading or writing.
  • Attempting to write read-only memory.
  • Trying to dereference a pointer that is already freed.

All these situations can give rise to memory errors, causing the program to terminate abruptly. This is particularly dangerous in safety- and mission-critical systems, where such abrupt program termination can have catastrophic consequences. Hence, it is necessary to detect and resolve such errors that can lead to segmentation faults. The Valgrind open source tool can be used to detect some of these errors by dynamically executing the program.

Valgrind notifies the user with an error report for all the above scenarios. The error detection is performed by tracking all the instructions before the execution of that particular instruction, and checking for memory leaks. This tracking is done by storing the data about the state of each memory location before the execution of each instruction, known as meta-data, in what is called shadow memory. Note that each time the meta-data is analysed to check for memory leaks, it may lead to an overhead which makes the program slower to analyse dynamically.
Memory faults may not cause significant damages in small programs, but can be extremely dangerous in safety-critical applications and can have disastrous consequences; for instance, a segmentation fault in a medical application may lead to loss of lives. Hence, one must be extremely careful about memory leaks.

Detecting memory errors using Valgrind
In this section, let us explore how to use Valgrind to detect memory errors in a program written in C/C++. Apart from the MemCheck tool, the Valgrind distribution also includes thread error detectors, a cache and branch-prediction profiler, a call-graph generating cache and branch-prediction profiler, a heap profiler and three experimental tools: a heap/stack/global array overrun detector, a second heap profiler that examines how heap blocks are used, and a SimPoint basic block vector generator. Valgrind-3.8.1 is the latest stable version, which has been used for this article. The following platforms support Valgrind: X86/Linux, AMD64/Linux, ARM/Linux, PPC32/Linux, PPC64/Linux, S390X/Linux, MIPS/Linux, ARM/Android (2.3.x and later), X86/Android (4.0 and later), X86/Darwin and AMD64/Darwin.

Debugging with Valgrind using MemCheck
Unlike Java, languages like C or C++ do not have a garbage collector, which is an automatic memory manager for collecting memory occupied by unused objects in the program. Hence, there exists a significantly higher chance for memory faults to occur. One major issue with memory faults is that the error leads to a failure only during runtime. Thus, tools like Valgrind play a major role in detecting memory faults, without which debugging such errors becomes troublesome.
Given below is a demonstration of how to use Valgrind with the following code. It explains the scenarios listed earlier:

#include<iostream>
#include<stdlib.h>
using namespace std;

int main()
{
    int *x;
    x = new int(20); // no delete() used to release memory allocated
    return 0;
}
Let us compile the above code using the following command:
$ g++ -g eg1.cpp -o eg1

To analyse this program using Valgrind, run the following command:

$ valgrind --tool=memcheck --leak-check=yes ./eg1

You will get the following output:

==5215== Memcheck, a memory error detector
==5215== Copyright (C) 2002-2012, and GNU GPL'd, by Julian Seward et al.
==5215== Using Valgrind-3.8.1 and LibVEX; rerun with -h for copyright info
==5215== Command: ./eg1
==5215==
==5215==
==5215== HEAP SUMMARY:
==5215== 	in use at exit: 4 bytes in 1 blocks
==5215==   total heap usage: 1 allocs, 0 frees, 4 bytes allocated
==5215==
==5215== 4 bytes in 1 blocks are definitely lost in loss record 1 of 1
==5215==	at 0x402B87E: operator new(unsigned int) (vg_replace_malloc.c:292)
==5215==	by 0x8048528: main (eg1.cpp:7)
==5215==
==5215== LEAK SUMMARY:
==5215==	definitely lost: 4 bytes in 1 blocks
==5215==	indirectly lost: 0 bytes in 0 blocks
==5215==  	possibly lost: 0 bytes in 0 blocks
==5215==	still reachable: 0 bytes in 0 blocks
==5215==     	suppressed: 0 bytes in 0 blocks
==5215==
==5215== For counts of detected and suppressed errors, rerun with: -v
==5215== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)

The number 5215 is the process ID of the program. Additionally, the tool provides information about various other properties of the program like heap summary, leak summary and error summary. Heap summary provides details regarding calls to malloc/new/free/delete. In the above output, the count of memory allocations and frees is mentioned; if not the same, it indicates a memory fault. The leak summary gives the amount of memory leaked; in the above example, this is 4 bytes, as shown. The error summary provides an overview about the total number of errors. 
In this example, the memory allocated is not released using the delete() instruction. Thus, the 4 bytes are considered as ‘definitely lost’ as given in the leak summary, which indicates that your program is leaking memory.
Let us complicate our example code by adding the following, in order to consider Scenario 2:
x[20] = 1; //Invalid write as x[20] is not allocated any memory

Remember that here, x[20] being assigned a value is not a valid operation; hence, this will lead to a segmentation fault. We know that the size of x is just 20 and hence accessible locations are x[0] to x[19], so x[20] is an invalid address, and writing to it is an invalid write. 
On compiling the code and running it in Valgrind, you will get the following response:
==5235== Invalid write of size 4
==5235==	at 0x804853A: main (eg1.cpp:8)
==5235==  Address 0x4328078 is not stack'd, malloc'd or (recently) free'd

The output indicates there is an invalid write of size 4 happening at location 0x804853A, i.e., at Line No 8 in the main function of the program. In other words, it gives you the type of error and the stack trace, which gives you the location of the error. 
Let us now add the following lines to our example code, to consider Scenario 3.
if(x[1] == 0) // x[1] is not assigned any value. Hence invalid read.
    cout<<"Hello";

There is an invalid read happening in the new line. Remember that we did not assign any values to the array.
 Therefore, x[1] has a garbage value and reading that is an invalid read—a memory fault. On compiling and running the program in Valgrind, you will get the following output:
==5244== Invalid read of size 4
==5244==	at 0x80485D7: main (eg1.cpp:9)
==5244==  Address 0x432802c is 0 bytes after a block of size 4 alloc'd
==5244==	at 0x402B87E: operator new(unsigned int) (vg_replace_malloc.c:292)
==5244==	by 0x80485B8: main (eg1.cpp:7)

Note that MemCheck does not report an error when it finds uninitialised data, but reports only when uninitialised data is used in the program.  
You can explore the other scenarios in a similar way, if you are interested. Valgrind effectively finds unpaired calls to new/malloc and delete/free, invalid memory operations like read and write, and detects system calls with inadequate read-write parameters. 

<strong>Installing the Valgrind framework</strong> 
Valgrind is available from the Ubuntu repositories. You can check the repositories for other distributions, or directly install the program using the source from http://valgrind.org/downloads/ for the latest version, after which, follow the steps given below:
$ bzip2 -d valgrind-XYZ.tar.bz2
$ tar -xf valgrind-XYZ.tar
$ ./configure
$ make
$ make install

Using Valgrind during software development can improve the quality of the software being developed. To get more information about Valgrind, please refer to the home page, http://valgrind.org/

References

[1]    http://www.cprogramming.com/debugging/valgrind.html
[2]    http://valgrind.org/
[3]    http://valgrind.org/downloads/
[4]    http://cs.ecs.baylor.edu/~donahoo/tools/valgrind/messages.html
[5]    http://www.around.com/ariane.html
[6]    http://valgrind.org/docs/manual/faq.html#faq.deflost 

 

LEAVE A REPLY

Please enter your comment!
Please enter your name here