CRASH Your System (and Debug Kernel Panic)

3
21928
Crash your system

Crash your system

Aimed at Linux kernel developers and just about anybody who would like to debug a kernel panic with the Crash utility, this article assumes readers have an understanding of the basics of the Linux kernel. Besides, exposure to a debugger like GDB would be essential.

Crash is a tool used to analyse the core dump file created by a tool like kdump. Crash depends upon kdump/kexec utilities to obtain its input file. A standard Linux kernel, when booted with the crashkernel argument, reserves a little amount of memory for a standby dump-capture kernel.

Upon a kernel panic, the kexec utility triggers a warm reboot into a dump kernel, where the memory contents of the panicked kernel are backed up. A warm reboot does not erase the contents of memory, and hence these are accessible across reboots. Once the memory contents are dumped to a preconfigured location, the system cold reboots to the standard kernel. The dump can later be used to analyse the panic.

Installing and configuring Crash

To install the Crash tool, you can either install a distribution-specific RPM/deb package, or you can compile from source as per the following steps (as the root):

wget -c http://people.redhat.com/anderson/crash-5.1.1.tar.gz ##the current version as of this article
tar -zxvf crash-5.1.1.tar.gz
cd crash-5.1.1.tar.gz
make && make install

Apart from this, you need to prepare your target machine for dump capture. You would need to make sure that the kernel running on this machine is compiled with the options CONFIG_KEXEC, CONFIG_DEBUG_INFO, CONFIG_CRASH_DUMP, CONFIG_PROC_VMCORE. Apart from that, you need to install the kexec-tools package, which can be downloaded from here.

Once you compile and install this package, you are provided with kdump, kexec, makedumpfile and makedumprd binaries, which are used during various phases of the panic and dump capture. For the machine to be able to boot to the dump kernel, we need the following arguments appended to the bootloader’s kernel line. On my Ubuntu system, I see the following arguments appended to my kernel line:

linux /boot/vmlinuz-2.6.35-24-generic crashkernel=384M-2G:64M,2G-:128M

Here, crashkernel is the keyword that is required. The memory settings are as follows: 384M-2G:64M. If installed RAM is between 384 MB and 2 GB, then reserve 64MB. If it’s above 2 GB, then reserve 128 MB (if RAM is less than 384 MB, no memory is reserved). So, depending on your system’s configuration, you can reserve some amount of memory for the dump kernel.

On some Fedora and Red Hat-based distributions, you see syntax like crashkernel=128M@16M. This means, reserve 128 MB of memory after the first 16 MB. Once these arguments are appended to the bootloader kernel line and saved, the system is rebooted with these settings, and is ready to capture the panic and dump it. Once a panic happens, the following files are fed to the crash utility to perform a dump analysis:

  • Kernel (namelist): This is the uncompressed kernel binary (vmlinux) and not the vmlinuz file that you have in the /boot directory; vmlinux can be obtained easily from the compilation directory of the kernel. If you are running a stock kernel, you need to obtain vmlinux from your vendor.
  • Dump Image (dumpfile): This is the vmcore file or the /dev/mem file.
  • Map file: This is typically the system map file, which is found in the kernel source directory after compilation. This file is passed to the Crash tool with the -S parameter.

Once the above files are obtained from the panicked system, we are ready to perform dump analysis.

Exploring Crash with a sample dump

Let’s trigger a crash, and use the dump we obtain to understand the Crash utility. Trigger a crash by trying the following command:

echo c > /proc/sysrq-trigger

This will trigger a panic, and the system boots into the crash kernel, and takes a dump of system memory into the directory /var/crash/<date-time>/. This is named vmcore. Once done, it boots back to the normal kernel.

With the help of the vmcore, vmlinux and system-map files, we will invoke the Crash tool, and view the sample output from it:

[root@DELL-RnD-India linux-2.6]# crash -S System.map vmlinux /var/crash/2011-01-10-12\:23/vmcore

crash 5.1.1 
---snip---
crash: overriding /boot/System.map with System.map 
GNU gdb (GDB) 7.0 
This GDB was configured as "x86_64-unknown-linux-gnu"... 
---snip------ 
  SYSTEM MAP: System.map                 
DEBUG KERNEL: vmlinux (2.6.36-rc6-ftrace+) 
  DUMPFILE: /var/crash/2011-01-10-12:23/vmcore 
        CPUS: 4 
        DATE: Mon Jan 10 12:21:33 2011 
      UPTIME: 00:06:56 
LOAD AVERAGE: 0.80, 0.65, 0.31 
       TASKS: 278 
    NODENAME: DELL-RnD-India 
     RELEASE: 2.6.36-rc6-ftrace+ 
     VERSION: #2 SMP Wed Sep 29 16:43:59 IST 2010 
     MACHINE: x86_64  (2666 Mhz) 
      MEMORY: 2 GB 
       PANIC: "Oops: 0002 [#1] SMP " (check log for details) 
         PID: 7203 
     COMMAND: "bash" 
        TASK: ffff88007b0d0000  [THREAD_INFO: ffff88007a6ba000] 
         CPU: 0 
       STATE: TASK_RUNNING (PANIC) 

crash>

The above output shows you details about the kernel, the number of processors on the target machine, the command which caused the panic, etc.

Note: Crash can also be invoked on a live system with /dev/mem instead of the vmcore file. For this to work, you need to disable the CONFIG_STRICT_DEVMEM option while compiling the kernel. Stock kernels come with this option enabled, and will not let you use it.

The help command

The most useful command would be the help command, which gives you all the available commands from within the crash tool:

 t             gdb            p              sig            waitq          
btop           help           ps             struct         whatis         
dev            irq            pte            swap           wr             
dis            kmem           ptob           sym            q              
eval           list           ptov           sys            
exit           log            rd             task           
extend         mach           repeat         timer          

crash version: 5.1.1 gdb version: 7.0

To obtain help on any command, run help followed by the command name — for example, help vm.

The bt command

The bt (backtrace) command gives you the stack trace in the current context. And bt -a gives you a stack trace of active tasks on all CPUs. Once the crash tool loads the first context, it sets up information of the panicked process. Here we take a look at the sample output of the command:

crash> bt 
PID: 7203 TASK: ffff88007b0d0000  CPU: 0 COMMAND: "bash" 
#0 [ffff88007a6bbb00] machine_kexec at ffffffff81027ac7 
#1 [ffff88007a6bbb80] crash_kexec at ffffffff810888c9 
#2 [ffff88007a6bbc50] oops_end at ffffffff814570c4 
#3 [ffff88007a6bbc80] no_context at ffffffff81032ee7 
<snipped>

The ps command

This command obtains the status of all the processes, or a selected one. It has an amazing number of options to provide lots of information during dump analysis. Refer to the help section for more details. Here is a sample output:

crash> ps -a 5390 
PID: 5390 TASK: ffff8800799ac650  CPU: 2 COMMAND: "httpd" 
ARG: /usr/sbin/httpd 
ENV: TERM=linux 
     PATH=/sbin:/usr/sbin:/bin:/usr/bin 
     runlevel=5 \<snipped....>

The set command

You can change the current context using the set command, which takes the PID of the process (which can be obtained from the ps command). It takes various other arguments as well, which can be learnt by running help set. If set is used without arguments, it shows information about the current stack. For example:

crash> set ffff88007d7c0000 
    PID: 1 
COMMAND: "init" 
   TASK: ffff88007d7c0000  [THREAD_INFO: ffff88007d7ba000] 

  CPU: 0 
  STATE: TASK_INTERRUPTIBLE

Here, the address is the task pointer of the init process.

The files command

This can be used to get all the open files in the current context; it is a context-sensitive command:

crash> set 1 
    PID: 1 
COMMAND: "init" 
   TASK: ffff88007d7c0000  [THREAD_INFO: ffff88007d7ba000] 
    CPU: 0 
  STATE: TASK_INTERRUPTIBLE 
crash> files 
PID: 1      TASK: ffff88007d7c0000  CPU: 0   COMMAND: "init" 
ROOT: /    CWD: / 
 FD       FILE            DENTRY           INODE       TYPE PATH 
  0 ffff880037a58f00 ffff88007cd5be40 ffff88007d090c90 CHR  /dev/null 
  1 ffff880037a58f00 ffff88007cd5be40 ffff88007d090c90 CHR  /dev/null 
  2 ffff880037a58f00 ffff88007cd5be40 ffff88007d090c90 CHR  /dev/null 
  3 ffff880037a58a80 ffff88003747b000 ffff88003750d540 FIFO 
  4 ffff880037a586c0 ffff88003747b000 ffff88003750d540 FIFO 
  5 ffff880037a58c00 ffff880037493240 ffff88007cdc2ca0 UNKN anon_inode:/inotify 
  6 ffff880037a58180 ffff8800374936c0 ffff88007cdc2ca0 UNKN anon_inode:/inotify 
  7 ffff880076087a80 ffff8800376d8540 ffff88007ceb87b0 SOCK 
  8 ffff880079a25d80 ffff88007a205e40 ffff880079eabc30 SOCK 
  9 ffff88007688b6c0 ffff88007a8f0480 ffff88003752e830 SOCK

We have looked into some regularly used commands. For other commands, kindly refer to the help section.

Acknowledgement

I referred to the documentation/kdump/kdump.txt file while writing this article. Apart from that, I also occasionally referred to numerous other articles available on the Web.

3 COMMENTS

  1. Hi,
    I have configured kdump and able to generate vmcore with 
    echo c > /proc/sysrq-trigger..but not able to generate vmcore with sysrq when system crashes.

  2. I need help
    Oops: 0002
    CPU: 0
    EIR: 0010:[]
    EFLAGS: 00010002
    eax:f7fd1018 ebx:c02857fc ecx:0000000e edx:000001f7
    esi:f7fd1018 edi:c02856e0 ebp:00000082 esp:c0243f00
    ds:0018 es:0018 ss:0018
    Process Swapper (pid:0, process nr:0, stackpage=c0243000)
    Stack: f7ff92c0 24000001 0000000e c0243f68 c0195034 c010afed 0000000e f7fd1000
    C0243f68 0000000e c02541c0 f7ff92c0 c0243f60 c010adac 0000000e c0243f68
    F7ff92c0 00000001 c0242000 00000463 c0119619 c010b113 0000000e c0243f68
    Call Trace: []…

LEAVE A REPLY

Please enter your comment!
Please enter your name here