Device Drivers, Part 16: Kernel Window — Peeping through /proc

8
9814
The virtual FS

The virtual FSThis article, which is part of the series on Linux device drivers, demonstrates the creation and usage of files under the /proc virtual filesystem.

After many months, Shweta and Pugs got together for some peaceful technical romancing. All through, they had been using all kinds of kernel windows, especially through the /proc virtual filesystem (using cat), to help them decode various details of Linux device drivers. Here’s a non-exhaustive summary listing:

  • /proc/modules — dynamically loaded modules
  • /proc/devices — registered character and block major numbers
  • /proc/iomem — on-system physical RAM and bus device addresses
  • /proc/ioports — on-system I/O port addresses (especially for x86 systems)
  • /proc/interrupts — registered interrupt request numbers
  • /proc/softirqs — registered soft IRQs
  • /proc/kallsyms — running kernel symbols, including from loaded modules
  • /proc/partitions — currently connected block devices and their partitions
  • /proc/filesystems — currently active filesystem drivers
  • /proc/swaps — currently active swaps
  • /proc/cpuinfo — information about the CPU(s) on the system
  • /proc/meminfo — information about the memory on the system, viz., RAM, swap, …

Custom kernel windows

“Yes, these have been really helpful in understanding and debugging Linux device drivers. But is it possible for us to also provide some help? Yes, I mean can we create one such kernel window through /proc?” asked Shweta.

“Why just one? You can have as many as you want. And it’s simple — just use the right set of APIs, and there you go.”

“For you, everything is simple,” Shweta grumbled.

“No yaar, this is seriously simple,” smiled Pugs. “Just watch me creating one for you,” he added.
And in a jiffy, Pugs created the proc_window.c file below:

#include <linux/module.h>
#include <linux/kernel.h>
#include <linux/proc_fs.h>
#include <linux/jiffies.h>

static struct proc_dir_entry *parent, *file, *link;
static int state = 0;

int time_read(char *page, char **start, off_t off, int count, int *eof, void *data) {
    int len, val;
    unsigned long act_jiffies;

    len = sprintf(page, "state = %d\n", state);
    act_jiffies = jiffies - INITIAL_JIFFIES;
    val = jiffies_to_msecs(act_jiffies);
    switch (state) {   
        case 0:
            len += sprintf(page + len, "time = %ld jiffies\n", act_jiffies);
            break;
        case 1:
            len += sprintf(page + len, "time = %d msecs\n", val);
            break;
        case 2:
            len += sprintf(page + len, "time = %ds %dms\n",
                    val / 1000, val % 1000);
            break;
        case 3:
            val /= 1000;
            len += sprintf(page + len, "time = %02d:%02d:%02d\n",
                    val / 3600, (val / 60) % 60, val % 60);
            break;
        default:
            len += sprintf(page + len, "<not implemented>\n");
            break;
    }
    len += sprintf(page + len, "{offset = %ld; count = %d;}\n", off, count);

    return len;
}
int time_write(struct file *file, const char __user *buffer, unsigned long count, void *data) {
    if (count > 2)
        return count;
    if ((count == 2) && (buffer[1] != '\n'))
        return count;
    if ((buffer[0] < '0') || ('9' < buffer[0]))
        return count;
    state = buffer[0] - '0';
    return count;
}

static int __init proc_win_init(void) {
    if ((parent = proc_mkdir("anil", NULL)) == NULL) {
        return -1;
    }
    if ((file = create_proc_entry("rel_time", 0666, parent)) == NULL) {
        remove_proc_entry("anil", NULL);
        return -1;
    }
    file->read_proc = time_read;
    file->write_proc = time_write;
    if ((link = proc_symlink("rel_time_l", parent, "rel_time")) == NULL) {
        remove_proc_entry("rel_time", parent);
        remove_proc_entry("anil", NULL);
        return -1;
    }
    link->uid = 0;
    link->gid = 100;
    return 0;
}

static void __exit proc_win_exit(void) {
    remove_proc_entry("rel_time_l", parent);
    remove_proc_entry("rel_time", parent);
    remove_proc_entry("anil", NULL);
}

module_init(proc_win_init);
module_exit(proc_win_exit);

MODULE_LICENSE("GPL");
MODULE_AUTHOR("Anil Kumar Pugalia <email_at_sarika-pugs_dot_com>");
MODULE_DESCRIPTION("Kernel window /proc Demonstration Driver");

And then Pugs did the following:

  • Built the driver file (proc_window.ko) using the usual driver’s Makefile.
  • Loaded the driver using insmod.
  • Showed various experiments using the newly created proc windows. (Refer to Figure 1.)
  • And finally, unloaded the driver using rmmod.
Peeping through /proc
Figure 1: Peeping through /proc

Demystifying the details

Starting from the constructor proc_win_init(), three proc entries have been created:

  • Directory anil under /proc (i.e., NULL parent) with default permissions 0755, using proc_mkdir()
  • Regular file rel_time in the above directory, with permissions 0666, using create_proc_entry()
  • Soft link rel_time_l to the file rel_time, in the same directory, using proc_symlink()

The corresponding removal of these is done with remove_proc_entry() in the destructor, proc_win_exit(), in chronological reverse order.

For every entry created under /proc, a corresponding struct proc_dir_entry is created. For each, many of its fields could be further updated as needed:

  • mode — Permissions of the file
  • uid — User ID of the file
  • gid — Group ID of the file

Additionally, for a regular file, the following two function pointers for reading and writing over the file could be provided, respectively:

  • int (*read_proc)(char *page, char **start, off_t off, int count, int *eof, void *data)
  • int (*write_proc)(struct file *file, const char __user *buffer, unsigned long count, void *data)

write_proc() is very similar to the character driver’s file operation write(). The above implementation lets the user write a digit from 0 to 9, and accordingly sets the internal state. read_proc() in the above implementation provides the current state, and the time since the system has been booted up — in different units, based on the current state. These are jiffies in state 0; milliseconds in state 1; seconds and milliseconds in state 2; hours, minutes and seconds in state 3; and <not implemented> in other states.

And to check the computation accuracy, Figure 2 highlights the system uptime in the output of top. read_proc‘s page parameter is a page-sized buffer, typically to be filled up with count bytes from offset off. But more often than not (because of less content), just the page is filled up, ignoring all other parameters.

Comparison with top’s output
Figure 2: Comparison with top’s output

All the /proc-related structure definitions and function declarations are available through <linux/proc_fs.h>. The jiffies-related function declarations and macro definitions are in <linux/jiffies.h>. As a special note, the actual jiffies are calculated by subtracting INITIAL_JIFFIES, since on boot-up, jiffies is initialised to INITIAL_JIFFIES instead of zero.

Summing up

“Hey Pugs! Why did you set the folder name to anil? Who is this Anil? You could have used my name, or maybe yours,” suggested Shweta. “Ha! That’s a surprise. My real name is Anil; it’s just that everyone in college knows me as Pugs,” smiled Pugs.

Watch out for further technical romancing from Pugs a.k.a Anil.

8 COMMENTS

  1. this can be used to modify the hardware register contents or read the register contents that can be really helpful in debugging drivers….

    • Yes, you are right – that could be one of its powerful usage, which in fact is one of the techniques for “debugging by querying”.

  2. @anil_pugalia:disqus sir……what’s the jiffies and INITIAL_JIFFIES stands for…..and what’s the meaning of HZ…..actually i am trying to calculate the jiffies for the write and read operation in character driver…..but when i print the jiffies value….it will never chnages in the starting and ending of the write and read operation…….can u clearify the concept behind this//

    • jiffies is the unit of resolution of kernel time. Nowadays, on a typical PC, it is 1 msec, which could be around a million instructions. So, your read or write is finishing before that, and hence you do not see any change in jiffies.

  3. @anil_pugalia:disqus if INITIAL_JIFFIES is the value of jiffies at the boot time…..then how INITIAL_JIFFIES would be greater then the jiffies…….actually when i am use to print the INITIAL_JIFFIES and jiffies then INITIAL_JIFFIES would be greater then the jiffies…..can you clearify sir//

    • jiffies is 32-bit variable, which would have a maximum, after which it would overflow and return back to 0. Moreover, INITIAL_JIFFIES is initialized to its maximum minus jiffies in 5 minutes, and hence most of the times you’d find it to be greater.

LEAVE A REPLY

Please enter your comment!
Please enter your name here