Let’s start with the ar
utility, which is used to create, extract and modify archives. Its main use is in building static libraries, which we will examine with an example. We will create two .c
files, with a function defined in each. The first is foo.c
:
void foo(int *i){ *i=*i+5; }
The second is bar.c
:
void bar(int *i){ *i=*i*5; }
Now, let’s compile these two files, using the -c
GCC option to only create object files and not link them. After compilation, we have the object files, as shown:
$ gcc -c foo.c && gcc -c bar.c $ ls *.o bar.o foo.o
Most of the time, we do not write code for a program from scratch, but use already available source or compiled code. For example, in C, we use the printf()
function, from a library of pre-compiled functions.
Libraries are of two types, static and dynamic. Dynamic libraries need to be linked to at run-time; if one is missing, the application won’t run. In cases where we want a program to be “independent” and not require any other libraries to be installed on the system before it can run, we need to resolve external functions and variables at compile time, and copy them into the program binary. This removes the runtime dependency on the library. For this purpose, we create static libraries — archive files of one or more object files.
The ar
tool can archive binary files, and is used to create such static libraries — actually, to create, modify and extract archive files. You may ask, what is the difference from the tar
(tape archive) tool, which also can archive binary and other types of files, and that too, with compression? The answer is, ar
creates a symbol table inside the output file, whereas tar
doesn’t; ar
yields a collection of symbols, and tar
a collection of files. You can get more details from the ar
man page.
So let’s create our static library. Its name should be in the format liblibraryname.a
. The lib
prefix is necessary, and the extension must be .a
as per UNIX standards (.so
is used for shared libraries):
$ ar -cvq libfoobar.a foo.o bar.o a - foo.o a – bar.o $ file libfoobar.a libfoobar.a: current ar archive
That’s it! Now, let’s use this library in a sample program. It’s better to create a header file declaring the functions defined in the library — foobar.h
:
#ifndef _FOO_BAR_ #define _FOO_BAR_ void foo(int *); void bar(int *); #endif
Here is a sample program (test.c
) that uses the static library:
#include<stdio.h> #include"foobar.h" int main(int argc, char **argv){ int i=5; int j=10; foo(&i); bar(&j); printf("i = %d\n",i); printf("j = %d\n",j); return 0; }
Let’s compile the sample program with gcc test.c -L./ -lfoobar
; the -L
option is for the library path, and -l
for the library name.
GCC does not require the complete name of the library file; we can omit the prefix lib
and the extension .a
or .so
(in case of a shared library), so let’s use -lfoobar
. This command yields the output file a.out
, which will have the contents of the static library compiled into the binary. You can even delete the libfoobar.a
file now, and you can still run the output file:
$ ./a.out i = 10 j = 50
So we have now built a static library with ar
, and used it, statically compiled into a program.
The ld tool is important
Linking is the process of combining various pieces of code and data together to form a single executable image (that can be loaded) in memory. Linking can be done at compile time or runtime. GCC performs linking in the background; compile with the -v
option, and you will see many background details, including the linking.
To understand what happened during compilation, you must know how to use a linker manually — so let’s compile a simple “Hello World” C program without linking it (as before, -c
) with gcc -c hello.c
. Let’s manually do linking of the resultant object file hello.o
.
First, you must know that main()
is the starting point of the program. Now, to make an executable, add some more object code, and the “C” library, into the final executable, with the following command:
$ ld -o hello -dynamic-linker /lib/ld-linux.so.2 /usr/lib/crt*.o hello.o -lc
Here, -o
names the output file hello
; -dynamic-linker
will include the shared library symbols from ld-linux.so.2
in the executable; we will include hello.o
and the other object files required, and finally include the C static library with -lc
.
In the output executable hello
, /usr/lib/crt*.o
, hello.o
, and the C library (-lc
) are statically linked to (copied into) hello
, and /lib/ld-linux.so.2
are linked dynamically.
How are ar
and ld
different?
The ar
tool can only archive binary files statically, whereas ld
links both shared and static libraries. Also, ld
resolves symbols, which ar
doesn’t. We don’t need to use ld
manually, since GCC handles this — but to learn about what is in the background, we tried the above steps.
nm
The following diagram represents the memory segments of a C program (heap, data, code and stack), which C programmers must consider.
Different variables and symbols make their entry into different sections: dynamically allocated memory in heap, static and global variables in the data section; code and constants in the text part; and local variables in the stack section.
It’s hard to investigate which symbol goes to which section — so, many thanks to open source developers, for the amazing tool nm
, which can dissect a.out
and detail the symbols present in it.
Let’s try this with a sample program, test.c
:
#include<stdio.h> #include<stdlib.h> #include<string.h> /* Declare some global variables*/ int global_int1; int global_int2=10; char global_string[10]; const int const_int = 10; void test1(void){ global_int1 = 20; printf("[test1] global_int2 = %d\n",global_int2); } void test2(void){ strcpy(global_string,"Hello"); printf("[test2] global_int1 = %d\n",global_int1); } void test3(void){ printf("[test3] global_string = %s\n",global_string); } int main(int argc, char **argv){ test1(); test2(); test3(); return 0; }
Let’s compile the program with gcc test.c -o test
and use nm
to dissect it:
$ nm ./test 08049674 d _DYNAMIC 08049740 d _GLOBAL_OFFSET_TABLE_ 0804855c R _IO_stdin_used w _Jv_RegisterClasses 08049664 d __CTOR_END__ 08049660 d __CTOR_LIST__ 0804966c D __DTOR_END__ 08049668 d __DTOR_LIST__ 0804865c r __FRAME_END__ 08049670 d __JCR_END__ 08049670 d __JCR_LIST__ 08049764 A __bss_start 0804975c D __data_start 08048510 t __do_global_ctors_aux 08048370 t __do_global_dtors_aux 08048560 R __dso_handle w __gmon_start__ 0804850a T __i686.get_pc_thunk.bx 08049660 d __init_array_end 08049660 d __init_array_start 080484a0 T __libc_csu_fini 080484b0 T __libc_csu_init U __libc_start_main@@GLIBC_2.0 08049764 A _edata 0804977c A _end 0804853c T _fini 08048558 R _fp_hw 080482b4 T _init 08048340 T _start 08049764 b completed.5963 08048564 R const_int 0804975c W data_start 08049768 b dtor_idx.5965 080483d0 t frame_dummy 08049778 B global_int1 08049760 D global_int2 0804976c B global_string 08048476 T main U memcpy@@GLIBC_2.0 U printf@@GLIBC_2.0 080483f4 T test1 0804841d T test2 08048459 T test3
Let’s examine the output. First, where do the program functions go? Near the end of the output, you see the names main
, test1
, test2
and test3
with a T
preceding them; T
stands for the text section, in which all these functions are.
Next, the global and static variables: global_string
and global_int1
are preceded with B
, while we have R const_int
and D global_int2
. This is because the data section is divided into two further sections: Uninitialised Data or BSS (Block Start by Symbol), and Initialised Data. Both global_string
and global_int1
are declared but not initialised, so they are in BSS (B
); global_int2
has been initialised, and is in D
, the Initialised Data section.
The storage of const_int
is interesting. We declared it as a const
variable, whose value won’t change throughout the program — a read-only variable. Thus, it is stored in R
, the read-only data section.
There are options for nm
— see the man page, and experiment, to understand object files. For example, -S
will show the size of each symbol in hexadecimal form, as follows (a snippet of the output):
$ nm -S ./test 08049778 00000004 B global_int1 0804976c 0000000a B global_string 08048476 0000001e T main
Quite interesting topic. nice explanations :)
pretty nice