Having Fun with gcc, the C Compiler

0
4401
GNU Compiler Collection

In this article we will discuss a few important options from among the hundreds provided by the gcc compiler to serve the needs of different types of programmers – student as well as professional.

First of all, we need to distinguish between GCC and gcc. GCC (GNU Compiler Collection) is a compiler system that provides compilers for the following programming languages – C, C++, Objective-C, FORTRAN, Ada, Go and D. The C compiler provided by GCC is called gcc. As the title suggests, this article is about gcc – the C compiler, and not about GCC, the compiler system. There’s also one question that needs to be answered before we proceed any further, and that is, “Do I really believe that the use of a compiler can ever be considered fun?” The answer is, yes. Exploring the features of gcc is not only informative but it could also be fun.

Let us begin by discussing two of the most commonly used options of gcc. Consider the simple C program pgm1.c given below. This and all the other programs along with their associated files discussed in this article can be downloaded from opensourceforu.com/article_source_code/Nov19funwithgcc.zip.

#include<stdio.h>
#define NUMBER 5
int main()
{
int a = NUMBER;
printf(“Square of %d is %d\n\n”, a, a*a);
return 0;
}
Figure 1: Output of the program pgm1.c

The gcc command gcc pgm1.c will compile the program pgm1.c to produce the Linux-executable file with the default name a.out. The executable file a.out can be executed with the command ./a.out. The executable file produced by gcc can be given a specific name by using the option -o. The command gcc pgm1.c -o exfile will create an executable file called exfile. This executable file can be executed with the command ./exfile. Figure 1 shows the execution of the program pgm1.c.

The compilation process of a program
The compilation of a C program involves a number of steps like preprocessing, generating assembly code, object code, executable code, etc. If required, gcc allows us to do this process step by step till we get an executable file. This is especially useful to computer science teachers because such an explanation will give students a clear understanding of the compilation process.

First, from the C file pgm1.c, let us create the preprocessed code in the file pgm1.i with the command gcc -E pgm1.c -o pgm1.i. Next, we will use the preprocessed code in pgm1.i to obtain the assembly code in the file pgm1.s with the command gcc -S pgm1.s. Next, the assembly code in the file pgm1.s is used to obtain the object code (relocatable machine code) in file pgm1.o with the command gcc -c pgm1.s. Finally, the object code in file pgm1.o is used to produce the executable (absolute machine code) file exfile with the command gcc pgm1.o -o exfile.

Figure 2 shows this step by step compilation process of the C program pgm1.c. In the figure, we can see that the macro NUMBER defined by the line of code #define NUMBER 5 and used in the line of code int a = NUMBER; in pgm1.c is replaced with the line of code int a =5;. Since the preprocessed file pgm1.i is very large, only the last six lines are printed with the command tail -6 pgm1.i.

Similarly, for the assembly code file also, only the last six lines are printed with the command tail-6 pgm1.s. The command ls shows all the files produced during the step by step compilation process.

Finally, the figure shows the output by executing the absolute machine code in file exfile.

Figure 2: Step by step compilation of the C program pgm1.c
Figure 3: Differences between GCC C and ANSI C

Options for C versions and warnings
The C programming language has the following versions — K&R C, ANSI C (ISO C), c99, c11, and c18. Programmers can specify the version of C so that the gcc compiler can check whether the code is compliant with that particular standard or not. For example, the options -ansi, -std=c99, and -std=c11 can be used to specify ANSI C, c99, and c11 versions of C, respectively. For a better understanding of this, consider the small C program pgm2.c.

#include<stdio.h>
int main()
{
int unix;
return 0;
}

Let us compile the C program pgm2.c with the commands gcc pgm2.c and gcc -ansi pgm2.c. Figure 3 shows the terminal after the commands have been executed. We get an error with the first command whereas the second command produces the executable file a.out. So why do we get an error with the first command? GCC provides predefined macros like UNIX as an extension, which are not part of standard C. So, the identifier unix is valid in ANSI C whereas in GCC it is not.

Figure 4: The options – Wall and – Werror

Now let us familiarise ourselves with a few options like Wall, Werror, etc, which are used for customising warnings during compilation. Previously, when we have compiled the C program pgm2.c, we didn’t get any warnings (see Figure 3). Now, let us compile pgm2.c with the option -Wall as gcc -ansi -Wall pgm2.c.

Figure 4 shows us that on compilation, we get the warning message Unused variable on the line of code int unix;. However, the option -Wall produces the executable file a.out whereas if the option –Werror is also added, no executable file is produced. This example also tells us that multiple gcc options can be used for a single compilation. The options -Wall and -Werror help us write code that does not contain any potential bugs in the later stages of development. I believe these options, when enabled, will allow students of C programming to write good quality code.

A C program without main() function
The gcc compiler allows us even to compile a C program without the main() function. This is one of the several solutions to the popular C programming puzzle, “Write a C program without the main() function.” This technique can also save at least three characters in C code golfing competitions, which are programming competitions in which the competitors try to write a program with the shortest source code possible to solve the given task. Consider the C program pgm3.c without the main() function, given below.

#include<stdio.h>
#include<stdlib.h>
int X( )
{
printf(“\nHello World\n”);
exit(0);
}

We must be careful to use the exit(0) function instead of return 0; statement to terminate the program because the main() function is absent in the code. Figure 5 shows the output of the program pgm3.c when compiled with the command gcc -nostartfiles pgm3.c. In the figure, we can see that the gcc compiler issues a warning stating that the main() function is absent.

Options for optimisation
There are a number of options provided by gcc for optimisation of the code. Some of the options include –O0, -O1, O2, -O3, -Ofast, -Os, etc. Consider the program pgm4.c to better understand optimisation with gcc. The program pgm4.c measures and prints the execution time taken to initialise an integer array a[1000][1000] with the value 1. The program is suitable for optimisation because of a feature of the memory called spatial locality. Spatial locality refers to the use of data elements within relatively close storage locations. The C programming language uses row major ordering where two-dimensional arrays are stored, row by row, in memory. However, in the program pgm4.c, the two nested for loops and the line of code a[j][i]=1; store the number 1 in the array column by column. Therefore the time taken to access array elements is more than the time taken to access the same number of array elements row by row.

We will use different optimisation options to achieve faster and faster execution. Incidentally, even without any compiler optimisation, we can reduce the execution time by replacing the line of code a[j][i]=1; with the line a[i][j]=1;, where the initialisation will be done row by row.

#include<stdio.h>
#include<time.h>
int main( )
{
float t1,t2,t3;
int i,j,a[1000][1000];
t1=clock( );
for(i=0;i<1000;i++)
{
for(j=0;j<1000;j++)
{
a[j][i]=1;
}
}
t2=clock( );
t3=(float)(((t2-t1)/CLOCKS_PER_SEC)*1000);
printf(“\nTime taken = %f milliseconds\n”,t3);
return 0;
}
Figure 5: Output of the program pgm3.c

The GCC manual tells us that the options -O1, -O2, and -O3 optimise the code in such a way that the execution becomes faster and faster. Figure 6 shows the output of the program with and without optimisation. We can see that the execution becomes faster and faster, as expected, with the unoptimised code being the slowest and the code optimised with option -O3 being the fastest.

Options for conditional execution
There are many options provided by gcc for efficient debugging of code. One technique is to execute a certain piece of code only if a particular macro is defined. This code can be used only for testing or debugging and will not be executed during actual runs. The option -D defines a macro for a preprocessor to detect. The program pgm5.c illustrates the use of option -D for the conditional execution of code.

#include<stdio.h>
int main()
{
printf(“I am always printed\n”);
#ifdef DB
printf(“Why I am printed now?\n”);
#endif
return 0;
}
Figure 6: Execution time with different levels of optimisation
Figure 7: Conditional execution of code

The command gcc pgm5.c compiles the program without the macro DB defined. The command gcc -D DB pgm5.c compiles the program with the macro DB defined. Figure 7 shows the output of the program pgm5.c with and without defining the macro DB. It can be seen from the figure that the conditionally executed line of code printf (“Why I am printed now?\n”); gets executed only when the macro DB is defined.

Please note that what we have seen in this article is just the tip of the iceberg. There are still hundreds of options in gcc remaining to be explored. Programmers can achieve different goals like platform-independent software development, efficient software development cycles, better debugging techniques, better training methods, etc, with these options. So, I am sure it will be immensely rewarding if you dig deeper into what gcc offers.

LEAVE A REPLY

Please enter your comment!
Please enter your name here