Hobbies for Geeks: Code Golfing and Code Obfuscation

0
3864

This article introduces the reader to two activities that are considered hobby programming — code golfing and code obfuscation.

People sometimes wonder about the hobbies of computer geeks who are often introverts and indoor people. To many computer programmers, programming is just another profession to make a living, often considered a tiresome task. But there are a few who like to write programs for enjoyment and as a hobby.

What is code golfing?
First, let us try to understand the rules of code golfing by looking at how the game of golf works. To win a game of golf you have to use the minimum number of strokes to hit the ball into all the 18 holes on the course, sequentially. Ironically, the aim of golf is to play for the least amount of time. With that knowledge about the game, let’s try to understand what code golfing is all about. In code golfing, a programmer tries to solve a problem with the shortest possible source code. But do remember that a ‘code golfed’ program may not be the best in terms of parameters like time complexity, space complexity (different from source code size), readability, etc, which are absolutely important when you are developing software.

Figure 1: The warnings and output of the program fib2.c

Now let us consider an example for code golfing. Let us write a program to print the first 31 Fibonacci numbers from 0 to 832040, each on a separate line. This is a question posed by a code golfing puzzles website (https://code-golf.io). I have decided to stick with C, one of my favourite programming languages, for the solution. The program given below, called fib1.c, is my first attempt. This and all the other programs discussed in this article can be downloaded from opensourceforu.com/article_source_code/May20HobbiesforGeeks.zip.

#include<stdio.h>
int main()
{
int f1=0,f2=1,f3,ctr=3;
printf("%d\n%d\n",f1,f2);
while(ctr < 32)
{
f3=f1+f2;
f1=f2;
f2=f3;
printf("%d\n",f3);
ctr++;
}
return 0;
}

This program successfully completed the challenge and the source code has 194 characters (including white space characters). Of course, this size was quite expected, as my code did not use any golfing techniques and was just plain enough to print the Fibonacci numbers. I have worked on this problem for some time and finally came up with the program called fib2.c given below. Notice that the program fib2.c is written in a single line to save some white space characters.

a,b=1;main(c){printf("0\1\n"); while(c^832040)c=a+b,a=b,b=c,printf("%d\n",c);}

The program fib2.c (78 characters) placed me at the 49th position among those who got the solutions. But I was amazed to learn that the current best solution has just 46 characters. Knowing that limit, I didn’t try to improve my solution any further. In all likelihood, the best solution might have used recursion to solve this puzzle because generating the Fibonacci series is an inherently recursive problem. To further illustrate the fact that in code golfing the only requirement is to write a program with minimum source code, take a look at Figure 1, which shows the warnings during the compilation of fib2.c and the first few lines of the output.

Figure 2: Output of the program ob1.c
Figure 3: Output of the program ob2.c

Now let us go through the program fib2.c to understand the code golfing techniques used and the reason for the many warnings. First, the three variables used are named a, b and c — single character variable names. I am sure software companies following good programming practices will fire you immediately for such variable names but in code golfing, source code size is the only thing that matters. Further, we haven’t declared the data type of these variables. Due to the implicit int rule of C, the three variables are of type integer, by default, even though you get a warning during compilation.

Since the variable a is declared as a global variable, it is initialised to 0 automatically. Thus we can save a few more characters. Notice that the variable c is declared inside the main() function because this function can also have arguments. This helps you save one more character, a comma separating multiple variables. The test condition of the while loop is while(c^832040) instead of while(c != 832040), thus saving one more character. The four statements c=a+b, a=b, b=c and printf(“%d\n”,c) are separated by commas, making it a single statement. Hence the while loop does not require opening and closing brackets. Also notice that I haven’t included any header files, resulting in even more warnings during compilation. Thus, from the above example, it is clear that you need a clear understanding about the different features of the programming language you are using for code golfing. But to win code golfing competitions you also need to develop better algorithms to solve the given problem, and out-of-the-box thinking.

The website mentioned earlier, which offers code golfing puzzles, does not allow users to inspect winning solutions by other programmers — in spite of that the puzzles and the competition format of the website is quite interesting. But if you really want to inspect the solutions provided by other programmers for different code golfing challenges, I urge you to visit the Code Golf website of the Stack Exchange Network. The link https://codegolf.stackexchange.com/ will take you there. You may not be familiar with this website, but I am sure you have all heard about Stack Overflow, also from the Stack Exchange Network.

Figure 4: Output of the program kerala.c

Another important question that needs to be answered about code golfing is regarding the most suitable programming language for participating in such challenges. Notice that the size of the source code often depends on the programming language being used. To understand this point better, consider a Python script and a C program that prints the message ‘Hello World’. The Python script needs just one line whereas the C program needs additional constructs like the declaration of the main() function, opening and closing brackets, etc, other than the printf() statement. Due to this reason, separate code golfing competitions are often conducted for different programming languages. But on the rare occasion where the competition is programming language agnostic, which language is preferred? The choice definitely depends on the programming language you are most familiar with. I usually attempt solutions in C or Python, depending on the specific problem at hand. But let’s suppose someone really wants to win code golfing competitions, to the extent of being willing to learn a new programming language. In that case, which programming language should be chosen? One choice is Perl, a general-purpose programming language that allows you to write very crisp programs. In fact, code golfing competitions were initially held by the Perl programming community and later adopted by other programming language communities. But Perl’s reputation as the best programming language for code golfing has been challenged in recent years.

Many programmers became so immersed in code golfing puzzles, some of them even developed programming languages that are suitable for code golfing challenges alone. One such option is the aptly named GolfScript, an esoteric programming language solely developed to win such competitions. But do remember that programs written in such programming languages are not at all readable and, after a few years, even the person who wrote the program will have a hard time understanding it. But, as mentioned earlier, the only requirement for code golfing competitions is to write programs with the least source code and nothing more.

A quick look at code obfuscation
Now let us discuss code obfuscation, yet another recreational programming activity. Code obfuscation involves writing programs that are difficult for human beings to understand. To illustrate this point further, consider the program ob1.c given below:

#include <stdio.h>
#include <string.h>
#define _______ ;
#define What p##r##i##n##t##f(
#define is "%s\n"
#define This ,_____);}
#define ______ m##a############i##n
#define __o__ int
#define __0__ char
#define __O__ strcpy
#define ooo ++
#define oo /2
__o__ ______(__o__ __, __0__
*___[]){__o__
_=0;__0__ ____[32]_______ __0__ _____
[32] _______ __O__
(____,"aHbeglhl1oj oWuotrklmdq!b"
) _______
for (_=0 _______ _<26 _______ _ ooo )
_____[_ oo ]
=____[++_] _______
What
is
This

I had written this program many years ago for my blog. Can you guess the output of the program? It prints the message ‘Hello World!’ Though the program looks like a mess, there is nothing clever in it. It uses the C preprocessor to define alternative symbols for many commonly occurring elements in C. The program also uses the fact that variable names in C can begin with an underscore symbol. Thus, many of the lines you see in the program are actually predefined macro variables. Similarly, if you observe carefully, the letters in the even positions of the string aHbeglhl1oj oWuotrklmdq!b form the string Hello World!. The program also uses the token pasting operator (##) of the C preprocessor to combine multiple tokens into a single token. For example, m##a############i##n is the same as main. Figure 2 shows the output of the program ob1.c, which prints the message ‘Hello World!’.

Now consider the program ob2.c given below. This is another program I had written for my blog a long time back.

#include<stdio.h>
#define _ ;
#define __ ,
#define ___ (
#define ____ )
#define _____ {
#define ______ }
#define _______ +
#define ________ printf
#define _________ scanf
#define __________ while
#define ___________ int
#define ____________ main
#define OO =
#define OOO 1
#define o "
#define ooo "%d"
#define oooo <=
#define o00o *
#define o0o &
#define o000o [
#define oo0oo ]
#define o0 char
o0 __0__ o000o 20 oo0oo OO _____ 69 __ 78 __ 84 __ 69 __ 82 __
32 __ 78 __ 79 __ 58 ______ _
o0 __00__ o000o 30 oo0oo OO _____ 70 __ 65 __ 67 __ 84 __
79 __ 82 __ 73 __ 65 __ 76 __ 32 __ 73 __ 83 __ 58 ______ _
___________ ____________ ___ ____
_____
___________ O OO OOO __ OOOO __ OOOOO OO OOO _
________ ___ __0__ ____ _
_________ ___ ooo __ o0o OOOO ____ _
__________ ___ O oooo OOOO ____
_____
OOOOO OO OOOOO o00o O _
O OO O _______ OOO _
______
________ ___ __00__ ____ _
________ ___ ooo __ OOOOO ____ _

Again, it is nothing to be proud of nor is it too clever; it is just about abusing the C preprocessor and variable naming conventions and, maybe, required a bit of patience. Now can you guess what the program is doing? Figure 3 shows the output of the program ob2.c, which reads a number from the keyboard and finds its factorial. Notice that the command gcc -w ob2.c is used to compile the program ob2.c, so that all the warnings are ignored. Further, notice that the strings ENTER NO: and FACTORIAL IS: are printed using the numeric ASCII value of the characters. For example, the ASCII value 69 corresponds to the letter E.

Now consider the C program called kerala.c given below:

int main()
{
int Oo,O,o;
for(O=0, o=0; Oo="DEEPU-OSFY-=4z6{7y9v;u:v:v;t=s?r?qElFjGjHhGiB"
"nBoDkFjC64gHhHg FiFjIgJeIgGiFj DlDkD76c33C48cLc"
"Kd33Jd33Hf33Gf33Gf 43Eg43Fg35HcLcKeIhFjDkCmAnC"
"nBn?r=s>s<t9x6z3?"[O+++'d'-'Y'];)
while(Oo-- > 'n'-'<') putchar(++o == 'K'?(o=0,10):33^O&1);
return 0;
}

The program is definitely not self-explanatory. Figure 4 shows the output of the C program kerala.c. You may not be familiar with the map of Kerala, an Indian state. But a cursory search on the Internet will confirm that the ASCII art map drawn by the program is indeed that of Kerala. ASCII art is the graphic design technique that uses a computer for presentation and consists of images drawn with characters defined by ASCII. I was inspired by a similar program, which prints the ASCII art map of India. I really wanted to include that program in this article. But that program is so popular that it is used by a lot of people, and I was unable to identify the original source. I believe using that program without proper citations in this article is an injustice to the ingenious programmer who wrote it originally. But if you search the Internet with the phrase ‘C program to print the map of India’, you will get the program and some nice explanations. Now, coming back to our program kerala.c, how does it work?

Figure 5: Output of the modified program kerala2.c

Let us try to understand the working of the program. The line of code int Oo,O,o; declares three variables, Oo, O and o. The use of such variable names will confuse casual observers a lot. The core of the program is the string:

DEEPU-OSFY-=4z6{7y9v;u:v:v;t=s?qElFjGjHhGiBnBoDkFjC64gHhHgFiFjIgJeIgGiFjDlDkD76c33C48cLcKd33Jd33Hf33Gf33Gf43Eg43Fg35HcLcKeIhFjDkCmAnCnBn?r=s>s<t9x6z3?

The string is divided into a number of lines for readability. This string is an example of run length encoding, a lossless data compression technique. For example, the string AAAAAAAABBBB can be written as A8B4 in run length encoding because there are eight As and four Bs in the string. The characters in the string in our program tell the compiler to print spaces and exclamation marks alternatively. The characters of the string from the 0th position to the tenth position are not part of the image being drawn and hence ignored. The 11th character is =, whose ASCII value is 61. The program is written in such a way that 11 (61-50) spaces will be printed. The 12th character is 4, whose ASCII value is 52 and, hence, the program prints 2 (52-50) exclamation marks. The ASCII value of the character in the odd position specifies the number of spaces to be printed, and the character in the even position specifies the number of exclamation marks to be printed. The while loop in the program iterates through the string, and prints a newline character or a space or an exclamation mark with the line of code putchar(++o == ‘K’?(o=0,10):33^O&1)’. The variable O contains the index of the character in the string being processed. If O contains an odd number, O&1 becomes 1 and 33^O&1 becomes 32 (refer to the truth table for XOR operation), the ASCII value of space. On the other hand, if O contains an even number, O&1 becomes 0 and 33^O&1 becomes 33, the ASCII value of exclamation mark. Thus, spaces and exclamation marks are printed in an alternating fashion. Replace the code fragment putchar(++o == ‘K’?(o=0,10):33^O&1); with putchar(++o == ‘K’? (o=0,10):32^O&1); to obtain the program kerala2.c, whose output is shown in Figure 5. Can you explain the reason for the output of the program kerala2.c. If you haven’t got the answer please download the program kerala2.c from the link mentioned earlier. I have added the answer as a comment. What happens if the code fragment putchar(++o == ‘K’?(o=0,10):34^O&1); is used to replace the original code?

Yet another question that needs to be answered is, how did I get this string? Well, this is the part where my patience helped me. First, I drew the ASCII art map of Kerala by hand, referring to a map, and then wrote another program to count the number of lines, spaces and exclamation marks, and then produced the string used in the program kerala.c.
Now the next important question to be answered is: which programming language is suitable for code obfuscation? Here again the choice of language often depends on personal priorities. But very often, the preferred language for code obfuscation is Perl, which is famous for its useful one-liners and notorious for producing unreadable code. Code obfuscation competitions are regularly held for Perl and C programmers. The most famous code obfuscation competition called International Obfuscated C Code Contest (IOCCC) is held for C programmers. You can find the official website of IOCCC at http://www.de.ioccc.org/. I urge you to go through the winning entries of IOCCC. I was really fascinated and felt humbled by the amazing code produced by the winners.

Now it is time for us to wind up our discussion. So far, we have discussed code golfing and code obfuscation as the hobbies of probably just a few geeks. Such activities are very important because many popular programming contests have code golfing and code obfuscation rounds. If you participate in these contests regularly, you will have a tighter grasp over the minute features of programming languages. Moreover, code obfuscation makes reverse engineering a lot more difficult. I am sure you are aware that having the ability to write code that is difficult to reverse engineer is a skill that is highly valued in programming circles. So, I assure you, you will be aptly rewarded for the time spent on code golfing and code obfuscation competitions.

LEAVE A REPLY

Please enter your comment!
Please enter your name here