An application needs resources and services to complete a well-defined task, and these services are provided by the OS (let’s call it the “execution environment”). For example, the printf
function is provided by the runtime C library LIBC. This is a classic example of code reuse, and a modular approach towards solving a problem.
To make an application secure, we have to ensure that:
- The application logic/design, in itself, is secure.
- The execution environment (we limit ourselves to LIBC here) is secure.
- The interface between application and LIBC (the execution environment that is providing services) is secure.
We already discussed (1) in the previous article; I will discuss some of the other techniques in this article. We assume that (2) is secure enough. So, as mentioned in the “What Next” section of the previous article, (3) is the subject matter of this article — the application/LIBC interface. The focus is on GLIBC — GNU LIBC, although concepts are generic. The tools used are: gdb v7.0-3.fc12, gcc v4.4.2 20091027, glibc v2.11.
Basic concepts
A C program relies on several functions like printf
, strcpy
, strcat
, strcmp
, scanf
, etc. All these functions are defined in the C runtime library provided by the system (normally, in the /lib/
directory). Common convention is to name this library libc.so.X
, where X
is the version number. This file is a link to the actual library.
Let’s suppose that a call to function F — which is defined in GLIBC — originating from the application is intercepted and tampered with! The level of tampering can vary; e.g., a hacker can completely replace the GLIBC function with a custom implementation (using library interposition feature), or can alter the flow of the application without providing a new function definition. The application should not be blamed here; it is the interface that has been manipulated.
We will also see how ltrace
can be used for application tracing, and discuss how functions are defined in a gdb
script.
Modifying hackMe.c: hackMe2.c
Let’s rewrite the earlier program hackMe.c
, naming it hackMe2.c
. Newly added code is appropriately highlighted in the following code snippet.
#include<stdio.h> #include<stdlib.h> #include<string.h> /* This block of defines obfuscates all the functions e.g., authenticate will appear as FUNC4 in complied binary */ #define fake1 FUNC1 #define fake2 FUNC2 #define fake3 FUNC3 #define authenticate FUNC4 #define fake5 FUNC5 #define fake_str TEST char fake_str[100]; int i; #define DEFINE_FUNC(FNAME, RETTYPE, ARG, RETVAL, STRVAL) \ RETTYPE FNAME (ARG) \ { \ strcpy(fake_str, STRVAL); \ i=strcmp(fake_str, "123"); \ return (RETVAL); \ } #define CALL_FUNC(FNAME, ARG) FNAME(ARG) DEFINE_FUNC( fake1, int, int i, 1,"fake1_str") DEFINE_FUNC( fake2, float, float i, 2, "fake2_str") DEFINE_FUNC( fake3, long, long i, 3, "fake3_str") DEFINE_FUNC( fake5, double, double i, 5, "fake5_str") /* Returns 0 on success, and Returns 1 on failure */ int authenticate (char *test) { CALL_FUNC(fake1,1); if(strcmp(test,"PASS") == 0 ) return 0; /* success*/ else return 1; /* fail */ } /* USAGE: . /hackMe2 FAIL */ int main(int argc, char *argv[]) { int retVal = -1; if (argc< 2) { printf ("\n USAGE: %s <PASS|FAIL>", argv[0]); exit (-2); } /* skipping any checks on argv to keep it simple*/ CALL_FUNC(fake2,2); retVal = authenticate(argv[1]); CALL_FUNC(fake3,3); if( retVal == 0) printf("\n Authenticated ... program continuing...\n"); else { printf("\n Wrong Input, exiting...\n"); CALL_FUNC(fake5,4); exit (retVal); } /* . . Rest of the program . . */ }
Compile and link (to produce 32-bit binary) the program; then strip the binary, as follows:
gcc hackMe2.c -o hackMe2 strip hackMe2
Fake code has been included (although, not in the best way; however, it is okay for demonstration purposes). These are the functions: fake1
, fake2
, fake3
, fake5
and a variable fake_str
. Two macros (DEFINE_FUNC
and CALL_FUNC
) are responsible for defining and calling the functions defined. Every fake function has calls to strcmp
and strcpy
. Additionally, hackMe2.c
has these modifications from hackMe.c
:
- Obfuscation — Guessing a symbol name, in the application, is harder for a hacker.
- Fake code — A little harder for a hacker to find the code to be targeted.
- Stripped binary — The hacker can’t see any symbols inside the application; consequently, debugging is even more difficult. Note that this is the second obstacle level that the hacker faces — the first being the binary in release mode. But, as discussed in the previous article, this may invite debugging challenges in the field.
With these modifications, hackMe2.c
is better than hackMe.c
, if evaluated on a security basis (let’s not bother about the data structures book that taught us code size and runtime efficiency; that book didn’t tell us anything about hackers, by the way).
Analysing the victim binary
Let’s use nm
and see if anything can be found in hackMe2
:
[raman@localhost article]$ nm hackMe2 nm: hackMe2: no symbols [raman@localhost article]$
This is as we expect after we stripped the binary. Note that even main
is not known to nm
. You can run the objdump
command with options -T
and/or -R
on hackMe2
and see what symbols hackMe2
needs from GLIBC.
Can we use gdb
to debug hackMe2
? The answer is, no, not directly — because there is no visible symbol in the application, gdb
can’t set a breakpoint.
Peeking inside GLIBC
It’s been seen that hackers haven’t had much luck with this application till now — so it’s time to look into the GLIBC functions that are called by any application. hackMe2
is no exception.
We need a tool that can look into the GLIBC calls; fortunately, ltrace
is a tool that excels at this. ltrace
, a library call tracer tool, is shipped with (almost) every Linux distribution. Here is the output of running hackMe2
under ltrace
.
[raman@localhost article]$ ltrace ./hackMe2 WRONGPASSWORD __libc_start_main(0x80485de, 2, 0xbfaf1054, 0x80486a0, 0x8048690 <unfinished ...> memcpy(0x80499c0, "fake2_str", 10) = 0x80499c0 strcmp("fake2_str", "123") = 1 memcpy(0x80499c0, "fake1_str", 10) = 0x80499c0 strcmp("fake1_str", "123") = 1 strcmp("WRONGPASSWORD", "PASS") = 1 memcpy(0x80499c0, "fake3_str", 10) = 0x80499c0 strcmp("fake3_str", "123") = 1 puts("\n Wrong Input, exiting..." Wrong Input, exiting... ) = 26 memcpy(0x80499c0, "fake5_str", 10) = 0x80499c0 strcmp("fake5_str", "123") = 1 exit(1 <unfinished ...> +++ exited (status 1) +++ [raman@localhost article]$
Looking at the output, we find there are several library calls to functions defined in GLIBC. However, we are interested only in calls that have reference to the input criteria (i.e., the password string WRONGPASSWORD
). There is only one such call:
strcmp("WRONGPASSWORD", "PASS") =1 (#T)
“1” is the return value of strcmp
, which means a failure.
By this time, you may have realised that a hacker will try to apply the modify-function-return-value hack, discussed in the previous article, on the strcmp
function defined in GLIBC! This is an important point: applying this hack to a function defined in an application is different from applying it on functions defined in GLIBC. That’s because the application is under the control of the application developer, but GLIBC is not. We (the developers) have not left any hole in the application, as far as applying this hack is concerned — but we can’t stop hackers applying this hack on GLIBC functions!
Hacking ‘strcmp’ in GLIBC
The hacker’s goal is clear: override strcmp
and return 0 irrespective of the arguments passed — but this should be done only for the case #T
. Other calls to strcmp
should be left unchanged. So, here is the gdb
script (arg_strcmp.gdb
) that does the magic. It runs the victim application providing a wrong password, and then forcefully applies the modify-function-return-value hack to the desired strcmp
call.
# Raman Deep: rd.golinux@gmail.com file ./hackMe2 ################################# DEFINITIONS:START set var $_isEq=0 # Yes! GDB_STRCMP, below, is a gdb function. # Function that provides strcmp-like functionality for gdb script; # this function will be used to match the password string provided in command line argument # with the string argument of strcmp in program define GDB_STRCMP set var $_i=0 set var $_c1= *(unsigned char *) ($arg0 + $_i) set var $_c2= *(unsigned char *) ($arg1 + $_i) while ( ($_c1 != 0x0) && ($_c2 != 0x0) && ($_c1 == $_c2) ) #printf "\n i=%d, addr1=%x(%d,%c), addr2=%x(%d,%c)", $_i, ($arg0 + $_i),$_c1, $_c1, ($arg1 + $_i), $_c2,$_c2 set $_i++ set $_c1= *(unsigned char *) ($arg0 + $_i) set $_c2= *(unsigned char *) ($arg1 + $_i) #while end end if( $_c1 == $_c2) set $_isEq=1 else set $_isEq=0 end #GDB_STRCMP end end ################################# DEFINITIONS:ENDS br __libc_start_main r WRONGPASSWORD br strcmp c while 1 up set var $argOne=*(int)($esp) set var $argTwo=*(int)($esp+4) set var $_myStr="WRONGPASSWORD" printf "\n strcmp((0x%x)\"%s\" , (0x%x)\"%s\") \n\n",$argOne, $argOne, $argTwo, $argTwo GDB_STRCMP $argOne $_myStr if ( $_isEq == 1) printf "\n\t--> THIS IS OF MY INTEREST -> I AM GOING TO MAKE IT PASS <--\n" stepi step printf "\n\t--> EAX=%d before HACK!!, setting this to 0 <--\n", $eax set $eax=0 printf "\n\t--> Set...EAX=%d <--\n", $eax set $_isEq=0 end c #while end end
This script defines a function named GDB_STRCMP
, which takes two arguments (each being the address of the start of a C-style string); GDB_STRCMP
sets a variable _isEq
to 1 or 0, depending on whether two strings match or differ, respectively. A breakpoint on strcmp
is set; every time this breakpoint is hit, the argument of strcmp
is compared with our input criteria (WRONGPASSWORD
). If there is a match, that means we have #T
, so its return value is changed to 0 (which means success). All the strcmp
calls are printed mentioning the address and actual string. The following session shows the output of running hackMe2
in gdb
using the arg_strcmp.gdb
script. One unusual thing — the stack trace printed shows strcmp
called from exit
. This is wrong, but this happens because the binary is stripped, and hence gdb
guesses wrong.
[raman@localhost article]$ gdb -x ../gdbScripts/arg_strcmp.gdb -quiet -batch Breakpoint 1 at 0x8048364 Breakpoint 1, 0x009faad6 in __libc_start_main () from /lib/libc.so.6 Breakpoint 2 at 0xa5a200 Breakpoint 2, 0x00a5a200 in strcmp () from /lib/libc.so.6 #1 0x080484fc in exit () strcmp((0x80499c0)"fake2_str" , (0x8048762)"123") Breakpoint 2, 0x00a5a200 in strcmp () from /lib/libc.so.6 #1 0x080484bb in exit () strcmp((0x80499c0)"fake1_str" , (0x8048762)"123") Breakpoint 2, 0x00a5a200 in strcmp () from /lib/libc.so.6 #1 0x080485cc in exit () strcmp((0xbffff566)"WRONGPASSWORD" , (0x8048784)"PASS") --> THIS IS OF MY INTEREST -> I AM GOING TO MAKE IT PASS <-- 0x00a5a204 in strcmp () from /lib/libc.so.6 Single stepping until exit from function strcmp, which has no line number information. 0x080485cc in exit () --> EAX=1 before HACK!!, setting this to 0 <-- --> Set...EAX=0 <-- Breakpoint 2, 0x00a5a200 in strcmp () from /lib/libc.so.6 #1 0x08048549 in exit () strcmp((0x80499c0)"fake3_str" , (0x8048762)"123") Authenticated ...program continuing... Program exited with code 051. /home/raman/gdbScripts/arg_strcmp.gdb:59: Error in sourced command file: No stack. [raman@localhost article]$
Countermeasures
Solution to this problem is to inline the calls to strcmp
. A compiler may do this implicitly, but this would be compiler/system dependent. So, it’s better to achieve the same effect by explicitly avoiding a call to strcmp
. This can be done by defining a custom function like my_strcmp
— which should be obfuscated — that would have the same functionality as GLIBC’s strcmp
. However, my_strcmp
would not call any function in GLIBC. This will have its own logic using plain C language statements. Then the authenticate function (or any other sensitive function) would call my_strcmp
instead of strcmp
.
We have now learned how gdb
can be used to play with functions defined in GLIBC to alter the behaviour of an application. This article does not address all the security concerns that an application has to take care of in the real world, but it should increase awareness among software developers.
What next…
I hope readers find this helpful; I have found the techniques discussed very useful in application debugging. I have tried to explain, in detail, everything used in this article. However, I believe gdb
scripts are complex beasts, so I will probably try to write an article on gdb
scripts.
References
man gdb
man nm
man ltrace
man objdump
- My previous article.