Tuesday 21 January 2014

Examining compiled C code

After writing a very simple C program, we compile it with a few different options and take a look at what the compiled code looks like, as well as compare the differences those options make.

C code looks like this:
 #include <stdio.h>  
 int main() {  
   printf("Hello World!\n");  
 }

Initially it is compiled by using GCC and options:
 -g                    # for debugging  
 -O0 -fno-builtin      # to make sure the code isn’t optimized

Then by using the objdump command and the option --source (or -d, which doesn’t include the original C code) we disassemble the compiled output, more specifically the sections containing the C code, and can examine how the source code has been converted into machine code.
We could also use the -f option to display the header information for the entire file.

Further we add to or remove from the initial set of options for the GCC assembler, as well as alter the source code, and have a look how that affects the output:

1. Adding compiler option -static: 
The output this produces is much larger in size. This is due to the fact that all libraries used by the program are included within the output. The benefit of that is the programmer doesn’t have to worry whether the user will have the libraries installed or not, but the drawback, of course, is the significant size gain of the output.

2. Removing the -fno-builtin option, which was used to exclude any function optimizations: 
Looking at the disassembly, we immediately notice that the compiler has replaced the printf() function with a much simpler puts() function, thus optimizing the compiled code.

3. Removing the -g option, which is used for debugging purposes: 
The size of the compiled output shrinks. Section headers and disassembly contain significantly less information that would have been used for debugging.

4. Adding additional arguments to the printf() function: 
After adding additional argument to the printf() funtcion
 printf("Hello World! %d,%d,%d,%d,%d,%d,%d,%d,%d,%d,%d\n", 0,1,2,3,4,5,6,7,8,9,10);
and recompiling the program, the compiled code seems to run out of registers for the arguments and stores the memory addresses of the rest of the arguments on the stack.

5. Moving the printf() to a separate function (I called it void PrintTheF()) outside of main() and then calling that function from main(): 
Examining the disassembly it looks like the main() calls the new function PrintTheF(), which then moves the address of the string into the register and then calls printf(). In essence, what was previously done in main(), is now being done in the separate function PrintTheF() that contains printf().

6. Changing the -O0 option to -O3: 
After recompiling the code, there is a noticeable difference in <main>, by using the option -O3, the compiler has optimized the code reducing it by 5 operations.
EDIT: The optional also removes protection from the stack, because the stack is simply not being used in this case.

2 comments:

  1. The program is pretty simple, so there's not much to optimize. What was the compiler able to remove with -O3 and why?

    ReplyDelete
    Replies
    1. Sorry for the delayed response, Chris. I've updated the post based on today's review part of the lecture, just for future reference.

      Delete