Friday 31 January 2014

Coding in Assembly

I thought I had a fairly good understanding of the very basics of the assembly language, then I tried writing my own simple program and was proven wrong. The program I initially set off writing in x86_64 (which later I had to also write in aarch64) consisted of writing a simple for loop that would iterate from 0 to 9 and display an output that consists of 'Loop: n' where n is a number corresponding to the loop's index.  Looks something like:


Loop: 0
Loop: 1
Loop: 2
...
Loop: 7
Loop: 8
Loop: 9

I was already given code for a program that would print out 'Hello World' and code for a program with a loop that didn't have any output. Combining the two I changed the output string from "Hello World\n" to "Loop: \n" leaving two spaces after the colon to accommodate the loop index integer. Then in the loop iteration I used the register %r15 in which the loop index was stored in and moved it into another unused register %r14. After which I converted the contents of %r14 to an ascii character by adding an ascii '0' to it. For the last step I stored the byte of the register containing the ascii value of the loop index integer into my output string in the position where I left the second space, so in other words I replace the space with the number I want to output.

After successfully compiling the program and running it with my desired output, I had to modify it yet again to a similar output, but now going from 0 to 30. In this case I had to modify the string to include another space since the output would have to be in double digits. Using the divide function in which the divisor was 10 and the dividend was the loop index, I took the quotient as the first digit and the remainder the second digit. Using the registers which are designed to store quotient and remainder, I moved them to the registers that are expected to be saved, converted them to ascii and replaced the spaces in the output string to the acquired integers during each iteration. The output looked like:

Loop: 00
Loop: 01
Loop: 02
...
Loop: 28
Loop: 29
Loop: 30

I wasn't quite done just there. For the last part, I had to remove the leading zeros form the output. To do that I loaded a register with an ascii '0' at the start to be used to comparison in the loop. During each iteration, after the division step, the code would compare the register containing the quotient (converted to ascii) to the register that stores '0' and if the comparison outcome is equal, it would jump over the operation that stores the byte of the quotient into the string (in place of the first space), continuing the loop and leaving the first digit on the string blank, thus eliminating the leading zero in the output.

After completing the task for x86_64, I was then instructed to write the same program in aarch64. It didn't turn out to be as simple as it sounded and even though most lines of code were a mindless code rewriting to satisfy aarch64 assembly syntax, certain operation didn't work the same way as they did in x86_64. The first issue was that the division operation would only give me the quotient, meaning I would have to calculate the remainder separately. To do that I had to use msub, which essentially subtracts the product of quotient and divisor (10) from the dividend (loop index). Once that was settled, I quickly found that I couldn't move the byte into a string by using mov. After some looking around, I found that there is a separate operation for that called strb which stores the byte in a string (which I have to offset in order to store it in the right position.). The last problem I had was I trying to figure out why I couldn't convert the register to ascii. The solution was to use the 32bit 'w' prefix for the register storing the value I wanted to convert.

The overall experience definitely came with a learning curve, which started as a mild frustration of not understanding what is going on, to a fairly humbling feeling which concluded in newly acquired knowledge. While coding assembly, you have you be prepared to change your logical view in order to achieve the results you are looking for. When it comes to debugging, both of the architectures provided brief error messages during compilation, or sometimes compile successfully, but not give you the output you were looking for (or no give you any output at all). With that being said, when something went wrong, debugging the code wasn't a fun experience.

Programming in assembly for both architectures is fairly similar (at least at such a simple level). However in this example aarch64 seemed a little less organized and I had to include a few extra operations to achieve the same output as the program written for x86_64.

Code:
aarch64
x84_64

Tuesday 21 January 2014

Examining compiled C code

After writing a very simple C program, we compile it with a few different options and take a look at what the compiled code looks like, as well as compare the differences those options make.

C code looks like this:
 #include <stdio.h>  
 int main() {  
   printf("Hello World!\n");  
 }

Initially it is compiled by using GCC and options:
 -g                    # for debugging  
 -O0 -fno-builtin      # to make sure the code isn’t optimized

Then by using the objdump command and the option --source (or -d, which doesn’t include the original C code) we disassemble the compiled output, more specifically the sections containing the C code, and can examine how the source code has been converted into machine code.
We could also use the -f option to display the header information for the entire file.

Further we add to or remove from the initial set of options for the GCC assembler, as well as alter the source code, and have a look how that affects the output:

1. Adding compiler option -static: 
The output this produces is much larger in size. This is due to the fact that all libraries used by the program are included within the output. The benefit of that is the programmer doesn’t have to worry whether the user will have the libraries installed or not, but the drawback, of course, is the significant size gain of the output.

2. Removing the -fno-builtin option, which was used to exclude any function optimizations: 
Looking at the disassembly, we immediately notice that the compiler has replaced the printf() function with a much simpler puts() function, thus optimizing the compiled code.

3. Removing the -g option, which is used for debugging purposes: 
The size of the compiled output shrinks. Section headers and disassembly contain significantly less information that would have been used for debugging.

4. Adding additional arguments to the printf() function: 
After adding additional argument to the printf() funtcion
 printf("Hello World! %d,%d,%d,%d,%d,%d,%d,%d,%d,%d,%d\n", 0,1,2,3,4,5,6,7,8,9,10);
and recompiling the program, the compiled code seems to run out of registers for the arguments and stores the memory addresses of the rest of the arguments on the stack.

5. Moving the printf() to a separate function (I called it void PrintTheF()) outside of main() and then calling that function from main(): 
Examining the disassembly it looks like the main() calls the new function PrintTheF(), which then moves the address of the string into the register and then calls printf(). In essence, what was previously done in main(), is now being done in the separate function PrintTheF() that contains printf().

6. Changing the -O0 option to -O3: 
After recompiling the code, there is a noticeable difference in <main>, by using the option -O3, the compiler has optimized the code reducing it by 5 operations.
EDIT: The optional also removes protection from the stack, because the stack is simply not being used in this case.

Thursday 16 January 2014

Understanding open source communities and code review processes


(moving all my stuff from Wordpress to Blogger)

For a better understanding how open source projects work, I took a look at a couple of free licensed packages available for Linux. In general I had a look at how the code is made available and how to make your own contribution. Furthermore, to grasp how bug fixes work, I picked a solved issue from each project as an example.


rawtherapee
License: GPLv3
Raw Therapee is a raw image editing tool, packed with many useful features. The project has a forum with a sizable amount of active users that seem to be contributing constantly. Their source code, as well as the bug tracker are available on Google code.
The one issue I looked at in particular was submitted Dec 2, 2013, it was due to a raw image from Nikon D610 not being processed properly. The submitter appeared to be mostly a user, as they did not participate in the discussion. One of the project owners commented 3 days later stating that raw images from the particular camera (in this case Nikon D610) were not supported yet. However a project contributor jumped into the conversation proposing a solution. After a few back and forths between the two, the solution was implemented that same day. The entire process was well organized and easy to follow.

qtractor
License: GPL
Essentially Qtractor is a audio/MIDI multi-track sequencer. The project code can be found on sourceforge, where you can make contributions to the project, whether it’s with enhancements or bug fixes.
The issue I observed involved an error with copying and renaming a clip, when pasting the renamed clip, the program didn't seem to register the name change. A solution followed the same day, but created another slight problem with the names appearing dim on some of the clips. The discussion was between the creator of the ticket and one of the contributors who worked on fixing both issues. The exchange took place over the course of 8 days, at the end of which the ticket was closed and the bugs were fixed. In this case the entire process was also very organized and easy to understand.

In both examples I chose to look at, both issues were attended to almost immediately and the communication between the users was active and clear. The only drawback I found with Raw Therapee bug tracking process is that there were not links to the changes being made during the discussion.