Brewing Thoughts: 2014

Wednesday 16 April 2014

Project Updates (Final)

Well, with the little time that I had, I dabbled with the Eina package, trying to get it t build for AArch64, but I can't say I have had any luck with it. Doesn't look like there is any AArch64 support for it, and I did not have enough time to try and get to the bottom of why that is.

Seeing how this is my last post, for this course at least, I must leave it with a conclusion and my overall thoughts. Well it's been a ride, for sure. It seemed like surprisingly long 4 months, but I definitely feel like I've come quite a long way. Knowing nothing about assembly and very minimal knowledge about architectures, I must say this course has definitely left me with a solid foundation for those topics that I can easily build up on. And even though I haven't put together any patches for the open source projects I was looking at, I've gained much knowledge of where and how about the open source communities and contribution procedures. Last but not least, as a previously very minimal Linux user, this course has forced me to quickly adapt and learn my way around the OS and taught me to appreciate it a whole lot more. Looks like I'll be leaving it installed on my PC (as a dual boot).

If anyone in Seneca is looking into getting their feet wet in the open source contribution and learning their way around the process itself, I must say, SPO600 is definitely the way to go. Best part about the course, of course, is the lack of the exam, ha-ha.

As I said, I will be taking a lot of useful knowledge away with me from this semester and definitely looking at expanding on that knowledge in the near future!

Tuesday 15 April 2014

Projects Update #4

GCC-XML
Let's get this one out of the way first. As a fellow by the name of David commented on my previous post (Projects Update #2), GCC-XML uses a 7-year-old version of the GCC compiler, which anyone might guess does not necessarily work with ARMv8 architecture.
With the being said, I will leave this project as is, since in order to get it to work with AArch64, I would have to back port the aforementioned 7-year-old GCC compiler. I could only wish for the ability to learn enough about compilers and port it over in just a few days.

Qlandkarte GT
Well, there are some good news. After being stumped by trying to build qtwebkit on AArch64, I am still stuck at trying to build qtwebkit on AArch64, but with some progress. After trying to get the source rpm by cloning from Fedora repository, switching into different Fedora branches and using the

fedpkg srpm

command, then attempting to build the source rpm by using

rpmbuild --rebuild pkgname.src.rpm

I would get multiple errors usually involving the build instructions that would cascade deeper and deeper, stating that aarch64 is not supported, and instructions to build on aarch64 do not exist.
Then I tried getting the source rpm from the Fedora arm repository and rebuild it into an RPM file. Even though those should be the same repository, I was getting different errors, but the message they conveyed was very familiar. Still "no dice".
So I thought I would try a slightly different approach, this time, after cloning qtwebkit from the repository, I used the

fedpkg prep

command and got the source code with CMake files to build it. After making a separate build directory, and running the CMake command on the source files directory, I have received another plethora of errors, telling me about the lack of aarch64 support. So I go out looking if anyone has made any changes to the CMake instructions to include aarch64, which after some time I came across a patch. With nothing to lose, I went ahead and applied it.
Before running CMake again, I have to change the following line in CMakeList.txt

-- set(PORT "NOPORT" CACHE STRING "choose which WebKit port to build (one of ${ALL_PORTS})")

++ set(PORT Efl)

Now that I had those changes saved, it was time to try and run CMake again, and voilà (sort of)... a different error message, but this one looking more promising:

CMake Error at /usr/share/cmake/Modules/FindPackageHandleStandardArgs.cmake:97 (message):
  Could NOT find Eina: Found unsuitable version ".", but required is at least
  "1.7" (found Eina.h_INCLUDE_DIR-NOTFOUND;eina_main.h_INCLUDE_DIR-NOTFOUND)

Looks like a bit of progress, with another dependency missing. Next step is to see if the package Eina builds and works properly on AArch64, then we can take another step forward. I will work on doing just that tonight and post my results tomorrow, along with my final submission for the SPO600 course.

Cheers!

Sunday 6 April 2014

Projects Update #3

Finally got around to posting one more after a few days of trying to figure out my last problem with the projects

Qlandkarte GT
As stated previously, I have been completely blocked by not being able to build qtwebkit on aarch64. Well unfortunately I can't say there has been very much progress in this area, it has almost turned into a project in of its own. I afraid I am still battling with the same wall or errors I have been getting the last time, specifically with MacroAssembler that is being generated during build time of the package. At the moment I cannot post much more as I am still looking at the MacroAssember.h file that was produced in the /root/rpmbuild/BUILD/webkit-qtwebkit-23/Source/JavaScriptCore/assembler/ directory. I hope I can get a step in the right direction very soon.

GCC-XML
As I explored deeper into my problem while building the project an aarch64, using cmake, I have discovered a plethora of files specific for each individual architecture, one of which included ARM. ARM being the closest the have to aarch64, I had a glance there and found out that there is specific code for various ARM cores, as well as code for float point arithmetics and much more... Oh boy. If I get a break from qtwebkit, I will try to just blatantly change all arm to aarch64 and see if that will fly. Otherwise, I might be neck deep in trying to figure out how to cater this for aarch64.

Stay tuned.

Monday 24 March 2014

Projects Update #2

Qlandkarte GT
After getting past the minor issue of having no access to the internet on QEMU, I have managed to get most of the dependencies, including manually building GDAL for AArch64. However, there is one that keeps giving me issues, and it's QtWebKit. The only binary package of QtWebKit for AArch64 available at the moment is for Fedora 21(Rawhide), which would have probably been fine if there wasn't a conflict of libpng versions between the new QtWebKit and slightly older QT4-devel. QT4 uses libpng15, whereas QtWebKit (F21) uses libpng16, and they can't seem to agree. I have been offered a good solution for that issues and that was to try and build QtWebKit for AArch64 from source, using a fedpkg branch of an older Fedora release, since the lib dependencies shouldn't make a difference, and this way they would be compatible. Unfortunately I haven't been successful while rebuilding from *.src.rpm to binary rpm (getting this wall of errors), and this is once again where I am stuck on trying to get it to build on AArch64.

I have also contacted upstream about testing and benchmarking the program on X84_64. I was told by project administrator Oliver Eichler that "as QLandkarte is an event driven application with no permanent need to compute data there is no general benchmark system to supervise performance", but he also told me that the area that always needs optimizations is the rendering of the map, especially with a very large amount of way points or trackdata. I have yet to take a look into that, and hopefully I'm not in over my head on this part. So even if I can't build it on AArch64, I can at least try and make some adjustments that would benefit the performance of map rendering of the application.

GCC-XML
After looking at Traverso, I got a feeling that I might have bit off more than I could chew and decided to change direction for my second project. I chose a command line tool GCCXML which produces an XML description of a C++ program from GCC compiler's internal representation. This is to ease the task of other development tools that work with C++ programs by avoiding the C++ parser.

So far attempts at building it for AArch64 haven't been successful, but I am in the process of figuring out the details. Figuring out which area to focus my efforts on for optimization is to follow shortly after.

Thursday 20 March 2014

Main Project (cont'd)

I have contacted Qlandkarte GT upstream about AArch64 porting and was told that since that package is strictly built in C and C++, it should compile just fine, as long as all the dependencies are compatible and present. So no major porting effort is needed. However I am still curious if I can build it on AArch64, so I will at least continue with that effort.

Going forward with my attempt to build Qlandkarte GT on AArch64 I ran into some issues trying to install the dependencies. After trying to yum install qt4-devel I got a list of all the dependencies that will be installed and a prompt asking me to proceed, so I did, only to get error about there being no download mirrors for those dependencies. I was advised to yum clean all, but that only gave me an error of not being able to connect to the repositories...

Well after receiving another hint to look at my resolver, I did realize I had the wrong nameserver, which I promptly changed to Google's 8.8.8.8 and finally got the ball rolling. This made me realize that yum's cached data tricked me into thinking I had an Internet connection by listing the files and asking me to install.

With that resolved I have managed to install most of the dependencies except for GDAL, which I have to try and build manually. This is where I have left off, I'll update on how that goes.

Monday 10 March 2014

Main Project

For the main part of this course, we were to choose two open source projects that haven't yet been ported to AArch64 and do so. The two projects I chose are Qlandkarte GT and Traverso.

Qlandkarte GT

The great majority of the work I have done so far involves this project. While searching for assembly code within the source files, I have only found a few lines of bit shift operations which had a C fallback. Building the package was a lot more time consuming than anticipated: I've acquired all the dependencies and proceeded to auto-configure the make file by using CMake, as instructed. However I've ran into one issues where the error message I got was stating that I am missing one of the dependencies. This turns out to be an extension to one of the major dependencies (the ones mentioned in the installation instructions). After reaching out to the community I've been told I need a development version of that extension, and with that in mind I set off searching for it. I've spent countless hours installing and re installing different versions of that extension, but still kept getting the same message. Eventually I finally got it to configure and create the make file, and built the package.

Before I begin to create a road map for how the project should be ported, I want to set up an emulated AArch64 environment and try to build the package as is on it. The project has a fair amount of dependencies, which worries me that some of them might not be compatible with ARMv8 architecture, which will open up a can of worms. I'll post an update of how that goes in a very near future, as soon as AArch64 environment is set up.

Traverso

Unfortunately I have only had a very quick look at the assembly for this project, which produced quite a few lines of assembly, including atomics. I will also update on this project as soon as I have a closer look at the source files.

Wednesday 5 February 2014

Use of Assembly in packages

Today we're looking at two packages, lua and samba, and determining whether assembly has been used in them, and if so, what is its purpose.

Samba
As stated on their website "Samba is the standard Windows interoperability suite of programs for Linux and Unix.", which essentially allows seamless network communications between machines with different operating systems (Windows, Unix and Linux in particular).

While searching for assembly code, I found quite a bit in the file called inffas86.c, located in /lib/zlib/contrib/inflate86/. Assembly is architecture specific and is separated by ifdefs, each architecture specific chunk of code is about 300 lines. What it does is it loops and decodes some codes and writes them out as literals. Here is a brief explanation, found in the file, and a snippet of that code:

/*
   Decode literal, length, and distance codes and write out the
   resulting literal and match bytes until either not enough input
   or output is available, an end-of-block is encountered, or a data
   error is encountered. When large enough input and output buffers
   are supplied to inflate(), for example, a 16K input buffer and a
   64K output buffer, more than 95% of the inflate execution time is
   spent in this routine.
*/

#if defined( __GNUC__ ) && defined( __amd64__ ) && ! defined( __i386 )
    __asm__ __volatile__ (
"        leaq    %0, %%rax\n"
"        movq    %%rbp, 8(%%rax)\n"       /* save regs rbp and rsp */
"        movq    %%rsp, (%%rax)\n"
"        movq    %%rax, %%rsp\n"          /* make rsp point to &ar */
"        movq    16(%%rsp), %%rsi\n"      /* rsi  = in */
"        movq    32(%%rsp), %%rdi\n"      /* rdi  = out */
"        movq    24(%%rsp), %%r9\n"       /* r9   = last */
"        movq    48(%%rsp), %%r10\n"      /* r10  = end */
"        movq    64(%%rsp), %%rbp\n"      /* rbp  = lcode */
"        movq    72(%%rsp), %%r11\n"      /* r11  = dcode */
"        movq    80(%%rsp), %%rdx\n"      /* rdx  = hold */
"        movl    88(%%rsp), %%ebx\n"      /* ebx  = bits */
"        movl    100(%%rsp), %%r12d\n"    /* r12d = lmask */
"        movl    104(%%rsp), %%r13d\n"    /* r13d = dmask */
                                          /* r14d = len */
                                          /* r15d = dist */
"        cld\n"
"        cmpq    %%rdi, %%r10\n"
"        je      .L_one_time\n"           /* if only one decode left */
"        cmpq    %%rsi, %%r9\n"
"        je      .L_one_time\n"
"        jmp     .L_do_loop\n"

".L_one_time:\n"
"        movq    %%r12, %%r8\n"           /* r8 = lmask */
"        cmpb    $32, %%bl\n"
"        ja      .L_get_length_code_one_time\n"

"        lodsl\n"                         /* eax = *(uint *)in++ */
"        movb    %%bl, %%cl\n"            /* cl = bits, needs it for shifting */
"        addb    $32, %%bl\n"             /* bits += 32 */
"        shlq    %%cl, %%rax\n"
"        orq     %%rax, %%rdx\n"          /* hold |= *((uint *)in)++ << bits */
"        jmp     .L_get_length_code_one_time\n"

No fall-backs have been provided for this file and if the architecture is not supported, an error is thrown:

#else
#error "x86 architecture not defined"

Another file that contains assembly is byteorder.h, located in /lib/util/. In here we see assembly defined specifically for PowerPC that uses load/store instructions for a short or int conversion.

#if (defined(__powerpc__) && defined(__GNUC__))
static __inline__ uint16_t ld_le16(const uint16_t *addr)
{
 uint16_t val;
 __asm__ ("lhbrx %0,0,%1" : "=r" (val) : "r" (addr), "m" (*addr));
 return val;
}

static __inline__ void st_le16(uint16_t *addr, const uint16_t val)
{
 __asm__ ("sthbrx %1,0,%2" : "=m" (*addr) : "r" (val), "r" (addr));
}

Lua
Lua is a lightweight, powerful and embeddable scripting language used in a vast variety of applications and several well known games.

A quick search of Lua files for anything that resembles assembly return just one line of code in a single file. By taking a close look, the assembly looks very similar to what we saw in byteorder.h from Samba. In this case it is Microsoft specific and is used for an integer conversion. For any other architecture a line of C code is used instead. Here is what the assembly looks like:

#if defined(MS_ASMTRICK) || defined(LUA_MSASMTRICK) /* { */
/* trick with Microsoft assembler for X86 */

#define lua_number2int(i,n)  __asm {__asm fld n   __asm fistp i}
#define lua_number2integer(i,n)  lua_number2int(i, n)
#define lua_number2unsigned(i,n)  \
  {__int64 l; __asm {__asm fld n   __asm fistp l} i = (unsigned int)l;}

Friday 31 January 2014

Coding in Assembly

I thought I had a fairly good understanding of the very basics of the assembly language, then I tried writing my own simple program and was proven wrong. The program I initially set off writing in x86_64 (which later I had to also write in aarch64) consisted of writing a simple for loop that would iterate from 0 to 9 and display an output that consists of 'Loop: n' where n is a number corresponding to the loop's index. Looks something like:

Loop: 0
Loop: 1
Loop: 2
...
Loop: 7
Loop: 8
Loop: 9

I was already given code for a program that would print out 'Hello World' and code for a program with a loop that didn't have any output. Combining the two I changed the output string from "Hello World\n" to "Loop: \n" leaving two spaces after the colon to accommodate the loop index integer. Then in the loop iteration I used the register %r15 in which the loop index was stored in and moved it into another unused register %r14. After which I converted the contents of %r14 to an ascii character by adding an ascii '0' to it. For the last step I stored the byte of the register containing the ascii value of the loop index integer into my output string in the position where I left the second space, so in other words I replace the space with the number I want to output.

After successfully compiling the program and running it with my desired output, I had to modify it yet again to a similar output, but now going from 0 to 30. In this case I had to modify the string to include another space since the output would have to be in double digits. Using the divide function in which the divisor was 10 and the dividend was the loop index, I took the quotient as the first digit and the remainder the second digit. Using the registers which are designed to store quotient and remainder, I moved them to the registers that are expected to be saved, converted them to ascii and replaced the spaces in the output string to the acquired integers during each iteration. The output looked like:

Loop: 00
Loop: 01
Loop: 02
...
Loop: 28
Loop: 29
Loop: 30

I wasn't quite done just there. For the last part, I had to remove the leading zeros form the output. To do that I loaded a register with an ascii '0' at the start to be used to comparison in the loop. During each iteration, after the division step, the code would compare the register containing the quotient (converted to ascii) to the register that stores '0' and if the comparison outcome is equal, it would jump over the operation that stores the byte of the quotient into the string (in place of the first space), continuing the loop and leaving the first digit on the string blank, thus eliminating the leading zero in the output.

After completing the task for x86_64, I was then instructed to write the same program in aarch64. It didn't turn out to be as simple as it sounded and even though most lines of code were a mindless code rewriting to satisfy aarch64 assembly syntax, certain operation didn't work the same way as they did in x86_64. The first issue was that the division operation would only give me the quotient, meaning I would have to calculate the remainder separately. To do that I had to use msub, which essentially subtracts the product of quotient and divisor (10) from the dividend (loop index). Once that was settled, I quickly found that I couldn't move the byte into a string by using mov. After some looking around, I found that there is a separate operation for that called strb which stores the byte in a string (which I have to offset in order to store it in the right position.). The last problem I had was I trying to figure out why I couldn't convert the register to ascii. The solution was to use the 32bit 'w' prefix for the register storing the value I wanted to convert.

The overall experience definitely came with a learning curve, which started as a mild frustration of not understanding what is going on, to a fairly humbling feeling which concluded in newly acquired knowledge. While coding assembly, you have you be prepared to change your logical view in order to achieve the results you are looking for. When it comes to debugging, both of the architectures provided brief error messages during compilation, or sometimes compile successfully, but not give you the output you were looking for (or no give you any output at all). With that being said, when something went wrong, debugging the code wasn't a fun experience.

Programming in assembly for both architectures is fairly similar (at least at such a simple level). However in this example aarch64 seemed a little less organized and I had to include a few extra operations to achieve the same output as the program written for x86_64.

Code:
aarch64
x84_64

Tuesday 21 January 2014

Examining compiled C code

After writing a very simple C program, we compile it with a few different options and take a look at what the compiled code looks like, as well as compare the differences those options make.

C code looks like this:

 #include <stdio.h>  
 int main() {  
   printf("Hello World!\n");  
 }

Initially it is compiled by using GCC and options:

 -g                    # for debugging  
 -O0 -fno-builtin      # to make sure the code isn’t optimized

Then by using the objdump command and the option --source (or -d, which doesn’t include the original C code) we disassemble the compiled output, more specifically the sections containing the C code, and can examine how the source code has been converted into machine code.
We could also use the -f option to display the header information for the entire file.

Further we add to or remove from the initial set of options for the GCC assembler, as well as alter the source code, and have a look how that affects the output:

1. Adding compiler option -static:
The output this produces is much larger in size. This is due to the fact that all libraries used by the program are included within the output. The benefit of that is the programmer doesn’t have to worry whether the user will have the libraries installed or not, but the drawback, of course, is the significant size gain of the output.

2. Removing the -fno-builtin option, which was used to exclude any function optimizations:
Looking at the disassembly, we immediately notice that the compiler has replaced the printf() function with a much simpler puts() function, thus optimizing the compiled code.

3. Removing the -g option, which is used for debugging purposes:
The size of the compiled output shrinks. Section headers and disassembly contain significantly less information that would have been used for debugging.

4. Adding additional arguments to the printf() function:
After adding additional argument to the printf() funtcion

 printf("Hello World! %d,%d,%d,%d,%d,%d,%d,%d,%d,%d,%d\n", 0,1,2,3,4,5,6,7,8,9,10);

and recompiling the program, the compiled code seems to run out of registers for the arguments and stores the memory addresses of the rest of the arguments on the stack.

5. Moving the printf() to a separate function (I called it void PrintTheF()) outside of main() and then calling that function from main():
Examining the disassembly it looks like the main() calls the new function PrintTheF(), which then moves the address of the string into the register and then calls printf(). In essence, what was previously done in main(), is now being done in the separate function PrintTheF() that contains printf().

6. Changing the -O0 option to -O3:
After recompiling the code, there is a noticeable difference in <main>, by using the option -O3, the compiler has optimized the code reducing it by 5 operations.
EDIT: The optional also removes protection from the stack, because the stack is simply not being used in this case.

Thursday 16 January 2014

Understanding open source communities and code review processes

(moving all my stuff from Wordpress to Blogger)

For a better understanding how open source projects work, I took a look at a couple of free licensed packages available for Linux. In general I had a look at how the code is made available and how to make your own contribution. Furthermore, to grasp how bug fixes work, I picked a solved issue from each project as an example.

rawtherapee

http://rawtherapee.com/

License: GPLv3

Raw Therapee is a raw image editing tool, packed with many useful features. The project has a forum with a sizable amount of active users that seem to be contributing constantly. Their source code, as well as the bug tracker are available on Google code.

The one issue I looked at in particular was submitted Dec 2, 2013, it was due to a raw image from Nikon D610 not being processed properly. The submitter appeared to be mostly a user, as they did not participate in the discussion. One of the project owners commented 3 days later stating that raw images from the particular camera (in this case Nikon D610) were not supported yet. However a project contributor jumped into the conversation proposing a solution. After a few back and forths between the two, the solution was implemented that same day. The entire process was well organized and easy to follow.

qtractor

http://qtractor.sourceforge.net/qtractor-index.html

License: GPL

Essentially Qtractor is a audio/MIDI multi-track sequencer. The project code can be found on sourceforge, where you can make contributions to the project, whether it’s with enhancements or bug fixes.

The issue I observed involved an error with copying and renaming a clip, when pasting the renamed clip, the program didn't seem to register the name change. A solution followed the same day, but created another slight problem with the names appearing dim on some of the clips. The discussion was between the creator of the ticket and one of the contributors who worked on fixing both issues. The exchange took place over the course of 8 days, at the end of which the ticket was closed and the bugs were fixed. In this case the entire process was also very organized and easy to understand.

In both examples I chose to look at, both issues were attended to almost immediately and the communication between the users was active and clear. The only drawback I found with Raw Therapee bug tracking process is that there were not links to the changes being made during the discussion.