Samba
As stated on their website "Samba is the standard Windows interoperability suite of programs for Linux and Unix.", which essentially allows seamless network communications between machines with different operating systems (Windows, Unix and Linux in particular).
While searching for assembly code, I found quite a bit in the file called inffas86.c, located in /lib/zlib/contrib/inflate86/. Assembly is architecture specific and is separated by ifdefs, each architecture specific chunk of code is about 300 lines. What it does is it loops and decodes some codes and writes them out as literals. Here is a brief explanation, found in the file, and a snippet of that code:
/*
Decode literal, length, and distance codes and write out the
resulting literal and match bytes until either not enough input
or output is available, an end-of-block is encountered, or a data
error is encountered. When large enough input and output buffers
are supplied to inflate(), for example, a 16K input buffer and a
64K output buffer, more than 95% of the inflate execution time is
spent in this routine.
*/
#if defined( __GNUC__ ) && defined( __amd64__ ) && ! defined( __i386 )
__asm__ __volatile__ (
" leaq %0, %%rax\n"
" movq %%rbp, 8(%%rax)\n" /* save regs rbp and rsp */
" movq %%rsp, (%%rax)\n"
" movq %%rax, %%rsp\n" /* make rsp point to &ar */
" movq 16(%%rsp), %%rsi\n" /* rsi = in */
" movq 32(%%rsp), %%rdi\n" /* rdi = out */
" movq 24(%%rsp), %%r9\n" /* r9 = last */
" movq 48(%%rsp), %%r10\n" /* r10 = end */
" movq 64(%%rsp), %%rbp\n" /* rbp = lcode */
" movq 72(%%rsp), %%r11\n" /* r11 = dcode */
" movq 80(%%rsp), %%rdx\n" /* rdx = hold */
" movl 88(%%rsp), %%ebx\n" /* ebx = bits */
" movl 100(%%rsp), %%r12d\n" /* r12d = lmask */
" movl 104(%%rsp), %%r13d\n" /* r13d = dmask */
/* r14d = len */
/* r15d = dist */
" cld\n"
" cmpq %%rdi, %%r10\n"
" je .L_one_time\n" /* if only one decode left */
" cmpq %%rsi, %%r9\n"
" je .L_one_time\n"
" jmp .L_do_loop\n"
".L_one_time:\n"
" movq %%r12, %%r8\n" /* r8 = lmask */
" cmpb $32, %%bl\n"
" ja .L_get_length_code_one_time\n"
" lodsl\n" /* eax = *(uint *)in++ */
" movb %%bl, %%cl\n" /* cl = bits, needs it for shifting */
" addb $32, %%bl\n" /* bits += 32 */
" shlq %%cl, %%rax\n"
" orq %%rax, %%rdx\n" /* hold |= *((uint *)in)++ << bits */
" jmp .L_get_length_code_one_time\n"
No fall-backs have been provided for this file and if the architecture is not supported, an error is thrown:#else
#error "x86 architecture not defined"
Another file that contains assembly is byteorder.h, located in /lib/util/. In here we see assembly defined specifically for PowerPC that uses load/store instructions for a short or int conversion.#if (defined(__powerpc__) && defined(__GNUC__))
static __inline__ uint16_t ld_le16(const uint16_t *addr)
{
uint16_t val;
__asm__ ("lhbrx %0,0,%1" : "=r" (val) : "r" (addr), "m" (*addr));
return val;
}
static __inline__ void st_le16(uint16_t *addr, const uint16_t val)
{
__asm__ ("sthbrx %1,0,%2" : "=m" (*addr) : "r" (val), "r" (addr));
}
LuaLua is a lightweight, powerful and embeddable scripting language used in a vast variety of applications and several well known games.
A quick search of Lua files for anything that resembles assembly return just one line of code in a single file. By taking a close look, the assembly looks very similar to what we saw in byteorder.h from Samba. In this case it is Microsoft specific and is used for an integer conversion. For any other architecture a line of C code is used instead. Here is what the assembly looks like:
#if defined(MS_ASMTRICK) || defined(LUA_MSASMTRICK) /* { */
/* trick with Microsoft assembler for X86 */
#define lua_number2int(i,n) __asm {__asm fld n __asm fistp i}
#define lua_number2integer(i,n) lua_number2int(i, n)
#define lua_number2unsigned(i,n) \
{__int64 l; __asm {__asm fld n __asm fistp l} i = (unsigned int)l;}
Eugen - Good post. I'm not sure I agree with your comment about no fallbacks for the Samba file inffas86.c though -- it looks like it's a replacement for inffast.c for x86 only (i.e., inffast.c itself would be used in most cases). Also, quickly grepping through the Makefiles etc. it looks like it might not even be built on x86 unless you take some manual steps (I'd want to check that more carefully before saying for certain). It would be worthwhile looking at how some of the distros build this.
ReplyDelete