technovelty

weblog of Ian Wienand

RSS  |  technovelty home  |  page of ian  |  ian@wienand.org

How much slower is cross compiling to your own host?

The usual case for cross-compiling is that your target is so woefully slow and under-powered that you would be insane to do anything else.

However, sometimes for one of the best reasons of all, "historical reasons", you might ship a 64-bit product but support building on 32-bit hosts, and thus cross-compile even on a very fast architecture like x86. How much does this cost, even though almost everyone is running the 32-bit cross-compiler on a modern 64-bit machine?

To test, I got a a 32-bit cross and a 64-bit native x86_64 compiler and toolchain; in this case based on gcc-4.1.2 and binutils 2.17. I then did a allyesconfig build of Linux 2.6.33 x86_64 kernel 3 times using the cross compilier toolchain and then native one. The results (in seconds):

32-bit64-bit
60905684
60505616
60635652
average
60675650

So, all up, ~7% less by building your 64-bit code on a 64-bit machine with a 32-bit cross-compiler.

posted at: Thu, 06 May 2010 16:50 | in /code/c | permalink | add comment (2 others)

What exactly does -Bsymblic do? -- update

Some time ago I wrote a description of the -Bsymbolic linker flag which could do with some further explanation. The original article is a good starting place.

One interesting point that I didn't go into was the potential for code optimisation -Bsymbolic brings about. I'm not sure if I missed that at the time, or the toolchain changed, both are probably equally likely!

Let me recap the example...

ianw@jj:/tmp/bsymbolic$ cat Makefile 
all: test test-bsymbolic

clean:
	rm -f *.so test testsym

liboverride.so : liboverride.c
	       $(CC) -Wall -O2 -shared -fPIC -o liboverride.so $<

libtest.so : libtest.c
	   $(CC) -Wall -O2 -shared -fPIC -o libtest.so $<

libtest-bsymbolic.so : libtest.c
		     $(CC) -Wall -O2 -shared -fPIC -Wl,-Bsymbolic -o $@ $<

test : test.c libtest.so liboverride.so
     $(CC) -Wall -O2 -L. -Wl,-rpath=. -ltest -o $@ $<

test-bsymbolic : test.c libtest-bsymbolic.so liboverride.so
	$(CC) -Wall -O2 -L. -Wl,-rpath=. -ltest-bsymbolic -o $@ $<
$ cat liboverride.c 
#include <stdio.h>

int foo(void)
{
	printf("override foo called\n");
	return 0;
}
$ cat libtest.c 
#include <stdio.h>

int foo(void) {
    printf("libtest foo called\n");
    return 1;
}

int test_foo(void) {
    return foo();
}
$ cat test.c
#include <stdio.h>

int test_foo(void);

int main(void)
{
	printf("%d\n", test_foo());
	return 0;
}

In words; libtest.so provides test_foo(), which calls foo() to do the actual work. libtest-bsymbolic.so is simply built with the flag in question, -Bsymbolic. liboverride.so provides the alternative version of foo() designed to override the original via a LD_PRELOAD of the library.

test is built against libtest.so, test-bsymbolic against libtest-bsymbolic.so.

Running the examples, we can see that the LD_PRELOAD does not override the symbol in the library built with -Bsymbolic.

$ ./test
libtest foo called
1

$ ./test-bsymbolic 
libtest foo called
1

$ LD_PRELOAD=liboverride.so ./test
override foo called
0

$ LD_PRELOAD=liboverride.so ./test-bsymbolic 
libtest foo called
1

There are a couple of things going on here. Firstly, you can see that the SYMBOLIC flag is set in the dynamic section, leading to the dynamic linker behaviour I explained in the original article:

ianw@jj:/tmp/bsymbolic$ readelf --dynamic ./libtest-bsymbolic.so 

Dynamic section at offset 0x550 contains 22 entries:
  Tag        Type                         Name/Value
 0x00000001 (NEEDED)                     Shared library: [libc.so.6]
 0x00000010 (SYMBOLIC)                   0x0
...

However, there is also an effect on generated code. Have a look at the PLTs:

$ objdump --disassemble-all ./libtest.so
Disassembly of section .plt:

[... blah ...]

0000039c &lt;foo@plt&gt;:
 39c:   ff a3 10 00 00 00       jmp    *0x10(%ebx)
 3a2:   68 08 00 00 00          push   $0x8
 3a7:   e9 d0 ff ff ff          jmp    37c &lt;_init+0x30&gt;
</pre>
</div>

<div class="codebox">
<pre>
$ objdump --disassemble-all ./libtest-bsymbolic.so
Disassembly of section .plt:

00000374 <__gmon_start__@plt-0x10>:
 374:   ff b3 04 00 00 00       pushl  0x4(%ebx)
 37a:   ff a3 08 00 00 00       jmp    *0x8(%ebx)
 380:   00 00                   add    %al,(%eax)
        ...

00000384 <__gmon_start__@plt>:
 384:   ff a3 0c 00 00 00       jmp    *0xc(%ebx)
 38a:   68 00 00 00 00          push   $0x0
 38f:   e9 e0 ff ff ff          jmp    374 <_init+0x30>

00000394 <puts@plt>:
 394:   ff a3 10 00 00 00       jmp    *0x10(%ebx)
 39a:   68 08 00 00 00          push   $0x8
 39f:   e9 d0 ff ff ff          jmp    374 <_init+0x30>

000003a4 <__cxa_finalize@plt>:
 3a4:   ff a3 14 00 00 00       jmp    *0x14(%ebx)
 3aa:   68 10 00 00 00          push   $0x10
 3af:   e9 c0 ff ff ff          jmp    374 <_init+0x30>

Notice the difference? There is no PLT entry for foo() when -Bsymbolic is used.

Effectively, the toolchain has noticed that foo() can never be overridden and optimised out the PLT call for it. This is analogous to using "hidden" attributes for symbols, which I have detailed in another article on symbol visiblity attributes (which also goes into PLT's, if the above meant nothing to you).

So -Bsymbolic does have some more side-effects than just setting a flag to tell the dynamic linker about binding rules -- it can actually result in optimised code. However, I'm still struggling to find good use-cases for -Bsymbolic that can't be better done with Version scripts and visibility attributes. I would certainly recommend using these methods if at all possible.

Thanks to Ryan Lortie for comments on the original article.

posted at: Mon, 22 Mar 2010 15:40 | in /code/c | permalink | add comment (0 others)

Relocation truncated to fit - WTF?

If you code for long enough on x86-64, you'll eventually hit an error such as:

(.text+0x3): relocation truncated to fit: R_X86_64_32S against symbol `array' defined in foo section in ./pcrel8.o

Here's a little example that might help you figure out what you've done wrong.

Consider the following code:

$ cat foo.s
.globl foovar
  .section   foo, "aw",@progbits
  .type foovar, @object
  .size foovar, 4
foovar:
   .long 0

.text
.globl _start
 .type function, @function
_start:
  movq $foovar, %rax

In case it's not clear, that would look something like:

int foovar = 0;

void function(void) {
  int *bar = &foovar;
}

Let's build that code, and see what it looks like

$ gcc -c foo.s

$ objdump --disassemble-all ./foo.o

./foo.o:     file format elf64-x86-64


Disassembly of section .text:

0000000000000000 <_start>:
   0:		 48 c7 c0 00 00 00 00	mov    $0x0,%rax

Disassembly of section foo:

0000000000000000 <foovar>:
   0:		 00 00			add    %al,(%rax)
   ...

We can see that the mov instruction has only allocated 4 bytes (00 00 00 00) for the linker to put in the address of foovar. If we check the relocations:

$ readelf --relocs ./foo.o

Relocation section '.rela.text' at offset 0x3a0 contains 1 entries:
  Offset          Info           Type           Sym. Value    Sym. Name + Addend
000000000003  00050000000b R_X86_64_32S      0000000000000000 foovar + 0

The R_X86_64_32S relocation is indeed only a 32-bit relocation. Now we can tickle this error. Consider the following linker script, which puts the foo section about 5 gigabytes away from the code.

$ cat test.lds
SECTIONS
{
 . = 10000;
 .text : { *(.text) }
 . = 5368709120;
 .data : { *(.foo) }
}

This now means that we can not fit the address of foovar inside the space allocated by the relocation. When we try it:

$ ld -Ttest.lds ./foo.o
./foo.o: In function `_start':
(.text+0x3): relocation truncated to fit: R_X86_64_32S against symbol `foovar' defined in foo section in ./foo.o

What this means is that the full 64-bit address of foovar, which now lives somewhere above 5 gigabytes, can't be represented within the 32-bit space allocated for it.

For code optimisation purposes, the default immediate size to the mov instructions is a 32-bit value. This makes sense because, for the most part, programs can happily live within a 32-bit address space, and people don't do things like keep their data so far away from their code it requires more than a 32-bit address to represent it. Defaulting to using 32-bit immediates therefore cuts the code size considerably, because you don't have to make room for a possible 64-bit immediate for every mov.

So, if you want to really move a full 64-bit immediate into a register, you want the movabs instruction. Try it out with the code above - with movabs you should get a R_X86_64_64 relocation and 64-bits worth of room to patch up the address, too.

If you're seeing this and you're not hand-coding, you probably want to check out the -mmodel argument to gcc.

posted at: Thu, 12 Mar 2009 23:20 | in /code/c | permalink | add comment (2 others)

Position Independent Code and x86-64 libraries

If you've ever tried to link non-position independent code into a shared library on x86-64, you should have seen a fairly cryptic error about invalid relocations and missing symbols. Hopefully this will clear it up a little!

Let's start with a small program to illustrate.

$ cat function.c
int global = 100;

int function(int i) {
	return i + global;
}
$ gcc -c function.c

Firstly, inspect the disassembley of this function:

0000000000000000 <function>:
   0:	55                   	push   %rbp
   1:	48 89 e5             	mov    %rsp,%rbp
   4:	89 7d fc             	mov    %edi,-0x4(%rbp)
   7:	8b 05 00 00 00 00    	mov    0x0(%rip),%eax        # d <function+0xd>
   d:	03 45 fc             	add    -0x4(%rbp),%eax
  10:	c9                   	leaveq
  11:	c3                   	retq

Lets just go through that for clarity:

The IP relative move is really the trick here. We know from the code that it has to move the value of the global variable here. The zero value is simply a place holder - the compiler currently does not determine the required address (i.e. how far away from the instruction pointer the memory holding the global variable is). It leaves behind a relocation -- a note that says to the linker "you should determine the correct address of foo (global in our case), and then patch this bit of the code to point to that addresss (i.e. foo)."

Relocations with addend

The top portion of the image above gives some idea of how it works. We can examine relocations in binaries with the readelf tool.

$ readelf --relocs ./function.o

Relocation section '.rela.text' at offset 0x518 contains 1 entries:
  Offset          Info           Type           Sym. Value    Sym. Name + Addend
000000000009  000800000002 R_X86_64_PC32     0000000000000000 global + fffffffffffffffc

There are many different types of relocations for different situations; the exact rules for different relocation types are described in the ABI documentation for the architecture. The R_X86_64_PC32 relocation is defined as "the base of the section the symbol is within, plus the symbol value, plus the addend". The addend makes it look more tricky than it is; remember that when an instruction is executing the instruction pointer points to the next instruction to be executed. Therefore, to correctly find the data relative to the instruction pointer, we need to subtract the extra. This can be seen more clearly when layed out in a linear fashion (as in the bottom of the above diagram).

If you try and build a shared object (dynamic library) with an object file with this type of relocation, you should get something like:

$ gcc -shared function.c
/usr/bin/ld: /tmp/ccQ2ttcT.o: relocation R_X86_64_32 against `a local symbol' can not be used when making a shared object; recompile with -fPIC
/tmp/ccQ2ttcT.o: could not read symbols: Bad value
collect2: ld returned 1 exit status

The specific problem is how this relocation interacts with Position Independent Code (PIC, enabled with -fPIC). PIC just means that the output binary does not expect to be loaded at a particular base address, but is happy being put anywhere in memory (compare the output of readelf --segments on a binary such as /bin/ls to that of any shared library). This is obviously critical for implementing lazy-loading (i.e. only loaded when required) shared-libraries, where you may have many libraries loaded in essentially any order. Trying to pre-allocate where in memory they would all live is completely impractical and just does not work (not to mention every single library that might ever be used would be competing for a spot in the limited address space of a 32-bit process!).

What's the specific problem with this relocation in a shared library? In a shared library situation, we can not depend on the local value of global actually being the one we want. Consider the following example, where we override the value of global with a LD_PRELOAD library.

$ cat function.c
int global = 100;

int function(int i) {
	return i + global;
}
$ gcc -fPIC -shared -o libfunction.so function.c

$ cat preload.c
int global = 200;
$ gcc -shared preload.c -o libpreload.so

$ cat program.c
#include <stdio.h>

int function(int i);

int main(void) {
   printf("%d\n", function(10));
}
$ gcc -L. -lfunction program.c -o program

$ LD_LIBRARY_PATH=. ./program
110
$ LD_PRELOAD=libpreload.so LD_LIBRARY_PATH=. ./program
210

If the code in libfunction.so has a fixed offset into its own data section, it will not be able to see the overridden value provided by libpreload.so. This is not the case when building a stand-alone executable, where references are satisfied internally.

Of course, any problem in computer science can be solved with a layer of abstraction, and that is what is done when compiling with -fPIC. To examine this case, let's see what happens with PIC turned on.

$ gcc -fPIC -shared -c  function.c
$ objdump --disassemble ./function.o

./function.o:     file format elf64-x86-64

Disassembly of section .text:

0000000000000000 <function>:
   0:	55                   	push   %rbp
   1:	48 89 e5             	mov    %rsp,%rbp
   4:	89 7d fc             	mov    %edi,-0x4(%rbp)
   7:	48 8b 05 00 00 00 00 	mov    0x0(%rip),%rax        # e <function+0xe>
   e:	8b 00                	mov    (%rax),%eax
  10:	03 45 fc             	add    -0x4(%rbp),%eax
  13:	c9                   	leaveq
  14:	c3                   	retq

It's almost the same! We setup the frame pointer with the first two instructions as before. We push the first argument into memory in the pre-allocated "red-zone" as before. Then, however, we do an IP relative load of an address into rax. Next we de-reference this into eax (e.g. eax = *rax in C) before adding the incoming argument to it and returning.

$ readelf --relocs ./function.o

Relocation section '.rela.text' at offset 0x550 contains 1 entries:
  Offset          Info           Type           Sym. Value    Sym. Name + Addend
00000000000a  000800000009 R_X86_64_GOTPCREL 0000000000000000 global + fffffffffffffffc

The magic here is again in the relocations. Notice this time we have a P_X86_64_GOTPCREL relocation. This says "replace the data at offset 0xa with the global offset table (GOT) entry of global.

Global Offset Table operation with data variables

As shown above, the GOT ensures the abstraction required so symbols can be diverted as expected. Each entry is essentially a pointer to the real data (hence the extra dereference in the code above). Since the GOT is at a fixed offset from the program code, it can use an IP relative address to gain access to the table entries.

This extra reference is obviously slower; however for the most part I imagine the overhead would be essentially immeasurable and is required for "generic" operation. If you have figured the cost of indirection through the GOT is the major bottleneck of your program, I imagine you wouldn't be reading this and would already be considering strategies to remove it!

The next question is why this works on plain old x86-32. Inspecting the code reveals why:

$ objdump --disassemble ./function.o
00000000 <function>:
   0:	55                   	push   %ebp
   1:	89 e5                	mov    %esp,%ebp
   3:	a1 00 00 00 00       	mov    0x0,%eax
   8:	03 45 08             	add    0x8(%ebp),%eax
   b:	5d                   	pop    %ebp
   c:	c3                   	ret
$ readelf --relocs ./function.o
Relocation section '.rel.text' at offset 0x2ec contains 1 entries:
 Offset     Info    Type            Sym.Value  Sym. Name
00000004  00000701 R_386_32          00000000   global

We start out the same, with the first two instructions setting up the frame pointer. However, next we load a memory value into eax -- as we can see from the relocation information, the address of global. Next we add the incoming argument from the stack (0x8(%ebp)) to the value in this memory location; implicitly dereferencing it. This provides the abstraction we need -- if the relocation makes the patched address at 0x4 the address of the GOT entry, it will be correctly dereferenced. It is the inability of the x86-32 architecture to try and optimise by doing instruction-pointer relative offseting which means it always needs to do slower memory references, which turns out to be just what you want when you're making a shared library!

So, the executive summary: the ability of x86-64 to use instruction-pointer relative offsetting to data addresses is a nice optimisation, but in a shared-library situation assumptions about the relative location of data are invalid and can not be used. In this case, access to global data (i.e. anything that might be changed around on you) must go through a layer of abstraction, namely the global offset table.

posted at: Wed, 26 Nov 2008 13:53 | in /code/c | permalink | add comment (4 others)

Conditional Operator Harmful ? yes : no

It seems existing versions of gcc don't warn when you use an assignment as a truth value as the first operand to the conditional operator. For example:

int fn(int i)
{
	return (i = 0x20) ? 0x40 : 0x80;
}

Presumably you meant ==, but unfortunately that does not warn with gcc (maybe it will one day).

We can use the tree dumping mechanisms (previously mentioned here) to confirm our problem. For the following function gcc creates:

fn (i)
{
  int D.1541;
  int iftmp.0;

  i = 32;
  if (1)
    {
      iftmp.0 = 64;
    }
  else
    {
      iftmp.0 = 128;
    }
  D.1541 = iftmp.0;
  return D.1541;
}

As you can see, this isn't the result we wanted. This can be quite nasty if you use the conditional operator in a hidden fashion; for example behind an assert. A common idiom is:

#define ASSERT(x) (x) ? : fail()

/* programmer now subsitutes a == for */
ASSERT(x = y);

You'll currently get no warning you've slipped up and used the assert wrong. Moral of the story: a quick audit of your code might turn up some surprising uses! However, in this case, the Intel compiler picks this one up:

$ icc -c test.c
test.c(3): warning #187: use of "=" where "==" may have been intended
  	   return (i = 0x20) ? 0x40 : 0x80;

Although it's often not trivial to move to another compiler, as a rule I would recommend trying alternative compilers for your code as it's a great sanity check.

posted at: Sat, 26 Apr 2008 08:29 | in /code/c | permalink | add comment (1 others)

Checking users of typedefs

If you poke around the Linux VM layers you might wonder why there is a STRICT_MM_TYPECHECKS option. It is there to stop programmers touching opaque types. Linus has said many times that a typedef should only be made for a completely opaque type. This therefore implies that rather than operate on it directly, there is some sort of "API" to get at the values.

A simple example is below. my_long_t is opaque; the only way you should use its value is via the my_val accessor.

But how do we enforce this, when the type is really just an unsigned long? We wrap it up in something else, in this case a struct.

#ifndef STRICT
  typedef unsigned long my_long_t;
# define my_val(x) (x)

#else
  typedef struct { unsigned long t; } my_long_t;
# define my_val(x) ((x).t)

#endif

my_long_t my_naughty_function(my_long_t input)
{
        return input * 1000;
}

We can now see the outcome

$ gcc -c strict.c
$ gcc -DSTRICT -c strict.c
strict.c: In function 'my_naughty_function':
strict.c:15: error: invalid operands to binary *

So now we have an error where we do the wrong thing, rather than no notification at all! This niftily moves you up from "bug via visual inspection" to "won't compile" (Rusty Russell talked about this scale at linux.conf.au once -- does anyone have a reference to that talk?). Why not just leave it like this then? Well it might confuse the compiler; if you have rules about always passing structures on the stack, for example, the above will suck. In general, it's best to hide as little from the compiler as possible, to give it the best chance of doing a good job for you.

posted at: Fri, 30 Jun 2006 13:35 | in /code/c | permalink | add comment (1 others)

GCC Trampolines

Paul Wayper writes about trampolines. This is something I've come across before, when I was learning about function pointers with IA64.

Nested functions are a strange contraption, and probably best avoided (people who should know agree). For example, here is a completely obfuscated way to double a number.

#include <stdio.h>

int function(int arg) {

        int nested_function(int nested_arg) {
                return arg + nested_arg;
        }

        return nested_function(arg);
}


int main(void)
{
        printf("%d\n", function(10));
}

Let's disassemble that on IA64 (because it is so much easier to understand).

0000000000000000 :
   0:   00 10 15 08 80 05       [MII]       alloc r34=ar.pfs,5,4,0
   6:   c0 80 33 7e 46 60                   adds r12=-16,r12
   c:   04 08 00 84                         mov r35=r1
  10:   01 00 00 00 01 00       [MII]       nop.m 0x0
  16:   10 02 00 62 00 80                   mov r33=b0
  1c:   04 00 01 84                         mov r36=r32;;
  20:   0a 70 40 18 00 21       [MMI]       adds r14=16,r12;;
  26:   00 00 39 20 23 e0                   st4 [r14]=r32
  2c:   01 70 00 84                         mov r15=r14
  30:   10 00 00 00 01 00       [MIB]       nop.m 0x0
  36:   00 00 00 02 00 00                   nop.i 0x0
  3c:   48 00 00 50                         br.call.sptk.many b0=70 
  40:   0a 08 00 46 00 21       [MMI]       mov r1=r35;;
  46:   00 00 00 02 00 00                   nop.m 0x0
  4c:   20 02 aa 00                         mov.i ar.pfs=r34
  50:   00 00 00 00 01 00       [MII]       nop.m 0x0
  56:   00 08 05 80 03 80                   mov b0=r33
  5c:   01 61 00 84                         adds r12=16,r12
  60:   11 00 00 00 01 00       [MIB]       nop.m 0x0
  66:   00 00 00 02 00 80                   nop.i 0x0
  6c:   08 00 84 00                         br.ret.sptk.many b0;;

0000000000000070 :
  70:   0a 40 00 1e 10 10       [MMI]       ld4 r8=[r15];;
  76:   80 00 21 00 40 00                   add r8=r32,r8
  7c:   00 00 04 00                         nop.i 0x0
  80:   11 00 00 00 01 00       [MIB]       nop.m 0x0
  86:   00 00 00 02 00 80                   nop.i 0x0
  8c:   08 00 84 00                         br.ret.sptk.many b0;;

Note that nothing funny happens there at all. The nested function looks exactly like a real function, but we don't do any of the alloc or anything to get us a new stack frame -- we're using the stack frame of the old function. No trampoline involved, just a straight branch.

So why do we need a trampoline? Because a nested function has an extra property over a normal function; the stack pointer must be set to be the stack pointer of the function it is nested within. This means that if we were to pass around pointers to the nested function, we would have to keep track that this isn't a pointer to just any old function, but a pointer to special function that needs its stack setup to come from the nested function. C doesn't work like this, there is only one type "pointer to function".

The easiest way is therefore to create a little function that sets the stack to the right place and calls the code. This "trampoline" is a normal function, so no need for trickery. For example, we can modify this to use a function pointer to the nested function:

#include <stdio.h>

int function(int arg) {

        int nested_function(int nested_arg) {
                return arg + nested_arg;
        }

        int (*function_pointer)(int arg) = nested_function;

        return function_pointer(arg);
}


int main(void)
{
        printf("%d\n", function(10));
}

We can see that now we are calling through a function pointer, we go through __ia64_trampoline, which is a little code stub in libgcc.a which loads up the right stack pointer and calls the "nested" function.

function:
	.prologue 12, 33
	.mmb
	.save ar.pfs, r34
	alloc r34 = ar.pfs, 1, 3, 1, 0
	.fframe 48
	adds r12 = -48, r12
	nop 0
	.mmi
	addl r15 = @ltoffx(__ia64_trampoline#), r1
	mov r35 = r1
	.save rp, r33
	mov r33 = b0
	.body
	;;
	.mmi
	nop 0
	adds r8 = 24, r12
	adds r14 = 16, r12
	.mmi
	ld8.mov r15 = [r15], __ia64_trampoline#
	;;
	st4 [r14] = r32
	nop 0
	.mmi
	mov r14 = r8
	;;
	st8 [r14] = r15, 8
	adds r15 = 40, r12
	;;
	.mfi
	st8 [r14] = r15, 8
	nop 0
	addl r15 = @ltoff(@fptr(nested_function.1814#)), gp
	;;
	.mmi
	ld8 r15 = [r15]
	;;
	st8 [r14] = r15, 8
	adds r15 = 16, r12
	;;
	.mmb
	st8 [r14] = r15
	ld8 r14 = [r8], 8
	nop 0
	.mmi
	ld4 r36 = [r15]
	;;
	nop 0
	mov b6 = r14
	.mbb
	ld8 r1 = [r8]
	nop 0
	br.call.sptk.many b0 = b6
	;;
	.mii
	mov r1 = r35
	mov ar.pfs = r34
	mov b0 = r33
	.mib
	nop 0
	.restore sp
	adds r12 = 48, r12
	br.ret.sptk.many b0
	.endp function#
	.section	.rodata.str1.8,"aMS",@progbits,1
	.align 8

So, the summary is that if you're taking the address of a nested function, to avoid having different types of pointers to functions if they are nested or not gcc puts in the stub "trampoline" code.

posted at: Mon, 15 May 2006 23:53 | in /code/c | permalink | add comment (0 others)

Unrolling and Duff's Device

Loop unrolling can be a fascinating art.

The canonical example of loop unrolling is something like

for (i = 0; i < 100; i++)
    x[i] = i;

becoming

for (i = 0; i < 25; i+=4) {
    x[i] = i;
    x[i+1] = i+1;
    x[i+2] = i+2;
    x[i+3] = i+3;
}

Longer runs of instructions afford you a better chance of managing to find instruction level parallelism; that is instructions that can all happen together. You can keep the pipeline fuller and hence run faster, at the cost of some expanded code size.

But unlike that example, you generally do not know where your loop ends. In the above example, we made four copies of the loop body, reducing the iterations by 4. For an indeterminate number of iterations, we would need to do the original loop n % 4 times and then the unrolled body floor(n / 4) times. So to spell it out, if we want to do the loop 7 times, then we execute the original loop 3 times and the unrolled portion once.

Our new code might look something like

int extras = n % 4;
for (i = 0; i < extras; i++)
    x[i] += i;

for ( ; i < n / 4 ; i+=4) {
    x[i] = i;
    x[i+1] = i+1;
    x[i+2] = i+2;
    x[i+3] = i+3;
}

That leads to a little trick called "Duff's Device", references to which can be found most anywhere (however I found most of them a little unclear, hence this post).

Duff's device looks like this

send(to, from, count)
register short *to, *from;
register count;
{
        register n=(count+7)/8;

        switch(count%8) {
        case 0: do {    *to = *from++;
        case 7:         *to = *from++;
        case 6:         *to = *from++;
        case 5:         *to = *from++;
        case 4:         *to = *from++;
        case 3:         *to = *from++;
        case 2:         *to = *from++;
        case 1:         *to = *from++;
                } while(--n>0);
        }
}

Now that doesn't look like loop unrolling, but it really is. In this case, we have unwound a loop doing simply *to = *from 8 times. But we still have that extra bit problem, since we have an indeterminate count.

To understand it, let's just work through what happens for possible inputs. For values of count from 1-7, n is one and count % 8 is just count so we fall right into the case statements, which fall through to each other without break and never repeat. The nifty bit comes when count is greater than 7. In this case, our first iteration comes from the switch where we fall through to do the extra count % 8 iterations. After that, we loop back to the start of the unrolling and do our 8 step loop as many times as we require.

Now by post-incrementing *to you can use this to implement memcpy(). An interesting exercise might be to see how much slower this is than the highly optimised memcpy that comes with glibc.

The code is most certainly ingenious and this page explains some of the history of it. Today I think it is mostly good as an exercise on "why it works".

If you're wondering how gcc handles unrolling, it creates the unrolled loop and the does a sort of fall through test which evaluates how many extra loops are required and branches into the unrolled portion at the correct point. A little extract is below, you can see r14 holds the number of extra loops required and branches higher up the unrolled code for each extra loop required.

...
  70: [MFB]       cmp4.eq p6,p7=0,r14
  76:             nop.f 0x0
  7c:       (p06) br.cond.dpnt.few 150 ;;
  80: [MFB]       cmp4.eq p6,p7=1,r14
  86:             nop.f 0x0
  8c:       (p06) br.cond.dpnt.few 130 ;;
  90: [MFB]       cmp4.eq p6,p7=2,r14
  96:             nop.f 0x0
  9c:       (p06) br.cond.dpnt.few 120 ;;
  a0: [MFB]       cmp4.eq p6,p7=3,r14
  a6:             nop.f 0x0
  ac:       (p06) br.cond.dpnt.few 110 ;;
  b0: [MFB]       cmp4.eq p6,p7=4,r14
  b6:             nop.f 0x0
  bc:       (p06) br.cond.dpnt.few 100 ;;
  c0: [MFB]       cmp4.eq p6,p7=5,r14
  c6:             nop.f 0x0
  cc:       (p06) br.cond.dpnt.few f0 ;;
  d0: [MMI]       cmp4.eq p6,p7=6,r14;;
  d6:       (p07) st4 [r16]=r17,4
  dc:             nop.i 0x0
...

posted at: Fri, 24 Mar 2006 09:32 | in /code/c | permalink | add comment (0 others)

Poking around inside the compiler

Shehjar, one of our students, came up with the interesting little

int i = 5;
printf("%d %d %d", i, i--, ++i);

and wondered exactly what it should do. Well, the C standard says that arguments to functions are evaluated first, but does not specify in what order they are evaluated. So this is a classic example of undefined behaviour (in fact it's even listed in the common list of not a bug not a bug by the GCC manual). In fact, ++ and friends don't actually create sequence points within the code (points where things are known to be evaluated), so they can play host to all sorts of undefined behaviour like above.

Of course, gcc doesn't leave you blind to this. If you turn on -Wall you get warning: operation on 'i' may be undefined. You do compile with -Wall right?

But this doesn't change the fact that that code prints 5,6,5 on x86 and on Itanium it prints 5,5,5. This suggested some investigation.

Firstly, lets simplify this down to

extern blah(int, int, int);

void function(void)
{
        int i = 5;
        blah(i, i--, ++i);
}

The problem is, looking at the output of the assembly doesn't tell us that much, because GCC is smart and by the time it has got to assembly, it is simply moving constant values into registers. So we need some way to peek into what GCC thinks it's doing to come up with those values.

Hence the magic -fdump flag. Building with this gives us dumps of the stages GCC is going through as it generates it's code.

$ gcc -fdump-tree-all -S -O ./test.c

This gives us lots of dumps, in numerical order as gcc is moving along. With modern GCC's, a good place to pick things up is to look is in the .gimple output file. Gimplification is the process GCC goes through optimising its internal tree structure (called GIMPLE after SIMPLE, apparently).

The .gimple file is pretty-printed from the tree representation back into C code. So, on i386 it looks like.

x86 $ cat test.c.t08.gimple

;; Function function (function)

function ()
{
  int i.0;
  int i;

  i = 5;
  i = i + 1;
  i.0 = i;
  i = i - 1;
  blah (i, i.0, i);
}

And on Itanium it looks like

ia64 $ cat test.c.t08.gimple
;; Function function (function)

function ()
{
  int i.0;
  int i;

  i = 5;
  i.0 = i;
  i = i - 1;
  i = i + 1;
  blah (i, i.0, i);
}

At this point, we can clearly see how gcc ends up with the idea that the code translates into 5,6,5 on x86 and 5,5,5 on Itanium. We also know that it is not the fault of optimisation, since we can see just what it is about to optimise matches the output we get. Why exactly it reaches this conclusion on the different architectures escapes me at the moment, but I'm slowing reading through and understanding the GCC code base hoping to one day find enlightenment :) Certainly passing arguments is very different on x86 compared to IA64 (on the stack rather than via registers) so no doubt some small piece of the backend config tweaks this into happening.

Another interesting thing to look at is the raw output of the tree gcc is using. You can see that via something like

$ gcc -S -fdump-tree-all-raw  -o test test.c

Which gives you an output like


;; Function function (function)

function ()
@1      function_decl    name: @2       type: @3       srcp: test.c:4
                         extern         body: @4
@2      identifier_node  strg: function lngt: 8
@3      function_type    size: @5       algn: 8        retn: @6
                         prms: @7
@4      bind_expr        type: @6       vars: @8       body: @9
@5      integer_cst      type: @10      low : 8
@6      void_type        name: @11      algn: 8
@7      tree_list        valu: @6
@8      var_decl         name: @12      type: @13      scpe: @1
                         srcp: test.c:6                artificial
                         size: @14      algn: 32       used: 1
@9      statement_list   0   : @15      1   : @16      2   : @17
                         3   : @18      4   : @19
@10     integer_type     name: @20      unql: @21      size: @22
                         algn: 64       prec: 36       unsigned
                         min : @23      max : @24
@11     type_decl        name: @25      type: @6       srcp: <built-in>:0
@12     identifier_node  strg: i.0      lngt: 3
@13     integer_type     name: @26      size: @14      algn: 32
                         prec: 32       min : @27      max : @28
@14     integer_cst      type: @10      low : 32
@15     modify_expr      type: @6       op 0: @29      op 1: @30
@16     modify_expr      type: @13      op 0: @29      op 1: @31
@17     modify_expr      type: @13      op 0: @8       op 1: @29
@18     modify_expr      type: @13      op 0: @29      op 1: @32
@19     call_expr        type: @13      fn  : @33      args: @34
@20     identifier_node  strg: bit_size_type           lngt: 13
@21     integer_type     name: @20      size: @22      algn: 64
                         prec: 36
@22     integer_cst      type: @10      low : 64
@23     integer_cst      type: @10      low : 0
@24     integer_cst      type: @10      low : -1
@25     identifier_node  strg: void     lngt: 4
@26     type_decl        name: @35      type: @13      srcp: <built-in>:0
@27     integer_cst      type: @13      high: -1       low : -2147483648
@28     integer_cst      type: @13      low : 2147483647
@29     var_decl         name: @36      type: @13      scpe: @1
                         srcp: test.c:5                size: @14
                         algn: 32       used: 1
@30     integer_cst      type: @13      low : 5
@31     plus_expr        type: @13      op 0: @29      op 1: @37
@32     minus_expr       type: @13      op 0: @29      op 1: @37
@33     addr_expr        type: @38      op 0: @39
@34     tree_list        valu: @29      chan: @40
@35     identifier_node  strg: int      lngt: 3
@36     identifier_node  strg: i        lngt: 1
@37     integer_cst      type: @13      low : 1
@38     pointer_type     size: @14      algn: 32       ptd : @41
@39     function_decl    name: @42      type: @41      srcp: test.c:1
                         undefined      extern
@40     tree_list        valu: @8       chan: @43
@41     function_type    size: @5       algn: 8        retn: @13
                         prms: @44
@42     identifier_node  strg: blah     lngt: 4
@43     tree_list        valu: @29
@44     tree_list        valu: @13      chan: @45
@45     tree_list        valu: @13      chan: @46
@46     tree_list        valu: @13      chan: @7

You can see the @X points to other nodes, which builds up to an entire tree structure of different nodes which represent the program. You can read more about the internal representation here.

Take home point? Using these dumps is a good way to start peering into what GCC is actually doing underneath its fairly extensive hood.

posted at: Tue, 07 Feb 2006 15:45 | in /code/c | permalink | add comment (3 others)

inline functions

Chatting with Hal, one of our summer students, we came across a bevy of interesting things to do with inlining code.

posted at: Thu, 15 Dec 2005 15:44 | in /code/c | permalink | add comment (0 others)

how to do max and min properly

The canonical definition of max is something like

#define max(a,b) ( ( (a) >= (b) ) ? (a) : (b))

But this raises many issues with types in C. What if you pass in an unsigned int and a signed int all at the same time? Under what circumstances is the comparison valid? What type should be returned?

The kernel gets around this by being tricky

/*
 * min()/max() macros that also do
 * strict type-checking.. See the
 * "unnecessary" pointer comparison.
 */
#define min(x,y) ({ \
        typeof(x) _x = (x);     \
        typeof(y) _y = (y);     \
        (void) (&_x == &_y);            \
        _x < _y ? _x : _y; })

The give you a clue to how this works by talking about the pointer comparison. C99 6.5.9.2 says that for the == comparison

Both operands are pointers to qualified or unqualified version of compatible types

Where compatible types are defined in 6.7.2.1

Two types have compatible type if their types are the same.

There are additional rules, but gcc is right to warn you when you compare pointers of incompatible types. Note that if the inputs are not pointers, but arithmetic types, the C99 standard says that normal arithmetic conversions apply. You can see that the outcome of this is really probably not what you want, but the compiler has no reason to warn you.

Thus the above code makes a warning spew out if you try to compare values of incompatible types. You can try it yourself

$ cat test.c
#include <stdio.h>

int main(void)
{
        int i = -100;
        unsigned int j = 100;

        if (&i == &j);

        if (i < j)
                printf("OK\n");
        if (i > j)
                printf("i is -100, why is that > 100?\n");

        return 0;
}

$ gcc -Wall -o test test.c
test.c: In function 'main':
test.c:8: warning: comparison of distinct pointer types lacks a cast

$ ./test
i is -100, why is that > 100?

Not the most straight forward warning in the world, but sufficient to let you know you're doing something wrong. Note we don't get any warning where the problem really is, except we get a value we don't expect. C certainly can be a very tricky language to get right!

posted at: Thu, 08 Dec 2005 17:19 | in /code/c | permalink | add comment (0 others)

What exactly does -Bsymblic do?

-Bsymbolic is described by the ld man page as

When creating a shared library, bind references to global symbols to the definition within the shared library, if any. Normally, it is possible for a program linked against a shared library to override the definition within the shared library. This option is only meaningful on ELF platforms which support shared libraries.

Indeed, it is interesting to peer under the hood a little. In fact, the only thing the -Bsymbolic flag does when building a shared library is add a flag in the dynamic section of the binary called DT_SYMBOLIC. Having a look at the ABI description of the flag is a little more instructive.

This element's presence in a shared object library alters the dynamic linker's symbol resolution algorithm for references within the library. Instead of starting a symbol search with the executable file, the dynamic linker starts from the shared object itself. If the shared object fails to supply the referenced symbol, the dynamic linker then searches the executable file and other shared objects as usual.

Ahh, now we can have a peek into the dynamic loader to see what exactly it does with this flag. Pull open the glibc source and have a look at elf/dl-load.c, particularly at the bit in _dl_map_obj_from_fd where we have

 /* If this object has DT_SYMBOLIC set modify now its scope.  We don't
     have to do this for the main map.  */
  if ((mode & RTLD_DEEPBIND) == 0
      && __builtin_expect (l->l_info[DT_SYMBOLIC] != NULL, 0)
      && &l->l_searchlist != l->l_scope[0])
    {
      /* Create an appropriate searchlist.  It contains only this map.
         This is the definition of DT_SYMBOLIC in SysVr4.  */
      l->l_symbolic_searchlist.r_list =
        (struct link_map **) malloc (sizeof (struct link_map *));

      if (l->l_symbolic_searchlist.r_list == NULL)
        {
          errstring = N_("cannot create searchlist");
          goto call_lose_errno;
        }

      l->l_symbolic_searchlist.r_list[0] = l;
      l->l_symbolic_searchlist.r_nlist = 1;

      /* Now move the existing entries one back.  */
      memmove (&l->l_scope[1], &l->l_scope[0],
               (l->l_scope_max - 1) * sizeof (l->l_scope[0]));

      /* Now add the new entry.  */
      l->l_scope[0] = &l->l_symbolic_searchlist;
    }

So we can see (in typical glibcistic code) that we insert ourselves at the beginning of the link map, and move everything else along further down the chain.

Finally, an example to illustrate. It's a bit long, but worth understanding.

ianw@lime:/tmp/symbolic$ cat Makefile
all: test testsym

clean:
        rm -f *.so test testsym

liboverride.so : override.c
        $(CC) -shared -fPIC -o liboverride.so override.c

libtest.so : libtest.c
        $(CC) -shared -fPIC -o libtest.so libtest.c

libtestsym.so : libtest.c
        $(CC) -shared -fPIC -Wl,-Bsymbolic -o libtestsym.so libtest.c

test : test.c libtest.so liboverride.so
        $(CC) -L. -ltest -o test test.c

testsym : test.c libtestsym.so liboverride.so
        $(CC) -L. -ltestsym -o testsym test.c

ianw@lime:/tmp/symbolic$ cat libtest.c
#include <stdio.h>

int foo(void) {
        printf("libtest foo called\n");
        return 1;
}

int test_foo(void)
{
        return foo();
}
ianw@lime:/tmp/symbolic$ cat override.c
#include <stdio.h>

int foo(void)
{
        printf("override foo called\n");
        return 0;
}
ianw@lime:/tmp/symbolic$ cat test.c
#include <stdio.h>

int main(void)
{
        printf("%d\n", test_foo());
}


ianw@lime:/tmp/symbolic$ make
cc -shared -fPIC -o libtest.so libtest.c
cc -shared -fPIC -o liboverride.so override.c
cc -L. -ltest -o test test.c
cc -shared -fPIC -Wl,-Bsymbolic -o libtestsym.so libtest.c
cc -L. -ltestsym -o testsym test.c

So we have an application, a library (compiled once with -Bsymbolic and once without) and another library with a symbol foo that will override our original definition.

ianw@lime:/tmp/symbolic$ LD_LIBRARY_PATH=. ./test
libtest foo called
1
ianw@lime:/tmp/symbolic$ LD_LIBRARY_PATH=. ./testsym
libtest foo called
1

No surprises here ... both libraries call the "builtin" foo and everything is fine.

ianw@lime:/tmp/symbolic$ LD_LIBRARY_PATH=. LD_PRELOAD=./liboverride.so  ./test
override foo called
0
ianw@lime:/tmp/symbolic$ LD_LIBRARY_PATH=. LD_PRELOAD=./liboverride.so  ./testsym
libtest foo called
1

Now we see the difference. The preloaded foo was ignored in the second case, because the libtest was told to look within itself for symbols before looking outside.

-Bsymbolic is a rather big hammer to maintain symbol visiblity with however; it makes it all or nothing. Chances are you'd be better of with a version script. For example, with the following version script we can achieve the same thing.

ianw@lime:/tmp/symbolic$ cat Versions
{global: test_foo;  local: *; };
ianw@lime:/tmp/symbolic$ make
cc -shared -fPIC -Wl,-version-script=Versions -o libtestver.so libtest.c
cc -L. -ltestver -o testver test.c
ianw@lime:/tmp/symbolic$ LD_LIBRARY_PATH=. LD_PRELOAD=./liboverride.so ./testver
libtest foo called
1

You can see that since the version of foo in libtestver.so was local it got bound before the LD_PRELOAD, so has the same effect as -Bsymbolic.

Moral of the story? Version your symbols!

Update 2010-03-22 : see the update for more information on code optimisation with -Bsymbolic.

posted at: Thu, 10 Nov 2005 16:40 | in /code/c | permalink | add comment (2 others)

Stripping unused functions

benno raised an interesting question on getting rid of unused functions from a binary. If you link a group of object files together, the linker will be smart enough to leave out any files that it know doesn't get used. But what about functions that don't get used?

The best solution I could think of was to use -ffunction-sections and then --gc-sections to remove those sections that aren't used.

$ cat foo.c
int foo(void) {};
int bar(void) {};

$ cat main.c
int
main(void)
{
  foo();
  return 0;
};

$ gcc -c main.c
$ gcc -c -ffunction-sections foo.c
$ gcc -Wl,--gc-sections foo.o main.o -o program
$ objdump -d ./program | grep bar

This seems well out of scope for the linker to resolve, as it's job isn't to splice and dice text sections. The compiler doesn't really have enough information to see what is going on either. I think it would be possible to analyse the relocations in a group of pre-linked object files, and then compare that against a list of global functions in those same object files and if there isn't a relocation against it, prune the function out of the object.

I think the nicest way to do that would be with libelf wrappers for Python. But surely someone has looked at doing that before?

posted at: Wed, 09 Nov 2005 15:02 | in /code/c | permalink | add comment (0 others)

cdecl

cdecl is another tool that has been around since the days when Trumpet Winsock and SLIP were cutting edge (the README says 1996) which I just found a use for.

I find it handy for debugging things like

#include <stdio.h>
#include <setjmp.h>

jmp_buf j;

extern int* fun1(void);
extern int* fun2(void);
extern int* fun3(void);
int* foo(int *p);

int* foo (int *p)
{
	p = fun1();
	if (setjmp (j))
	   return p;

	   /* longjmp (j) may occur in fun3. */
	   p = fun3();
	   return p;
}
ianw@lime:/tmp$ gcc -g -c  -O2 -W -Wall -Wstrict-prototypes -Wmissing-prototypes -Werror -g -O3 jmp.c
cc1: warnings being treated as errors
jmp.c: In function 'foo':
jmp.c:11: warning: argument 'p' might be clobbered by 'longjmp' or 'vfork'

I can never remember the correct volatile for this situation

ianw@lime:~$ cdecl
cdecl> explain volatile int *p
declare p as pointer to volatile int
cdecl> explain int *volatile v
declare p as volatile pointer to int

So clearly the second version is the one I want, since I need to indicate that the pointer value might change underneath us. Nifty!

posted at: Wed, 02 Nov 2005 14:28 | in /code/c | permalink | add comment (0 others)

Investigating floats

float variables are, on most current hardware, instrumented as as single precision IEEE 754 value. We know that floating point values only approximate real values, but it is often hard to visualise. When we have the binary decimal 0.12345 we are really saying something like 1*10-1 + 2*10-2 + 3*10-3 .... Thus a binary decimal 0.10110 describes 1*2-1 + 0*2-2 + 1*2-3 ...

So clearly, with one decimal digit we can describe the range 0.0 - 0.9, but with one binary digit we can describe only 0 and 1/2 = 0.5. The fractions we can represent exactly are the dyadic rationals. Other rationals repeat (just like in base 10) and others are irrational. So for example, 0.1 can not be repesented exactly without infinite precision.

The program below illustrates how a single precision floating point value works by showing what bits are set and adding them up.

For example

$ ./float .5
0.500000 = 1 * (1/2^0) * 2^-1
0.500000 = 1 * (1/1) * 2^-1
0.500000 = 1 * 1 * 0.500000
0.500000 = 0.5
$ ./float .1
0.100000 = 1 * (1/2^0 + 1/2^1 + 1/2^4 + 1/2^5 + 1/2^8 + 1/2^9 + 1/2^12 + 1/2^13 + 1/2^16 + 1/2^17 + 1/2^20 + 1/2^21 + 1/2^23) * 2^-4
0.100000 = 1 * (1/1 + 1/2 + 1/16 + 1/32 + 1/256 + 1/512 + 1/4096 + 1/8192 + 1/65536 + 1/131072 + 1/1048576 + 1/2097152 + 1/8388608) * 2^-4
0.100000 = 1 * 1.60000002384 * 0.062500
0.100000 = 0.10000000149

I use doubles to show how the errors creep in.

#include <stdio.h>
#include <string.h>
#include <stdlib.h>

/* return 2^n */
int two_to_pos(int n)
{
	if (n == 0)
		return 1;
	return 2 * two_to_pos(n - 1);
}

double two_to_neg(int n)
{
	if (n == 0)
		return 1;
	return 1.0 / (two_to_pos(abs(n)));
}

double two_to(int n)
{
	if (n >= 0)
		return two_to_pos(n);
	if (n < 0)
		return two_to_neg(n);
	return 0;
}

/* Go through some memory "m" which is the 24 bit significand of a
   floating point number.  We work "backwards" from the bits
   furthest on the right, for no particular reason. */
double calc_float(int m, int bit)
{
	/* 23 bits; this terminates recursion */
	if (bit > 23)
		return 0;

	/* if the bit is set, it represents the value 1/2^bit */
	if ((m >> bit) & 1)
		return 1.0L/two_to(23 - bit) + calc_float(m, bit + 1);

	/* otherwise go to the next bit */
	return calc_float(m, bit + 1);
}

int main(int argc, char *argv[])
{
	float f;
	int m,i,sign,exponent,significand;

	if (argc != 2)
	{
		printf("usage: float 123.456\n");
		exit(1);
	}

	if (sscanf(argv[1], "%f", &f) != 1)
	{
		printf("invalid input\n");
		exit(1);
	}

	/* We need to "fool" the compiler, as if we start to use casts
	   (e.g. (int)f) it will actually do a conversion for us.  We
	   want access to the raw bits, so we just copy it into a same
	   sized variable. */
	memcpy(&m, &f, 4);

	/* The sign bit is the first bit */
	sign = (m >> 31) & 0x1;

	/* Exponent is 8 bits following the sign bit */
	exponent = ((m >> 23) & 0xFF) - 127;

	/* Significand fills out the float, the first bit is implied
	   to be 1, hence the 24 bit OR value below. */
	significand = (m & 0x7FFFFF) | 0x800000;

	/* print out a power representation */
	printf("%f = %d * (", f, sign ? -1 : 1);
	for(i = 23 ; i >= 0 ; i--)
	{
		if ((significand >> i) & 1)
			printf("%s1/2^%d", (i == 23) ? "" : " + ",
			       23-i);
	}
	printf(") * 2^%d\n", exponent);

	/* print out a fractional representation */
	printf("%f = %d * (", f, sign ? -1 : 1);
	for(i = 23 ; i >= 0 ; i--)
	{
		if ((significand >> i) & 1)
			printf("%s1/%d", (i == 23) ? "" : " + ",
			       (int)two_to(23-i));
	}
	printf(") * 2^%d\n", exponent);

	/* convert this into decimal and print it out */
	printf("%f = %d * %.12g * %f\n",
	       f,
	       (sign ? -1 : 1),
	       calc_float(significand, 0),
	       two_to(exponent));

	/* do the math this time */
	printf("%f = %.12g\n",
	       f,
	       (sign ? -1 : 1) *
	       calc_float(significand, 0) *
	       two_to(exponent)
		);

	return 0;
}

posted at: Thu, 20 Oct 2005 16:20 | in /code/c | permalink | add comment (0 others)

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 2.5 License.