RSS | technovelty home | page of ian | ianw@ieee.org
It seems existing versions of gcc don't warn when you use an assignment as a truth value as the first operand to the conditional operator. For example:
int fn(int i)
{
return (i = 0x20) ? 0x40 : 0x80;
}
Presumably you meant ==, but unfortunately that does not warn with gcc (maybe it will one day).
We can use the tree dumping mechanisms (previously mentioned here) to confirm our problem. For the following function gcc creates:
fn (i)
{
int D.1541;
int iftmp.0;
i = 32;
if (1)
{
iftmp.0 = 64;
}
else
{
iftmp.0 = 128;
}
D.1541 = iftmp.0;
return D.1541;
}
As you can see, this isn't the result we wanted. This can be quite nasty if you use the conditional operator in a hidden fashion; for example behind an assert. A common idiom is:
#define ASSERT(x) (x) ? : fail() /* programmer now subsitutes a == for */ ASSERT(x = y);
You'll currently get no warning you've slipped up and used the assert wrong. Moral of the story: a quick audit of your code might turn up some surprising uses! However, in this case, the Intel compiler picks this one up:
$ icc -c test.c test.c(3): warning #187: use of "=" where "==" may have been intended return (i = 0x20) ? 0x40 : 0x80;
Although it's often not trivial to move to another compiler, as a rule I would recommend trying alternative compilers for your code as it's a great sanity check.
posted at: Sat, 26 Apr 2008 08:29 | in /code/c | permalink | add comment (1 others)
If you poke around the Linux VM layers you might wonder why there
is a STRICT_MM_TYPECHECKS option. It is there to stop
programmers touching opaque types. Linus has said
many times that a typedef should only be made for a
completely opaque type. This therefore implies that rather than
operate on it directly, there is some sort of "API" to get at the
values.
A simple example is below. my_long_t is opaque; the
only way you should use its value is via the my_val
accessor.
But how do we enforce this, when the type is really just an
unsigned long? We wrap it up in something else, in this case a
struct.
#ifndef STRICT
typedef unsigned long my_long_t;
# define my_val(x) (x)
#else
typedef struct { unsigned long t; } my_long_t;
# define my_val(x) ((x).t)
#endif
my_long_t my_naughty_function(my_long_t input)
{
return input * 1000;
}
We can now see the outcome
$ gcc -c strict.c $ gcc -DSTRICT -c strict.c strict.c: In function 'my_naughty_function': strict.c:15: error: invalid operands to binary *
So now we have an error where we do the wrong thing, rather than no notification at all! This niftily moves you up from "bug via visual inspection" to "won't compile" (Rusty Russell talked about this scale at linux.conf.au once -- does anyone have a reference to that talk?). Why not just leave it like this then? Well it might confuse the compiler; if you have rules about always passing structures on the stack, for example, the above will suck. In general, it's best to hide as little from the compiler as possible, to give it the best chance of doing a good job for you.
posted at: Fri, 30 Jun 2006 13:35 | in /code/c | permalink | add comment (1 others)
Paul Wayper writes about trampolines. This is something I've come across before, when I was learning about function pointers with IA64.
Nested functions are a strange contraption, and probably best avoided (people who should know agree). For example, here is a completely obfuscated way to double a number.
#include <stdio.h>
int function(int arg) {
int nested_function(int nested_arg) {
return arg + nested_arg;
}
return nested_function(arg);
}
int main(void)
{
printf("%d\n", function(10));
}
Let's disassemble that on IA64 (because it is so much easier to understand).
0000000000000000: 0: 00 10 15 08 80 05 [MII] alloc r34=ar.pfs,5,4,0 6: c0 80 33 7e 46 60 adds r12=-16,r12 c: 04 08 00 84 mov r35=r1 10: 01 00 00 00 01 00 [MII] nop.m 0x0 16: 10 02 00 62 00 80 mov r33=b0 1c: 04 00 01 84 mov r36=r32;; 20: 0a 70 40 18 00 21 [MMI] adds r14=16,r12;; 26: 00 00 39 20 23 e0 st4 [r14]=r32 2c: 01 70 00 84 mov r15=r14 30: 10 00 00 00 01 00 [MIB] nop.m 0x0 36: 00 00 00 02 00 00 nop.i 0x0 3c: 48 00 00 50 br.call.sptk.many b0=70 40: 0a 08 00 46 00 21 [MMI] mov r1=r35;; 46: 00 00 00 02 00 00 nop.m 0x0 4c: 20 02 aa 00 mov.i ar.pfs=r34 50: 00 00 00 00 01 00 [MII] nop.m 0x0 56: 00 08 05 80 03 80 mov b0=r33 5c: 01 61 00 84 adds r12=16,r12 60: 11 00 00 00 01 00 [MIB] nop.m 0x0 66: 00 00 00 02 00 80 nop.i 0x0 6c: 08 00 84 00 br.ret.sptk.many b0;; 0000000000000070 : 70: 0a 40 00 1e 10 10 [MMI] ld4 r8=[r15];; 76: 80 00 21 00 40 00 add r8=r32,r8 7c: 00 00 04 00 nop.i 0x0 80: 11 00 00 00 01 00 [MIB] nop.m 0x0 86: 00 00 00 02 00 80 nop.i 0x0 8c: 08 00 84 00 br.ret.sptk.many b0;;
Note that nothing funny happens there at all. The nested function
looks exactly like a real function, but we don't do any of the
alloc or anything to get us a new stack frame -- we're
using the stack frame of the old function. No trampoline involved,
just a straight branch.
So why do we need a trampoline? Because a nested function has an extra property over a normal function; the stack pointer must be set to be the stack pointer of the function it is nested within. This means that if we were to pass around pointers to the nested function, we would have to keep track that this isn't a pointer to just any old function, but a pointer to special function that needs its stack setup to come from the nested function. C doesn't work like this, there is only one type "pointer to function".
The easiest way is therefore to create a little function that sets the stack to the right place and calls the code. This "trampoline" is a normal function, so no need for trickery. For example, we can modify this to use a function pointer to the nested function:
#include <stdio.h>
int function(int arg) {
int nested_function(int nested_arg) {
return arg + nested_arg;
}
int (*function_pointer)(int arg) = nested_function;
return function_pointer(arg);
}
int main(void)
{
printf("%d\n", function(10));
}
We can see that now we are calling through a function pointer, we
go through __ia64_trampoline, which is a little code stub
in libgcc.a which loads up the right stack pointer and
calls the "nested" function.
function: .prologue 12, 33 .mmb .save ar.pfs, r34 alloc r34 = ar.pfs, 1, 3, 1, 0 .fframe 48 adds r12 = -48, r12 nop 0 .mmi addl r15 = @ltoffx(__ia64_trampoline#), r1 mov r35 = r1 .save rp, r33 mov r33 = b0 .body ;; .mmi nop 0 adds r8 = 24, r12 adds r14 = 16, r12 .mmi ld8.mov r15 = [r15], __ia64_trampoline# ;; st4 [r14] = r32 nop 0 .mmi mov r14 = r8 ;; st8 [r14] = r15, 8 adds r15 = 40, r12 ;; .mfi st8 [r14] = r15, 8 nop 0 addl r15 = @ltoff(@fptr(nested_function.1814#)), gp ;; .mmi ld8 r15 = [r15] ;; st8 [r14] = r15, 8 adds r15 = 16, r12 ;; .mmb st8 [r14] = r15 ld8 r14 = [r8], 8 nop 0 .mmi ld4 r36 = [r15] ;; nop 0 mov b6 = r14 .mbb ld8 r1 = [r8] nop 0 br.call.sptk.many b0 = b6 ;; .mii mov r1 = r35 mov ar.pfs = r34 mov b0 = r33 .mib nop 0 .restore sp adds r12 = 48, r12 br.ret.sptk.many b0 .endp function# .section .rodata.str1.8,"aMS",@progbits,1 .align 8
So, the summary is that if you're taking the address of a nested function, to avoid having different types of pointers to functions if they are nested or not gcc puts in the stub "trampoline" code.
posted at: Mon, 15 May 2006 23:53 | in /code/c | permalink | add comment (0 others)
Loop unrolling can be a fascinating art.
The canonical example of loop unrolling is something like
for (i = 0; i < 100; i++)
x[i] = i;
becoming
for (i = 0; i < 25; i+=4) {
x[i] = i;
x[i+1] = i+1;
x[i+2] = i+2;
x[i+3] = i+3;
}
Longer runs of instructions afford you a better chance of managing to find instruction level parallelism; that is instructions that can all happen together. You can keep the pipeline fuller and hence run faster, at the cost of some expanded code size.
But unlike that example, you generally do not know where your loop
ends. In the above example, we made four copies of the loop body,
reducing the iterations by 4. For an indeterminate number of
iterations, we would need to do the original loop n % 4
times and then the unrolled body floor(n / 4) times. So
to spell it out, if we want to do the loop 7 times, then we execute
the original loop 3 times and the unrolled portion once.
Our new code might look something like
int extras = n % 4;
for (i = 0; i < extras; i++)
x[i] += i;
for ( ; i < n / 4 ; i+=4) {
x[i] = i;
x[i+1] = i+1;
x[i+2] = i+2;
x[i+3] = i+3;
}
That leads to a little trick called "Duff's Device", references to which can be found most anywhere (however I found most of them a little unclear, hence this post).
Duff's device looks like this
send(to, from, count)
register short *to, *from;
register count;
{
register n=(count+7)/8;
switch(count%8) {
case 0: do { *to = *from++;
case 7: *to = *from++;
case 6: *to = *from++;
case 5: *to = *from++;
case 4: *to = *from++;
case 3: *to = *from++;
case 2: *to = *from++;
case 1: *to = *from++;
} while(--n>0);
}
}
Now that doesn't look like loop unrolling, but it really is. In
this case, we have unwound a loop doing simply *to =
*from 8 times. But we still have that extra bit problem, since
we have an indeterminate count.
To understand it, let's just work through what happens for possible
inputs. For values of count from 1-7, n is
one and count % 8 is just count so we fall
right into the case statements, which fall through to each other
without break and never repeat. The nifty bit comes when
count is greater than 7. In this case, our first iteration comes from
the switch where we fall through to do the extra
count % 8 iterations. After that, we loop back to the
start of the unrolling and do our 8 step loop as many times as we
require.
Now by post-incrementing *to you can use this to
implement memcpy(). An interesting exercise might be to
see how much slower this is than the highly optimised
memcpy that comes with glibc.
The code is most certainly ingenious and this page explains some of the history of it. Today I think it is mostly good as an exercise on "why it works".
If you're wondering how gcc handles unrolling, it creates the
unrolled loop and the does a sort of fall through test which evaluates
how many extra loops are required and branches into the unrolled
portion at the correct point. A little extract is below, you can see
r14 holds the number of extra loops required and branches
higher up the unrolled code for each extra loop required.
... 70: [MFB] cmp4.eq p6,p7=0,r14 76: nop.f 0x0 7c: (p06) br.cond.dpnt.few 150;; 80: [MFB] cmp4.eq p6,p7=1,r14 86: nop.f 0x0 8c: (p06) br.cond.dpnt.few 130 ;; 90: [MFB] cmp4.eq p6,p7=2,r14 96: nop.f 0x0 9c: (p06) br.cond.dpnt.few 120 ;; a0: [MFB] cmp4.eq p6,p7=3,r14 a6: nop.f 0x0 ac: (p06) br.cond.dpnt.few 110 ;; b0: [MFB] cmp4.eq p6,p7=4,r14 b6: nop.f 0x0 bc: (p06) br.cond.dpnt.few 100 ;; c0: [MFB] cmp4.eq p6,p7=5,r14 c6: nop.f 0x0 cc: (p06) br.cond.dpnt.few f0 ;; d0: [MMI] cmp4.eq p6,p7=6,r14;; d6: (p07) st4 [r16]=r17,4 dc: nop.i 0x0 ...
posted at: Fri, 24 Mar 2006 09:32 | in /code/c | permalink | add comment (0 others)
Shehjar, one of our students, came up with the interesting little
int i = 5;
printf("%d %d %d", i, i--, ++i);
and wondered exactly what it should do. Well, the C standard says
that arguments to functions are evaluated first, but does not specify
in what order they are evaluated. So this is a classic example
of undefined behaviour (in fact it's even listed in the common list of
not a bug not
a bug by the GCC manual). In fact, ++ and friends
don't actually create sequence points within the code (points where
things are known to be evaluated), so they can play host to all
sorts of undefined behaviour like above.
Of course, gcc doesn't leave you blind to this. If you turn on
-Wall you get warning: operation on 'i' may be
undefined. You do compile with -Wall right?
But this doesn't change the fact that that code prints
5,6,5 on x86 and on Itanium it prints 5,5,5.
This suggested some investigation.
Firstly, lets simplify this down to
extern blah(int, int, int);
void function(void)
{
int i = 5;
blah(i, i--, ++i);
}
The problem is, looking at the output of the assembly doesn't tell us that much, because GCC is smart and by the time it has got to assembly, it is simply moving constant values into registers. So we need some way to peek into what GCC thinks it's doing to come up with those values.
Hence the magic -fdump flag. Building with this gives
us dumps of the stages GCC is going through as it generates it's code.
$ gcc -fdump-tree-all -S -O ./test.c
This gives us lots of dumps, in numerical order as gcc is moving
along. With modern GCC's, a good place to pick things up is to look
is in the .gimple output file. Gimplification is
the process GCC goes through optimising its internal tree structure
(called GIMPLE after SIMPLE,
apparently).
The .gimple file is pretty-printed from the tree
representation back into C code. So, on i386 it looks like.
x86 $ cat test.c.t08.gimple
;; Function function (function)
function ()
{
int i.0;
int i;
i = 5;
i = i + 1;
i.0 = i;
i = i - 1;
blah (i, i.0, i);
}
And on Itanium it looks like
ia64 $ cat test.c.t08.gimple
;; Function function (function)
function ()
{
int i.0;
int i;
i = 5;
i.0 = i;
i = i - 1;
i = i + 1;
blah (i, i.0, i);
}
At this point, we can clearly see how gcc ends up with the idea
that the code translates into 5,6,5 on x86 and
5,5,5 on Itanium. We also know that it is not the fault
of optimisation, since we can see just what it is about to
optimise matches the output we get. Why exactly it reaches this
conclusion on the different architectures escapes me at the moment,
but I'm slowing reading through and understanding the GCC code base
hoping to one day find enlightenment :) Certainly passing arguments is
very different on x86 compared to IA64 (on the stack rather than via
registers) so no doubt some small piece of the backend config tweaks
this into happening.
Another interesting thing to look at is the raw output of the tree gcc is using. You can see that via something like
$ gcc -S -fdump-tree-all-raw -o test test.c
Which gives you an output like
;; Function function (function)
function ()
@1 function_decl name: @2 type: @3 srcp: test.c:4
extern body: @4
@2 identifier_node strg: function lngt: 8
@3 function_type size: @5 algn: 8 retn: @6
prms: @7
@4 bind_expr type: @6 vars: @8 body: @9
@5 integer_cst type: @10 low : 8
@6 void_type name: @11 algn: 8
@7 tree_list valu: @6
@8 var_decl name: @12 type: @13 scpe: @1
srcp: test.c:6 artificial
size: @14 algn: 32 used: 1
@9 statement_list 0 : @15 1 : @16 2 : @17
3 : @18 4 : @19
@10 integer_type name: @20 unql: @21 size: @22
algn: 64 prec: 36 unsigned
min : @23 max : @24
@11 type_decl name: @25 type: @6 srcp: <built-in>:0
@12 identifier_node strg: i.0 lngt: 3
@13 integer_type name: @26 size: @14 algn: 32
prec: 32 min : @27 max : @28
@14 integer_cst type: @10 low : 32
@15 modify_expr type: @6 op 0: @29 op 1: @30
@16 modify_expr type: @13 op 0: @29 op 1: @31
@17 modify_expr type: @13 op 0: @8 op 1: @29
@18 modify_expr type: @13 op 0: @29 op 1: @32
@19 call_expr type: @13 fn : @33 args: @34
@20 identifier_node strg: bit_size_type lngt: 13
@21 integer_type name: @20 size: @22 algn: 64
prec: 36
@22 integer_cst type: @10 low : 64
@23 integer_cst type: @10 low : 0
@24 integer_cst type: @10 low : -1
@25 identifier_node strg: void lngt: 4
@26 type_decl name: @35 type: @13 srcp: <built-in>:0
@27 integer_cst type: @13 high: -1 low : -2147483648
@28 integer_cst type: @13 low : 2147483647
@29 var_decl name: @36 type: @13 scpe: @1
srcp: test.c:5 size: @14
algn: 32 used: 1
@30 integer_cst type: @13 low : 5
@31 plus_expr type: @13 op 0: @29 op 1: @37
@32 minus_expr type: @13 op 0: @29 op 1: @37
@33 addr_expr type: @38 op 0: @39
@34 tree_list valu: @29 chan: @40
@35 identifier_node strg: int lngt: 3
@36 identifier_node strg: i lngt: 1
@37 integer_cst type: @13 low : 1
@38 pointer_type size: @14 algn: 32 ptd : @41
@39 function_decl name: @42 type: @41 srcp: test.c:1
undefined extern
@40 tree_list valu: @8 chan: @43
@41 function_type size: @5 algn: 8 retn: @13
prms: @44
@42 identifier_node strg: blah lngt: 4
@43 tree_list valu: @29
@44 tree_list valu: @13 chan: @45
@45 tree_list valu: @13 chan: @46
@46 tree_list valu: @13 chan: @7
You can see the @X points to other nodes, which builds
up to an entire tree structure of different nodes which represent the
program. You can read more about the internal representation here.
Take home point? Using these dumps is a good way to start peering into what GCC is actually doing underneath its fairly extensive hood.
posted at: Tue, 07 Feb 2006 15:45 | in /code/c | permalink | add comment (2 others)
Chatting with Hal, one of our summer students, we came across a bevy of interesting things to do with inlining code.
inline can only be considered
a hint, by the standard. gcc may choose not to inline
things marked inline for a number of reasons.always_inline
only applies to functions that are already declared with the
inline function specifier.always_inline will inline even with optimisation
turned off. The kernel wants this on all the time so has
#define inline inline __attribute__((always_inline))
-O)
-finline-functions does nothing. This is because gcc
doesn't have enough information to analyse the functions and decide
whether to put things inline or not.
static inline void stack_user1(void)
{
int blah[1000];
}
static inline void stack_user2(void)
{
int blah[1000];
}
/* we use a lot of stack without realising it */
int caller(void)
{
stack_user1();
stack_user2();
}
inline as a macro with
the advantage of type checking.posted at: Thu, 15 Dec 2005 15:44 | in /code/c | permalink | add comment (0 others)
The canonical definition of max is something like
#define max(a,b) ( ( (a) >= (b) ) ? (a) : (b))
But this raises many issues with types in C. What if you pass in
an unsigned int and a signed int all at the
same time? Under what circumstances is the comparison valid? What
type should be returned?
The kernel gets around this by being tricky
/*
* min()/max() macros that also do
* strict type-checking.. See the
* "unnecessary" pointer comparison.
*/
#define min(x,y) ({ \
typeof(x) _x = (x); \
typeof(y) _y = (y); \
(void) (&_x == &_y); \
_x < _y ? _x : _y; })
The give you a clue to how this works by talking about the pointer
comparison. C99 6.5.9.2 says that for the == comparison
Both operands are pointers to qualified or unqualified version of compatible types
Where compatible types are defined in 6.7.2.1
Two types have compatible type if their types are the same.
There are additional rules, but gcc is right to warn
you when you compare pointers of incompatible types. Note that if the
inputs are not pointers, but arithmetic types, the C99 standard
says that normal arithmetic conversions apply. You can see that the
outcome of this is really probably not what you want, but the compiler
has no reason to warn you.
Thus the above code makes a warning spew out if you try to compare values of incompatible types. You can try it yourself
$ cat test.c
#include <stdio.h>
int main(void)
{
int i = -100;
unsigned int j = 100;
if (&i == &j);
if (i < j)
printf("OK\n");
if (i > j)
printf("i is -100, why is that > 100?\n");
return 0;
}
$ gcc -Wall -o test test.c
test.c: In function 'main':
test.c:8: warning: comparison of distinct pointer types lacks a cast
$ ./test
i is -100, why is that > 100?
Not the most straight forward warning in the world, but sufficient to let you know you're doing something wrong. Note we don't get any warning where the problem really is, except we get a value we don't expect. C certainly can be a very tricky language to get right!
posted at: Thu, 08 Dec 2005 17:19 | in /code/c | permalink | add comment (0 others)
-Bsymbolic is described by the ld man
page as
When creating a shared library, bind references to global symbols to the definition within the shared library, if any. Normally, it is possible for a program linked against a shared library to override the definition within the shared library. This option is only meaningful on ELF platforms which support shared libraries.
Indeed, it is interesting to peer under the hood a little. In
fact, the only thing the -Bsymbolic flag does when
building a shared library is add a flag in the dynamic section of the
binary called DT_SYMBOLIC. Having a look at the ABI
description of the flag is a little more instructive.
This element's presence in a shared object library alters the dynamic linker's symbol resolution algorithm for references within the library. Instead of starting a symbol search with the executable file, the dynamic linker starts from the shared object itself. If the shared object fails to supply the referenced symbol, the dynamic linker then searches the executable file and other shared objects as usual.
Ahh, now we can have a peek into the dynamic loader to see what
exactly it does with this flag. Pull open the glibc source and have a
look at elf/dl-load.c,
particularly at the bit in _dl_map_obj_from_fd where we have
/* If this object has DT_SYMBOLIC set modify now its scope. We don't
have to do this for the main map. */
if ((mode & RTLD_DEEPBIND) == 0
&& __builtin_expect (l->l_info[DT_SYMBOLIC] != NULL, 0)
&& &l->l_searchlist != l->l_scope[0])
{
/* Create an appropriate searchlist. It contains only this map.
This is the definition of DT_SYMBOLIC in SysVr4. */
l->l_symbolic_searchlist.r_list =
(struct link_map **) malloc (sizeof (struct link_map *));
if (l->l_symbolic_searchlist.r_list == NULL)
{
errstring = N_("cannot create searchlist");
goto call_lose_errno;
}
l->l_symbolic_searchlist.r_list[0] = l;
l->l_symbolic_searchlist.r_nlist = 1;
/* Now move the existing entries one back. */
memmove (&l->l_scope[1], &l->l_scope[0],
(l->l_scope_max - 1) * sizeof (l->l_scope[0]));
/* Now add the new entry. */
l->l_scope[0] = &l->l_symbolic_searchlist;
}
So we can see (in typical glibcistic code) that we insert ourselves at the beginning of the link map, and move everything else along further down the chain.
Finally, an example to illustrate. It's a bit long, but worth understanding.
ianw@lime:/tmp/symbolic$ cat Makefile
all: test testsym
clean:
rm -f *.so test testsym
liboverride.so : override.c
$(CC) -shared -fPIC -o liboverride.so override.c
libtest.so : libtest.c
$(CC) -shared -fPIC -o libtest.so libtest.c
libtestsym.so : libtest.c
$(CC) -shared -fPIC -Wl,-Bsymbolic -o libtestsym.so libtest.c
test : test.c libtest.so liboverride.so
$(CC) -L. -ltest -o test test.c
testsym : test.c libtestsym.so liboverride.so
$(CC) -L. -ltestsym -o testsym test.c
ianw@lime:/tmp/symbolic$ cat libtest.c
#include
int foo(void) {
printf("libtest foo called\n");
return 1;
}
int test_foo(void)
{
return foo();
}
ianw@lime:/tmp/symbolic$ cat override.c
#include
int foo(void)
{
printf("override foo called\n");
return 0;
}
ianw@lime:/tmp/symbolic$ cat test.c
#include
int main(void)
{
printf("%d\n", test_foo());
}
ianw@lime:/tmp/symbolic$ make
cc -shared -fPIC -o libtest.so libtest.c
cc -shared -fPIC -o liboverride.so override.c
cc -L. -ltest -o test test.c
cc -shared -fPIC -Wl,-Bsymbolic -o libtestsym.so libtest.c
cc -L. -ltestsym -o testsym test.c
So we have an application, a library (compiled once with
-Bsymbolic and once without) and another library with a
symbol foo that will override our original definition.
ianw@lime:/tmp/symbolic$ LD_LIBRARY_PATH=. ./test libtest foo called 1 ianw@lime:/tmp/symbolic$ LD_LIBRARY_PATH=. ./testsym libtest foo called 1
No surprises here ... both libraries call the "builtin" foo and everything is fine.
ianw@lime:/tmp/symbolic$ LD_LIBRARY_PATH=. LD_PRELOAD=./liboverride.so ./test override foo called 0 ianw@lime:/tmp/symbolic$ LD_LIBRARY_PATH=. LD_PRELOAD=./liboverride.so ./testsym libtest foo called 1
Now we see the difference. The preloaded foo was
ignored in the second case, because the libtest was told
to look within itself for symbols before looking outside.
-Bsymbolic is a rather big hammer to maintain symbol
visiblity with however; it makes it all or nothing. Chances are you'd
be better of with a version script. For example, with the following
version script we can achieve the same thing.
ianw@lime:/tmp/symbolic$ cat Versions
{global: test_foo; local: *; };
ianw@lime:/tmp/symbolic$ make
cc -shared -fPIC -Wl,-version-script=Versions -o libtestver.so libtest.c
cc -L. -ltestver -o testver test.c
ianw@lime:/tmp/symbolic$ LD_LIBRARY_PATH=. LD_PRELOAD=./liboverride.so ./testver
libtest foo called
1
You can see that since the version of foo in
libtestver.so was local it got bound before the
LD_PRELOAD, so has the same effect as
-Bsymbolic.
Moral of the story? Version your symbols!
posted at: Thu, 10 Nov 2005 16:40 | in /code/c | permalink | add comment (0 others)
benno raised an interesting question on getting rid of unused functions from a binary. If you link a group of object files together, the linker will be smart enough to leave out any files that it know doesn't get used. But what about functions that don't get used?
The best solution I could think of was to use
-ffunction-sections and then --gc-sections
to remove those sections that aren't used.
$ cat foo.c
int foo(void) {};
int bar(void) {};
$ cat main.c
int
main(void)
{
foo();
return 0;
};
$ gcc -c main.c
$ gcc -c -ffunction-sections foo.c
$ gcc -Wl,--gc-sections foo.o main.o -o program
$ objdump -d ./program | grep bar
This seems well out of scope for the linker to resolve, as it's job isn't to splice and dice text sections. The compiler doesn't really have enough information to see what is going on either. I think it would be possible to analyse the relocations in a group of pre-linked object files, and then compare that against a list of global functions in those same object files and if there isn't a relocation against it, prune the function out of the object.
I think the nicest way to do that would be with libelf
wrappers for Python. But surely someone has looked at doing that
before?
posted at: Wed, 09 Nov 2005 15:02 | in /code/c | permalink | add comment (0 others)
cdecl is another tool that has been around since the
days when Trumpet Winsock and SLIP were cutting edge (the README says
1996) which I just found a use for.
I find it handy for debugging things like
#include <stdio.h>
#include <setjmp.h>
jmp_buf j;
extern int* fun1(void);
extern int* fun2(void);
extern int* fun3(void);
int* foo(int *p);
int* foo (int *p)
{
p = fun1();
if (setjmp (j))
return p;
/* longjmp (j) may occur in fun3. */
p = fun3();
return p;
}
ianw@lime:/tmp$ gcc -g -c -O2 -W -Wall -Wstrict-prototypes -Wmissing-prototypes -Werror -g -O3 jmp.c
cc1: warnings being treated as errors
jmp.c: In function 'foo':
jmp.c:11: warning: argument 'p' might be clobbered by 'longjmp' or 'vfork'
I can never remember the correct volatile for this situation
ianw@lime:~$ cdecl cdecl> explain volatile int *p declare p as pointer to volatile int cdecl> explain int *volatile v declare p as volatile pointer to int
So clearly the second version is the one I want, since I need to indicate that the pointer value might change underneath us. Nifty!
posted at: Wed, 02 Nov 2005 14:28 | in /code/c | permalink | add comment (0 others)
float variables are, on most current hardware,
instrumented as as single precision IEEE 754 value. We know that
floating point values only approximate real values, but it is often
hard to visualise. When we have the binary decimal 0.12345 we are
really saying something like 1*10-1 +
2*10-2 + 3*10-3 .... Thus a binary
decimal 0.10110 describes 1*2-1 + 0*2-2 +
1*2-3 ...
So clearly, with one decimal digit we can describe the range
0.0 - 0.9, but with one binary digit we can describe only
0 and 1/2 = 0.5. The fractions we can
represent exactly are the dyadic
rationals. Other rationals repeat (just like in base 10) and
others are irrational. So for example, 0.1 can not be
repesented exactly without infinite precision.
The program below illustrates how a single precision floating point value works by showing what bits are set and adding them up.
For example
$ ./float .5 0.500000 = 1 * (1/2^0) * 2^-1 0.500000 = 1 * (1/1) * 2^-1 0.500000 = 1 * 1 * 0.500000 0.500000 = 0.5 $ ./float .1 0.100000 = 1 * (1/2^0 + 1/2^1 + 1/2^4 + 1/2^5 + 1/2^8 + 1/2^9 + 1/2^12 + 1/2^13 + 1/2^16 + 1/2^17 + 1/2^20 + 1/2^21 + 1/2^23) * 2^-4 0.100000 = 1 * (1/1 + 1/2 + 1/16 + 1/32 + 1/256 + 1/512 + 1/4096 + 1/8192 + 1/65536 + 1/131072 + 1/1048576 + 1/2097152 + 1/8388608) * 2^-4 0.100000 = 1 * 1.60000002384 * 0.062500 0.100000 = 0.10000000149
I use doubles to show how the errors creep in.
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
/* return 2^n */
int two_to_pos(int n)
{
if (n == 0)
return 1;
return 2 * two_to_pos(n - 1);
}
double two_to_neg(int n)
{
if (n == 0)
return 1;
return 1.0 / (two_to_pos(abs(n)));
}
double two_to(int n)
{
if (n >= 0)
return two_to_pos(n);
if (n < 0)
return two_to_neg(n);
return 0;
}
/* Go through some memory "m" which is the 24 bit significand of a
floating point number. We work "backwards" from the bits
furthest on the right, for no particular reason. */
double calc_float(int m, int bit)
{
/* 23 bits; this terminates recursion */
if (bit > 23)
return 0;
/* if the bit is set, it represents the value 1/2^bit */
if ((m >> bit) & 1)
return 1.0L/two_to(23 - bit) + calc_float(m, bit + 1);
/* otherwise go to the next bit */
return calc_float(m, bit + 1);
}
int main(int argc, char *argv[])
{
float f;
int m,i,sign,exponent,significand;
if (argc != 2)
{
printf("usage: float 123.456\n");
exit(1);
}
if (sscanf(argv[1], "%f", &f) != 1)
{
printf("invalid input\n");
exit(1);
}
/* We need to "fool" the compiler, as if we start to use casts
(e.g. (int)f) it will actually do a conversion for us. We
want access to the raw bits, so we just copy it into a same
sized variable. */
memcpy(&m, &f, 4);
/* The sign bit is the first bit */
sign = (m >> 31) & 0x1;
/* Exponent is 8 bits following the sign bit */
exponent = ((m >> 23) & 0xFF) - 127;
/* Significand fills out the float, the first bit is implied
to be 1, hence the 24 bit OR value below. */
significand = (m & 0x7FFFFF) | 0x800000;
/* print out a power representation */
printf("%f = %d * (", f, sign ? -1 : 1);
for(i = 23 ; i >= 0 ; i--)
{
if ((significand >> i) & 1)
printf("%s1/2^%d", (i == 23) ? "" : " + ",
23-i);
}
printf(") * 2^%d\n", exponent);
/* print out a fractional representation */
printf("%f = %d * (", f, sign ? -1 : 1);
for(i = 23 ; i >= 0 ; i--)
{
if ((significand >> i) & 1)
printf("%s1/%d", (i == 23) ? "" : " + ",
(int)two_to(23-i));
}
printf(") * 2^%d\n", exponent);
/* convert this into decimal and print it out */
printf("%f = %d * %.12g * %f\n",
f,
(sign ? -1 : 1),
calc_float(significand, 0),
two_to(exponent));
/* do the math this time */
printf("%f = %.12g\n",
f,
(sign ? -1 : 1) *
calc_float(significand, 0) *
two_to(exponent)
);
return 0;
}
posted at: Thu, 20 Oct 2005 16:20 | in /code/c | permalink | add comment (0 others)
A sometimes handy trick is registering your own specifiers for
printf. Although it is GNU only, it is still a handy
trick. For example, glibc comes with a built-in
extension to reduce a float for you, e.g.
$ cat test.c #include#include int main(void) { float bytes = 1024.0 * 1024.0 * 1024.0; register_printf_function ('b', printf_size, printf_size_info); printf("%.0bB\n", bytes); } $ gcc -o test test.c $ ./test 1gB
If we register the handler with a capital letter, the base will be 1000, otherwise it is 1024. It is unfortunate that this doesn't appear to support SI units, but it is still handy (and may one day).
The glibc manual has full details on customising printf further. You can easily add a specifier for your own structures, etc.
posted at: Wed, 19 Oct 2005 10:02 | in /code/c | permalink | add comment (0 others)
Is
int even(int n)
{
return (n & 1);
}
a valid way of determining the evenness of a number? For unsigned
integers, yes. For signed positive integers also yes. However,
C99 6.2.6.2-2 states
For signed integer types, the bits of the object representation shall be divided into three groups: value bits, padding bits, and the sign bit. [...] If the sign bit is one, the value shall be modified in one of the following ways
- the corresponding value with the sign bit 0 is negated (sign and magnitude);
- the sign bit has the value -(2N)(two's complement);
- the sign bit has the value -(2N - 1)(one's complement).
In other words, there are three possible representations of a signed integer in C, and one of them (one's complement) has a negative representation that fails the above test. However, I don't think there are actually any one's complement machines, so you're pretty safe.
However, Murphy's law dictates that eventually your code will run
on some bizarre device that uses one's complement. Thus the best bet
is to use a remainder (%) and let the compiler figure it
out for you. Have a look at what the compiler gets up to for the most
straight forward case:
$ cat test.c
int function(int i)
{
return (i % 2);
}
$ gcc -O2 -S test.c
$ cat test.s
.file "test.c"
.text
.p2align 4,,15
.globl function
.type function, @function
function:
pushl %ebp
movl %esp, %ebp
movl 8(%ebp), %eax
andl $-2147483647, %eax
js .L5
popl %ebp
ret
.p2align 4,,15
.L5:
decl %eax
orl $-2, %eax
incl %eax
popl %ebp
ret
.size function, .-function
.ident "GCC: (GNU) 4.0.2 20050821 (prerelease) (Debian 4.0.1-6)"
.section .note.GNU-stack,"",@progbits
It's a bit tricky because that remainder needs to account for
negative values. So $-2147483657 == -0x7FFFFFFF, which
in two's complement is binary 1000 0000 0000 0000 0000 0000 0000
0001. So the andl checks the last bit just like
we did before, however having the top bit set means that if the top
bit is set in the other operand (i.e. the sign bit, indicating it is a
negative number), the sign flag will be set and the js
(jump on sign bit) will be triggered (that next bit at
.L5 is a consequence of C99 6.5.5 6 where we
have the relation (a/b)*b + a%b == a).
We can reduce the even function down to
$ cat test.c
int function(int i)
{
return ((unsigned int)i % 2);
}
$ gcc -O2 -S test.c
$ cat test.s
.file "test.c"
.text
.p2align 4,,15
.globl function
.type function, @function
function:
pushl %ebp
movl %esp, %ebp
movl 8(%ebp), %eax
andl $1, %eax
popl %ebp
ret
.size function, .-function
.ident "GCC: (GNU) 4.0.2 20050821 (prerelease) (Debian 4.0.1-6)"
.section .note.GNU-stack,"",@progbits
Which is the same as the original on this two's complement x86. In
this trivial case the cast to unsigned could have been
used with the original & 1 check and would have been OK,
but my feeling is this the % is a little more clear (and
if you need to worry about negative values your hand is forced).
I think the moral is to be worried whenever you pull out a bitwise operator and try to pass it off the work to the compiler (with a quick sanity check).
posted at: Wed, 14 Sep 2005 09:51 | in /code/c | permalink | add comment (0 others)
When you know about buffering it's trivial, but when you don't you can get some very unexpected behaviour from UNIX.
For example, the following program will print nothing.
#includeint main(void) { printf("hello world"); while (1) sleep(1); }
This is because by default your stdout stream (which printf uses)
will be in line buffering mode, where output will not be
printed until a newline (\n) is seen.
You can either add the newline (as was probably intended, but it's
an easy enough thing to leave off!) or manually call
fflush to flush the buffer.
There are three buffer modes available; unbuffered,
line and block buffered. You can access them with the
setbuf call. Unbuffered is the default for
stderr, while your tty by default takes on line
buffering. Anything connected to a block device (e.g. a file on a
disk, or pipes) take on block buffering, where the best amounts of
data for the device are written at once.
But essentially, if your output is missing check for those newlines!
posted at: Fri, 26 Aug 2005 23:51 | in /code/c | permalink | add comment (0 others)
According to recent
discussions sched_yield can cause performance
problems with some code.
I haven't looked at the OpenLDAP code, but a common problem is that
if you call pthread_cond_signal it doesn't actually
relinquish control from the current thread to the thread (if any) that
has been signaled. The thread that has been signalled has to wait for
its next time slice before it will realise it has been woken up, and
the signalling thread loops around the mutex for the conditional doing
nothing. Thus a common idiom is to call
pthread_cond_signal and then sched_yield to
relinquish control from your thread to the waiting thread (I've
certainly done this myself).
This can lead to very bad performance under certain circumstances
since sched_yield on Linux is saying to the scheduler "I
have nothing important to do". This may actually be the opposite of
what is really true, as usually developers see this problem when there
is so much work to do you don't get a chance to relinquish control and
make some forward progress with it. This is especially bad on a SMP
machine where you probably don't want to relinquish control at all,
since you don't need to give up your timeslice for other threads to
start running on other processors; what allows you forward progress on
a uniprocessor might be slowing you down needlessly on a SMP machine.
On some other systems sched_yield will relinquish control
to another thread in your thread group, but this is not specified by
the standard so you can not depend on it.
One solution to such a problem is presented below. This is a fairly
standard situation where are request is made and needs to be handled.
To avoid using sched_yield one needs to queue the
requests for a dispatcher to handle and take an extra lock that it is
possible to sleep on when required. On a uniprocessor, when requests
are flooding in request() will just be looping queueing
work until its timeslice is complete. The typical idiom is to place a
yield after the unlock of dispatcher.mutex to allow the
dispatcher to do some work. However by adding another lock you can
make request sleep, giving the dispatcher time to clear
the queue. You will notice if you include the usleep
that everything happens nice and sequentially on uniprocessor or SMP;
this is because the dispatcher gets a chance to run as the requests go
to sleep. Conversely without the sleep the requests are flooding in
as fast as possible; to keep your request handling latency reasonable
you don't want to be giving up the CPU to every Tom, Dick and Harry
every time you handle a new request!
Compile this once as
$ gcc -Wall -o server server.c -lpthread
to see the problem, then once as
$ gcc -Wall -o server-resched server.c -lpthread -DRESCHED
to see the dispatcher handling the batched requests. I believe
this is a good solution to minimise latency below a CPU timeslice when
requests are flooding, without having to "fool" the scheduler with
sched_yield. Something to keep in mind is that if you
find yourself using sched_yield when you may very well
have more useful work to do, then chances are you're doing the wrong
thing.
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <pthread.h>
#include <errno.h>
struct request_tag_t {
struct request_tag_t *next;
int operation;
};
struct dispatcher_tag_t {
struct request_tag_t *first;
struct request_tag_t *last;
int running;
pthread_mutex_t mutex;
pthread_cond_t request;
pthread_mutex_t resched;
};
struct dispatcher_tag_t dispatcher = {
.first = NULL,
.last = NULL,
.running = 0,
.mutex = PTHREAD_MUTEX_INITIALIZER,
.request = PTHREAD_COND_INITIALIZER,
.resched = PTHREAD_MUTEX_INITIALIZER,
};
int rand_num(int range)
{
return (int)((range)*1.0*rand()/(RAND_MAX+1.0));
}
void *dispatcher_routine(void *arg)
{
struct request_tag_t *request;
printf("dispatcher starting\n");
while (1)
{
/* take the dispatcher queue mutex */
pthread_mutex_lock(&dispatcher.mutex);
/* if nothing to do, wait on the dispatcher queue
* condition variable until there is */
while (dispatcher.first == NULL)
pthread_cond_wait(&dispatcher.request, &dispatcher.mutex);
/* take a request from the queue */
request = dispatcher.first;
dispatcher.first = request->next;
if (dispatcher.first == NULL)
dispatcher.last = NULL;
/* handle request, probably by handling it in a new thread. */
printf("I'm dispatching request %d\n", request->operation);
/* free request we have just finished with */
free(request);
/* release queue lock */
pthread_mutex_unlock(&dispatcher.mutex);
#ifdef RESCHED
/* release the resched lock to signal we've seen the queue */
pthread_mutex_unlock(&dispatcher.resched);
#endif
}
}
int tries = 1;
void request(int operation)
{
struct request_tag_t *request;
/* take the dispatcher queue mutex */
pthread_mutex_lock(&dispatcher.mutex);
/* start the thread if it isn't */
if (!dispatcher.running)
{
pthread_t thread;
printf("dispatcher not running, starting\n");
dispatcher.running = 1;
pthread_create(&thread, NULL, dispatcher_routine, NULL);
}
/* allocate a new request */
request = malloc(sizeof(struct request_tag_t));
if (request == NULL)
exit(1);
request->next = NULL;
request->operation = operation;
/* add to the queue, maintaing first and last */
if (dispatcher.first == NULL) {
dispatcher.first = request;
dispatcher.last = request;
} else {
(dispatcher.last)->next = request;
dispatcher.last = request;
}
/* signal something to do */
pthread_cond_signal(&dispatcher.request);
/* free the queue lock */
pthread_mutex_unlock(&dispatcher.mutex);
/* temping to put a sched_yield() here */
#ifdef RESCHED
/* try the resched lock. if we can't have it, that means
* we've already got it and the dispatch thread hasn't had a
* chance to run and unlock it. We queue up to 10 requests
* before allowing the dispatcher to run by sleeping on the
* mutex
*/
if (pthread_mutex_trylock(&dispatcher.resched) == EBUSY) {
if (tries++ == 10) {
pthread_mutex_lock(&dispatcher.resched);
tries = 1;
}
} else
tries = 1;
#endif
}
int main(void) {
while(1)
{
int n = rand_num(3);
printf("requesting %d\n", n);
request(n);
/* if you uncomment this, everything should play
* nicely with one dispatch per request, because other
* threads get a chance to run. Otherwise we have
* requests flooding in as fast as possible
*/
/* usleep(500000); */
}
}
posted at: Tue, 23 Aug 2005 10:36 | in /code/c | permalink | add comment (0 others)

This work is licensed under a Creative Commons Attribution-ShareAlike 2.5 License.