Poking around inside the compiler

Shehjar, one of our students, came up with the interesting little

int i = 5;
printf("%d %d %d", i, i--, ++i);

and wondered exactly what it should do. Well, the C standard says that arguments to functions are evaluated first, but does not specify in what order they are evaluated. So this is a classic example of undefined behaviour (in fact it's even listed in the common list of not a bug not a bug by the GCC manual). In fact, ++ and friends don't actually create sequence points within the code (points where things are known to be evaluated), so they can play host to all sorts of undefined behaviour like above.

Of course, gcc doesn't leave you blind to this. If you turn on -Wall you get warning: operation on 'i' may be undefined. You do compile with -Wall right?

But this doesn't change the fact that that code prints 5,6,5 on x86 and on Itanium it prints 5,5,5. This suggested some investigation.

Firstly, lets simplify this down to

extern blah(int, int, int);

void function(void)
{
        int i = 5;
        blah(i, i--, ++i);
}

The problem is, looking at the output of the assembly doesn't tell us that much, because GCC is smart and by the time it has got to assembly, it is simply moving constant values into registers. So we need some way to peek into what GCC thinks it's doing to come up with those values.

Hence the magic -fdump flag. Building with this gives us dumps of the stages GCC is going through as it generates it's code.

$ gcc -fdump-tree-all -S -O ./test.c

This gives us lots of dumps, in numerical order as gcc is moving along. With modern GCC's, a good place to pick things up is to look is in the .gimple output file. Gimplification is the process GCC goes through optimising its internal tree structure (called GIMPLE after SIMPLE, apparently).

The .gimple file is pretty-printed from the tree representation back into C code. So, on i386 it looks like.

x86 $ cat test.c.t08.gimple

;; Function function (function)

function ()
{
  int i.0;
  int i;

  i = 5;
  i = i + 1;
  i.0 = i;
  i = i - 1;
  blah (i, i.0, i);
}

And on Itanium it looks like

ia64 $ cat test.c.t08.gimple
;; Function function (function)

function ()
{
  int i.0;
  int i;

  i = 5;
  i.0 = i;
  i = i - 1;
  i = i + 1;
  blah (i, i.0, i);
}

At this point, we can clearly see how gcc ends up with the idea that the code translates into 5,6,5 on x86 and 5,5,5 on Itanium. We also know that it is not the fault of optimisation, since we can see just what it is about to optimise matches the output we get. Why exactly it reaches this conclusion on the different architectures escapes me at the moment, but I'm slowing reading through and understanding the GCC code base hoping to one day find enlightenment :) Certainly passing arguments is very different on x86 compared to IA64 (on the stack rather than via registers) so no doubt some small piece of the backend config tweaks this into happening.

Another interesting thing to look at is the raw output of the tree gcc is using. You can see that via something like

$ gcc -S -fdump-tree-all-raw  -o test test.c

Which gives you an output like

;; Function function (function)

function ()
@1      function_decl    name: @2       type: @3       srcp: test.c:4
                         extern         body: @4
@2      identifier_node  strg: function lngt: 8
@3      function_type    size: @5       algn: 8        retn: @6
                         prms: @7
@4      bind_expr        type: @6       vars: @8       body: @9
@5      integer_cst      type: @10      low : 8
@6      void_type        name: @11      algn: 8
@7      tree_list        valu: @6
@8      var_decl         name: @12      type: @13      scpe: @1
                         srcp: test.c:6                artificial
                         size: @14      algn: 32       used: 1
@9      statement_list   0   : @15      1   : @16      2   : @17
                         3   : @18      4   : @19
@10     integer_type     name: @20      unql: @21      size: @22
                         algn: 64       prec: 36       unsigned
                         min : @23      max : @24
@11     type_decl        name: @25      type: @6       srcp: <built-in>:0
@12     identifier_node  strg: i.0      lngt: 3
@13     integer_type     name: @26      size: @14      algn: 32
                         prec: 32       min : @27      max : @28
@14     integer_cst      type: @10      low : 32
@15     modify_expr      type: @6       op 0: @29      op 1: @30
@16     modify_expr      type: @13      op 0: @29      op 1: @31
@17     modify_expr      type: @13      op 0: @8       op 1: @29
@18     modify_expr      type: @13      op 0: @29      op 1: @32
@19     call_expr        type: @13      fn  : @33      args: @34
@20     identifier_node  strg: bit_size_type           lngt: 13
@21     integer_type     name: @20      size: @22      algn: 64
                         prec: 36
@22     integer_cst      type: @10      low : 64
@23     integer_cst      type: @10      low : 0
@24     integer_cst      type: @10      low : -1
@25     identifier_node  strg: void     lngt: 4
@26     type_decl        name: @35      type: @13      srcp: <built-in>:0
@27     integer_cst      type: @13      high: -1       low : -2147483648
@28     integer_cst      type: @13      low : 2147483647
@29     var_decl         name: @36      type: @13      scpe: @1
                         srcp: test.c:5                size: @14
                         algn: 32       used: 1
@30     integer_cst      type: @13      low : 5
@31     plus_expr        type: @13      op 0: @29      op 1: @37
@32     minus_expr       type: @13      op 0: @29      op 1: @37
@33     addr_expr        type: @38      op 0: @39
@34     tree_list        valu: @29      chan: @40
@35     identifier_node  strg: int      lngt: 3
@36     identifier_node  strg: i        lngt: 1
@37     integer_cst      type: @13      low : 1
@38     pointer_type     size: @14      algn: 32       ptd : @41
@39     function_decl    name: @42      type: @41      srcp: test.c:1
                         undefined      extern
@40     tree_list        valu: @8       chan: @43
@41     function_type    size: @5       algn: 8        retn: @13
                         prms: @44
@42     identifier_node  strg: blah     lngt: 4
@43     tree_list        valu: @29
@44     tree_list        valu: @13      chan: @45
@45     tree_list        valu: @13      chan: @46
@46     tree_list        valu: @13      chan: @7

You can see the @X points to other nodes, which builds up to an entire tree structure of different nodes which represent the program. You can read more about the internal representation here.

Take home point? Using these dumps is a good way to start peering into what GCC is actually doing underneath its fairly extensive hood.