Superblock scheduling and tail duplication

I hate it when compiler people start talking and I have no idea what they are on about. It turns out if you take the time to read their papers, you can glean a little of the black art that is compilers.

Superblocks and tail duplication are quite important on an architecture like IA64, because the compiler has to find as much instruction level parallelism (ILP) as possible. What this means is that the longer list of instructions you can get queued up which don't depend on each other, the better. Once an instruction depends on another one you have to introduce a stop, which slows things down and makes the processor wait around doing nothing.

So a superblock is an abstraction the compiler can internally make by finding blocks of instructions that are likely to occur sequentially. Superblocks have one entry point, but may have multiple exit points. The compiler can create superblocks by either static analysis of code (seeing which code paths are likely by looking at them at compile time; this is an interesting problem worthy of hours on it's own) or by looking at profile information gained from running code (i.e. which code paths are followed in real conditions). Tail duplication is an extension to make even bigger superblocks by essentially copying code. An example might illustrate

function()
{
  do {
      /* entry */
      if (something_likely)
         /* part 1 */
      else
         /* part 2 */
      /* both */
  } while (something)
}

Could be changed to something like

function()
{
a:
 /* entry */               |-> superblock 1
 if (something_likely) {   |
    /* part 1 */           |
    /* both   */           |
    if (something)         |
       goto a              |
    else                   |
       return              |
 }
 /* part 2 */           |-> superblock 2
 /* both   */           |
 if (something)         |
    goto a              |
 else                   |
    return              |
}

You can see how it duplicates the common part of the code (both) to make larger blocks; this is tail duplication. Notice how the first superblock takes into account the entry block as well -- since the compiler knows the if statement is likely to be taken (from some sort of static analysis or profiling information), it should try to keep the whole thing together.

Superblocks let the compiler make better decisions about code scheduling. Imagine in entry you do a load from some known memory address. You only read this value in superblock 1, never write it. However, say in superblock 2 you do a store to an unknown at compile time address (this is called a hazard). This means you can't optimise out this first load in entry because if you end up taking superblock 2 the store may change the value (the compiler can't know what is being changed, so it has to assume the worst). Thus you end up loading the same thing every time, even though it probably hasn't changed. But you want your superblock as big as possible so you really want to include the entry block ... you can't have your cake and eat it too!

Lucikly, smart people have come up with ways to analyse a superblocks and remove what they call loop invariant code from it. This means the compiler can realise it can move the load in entry outside of superblock 1 into another block, since superblock 1 will never change it. The compiler can then make superblock 2 branch back to this "new" block. This means the value only gets loaded once on the first entry to superblock 1 and only then again if it is (possibly) changed by superblock 2. Enjoy your cake!

So next time an compiler geek starts talking about superblocks maybe we'll be almost able to keep up!