PowerPC char nuances

Mon 14 February 2005

By default, char on PowerPC defaults to be a unsigned char, unlike most other architectures (3-8 of the ABI).

All EBCDIC machines seem to have char defined as unsigned. I wouldn't know EBCDIC if it hit me in the face, and I doubt many people born in the 80's would either. What seems more likely is that PowerPC chose this in it's ABI due to architectural limitations around type promotion of chars to integers. PowerPC doesn't have an instruction to move and sign extend all at the same time like other architectures. For example, the following code

void f(void)
{
    char a = 'a';
        int i;

        i = a;
}

produces the following ASM on 386

f:
        pushl   %ebp
        movl    %esp, %ebp
        subl    $8, %esp
        movb    $97, -1(%ebp)
        movsbl  -1(%ebp),%eax
        movl    %eax, -8(%ebp)
        leave
        ret

See the movsbl instruction; which in AT&T syntax means "move sign extending the byte to a long". Now watch the same thing on PowerPC with --fsigned-char

f:
        stwu 1,-32(1)
        stw 31,28(1)
        mr 31,1
        li 0,97
        stb 0,8(31)
        lbz 0,8(31)
        extsb 0,0
        stw 0,12(31)
        lwz 11,0(1)
        lwz 31,-4(11)
        mr 1,11
        blr

Note here you have to do a load clearing the top bits (lbz) and then sign extend it in a separate operation (extsb). Of course, if you do that without the -fsigned-char it just loads, without the extra clear.

So, without knowing too much about the history, my guess is that the guys at IBM/Motorola/Whoever were thinking EBCDIC when they designed the PowerPC architecture where type promotion with sign extend was probably going to be a largely superfluous instruction. The world of Linux (and consequently an operating system that ran on more than one or two architectures) appeared, so they defined the ABI to have char as unsigned because that's what they had before. Now we are stuck with this little nuance.

Moral of the story: don't assume anything, be that sizeof(long) or the signed-ness of a char.