Fun with floating point

Mon 11 April 2005

You probably already know that a floating point number is represented in binary as sign * significand * radixexponent. Thus you can represent the number 20.23 with radix 10 (i.e. base 10) as 1 * .2023 * 102 or as 1 * .02023 * 103.

Since one number can be represented in different ways, we define we defined the normalised version as the one that satisfies `` 1/radix <= significand < 1``. You can read that as saying "the leftmost number in the significand should not be zero".

So when we convert into binary (base 2) rather than base 10, we are saying that the "leftmost number should not be zero", hence, it can only be one. In fact, the IEEE standard "hides" the 1 because it is implied by a normalised number, giving you an extra bit for more precision in the significand.

So to normalise a floating point number you have to shift the signifcand left a number of times, and check if the first digit is a one. This is something that the hardware can probably do very fast, since it has to do it a lot. Combine this with an architecture like IA64 which has a 64 bit significand, and you've just found a way to do a really cool implementation of "find the first bit that is not zero in a 64 bit value", a common operation when working with bitfields (it was really David Mosberger who originally came up with that idea in the kernel).

#define ia64_getf_exp(x)                                        \
({                                                              \
        long ia64_intri_res;                                    \
                                                                \
        asm ("getf.exp %0=%1" : "=r"(ia64_intri_res) : "f"(x)); \
                                                                \
        ia64_intri_res;                                         \
})


int main(void)
{

    long double d = 0x1UL;
    long exp;

    exp = ia64_getf_exp(d);

    printf("The first non-zero bit is bit %d\n", exp - 65535);
}

Note the processor is using an 82 bit floating point implementation, with a 17 bit exponent component. Thus we use a 16 bit (0xFFFF, or 65535) bias so we can represent positive and negative numbers (i.e, zero is represented by 65535, 1 by 65536 and -1 by 65534) without an explicit sign bit.

IA64 uses the floating point registers in other interesting ways too. For example, the clear_page() implementation in the kernel spills zero'd floating point registers into memory because that provides you with the maximum memory bandwidth. The libc bzero() implementation does a similar thing.