Tracking an ABI change

Unfortunately this morning I got hit by a bug where an updated library broke an existing program.

The first thing I noticed was that if I rebuilt the program in question against the new library, everything worked again. This sort of thing points to a (probably unintentional) ABI change.

The source code was large, so I needed to try an zero in on what was happening a bit more. I figured if this was an ABI change, it should show up in the assembly. Thus I created a dump of both binary images with objdump --disassemble.

I then ran that through tkdiff to see where I stood. This showed up about 1500 differences, but looking at them they were mostly

@@ -2138 +2138 @@
-4000000000006dd0:      01 98 81 03 38 24       [MII]       addl r51=7264,r1
+4000000000006dd0:      01 98 01 03 28 24       [MII]       addl r51=5184,r1

As you may know, on IA64 r1 is defined as the gp or global pointer register. Functions aren't just functions on IA64, they have a function descriptor which contains both the function address and a value for the global pointer. The add instruction can take up to a 22 bit operand, so by adding to the global pointer you can offset into a region of 4MB of memory (2:sup:22 = 4MB) directly. When gcc builds your program, it sets r1 to point to the .got section of your binary. Now between the start of the binary and the GOT there is a whole bunch of stuff, notably unwind info, which might push the offsets out. So we can pretty much ignore all of these when looking for the root of our problem.

So a bit more sed and grep gives you a much reduced list of changes, and one in particular jumps out ...

-4000000000051a2c:      04 00 10 90                         mov r38=512
+4000000000051a2c:      24 00 08 90                         mov r38=258

This is where the very handy addr2line comes into play. Running that over the binary gives us

ianw@lime:~/tmp/openssh-3.8.1p1/build-deb$ addr2line --exe ./ssh 4000000000051a2c
../../openbsd-compat/bsd-arc4random.c:60

Peeking at that code

static RC4_KEY rc4;

void arc4random_stir(void)
{
        unsigned char rand_buf[SEED_SIZE];

60-->memset(&rc4, 0, sizeof(rc4));
        if (RAND_bytes(rand_buf, sizeof(rand_buf)) <= 0

 ... blah blah ...

This looks a lot like the sizeof(RC4_KEY) has changed on us. If our library has a different idea about the size of things than we do, it's sure to be a recipe for disaster. A little test program confirms the hypothesis.

#include "openssl/rc4.h"
main(void)
{
        printf("%d\n", sizeof(RC4_KEY));
}

-- 0.9.7e-3 --
ianw@lime:~/tmp$ ./test
258

-- 0.9.7g-1 --
ianw@lime:~/tmp$ ./test
512

Of course, the "what" is the easy bit. Finding out why the size is different is left as an exercise, and a reason why your projects should always keep a ChangeLog in excruciating detail.