Compare and Swap with PIC

Our Dear Leader Sam Hocevar has previously blogged about PIC and inline ASM. Today I came across a sort of extension to this problem.

Consider the following code, which implements a double word compare and swap using the x86 cmpxchg8b instruction (for a bonus you can lock it to make it atomic).

#include <stdio.h>

typedef struct double_word_t {
    int a;
    int b;
} double_word;

/* atomically compare old and mem, if they are the same then copy new
   back to mem */
int compare_and_swap(double_word *mem,
             double_word old,
             double_word new) {

    char result;
    __asm__ __volatile__("lock; cmpxchg8b %0; setz %1;"
                 : "=m"(*mem), "=q"(result)
                 : "m"(*mem), "d" (old.b), "a" (old.a),
                   "c" (new.b), "b" (new.a)
                 : "memory");
    return (int)result;
}

int main(void)
{

    double_word w = {.a = 0, .b = 0};
    double_word old = {.a = 17, .b = 42};
    double_word new = {.a = 12, .b = 13};

    /* old != w, therefore nothing happens */
    compare_and_swap(&w, old, new);
    printf("Should fail -> (%d,%d)\n", w.a, w.b);

    /* old == w, therefore w = new */
    old.a = 0; old.b = 0;
    compare_and_swap(&w, old, new);
    printf("Should work  -> (%d,%d)\n", w.a, w.b);

    return 0;
}

This type of CAS can be used to implement lock-free algorithms (I've previously blogged about that sort of thing).

The problem is that the cmpxchg8b uses the ebx register, i.e. pseudo code looks like:

if(EDX:EAX == Destination) {
    ZF = 1;
    Destination = ECX:EBX;
}
else {
    ZF = 0;
    EDX:EAX = Destination;
}

PIC code reserves ebx for internal use, so if you try to compile that with -fPIC you will get an error about not being able to allocate ebx.

A first attempt to create a PIC friendly version would simply save and restore ebx and not gcc anything about it, something like:

__asm__ __volatile__("pushl %%ebx;"   /* save ebx used for PIC GOT ptr */
             "movl %6,%%ebx;" /* move new_val2 to %ebx */
             "lock; cmpxchg8b %0; setz %1;"
             "pop %%ebx;"     /* restore %ebx */
                 : "=m"(*mem), "=q"(result)
             : "m"(*mem), "d" (old.b), "a" (old.a),
               "c" (new.b), "m" (new.a) : "memory");

Unfortunately, this isn't a generic solution. It works fine with the PIC case, because gcc will not allocate ebx for anything else. But in the non-PIC case, there is a chance that ebx will be used for addr. This would cause a probably fairly tricky bug to track down!

The solution is to use the #if __PIC__ directive to either tell gcc you're clobbering ebx in the non-PIC case, or just keep two versions around; one that saves and restores ebx for PIC and one that doesn't.