Our Dear Leader Sam Hocevar has previously blogged about PIC and inline ASM. Today I came across a sort of extension to this problem.
Consider the following code, which implements a double word compare and swap using the x86 cmpxchg8b instruction (for a bonus you can lock it to make it atomic).
#include <stdio.h>
typedef struct double_word_t {
int a;
int b;
} double_word;
/* atomically compare old and mem, if they are the same then copy new
back to mem */
int compare_and_swap(double_word *mem,
double_word old,
double_word new) {
char result;
__asm__ __volatile__("lock; cmpxchg8b %0; setz %1;"
: "=m"(*mem), "=q"(result)
: "m"(*mem), "d" (old.b), "a" (old.a),
"c" (new.b), "b" (new.a)
: "memory");
return (int)result;
}
int main(void)
{
double_word w = {.a = 0, .b = 0};
double_word old = {.a = 17, .b = 42};
double_word new = {.a = 12, .b = 13};
/* old != w, therefore nothing happens */
compare_and_swap(&w, old, new);
printf("Should fail -> (%d,%d)\n", w.a, w.b);
/* old == w, therefore w = new */
old.a = 0; old.b = 0;
compare_and_swap(&w, old, new);
printf("Should work -> (%d,%d)\n", w.a, w.b);
return 0;
}
This type of CAS can be used to implement lock-free algorithms (I've previously blogged about that sort of thing).
The problem is that the cmpxchg8b uses the ebx register, i.e. pseudo code looks like:
if(EDX:EAX == Destination) {
ZF = 1;
Destination = ECX:EBX;
}
else {
ZF = 0;
EDX:EAX = Destination;
}
PIC code reserves ebx for internal use, so if you try to compile that with -fPIC you will get an error about not being able to allocate ebx.
A first attempt to create a PIC friendly version would simply save and restore ebx and not gcc anything about it, something like:
__asm__ __volatile__("pushl %%ebx;" /* save ebx used for PIC GOT ptr */
"movl %6,%%ebx;" /* move new_val2 to %ebx */
"lock; cmpxchg8b %0; setz %1;"
"pop %%ebx;" /* restore %ebx */
: "=m"(*mem), "=q"(result)
: "m"(*mem), "d" (old.b), "a" (old.a),
"c" (new.b), "m" (new.a) : "memory");
Unfortunately, this isn't a generic solution. It works fine with the PIC case, because gcc will not allocate ebx for anything else. But in the non-PIC case, there is a chance that ebx will be used for addr. This would cause a probably fairly tricky bug to track down!
The solution is to use the #if __PIC__ directive to either tell gcc you're clobbering ebx in the non-PIC case, or just keep two versions around; one that saves and restores ebx for PIC and one that doesn't.