Exploring cmp8xchg16

The latest version of Itanium comes with a new instruction, cmp8xchg16. This does a compare and exchange on 8 bytes (64 bits) but exchanges 16 bytes (128 bits). Seems like a strange instruction?

Firstly, a bit of an overview of compare-and-exchange (CAS). As a function, it would be expressed compare_and_exchange(memory, new_value, expected_value). If the memory location contains the expected value, the new value is swapped in, atomically. This means nothing can interrupt us between the compare and the exchange.

This is useful for implementing lock-free algorithms, particularly the case of updating a linked-list stack. For example, below we show how you can safely insert an element into the linked list using compare and exchange.

compare and exchange push

Note that nothing can interrupt the update of the head of list pointer; if the value is incorrect we just loop around, update and try again. Eventually we hope we can do it uninterrupted.

However, removing an element from the list is not as simple.

compare and exchange pop

We could potentially be interrupted, and have the list change underneath us. As long as the elements move around in memory, the CAS will fail and we will be OK. But if the next element is in the same place as the "old" element (the one we are trying to pop) we have a problem. This is known as the "ABA" problem (or the "A-B-A" problem, or something else; if this still doesn't make sense try this page).

To avoid this, one common scheme is to have a version number on the head of list pointer. You do the swap, and compare the version number as well as the next pointer in memory. If they both match, you know that nothing has been updated and you are safe to pop the entry. This requires either a double wide compare and exchange instruction (so you can swap both a pointer and a version number) or tricky schemes like aligning the variables and using the base bits as version numbers (see An Almost Non-Blocking Stack, Bohem 2004).

Now, we can't just check the version with a normal width compare and swap and then update the pointer later, because we've just created our race condition again. But we don't actually need to check the pointer, just swap it in if the version is OK. Hence (I think) a cmp8xchg16 instruction. It allows you to check a version number, swap in a new pointer if it is all OK, but without (what I presume is significant) overhead of a full double wide compare and swap implementation. Just when to update the version I'm still thinking about; I think you could still use a blacklist scheme like the Bohem approach, but it needs further thought.