technovelty

weblog of Ian Wienand

RSS  |  technovelty home  |  page of ian  |  ianw@ieee.org

Non-thread safe C

Recently it has come up on the LKML and other blogs about how C is not thread safe. The nice piece of example code given is

int trylock()
{
    int res;

    res = pthread_mutex_trylock(&mutex);
    if (res == 0)
            ++acquires_count;

    return res;
}

At first glance it seems OK, and I wasn't exactly sure what the problem was until I pulled it apart. GCC is well within its rights to produce the following code (annotated for clarity).

push   %ebp                     ; setting up stack frame
mov    %esp,%ebp                ; save esp to ebp
sub    $0x8,%esp                ; make some room on stack
movl   <mutex>,(%esp)           ; copy mutex to stack
call   <pthread_mutex_trylock>  ; call pthread_mutex_trylock
mov    <acquires_count>,%edx    ; copy acquires_count to %edx
cmp    $0x1,%eax                ; set carry flag if result == 0
adc    $0x0,%edx                ; add with carry
mov    %edx,<acquires_count>    ; write back acquires_count
leave                           ; release stack frame
ret                             ; done!

This code uses a bit of a trick to avoid a branch around code. It uses the carry flag, which is simple enough to understand when adding. Consider if you are adding 0001 + 1111 you end up with 11111, where the top bit is the carry bit. You can then feed this carry bit into another part of the addition operation if you were, say, using two 32 bit numbers to represent a 64 bit number on a 32 bit system.

The trick uses the dual nature of the carry flag as a "borrow" bit; if a < b when a - b then the bit is set. Consider the simplest case of 0 - 1 -- the carry bit is set to indicate to the next operation that a bit needs to be borrowed. Imagine for a moment we have a four bit machine and want to do the 8-bit unsigned operation

1111 0000 -   (240) -
0000 0001     (  1)
==== ====

First we do 0000 - 0001 which leaves us 1111 with the borrow bit set. Then we do 1111 - 0000 but take note that the borrow bit is set, so get rid of the borrowed bit, leaving us 1110. Putting the two together gives us 1110 1111 or 239.

Therefore the cmp $0x1,<result> instruction (which is really just result - 1 with no output but setting flags) can only set the carry flag (i.e. borrow a bit) if the value is 0; any other value in result means there will be no need to borrow and the carry flag is not set.

Consequently the add with carry (adc) will add one to the result only if the carry bit was set. A neat trick which avoids a jump, but not so much thread safe: if you are interrupted after the cmp but before the mov store you can write back the wrong value. Most people (including me till 5 minutes ago) looking at the code would assume that it was safe on the theory that acquires_count is not touched if you don't have the lock because it is inside the if statement.

Interestingly, the Intel compiler does the simpler jump method

push   <mutex>                     ; push argument
call   <pthread_mutex_trylock>
pop    %ecx                        ; get return addr on top
test   %eax,%eax                   ; AND %eax,%eax
jne    <trylock+0x16>  --------+   ; if that wasn't 0, jump out
addl   $0x1,<acquires_count>   |
ret                    <-------+   ; done!

Frankly I have no idea which one is faster/better (I would guess the second one because you don't always dirty a cache line, but you'd have to measure that against the cost of the branch) but obviously GCC thinks one thing and Intel thinks another and I bet if you put them in a room together you would end up with 3 opinions! I think the major point is that once upon a time it was mostly people writing kernels doing things like accessing I/O ports or other memory mapped areas that needed to be very careful the compiler didn't optimise out (or in) something incorrectly. With multi-threaded C code that problem moves up to the application developer too.

A different instruction set can help here. The Itanium version is possibly less prone to this sort of thing because predication makes it faster to implement conditional branches -- any instruction with a predicate that is not set is skipped. The load and add is safe because it will only be done with the lock held. However, if anyone is wondering why Itanium come with such big caches, the difference between the 6 line version and the Itanium version below should answer that.

 [MII]       alloc r32=ar.pfs,4,3,0               ; alloc stack frame
             mov r33=b0                           ; return pointer
             mov r34=r1
 [MMI]       addl r14=72,r1;;                     ; r11 = &mutex from GOT
             ld8 r11=[r14]                        ;
             nop.i 0x0;;
 [MIB]       mov r35=r11                          ; put into argument 1
             nop.i 0x0
             br.call.sptk.few b0=<pthread_mutex_trylock>
 [MII]       cmp4.eq.unc p0,p6=r8,r0              ; if result !0, set predicate p6
             mov.i ar.pfs=r32
             mov r1=r34;;                         ; load acquires_count from GOT
 [MIB]       addl r10=144,r1                      ;
             nop.i 0x0
       (p06) br.cond.dptk.few <trylock+0x80>;;    ; jump to [1] if p6 set
 [MII]       ld4 r9=[r10]
             mov b0=r33
             addl r3=144,r1;;
 [MMI]       adds r2=1,r9;;                       ; acquires_count++
             st4 [r3]=r2                          ; save updated value
             nop.i 0x0
 [MIB]       nop.m 0x0                            ; [1]
             nop.i 0x0
             br.few <trylock+0x90>;;              ; jump to [2]
 [MII]       nop.m 0x0
             mov b0=r33
             nop.i 0x0;;
 [MIB]       nop.m 0x0                            ; [2]
             nop.i 0x0
             br.ret.sptk.many b0;;

I guess the moral of the story is that multi-threading is hard, but we knew that anyway.

posted at: Mon, 29 Oct 2007 12:13 | in /code/badcode | permalink | add comment (7 others)

On incorrectly utilising "utilising"

My supervisor pointed out a very annoying habit I had formed when writing technical documents, best summed up below:

"It would be difficult to find a page of governmental, military, or academic writing that doesn't have on it the word utilize. It must be one of the most over-utilized words in the world. It seems as though people out to impress people with the significance of what they're doing use utilize when they should use use.

Utilize is not an elegant variation of the word use; it has its own distinct meaning. When you utilize something, you make do with something not normally used for the purpose, e.g., you utilize a dime when the bloody screwdriver is nowhere to be found. If the screwdriver were there, you'd use it, not utilize a stupid dime for the purpose. Use use when you mean use, and utilize only when it's properly used to mean--to use something not normally used. The computer went off-line, so they utilized Mr. Wang's abacus, the one he liked to use. Despite the temporary breakdown, the computer's use-rate was up (not its utilization-rate)."

Cheney, "Getting the words right" (1983)

You will now probably notice this subtle verbiage in most everything you read!

posted at: Tue, 21 Aug 2007 10:19 | in /code/badcode | permalink | add comment (5 others)

Software Engineering via Balkanisation

We were talking today about the relative ease of moving Darwin (the core of OS X) to L4 the Darbat guys have experienced. Chuck noted that major parts of IOKit, Mach and BSD pull apart more easily than one might expect.

Luckily we had an ex Apple IOKit architect there, who noted that one large factor behind the good software engineering arose because none of those three groups ever talked to each other, sometimes even to the point of "I'm a BSD guy, I don't talk to the IOKit guys".

It got me to thinking that high impedance between groups working on tightly integrated components is probably a good thing. A good example might be how none of the projects to port Linux on top of L4 have ever really stuck and fallen to bit-rot from non-maintenance (of course Wombat does not fall into this category! :).

Maybe the problem is that the Linux community talks too much, and this leads developers to becoming too familiar and too willing to cross domain boundaries (this has been studied and shown to be a problem in papers such as Maintainability of the Linux Kernel, Scach et al.).

Of course prominent kernel developers are not keen on committing to APIs for various (good) reasons. Clearly a stable ABI is unworkable, and perhaps the impedance levels for making core changes are already quite high because you've got to get it past a raft of maintainers before it goes anywhere. But alternatively perhaps everyone giving their 2c's worth and trying to work together to couple their work as closely as possible causes more boundary crossings than otherwise might happen. "Don't do that there, we'll tweak this subsystem to do that and move that bit up a level behind a new interface" etc. -- you want systems to be close, but not so close that they can never be untangled again.

So, is making your teams hate each other a valid software engineering approach?

posted at: Thu, 05 Oct 2006 22:17 | in /code/badcode | permalink | add comment (0 others)

Rusty Russell's "Hard to Misuse" rules

A helpful comment pointed me to Rusty Russell's Hard to Misuse Interface Levels rules (a play on "easy to use").

I've reproduced them here in one place, since I always seem to want to refer to it. See the original for fun examples. Higher is better.

  1. Impossible to get wrong
  2. Compiler/linker won't let you get it wrong
  3. Compiler/linker warns if you get it wrong
  4. Simplest use is correct
  5. The name tells you how to use it
  6. Do it right or breaks at runtime
  7. Follow the convention and you will get it right
  8. Read the documentation and you will get it right
  9. Read the implementation and you will get it right
  10. Read a mail list thread and you will get it right
  11. Read the documentation and you will get it wrong
  12. Follow the convention and you will get it wrong
  13. Do it right and it will break at runtime
  14. The name tells you how not to use it
  15. The obvious use is wrong
  16. Compiler/linker will warn you if you get it right
  17. Compiler/linker won't let you get it right
  18. Impossible to get right

posted at: Mon, 24 Jul 2006 16:46 | in /code/badcode | permalink | add comment (0 others)

The Science of Scientific Writing

The Science of Scientific Writing (Gopen and Swan, 1990) is a nice paper for anyone who has to write anything technical.

Since we read English left to right, we come to expect context at the start of our informational blocks (sentence) and data at the end. Messing with this scheme tends to confuse the reader, making them work very hard to understand your point. Some general rules from the paper I thought I might write down:

I'm interested in any other guides to writing people have come across.

posted at: Wed, 14 Jun 2006 16:54 | in /code/badcode | permalink | add comment (0 others)

The Datapoint 2200 - why we're stuck with little endian?

nfd made an interesting point in an architecture seminar recently, which suggested that the reason we are stuck with little endian on most of the world's desktop processors is due to an obscure computer (calculator?) called the Datapoint 2200. It seems Intel was contracted to replace the TTL parts in the Datapoint with a microprocessor, which turned into the 4004 processor. That turned into the 8008 (twice as good), which turned into the 8086, which pretty much brings us to today.

So why little endian? Well, the Datapoint used a shift-register memory, because it was cheap (<grandpa_simpson>as was the style at the time</grandpa_simpson>). AFAICS a shift register memory is just a big FIFO queue; you chain lots of bits together and you've got "memory". You put stuff in one end, take it out the other, and spend lots of time waiting in between!

So the Datapoint can use up to a 16 bit (2 byte) address. Therefore very often you'd want to do an add on a two byte value, and you need to make sure that the carry bit propagated. Now, imagine you use big endian, you have a situation that looks something like

cycling in a shift-register memory

You waste all those cycles shifting through the memory to propagate the carry bit. If you used little endian in this case, you're carry bit would propagate naturally upwards after your first shift out.

So for this reason little endian was the default for the Datapoint; Intel saw fit to copy it for the 4004, and the rest is history. Little endian has some other advantages, but I like this version of the story!

posted at: Fri, 19 May 2006 16:57 | in /code/badcode | permalink | add comment (0 others)

How to annoy customers looking to spend multiple thousands of dollars

Allow them to complete the entire order process, and then present them with the following screen at the last step. Continue doing this for a few hours.

We're sorry - We are experiencing
some technical problem

Clearly you shouldn't run any business critical systems on Dell and certainly don't employ anyone who lists "Dell Website Developer" on their resume.

posted at: Wed, 14 Dec 2005 22:20 | in /code/badcode | permalink | add comment (0 others)

On Changes

Debian bug 342658 has illustrated some ways to not run an Open Source project.

Firstly, and by far most importantly, keep a detailed change log. An entry like "Support for Blah has been added" is essentially useless, because it doesn't give someone looking in a detailed view of what has happened. The only sane way to do this is to keep a proper ChangeLog following (something like) the GNU ChangeLog format. For an example of how to do this without fail, see the glibc ChangeLog. I can attest that single ChangeLog file has helped chase down more bugs than anything else, including revision histories. Other good examples are gcc, binutils and emacs. If I can't figure out why something changed from that, I can always go back to the person who did the change to find it, or cross reference it with mailing list postings.

Secondly, if you're writing a library then test cases to stop regressions are of fundamental importance. Again glibc gets this right, where fixes to major bugs must be accompanied by a test case. You should be aiming for correctness and portability, and there's just no way to confirm this unless you actually test your library.

Without these two basic facilities, anyone fresh coming into your code is left treading water without anything to hold on to. And since Open Source is all about getting as many eyes as possible on the code, you need to offer people something to grab on to to avoid drowning in your sea of code.

posted at: Wed, 14 Dec 2005 10:05 | in /code/badcode | permalink | add comment (0 others)

how not to print a newline

fputc((int)"\n",fp);

posted at: Mon, 24 Oct 2005 21:07 | in /code/badcode | permalink | add comment (0 others)

IEC 60027-2 Units

It has been pointed out to me that in a previous post I incorrectly suggested the Ki/Mi/Gi style of units for describing binary multiples has something to do with SI units. These style of units are officially described in IEC 60027-2: Letter symbols to be used in electrical technology - Part 2: Telecommunications and electronics, thus you might like to refer to them as "IEC Units" or similar.

The NIST reference has this to say:

It is important to recognise that the new prefixes for binary multiples are not part of the International System of Units (SI), the modern metric system. However, for ease of understanding and recall, they were derived from the SI prefixes for positive powers of ten.

It is a considerable faux pas in many circles to get these things mixed up.

posted at: Thu, 20 Oct 2005 09:56 | in /code/badcode | permalink | add comment (0 others)

Detecting SEGV read/write on IA64

Today someone asked me why the following code works

#include <errno.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <signal.h>
#include <sys/mman.h>
#include <limits.h>
#include <features.h>


void segv_handler(int sig, siginfo_t *sip, void *c)
{
        struct sigcontext *scp = (struct sigcontext *)c;
        unsigned long i = *((unsigned long*)scp->sc_ip);
        printf("%s fault\n", ((i >> 21) & 1) ? "write" : "read");
        exit(1);
}


int main(void)
{
        struct sigaction sa;
        int *x;
        int m;

        sa.sa_handler = SIG_DFL;
        sa.sa_sigaction = segv_handler;
        sigemptyset(&sa.sa_mask);
        sigaddset(&sa.sa_mask,SIGIO);
        sigaddset(&sa.sa_mask,SIGALRM);

        sa.sa_flags = SA_SIGINFO;

        if (sigaction(SIGSEGV,&sa,NULL)) {
                printf(" Error assigning signal!\n");
        }

        x = valloc(1024);

        mprotect(x,1024,PROT_NONE);  //make it none access

        /* read fault */
        m = x[0];
        /* write fault */
        /* x[0] = 1;   */

        return 0;
}

This appears to be an extremley bad way to find out if you have taken a read or write fault with a SEGV on IA64.

It's actually a fluke that this works at all. Its trying to match part of the instruction opcode to see if it is a load or store but gets it completely wrong. Compile it with ICC and it will fail.

On IA64 we are provided with si_isr, the interrupt status register, which can tell us exactly what has happened. Below is the right way to do it.

#define _GNU_SOURCE 1
#include <errno.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <signal.h>
#include <sys/mman.h>
#include <limits.h>
#include <features.h>


void segv_handler(int sig, siginfo_t *sip, void *c)
{
        struct sigcontext *scp = (struct sigcontext *)c;
        unsigned long i = *((unsigned long*)scp->sc_ip);

        int x = (sip->si_isr >> 32) & 1; // bit 32
        int w = (sip->si_isr >> 33) & 1; // bit 33
        int r = (sip->si_isr >> 34) & 1; // bit 34
        printf("%s|%s|%s fault\n",
               r ? "r" : " ",
               w ? "w" : " ",
               x ? "x" : " ");
        exit(1);
}


int main(void)
{
        struct sigaction sa;
        volatile int *x;
        int m = 100;

        sa.sa_handler = SIG_DFL;
        sa.sa_sigaction = segv_handler;
        sigemptyset(&sa.sa_mask);
        sigaddset(&sa.sa_mask,SIGIO);
        sigaddset(&sa.sa_mask,SIGALRM);

        sa.sa_flags = SA_SIGINFO;

        if (sigaction(SIGSEGV,&sa,NULL)) {
                printf(" Error assigning signal!\n");
        }

        x = valloc(1024);

        mprotect(x,1024,PROT_NONE);  //make it none access

        /* read fault */
        m = x[0];
        /* write fault */
        //x[0] = 100;
        return 0;
}

posted at: Mon, 23 May 2005 14:48 | in /code/badcode | permalink | add comment (0 others)

Pthreads mutex subtleties

Is the following code valid?

#include <stdio.h>
#include <signal.h>
#include <pthread.h>

pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER;

void sigint_handler(int sig)
{
        pthread_mutex_unlock(&mutex);
        return;
}


int main(void)
{

        signal(SIGINT, sigint_handler);

        pthread_mutex_lock(&mutex);

        printf("About to wait\n");

        pthread_mutex_lock(&mutex);

        printf("Ok, done!\n");
}

It certainly works on Linux, but on at least FreeBSD it will not as deadlock detection will kick in when the thread tries to lock the mutex it already has locked. Using signals in threaded code is probably a bad idea but I believe the deadlock detection is broken in this case.

Of course, the right way to do this is to use a condition variable and use a pthread_cond_wait() call in main() and wake it up from the signal handler. This works fine.

Moral of the story -- check the return value of pthread_mutex_lock() because sometimes it may not be what you expect! It's much easier to debug a error message about pthread_mutex_lock returning some strange EDEADLCK code than to try and find why your multithreaded application seems to be randomly crashing on another operating system.

posted at: Thu, 24 Feb 2005 09:56 | in /code/badcode | permalink | add comment (0 others)

hnb - bad code

for an example of how not to do things, hnb is pretty good. it's a shame, because it looked like a handy, console based todo list with an xml backing. I don't mean to trash opensource work, but here's some constructive critisim

posted at: Wed, 02 Feb 2005 16:51 | in /code/badcode | permalink | add comment (0 others)

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 2.5 License.