RSS | technovelty home | page of ian | ian@wienand.org
See if you can spot the bug in this code!
#include <stdio.h>
#include <stdlib.h>
int main(void)
{
union {
unsigned char raw[8];
struct {
char one;
int two;
char three;
char four;
char five;
} formatted __attribute__((packed));
} test;
printf("one : %d\n", (int)&test.formatted.one - (int)&test);
printf("two : %d\n", (int)&test.formatted.two - (int)&test);
printf("three : %d\n", (int)&test.formatted.three - (int)&test);
printf("four : %d\n", (int)&test.formatted.four - (int)&test);
printf("five : %d\n", (int)&test.formatted.five - (int)&test);
return 0;
}
$ gcc -Wall -o packing packing.c $ ./packing one : 0 two : 4 three : 8 four : 9 five : 10
Here's the relevant bit from the gcc manual:
For an enum, struct or union type, you may specify attributes either between the enum, struct or union tag and the name of the type, or just past the closing curly brace of the definition. The former syntax is preferred.
By getting the packed attribute in the wrong place (if it's not clear, it should be before formatted), it is applied to the variable rather than the type. The compiler (both gcc and icc do this) has already laid out the structure, so misses the packing directive, and unfortunately doesn't warn (that may be a bug?).
I can tell you from experience this can be hard to track down!
posted at: Fri, 07 Nov 2008 22:13 | in /code/badcode | permalink | add comment (0 others)
Recently it has come up on the LKML and other blogs about how C is not thread safe. The nice piece of example code given is
int trylock()
{
int res;
res = pthread_mutex_trylock(&mutex);
if (res == 0)
++acquires_count;
return res;
}
At first glance it seems OK, and I wasn't exactly sure what the problem was until I pulled it apart. GCC is well within its rights to produce the following code (annotated for clarity).
push %ebp ; setting up stack frame mov %esp,%ebp ; save esp to ebp sub $0x8,%esp ; make some room on stack movl <mutex>,(%esp) ; copy mutex to stack call <pthread_mutex_trylock> ; call pthread_mutex_trylock mov <acquires_count>,%edx ; copy acquires_count to %edx cmp $0x1,%eax ; set carry flag if result == 0 adc $0x0,%edx ; add with carry mov %edx,<acquires_count> ; write back acquires_count leave ; release stack frame ret ; done!
This code uses a bit of a trick to avoid a branch around code. It uses the carry flag, which is simple enough to understand when adding. Consider if you are adding 0001 + 1111 you end up with 11111, where the top bit is the carry bit. You can then feed this carry bit into another part of the addition operation if you were, say, using two 32 bit numbers to represent a 64 bit number on a 32 bit system.
The trick uses the dual nature of the carry flag as a "borrow" bit; if a < b when a - b then the bit is set. Consider the simplest case of 0 - 1 -- the carry bit is set to indicate to the next operation that a bit needs to be borrowed. Imagine for a moment we have a four bit machine and want to do the 8-bit unsigned operation
1111 0000 - (240) - 0000 0001 ( 1) ==== ====
First we do 0000 - 0001 which leaves us 1111 with the borrow bit set. Then we do 1111 - 0000 but take note that the borrow bit is set, so get rid of the borrowed bit, leaving us 1110. Putting the two together gives us 1110 1111 or 239.
Therefore the cmp $0x1,<result> instruction (which is really just result - 1 with no output but setting flags) can only set the carry flag (i.e. borrow a bit) if the value is 0; any other value in result means there will be no need to borrow and the carry flag is not set.
Consequently the add with carry (adc) will add one to the result only if the carry bit was set. A neat trick which avoids a jump, but not so much thread safe: if you are interrupted after the cmp but before the mov store you can write back the wrong value. Most people (including me till 5 minutes ago) looking at the code would assume that it was safe on the theory that acquires_count is not touched if you don't have the lock because it is inside the if statement.
Interestingly, the Intel compiler does the simpler jump method
push <mutex> ; push argument call <pthread_mutex_trylock> pop %ecx ; get return addr on top test %eax,%eax ; AND %eax,%eax jne <trylock+0x16> --------+ ; if that wasn't 0, jump out addl $0x1,<acquires_count> | ret <-------+ ; done!
Frankly I have no idea which one is faster/better (I would guess the second one because you don't always dirty a cache line, but you'd have to measure that against the cost of the branch) but obviously GCC thinks one thing and Intel thinks another and I bet if you put them in a room together you would end up with 3 opinions! I think the major point is that once upon a time it was mostly people writing kernels doing things like accessing I/O ports or other memory mapped areas that needed to be very careful the compiler didn't optimise out (or in) something incorrectly. With multi-threaded C code that problem moves up to the application developer too.
A different instruction set can help here. The Itanium version is possibly less prone to this sort of thing because predication makes it faster to implement conditional branches -- any instruction with a predicate that is not set is skipped. The load and add is safe because it will only be done with the lock held. However, if anyone is wondering why Itanium come with such big caches, the difference between the 6 line version and the Itanium version below should answer that.
[MII] alloc r32=ar.pfs,4,3,0 ; alloc stack frame
mov r33=b0 ; return pointer
mov r34=r1
[MMI] addl r14=72,r1;; ; r11 = &mutex from GOT
ld8 r11=[r14] ;
nop.i 0x0;;
[MIB] mov r35=r11 ; put into argument 1
nop.i 0x0
br.call.sptk.few b0=<pthread_mutex_trylock>
[MII] cmp4.eq.unc p0,p6=r8,r0 ; if result !0, set predicate p6
mov.i ar.pfs=r32
mov r1=r34;; ; load acquires_count from GOT
[MIB] addl r10=144,r1 ;
nop.i 0x0
(p06) br.cond.dptk.few <trylock+0x80>;; ; jump to [1] if p6 set
[MII] ld4 r9=[r10]
mov b0=r33
addl r3=144,r1;;
[MMI] adds r2=1,r9;; ; acquires_count++
st4 [r3]=r2 ; save updated value
nop.i 0x0
[MIB] nop.m 0x0 ; [1]
nop.i 0x0
br.few <trylock+0x90>;; ; jump to [2]
[MII] nop.m 0x0
mov b0=r33
nop.i 0x0;;
[MIB] nop.m 0x0 ; [2]
nop.i 0x0
br.ret.sptk.many b0;;
I guess the moral of the story is that multi-threading is hard, but we knew that anyway.
posted at: Mon, 29 Oct 2007 12:13 | in /code/badcode | permalink | add comment (8 others)
My supervisor pointed out a very annoying habit I had formed when writing technical documents, best summed up below:
"It would be difficult to find a page of governmental, military, or academic writing that doesn't have on it the word utilize. It must be one of the most over-utilized words in the world. It seems as though people out to impress people with the significance of what they're doing use utilize when they should use use.
Utilize is not an elegant variation of the word use; it has its own distinct meaning. When you utilize something, you make do with something not normally used for the purpose, e.g., you utilize a dime when the bloody screwdriver is nowhere to be found. If the screwdriver were there, you'd use it, not utilize a stupid dime for the purpose. Use use when you mean use, and utilize only when it's properly used to mean--to use something not normally used. The computer went off-line, so they utilized Mr. Wang's abacus, the one he liked to use. Despite the temporary breakdown, the computer's use-rate was up (not its utilization-rate)."
Cheney, "Getting the words right" (1983)
You will now probably notice this subtle verbiage in most everything you read!
posted at: Tue, 21 Aug 2007 10:19 | in /code/badcode | permalink | add comment (5 others)
We were talking today about the relative ease of moving Darwin (the core of OS X) to L4 the Darbat guys have experienced. Chuck noted that major parts of IOKit, Mach and BSD pull apart more easily than one might expect.
Luckily we had an ex Apple IOKit architect there, who noted that one large factor behind the good software engineering arose because none of those three groups ever talked to each other, sometimes even to the point of "I'm a BSD guy, I don't talk to the IOKit guys".
It got me to thinking that high impedance between groups working on tightly integrated components is probably a good thing. A good example might be how none of the projects to port Linux on top of L4 have ever really stuck and fallen to bit-rot from non-maintenance (of course Wombat does not fall into this category! :).
Maybe the problem is that the Linux community talks too much, and this leads developers to becoming too familiar and too willing to cross domain boundaries (this has been studied and shown to be a problem in papers such as Maintainability of the Linux Kernel, Scach et al.).
Of course prominent kernel developers are not keen on committing to APIs for various (good) reasons. Clearly a stable ABI is unworkable, and perhaps the impedance levels for making core changes are already quite high because you've got to get it past a raft of maintainers before it goes anywhere. But alternatively perhaps everyone giving their 2c's worth and trying to work together to couple their work as closely as possible causes more boundary crossings than otherwise might happen. "Don't do that there, we'll tweak this subsystem to do that and move that bit up a level behind a new interface" etc. -- you want systems to be close, but not so close that they can never be untangled again.
So, is making your teams hate each other a valid software engineering approach?
posted at: Thu, 05 Oct 2006 22:17 | in /code/badcode | permalink | add comment (0 others)
A helpful comment pointed me to Rusty Russell's Hard to Misuse Interface Levels rules (a play on "easy to use").
I've reproduced them here in one place, since I always seem to want to refer to it. See the original for fun examples. Higher is better.
posted at: Mon, 24 Jul 2006 16:46 | in /code/badcode | permalink | add comment (0 others)
The Science of Scientific Writing (Gopen and Swan, 1990) is a nice paper for anyone who has to write anything technical.
Since we read English left to right, we come to expect context at the start of our informational blocks (sentence) and data at the end. Messing with this scheme tends to confuse the reader, making them work very hard to understand your point. Some general rules from the paper I thought I might write down:
Subjects and verbs should be together. Don't stretch out the sentence with extraneous information between what you are talking about and what you are saying about it. e.g.
The cache, a 128 entry, dual ported, 4-way set associative design with on core tags but off core data, was found to be the major bottleneck of the system.
The cache is the subject, and was found to be is the verb. They should not be separated by all the other stuff.
The cache was a 128 entry, dual ported, 4-way set associative design, with on core tags but off core data. It was found to be the major bottleneck of the system.
Similarly, discard material which does not discern significant connections. As in the above example, it is quite easy to get into a habit of putting largely irrelevant information right in the middle of a sentence.
The stress position is the area around the end of a sentence. When reading, we tend to want the opening of the sentence to lead us in gently, reaching the big conclusion at the end. We then pause briefly, think about it, and move onto the next sentence. By making sure the stress position stresses the right information, we help understanding.
10% of time was spent handling TLB refills in our flablah benchmark, which simulates a transaction processing workload.
The above sentence hits us with the important information first, leaving us to finish the sentence and then wonder "what was that figure again". It could become
Our flablah benchmark simulates a transaction processing workload. For this benchmark, we found that 10% of time was spent handling TLB refills.
By keeping worthy material in stress positions we can make things much easier to read.
The first part of the sentence, the setup component, is the topic position. One option is to have the topic position linking backwards, so we know what the upcoming stress position is referring to. This is more or less what we did in the example above ("For this benchmark..."). Otherwise we want the topic to introduce the person, thing or concept which we are about to talk about; to quote the paper "Bees disperse pollen" and "pollen is dispersed by bees" are both very similar, but one suggests we are talking about bees and one pollen. You need to make sure you are setting up the sentence so that the emphasis of the stress position relates the foremost concept in the readers mind.
Be careful that topic positions keep telling a consistent story. Seems logical, but it is easy to get into situations where you introduce too many ideas without relating them to what has come before.
I'm interested in any other guides to writing people have come across.
posted at: Wed, 14 Jun 2006 16:54 | in /code/badcode | permalink | add comment (0 others)
nfd made an interesting point in an architecture seminar recently, which suggested that the reason we are stuck with little endian on most of the world's desktop processors is due to an obscure computer (calculator?) called the Datapoint 2200. It seems Intel was contracted to replace the TTL parts in the Datapoint with a microprocessor, which turned into the 4004 processor. That turned into the 8008 (twice as good), which turned into the 8086, which pretty much brings us to today.
So why little endian? Well, the Datapoint used a shift-register memory, because it was cheap (<grandpa_simpson>as was the style at the time</grandpa_simpson>). AFAICS a shift register memory is just a big FIFO queue; you chain lots of bits together and you've got "memory". You put stuff in one end, take it out the other, and spend lots of time waiting in between!
So the Datapoint can use up to a 16 bit (2 byte) address. Therefore very often you'd want to do an add on a two byte value, and you need to make sure that the carry bit propagated. Now, imagine you use big endian, you have a situation that looks something like
You waste all those cycles shifting through the memory to propagate the carry bit. If you used little endian in this case, you're carry bit would propagate naturally upwards after your first shift out.
So for this reason little endian was the default for the Datapoint; Intel saw fit to copy it for the 4004, and the rest is history. Little endian has some other advantages, but I like this version of the story!
posted at: Fri, 19 May 2006 16:57 | in /code/badcode | permalink | add comment (3 others)
Allow them to complete the entire order process, and then present them with the following screen at the last step. Continue doing this for a few hours.
Clearly you shouldn't run any business critical systems on Dell and certainly don't employ anyone who lists "Dell Website Developer" on their resume.
posted at: Wed, 14 Dec 2005 22:20 | in /code/badcode | permalink | add comment (0 others)
Firstly, and by far most importantly, keep a detailed change log. An entry like "Support for Blah has been added" is essentially useless, because it doesn't give someone looking in a detailed view of what has happened. The only sane way to do this is to keep a proper ChangeLog following (something like) the GNU ChangeLog format. For an example of how to do this without fail, see the glibc ChangeLog. I can attest that single ChangeLog file has helped chase down more bugs than anything else, including revision histories. Other good examples are gcc, binutils and emacs. If I can't figure out why something changed from that, I can always go back to the person who did the change to find it, or cross reference it with mailing list postings.
Secondly, if you're writing a library then test cases to stop regressions are of fundamental importance. Again glibc gets this right, where fixes to major bugs must be accompanied by a test case. You should be aiming for correctness and portability, and there's just no way to confirm this unless you actually test your library.
Without these two basic facilities, anyone fresh coming into your code is left treading water without anything to hold on to. And since Open Source is all about getting as many eyes as possible on the code, you need to offer people something to grab on to to avoid drowning in your sea of code.
posted at: Wed, 14 Dec 2005 10:05 | in /code/badcode | permalink | add comment (0 others)
fputc((int)"\n",fp);
posted at: Mon, 24 Oct 2005 21:07 | in /code/badcode | permalink | add comment (0 others)
It has been pointed out to me that in a previous
post I incorrectly suggested the Ki/Mi/Gi style of
units for describing binary multiples has something to do with SI
units. These style of units are officially described in IEC
60027-2: Letter symbols to be used in electrical technology - Part 2:
Telecommunications and electronics, thus you might like to refer
to them as "IEC Units" or similar.
The NIST reference has this to say:
It is important to recognise that the new prefixes for binary multiples are not part of the International System of Units (SI), the modern metric system. However, for ease of understanding and recall, they were derived from the SI prefixes for positive powers of ten.
It is a considerable faux pas in many circles to get these things mixed up.
posted at: Thu, 20 Oct 2005 09:56 | in /code/badcode | permalink | add comment (0 others)
Today someone asked me why the following code works
#include <errno.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <signal.h>
#include <sys/mman.h>
#include <limits.h>
#include <features.h>
void segv_handler(int sig, siginfo_t *sip, void *c)
{
struct sigcontext *scp = (struct sigcontext *)c;
unsigned long i = *((unsigned long*)scp->sc_ip);
printf("%s fault\n", ((i >> 21) & 1) ? "write" : "read");
exit(1);
}
int main(void)
{
struct sigaction sa;
int *x;
int m;
sa.sa_handler = SIG_DFL;
sa.sa_sigaction = segv_handler;
sigemptyset(&sa.sa_mask);
sigaddset(&sa.sa_mask,SIGIO);
sigaddset(&sa.sa_mask,SIGALRM);
sa.sa_flags = SA_SIGINFO;
if (sigaction(SIGSEGV,&sa,NULL)) {
printf(" Error assigning signal!\n");
}
x = valloc(1024);
mprotect(x,1024,PROT_NONE); //make it none access
/* read fault */
m = x[0];
/* write fault */
/* x[0] = 1; */
return 0;
}
This appears to be an extremley bad way to find out if you have taken a read or write fault with a SEGV on IA64.
It's actually a fluke that this works at all. Its trying to match part of the instruction opcode to see if it is a load or store but gets it completely wrong. Compile it with ICC and it will fail.
On IA64 we are provided with si_isr, the interrupt
status register, which can tell us exactly what has happened.
Below is the right way to do it.
#define _GNU_SOURCE 1
#include <errno.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <signal.h>
#include <sys/mman.h>
#include <limits.h>
#include <features.h>
void segv_handler(int sig, siginfo_t *sip, void *c)
{
struct sigcontext *scp = (struct sigcontext *)c;
unsigned long i = *((unsigned long*)scp->sc_ip);
int x = (sip->si_isr >> 32) & 1; // bit 32
int w = (sip->si_isr >> 33) & 1; // bit 33
int r = (sip->si_isr >> 34) & 1; // bit 34
printf("%s|%s|%s fault\n",
r ? "r" : " ",
w ? "w" : " ",
x ? "x" : " ");
exit(1);
}
int main(void)
{
struct sigaction sa;
volatile int *x;
int m = 100;
sa.sa_handler = SIG_DFL;
sa.sa_sigaction = segv_handler;
sigemptyset(&sa.sa_mask);
sigaddset(&sa.sa_mask,SIGIO);
sigaddset(&sa.sa_mask,SIGALRM);
sa.sa_flags = SA_SIGINFO;
if (sigaction(SIGSEGV,&sa,NULL)) {
printf(" Error assigning signal!\n");
}
x = valloc(1024);
mprotect(x,1024,PROT_NONE); //make it none access
/* read fault */
m = x[0];
/* write fault */
//x[0] = 100;
return 0;
}
posted at: Mon, 23 May 2005 14:48 | in /code/badcode | permalink | add comment (0 others)
Is the following code valid?
#include <stdio.h>
#include <signal.h>
#include <pthread.h>
pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER;
void sigint_handler(int sig)
{
pthread_mutex_unlock(&mutex);
return;
}
int main(void)
{
signal(SIGINT, sigint_handler);
pthread_mutex_lock(&mutex);
printf("About to wait\n");
pthread_mutex_lock(&mutex);
printf("Ok, done!\n");
}
It certainly works on Linux, but on at least FreeBSD it will not as deadlock detection will kick in when the thread tries to lock the mutex it already has locked. Using signals in threaded code is probably a bad idea but I believe the deadlock detection is broken in this case.
Of course, the right way to do this is to use a condition variable
and use a pthread_cond_wait() call in main()
and wake it up from the signal handler. This works fine.
Moral of the story -- check the return value of
pthread_mutex_lock() because sometimes it may not be what
you expect! It's much easier to debug a error message about
pthread_mutex_lock returning some strange
EDEADLCK code than to try and find why your multithreaded
application seems to be randomly crashing on another operating
system.
posted at: Thu, 24 Feb 2005 09:56 | in /code/badcode | permalink | add comment (0 others)
for an example of how not to do things, hnb is pretty good. it's a shame, because it looked like a handy, console based todo list with an xml backing. I don't mean to trash opensource work, but here's some constructive critisim
indent(1L)can help you there
Node *pos=(Node *)data; ... return (int)pos;
sizeof(int) != sizeof(pointer). This also
says something about abstraction, since it does it so many times this
really hints it should have been broken out. posted at: Wed, 02 Feb 2005 16:51 | in /code/badcode | permalink | add comment (0 others)

This work is licensed under a Creative Commons Attribution-ShareAlike 2.5 License.