RSS | technovelty home | page of ian | ian@wienand.org
You work with Python for a while and you'll become familiar with printing a method and getting
<bound method Foo.function of <__main__.Foo instance at 0xb736960c>>
I think there is room for one more explanation on the internet, since I've never seen it diagrammed out (maybe for good reason!).
In the above diagram on the left, we have the fairly simple conceptual model of a class with a function. One naturally tends to think of the function as a part of the class and your instance calls into that function. This is conceptually correct, but a little abstracted from what's actually happening.
The right attempts to illustrate the underlying process in some more depth. The first step, on the top right, is building something like the following:
class Foo():
def function(self):
print "hi!"
As this illustrates, the above code results in two things happening; firstly a function object for function is created and secondly the __dict__ attribute of the class is given a key function that points to this function object.
Now the thing about this function object is that it implements the descriptor protocol. In short, if an object implements a __get__ function; then when that object is accessed as an attribute of an object the __get__ function is called. You can read up on the descriptor protocol, but the important part to remember is that it passes in the context from which it is called; that is the object that is calling the function.
So, for example, when we then do the following:
f = Foo() f.function()
what happens is that we get the attribute function of f and then call it. f above doesn't actually know anything about function as such — what it does know is its class inheritance and so Python goes searching the parent's class __dict__ to try and find the function attribute. It finds this, and as per the descriptor protocol when the attribute is accessed it calls the __get__ function of the underlying function object.
What happens now is that the function's __get__ method returns essentially a wrapper object that stores the information to bind the function to the object. This wrapper object is of type types.MethodType and you can see it stores some important attributes in the object — im_func which is the function to call, and im_self which is the object who called it. Passing the object through to im_self is how function gets it's first self argument (the calling object).
So when you print the value of f.function() you see it report itself as a bound method. So hopefully this illustrates that a bound method is a just a special object that knows how to call an underlying function with context about the object that's calling it.
To try and make this a little more concrete, consider the following program:
import types
class Foo():
def function(self):
print "hi!"
f = Foo()
# this is a function object
print Foo.__dict__['function']
# this is a method as returned by
# Foo.__dict__['function'].__get__()
print f.function
# we can check that this is an instance of MethodType
print type(f.function) == types.MethodType
# the im_func field of the MethodType is the underlying function
print f.function.im_func
print Foo.__dict__['function']
# these are the same object
print f.function.im_self
print f
Running this gives output something like
$ python ./foo.py <function function at 0xb73540d4> <bound method Foo.function of <__main__.Foo instance at 0xb736960c>> True <function function at 0xb73540d4> <function function at 0xb73540d4> <__main__.Foo instance at 0xb72c060c> <__main__.Foo instance at 0xb72c060c>
To pull it apart; we can see that Foo.__dict__['function'] is a function object, but then f.function is a bound method. The bound method's im_func is the underlying function object, and the im_self is the object f: thus im_func(im_self) is calling function with the correct object as the first argument self.
So the main point is to kind of shift thinking about a function as some particular intrinsic part of a class, but rather as a separate object abstracted from the class that gets bound into an instance as required. The class is in some ways a template and namespacing tool to allow you to find the right function objects; it doesn't actually implement the functions as such.
There is plenty more information if you search for "descriptor protocol" and Python binding rules and lots of advanced tricks you can play. But hopefully this is a useful introduction to get an initial handle on what's going on!
posted at: Tue, 06 Mar 2012 23:46 | in /code/python | permalink | add comment (1 others)
Something interesting I discovered about Python and --prefix that I can't see a lot of documentation on...
When you build Python you can use the standard --prefix flag to configure to home the installation as you require. You might expect that this would hard-code the location to look for the support libraries to the value you gave; however in reality it doesn't quite work like that.
Python will only look in the directory specified by prefix after it first searches relative to the path of the executing binary. Specifically, it looks at argv[0] and works through a few steps — is argv[0] a symlink? then dereference it. Does argv[0] have any slashes in it? if not, then search the $PATH for the binary. After this, it starts searching for dirname(argv[0])/lib/pythonX.Y/os.py, then dirname(argv[0])/../lib and so on, until it reaches the root. Only after these searches fail does the interpreter then fall back to the hard-coded path specified in the --prefix when configured.
What is the practical implications? It means you can move around a python installation tree and have it all "just work", which is nice. In my situation, I noticed this because we have a completely self-encapsulated build toolchain, but we wish to ship the same interpreter on the thing that we're building (during the build, we run the interpreter to create .pyc files for distribution, and we need to be sure that when we did this we didn't accidentally pick up any of the build hosts python; only the toolchain python).
The PYTHONHOME environment variable does override this behaviour; if it is set then the search stops there. Another interesting thing is that sys.prefix is therefore not the value passed in by --prefix during configure, but the value of the current dynamically determined prefix value.
If you run an strace, you can see this in operation.
readlink("/usr/bin/python", "python2.7", 4096) = 9
readlink("/usr/bin/python2.7", 0xbf8b014c, 4096) = -1 EINVAL (Invalid argument)
stat64("/usr/bin/Modules/Setup", 0xbf8af0a0) = -1 ENOENT (No such file or directory)
stat64("/usr/bin/lib/python2.7/os.py", 0xbf8af090) = -1 ENOENT (No such file or directory)
stat64("/usr/bin/lib/python2.7/os.pyc", 0xbf8af090) = -1 ENOENT (No such file or directory)
stat64("/usr/lib/python2.7/os.py", {st_mode=S_IFREG|0644, st_size=26300, ...}) = 0
stat64("/usr/bin/Modules/Setup", 0xbf8af0a0) = -1 ENOENT (No such file or directory)
stat64("/usr/bin/lib/python2.7/lib-dynload", 0xbf8af0a0) = -1 ENOENT (No such file or directory)
stat64("/usr/lib/python2.7/lib-dynload", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
Firstly it dereferences symlinks. Then it looks for Modules/Setup to see if it is running out of the build tree. Then it starts looking for os.py, walking its way upwards. One interesting thing that may either be a bug or a feature, I haven't decided, is that if you set the prefix to / then the interpreter will not go back to the root and then look in /lib. This is probably pretty obscure usage though!
All this is implemented in Modules/getpath.c which has a nice big comment at the top explaining the rules in detail.
posted at: Wed, 08 Feb 2012 16:16 | in /code/python | permalink | add comment (0 others)
I recently came across the pylint error:
E: 3,4:Foo.foo: An attribute affected in foo line 12 hide this method
in code that boiled down to essentially:
class Foo:
def foo(self):
return True
def foo_override(self):
return False
def __init__(self, override=False):
if override:
self.foo = self.foo_override
Unfortunately that message isn't particularly helpful in figuring out what's going on. I still can't claim to be 100% sure what the message is intended to convey, but I can construct something that maybe it's talking about.
Consider the following using the above class
foo = Foo() moo = Foo(override=True) print "expect True : %s" % foo.foo() print "expect False : %s" % moo.foo() print "expect True : %s" % Foo.foo(foo) print "expect False : %s" % Foo.foo(moo)
which gives output of:
$ python ./foo.py expect True : True expect False : False expect True : True expect False : True
Now, if you read just about any Python tutorial, it will say something along the lines of:
... the special thing about methods is that the object is passed as the first argument of the function. In our example, the call x.f() is exactly equivalent to MyClass.f(x). In general, calling a method with a list of n arguments is equivalent to calling the corresponding function with an argument list that is created by inserting the method’s object before the first argument. [Official Python Tutorial]
The official tutorial above is careful to say in general; others often don't.
The important point to remember is how python internally resolves attribute references as described by the data model. The moo.foo() call is really moo.__dict__["foo"](moo); examining the __dict__ for the moo object we can see that foo has been re-assigned:
>>> print moo.__dict__
{'foo': <bound method Foo.foo_override of <__main__.Foo instance at 0xb72838ac>>}
Our Foo.foo(moo) call is really Foo.__dict__["foo"](moo) -- the fact that we reassigned foo in moo is never noticed. If we were to do something like Foo.foo = Foo.foo_override we would modify the class __dict__, but that doesn't give us the original semantics.
So I postulate that the main point of this warning is to suggest to you that you're creating an instance that now behaves differently to its class. Because the symmetry of calling an instance and calling a class is well understood you might end up getting some strange behaviour, especially if you start with heavy-duty introspection of classes.
Thinking about various hacks and ways to re-write this construct is kind of interesting. I think I might have found a hook for a decent interview question :)
posted at: Thu, 26 Jan 2012 22:09 | in /code/python | permalink | add comment (3 others)
As I was debugging something recently, an instruction popped up that seemed a little incongruous:
lea 0x0(%edi,%eiz,1),%edi
Now this is an interesting instruction on a few levels. Firstly, %eiz is a psuedo-register that simply equates to zero somewhat like MIPS r0; I don't think it is really in common usage. But when you look closer, this instruction is a fancy way of doing nothing. It's a little clearer in Intel syntax mode:
lea edi,[edi+eiz*1+0x0]
So we can see that this is using scaled indexed addressing mode to load into %edi the value in %edi plus 0 * 1 with an offset of 0x0; i.e. put the value of %edi into %edi, i.e. do nothing. So why would this appear?
What we can see from the disassembley is that this single instruction takes up an impressive 7 bytes:
8048489: 8d bc 27 00 00 00 00 lea edi,[edi+eiz*1+0x0]
Now, compare that to a standard nop which requires just a single byte to encode. Thus to pad out 7 bytes of space would require 7 nop instructions to be issued, which is a significantly slower way of doing nothing! Let's investigate just how much...
Below is a simple program that does nothing in a tight-loop; firstly using nops and then the lea do-nothing method.
#include <stdio.h>
#include <stdint.h>
#include <time.h>
typedef uint64_t cycle_t;
static inline cycle_t
i386_get_cycles(void)
{
cycle_t result;
__asm__ __volatile__("rdtsc" : "=A" (result));
return result;
}
#define get_cycles i386_get_cycles
int main() {
int i;
uint64_t t1, t2;
t1 = get_cycles();
/* nop do nothing */
while (i < 100000) {
__asm__ __volatile__("nop;nop;nop");
i++;
}
t2 = get_cycles();
printf("%ld\n", t2 - t1);
i = 0;
t1 = get_cycles();
/* lea do-nothing */
while (i < 100000) {
__asm__ __volatile__("lea 0x0(%edi,%eiz,1),%edi");
i++;
}
t2 = get_cycles();
printf("%ld\n", t2 - t1);
}
Firstly, you'll notice that rather than the 7-bytes mentioned before, we're comparing 3-byte sequences. That's because the lea instruction ends up encoded as:
8048388: 8d 3c 27 lea (%edi,%eiz,1),%edi
When you hand-code this instruction, you can't actually convince the assembler to pad out those extra zeros for the zero displacement because it realises it doesn't need them, so why would it waste the space! So, how did they get in there in the original disassembley? If gas is trying to align something by padding, it has built-in sequences for the most efficient way of doing that for different sizes (you can see it in i386_align_code of gas/config/tc-i386.c which adds the extra 4 bytes in directly).
Anyway, we can build and test this out (note you need the special -mindex-reg flag passed to gas to use the %eiz syntax):
$ gcc -O3 -Wa,-mindex-reg -o wait wait.c $ ./wait 300072 189945
So, if you need 3-bytes of padding in your code for some reason, it's ~160% slower to pad out 3-bytes with no-ops rather than a single larger instruction (at least on my aging Pentium M laptop).
So now you can rest easy knowing that even though your code is doing nothing, it is doing it in the most efficient manner possible!
posted at: Fri, 29 Jul 2011 09:46 | in /code/arch | permalink | add comment (1 others)
I think maybe the world has moved on from Pyblosxom and, as my logs can attest, spammers seem to have moved on past Recaptcha (even if it is manual).
However, maybe there are some other holdouts like myself who will find this plugin useful. In reality, this will do nothing for stopping spam — for that I am currently having good luck with Akismet. However, I like that in displaying the captcha it costs the spammers presumably something to crack it so they can even attempt to submit the comment.
posted at: Wed, 02 Mar 2011 23:55 | in /code/weblog | permalink | add comment (0 others)
The documentation on ld's symbol versioning syntax is a little bit vague on "dependencies", which it talks about but doesn't give many details on.
Let's construct a small example:
#include <stdio.h>
#ifndef VERSION_2
void foo(int f) {
printf("version 1 called\n");
}
#else
void foo_v1(int f) {
printf("version 1 called\n");
}
__asm__(".symver foo_v1,foo@VERSION_1");
void foo_v2(int f) {
printf("version 2 called\n");
}
/* i.e. foo_v2 is really foo@VERSION_2
* @@ means this is the default version
*/
__asm__(".symver foo_v2,foo@@VERSION_2");
#endif
$ cat 1.ver
VERSION_1 {
global:
foo;
local:
*;
};
$ cat 2.ver
VERSION_1 {
local:
*;
};
VERSION_2 {
foo;
} VERSION_1;
$ cat main.c
#include <stdio.h>
void foo(int);
int main(void) {
foo(100);
return 0;
}
$ cat Makefile
all: v1 v2
libfoo.so.1 : foo.c
gcc -shared -fPIC -o libfoo.so.1 -Wl,--soname='libfoo.so.1' -Wl,--version-script=1.ver foo.c
libfoo.so.2 : foo.c
gcc -shared -fPIC -DVERSION_2 -o libfoo.so.2 -Wl,--soname='libfoo.so.2' -Wl,--version-script=2.ver foo.c
v1: main.c libfoo.so.1
ln -sf libfoo.so.1 libfoo.so
gcc -Wall -o v1 -lfoo -L. -Wl,-rpath=. main.c
v2: main.c libfoo.so.2
ln -sf libfoo.so.2 libfoo.so
gcc -Wall -o v2 -lfoo -L. -Wl,-rpath=. main.c
.PHONY: clean
clean:
rm -f libfoo* v1 v2
$ ./v1
version 1 called$ ./v2
version 2 called
In words, we create two libraries; a version 1 and a version 2, where we provide a new version of foo in the version 2 library. The soname is set in the libraries, so v1 and v2 can distinguish the correct library to use.
In the updated 2.ver version, we say that VERSION_2 depends on VERSION_1. So, the question is, what does this mean? Does it have any effect?
We can examine the version descriptors in the library and see that there is indeed a relationship recorded there.
[...] Version definition section '.gnu.version_d' contains 3 entries: Addr: 0x0000000000000264 Offset: 0x000264 Link: 5 (.dynstr) 000000: Rev: 1 Flags: BASE Index: 1 Cnt: 1 Name: libfoo.so.2 0x001c: Rev: 1 Flags: none Index: 2 Cnt: 1 Name: VERSION_1 0x0038: Rev: 1 Flags: none Index: 3 Cnt: 2 Name: VERSION_2 0x0054: Parent 1: VERSION_1
Looking at the specification we can see that each version definition has a vd_aux field which is a linked list of, essentially, strings that give "the version or dependency name". This is a little vague for a specification, however it appears to mean that the first entry is the name of the version specification, and any following elements are your dependencies. At least, this is how readelf interprets it when it shows you the "Parent" field in the output above.
This implies something that the ld documentation doesn't mention; in that you may list multiple dependencies for a version node. That does work, and readelf will just report more parents if you try it.
So the question is, what does this dependency actually do? Well, as far as I can tell, nothing really. The dynamic loader doesn't look at the dependency information; and doesn't have any need to — it is looking to resolve something specific, foo@VERSION_2 for example, and doesn't really care that VERSION_1 even exists.
ld does enforce the dependency, in that if you specify a dependent node but leave it out or accidentally erase it, the link will fail. However, it doesn't really convey anything other than its intrinsic documenation value.
posted at: Fri, 19 Nov 2010 13:30 | in /code/c | permalink | add comment (1 others)
In a previous post I mentioned split debugging info.
One addendum to this is how symbols are handled. Symbols are separate to debugging info (i.e. the stuff about variable names, types, etc you get when -g is turned on), but necessary for a good debugging experience.
You have a choice, however, of where you leave the symbol files. You can leave them in your shipping binary/library so that users who don't have the full debugging info available will still get a back-trace that at least has function names. The cost is slightly larger files for everyone, even if the symbols are never used. This appears to be what Redhat does with it's system libraries, for example.
The other option is to keep the symbols in the .debug files along-side the debug info. This results in smaller libraries, but really requires you to have the debug info files available to have workable debugging. This appears to be what Debian does.
So, how do you go about this? Well, it depends on what tools you're using.
For binutils strip, there is some asynchronicity between the --strip-debug and --only-keep-debug options. --strip-debug will keep the symbol table in the binary, and --only-keep-debug will also keep the symbol table.
$ gcc -g -o main main.c $ readelf --sections ./main | grep symtab [36] .symtab SYMTAB 00000000 000f48 000490 10 37 53 4 $ cp main main.debug $ strip --only-keep-debug main.debug $ readelf --sections ./main.debug | grep symtab [36] .symtab SYMTAB 00000000 000b1c 000490 10 37 53 4 $ strip --strip-debug ./main $ readelf --sections ./main.debug | grep symtab [36] .symtab SYMTAB 00000000 000b1c 000490 10 37 53 4
Of course, you can then strip (with no arguments) the final binary to get rid of the symbol table; but other than manually pulling out the .symtab section with objcopy I'm not aware of any way to remove it from the debug info file.
Constrast with elfutils; more commonly used on Redhat based system I think.
eu-strip's --strip-debug does the same thing; leaves the symtab section in the binary. However, it also has a -f option, which puts any removed sections during the strip into a separate file. Therefore, you can create any combination you wish; eu-strip -f results in an empty binary with symbols and debug data in the .debug file, while eu-strip -g -f results in debug data only in the .debug file, and symbol data retained in the binary.
The only thing to be careful about is using eu-strip -g -f and then further stripping the binary, and consequently destroying the symbol table, but retaining debug info. This can lead to some strange things in backtraces:
$ gcc -g -o main main.c
$ eu-strip -g -f main.debug main
$ strip ./main
$ gdb ./main
GNU gdb (GDB) 7.1-debian
...
(gdb) break foo
Breakpoint 1 at 0x8048397: file main.c, line 2.
(gdb) r
Starting program: /home/ianw/tmp/symtab/main
Breakpoint 1, foo (i=100) at main.c:2
2return i + 100;
(gdb) back
#0 foo (i=100) at main.c:2
#1 0x080483b1 in main () at main.c:6
#2 0x423f1c76 in __libc_start_main (main=Could not find the frame base for "__libc_start_main".
) at libc-start.c:228
#3 0x08048301 in ?? ()
Note one difference between strip and eu-strip is that binutils strip will leave the .gnu_debuglink section in, while eu-strip will not:
$ gcc -g -o main main.c $ eu-strip -g -f main.debug main $ readelf --sections ./main| grep debuglink [29] .gnu_debuglink PROGBITS 00000000 000bd8 000010 00 0 0 4 $ eu-strip main $ readelf --sections ./main| grep debuglink $ gcc -g -o main main.c $ eu-strip -g -f main.debug main $ strip main $ readelf --sections ./main| grep debuglink [27] .gnu_debuglink PROGBITS 00000000 0005d8 000010 00 0 0 4
posted at: Tue, 24 Aug 2010 11:39 | in /code | permalink | add comment (2 others)
The usual case for cross-compiling is that your target is so woefully slow and under-powered that you would be insane to do anything else.
However, sometimes for one of the best reasons of all, "historical reasons", you might ship a 64-bit product but support building on 32-bit hosts, and thus cross-compile even on a very fast architecture like x86. How much does this cost, even though almost everyone is running the 32-bit cross-compiler on a modern 64-bit machine?
To test, I got a a 32-bit cross and a 64-bit native x86_64 compiler and toolchain; in this case based on gcc-4.1.2 and binutils 2.17. I then did a allyesconfig build of Linux 2.6.33 x86_64 kernel 3 times using the cross compilier toolchain and then native one. The results (in seconds):
| 32-bit | 64-bit |
|---|---|
| 6090 | 5684 |
| 6050 | 5616 |
| 6063 | 5652 |
| average | |
| 6067 | 5650 |
So, all up, ~7% less by building your 64-bit code on a 64-bit machine with a 32-bit cross-compiler.
posted at: Thu, 06 May 2010 16:50 | in /code/c | permalink | add comment (2 others)
Some time ago I wrote a description of the -Bsymbolic linker flag which could do with some further explanation. The original article is a good starting place.
One interesting point that I didn't go into was the potential for code optimisation -Bsymbolic brings about. I'm not sure if I missed that at the time, or the toolchain changed, both are probably equally likely!
Let me recap the example...
ianw@jj:/tmp/bsymbolic$ cat Makefile
all: test test-bsymbolic
clean:
rm -f *.so test testsym
liboverride.so : liboverride.c
$(CC) -Wall -O2 -shared -fPIC -o liboverride.so $<
libtest.so : libtest.c
$(CC) -Wall -O2 -shared -fPIC -o libtest.so $<
libtest-bsymbolic.so : libtest.c
$(CC) -Wall -O2 -shared -fPIC -Wl,-Bsymbolic -o $@ $<
test : test.c libtest.so liboverride.so
$(CC) -Wall -O2 -L. -Wl,-rpath=. -ltest -o $@ $<
test-bsymbolic : test.c libtest-bsymbolic.so liboverride.so
$(CC) -Wall -O2 -L. -Wl,-rpath=. -ltest-bsymbolic -o $@ $<
$ cat liboverride.c
#include <stdio.h>
int foo(void)
{
printf("override foo called\n");
return 0;
}
$ cat libtest.c
#include <stdio.h>
int foo(void) {
printf("libtest foo called\n");
return 1;
}
int test_foo(void) {
return foo();
}
$ cat test.c
#include <stdio.h>
int test_foo(void);
int main(void)
{
printf("%d\n", test_foo());
return 0;
}
In words; libtest.so provides test_foo(), which calls foo() to do the actual work. libtest-bsymbolic.so is simply built with the flag in question, -Bsymbolic. liboverride.so provides the alternative version of foo() designed to override the original via a LD_PRELOAD of the library.
test is built against libtest.so, test-bsymbolic against libtest-bsymbolic.so.
Running the examples, we can see that the LD_PRELOAD does not override the symbol in the library built with -Bsymbolic.
$ ./test libtest foo called 1 $ ./test-bsymbolic libtest foo called 1 $ LD_PRELOAD=liboverride.so ./test override foo called 0 $ LD_PRELOAD=liboverride.so ./test-bsymbolic libtest foo called 1
There are a couple of things going on here. Firstly, you can see that the SYMBOLIC flag is set in the dynamic section, leading to the dynamic linker behaviour I explained in the original article:
ianw@jj:/tmp/bsymbolic$ readelf --dynamic ./libtest-bsymbolic.so Dynamic section at offset 0x550 contains 22 entries: Tag Type Name/Value 0x00000001 (NEEDED) Shared library: [libc.so.6] 0x00000010 (SYMBOLIC) 0x0 ...
However, there is also an effect on generated code. Have a look at the PLTs:
$ objdump --disassemble-all ./libtest.so Disassembly of section .plt: [... blah ...] 0000039c <foo@plt>: 39c: ff a3 10 00 00 00 jmp *0x10(%ebx) 3a2: 68 08 00 00 00 push $0x8 3a7: e9 d0 ff ff ff jmp 37c <_init+0x30>
$ objdump --disassemble-all ./libtest-bsymbolic.so
Disassembly of section .plt:
00000374 <__gmon_start__@plt-0x10>:
374: ff b3 04 00 00 00 pushl 0x4(%ebx)
37a: ff a3 08 00 00 00 jmp *0x8(%ebx)
380: 00 00 add %al,(%eax)
...
00000384 <__gmon_start__@plt>:
384: ff a3 0c 00 00 00 jmp *0xc(%ebx)
38a: 68 00 00 00 00 push $0x0
38f: e9 e0 ff ff ff jmp 374 <_init+0x30>
00000394 <puts@plt>:
394: ff a3 10 00 00 00 jmp *0x10(%ebx)
39a: 68 08 00 00 00 push $0x8
39f: e9 d0 ff ff ff jmp 374 <_init+0x30>
000003a4 <__cxa_finalize@plt>:
3a4: ff a3 14 00 00 00 jmp *0x14(%ebx)
3aa: 68 10 00 00 00 push $0x10
3af: e9 c0 ff ff ff jmp 374 <_init+0x30>
Notice the difference? There is no PLT entry for foo() when -Bsymbolic is used.
Effectively, the toolchain has noticed that foo() can never be overridden and optimised out the PLT call for it. This is analogous to using "hidden" attributes for symbols, which I have detailed in another article on symbol visiblity attributes (which also goes into PLT's, if the above meant nothing to you).
So -Bsymbolic does have some more side-effects than just setting a flag to tell the dynamic linker about binding rules -- it can actually result in optimised code. However, I'm still struggling to find good use-cases for -Bsymbolic that can't be better done with Version scripts and visibility attributes. I would certainly recommend using these methods if at all possible.
Thanks to Ryan Lortie for comments on the original article.
posted at: Mon, 22 Mar 2010 15:40 | in /code/c | permalink | add comment (1 others)
So, you have some application where you want the user to specify a remote host/port, and you want to support IPv4 and IPv6.
For literal addresses, things are fairly simple. IPv4 addresses are simple, and RFC2732 has things covered by putting the IPv6 address within square brackets.
It gets more interesting as to what you should do with hostnames. The problem is that getaddrinfo can return you multiple addresses, but without extra disambiguation from the user it is very difficult to know which one to choose. RFC4472 discusses this, but there does not appear to be any good solution.
Possibly you can do something like ping/ping6 and have a separate program name or configuration flag to choose IPv6. This comes at a cost of transparency.
The glibc implementation of getaddrinfo() puts considerable effort into deciding if you have an IPv6 interface up and running before it will return an IPv6 address. It will even recognise link-local addresses and sort addresses more likely to work to the front of the returned list as described here. However, there is still a small possibility that the IPv6 interface doesn't actually work, and so the library will sort the IPv6 address as first in the returned list when maybe it shouldn't be.
If you are using TCP, you can connect to each address in turn to find one that works. With UDP, however, the connect essentially does nothing.
So I believe probably the best way to handle hostnames for UDP connections, at least on Linux/glibc, is to trust getaddrinfo to return the sanest values first, try a connect on the socket anyway just for extra security and then essentially hope it works. Below is some example code to do that (literal address splitter bit stolen from Python's httplib).
import socket
DEFAULT_PORT = 123
host = '[fe80::21c:a0ff:fb27:7196]:567'
# the port will be anything after the last :
p = host.rfind(":")
# ipv6 literals should have a closing brace
b = host.rfind("]")
# if the last : is outside the [addr] part (or if we don't have []'s
if (p > b):
try:
port = int(host[p+1:])
except ValueError:
print "Non-numeric port"
raise
host = host[:p]
else:
port = DEFAULT_PORT
# now strip off ipv6 []'s if there are any
if host and host[0] == '[' and host[-1] == ']':
host = host[1:-1]
print "host = <%s>, port = <%d>" % (host, port)
the_socket = None
res = socket.getaddrinfo(host, port, socket.AF_UNSPEC, socket.SOCK_DGRAM)
# go through all the returned values, and choose the ipv6 one if
# we see it.
for r in res:
af,socktype,proto,cname,sa = r
try:
the_socket = socket.socket(af, socktype, proto)
the_socket.connect(sa)
except socket.error, e:
# connect failed! try the next one
continue
break
if the_socket == None:
raise socket.error, "Could not get address!"
# ready to send!
the_socket.send("hi!")
posted at: Tue, 16 Mar 2010 15:54 | in /code/python | permalink | add comment (0 others)
The ABC overhauled DIG Jazz (now I think it's just called "ABC Jazz") and upgraded from the oh-so 2008 XML playlist to a much more web-cool JSON one.
Hence Version 3 (source) of the applet. Now with improved HTML escaping and different colors.
Check out The Dilworths while you're there!
posted at: Sat, 27 Feb 2010 00:44 | in /code/gnome | permalink | add comment (0 others)
I've recently found out a bit more about separating debug info, and thought a consolidated reference might be handy.
Most every distribution now provides separate debug packages which contain only the debug info, saving much space for the 99% of people who never want to start gdb.
This is achieved with objcopy and --only-keep-debug/--add-gnu-debuglink and is well explained in the man page.
This adds a .gnu_debuglink section to the binary with the name of debug file to look for.
$ gcc -g -shared -o libtest.so libtest.c $ objcopy --only-keep-debug libtest.so libtest.debug $ objcopy --add-gnu-debuglink=libtest.debug libtest.so $ objdump -s -j .gnu_debuglink libtest.so libtest.so: file format elf32-i386 Contents of section .gnu_debuglink: 0000 6c696274 6573742e 64656275 67000000 libtest.debug... 0010 52a7fd0a R...
The first part is the name of the file, the second part is a check-sum of debug-info file for later reference.
Did you know that binaries also get stamped with a unique id when they are built? The ld --build-id flag stamps in a hash near the end of the link.
$ readelf --wide --sections ./libtest.so | grep build [ 1] .note.gnu.build-id NOTE 000000d4 0000d4 000024 00 A 0 0 4 $ objdump -s -j .note.gnu.build-id libtest.so libtest.so: file format elf32-i386 Contents of section .note.gnu.build-id: 00d4 04000000 14000000 03000000 474e5500 ............GNU. 00e4 a07ab0e4 7cd54f60 0f5cf66b 5799b05c .z..|.O`.\.kW..\ 00f4 2d43f456 -C.V
Incase you're wondering what the format of that is...
uint32 name_size; /* size of the name */ uint32 hash_size; /* size of the hash */ uint32 identifier; /* NT_GNU_BUILD_ID == 0x3 */ char name[name_size]; /* the name "GNU" */ char hash[hash_size]; /* the hash */
Although the actual file may change (due to prelink or similar) the hash will not be updated and remain constant.
The last piece of the puzzle is how gdb attempts to find the debug-info files when it is run. The main variable influencing this is debug-file-directory.
(gdb) show debug-file-directory The directory where separate debug symbols are searched for is "/usr/lib/debug".
The first thing gdb does, which you can verify via an strace, is search for a file called [debug-file-directory]/.build-id/xx/yyyyyy.debug; where xx is the first two hexadecimal digits of the hash, and yyy the rest of it:
$ objdump -s -j .note.gnu.build-id /bin/ls
/bin/ls: file format elf32-i386
Contents of section .note.gnu.build-id:
8048168 04000000 14000000 03000000 474e5500 ............GNU.
8048178 c6fd8024 2a11673c 7c6a5af6 2c65b1b5 ...$*.g<|jZ.,e..
8048188 d7e13fd4 ..?.
... [running gdb /bin/ls] ...
access("/usr/lib/debug/.build-id/c6/fd80242a11673c7c6a5af62c65b1b5d7e13fd4.debug", F_OK) = -1 ENOENT (No such file or directory)
Next it moves onto the debug-link info filename. First it looks for the filename in same directory as the object being debugged. After that it looks for the file in a sub-directory called .debug/ in the same directory.
Finally, it prepends the debug-file-directory to the path of the object being inspected and looks for the debug info there. This is why the /usr/lib/debug directory looks like the root of a file-system; if you're looking for the debug-info of /usr/lib/libfoo.so it will be looked for in /usr/lib/debug/usr/lib/libfoo.so.
Interestingly, the sysroot and solib-search-path don't appear to have anything to do with these lookups. So if you change the sysroot, you also need to change the debug-file-directory to match.
However, most distributions make all this "just work", so hopefully you'll never have to worry about anyway!
posted at: Fri, 22 Jan 2010 09:11 | in /code | permalink | add comment (4 others)
By now everybody has now heard about Go, Google's expressive, concurrent, garbage collecting language. One big, glaring thing stuck out at me when I was reading the documentation:
Do not communicate by sharing memory; instead, share memory by communicating.
One of the examples given is a semaphore using a channel, which I'll copy here for posterity.
var sem = make(chan int, MaxOutstanding)
func handle(r *Request) {
sem <- 1; // Wait for active queue to drain.
process(r); // May take a long time.
<-sem; // Done; enable next request to run.
}
func Serve(queue chan *Request) {
for {
req := <-queue;
go handle(req); // Don't wait for handle to finish.
}
}
Here is a little illustration of that in operation.
Serve creates goroutines via the go keyword; each of which tries to get a slot in the channel. In the example there are only 3 slots, so it acts like a semaphore of count 3. When done, each thread returns its slot to the channel, which allows anyone blocked to be woken and continued.
This instantly reminded me of the very first thing you need to do if you ever want to pass Advanced Operating Systems -- write a semaphore server to provide synchronisation within your OS.
In L4, threads communicate with each other via inter-process communication (IPC). IPC messages have a fixed format - you specify a target thread, bundle some data into the available slots in the IPC format and fire it off. By default you block waiting for a reply -- this all happens within a single call for efficiency. On the other side, you can write servers who are listening for remote IPC connections, where everything happens in reverse.
Here's another illustration the of the trivial semaphore server concept Shehjar and I implemented.
Look familiar? Instead of a blocking push of a number into a slot into a channel, you make a blocking IPC call to a remote server.
My point here is that both take the approach of sharing memory via communication. When using IPC, you bundle up all your information into the available slots in the IPC message and send it. When using a channel, you bundle your information into an entry in the channel and call your goroutine. Receiving the IPC is the same as draining a channel - both result in you getting the information that was bundled into it by the caller.
| IPC | Go |
|---|---|
| Start thread | Start goroutine |
| New thread blocks listening for IPC message | Goroutine blocks draining empty channel |
| Bundle information into IPC message | Bundle data into type of your channel |
| Send IPC to new thread | Push data into channel |
| Remote thread unbundles IPC | goroutine drains channel and gets data |
Whenever you mention the word "microkernel", people go off the deep-end and one thing they seem to forget about is the inherent advantages of sharing state only via communication. As soon as you do that, you've broken open an amazing new tool for concurrency, which is now implicitly implied. By communicating via messages/channels rather than shared global state, it doesn't matter where you run! One of those threads in the example could be running on another computer in your cloud, marshalling up it's IPC messages/channel entries and sending them over TCP/IP -- nobody would care!
At any rate, do not communicate by sharing memory; instead, share memory by communicating is certainly an idea whose time has come.
posted at: Fri, 20 Nov 2009 11:37 | in /code | permalink | add comment (5 others)
I recently finished The Race for a New Game Machine: Creating the Chips Inside the XBox 360 and the Playstation 3 (David Shippy and Mickie Phipps); an interesting insight into the processor development process from some of the lead architects.
The executive summary is : Sony, Toshiba and IBM (STI) decided to get together to create the core of the Playstation 3 — the Cell processor. Sony, with their graphics and gaming experience, would do the Synergistic Processing Elements; extremely fast but limited sub-units specialising in doing 3D graphics and physics work (i.e. great for games). IBM would do a Power based core that handled the general purpose computing requirements.
The twist comes when Microsoft came along to IBM looking for the Xbox 360 processor, and someone at IBM mentioned the Power core that was being worked on for the Playstation. Unsurprisingly, the features being built for the Playstaion also interested Microsoft, and the next thing you know, IBM is working on the same core for Microsoft and Sony at the same time, without telling either side.
This whole chain of events makes for a very interesting story. The book is written for a general audience, but you'll probably get the most out of it if you already have some knowledge of computer architecture; if you're trying to understand some of the concepts referred to from the two line descriptions you'll get a bit lost (H&P it is not).
The only small criticism is that it sometimes falls into reading a bit like a long LinkedIn recommendation. However, the book is very well paced, and throws in just enough technical tidbits amongst the corporate and personal dramas to make it a very fun read.
One thing that is talked about a bit is the fan-out of four (FO4) metric used in the designers quest to push the chip as fast as possible (and, as mentioned many times in the book, faster than what Intel could do!). I thought it might be useful to expand on this interesting metric a bit.
One problem facing chip architects is that, thanks to Moore's Law, it is hard to find a constant to compare design versus implementation. For example, you may design an amazing logic-block to factor large integers into products of prime numbers, but somebody else with better fabrication facilities might be able to essentially brute-force a better solution by producing faster hardware using a much less innovative design.
Some metric is needed that can compare the two designs discounting who has the better fabrication process. This is where the FO4 comes in.
When you change the input to a logic gate, it is not like it magically flips the output to the correct level instantaneously. There is a latency while everything settles to its correct level — the gate delay. The more gates connected to the output of a gate the more current required, which has additional effects on latency. The FO4 latency is defined as the time required to flip an inverter gate connected to (fanned-out) to four other inverter gates.
Thus you can describe the latency of other logic blocks in multiples of FO4 latencies. As this avoids measuring against wall-time it is an effective description of the relative efficiency of logic designs. For example, you may calculate that your factoriser has a latency of 100 FO4. Just because someone else's 200 FO4 factoriser gets a result a few microseconds faster thanks to their fancy ultra-low-FO4-latency fabrication process, you can still show that your design, at least a priori, is better.
The book refers several times to efforts to reduce the FO4 of the processor as much as possible. The reason this is important in this context is that the maximum latency on the critical path will determine the fastest clock speed you can run the processor at. For reasons explained in the book high clock speed was a primary goal, so every effort had to be made to reduce latencies.
All modern processors operate as a production line, with each stage doing some work and passing it on to the next stage. Clearly the slowest stage determines the maximum speed that the production line can run at (weakest link in the chain and all that). For example, if you clock at 1Ghz, that means each cycle takes 1 nanosecond (1s / 1,000,000,000Hz). If you have a F04 latency of say, 10 picoseconds, that means any given stage can have a latency of no more than 100 FO4 — otherwise that stage would not have enough time to settle and actually produce the correct result.
Thus the smaller you can get the FO4 latencies of your various stages, the higher you can safely up the clock speed. One way around long latencies might be to split-up your logic into smaller stages, making a much longer pipeline (production line). For example, split your 100 FO4 block into two 50 FO4 stages. You can now clock the processor higher, but this doesn't necessarily mean you'll get actual results out the end of the pipeline any faster (as Intel discovered with the Pentium 4 and it's notoriously long pipelines and corresponding high clock rates).
Of course, this doesn't even begin to describe the issues with superscalar design, instruction level parallelism, cache interaction and the myriad of other things the architects have to consider.
Anyway, after reading this book I guarantee you'll have an interesting new insight the next time you fire-up Guitar Hero.
posted at: Wed, 15 Jul 2009 19:15 | in /code/arch | permalink | add comment (2 others)
It seems the ABC updated the DIG Jazz now-playing list format, breaking V1. Some quick flash disassembly and a bit of hacking, and order is restored. As a bonus, it now shows the upcoming songs.
Source or Debian package.
posted at: Mon, 18 May 2009 23:20 | in /code/gnome | permalink | add comment (0 others)

This work is licensed under a Creative Commons Attribution-ShareAlike 2.5 License.