RSS | technovelty home | page of ian | ian@wienand.org
I've recently found out a bit more about separating debug info, and thought a consolidated reference might be handy.
Most every distribution now provides separate debug packages which contain only the debug info, saving much space for the 99% of people who never want to start gdb.
This is achieved with objcopy and --only-keep-debug/--add-gnu-debuglink and is well explained in the man page.
This adds a .gnu_debuglink section to the binary with the name of debug file to look for.
$ gcc -g -shared -o libtest.so libtest.c $ objcopy --only-keep-debug libtest.so libtest.debug $ objcopy --add-gnu-debuglink=libtest.debug libtest.so $ objdump -s -j .gnu_debuglink libtest.so libtest.so: file format elf32-i386 Contents of section .gnu_debuglink: 0000 6c696274 6573742e 64656275 67000000 libtest.debug... 0010 52a7fd0a R...
The first part is the name of the file, the second part is a check-sum of debug-info file for later reference.
Did you know that binaries also get stamped with a unique id when they are built? The ld --build-id flag stamps in a hash near the end of the link.
$ readelf --wide --sections ./libtest.so | grep build [ 1] .note.gnu.build-id NOTE 000000d4 0000d4 000024 00 A 0 0 4 $ objdump -s -j .note.gnu.build-id libtest.so libtest.so: file format elf32-i386 Contents of section .note.gnu.build-id: 00d4 04000000 14000000 03000000 474e5500 ............GNU. 00e4 a07ab0e4 7cd54f60 0f5cf66b 5799b05c .z..|.O`.\.kW..\ 00f4 2d43f456 -C.V
Incase you're wondering what the format of that is...
uint32 name_size; /* size of the name */ uint32 hash_size; /* size of the hash */ uint32 identifier; /* NT_GNU_BUILD_ID == 0x3 */ char name[name_size]; /* the name "GNU" */ char hash[hash_size]; /* the hash */
Although the actual file may change (due to prelink or similar) the hash will not be updated and remain constant.
The last piece of the puzzle is how gdb attempts to find the debug-info files when it is run. The main variable influencing this is debug-file-directory.
(gdb) show debug-file-directory The directory where separate debug symbols are searched for is "/usr/lib/debug".
The first thing gdb does, which you can verify via an strace, is search for a file called [debug-file-directory]/.build-id/xx/yyyyyy.debug; where xx is the first two hexadecimal digits of the hash, and yyy the rest of it:
$ objdump -s -j .note.gnu.build-id /bin/ls
/bin/ls: file format elf32-i386
Contents of section .note.gnu.build-id:
8048168 04000000 14000000 03000000 474e5500 ............GNU.
8048178 c6fd8024 2a11673c 7c6a5af6 2c65b1b5 ...$*.g<|jZ.,e..
8048188 d7e13fd4 ..?.
... [running gdb /bin/ls] ...
access("/usr/lib/debug/.build-id/c6/fd80242a11673c7c6a5af62c65b1b5d7e13fd4.debug", F_OK) = -1 ENOENT (No such file or directory)
Next it moves onto the debug-link info filename. First it looks for the filename in same directory as the object being debugged. After that it looks for the file in a sub-directory called .debug/ in the same directory.
Finally, it prepends the debug-file-directory to the path of the object being inspected and looks for the debug info there. This is why the /usr/lib/debug directory looks like the root of a file-system; if you're looking for the debug-info of /usr/lib/libfoo.so it will be looked for in /usr/lib/debug/usr/lib/libfoo.so.
Interestingly, the sysroot and solib-search-path don't appear to have anything to do with these lookups. So if you change the sysroot, you also need to change the debug-file-directory to match.
However, most distributions make all this "just work", so hopefully you'll never have to worry about anyway!
posted at: Fri, 22 Jan 2010 09:11 | in /code | permalink | add comment (2 others)
I was recently driving through the California desert and came across the Salton Sea. Long story short - it rained a lot and the Colorado River overflowed a bunch of dams and dikes meant to contain it and created a huge inland sea. Oops.
Some enterprising souls must have decided that despite the lack of any natural flushing dooming the sea to a salty, polluted existence, there was ripe opportunity to create a sea-side metropolis.
From the ground, it is a bit of a fun ghost town to explore. The typical "everything just abandoned" type thing. But when I came to geotag some photos I took there, I was quite astonished to see this.
That looks exactly like what I used to do in SimCity. I'd use the F-U-N-D-S cheat at the start to max out my money, then build my little empire with neat roads and school and harbours and whatnot — they've even got an airport! Then I'd press "go" and people would slowly move in to the residential areas, one house on one block at a time.
I guess poor old Salton City never made it past "turtle speed"!
posted at: Mon, 11 Jan 2010 16:26 | in /humor | permalink | add comment (3 others)
Mark this one down as another in the long list of "duh" — once you realise what is going on!
Bug report comes in about a long running daemon that has stopped logging. lsof reports the log file is now named logfile~ and further more is deleted! This happens after a system upgrade scenario, so of course I go off digging through a multitude of scripts and what-not to find the culprit...
Have you got it yet?
Try this...
# lsof | grep syslogd | grep messages syslogd 1376 root 15w REG 3,1 99851 4605625 /var/log/messages
# cd /var/log/ # vi messages (and save the file)
root@jj:/var/log# lsof | grep syslogd | grep messages syslogd 1376 root 15w REG 3,1 99851 4605625 /var/log/messages~ (deleted)
vi is very careful and renames your existing file, so that if anything goes wrong when writing the new version you can get something back. It's a shame the daemon doesn't know about this! The kernel is happy to deal with the rename, but when the backup file is unlinked you're out of luck. Confusingly to a casual inspection your log file looks like it's there ... just that nothing is going into it. (oh, and if you tried that, you might like to restart syslogd now :)
Moral of the story -- overcome that finger-memory and never use vi on a live file; you're asking for trouble!
posted at: Fri, 08 Jan 2010 16:15 | in /linux/tips | permalink | add comment (6 others)
So, how to strip a shared library?
--strip-unneeded states that it removes all symbols that are not needed for relocation processing. This is a little cryptic, because one might reasonably assume that a shared library can be "relocated", in that it can be loaded anywhere. However, what this really refers to is object files that are usually built and bundled into a .a archive for static linking. For an object in an static library archive to still be useful, global symbols must be kept, although static symbols can be removed. Take the following small example:
$ cat libtest.c
static int static_var = 100;
int global_var = 100;
static int static_function(void) {
return static_var;
}
int global_function(int i) {
return static_function() + global_var + i;
}
Before stripping:
$ gcc -c -fPIC -o libtest.o libtest.c
$ readelf --symbols ./libtest.o
Symbol table '.symtab' contains 18 entries:
Num: Value Size Type Bind Vis Ndx Name
...
5: 00000000 4 OBJECT LOCAL DEFAULT 5 static_var
6: 00000000 22 FUNC LOCAL DEFAULT 3 static_function
13: 00000004 4 OBJECT GLOBAL DEFAULT 5 global_var
16: 00000016 36 FUNC GLOBAL DEFAULT 3 global_function
After stripping:
$ strip --strip-unneeded libtest.o
$ readelf --symbols ./libtest.o
Symbol table '.symtab' contains 15 entries:
Num: Value Size Type Bind Vis Ndx Name
...
10: 00000004 4 OBJECT GLOBAL DEFAULT 5 global_var
13: 00000016 36 FUNC GLOBAL DEFAULT 3 global_function
If you --strip-all from this object file, it will remove the entire .symtab section and will be useless for further linking, because you'll never be able to find global_function to call it!.
Shared libraries are different, however. Shared libraries keep global symbols in a separate ELF section called .dynsym. --strip-all will not touch the dynamic symbol entires, and thus it is therefore safe to remove all the "standard" symbols from the output file, without affecting the usability of the shared library. For example, readelf will still show the .dynsym symbols even after stripping:
$ gcc -shared -fPIC -o libtest.so libtest.c
$ strip --strip-all ./libtest.so
$ readelf --syms ./libtest.so
Symbol table '.dynsym' contains 11 entries:
Num: Value Size Type Bind Vis Ndx Name
...
6: 00000452 36 FUNC GLOBAL DEFAULT 12 global_function
10: 000015e0 4 OBJECT GLOBAL DEFAULT 21 global_var
However, --strip-unneeded is smart enough to realise that a shared-object library doesn't need the .symtab section as well and remove it.
So, conclusions? --strip-all is safe on shared libraries, because global symbols remain in a separate section, but not on objects for inclusion in static libraries (relocatable objects). --strip-unneeded is safe for both, and automatically understands shared objects do not need any .symtab entries to function and removes them; effectively doing the same work as --strip-all. So, --strip-unneeded is essentially the only tool you need for standard stripping needs!
See also
posted at: Wed, 23 Dec 2009 16:39 | in /linux | permalink | add comment (1 others)
I usually find blog rants useless, but sometimes something is just so annoying one is sufficiently inspired. Today I went with my parents to buy them a Tivo at Harvey Norman, Norwest, Castle Hill, NSW, Australia. I am a big Tivo fan; the interface is good and it "just works". I don't mind paying for (or in this recommending paying for) good products.
After selecting the Tivo model, I asked for a HDMI cable. The salesman made a series of questions about what sort of HD TV it was being plugged into; I quickly sensed this as a probe to see what sort of suckers we were, and requested just a "normal" cable.
At this point, he insisted on a $130 (you guessed it) Monster cable, and had the audacity to say that we didn't need one of the really expensive cables because our TV wasn't good enough! I openly expressed my concern, but the annoying high-pressure sales pitch had just begun. The amount of, frankly, crap that he spewed about 4-bit this, 10-bit that, legislating of labels, DA signal levels, mythical customers who regretted buying the cheap cables and who knows what else was to the point of being comical if it weren't so insistent and said with such seeming authority.
There is only one thing that matters - if the cable has passed the functional requirements for being certified to have the distinctive HDMI logo plastered on it. From the HDMI FAQ:
Q. What testing is required?
Prior to mass producing or distributing any Licensed Product or component that claims compliance with the HDMI Specification (or allowing someone else to do such activities), each Adopter must test a representative sample for HDMI compliance. First, the Adopter must self test as specified in the then-current HDMI Compliance Test Specification. The HDMI Compliance Test Specification provides a suite of testing procedures, and establishes certain minimum requirements specifying how each HDMI Adopter should test Licensed Products for conformance to the HDMI Specification.
Now, I can understand that if you buy any old HDMI cable off Ebay for $1, it may be a knock-off that uses the HDMI logo illegally. But there is no way that the certified $50 Philips cable (still very over-priced, but at least not insane, and discounted to $35) performs any differently to some overpriced Monster model certified to exactly the same standard.
The thing that annoyed me most was his analogy to buying a tyre. He stated that "if you walked up to a tyre salesman and I said don't want the Pirelli's, just put the cheap-o tires on my Ferrari" I'd be insane, and thus by extension of that logic I was insane for not buying a Monster cable for my great new Tivo.
This analogy is completely flawed and really just dishonest. A Ferrari is much more powerful and goes much faster than a standard car. It is plausible it needs a better engineered tyre to perform adequately given the additional stresses it undergoes. A Tivo doesn't put out any more or any less bits than any other HDMI certified equipment, no matter what you do. If the cable is certified as getting all the bits to the other end under whatever environmental conditions specified by the HDMI people, then it's going to work for the 99% of people with normal requirements.
Nobody wants to make a significant investment in a piece of audio-visual equipment and feel they are getting something that isn't optimal. Harvey Norman's use of this understandable consumer sentiment to sell ridiculously over-priced cables that do nothing is extremely disappointing.
I'm sure the commissions on these things encourage this behaviour, so it is useless expecting the retailer or individual sales assistant to change their policy to recommend reasonably priced cables. However, it is really Tivo and other manufacturers who get the raw end of this deal; a $130 cable is over 20% of the price the actual Tivo! That is surely affecting people's purchasing decisions.
If Tivo and others included a certified HDMI cable with their device, as they do with component cables, and had "Certified HDMI 1.3 cable included" plastered on the box, it would be a harder sell to explain why the manufacturer would bother shipping a certified cable that is supposedly insufficient, and consumers would hopefully avoid the very distasteful high-pressure theatrics I was subjected to today.
Update: I have removed my description of the individual salesman in the title. Singling someone out invites ad hominem attacks and I have no interest in providing a forum for or perpetuating any such thing.
If it's one salesman, it's a thousand. To reiterate my main point, manufacturers must surely be annoyed that they participate in price wars with each other only to have their margins taken by a gold-plated optical cable company. I believe it is really up to them to get the information into their own market so it can operate efficiently.
posted at: Tue, 22 Dec 2009 20:23 | in /general | permalink | add comment (5 others)
On a recent trip up the Oregon coast, a friendly doorman at our hotel in Portland was inquiring about our trip. When we mentioned we passed through Bandon, OR, he quipped that Bandon was the place furthest from a city of one million people in the USA. I guess a normal person would just think "oh, that's interesting" and move on, but it has been plaguing me ever since.
Firstly, I had to find what cities in the USA had more than 1 million people. Luckily Wolfram Alpha gives the answer:
(I certainly wouldn't have guessed that list!) From there my plan was to find the bounding box of the continental USA; luckily Wikipedia has the raw data for that. Combined with the latitude and longitude of the cities above, I had the raw data.
I couldn't figure out any way better than a simple brute-force of testing every degree and minute of latitude and longitude within the bounding box and calculating the distance to the closest large city; the theory being that from one particular point you would have to travel further than any other to reach a city of 1 million people. Luckily, that is short work for a modern processor, and hopefully the result would be a point somewhere around Bandon. I'd already become acquainted with the great circle and measuring distances when I did Tinymap, so a quick python program evolved.
However, it turns out that the program picks the far south-east corner of the bounding box. Thanks to the shape of the USA, that is way out over the ocean somewhere. I can't figure out a way to get an outline of the USA to test if a given point is inside the border or not, but all is not lost.
I modified the program to output the the distance to the closest large city along with the location to a log file, and then imported it into gnuplot to make a heat-map. The hardest part was finding an equirectangular outline of the USA to place the heat-map over, rather than a much more common Mercator projection; Wikimedia to the rescue!
I actually surprised myself at how well the two lined up when, after a little work with Gimp, I overlayed them (big)
From this, I can see that Bandon, about a third of the way up the Oregon coast, is a pretty good candidate. However, probably not the best; I get the feeling the real point that is the furthest from any city of 1 million people is actually somewhere in the central-middle of Montana.
However, we can also fiddle the program slightly to disprove the point about Bandon. The numbers show the closest large city to Bandon is LA, at ~1141km. Taking another point we suspect to be more remote; the closest large city to Portland (where we met the doorman) is also LA at ~1329km. So to reach the closest large city you have to travel further from Portland than Bandon, so Bandon is not the furthest place in the USA from a city of one million people. Myth busted!
posted at: Sun, 06 Dec 2009 11:28 | in /general | permalink | add comment (9 others)
By now everybody has now heard about Go, Google's expressive, concurrent, garbage collecting language. One big, glaring thing stuck out at me when I was reading the documentation:
Do not communicate by sharing memory; instead, share memory by communicating.
One of the examples given is a semaphore using a channel, which I'll copy here for posterity.
var sem = make(chan int, MaxOutstanding)
func handle(r *Request) {
sem <- 1; // Wait for active queue to drain.
process(r); // May take a long time.
<-sem; // Done; enable next request to run.
}
func Serve(queue chan *Request) {
for {
req := <-queue;
go handle(req); // Don't wait for handle to finish.
}
}
Here is a little illustration of that in operation.
Serve creates goroutines via the go keyword; each of which tries to get a slot in the channel. In the example there are only 3 slots, so it acts like a semaphore of count 3. When done, each thread returns its slot to the channel, which allows anyone blocked to be woken and continued.
This instantly reminded me of the very first thing you need to do if you ever want to pass Advanced Operating Systems -- write a semaphore server to provide synchronisation within your OS.
In L4, threads communicate with each other via inter-process communication (IPC). IPC messages have a fixed format - you specify a target thread, bundle some data into the available slots in the IPC format and fire it off. By default you block waiting for a reply -- this all happens within a single call for efficiency. On the other side, you can write servers who are listening for remote IPC connections, where everything happens in reverse.
Here's another illustration the of the trivial semaphore server concept Shehjar and I implemented.
Look familiar? Instead of a blocking push of a number into a slot into a channel, you make a blocking IPC call to a remote server.
My point here is that both take the approach of sharing memory via communication. When using IPC, you bundle up all your information into the available slots in the IPC message and send it. When using a channel, you bundle your information into an entry in the channel and call your goroutine. Receiving the IPC is the same as draining a channel - both result in you getting the information that was bundled into it by the caller.
| IPC | Go |
|---|---|
| Start thread | Start goroutine |
| New thread blocks listening for IPC message | Goroutine blocks draining empty channel |
| Bundle information into IPC message | Bundle data into type of your channel |
| Send IPC to new thread | Push data into channel |
| Remote thread unbundles IPC | goroutine drains channel and gets data |
Whenever you mention the word "microkernel", people go off the deep-end and one thing they seem to forget about is the inherent advantages of sharing state only via communication. As soon as you do that, you've broken open an amazing new tool for concurrency, which is now implicitly implied. By communicating via messages/channels rather than shared global state, it doesn't matter where you run! One of those threads in the example could be running on another computer in your cloud, marshalling up it's IPC messages/channel entries and sending them over TCP/IP -- nobody would care!
At any rate, do not communicate by sharing memory; instead, share memory by communicating is certainly an idea whose time has come.
posted at: Fri, 20 Nov 2009 11:37 | in /code | permalink | add comment (5 others)
Although Django is well packaged for Debian, I've recently come to the conculsion that the packages are really not what I want. The problem is that my server runs Debian stable, while my development laptop runs unstable, and Django revisions definitely fall into the "unstable" category. There really is no way to use a system Django 1.1 on one side, and a system Django 1.0 on the other.
After a bit of work, I think I've got something together that works, and I post it here in the hope it is useful for someone else. This info has been gleaned from similar references such as this and this.
This is aimed at running a server using Debian stable (5.0) for production and an unstable environment for development. You actually need both to get this running. This is based on a project called "project" that lives in /var/www
$ virtualenv --no-site-packages project New python executable in project/bin/python Installing setuptools............done.
/var/www$ cd project /var/www/project$ . bin/activate (project) /var/www/project$
(project) /var/www/project$ easy_install pip Searching for pip Reading http://pypi.python.org/simple/pip/ Reading http://pip.openplans.org Best match: pip 0.4 Downloading http://pypi.python.org/packages/source/p/pip/pip-0.4.tar.gz#md5=b45714d04f8fd38fe8e3d4c7600b91a2 Processing pip-0.4.tar.gz Running pip-0.4/setup.py -q bdist_egg --dist-dir /tmp/easy_install-Wu9O-U/pip-0.4/egg-dist-tmp-xjSdxq warning: no previously-included files matching '*.txt' found under directory 'docs/_build' no previously-included directories found matching 'docs/_build/_sources' zip_safe flag not set; analyzing archive contents... pip: module references __file__ Adding pip 0.4 to easy-install.pth file Installing pip script to /var/www/project/bin Installed /var/www/project/lib/python2.5/site-packages/pip-0.4-py2.5.egg Processing dependencies for pip Finished processing dependencies for pip
(project) /var/www/project$ easy_install setuptools==0.6c9 Searching for setuptools==0.6c9 Reading http://pypi.python.org/simple/setuptools/ Best match: setuptools 0.6c9 Downloading http://pypi.python.org/packages/2.5/s/setuptools/setuptools-0.6c9-py2.5.egg#md5=fe67c3e5a17b12c0e7c541b7ea43a8e6 Processing setuptools-0.6c9-py2.5.egg Moving setuptools-0.6c9-py2.5.egg to /var/www/project/lib/python2.5/site-packages Removing setuptools 0.6c8 from easy-install.pth file Adding setuptools 0.6c9 to easy-install.pth file Installing easy_install script to /var/www/project/bin Installing easy_install-2.5 script to /var/www/project/bin Installed /var/www/project/lib/python2.5/site-packages/setuptools-0.6c9-py2.5.egg Processing dependencies for setuptools==0.6c9 Finished processing dependencies for setuptools==0.6c9
(project) /var/www/project$ cat requirements.txt -e svn+http://code.djangoproject.com/svn/django/tags/releases/1.0.3/#egg=Django (project) /var/www/project$ pip install -r requirements.txt Obtaining Django from svn+http://code.djangoproject.com/svn/django/tags/releases/1.0.3/#egg=Django (from -r requirements.txt (line 1)) Checking out http://code.djangoproject.com/svn/django/tags/releases/1.0.3/ to ./src/django (project) /var/www/project$ pip install -r requirements.txt Obtaining Django from svn+http://code.djangoproject.com/svn/django/tags/releases/1.0.3/#egg=Django (from -r requirements.txt (line 1)) Checking out http://code.djangoproject.com/svn/django/tags/releases/1.0.3/ to ./src/django ... so on ...
activate_this = "/var/www/project/bin/activate_this.py" execfile(activate_this, dict(__file__=activate_this)) from django.core.handlers.modpython import handler
(project) /var/www/project$ mkdir project (project) /var/www/project/project$ mkdir db django media www (project) /var/www/project/project$ cd django/ (project) /var/www/project/project/django$ django-admin startproject myproject
DocumentRoot /var/www/project
<Location "/">
SetHandler python-program
PythonHandler project-python
PythonPath "['/var/www/project/','/var/www/project/project/django/'] + sys.path"
SetEnv DJANGO_SETTINGS_MODULE myproject.settings
PythonDebug On
</Location>
Alias /media /var/www/project/project/media
<Location "/media">
SetHandler none
</Location>
<Directory "/var/www/project/project/media">
AllowOverride none
Order allow,deny
Allow from all
Options FollowSymLinks Indexes
</Directory>
With all this, you should be up and running in a basic but stable environment. It's easy enough to update packages for security fixes, etc via pip after activating your virtualenv.
posted at: Fri, 11 Sep 2009 22:49 | in /linux/debian | permalink | add comment (6 others)
Here's an interesting behaviour that, as far as I can tell, is completley undocumented, sightly consfusing but fairly logical. Your program should receive a SIGTTOU when it is running in the background and attempts to output to the terminal -- the idea being that you shouldn't scramble the output by mixing it in while the shell is trying to operate. Here's what the bash manual has to say
Background processes are those whose process group ID differs from the terminal's; such processes are immune to key- board-generated signals. Only foreground processes are allowed to read from or write to the terminal. Background processes which attempt to read from (write to) the terminal are sent a SIGTTIN (SIGTTOU) signal by the terminal driver, which, unless caught, suspends the process.
So, consider the following short program, which writes some output and catches any SIGTTOU's, with an optional flag to switch between canonical and non-canonical mode.
#include <stdio.h>
#include <stdlib.h>
#include <signal.h>
#include <termios.h>
#include <unistd.h>
static void sig_ttou(int signo) {
printf("caught SIGTTOU\n");
signal(SIGTTOU, SIG_DFL);
kill(getpid(), SIGTTOU);
}
int main(int argc, char *argv[]) {
signal(SIGTTOU, sig_ttou);
if (argc != 1) {
struct termios tty;
printf("setting non-canoncial mode\n");
tcgetattr(fileno(stdout), &tty);
tty.c_lflag &= ~(ICANON);
tcsetattr(fileno(stdout), TCSANOW, &tty);
}
int i = 0;
while (1) {
printf(" *** %d ***\n", i++);
sleep(1);
}
}
This program ends up operating in an interesting manner.
$ ./sigttou & *** 0 *** [1] 26171 $ *** 1 *** *** 2 *** *** 3 ***
$ ./sigttou 1 & [1] 26494 ianw@jj:/tmp$ setting non-canoncial mode caught SIGTTOU [1]+ Stopped ./sigttou 1
$ stty tostop $ ./sigttou & [2] 26531 ianw@jj:/tmp$ *** 0 *** caught SIGTTOU [2]+ Stopped ./sigttou
You can see a practical example of this by comparing the difference
between cat file & and more file &.
The semantics make some sense -- anything switching off canonical mode
is like to be going to really scramble your terminal, so it's good to
stop it and let it's terminal handling functions run. I'm not sure
why canoncial background is considered useful mixed in with your
prompt, but someone, somewhere must have decided it was so.
Update: upon further investigation, it is the switching of terminal modes that invokes the SIGTTOU. To follow the logic through more, see the various users of tty_check_change in the tty driver.
posted at: Fri, 21 Aug 2009 11:02 | in /linux/tips | permalink | add comment (0 others)
My attempt at answering that most important of questions : where should one place their plate in the microwave to achieve maximal heating?
posted at: Thu, 23 Jul 2009 12:23 | in /humor | permalink | add comment (1 others)
I recently finished The Race for a New Game Machine: Creating the Chips Inside the XBox 360 and the Playstation 3 (David Shippy and Mickie Phipps); an interesting insight into the processor development process from some of the lead architects.
The executive summary is : Sony, Toshiba and IBM (STI) decided to get together to create the core of the Playstation 3 — the Cell processor. Sony, with their graphics and gaming experience, would do the Synergistic Processing Elements; extremely fast but limited sub-units specialising in doing 3D graphics and physics work (i.e. great for games). IBM would do a Power based core that handled the general purpose computing requirements.
The twist comes when Microsoft came along to IBM looking for the Xbox 360 processor, and someone at IBM mentioned the Power core that was being worked on for the Playstation. Unsurprisingly, the features being built for the Playstaion also interested Microsoft, and the next thing you know, IBM is working on the same core for Microsoft and Sony at the same time, without telling either side.
This whole chain of events makes for a very interesting story. The book is written for a general audience, but you'll probably get the most out of it if you already have some knowledge of computer architecture; if you're trying to understand some of the concepts referred to from the two line descriptions you'll get a bit lost (H&P it is not).
The only small criticism is that it sometimes falls into reading a bit like a long LinkedIn recommendation. However, the book is very well paced, and throws in just enough technical tidbits amongst the corporate and personal dramas to make it a very fun read.
One thing that is talked about a bit is the fan-out of four (FO4) metric used in the designers quest to push the chip as fast as possible (and, as mentioned many times in the book, faster than what Intel could do!). I thought it might be useful to expand on this interesting metric a bit.
One problem facing chip architects is that, thanks to Moore's Law, it is hard to find a constant to compare design versus implementation. For example, you may design an amazing logic-block to factor large integers into products of prime numbers, but somebody else with better fabrication facilities might be able to essentially brute-force a better solution by producing faster hardware using a much less innovative design.
Some metric is needed that can compare the two designs discounting who has the better fabrication process. This is where the FO4 comes in.
When you change the input to a logic gate, it is not like it magically flips the output to the correct level instantaneously. There is a latency while everything settles to its correct level — the gate delay. The more gates connected to the output of a gate the more current required, which has additional effects on latency. The FO4 latency is defined as the time required to flip an inverter gate connected to (fanned-out) to four other inverter gates.
Thus you can describe the latency of other logic blocks in multiples of FO4 latencies. As this avoids measuring against wall-time it is an effective description of the relative efficiency of logic designs. For example, you may calculate that your factoriser has a latency of 100 FO4. Just because someone else's 200 FO4 factoriser gets a result a few microseconds faster thanks to their fancy ultra-low-FO4-latency fabrication process, you can still show that your design, at least a priori, is better.
The book refers several times to efforts to reduce the FO4 of the processor as much as possible. The reason this is important in this context is that the maximum latency on the critical path will determine the fastest clock speed you can run the processor at. For reasons explained in the book high clock speed was a primary goal, so every effort had to be made to reduce latencies.
All modern processors operate as a production line, with each stage doing some work and passing it on to the next stage. Clearly the slowest stage determines the maximum speed that the production line can run at (weakest link in the chain and all that). For example, if you clock at 1Ghz, that means each cycle takes 1 nanosecond (1s / 1,000,000,000Hz). If you have a F04 latency of say, 10 picoseconds, that means any given stage can have a latency of no more than 100 FO4 — otherwise that stage would not have enough time to settle and actually produce the correct result.
Thus the smaller you can get the FO4 latencies of your various stages, the higher you can safely up the clock speed. One way around long latencies might be to split-up your logic into smaller stages, making a much longer pipeline (production line). For example, split your 100 FO4 block into two 50 FO4 stages. You can now clock the processor higher, but this doesn't necessarily mean you'll get actual results out the end of the pipeline any faster (as Intel discovered with the Pentium 4 and it's notoriously long pipelines and corresponding high clock rates).
Of course, this doesn't even begin to describe the issues with superscalar design, instruction level parallelism, cache interaction and the myriad of other things the architects have to consider.
Anyway, after reading this book I guarantee you'll have an interesting new insight the next time you fire-up Guitar Hero.
posted at: Wed, 15 Jul 2009 19:15 | in /code/arch | permalink | add comment (2 others)
It seems the ABC updated the DIG Jazz now-playing list format, breaking V1. Some quick flash disassembly and a bit of hacking, and order is restored. As a bonus, it now shows the upcoming songs.
Source or Debian package.
posted at: Mon, 18 May 2009 23:20 | in /code/gnome | permalink | add comment (0 others)
I think the most correct way to describe utilisation of a hash-table is using chi-squared distributions and hypothesis and degrees of freedom and a bunch of other things nobody but an actuary remembers. So I was looking for a quick method that was close-enough but didn't require digging out a statistics text-book.
I'm sure I've re-invented some well-known measurement, but I'm not sure what it is. The idea is to add up the total steps required to look-up all elements in the hash-table, and compare that to the theoretical ideal of a uniformly balanced hash-table. You can then get a ratio that tells you if you're in the ball-park, or if you should try something else. A diagram should suffice.
This seems to give quite useful results with a bare minimum of effort, and most importantly no tricky floating point math. For example, on the standard Unix words with a 2048 entry hash-table, the standard DJB hash came out very well (as expected)
Ideal 2408448 Actual 2473833 ---- Ratio 0.973569
To contrast, a simple "add each character" type hash:
Ideal 2408448 Actual 6367489 ---- Ratio 0.378241
Example code is hash-ratio.py. I expect this measurement is most useful when you have a largely static bunch of data for which you are attempting to choose an appropriate hash-function. I guess if you are really trying to hash more or less random incoming data and hence only have a random sample to work with, you can't avoid doing the "real" statistics.
posted at: Thu, 07 May 2009 16:37 | in /code | permalink | add comment (1 others)
If you code for long enough on x86-64, you'll eventually hit an error such as:
(.text+0x3): relocation truncated to fit: R_X86_64_32S against symbol `array' defined in foo section in ./pcrel8.o
Here's a little example that might help you figure out what you've done wrong.
Consider the following code:
$ cat foo.s .globl foovar .section foo, "aw",@progbits .type foovar, @object .size foovar, 4 foovar: .long 0 .text .globl _start .type function, @function _start: movq $foovar, %rax
In case it's not clear, that would look something like:
int foovar = 0;
void function(void) {
int *bar = &foovar;
}
Let's build that code, and see what it looks like
$ gcc -c foo.s $ objdump --disassemble-all ./foo.o ./foo.o: file format elf64-x86-64 Disassembly of section .text: 0000000000000000 <_start>: 0: 48 c7 c0 00 00 00 00 mov $0x0,%rax Disassembly of section foo: 0000000000000000 <foovar>: 0: 00 00 add %al,(%rax) ...
We can see that the mov instruction has only allocated 4 bytes (00 00 00 00) for the linker to put in the address of foovar. If we check the relocations:
$ readelf --relocs ./foo.o Relocation section '.rela.text' at offset 0x3a0 contains 1 entries: Offset Info Type Sym. Value Sym. Name + Addend 000000000003 00050000000b R_X86_64_32S 0000000000000000 foovar + 0
The R_X86_64_32S relocation is indeed only a 32-bit relocation. Now we can tickle this error. Consider the following linker script, which puts the foo section about 5 gigabytes away from the code.
$ cat test.lds
SECTIONS
{
. = 10000;
.text : { *(.text) }
. = 5368709120;
.data : { *(.foo) }
}
This now means that we can not fit the address of foovar inside the space allocated by the relocation. When we try it:
$ ld -Ttest.lds ./foo.o ./foo.o: In function `_start': (.text+0x3): relocation truncated to fit: R_X86_64_32S against symbol `foovar' defined in foo section in ./foo.o
What this means is that the full 64-bit address of foovar, which now lives somewhere above 5 gigabytes, can't be represented within the 32-bit space allocated for it.
For code optimisation purposes, the default immediate size to the mov instructions is a 32-bit value. This makes sense because, for the most part, programs can happily live within a 32-bit address space, and people don't do things like keep their data so far away from their code it requires more than a 32-bit address to represent it. Defaulting to using 32-bit immediates therefore cuts the code size considerably, because you don't have to make room for a possible 64-bit immediate for every mov.
So, if you want to really move a full 64-bit immediate into a register, you want the movabs instruction. Try it out with the code above - with movabs you should get a R_X86_64_64 relocation and 64-bits worth of room to patch up the address, too.
If you're seeing this and you're not hand-coding, you probably want to check out the -mmodel argument to gcc.
posted at: Thu, 12 Mar 2009 23:20 | in /code/c | permalink | add comment (2 others)
Some tips and things to check if your YUI ButtonGroup isn't behaving as you wish it would.
Double-check your <body> tag has class="yui-skin-sam"
Unlike in the documentation example, you can't just put a call to YAHOO.widget.ButtonGroup pointing to your div anywhere in your HTML and expect it to work. You've got to wait for it to be ready with something like:
<script type="text/javascript">
YAHOO.util.Event.onContentReady("my_button_div", function() {
var oButtonGroup = new YAHOO.widget.ButtonGroup("my_button_div");
});
</script>
You can easily get an image in each button. For example, if your button is defined as:
<span id="my-button-id" class="yui-button yui-radio-button yui-button-checked">
<span class="first-child">
<button type="button" hidefocus="true"></button>
</span>
</span>
Simply add a CSS class something like:
.yui-button#my-button-id button { background:url(http://server/image.jpg) 50% 50% no-repeat; }
Hopefully, this will save someone else a few hours!
posted at: Mon, 02 Mar 2009 23:36 | in /web | permalink | add comment (1 others)

This work is licensed under a Creative Commons Attribution-ShareAlike 2.5 License.