technovelty

weblog of Ian Wienand

RSS  |  technovelty home  |  page of ian  |  ian@wienand.org

Separate debug info

I've recently found out a bit more about separating debug info, and thought a consolidated reference might be handy.

The Idea

Most every distribution now provides separate debug packages which contain only the debug info, saving much space for the 99% of people who never want to start gdb.

This is achieved with objcopy and --only-keep-debug/--add-gnu-debuglink and is well explained in the man page.

What does this do?

This adds a .gnu_debuglink section to the binary with the name of debug file to look for.

$ gcc -g -shared -o libtest.so libtest.c
$ objcopy --only-keep-debug libtest.so libtest.debug
$ objcopy --add-gnu-debuglink=libtest.debug libtest.so
$ objdump -s -j .gnu_debuglink libtest.so

libtest.so:     file format elf32-i386

Contents of section .gnu_debuglink:
 0000 6c696274 6573742e 64656275 67000000  libtest.debug...
 0010 52a7fd0a                             R... 

The first part is the name of the file, the second part is a check-sum of debug-info file for later reference.

Build ID

Did you know that binaries also get stamped with a unique id when they are built? The ld --build-id flag stamps in a hash near the end of the link.

$ readelf --wide --sections ./libtest.so  | grep build
  [ 1] .note.gnu.build-id NOTE            000000d4 0000d4 000024 00   A  0   0  4
$ objdump -s -j .note.gnu.build-id libtest.so 

libtest.so:     file format elf32-i386

Contents of section .note.gnu.build-id:
 00d4 04000000 14000000 03000000 474e5500  ............GNU.
 00e4 a07ab0e4 7cd54f60 0f5cf66b 5799b05c  .z..|.O`.\.kW..\
 00f4 2d43f456                             -C.V            

Incase you're wondering what the format of that is...

uint32 name_size; /* size of the name */
uint32 hash_size; /* size of the hash */
uint32 identifier; /* NT_GNU_BUILD_ID == 0x3 */
char   name[name_size]; /* the name "GNU" */
char   hash[hash_size]; /* the hash */

Although the actual file may change (due to prelink or similar) the hash will not be updated and remain constant.

Finding the debug info files

The last piece of the puzzle is how gdb attempts to find the debug-info files when it is run. The main variable influencing this is debug-file-directory.

(gdb) show debug-file-directory 
The directory where separate debug symbols are searched for is "/usr/lib/debug".

The first thing gdb does, which you can verify via an strace, is search for a file called [debug-file-directory]/.build-id/xx/yyyyyy.debug; where xx is the first two hexadecimal digits of the hash, and yyy the rest of it:

$ objdump -s -j .note.gnu.build-id /bin/ls

/bin/ls:     file format elf32-i386

Contents of section .note.gnu.build-id:
 8048168 04000000 14000000 03000000 474e5500  ............GNU.
 8048178 c6fd8024 2a11673c 7c6a5af6 2c65b1b5  ...$*.g<|jZ.,e..
 8048188 d7e13fd4                             ..?.            

... [running gdb /bin/ls] ...

access("/usr/lib/debug/.build-id/c6/fd80242a11673c7c6a5af62c65b1b5d7e13fd4.debug", F_OK) = -1 ENOENT (No such file or directory)

Next it moves onto the debug-link info filename. First it looks for the filename in same directory as the object being debugged. After that it looks for the file in a sub-directory called .debug/ in the same directory.

Finally, it prepends the debug-file-directory to the path of the object being inspected and looks for the debug info there. This is why the /usr/lib/debug directory looks like the root of a file-system; if you're looking for the debug-info of /usr/lib/libfoo.so it will be looked for in /usr/lib/debug/usr/lib/libfoo.so.

Interestingly, the sysroot and solib-search-path don't appear to have anything to do with these lookups. So if you change the sysroot, you also need to change the debug-file-directory to match.

However, most distributions make all this "just work", so hopefully you'll never have to worry about anyway!

posted at: Fri, 22 Jan 2010 09:11 | in /code | permalink | add comment (2 others)

Salton (Sim)City

I was recently driving through the California desert and came across the Salton Sea. Long story short - it rained a lot and the Colorado River overflowed a bunch of dams and dikes meant to contain it and created a huge inland sea. Oops.

Some enterprising souls must have decided that despite the lack of any natural flushing dooming the sea to a salty, polluted existence, there was ripe opportunity to create a sea-side metropolis.

From the ground, it is a bit of a fun ghost town to explore. The typical "everything just abandoned" type thing. But when I came to geotag some photos I took there, I was quite astonished to see this.

Salton (Sim)City

That looks exactly like what I used to do in SimCity. I'd use the F-U-N-D-S cheat at the start to max out my money, then build my little empire with neat roads and school and harbours and whatnot — they've even got an airport! Then I'd press "go" and people would slowly move in to the residential areas, one house on one block at a time.

I guess poor old Salton City never made it past "turtle speed"!

posted at: Mon, 11 Jan 2010 16:26 | in /humor | permalink | add comment (3 others)

vi backup files considered harmful

Mark this one down as another in the long list of "duh" — once you realise what is going on!

Bug report comes in about a long running daemon that has stopped logging. lsof reports the log file is now named logfile~ and further more is deleted! This happens after a system upgrade scenario, so of course I go off digging through a multitude of scripts and what-not to find the culprit...

Have you got it yet?

Try this...

# lsof | grep syslogd | grep messages
syslogd    1376        root   15w      REG        3,1    99851    4605625 /var/log/messages
# cd /var/log/
# vi messages (and save the file)
root@jj:/var/log# lsof | grep syslogd | grep messages
syslogd    1376        root   15w      REG        3,1    99851    4605625 /var/log/messages~ (deleted)

vi is very careful and renames your existing file, so that if anything goes wrong when writing the new version you can get something back. It's a shame the daemon doesn't know about this! The kernel is happy to deal with the rename, but when the backup file is unlinked you're out of luck. Confusingly to a casual inspection your log file looks like it's there ... just that nothing is going into it. (oh, and if you tried that, you might like to restart syslogd now :)

Moral of the story -- overcome that finger-memory and never use vi on a live file; you're asking for trouble!

posted at: Fri, 08 Jan 2010 16:15 | in /linux/tips | permalink | add comment (6 others)

Stripping shared libraries

So, how to strip a shared library?

--strip-unneeded states that it removes all symbols that are not needed for relocation processing. This is a little cryptic, because one might reasonably assume that a shared library can be "relocated", in that it can be loaded anywhere. However, what this really refers to is object files that are usually built and bundled into a .a archive for static linking. For an object in an static library archive to still be useful, global symbols must be kept, although static symbols can be removed. Take the following small example:

$ cat libtest.c
static int static_var = 100;
int global_var = 100;

static int static_function(void) {
       return static_var;
}

int global_function(int i) {
    return static_function() + global_var + i;
}

Before stripping:

$ gcc -c -fPIC -o libtest.o libtest.c
$ readelf --symbols ./libtest.o

Symbol table '.symtab' contains 18 entries:
   Num:    Value  Size Type    Bind   Vis      Ndx Name
...
     5: 00000000     4 OBJECT  LOCAL  DEFAULT    5 static_var
     6: 00000000    22 FUNC    LOCAL  DEFAULT    3 static_function
    13: 00000004     4 OBJECT  GLOBAL DEFAULT    5 global_var
    16: 00000016    36 FUNC    GLOBAL DEFAULT    3 global_function

After stripping:

$ strip --strip-unneeded libtest.o
$ readelf --symbols ./libtest.o 

Symbol table '.symtab' contains 15 entries:
   Num:    Value  Size Type    Bind   Vis      Ndx Name
...
    10: 00000004     4 OBJECT  GLOBAL DEFAULT    5 global_var
    13: 00000016    36 FUNC    GLOBAL DEFAULT    3 global_function

If you --strip-all from this object file, it will remove the entire .symtab section and will be useless for further linking, because you'll never be able to find global_function to call it!.

Shared libraries are different, however. Shared libraries keep global symbols in a separate ELF section called .dynsym. --strip-all will not touch the dynamic symbol entires, and thus it is therefore safe to remove all the "standard" symbols from the output file, without affecting the usability of the shared library. For example, readelf will still show the .dynsym symbols even after stripping:

$ gcc -shared -fPIC -o libtest.so libtest.c
$ strip --strip-all ./libtest.so 
$ readelf  --syms ./libtest.so 

Symbol table '.dynsym' contains 11 entries:
   Num:    Value  Size Type    Bind   Vis      Ndx Name
...
     6: 00000452    36 FUNC    GLOBAL DEFAULT   12 global_function
    10: 000015e0     4 OBJECT  GLOBAL DEFAULT   21 global_var

However, --strip-unneeded is smart enough to realise that a shared-object library doesn't need the .symtab section as well and remove it.

So, conclusions? --strip-all is safe on shared libraries, because global symbols remain in a separate section, but not on objects for inclusion in static libraries (relocatable objects). --strip-unneeded is safe for both, and automatically understands shared objects do not need any .symtab entries to function and removes them; effectively doing the same work as --strip-all. So, --strip-unneeded is essentially the only tool you need for standard stripping needs!

See also

posted at: Wed, 23 Dec 2009 16:39 | in /linux | permalink | add comment (1 others)

An open letter to Harvey Norman, Norwest, Castle Hill

I usually find blog rants useless, but sometimes something is just so annoying one is sufficiently inspired. Today I went with my parents to buy them a Tivo at Harvey Norman, Norwest, Castle Hill, NSW, Australia. I am a big Tivo fan; the interface is good and it "just works". I don't mind paying for (or in this recommending paying for) good products.

After selecting the Tivo model, I asked for a HDMI cable. The salesman made a series of questions about what sort of HD TV it was being plugged into; I quickly sensed this as a probe to see what sort of suckers we were, and requested just a "normal" cable.

At this point, he insisted on a $130 (you guessed it) Monster cable, and had the audacity to say that we didn't need one of the really expensive cables because our TV wasn't good enough! I openly expressed my concern, but the annoying high-pressure sales pitch had just begun. The amount of, frankly, crap that he spewed about 4-bit this, 10-bit that, legislating of labels, DA signal levels, mythical customers who regretted buying the cheap cables and who knows what else was to the point of being comical if it weren't so insistent and said with such seeming authority.

There is only one thing that matters - if the cable has passed the functional requirements for being certified to have the distinctive HDMI logo plastered on it. From the HDMI FAQ:

Q. What testing is required?

Prior to mass producing or distributing any Licensed Product or component that claims compliance with the HDMI Specification (or allowing someone else to do such activities), each Adopter must test a representative sample for HDMI compliance. First, the Adopter must self test as specified in the then-current HDMI Compliance Test Specification. The HDMI Compliance Test Specification provides a suite of testing procedures, and establishes certain minimum requirements specifying how each HDMI Adopter should test Licensed Products for conformance to the HDMI Specification.

Now, I can understand that if you buy any old HDMI cable off Ebay for $1, it may be a knock-off that uses the HDMI logo illegally. But there is no way that the certified $50 Philips cable (still very over-priced, but at least not insane, and discounted to $35) performs any differently to some overpriced Monster model certified to exactly the same standard.

The thing that annoyed me most was his analogy to buying a tyre. He stated that "if you walked up to a tyre salesman and I said don't want the Pirelli's, just put the cheap-o tires on my Ferrari" I'd be insane, and thus by extension of that logic I was insane for not buying a Monster cable for my great new Tivo.

This analogy is completely flawed and really just dishonest. A Ferrari is much more powerful and goes much faster than a standard car. It is plausible it needs a better engineered tyre to perform adequately given the additional stresses it undergoes. A Tivo doesn't put out any more or any less bits than any other HDMI certified equipment, no matter what you do. If the cable is certified as getting all the bits to the other end under whatever environmental conditions specified by the HDMI people, then it's going to work for the 99% of people with normal requirements.

Nobody wants to make a significant investment in a piece of audio-visual equipment and feel they are getting something that isn't optimal. Harvey Norman's use of this understandable consumer sentiment to sell ridiculously over-priced cables that do nothing is extremely disappointing.

I'm sure the commissions on these things encourage this behaviour, so it is useless expecting the retailer or individual sales assistant to change their policy to recommend reasonably priced cables. However, it is really Tivo and other manufacturers who get the raw end of this deal; a $130 cable is over 20% of the price the actual Tivo! That is surely affecting people's purchasing decisions.

If Tivo and others included a certified HDMI cable with their device, as they do with component cables, and had "Certified HDMI 1.3 cable included" plastered on the box, it would be a harder sell to explain why the manufacturer would bother shipping a certified cable that is supposedly insufficient, and consumers would hopefully avoid the very distasteful high-pressure theatrics I was subjected to today.

Update: I have removed my description of the individual salesman in the title. Singling someone out invites ad hominem attacks and I have no interest in providing a forum for or perpetuating any such thing.

If it's one salesman, it's a thousand. To reiterate my main point, manufacturers must surely be annoyed that they participate in price wars with each other only to have their margins taken by a gold-plated optical cable company. I believe it is really up to them to get the information into their own market so it can operate efficiently.

posted at: Tue, 22 Dec 2009 20:23 | in /general | permalink | add comment (5 others)

Distance to 1 million people

On a recent trip up the Oregon coast, a friendly doorman at our hotel in Portland was inquiring about our trip. When we mentioned we passed through Bandon, OR, he quipped that Bandon was the place furthest from a city of one million people in the USA. I guess a normal person would just think "oh, that's interesting" and move on, but it has been plaguing me ever since.

Firstly, I had to find what cities in the USA had more than 1 million people. Luckily Wolfram Alpha gives the answer:

(I certainly wouldn't have guessed that list!) From there my plan was to find the bounding box of the continental USA; luckily Wikipedia has the raw data for that. Combined with the latitude and longitude of the cities above, I had the raw data.

I couldn't figure out any way better than a simple brute-force of testing every degree and minute of latitude and longitude within the bounding box and calculating the distance to the closest large city; the theory being that from one particular point you would have to travel further than any other to reach a city of 1 million people. Luckily, that is short work for a modern processor, and hopefully the result would be a point somewhere around Bandon. I'd already become acquainted with the great circle and measuring distances when I did Tinymap, so a quick python program evolved.

However, it turns out that the program picks the far south-east corner of the bounding box. Thanks to the shape of the USA, that is way out over the ocean somewhere. I can't figure out a way to get an outline of the USA to test if a given point is inside the border or not, but all is not lost.

I modified the program to output the the distance to the closest large city along with the location to a log file, and then imported it into gnuplot to make a heat-map. The hardest part was finding an equirectangular outline of the USA to place the heat-map over, rather than a much more common Mercator projection; Wikimedia to the rescue!

I actually surprised myself at how well the two lined up when, after a little work with Gimp, I overlayed them (big)

Distance to a city of 1 million people (km)

From this, I can see that Bandon, about a third of the way up the Oregon coast, is a pretty good candidate. However, probably not the best; I get the feeling the real point that is the furthest from any city of 1 million people is actually somewhere in the central-middle of Montana.

However, we can also fiddle the program slightly to disprove the point about Bandon. The numbers show the closest large city to Bandon is LA, at ~1141km. Taking another point we suspect to be more remote; the closest large city to Portland (where we met the doorman) is also LA at ~1329km. So to reach the closest large city you have to travel further from Portland than Bandon, so Bandon is not the furthest place in the USA from a city of one million people. Myth busted!

posted at: Sun, 06 Dec 2009 11:28 | in /general | permalink | add comment (9 others)

Go L4!

By now everybody has now heard about Go, Google's expressive, concurrent, garbage collecting language. One big, glaring thing stuck out at me when I was reading the documentation:

Do not communicate by sharing memory; instead, share memory by communicating.

One of the examples given is a semaphore using a channel, which I'll copy here for posterity.

var sem = make(chan int, MaxOutstanding)

func handle(r *Request) {
    sem <- 1;    // Wait for active queue to drain.
    process(r);  // May take a long time.
    <-sem;       // Done; enable next request to run.
}

func Serve(queue chan *Request) {
    for {
        req := <-queue;
        go handle(req);  // Don't wait for handle to finish.
    }
}

Here is a little illustration of that in operation.

Semaphores with Google Go

Serve creates goroutines via the go keyword; each of which tries to get a slot in the channel. In the example there are only 3 slots, so it acts like a semaphore of count 3. When done, each thread returns its slot to the channel, which allows anyone blocked to be woken and continued.

This instantly reminded me of the very first thing you need to do if you ever want to pass Advanced Operating Systems -- write a semaphore server to provide synchronisation within your OS.

In L4, threads communicate with each other via inter-process communication (IPC). IPC messages have a fixed format - you specify a target thread, bundle some data into the available slots in the IPC format and fire it off. By default you block waiting for a reply -- this all happens within a single call for efficiency. On the other side, you can write servers who are listening for remote IPC connections, where everything happens in reverse.

Here's another illustration the of the trivial semaphore server concept Shehjar and I implemented.

L4 semaphore server example

Look familiar? Instead of a blocking push of a number into a slot into a channel, you make a blocking IPC call to a remote server.

My point here is that both take the approach of sharing memory via communication. When using IPC, you bundle up all your information into the available slots in the IPC message and send it. When using a channel, you bundle your information into an entry in the channel and call your goroutine. Receiving the IPC is the same as draining a channel - both result in you getting the information that was bundled into it by the caller.

IPCGo
Start threadStart goroutine
New thread blocks listening for IPC messageGoroutine blocks draining empty channel
Bundle information into IPC messageBundle data into type of your channel
Send IPC to new threadPush data into channel
Remote thread unbundles IPCgoroutine drains channel and gets data

Whenever you mention the word "microkernel", people go off the deep-end and one thing they seem to forget about is the inherent advantages of sharing state only via communication. As soon as you do that, you've broken open an amazing new tool for concurrency, which is now implicitly implied. By communicating via messages/channels rather than shared global state, it doesn't matter where you run! One of those threads in the example could be running on another computer in your cloud, marshalling up it's IPC messages/channel entries and sending them over TCP/IP -- nobody would care!

At any rate, do not communicate by sharing memory; instead, share memory by communicating is certainly an idea whose time has come.

posted at: Fri, 20 Nov 2009 11:37 | in /code | permalink | add comment (5 others)

Django toolchain on Debian

Although Django is well packaged for Debian, I've recently come to the conculsion that the packages are really not what I want. The problem is that my server runs Debian stable, while my development laptop runs unstable, and Django revisions definitely fall into the "unstable" category. There really is no way to use a system Django 1.1 on one side, and a system Django 1.0 on the other.

After a bit of work, I think I've got something together that works, and I post it here in the hope it is useful for someone else. This info has been gleaned from similar references such as this and this.

This is aimed at running a server using Debian stable (5.0) for production and an unstable environment for development. You actually need both to get this running. This is based on a project called "project" that lives in /var/www

  1. First step is to install python-virtualenv on both.
  2. Create a virtualenv on both, using the --no-site-packages to make it a stand-alone environment. This is like a chroot for python.
    $ virtualenv --no-site-packages project
    New python executable in project/bin/python
    Installing setuptools............done.
    
  3. The unstable environment has a file you'll need to copy into the stable environment - bin/activate_this.py. The stable version of python-virtualenv isn't recent enough to include this file, but you need it to essentially switch the system python into the chrooted environment. This will come in handy later when setting up the webserver.
  4. There are probably better ways to keep the two environments in sync, but I simply take a manual approach of doing everything twice, once in each. So from now on, do the following in both environments.
  5. Activate the environment
    /var/www$ cd project
    /var/www/project$ . bin/activate
    (project) /var/www/project$
    
  6. Use easy_install to install pip
    (project) /var/www/project$ easy_install pip
    Searching for pip
    Reading http://pypi.python.org/simple/pip/
    Reading http://pip.openplans.org
    Best match: pip 0.4
    Downloading http://pypi.python.org/packages/source/p/pip/pip-0.4.tar.gz#md5=b45714d04f8fd38fe8e3d4c7600b91a2
    Processing pip-0.4.tar.gz
    Running pip-0.4/setup.py -q bdist_egg --dist-dir /tmp/easy_install-Wu9O-U/pip-0.4/egg-dist-tmp-xjSdxq
    warning: no previously-included files matching '*.txt' found under directory 'docs/_build'
    no previously-included directories found matching 'docs/_build/_sources'
    zip_safe flag not set; analyzing archive contents...
    pip: module references __file__
    Adding pip 0.4 to easy-install.pth file
    Installing pip script to /var/www/project/bin
    
    Installed /var/www/project/lib/python2.5/site-packages/pip-0.4-py2.5.egg
    Processing dependencies for pip
    Finished processing dependencies for pip
    
  7. Install setuptools, also using easy_install (for some reason, pip can't install it). There is a trick here, you need to specify at least version 0.6c9 or there will be issues with the SVN version on Debian stable when you try to get Django in the next step.
    (project) /var/www/project$ easy_install setuptools==0.6c9
    Searching for setuptools==0.6c9
    Reading http://pypi.python.org/simple/setuptools/
    Best match: setuptools 0.6c9
    Downloading http://pypi.python.org/packages/2.5/s/setuptools/setuptools-0.6c9-py2.5.egg#md5=fe67c3e5a17b12c0e7c541b7ea43a8e6
    Processing setuptools-0.6c9-py2.5.egg
    Moving setuptools-0.6c9-py2.5.egg to /var/www/project/lib/python2.5/site-packages
    Removing setuptools 0.6c8 from easy-install.pth file
    Adding setuptools 0.6c9 to easy-install.pth file
    Installing easy_install script to /var/www/project/bin
    Installing easy_install-2.5 script to /var/www/project/bin
    
    Installed /var/www/project/lib/python2.5/site-packages/setuptools-0.6c9-py2.5.egg
    Processing dependencies for setuptools==0.6c9
    Finished processing dependencies for setuptools==0.6c9
    
  8. Create a requirements.txt with the path to the Django SVN for pip to install, then and then install it.
    (project) /var/www/project$ cat requirements.txt
    -e svn+http://code.djangoproject.com/svn/django/tags/releases/1.0.3/#egg=Django
    (project) /var/www/project$ pip install -r requirements.txt
    Obtaining Django from svn+http://code.djangoproject.com/svn/django/tags/releases/1.0.3/#egg=Django (from -r requirements.txt (line 1))
      Checking out http://code.djangoproject.com/svn/django/tags/releases/1.0.3/ to ./src/django
    
    (project) /var/www/project$ pip install -r requirements.txt
    Obtaining Django from svn+http://code.djangoproject.com/svn/django/tags/releases/1.0.3/#egg=Django (from -r requirements.txt (line 1))
      Checking out http://code.djangoproject.com/svn/django/tags/releases/1.0.3/ to ./src/django
    ... so on ...
    
  9. Almost there! You can keep installing more Python requirements with pip if you need, but we've got enough here to start.
  10. Create a file in /var/www/project called project-python.py. This will be the Python interpreter the webserver uses, and basically exec's itself into the virtalenv. The file should contain the following:
    activate_this = "/var/www/project/bin/activate_this.py"
    execfile(activate_this, dict(__file__=activate_this))
    
    from django.core.handlers.modpython import handler
    
  11. Now it's time to start the Django project. I like to create a new directory called project, which will be the parent directory kept in the SCM with all the code, media, templates, database (if using SQLite) etc. In this way to keep the two environments up-to-date I simply svn ci on one side, and svn co on the other.
    (project) /var/www/project$ mkdir project
    (project) /var/www/project/project$ mkdir db django media www
    (project) /var/www/project/project$ cd django/
    (project) /var/www/project/project/django$ django-admin startproject myproject
    
  12. Last step now is to wire-up Apache to serve it all up. The magic is making sure you specify the correct PythonHandler that you made before to use the virtualenv, and include the right paths so you can find it and all the required Django settings.
    DocumentRoot /var/www/project
    
    <Location "/">
        SetHandler python-program
        PythonHandler project-python
        PythonPath "['/var/www/project/','/var/www/project/project/django/'] + sys.path"
        SetEnv DJANGO_SETTINGS_MODULE myproject.settings
        PythonDebug On
    </Location>
    
    Alias /media /var/www/project/project/media
    <Location "/media">
        SetHandler none
    </Location>
    <Directory "/var/www/project/project/media">
        AllowOverride none
        Order allow,deny
        Allow from all
        Options FollowSymLinks Indexes
    </Directory>
    

With all this, you should be up and running in a basic but stable environment. It's easy enough to update packages for security fixes, etc via pip after activating your virtualenv.

posted at: Fri, 11 Sep 2009 22:49 | in /linux/debian | permalink | add comment (6 others)

SIGTTOU and switching to canonical mode

Here's an interesting behaviour that, as far as I can tell, is completley undocumented, sightly consfusing but fairly logical. Your program should receive a SIGTTOU when it is running in the background and attempts to output to the terminal -- the idea being that you shouldn't scramble the output by mixing it in while the shell is trying to operate. Here's what the bash manual has to say

Background processes are those whose process group ID differs from the
terminal's; such processes are immune to key- board-generated signals.
Only foreground processes are allowed to read from or write to the
terminal.  Background processes which attempt to read from (write to)
the terminal are sent a SIGTTIN (SIGTTOU) signal by the terminal
driver, which, unless caught, suspends the process.

So, consider the following short program, which writes some output and catches any SIGTTOU's, with an optional flag to switch between canonical and non-canonical mode.

#include <stdio.h>
#include <stdlib.h>
#include <signal.h>
#include <termios.h>
#include <unistd.h>

static void sig_ttou(int signo) {
   printf("caught SIGTTOU\n");
   signal(SIGTTOU, SIG_DFL);
   kill(getpid(), SIGTTOU);
}

int main(int argc, char *argv[]) {

   signal(SIGTTOU, sig_ttou);

   if (argc != 1) {
      struct termios tty;

      printf("setting non-canoncial mode\n");
      tcgetattr(fileno(stdout), &tty);
      tty.c_lflag &= ~(ICANON);
      tcsetattr(fileno(stdout), TCSANOW, &tty);
   }

   int i = 0;
   while (1) {
      printf("  *** %d ***\n", i++);
      sleep(1);
   }
}

This program ends up operating in an interesting manner.

  1. Run in the background, canonical mode : no SIGTTOU and output gets multiplexed with shell.
    $ ./sigttou &
      *** 0 ***
    [1] 26171
    $   *** 1 ***
      *** 2 ***
      *** 3 ***
    
  2. Run in the background, non-canonical mode : SIGTTOU delivered
    $ ./sigttou 1 &
    [1] 26494
    ianw@jj:/tmp$ setting non-canoncial mode
    caught SIGTTOU
    
    
    [1]+  Stopped                 ./sigttou 1
    
  3. Run in the background, canonical mode, tostop set via stty : SIGTTOU delivered, seemingly after a write proceeds
    $ stty tostop
    $ ./sigttou &
    [2] 26531
    ianw@jj:/tmp$   *** 0 ***
    caught SIGTTOU
    
    
    [2]+  Stopped                 ./sigttou
    

You can see a practical example of this by comparing the difference between cat file & and more file &. The semantics make some sense -- anything switching off canonical mode is like to be going to really scramble your terminal, so it's good to stop it and let it's terminal handling functions run. I'm not sure why canoncial background is considered useful mixed in with your prompt, but someone, somewhere must have decided it was so.

Update: upon further investigation, it is the switching of terminal modes that invokes the SIGTTOU. To follow the logic through more, see the various users of tty_check_change in the tty driver.

posted at: Fri, 21 Aug 2009 11:02 | in /linux/tips | permalink | add comment (0 others)

Using frozen chocolate to visualise microwave heat distribution

My attempt at answering that most important of questions : where should one place their plate in the microwave to achieve maximal heating?

posted at: Thu, 23 Jul 2009 12:23 | in /humor | permalink | add comment (1 others)

Review : The Race for a New Game Machine

I recently finished The Race for a New Game Machine: Creating the Chips Inside the XBox 360 and the Playstation 3 (David Shippy and Mickie Phipps); an interesting insight into the processor development process from some of the lead architects.

The executive summary is : Sony, Toshiba and IBM (STI) decided to get together to create the core of the Playstation 3 — the Cell processor. Sony, with their graphics and gaming experience, would do the Synergistic Processing Elements; extremely fast but limited sub-units specialising in doing 3D graphics and physics work (i.e. great for games). IBM would do a Power based core that handled the general purpose computing requirements.

The twist comes when Microsoft came along to IBM looking for the Xbox 360 processor, and someone at IBM mentioned the Power core that was being worked on for the Playstation. Unsurprisingly, the features being built for the Playstaion also interested Microsoft, and the next thing you know, IBM is working on the same core for Microsoft and Sony at the same time, without telling either side.

This whole chain of events makes for a very interesting story. The book is written for a general audience, but you'll probably get the most out of it if you already have some knowledge of computer architecture; if you're trying to understand some of the concepts referred to from the two line descriptions you'll get a bit lost (H&P it is not).

The only small criticism is that it sometimes falls into reading a bit like a long LinkedIn recommendation. However, the book is very well paced, and throws in just enough technical tidbits amongst the corporate and personal dramas to make it a very fun read.

One thing that is talked about a bit is the fan-out of four (FO4) metric used in the designers quest to push the chip as fast as possible (and, as mentioned many times in the book, faster than what Intel could do!). I thought it might be useful to expand on this interesting metric a bit.

FO4

One problem facing chip architects is that, thanks to Moore's Law, it is hard to find a constant to compare design versus implementation. For example, you may design an amazing logic-block to factor large integers into products of prime numbers, but somebody else with better fabrication facilities might be able to essentially brute-force a better solution by producing faster hardware using a much less innovative design.

Some metric is needed that can compare the two designs discounting who has the better fabrication process. This is where the FO4 comes in.

When you change the input to a logic gate, it is not like it magically flips the output to the correct level instantaneously. There is a latency while everything settles to its correct level — the gate delay. The more gates connected to the output of a gate the more current required, which has additional effects on latency. The FO4 latency is defined as the time required to flip an inverter gate connected to (fanned-out) to four other inverter gates.

Fan-out of four

Thus you can describe the latency of other logic blocks in multiples of FO4 latencies. As this avoids measuring against wall-time it is an effective description of the relative efficiency of logic designs. For example, you may calculate that your factoriser has a latency of 100 FO4. Just because someone else's 200 FO4 factoriser gets a result a few microseconds faster thanks to their fancy ultra-low-FO4-latency fabrication process, you can still show that your design, at least a priori, is better.

The book refers several times to efforts to reduce the FO4 of the processor as much as possible. The reason this is important in this context is that the maximum latency on the critical path will determine the fastest clock speed you can run the processor at. For reasons explained in the book high clock speed was a primary goal, so every effort had to be made to reduce latencies.

All modern processors operate as a production line, with each stage doing some work and passing it on to the next stage. Clearly the slowest stage determines the maximum speed that the production line can run at (weakest link in the chain and all that). For example, if you clock at 1Ghz, that means each cycle takes 1 nanosecond (1s / 1,000,000,000Hz). If you have a F04 latency of say, 10 picoseconds, that means any given stage can have a latency of no more than 100 FO4 — otherwise that stage would not have enough time to settle and actually produce the correct result.

Thus the smaller you can get the FO4 latencies of your various stages, the higher you can safely up the clock speed. One way around long latencies might be to split-up your logic into smaller stages, making a much longer pipeline (production line). For example, split your 100 FO4 block into two 50 FO4 stages. You can now clock the processor higher, but this doesn't necessarily mean you'll get actual results out the end of the pipeline any faster (as Intel discovered with the Pentium 4 and it's notoriously long pipelines and corresponding high clock rates).

Of course, this doesn't even begin to describe the issues with superscalar design, instruction level parallelism, cache interaction and the myriad of other things the architects have to consider.

Anyway, after reading this book I guarantee you'll have an interesting new insight the next time you fire-up Guitar Hero.

posted at: Wed, 15 Jul 2009 19:15 | in /code/arch | permalink | add comment (2 others)

Dig Jazz Applet, V2

It seems the ABC updated the DIG Jazz now-playing list format, breaking V1. Some quick flash disassembly and a bit of hacking, and order is restored. As a bonus, it now shows the upcoming songs.

DIG Jazz now-playing Gnome applet

Source or Debian package.

posted at: Mon, 18 May 2009 23:20 | in /code/gnome | permalink | add comment (0 others)

Quickly describing hash utilisation

I think the most correct way to describe utilisation of a hash-table is using chi-squared distributions and hypothesis and degrees of freedom and a bunch of other things nobody but an actuary remembers. So I was looking for a quick method that was close-enough but didn't require digging out a statistics text-book.

I'm sure I've re-invented some well-known measurement, but I'm not sure what it is. The idea is to add up the total steps required to look-up all elements in the hash-table, and compare that to the theoretical ideal of a uniformly balanced hash-table. You can then get a ratio that tells you if you're in the ball-park, or if you should try something else. A diagram should suffice.

Scheme for acquiring a hash-utilisation ratio

This seems to give quite useful results with a bare minimum of effort, and most importantly no tricky floating point math. For example, on the standard Unix words with a 2048 entry hash-table, the standard DJB hash came out very well (as expected)

Ideal 2408448
Actual 2473833
----
Ratio 0.973569

To contrast, a simple "add each character" type hash:

Ideal 2408448
Actual 6367489
----
Ratio 0.378241

Example code is hash-ratio.py. I expect this measurement is most useful when you have a largely static bunch of data for which you are attempting to choose an appropriate hash-function. I guess if you are really trying to hash more or less random incoming data and hence only have a random sample to work with, you can't avoid doing the "real" statistics.

posted at: Thu, 07 May 2009 16:37 | in /code | permalink | add comment (1 others)

Relocation truncated to fit - WTF?

If you code for long enough on x86-64, you'll eventually hit an error such as:

(.text+0x3): relocation truncated to fit: R_X86_64_32S against symbol `array' defined in foo section in ./pcrel8.o

Here's a little example that might help you figure out what you've done wrong.

Consider the following code:

$ cat foo.s
.globl foovar
  .section   foo, "aw",@progbits
  .type foovar, @object
  .size foovar, 4
foovar:
   .long 0

.text
.globl _start
 .type function, @function
_start:
  movq $foovar, %rax

In case it's not clear, that would look something like:

int foovar = 0;

void function(void) {
  int *bar = &foovar;
}

Let's build that code, and see what it looks like

$ gcc -c foo.s

$ objdump --disassemble-all ./foo.o

./foo.o:     file format elf64-x86-64


Disassembly of section .text:

0000000000000000 <_start>:
   0:		 48 c7 c0 00 00 00 00	mov    $0x0,%rax

Disassembly of section foo:

0000000000000000 <foovar>:
   0:		 00 00			add    %al,(%rax)
   ...

We can see that the mov instruction has only allocated 4 bytes (00 00 00 00) for the linker to put in the address of foovar. If we check the relocations:

$ readelf --relocs ./foo.o

Relocation section '.rela.text' at offset 0x3a0 contains 1 entries:
  Offset          Info           Type           Sym. Value    Sym. Name + Addend
000000000003  00050000000b R_X86_64_32S      0000000000000000 foovar + 0

The R_X86_64_32S relocation is indeed only a 32-bit relocation. Now we can tickle this error. Consider the following linker script, which puts the foo section about 5 gigabytes away from the code.

$ cat test.lds
SECTIONS
{
 . = 10000;
 .text : { *(.text) }
 . = 5368709120;
 .data : { *(.foo) }
}

This now means that we can not fit the address of foovar inside the space allocated by the relocation. When we try it:

$ ld -Ttest.lds ./foo.o
./foo.o: In function `_start':
(.text+0x3): relocation truncated to fit: R_X86_64_32S against symbol `foovar' defined in foo section in ./foo.o

What this means is that the full 64-bit address of foovar, which now lives somewhere above 5 gigabytes, can't be represented within the 32-bit space allocated for it.

For code optimisation purposes, the default immediate size to the mov instructions is a 32-bit value. This makes sense because, for the most part, programs can happily live within a 32-bit address space, and people don't do things like keep their data so far away from their code it requires more than a 32-bit address to represent it. Defaulting to using 32-bit immediates therefore cuts the code size considerably, because you don't have to make room for a possible 64-bit immediate for every mov.

So, if you want to really move a full 64-bit immediate into a register, you want the movabs instruction. Try it out with the code above - with movabs you should get a R_X86_64_64 relocation and 64-bits worth of room to patch up the address, too.

If you're seeing this and you're not hand-coding, you probably want to check out the -mmodel argument to gcc.

posted at: Thu, 12 Mar 2009 23:20 | in /code/c | permalink | add comment (2 others)

YUI ButtonGroup Notes

Some tips and things to check if your YUI ButtonGroup isn't behaving as you wish it would.

Hopefully, this will save someone else a few hours!

posted at: Mon, 02 Mar 2009 23:36 | in /web | permalink | add comment (1 others)

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 2.5 License.