Converting DICOM images, part 2

In a previous post I discussed converting medical images. I tried again today and hit another issue which could do with some google help.

If you see the following:

$ dcm2pnm --all-frames input.dcm
E: can't change to unencapsulated representation for pixel data
E: can't determine 'PhotometricInterpretation' of decompressed image
E: can't decompress frame 0: Illegal call, perhaps wrong parameters
E: can't convert input pixel data, probably unsupported compression
F: Invalid DICOM image

and dcmdump reports somewhere:

# Dicom-Data-Set
# Used TransferSyntax: JPEG Baseline

Try the ``dcmj2pnm`` program, rather than dcm2pnm. The man page does mention this, but that assumes that you know you have a JPEG encoded image :) It works the same, can extract multiple frames and can be converted to a movie as discussed in the previous post. Easy when you know how!

Pyblosxom Recaptcha Plugin

I think maybe the world has moved on from Pyblosxom and, as my logs can attest, spammers seem to have moved on past Recaptcha (even if it is manual).

However, maybe there are some other holdouts like myself who will find this plugin useful. In reality, this will do nothing for stopping spam — for that I am currently having good luck with Akismet. However, I like that in displaying the captcha it costs the spammers presumably something to crack it so they can even attempt to submit the comment.

Comcast self-setup with Netgear routers

I just got a Zoom 5341 modem to replace a very sketchy old Motorola model and so far it seems to work fine.

However, the Comcast self-install switch process was not seamless. After you plug your new modem in the process starts fine, capturing your session and prompting you for your account. However at the next step it prompts you to download the Windows or Mac only client, directing you to an address http://cdn/....

It is at this point you can get stuck if you're behind a Netgear router (I have a WNDR3700) and probably others. The simple problem is that, for whatever reason, the factory firmware in this router does not pass on the domain search suffix through its inbuilt DHCP client. Without that, cdn doesn't resolve to anything, so you can't download the self-help tool and you're effectively stuck at a "can not find this page" screen. If you google at this point you can find many people who have hit this problem, and various information from erroneous suggestions that you've been hacked (?) to most people just giving up and moving into Comcast phone support hell.

Assuming you now don't have internet access, you can complete the process with a quick swizzle. Somebody might be able to correct me on this, but I don't think you want to run the self-install tool from an actual computer plugged directly into the modem if you have a router in the picture, because the wrong MAC address will get registered. So, your best solution is to turn everything off, plug your computer directly into the modem, turn it on, get an address, download the tool, turn everything off, plug the router back-in to the modem and then run the self install tool. At this point, everything just worked fine for me.

Netgear should probably fix this by correctly passing through the domain search suffixes in their routers. Comcast should probably fix this by doing some sort of geo-ip lookup to give clients a fully-qualified address in that webpage to download the tool even if their router is broken (or really, does downloading that file really require a content-delivery network?). You can probably fix this by running Openwrt on your router before you start.

Otherwise, the Zoom modem and the Netgear WNDR3700 seem to make a very nice combination which I would reccommend.

Adding a javascript tracker to your Docbook document

If you convert a docbook document in a chunked HTML form, it would presumably be very useful for you to place one of the various available javascript based trackers on each of your pages to see what is most popular, linked to etc.

I'm assuming you've already got to the point where you have your DSSSL stylesheet going (see something like my prior entry).

It doesn't appear you can get arbitrary output into the <head> tag -- the %html-header-tags% variable is really just designed for META tags and there doesn't appear to be anything else in the standard stylesheets to override.

So the trick is to use $html-body-start$. But you have to be a little bit tricky to actually get your javascript to slip past the SGML parser. After several attempts and a bit of googling I finally ended up with the following for a Google Analytics tracker, using string-append to get the javascript comment filtering in:

(define ($html-body-start$)
  (make element gi: "script"
  attributes: '(("language" "javascript")
                ("type" "text/javascript"))
            (make formatting-instruction
              data: (string-append "<" "!--
 var _gaq = _gaq || [];
  _gaq.push(['_setAccount', 'UA-XXXXXXXXXX-1']);
  _gaq.push(['_trackPageview']);

  (function() {
    var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true;
    ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js';
    var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s);
  })();
// --" ">"))))

easygeotag.info

I think I've just about finished my Thanksgiving project easygeotag.info.

easygeotag.info

I'm a little bit obsessive about geotagging my photos and while I know there are many photo management solutions out there that can do it in various ways, I generally find it quicker and easier to use exiv2 and simple shell scripts to embed the location info directly into my files.

I've tried a number of things that have never worked out better than simply using Google Maps and panning around to find locations — I even bought a GPS tracker which would supposedly automatically tag my photos; assuming of course it could ever get a GPS lock, it hadn't run out of batteries and corrupted its filesystem, you had all times in sync and could figure out the various timezone issues, daylight savings changes, etc etc. I always feel safer having all my metadata embedded in the actual files just incase Yahoo ever does a del.icio.us to Flickr (I use a little Python script with IPTC bindings for comments, which I then backup similarly obsessively locally and to Amazon S3).

The site is fairly simple in concept — it allows you to search for locations, easily extract the geotag info and provides the ability to save frequently used locations for easy reference.

Mostly it was an exercise for me to implement something after reading the excellent Javascript patterns with YUI3, Google App Engine and OpenID — all of which of which I managed to cram in.

Although the audience may be limited (maybe just to me :) I hope someone else finds it useful for managaing their memories! If you think this might be useful and would like the output in some other format, just let me know.

Symbol Versions and dependencies

The documentation on ld's symbol versioning syntax is a little bit vague on "dependencies", which it talks about but doesn't give many details on.

Let's construct a small example:

``$ cat foo.c``

#include <stdio.h>

#ifndef VERSION_2
void foo(int f) {
     printf("version 1 called\n");
}

#else
void foo_v1(int f) {
     printf("version 1 called\n");
}
__asm__(".symver foo_v1,foo@VERSION_1");

void foo_v2(int f) {
     printf("version 2 called\n");
}
/* i.e. foo_v2 is really foo@VERSION_2
 * @@ means this is the default version
 */
__asm__(".symver foo_v2,foo@@VERSION_2");

#endif

``$ cat 1.ver``

VERSION_1 {
      global:
      foo;
      local:
        *;
};

``$ cat 2.ver``

VERSION_1 {
      local:
        *;
};

VERSION_2 {
      foo;
} VERSION_1;

``$ cat main.c``

#include <stdio.h>

void foo(int);

int main(void) {
    foo(100);
    return 0;
}

``$ cat Makefile``

all: v1 v2

libfoo.so.1 : foo.c
        gcc -shared -fPIC -o libfoo.so.1 -Wl,--soname='libfoo.so.1' -Wl,--version-script=1.ver foo.c

libfoo.so.2 : foo.c
        gcc -shared -fPIC -DVERSION_2 -o libfoo.so.2 -Wl,--soname='libfoo.so.2' -Wl,--version-script=2.ver foo.c

v1: main.c libfoo.so.1
    ln -sf libfoo.so.1 libfoo.so
    gcc -Wall -o v1 -lfoo -L. -Wl,-rpath=. main.c

v2: main.c libfoo.so.2
    ln -sf libfoo.so.2 libfoo.so
    gcc -Wall -o v2 -lfoo -L. -Wl,-rpath=. main.c

.PHONY: clean
clean:
    rm -f libfoo* v1 v2

``$ ./v1``

version 1 called

``$ ./v2``

version 2 called

In words, we create two libraries; a version 1 and a version 2, where we provide a new version of foo in the version 2 library. The soname is set in the libraries, so v1 and v2 can distinguish the correct library to use.

In the updated 2.ver version, we say that VERSION_2 depends on VERSION_1. So, the question is, what does this mean? Does it have any effect?

We can examine the version descriptors in the library and see that there is indeed a relationship recorded there.

``$ readelf --version-info ./libfoo.so.2``

[...]
Version definition section '.gnu.version_d' contains 3 entries:
  Addr: 0x0000000000000264  Offset: 0x000264  Link: 5 (.dynstr)
  000000: Rev: 1  Flags: BASE   Index: 1  Cnt: 1  Name: libfoo.so.2
  0x001c: Rev: 1  Flags: none  Index: 2  Cnt: 1  Name: VERSION_1
  0x0038: Rev: 1  Flags: none  Index: 3  Cnt: 2  Name: VERSION_2
  0x0054: Parent 1: VERSION_1

Looking at the specification we can see that each version definition has a vd_aux field which is a linked list of, essentially, strings that give "the version or dependency name". This is a little vague for a specification, however it appears to mean that the first entry is the name of the version specification, and any following elements are your dependencies. At least, this is how readelf interprets it when it shows you the "Parent" field in the output above.

This implies something that the ld documentation doesn't mention; in that you may list multiple dependencies for a version node. That does work, and readelf will just report more parents if you try it.

So the question is, what does this dependency actually do? Well, as far as I can tell, nothing really. The dynamic loader doesn't look at the dependency information; and doesn't have any need to — it is looking to resolve something specific, foo@VERSION_2 for example, and doesn't really care that VERSION_1 even exists.

ld does enforce the dependency, in that if you specify a dependent node but leave it out or accidentally erase it, the link will fail. However, it doesn't really convey anything other than its intrinsic documenation value.

Logging POST requests with Apache

After getting a flood of spam, I became suspicious that there was an exploit in my blog software allowing easy robo-posts. Despite a code audit I couldn't see anything, and thus wanted to log the incoming POST requests before any local processing at all.

It took me a while to figure out how to do this, hopefully this helps someone else. Firstly install libapache-mod-security, then the magic incarnation is

SecRuleEngine On
SecAuditEngine on
SecAuditLog /var/log/apache2/website-audit.log
SecRequestBodyAccess on
SecAuditLogParts ABIFHZ

SecDefaultAction "nolog,noauditlog,allow,phase:2"

SecRule REQUEST_METHOD "^POST$" "chain,allow,phase:2"
SecRule REQUEST_URI ".*" "auditlog"

So, to break it down a little, the default action says to do nothing during phase 2 (when the body is available for inspection); the allow means that we're indicating that nothing further will happen in any of the remaining phases, so the module can shortcut through them. The two SecRules work together -- the first says that any POST requests should be tested by the next rule (i.e. the chained rule), which in this case says that any request should be sent to the audit log. After that, the similar allow/phase argument again says that nothing further is going to happen in any of the subsequent phases mod_security can work on. As per the parts between A and Z, we'll log the headers, the request body, the final response and trailer.

So, as it turns out, there is no exploit; it seems most likely there is an actual human behind the spam that gets through, because every time they take a guess it is correct. So I guess I'll take a glass-half-full kind of approach and rather than being annoyed at removing the spam, I'll just convince myself that I made a small donation from some spam overlord to one of their poor minions!

Split debugging info -- symbols

In a previous post I mentioned split debugging info.

One addendum to this is how symbols are handled. Symbols are separate to debugging info (i.e. the stuff about variable names, types, etc you get when -g is turned on), but necessary for a good debugging experience.

You have a choice, however, of where you leave the symbol files. You can leave them in your shipping binary/library so that users who don't have the full debugging info available will still get a back-trace that at least has function names. The cost is slightly larger files for everyone, even if the symbols are never used. This appears to be what Redhat does with it's system libraries, for example.

The other option is to keep the symbols in the .debug files along-side the debug info. This results in smaller libraries, but really requires you to have the debug info files available to have workable debugging. This appears to be what Debian does.

So, how do you go about this? Well, it depends on what tools you're using.

For binutils strip, there is some asynchronicity between the --strip-debug and --only-keep-debug options. --strip-debug will keep the symbol table in the binary, and --only-keep-debug will also keep the symbol table.

$ gcc -g -o main main.c
$ readelf --sections ./main | grep symtab
  [36] .symtab           SYMTAB          00000000 000f48 000490 10     37  53  4
$ cp main main.debug
$ strip --only-keep-debug main.debug
$ readelf --sections ./main.debug | grep symtab
  [36] .symtab           SYMTAB          00000000 000b1c 000490 10     37  53  4
$ strip --strip-debug ./main
$ readelf --sections ./main.debug | grep symtab
  [36] .symtab           SYMTAB          00000000 000b1c 000490 10     37  53  4

Of course, you can then strip (with no arguments) the final binary to get rid of the symbol table; but other than manually pulling out the .symtab section with objcopy I'm not aware of any way to remove it from the debug info file.

Constrast with elfutils; more commonly used on Redhat based system I think.

eu-strip's --strip-debug does the same thing; leaves the symtab section in the binary. However, it also has a -f option, which puts any removed sections during the strip into a separate file. Therefore, you can create any combination you wish; eu-strip -f results in an empty binary with symbols and debug data in the .debug file, while eu-strip -g -f results in debug data only in the .debug file, and symbol data retained in the binary.

The only thing to be careful about is using eu-strip -g -f and then further stripping the binary, and consequently destroying the symbol table, but retaining debug info. This can lead to some strange things in backtraces:

$ gcc -g -o main main.c
$ eu-strip -g -f main.debug main
$ strip ./main
$ gdb ./main
GNU gdb (GDB) 7.1-debian
...
(gdb) break foo
Breakpoint 1 at 0x8048397: file main.c, line 2.
(gdb) r
Starting program: /home/ianw/tmp/symtab/main

Breakpoint 1, foo (i=100) at main.c:2
      2return i + 100;
(gdb) back
#0  foo (i=100) at main.c:2
#1  0x080483b1 in main () at main.c:6
#2  0x423f1c76 in __libc_start_main (main=Could not find the frame base for "__libc_start_main".
) at libc-start.c:228
#3  0x08048301 in ?? ()

Note one difference between strip and eu-strip is that binutils strip will leave the .gnu_debuglink section in, while eu-strip will not:

$ gcc -g -o main main.c
$ eu-strip -g -f main.debug main
$ readelf --sections ./main| grep debuglink
  [29] .gnu_debuglink    PROGBITS        00000000 000bd8 000010 00      0   0  4
$ eu-strip main
$ readelf --sections ./main| grep debuglink
$ gcc -g -o main main.c
$ eu-strip -g -f main.debug main
$ strip main
$ readelf --sections ./main| grep debuglink
  [27] .gnu_debuglink    PROGBITS        00000000 0005d8 000010 00      0   0  4