Toolchains, libc and interesting interactions

When building a user space binary, the -lc that gcc inserts into the final link seems pretty straight forward — link the C library. As with all things system-library related there is more to investigate.

If you look at /usr/lib/libc.so; the "library" that gets linked when you specify -lc, it is not a library as such, but a link script which specifies the libraries to link, which also includes the dynamic linker itself:

$ cat /usr/lib/libc.so
/* GNU ld script
   Use the shared library, but some functions are only in
   the static library, so try that secondarily.  */
OUTPUT_FORMAT(elf64-x86-64)
GROUP ( /lib/libc.so.6 /usr/lib/libc_nonshared.a  AS_NEEDED ( /lib/ld-linux-x86-64.so.2 ) )

On very old glibc's the AS_NEEDED didn't appear; so every binary really did have a DT_NEEDED entry for the dynamic linker itself. This can be somewhat confusing if you're ever doing forensics on an old binary which seems to have these entries for not apparent reason. However, we can see that that /lib/libc.so itself does actually require symbols from the dynamic linker:

$ readelf --dynamic /lib/libc.so.6 | grep NEEDED
 0x0000000000000001 (NEEDED)             Shared library: [ld-linux-x86-64.so.2]

Although you're unlikely to encounter it, a broken toolchain can cause havoc if it gets the link order wrong here, because the glibc dynamic linker defines minimal versions of malloc and friends -- you've got a chicken-and-egg problem using libc's malloc before you've loaded it! You can simulate this havoc with something like:

$ cat foo.c
#include <stdio.h>
#include <syslog.h>

int main(void)
{
   syslog(LOG_DEBUG, "hello, world!");
   return 0;
}

$ cat libbroken.so
GROUP ( /lib/ld-linux-x86-64.so.2 )

$ gcc -o -Wall -Wl,-rpath=. -L. -lbroken -g -o foo foo.c

$ ./foo
Inconsistency detected by ld.so: dl-minimal.c: 138: realloc: Assertion `ptr == alloc_last_block' failed!

Depending on various versions of things, you might see that assert or possibly just strange, corrupt output in your logs as syslog calls the wrong malloc. You could debug something like this by asking the dynamic linker to show you its bindings as it resolves them:

$ LD_DEBUG_OUTPUT=foo.txt LD_DEBUG=bindings ./foo
Inconsistency detected by ld.so: dl-minimal.c: 138: realloc: Assertion `ptr == alloc_last_block' failed!

$ cat foo.txt.11360 | grep "\`malloc'"
     11360: binding file /lib/libc.so.6 [0] to /lib64/ld-linux-x86-64.so.2 [0]: normal symbol `malloc' [GLIBC_2.2.5]
     11360: binding file /lib64/ld-linux-x86-64.so.2 [0] to /lib64/ld-linux-x86-64.so.2 [0]: normal symbol `malloc' [GLIBC_2.2.5]
     11360: binding file /lib/libc.so.6 [0] to /lib64/ld-linux-x86-64.so.2 [0]: normal symbol `malloc' [GLIBC_2.2.5]

Above, because the dynamic loader comes first in the link order, libc.so.6's malloc has bound to the minimal implementation it provides, rather the full-featured one it provides internally.

As an aside, AFAICT, there is really only one reason why a normal library will link against the dynamic loader -- for the thread-local storage support function __tls_get_addr. You can try this yourself:

$ cat tls.c
char __thread *foo;

char* moo(void) {
    return foo;
}
$ gcc -fPIC -o libtls.so -shared tls.c
$ readelf -d ./libtls.so  | grep NEED
 0x00000001 (NEEDED)                     Shared library: [libc.so.6]
 0x00000001 (NEEDED)                     Shared library: [ld-linux.so.2]
 0x6ffffffe (VERNEED)                    0x314
 0x6fffffff (VERNEEDNUM)                 2

Thread-local storage is worthy of a book of its own, but the gist is that this support function says "hey, give me an address of foo in libtls.so", the magic being that if the current thread has never accessed foo then it may not actually have any storage for foo yet, so the dynamic linker can allocate some memory for it lazily and then return the right thing. Otherwise, every thread that started would need to reserve room for foo "just in case", even if it never cares about moo.

But looking a little closer at the symbols of libc.so is also interesting. libc.so doesn't actually have many functions you can override. You can see what it is possible to override by checking the relocations against the procdure-lookup table (PLT).

Relocation section '.rela.plt' at offset 0x1e770 contains 8 entries:
  Offset          Info           Type           Sym. Value    Sym. Name + Addend
00000035b000  084600000007 R_X86_64_JUMP_SLO 00000000000a2100 sysconf + 0
00000035b008  02e000000007 R_X86_64_JUMP_SLO 0000000000075e50 calloc + 0
00000035b010  01dd00000007 R_X86_64_JUMP_SLO 0000000000077910 realloc + 0
00000035b018  029300000007 R_X86_64_JUMP_SLO 00000000000661c0 feof + 0
00000035b020  046f00000007 R_X86_64_JUMP_SLO 00000000000768c0 malloc + 0
00000035b028  000400000007 R_X86_64_JUMP_SLO 0000000000000000 __tls_get_addr + 0
00000035b030  01b400000007 R_X86_64_JUMP_SLO 0000000000076dd0 memalign + 0
00000035b038  086000000007 R_X86_64_JUMP_SLO 00000000000767e0 free + 0

i.e. instead of jumping directly to the malloc defined in the libc code section, any internal calls will jump to this stub which, the first time, asks the dynamic linker to go out and find the address of malloc (it then saves it, so the second time the stub just jumps to the saved location).

This is an interesting list, seemingly without much order. feof, for example, stands out as worth checking out a bit closer — why would that be there when fopen isn't, say? We can track down where it comes from with a bit of detective work; knowing that the value of the symbol feof will be placed into 0x35b018 we can disassemble libc.so to see that this address is used by the feof PLT stub at 0x1e870 (luckily, objdump has done the math to offest from the rip for us; i.e. 0x1e876 + 0x33c7a2 = 0x35b018)

000000000001e870 <feof@plt>:
   1e870:       ff 25 a2 c7 33 00       jmpq   *0x33c7a2(%rip)        # 35b018 <_IO_file_jumps+0xb18>
   1e876:       68 03 00 00 00          pushq  $0x3
   1e87b:       e9 b0 ff ff ff          jmpq   1e830 <h_errno+0x1e7dc>

From there we can search for anyone jumping to that address, and find out the caller:

$ objdump --disassemble-all /lib/libc.so.6 | grep 1e870
000000000001e870 <feof@plt>:
   1e870:   ff 25 a2 c7 33 00       jmpq   *0x33c7a2(%rip)        # 35b018 <_IO_file_jumps+0xb18>
   f19f7:   e8 74 ce f2 ff          callq  1e870 <feof@plt>

$ addr2line -e /lib/libc.so.6 f19f7
/home/aurel32/eglibc/eglibc-2.11.2/sunrpc/bindrsvprt.c:70

Which turns out to be part of a local patch which probably gets things a little wrong, as described below. The sysconf relocation is from a similar add-on patch (being used to find the page size, it seems).

libc, like all sensible libraries, uses the hidden attribute on symbols to restrict what is exported by the library. The benefit of this is that when the linker knows you are referencing a hidden symbol it knows that the value can never be overridden, and thus does not need to emit extra code to do indirection just in case you ever wish to redirect the symbols. In the above, it appears that feof has never been marked as hidden, probably because no internal glibc functions used it until that add-on patch, and since it is not considered an internal function the linker must allow for the possibility that it will be overridden at run time and provide a PLT slot for it. There are consequences; if this was on a fast-path then the extra jumps required to go via the PLT may matter for performance and it may also cause strange behaviour if somebody had preloaded something that took over feof.

Note, this is different from saying that your library can override symbols provided by libc.so; such as when you LD_PRELOAD a library to wrap an open call. What you can not override is the open call that say, the internal libc.so function getusershell does to read /etc/shells.

Having the malloc related calls as preemptable seems intentional and sane; although I can not find a comment to the effect of "we deliberately leave these like this so that users may use alternative malloc implementations", it makes sense so that libc.so is internally using the same malloc as everything else if you choose to use something such as tc-malloc.

tl;dr? Digging into your system libraries is always interesting. Be careful with your link order when creating toolchains, and be careful about symbol visibility when you're working with libraries.