Infinity

I’m writing a replacement for libthread_db. It’s called Infinity.

Why? Because libthread_db is a pain in the ass for debuggers. GDB has to watch for inferiors loading thread libraries. It has to know that, for example, on GNU/Linux, when the inferior loads libpthread.so then GDB has to locate the corresponding libthread_db.so into itself and use that to inspect libpthread’s internal structures. How does GDB know where libthread_db is? It doesn’t, it has to search for it. How does it know, when it finds it, that the libthread_db it found is compatible with the libpthread the inferior loaded? It doesn’t, it has to load it to see, then unload it if it didn’t work. How does GDB know that the libthread_db it found is compatible with itself? It doesn’t, it has to load it and, erm, crash if it isn’t. How does GDB manage when the inferior (and its libthread_db) has a different ABI to GDB? Well, it doesn’t.

libthread_db means you can’t debug an application in a RHEL 6 container with a GDB in a RHEL 7 container. Probably. Not safely. Not without using gdbserver, anyway–and there’s no reason you should have to use gdbserver to debug what is essentially a native process.

So. Infinity. In Infinity, inspection functions for debuggers will be shipped as bytecode in ELF notes in the same file as the code they pertain to. libpthread.so, for example, will contain a bunch of Infinity notes, each representing some bit of functionality that GDB currently gets from libthread_db. When the inferior starts or loads libraries GDB will find the notes in the files it already loaded and register their functions. If GDB notices it has, for example, the full set of functions it requires for thread support then, boom, thread support switches on. This happens regardless of whether libpthread was dynamically or statically linked.

(If you’re using gdbserver, gdbserver gives GDB a list of Infinity functions it’s interested in. When GDB finds these functions it fires the (slightly rewritten) bytecode over to gdbserver and gdbserver takes it from there.)

Concrete things I have are: a bytecode format (but not the bytecode itself), an executable with a couple of handwritten notes (with some junk where the bytecode should be), a readelf that can decode the notes, a BFD that extracts the notes and a GDB that picks them up.

What I’m doing right now is rewriting a function I don’t understand (td_ta_map_lwp2thr) in a language I’m inventing as I go along (i8) that’ll be compiled with a compiler that barely exists (i8c) into a bytecode that’s totally undefined to be executed by an interpreter that doesn’t exist.

(The compiler’s going to be written in Python, and it’ll emit assembly language. It’s more of an assembler, really. Emitting assembler rather than going straight to bytecode simplifies things (e.g. the compiler won’t need to understand numbers!) at the expense of emitting some slightly crappy code (e.g. instruction sequences that add zero). I’m thinking GDB will eventually JIT the bytecode so this won’t matter. GDB will have to JIT if it’s to cope with millions of threads, but jitted Infinity should be faster than libthread_db. None of this is possible now, but it might be sooner than you thing with the GDB/GCC integration work that’s happening. Besides, I can think of about five different ways to make an interpreter skip null operations in zero time.)

Future applications for Infinity notes include the run-time linker interface.

Watch this space.

Remote debugging with GDB

→ originally posted on developers.redhat.com

This past few weeks I’ve been working on making remote debugging in GDB easier to use. What’s remote debugging? It’s where you run GDB on one machine and the program being debugged on another. To do this you need something to allow GDB to control the program being debugged, and that something is called the remote stub. GDB ships with a remote stub called gdbserver, but other remote stubs exist. You can write them into your own program too, which is handy if you’re using minimal or unusual hardware that cannot run regular applications… cellphone masts, satellites, that kind of thing. I bet you didn’t know GDB could do that!

If you’ve used remote debugging in GDB you’ll know it requires a certain amount of setup. You need to tell GDB how to access to your program’s binaries with a set sysroot command, you need to obtain a local copy of the main executable and supply that to GDB with a file command, and you need to tell GDB to commence remote debugging with a target remote command.

Until now. Now all you need is the target remote command.

This new code is really new. It’s not in any GDB release yet, let alone in RHEL or Fedora. It’s not even in the nightly GDB snapshot, it’s that fresh. So, with the caveat that none of these examples will work today unless you’re using a Git build, here’s some things you can do with gdbserver using the new code.

Here’s an example of a traditional remote debugging session, with the things you type in bold. In one window:

abc$ ssh xyz.example.com
xyz$ gdbserver :9999 --attach 5312
Attached; pid = 5312
Listening on port 9999

gdbserver attached to process 5312, stopped it, and is waiting for GDB to talk to it on TCP port 9999. Now, in another window:

abc$ gdb -q
(gdb) target remote xyz.example.com:9999
Remote debugging using xyz.example.com:9999
...lots of messages you can ignore...
(gdb) bt
#0 0x00000035b5edf098 in *__GI___poll (fds=0x27467a0, nfds=8,
timeout=<optimized out>) at ../sysdeps/unix/sysv/linux/poll.c:83
#1 0x00000035b76449f9 in ?? () from target:/lib64/libglib-2.0.so.0
#2 0x00000035b76451a5 in g_main_loop_run ()
from target:/lib64/libglib-2.0.so.0
#3 0x0000003dfd34dd17 in gtk_main ()
from target:/usr/lib64/libgtk-x11-2.0.so.0
#4 0x000000000040913d in main ()

Now you have GDB on one machine (abc) controlling process 5312 on another machine (xyz) via gdbserver. Here I did a backtrace, but you can do pretty much anything you can with regular, non-remote GDB.

I called that a “traditional” remote debugging session because that’s how a lot of people use this, but there’s a more flexible way of doing things if you’re using gdbserver as your stub. GDB and gdbserver can communicate over stdio pipes, so you can chain commands, and the new code to remove all the setup you used to need makes this really nice. Lets do that first example again, with pipes this time:

abc$ gdb -q
(gdb) target remote | ssh -T xyz.example.com gdbserver - --attach 5312
Remote debugging using | ssh -T xyz.example.com gdbserver - --attach 5312
Attached; pid = 5312
Remote debugging using stdio
...lots of messages...
(gdb)

The “-” in gdbserver’s argument list replaces the “:9999” in the previous example. It tells gdbserver we’re using stdio pipes rather than TCP port 9999. As well as configuring everything with single command, this has the advantage that the communication is through ssh; there’s no security in GDB’s remote protocol, so it’s not the kind of thing you want to do over the open internet.

What else can you do with this? Anything you can do through stdio pipes! You can enter Docker containers:

(gdb) target remote | sudo docker exec -i e0c1afa81e1d gdbserver - --attach 58
Remote debugging using | sudo docker exec -i e0c1afa81e1d gdbserver - --attach 58
Attached; pid = 58
Remote debugging using stdio
...

Notice how I slipped sudo in there too. Anything you can do over stdio pipes, remember? If you’re using Kubernetes you can use kubectl exec, or with OpenShift osc exec.

gdbserver can do more than just attach, you can start programs with it too:

(gdb) target remote | sudo docker exec -i e0c1afa81e1d gdbserver - /bin/sh
Remote debugging using | sudo docker exec -i e0c1afa81e1d gdbserver - /bin/sh
Process /bin/sh created; pid = 89
stdin/stdout redirected
Remote debugging using stdio
...

Or you can start it without any specific program, and then tell it what do do from within GDB. This is by far the most flexible way to use gdbserver. You can control more than one process, for example:

(gdb) target extended-remote | ssh -T root@xyz.example.com gdbserver --multi -
Remote debugging using | gdbserver --multi -
Remote debugging using stdio
(gdb) attach 774
...messages...
(gdb) add-inferior
Added inferior 2
(gdb) inferior 2
[Switching to inferior 2 [<null>] (<noexec>)]
(gdb) attach 871
...messages...
(gdb) info inferiors
Num Description Executable
* 2 process 871 target:/usr/sbin/httpd
  1 process 774 target:/usr/libexec/mysqld

Ready to debug that connection issue between your webserver and database?

libiberty demangler fuzzer

I wrote a quick fuzzer for libiberty’s demangler this morning. Here’s how to get and use it:

$ mkdir workdir
$ cd workdir
$ git clone git@github.com:gbenson/binutils-gdb.git src
  ...
$ cd src
$ git co demangler
$ cd ..
$ mkdir build
$ cd build
$ CFLAGS="-g -O0" ../src/configure --with-separate-debug-dir=/usr/lib/debug
  ...
$ make
  ...
$ ../src/fuzzer.sh 
  ...
+ /home/gary/workdir/build/fuzzer
../src/fuzzer.sh: line 12: 22482 Segmentation fault      (core dumped) $fuzzer
# copy the executable and PID from the two lines above
$ gdb -batch /home/gary/workdir/build/fuzzer core.22482 -ex "bt"
...
Core was generated by `/home/gary/workdir/build/fuzzer'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x000000000040d5f2 in op_is_new_cast (op=0x7ffffac19930) at libiberty/cp-demangle.c:3064
3064	  const char *code = op->u.s_operator.op->code;
#0  0x000000000040d5f2 in op_is_new_cast (op=0x7ffffac19930) at libiberty/cp-demangle.c:3064
#1  0x000000000040dc6f in d_expression_1 (di=0x7ffffac19c90) at libiberty/cp-demangle.c:3232
#2  0x000000000040dfbc in d_expression (di=0x7ffffac19c90) at libiberty/cp-demangle.c:3315
#3  0x000000000040cff6 in d_array_type (di=0x7ffffac19c90) at libiberty/cp-demangle.c:2821
#4  0x000000000040c260 in cplus_demangle_type (di=0x7ffffac19c90) at libiberty/cp-demangle.c:2330
#5  0x000000000040cd70 in d_parmlist (di=0x7ffffac19c90) at libiberty/cp-demangle.c:2718
#6  0x000000000040ceb8 in d_bare_function_type (di=0x7ffffac19c90, has_return_type=0) at libiberty/cp-demangle.c:2772
#7  0x000000000040a9b2 in d_encoding (di=0x7ffffac19c90, top_level=1) at libiberty/cp-demangle.c:1287
#8  0x000000000040a6be in cplus_demangle_mangled_name (di=0x7ffffac19c90, top_level=1) at libiberty/cp-demangle.c:1164
#9  0x0000000000413131 in d_demangle_callback (mangled=0x7ffffac19ea0 "_Z1-Av23*;cG~Wo2Vu", options=259, callback=0x40ed11 , opaque=0x7ffffac19d70)
    at libiberty/cp-demangle.c:5862
#10 0x0000000000413267 in d_demangle (mangled=0x7ffffac19ea0 "_Z1-Av23*;cG~Wo2Vu", options=259, palc=0x7ffffac19dd8) at libiberty/cp-demangle.c:5913
#11 0x00000000004132d6 in cplus_demangle_v3 (mangled=0x7ffffac19ea0 "_Z1-Av23*;cG~Wo2Vu", options=259) at libiberty/cp-demangle.c:6070
#12 0x0000000000401a87 in cplus_demangle (mangled=0x7ffffac19ea0 "_Z1-Av23*;cG~Wo2Vu", options=259) at libiberty/cplus-dem.c:858
#13 0x0000000000400ea0 in main (argc=1, argv=0x7ffffac1a0a8) at /home/gary/workdir/src/fuzzer.c:26

The symbol that crashed it is one frame up from the bottom of the backtrace.

r_debug

I’ve been trying to figure out how to get information about libraries loaded with dlmopen out of glibc‘s runtime linker and into GDB.

The current interface uses a structure called r_debug that’s defined in link.h. If the executable’s dynamic section has a DT_DEBUG element, the runtime linker sets that element’s value to the address where this structure can be found. I tried to discover where this interface originated, but I didn’t get very far. The only mention of it I found anywhere in any standard is in the System V Application Binary Interface, where it says:

If an object file participates in dynamic linking, its program header table will have an element of type PT_DYNAMIC. This “segment” contains the .dynamic section. A special symbol, _DYNAMIC, labels the section…

and later:

DT_DEBUG
This member is used for debugging. Its contents are not specified for the ABI; programs that access this entry are not ABI-conforming.

No help there then. In glibc, r_debug looks like this:

struct r_debug
{
  int r_version;              /* Version number for this protocol.  */

  struct link_map *r_map;     /* Head of the chain of loaded objects.  */

  /* This is the address of a function internal to the run-time linker,
     that will always be called when the linker begins to map in a
     library or unmap it, and again when the mapping change is complete.
     The debugger can set a breakpoint at this address if it wants to
     notice shared object mapping changes.  */
  ElfW(Addr) r_brk;
  enum
    {
      /* This state value describes the mapping change taking place when
         the `r_brk' address is called.  */
      RT_CONSISTENT,          /* Mapping change is complete.  */
      RT_ADD,                 /* Beginning to add a new object.  */
      RT_DELETE               /* Beginning to remove an object mapping.  */
    } r_state;

  ElfW(Addr) r_ldbase;        /* Base address the linker is loaded at.  */
};

With glibc, r_version == 1. At least some versions of Solaris have r_version == 2, and when this is the case there are three extra fields, r_ldsomap, r_rdevent, r_flags. GDB uses r_ldsomap if r_version == 2; the other two seem to be the interface with librtld_db. That’s not documented anywhere to my knowledge, and may not even be fixed: applications are supposed to use the external interface to librtld_db as documented here.

Here is the problem: r_debug, as it stands, has no way to access more than one namespace. The objects in r_map are the default namespace, directly linked, or opened with dlopen, or opened with dlmopen with lmid set to LM_ID_BASE. The r_ldsomap field in Solaris’s r_debug gives access to the linker’s namespace, opened with dlmopen with lmid set to LM_ID_LDSO, but you still can’t see any other namespaces.

glibc uses multiple r_debug structures internally, one per namespace. It would be trivial to add a “next r_debug” link to r_debug if it were possible to extend the structure, but to do this you’d need to set r_version > 2. Applications could arguably expect a runtime linker with r_version > 2 to support the version 2 interface in full, but it wouldn’t be possible to do that in glibc without reverse engineering Solaris’s implementation. glibc is therefore stuck at r_version == 1, and the r_debug structure is effectively immutable for all time.

Easy things to do with GDB #1

I’ve been meaning to write some introductory articles to GDB ever since back in February at FOSDEM, where I was surprised by just how many people do not use GDB–or even know what it is! It’s taken me a while to figure out a nice example, but I finally found them.

Last week I wanted to extend ogg123, a tool for playing music files on the commandline. It supports several formats–Ogg Vorbis, FLAC and Speex–and the way it abstracts that is by passing around a struct format_t full of function pointers to various callbacks. I was adding a new format, and wanted to know what kind of arguments it was calling its callbacks with. I could have tried to figure it out by reading the code, but it was easier to just run it in GDB and see what was happening. Here’s how:

  1. Build a copy of ogg123 with debugging support:
    wget http://downloads.xiph.org/releases/vorbis/vorbis-tools-1.4.0.tar.gz
    tar xf vorbis-tools-1.4.0.tar.gz
    cd vorbis-tools-1.4.0
    CFLAGS="-g -O0" ./configure
    make

    The important bit here is the CFLAGS="-g -O0". This causes configure to pass these extra options to GCC in the makefiles it generates. -g instructs GCC to include debugging information in the files it generates to allow GDB to relate the executable files you’re debugging to the source files they were generated from. -O0 instructs GCC not to optimise the code, which makes for easier debugging.

  2. Start up GDB on the ogg123 you just built:
    gdb -args ogg123/ogg123 ~/music/Various/Disco\ Demands/4-08\ -\ Give\ It\ Up.flac

    Everything after the -args option is the command you’d normally type to run the program. GDB will print out some stuff as it starts up, then present you with a (gdb) prompt and wait for you to enter some commands:

    GNU gdb (GDB) 7.4.50.20120313-cvs
    Copyright (C) 2012 Free Software Foundation, Inc.
    License GPLv3+: GNU GPL version 3 or later 
    This is free software: you are free to change and redistribute it.
    There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
    and "show warranty" for details.
    This GDB was configured as "x86_64-unknown-linux-gnu".
    For bug reporting instructions, please see:
    <http://www.gnu.org/software/gdb/bugs/>...
    Reading symbols from /home/gary/vorbis-tools/ogg123/ogg123...done.
    (gdb) 
  3. Ok, the callback I’m interested in is the “read” callback, and we’re playing a FLAC file so the callback is called flac_read. I’d like to run the program until flac_read is called, so I set a breakpoint on flac_read by entering break flac_read and then using the run command to set ogg123 going:
    (gdb) break flac_read 
    Breakpoint 1 at 0x40dde5: file flac_format.c, line 253.
    (gdb) run
    Starting program: /home/gary/vorbis-tools/ogg123/ogg123 /home/gary/music/Various/Disco\ Demands/4-08\ -\ Give\ It\ Up.flac
    [Thread debugging using libthread_db enabled]
    Using host libthread_db library "/lib64/libthread_db.so.1".
    [New Thread 0x7fffeb863700 (LWP 10312)]
    [Thread 0x7fffeb863700 (LWP 10312) exited]
    
    Audio Device:   PulseAudio Output
    
    Playing: /home/gary/music/Various/Disco Demands/4-08 - Give It Up.flac
    
    Breakpoint 1, flac_read (decoder=0x637560, ptr=0x6182a0, nbytes=30240, 
        eos=0x7fffffffe37c, audio_fmt=0x7fffffffe380) at flac_format.c:253
    253	  flac_private_t *priv = decoder->private;
    (gdb)
  4. GDB now has a running copy of ogg123 stopped at the very start of flac_read. Pretty jazzy huh? You can see the values of the arguments in the second-to-last line it printed, so for example you can see that the caller has requested to read 30240 bytes of data. The various structures (decoder=0x637560 etc) aren’t so useful as they’re pointers, but we can deference them with the print command:
    (gdb) print *decoder
    $1 = {source = 0x637060, request_fmt = {big_endian = 0, word_size = 2, 
        signed_sample = 1, rate = 0, channels = 0, matrix = 0x0}, actual_fmt = {
        big_endian = 0, word_size = 2, signed_sample = 1, rate = 0, channels = 0, 
        matrix = 0x0}, format = 0x617ec0, callbacks = 0x7fffffffe3a0, 
      callback_arg = 0x0, private = 0x6375d0}

    That’s pretty nice, we can see for example what format it wants the data in: little endian, signed, etc. Also, notice the line of code it stopped on:

    253	  flac_private_t *priv = decoder->private;

    That’s the next line of code GDB will execute if we set the program going again in some way. We can advance over just that line with the next command:

    (gdb) next
    254	  decoder_callbacks_t *cb = decoder->callbacks;

    Now it’s initialised the local variable priv, which we can also print out:

    (gdb) p *priv
    $2 = {decoder = 0x637000, is_oggflac = 0, channels = 2, rate = 44100, 
      bits_per_sample = 16, totalsamples = 13337016, currentsample = 0, 
      samples_decoded = 4096, samples_decoded_previous = 0, bytes_read = 24576, 
      bytes_read_previous = 0, comments = 0x638240, bos = 1, eos = 0, 
      buf = 0x637020, buf_len = 4096, buf_start = 0, buf_fill = 4096, stats = {
        total_time = 0, current_time = 0, instant_bitrate = 0, avg_bitrate = 0}}

    Did you see what I did there? Most GDB commands have abbreviated forms, and the abbreviation for print is p. The commands I’ve introduced in this article are some of the most commonly used, and their abbreviated forms are all simply their first letter: b for breakpoint, r for run and n for next. You can also repeat the previous command by pressing Return, which is handy for doing a load of next commands one after the other, for instance.

Ok, that’s all for today. Go and have a play!

Working on GDB

For future reference, and because I keep forgetting things, here is how I work on GDB:

  1. Check out a copy of the sources. Some of this is described on the branch management page of the GDB wiki:
    mkdir /somewhere/to/work
    cd /somewhere/to/work
    git clone ssh://sourceware.org/git/archer.git src
    cd src
    git remote add gdb git://sourceware.org/git/gdb.git
  2. Point the master branch to the correct place. The real one is in the GDB repo, but there’s an old one in Archer that is no longer used. Open /somewhere/to/work/src/.git/config in your favourite editor and change remote = origin to remote = gdb in the [branch "master"] section. Then probably do a git pull to fetch the actual code you’ll be working on.
  3. Create yourself a nice new branch to make your changes in:
  4. git branch --track archer-gbenson-lazy-skip-inline-frames gdb/master
    git checkout archer-gbenson-lazy-skip-inline-frames
  5. (make your changes here)
  6. Build it. I build out-of-tree, like this:
    mkdir /somewhere/to/work/build
    cd /somewhere/to/work/build
    CFLAGS="-g -O0" ../src/configure --with-separate-debug-dir=/usr/lib/debug
    make
  7. Run the testsuite. I like to run the tests twice, once without and once with the changes, then I diff the results and pipe it through a script I wrote to hide the unimportant bits:
    make check >& build-20110921-3.log
    diff -u baseline-20110919.log build-20110921-3.log | gdbtestdiff
  8. If a test fails, you’ll want to hook gdb into it. To do that you run the specific test to create a transcript:
    cd /somewhere/to/work/build/gdb/testsuite
    runtest TRANSCRIPT=1 ../../../src/gdb/testsuite/gdb.opt/inline-cmds.exp
    gdb ../gdb
  9. Then you pipe the transcript into gdb:
    set prompt (top-gdb) 
    b internal_error
    r -nx <transcript.1

Ok, I think that is enough for now.