Remote debugging with GDB

→ originally posted on developerblog.redhat.com

This past few weeks I’ve been working on making remote debugging in GDB easier to use. What’s remote debugging? It’s where you run GDB on one machine and the program being debugged on another. To do this you need something to allow GDB to control the program being debugged, and that something is called the remote stub. GDB ships with a remote stub called gdbserver, but other remote stubs exist. You can write them into your own program too, which is handy if you’re using minimal or unusual hardware that cannot run regular applications… cellphone masts, satellites, that kind of thing. I bet you didn’t know GDB could do that!

If you’ve used remote debugging in GDB you’ll know it requires a certain amount of setup. You need to tell GDB how to access to your program’s binaries with a set sysroot command, you need to obtain a local copy of the main executable and supply that to GDB with a file command, and you need to tell GDB to commence remote debugging with a target remote command.

Until now. Now all you need is the target remote command.

This new code is really new. It’s not in any GDB release yet, let alone in RHEL or Fedora. It’s not even in the nightly GDB snapshot, it’s that fresh. So, with the caveat that none of these examples will work today unless you’re using a Git build, here’s some things you can do with gdbserver using the new code.

Here’s an example of a traditional remote debugging session, with the things you type in bold. In one window:

abc$ ssh xyz.example.com
xyz$ gdbserver :9999 --attach 5312
Attached; pid = 5312
Listening on port 9999

gdbserver attached to process 5312, stopped it, and is waiting for GDB to talk to it on TCP port 9999. Now, in another window:

abc$ gdb -q
(gdb) target remote xyz.example.com:9999
Remote debugging using xyz.example.com:9999
...lots of messages you can ignore...
(gdb) bt
#0 0x00000035b5edf098 in *__GI___poll (fds=0x27467a0, nfds=8,
timeout=<optimized out>) at ../sysdeps/unix/sysv/linux/poll.c:83
#1 0x00000035b76449f9 in ?? () from target:/lib64/libglib-2.0.so.0
#2 0x00000035b76451a5 in g_main_loop_run ()
from target:/lib64/libglib-2.0.so.0
#3 0x0000003dfd34dd17 in gtk_main ()
from target:/usr/lib64/libgtk-x11-2.0.so.0
#4 0x000000000040913d in main ()

Now you have GDB on one machine (abc) controlling process 5312 on another machine (xyz) via gdbserver. Here I did a backtrace, but you can do pretty much anything you can with regular, non-remote GDB.

I called that a “traditional” remote debugging session because that’s how a lot of people use this, but there’s a more flexible way of doing things if you’re using gdbserver as your stub. GDB and gdbserver can communicate over stdio pipes, so you can chain commands, and the new code to remove all the setup you used to need makes this really nice. Lets do that first example again, with pipes this time:

abc$ gdb -q
(gdb) target remote | ssh -T xyz.example.com gdbserver - --attach 5312
Remote debugging using | ssh -T xyz.example.com gdbserver - --attach 5312
Attached; pid = 5312
Remote debugging using stdio
...lots of messages...
(gdb)

The “-” in gdbserver’s argument list replaces the “:9999” in the previous example. It tells gdbserver we’re using stdio pipes rather than TCP port 9999. As well as configuring everything with single command, this has the advantage that the communication is through ssh; there’s no security in GDB’s remote protocol, so it’s not the kind of thing you want to do over the open internet.

What else can you do with this? Anything you can do through stdio pipes! You can enter Docker containers:

(gdb) target remote | sudo docker exec -i e0c1afa81e1d gdbserver - --attach 58
Remote debugging using | sudo docker exec -i e0c1afa81e1d gdbserver - --attach 58
Attached; pid = 58
Remote debugging using stdio
...

Notice how I slipped sudo in there too. Anything you can do over stdio pipes, remember? If you’re using Kubernetes you can use kubectl exec, or with OpenShift osc exec.

gdbserver can do more than just attach, you can start programs with it too:

(gdb) target remote | sudo docker exec -i e0c1afa81e1d gdbserver - /bin/sh
Remote debugging using | sudo docker exec -i e0c1afa81e1d gdbserver - /bin/sh
Process /bin/sh created; pid = 89
stdin/stdout redirected
Remote debugging using stdio
...

Or you can start it without any specific program, and then tell it what do do from within GDB. This is by far the most flexible way to use gdbserver. You can control more than one process, for example:

(gdb) target extended-remote | ssh -T root@xyz.example.com gdbserver --multi -
Remote debugging using | gdbserver --multi -
Remote debugging using stdio
(gdb) attach 774
...messages...
(gdb) add-inferior
Added inferior 2
(gdb) inferior 2
[Switching to inferior 2 [<null>] (<noexec>)]
(gdb) attach 871
...messages...
(gdb) info inferiors
Num Description Executable
* 2 process 871 target:/usr/sbin/httpd
  1 process 774 target:/usr/libexec/mysqld

Ready to debug that connection issue between your webserver and database?

Leave a comment

Judgement Day

GDB will be the weapon we fight with if we accidentally build Skynet.

Comments (3)

libiberty demangler fuzzer

I wrote a quick fuzzer for libiberty’s demangler this morning. Here’s how to get and use it:

$ mkdir workdir
$ cd workdir
$ git clone git@github.com:gbenson/binutils-gdb.git src
  ...
$ cd src
$ git co demangler
$ cd ..
$ mkdir build
$ cd build
$ CFLAGS="-g -O0" ../src/configure --with-separate-debug-dir=/usr/lib/debug
  ...
$ make
  ...
$ ../src/fuzzer.sh 
  ...
+ /home/gary/workdir/build/fuzzer
../src/fuzzer.sh: line 12: 22482 Segmentation fault      (core dumped) $fuzzer
# copy the executable and PID from the two lines above
$ gdb -batch /home/gary/workdir/build/fuzzer core.22482 -ex "bt"
...
Core was generated by `/home/gary/workdir/build/fuzzer'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x000000000040d5f2 in op_is_new_cast (op=0x7ffffac19930) at libiberty/cp-demangle.c:3064
3064	  const char *code = op->u.s_operator.op->code;
#0  0x000000000040d5f2 in op_is_new_cast (op=0x7ffffac19930) at libiberty/cp-demangle.c:3064
#1  0x000000000040dc6f in d_expression_1 (di=0x7ffffac19c90) at libiberty/cp-demangle.c:3232
#2  0x000000000040dfbc in d_expression (di=0x7ffffac19c90) at libiberty/cp-demangle.c:3315
#3  0x000000000040cff6 in d_array_type (di=0x7ffffac19c90) at libiberty/cp-demangle.c:2821
#4  0x000000000040c260 in cplus_demangle_type (di=0x7ffffac19c90) at libiberty/cp-demangle.c:2330
#5  0x000000000040cd70 in d_parmlist (di=0x7ffffac19c90) at libiberty/cp-demangle.c:2718
#6  0x000000000040ceb8 in d_bare_function_type (di=0x7ffffac19c90, has_return_type=0) at libiberty/cp-demangle.c:2772
#7  0x000000000040a9b2 in d_encoding (di=0x7ffffac19c90, top_level=1) at libiberty/cp-demangle.c:1287
#8  0x000000000040a6be in cplus_demangle_mangled_name (di=0x7ffffac19c90, top_level=1) at libiberty/cp-demangle.c:1164
#9  0x0000000000413131 in d_demangle_callback (mangled=0x7ffffac19ea0 "_Z1-Av23*;cG~Wo2Vu", options=259, callback=0x40ed11 , opaque=0x7ffffac19d70)
    at libiberty/cp-demangle.c:5862
#10 0x0000000000413267 in d_demangle (mangled=0x7ffffac19ea0 "_Z1-Av23*;cG~Wo2Vu", options=259, palc=0x7ffffac19dd8) at libiberty/cp-demangle.c:5913
#11 0x00000000004132d6 in cplus_demangle_v3 (mangled=0x7ffffac19ea0 "_Z1-Av23*;cG~Wo2Vu", options=259) at libiberty/cp-demangle.c:6070
#12 0x0000000000401a87 in cplus_demangle (mangled=0x7ffffac19ea0 "_Z1-Av23*;cG~Wo2Vu", options=259) at libiberty/cplus-dem.c:858
#13 0x0000000000400ea0 in main (argc=1, argv=0x7ffffac1a0a8) at /home/gary/workdir/src/fuzzer.c:26

The symbol that crashed it is one frame up from the bottom of the backtrace.

Leave a comment

Run-time linker interface

If you’re debugging an application that loads thousands of shared libraries then be sure to read the LinkerInterface page on the GDB wiki.

Leave a comment

r_debug

I’ve been trying to figure out how to get information about libraries loaded with dlmopen out of glibc‘s runtime linker and into GDB.

The current interface uses a structure called r_debug that’s defined in link.h. If the executable’s dynamic section has a DT_DEBUG element, the runtime linker sets that element’s value to the address where this structure can be found. I tried to discover where this interface originated, but I didn’t get very far. The only mention of it I found anywhere in any standard is in the System V Application Binary Interface, where it says:

If an object file participates in dynamic linking, its program header table will have an element of type PT_DYNAMIC. This “segment” contains the .dynamic section. A special symbol, _DYNAMIC, labels the section…

and later:

DT_DEBUG
This member is used for debugging. Its contents are not specified for the ABI; programs that access this entry are not ABI-conforming.

No help there then. In glibc, r_debug looks like this:

struct r_debug
{
  int r_version;              /* Version number for this protocol.  */

  struct link_map *r_map;     /* Head of the chain of loaded objects.  */

  /* This is the address of a function internal to the run-time linker,
     that will always be called when the linker begins to map in a
     library or unmap it, and again when the mapping change is complete.
     The debugger can set a breakpoint at this address if it wants to
     notice shared object mapping changes.  */
  ElfW(Addr) r_brk;
  enum
    {
      /* This state value describes the mapping change taking place when
         the `r_brk' address is called.  */
      RT_CONSISTENT,          /* Mapping change is complete.  */
      RT_ADD,                 /* Beginning to add a new object.  */
      RT_DELETE               /* Beginning to remove an object mapping.  */
    } r_state;

  ElfW(Addr) r_ldbase;        /* Base address the linker is loaded at.  */
};

With glibc, r_version == 1. At least some versions of Solaris have r_version == 2, and when this is the case there are three extra fields, r_ldsomap, r_rdevent, r_flags. GDB uses r_ldsomap if r_version == 2; the other two seem to be the interface with librtld_db. That’s not documented anywhere to my knowledge, and may not even be fixed: applications are supposed to use the external interface to librtld_db as documented here.

Here is the problem: r_debug, as it stands, has no way to access more than one namespace. The objects in r_map are the default namespace, directly linked, or opened with dlopen, or opened with dlmopen with lmid set to LM_ID_BASE. The r_ldsomap field in Solaris’s r_debug gives access to the linker’s namespace, opened with dlmopen with lmid set to LM_ID_LDSO, but you still can’t see any other namespaces.

glibc uses multiple r_debug structures internally, one per namespace. It would be trivial to add a “next r_debug” link to r_debug if it were possible to extend the structure, but to do this you’d need to set r_version > 2. Applications could arguably expect a runtime linker with r_version > 2 to support the version 2 interface in full, but it wouldn’t be possible to do that in glibc without reverse engineering Solaris’s implementation. glibc is therefore stuck at r_version == 1, and the r_debug structure is effectively immutable for all time.

Leave a comment

VM networking tip

If you are setting up VMs using libvirt then it’s a good idea to change the address of the virtual network to something other than the default. Why? Because if you don’t, and you create a guest which itself starts up libvirt and uses NetworkManager then at least some of the time your VM will start up with its networking hosed.

If the host is using the default network (192.168.122.0/24) and the guests also want to use that network then there is a race between NetworkManager bringing up eth0 and libvirt bringing up virbr0. libvirt checks for existing interfaces using the network it is configured for before starting up virbr0, so if NetworkManager brings up eth0 first then virbr0 will not be set up on the guest and everything will be fine. But, if eth0 is not set up by the time libvirt runs the check, then virbr0 will take 192.168.122.0/24, then eth0 will come up on 192.168.122.something, and you’ll have a VM with two separate interfaces connected to two separate networks that both have the same address range… and it won’t work!

The easy way to solve this is to not install libvirt on the guest, but you may not be able to change this until after the guest is running, and if libvirt starts up during a guest’s installer then you may need to complete parts of the installation with no networking. This may or may not be ok for you and your OS. I’m using VMs to set up clean test environments for GDB, and at the moment I’m setting up three or four new “machines” every day (and throwing them away when I’m done) so I want the process as streamlined as possible. If you only occasionally set up new VMs then some extra tasks during the installation may not be a problem, but it is pretty simple to change the network on the host and you only have to do it once:

  1. As root, run virsh net-edit default
  2. Change every occurence of 192.168.122 to something else
  3. Stop any running guests
  4. Run virsh net-destroy default
  5. Run virsh net-start default

It’s a shame this can’t be fixed more conclusively elsewhere, but NetworkManager brings up the interfaces asynchronously at boot time which makes it impossible to definitively schedule libvirt’s startup to happen after NetworkManager.

Thank you Laine Stump for helping me out with this.

Leave a comment

Only dest dir longer than base dir not supported

Hello,

Have you experienced the mysterious error, “Only dest dir longer than base dir not supported”?

I have.

The Problem

When you build an rpm, the code is built in %{_builddir}, which usually evaluates as %{_topdir}/BUILD, which in turn evaluates as something like /home/you/rpmbuild/BUILD. The built code (if built with GCC) ends up with loads of /home/you/rpmbuild/BUILD paths embedded in it, and the script /usr/lib/rpm/debugedit rewrites these paths to /usr/src/debug so that the debuginfo rpms work. /usr/lib/rpm/debugedit cannot extend strings, it can only shrink them. If you are seeing “Only dest dir longer than base dir not supported” then, somewhere in your build system, %{_builddir} is defined as something that expands to a string shorter than /usr/src/debug.

Example

In my case, I was building a glibc rpm in a VM that turned out not to have enough disk. I created a new disk, mounted it on /mnt, and added the line %_topdir /mnt to my ~/.rpmmacros. The result? “Only dest dir longer than base dir not supported”. I fixed it by editing ~/.rpmmacros to say %_topdir /mnt/rpmbuild.

Briefly

/usr/src/debug       # the reference
/mnt/BUILD           # too short!
/mnt/rpmbuild/BUILD  # long enough

Thank you,
The Mgt.

Leave a comment

Saving money

I have a pair of set-top box PCs I’ve been using as always-on servers. I used them because they’re silent, but lately I’ve been thinking about power consumption. They were pretty good when I bought them in 2006 and 2008, but there’s much better stuff available now. I spent £60 on a Raspberry Pi and some supporting bits; given that it uses roughly a tenth the power of one of the set-top boxes it will have paid for itself in about two months.

While reorganising everything I also decommissioned an old Netgear switch which was likely costing £100 a year to run. Maybe it’s time you looked in your networking cupboard too!

Comments (3)

ath9k_htc drivers for RHEL 6.4

This morning I packaged up ath9k_htc drivers for RHEL 6.4. This isn’t anything official, just something I knocked together. I’m using it with a TP-LINK TL-WN722N and kernel-2.6.32-358.2.1.el6.x86_64; it seems to work but your mileage may vary! RPMs here.

Comments (9)

Because I never remember how to use OProfile

sudo opcontrol --reset
sudo opcontrol --start
# the thing you want to profile
sudo opcontrol --stop
opreport -l | less
Comments (2)