r_debug

I’ve been trying to figure out how to get information about libraries loaded with dlmopen out of glibc‘s runtime linker and into GDB.

The current interface uses a structure called r_debug that’s defined in link.h. If the executable’s dynamic section has a DT_DEBUG element, the runtime linker sets that element’s value to the address where this structure can be found. I tried to discover where this interface originated, but I didn’t get very far. The only mention of it I found anywhere in any standard is in the System V Application Binary Interface, where it says:

If an object file participates in dynamic linking, its program header table will have an element of type PT_DYNAMIC. This “segment” contains the .dynamic section. A special symbol, _DYNAMIC, labels the section…

and later:

DT_DEBUG
This member is used for debugging. Its contents are not specified for the ABI; programs that access this entry are not ABI-conforming.

No help there then. In glibc, r_debug looks like this:

struct r_debug
{
  int r_version;              /* Version number for this protocol.  */

  struct link_map *r_map;     /* Head of the chain of loaded objects.  */

  /* This is the address of a function internal to the run-time linker,
     that will always be called when the linker begins to map in a
     library or unmap it, and again when the mapping change is complete.
     The debugger can set a breakpoint at this address if it wants to
     notice shared object mapping changes.  */
  ElfW(Addr) r_brk;
  enum
    {
      /* This state value describes the mapping change taking place when
         the `r_brk' address is called.  */
      RT_CONSISTENT,          /* Mapping change is complete.  */
      RT_ADD,                 /* Beginning to add a new object.  */
      RT_DELETE               /* Beginning to remove an object mapping.  */
    } r_state;

  ElfW(Addr) r_ldbase;        /* Base address the linker is loaded at.  */
};

With glibc, r_version == 1. At least some versions of Solaris have r_version == 2, and when this is the case there are three extra fields, r_ldsomap, r_rdevent, r_flags. GDB uses r_ldsomap if r_version == 2; the other two seem to be the interface with librtld_db. That’s not documented anywhere to my knowledge, and may not even be fixed: applications are supposed to use the external interface to librtld_db as documented here.

Here is the problem: r_debug, as it stands, has no way to access more than one namespace. The objects in r_map are the default namespace, directly linked, or opened with dlopen, or opened with dlmopen with lmid set to LM_ID_BASE. The r_ldsomap field in Solaris’s r_debug gives access to the linker’s namespace, opened with dlmopen with lmid set to LM_ID_LDSO, but you still can’t see any other namespaces.

glibc uses multiple r_debug structures internally, one per namespace. It would be trivial to add a “next r_debug” link to r_debug if it were possible to extend the structure, but to do this you’d need to set r_version > 2. Applications could arguably expect a runtime linker with r_version > 2 to support the version 2 interface in full, but it wouldn’t be possible to do that in glibc without reverse engineering Solaris’s implementation. glibc is therefore stuck at r_version == 1, and the r_debug structure is effectively immutable for all time.

5 thoughts on “r_debug

  1. I feel for you. I was the madman who added support for ld.so introspection for DTrace for Linux, and to say it was horrifically painful was to understate the case. If you want to look at the subsequent r_debugs you have to dig about in the rtld_global structure, and oh btw that contains every link map including the first but *only the link maps after the first are kept up to date*… and of course you don’t get the help you get with r_debug keeping everything synchronized: you have to spy on the dl_load_lock yourself (but you can’t block on it because it’s a private futex). (I had extra fun because the whole thing is non-GPLed so I couldn’t even #include any of the relevant headers. Not that I could have anyway because they’re private to glibc and not installed.)

    It’s all really quite unbelievably painful, and in the end not very useful because dlmopen() barely works on Linux: you can only dlmopen() a few times before you run out of TLS descriptors unless you rebuild the world using a different TLS model. I will add support for Infinity to everything I have anything to do with as soon as I see it landing anywhere at all, even though it means I’ll probably have to implement a DWARF expression evaluator first, because *anything* must be better than this.

  2. Nick, there are SystemTap probes in the runtime linker now that you should be able to use instead of spying on the lock. See rtld-debugger-interface.txt in the glibc sources for details.

    Also, somebody told that dlmopen has been recently fixed on Linux to work as it should. I don’t know if that means the old limit has been removed or what, I didn’t look into it closely.

    Also also a DWARF expression evaluator shouldn’t be too hard, just a stack and a few tens of bytecodes. There might even be something you could reuse (let me know if you find something!)

  3. If the version 2 of the structure from solaris is “not documented anywhere to my knowledge, and may not even be fixed: applications are supposed to use the external interface to librtld_db” then how could applications “arguably expect a runtime linker with r_version > 2 to support the version 2 interface in full”. That seems to be a logical error.

    While backward compatibility with the unsupported and potentially unfixed interface would be ideal, why would we slavishly adhere to something never intended to be set in stone.

    There are many other examples where fields in structures are documented to be unsupported. The two that jump to mind are rusage and mallinfo structures.

  4. Ok, maybe they shouldn’t expect r_version > 2 to mean “the version 2 interface is there”… but they might expect it. A case in point, GDB has at least one place where it tests r_version < 2 and operates differently depending on the result (in solib_svr4_r_ldsomap, in solib-svr4.c).

    Even if we did fully implement the v2 interface, there’s no guarantee bumping r_version to 3 would not break some client somewhere, precisely because the interface isn’t documented. Say some client checks the interface version and bails if it’s not 1 or 2. We’d bump the version in glibc and then start getting bug reports. If the interface was documented then we could say “oh, your client is in violation of the spec” but we can’t say that, so what may never have been intended to be set in stone has become so.

    I stand by my assertion that glibc’s r_debug structure cannot be changed without risk of breaking clients.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.