Infinity client library

This past few weeks I’ve been working on an Infinity client library. This is what GDB will use to execute notes it finds. It’s early days, but it executed its first note this morning so I thought I’d put something together so people can see what I’m doing. Here’s how to try it out:

  1. Install elfutils libelf development stuff if you don’t have it already, the tlsdump example program needs it:
    sudo yum install elfutils-libelf-devel  # Fedora, RHEL, etc...
    sudo apt-get install libelf-dev         # Debian, Ubuntu, etc...
  2. Download and build the Infinity client library and example program:
    git clone -b libi8x-0.0.1 https://github.com/gbenson/libi8x.git libi8x-0.0.1
    cd libi8x-0.0.1
    ./autogen.sh
    ./configure --enable-logging --enable-debug
    make
  3. Check the tlsdump example program built:
    bash$ ls -l examples/tlsdump
    -rwxr-xr-x. 1 gary gary 5540 Apr 20 12:52 examples/tlsdump

    Yeah, there it is! (if it’s not there go back to step 0)

  4. Build a program with notes to run the example program against:
    gcc -o tests/ifact tests/ifact.S tests/main.c
  5. Run the program you just built:
    bash$ tests/ifact &
    [2] 8301
    Hello world I'm 8301
  6. Run the libi8x tlsdump example program with the test program’s PID as it’s argument:
    $ examples/tlsdump 8301
    0! = 1
    1! = 1
    2! = 2
    3! = 6
    4! = 24
    5! = 120
    6! = 720
    7! = 5040
    8! = 40320
    9! = 362880
    10! = 3628800
    11! = 39916800
    12! = 479001600

What just happened? The executable test/ifact you built contains a single Infinity note, test::factorial(i)i, the source for which is in tests/ifact.i8. The tlsdump example located the ifact executable, loaded test::factorial(i)i from it, and ran it a few times printing the result:

  err = i8x_ctx_get_funcref (ctx, "test", "factorial", "i", "i", &fr);
  if (err != I8X_OK)
    error_i8x (ctx, err);

  err = i8x_xctx_new (ctx, 512, &xctx);
  if (err != I8X_OK)
    error_i8x (ctx, err);

  for (int i = 0; i < 13; i++)
    {
      union i8x_value args[1], rets[1];

      args[0].i = i;
      err = i8x_xctx_call (xctx, fr, NULL, args, rets);
      if (err != I8X_OK)
	error_i8x (ctx, err);

      printf ("%d! = %d\n", i, rets[0].i);
    }

To see some debug output try this:

I8X_LOG=debug examples/tlsdump PID

Also try I8X_DEBUG=true in addition to I8X_LOG=debug to trace the bytecode as it executes.

Infinity status

I’m winding down for a month away from Infinity. The current status is that the language and note format changes for 0.0.2 are all done. You can get them with:

git clone https://github.com/gbenson/i8c.git

There’s also the beginnings of an Emacs major mode for i8 in there too. My glibc tree now has notes for td_ta_thr_iter as well as td_ta_map_lwp2thr. That’s two of the three hard ones done. Get them with:

git clone https://github.com/gbenson/glibc.git -b infinity2

FWIW td_thr_get_info is just legwork and td_thr_tls_get_addr is just a wrapper for td_thr_tlsbase; td_thr_tlsbase is the other hard note.

All notes have testcases with 100% bytecode coverage. I may add a flag for I8X to make not having 100% coverage a failure, and make glibc use it so nobody can commit notes with untested code.

The total note size so far is 720 bytes so I may still manage to get all five libpthread notes implemented in less than 1k:

Displaying notes found at file offset 0x00018f54 with length 0x000002d0:
  Owner                 Data size	Description
  GNU                  0x00000063	NT_GNU_INFINITY (inspection function)
    Signature: libpthread::__lookup_th_unique(i)ip
  GNU                  0x00000088	NT_GNU_INFINITY (inspection function)
    Signature: libpthread::map_lwp2thr(i)ip
  GNU                  0x000000cd	NT_GNU_INFINITY (inspection function)
    Signature: libpthread::__iterate_thread_list(Fi(po)oipii)ii
  GNU                  0x000000d2	NT_GNU_INFINITY (inspection function)
    Signature: libpthread::thr_iter(Fi(po)oiipi)i

td_ta_map_lwp2thr

To debug live processes on modern Linux GDB needs four libthread_db functions:

  • td_ta_map_lwp2thr (required for initial attach)
  • td_thr_get_info (required for initial attach)
  • td_thr_tls_get_addr (not required for initial attach, but required for “p errno” on regular executables)
  • td_thr_tlsbase (not required for initial attach, but required for “p errno” for -static -pthread executables)

To debug a corefile on modern Linux GDB needs one more libthread_db function:

  • td_ta_thr_iter

GDB makes some other libthread_db calls too, but these are bookkeeping that won’t be required with the replacement. So, the order of work will be:

  1. Implement replacements for the four core functions.
  2. Get those approved and committed in GDB, BFD and glibc (and in binutils, coreutils readelf).
  3. Replace td_ta_thr_iter too, and get that committed.
  4. Implement runtime-linker interface stuff to allow GDB to follow dlmopen.

The first (non-bookkeeping) function GDB calls is td_ta_map_lwp2thr and it’s a pig. If I can do td_ta_map_lwp2thr I can do anything.

When you call it, td_ta_map_lwp2thr has four ways it can proceed:

  1. If __pthread_initialize_minimal has not gotten far enough we can’t rely on whatever’s in the thread registers. If this is the case, td_ta_map_lwp2thr checks that the LWP is the initial thread and sets th->th_unique to NULL. (Other bits of libthread_db spot this NULL and act accordingly.) td_ta_map_lwp2thr decides whether __pthread_initialize_minimal has gotten far enough by examining __stack_user.next in the inferior. If it’s NULL then __pthread_initialize_minimal has not gotten far enough.
  2. On ta_howto_const_thread_area architectures (x86_64, aarch64, arm)
    [glibc/sysdeps/*/nptl/tls.h has
      #define DB_THREAD_SELF CONST_THREAD_AREA(bits, value)
    which exports
      const uint32_t _thread_db_const_thread_area = value;
    from glibc/nptl_db/db_info.c]:

    • td_ta_map_lwp2thr will call ps_get_thread_area with value

    to set th->th_unique.

    ps_get_thread_area (in GDB) does different things for different
    architectures:

    1. on x86_64, value is a register number (FS or GS)
      ps_get_thread_area returns the contents of that register.
    2. on arm, GDB uses PTRACE_GET_THREAD_AREA, NULL and subtracts value from the result.
    3. on aarch64, GDB uses PTRACE_GETREGSET, NT_ARM_TLS and subtracts value from the result.
  3. On ta_howto_reg architectures (ppc*, s390*)
    [glibc/sysdeps/*/nptl/tls.h has
      #define DB_THREAD_SELF REGISTER(bits, size, regofs, bias)...
    which exports
      const uint32_t _thread_db_register32[3] = {size, regofs, bias};
    and/or
      const uint32_t _thread_db_register64[3] = {size, regofs, bias};
    from glibc/nptl_db/db_info.c]:

    td_ta_map_lwp2thr will:

    • call ps_lgetregs to get the inferior’s registers
    • get the contents of the specified register (with _td_fetch_value_local)

        and

    • SUBTRACT bias from the register’s contents

    to set th->unique.

  4. On ta_howto_reg_thread_area architectures (i386)
    [glibc/sysdeps/*/nptl/tls.h has
      #define DB_THREAD_SELF REGISTER_THREAD_AREA(bits, size, regofs, bias)...
    which exports
      const uint32_t _thread_db_register32_thread_area[3] = {size, regofs, bias};
    and/or
      const uint32_t _thread_db_register64_thread_area[3] = {size, regofs, bias};
    from glibc/nptl_db/db_info.c]:

    td_ta_map_lwp2thr will:

    • call ps_lgetregs to get the inferior’s registers
    • get the contents of the specified register (with _td_fetch_value_local)
    • RIGHT SHIFT the register’s contents by bias

        and

    • call ps_get_thread_area with that number

    to set th->unique.

    ps_get_thread_area (in GDB) does different things for different
    architectures:

    1. on i386, GDB uses PTRACE_GET_THREAD_AREA, VALUE and returns the second element of the result.

Cases 2, 3, and 4 will obviously be hardwired into the specific architecture’s libpthread. But… yeah.

Saving money

I have a pair of set-top box PCs I’ve been using as always-on servers. I used them because they’re silent, but lately I’ve been thinking about power consumption. They were pretty good when I bought them in 2006 and 2008, but there’s much better stuff available now. I spent £60 on a Raspberry Pi and some supporting bits; given that it uses roughly a tenth the power of one of the set-top boxes it will have paid for itself in about two months.

While reorganising everything I also decommissioned an old Netgear switch which was likely costing £100 a year to run. Maybe it’s time you looked in your networking cupboard too!

Future archaeology

Andrew Hughes pointed out yesterday that the ARM interpreter and JIT are slated for removal in IcedTea6-1.11 unless someone steps up to maintain it. Currently there’s only one place where the all information about what’s required is collated—inside my head—so I thought I’d better write it up before I start forgetting. It’s entirely possible the interpreter will be removed, but it’s also possible that someone will end up trying to resurrect it months or years down the line. If you are that person and you are reading this then you owe me a beer ;)

The first change that broke the ARM code was the fix for PR icedtea/323, aka Sun bug 6939182. I described the required fix here:

“[In the ARM code] last_Java_sp is set to the address of the top Zero frame wherever the frame anchor is set up. It needs changing such that last_Java_sp is set to thread->zero_stack()->sp() (and the new field last_Java_fp gets set to what last_Java_sp used to be set to).”

The second change that broke the ARM code was the fix for PR icedtea/484, aka Sun bug 6951784. I described the required fix here:

“I have had to change the calling convention within Zero and Shark. All method entries (the C function that executes the method) now return an integer which is the number of deoptimized frames they have left on the stack. Whenever a method is called it is now the caller’s responsibility to check whether frames have been deoptimized and reenter the interpreter if they have.”

The third change, currently in progress, reverts the last commit by the ARM code’s author, Ed Nevill: fix for fast bytecodes with ARM/Shark. This piece of code was accidentally incorporated in one of the webrevs when Zero was upstreamed, and isn’t conditionalised correctly. It can cause problems when the ARM code is not present, and there’s no neat fix. Given that the ARM code has been broken for five days shy of a year now I’ve asked for it to be removed from OpenJDK. This is Sun bug 7030207. If the ARM code is resurrected, this patch will require reinstating (with more specific conditionalisation please!)

The fourth change, currently in the future, is JSR 292. Explicit method handle stuff should just work–it’ll be handled by Zero–but the ARM interpreter and JIT will need updating to support three new instructions: invokedynamic, fast_aldc and fast_aldc_w. The latter two are internal instructions, in case you wondered why you’d never heard of them before!

Ok, that is all.