Inside Zero and Shark: Calling conventions and the call stub

JavaCalls::call is merely a thin wrapper around JavaCalls::call_helper, so lets have a look in there (they’re both in javaCalls.cpp). The interesting part starts when the JavaCallWrapper is created. JavaCallWrapper‘s constructor manages the transition to _thread_in_Java, amongst other things, and its destructor manages the transition back to _thread_in_vm, so the whole of that block will be _thread_in_Java. This idiom of using an object whose constuctor and destructor manage things is a common one in HotSpot; the apparently unused HandleMark created directly after the JavaCallWrapper is another example of this.

Ok, so now we’re _thread_in_Java, and it’s time to execute some Java code. The call to the call stub is the bit that does that, but before we look at that it’s interesting to skip forward a little, to look at what happens before and after the HandleMark and JavaCallWrapper are destroyed. Immediately before the blocks close is this:

// Preserve oop return value across possible gc points
if (oop_result_flag) {
  thread->set_vm_result((oop) result->get_jobject());
}

and immediately after the blocks close is this:

// Restore possible oop return
if (oop_result_flag) {
  result->set_jobject((jobject) thread->vm_result());
  thread->set_vm_result(NULL);
}

If the Java code called by the call stub returned an object (a java.lang.Object) then a pointer to that object will now be in result — and it’s an oop. The destructors of both HandleMark and JavaCallWrapper contain code that can GC, so these blocks of code are needed to protect that oop. Here, rather than using a handle, the result is protected by being stored in the thread, in a location the GC knows to check and update.

Back to the call stub. What is it? Well, in what I’ll call “classic” HotSpot (where everything from here on in is written in assembly language) every methodOop has a pair of entry points: pointers to the native code that actually executes the method. When a method has been JIT compiled these entry points will point at the JIT compiled code; for interpreter code they will point to some location within the interpreter. The reason there are two entry points is that the interpreter passes arguments and return values in a different manner to compiled code; the interpreter uses a different calling convention from the compiled code. If a method is compiled then its compiled entry point (the entry point that will be called by compiled code) will point directly at the compiled code, but its interpreted entry point will point to the i2c adaptor, which translates from the interpreter calling convention to the compiler calling convention and then jumps to the compiled entry point. Interpreted methods have similar treatment: their interpreted entry point points to the part of the interpreter responsible for executing that method, and their compiled entry point will point to the c2i adaptor.

What does this have to do with the call stub? Well, the call stub is the interface between VM code and the interpreter calling convention. It takes a C array of parameters and copies them to the locations specified by the interpreter calling convention. Then it invokes the method, by jumping to its interpreted entry point. Finally, it copies the result from the location specified by the interpreter calling convention to the address supplied by JavaCalls::call_helper.

You’ll notice this description has been with reference to classic HotSpot. Zero and Shark are mostly the same, but there are two significant differences. Firstly, the reason classic HotSpot has two calling conventions is an optimization. The interpreter calling convention gives better performance in the interpreter, the compiler calling convention gives better performance in the compiler, and the difference is enough to more than offset the overhead of using adaptors for bridging. In Zero and Shark, the limits of what can be done in C++ and with LLVM constrain the design of the calling convention such that having different ones doesn’t really make sense. So — for now, at least — Shark code also uses the interpreter calling convention, and the compiled entry point is never set or used. In Zero and Shark there is only “the calling convention”.

The second difference is that Shark methods require a bit of extra information to execute. Compiled methods need to be able to tell HotSpot where they are in the code at certain times, and in classic HotSpot this is done by looking at the PC. LLVM doesn’t allow us access to this — even if it did, it wouldn’t make much sense — so Shark compiled methods feed HotSpot faked PCs. To do this, each method needs to know where HotSpot thinks the compiled code starts, so in Zero, entry points are not pointers to code but pointers to ZeroEntry or SharkEntry objects. The real entry point is stored within those.

Next time, some details about the calling convention, and some stuff about stacks.

Inside Zero and Shark: Handles and oops, traps and checks

You’re about to run the important enterprise application “Hello World”. What’s going to happen?

class HelloWorld {
  public static void main(String[] args) {
    System.out.println("Hello world!");
  }
}

After initializing itself, HotSpot will create a new Java thread. This will initially be _thread_in_vm because it’s running VM code. Eventually it will call JavaCalls::call (in javaCalls.cpp) to bridge from VM code to Java code. Before we can look at what JavaCalls::call does, however, we need to understand a couple of HotSpot conventions. Look at its prototype:

void JavaCalls::call(JavaValue* result, methodHandle method, JavaCallArguments* args, TRAPS);

The first things we need to understand are handles and oops. All Java objects, and in fact all objects in HotSpot managed by the garbage collector, are oops, and when you’re dealing with oops you need to keep the garbage collector in mind. More specifically, you need to know where in your code the GC might run, because when it does run you need to have told it the location of every single oop you’re using, and when it returns you need to deal with the fact that your oops have probably moved. If your C compiler has optimized your code such that an oop is in a register then the oop in that register is now wrong, and you’re going to crash pretty soon.

Dealing with raw oops is hard, but luckily there are ways of protecting them. In VM code — when you’re _thread_in_vm — the protection of choice is to use handles. A handle wraps an oop, managing access to it such that GC activity becomes transparent. If you’re in VM code and you’re using handles then you don’t have to worry. But you do need to know what’s happening, because if you see some code that’s calling methodHandle methods and you grep the OpenJDK tree to find the methodHandle class definition you will not find it. The methods you are looking for are actually the methods of the methodOopDesc class (in methodOop.hpp). The handle is just a wrapper.

The other thing we need to understand in that prototype is the mysterious TRAPS at the end. It’s kind of a note to the programmer: functions that trap are functions that can throw Java exceptions. When you call them you use CHECK as their final argument for a convenient exception check:

JavaCalls::call(result, method, args, CHECK);

TRAPS and CHECK are defined in exceptions.hpp. You may wish to avert your eyes:

#define TRAPS   Thread* THREAD
#define CHECK   THREAD); if (HAS_PENDING_EXCEPTION) return; (0

Now we can see how HotSpot handles exceptions: they’re simply stored in the thread. Code that cares can access the exception using these guys:

#define PENDING_EXCEPTION       (((ThreadShadow *) THREAD)->pending_exception())
#define HAS_PENDING_EXCEPTION   (((ThreadShadow *) THREAD)->has_pending_exception())

Next time I really will explain how method invocation works…

Inside Zero and Shark: Java threads and state transitions

Andrew Haley has been doing some work on Zero and Shark lately, and his questions have made me realise that while Zero and Shark are pretty small in comparison with the rest of HotSpot, they’re not the easiest of things to get a handle on. I decided to write some articles to try and present a kind of overview of it all, to make things easier for others in the future.

HotSpot is the Java Virtual Machine (JVM) of the OpenJDK project. Its name refers to it’s primary mode of operation, in which Java methods are initially executed by a profiling interpreter, and only after they have been executed a certain number of times are they be deemed “hot” enough to be compiled to native code by a Just In Time (JIT) compiler. The aim is to avoid wasting time compiling rarely-used methods, such that each method you compile to be the one that will improve performance the most.

If you look inside a running HotSpot process you’ll see a number of different threads. There will be VM threads that handle such things as garbage collection. There may be one or more compiler threads — these are the JITs. And there will be Java threads. These are the threads that are executing Java code, the threads we are interested in.

At any time, each Java thread will be in one of (essentially) three states. A thread that is _thread_in_Java is executing code that was written in Java, either by interpreting bytecode or by executing native code compiled by the JIT. A thread that is _thread_in_native is executing a native Java method — JNI code. And a thread that is _thread_in_vm is running code that is part of the VM rather than code that is part of the application.

Threads change state all over the place. Imagine you’re in a Java method (you’re _thread_in_Java) and you invoke a native method. That switches you to _thread_in_native. Then, your native code calls some VM function, and suddenly you’re _thread_in_vm. Maybe that VM function calls some Java code? Now you’re back in _thread_in_Java. And as those calls return the transitions happen in reverse.

When hacking on HotSpot you tend to avoid thread state transitions where possible because various things happen during them and some directions are not cheap. The most obvious example of this is that threads remain _thread_in_Java across method calls, such that if one non-native method calls another then no transition occurs. In fact, _thread_in_Java is something of a default state. If you look at the function in Zero that handles calls to native methods (CppInterpreter::native_entry, in cppInterpreter_zero.cpp) you’ll see that the transition to _thread_in_native is the very last thing to happen before the actual call itself, and that transitioning back to _thread_in_Java is the very first thing to happen once the native method returns. And whilst you’re in there, check out what happens during the transition back to _thread_in_Java. The transition from _thread_in_native to _thread_in_Java is one of the expensive ones.

That pretty much covers threads and state transitions. Next time I’ll explain some of how method invocation actually works.

Porting Shark

Shark when it’s done will be great, a massive improvement over Zero, but LLVM only supports a couple of the platforms people use Zero on. I’ve wondered a few times how the task of porting LLVM to a new architecture compares with writing a full HotSpot port from scratch. This morning I realised I could get a rough idea by simply counting the lines of x86-specific code, the one port they share:

  Lines of code
LLVM 2.4 34,391
HotSpot 14.0b08 77,329

This is just raw lines of code, nothing clever. Both implement a combined IA-32 and X86-64 port, and the HotSpot figure is for the Linux port with the server JIT — one OS, one JIT — so I believe it’s a fair comparison. You could infer that porting LLVM and using Zero and Shark will get you up and running with OpenJDK in about half the time. That’s not bad.

Fun things to type in gdb

Want to see what the C++ interpreter is up to in gdb?

(gdb) bt
...
#6  0x0f42155c in BytecodeInterpreter::run (istate=0xd0f7e55c) at bytecodeInterpreter.cpp:857
...
(gdb) call PI(0xd0f7e55c)
thread: 0x10108650
bcp: 0xf20efe8b
locals: 0xd0f7e5b4
constants: 0xf20f01f8
method: 0xf20efea8[ javasoft.sqe.tests.vm.jdwp.StackFrame.PopFrames.popframes001a$TestedThreadClass.testedMethod(I)I ]
mdx: 0x00000000
stack: 0xd0f7e558
msg: no_request
result_to_call._callee: 0xf2070188
result_to_call._callee_entry_point: 0xf5e95184
result_to_call._bcp_advance: 3 
osr._osr_buf: 0xf2070188
osr._osr_entry: 0xf5e95184
result_return_kind 0xf2070188 
prev_link: 0x00000000
native_mirror: 0x00000000
stack_base: 0xd0f7e55c
stack_limit: 0xd0f7e54c
monitor_base: 0xd0f7e55c
self_link: 0xd0f7e55c

Super-dirty jtreg hacking

Today I made my second official patch to OpenJDK. I forgot how to make the jtreg test and had to figure it out all over again, so here’s my quick and dirty guide for the future:

  1. Build jtreg. I use the IcedTea one, because it’s there:
    make jtreg
  2. Make a test root and copy your test into it:
    mkdir -p tests/tests
    touch tests/TEST.ROOT
    mv ~/Test6779290.java tests/tests
    
  3. Run the tests:
    openjdk-ecj/control/build/linux-ppc/j2sdk-image/jre/bin/java -jar test/jtreg.jar -v1 -s tests

In other news it’s over a year since I started hacking on Zero. I was hoping to be able to announce a TCK-passing build before Christmas but that’s not going to happen. Oh well.

Fedora 10

Apparently Fedora 10’s eclipse-ecj doesn’t have gcj-compiled libraries any more. Never mind:

mkdir /usr/lib/gcj/eclipse-ecj
aot-compile -c "-O3" /usr/lib/eclipse/dropins/jdt/plugins /usr/lib/gcj/eclipse-ecj
rebuild-gcj-db

Also, whilst I’m messing with my system, I’ve always had to do the following for ppc64 builds to work:

mkdir -p /usr/lib/jvm/java-gcj/jre/lib/ppc64/server
ln -s /usr/lib64/gcj-4.3.2/libjvm.so /usr/lib/jvm/java-gcj/jre/lib/ppc64/server

I never figured out how anyone else manages without this. Maybe nobody else is trying to build two platforms on the one box.

Update

With talk of a new IcedTea release I thought I’d better commit what I had of Shark ready for it. I found a couple of what look like optimizer failures while testing (usually I build with optimization disabled, for debugging) but I managed to work around those this morning and get a set of DaCapo results:

  Status Detail
antlr FAIL too many open files
bloat pass 83178ms
chart pass 47227ms
eclipse FAIL one method miscompiles, one method won’t compile
fop pass 15762ms
hsqldb pass 21190ms
jython pass 67533ms
luindex pass 35567ms
lusearch pass 35633ms
pmd pass 60637ms
xalan pass 48422ms

These are still with a non-optimized LLVM, but the numbers are much closer to what I was hoping for than the previous sets.