Inside Zero and Shark: HotSpot’s stacks

Now that we understand what Java is expecting of the stack we can take a look at how HotSpot and Zero implement it. It would, of course, be perfectly possible to implement a stack exactly as described, but in practice that’s not how it’s done.

The first difference is pretty straightforward. Remember I said that when a method is called the interpreter pops its arguments from the stack and copies them to the callee’s local variables? Well, all that copying would be pretty inefficient, so HotSpot simply doesn’t bother. Imagine you’re executing some method. You’ve just pushed three values onto the stack, value_a, value_b and value_c, and you’re about to invoke a method that takes two arguments:

 
value_c
value_b
value_a

When it enters the callee, rather than popping the arguments, it leaves them where they are:

 
value_c local[1]
value_b local[0]
  value_a  

If the callee has more locals than it has arguments (say this method needs four) then extra locals (set to zero) will be pushed onto the stack:

 
  0   local[3]
0 local[2]
value_c local[1]
value_b local[0]
  value_a  

Execution continues as normal after that, with values being pushed onto the stack after the locals. As methods call methods call methods call methods, the stack becomes split into layers, with individual method’s stacks interleaved with blocks of local variables. When a method returns, everything up to and including local[0] is popped. If a method is to return a value, then that will popped before everything else and pushed back onto the stack afterwards. We’ve exchanged copying the arguments for copying the result, a good tradeoff given that methods can have many arguments but only one result.

So far so good, but there’s another difference. The stack I’ve been talking to until now is the stack of the Java language. In HotSpot this is variously referred to as the Java expression stack, the Java stack or the expression stack. But HotSpot is a program in itself, and it has a stack of its own, the ABI stack or native stack. This is the stack that C and C++ functions use to store their own bits and pieces on. HotSpot was originally written for i386, a platform notoriously starved of registers, and rather than maintaining two separate stack pointers the HotSpot engineers decided to store the Java stack on the ABI stack and save a register. Each time a Java method is invoked, the native code that executes it needs to store some state on the ABI stack, so chunks of Java stuff end up interleaved with chunks of ABI stuff between each method’s local variables and its part of the expression stack.

This tinkering with the ABI stack is one of the two reasons the C++ interpreter in HotSpot required a layer written in assembler — you don’t have that kind of access to the ABI stack in C++. Zero, of course, is written in C++, and doesn’t have that access; Zero maintains a separate stack, an instance of the ZeroStack class from stack_zero.hpp. That could have consigned this interleaving to a side-note from history, but sadly the C++ interpreter expects to find it’s state information stored between its local variables and its expression stack. Rather than rewriting the C++ interpreter, Zero interleaves too. It’s the path of least resistance.

You can see what I mean in this crash dump that — congratulations! — you are now qualified to understand. The trace is split into frames, with each frame representing one method invocation. The deepest frame is at the top, so here:

java.lang.Thread.run
called java.util.concurrent.ThreadPoolExecutor$Worker.run
called java.util.concurrent.ThreadPoolExecutor.runWorker
called sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run
called sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0
called java.net.Socket.getInputStream — which crashed.

The top frame’s expression stack is at 0xd0ffe7e40xd0ffe7ec, and its local variables are at start of the next frame down, at 0xd0ffe83c0xd0ffe848. Between them is the C++ interpreter’s state (at 0xd0ffe7f00xd0ffe830) and two words which the stack walker uses to figure out where everything is.

I’m nearly finished, but there’s one final thing. The Java Language Specification specifies the sizes of the various types in terms of the number of stack slots or local variable slots they occupy: long and double values take two slots, and everything else takes one. If a method’s arguments are an int, an Object, a long and an int, it’s local variables on entry will look like this:

int   local[4]
long local[3]
local[2]
Object local[1]
int local[0]

This is pretty straightforward, aside from the fact that the long is officially in local[2] but its address is actually the address of local[3]. The problem arises when you’re using a 64-bit machine — the 64-bit Object pointer has been allocated the same number of slots as a 32-bit int. On 64-bit platforms, therefore, stack slots need to be 64-bits wide, which wastes space, and leaves us the choice of where in the slot to put non-Object types. The various classic HotSpot ports do this in different ways, but on Zero everything is accessed by slot number so values are positioned such that they start at the address of the start of the slot. This means the calculation the same regardless of whether the machine is 32- or 64-bit, and makes the majority of this stuff transparent. The same local variable array on 64-bit Zero looks like this:

int     local[4]
long local[3]
  local[2]
Object local[1]
int   local[0]

I’ll shut up about stacks now!

Inside Zero and Shark: The Java stack

This article will be a little generic — nothing about HotSpot, nothing about Zero — but before we can understand Zero’s calling convention we need to go up a level and understand the calling convention of Java itself, in which arguments and results are passed on the stack. Lets have a look at an example to see how that works:

class HelloUser {
  public static void main(String[] args) {
    System.out.print("Hello ");
    System.out.println(System.getProperty("user.name"));
  }
}

We’re going to have to disassemble it to see what’s happening:

public static void main(java.lang.String[]);
    0:  getstatic       [Field java/lang/System.out:Ljava/io/PrintStream;]
    3:  ldc             [String "Hello "]
    5:  invokevirtual   [Method java/io/PrintStream.print:(Ljava/lang/String;)V]
    8:  getstatic       [Field java/lang/System.out:Ljava/io/PrintStream;]
   11:  ldc             [String "user.name"]
   13:  invokestatic    [Method java/lang/System.getProperty:(Ljava/lang/String;)Ljava/lang/String;]
   16:  invokevirtual   [Method java/io/PrintStream.println:(Ljava/lang/String;)V]
   19:  return

The getstatic instruction gets a value from a static field of a class (in this case the out field of the System class) and pushes it onto the stack. The ldc instruction loads a constant (the string "Hello ") and pushes that onto the stack. So far we have this:

 
 
 
 
 
 
 
System.out
 
 
"Hello "
System.out
Before 0: getstatic Before 3: ldc Before 5: invokevirtual

The next instruction is an invokevirtual, which is going to call the method java.io.PrintStream.print. This takes two arguments, the implicit argument this, and the string to print, so the interpreter pops two values from the stack, stores them as the callee’s first two local variables, and starts to execute the callee. When the callee returns the stack will be empty:

 
 
 
 
Before 8: getstatic

We now have another getstatic and another ldc:

 
 
 
System.out
 
 
"user.name"
System.out
Before 11: ldc Before 13: invokestatic

The next instruction is an invokestatic, another method call. This is calling java.lang.System.getProperty, which takes only one argument, the name of the property to get (static methods have no this). Presently there are two values on the stack, but the interpreter doesn’t care about that. It simply pops the top value from the stack, stores it as the callee’s first local variable, and starts to execute the callee. This time, however, the callee returns a value, the user’s name, so when it returns it will have pushed that onto the stack:

 
 
"gbenson"
System.out
Before 16: invokevirtual

Now we’re ready for the final call, another invokevirtual. That extra value on the stack may have seemed odd before, but now it makes sense; it’s the first argument for this call! The interpreter pops two values from the stack, stores them as the callee’s first two local variables, and starts to execute the callee. This method returns nothing, so when the callee returns the stack will be empty. HelloUser.main returns nothing, so the stack is now exactly as it should be for us to execute the return instruction:

 
 
 
 
Before 19: return

Next time we’ll see how all this works in HotSpot and Zero.

Inside Zero and Shark: Calling conventions and the call stub

JavaCalls::call is merely a thin wrapper around JavaCalls::call_helper, so lets have a look in there (they’re both in javaCalls.cpp). The interesting part starts when the JavaCallWrapper is created. JavaCallWrapper‘s constructor manages the transition to _thread_in_Java, amongst other things, and its destructor manages the transition back to _thread_in_vm, so the whole of that block will be _thread_in_Java. This idiom of using an object whose constuctor and destructor manage things is a common one in HotSpot; the apparently unused HandleMark created directly after the JavaCallWrapper is another example of this.

Ok, so now we’re _thread_in_Java, and it’s time to execute some Java code. The call to the call stub is the bit that does that, but before we look at that it’s interesting to skip forward a little, to look at what happens before and after the HandleMark and JavaCallWrapper are destroyed. Immediately before the blocks close is this:

// Preserve oop return value across possible gc points
if (oop_result_flag) {
  thread->set_vm_result((oop) result->get_jobject());
}

and immediately after the blocks close is this:

// Restore possible oop return
if (oop_result_flag) {
  result->set_jobject((jobject) thread->vm_result());
  thread->set_vm_result(NULL);
}

If the Java code called by the call stub returned an object (a java.lang.Object) then a pointer to that object will now be in result — and it’s an oop. The destructors of both HandleMark and JavaCallWrapper contain code that can GC, so these blocks of code are needed to protect that oop. Here, rather than using a handle, the result is protected by being stored in the thread, in a location the GC knows to check and update.

Back to the call stub. What is it? Well, in what I’ll call “classic” HotSpot (where everything from here on in is written in assembly language) every methodOop has a pair of entry points: pointers to the native code that actually executes the method. When a method has been JIT compiled these entry points will point at the JIT compiled code; for interpreter code they will point to some location within the interpreter. The reason there are two entry points is that the interpreter passes arguments and return values in a different manner to compiled code; the interpreter uses a different calling convention from the compiled code. If a method is compiled then its compiled entry point (the entry point that will be called by compiled code) will point directly at the compiled code, but its interpreted entry point will point to the i2c adaptor, which translates from the interpreter calling convention to the compiler calling convention and then jumps to the compiled entry point. Interpreted methods have similar treatment: their interpreted entry point points to the part of the interpreter responsible for executing that method, and their compiled entry point will point to the c2i adaptor.

What does this have to do with the call stub? Well, the call stub is the interface between VM code and the interpreter calling convention. It takes a C array of parameters and copies them to the locations specified by the interpreter calling convention. Then it invokes the method, by jumping to its interpreted entry point. Finally, it copies the result from the location specified by the interpreter calling convention to the address supplied by JavaCalls::call_helper.

You’ll notice this description has been with reference to classic HotSpot. Zero and Shark are mostly the same, but there are two significant differences. Firstly, the reason classic HotSpot has two calling conventions is an optimization. The interpreter calling convention gives better performance in the interpreter, the compiler calling convention gives better performance in the compiler, and the difference is enough to more than offset the overhead of using adaptors for bridging. In Zero and Shark, the limits of what can be done in C++ and with LLVM constrain the design of the calling convention such that having different ones doesn’t really make sense. So — for now, at least — Shark code also uses the interpreter calling convention, and the compiled entry point is never set or used. In Zero and Shark there is only “the calling convention”.

The second difference is that Shark methods require a bit of extra information to execute. Compiled methods need to be able to tell HotSpot where they are in the code at certain times, and in classic HotSpot this is done by looking at the PC. LLVM doesn’t allow us access to this — even if it did, it wouldn’t make much sense — so Shark compiled methods feed HotSpot faked PCs. To do this, each method needs to know where HotSpot thinks the compiled code starts, so in Zero, entry points are not pointers to code but pointers to ZeroEntry or SharkEntry objects. The real entry point is stored within those.

Next time, some details about the calling convention, and some stuff about stacks.

Inside Zero and Shark: Handles and oops, traps and checks

You’re about to run the important enterprise application “Hello World”. What’s going to happen?

class HelloWorld {
  public static void main(String[] args) {
    System.out.println("Hello world!");
  }
}

After initializing itself, HotSpot will create a new Java thread. This will initially be _thread_in_vm because it’s running VM code. Eventually it will call JavaCalls::call (in javaCalls.cpp) to bridge from VM code to Java code. Before we can look at what JavaCalls::call does, however, we need to understand a couple of HotSpot conventions. Look at its prototype:

void JavaCalls::call(JavaValue* result, methodHandle method, JavaCallArguments* args, TRAPS);

The first things we need to understand are handles and oops. All Java objects, and in fact all objects in HotSpot managed by the garbage collector, are oops, and when you’re dealing with oops you need to keep the garbage collector in mind. More specifically, you need to know where in your code the GC might run, because when it does run you need to have told it the location of every single oop you’re using, and when it returns you need to deal with the fact that your oops have probably moved. If your C compiler has optimized your code such that an oop is in a register then the oop in that register is now wrong, and you’re going to crash pretty soon.

Dealing with raw oops is hard, but luckily there are ways of protecting them. In VM code — when you’re _thread_in_vm — the protection of choice is to use handles. A handle wraps an oop, managing access to it such that GC activity becomes transparent. If you’re in VM code and you’re using handles then you don’t have to worry. But you do need to know what’s happening, because if you see some code that’s calling methodHandle methods and you grep the OpenJDK tree to find the methodHandle class definition you will not find it. The methods you are looking for are actually the methods of the methodOopDesc class (in methodOop.hpp). The handle is just a wrapper.

The other thing we need to understand in that prototype is the mysterious TRAPS at the end. It’s kind of a note to the programmer: functions that trap are functions that can throw Java exceptions. When you call them you use CHECK as their final argument for a convenient exception check:

JavaCalls::call(result, method, args, CHECK);

TRAPS and CHECK are defined in exceptions.hpp. You may wish to avert your eyes:

#define TRAPS   Thread* THREAD
#define CHECK   THREAD); if (HAS_PENDING_EXCEPTION) return; (0

Now we can see how HotSpot handles exceptions: they’re simply stored in the thread. Code that cares can access the exception using these guys:

#define PENDING_EXCEPTION       (((ThreadShadow *) THREAD)->pending_exception())
#define HAS_PENDING_EXCEPTION   (((ThreadShadow *) THREAD)->has_pending_exception())

Next time I really will explain how method invocation works…

Inside Zero and Shark: Java threads and state transitions

Andrew Haley has been doing some work on Zero and Shark lately, and his questions have made me realise that while Zero and Shark are pretty small in comparison with the rest of HotSpot, they’re not the easiest of things to get a handle on. I decided to write some articles to try and present a kind of overview of it all, to make things easier for others in the future.

HotSpot is the Java Virtual Machine (JVM) of the OpenJDK project. Its name refers to it’s primary mode of operation, in which Java methods are initially executed by a profiling interpreter, and only after they have been executed a certain number of times are they be deemed “hot” enough to be compiled to native code by a Just In Time (JIT) compiler. The aim is to avoid wasting time compiling rarely-used methods, such that each method you compile to be the one that will improve performance the most.

If you look inside a running HotSpot process you’ll see a number of different threads. There will be VM threads that handle such things as garbage collection. There may be one or more compiler threads — these are the JITs. And there will be Java threads. These are the threads that are executing Java code, the threads we are interested in.

At any time, each Java thread will be in one of (essentially) three states. A thread that is _thread_in_Java is executing code that was written in Java, either by interpreting bytecode or by executing native code compiled by the JIT. A thread that is _thread_in_native is executing a native Java method — JNI code. And a thread that is _thread_in_vm is running code that is part of the VM rather than code that is part of the application.

Threads change state all over the place. Imagine you’re in a Java method (you’re _thread_in_Java) and you invoke a native method. That switches you to _thread_in_native. Then, your native code calls some VM function, and suddenly you’re _thread_in_vm. Maybe that VM function calls some Java code? Now you’re back in _thread_in_Java. And as those calls return the transitions happen in reverse.

When hacking on HotSpot you tend to avoid thread state transitions where possible because various things happen during them and some directions are not cheap. The most obvious example of this is that threads remain _thread_in_Java across method calls, such that if one non-native method calls another then no transition occurs. In fact, _thread_in_Java is something of a default state. If you look at the function in Zero that handles calls to native methods (CppInterpreter::native_entry, in cppInterpreter_zero.cpp) you’ll see that the transition to _thread_in_native is the very last thing to happen before the actual call itself, and that transitioning back to _thread_in_Java is the very first thing to happen once the native method returns. And whilst you’re in there, check out what happens during the transition back to _thread_in_Java. The transition from _thread_in_native to _thread_in_Java is one of the expensive ones.

That pretty much covers threads and state transitions. Next time I’ll explain some of how method invocation actually works.

Porting Shark

Shark when it’s done will be great, a massive improvement over Zero, but LLVM only supports a couple of the platforms people use Zero on. I’ve wondered a few times how the task of porting LLVM to a new architecture compares with writing a full HotSpot port from scratch. This morning I realised I could get a rough idea by simply counting the lines of x86-specific code, the one port they share:

  Lines of code
LLVM 2.4 34,391
HotSpot 14.0b08 77,329

This is just raw lines of code, nothing clever. Both implement a combined IA-32 and X86-64 port, and the HotSpot figure is for the Linux port with the server JIT — one OS, one JIT — so I believe it’s a fair comparison. You could infer that porting LLVM and using Zero and Shark will get you up and running with OpenJDK in about half the time. That’s not bad.

Fun things to type in gdb

Want to see what the C++ interpreter is up to in gdb?

(gdb) bt
...
#6  0x0f42155c in BytecodeInterpreter::run (istate=0xd0f7e55c) at bytecodeInterpreter.cpp:857
...
(gdb) call PI(0xd0f7e55c)
thread: 0x10108650
bcp: 0xf20efe8b
locals: 0xd0f7e5b4
constants: 0xf20f01f8
method: 0xf20efea8[ javasoft.sqe.tests.vm.jdwp.StackFrame.PopFrames.popframes001a$TestedThreadClass.testedMethod(I)I ]
mdx: 0x00000000
stack: 0xd0f7e558
msg: no_request
result_to_call._callee: 0xf2070188
result_to_call._callee_entry_point: 0xf5e95184
result_to_call._bcp_advance: 3 
osr._osr_buf: 0xf2070188
osr._osr_entry: 0xf5e95184
result_return_kind 0xf2070188 
prev_link: 0x00000000
native_mirror: 0x00000000
stack_base: 0xd0f7e55c
stack_limit: 0xd0f7e54c
monitor_base: 0xd0f7e55c
self_link: 0xd0f7e55c