My landline went dead shortly after starting work today. I managed to keep coding for an hour or so but my PPC machine is in Boston and there’s only so much you can do blind. It turned out to be a blessing in disguise however, since I’ve been meaning to document the interpreter calling convention for some time, but I got bored every time and went back to coding. Until today…

So, the interpreter calling convention. The interpreter calling convention is basically the register usage and the stack frame layout used within the interpreter — which for the C++ interpreter means everything inbetween StubRoutines::call_stub() and BytecodeInterpreter::run(). And while not immediately relevent, this includes any code created by a JIT.

The register usage is the simplest bit. There are three symbolically named registers that are valid at all times within the interpreter:

Rmethod
The address of the current methodOop.
Rlocals
The address of first local variable. Local variables are accessed with negative indices, so the address of the second local variable is Rlocals - wordSize and so on. Note that Rlocals is essentially a stack pointer and is treated as such in the result convertors, so you need to be careful it’s pointing where you expect while methods are returning.
Rstate
The address of the current interpreterState object.

These registers are assigned to registers defined as non-volatile by the PPC ABIs, so they do not need to be saved or restored around calls to native ABI functions.

The stack is managed in accordance with the PPC ABIs, with r1 as the frame pointer and with frames laid out in the standard manner:

    | ...                  |  high addresses
+-> | Link area            |
|   +----------------------+
|   | Register save area   |
|   | Local variable space |
|   | Parameter list space |
+---+ Link area            |  low addresses
    +----------------------+

Each method has its own frame, the “local variable space” of which is laid out as follows:

    +----------------------+
    | interpreterState     |  high addresses
    +----------------------+
    | monitor 0            |
    |  ...                 |
    | monitor m            |
    +----------------------+
    | stack slot 0         |
    |  ...                 |
    | stack slot n         |
    +----------------------+
    | slop_factor          |
    +----------------------+
    | padding              |  low addresses
    +----------------------+

slop_factor is a hack. When a method is called the callee’s parameters are pushed onto the caller’s stack. The method pops these off and pushes its return value, if any. It is a given that the Java compiler allocates enough stack slots for the parameters a method will call, but nobody seems sure it allocates enough slots for a return value in the event that the return value requires more slots than the parameters. This is dubiously referred to as “the static long no_params() issue”. slop_factor is essentially two extra words above the expression stack to protect what follows from being overwritten in this case. It’s probably unnecessary, but on PPC32 there is a fair chance that a stack overrun will corrupt the return address, just the kind of fun bug where the crash happens long after the cause. I left it in; I value my sanity over some wasted bytes of memory.

The state-monitors-stack ordering is not random. On entry to a method the caller’s frame will be at the top of the stack and Rlocals will be pointing at an expression stack slot within that frame (or at the higher slop word, in frames with no stack). A method’s N parameters are its first N local variables, and in methods with more local variables than parameters this frame (the caller’s frame) may be extended to accomodate them. Having the expression stack the lowest thing in the frame minimises the amount of stuff that needs moving when a frame is extended. Of course, this means that the entire expression stack needs moving every time a monitor is allocated, but synchronized methods pre-allocate a monitor so the monitor list needs extending much less frequently than the expression stack. Having it this way round means the monitors never move — and you can’t move a monitor without a safepoint so this is nice.

I’m fading now, but one final point is that random frame extension means you cannot unwind frames assuming they’re the same size they were when you created them.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.