My first builds ran to completion yesterday. It’s slow: the 32-bit one took nearly four hours and the 64-bit one a little over six — and this is on a quad 2.5GHz G5! I don’t have exclusive use of the machine though so I suppose other stuff may have been running in the meantime.

The next step is IcedTea integration. My tree is currently an odd hybrid of b17 with b22’s hotspot/src dropped in, and with all the build system changes between b17 and b23 I’ll basically be starting the port from scratch. This is possibly a good thing, because I want to switch from using ARCH_DATA_MODEL to using setarch, but it’s a pain nontheless. I plan to wait until the IcedTea guys have upgraded to b23 and sync my tree on that, remake all the build system patches and then import everything into IcedTea.

I have plenty to do in the meantime. I’ve been doing an audit of dodgy code while builds were running yesterday, basically looking for Unimplemented()s where they shouldn’t be and of course the ubiquitous XXX_EVIL_EVIL_EVIL. This is the state of my to do list:

Contended locks
The bit that handles locking and unlocking for synchronized native methods cannot cope when the lock is contended.
Relocations
The assembler doesn’t support relocations; if the garbage collector moves generated code then absolute addresses that reference other generated code will become invalid. I’m not sure if this matters currently.
Atomic copies
Copy::conjoint_jlongs_atomic() is not atomic on 32-bit. The other atomic copies are questionable too, but that one definitely doesn’t do what it says on the tin.
Stack args
The part of the signature handler that passes arguments to native methods on the stack is untested on 64-bit.
Unimplemented stuff
A brief and inexhaustive list: stack overflow checks, stack banging (whatever that is), JVMTI, JDI, profiling, prefetch, a JIT.

I now have Hello World executing to completion, 303,371 instructions on 32-bit and 307,209 on 64. There’s a lot missing, and a lot I plain don’t understand. And it’s very, very slow. I have a copy of javac running on 64-bit as I write. And running. And running. I attached gdb briefly and it was nearing two billion instructions. On 64-bit, that is. 32-bit is crashing in the garbage collector after a mere 300 million bytecodes or so. Debugging GC crashes is something I was dreading.

$ control/build/linux-ppc64/bin/java Hello
Hello world
# To suppress the following error report, specify this argument
# after -XX: or in .hotspotrc:  SuppressErrorAt=/os_linux_ppc.hpp:28

  error: Unimplemented(): aztec/hotspot/src/os_cpu/linux_ppc/vm/os_linux_ppc.hpp:28

Only on 64-bit so far…

My first exception!

Error occurred during initialization of VM
java.lang.ExceptionInInitializerError
        at java.security.SecureClassLoader.<clinit>(SecureClassLoader.java:55)
        at sun.misc.Launcher.<init>(Launcher.java:71)
        at sun.misc.Launcher.<clinit>(Launcher.java:59)
        at java.lang.ClassLoader.initSystemClassLoader(ClassLoader.java:1322)
        at java.lang.ClassLoader.getSystemClassLoader(ClassLoader.java:1304)
Caused by: java.lang.ClassCastException: java.lang.Class cannot be cast to java.lang.String
        at sun.security.util.Debug.<clinit>(Debug.java:45)
        at java.security.SecureClassLoader.<clinit>(SecureClassLoader.java:55)
        at sun.misc.Launcher.<init>(Launcher.java:71)
        at sun.misc.Launcher.<clinit>(Launcher.java:59)
        at java.lang.ClassLoader.initSystemClassLoader(ClassLoader.java:1322)
        at java.lang.ClassLoader.getSystemClassLoader(ClassLoader.java:1304)

I just spent the best part of a day trying to figure out why something was happening differently on 32- and 64-bit PPC. It turned out that what was happening was that one was decoding the string "OpenJDK  VM" while the other was decoding "OpenJDK 64-Bit  VM". I should feel pleased — things are working as they should — but it’s given me a headache.

Anyway, the point of this entry is that while I was in there I checked out that double-space. It was trying to insert "Server" or "Client", but of course mine has neither. It occurred to me that having undocumented variants of OpenJDK out there with weird identifiers could be slightly disturbing, so if you’re Googling this in the future trying to figure out what an "OpenJDK Aztec VM" or an "OpenJDK 64-Bit Aztec VM" is then this is the documentation: you’re using my stuff.

Because I keep getting this wrong, here is where the pointers in an interpreterState should point. This is a PPC32 frame with a four-word stack with one item pushed onto it and no monitors:

…680 Back chain
…684 LR save word
…688 Padding
…68c
…690 Slop
…694 _stack_limit
…698  
…69c Stack
…6a0 _stack
…6a4  
…6a8 Interpreter state _stack_base, _monitor_base
.
.
.
…6ec

This is the same frame with a monitor allocated:

…680 Back chain
…684 LR save word
…688 Slop
…68c _stack_limit
…690  
…694 Stack
…698 _stack
…69c  
…6a0 Monitor _stack_base
…6a4
…6a8 Interpreter state _monitor_base
.
.
.
…6ec

My landline went dead shortly after starting work today. I managed to keep coding for an hour or so but my PPC machine is in Boston and there’s only so much you can do blind. It turned out to be a blessing in disguise however, since I’ve been meaning to document the interpreter calling convention for some time, but I got bored every time and went back to coding. Until today…

So, the interpreter calling convention. The interpreter calling convention is basically the register usage and the stack frame layout used within the interpreter — which for the C++ interpreter means everything inbetween StubRoutines::call_stub() and BytecodeInterpreter::run(). And while not immediately relevent, this includes any code created by a JIT.

The register usage is the simplest bit. There are three symbolically named registers that are valid at all times within the interpreter:

Rmethod
The address of the current methodOop.
Rlocals
The address of first local variable. Local variables are accessed with negative indices, so the address of the second local variable is Rlocals - wordSize and so on. Note that Rlocals is essentially a stack pointer and is treated as such in the result convertors, so you need to be careful it’s pointing where you expect while methods are returning.
Rstate
The address of the current interpreterState object.

These registers are assigned to registers defined as non-volatile by the PPC ABIs, so they do not need to be saved or restored around calls to native ABI functions.

The stack is managed in accordance with the PPC ABIs, with r1 as the frame pointer and with frames laid out in the standard manner:

    | ...                  |  high addresses
+-> | Link area            |
|   +----------------------+
|   | Register save area   |
|   | Local variable space |
|   | Parameter list space |
+---+ Link area            |  low addresses
    +----------------------+

Each method has its own frame, the “local variable space” of which is laid out as follows:

    +----------------------+
    | interpreterState     |  high addresses
    +----------------------+
    | monitor 0            |
    |  ...                 |
    | monitor m            |
    +----------------------+
    | stack slot 0         |
    |  ...                 |
    | stack slot n         |
    +----------------------+
    | slop_factor          |
    +----------------------+
    | padding              |  low addresses
    +----------------------+

slop_factor is a hack. When a method is called the callee’s parameters are pushed onto the caller’s stack. The method pops these off and pushes its return value, if any. It is a given that the Java compiler allocates enough stack slots for the parameters a method will call, but nobody seems sure it allocates enough slots for a return value in the event that the return value requires more slots than the parameters. This is dubiously referred to as “the static long no_params() issue”. slop_factor is essentially two extra words above the expression stack to protect what follows from being overwritten in this case. It’s probably unnecessary, but on PPC32 there is a fair chance that a stack overrun will corrupt the return address, just the kind of fun bug where the crash happens long after the cause. I left it in; I value my sanity over some wasted bytes of memory.

The state-monitors-stack ordering is not random. On entry to a method the caller’s frame will be at the top of the stack and Rlocals will be pointing at an expression stack slot within that frame (or at the higher slop word, in frames with no stack). A method’s N parameters are its first N local variables, and in methods with more local variables than parameters this frame (the caller’s frame) may be extended to accomodate them. Having the expression stack the lowest thing in the frame minimises the amount of stuff that needs moving when a frame is extended. Of course, this means that the entire expression stack needs moving every time a monitor is allocated, but synchronized methods pre-allocate a monitor so the monitor list needs extending much less frequently than the expression stack. Having it this way round means the monitors never move — and you can’t move a monitor without a safepoint so this is nice.

I’m fading now, but one final point is that random frame extension means you cannot unwind frames assuming they’re the same size they were when you created them.

Being at a point where I’m interpreting bytecodes is really cool: every new thing I implement gets the interpreter a whole load further. I’ve now executed 70 instructions, including calls to both native and non-native methods with both void and non-void return values. It’s currently stopping at the start of the first method to require more locals than it takes parameters: the first method whose frame may need expanding, in other words. This might be slightly complicated by the fact that Rlocals is not used exactly as I thought it would be when I planned the frame-expanding code. I’m hoping this doesn’t matter.