The State Decacher and Other Animals

It’s been a while. Here’s where I am:

  Status Detail
antlr FAIL too many open files
bloat pass 699657ms
chart pass 342527ms
eclipse FAIL one method miscompiles, one method won’t compile
fop pass 35198ms
hsqldb pass 178011ms
jython pass 983272ms
luindex pass 140654ms
lusearch FAIL segfault
pmd pass 456881ms
xalan pass 148200ms

After implementing deoptimization and the remaining bytecodes I’ve been taking some time to rewrite state cache and decache. When methods are compiled, the local variables and expression stack mostly end up in registers, but when you enter the VM some of the locals and stack slots need to be accessible. Garbage collection can happen when you invoke Java methods or call VM functions, for example, so object pointers need to be visible. The way this works in Shark is that methods allocate a frame on the call stack at entry with enough space to store all its locals and stack slots. At VM entry, whatever slots are needed are written to the frame (“decached”), and on return any object pointers are reloaded (“cached”) in case they changed.

Unfortunately, over time, the cache and decache functions have become ridiculously overcomplicated. The problem is that there are three types of decache and cache — for a Java call, for a VM call, and for deoptimization — each with its own different rules for exactly what needs writing and rereading. The story ends there for cache, but a decache has three separate functions: it must generate the actual code to write the values, it must tell the garbage collector which slots contain objects, and it must describe the frame to the stack trace code. This is all further complicated by the fact that Shark uses a compressed expression stack, where long and double values take up one slot, whereas HotSpot uses an expanded version where they take up two.

I’m pretty sure that the Eclipse failure is a decache failure, and I’m leaning towards the two being them as well, hence the rewrite. It’s in two parts, the first being to move the interface between the compressed and expanded stacks from the decacher code into the bit that parses the bytecode, and the second being to abstract everything so that cache and decache are using as much of the same code as possible. Currently there is not much sharing between the two, and it’s messy.

The first part is done, and seems pretty stable. The only place where compressed stacks now exist is in the SharkBlock class, and where it was necessary or useful to expose the expanded stack I’ve prefixed the method names with “x”.

The second part is a work in progress…

DaCapo status

I’ve been working on DaCapo for nearly two weeks now, so I took a bit of time out today to figure out where I am with it:

  Status Detail Unimplemented bytecodes
antlr FAIL too many open files jsr (once)
bloat pass 718149ms multianewarray (once)
chart pass 337240ms
eclipse FAIL requires deoptimization
fop pass 37126ms
hsqldb pass 178120ms
jython FAIL requires deoptimization jsr (21 times)
luindex pass 149362ms jsr (3 times)
lusearch FAIL segfault
pmd pass 457936ms jsr (15 times)
xalan pass 174340ms jsr (8 times)

Note that this is a debug build, with no optimization and assertions enabled, so the times are in no way representative.