It’s been a while. Here’s where I am:
Status | Detail | |
---|---|---|
antlr | FAIL | too many open files |
bloat | pass | 699657ms |
chart | pass | 342527ms |
eclipse | FAIL | one method miscompiles, one method won’t compile |
fop | pass | 35198ms |
hsqldb | pass | 178011ms |
jython | pass | 983272ms |
luindex | pass | 140654ms |
lusearch | FAIL | segfault |
pmd | pass | 456881ms |
xalan | pass | 148200ms |
After implementing deoptimization and the remaining bytecodes I’ve been taking some time to rewrite state cache and decache. When methods are compiled, the local variables and expression stack mostly end up in registers, but when you enter the VM some of the locals and stack slots need to be accessible. Garbage collection can happen when you invoke Java methods or call VM functions, for example, so object pointers need to be visible. The way this works in Shark is that methods allocate a frame on the call stack at entry with enough space to store all its locals and stack slots. At VM entry, whatever slots are needed are written to the frame (“decached”), and on return any object pointers are reloaded (“cached”) in case they changed.
Unfortunately, over time, the cache and decache functions have become ridiculously overcomplicated. The problem is that there are three types of decache and cache — for a Java call, for a VM call, and for deoptimization — each with its own different rules for exactly what needs writing and rereading. The story ends there for cache, but a decache has three separate functions: it must generate the actual code to write the values, it must tell the garbage collector which slots contain objects, and it must describe the frame to the stack trace code. This is all further complicated by the fact that Shark uses a compressed expression stack, where long and double values take up one slot, whereas HotSpot uses an expanded version where they take up two.
I’m pretty sure that the Eclipse failure is a decache failure, and I’m leaning towards the two being them as well, hence the rewrite. It’s in two parts, the first being to move the interface between the compressed and expanded stacks from the decacher code into the bit that parses the bytecode, and the second being to abstract everything so that cache and decache are using as much of the same code as possible. Currently there is not much sharing between the two, and it’s messy.
The first part is done, and seems pretty stable. The only place where compressed stacks now exist is in the SharkBlock
class, and where it was necessary or useful to expose the expanded stack I’ve prefixed the method names with “x”.
The second part is a work in progress…