Super-dirty jtreg hacking

Today I made my second official patch to OpenJDK. I forgot how to make the jtreg test and had to figure it out all over again, so here’s my quick and dirty guide for the future:

  1. Build jtreg. I use the IcedTea one, because it’s there:
    make jtreg
  2. Make a test root and copy your test into it:
    mkdir -p tests/tests
    touch tests/TEST.ROOT
    mv ~/Test6779290.java tests/tests
    
  3. Run the tests:
    openjdk-ecj/control/build/linux-ppc/j2sdk-image/jre/bin/java -jar test/jtreg.jar -v1 -s tests

In other news it’s over a year since I started hacking on Zero. I was hoping to be able to announce a TCK-passing build before Christmas but that’s not going to happen. Oh well.

Fedora 10

Apparently Fedora 10’s eclipse-ecj doesn’t have gcj-compiled libraries any more. Never mind:

mkdir /usr/lib/gcj/eclipse-ecj
aot-compile -c "-O3" /usr/lib/eclipse/dropins/jdt/plugins /usr/lib/gcj/eclipse-ecj
rebuild-gcj-db

Also, whilst I’m messing with my system, I’ve always had to do the following for ppc64 builds to work:

mkdir -p /usr/lib/jvm/java-gcj/jre/lib/ppc64/server
ln -s /usr/lib64/gcj-4.3.2/libjvm.so /usr/lib/jvm/java-gcj/jre/lib/ppc64/server

I never figured out how anyone else manages without this. Maybe nobody else is trying to build two platforms on the one box.

Update

With talk of a new IcedTea release I thought I’d better commit what I had of Shark ready for it. I found a couple of what look like optimizer failures while testing (usually I build with optimization disabled, for debugging) but I managed to work around those this morning and get a set of DaCapo results:

  Status Detail
antlr FAIL too many open files
bloat pass 83178ms
chart pass 47227ms
eclipse FAIL one method miscompiles, one method won’t compile
fop pass 15762ms
hsqldb pass 21190ms
jython pass 67533ms
luindex pass 35567ms
lusearch pass 35633ms
pmd pass 60637ms
xalan pass 48422ms

These are still with a non-optimized LLVM, but the numbers are much closer to what I was hoping for than the previous sets.

The State Decacher and Other Animals

It’s been a while. Here’s where I am:

  Status Detail
antlr FAIL too many open files
bloat pass 699657ms
chart pass 342527ms
eclipse FAIL one method miscompiles, one method won’t compile
fop pass 35198ms
hsqldb pass 178011ms
jython pass 983272ms
luindex pass 140654ms
lusearch FAIL segfault
pmd pass 456881ms
xalan pass 148200ms

After implementing deoptimization and the remaining bytecodes I’ve been taking some time to rewrite state cache and decache. When methods are compiled, the local variables and expression stack mostly end up in registers, but when you enter the VM some of the locals and stack slots need to be accessible. Garbage collection can happen when you invoke Java methods or call VM functions, for example, so object pointers need to be visible. The way this works in Shark is that methods allocate a frame on the call stack at entry with enough space to store all its locals and stack slots. At VM entry, whatever slots are needed are written to the frame (“decached”), and on return any object pointers are reloaded (“cached”) in case they changed.

Unfortunately, over time, the cache and decache functions have become ridiculously overcomplicated. The problem is that there are three types of decache and cache — for a Java call, for a VM call, and for deoptimization — each with its own different rules for exactly what needs writing and rereading. The story ends there for cache, but a decache has three separate functions: it must generate the actual code to write the values, it must tell the garbage collector which slots contain objects, and it must describe the frame to the stack trace code. This is all further complicated by the fact that Shark uses a compressed expression stack, where long and double values take up one slot, whereas HotSpot uses an expanded version where they take up two.

I’m pretty sure that the Eclipse failure is a decache failure, and I’m leaning towards the two being them as well, hence the rewrite. It’s in two parts, the first being to move the interface between the compressed and expanded stacks from the decacher code into the bit that parses the bytecode, and the second being to abstract everything so that cache and decache are using as much of the same code as possible. Currently there is not much sharing between the two, and it’s messy.

The first part is done, and seems pretty stable. The only place where compressed stacks now exist is in the SharkBlock class, and where it was necessary or useful to expose the expanded stack I’ve prefixed the method names with “x”.

The second part is a work in progress…

DaCapo status

I’ve been working on DaCapo for nearly two weeks now, so I took a bit of time out today to figure out where I am with it:

  Status Detail Unimplemented bytecodes
antlr FAIL too many open files jsr (once)
bloat pass 718149ms multianewarray (once)
chart pass 337240ms
eclipse FAIL requires deoptimization
fop pass 37126ms
hsqldb pass 178120ms
jython FAIL requires deoptimization jsr (21 times)
luindex pass 149362ms jsr (3 times)
lusearch FAIL segfault
pmd pass 457936ms jsr (15 times)
xalan pass 174340ms jsr (8 times)

Note that this is a debug build, with no optimization and assertions enabled, so the times are in no way representative.

DaCapo

This past week or so I’ve been trying to get the DaCapo benchmarks running on Shark. It’s a total baptism of fire. ANTLR uses exceptions extensively, so I’ve had to implement exception handling. FOP is multithreaded, so I’ve had to implement slow-path monitor acquisition and release (all of synchronization is now done!) I’ve had to implement safepoints, unresolved field resolution, and unresolved method resolution for invokeinterface. I’ve had to replace the unentered block detection code to cope with the more complex flows introduced by exception handlers. I’ve fixed bugs in the divide-by-zero check, in aload, astore, checkcast and new, and to top it off I implemented lookupswitch for kicks. And I’m only halfway through the set of benchmarks…

Building Shark

For reference, this is how to reproduce my working environment and get a debuggable Shark built:

svn co http://llvm.org/svn/llvm-project/llvm/trunk llvm
cd llvm
./configure --with-pic --enable-pic
make
cd ..
hg clone http://icedtea.classpath.org/hg/icedtea6
cd icedtea6
curl http://gbenson.net/wp-content/uploads/2008/08/mixtec-hacks.patch | patch -p1
./autogen.sh
LLVM_CONFIG=$(dirname $PWD)/llvm/Debug/bin/llvm-config ./configure --enable-shark
make icedtea-against-ecj

After the initial make icedtea-against-ecj you can use make hotspot to rebuild only HotSpot.