Super-dirty jtreg hacking

Today I made my second official patch to OpenJDK. I forgot how to make the jtreg test and had to figure it out all over again, so here’s my quick and dirty guide for the future:

Build jtreg. I use the IcedTea one, because it’s there:
```
make jtreg
```

Make a test root and copy your test into it:

mkdir -p tests/tests
touch tests/TEST.ROOT
mv ~/Test6779290.java tests/tests

Run the tests:

openjdk-ecj/control/build/linux-ppc/j2sdk-image/jre/bin/java -jar test/jtreg.jar -v1 -s tests

In other news it’s over a year since I started hacking on Zero. I was hoping to be able to announce a TCK-passing build before Christmas but that’s not going to happen. Oh well.

Fedora 10

Apparently Fedora 10’s eclipse-ecj doesn’t have gcj-compiled libraries any more. Never mind:

mkdir /usr/lib/gcj/eclipse-ecj
aot-compile -c "-O3" /usr/lib/eclipse/dropins/jdt/plugins /usr/lib/gcj/eclipse-ecj
rebuild-gcj-db

Also, whilst I’m messing with my system, I’ve always had to do the following for ppc64 builds to work:

mkdir -p /usr/lib/jvm/java-gcj/jre/lib/ppc64/server
ln -s /usr/lib64/gcj-4.3.2/libjvm.so /usr/lib/jvm/java-gcj/jre/lib/ppc64/server

I never figured out how anyone else manages without this. Maybe nobody else is trying to build two platforms on the one box.

Not dead

For anyone who’s wondering what I’ve been up to I’ve taken a bit of a break from Shark these past few weeks in order to concentrate on getting a build of Zero though the JCK.

Update

With talk of a new IcedTea release I thought I’d better commit what I had of Shark ready for it. I found a couple of what look like optimizer failures while testing (usually I build with optimization disabled, for debugging) but I managed to work around those this morning and get a set of DaCapo results:

	Status	Detail
antlr	FAIL	too many open files
bloat	pass	83178ms
chart	pass	47227ms
eclipse	FAIL	one method miscompiles, one method won’t compile
fop	pass	15762ms
hsqldb	pass	21190ms
jython	pass	67533ms
luindex	pass	35567ms
lusearch	pass	35633ms
pmd	pass	60637ms
xalan	pass	48422ms

These are still with a non-optimized LLVM, but the numbers are much closer to what I was hoping for than the previous sets.

Quickie

I have a really bad cold, but I fixed the lusearch bug.

The State Decacher and Other Animals

It’s been a while. Here’s where I am:

	Status	Detail
antlr	FAIL	too many open files
bloat	pass	699657ms
chart	pass	342527ms
eclipse	FAIL	one method miscompiles, one method won’t compile
fop	pass	35198ms
hsqldb	pass	178011ms
jython	pass	983272ms
luindex	pass	140654ms
lusearch	FAIL	segfault
pmd	pass	456881ms
xalan	pass	148200ms

After implementing deoptimization and the remaining bytecodes I’ve been taking some time to rewrite state cache and decache. When methods are compiled, the local variables and expression stack mostly end up in registers, but when you enter the VM some of the locals and stack slots need to be accessible. Garbage collection can happen when you invoke Java methods or call VM functions, for example, so object pointers need to be visible. The way this works in Shark is that methods allocate a frame on the call stack at entry with enough space to store all its locals and stack slots. At VM entry, whatever slots are needed are written to the frame (“decached”), and on return any object pointers are reloaded (“cached”) in case they changed.

Unfortunately, over time, the cache and decache functions have become ridiculously overcomplicated. The problem is that there are three types of decache and cache — for a Java call, for a VM call, and for deoptimization — each with its own different rules for exactly what needs writing and rereading. The story ends there for cache, but a decache has three separate functions: it must generate the actual code to write the values, it must tell the garbage collector which slots contain objects, and it must describe the frame to the stack trace code. This is all further complicated by the fact that Shark uses a compressed expression stack, where long and double values take up one slot, whereas HotSpot uses an expanded version where they take up two.

I’m pretty sure that the Eclipse failure is a decache failure, and I’m leaning towards the two being them as well, hence the rewrite. It’s in two parts, the first being to move the interface between the compressed and expanded stacks from the decacher code into the bit that parses the bytecode, and the second being to abstract everything so that cache and decache are using as much of the same code as possible. Currently there is not much sharing between the two, and it’s messy.

The first part is done, and seems pretty stable. The only place where compressed stacks now exist is in the SharkBlock class, and where it was necessary or useful to expose the expanded stack I’ve prefixed the method names with “x”.

The second part is a work in progress…

Shark, now with 100% bytecode coverage

I did jsr, ret and multianewarray today; Shark now has 100% bytecode coverage.

DaCapo status

I’ve been working on DaCapo for nearly two weeks now, so I took a bit of time out today to figure out where I am with it:

	Status	Detail	Unimplemented bytecodes
antlr	FAIL	too many open files	jsr (once)
bloat	pass	718149ms	multianewarray (once)
chart	pass	337240ms
eclipse	FAIL	requires deoptimization
fop	pass	37126ms
hsqldb	pass	178120ms
jython	FAIL	requires deoptimization	jsr (21 times)
luindex	pass	149362ms	jsr (3 times)
lusearch	FAIL	segfault
pmd	pass	457936ms	jsr (15 times)
xalan	pass	174340ms	jsr (8 times)

Note that this is a debug build, with no optimization and assertions enabled, so the times are in no way representative.

DaCapo

This past week or so I’ve been trying to get the DaCapo benchmarks running on Shark. It’s a total baptism of fire. ANTLR uses exceptions extensively, so I’ve had to implement exception handling. FOP is multithreaded, so I’ve had to implement slow-path monitor acquisition and release (all of synchronization is now done!) I’ve had to implement safepoints, unresolved field resolution, and unresolved method resolution for invokeinterface. I’ve had to replace the unentered block detection code to cope with the more complex flows introduced by exception handlers. I’ve fixed bugs in the divide-by-zero check, in aload, astore, checkcast and new, and to top it off I implemented lookupswitch for kicks. And I’m only halfway through the set of benchmarks…

Building Shark

For reference, this is how to reproduce my working environment and get a debuggable Shark built:

svn co http://llvm.org/svn/llvm-project/llvm/trunk llvm
cd llvm
./configure --with-pic --enable-pic
make
cd ..
hg clone http://icedtea.classpath.org/hg/icedtea6
cd icedtea6
curl http://gbenson.net/wp-content/uploads/2008/08/mixtec-hacks.patch | patch -p1
./autogen.sh
LLVM_CONFIG=$(dirname $PWD)/llvm/Debug/bin/llvm-config ./configure --enable-shark
make icedtea-against-ecj

After the initial make icedtea-against-ecj you can use make hotspot to rebuild only HotSpot.