I just committed what I have of Shark to icedtea6 hg. Its version number should be taken as an indication of its level of completeness. To build it you need to use ./configure --enable-shark with a very recent LLVM installed on your system. I’m using 52487…
I spent quite a lot of time tracking down a miscompilation in LLVM lately but I’m pretty much back on track now. One fun thing is that I figured out how to add diagnostic options to OpenJDK, so when I eventually release Shark you’ll be able to debug it with the following exciting options:
-XX:SharkStartAt=N- Start compiling only after N compilation requests. (Use in conjunction with
-XX:+PrintCompilation.) -XX:SharkStopAfter=N- Stop compiling after N compilation requests.
-XX:SharkDumpModuleAfter=N- Dump all generated LLVM bitcode (to
hotspot.bc) after N compilation requests. -XX:SharkPrintTypeflowAfter=N- Print the results of the typeflow pass of the Nth compilation request.
-XX:+SharkTraceBytecodes- Print the name and bci of each bytecode compiled (Handy for crashes).
I tend to use them in pairs, eg -XX:Shark{StartAt,DumpModuleAfter}=17.
This list will probably get real old real quick, but they’ll all be listed in ports/hotspot/src/share/vm/shark/shark_globals.hpp.
Keeping the (Java) stack and locals in registers is all very well, but it makes method calls and safepoints tricky since previously everything was on the (Zero) stack where it needed to be and now it isn’t.
So here it is, the first method from the all new Shark, String.hashCode(). And the same method from old Shark for comparison. Some highlights:
- The inner loop for the actual calculation is 18 instructions (down from 53)
- A complete pass through the method when fetching a cached hashcode is 39 instructions (down from 66)
All in all I’m pretty pleased with myself.
I nearly have String.hashCode() compiling with Shark II and the code is so short it’s unbelievable!
I started tearing Shark apart yesterday. I had a chat with Tom Tromey on Saturday and he mentioned some properties of Java bytecode of which I was unaware — like, you can compile it without using a stack pointer. I was trying to figure out how to modify my first pass to generate some of this stuff when a reply to a seemingly unrelated question made me realise that not only does the server JIT have a pass that does all this and more, but it’s been abstracted out for easy reuse! So stay tuned for the Shark that keeps its stack and locals in registers :)
Today I got my second method working, String.hashCode(). Now I have conditional and unconditional branching, field access, array loads, a whole bunch of integer operators, and returning with a result implemented and (somewhat) tested. The bytecode coverage chart says I’m 50% done, but I don’t believe it.
Well, it’s taken a month and a half — and over 2000 lines of code — but I finally got a method out of Shark.
I made a chart showing which bytecodes are implemented, which I’ll keep updated as I progress. The estimated total coverage of 18% is slightly fanciful as it treats all bytecodes as equally complex, with nop having the same weight as new for example. Some codes are marked as complete but untested too. The way the compiler is structured means that in simple cases I can copy and paste whole blocks of bytecodes from the server compiler, so where I was doing one bytecode in a block I’ve copied the lot across. Most of them ought to be fine, but a couple are dubious. I’m still shuffling things around to try and make things less so.
Onwards…
Shark just JITted its first bytecode:
load i32* %local0 ;:15 [#uses=1] inttoptr i32 %sp to i32* ; :16 [#uses=1] store i32 %15, i32* %16 %sp1 = sub i32 %sp, 4 ; [#uses=2]
Ladies and gentlemen, it is aload_0!
Yesterday I had a mini-milestone in that I got enough bits and pieces written to have HotSpot call my JIT’s compile_method() method. The next trick will be to get a piece of code back into HotSpot without it being relocated on the way.
Of course, the zero interpreter doesn’t have profiling, so it only works if you force it with -Xcomp at the moment.
Also, I disabled InlineIntrinsics, which reminded me that there was some kind of fast accessor thing for JNI that I disabled in the original ppc port. I should fix that at some point.