I finished converting everything to 64-bit. It was way easier than I expected and I’m very glad I took the time to do it because what I thought were vast and insurmountable differences turned out to be pretty minuscule. I managed to tuck everything away in macros, the ABI differences in prolog
and epilog
, and enter
(née function_entry_point
) and call
, and the register-size differences in much simpler ones like load
and store
which just map to lwz
/ld
and stw
/std
respectively. Hiding the ugly stuff in the assembler keeps the generators happily free of conditionals.
My next job is designing the interpreter calling convention. There’s no real reason for the interpreter to follow the platform’s ABI, and the fact that Java is essentially stack-based and PPC is essentially register-based is a very good reason not to. So I’m trying to figure out how to arrange the stack.
The general layout of stack frames is the same under both 32- and 64-bit ABIs:
| ... | high addresses +-> | Link area | | +----------------------+ | | Register save area | | | Local variable space | | | Parameter list space | +---+ Link area | low addresses +----------------------+
The stack grows downwards, and the stack pointer, r1
, points to the first word of the link area, the lowest address, such that all accesses into the stack are relative to r1
with a positive offset. The ABIs are pretty relaxed about what happens in the stack, but one thing they’re firm about is that 0(r1)
points to the previous frame — essentially it’s where you save your caller’s r1
. This is slightly irritating, because I think the interpreter would like an open-ended stack, but the requirement to maintain a valid link area at the very top of the stack would seem to preclude this. Aside from anything else, if r1
isn’t pointing to a valid link area then gdb cannot unwind the stack and produce backtraces. I discovered this empirically a while back ;)
My thinking at the moment is to leave r1
alone, and to use another register (r31
maybe) as the interpreter’s stack pointer. That way the interpreter can extend the stack however it likes, without thought to link areas and alignment, on the assumption that if it ever jumps out into C-land then it must first create a valid stack frame around it’s own data to protect it. Specifically, this will be a frame with no register save area, meaning the interpreter’s stuff falls neatly into the local variable space.
I’m not sure how stack walking will work under this scenario. It may be that it’s better to do this stack-shuffling every time the interpreter calls a new method, such that each method call, be it Java or C, has it’s own valid ABI stack frame. This will undoubtedly resolve itself as I progress.
Ok since you are using the c++ based interpreter the frame you describe here:
This is one thing I do find strange, as I can’t see why you would not simply allocate space for all locals in one go. Certainly
call_stub
can readmax_locals
from themethodOop
. I wondered if somewhere in the interpreter there would be cases where a method is entered without having access to itsmethodOop
but that sounds pretty implausable.You can’t get rid of stack extension even if you did this because of adapters. (Unless you went back adapters with frames and we don’t want to go there). When you do an invoke from either compiled or interpreted you don’t know if the callee is compiled or interpreted. So the calling convention essentially has to change as you cross the boundary. The interpreter always assumes it is calling interpreted code and the compiler assumes compiled. In the case where this is wrong you will need stack space somewhere (for certain if c2i). So this means the caller’s frame is extended or in the bad old days of framed adapters that an intermediate frame gets created.
So you could make a version of the interpreter that as part of the call it extended it’s own stack to account for the “extra” locals. Then if you ended up calling compiled you’d have wasted that space. It’s not a large amount of space for sure but I don’t see the real benefit of doing it before you call. In either case extension will happen on either the front side or back side of the call.
Well I guess at the moment I won’t need adapters because I don’t have a compiler :) But it’s good to know…
My concern is that extending the stack frame requires either moving the locals or storing them in reverse order. The former adds a chunk of overhead to every (interpreted) method call, and the latter would add overhead (negation) to every variable access. Both seem pretty bad. BTW I notice you drew your stack frames with reversed locals — was that intentional?
Of course, if I do end up extending the stack for the extra locals then I may as well extend it far enough to fit the interpreter state and monitors and expression stack too. It’s not such an issue on 32-bit but ppc64 has a lot of overhead in it’s frames (14 slots, so 112 bytes) and it’d be nice to save that.
Actually, I’m not even sure it’s possible to extend a stack frame on PPC without violating the ABI. The stack is arranged such that the stack pointer points to the first word of the top frame — the “back chain” word — which points to the back chain word of the next frame, and so on until the last frame which has a back chain of
NULL
. When creating a new frame you’re required to update the stack pointer and the back chain word atomically so that the stack pointer is always pointing to the beginning of a linked list of frames. There are special instructions for it, basically “store the contents of register X at address Y then store address Y in register X”, but I can’t see anything that could be used to atomically extend a frame as would be required. And a non-atomic extension risks having a signal turn up and trash everything.I may end up having to make the caller allocate space for the parameters and have the callee allocate space for all locals and copy the parameters into it. You’d waste the parameter space for interpreted callees, but I’m not sure there’s a way around it.
Uh the locals are stored in reverse order. Since it is a stack the interpreter would push local[0] first, …, So when you extend the stack you don’t do anything but allocate space. That is of course why I drew the stack with the locals reversed. The numbers refer to the Java local number. If you are actually doing array accesses then the number is negated. Take a look at the code in bytecodeInterpreter and you’ll see that the index is negated.
I don’t have a copy of either the ABI or a ppc manual in front of me but it must be possible since that is how the other ppc ports work even though they have adapter frames and I know of no problem with signal handlers.
I figured it out. Say your the top of the stack is frame N, such that frame N’s back chain word points at frame N-1. First you create a new frame in the normal way, N+1. Then you change N+1’s back chain to point at N-1, effectively turning N and N+1 into the new frame N. It’ll be slightly more complex on ppc64 since the minimum frame is larger than the smallest amount you might want to extend it by, but the principle is the same.
Oh, cool. In which case extending the stack only involves copying a couple of words.