On 2 March 2016 at 11:35, Edward Nevill edward.nevill@linaro.org wrote:
cmp x2, 8 <<< (1)
(1) If count as a 64 bit unsigned is <= 8 then it is probably still <= 8 as a 32 bit unsigned.
You mean to use "cmp w2, 8" instead? Is there any difference?
(2) Nowhere in the function does it store anything on the stack, so why drop and restore the stack every time. Also, minor quibble in the disass, why does sub use #64 whereas add uses just '64' (appreciate this is probably binutils, not gcc).
My reading of the AAPCS64 is that it's not necessary to have a frame at all, only that if you do, it must be quad-word aligned.
Clang/LLVM doesn't seem to bother with the push and pop, but it also uses "cmp x".
.L15: adrp x3, .L4 add x3, x3, :lo12:.L4 ldr x2, [x3, x2, lsl #3] br x2
Hum, this is *exactly* what Clang generates... :)
(4) Seems to be something wrong with the load scheduler here? Why not move the stp x2, x3 to the end. It does this repeatedly.
Again, Clang seems to do what you want...
Have you tried building OpenJDK with Clang?
cheers, --renato