On 02/03/16 14:25, Renato Golin wrote:
On 2 March 2016 at 11:35, Edward Nevill edward.nevill@linaro.org wrote:
cmp x2, 8 <<< (1)
(1) If count as a 64 bit unsigned is <= 8 then it is probably still <= 8 as a 32 bit unsigned.
You mean to use "cmp w2, 8" instead? Is there any difference?
No, it's code equivalent to
unsigned long x;
if (x <= 8) { if ((unsigned) x <= 8) { ... } }
Where the inner test is clearly redundant (for unsigned).
R.
(2) Nowhere in the function does it store anything on the stack, so why drop and restore the stack every time. Also, minor quibble in the disass, why does sub use #64 whereas add uses just '64' (appreciate this is probably binutils, not gcc).
My reading of the AAPCS64 is that it's not necessary to have a frame at all, only that if you do, it must be quad-word aligned.
Clang/LLVM doesn't seem to bother with the push and pop, but it also uses "cmp x".
.L15: adrp x3, .L4 add x3, x3, :lo12:.L4 ldr x2, [x3, x2, lsl #3] br x2
Hum, this is *exactly* what Clang generates... :)
(4) Seems to be something wrong with the load scheduler here? Why not move the stp x2, x3 to the end. It does this repeatedly.
Again, Clang seems to do what you want...
Have you tried building OpenJDK with Clang?
cheers, --renato _______________________________________________ linaro-toolchain mailing list linaro-toolchain@lists.linaro.org https://lists.linaro.org/mailman/listinfo/linaro-toolchain
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.