On 21 November 2012 09:20, Zhenqiang Chen zhenqiang.chen@linaro.org wrote:
On 21 November 2012 03:26, Michael Hope michael.hope@linaro.org wrote:
On 20 November 2012 22:10, Zhenqiang Chen zhenqiang.chen@linaro.org wrote:
Hi,
I try ARM, MIPS, PowerPC and X86 on povray benchmark. No one can shrink-wrap function Ray_In_Bound.
Here is: bool Ray_In_Bound (RAY *Ray, OBJECT *Bounding_Object) { ... for (Bound = Bounding_Object; Bound != NULL; Bound = Bound->Sibling) {...} return (true); } For ARM O2/O3, "Bound" is allocated to "r6" during ira. So there is copy
r6 = r1 before testing Bound != NULL
Could you hack the benchmark to make the early exit explicit and see if that changes the result? That lets us know if improving shrink wrap is worthwhile.
Something like:
bool Ray_In_Bound (RAY *Ray, OBJECT *Bounding_Object) { if (Bounding_Object == NULL) return true;
I had tried it. The result is the same with the original one. (The hack code is optimized)
After hacking the assemble code, I got 2-3% performance improvement for -O2. Here is the assemble change Original code: push {r4, r5, r6, r7, r8, r9, lr} .save {r4, r5, r6, r7, r8, r9, lr} mov r6, r1 .pad #196 sub sp, sp, #196 cbz r1, .L113 ldr r8, .L117 ... .L113: movs r0, #1 add sp, sp, #196 @ sp needed pop {r4, r5, r6, r7, r8, r9, pc}
After shrink-wrap: cbz r1, .L1131 push {r4, r5, r6, r7, r8, r9, lr} .save {r4, r5, r6, r7, r8, r9, lr} mov r6, r1 .pad #196 sub sp, sp, #196 ldr r8, .L117 ... .L113: movs r0, #1 add sp, sp, #196 @ sp needed pop {r4, r5, r6, r7, r8, r9, pc} .L1131: movs r0, #1 bx lr
But simple hack for -O3 has ~1% regression. "code alignment" change should be the root cause. To verify it, I add 6 NOPs after "bx lr". With it, the size of block .L1131 is 16 Bytes. After this change, O3 will have 2-3% performance improvement.
-Zhenqiang