Re: Shrink-wrap on povray

23 Nov 2012


      On 23 November 2012 04:25, Michael Hope michael.hope@linaro.org wrote:
...
On 22 November 2012 20:53, Zhenqiang Chen zhenqiang.chen@linaro.org wrote:
...
On 21 November 2012 09:20, Zhenqiang Chen zhenqiang.chen@linaro.org wrote:
...
On 21 November 2012 03:26, Michael Hope michael.hope@linaro.org wrote:
...
On 20 November 2012 22:10, Zhenqiang Chen zhenqiang.chen@linaro.org wrote:
...
Hi,
I try ARM, MIPS, PowerPC and X86 on povray benchmark. No one can
shrink-wrap function Ray_In_Bound.
Here is:
bool Ray_In_Bound (RAY *Ray, OBJECT *Bounding_Object)
{
  ...
  for (Bound = Bounding_Object; Bound != NULL; Bound = Bound->Sibling)
  {...}
  return (true);
}
For ARM O2/O3, "Bound" is allocated to "r6" during ira. So there is copy
r6 = r1 before
testing Bound != NULL
Could you hack the benchmark to make the early exit explicit and see
if that changes the result?  That lets us know if improving shrink
wrap is worthwhile.
Something like:
bool Ray_In_Bound (RAY *Ray, OBJECT *Bounding_Object)
 {
  if (Bounding_Object == NULL) return true;
I had tried it. The result is the same with the original one. (The
hack code is optimized)
After hacking the assemble code, I got 2-3% performance improvement
for -O2. Here is the assemble change
Original code:
        push    {r4, r5, r6, r7, r8, r9, lr}
        .save {r4, r5, r6, r7, r8, r9, lr}
        mov     r6, r1
        .pad #196
        sub     sp, sp, #196
        cbz     r1, .L113
        ldr     r8, .L117
        ...
.L113:
        movs    r0, #1
        add     sp, sp, #196
        @ sp needed
        pop     {r4, r5, r6, r7, r8, r9, pc}
After shrink-wrap:
        cbz     r1, .L1131
        push    {r4, r5, r6, r7, r8, r9, lr}
        .save {r4, r5, r6, r7, r8, r9, lr}
        mov     r6, r1
        .pad #196
        sub     sp, sp, #196
        ldr     r8, .L117
        ...
.L113:
        movs    r0, #1
        add     sp, sp, #196
        @ sp needed
        pop     {r4, r5, r6, r7, r8, r9, pc}
.L1131:
        movs    r0, #1
        bx      lr
But simple hack for -O3 has ~1% regression. "code alignment" change
should be the root cause. To verify it, I add 6 NOPs after "bx lr".
With it, the size of block .L1131 is 16 Bytes. After this change, O3
will have 2-3% performance improvement.
That's good then.  So modulo supposed alignment changes, your current
shrink wrap patch causes no speed regressions and has the potential to
show an improvement.
Worth finishing and committing.  Shrinkwrap was a mess last time - we
need to check that all of these bugs:
 http://goo.gl/6fGg5
are clear before upstreaming/backporting.
I will build a clean toolchain and verify them.
Thanks!
-Zhenqiang

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

Re: Shrink-wrap on povray