On 17 May 2013 08:18, YongQin Liu <yongqin.liu@linaro.org> wrote:
I Compiled it with the android-ndk-r8d with the attached build_native.sh script.

That's a gcc 4.7 right?


> * What platform are you testing on?
The device I am testing on is an SP8810 device. here is the content of the cpuinfo

I believe that's a Cortex-A5.


root@android:/ # cat /proc/cpuinfo                                             
Processor : ARMv7 Processor rev 1 (v7l)

<rant>Why don't we print full information like Intel?</rant>


For  the "-fprefetch-loop-arrays", 
I enabled it by appending option "-O3 -fprefetch-loop-arrays", but there seems no improvement.

It's possible that tCCParticle is too complex for the compiler to prefetch, possibly bigger than a cache line?

If you can change the source code, try to move the fist line of that loop to inside the loop, after the last use of p, saving it into a hoisted variable, initialized with the first iteration.

        tCCParticle *p = &m_pParticles[m_uParticleIdx];
        while (m_uParticleIdx < m_uParticleCount)
        {
            // life
            p->timeToLive -= dt;
            ...
            *p = &m_pParticles[m_uParticleIdx];
            ...
            ++m_uParticleIdx;
         }

Not that you should leave the source code like that, but it'll give as a clue whether the load is really the problem here.

cheers,
--renato