Hi, All
Thanks for all of your analysis.
The related information is as following:
- What is your source code?
It's the "void CCParticleSystem::update(float dt)" method of the attached CCParticleSystem.cpp file.
- How did you compile your source code?
I Compiled it with the android-ndk-r8d with the attached build_native.sh script. Also I set it to use armeabi-v7a and neon. but the neon should have no affects because there is no source using the neon features.
- What compiler did you use?
I use the default compiler of android-ndk-r8d, it should be arm-linux-androideabi-4.6
- What platform are you testing on?
The device I am testing on is an SP8810 device. here is the content of the cpuinfo
root@android:/ # cat /proc/cpuinfo
Processor : ARMv7 Processor rev 1 (v7l) BogoMIPS : 1024.00 Features : swp half thumb fastmult vfp edsp thumbee neon vfpv3 CPU implementer : 0x41 CPU architecture: 7 CPU variant : 0x0 CPU part : 0xc05 CPU revision : 1
Hardware : SP8810 Revision : 0000 Serial : 0000000000000000 root@android:/ #
- Is there anyway you can generate a smaller test case?
Sorry, not able to that now.
For the "-fprefetch-loop-arrays", I enabled it by appending option "-O3 -fprefetch-loop-arrays", but there seems no improvement.
Thanks, Yongqin Liu
On 14 May 2013 22:03, Renato Golin renato.golin@linaro.org wrote:
On 14 May 2013 13:23, Will Newton will.newton@linaro.org wrote:
It looks like there is a data dependency on the preceding load, it might be worth looking into prefetching the data, either manually or maybe try -fprefetch-loop-arrays?
I agree with Matt on needing more info, but I also agree with Will that a pre-fetch could speed things up.
The beginning of the block is a few instructions up, and the address of the VLDR is computed by almost all instructions in the block, in chain, I'm assuming (without evidence) that it's the VLDR itself who is taking all that time to release S15 for VSUB.
Furthermore, the VLDR was hit 100x less than the VSUB, hinting that it's not waiting for too long waiting for anything, so the instructions before it calculating the offset are pretty much streamlined, another hint that it's the VLDR itself who is taking that long.
cheers, --renato