YongQin,
On 13/05/13 13:52, YongQin Liu wrote:
Hi, All
The attachment file is analysed by the streamline tool. How to do you think about the vsub.f32 hot spot there? Is there any way we can improve that?
Maybe - but you give very little information for us to help you.
We need more context - a screengrab is not enough:
* What is your source code? * How did you compile your source code? * What compiler did you use? * What platform are you testing on? * Is there anyway you can generate a smaller test case?
Also please don't attach images to emails, use text files wherever possible, and if you need to attach an image please provide a link to it rather than including it in the email.
Don't get too focused on the one instruction Streamline reports as being hot. Streamline is restricted to reporting what it is told by the kernel, and in an out-of-order super-scalar core it doesn't always match what is precisely happening.
In this case my intuition tells me that the problem is the code sequence before the VSUB:
LDR r5, [r4, #0x16c] @ 1 MOVS r3, #0xa4 @ 2 MLA r5, r3, r2, r5 @ 3 VLDR s15, [r5, #0x68] @ 4 VSUB.F32 s15, s15, s16 @ 5
Note the s15 source into insn 5 is loaded in the previous insn. Insn 4's base register is calculated from a Multiply-accumulate inn Insn 3, for which one of the sources is loaded in Insn 1.
So I *think* the 'hotness' is coming from the dependent chain of instructions which when you get to the VSUB the core is waiting to complete before it can do the calculation. But this is a guess - I don't have enough information.
There are instantly many theories I have around why is this the case, but I can't work out what is going on without the information I asked for above.
Thanks,
Matt