Hi Vijay,
On Sat, Jan 22, 2011 at 9:59 AM, Vijay Kilari vijay.kilari@gmail.com wrote:
Hello Dave,
Thanks for this info.
I have few more queries after looking at the results of memset on A9 & A8. I agree that externel bus speed matters in comparision across platforms.
- Why memset is performance is good on A8 than A9?. any justification?
I've CC'd the linaro-toolchain list who have been working on this topic and may be able to provide you with more information.
Cheers ---Dave
On 26 January 2011 12:12, Dave Martin dave.martin@linaro.org wrote:
Hi Vijay,
On Sat, Jan 22, 2011 at 9:59 AM, Vijay Kilari vijay.kilari@gmail.com wrote:
Hello Dave,
Thanks for this info.
I have few more queries after looking at the results of memset on A9 & A8. I agree that externel bus speed matters in comparision across platforms.
- Why memset is performance is good on A8 than A9?. any justification?
I've CC'd the linaro-toolchain list who have been working on this topic and may be able to provide you with more information.
Unfortunately we don't know why Neon was a bad idea for memset etc on A9, it's just the tests show it being worse and the advice we get says to avoid it - we've just not got an explanation.
The test code is trivially simple for the cases I tried.
Dave
On Thu, Jan 27, 2011 at 2:03 AM, David Gilbert david.gilbert@linaro.org wrote:
On 26 January 2011 12:12, Dave Martin dave.martin@linaro.org wrote:
Hi Vijay,
On Sat, Jan 22, 2011 at 9:59 AM, Vijay Kilari vijay.kilari@gmail.com wrote:
Hello Dave,
Thanks for this info.
I have few more queries after looking at the results of memset on A9 & A8. I agree that externel bus speed matters in comparision across platforms.
- Why memset is performance is good on A8 than A9?. any justification?
I've CC'd the linaro-toolchain list who have been working on this topic and may be able to provide you with more information.
Unfortunately we don't know why Neon was a bad idea for memset etc on A9, it's just the tests show it being worse and the advice we get says to avoid it - we've just not got an explanation.
To add some hearsay to the mix, I understand that the NEON unit on the A8 has a 128 bit connection straight into the L2 cache while the core has a 64 bit connection into the L1. This means that a NEON memcpy() (and probably memset()) on the A8 does twice as well for large or cold writes.
-- Michael
linaro-toolchain@lists.linaro.org