Re: NEON intrinsics vs. assembly code

22 Apr 2011


      On 21 April 2011 23:38, Richard Sandiford richard.sandiford@linaro.org wrote:
...
Michael mentioned that some users reported seeing better preformance from
RVCT using arm_neon.h then they did when coding directly in assembler.
He suggested we try the same thing for GCC.  Here's an experiment using
the example that Jim Huang posted to the dev list recently:
https://wiki.linaro.org/RichardSandiford/Sandbox/IntrinsicsPerformance
hi Richard,
I appreciate your analysis very much!
In fact, that was the practice when I learned ARM NEON.
...
The summary is that the C version needs to borrow a trick from the
assembly code in order to be competitive.  If it does that, though,
the C code can be faster.  I think this is mostly down to scheduling,
although I haven't checked in detail yet.
Thanks for the conclusion.  Indeed, GCC meeds extra hints for NEON
iterative modulo scheduling.
Do you have any further plan to improve?
Sincerely,
-jserv

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

Re: NEON intrinsics vs. assembly code