NEON intrinsics vs. assembly code

21 Apr 2011


      Michael mentioned that some users reported seeing better preformance from
RVCT using arm_neon.h then they did when coding directly in assembler.
He suggested we try the same thing for GCC.  Here's an experiment using
the example that Jim Huang posted to the dev list recently:
https://wiki.linaro.org/RichardSandiford/Sandbox/IntrinsicsPerformance
The summary is that the C version needs to borrow a trick from the
assembly code in order to be competitive.  If it does that, though,
the C code can be faster.  I think this is mostly down to scheduling,
although I haven't checked in detail yet.
Richard

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

NEON intrinsics vs. assembly code