I've updated:
https://wiki.linaro.org/RichardSandiford/Sandbox/NeonLibAv
so that it gives the output for current trunk, including Ira's commit yesterday to reduce the amount of overpromotion. I also reran the microbenchmarks. The good news is that the vectorised code is now better in all cases than the non-vectorised code.
The biggest winner from last time was rgb24tobgr16_C(). It used to be much worse with vectorisation due to lots of excessive widening. Thanks to Ira's patch, the loop now looks pretty respectable, and is ~3.25x faster than the non-vectorised code.
As well as using a more recent compiler, the new version also uses -mvectorize-with-neon-quad. Once again it shows a significant improvement over the default.
Richard
On 5 August 2011 08:53, Richard Sandiford richard.sandiford@linaro.org wrote:
As well as using a more recent compiler, the new version also uses -mvectorize-with-neon-quad. Once again it shows a significant improvement over the default.
Richard E., Ramana and I finally came to an agreement about how it should work, so hopefully from some time next week quad-words will be the default.
Thanks, Ira
Richard
linaro-toolchain mailing list linaro-toolchain@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-toolchain
linaro-toolchain@lists.linaro.org