 
            On Thu, Oct 27, 2011 at 10:30:14PM +0300, Siarhei Siamashka wrote:
On Thu, Oct 27, 2011 at 9:45 PM, Christian Robottom Reis kiko@linaro.org wrote:
On Wed, Oct 26, 2011 at 03:54:52PM -0500, Tom Gall wrote:
Hardware used includes the imx53 QuickStart board by freescale and an intel core 2 duo in my Lenovo T400.
The results can be found here including both the raw numbers and pretty graphs.
Again, wow, thanks for such a thorough analysis. I think this is indeed very good material for discusing with Ubuntu. Do we have a session scheduled with them to talk about this?
I have a question: any idea why the gap between 8c and turbo8 is so much more impressive on x86(_64) than on ARM?
Out of curiosity, how much is "much more impressive"? Which case in particular has caught your attention?
Sorry; here's the graphs which show where I was surprised to see a 3-4x gap:
https://wiki.linaro.org/TomGall/LibJpeg8?action=AttachFile&do=view&t...
https://wiki.linaro.org/TomGall/LibJpeg8?action=AttachFile&do=view&t...
https://wiki.linaro.org/TomGall/LibJpeg8?action=AttachFile&do=view&t...
https://wiki.linaro.org/TomGall/LibJpeg8?action=AttachFile&do=view&t...
Could there be low-hanging fruit left in the NEON codepaths?
Yes, currently missing ARM NEON optimizations for chroma upsampling/downsampling and grayscale color conversions definitely affect the tjbench results for subsampled formats and grayscale.
Also huffman decoder optimizations (which are C code, not SIMD) in libjpeg-turbo seem to be providing only some barely measurable improvement on ARM, while huffman speedup is clearly more impressive on x86. This gives libjpeg-turbo more points over IJG jpeg on x86 as a result.
We should definitely consider these for future work, then. Thanks for the reply,