Måns pointed me at the IDCT throughput test that's included with libav. I've written up a page on how to build and run it at: https://wiki.linaro.org/MichaelHope/Sandbox/LibAvDCT
Included are results with and without the vectoriser. In all cases the vectoriser improves things, including increasing the SIMPLE-C version by 11 % and the peak by 17 %.
The coefficient of variance is low so the results are consistent. I haven't investigated the benchmark itself to see if its valid - we could be vectorising the loop overhead instead of the IDCT itself.
-- Michael
linaro-toolchain@lists.linaro.org