On 6 July 2012 16:52, Mans Rullgard mans.rullgard@linaro.org wrote:
I ran my usual set of benchmarks of libav compiled with the current gcc releases (hand-written assembly disabled). The results are in this spreadsheet: https://docs.google.com/spreadsheet/ccc?key=0AguHvNGaLXy9dHExeWZ1YWZ1c0s2Vnp...
First the good news, almost everything is faster with 4.6+ than with linaro-4.5.
The bad news is that some things have regressed since 4.6, even if not all the way back to 4.5 levels. A few especially problematic pieces stand out:
The mp3 test performs 5-15% worse. This regression is (mostly) attributable to the ff_mpadsp_apply_window_fixed [1] function. We have looked at this one before.
FLAC is 9% slower in upstream 4.7/4.8 compared to Linaro releases. Here flac_lpc_16_c [2] and flac_decorrelate_indep_c_16 [3] are mainly to blame.
Looking at this in the middle of the summit - In the flac_lpc_16_c code in the vectorized case could you take a look with perf and say which part is hot ?
is it the top level nested loop over i and j or is it the loop that does a summation when i < len ?
The non-vectorized case looks interesting because it might be a fallout with sched-pressure.
MPEG2/MPEG4 decoding is ~10% slower with vectorisation turned on. The culprit here is ff_simple_idct_8_c [4] function.
H.264 and DTS seem 1-2% slower, although this could be just noise.
Code size has increased by ~10% in all post-4.6 releases.
In all cases, compiled with -O3 -mcpu=cortex-a9. The vectorised builds all use -fvect-cost-model. Without this flag the results are much worse.
[1] http://git.libav.org/?p=libav.git%3Ba=blob%3Bf=libavcodec/mpegaudiodsp_templ... [2] http://git.libav.org/?p=libav.git%3Ba=blob%3Bf=libavcodec/flacdsp.c%3Bh=a2e3... [3] http://git.libav.org/?p=libav.git%3Ba=blob%3Bf=libavcodec/flacdsp_template.c... [4] http://git.libav.org/?p=libav.git%3Ba=blob%3Bf=libavcodec/simple_idct_templa...
-- Mans Rullgard / mru
linaro-toolchain mailing list linaro-toolchain@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-toolchain