Revital Eres revital.eres@linaro.org writes:
btw, do you also have numbers of how much SMS (hopefully) improves performance on top of the vectorized code?
OK, here's a comparison of:
-mcpu=cortex-a8 -mfpu=neon -mfloat-abi=softfp -mvectorize-with-neon-quad -fno-auto-inc-dec
vs:
-mcpu=cortex-a8 -mfpu=neon -mfloat-abi=softfp -mvectorize-with-neon-quad -fmodulo-sched -fmodulo-sched-allow-regmoves -fno-auto-inc-dec
(including the register-scheduling patch). As you can see, it's a bit of a mixed bag.
mjpegenc is another case where SMS generates lots of spilling while the normal scheduler doesn't.
Richard
a3dec before: 500000 runs take 4.61447s after: 500000 runs take 4.61377s speedup: x1 aacsbr-1 before: 5000000 runs take 4.08304s after: 5000000 runs take 4.37424s speedup: x0.933 aacsbr-2 before: 5000000 runs take 3.01974s after: 5000000 runs take 3.08987s speedup: x0.977 aacsbr-3 before: 4000000 runs take 5.77838s after: 4000000 runs take 5.63406s speedup: x1.03 aes before: 500000 runs take 24.6801s after: 500000 runs take 16.9731s speedup: x1.45 avs before: 1000000 runs take 2.26315s after: 1000000 runs take 2.23679s speedup: x1.01 cdgraphics before: 1000000 runs take 2.40573s after: 1000000 runs take 2.40582s speedup: x1 dwt before: 2000000 runs take 9.02847s after: 2000000 runs take 9.1022s speedup: x0.992 dxa before: 2000000 runs take 4.55194s after: 2000000 runs take 4.40613s speedup: x1.03 mjpegenc before: 500000 runs take 3.28186s after: 500000 runs take 7.31247s speedup: x0.449 qtrle before: 1000000 runs take 4.52829s after: 1000000 runs take 4.54483s speedup: x0.996 resample before: 1000000 runs take 2.32559s after: 1000000 runs take 1.91016s speedup: x1.22 rgb2rgb-rgb24tobgr16 before: 1000000 runs take 1.15713s after: 1000000 runs take 1.1557s speedup: x1 rgb2rgb-rgb24tobgr32 before: 2000000 runs take 4.55701s after: 2000000 runs take 4.55148s speedup: x1 rgb2rgb-rgb32tobgr24 before: 2000000 runs take 3.59705s after: 2000000 runs take 3.59683s speedup: x1 rgb2rgb-shuffle-bytes before: 500000 runs take 2.23944s after: 500000 runs take 2.24091s speedup: x0.999 rgb2rgb-yuy2toyv12 before: 500000 runs take 4.51581s after: 500000 runs take 4.51593s speedup: x1 rgb2rgb-yv12touyvy before: 1500000 runs take 3.52603s after: 1500000 runs take 3.49863s speedup: x1.01 twinvq before: 500000 runs take 0.446442s after: 500000 runs take 0.452545s speedup: x0.987 wmavoice before: 500000 runs take 0.864716s after: 500000 runs take 0.864685s speedup: x1