I've tried to clean up the libav microbenchmarks that I did for the strided load/store stuff. They're on Launchpad at:
lp:~rsandifo/+junk/loop-microbenchmarks
The main changes are that the benchmarks now preload the caches (for CPUs that don't allocate on write) and that they now check the optimised loop against an unoptimised one.
The usual big caveat applies: these loops were chosen because they were affected by strided load/stores. They aren't necessarily interesting for any other reason, and some were even explicitly marked as cold.
I'm going to add some of the video decode routines from Michael's benchmark soon. These microbenchmarks aren't supposed to be libav-specific though, so if you have other interesting ones, please do add them.
Richard