I put a build harness around libav and gathered some profiling data. See: bzr branch lp:~linaro-toolchain-dev/+junk/libav-suite
It includes a Makefile that builds a C only, h.264 only decoder and two Creative Commons licensed videos to use as input.
README.rst has the basic commands for running ffmpeg and initial perf results showing the hot functions. Dave, 20 % of the time is spent in memcpy() so you might want to have a look.
The vectoriser has no effect. GCC 4.5 is ~17 % faster than 4.6. I'll look into extracting and harnessing the functions themselves later this week.
-- Michael