The gaol and plan of investigation has been described in [1]
In the plan, this task is divided into three parts, 1) patch backport, 2) regression fix, and 3) exploration and study other ARM compilers. This report follow the same manner.
1. Patch backport. 8 patches are listed in [1]. Backport them to Linaro 4.5 tree will improve speed performance. Action/Recommendation: Backport them if speed improves. These patches are ones that I think they *should* improve speed, but "performance surprise" is not impossible.
2. Regression fix. So far (until r99399), Linaro GCC 4.5 is slower than FSF GCC 4.5.0 on some EEMBC benchmarks. Performance regression is introduced by four commits, r99324,r99330,r99369,r99380, see details in [2]. Action/Recommendation: Figure out why speed regression is introduced, and try to fix it. One cent here is that how to avoid speed regression. I do believe that sometimes regression is unavoidable, but it is better if can track them, and keep them manageable.
3. Exploration and study other ARM compilers. In this part, I don't find any possible thumb-2 specific improvements. However, loop optimization and instruction scheduling should be improved on ARM. (This statement may be true to all ports, or even all compilers)
Some tickets are opened for this part, LP:660644 Missed optimization opportunities LP:662692 Inner loop in autcor00 can be optimized better LP:656957 LP:645267 Improve code generation on switch statement LP:663793 Tune Swing Modulo Scheduling or Selective Scheduling for ARM LP:656373 Try -fsched-pressure for ARM I have to admit that instruction scheduling is quite hard, but if we can do something here, that will be great. I've put it in "performance-insdie-gcc" session on UDS. Let us talk about it a little there next week.
During this investigation, I also find LTO or "whole-program optimization" is useful to some EEMBC benchmarks. (I didn't run LTO/WPO at all, but I got this when read source of benchmarks)
[1] Plan of CS304: Thumb2 tuning investigation. http://lists.linaro.org/pipermail/linaro-toolchain/2010-October/000300.html [2] https://wiki.linaro.org/YaoQi/Sandbox/Thumb2SpeedOptimization