Continued looking at my constant reuse optimization. I've identified a couple of hundred optimization opportunities in the whole of gcc itself, which is fewer than I had hoped. There are almost no opportunities when compiling for size as constants are always loaded from a constant pool in that case (I'm not sure why that's the case, given that this isn't any more space efficient than movw+movt, unless it can share the constant in more than one place).
Backported my -mtune=native patch to Linaro GCC.
Backported my generic tuning patch to Linaro GCC.
Backported my pr50717 patch to Linaro, and pushed to Launchpad for testing.
Analysed my benchmark results I made to aid generic tuning. Disappointingly the A8/A9 tuning is not as beneficial as one would like. In fact, the existing generic tuning patch (which was supposed to be a framework only) is actually quite competitive and gives better performance in some cases.
Set more benchmarks running, this time with NEON enabled. That's about 36 hour's worth on A9, and more like 90 hours on my A8 (obviously, there's some difference in the clock speeds there).
Discovered that my native tuning code won't compile with a C++ compiler (GCC Bugzilla PR50809). Tested and committed a fix upstream.