Continued looking at constant reuse optimizations, as a background task. I've fiddled with the costs a bit more to remove false positives.
Continued benchmarking different generic tuning ideas. With each test run taking most of a day this is slow going.
Took Michael's rootfs that is used for all the toolchain testing and benchmarking, unpacked it, and repacked it so that it is compatible with "linaro-media-create", then tested that I could use it to run tests on LAVA successfully. I was hoping to use this for extra benchmarking bandwidth, but there's a permissions problem in the LAVA website software that means it's not yet possible to post private results to the system, so no proprietary benchmarks yet. I can still continue pipe-cleaning my process, and maybe run some benchmarks without actually reporting the results (or perhaps posting them somewhere write-only).
Begun work on adding GCC support for 64-bit shifts with NEON. This is not quite as simple as it ought to be because a) it's inefficient to move a value to NEON registers just to do a shift, so it needs to detect where the value is, and b) right shifts are encoded as left shift by a negative amount, and negative shift amounts are normally considered undefined behaviour.