Hi,
* finished SLP for reduction patch. The loop in DenBench that needs this feature also requires support of load permutation. I am considering to implement that too. I looked for other occasions that need this feature, but only found loops that are not vectorizable. So, I am not sure I'll proceed in this direction.
* looked into extract_even/odd and interleave_high/low implementation on ARM as a backup plan for the case we don't have special load/store support on time for the next release. Even though NEON VZIP and VUZP instructions can perform both even and odd (and high and low) computations simultaneously, I don't see how we can express that at the tree level.
* looking into ffmpeg
* non-Linaro issues
Ira