I've added some ideas to the NEON blueprint. There are now really 6 separate tasks, broken down into subitems, so it looks like we really could have 6 separate blueprints, as you mentioned on the wiki page. I wasn't sure how to create those blueprints correctly though. Please let me know if they don't look sensible!
Another one that would be interesting is the missed SMS opportunity exposed by Jim Huang's NEON intrinsic example from a while back. If we have a loop such as:
for (int i = 0; i < n; i++) { unsigned short foo = a[i]; ... a[i] = ...; }
then SMS treats the read from a[i + 1] as having a true dependency on a[i], preventing any useful cross-iteration scheduling.
Is that already on our radar? If not, could it be treated as another NEON work item? Like my auto inc/dec suggestion in the blueprint, it's really a generic improvement. However, like the inc/dec thing, I expect it's going to affect NEON more than core code.
Richard