The remaining change for neon-strided-load-extract is to allow fwprop.c to propagate:
(set (reg X) (subreg (reg Y) N))
even if no further simplifications are possible. I posted the original patch for comments here:
http://article.gmane.org/gmane.comp.gcc.patches/246180/
and fixed the problem that H.J. spotted. I wasn't entirely happy with the benchmark results though, so it never became an RFA.
Richard
gcc/ * fwprop.c (propagate_rtx): Also set PR_CAN_APPEAR for subregs.
Index: gcc/fwprop.c =================================================================== --- gcc/fwprop.c 2011-09-15 14:36:23.206143787 +0100 +++ gcc/fwprop.c 2011-09-15 14:36:40.995131564 +0100 @@ -664,7 +664,12 @@ propagate_rtx (rtx x, enum machine_mode return NULL_RTX;
flags = 0; - if (REG_P (new_rtx) || CONSTANT_P (new_rtx)) + if (REG_P (new_rtx) + || CONSTANT_P (new_rtx) + || (GET_CODE (new_rtx) == SUBREG + && REG_P (SUBREG_REG (new_rtx)) + && (GET_MODE_SIZE (mode) + <= GET_MODE_SIZE (GET_MODE (SUBREG_REG (new_rtx)))))) flags |= PR_CAN_APPEAR; if (!for_each_rtx (&new_rtx, varying_mem_p, NULL)) flags |= PR_HANDLE_MEM;