Hi,
I am trying to implement interleave_high/low and extract_even/odd using vzip and vuzp instructions. I am attaching a patch that attempts to do that. It uses already existing neon_vzip<mode>_internal. The problem with it is that it doesn't express the fact that the two outputs of vzip depend on both inputs, which causes wrong code generation in CSE: for (a,b)<- vzip (c,d) and (e,f) <- vzip (g,d) CSE decides that b==f, since on RTL level b and f depend only on d.
Here is neon_vzip<mode>_internal:
(define_insn "neon_vzip<mode>_internal" [(set (match_operand:VDQW 0 "s_register_operand" "=w") (unspec:VDQW [(match_operand:VDQW 1 "s_register_operand" "0")] UNSPEC_VZIP1)) (set (match_operand:VDQW 2 "s_register_operand" "=w") (unspec:VDQW [(match_operand:VDQW 3 "s_register_operand" "2")] UNSPEC_VZIP2))] "TARGET_NEON" "vzip.<V_sz_elem>\t%<V_reg>0, %<V_reg>2" [(set (attr "neon_type") (if_then_else (ne (symbol_ref "<Is_d_reg>") (const_int 0)) (const_string "neon_bp_simple") (const_string "neon_bp_3cycle")))] )
Is there a way to properly mark the dependence?
Thanks, Ira
On Mon, 2011-01-31 at 16:02 +0200, Ira Rosen wrote:
Hi,
I am trying to implement interleave_high/low and extract_even/odd using vzip and vuzp instructions. I am attaching a patch that attempts to do that. It uses already existing neon_vzip<mode>_internal. The problem with it is that it doesn't express the fact that the two outputs of vzip depend on both inputs, which causes wrong code generation in CSE: for (a,b)<- vzip (c,d) and (e,f) <- vzip (g,d) CSE decides that b==f, since on RTL level b and f depend only on d.
Here is neon_vzip<mode>_internal:
(define_insn "neon_vzip<mode>_internal" [(set (match_operand:VDQW 0 "s_register_operand" "=w") (unspec:VDQW [(match_operand:VDQW 1 "s_register_operand" "0")] UNSPEC_VZIP1)) (set (match_operand:VDQW 2 "s_register_operand" "=w") (unspec:VDQW [(match_operand:VDQW 3 "s_register_operand" "2")] UNSPEC_VZIP2))]
Just to let you know, here, when I tried to make the two sets parallel and "+w" for operands 0 and 2 (instead of =w), it made reload ICE. I did this to represent the 'meaning' of the instruction in md-speak, but no success :-(
Tejas.
"TARGET_NEON" "vzip.<V_sz_elem>\t%<V_reg>0, %<V_reg>2" [(set (attr "neon_type") (if_then_else (ne (symbol_ref "<Is_d_reg>") (const_int 0)) (const_string "neon_bp_simple") (const_string "neon_bp_3cycle")))] )
Is there a way to properly mark the dependence?
Thanks, Ira _______________________________________________ linaro-toolchain mailing list linaro-toolchain@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-toolchain
Ira Rosen ira.rosen@linaro.org wrote:
(define_insn "neon_vzip<mode>_internal" [(set (match_operand:VDQW 0 "s_register_operand" "=w") (unspec:VDQW [(match_operand:VDQW 1 "s_register_operand" "0")] UNSPEC_VZIP1)) (set (match_operand:VDQW 2 "s_register_operand" "=w") (unspec:VDQW [(match_operand:VDQW 3 "s_register_operand" "2")] UNSPEC_VZIP2))] "TARGET_NEON" "vzip.<V_sz_elem>\t%<V_reg>0, %<V_reg>2" [(set (attr "neon_type") (if_then_else (ne (symbol_ref "<Is_d_reg>") (const_int 0)) (const_string "neon_bp_simple") (const_string "neon_bp_3cycle")))] )
Is there a way to properly mark the dependence?
I guess you need to make sure the UNSPEC_VZIP forms both explicitly depend on both input operands. This can be a bit tricky due to various reload constraints on how matching constraints and/or match_dup can be used, but I think something along the following lines should work:
[(set (match_operand:VDQW 0 "s_register_operand" "=w") (unspec:VDQW [(match_operand:VDQW 1 "s_register_operand" "0") (match_operand:VDQW 2 "s_register_operand" "w")] UNSPEC_VZIP1)) (set (match_operand:VDQW 3 "s_register_operand" "=2") (unspec:VDQW [(match_dup 1) (match_dup 2)] UNSPEC_VZIP2))]
Mit freundlichen Gruessen / Best Regards
Ulrich Weigand
-- Dr. Ulrich Weigand | Phone: +49-7031/16-3727 STSM, GNU compiler and toolchain for Linux on System z and Cell/B.E. IBM Deutschland Research & Development GmbH Vorsitzender des Aufsichtsrats: Martin Jetter | Geschäftsführung: Dirk Wittkopp Sitz der Gesellschaft: Böblingen | Registergericht: Amtsgericht Stuttgart, HRB 243294
On 31 January 2011 16:53, Ulrich Weigand Ulrich.Weigand@de.ibm.com wrote:
Ira Rosen ira.rosen@linaro.org wrote:
(define_insn "neon_vzip<mode>_internal" [(set (match_operand:VDQW 0 "s_register_operand" "=w") (unspec:VDQW [(match_operand:VDQW 1 "s_register_operand" "0")] UNSPEC_VZIP1)) (set (match_operand:VDQW 2 "s_register_operand" "=w") (unspec:VDQW [(match_operand:VDQW 3 "s_register_operand" "2")] UNSPEC_VZIP2))] "TARGET_NEON" "vzip.<V_sz_elem>\t%<V_reg>0, %<V_reg>2" [(set (attr "neon_type") (if_then_else (ne (symbol_ref "<Is_d_reg>") (const_int 0)) (const_string "neon_bp_simple") (const_string "neon_bp_3cycle")))] )
Is there a way to properly mark the dependence?
I guess you need to make sure the UNSPEC_VZIP forms both explicitly depend on both input operands. This can be a bit tricky due to various reload constraints on how matching constraints and/or match_dup can be used, but I think something along the following lines should work:
[(set (match_operand:VDQW 0 "s_register_operand" "=w") (unspec:VDQW [(match_operand:VDQW 1 "s_register_operand" "0") (match_operand:VDQW 2 "s_register_operand" "w")] UNSPEC_VZIP1)) (set (match_operand:VDQW 3 "s_register_operand" "=2") (unspec:VDQW [(match_dup 1) (match_dup 2)] UNSPEC_VZIP2))]
Thanks a lot! It seems to work. It fixed the problem and I am now testing the patch on the rest of the vectorizer testsuite.
Thanks, Ira
Mit freundlichen Gruessen / Best Regards
Ulrich Weigand
-- Dr. Ulrich Weigand | Phone: +49-7031/16-3727 STSM, GNU compiler and toolchain for Linux on System z and Cell/B.E. IBM Deutschland Research & Development GmbH Vorsitzender des Aufsichtsrats: Martin Jetter | Geschäftsführung: Dirk Wittkopp Sitz der Gesellschaft: Böblingen | Registergericht: Amtsgericht Stuttgart, HRB 243294
On 1 February 2011 11:47, Ira Rosen ira.rosen@linaro.org wrote:
Thanks a lot! It seems to work. It fixed the problem and I am now testing the patch on the rest of the vectorizer testsuite.
After testing only with the vectorizer testsuite (which contains at least 30 tests for strided accesses), I'd appreciate comments on the patch before I start full testing (cross testing with qemu doesn't work so well for NEON).
Thanks, Ira
ChangeLog:
* config/arm/neon.md (UNSPEC_EXTEVEN): New constant. (UNSPEC_EXTODD, UNSPEC_INTERHI, UNSPEC_INTERLO): Likewise. (neon_vuzp<mode>_extract): New instruction pattern. (vec_extract_even<mode>): New expander. (vec_extract_odd<mode>): Likewise. (neon_vzip<mode>_interleave): New instruction pattern. (vec_interleave_high<mode>): New expander. (vec_interleave_low<mode>): Likewise.
testsuite/ChangeLog:
* lib/target-supports.exp (check_effective_target_vect_extract_even_odd): Add NEON to the list of supporting targets. (check_effective_target_vect_extract_even_odd_wide, check_effective_target_vect_interleave): Likewise.
Ira Rosen ira.rosen@linaro.org wrote:
After testing only with the vectorizer testsuite (which contains at least 30 tests for strided accesses), I'd appreciate comments on the patch before I start full testing (cross testing with qemu doesn't work so well for NEON).
I cannot really comment on the ARM semantics, but just from the point of view of how the .md file looks:
- You define new patterns neon_vzip<mode>_interleave etc. instead of using the existing neon_vzip<mode>_internal etc., presumably because the existing patterns are broken. But because they are actually broken, they should rather be fixed themselves (and *then* used instead of duplicated).
- You define a bunch of new UNSPEC_ codes which will actually never show up in RTL, because the expanders that appear to use them always end in a DONE statement:
+(define_expand "vec_extract_even<mode>" + [(set (match_operand:VDQW 0 "register_operand" "") + (unspec:VDQW [(match_operand:VDQW 1 "register_operand" "") + (match_operand:VDQW 2 "register_operand" "")] + UNSPEC_EXTEVEN))] + "TARGET_NEON" +{ + rtx tmp = gen_reg_rtx (<MODE>mode); + + if (BYTES_BIG_ENDIAN) + emit_insn (gen_neon_vuzp<mode>_extract (tmp, operands[1], operands[2], + operands[0])); + else + emit_insn (gen_neon_vuzp<mode>_extract (operands[0], operands[1], + operands[2], tmp)); + DONE; +})
It doesn't seem to make much sense to define UNSPECs here; such expanders can instead simply use naked match_operands like so:
(define_expand "vec_extract_even<mode>" [(match_operand:VDQW 0 "register_operand" "") (match_operand:VDQW 1 "register_operand" "") (match_operand:VDQW 2 "register_operand" "")]
Mit freundlichen Gruessen / Best Regards
Ulrich Weigand
-- Dr. Ulrich Weigand | Phone: +49-7031/16-3727 STSM, GNU compiler and toolchain for Linux on System z and Cell/B.E. IBM Deutschland Research & Development GmbH Vorsitzender des Aufsichtsrats: Martin Jetter | Geschäftsführung: Dirk Wittkopp Sitz der Gesellschaft: Böblingen | Registergericht: Amtsgericht Stuttgart, HRB 243294
On 1 February 2011 14:01, Ulrich Weigand Ulrich.Weigand@de.ibm.com wrote:
Ira Rosen ira.rosen@linaro.org wrote:
After testing only with the vectorizer testsuite (which contains at least 30 tests for strided accesses), I'd appreciate comments on the patch before I start full testing (cross testing with qemu doesn't work so well for NEON).
I cannot really comment on the ARM semantics, but just from the point of view of how the .md file looks:
- You define new patterns neon_vzip<mode>_interleave etc. instead of using
the existing neon_vzip<mode>_internal etc., presumably because the existing patterns are broken. But because they are actually broken, they should rather be fixed themselves (and *then* used instead of duplicated).
Are they actually broken ? I'd be worried if that were the case. My understanding is that the existing ones are being used for the Neon intrinsics / builtins.
It would be a good thing to integrate the 2 of them together if at all possible.
Ramana
Ramana Radhakrishnan ramana.radhakrishnan@linaro.org wrote:
On 1 February 2011 14:01, Ulrich Weigand Ulrich.Weigand@de.ibm.com
wrote:
- You define new patterns neon_vzip<mode>_interleave etc. instead of
using
the existing neon_vzip<mode>_internal etc., presumably because the existing patterns are broken. But because they are actually broken, they should rather be fixed themselves (and *then* used instead of duplicated).
Are they actually broken ? I'd be worried if that were the case. My understanding is that the existing ones are being used for the Neon intrinsics / builtins.
Yes, they're broken, for the reason Ira originally points out:
"The problem with it is that it doesn't express the fact that the two outputs of vzip depend on both inputs, which causes wrong code generation in CSE: for (a,b)<- vzip (c,d) and (e,f) <- vzip (g,d) CSE decides that b==f, since on RTL level b and f depend only on d."
(define_insn "neon_vzip<mode>_internal" [(set (match_operand:VDQW 0 "s_register_operand" "=w") (unspec:VDQW [(match_operand:VDQW 1 "s_register_operand" "0")] UNSPEC_VZIP1)) (set (match_operand:VDQW 2 "s_register_operand" "=w") (unspec:VDQW [(match_operand:VDQW 3 "s_register_operand" "2")] UNSPEC_VZIP2))]
The VZIP unspecs must have *both* input registers as operand, since the output actually depends on both.
It may be that in the original use (where the output seems to always be written to memory for some reason?) CSE rarely has an opportunity for perform this transformation, but it's still at least latently broken ...
It would be a good thing to integrate the 2 of them together if at all possible.
I think if the original pattern is fixed, Ira should be able to just use it without problem.
Mit freundlichen Gruessen / Best Regards
Ulrich Weigand
-- Dr. Ulrich Weigand | Phone: +49-7031/16-3727 STSM, GNU compiler and toolchain for Linux on System z and Cell/B.E. IBM Deutschland Research & Development GmbH Vorsitzender des Aufsichtsrats: Martin Jetter | Geschäftsführung: Dirk Wittkopp Sitz der Gesellschaft: Böblingen | Registergericht: Amtsgericht Stuttgart, HRB 243294
On 1 February 2011 16:23, Ulrich Weigand Ulrich.Weigand@de.ibm.com wrote:
Are they actually broken ? I'd be worried if that were the case. My understanding is that the existing ones are being used for the Neon intrinsics / builtins.
Yes, they're broken, for the reason Ira originally points out:
Right. The intrinsics are broken as well (I wrote a test to check that). The existing intrinsic testcases are compiled with -O0, which doesn't apply CSE (also there are no testcases that actually check the results of vzip, vuzp...).
I'll fix the patch according to your comments, add intrinsic testcases, and also fix neon_vtrn<mode>_internal.
Thanks, Ira
linaro-toolchain@lists.linaro.org