Re: [CI-NOTIFY]: TCWG Bisect tcwg_gnu/gnu-master-aarch64-check_bootstrap - Build # 80 - Successful! - linaro-toolchain

19 Jul 2021

Reported to upstream at https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101506 .
--
Maxim Kuvyrkov
https://www.linaro.org
...
On 19 Jul 2021, at 02:30, ci_notify@linaro.org wrote:
Successfully identified regression in *gcc* in CI configuration tcwg_gnu/gnu-master-aarch64-check_bootstrap.  So far, this commit has regressed CI configurations:

tcwg_gnu/gnu-master-aarch64-check_bootstrap

Culprit:
<cut>
commit 1dd3f21095858fbfd3e28a149578d5fb67e75f95
Author: Richard Biener <rguenther@suse.de>
Date:   Tue Jul 13 13:59:15 2021 +0200
Support reduction def re-use for epilogue with different vector size
The following adds support for re-using the vector reduction def
   from the main loop in vectorized epilogue loops on architectures
   which use different vector sizes for the epilogue.  That's only
   x86 as far as I am aware.
2021-07-13  Richard Biener  rguenther@suse.de
       * tree-vect-loop.c (vect_find_reusable_accumulator): Handle
       vector types where the old vector type has a multiple of
       the new vector type elements.
       (vect_create_partial_epilog): New function, split out from...
       (vect_create_epilog_for_reduction): ... here.
       (vect_transform_cycle_phi): Reduce the re-used accumulator
       to the new vector type.

       * gcc.target/i386/vect-reduc-1.c: New testcase.

</cut>
Results regressed to (for first_bad == 1dd3f21095858fbfd3e28a149578d5fb67e75f95)
# reset_artifacts:
-10
# build_abe binutils:
-2
# build_abe bootstrap:
-1
# build_abe dejagnu:
0
# build_abe check_bootstrap -- --set runtestflags=g++.dg/dg.exp --set runtestflags=gcc.target/aarch64/aarch64.exp:
1
# Getting actual results from build directory /home/tcwg-buildslave/workspace/tcwg_gnu_3/artifacts/build-1dd3f21095858fbfd3e28a149578d5fb67e75f95/sumfiles
# 	/home/tcwg-buildslave/workspace/tcwg_gnu_3/artifacts/build-1dd3f21095858fbfd3e28a149578d5fb67e75f95/sumfiles/libstdc++.sum
# 	/home/tcwg-buildslave/workspace/tcwg_gnu_3/artifacts/build-1dd3f21095858fbfd3e28a149578d5fb67e75f95/sumfiles/gfortran.sum
# 	/home/tcwg-buildslave/workspace/tcwg_gnu_3/artifacts/build-1dd3f21095858fbfd3e28a149578d5fb67e75f95/sumfiles/libitm.sum
# 	/home/tcwg-buildslave/workspace/tcwg_gnu_3/artifacts/build-1dd3f21095858fbfd3e28a149578d5fb67e75f95/sumfiles/libgomp.sum
# 	/home/tcwg-buildslave/workspace/tcwg_gnu_3/artifacts/build-1dd3f21095858fbfd3e28a149578d5fb67e75f95/sumfiles/libatomic.sum
# 	/home/tcwg-buildslave/workspace/tcwg_gnu_3/artifacts/build-1dd3f21095858fbfd3e28a149578d5fb67e75f95/sumfiles/g++.sum
# 	/home/tcwg-buildslave/workspace/tcwg_gnu_3/artifacts/build-1dd3f21095858fbfd3e28a149578d5fb67e75f95/sumfiles/gcc.sum
# Manifest:         gcc-compare-results/contrib/testsuite-management/flaky/gnu-master-aarch64-check_bootstrap.xfail
# Getting actual results from build directory base-artifacts/sumfiles
# 	base-artifacts/sumfiles/libstdc++.sum
# 	base-artifacts/sumfiles/gfortran.sum
# 	base-artifacts/sumfiles/libitm.sum
# 	base-artifacts/sumfiles/libgomp.sum
# 	base-artifacts/sumfiles/libatomic.sum
# 	base-artifacts/sumfiles/g++.sum
# 	base-artifacts/sumfiles/gcc.sum
# 
# 
# Unexpected results in this build (new failures)
# 		=== gcc tests ===
# 
# Running gcc.target/aarch64/aarch64.exp ...
# FAIL: gcc.target/aarch64/vect-fmaxv-fminv-compile.c scan-assembler fminnmv
# FAIL: gcc.target/aarch64/vect-fmaxv-fminv-compile.c scan-assembler fmaxnmv
# 
# 		=== Results Summary ===
from (for last_good == a7098d6ef4e4e799dab8ef925c62b199d707694b)
# reset_artifacts:
-10
# build_abe binutils:
-2
# build_abe bootstrap:
-1
# build_abe dejagnu:
0
# build_abe check_bootstrap -- --set runtestflags=g++.dg/dg.exp --set runtestflags=gcc.target/aarch64/aarch64.exp:
1
Artifacts of last_good build: https://ci.linaro.org/job/tcwg_gcc-bisect-gnu-master-aarch64-check_bootstrap...
Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_gcc-bisect-gnu-master-aarch64-check_bootstrap...
Build top page/logs: https://ci.linaro.org/job/tcwg_gcc-bisect-gnu-master-aarch64-check_bootstrap...
Configuration details:
Reproduce builds:
<cut>
mkdir investigate-gcc-1dd3f21095858fbfd3e28a149578d5fb67e75f95
cd investigate-gcc-1dd3f21095858fbfd3e28a149578d5fb67e75f95
git clone https://git.linaro.org/toolchain/jenkins-scripts
mkdir -p artifacts/manifests
curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_gcc-bisect-gnu-master-aarch64-check_bootstrap... --fail
curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_gcc-bisect-gnu-master-aarch64-check_bootstrap... --fail
curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_gcc-bisect-gnu-master-aarch64-check_bootstrap... --fail
chmod +x artifacts/test.sh
# Reproduce the baseline build (build all pre-requisites)
./jenkins-scripts/tcwg_gnu-build.sh @@ artifacts/manifests/build-baseline.sh
# Save baseline build state (which is then restored in artifacts/test.sh)
rsync -a --del --delete-excluded --exclude bisect/ --exclude artifacts/ --exclude gcc/ ./ ./bisect/baseline/
cd gcc
# Reproduce first_bad build
git checkout --detach 1dd3f21095858fbfd3e28a149578d5fb67e75f95
../artifacts/test.sh
# Reproduce last_good build
git checkout --detach a7098d6ef4e4e799dab8ef925c62b199d707694b
../artifacts/test.sh
cd ..
</cut>
History of pending regressions and results: https://git.linaro.org/toolchain/ci/base-artifacts.git/log/?h=linaro-local/c...
Artifacts: https://ci.linaro.org/job/tcwg_gcc-bisect-gnu-master-aarch64-check_bootstrap...
Build log: https://ci.linaro.org/job/tcwg_gcc-bisect-gnu-master-aarch64-check_bootstrap...
Full commit (up to 1000 lines):
<cut>
commit 1dd3f21095858fbfd3e28a149578d5fb67e75f95
Author: Richard Biener <rguenther@suse.de>
Date:   Tue Jul 13 13:59:15 2021 +0200
Support reduction def re-use for epilogue with different vector size
The following adds support for re-using the vector reduction def
   from the main loop in vectorized epilogue loops on architectures
   which use different vector sizes for the epilogue.  That's only
   x86 as far as I am aware.
2021-07-13  Richard Biener  rguenther@suse.de
       * tree-vect-loop.c (vect_find_reusable_accumulator): Handle
       vector types where the old vector type has a multiple of
       the new vector type elements.
       (vect_create_partial_epilog): New function, split out from...
       (vect_create_epilog_for_reduction): ... here.
       (vect_transform_cycle_phi): Reduce the re-used accumulator
       to the new vector type.

       * gcc.target/i386/vect-reduc-1.c: New testcase.


gcc/testsuite/gcc.target/i386/vect-reduc-1.c |  17 ++
gcc/tree-vect-loop.c                         | 227 ++++++++++++++++-----------
2 files changed, 156 insertions(+), 88 deletions(-)

diff --git a/gcc/testsuite/gcc.target/i386/vect-reduc-1.c b/gcc/testsuite/gcc.target/i386/vect-reduc-1.c
new file mode 100644
index 00000000000..9ee9ba4e736
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/vect-reduc-1.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -mavx2 -mno-avx512f -fdump-tree-vect-details" } */



+#define N 32
+int foo (int *a, int n)
+{

int sum = 1;
for (int i = 0; i < 8*N + 4; ++i)
sum += a[i];
return sum;

+}



+/* The reduction epilog should be vectorized and the accumulator

re-used.  */

+/* { dg-final { scan-tree-dump "LOOP EPILOGUE VECTORIZED" "vect" } } */
+/* { dg-final { scan-assembler-times "psrl" 2 } } */
+/* { dg-final { scan-assembler-times "padd" 5 } } */
diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index 8c27d75f889..e9780158a51 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -4896,12 +4896,11 @@ vect_find_reusable_accumulator (loop_vec_info loop_vinfo,
   	      accumulator->reduc_info->reduc_scalar_results.begin ()))
    return false;

/* For now, only handle the case in which both loops are operating on the
same vector types.  In future we could reduce wider vectors to narrower


ones as well.  */




/* Handle the case where we can reduce wider vectors to narrower ones.  */
tree vectype = STMT_VINFO_VECTYPE (reduc_info);
tree old_vectype = TREE_TYPE (accumulator->reduc_input);


if (!useless_type_conversion_p (old_vectype, vectype))


if (!constant_multiple_p (TYPE_VECTOR_SUBPARTS (old_vectype),

	    TYPE_VECTOR_SUBPARTS (vectype)))

return false;
/* Non-SLP reductions might apply an adjustment after the reduction


@@ -4935,6 +4934,101 @@ vect_find_reusable_accumulator (loop_vec_info loop_vinfo,
  return true;
}
+/* Reduce the vector VEC_DEF down to VECTYPE with reduction operation

CODE emitting stmts before GSI.  Returns a vector def of VECTYPE.  */


+static tree
+vect_create_partial_epilog (tree vec_def, tree vectype, enum tree_code code,

	    gimple_seq *seq)



+{

unsigned nunits = TYPE_VECTOR_SUBPARTS (TREE_TYPE (vec_def)).to_constant ();
unsigned nunits1 = TYPE_VECTOR_SUBPARTS (vectype).to_constant ();
tree stype = TREE_TYPE (vectype);
tree new_temp = vec_def;
while (nunits > nunits1)
{
 nunits /= 2;


 tree vectype1 = get_related_vectype_for_scalar_type (TYPE_MODE (vectype),


					   stype, nunits);


 unsigned int bitsize = tree_to_uhwi (TYPE_SIZE (vectype1));



 /* The target has to make sure we support lowpart/highpart


extraction, either via direct vector extract or through


an integer mode punning.  */


 tree dst1, dst2;


 gimple *epilog_stmt;


 if (convert_optab_handler (vec_extract_optab,


		 TYPE_MODE (TREE_TYPE (new_temp)),


		 TYPE_MODE (vectype1))


 != CODE_FOR_nothing)


{
 /* Extract sub-vectors directly once vec_extract becomes


    a conversion optab.  */


 dst1 = make_ssa_name (vectype1);


 epilog_stmt


     = gimple_build_assign (dst1, BIT_FIELD_REF,


		     build3 (BIT_FIELD_REF, vectype1,


			     new_temp, TYPE_SIZE (vectype1),


			     bitsize_int (0)));


 gimple_seq_add_stmt_without_update (seq, epilog_stmt);


 dst2 =  make_ssa_name (vectype1);


 epilog_stmt


     = gimple_build_assign (dst2, BIT_FIELD_REF,


		     build3 (BIT_FIELD_REF, vectype1,


			     new_temp, TYPE_SIZE (vectype1),


			     bitsize_int (bitsize)));


 gimple_seq_add_stmt_without_update (seq, epilog_stmt);


}
 else


{
 /* Extract via punning to appropriately sized integer mode


    vector.  */


 tree eltype = build_nonstandard_integer_type (bitsize, 1);


 tree etype = build_vector_type (eltype, 2);


 gcc_assert (convert_optab_handler (vec_extract_optab,


			     TYPE_MODE (etype),


			     TYPE_MODE (eltype))


      != CODE_FOR_nothing);


 tree tem = make_ssa_name (etype);


 epilog_stmt = gimple_build_assign (tem, VIEW_CONVERT_EXPR,


			     build1 (VIEW_CONVERT_EXPR,


				     etype, new_temp));


 gimple_seq_add_stmt_without_update (seq, epilog_stmt);


 new_temp = tem;


 tem = make_ssa_name (eltype);


 epilog_stmt


     = gimple_build_assign (tem, BIT_FIELD_REF,


		     build3 (BIT_FIELD_REF, eltype,


			     new_temp, TYPE_SIZE (eltype),


			     bitsize_int (0)));


 gimple_seq_add_stmt_without_update (seq, epilog_stmt);


 dst1 = make_ssa_name (vectype1);


 epilog_stmt = gimple_build_assign (dst1, VIEW_CONVERT_EXPR,


			     build1 (VIEW_CONVERT_EXPR,


				     vectype1, tem));


 gimple_seq_add_stmt_without_update (seq, epilog_stmt);


 tem = make_ssa_name (eltype);


 epilog_stmt


     = gimple_build_assign (tem, BIT_FIELD_REF,


		     build3 (BIT_FIELD_REF, eltype,


			     new_temp, TYPE_SIZE (eltype),


			     bitsize_int (bitsize)));


 gimple_seq_add_stmt_without_update (seq, epilog_stmt);


 dst2 =  make_ssa_name (vectype1);


 epilog_stmt = gimple_build_assign (dst2, VIEW_CONVERT_EXPR,


			     build1 (VIEW_CONVERT_EXPR,


				     vectype1, tem));


 gimple_seq_add_stmt_without_update (seq, epilog_stmt);


}

 new_temp = make_ssa_name (vectype1);


 epilog_stmt = gimple_build_assign (new_temp, code, dst1, dst2);


 gimple_seq_add_stmt_without_update (seq, epilog_stmt);


}

return new_temp;

+}



/* Function vect_create_epilog_for_reduction
Create code at the loop-epilog to finalize the result of a reduction
@@ -5684,87 +5778,11 @@ vect_create_epilog_for_reduction (loop_vec_info loop_vinfo,
  /* First reduce the vector to the desired vector size we should
do shift reduction on by combining upper and lower halves.  */


 new_temp = reduc_inputs[0];


 while (nunits > nunits1)


{
 nunits /= 2;


 vectype1 = get_related_vectype_for_scalar_type (TYPE_MODE (vectype),


					  stype, nunits);


 unsigned int bitsize = tree_to_uhwi (TYPE_SIZE (vectype1));



 /* The target has to make sure we support lowpart/highpart


    extraction, either via direct vector extract or through


    an integer mode punning.  */


 tree dst1, dst2;


 if (convert_optab_handler (vec_extract_optab,


		     TYPE_MODE (TREE_TYPE (new_temp)),


		     TYPE_MODE (vectype1))


     != CODE_FOR_nothing)


   {


     /* Extract sub-vectors directly once vec_extract becomes


 a conversion optab.  */


     dst1 = make_ssa_name (vectype1);


     epilog_stmt


  = gimple_build_assign (dst1, BIT_FIELD_REF,


			 build3 (BIT_FIELD_REF, vectype1,


				 new_temp, TYPE_SIZE (vectype1),


				 bitsize_int (0)));


     gsi_insert_before (&exit_gsi, epilog_stmt, GSI_SAME_STMT);


     dst2 =  make_ssa_name (vectype1);


     epilog_stmt


  = gimple_build_assign (dst2, BIT_FIELD_REF,


			 build3 (BIT_FIELD_REF, vectype1,


				 new_temp, TYPE_SIZE (vectype1),


				 bitsize_int (bitsize)));


     gsi_insert_before (&exit_gsi, epilog_stmt, GSI_SAME_STMT);


   }


 else


   {


     /* Extract via punning to appropriately sized integer mode


 vector.  */


     tree eltype = build_nonstandard_integer_type (bitsize, 1);


     tree etype = build_vector_type (eltype, 2);


     gcc_assert (convert_optab_handler (vec_extract_optab,


				 TYPE_MODE (etype),


				 TYPE_MODE (eltype))


	  != CODE_FOR_nothing);


     tree tem = make_ssa_name (etype);


     epilog_stmt = gimple_build_assign (tem, VIEW_CONVERT_EXPR,


				 build1 (VIEW_CONVERT_EXPR,


					 etype, new_temp));


     gsi_insert_before (&exit_gsi, epilog_stmt, GSI_SAME_STMT);


     new_temp = tem;


     tem = make_ssa_name (eltype);


     epilog_stmt


  = gimple_build_assign (tem, BIT_FIELD_REF,


			 build3 (BIT_FIELD_REF, eltype,


				 new_temp, TYPE_SIZE (eltype),


				 bitsize_int (0)));


     gsi_insert_before (&exit_gsi, epilog_stmt, GSI_SAME_STMT);


     dst1 = make_ssa_name (vectype1);


     epilog_stmt = gimple_build_assign (dst1, VIEW_CONVERT_EXPR,


				 build1 (VIEW_CONVERT_EXPR,


					 vectype1, tem));


     gsi_insert_before (&exit_gsi, epilog_stmt, GSI_SAME_STMT);


     tem = make_ssa_name (eltype);


     epilog_stmt


  = gimple_build_assign (tem, BIT_FIELD_REF,


			 build3 (BIT_FIELD_REF, eltype,


				 new_temp, TYPE_SIZE (eltype),


				 bitsize_int (bitsize)));


     gsi_insert_before (&exit_gsi, epilog_stmt, GSI_SAME_STMT);


     dst2 =  make_ssa_name (vectype1);


     epilog_stmt = gimple_build_assign (dst2, VIEW_CONVERT_EXPR,


				 build1 (VIEW_CONVERT_EXPR,


					 vectype1, tem));


     gsi_insert_before (&exit_gsi, epilog_stmt, GSI_SAME_STMT);


   }



 new_temp = make_ssa_name (vectype1);


 epilog_stmt = gimple_build_assign (new_temp, code, dst1, dst2);


 gsi_insert_before (&exit_gsi, epilog_stmt, GSI_SAME_STMT);


 reduc_inputs[0] = new_temp;


}


 gimple_seq stmts = NULL;


 new_temp = vect_create_partial_epilog (reduc_inputs[0], vectype1,


			     code, &stmts);


 gsi_insert_seq_before (&exit_gsi, stmts, GSI_SAME_STMT);


 reduc_inputs[0] = new_temp;

if (reduce_with_shift && !slp_reduc)

{


@@ -7681,13 +7699,46 @@ vect_transform_cycle_phi (loop_vec_info loop_vinfo,
if (auto *accumulator = reduc_info->reused_accumulator)
    {

 tree def = accumulator->reduc_input;


 unsigned int nreduc;


 bool res = constant_multiple_p (TYPE_VECTOR_SUBPARTS (TREE_TYPE (def)),


		      TYPE_VECTOR_SUBPARTS (vectype_out),


		      &nreduc);


 gcc_assert (res);


 if (nreduc != 1)


{
 /* Reduce the single vector to a smaller one.  */


 gimple_seq stmts = NULL;


 def = vect_create_partial_epilog (def, vectype_out,


			    STMT_VINFO_REDUC_CODE (reduc_info),


			    &stmts);


 /* Adjust the input so we pick up the partially reduced value


    for the skip edge in vect_create_epilog_for_reduction.  */


 accumulator->reduc_input = def;


 if (loop_vinfo->main_loop_edge)


   {


     /* While we'd like to insert on the edge this will split


 blocks and disturb bookkeeping, we also will eventually


 need this on the skip edge.  Rely on sinking to


 fixup optimal placement and insert in the pred.  */


     gimple_stmt_iterator gsi


= gsi_last_bb (loop_vinfo->main_loop_edge->src);


     /* Insert before a cond that eventually skips the


 epilogue.  */


     if (!gsi_end_p (gsi) && stmt_ends_bb_p (gsi_stmt (gsi)))


gsi_prev (&gsi);


     gsi_insert_seq_after (&gsi, stmts, GSI_CONTINUE_LINKING);


   }


 else


   gsi_insert_seq_on_edge_immediate (loop_preheader_edge (loop),


			      stmts);


}
 if (loop_vinfo->main_loop_edge)
vec_initial_defs[0]


 = vect_get_main_loop_result (loop_vinfo, accumulator->reduc_input,




 = vect_get_main_loop_result (loop_vinfo, def,
	       vec_initial_defs[0]);
else




vec_initial_defs.safe_push (accumulator->reduc_input);
 gcc_assert (vec_initial_defs.length () == 1);




vec_initial_defs.safe_push (def);
}
/* Generate the reduction PHIs upfront.  */


</cut>