Successfully identified regression in *gcc* in CI configuration tcwg_bmk_gnu_tk1/gnu-master-arm-spec2k6-O3_LTO. So far, this commit has regressed CI configurations: - tcwg_bmk_gnu_tk1/gnu-master-arm-spec2k6-O3_LTO
Culprit: <cut> commit f31da42e047e8018ca6ad9809273bc7efb6ffcaf Author: Richard Biener rguenther@suse.de Date: Fri Aug 6 14:39:05 2021 +0200
tree-optimization/101801 - remove vect_worthwhile_without_simd_p
This removes the cost part of vect_worthwhile_without_simd_p, retaining only the correctness bits. The reason is that the cost heuristic do not properly account for SLP plus the check whether "without simd" applies misfires for AVX512 mask vectors at the moment, leading to missed vectorizations there.
Any costing decision should take place in the cost modeling, no single stmt is to disable all vectorization on its own.
2021-08-06 Richard Biener rguenther@suse.de
PR tree-optimization/101801 * tree-vectorizer.h (vect_worthwhile_without_simd_p): Rename... (vect_can_vectorize_without_simd_p): ... to this. * tree-vect-loop.c (vect_worthwhile_without_simd_p): Rename... (vect_can_vectorize_without_simd_p): ... to this and fold in vect_min_worthwhile_factor. (vect_min_worthwhile_factor): Remove. (vectorizable_reduction): Adjust and remove the cost part. * tree-vect-stmts.c (vectorizable_shift): Likewise. (vectorizable_operation): Likewise. </cut>
Results regressed to (for first_bad == f31da42e047e8018ca6ad9809273bc7efb6ffcaf) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--with-mode=arm --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--with-mode=arm --set gcc_override_configure=--disable-libsanitizer: -5 # true: 0 # benchmark -- -O3_LTO_marm artifacts/build-f31da42e047e8018ca6ad9809273bc7efb6ffcaf/results_id: 1 # 482.sphinx3,sphinx_livepretend_base.default regressed by 105
from (for last_good == c2a984a3570b908a44a35e43bb48f0a05196156a) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--with-mode=arm --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--with-mode=arm --set gcc_override_configure=--disable-libsanitizer: -5 # true: 0 # benchmark -- -O3_LTO_marm artifacts/build-c2a984a3570b908a44a35e43bb48f0a05196156a/results_id: 1
Artifacts of last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-arm... Results ID of last_good: tk1_32/tcwg_bmk_gnu_tk1/bisect-gnu-master-arm-spec2k6-O3_LTO/3203 Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-arm... Results ID of first_bad: tk1_32/tcwg_bmk_gnu_tk1/bisect-gnu-master-arm-spec2k6-O3_LTO/3211 Build top page/logs: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-arm...
Configuration details:
Reproduce builds: <cut> mkdir investigate-gcc-f31da42e047e8018ca6ad9809273bc7efb6ffcaf cd investigate-gcc-f31da42e047e8018ca6ad9809273bc7efb6ffcaf
git clone https://git.linaro.org/toolchain/jenkins-scripts
mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-arm... --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-arm... --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-arm... --fail chmod +x artifacts/test.sh
# Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh
# Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /gcc/ ./ ./bisect/baseline/
cd gcc
# Reproduce first_bad build git checkout --detach f31da42e047e8018ca6ad9809273bc7efb6ffcaf ../artifacts/test.sh
# Reproduce last_good build git checkout --detach c2a984a3570b908a44a35e43bb48f0a05196156a ../artifacts/test.sh
cd .. </cut>
History of pending regressions and results: https://git.linaro.org/toolchain/ci/base-artifacts.git/log/?h=linaro-local/c...
Artifacts: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-arm... Build log: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-arm...
Full commit (up to 1000 lines): <cut> commit f31da42e047e8018ca6ad9809273bc7efb6ffcaf Author: Richard Biener rguenther@suse.de Date: Fri Aug 6 14:39:05 2021 +0200
tree-optimization/101801 - remove vect_worthwhile_without_simd_p
This removes the cost part of vect_worthwhile_without_simd_p, retaining only the correctness bits. The reason is that the cost heuristic do not properly account for SLP plus the check whether "without simd" applies misfires for AVX512 mask vectors at the moment, leading to missed vectorizations there.
Any costing decision should take place in the cost modeling, no single stmt is to disable all vectorization on its own.
2021-08-06 Richard Biener rguenther@suse.de
PR tree-optimization/101801 * tree-vectorizer.h (vect_worthwhile_without_simd_p): Rename... (vect_can_vectorize_without_simd_p): ... to this. * tree-vect-loop.c (vect_worthwhile_without_simd_p): Rename... (vect_can_vectorize_without_simd_p): ... to this and fold in vect_min_worthwhile_factor. (vect_min_worthwhile_factor): Remove. (vectorizable_reduction): Adjust and remove the cost part. * tree-vect-stmts.c (vectorizable_shift): Likewise. (vectorizable_operation): Likewise. --- gcc/tree-vect-loop.c | 43 +++++++------------------------------------ gcc/tree-vect-stmts.c | 26 ++------------------------ gcc/tree-vectorizer.h | 2 +- 3 files changed, 10 insertions(+), 61 deletions(-)
diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c index 1e21fe6b13d..37c7daa7f9e 100644 --- a/gcc/tree-vect-loop.c +++ b/gcc/tree-vect-loop.c @@ -7227,24 +7227,13 @@ vectorizable_reduction (loop_vec_info loop_vinfo, if (dump_enabled_p ()) dump_printf (MSG_NOTE, "op not supported by target.\n"); if (maybe_ne (GET_MODE_SIZE (vec_mode), UNITS_PER_WORD) - || !vect_worthwhile_without_simd_p (loop_vinfo, code)) + || !vect_can_vectorize_without_simd_p (code)) ok = false; else if (dump_enabled_p ()) dump_printf (MSG_NOTE, "proceeding using word mode.\n"); }
- /* Worthwhile without SIMD support? */ - if (ok - && !VECTOR_MODE_P (TYPE_MODE (vectype_in)) - && !vect_worthwhile_without_simd_p (loop_vinfo, code)) - { - if (dump_enabled_p ()) - dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, - "not worthwhile without SIMD support.\n"); - ok = false; - } - /* lane-reducing operations have to go through vect_transform_reduction. For the other cases try without the single cycle optimization. */ if (!ok) @@ -7948,46 +7937,28 @@ vectorizable_phi (vec_info *, }
-/* Function vect_min_worthwhile_factor. +/* Return true if we can emulate CODE on an integer mode representation + of a vector. */
- For a loop where we could vectorize the operation indicated by CODE, - return the minimum vectorization factor that makes it worthwhile - to use generic vectors. */ -static unsigned int -vect_min_worthwhile_factor (enum tree_code code) +bool +vect_can_vectorize_without_simd_p (tree_code code) { switch (code) { case PLUS_EXPR: case MINUS_EXPR: case NEGATE_EXPR: - return 4; - case BIT_AND_EXPR: case BIT_IOR_EXPR: case BIT_XOR_EXPR: case BIT_NOT_EXPR: - return 2; + return true;
default: - return INT_MAX; + return false; } }
-/* Return true if VINFO indicates we are doing loop vectorization and if - it is worth decomposing CODE operations into scalar operations for - that loop's vectorization factor. */ - -bool -vect_worthwhile_without_simd_p (vec_info *vinfo, tree_code code) -{ - loop_vec_info loop_vinfo = dyn_cast <loop_vec_info> (vinfo); - unsigned HOST_WIDE_INT value; - return (loop_vinfo - && LOOP_VINFO_VECT_FACTOR (loop_vinfo).is_constant (&value) - && value >= vect_min_worthwhile_factor (code)); -} - /* Function vectorizable_induction
Check if STMT_INFO performs an induction computation that can be vectorized. diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c index 94bdb74ea8d..5b94d41e292 100644 --- a/gcc/tree-vect-stmts.c +++ b/gcc/tree-vect-stmts.c @@ -5685,24 +5685,13 @@ vectorizable_shift (vec_info *vinfo, /* Check only during analysis. */ if (maybe_ne (GET_MODE_SIZE (vec_mode), UNITS_PER_WORD) || (!vec_stmt - && !vect_worthwhile_without_simd_p (vinfo, code))) + && !vect_can_vectorize_without_simd_p (code))) return false; if (dump_enabled_p ()) dump_printf_loc (MSG_NOTE, vect_location, "proceeding using word mode.\n"); }
- /* Worthwhile without SIMD support? Check only during analysis. */ - if (!vec_stmt - && !VECTOR_MODE_P (TYPE_MODE (vectype)) - && !vect_worthwhile_without_simd_p (vinfo, code)) - { - if (dump_enabled_p ()) - dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, - "not worthwhile without SIMD support.\n"); - return false; - } - if (!vec_stmt) /* transformation not required. */ { if (slp_node @@ -6094,24 +6083,13 @@ vectorizable_operation (vec_info *vinfo, "op not supported by target.\n"); /* Check only during analysis. */ if (maybe_ne (GET_MODE_SIZE (vec_mode), UNITS_PER_WORD) - || (!vec_stmt && !vect_worthwhile_without_simd_p (vinfo, code))) + || (!vec_stmt && !vect_can_vectorize_without_simd_p (code))) return false; if (dump_enabled_p ()) dump_printf_loc (MSG_NOTE, vect_location, "proceeding using word mode.\n"); }
- /* Worthwhile without SIMD support? Check only during analysis. */ - if (!VECTOR_MODE_P (vec_mode) - && !vec_stmt - && !vect_worthwhile_without_simd_p (vinfo, code)) - { - if (dump_enabled_p ()) - dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, - "not worthwhile without SIMD support.\n"); - return false; - } - int reduc_idx = STMT_VINFO_REDUC_IDX (stmt_info); vec_loop_masks *masks = (loop_vinfo ? &LOOP_VINFO_MASKS (loop_vinfo) : NULL); internal_fn cond_fn = get_conditional_internal_fn (code); diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h index 5571b3cce3b..de0ecf86478 100644 --- a/gcc/tree-vectorizer.h +++ b/gcc/tree-vectorizer.h @@ -2061,7 +2061,7 @@ extern bool vectorizable_lc_phi (loop_vec_info, stmt_vec_info, gimple **, slp_tree); extern bool vectorizable_phi (vec_info *, stmt_vec_info, gimple **, slp_tree, stmt_vector_for_cost *); -extern bool vect_worthwhile_without_simd_p (vec_info *, tree_code); +extern bool vect_can_vectorize_without_simd_p (tree_code); extern int vect_get_known_peeling_cost (loop_vec_info, int, int *, stmt_vector_for_cost *, stmt_vector_for_cost *, </cut>