On Sat, 26 Jun 2021, Maxim Kuvyrkov wrote:
Hi Richard,
This is a report from Linaro’s automatic benchmarking system, which tracks upstream components and automatically bisects performance regressions down to a single commit. We have recently enabled these emails to be sent to patch authors automatically.
It appears that after your patch SPEC CPU2006’s 483.xalancbmk slowed down by 3% on arm-linux-gnueabihf when built with -O2 -flto. Any theories on what could’ve caused it?
The patch enables more DSE in some rare cases (plus the associated DCE). I guess that this somehow triggers followup transforms that are the real cause for the slowdown - you'd have to try to isolate the offending DSE (like add a debug counter to dse_optimize_stmt and "bisect" that).
This is one of the first reports, and we welcome feedback on improving reporting template.
I can only guess the used options (-O2 -flto, no -march?) from the "name" of the report. Reporting the full flags in effect for the benchmark in question (including any added via portability) plus the exact configury used for building the compiler (where -march might come into effect) would be more useful information compared to the rest of the mostly redundant info in the report.
Thanks, Richard.
Regards,
-- Maxim Kuvyrkov https://www.linaro.org
On 26 Jun 2021, at 16:24, ci_notify@linaro.org wrote:
Successfully identified regression in *gcc* in CI configuration tcwg_bmk_gnu_tk1/gnu-master-arm-spec2k6-O2_LTO. So far, this commit has regressed CI configurations:
- tcwg_bmk_gnu_tk1/gnu-master-arm-spec2k6-O2_LTO
Culprit:
<cut> commit 32955416d8040b1fa1ba21cd4179b3264e6c5bd6 Author: Richard Biener <rguenther@suse.de> Date: Mon May 3 12:07:58 2021 +0200
Improve PHI handling in DSE
This improves handling of PHI defs when walking uses in dse_classify_store to track two PHI defs. This happens when there are CFG merges and one PHI feeds into another. If we decide to want more then using a sbitmap for this might be the way to go.
2021-05-03 Richard Biener rguenther@suse.de
* tree-ssa-dse.c (dse_classify_store): Track two PHI defs. * gcc.dg/tree-ssa/ssa-dse-42.c: New testcase. * gcc.dg/pr81192.c: Disable DSE.
</cut>
Results regressed to (for first_bad == 32955416d8040b1fa1ba21cd4179b3264e6c5bd6) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--with-mode=arm --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--with-mode=arm --set gcc_override_configure=--disable-libsanitizer: -5 # true: 0 # benchmark -O2_LTO_marm -- artifacts/build-32955416d8040b1fa1ba21cd4179b3264e6c5bd6/results_id: 1 # 483.xalancbmk,Xalan_base.default regressed by 103
from (for last_good == ed3c43224cc4e378dbab066122bc63536ccb1276) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--with-mode=arm --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--with-mode=arm --set gcc_override_configure=--disable-libsanitizer: -5 # true: 0 # benchmark -O2_LTO_marm -- artifacts/build-ed3c43224cc4e378dbab066122bc63536ccb1276/results_id: 1
Artifacts of last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-arm... Results ID of last_good: tk1_32/tcwg_bmk_gnu_tk1/bisect-gnu-master-arm-spec2k6-O2_LTO/458 Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-arm... Results ID of first_bad: tk1_32/tcwg_bmk_gnu_tk1/bisect-gnu-master-arm-spec2k6-O2_LTO/480 Build top page/logs: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-arm...
Configuration details:
Reproduce builds:
<cut> mkdir investigate-gcc-32955416d8040b1fa1ba21cd4179b3264e6c5bd6 cd investigate-gcc-32955416d8040b1fa1ba21cd4179b3264e6c5bd6
git clone https://git.linaro.org/toolchain/jenkins-scripts
mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-arm... --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-arm... --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-arm... --fail chmod +x artifacts/test.sh
# Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh
cd gcc
# Reproduce first_bad build git checkout --detach 32955416d8040b1fa1ba21cd4179b3264e6c5bd6 ../artifacts/test.sh
# Reproduce last_good build git checkout --detach ed3c43224cc4e378dbab066122bc63536ccb1276 ../artifacts/test.sh
cd ..
</cut>
History of pending regressions and results: https://git.linaro.org/toolchain/ci/base-artifacts.git/log/?h=linaro-local/c...
Artifacts: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-arm... Build log: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-arm...
Full commit (up to 1000 lines):
<cut> commit 32955416d8040b1fa1ba21cd4179b3264e6c5bd6 Author: Richard Biener <rguenther@suse.de> Date: Mon May 3 12:07:58 2021 +0200
Improve PHI handling in DSE
This improves handling of PHI defs when walking uses in dse_classify_store to track two PHI defs. This happens when there are CFG merges and one PHI feeds into another. If we decide to want more then using a sbitmap for this might be the way to go.
2021-05-03 Richard Biener rguenther@suse.de
* tree-ssa-dse.c (dse_classify_store): Track two PHI defs. * gcc.dg/tree-ssa/ssa-dse-42.c: New testcase. * gcc.dg/pr81192.c: Disable DSE.
gcc/testsuite/gcc.dg/pr81192.c | 4 +++- gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-42.c | 20 ++++++++++++++++++++ gcc/tree-ssa-dse.c | 23 ++++++++++++++--------- 3 files changed, 37 insertions(+), 10 deletions(-)
diff --git a/gcc/testsuite/gcc.dg/pr81192.c b/gcc/testsuite/gcc.dg/pr81192.c index 71bbc13a0e9..6cab6056558 100644 --- a/gcc/testsuite/gcc.dg/pr81192.c +++ b/gcc/testsuite/gcc.dg/pr81192.c @@ -1,4 +1,4 @@ -/* { dg-options "-Os -fdump-tree-pre-details -fdisable-tree-evrp" } */ +/* { dg-options "-Os -fdump-tree-pre-details -fdisable-tree-evrp -fno-tree-dse" } */
/* Disable tree-evrp because the new version of evrp sees <bb 3> : @@ -16,6 +16,8 @@ produces # iftmp.2_12 = PHI <2147483647(3), iftmp.2_11(4)> which causes the situation being tested to dissapear before we get to PRE. */
+/* Likewise disable DSE which also elides the tail merging "opportunity". */
#if __SIZEOF_INT__ == 2 #define unsigned __UINT32_TYPE__ #define int __INT32_TYPE__ diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-42.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-42.c new file mode 100644 index 00000000000..34108c83828 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-42.c @@ -0,0 +1,20 @@ +/* { dg-do compile } */ +/* { dg-options "-O -fdump-tree-dse1" } */
+int a[2]; +void foo(int i, int k, int j) +{
- a[0] = i;
- if (k)
- a[0] = a[i] + k;
- else
- {
if (j)
a[1] = 1;
a[0] = a[i] + 3;
- }
- a[0] = 0;
+}
+/* The last stores to a[0] and a[1] remain. */ +/* { dg-final { scan-tree-dump-times " = " 2 "dse1" } } */ diff --git a/gcc/tree-ssa-dse.c b/gcc/tree-ssa-dse.c index e0a944c704a..dfa6d314727 100644 --- a/gcc/tree-ssa-dse.c +++ b/gcc/tree-ssa-dse.c @@ -799,7 +799,8 @@ dse_classify_store (ao_ref *ref, gimple *stmt, return DSE_STORE_LIVE;
auto_vec<gimple *, 10> defs;
gimple *phi_def = NULL;
gimple *first_phi_def = NULL;
{ /* Limit stmt walking. */gimple *last_phi_def = NULL; FOR_EACH_IMM_USE_STMT (use_stmt, ui, defvar)
@@ -825,7 +826,9 @@ dse_classify_store (ao_ref *ref, gimple *stmt, SSA_NAME_VERSION (PHI_RESULT (use_stmt)))) { defs.safe_push (use_stmt);
phi_def = use_stmt;
if (!first_phi_def)
first_phi_def = use_stmt;
} } /* If the statement is a use the store is not dead. */last_phi_def = use_stmt;
@@ -889,6 +892,8 @@ dse_classify_store (ao_ref *ref, gimple *stmt, gimple *def = defs[i]; gimple *use_stmt; use_operand_p use_p;
tree vdef = (gimple_code (def) == GIMPLE_PHI
/* If the path to check starts with a kill we do not need to process it further. ??? With byte tracking we need only kill the bytes currently? gimple_phi_result (def) : gimple_vdef (def));
@@ -901,8 +906,7 @@ dse_classify_store (ao_ref *ref, gimple *stmt, } /* If the path ends here we do not need to process it further. This for example happens with calls to noreturn functions. */
else if (gimple_code (def) != GIMPLE_PHI
&& has_zero_uses (gimple_vdef (def)))
not dead. */else if (has_zero_uses (vdef)) { /* But if the store is to global memory it is definitely
@@ -912,12 +916,13 @@ dse_classify_store (ao_ref *ref, gimple *stmt, } /* In addition to kills we can remove defs whose only use is another def in defs. That can only ever be PHIs of which
we track a single for simplicity reasons (we fail for multiple
PHIs anyways). We can also ignore defs that feed only into
we track two for simplicity reasons, the first and last in
{first,last}_phi_def (we fail for multiple PHIs anyways).
We can also ignore defs that feed only into already visited PHIs. */
else if (gimple_code (def) != GIMPLE_PHI
&& single_imm_use (gimple_vdef (def), &use_p, &use_stmt)
&& (use_stmt == phi_def
else if (single_imm_use (vdef, &use_p, &use_stmt)
&& (use_stmt == first_phi_def
|| use_stmt == last_phi_def || (gimple_code (use_stmt) == GIMPLE_PHI && bitmap_bit_p (visited, SSA_NAME_VERSION
</cut>