On Wed, 22 Sep 2021, Maxim Kuvyrkov wrote:
Hi Richard,
We have improved reporting of our benchmarking CI. Any feedback on the below report format?
Interestingly I saw no -fopt-info-loop report change for 482.sphinx3 on x86_64.
The change improved infrastructure, please file a bugreport and try pointing to a specific bad optimization decision - the infrastructure is (hopefully) not to blame here but a consumer which might have an off cost decision.
Richard.
Regards,
-- Maxim Kuvyrkov https://www.linaro.org
On 22 Sep 2021, at 01:58, ci_notify@linaro.org wrote:
After gcc commit f92901a508305f291fcf2acae0825379477724de Author: Richard Biener rguenther@suse.de
tree-optimization/65206 - dependence analysis on mixed pointer/array
the following benchmarks slowed down by more than 2%:
- 482.sphinx3 slowed down by 4% from 20816 to 21661 perf samples
Below reproducer instructions can be used to re-build both "first_bad" and "last_good" cross-toolchains used in this bisection. Naturally, the scripts will fail when triggerring benchmarking jobs if you don't have access to Linaro TCWG CI.
For your convenience, we have uploaded tarballs with pre-processed source and assembly files at:
- First_bad save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-master-aar...
- Last_good save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-master-aar...
- Baseline save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-master-aar...
Configuration:
- Benchmark: SPEC CPU2006
- Toolchain: GCC + Glibc + GNU Linker
- Version: all components were built from their tip of trunk
- Target: aarch64-linux-gnu
- Compiler flags: -O3
- Hardware: NVidia TX1 4x Cortex-A57
This benchmarking CI is work-in-progress, and we welcome feedback and suggestions at linaro-toolchain@lists.linaro.org . In our improvement plans is to add support for SPEC CPU2017 benchmarks and provide "perf report/annotate" data behind these reports.
THIS IS THE END OF INTERESTING STUFF. BELOW ARE LINKS TO BUILDS, REPRODUCTION INSTRUCTIONS, AND THE RAW COMMIT.
This commit has regressed these CI configurations:
- tcwg_bmk_gnu_tx1/gnu-master-aarch64-spec2k6-O3
First_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-master-aar... Last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-master-aar... Baseline build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-master-aar... Even more details: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-master-aar...
Reproduce builds:
<cut> mkdir investigate-gcc-f92901a508305f291fcf2acae0825379477724de cd investigate-gcc-f92901a508305f291fcf2acae0825379477724de
# Fetch scripts git clone https://git.linaro.org/toolchain/jenkins-scripts
# Fetch manifests and test.sh script mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-master-aar... --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-master-aar... --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-master-aar... --fail chmod +x artifacts/test.sh
# Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh
# Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /gcc/ ./ ./bisect/baseline/
cd gcc
# Reproduce first_bad build git checkout --detach f92901a508305f291fcf2acae0825379477724de ../artifacts/test.sh
# Reproduce last_good build git checkout --detach abdf63d782cba82b5ecf264248518cbb065650ed ../artifacts/test.sh
cd ..
</cut>
Full commit (up to 1000 lines):
<cut> commit f92901a508305f291fcf2acae0825379477724de Author: Richard Biener <rguenther@suse.de> Date: Wed Sep 8 14:42:31 2021 +0200
tree-optimization/65206 - dependence analysis on mixed pointer/array
This adds the capability to analyze the dependence of mixed pointer/array accesses. The example is from where using a masked load/store creates the pointer-based access when an otherwise unconditional access is array based. Other examples would include accesses to an array mixed with accesses from inlined helpers that work on pointers.
The idea is quite simple and old - analyze the data-ref indices as if the reference was pointer-based. The following change does this by changing dr_analyze_indices to work on the indices sub-structure and storing an alternate indices substructure in each data reference. That alternate set of indices is analyzed lazily by initialize_data_dependence_relation when it fails to match-up the main set of indices of two data references. initialize_data_dependence_relation is refactored into a head and a tail worker and changed to work on one of the indices structures and thus away from using DR_* access macros which continue to reference the main indices substructure.
There are quite some vectorization and loop distribution opportunities unleashed in SPEC CPU 2017, notably 520.omnetpp_r, 548.exchange2_r, 510.parest_r, 511.povray_r, 521.wrf_r, 526.blender_r, 527.cam4_r and 544.nab_r see amendments in what they report with -fopt-info-loop while the rest of the specrate set sees no changes there. Measuring runtime for the set where changes were reported reveals nothing off-noise besides 511.povray_r which seems to regress slightly for me (on a Zen2 machine with -Ofast -march=native).
2021-09-08 Richard Biener rguenther@suse.de
PR tree-optimization/65206 * tree-data-ref.h (struct data_reference): Add alt_indices, order it last. * tree-data-ref.c (free_data_ref): Release alt_indices. (dr_analyze_indices): Work on struct indices and get DR_REF as tree. (create_data_ref): Adjust. (initialize_data_dependence_relation): Split into head and tail. When the base objects fail to match up try again with pointer-based analysis of indices. * tree-vectorizer.c (vec_info_shared::check_datarefs): Do not compare the lazily computed alternate set of indices. * gcc.dg/torture/20210916.c: New testcase. * gcc.dg/vect/pr65206.c: Likewise.
gcc/testsuite/gcc.dg/torture/20210916.c | 20 ++++ gcc/testsuite/gcc.dg/vect/pr65206.c | 22 ++++ gcc/tree-data-ref.c | 174 +++++++++++++++++++++----------- gcc/tree-data-ref.h | 9 +- gcc/tree-vectorizer.c | 3 +- 5 files changed, 168 insertions(+), 60 deletions(-)
diff --git a/gcc/testsuite/gcc.dg/torture/20210916.c b/gcc/testsuite/gcc.dg/torture/20210916.c new file mode 100644 index 00000000000..0ea6d45e463 --- /dev/null +++ b/gcc/testsuite/gcc.dg/torture/20210916.c @@ -0,0 +1,20 @@ +/* { dg-do compile } */
+typedef union tree_node *tree; +struct tree_base {
- unsigned : 1;
- unsigned lang_flag_2 : 1;
+}; +struct tree_type {
- tree main_variant;
+}; +union tree_node {
- struct tree_base base;
- struct tree_type type;
+}; +tree finish_struct_t, finish_struct_x; +void finish_struct() +{
- for (; finish_struct_t->type.main_variant;)
- finish_struct_x->base.lang_flag_2 = 0;
+} diff --git a/gcc/testsuite/gcc.dg/vect/pr65206.c b/gcc/testsuite/gcc.dg/vect/pr65206.c new file mode 100644 index 00000000000..3b6262622c0 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/pr65206.c @@ -0,0 +1,22 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_double } */ +/* { dg-additional-options "-fno-trapping-math -fno-allow-store-data-races" } */ +/* { dg-additional-options "-mavx" { target avx } } */
+#define N 1024
+double a[N], b[N];
+void foo () +{
- for (int i = 0; i < N; ++i)
- if (b[i] < 3.)
a[i] += b[i];
+}
+/* We get a .MASK_STORE because while the load of a[i] does not trap
- the store would introduce store data races. Make sure we still
- can handle the data dependence with zero distance. */
+/* { dg-final { scan-tree-dump-not "versioning for alias required" "vect" { target { vect_masked_store || avx } } } } */ +/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" { target { vect_masked_store || avx } } } } */ diff --git a/gcc/tree-data-ref.c b/gcc/tree-data-ref.c index e061baa7c20..18307a554fc 100644 --- a/gcc/tree-data-ref.c +++ b/gcc/tree-data-ref.c @@ -99,6 +99,7 @@ along with GCC; see the file COPYING3. If not see #include "internal-fn.h" #include "vr-values.h" #include "range-op.h" +#include "tree-ssa-loop-ivopts.h"
static struct datadep_stats { @@ -1300,22 +1301,18 @@ base_supports_access_fn_components_p (tree base) DR, analyzed in LOOP and instantiated before NEST. */
static void -dr_analyze_indices (struct data_reference *dr, edge nest, loop_p loop) +dr_analyze_indices (struct indices *dri, tree ref, edge nest, loop_p loop) {
- vec<tree> access_fns = vNULL;
- tree ref, op;
- tree base, off, access_fn;
- /* If analyzing a basic-block there are no indices to analyze and thus no access functions. */ if (!nest) {
DR_BASE_OBJECT (dr) = DR_REF (dr);
DR_ACCESS_FNS (dr).create (0);
dri->base_object = ref;
}dri->access_fns.create (0); return;
- ref = DR_REF (dr);
vec<tree> access_fns = vNULL;
/* REALPART_EXPR and IMAGPART_EXPR can be handled like accesses into a two element array with a constant index. The base is
@@ -1338,8 +1335,8 @@ dr_analyze_indices (struct data_reference *dr, edge nest, loop_p loop) { if (TREE_CODE (ref) == ARRAY_REF) {
op = TREE_OPERAND (ref, 1);
access_fn = analyze_scalar_evolution (loop, op);
tree op = TREE_OPERAND (ref, 1);
access_fn = instantiate_scev (nest, loop, access_fn); access_fns.safe_push (access_fn); }tree access_fn = analyze_scalar_evolution (loop, op);
@@ -1370,16 +1367,16 @@ dr_analyze_indices (struct data_reference *dr, edge nest, loop_p loop) analyzed nest, add it as an additional independent access-function. */ if (TREE_CODE (ref) == MEM_REF) {
op = TREE_OPERAND (ref, 0);
access_fn = analyze_scalar_evolution (loop, op);
tree op = TREE_OPERAND (ref, 0);
{tree access_fn = analyze_scalar_evolution (loop, op); access_fn = instantiate_scev (nest, loop, access_fn); if (TREE_CODE (access_fn) == POLYNOMIAL_CHREC)
tree memoff = TREE_OPERAND (ref, 1);tree orig_type;
base = initial_condition (access_fn);
orig_type = TREE_TYPE (base);
tree base = initial_condition (access_fn);
STRIP_USELESS_TYPE_CONVERSION (base);tree orig_type = TREE_TYPE (base);
split_constant_offset (base, &base, &off); STRIP_USELESS_TYPE_CONVERSION (base); /* Fold the MEM_REF offset into the evolutions initialtree off;
@@ -1424,7 +1421,7 @@ dr_analyze_indices (struct data_reference *dr, edge nest, loop_p loop) base, memoff); MR_DEPENDENCE_CLIQUE (ref) = MR_DEPENDENCE_CLIQUE (old); MR_DEPENDENCE_BASE (ref) = MR_DEPENDENCE_BASE (old);
DR_UNCONSTRAINED_BASE (dr) = true;
access_fns.safe_push (access_fn); } }dri->unconstrained_base = true;
@@ -1436,8 +1433,8 @@ dr_analyze_indices (struct data_reference *dr, edge nest, loop_p loop) build_int_cst (reference_alias_ptr_type (ref), 0)); }
- DR_BASE_OBJECT (dr) = ref;
- DR_ACCESS_FNS (dr) = access_fns;
- dri->base_object = ref;
- dri->access_fns = access_fns;
}
/* Extracts the alias analysis information from the memory reference DR. */ @@ -1463,6 +1460,8 @@ void free_data_ref (data_reference_p dr) { DR_ACCESS_FNS (dr).release ();
- if (dr->alt_indices.base_object)
- dr->alt_indices.access_fns.release (); free (dr);
}
@@ -1497,7 +1496,7 @@ create_data_ref (edge nest, loop_p loop, tree memref, gimple *stmt,
dr_analyze_innermost (&DR_INNERMOST (dr), memref, nest != NULL ? loop : NULL, stmt);
- dr_analyze_indices (dr, nest, loop);
dr_analyze_indices (&dr->indices, DR_REF (dr), nest, loop); dr_analyze_alias (dr);
if (dump_file && (dump_flags & TDF_DETAILS))
@@ -3066,41 +3065,30 @@ access_fn_components_comparable_p (tree ref_a, tree ref_b) TREE_TYPE (TREE_OPERAND (ref_b, 0))); }
-/* Initialize a data dependence relation between data accesses A and
- B. NB_LOOPS is the number of loops surrounding the references: the
- size of the classic distance/direction vectors. */
+/* Initialize a data dependence relation RES in LOOP_NEST. USE_ALT_INDICES
- is true when the main indices of A and B were not comparable so we try again
- with alternate indices computed on an indirect reference. */
struct data_dependence_relation * -initialize_data_dependence_relation (struct data_reference *a,
struct data_reference *b,
vec<loop_p> loop_nest)
+initialize_data_dependence_relation (struct data_dependence_relation *res,
vec<loop_p> loop_nest,
bool use_alt_indices)
{
- struct data_dependence_relation *res;
- struct data_reference *a = DDR_A (res);
- struct data_reference *b = DDR_B (res); unsigned int i;
- res = XCNEW (struct data_dependence_relation);
- DDR_A (res) = a;
- DDR_B (res) = b;
- DDR_LOOP_NEST (res).create (0);
- DDR_SUBSCRIPTS (res).create (0);
- DDR_DIR_VECTS (res).create (0);
- DDR_DIST_VECTS (res).create (0);
- if (a == NULL || b == NULL)
- struct indices *indices_a = &a->indices;
- struct indices *indices_b = &b->indices;
- if (use_alt_indices) {
DDR_ARE_DEPENDENT (res) = chrec_dont_know;
return res;
if (TREE_CODE (DR_REF (a)) != MEM_REF)
- indices_a = &a->alt_indices;
if (TREE_CODE (DR_REF (b)) != MEM_REF)
- indices_b = &b->alt_indices; }
- /* If the data references do not alias, then they are independent. */
- if (!dr_may_alias_p (a, b, loop_nest.exists () ? loop_nest[0] : NULL))
- {
DDR_ARE_DEPENDENT (res) = chrec_known;
return res;
- }
- unsigned int num_dimensions_a = DR_NUM_DIMENSIONS (a);
- unsigned int num_dimensions_b = DR_NUM_DIMENSIONS (b);
- unsigned int num_dimensions_a = indices_a->access_fns.length ();
- unsigned int num_dimensions_b = indices_b->access_fns.length (); if (num_dimensions_a == 0 || num_dimensions_b == 0) { DDR_ARE_DEPENDENT (res) = chrec_dont_know;
@@ -3125,9 +3113,9 @@ initialize_data_dependence_relation (struct data_reference *a,
the a and b accesses have a single ARRAY_REF component reference [0] but have two subscripts. */
- if (DR_UNCONSTRAINED_BASE (a))
- if (indices_a->unconstrained_base) num_dimensions_a -= 1;
- if (DR_UNCONSTRAINED_BASE (b))
if (indices_b->unconstrained_base) num_dimensions_b -= 1;
/* These structures describe sequences of component references in
@@ -3210,6 +3198,10 @@ initialize_data_dependence_relation (struct data_reference *a, B: [3, 4] (i.e. s.e) */ while (index_a < num_dimensions_a && index_b < num_dimensions_b) {
/* The alternate indices form always has a single dimension
with unconstrained base. */
gcc_assert (!use_alt_indices);
allowed by dr_analyze_indices. */ gcc_checking_assert (access_fn_component_p (ref_a));/* REF_A and REF_B must be one of the component access types
@@ -3280,11 +3272,12 @@ initialize_data_dependence_relation (struct data_reference *a, /* See whether FULL_SEQ ends at the base and whether the two bases are equal. We do not care about TBAA or alignment info so we can use OEP_ADDRESS_OF to avoid false negatives. */
- tree base_a = DR_BASE_OBJECT (a);
- tree base_b = DR_BASE_OBJECT (b);
- tree base_a = indices_a->base_object;
- tree base_b = indices_b->base_object; bool same_base_p = (full_seq.start_a + full_seq.length == num_dimensions_a && full_seq.start_b + full_seq.length == num_dimensions_b
&& DR_UNCONSTRAINED_BASE (a) == DR_UNCONSTRAINED_BASE (b)
&& (indices_a->unconstrained_base
== indices_b->unconstrained_base) && operand_equal_p (base_a, base_b, OEP_ADDRESS_OF) && (types_compatible_p (TREE_TYPE (base_a), TREE_TYPE (base_b))
@@ -3323,7 +3316,7 @@ initialize_data_dependence_relation (struct data_reference *a, both lvalues are distinct from the object's declared type. */ if (same_base_p) {
if (DR_UNCONSTRAINED_BASE (a))
full_seq.length += 1; } elseif (indices_a->unconstrained_base)
@@ -3332,8 +3325,41 @@ initialize_data_dependence_relation (struct data_reference *a, /* Punt if we didn't find a suitable sequence. */ if (full_seq.length == 0) {
DDR_ARE_DEPENDENT (res) = chrec_dont_know;
return res;
if (use_alt_indices
|| (TREE_CODE (DR_REF (a)) == MEM_REF
&& TREE_CODE (DR_REF (b)) == MEM_REF)
|| may_be_nonaddressable_p (DR_REF (a))
|| may_be_nonaddressable_p (DR_REF (b)))
{
/* Fully exhausted possibilities. */
DDR_ARE_DEPENDENT (res) = chrec_dont_know;
return res;
}
/* Try evaluating both DRs as dereferences of pointers. */
if (!a->alt_indices.base_object
&& TREE_CODE (DR_REF (a)) != MEM_REF)
{
tree alt_ref = build2 (MEM_REF, TREE_TYPE (DR_REF (a)),
build1 (ADDR_EXPR, ptr_type_node, DR_REF (a)),
build_int_cst
(reference_alias_ptr_type (DR_REF (a)), 0));
dr_analyze_indices (&a->alt_indices, alt_ref,
loop_preheader_edge (loop_nest[0]),
loop_containing_stmt (DR_STMT (a)));
}
if (!b->alt_indices.base_object
&& TREE_CODE (DR_REF (b)) != MEM_REF)
{
tree alt_ref = build2 (MEM_REF, TREE_TYPE (DR_REF (b)),
build1 (ADDR_EXPR, ptr_type_node, DR_REF (b)),
build_int_cst
(reference_alias_ptr_type (DR_REF (b)), 0));
dr_analyze_indices (&b->alt_indices, alt_ref,
loop_preheader_edge (loop_nest[0]),
loop_containing_stmt (DR_STMT (b)));
}
return initialize_data_dependence_relation (res, loop_nest, true);
}
if (!same_base_p)
@@ -3381,8 +3407,8 @@ initialize_data_dependence_relation (struct data_reference *a, struct subscript *subscript;
subscript = XNEW (struct subscript);
SUB_ACCESS_FN (subscript, 0) = DR_ACCESS_FN (a, full_seq.start_a + i);
SUB_ACCESS_FN (subscript, 1) = DR_ACCESS_FN (b, full_seq.start_b + i);
SUB_ACCESS_FN (subscript, 0) = indices_a->access_fns[full_seq.start_a + i];
SUB_ACCESS_FN (subscript, 1) = indices_b->access_fns[full_seq.start_b + i]; SUB_CONFLICTS_IN_A (subscript) = conflict_fn_not_known (); SUB_CONFLICTS_IN_B (subscript) = conflict_fn_not_known (); SUB_LAST_CONFLICT (subscript) = chrec_dont_know;
@@ -3393,6 +3419,40 @@ initialize_data_dependence_relation (struct data_reference *a, return res; }
+/* Initialize a data dependence relation between data accesses A and
- B. NB_LOOPS is the number of loops surrounding the references: the
- size of the classic distance/direction vectors. */
+struct data_dependence_relation * +initialize_data_dependence_relation (struct data_reference *a,
struct data_reference *b,
vec<loop_p> loop_nest)
+{
- data_dependence_relation *res = XCNEW (struct data_dependence_relation);
- DDR_A (res) = a;
- DDR_B (res) = b;
- DDR_LOOP_NEST (res).create (0);
- DDR_SUBSCRIPTS (res).create (0);
- DDR_DIR_VECTS (res).create (0);
- DDR_DIST_VECTS (res).create (0);
- if (a == NULL || b == NULL)
- {
DDR_ARE_DEPENDENT (res) = chrec_dont_know;
return res;
- }
- /* If the data references do not alias, then they are independent. */
- if (!dr_may_alias_p (a, b, loop_nest.exists () ? loop_nest[0] : NULL))
- {
DDR_ARE_DEPENDENT (res) = chrec_known;
return res;
- }
- return initialize_data_dependence_relation (res, loop_nest, false);
+}
/* Frees memory used by the conflict function F. */
static void diff --git a/gcc/tree-data-ref.h b/gcc/tree-data-ref.h index 685f33d85ae..74f579c9f3f 100644 --- a/gcc/tree-data-ref.h +++ b/gcc/tree-data-ref.h @@ -166,14 +166,19 @@ struct data_reference and runs to completion. */ bool is_conditional_in_stmt;
/* Alias information for the data reference. */
struct dr_alias alias;
/* Behavior of the memory reference in the innermost loop. */ struct innermost_loop_behavior innermost;
/* Subscripts of this data reference. */ struct indices indices;
- /* Alias information for the data reference. */
- struct dr_alias alias;
- /* Alternate subscripts initialized lazily and used by data-dependence
analysis only when the main indices of two DRs are not comparable.
Keep last to keep vec_info_shared::check_datarefs happy. */
- struct indices alt_indices;
};
#define DR_STMT(DR) (DR)->stmt diff --git a/gcc/tree-vectorizer.c b/gcc/tree-vectorizer.c index 3aa3e2a6783..20daa31187d 100644 --- a/gcc/tree-vectorizer.c +++ b/gcc/tree-vectorizer.c @@ -507,7 +507,8 @@ vec_info_shared::check_datarefs () return; gcc_assert (datarefs.length () == datarefs_copy.length ()); for (unsigned i = 0; i < datarefs.length (); ++i)
- if (memcmp (&datarefs_copy[i], datarefs[i], sizeof (data_reference)) != 0)
- if (memcmp (&datarefs_copy[i], datarefs[i],
gcc_unreachable ();offsetof (data_reference, alt_indices)) != 0)
}
</cut>