Identified regression caused by *gcc:76b75018b3d053a890ebe155e47814de14b3c9fb*:
commit 76b75018b3d053a890ebe155e47814de14b3c9fb
Author: Jason Merrill <jason(a)redhat.com>
c++: implement C++17 hardware interference size
Results regressed to (for first_bad == 76b75018b3d053a890ebe155e47814de14b3c9fb)
# reset_artifacts:
-10
# true:
0
# build_abe binutils:
1
# build_abe stage1:
2
# build_abe linux:
3
# build_abe glibc:
4
# First few build errors in logs:
from (for last_good == 8ea292591e42aa4d52b4b7a00b86335bfd2e2e85)
# reset_artifacts:
-10
# true:
0
# build_abe binutils:
1
# build_abe stage1:
2
# build_abe linux:
3
# build_abe glibc:
4
# build_abe stage2:
5
# build_abe gdb:
6
# build_abe qemu:
7
This commit has regressed these CI configurations:
- tcwg_gnu_cross_build/master-aarch64
Artifacts of last_good build: https://ci.linaro.org/job/tcwg_gnu_cross_build-bisect-master-aarch64/2/arti…
Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_gnu_cross_build-bisect-master-aarch64/2/arti…
Even more details: https://ci.linaro.org/job/tcwg_gnu_cross_build-bisect-master-aarch64/2/arti…
Reproduce builds:
<cut>
mkdir investigate-gcc-76b75018b3d053a890ebe155e47814de14b3c9fb
cd investigate-gcc-76b75018b3d053a890ebe155e47814de14b3c9fb
# Fetch scripts
git clone https://git.linaro.org/toolchain/jenkins-scripts
# Fetch manifests and test.sh script
mkdir -p artifacts/manifests
curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_gnu_cross_build-bisect-master-aarch64/2/arti… --fail
curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_gnu_cross_build-bisect-master-aarch64/2/arti… --fail
curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_gnu_cross_build-bisect-master-aarch64/2/arti… --fail
chmod +x artifacts/test.sh
# Reproduce the baseline build (build all pre-requisites)
./jenkins-scripts/tcwg_gnu-build.sh @@ artifacts/manifests/build-baseline.sh
# Save baseline build state (which is then restored in artifacts/test.sh)
mkdir -p ./bisect
rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /gcc/ ./ ./bisect/baseline/
cd gcc
# Reproduce first_bad build
git checkout --detach 76b75018b3d053a890ebe155e47814de14b3c9fb
../artifacts/test.sh
# Reproduce last_good build
git checkout --detach 8ea292591e42aa4d52b4b7a00b86335bfd2e2e85
../artifacts/test.sh
cd ..
</cut>
Full commit (up to 1000 lines):
<cut>
commit 76b75018b3d053a890ebe155e47814de14b3c9fb
Author: Jason Merrill <jason(a)redhat.com>
Date: Thu Jul 15 15:30:17 2021 -0400
c++: implement C++17 hardware interference size
The last missing piece of the C++17 standard library is the hardware
intereference size constants. Much of the delay in implementing these has
been due to uncertainty about what the right values are, and even whether
there is a single constant value that is suitable; the destructive
interference size is intended to be used in structure layout, so program
ABIs will depend on it.
In principle, both of these values should be the same as the target's L1
cache line size. When compiling for a generic target that is intended to
support a range of target CPUs with different cache line sizes, the
constructive size should probably be the minimum size, and the destructive
size the maximum, unless you are constrained by ABI compatibility with
previous code.
From discussion on gcc-patches, I've come to the conclusion that the
solution to the difficulty of choosing stable values is to give up on it,
and instead encourage only uses where ABI stability is unimportant: in
particular, uses where the ABI is shared at most between translation units
built at the same time with the same flags.
To that end, I've added a warning for any use of the constant value of
std::hardware_destructive_interference_size in a header or module export.
Appropriate uses within a project can disable the warning.
A previous iteration of this patch included an -finterference-tune flag to
make the value vary with -mtune; this iteration makes that the default
behavior, which should be appropriate for all reasonable uses of the
variable. The previous default of "stable-ish" seems to me likely to have
been more of an attractive nuisance; since we can't promise actual
stability, we should instead make proper uses more convenient.
JF Bastien's implementation proposal is summarized at
https://github.com/itanium-cxx-abi/cxx-abi/issues/74
I implement this by adding new --params for the two sizes. Targets can
override these values in targetm.target_option.override() to support a range
of values for the generic target; otherwise, both will default to the L1
cache line size.
64 bytes still seems correct for all x86.
I'm not sure why he proposed 64/64 for generic 32-bit ARM, since the Cortex
A9 has a 32-byte cache line, so I'd think 32/64 would make more sense.
He proposed 64/128 for generic AArch64, but since the A64FX now has a 256B
cache line, I've changed that to 64/256.
Other arch maintainers are invited to set ranges for their generic targets
if that seems better than using the default cache line size for both values.
With the above choice to reject stability as a goal, getting these values
"right" is now just a matter of what we want the default optimization to be,
and we can feel free to adjust them as CPUs with different cache lines
become more and less common.
gcc/ChangeLog:
* params.opt: Add destructive-interference-size and
constructive-interference-size.
* doc/invoke.texi: Document them.
* config/aarch64/aarch64.c (aarch64_override_options_internal):
Set them.
* config/arm/arm.c (arm_option_override): Set them.
* config/i386/i386-options.c (ix86_option_override_internal):
Set them.
gcc/c-family/ChangeLog:
* c.opt: Add -Winterference-size.
* c-cppbuiltin.c (cpp_atomic_builtins): Add __GCC_DESTRUCTIVE_SIZE
and __GCC_CONSTRUCTIVE_SIZE.
gcc/cp/ChangeLog:
* constexpr.c (maybe_warn_about_constant_value):
Complain about std::hardware_destructive_interference_size.
(cxx_eval_constant_expression): Call it.
* decl.c (cxx_init_decl_processing): Check
--param *-interference-size values.
libstdc++-v3/ChangeLog:
* include/std/version: Define __cpp_lib_hardware_interference_size.
* libsupc++/new: Define hardware interference size variables.
gcc/testsuite/ChangeLog:
* g++.dg/warn/Winterference.H: New file.
* g++.dg/warn/Winterference.C: New test.
* g++.target/aarch64/interference.C: New test.
* g++.target/arm/interference.C: New test.
* g++.target/i386/interference.C: New test.
---
gcc/c-family/c-cppbuiltin.c | 14 ++++++
gcc/c-family/c.opt | 5 ++
gcc/config/aarch64/aarch64.c | 22 +++++++++
gcc/config/arm/arm.c | 22 +++++++++
gcc/config/i386/i386-options.c | 6 +++
gcc/cp/constexpr.c | 33 +++++++++++++
gcc/cp/decl.c | 32 ++++++++++++
gcc/doc/invoke.texi | 65 +++++++++++++++++++++++++
gcc/params.opt | 16 ++++++
gcc/testsuite/g++.dg/warn/Winterference-2.C | 14 ++++++
gcc/testsuite/g++.dg/warn/Winterference.C | 6 +++
gcc/testsuite/g++.dg/warn/Winterference.H | 7 +++
gcc/testsuite/g++.target/aarch64/interference.C | 9 ++++
gcc/testsuite/g++.target/arm/interference.C | 9 ++++
gcc/testsuite/g++.target/i386/interference.C | 8 +++
libstdc++-v3/include/std/version | 3 ++
libstdc++-v3/libsupc++/new | 10 +++-
17 files changed, 279 insertions(+), 2 deletions(-)
diff --git a/gcc/c-family/c-cppbuiltin.c b/gcc/c-family/c-cppbuiltin.c
index 48cbefd8bf8..ce88e707127 100644
--- a/gcc/c-family/c-cppbuiltin.c
+++ b/gcc/c-family/c-cppbuiltin.c
@@ -741,6 +741,20 @@ cpp_atomic_builtins (cpp_reader *pfile)
builtin_define_with_int_value ("__GCC_ATOMIC_TEST_AND_SET_TRUEVAL",
targetm.atomic_test_and_set_trueval);
+ /* Macros for C++17 hardware interference size constants. Either both or
+ neither should be set. */
+ gcc_assert (!param_destruct_interfere_size
+ == !param_construct_interfere_size);
+ if (param_destruct_interfere_size)
+ {
+ /* FIXME The way of communicating these values to the library should be
+ part of the C++ ABI, whether macro or builtin. */
+ builtin_define_with_int_value ("__GCC_DESTRUCTIVE_SIZE",
+ param_destruct_interfere_size);
+ builtin_define_with_int_value ("__GCC_CONSTRUCTIVE_SIZE",
+ param_construct_interfere_size);
+ }
+
/* ptr_type_node can't be used here since ptr_mode is only set when
toplev calls backend_init which is not done with -E or pch. */
psize = POINTER_SIZE_UNITS;
diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt
index c5fe90003f2..9c151d19870 100644
--- a/gcc/c-family/c.opt
+++ b/gcc/c-family/c.opt
@@ -722,6 +722,11 @@ Winit-list-lifetime
C++ ObjC++ Var(warn_init_list) Warning Init(1)
Warn about uses of std::initializer_list that can result in dangling pointers.
+Winterference-size
+C++ ObjC++ Var(warn_interference_size) Warning Init(1)
+Warn about nonsensical values of --param destructive-interference-size or
+constructive-interference-size.
+
Wimplicit
C ObjC Var(warn_implicit) Warning LangEnabledBy(C ObjC,Wall)
Warn about implicit declarations.
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 30d9a0b7a3d..36519ccc5a5 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -16540,6 +16540,28 @@ aarch64_override_options_internal (struct gcc_options *opts)
SET_OPTION_IF_UNSET (opts, &global_options_set,
param_l1_cache_line_size,
aarch64_tune_params.prefetch->l1_cache_line_size);
+
+ if (aarch64_tune_params.prefetch->l1_cache_line_size >= 0)
+ {
+ SET_OPTION_IF_UNSET (opts, &global_options_set,
+ param_destruct_interfere_size,
+ aarch64_tune_params.prefetch->l1_cache_line_size);
+ SET_OPTION_IF_UNSET (opts, &global_options_set,
+ param_construct_interfere_size,
+ aarch64_tune_params.prefetch->l1_cache_line_size);
+ }
+ else
+ {
+ /* For a generic AArch64 target, cover the current range of cache line
+ sizes. */
+ SET_OPTION_IF_UNSET (opts, &global_options_set,
+ param_destruct_interfere_size,
+ 256);
+ SET_OPTION_IF_UNSET (opts, &global_options_set,
+ param_construct_interfere_size,
+ 64);
+ }
+
if (aarch64_tune_params.prefetch->l2_cache_size >= 0)
SET_OPTION_IF_UNSET (opts, &global_options_set,
param_l2_cache_size,
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index f1e628253d0..6c6e77fab66 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -3669,6 +3669,28 @@ arm_option_override (void)
SET_OPTION_IF_UNSET (&global_options, &global_options_set,
param_l1_cache_line_size,
current_tune->prefetch.l1_cache_line_size);
+ if (current_tune->prefetch.l1_cache_line_size >= 0)
+ {
+ SET_OPTION_IF_UNSET (&global_options, &global_options_set,
+ param_destruct_interfere_size,
+ current_tune->prefetch.l1_cache_line_size);
+ SET_OPTION_IF_UNSET (&global_options, &global_options_set,
+ param_construct_interfere_size,
+ current_tune->prefetch.l1_cache_line_size);
+ }
+ else
+ {
+ /* For a generic ARM target, JF Bastien proposed using 64 for both. */
+ /* ??? Cortex A9 has a 32-byte cache line, so why not 32 for
+ constructive? */
+ /* More recent Cortex chips have a 64-byte cache line, but are marked
+ ARM_PREFETCH_NOT_BENEFICIAL, so they get these defaults. */
+ SET_OPTION_IF_UNSET (&global_options, &global_options_set,
+ param_destruct_interfere_size, 64);
+ SET_OPTION_IF_UNSET (&global_options, &global_options_set,
+ param_construct_interfere_size, 64);
+ }
+
if (current_tune->prefetch.l1_cache_size >= 0)
SET_OPTION_IF_UNSET (&global_options, &global_options_set,
param_l1_cache_size,
diff --git a/gcc/config/i386/i386-options.c b/gcc/config/i386/i386-options.c
index 2cb87cedec0..c0006b3674b 100644
--- a/gcc/config/i386/i386-options.c
+++ b/gcc/config/i386/i386-options.c
@@ -2579,6 +2579,12 @@ ix86_option_override_internal (bool main_args_p,
SET_OPTION_IF_UNSET (opts, opts_set, param_l2_cache_size,
ix86_tune_cost->l2_cache_size);
+ /* 64B is the accepted value for these for all x86. */
+ SET_OPTION_IF_UNSET (&global_options, &global_options_set,
+ param_destruct_interfere_size, 64);
+ SET_OPTION_IF_UNSET (&global_options, &global_options_set,
+ param_construct_interfere_size, 64);
+
/* Enable sw prefetching at -O3 for CPUS that prefetching is helpful. */
if (opts->x_flag_prefetch_loop_arrays < 0
&& HAVE_prefetch
diff --git a/gcc/cp/constexpr.c b/gcc/cp/constexpr.c
index 7772fe62d95..0c2498aee22 100644
--- a/gcc/cp/constexpr.c
+++ b/gcc/cp/constexpr.c
@@ -6075,6 +6075,37 @@ inline_asm_in_constexpr_error (location_t loc)
"%<constexpr%> function in C++20");
}
+/* We're getting the constant value of DECL in a manifestly constant-evaluated
+ context; maybe complain about that. */
+
+static void
+maybe_warn_about_constant_value (location_t loc, tree decl)
+{
+ static bool explained = false;
+ if (cxx_dialect >= cxx17
+ && warn_interference_size
+ && !global_options_set.x_param_destruct_interfere_size
+ && DECL_CONTEXT (decl) == std_node
+ && id_equal (DECL_NAME (decl), "hardware_destructive_interference_size")
+ && (LOCATION_FILE (input_location) != main_input_filename
+ || module_exporting_p ())
+ && warning_at (loc, OPT_Winterference_size, "use of %qD", decl)
+ && !explained)
+ {
+ explained = true;
+ inform (loc, "its value can vary between compiler versions or "
+ "with different %<-mtune%> or %<-mcpu%> flags");
+ inform (loc, "if this use is part of a public ABI, change it to "
+ "instead use a constant variable you define");
+ inform (loc, "the default value for the current CPU tuning "
+ "is %d bytes", param_destruct_interfere_size);
+ inform (loc, "you can stabilize this value with %<--param "
+ "hardware_destructive_interference_size=%d%>, or disable "
+ "this warning with %<-Wno-interference-size%>",
+ param_destruct_interfere_size);
+ }
+}
+
/* Attempt to reduce the expression T to a constant value.
On failure, issue diagnostic and return error_mark_node. */
/* FIXME unify with c_fully_fold */
@@ -6219,6 +6250,8 @@ cxx_eval_constant_expression (const constexpr_ctx *ctx, tree t,
r = *p;
break;
}
+ if (ctx->manifestly_const_eval)
+ maybe_warn_about_constant_value (loc, t);
if (COMPLETE_TYPE_P (TREE_TYPE (t))
&& is_really_empty_class (TREE_TYPE (t), /*ignore_vptr*/false))
{
diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c
index bce62ad202a..c2065027369 100644
--- a/gcc/cp/decl.c
+++ b/gcc/cp/decl.c
@@ -4752,6 +4752,38 @@ cxx_init_decl_processing (void)
/* Show we use EH for cleanups. */
if (flag_exceptions)
using_eh_for_cleanups ();
+
+ /* Check that the hardware interference sizes are at least
+ alignof(max_align_t), as required by the standard. */
+ const int max_align = max_align_t_align () / BITS_PER_UNIT;
+ if (param_destruct_interfere_size)
+ {
+ if (param_destruct_interfere_size < max_align)
+ error ("%<--param destructive-interference-size=%d%> is less than "
+ "%d", param_destruct_interfere_size, max_align);
+ else if (param_destruct_interfere_size < param_l1_cache_line_size)
+ warning (OPT_Winterference_size,
+ "%<--param destructive-interference-size=%d%> "
+ "is less than %<--param l1-cache-line-size=%d%>",
+ param_destruct_interfere_size, param_l1_cache_line_size);
+ }
+ else if (param_l1_cache_line_size >= max_align)
+ param_destruct_interfere_size = param_l1_cache_line_size;
+ /* else leave it unset. */
+
+ if (param_construct_interfere_size)
+ {
+ if (param_construct_interfere_size < max_align)
+ error ("%<--param constructive-interference-size=%d%> is less than "
+ "%d", param_construct_interfere_size, max_align);
+ else if (param_construct_interfere_size > param_l1_cache_line_size)
+ warning (OPT_Winterference_size,
+ "%<--param constructive-interference-size=%d%> "
+ "is greater than %<--param l1-cache-line-size=%d%>",
+ param_construct_interfere_size, param_l1_cache_line_size);
+ }
+ else if (param_l1_cache_line_size >= max_align)
+ param_construct_interfere_size = param_l1_cache_line_size;
}
/* Enter an abi node in global-module context. returns a cookie to
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 23cc68f92b5..78cfc100ac2 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -9018,6 +9018,43 @@ that has already been done in the current function. Therefore,
seemingly insignificant changes in the source program can cause the
warnings produced by @option{-Winline} to appear or disappear.
+@item -Winterference-size
+@opindex Winterference-size
+Warn about use of C++17 @code{std::hardware_destructive_interference_size}
+without specifying its value with @option{--param destructive-interference-size}.
+Also warn about questionable values for that option.
+
+This variable is intended to be used for controlling class layout, to
+avoid false sharing in concurrent code:
+
+@smallexample
+struct independent_fields @{
+ alignas(std::hardware_destructive_interference_size) std::atomic<int> one;
+ alignas(std::hardware_destructive_interference_size) std::atomic<int> two;
+@};
+@end smallexample
+
+Here @samp{one} and @samp{two} are intended to be far enough apart
+that stores to one won't require accesses to the other to reload the
+cache line.
+
+By default, @option{--param destructive-interference-size} and
+@option{--param constructive-interference-size} are set based on the
+current @option{-mtune} option, typically to the L1 cache line size
+for the particular target CPU, sometimes to a range if tuning for a
+generic target. So all translation units that depend on ABI
+compatibility for the use of these variables must be compiled with
+the same @option{-mtune} (or @option{-mcpu}).
+
+If ABI stability is important, such as if the use is in a header for a
+library, you should probably not use the hardware interference size
+variables at all. Alternatively, you can force a particular value
+with @option{--param}.
+
+If you are confident that your use of the variable does not affect ABI
+outside a single build of your project, you can turn off the warning
+with @option{-Wno-interference-size}.
+
@item -Wint-in-bool-context
@opindex Wint-in-bool-context
@opindex Wno-int-in-bool-context
@@ -13938,6 +13975,34 @@ prefetch hints can be issued for any constant stride.
This setting is only useful for strides that are known and constant.
+@item destructive-interference-size
+@item constructive-interference-size
+The values for the C++17 variables
+@code{std::hardware_destructive_interference_size} and
+@code{std::hardware_constructive_interference_size}. The destructive
+interference size is the minimum recommended offset between two
+independent concurrently-accessed objects; the constructive
+interference size is the maximum recommended size of contiguous memory
+accessed together. Typically both will be the size of an L1 cache
+line for the target, in bytes. For a generic target covering a range of L1
+cache line sizes, typically the constructive interference size will be
+the small end of the range and the destructive size will be the large
+end.
+
+The destructive interference size is intended to be used for layout,
+and thus has ABI impact. The default value is not expected to be
+stable, and on some targets varies with @option{-mtune}, so use of
+this variable in a context where ABI stability is important, such as
+the public interface of a library, is strongly discouraged; if it is
+used in that context, users can stabilize the value using this
+option.
+
+The constructive interference size is less sensitive, as it is
+typically only used in a @samp{static_assert} to make sure that a type
+fits within a cache line.
+
+See also @option{-Winterference-size}.
+
@item loop-interchange-max-num-stmts
The maximum number of stmts in a loop to be interchanged.
diff --git a/gcc/params.opt b/gcc/params.opt
index 3a701e22c46..658ca028851 100644
--- a/gcc/params.opt
+++ b/gcc/params.opt
@@ -361,6 +361,22 @@ The maximum code size growth ratio when expanding into a jump table (in percent)
Common Joined UInteger Var(param_l1_cache_line_size) Init(32) Param Optimization
The size of L1 cache line.
+-param=destructive-interference-size=
+Common Joined UInteger Var(param_destruct_interfere_size) Init(0) Param Optimization
+The minimum recommended offset between two concurrently-accessed objects to
+avoid additional performance degradation due to contention introduced by the
+implementation. Typically the L1 cache line size, but can be larger to
+accommodate a variety of target processors with different cache line sizes.
+C++17 code might use this value in structure layout, but is strongly
+discouraged from doing so in public ABIs.
+
+-param=constructive-interference-size=
+Common Joined UInteger Var(param_construct_interfere_size) Init(0) Param Optimization
+The maximum recommended size of contiguous memory occupied by two objects
+accessed with temporal locality by concurrent threads. Typically the L1 cache
+line size, but can be smaller to accommodate a variety of target processors with
+different cache line sizes.
+
-param=l1-cache-size=
Common Joined UInteger Var(param_l1_cache_size) Init(64) Param Optimization
The size of L1 cache.
diff --git a/gcc/testsuite/g++.dg/warn/Winterference-2.C b/gcc/testsuite/g++.dg/warn/Winterference-2.C
new file mode 100644
index 00000000000..2af75c63f83
--- /dev/null
+++ b/gcc/testsuite/g++.dg/warn/Winterference-2.C
@@ -0,0 +1,14 @@
+// { dg-do compile { target c++20 } }
+// { dg-additional-options -fmodules-ts }
+
+module ;
+
+#include <new>
+
+export module foo;
+
+export {
+ struct A {
+ alignas(std::hardware_destructive_interference_size) int x; // { dg-warning Winterference-size }
+ };
+}
diff --git a/gcc/testsuite/g++.dg/warn/Winterference.C b/gcc/testsuite/g++.dg/warn/Winterference.C
new file mode 100644
index 00000000000..57c001bc032
--- /dev/null
+++ b/gcc/testsuite/g++.dg/warn/Winterference.C
@@ -0,0 +1,6 @@
+// Test that we warn about use of std::hardware_destructive_interference_size
+// in a header.
+// { dg-do compile { target c++17 } }
+
+// { dg-warning Winterference-size "" { target *-*-* } 0 }
+#include "Winterference.H"
diff --git a/gcc/testsuite/g++.dg/warn/Winterference.H b/gcc/testsuite/g++.dg/warn/Winterference.H
new file mode 100644
index 00000000000..36f0ad5f6d1
--- /dev/null
+++ b/gcc/testsuite/g++.dg/warn/Winterference.H
@@ -0,0 +1,7 @@
+#include <new>
+
+struct A
+{
+ alignas(std::hardware_destructive_interference_size) int i;
+ alignas(std::hardware_destructive_interference_size) int j;
+};
diff --git a/gcc/testsuite/g++.target/aarch64/interference.C b/gcc/testsuite/g++.target/aarch64/interference.C
new file mode 100644
index 00000000000..0fc01655223
--- /dev/null
+++ b/gcc/testsuite/g++.target/aarch64/interference.C
@@ -0,0 +1,9 @@
+// Test C++17 hardware interference size constants
+// { dg-do compile { target c++17 } }
+
+#include <new>
+
+// Most AArch64 CPUs have an L1 cache line size of 64, but some recent ones use
+// 128 or even 256.
+static_assert(std::hardware_destructive_interference_size == 256);
+static_assert(std::hardware_constructive_interference_size == 64);
diff --git a/gcc/testsuite/g++.target/arm/interference.C b/gcc/testsuite/g++.target/arm/interference.C
new file mode 100644
index 00000000000..34fe8a52bff
--- /dev/null
+++ b/gcc/testsuite/g++.target/arm/interference.C
@@ -0,0 +1,9 @@
+// Test C++17 hardware interference size constants
+// { dg-do compile { target c++17 } }
+
+#include <new>
+
+// Recent ARM CPUs have a cache line size of 64. Older ones have
+// a size of 32, but I guess they're old enough that we don't care?
+static_assert(std::hardware_destructive_interference_size == 64);
+static_assert(std::hardware_constructive_interference_size == 64);
diff --git a/gcc/testsuite/g++.target/i386/interference.C b/gcc/testsuite/g++.target/i386/interference.C
new file mode 100644
index 00000000000..c7b910e3ada
--- /dev/null
+++ b/gcc/testsuite/g++.target/i386/interference.C
@@ -0,0 +1,8 @@
+// Test C++17 hardware interference size constants
+// { dg-do compile { target c++17 } }
+
+#include <new>
+
+// It is generally agreed that these are the right values for all x86.
+static_assert(std::hardware_destructive_interference_size == 64);
+static_assert(std::hardware_constructive_interference_size == 64);
diff --git a/libstdc++-v3/include/std/version b/libstdc++-v3/include/std/version
index f950bf0f0db..f41004b5911 100644
--- a/libstdc++-v3/include/std/version
+++ b/libstdc++-v3/include/std/version
@@ -140,6 +140,9 @@
#define __cpp_lib_filesystem 201703
#define __cpp_lib_gcd 201606
#define __cpp_lib_gcd_lcm 201606
+#ifdef __GCC_DESTRUCTIVE_SIZE
+# define __cpp_lib_hardware_interference_size 201703L
+#endif
#define __cpp_lib_hypot 201603
#define __cpp_lib_invoke 201411L
#define __cpp_lib_lcm 201606
diff --git a/libstdc++-v3/libsupc++/new b/libstdc++-v3/libsupc++/new
index 3349b13fd1b..7bc67a6cb02 100644
--- a/libstdc++-v3/libsupc++/new
+++ b/libstdc++-v3/libsupc++/new
@@ -183,9 +183,9 @@ inline void operator delete[](void*, void*) _GLIBCXX_USE_NOEXCEPT { }
} // extern "C++"
#if __cplusplus >= 201703L
-#ifdef _GLIBCXX_HAVE_BUILTIN_LAUNDER
namespace std
{
+#ifdef _GLIBCXX_HAVE_BUILTIN_LAUNDER
#define __cpp_lib_launder 201606
/// Pointer optimization barrier [ptr.launder]
template<typename _Tp>
@@ -205,8 +205,14 @@ namespace std
void launder(const void*) = delete;
void launder(volatile void*) = delete;
void launder(const volatile void*) = delete;
-}
#endif // _GLIBCXX_HAVE_BUILTIN_LAUNDER
+
+#ifdef __GCC_DESTRUCTIVE_SIZE
+# define __cpp_lib_hardware_interference_size 201703L
+ inline constexpr size_t hardware_destructive_interference_size = __GCC_DESTRUCTIVE_SIZE;
+ inline constexpr size_t hardware_constructive_interference_size = __GCC_CONSTRUCTIVE_SIZE;
+#endif // __GCC_DESTRUCTIVE_SIZE
+}
#endif // C++17
#if __cplusplus > 201703L
</cut>
Identified regression caused by *gcc:76b75018b3d053a890ebe155e47814de14b3c9fb*:
commit 76b75018b3d053a890ebe155e47814de14b3c9fb
Author: Jason Merrill <jason(a)redhat.com>
c++: implement C++17 hardware interference size
Results regressed to (for first_bad == 76b75018b3d053a890ebe155e47814de14b3c9fb)
# reset_artifacts:
-10
# true:
0
# build_abe binutils:
1
# First few build errors in logs:
from (for last_good == 8ea292591e42aa4d52b4b7a00b86335bfd2e2e85)
# reset_artifacts:
-10
# true:
0
# build_abe binutils:
1
# build_abe bootstrap:
2
This commit has regressed these CI configurations:
- tcwg_gcc_bootstrap/master-aarch64-bootstrap
Artifacts of last_good build: https://ci.linaro.org/job/tcwg_gcc_bootstrap-bisect-master-aarch64-bootstra…
Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_gcc_bootstrap-bisect-master-aarch64-bootstra…
Even more details: https://ci.linaro.org/job/tcwg_gcc_bootstrap-bisect-master-aarch64-bootstra…
Reproduce builds:
<cut>
mkdir investigate-gcc-76b75018b3d053a890ebe155e47814de14b3c9fb
cd investigate-gcc-76b75018b3d053a890ebe155e47814de14b3c9fb
# Fetch scripts
git clone https://git.linaro.org/toolchain/jenkins-scripts
# Fetch manifests and test.sh script
mkdir -p artifacts/manifests
curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_gcc_bootstrap-bisect-master-aarch64-bootstra… --fail
curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_gcc_bootstrap-bisect-master-aarch64-bootstra… --fail
curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_gcc_bootstrap-bisect-master-aarch64-bootstra… --fail
chmod +x artifacts/test.sh
# Reproduce the baseline build (build all pre-requisites)
./jenkins-scripts/tcwg_gnu-build.sh @@ artifacts/manifests/build-baseline.sh
# Save baseline build state (which is then restored in artifacts/test.sh)
mkdir -p ./bisect
rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /gcc/ ./ ./bisect/baseline/
cd gcc
# Reproduce first_bad build
git checkout --detach 76b75018b3d053a890ebe155e47814de14b3c9fb
../artifacts/test.sh
# Reproduce last_good build
git checkout --detach 8ea292591e42aa4d52b4b7a00b86335bfd2e2e85
../artifacts/test.sh
cd ..
</cut>
Full commit (up to 1000 lines):
<cut>
commit 76b75018b3d053a890ebe155e47814de14b3c9fb
Author: Jason Merrill <jason(a)redhat.com>
Date: Thu Jul 15 15:30:17 2021 -0400
c++: implement C++17 hardware interference size
The last missing piece of the C++17 standard library is the hardware
intereference size constants. Much of the delay in implementing these has
been due to uncertainty about what the right values are, and even whether
there is a single constant value that is suitable; the destructive
interference size is intended to be used in structure layout, so program
ABIs will depend on it.
In principle, both of these values should be the same as the target's L1
cache line size. When compiling for a generic target that is intended to
support a range of target CPUs with different cache line sizes, the
constructive size should probably be the minimum size, and the destructive
size the maximum, unless you are constrained by ABI compatibility with
previous code.
From discussion on gcc-patches, I've come to the conclusion that the
solution to the difficulty of choosing stable values is to give up on it,
and instead encourage only uses where ABI stability is unimportant: in
particular, uses where the ABI is shared at most between translation units
built at the same time with the same flags.
To that end, I've added a warning for any use of the constant value of
std::hardware_destructive_interference_size in a header or module export.
Appropriate uses within a project can disable the warning.
A previous iteration of this patch included an -finterference-tune flag to
make the value vary with -mtune; this iteration makes that the default
behavior, which should be appropriate for all reasonable uses of the
variable. The previous default of "stable-ish" seems to me likely to have
been more of an attractive nuisance; since we can't promise actual
stability, we should instead make proper uses more convenient.
JF Bastien's implementation proposal is summarized at
https://github.com/itanium-cxx-abi/cxx-abi/issues/74
I implement this by adding new --params for the two sizes. Targets can
override these values in targetm.target_option.override() to support a range
of values for the generic target; otherwise, both will default to the L1
cache line size.
64 bytes still seems correct for all x86.
I'm not sure why he proposed 64/64 for generic 32-bit ARM, since the Cortex
A9 has a 32-byte cache line, so I'd think 32/64 would make more sense.
He proposed 64/128 for generic AArch64, but since the A64FX now has a 256B
cache line, I've changed that to 64/256.
Other arch maintainers are invited to set ranges for their generic targets
if that seems better than using the default cache line size for both values.
With the above choice to reject stability as a goal, getting these values
"right" is now just a matter of what we want the default optimization to be,
and we can feel free to adjust them as CPUs with different cache lines
become more and less common.
gcc/ChangeLog:
* params.opt: Add destructive-interference-size and
constructive-interference-size.
* doc/invoke.texi: Document them.
* config/aarch64/aarch64.c (aarch64_override_options_internal):
Set them.
* config/arm/arm.c (arm_option_override): Set them.
* config/i386/i386-options.c (ix86_option_override_internal):
Set them.
gcc/c-family/ChangeLog:
* c.opt: Add -Winterference-size.
* c-cppbuiltin.c (cpp_atomic_builtins): Add __GCC_DESTRUCTIVE_SIZE
and __GCC_CONSTRUCTIVE_SIZE.
gcc/cp/ChangeLog:
* constexpr.c (maybe_warn_about_constant_value):
Complain about std::hardware_destructive_interference_size.
(cxx_eval_constant_expression): Call it.
* decl.c (cxx_init_decl_processing): Check
--param *-interference-size values.
libstdc++-v3/ChangeLog:
* include/std/version: Define __cpp_lib_hardware_interference_size.
* libsupc++/new: Define hardware interference size variables.
gcc/testsuite/ChangeLog:
* g++.dg/warn/Winterference.H: New file.
* g++.dg/warn/Winterference.C: New test.
* g++.target/aarch64/interference.C: New test.
* g++.target/arm/interference.C: New test.
* g++.target/i386/interference.C: New test.
---
gcc/c-family/c-cppbuiltin.c | 14 ++++++
gcc/c-family/c.opt | 5 ++
gcc/config/aarch64/aarch64.c | 22 +++++++++
gcc/config/arm/arm.c | 22 +++++++++
gcc/config/i386/i386-options.c | 6 +++
gcc/cp/constexpr.c | 33 +++++++++++++
gcc/cp/decl.c | 32 ++++++++++++
gcc/doc/invoke.texi | 65 +++++++++++++++++++++++++
gcc/params.opt | 16 ++++++
gcc/testsuite/g++.dg/warn/Winterference-2.C | 14 ++++++
gcc/testsuite/g++.dg/warn/Winterference.C | 6 +++
gcc/testsuite/g++.dg/warn/Winterference.H | 7 +++
gcc/testsuite/g++.target/aarch64/interference.C | 9 ++++
gcc/testsuite/g++.target/arm/interference.C | 9 ++++
gcc/testsuite/g++.target/i386/interference.C | 8 +++
libstdc++-v3/include/std/version | 3 ++
libstdc++-v3/libsupc++/new | 10 +++-
17 files changed, 279 insertions(+), 2 deletions(-)
diff --git a/gcc/c-family/c-cppbuiltin.c b/gcc/c-family/c-cppbuiltin.c
index 48cbefd8bf8..ce88e707127 100644
--- a/gcc/c-family/c-cppbuiltin.c
+++ b/gcc/c-family/c-cppbuiltin.c
@@ -741,6 +741,20 @@ cpp_atomic_builtins (cpp_reader *pfile)
builtin_define_with_int_value ("__GCC_ATOMIC_TEST_AND_SET_TRUEVAL",
targetm.atomic_test_and_set_trueval);
+ /* Macros for C++17 hardware interference size constants. Either both or
+ neither should be set. */
+ gcc_assert (!param_destruct_interfere_size
+ == !param_construct_interfere_size);
+ if (param_destruct_interfere_size)
+ {
+ /* FIXME The way of communicating these values to the library should be
+ part of the C++ ABI, whether macro or builtin. */
+ builtin_define_with_int_value ("__GCC_DESTRUCTIVE_SIZE",
+ param_destruct_interfere_size);
+ builtin_define_with_int_value ("__GCC_CONSTRUCTIVE_SIZE",
+ param_construct_interfere_size);
+ }
+
/* ptr_type_node can't be used here since ptr_mode is only set when
toplev calls backend_init which is not done with -E or pch. */
psize = POINTER_SIZE_UNITS;
diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt
index c5fe90003f2..9c151d19870 100644
--- a/gcc/c-family/c.opt
+++ b/gcc/c-family/c.opt
@@ -722,6 +722,11 @@ Winit-list-lifetime
C++ ObjC++ Var(warn_init_list) Warning Init(1)
Warn about uses of std::initializer_list that can result in dangling pointers.
+Winterference-size
+C++ ObjC++ Var(warn_interference_size) Warning Init(1)
+Warn about nonsensical values of --param destructive-interference-size or
+constructive-interference-size.
+
Wimplicit
C ObjC Var(warn_implicit) Warning LangEnabledBy(C ObjC,Wall)
Warn about implicit declarations.
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 30d9a0b7a3d..36519ccc5a5 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -16540,6 +16540,28 @@ aarch64_override_options_internal (struct gcc_options *opts)
SET_OPTION_IF_UNSET (opts, &global_options_set,
param_l1_cache_line_size,
aarch64_tune_params.prefetch->l1_cache_line_size);
+
+ if (aarch64_tune_params.prefetch->l1_cache_line_size >= 0)
+ {
+ SET_OPTION_IF_UNSET (opts, &global_options_set,
+ param_destruct_interfere_size,
+ aarch64_tune_params.prefetch->l1_cache_line_size);
+ SET_OPTION_IF_UNSET (opts, &global_options_set,
+ param_construct_interfere_size,
+ aarch64_tune_params.prefetch->l1_cache_line_size);
+ }
+ else
+ {
+ /* For a generic AArch64 target, cover the current range of cache line
+ sizes. */
+ SET_OPTION_IF_UNSET (opts, &global_options_set,
+ param_destruct_interfere_size,
+ 256);
+ SET_OPTION_IF_UNSET (opts, &global_options_set,
+ param_construct_interfere_size,
+ 64);
+ }
+
if (aarch64_tune_params.prefetch->l2_cache_size >= 0)
SET_OPTION_IF_UNSET (opts, &global_options_set,
param_l2_cache_size,
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index f1e628253d0..6c6e77fab66 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -3669,6 +3669,28 @@ arm_option_override (void)
SET_OPTION_IF_UNSET (&global_options, &global_options_set,
param_l1_cache_line_size,
current_tune->prefetch.l1_cache_line_size);
+ if (current_tune->prefetch.l1_cache_line_size >= 0)
+ {
+ SET_OPTION_IF_UNSET (&global_options, &global_options_set,
+ param_destruct_interfere_size,
+ current_tune->prefetch.l1_cache_line_size);
+ SET_OPTION_IF_UNSET (&global_options, &global_options_set,
+ param_construct_interfere_size,
+ current_tune->prefetch.l1_cache_line_size);
+ }
+ else
+ {
+ /* For a generic ARM target, JF Bastien proposed using 64 for both. */
+ /* ??? Cortex A9 has a 32-byte cache line, so why not 32 for
+ constructive? */
+ /* More recent Cortex chips have a 64-byte cache line, but are marked
+ ARM_PREFETCH_NOT_BENEFICIAL, so they get these defaults. */
+ SET_OPTION_IF_UNSET (&global_options, &global_options_set,
+ param_destruct_interfere_size, 64);
+ SET_OPTION_IF_UNSET (&global_options, &global_options_set,
+ param_construct_interfere_size, 64);
+ }
+
if (current_tune->prefetch.l1_cache_size >= 0)
SET_OPTION_IF_UNSET (&global_options, &global_options_set,
param_l1_cache_size,
diff --git a/gcc/config/i386/i386-options.c b/gcc/config/i386/i386-options.c
index 2cb87cedec0..c0006b3674b 100644
--- a/gcc/config/i386/i386-options.c
+++ b/gcc/config/i386/i386-options.c
@@ -2579,6 +2579,12 @@ ix86_option_override_internal (bool main_args_p,
SET_OPTION_IF_UNSET (opts, opts_set, param_l2_cache_size,
ix86_tune_cost->l2_cache_size);
+ /* 64B is the accepted value for these for all x86. */
+ SET_OPTION_IF_UNSET (&global_options, &global_options_set,
+ param_destruct_interfere_size, 64);
+ SET_OPTION_IF_UNSET (&global_options, &global_options_set,
+ param_construct_interfere_size, 64);
+
/* Enable sw prefetching at -O3 for CPUS that prefetching is helpful. */
if (opts->x_flag_prefetch_loop_arrays < 0
&& HAVE_prefetch
diff --git a/gcc/cp/constexpr.c b/gcc/cp/constexpr.c
index 7772fe62d95..0c2498aee22 100644
--- a/gcc/cp/constexpr.c
+++ b/gcc/cp/constexpr.c
@@ -6075,6 +6075,37 @@ inline_asm_in_constexpr_error (location_t loc)
"%<constexpr%> function in C++20");
}
+/* We're getting the constant value of DECL in a manifestly constant-evaluated
+ context; maybe complain about that. */
+
+static void
+maybe_warn_about_constant_value (location_t loc, tree decl)
+{
+ static bool explained = false;
+ if (cxx_dialect >= cxx17
+ && warn_interference_size
+ && !global_options_set.x_param_destruct_interfere_size
+ && DECL_CONTEXT (decl) == std_node
+ && id_equal (DECL_NAME (decl), "hardware_destructive_interference_size")
+ && (LOCATION_FILE (input_location) != main_input_filename
+ || module_exporting_p ())
+ && warning_at (loc, OPT_Winterference_size, "use of %qD", decl)
+ && !explained)
+ {
+ explained = true;
+ inform (loc, "its value can vary between compiler versions or "
+ "with different %<-mtune%> or %<-mcpu%> flags");
+ inform (loc, "if this use is part of a public ABI, change it to "
+ "instead use a constant variable you define");
+ inform (loc, "the default value for the current CPU tuning "
+ "is %d bytes", param_destruct_interfere_size);
+ inform (loc, "you can stabilize this value with %<--param "
+ "hardware_destructive_interference_size=%d%>, or disable "
+ "this warning with %<-Wno-interference-size%>",
+ param_destruct_interfere_size);
+ }
+}
+
/* Attempt to reduce the expression T to a constant value.
On failure, issue diagnostic and return error_mark_node. */
/* FIXME unify with c_fully_fold */
@@ -6219,6 +6250,8 @@ cxx_eval_constant_expression (const constexpr_ctx *ctx, tree t,
r = *p;
break;
}
+ if (ctx->manifestly_const_eval)
+ maybe_warn_about_constant_value (loc, t);
if (COMPLETE_TYPE_P (TREE_TYPE (t))
&& is_really_empty_class (TREE_TYPE (t), /*ignore_vptr*/false))
{
diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c
index bce62ad202a..c2065027369 100644
--- a/gcc/cp/decl.c
+++ b/gcc/cp/decl.c
@@ -4752,6 +4752,38 @@ cxx_init_decl_processing (void)
/* Show we use EH for cleanups. */
if (flag_exceptions)
using_eh_for_cleanups ();
+
+ /* Check that the hardware interference sizes are at least
+ alignof(max_align_t), as required by the standard. */
+ const int max_align = max_align_t_align () / BITS_PER_UNIT;
+ if (param_destruct_interfere_size)
+ {
+ if (param_destruct_interfere_size < max_align)
+ error ("%<--param destructive-interference-size=%d%> is less than "
+ "%d", param_destruct_interfere_size, max_align);
+ else if (param_destruct_interfere_size < param_l1_cache_line_size)
+ warning (OPT_Winterference_size,
+ "%<--param destructive-interference-size=%d%> "
+ "is less than %<--param l1-cache-line-size=%d%>",
+ param_destruct_interfere_size, param_l1_cache_line_size);
+ }
+ else if (param_l1_cache_line_size >= max_align)
+ param_destruct_interfere_size = param_l1_cache_line_size;
+ /* else leave it unset. */
+
+ if (param_construct_interfere_size)
+ {
+ if (param_construct_interfere_size < max_align)
+ error ("%<--param constructive-interference-size=%d%> is less than "
+ "%d", param_construct_interfere_size, max_align);
+ else if (param_construct_interfere_size > param_l1_cache_line_size)
+ warning (OPT_Winterference_size,
+ "%<--param constructive-interference-size=%d%> "
+ "is greater than %<--param l1-cache-line-size=%d%>",
+ param_construct_interfere_size, param_l1_cache_line_size);
+ }
+ else if (param_l1_cache_line_size >= max_align)
+ param_construct_interfere_size = param_l1_cache_line_size;
}
/* Enter an abi node in global-module context. returns a cookie to
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 23cc68f92b5..78cfc100ac2 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -9018,6 +9018,43 @@ that has already been done in the current function. Therefore,
seemingly insignificant changes in the source program can cause the
warnings produced by @option{-Winline} to appear or disappear.
+@item -Winterference-size
+@opindex Winterference-size
+Warn about use of C++17 @code{std::hardware_destructive_interference_size}
+without specifying its value with @option{--param destructive-interference-size}.
+Also warn about questionable values for that option.
+
+This variable is intended to be used for controlling class layout, to
+avoid false sharing in concurrent code:
+
+@smallexample
+struct independent_fields @{
+ alignas(std::hardware_destructive_interference_size) std::atomic<int> one;
+ alignas(std::hardware_destructive_interference_size) std::atomic<int> two;
+@};
+@end smallexample
+
+Here @samp{one} and @samp{two} are intended to be far enough apart
+that stores to one won't require accesses to the other to reload the
+cache line.
+
+By default, @option{--param destructive-interference-size} and
+@option{--param constructive-interference-size} are set based on the
+current @option{-mtune} option, typically to the L1 cache line size
+for the particular target CPU, sometimes to a range if tuning for a
+generic target. So all translation units that depend on ABI
+compatibility for the use of these variables must be compiled with
+the same @option{-mtune} (or @option{-mcpu}).
+
+If ABI stability is important, such as if the use is in a header for a
+library, you should probably not use the hardware interference size
+variables at all. Alternatively, you can force a particular value
+with @option{--param}.
+
+If you are confident that your use of the variable does not affect ABI
+outside a single build of your project, you can turn off the warning
+with @option{-Wno-interference-size}.
+
@item -Wint-in-bool-context
@opindex Wint-in-bool-context
@opindex Wno-int-in-bool-context
@@ -13938,6 +13975,34 @@ prefetch hints can be issued for any constant stride.
This setting is only useful for strides that are known and constant.
+@item destructive-interference-size
+@item constructive-interference-size
+The values for the C++17 variables
+@code{std::hardware_destructive_interference_size} and
+@code{std::hardware_constructive_interference_size}. The destructive
+interference size is the minimum recommended offset between two
+independent concurrently-accessed objects; the constructive
+interference size is the maximum recommended size of contiguous memory
+accessed together. Typically both will be the size of an L1 cache
+line for the target, in bytes. For a generic target covering a range of L1
+cache line sizes, typically the constructive interference size will be
+the small end of the range and the destructive size will be the large
+end.
+
+The destructive interference size is intended to be used for layout,
+and thus has ABI impact. The default value is not expected to be
+stable, and on some targets varies with @option{-mtune}, so use of
+this variable in a context where ABI stability is important, such as
+the public interface of a library, is strongly discouraged; if it is
+used in that context, users can stabilize the value using this
+option.
+
+The constructive interference size is less sensitive, as it is
+typically only used in a @samp{static_assert} to make sure that a type
+fits within a cache line.
+
+See also @option{-Winterference-size}.
+
@item loop-interchange-max-num-stmts
The maximum number of stmts in a loop to be interchanged.
diff --git a/gcc/params.opt b/gcc/params.opt
index 3a701e22c46..658ca028851 100644
--- a/gcc/params.opt
+++ b/gcc/params.opt
@@ -361,6 +361,22 @@ The maximum code size growth ratio when expanding into a jump table (in percent)
Common Joined UInteger Var(param_l1_cache_line_size) Init(32) Param Optimization
The size of L1 cache line.
+-param=destructive-interference-size=
+Common Joined UInteger Var(param_destruct_interfere_size) Init(0) Param Optimization
+The minimum recommended offset between two concurrently-accessed objects to
+avoid additional performance degradation due to contention introduced by the
+implementation. Typically the L1 cache line size, but can be larger to
+accommodate a variety of target processors with different cache line sizes.
+C++17 code might use this value in structure layout, but is strongly
+discouraged from doing so in public ABIs.
+
+-param=constructive-interference-size=
+Common Joined UInteger Var(param_construct_interfere_size) Init(0) Param Optimization
+The maximum recommended size of contiguous memory occupied by two objects
+accessed with temporal locality by concurrent threads. Typically the L1 cache
+line size, but can be smaller to accommodate a variety of target processors with
+different cache line sizes.
+
-param=l1-cache-size=
Common Joined UInteger Var(param_l1_cache_size) Init(64) Param Optimization
The size of L1 cache.
diff --git a/gcc/testsuite/g++.dg/warn/Winterference-2.C b/gcc/testsuite/g++.dg/warn/Winterference-2.C
new file mode 100644
index 00000000000..2af75c63f83
--- /dev/null
+++ b/gcc/testsuite/g++.dg/warn/Winterference-2.C
@@ -0,0 +1,14 @@
+// { dg-do compile { target c++20 } }
+// { dg-additional-options -fmodules-ts }
+
+module ;
+
+#include <new>
+
+export module foo;
+
+export {
+ struct A {
+ alignas(std::hardware_destructive_interference_size) int x; // { dg-warning Winterference-size }
+ };
+}
diff --git a/gcc/testsuite/g++.dg/warn/Winterference.C b/gcc/testsuite/g++.dg/warn/Winterference.C
new file mode 100644
index 00000000000..57c001bc032
--- /dev/null
+++ b/gcc/testsuite/g++.dg/warn/Winterference.C
@@ -0,0 +1,6 @@
+// Test that we warn about use of std::hardware_destructive_interference_size
+// in a header.
+// { dg-do compile { target c++17 } }
+
+// { dg-warning Winterference-size "" { target *-*-* } 0 }
+#include "Winterference.H"
diff --git a/gcc/testsuite/g++.dg/warn/Winterference.H b/gcc/testsuite/g++.dg/warn/Winterference.H
new file mode 100644
index 00000000000..36f0ad5f6d1
--- /dev/null
+++ b/gcc/testsuite/g++.dg/warn/Winterference.H
@@ -0,0 +1,7 @@
+#include <new>
+
+struct A
+{
+ alignas(std::hardware_destructive_interference_size) int i;
+ alignas(std::hardware_destructive_interference_size) int j;
+};
diff --git a/gcc/testsuite/g++.target/aarch64/interference.C b/gcc/testsuite/g++.target/aarch64/interference.C
new file mode 100644
index 00000000000..0fc01655223
--- /dev/null
+++ b/gcc/testsuite/g++.target/aarch64/interference.C
@@ -0,0 +1,9 @@
+// Test C++17 hardware interference size constants
+// { dg-do compile { target c++17 } }
+
+#include <new>
+
+// Most AArch64 CPUs have an L1 cache line size of 64, but some recent ones use
+// 128 or even 256.
+static_assert(std::hardware_destructive_interference_size == 256);
+static_assert(std::hardware_constructive_interference_size == 64);
diff --git a/gcc/testsuite/g++.target/arm/interference.C b/gcc/testsuite/g++.target/arm/interference.C
new file mode 100644
index 00000000000..34fe8a52bff
--- /dev/null
+++ b/gcc/testsuite/g++.target/arm/interference.C
@@ -0,0 +1,9 @@
+// Test C++17 hardware interference size constants
+// { dg-do compile { target c++17 } }
+
+#include <new>
+
+// Recent ARM CPUs have a cache line size of 64. Older ones have
+// a size of 32, but I guess they're old enough that we don't care?
+static_assert(std::hardware_destructive_interference_size == 64);
+static_assert(std::hardware_constructive_interference_size == 64);
diff --git a/gcc/testsuite/g++.target/i386/interference.C b/gcc/testsuite/g++.target/i386/interference.C
new file mode 100644
index 00000000000..c7b910e3ada
--- /dev/null
+++ b/gcc/testsuite/g++.target/i386/interference.C
@@ -0,0 +1,8 @@
+// Test C++17 hardware interference size constants
+// { dg-do compile { target c++17 } }
+
+#include <new>
+
+// It is generally agreed that these are the right values for all x86.
+static_assert(std::hardware_destructive_interference_size == 64);
+static_assert(std::hardware_constructive_interference_size == 64);
diff --git a/libstdc++-v3/include/std/version b/libstdc++-v3/include/std/version
index f950bf0f0db..f41004b5911 100644
--- a/libstdc++-v3/include/std/version
+++ b/libstdc++-v3/include/std/version
@@ -140,6 +140,9 @@
#define __cpp_lib_filesystem 201703
#define __cpp_lib_gcd 201606
#define __cpp_lib_gcd_lcm 201606
+#ifdef __GCC_DESTRUCTIVE_SIZE
+# define __cpp_lib_hardware_interference_size 201703L
+#endif
#define __cpp_lib_hypot 201603
#define __cpp_lib_invoke 201411L
#define __cpp_lib_lcm 201606
diff --git a/libstdc++-v3/libsupc++/new b/libstdc++-v3/libsupc++/new
index 3349b13fd1b..7bc67a6cb02 100644
--- a/libstdc++-v3/libsupc++/new
+++ b/libstdc++-v3/libsupc++/new
@@ -183,9 +183,9 @@ inline void operator delete[](void*, void*) _GLIBCXX_USE_NOEXCEPT { }
} // extern "C++"
#if __cplusplus >= 201703L
-#ifdef _GLIBCXX_HAVE_BUILTIN_LAUNDER
namespace std
{
+#ifdef _GLIBCXX_HAVE_BUILTIN_LAUNDER
#define __cpp_lib_launder 201606
/// Pointer optimization barrier [ptr.launder]
template<typename _Tp>
@@ -205,8 +205,14 @@ namespace std
void launder(const void*) = delete;
void launder(volatile void*) = delete;
void launder(const volatile void*) = delete;
-}
#endif // _GLIBCXX_HAVE_BUILTIN_LAUNDER
+
+#ifdef __GCC_DESTRUCTIVE_SIZE
+# define __cpp_lib_hardware_interference_size 201703L
+ inline constexpr size_t hardware_destructive_interference_size = __GCC_DESTRUCTIVE_SIZE;
+ inline constexpr size_t hardware_constructive_interference_size = __GCC_CONSTRUCTIVE_SIZE;
+#endif // __GCC_DESTRUCTIVE_SIZE
+}
#endif // C++17
#if __cplusplus > 201703L
</cut>
Identified regression caused by *gcc:76b75018b3d053a890ebe155e47814de14b3c9fb*:
commit 76b75018b3d053a890ebe155e47814de14b3c9fb
Author: Jason Merrill <jason(a)redhat.com>
c++: implement C++17 hardware interference size
Results regressed to (for first_bad == 76b75018b3d053a890ebe155e47814de14b3c9fb)
# reset_artifacts:
-10
# true:
0
# build_abe binutils:
1
# First few build errors in logs:
from (for last_good == 8ea292591e42aa4d52b4b7a00b86335bfd2e2e85)
# reset_artifacts:
-10
# true:
0
# build_abe binutils:
1
# build_abe gcc:
2
# build_abe linux:
4
# build_abe glibc:
5
# build_abe gdb:
6
This commit has regressed these CI configurations:
- tcwg_gnu_native_build/master-arm
Artifacts of last_good build: https://ci.linaro.org/job/tcwg_gnu_native_build-bisect-master-arm/2/artifac…
Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_gnu_native_build-bisect-master-arm/2/artifac…
Even more details: https://ci.linaro.org/job/tcwg_gnu_native_build-bisect-master-arm/2/artifac…
Reproduce builds:
<cut>
mkdir investigate-gcc-76b75018b3d053a890ebe155e47814de14b3c9fb
cd investigate-gcc-76b75018b3d053a890ebe155e47814de14b3c9fb
# Fetch scripts
git clone https://git.linaro.org/toolchain/jenkins-scripts
# Fetch manifests and test.sh script
mkdir -p artifacts/manifests
curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_gnu_native_build-bisect-master-arm/2/artifac… --fail
curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_gnu_native_build-bisect-master-arm/2/artifac… --fail
curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_gnu_native_build-bisect-master-arm/2/artifac… --fail
chmod +x artifacts/test.sh
# Reproduce the baseline build (build all pre-requisites)
./jenkins-scripts/tcwg_gnu-build.sh @@ artifacts/manifests/build-baseline.sh
# Save baseline build state (which is then restored in artifacts/test.sh)
mkdir -p ./bisect
rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /gcc/ ./ ./bisect/baseline/
cd gcc
# Reproduce first_bad build
git checkout --detach 76b75018b3d053a890ebe155e47814de14b3c9fb
../artifacts/test.sh
# Reproduce last_good build
git checkout --detach 8ea292591e42aa4d52b4b7a00b86335bfd2e2e85
../artifacts/test.sh
cd ..
</cut>
Full commit (up to 1000 lines):
<cut>
commit 76b75018b3d053a890ebe155e47814de14b3c9fb
Author: Jason Merrill <jason(a)redhat.com>
Date: Thu Jul 15 15:30:17 2021 -0400
c++: implement C++17 hardware interference size
The last missing piece of the C++17 standard library is the hardware
intereference size constants. Much of the delay in implementing these has
been due to uncertainty about what the right values are, and even whether
there is a single constant value that is suitable; the destructive
interference size is intended to be used in structure layout, so program
ABIs will depend on it.
In principle, both of these values should be the same as the target's L1
cache line size. When compiling for a generic target that is intended to
support a range of target CPUs with different cache line sizes, the
constructive size should probably be the minimum size, and the destructive
size the maximum, unless you are constrained by ABI compatibility with
previous code.
From discussion on gcc-patches, I've come to the conclusion that the
solution to the difficulty of choosing stable values is to give up on it,
and instead encourage only uses where ABI stability is unimportant: in
particular, uses where the ABI is shared at most between translation units
built at the same time with the same flags.
To that end, I've added a warning for any use of the constant value of
std::hardware_destructive_interference_size in a header or module export.
Appropriate uses within a project can disable the warning.
A previous iteration of this patch included an -finterference-tune flag to
make the value vary with -mtune; this iteration makes that the default
behavior, which should be appropriate for all reasonable uses of the
variable. The previous default of "stable-ish" seems to me likely to have
been more of an attractive nuisance; since we can't promise actual
stability, we should instead make proper uses more convenient.
JF Bastien's implementation proposal is summarized at
https://github.com/itanium-cxx-abi/cxx-abi/issues/74
I implement this by adding new --params for the two sizes. Targets can
override these values in targetm.target_option.override() to support a range
of values for the generic target; otherwise, both will default to the L1
cache line size.
64 bytes still seems correct for all x86.
I'm not sure why he proposed 64/64 for generic 32-bit ARM, since the Cortex
A9 has a 32-byte cache line, so I'd think 32/64 would make more sense.
He proposed 64/128 for generic AArch64, but since the A64FX now has a 256B
cache line, I've changed that to 64/256.
Other arch maintainers are invited to set ranges for their generic targets
if that seems better than using the default cache line size for both values.
With the above choice to reject stability as a goal, getting these values
"right" is now just a matter of what we want the default optimization to be,
and we can feel free to adjust them as CPUs with different cache lines
become more and less common.
gcc/ChangeLog:
* params.opt: Add destructive-interference-size and
constructive-interference-size.
* doc/invoke.texi: Document them.
* config/aarch64/aarch64.c (aarch64_override_options_internal):
Set them.
* config/arm/arm.c (arm_option_override): Set them.
* config/i386/i386-options.c (ix86_option_override_internal):
Set them.
gcc/c-family/ChangeLog:
* c.opt: Add -Winterference-size.
* c-cppbuiltin.c (cpp_atomic_builtins): Add __GCC_DESTRUCTIVE_SIZE
and __GCC_CONSTRUCTIVE_SIZE.
gcc/cp/ChangeLog:
* constexpr.c (maybe_warn_about_constant_value):
Complain about std::hardware_destructive_interference_size.
(cxx_eval_constant_expression): Call it.
* decl.c (cxx_init_decl_processing): Check
--param *-interference-size values.
libstdc++-v3/ChangeLog:
* include/std/version: Define __cpp_lib_hardware_interference_size.
* libsupc++/new: Define hardware interference size variables.
gcc/testsuite/ChangeLog:
* g++.dg/warn/Winterference.H: New file.
* g++.dg/warn/Winterference.C: New test.
* g++.target/aarch64/interference.C: New test.
* g++.target/arm/interference.C: New test.
* g++.target/i386/interference.C: New test.
---
gcc/c-family/c-cppbuiltin.c | 14 ++++++
gcc/c-family/c.opt | 5 ++
gcc/config/aarch64/aarch64.c | 22 +++++++++
gcc/config/arm/arm.c | 22 +++++++++
gcc/config/i386/i386-options.c | 6 +++
gcc/cp/constexpr.c | 33 +++++++++++++
gcc/cp/decl.c | 32 ++++++++++++
gcc/doc/invoke.texi | 65 +++++++++++++++++++++++++
gcc/params.opt | 16 ++++++
gcc/testsuite/g++.dg/warn/Winterference-2.C | 14 ++++++
gcc/testsuite/g++.dg/warn/Winterference.C | 6 +++
gcc/testsuite/g++.dg/warn/Winterference.H | 7 +++
gcc/testsuite/g++.target/aarch64/interference.C | 9 ++++
gcc/testsuite/g++.target/arm/interference.C | 9 ++++
gcc/testsuite/g++.target/i386/interference.C | 8 +++
libstdc++-v3/include/std/version | 3 ++
libstdc++-v3/libsupc++/new | 10 +++-
17 files changed, 279 insertions(+), 2 deletions(-)
diff --git a/gcc/c-family/c-cppbuiltin.c b/gcc/c-family/c-cppbuiltin.c
index 48cbefd8bf8..ce88e707127 100644
--- a/gcc/c-family/c-cppbuiltin.c
+++ b/gcc/c-family/c-cppbuiltin.c
@@ -741,6 +741,20 @@ cpp_atomic_builtins (cpp_reader *pfile)
builtin_define_with_int_value ("__GCC_ATOMIC_TEST_AND_SET_TRUEVAL",
targetm.atomic_test_and_set_trueval);
+ /* Macros for C++17 hardware interference size constants. Either both or
+ neither should be set. */
+ gcc_assert (!param_destruct_interfere_size
+ == !param_construct_interfere_size);
+ if (param_destruct_interfere_size)
+ {
+ /* FIXME The way of communicating these values to the library should be
+ part of the C++ ABI, whether macro or builtin. */
+ builtin_define_with_int_value ("__GCC_DESTRUCTIVE_SIZE",
+ param_destruct_interfere_size);
+ builtin_define_with_int_value ("__GCC_CONSTRUCTIVE_SIZE",
+ param_construct_interfere_size);
+ }
+
/* ptr_type_node can't be used here since ptr_mode is only set when
toplev calls backend_init which is not done with -E or pch. */
psize = POINTER_SIZE_UNITS;
diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt
index c5fe90003f2..9c151d19870 100644
--- a/gcc/c-family/c.opt
+++ b/gcc/c-family/c.opt
@@ -722,6 +722,11 @@ Winit-list-lifetime
C++ ObjC++ Var(warn_init_list) Warning Init(1)
Warn about uses of std::initializer_list that can result in dangling pointers.
+Winterference-size
+C++ ObjC++ Var(warn_interference_size) Warning Init(1)
+Warn about nonsensical values of --param destructive-interference-size or
+constructive-interference-size.
+
Wimplicit
C ObjC Var(warn_implicit) Warning LangEnabledBy(C ObjC,Wall)
Warn about implicit declarations.
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 30d9a0b7a3d..36519ccc5a5 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -16540,6 +16540,28 @@ aarch64_override_options_internal (struct gcc_options *opts)
SET_OPTION_IF_UNSET (opts, &global_options_set,
param_l1_cache_line_size,
aarch64_tune_params.prefetch->l1_cache_line_size);
+
+ if (aarch64_tune_params.prefetch->l1_cache_line_size >= 0)
+ {
+ SET_OPTION_IF_UNSET (opts, &global_options_set,
+ param_destruct_interfere_size,
+ aarch64_tune_params.prefetch->l1_cache_line_size);
+ SET_OPTION_IF_UNSET (opts, &global_options_set,
+ param_construct_interfere_size,
+ aarch64_tune_params.prefetch->l1_cache_line_size);
+ }
+ else
+ {
+ /* For a generic AArch64 target, cover the current range of cache line
+ sizes. */
+ SET_OPTION_IF_UNSET (opts, &global_options_set,
+ param_destruct_interfere_size,
+ 256);
+ SET_OPTION_IF_UNSET (opts, &global_options_set,
+ param_construct_interfere_size,
+ 64);
+ }
+
if (aarch64_tune_params.prefetch->l2_cache_size >= 0)
SET_OPTION_IF_UNSET (opts, &global_options_set,
param_l2_cache_size,
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index f1e628253d0..6c6e77fab66 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -3669,6 +3669,28 @@ arm_option_override (void)
SET_OPTION_IF_UNSET (&global_options, &global_options_set,
param_l1_cache_line_size,
current_tune->prefetch.l1_cache_line_size);
+ if (current_tune->prefetch.l1_cache_line_size >= 0)
+ {
+ SET_OPTION_IF_UNSET (&global_options, &global_options_set,
+ param_destruct_interfere_size,
+ current_tune->prefetch.l1_cache_line_size);
+ SET_OPTION_IF_UNSET (&global_options, &global_options_set,
+ param_construct_interfere_size,
+ current_tune->prefetch.l1_cache_line_size);
+ }
+ else
+ {
+ /* For a generic ARM target, JF Bastien proposed using 64 for both. */
+ /* ??? Cortex A9 has a 32-byte cache line, so why not 32 for
+ constructive? */
+ /* More recent Cortex chips have a 64-byte cache line, but are marked
+ ARM_PREFETCH_NOT_BENEFICIAL, so they get these defaults. */
+ SET_OPTION_IF_UNSET (&global_options, &global_options_set,
+ param_destruct_interfere_size, 64);
+ SET_OPTION_IF_UNSET (&global_options, &global_options_set,
+ param_construct_interfere_size, 64);
+ }
+
if (current_tune->prefetch.l1_cache_size >= 0)
SET_OPTION_IF_UNSET (&global_options, &global_options_set,
param_l1_cache_size,
diff --git a/gcc/config/i386/i386-options.c b/gcc/config/i386/i386-options.c
index 2cb87cedec0..c0006b3674b 100644
--- a/gcc/config/i386/i386-options.c
+++ b/gcc/config/i386/i386-options.c
@@ -2579,6 +2579,12 @@ ix86_option_override_internal (bool main_args_p,
SET_OPTION_IF_UNSET (opts, opts_set, param_l2_cache_size,
ix86_tune_cost->l2_cache_size);
+ /* 64B is the accepted value for these for all x86. */
+ SET_OPTION_IF_UNSET (&global_options, &global_options_set,
+ param_destruct_interfere_size, 64);
+ SET_OPTION_IF_UNSET (&global_options, &global_options_set,
+ param_construct_interfere_size, 64);
+
/* Enable sw prefetching at -O3 for CPUS that prefetching is helpful. */
if (opts->x_flag_prefetch_loop_arrays < 0
&& HAVE_prefetch
diff --git a/gcc/cp/constexpr.c b/gcc/cp/constexpr.c
index 7772fe62d95..0c2498aee22 100644
--- a/gcc/cp/constexpr.c
+++ b/gcc/cp/constexpr.c
@@ -6075,6 +6075,37 @@ inline_asm_in_constexpr_error (location_t loc)
"%<constexpr%> function in C++20");
}
+/* We're getting the constant value of DECL in a manifestly constant-evaluated
+ context; maybe complain about that. */
+
+static void
+maybe_warn_about_constant_value (location_t loc, tree decl)
+{
+ static bool explained = false;
+ if (cxx_dialect >= cxx17
+ && warn_interference_size
+ && !global_options_set.x_param_destruct_interfere_size
+ && DECL_CONTEXT (decl) == std_node
+ && id_equal (DECL_NAME (decl), "hardware_destructive_interference_size")
+ && (LOCATION_FILE (input_location) != main_input_filename
+ || module_exporting_p ())
+ && warning_at (loc, OPT_Winterference_size, "use of %qD", decl)
+ && !explained)
+ {
+ explained = true;
+ inform (loc, "its value can vary between compiler versions or "
+ "with different %<-mtune%> or %<-mcpu%> flags");
+ inform (loc, "if this use is part of a public ABI, change it to "
+ "instead use a constant variable you define");
+ inform (loc, "the default value for the current CPU tuning "
+ "is %d bytes", param_destruct_interfere_size);
+ inform (loc, "you can stabilize this value with %<--param "
+ "hardware_destructive_interference_size=%d%>, or disable "
+ "this warning with %<-Wno-interference-size%>",
+ param_destruct_interfere_size);
+ }
+}
+
/* Attempt to reduce the expression T to a constant value.
On failure, issue diagnostic and return error_mark_node. */
/* FIXME unify with c_fully_fold */
@@ -6219,6 +6250,8 @@ cxx_eval_constant_expression (const constexpr_ctx *ctx, tree t,
r = *p;
break;
}
+ if (ctx->manifestly_const_eval)
+ maybe_warn_about_constant_value (loc, t);
if (COMPLETE_TYPE_P (TREE_TYPE (t))
&& is_really_empty_class (TREE_TYPE (t), /*ignore_vptr*/false))
{
diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c
index bce62ad202a..c2065027369 100644
--- a/gcc/cp/decl.c
+++ b/gcc/cp/decl.c
@@ -4752,6 +4752,38 @@ cxx_init_decl_processing (void)
/* Show we use EH for cleanups. */
if (flag_exceptions)
using_eh_for_cleanups ();
+
+ /* Check that the hardware interference sizes are at least
+ alignof(max_align_t), as required by the standard. */
+ const int max_align = max_align_t_align () / BITS_PER_UNIT;
+ if (param_destruct_interfere_size)
+ {
+ if (param_destruct_interfere_size < max_align)
+ error ("%<--param destructive-interference-size=%d%> is less than "
+ "%d", param_destruct_interfere_size, max_align);
+ else if (param_destruct_interfere_size < param_l1_cache_line_size)
+ warning (OPT_Winterference_size,
+ "%<--param destructive-interference-size=%d%> "
+ "is less than %<--param l1-cache-line-size=%d%>",
+ param_destruct_interfere_size, param_l1_cache_line_size);
+ }
+ else if (param_l1_cache_line_size >= max_align)
+ param_destruct_interfere_size = param_l1_cache_line_size;
+ /* else leave it unset. */
+
+ if (param_construct_interfere_size)
+ {
+ if (param_construct_interfere_size < max_align)
+ error ("%<--param constructive-interference-size=%d%> is less than "
+ "%d", param_construct_interfere_size, max_align);
+ else if (param_construct_interfere_size > param_l1_cache_line_size)
+ warning (OPT_Winterference_size,
+ "%<--param constructive-interference-size=%d%> "
+ "is greater than %<--param l1-cache-line-size=%d%>",
+ param_construct_interfere_size, param_l1_cache_line_size);
+ }
+ else if (param_l1_cache_line_size >= max_align)
+ param_construct_interfere_size = param_l1_cache_line_size;
}
/* Enter an abi node in global-module context. returns a cookie to
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 23cc68f92b5..78cfc100ac2 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -9018,6 +9018,43 @@ that has already been done in the current function. Therefore,
seemingly insignificant changes in the source program can cause the
warnings produced by @option{-Winline} to appear or disappear.
+@item -Winterference-size
+@opindex Winterference-size
+Warn about use of C++17 @code{std::hardware_destructive_interference_size}
+without specifying its value with @option{--param destructive-interference-size}.
+Also warn about questionable values for that option.
+
+This variable is intended to be used for controlling class layout, to
+avoid false sharing in concurrent code:
+
+@smallexample
+struct independent_fields @{
+ alignas(std::hardware_destructive_interference_size) std::atomic<int> one;
+ alignas(std::hardware_destructive_interference_size) std::atomic<int> two;
+@};
+@end smallexample
+
+Here @samp{one} and @samp{two} are intended to be far enough apart
+that stores to one won't require accesses to the other to reload the
+cache line.
+
+By default, @option{--param destructive-interference-size} and
+@option{--param constructive-interference-size} are set based on the
+current @option{-mtune} option, typically to the L1 cache line size
+for the particular target CPU, sometimes to a range if tuning for a
+generic target. So all translation units that depend on ABI
+compatibility for the use of these variables must be compiled with
+the same @option{-mtune} (or @option{-mcpu}).
+
+If ABI stability is important, such as if the use is in a header for a
+library, you should probably not use the hardware interference size
+variables at all. Alternatively, you can force a particular value
+with @option{--param}.
+
+If you are confident that your use of the variable does not affect ABI
+outside a single build of your project, you can turn off the warning
+with @option{-Wno-interference-size}.
+
@item -Wint-in-bool-context
@opindex Wint-in-bool-context
@opindex Wno-int-in-bool-context
@@ -13938,6 +13975,34 @@ prefetch hints can be issued for any constant stride.
This setting is only useful for strides that are known and constant.
+@item destructive-interference-size
+@item constructive-interference-size
+The values for the C++17 variables
+@code{std::hardware_destructive_interference_size} and
+@code{std::hardware_constructive_interference_size}. The destructive
+interference size is the minimum recommended offset between two
+independent concurrently-accessed objects; the constructive
+interference size is the maximum recommended size of contiguous memory
+accessed together. Typically both will be the size of an L1 cache
+line for the target, in bytes. For a generic target covering a range of L1
+cache line sizes, typically the constructive interference size will be
+the small end of the range and the destructive size will be the large
+end.
+
+The destructive interference size is intended to be used for layout,
+and thus has ABI impact. The default value is not expected to be
+stable, and on some targets varies with @option{-mtune}, so use of
+this variable in a context where ABI stability is important, such as
+the public interface of a library, is strongly discouraged; if it is
+used in that context, users can stabilize the value using this
+option.
+
+The constructive interference size is less sensitive, as it is
+typically only used in a @samp{static_assert} to make sure that a type
+fits within a cache line.
+
+See also @option{-Winterference-size}.
+
@item loop-interchange-max-num-stmts
The maximum number of stmts in a loop to be interchanged.
diff --git a/gcc/params.opt b/gcc/params.opt
index 3a701e22c46..658ca028851 100644
--- a/gcc/params.opt
+++ b/gcc/params.opt
@@ -361,6 +361,22 @@ The maximum code size growth ratio when expanding into a jump table (in percent)
Common Joined UInteger Var(param_l1_cache_line_size) Init(32) Param Optimization
The size of L1 cache line.
+-param=destructive-interference-size=
+Common Joined UInteger Var(param_destruct_interfere_size) Init(0) Param Optimization
+The minimum recommended offset between two concurrently-accessed objects to
+avoid additional performance degradation due to contention introduced by the
+implementation. Typically the L1 cache line size, but can be larger to
+accommodate a variety of target processors with different cache line sizes.
+C++17 code might use this value in structure layout, but is strongly
+discouraged from doing so in public ABIs.
+
+-param=constructive-interference-size=
+Common Joined UInteger Var(param_construct_interfere_size) Init(0) Param Optimization
+The maximum recommended size of contiguous memory occupied by two objects
+accessed with temporal locality by concurrent threads. Typically the L1 cache
+line size, but can be smaller to accommodate a variety of target processors with
+different cache line sizes.
+
-param=l1-cache-size=
Common Joined UInteger Var(param_l1_cache_size) Init(64) Param Optimization
The size of L1 cache.
diff --git a/gcc/testsuite/g++.dg/warn/Winterference-2.C b/gcc/testsuite/g++.dg/warn/Winterference-2.C
new file mode 100644
index 00000000000..2af75c63f83
--- /dev/null
+++ b/gcc/testsuite/g++.dg/warn/Winterference-2.C
@@ -0,0 +1,14 @@
+// { dg-do compile { target c++20 } }
+// { dg-additional-options -fmodules-ts }
+
+module ;
+
+#include <new>
+
+export module foo;
+
+export {
+ struct A {
+ alignas(std::hardware_destructive_interference_size) int x; // { dg-warning Winterference-size }
+ };
+}
diff --git a/gcc/testsuite/g++.dg/warn/Winterference.C b/gcc/testsuite/g++.dg/warn/Winterference.C
new file mode 100644
index 00000000000..57c001bc032
--- /dev/null
+++ b/gcc/testsuite/g++.dg/warn/Winterference.C
@@ -0,0 +1,6 @@
+// Test that we warn about use of std::hardware_destructive_interference_size
+// in a header.
+// { dg-do compile { target c++17 } }
+
+// { dg-warning Winterference-size "" { target *-*-* } 0 }
+#include "Winterference.H"
diff --git a/gcc/testsuite/g++.dg/warn/Winterference.H b/gcc/testsuite/g++.dg/warn/Winterference.H
new file mode 100644
index 00000000000..36f0ad5f6d1
--- /dev/null
+++ b/gcc/testsuite/g++.dg/warn/Winterference.H
@@ -0,0 +1,7 @@
+#include <new>
+
+struct A
+{
+ alignas(std::hardware_destructive_interference_size) int i;
+ alignas(std::hardware_destructive_interference_size) int j;
+};
diff --git a/gcc/testsuite/g++.target/aarch64/interference.C b/gcc/testsuite/g++.target/aarch64/interference.C
new file mode 100644
index 00000000000..0fc01655223
--- /dev/null
+++ b/gcc/testsuite/g++.target/aarch64/interference.C
@@ -0,0 +1,9 @@
+// Test C++17 hardware interference size constants
+// { dg-do compile { target c++17 } }
+
+#include <new>
+
+// Most AArch64 CPUs have an L1 cache line size of 64, but some recent ones use
+// 128 or even 256.
+static_assert(std::hardware_destructive_interference_size == 256);
+static_assert(std::hardware_constructive_interference_size == 64);
diff --git a/gcc/testsuite/g++.target/arm/interference.C b/gcc/testsuite/g++.target/arm/interference.C
new file mode 100644
index 00000000000..34fe8a52bff
--- /dev/null
+++ b/gcc/testsuite/g++.target/arm/interference.C
@@ -0,0 +1,9 @@
+// Test C++17 hardware interference size constants
+// { dg-do compile { target c++17 } }
+
+#include <new>
+
+// Recent ARM CPUs have a cache line size of 64. Older ones have
+// a size of 32, but I guess they're old enough that we don't care?
+static_assert(std::hardware_destructive_interference_size == 64);
+static_assert(std::hardware_constructive_interference_size == 64);
diff --git a/gcc/testsuite/g++.target/i386/interference.C b/gcc/testsuite/g++.target/i386/interference.C
new file mode 100644
index 00000000000..c7b910e3ada
--- /dev/null
+++ b/gcc/testsuite/g++.target/i386/interference.C
@@ -0,0 +1,8 @@
+// Test C++17 hardware interference size constants
+// { dg-do compile { target c++17 } }
+
+#include <new>
+
+// It is generally agreed that these are the right values for all x86.
+static_assert(std::hardware_destructive_interference_size == 64);
+static_assert(std::hardware_constructive_interference_size == 64);
diff --git a/libstdc++-v3/include/std/version b/libstdc++-v3/include/std/version
index f950bf0f0db..f41004b5911 100644
--- a/libstdc++-v3/include/std/version
+++ b/libstdc++-v3/include/std/version
@@ -140,6 +140,9 @@
#define __cpp_lib_filesystem 201703
#define __cpp_lib_gcd 201606
#define __cpp_lib_gcd_lcm 201606
+#ifdef __GCC_DESTRUCTIVE_SIZE
+# define __cpp_lib_hardware_interference_size 201703L
+#endif
#define __cpp_lib_hypot 201603
#define __cpp_lib_invoke 201411L
#define __cpp_lib_lcm 201606
diff --git a/libstdc++-v3/libsupc++/new b/libstdc++-v3/libsupc++/new
index 3349b13fd1b..7bc67a6cb02 100644
--- a/libstdc++-v3/libsupc++/new
+++ b/libstdc++-v3/libsupc++/new
@@ -183,9 +183,9 @@ inline void operator delete[](void*, void*) _GLIBCXX_USE_NOEXCEPT { }
} // extern "C++"
#if __cplusplus >= 201703L
-#ifdef _GLIBCXX_HAVE_BUILTIN_LAUNDER
namespace std
{
+#ifdef _GLIBCXX_HAVE_BUILTIN_LAUNDER
#define __cpp_lib_launder 201606
/// Pointer optimization barrier [ptr.launder]
template<typename _Tp>
@@ -205,8 +205,14 @@ namespace std
void launder(const void*) = delete;
void launder(volatile void*) = delete;
void launder(const volatile void*) = delete;
-}
#endif // _GLIBCXX_HAVE_BUILTIN_LAUNDER
+
+#ifdef __GCC_DESTRUCTIVE_SIZE
+# define __cpp_lib_hardware_interference_size 201703L
+ inline constexpr size_t hardware_destructive_interference_size = __GCC_DESTRUCTIVE_SIZE;
+ inline constexpr size_t hardware_constructive_interference_size = __GCC_CONSTRUCTIVE_SIZE;
+#endif // __GCC_DESTRUCTIVE_SIZE
+}
#endif // C++17
#if __cplusplus > 201703L
</cut>
Successfully identified regression in *linux* in CI configuration tcwg_kernel/llvm-master-aarch64-mainline-allmodconfig. So far, this commit has regressed CI configurations:
- tcwg_kernel/llvm-master-aarch64-mainline-allmodconfig
Culprit:
<cut>
commit c3496da580b0fc10fdeba8f6a5e6aef4c78b5598
Author: Slark Xiao <slark_xiao(a)163.com>
Date: Tue Aug 31 10:40:25 2021 +0800
net: Add depends on OF_NET for LiteX's LiteETH
Current settings may produce a build error when
CONFIG_OF_NET is disabled. The CONFIG_OF_NET controls
a headfile <linux/of.h> and some functions
in <linux/of_net.h>.
Signed-off-by: Slark Xiao <slark_xiao(a)163.com>
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
</cut>
Results regressed to (for first_bad == c3496da580b0fc10fdeba8f6a5e6aef4c78b5598)
# reset_artifacts:
-10
# build_abe binutils:
-9
# build_llvm:
-5
# build_abe qemu:
-2
# linux_n_obj:
29873
# linux build successful:
all
# First few build errors in logs:
from (for last_good == a9e7c3cedc2914f63cd135b75832b9bf850af782)
# reset_artifacts:
-10
# build_abe binutils:
-9
# build_llvm:
-5
# build_abe qemu:
-2
# linux_n_obj:
29873
# linux build successful:
all
# linux boot successful:
boot
Artifacts of last_good build: https://ci.linaro.org/job/tcwg_kernel-llvm-bisect-llvm-master-aarch64-mainl…
Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_kernel-llvm-bisect-llvm-master-aarch64-mainl…
Build top page/logs: https://ci.linaro.org/job/tcwg_kernel-llvm-bisect-llvm-master-aarch64-mainl…
Configuration details:
Reproduce builds:
<cut>
mkdir investigate-linux-c3496da580b0fc10fdeba8f6a5e6aef4c78b5598
cd investigate-linux-c3496da580b0fc10fdeba8f6a5e6aef4c78b5598
git clone https://git.linaro.org/toolchain/jenkins-scripts
mkdir -p artifacts/manifests
curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_kernel-llvm-bisect-llvm-master-aarch64-mainl… --fail
curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_kernel-llvm-bisect-llvm-master-aarch64-mainl… --fail
curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_kernel-llvm-bisect-llvm-master-aarch64-mainl… --fail
chmod +x artifacts/test.sh
# Reproduce the baseline build (build all pre-requisites)
./jenkins-scripts/tcwg_kernel-build.sh @@ artifacts/manifests/build-baseline.sh
# Save baseline build state (which is then restored in artifacts/test.sh)
mkdir -p ./bisect
rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /linux/ ./ ./bisect/baseline/
cd linux
# Reproduce first_bad build
git checkout --detach c3496da580b0fc10fdeba8f6a5e6aef4c78b5598
../artifacts/test.sh
# Reproduce last_good build
git checkout --detach a9e7c3cedc2914f63cd135b75832b9bf850af782
../artifacts/test.sh
cd ..
</cut>
History of pending regressions and results: https://git.linaro.org/toolchain/ci/base-artifacts.git/log/?h=linaro-local/…
Artifacts: https://ci.linaro.org/job/tcwg_kernel-llvm-bisect-llvm-master-aarch64-mainl…
Build log: https://ci.linaro.org/job/tcwg_kernel-llvm-bisect-llvm-master-aarch64-mainl…
Full commit (up to 1000 lines):
<cut>
commit c3496da580b0fc10fdeba8f6a5e6aef4c78b5598
Author: Slark Xiao <slark_xiao(a)163.com>
Date: Tue Aug 31 10:40:25 2021 +0800
net: Add depends on OF_NET for LiteX's LiteETH
Current settings may produce a build error when
CONFIG_OF_NET is disabled. The CONFIG_OF_NET controls
a headfile <linux/of.h> and some functions
in <linux/of_net.h>.
Signed-off-by: Slark Xiao <slark_xiao(a)163.com>
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
---
drivers/net/ethernet/litex/Kconfig | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/net/ethernet/litex/Kconfig b/drivers/net/ethernet/litex/Kconfig
index 265dba414b41..63bf01d28f0c 100644
--- a/drivers/net/ethernet/litex/Kconfig
+++ b/drivers/net/ethernet/litex/Kconfig
@@ -17,6 +17,7 @@ if NET_VENDOR_LITEX
config LITEX_LITEETH
tristate "LiteX Ethernet support"
+ depends on OF_NET
help
If you wish to compile a kernel for hardware with a LiteX LiteEth
device then you should answer Y to this.
</cut>
Identified regression caused by *gcc:01b5038718056b024b370b74a874fbd92c5bbab3*:
commit 01b5038718056b024b370b74a874fbd92c5bbab3
Author: Aldy Hernandez <aldyh(a)redhat.com>
Disable threading through latches until after loop optimizations.
Results regressed to (for first_bad == 01b5038718056b024b370b74a874fbd92c5bbab3)
# reset_artifacts:
-10
# build_abe binutils:
-9
# build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer:
-8
# build_abe linux:
-7
# build_abe glibc:
-6
# build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer:
-5
# true:
0
# benchmark -- -Os artifacts/build-01b5038718056b024b370b74a874fbd92c5bbab3/results_id:
1
# 459.GemsFDTD,GemsFDTD_base.default regressed by 102
# 464.h264ref,h264ref_base.default regressed by 102
from (for last_good == fb88bf9931f17d137eb50c001e1c924aa1e34e83)
# reset_artifacts:
-10
# build_abe binutils:
-9
# build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer:
-8
# build_abe linux:
-7
# build_abe glibc:
-6
# build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer:
-5
# true:
0
# benchmark -- -Os artifacts/build-fb88bf9931f17d137eb50c001e1c924aa1e34e83/results_id:
1
This commit has regressed these CI configurations:
- tcwg_bmk_gnu_apm/gnu-master-aarch64-spec2k6-Os
Artifacts of last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-master-aa…
Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-master-aa…
Even more details: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-master-aa…
Reproduce builds:
<cut>
mkdir investigate-gcc-01b5038718056b024b370b74a874fbd92c5bbab3
cd investigate-gcc-01b5038718056b024b370b74a874fbd92c5bbab3
# Fetch scripts
git clone https://git.linaro.org/toolchain/jenkins-scripts
# Fetch manifests and test.sh script
mkdir -p artifacts/manifests
curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-master-aa… --fail
curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-master-aa… --fail
curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-master-aa… --fail
chmod +x artifacts/test.sh
# Reproduce the baseline build (build all pre-requisites)
./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh
# Save baseline build state (which is then restored in artifacts/test.sh)
mkdir -p ./bisect
rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /gcc/ ./ ./bisect/baseline/
cd gcc
# Reproduce first_bad build
git checkout --detach 01b5038718056b024b370b74a874fbd92c5bbab3
../artifacts/test.sh
# Reproduce last_good build
git checkout --detach fb88bf9931f17d137eb50c001e1c924aa1e34e83
../artifacts/test.sh
cd ..
</cut>
Full commit (up to 1000 lines):
<cut>
commit 01b5038718056b024b370b74a874fbd92c5bbab3
Author: Aldy Hernandez <aldyh(a)redhat.com>
Date: Thu Sep 9 20:30:28 2021 +0200
Disable threading through latches until after loop optimizations.
The motivation for this patch was enabling the use of global ranges in
the path solver, but this caused certain properties of loops being
destroyed which made subsequent loop optimizations to fail.
Consequently, this patch's mail goal is to disable jump threading
involving the latch until after loop optimizations have run.
As can be seen in the test adjustments, we mostly shift the threading
from the early threaders (ethread, thread[12] to the late threaders
thread[34]). I have nuked some of the early notes in the testcases
that came as part of the jump threader rewrite. They're mostly noise
now.
Note that we could probably relax some other restrictions in
profitable_path_p when loop optimizations have completed, but it would
require more testing, and I'm hesitant to touch more things than needed
at this point. I have added a reminder to the function to keep this
in mind.
Finally, perhaps as a follow-up, we should apply the same restrictions to
the forward threader. At some point I'd like to combine the cost models.
Tested on x86-64 Linux.
p.s. There is a thorough discussion involving the limitations of jump
threading involving loops here:
https://gcc.gnu.org/pipermail/gcc/2021-September/237247.html
gcc/ChangeLog:
* tree-pass.h (PROP_loop_opts_done): New.
* gimple-range-path.cc (path_range_query::internal_range_of_expr):
Intersect with global range.
* tree-ssa-loop.c (tree_ssa_loop_done): Set PROP_loop_opts_done.
* tree-ssa-threadbackward.c
(back_threader_profitability::profitable_path_p): Disable
threading through latches until after loop optimizations have run.
gcc/testsuite/ChangeLog:
* gcc.dg/tree-ssa/ssa-dom-thread-2b.c: Adjust for disabling of
threading through latches.
* gcc.dg/tree-ssa/ssa-dom-thread-6.c: Same.
* gcc.dg/tree-ssa/ssa-dom-thread-7.c: Same.
Co-authored-by: Michael Matz <matz(a)suse.de>
---
gcc/gimple-range-path.cc | 3 ++
gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-2b.c | 4 +--
gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-6.c | 37 ++---------------------
gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c | 17 +----------
gcc/tree-pass.h | 2 ++
gcc/tree-ssa-loop.c | 2 +-
gcc/tree-ssa-threadbackward.c | 28 +++++++++++++++--
7 files changed, 37 insertions(+), 56 deletions(-)
diff --git a/gcc/gimple-range-path.cc b/gcc/gimple-range-path.cc
index a4fa3b296ff..c616b65756f 100644
--- a/gcc/gimple-range-path.cc
+++ b/gcc/gimple-range-path.cc
@@ -127,6 +127,9 @@ path_range_query::internal_range_of_expr (irange &r, tree name, gimple *stmt)
basic_block bb = stmt ? gimple_bb (stmt) : exit_bb ();
if (stmt && range_defined_in_block (r, name, bb))
{
+ if (TREE_CODE (name) == SSA_NAME)
+ r.intersect (gimple_range_global (name));
+
set_cache (r, name);
return true;
}
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-2b.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-2b.c
index e1c33e86cd7..823ada982ff 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-2b.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-2b.c
@@ -1,5 +1,5 @@
/* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-thread1-stats -fdump-tree-dom2-stats -fdisable-tree-ethread" } */
+/* { dg-options "-O2 -fdump-tree-thread3-stats -fdump-tree-dom2-stats -fdisable-tree-ethread" } */
void foo();
void bla();
@@ -26,4 +26,4 @@ void thread_latch_through_header (void)
case. And we want to thread through the header as well. These
are both caught by threading in DOM. */
/* { dg-final { scan-tree-dump-not "Jumps threaded" "dom2"} } */
-/* { dg-final { scan-tree-dump-times "Jumps threaded: 1" 1 "thread1"} } */
+/* { dg-final { scan-tree-dump-times "Jumps threaded: 1" 1 "thread3"} } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-6.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-6.c
index c7bf867b084..ee46759bacc 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-6.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-6.c
@@ -1,41 +1,8 @@
/* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-thread1-details -fdump-tree-thread2-details" } */
+/* { dg-options "-O2 -fdump-tree-thread1-details -fdump-tree-thread3-details" } */
-/* All the threads in the thread1 dump start on a X->BB12 edge, as can
- be seen in the dump:
-
- Registering FSM jump thread: (x, 12) incoming edge; ...
- etc
- etc
-
- Before the new evrp, we were threading paths that started at the
- following edges:
-
- Registering FSM jump thread: (10, 12) incoming edge
- Registering FSM jump thread: (6, 12) incoming edge
- Registering FSM jump thread: (9, 12) incoming edge
-
- This was because the PHI at BB12 had constant values coming in from
- BB10, BB6, and BB9:
-
- # state_10 = PHI <state_11(7), 0(10), state_11(5), 1(6), state_11(8), 2(9), state_11(11)>
-
- Now with the new evrp, we get:
-
- # state_10 = PHI <0(7), 0(10), state_11(5), 1(6), 0(8), 2(9), 1(11)>
-
- Thus, we have 3 more paths that are known to be constant and can be
- threaded. Which means that by the second threading pass, we can
- only find one profitable path.
-
- For the record, all these extra constants are better paths coming
- out of switches. For example:
-
- SWITCH_BB -> BBx -> BBy -> BBz -> PHI
-
- We now know the value of the switch index at PHI. */
/* { dg-final { scan-tree-dump-times "Registering FSM jump" 6 "thread1" } } */
-/* { dg-final { scan-tree-dump-times "Registering FSM jump" 1 "thread2" } } */
+/* { dg-final { scan-tree-dump-times "Registering FSM jump" 1 "thread3" } } */
int sum0, sum1, sum2, sum3;
int foo (char *s, char **ret)
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c
index 5fc2145a432..ba07942f9dd 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c
@@ -1,23 +1,8 @@
/* { dg-do compile } */
/* { dg-options "-O2 -fdump-tree-thread1-stats -fdump-tree-thread2-stats -fdump-tree-dom2-stats -fdump-tree-thread3-stats -fdump-tree-dom3-stats -fdump-tree-vrp2-stats -fno-guess-branch-probability" } */
-/* Here we have the same issue as was commented in ssa-dom-thread-6.c.
- The PHI coming into the threader has a lot more constants, so the
- threader can thread more paths.
-
-$ diff clean/a.c.105t.mergephi2 a.c.105t.mergephi2
-252c252
-< # s_50 = PHI <s_49(10), 5(14), s_51(18), s_51(22), 1(26), 1(29), 1(31), s_51(5), 4(12), 1(15), 5(17), 1(19), 3(21), 1(23), 6(25), 7(28), s_51(30)>
----
-> # s_50 = PHI <s_49(10), 5(14), 4(18), 5(22), 1(26), 1(29), 1(31), s_51(5), 4(12), 1(15), 5(17), 1(19), 3(21), 1(23), 6(25), 7(28), 7(30)>
-272a273
-
- I spot checked a few and they all have the same pattern. We are
- basically tracking the switch index better through multiple
- paths. */
-
/* { dg-final { scan-tree-dump "Jumps threaded: 18" "thread1" } } */
-/* { dg-final { scan-tree-dump "Jumps threaded: 8" "thread2" } } */
+/* { dg-final { scan-tree-dump "Jumps threaded: 8" "thread3" } } */
/* { dg-final { scan-tree-dump-not "Jumps threaded" "dom2" } } */
/* aarch64 has the highest CASE_VALUES_THRESHOLD in GCC. It's high enough
diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h
index 83941bc0cee..eb75eb17951 100644
--- a/gcc/tree-pass.h
+++ b/gcc/tree-pass.h
@@ -225,6 +225,8 @@ protected:
been optimized. */
#define PROP_gimple_lomp_dev (1 << 16) /* done omp_device_lower */
#define PROP_rtl_split_insns (1 << 17) /* RTL has insns split. */
+#define PROP_loop_opts_done (1 << 18) /* SSA loop optimizations
+ have completed. */
#define PROP_gimple \
(PROP_gimple_any | PROP_gimple_lcf | PROP_gimple_leh | PROP_gimple_lomp)
diff --git a/gcc/tree-ssa-loop.c b/gcc/tree-ssa-loop.c
index 0cc4b3bbccf..1bbf2f1fb2c 100644
--- a/gcc/tree-ssa-loop.c
+++ b/gcc/tree-ssa-loop.c
@@ -540,7 +540,7 @@ const pass_data pass_data_tree_loop_done =
OPTGROUP_LOOP, /* optinfo_flags */
TV_NONE, /* tv_id */
PROP_cfg, /* properties_required */
- 0, /* properties_provided */
+ PROP_loop_opts_done, /* properties_provided */
0, /* properties_destroyed */
0, /* todo_flags_start */
TODO_cleanup_cfg, /* todo_flags_finish */
diff --git a/gcc/tree-ssa-threadbackward.c b/gcc/tree-ssa-threadbackward.c
index 449232c7715..e72992328de 100644
--- a/gcc/tree-ssa-threadbackward.c
+++ b/gcc/tree-ssa-threadbackward.c
@@ -43,6 +43,7 @@ along with GCC; see the file COPYING3. If not see
#include "ssa.h"
#include "tree-cfgcleanup.h"
#include "tree-pretty-print.h"
+#include "cfghooks.h"
// Path registry for the backwards threader. After all paths have been
// registered with register_path(), thread_through_all_blocks() is called
@@ -564,7 +565,10 @@ back_threader_registry::thread_through_all_blocks (bool may_peel_loop_headers)
TAKEN_EDGE, otherwise it is NULL.
CREATES_IRREDUCIBLE_LOOP, if non-null is set to TRUE if threading this path
- would create an irreducible loop. */
+ would create an irreducible loop.
+
+ ?? It seems we should be able to loosen some of the restrictions in
+ this function after loop optimizations have run. */
bool
back_threader_profitability::profitable_path_p (const vec<basic_block> &m_path,
@@ -725,7 +729,11 @@ back_threader_profitability::profitable_path_p (const vec<basic_block> &m_path,
the last entry in the array when determining if we thread
through the loop latch. */
if (loop->latch == bb)
- threaded_through_latch = true;
+ {
+ threaded_through_latch = true;
+ if (dump_file && (dump_flags & TDF_DETAILS))
+ fprintf (dump_file, " (latch)");
+ }
}
gimple *stmt = get_gimple_control_stmt (m_path[0]);
@@ -845,6 +853,22 @@ back_threader_profitability::profitable_path_p (const vec<basic_block> &m_path,
"a multiway branch.\n");
return false;
}
+
+ /* Threading through an empty latch would cause code to be added to
+ the latch. This could alter the loop form sufficiently to cause
+ loop optimizations to fail. Disable these threads until after
+ loop optimizations have run. */
+ if ((threaded_through_latch
+ || (taken_edge && taken_edge->dest == loop->latch))
+ && !(cfun->curr_properties & PROP_loop_opts_done)
+ && empty_block_p (loop->latch))
+ {
+ if (dump_file && (dump_flags & TDF_DETAILS))
+ fprintf (dump_file,
+ " FAIL: FSM Thread through latch before loop opts would create non-empty latch\n");
+ return false;
+
+ }
return true;
}
</cut>