linaro-toolchain September 2021

linaro-toolchain@lists.linaro.org

18 participants
86 discussions

by Peter Maydell

Progress (short week, 3 days) * UM-2 [QEMU upstream maintainership] + more code review, notably the Apple Silicon hvf support, which is nearly ready to go in * QEMU-406 [QEMU support for MVE (M-profile Vector Extension; Helium)] + Sent out v2 of the "optimized code gen for MVE" patchset; this now covers all the insns that have an easy optimized version. + Fixed a bug where we weren't correctly setting up FPSCR.LTPSIZE when using QEMU's user-mode-only emulator + Wrote some code to add support for the (not yet finalized) gdbstub XML that tells GDB that the guest CPU has MVE. This causes a GDB with the MVE handling to crash, so one or the other of us has got something wrong :-) KVM Forum was this week, as a 2-day virtual conference. I felt the programme was comparatively a bit small this year, but there were some interesting talks. Also a BoF session on whether/how we should consider adding Rust code to QEMU: I am pushing for (a) a clearer medium-to-long-term vision of where we would be going and why we'd be doing this and (b) more design-sketch type work of "what would XYZ in rust look like", which would hopefully both (a) make the benefit/lack thereof a bit more clear and (b) demonstrate that there are enough people enthusiastic enough about the prospect to make it a success... -- PMM

4 years, 6 months

[TCWG CI] 447.dealII,[.] grew in size by 164% after llvm: [libc++][NFC] Mark values in gdb pretty print comparison functions as live to prevent values being optimized out.

by ci_notify＠linaro.org

After llvm commit 1c3fcc8ae92ebfe9a9d1a21a288ad71ef7f98091 Author: Amy Kwan <amy.kwan1(a)ibm.com> [libc++][NFC] Mark values in gdb pretty print comparison functions as live to prevent values being optimized out. the following hot functions grew in size by more than 10% (but their benchmarks grew in size by less than 1%): - 447.dealII,[.] contract<3> grew in size by 164% Benchmark: Toolchain: Clang + Glibc + LLVM Linker Version: all components were built from their latest release branch Target: aarch64-linux-gnu Compiler flags: -Oz Hardware: APM Mustang 8x X-Gene1 This commit has regressed these CI configurations: - tcwg_bmk_llvm_apm/llvm-release-aarch64-spec2k6-Oz First_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release… Last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release… Baseline build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release… Even more details: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release… Reproduce builds: <cut> mkdir investigate-llvm-1c3fcc8ae92ebfe9a9d1a21a288ad71ef7f98091 cd investigate-llvm-1c3fcc8ae92ebfe9a9d1a21a288ad71ef7f98091 # Fetch scripts git clone https://git.linaro.org/toolchain/jenkins-scripts # Fetch manifests and test.sh script mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /llvm/ ./ ./bisect/baseline/ cd llvm # Reproduce first_bad build git checkout --detach 1c3fcc8ae92ebfe9a9d1a21a288ad71ef7f98091 ../artifacts/test.sh # Reproduce last_good build git checkout --detach c8905f1bb304f1cfe297312ae0dda9946cb27594 ../artifacts/test.sh cd .. </cut> Full commit (up to 1000 lines): <cut> commit 1c3fcc8ae92ebfe9a9d1a21a288ad71ef7f98091 Author: Amy Kwan <amy.kwan1(a)ibm.com> Date: Fri Sep 3 14:53:57 2021 -0400 [libc++][NFC] Mark values in gdb pretty print comparison functions as live to prevent values being optimized out. It appears when testing LLVM 13 on Power, we run into failures with the `libcxx/test/libcxx/gdb/gdb_pretty_printer_test.sh.cpp` test case optimizing values out. Despite some the functions in the test already being marked with optnone, adding the `MarkAsLive()` calls inside of the pretty printer comparison functions resolves the issues of the values being optimized out. This patch aims to address https://llvm.org/PR51675. Differential Revision: https://reviews.llvm.org/D109204 (cherry picked from commit 217c6d643124be312f4a99b203118744edb9d54c) --- libcxx/test/libcxx/gdb/gdb_pretty_printer_test.sh.cpp | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/libcxx/test/libcxx/gdb/gdb_pretty_printer_test.sh.cpp b/libcxx/test/libcxx/gdb/gdb_pretty_printer_test.sh.cpp index 2d8e9620089a..7c8d307d19fb 100644 --- a/libcxx/test/libcxx/gdb/gdb_pretty_printer_test.sh.cpp +++ b/libcxx/test/libcxx/gdb/gdb_pretty_printer_test.sh.cpp @@ -92,24 +92,28 @@ void MarkAsLive(Type &&) {} template <typename TypeToPrint> void ComparePrettyPrintToChars( TypeToPrint value, const char *expectation) { + MarkAsLive(value); StopForDebugger(&value, &expectation); } template <typename TypeToPrint> void ComparePrettyPrintToRegex( TypeToPrint value, const char *expectation) { + MarkAsLive(value); StopForDebugger(&value, &expectation); } void CompareExpressionPrettyPrintToChars( std::string value, const char *expectation) { + MarkAsLive(value); StopForDebugger(&value, &expectation); } void CompareExpressionPrettyPrintToRegex( std::string value, const char *expectation) { + MarkAsLive(value); StopForDebugger(&value, &expectation); } </cut>

4 years, 6 months

[TCWG CI] 456.hmmer slowed down by 4% after gcc: c++ ICE with nested requirement as default tpl parm[PR94827]

by ci_notify＠linaro.org

After gcc commit c416c52bcdb120db5e8c53a51bd78c4360daf79b Author: Nathan Sidwell <nathan(a)acm.org> c++ ICE with nested requirement as default tpl parm[PR94827] the following benchmarks slowed down by more than 2%: - 456.hmmer slowed down by 4% Benchmark: Toolchain: GCC + Glibc + GNU Linker Version: all components were built from their latest release branch Target: aarch64-linux-gnu Compiler flags: -O3 -flto Hardware: NVidia TX1 4x Cortex-A57 This commit has regressed these CI configurations: - tcwg_bmk_gnu_tx1/gnu-release-aarch64-spec2k6-O3_LTO First_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-release-a… Last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-release-a… Baseline build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-release-a… Even more details: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-release-a… Reproduce builds: <cut> mkdir investigate-gcc-c416c52bcdb120db5e8c53a51bd78c4360daf79b cd investigate-gcc-c416c52bcdb120db5e8c53a51bd78c4360daf79b # Fetch scripts git clone https://git.linaro.org/toolchain/jenkins-scripts # Fetch manifests and test.sh script mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-release-a… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-release-a… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-release-a… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /gcc/ ./ ./bisect/baseline/ cd gcc # Reproduce first_bad build git checkout --detach c416c52bcdb120db5e8c53a51bd78c4360daf79b ../artifacts/test.sh # Reproduce last_good build git checkout --detach b1983f4582bbe060b7da83578acb9ed653681fc8 ../artifacts/test.sh cd .. </cut> Full commit (up to 1000 lines): <cut> commit c416c52bcdb120db5e8c53a51bd78c4360daf79b Author: Nathan Sidwell <nathan(a)acm.org> Date: Thu Apr 30 08:23:16 2020 -0700 c++ ICE with nested requirement as default tpl parm[PR94827] Template headers are not incrementally updated as we parse its parameters. We maintain a dummy level until the closing > when we replace the dummy with a real parameter set. requires processing was expecting a properly populated arg_vec in current_template_parms, and then creates a self-mapping of parameters from that. But we don't need to do that, just teach map_arguments to look at TREE_VALUE when args is NULL. * constraint.cc (map_arguments): If ARGS is null, it's a self-mapping of parms. (finish_nested_requirement): Do not pass argified current_template_parms to normalization. (tsubst_nested_requirement): Don't assert no template parms. --- gcc/cp/ChangeLog | 10 ++++++++++ gcc/cp/constraint.cc | 27 ++++++++++++++++----------- gcc/testsuite/g++.dg/concepts/pr94827.C | 15 +++++++++++++++ 3 files changed, 41 insertions(+), 11 deletions(-) diff --git a/gcc/cp/ChangeLog b/gcc/cp/ChangeLog index 1fa0e123cb1..3c57945cecf 100644 --- a/gcc/cp/ChangeLog +++ b/gcc/cp/ChangeLog @@ -1,3 +1,13 @@ +2020-04-30 Jason Merrill <jason(a)redhat.com> + Nathan Sidwell <nathan(a)acm.org> + + PR c++/94827 + * constraint.cc (map_arguments): If ARGS is null, it's a + self-mapping of parms. + (finish_nested_requirement): Do not pass argified + current_template_parms to normalization. + (tsubst_nested_requirement): Don't assert no template parms. + 2020-04-30 Iain Sandoe <iain(a)sandoe.co.uk> PR c++/94886 diff --git a/gcc/cp/constraint.cc b/gcc/cp/constraint.cc index 866b0f51b05..85513fecf43 100644 --- a/gcc/cp/constraint.cc +++ b/gcc/cp/constraint.cc @@ -546,12 +546,16 @@ static tree map_arguments (tree parms, tree args) { for (tree p = parms; p; p = TREE_CHAIN (p)) - { - int level; - int index; - template_parm_level_and_index (TREE_VALUE (p), &level, &index); - TREE_PURPOSE (p) = TMPL_ARG (args, level, index); - } + if (args) + { + int level; + int index; + template_parm_level_and_index (TREE_VALUE (p), &level, &index); + TREE_PURPOSE (p) = TMPL_ARG (args, level, index); + } + else + TREE_PURPOSE (p) = TREE_VALUE (p); + return parms; } @@ -2005,8 +2009,6 @@ tsubst_compound_requirement (tree t, tree args, subst_info info) static tree tsubst_nested_requirement (tree t, tree args, subst_info info) { - gcc_assert (!uses_template_parms (args)); - /* Ensure that we're in an evaluation context prior to satisfaction. */ tree norm = TREE_VALUE (TREE_TYPE (t)); tree result = satisfy_constraint (norm, args, info); @@ -2953,12 +2955,15 @@ finish_compound_requirement (location_t loc, tree expr, tree type, bool noexcept tree finish_nested_requirement (location_t loc, tree expr) { + /* Currently open template headers have dummy arg vectors, so don't + pass into normalization. */ + tree norm = normalize_constraint_expression (expr, NULL_TREE, false); + tree args = current_template_parms + ? template_parms_to_args (current_template_parms) : NULL_TREE; + /* Save the normalized constraint and complete set of normalization arguments with the requirement. We keep the complete set of arguments around for re-normalization during diagnostics. */ - tree args = current_template_parms - ? template_parms_to_args (current_template_parms) : NULL_TREE; - tree norm = normalize_constraint_expression (expr, args, false); tree info = build_tree_list (args, norm); /* Build the constraint, saving its normalization as its type. */ diff --git a/gcc/testsuite/g++.dg/concepts/pr94827.C b/gcc/testsuite/g++.dg/concepts/pr94827.C new file mode 100644 index 00000000000..f14ec2551a1 --- /dev/null +++ b/gcc/testsuite/g++.dg/concepts/pr94827.C @@ -0,0 +1,15 @@ +// PR 94287 ICE looking inside open template-parm level +// { dg-do run { target c++17 } } +// { dg-options -fconcepts } + +template <typename T, + bool X = requires { requires (sizeof(T)==1); } > + int foo(T) { return X; } + +int main() { + if (!foo('4')) + return 1; + if (foo (4)) + return 2; + return 0; +} </cut>

4 years, 6 months

[TCWG CI] Regression caused by llvm: Inform pass manager when child loops are deleted

by ci_notify＠linaro.org

After llvm commit f17d60d620283b5d53286056ceeaeb8c27b6530a Author: Bjorn Pettersson <bjorn.a.pettersson(a)ericsson.com> Inform pass manager when child loops are deleted Below reproducer instructions can be used to re-build both "first_bad" and "last_good" cross-toolchains used in this bisection. Naturally, the scripts will fail when triggerring benchmarking jobs if you don't have access to Linaro TCWG CI. This commit has regressed these CI configurations: - tcwg_bmk_llvm_tx1/llvm-release-aarch64-spec2k6-O2 First_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-release… Last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-release… Baseline build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-release… Even more details: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-release… Reproduce builds: <cut> mkdir investigate-llvm-f17d60d620283b5d53286056ceeaeb8c27b6530a cd investigate-llvm-f17d60d620283b5d53286056ceeaeb8c27b6530a # Fetch scripts git clone https://git.linaro.org/toolchain/jenkins-scripts # Fetch manifests and test.sh script mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-release… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-release… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-release… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /llvm/ ./ ./bisect/baseline/ cd llvm # Reproduce first_bad build git checkout --detach f17d60d620283b5d53286056ceeaeb8c27b6530a ../artifacts/test.sh # Reproduce last_good build git checkout --detach f56129fe78d5c849971017976c71333b6b1a27c6 ../artifacts/test.sh cd .. </cut> Full commit (up to 1000 lines): <cut> commit f17d60d620283b5d53286056ceeaeb8c27b6530a Author: Bjorn Pettersson <bjorn.a.pettersson(a)ericsson.com> Date: Fri Sep 3 20:50:33 2021 +0200 Inform pass manager when child loops are deleted As part of the nontrivial unswitching we could end up removing child loops. This patch add a notification to the pass manager when that happens (using the markLoopAsDeleted callback). Without this there could be stale LoopAccessAnalysis results cached in the analysis manager. Those analysis results are cached based on a Loop* as key. Since the BumpPtrAllocator used to allocate Loop objects could be resetted between different runs of for example the loop-distribute pass (running on different functions), a new Loop object could be created using the same Loop pointer. And then when requiring the LoopAccessAnalysis for the loop we got the stale (corrupt) result from the destroyed loop. Reviewed By: aeubanks Differential Revision: https://reviews.llvm.org/D109257 (fixes PR51754) (cherry-picked from commit 0f0344dd1e3b53387bb396070916e67f4c426da6) --- llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp | 43 +++++++++---- .../nontrivial-unswitch-markloopasdeleted.ll | 71 ++++++++++++++++++++++ 2 files changed, 102 insertions(+), 12 deletions(-) diff --git a/llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp b/llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp index b9cccc2af309..b1c105258027 100644 --- a/llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp +++ b/llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp @@ -1587,10 +1587,12 @@ deleteDeadClonedBlocks(Loop &L, ArrayRef<BasicBlock *> ExitBlocks, BB->eraseFromParent(); } -static void deleteDeadBlocksFromLoop(Loop &L, - SmallVectorImpl<BasicBlock *> &ExitBlocks, - DominatorTree &DT, LoopInfo &LI, - MemorySSAUpdater *MSSAU) { +static void +deleteDeadBlocksFromLoop(Loop &L, + SmallVectorImpl<BasicBlock *> &ExitBlocks, + DominatorTree &DT, LoopInfo &LI, + MemorySSAUpdater *MSSAU, + function_ref<void(Loop &, StringRef)> DestroyLoopCB) { // Find all the dead blocks tied to this loop, and remove them from their // successors. SmallSetVector<BasicBlock *, 8> DeadBlockSet; @@ -1640,6 +1642,7 @@ static void deleteDeadBlocksFromLoop(Loop &L, }) && "If the child loop header is dead all blocks in the child loop must " "be dead as well!"); + DestroyLoopCB(*ChildL, ChildL->getName()); LI.destroy(ChildL); return true; }); @@ -1980,6 +1983,8 @@ static bool rebuildLoopAfterUnswitch(Loop &L, ArrayRef<BasicBlock *> ExitBlocks, ParentL->removeChildLoop(llvm::find(*ParentL, &L)); else LI.removeLoop(llvm::find(LI, &L)); + // markLoopAsDeleted for L should be triggered by the caller (it is typically + // done by using the UnswitchCB callback). LI.destroy(&L); return false; } @@ -2019,7 +2024,8 @@ static void unswitchNontrivialInvariants( SmallVectorImpl<BasicBlock *> &ExitBlocks, IVConditionInfo &PartialIVInfo, DominatorTree &DT, LoopInfo &LI, AssumptionCache &AC, function_ref<void(bool, bool, ArrayRef<Loop *>)> UnswitchCB, - ScalarEvolution *SE, MemorySSAUpdater *MSSAU) { + ScalarEvolution *SE, MemorySSAUpdater *MSSAU, + function_ref<void(Loop &, StringRef)> DestroyLoopCB) { auto *ParentBB = TI.getParent(); BranchInst *BI = dyn_cast<BranchInst>(&TI); SwitchInst *SI = BI ? nullptr : cast<SwitchInst>(&TI); @@ -2319,7 +2325,7 @@ static void unswitchNontrivialInvariants( // Now that our cloned loops have been built, we can update the original loop. // First we delete the dead blocks from it and then we rebuild the loop // structure taking these deletions into account. - deleteDeadBlocksFromLoop(L, ExitBlocks, DT, LI, MSSAU); + deleteDeadBlocksFromLoop(L, ExitBlocks, DT, LI, MSSAU, DestroyLoopCB); if (MSSAU && VerifyMemorySSA) MSSAU->getMemorySSA()->verifyMemorySSA(); @@ -2670,7 +2676,8 @@ static bool unswitchBestCondition( Loop &L, DominatorTree &DT, LoopInfo &LI, AssumptionCache &AC, AAResults &AA, TargetTransformInfo &TTI, function_ref<void(bool, bool, ArrayRef<Loop *>)> UnswitchCB, - ScalarEvolution *SE, MemorySSAUpdater *MSSAU) { + ScalarEvolution *SE, MemorySSAUpdater *MSSAU, + function_ref<void(Loop &, StringRef)> DestroyLoopCB) { // Collect all invariant conditions within this loop (as opposed to an inner // loop which would be handled when visiting that inner loop). SmallVector<std::pair<Instruction *, TinyPtrVector<Value *>>, 4> @@ -2958,7 +2965,7 @@ static bool unswitchBestCondition( << "\n"); unswitchNontrivialInvariants(L, *BestUnswitchTI, BestUnswitchInvariants, ExitBlocks, PartialIVInfo, DT, LI, AC, - UnswitchCB, SE, MSSAU); + UnswitchCB, SE, MSSAU, DestroyLoopCB); return true; } @@ -2988,7 +2995,8 @@ unswitchLoop(Loop &L, DominatorTree &DT, LoopInfo &LI, AssumptionCache &AC, AAResults &AA, TargetTransformInfo &TTI, bool Trivial, bool NonTrivial, function_ref<void(bool, bool, ArrayRef<Loop *>)> UnswitchCB, - ScalarEvolution *SE, MemorySSAUpdater *MSSAU) { + ScalarEvolution *SE, MemorySSAUpdater *MSSAU, + function_ref<void(Loop &, StringRef)> DestroyLoopCB) { assert(L.isRecursivelyLCSSAForm(DT, LI) && "Loops must be in LCSSA form before unswitching."); @@ -3036,7 +3044,8 @@ unswitchLoop(Loop &L, DominatorTree &DT, LoopInfo &LI, AssumptionCache &AC, // Try to unswitch the best invariant condition. We prefer this full unswitch to // a partial unswitch when possible below the threshold. - if (unswitchBestCondition(L, DT, LI, AC, AA, TTI, UnswitchCB, SE, MSSAU)) + if (unswitchBestCondition(L, DT, LI, AC, AA, TTI, UnswitchCB, SE, MSSAU, + DestroyLoopCB)) return true; // No other opportunities to unswitch. @@ -3083,6 +3092,10 @@ PreservedAnalyses SimpleLoopUnswitchPass::run(Loop &L, LoopAnalysisManager &AM, U.markLoopAsDeleted(L, LoopName); }; + auto DestroyLoopCB = [&U](Loop &L, StringRef Name) { + U.markLoopAsDeleted(L, Name); + }; + Optional<MemorySSAUpdater> MSSAU; if (AR.MSSA) { MSSAU = MemorySSAUpdater(AR.MSSA); @@ -3091,7 +3104,8 @@ PreservedAnalyses SimpleLoopUnswitchPass::run(Loop &L, LoopAnalysisManager &AM, } if (!unswitchLoop(L, AR.DT, AR.LI, AR.AC, AR.AA, AR.TTI, Trivial, NonTrivial, UnswitchCB, &AR.SE, - MSSAU.hasValue() ? MSSAU.getPointer() : nullptr)) + MSSAU.hasValue() ? MSSAU.getPointer() : nullptr, + DestroyLoopCB)) return PreservedAnalyses::all(); if (AR.MSSA && VerifyMemorySSA) @@ -3179,12 +3193,17 @@ bool SimpleLoopUnswitchLegacyPass::runOnLoop(Loop *L, LPPassManager &LPM) { LPM.markLoopAsDeleted(*L); }; + auto DestroyLoopCB = [&LPM](Loop &L, StringRef /* Name */) { + LPM.markLoopAsDeleted(L); + }; + if (MSSA && VerifyMemorySSA) MSSA->verifyMemorySSA(); bool Changed = unswitchLoop(*L, DT, LI, AC, AA, TTI, true, NonTrivial, UnswitchCB, SE, - MSSAU.hasValue() ? MSSAU.getPointer() : nullptr); + MSSAU.hasValue() ? MSSAU.getPointer() : nullptr, + DestroyLoopCB); if (MSSA && VerifyMemorySSA) MSSA->verifyMemorySSA(); diff --git a/llvm/test/Transforms/SimpleLoopUnswitch/nontrivial-unswitch-markloopasdeleted.ll b/llvm/test/Transforms/SimpleLoopUnswitch/nontrivial-unswitch-markloopasdeleted.ll new file mode 100644 index 000000000000..455a38535576 --- /dev/null +++ b/llvm/test/Transforms/SimpleLoopUnswitch/nontrivial-unswitch-markloopasdeleted.ll @@ -0,0 +1,71 @@ +; RUN: opt < %s -enable-loop-distribute -passes='loop-distribute,loop-mssa(simple-loop-unswitch<nontrivial>),loop-distribute' -o /dev/null -S -debug-pass-manager=verbose 2>&1 | FileCheck %s + + +; Running loop-distribute will result in LoopAccessAnalysis being required and +; cached in the LoopAnalysisManagerFunctionProxy. +; +; CHECK: Running analysis: LoopAccessAnalysis on Loop at depth 2 containing: %loop_a_inner<header><latch><exiting> + + +; Then simple-loop-unswitch is removing/replacing some loops (resulting in +; Loop objects used as key in the analyses cache is destroyed). So here we +; want to see that any analysis results cached on the destroyed loop is +; cleared. A special case here is that loop_a_inner is destroyed when +; unswitching the parent loop. +; +; The bug solved and verified by this test case was related to the +; SimpleLoopUnswitch not marking the Loop as removed, so we missed clearing +; the analysis caches. +; +; CHECK: Running pass: SimpleLoopUnswitchPass on Loop at depth 1 containing: %loop_begin<header>,%loop_b,%loop_b_inner,%loop_b_inner_exit,%loop_a,%loop_a_inner,%loop_a_inner_exit,%latch<latch><exiting> +; CHECK-NEXT: Clearing all analysis results for: loop_a_inner + + +; When running loop-distribute the second time we can see that loop_a_inner +; isn't analysed because the loop no longer exists (instead we find a new loop, +; loop_a_inner.us). This kind of verifies that it was correct to remove the +; loop_a_inner related analysis above. +; +; CHECK: Running analysis: LoopAccessAnalysis on Loop at depth 2 containing: %loop_a_inner.us<header><latch><exiting> + + +define i32 @test6(i1* %ptr, i1 %cond1, i32* %a.ptr, i32* %b.ptr) { +entry: + br label %loop_begin + +loop_begin: + %v = load i1, i1* %ptr + br i1 %cond1, label %loop_a, label %loop_b + +loop_a: + br label %loop_a_inner + +loop_a_inner: + %va = load i1, i1* %ptr + %a = load i32, i32* %a.ptr + br i1 %va, label %loop_a_inner, label %loop_a_inner_exit + +loop_a_inner_exit: + %a.lcssa = phi i32 [ %a, %loop_a_inner ] + br label %latch + +loop_b: + br label %loop_b_inner + +loop_b_inner: + %vb = load i1, i1* %ptr + %b = load i32, i32* %b.ptr + br i1 %vb, label %loop_b_inner, label %loop_b_inner_exit + +loop_b_inner_exit: + %b.lcssa = phi i32 [ %b, %loop_b_inner ] + br label %latch + +latch: + %ab.phi = phi i32 [ %a.lcssa, %loop_a_inner_exit ], [ %b.lcssa, %loop_b_inner_exit ] + br i1 %v, label %loop_begin, label %loop_exit + +loop_exit: + %ab.lcssa = phi i32 [ %ab.phi, %latch ] + ret i32 %ab.lcssa +} </cut>

4 years, 6 months

[TCWG CI] Regression caused by llvm:d2bb6d512c0f7da8749de51522a0a98f3f08242a

by ci_notify＠linaro.org

Identified regression caused by *llvm:d2bb6d512c0f7da8749de51522a0a98f3f08242a*: commit d2bb6d512c0f7da8749de51522a0a98f3f08242a Author: Min-Yih Hsu <minyihh(a)uci.edu> [X86] Add explicit library dependency on LLVMInstrumentation Results regressed to (for first_bad == d2bb6d512c0f7da8749de51522a0a98f3f08242a) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -- -O2 artifacts/build-d2bb6d512c0f7da8749de51522a0a98f3f08242a/results_id: 1 # 433.milc,milc_base.default regressed by 103 # 433.milc,[.] mult_su3_mat_vec regressed by 123 from (for last_good == ef8707574bbc7d264644d9e6730118cc0addd871) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -- -O2 artifacts/build-ef8707574bbc7d264644d9e6730118cc0addd871/results_id: 1 This commit has regressed these CI configurations: - tcwg_bmk_llvm_tx1/llvm-master-aarch64-spec2k6-O2 Artifacts of last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Even more details: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Reproduce builds: <cut> mkdir investigate-llvm-d2bb6d512c0f7da8749de51522a0a98f3f08242a cd investigate-llvm-d2bb6d512c0f7da8749de51522a0a98f3f08242a # Fetch scripts git clone https://git.linaro.org/toolchain/jenkins-scripts # Fetch manifests and test.sh script mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /llvm/ ./ ./bisect/baseline/ cd llvm # Reproduce first_bad build git checkout --detach d2bb6d512c0f7da8749de51522a0a98f3f08242a ../artifacts/test.sh # Reproduce last_good build git checkout --detach ef8707574bbc7d264644d9e6730118cc0addd871 ../artifacts/test.sh cd .. </cut> Full commit (up to 1000 lines): <cut> commit d2bb6d512c0f7da8749de51522a0a98f3f08242a Author: Min-Yih Hsu <minyihh(a)uci.edu> Date: Tue Aug 24 14:07:21 2021 -0700 [X86] Add explicit library dependency on LLVMInstrumentation Patch 9588b685c6b2 introduced dependency on ASAN. But it didn't explicitly put LLVMInstrumentation as one of the library dependencies such that the build will fail if we're building LLVM as shared libraries (i.e. -DBUILD_SHARED_LIBS=ON). This patch explicitly links X86CodeGen against the Instrumentation component. Differential Revision: https://reviews.llvm.org/D108662 --- llvm/lib/Target/X86/CMakeLists.txt | 1 + 1 file changed, 1 insertion(+) diff --git a/llvm/lib/Target/X86/CMakeLists.txt b/llvm/lib/Target/X86/CMakeLists.txt index a2816f6e5e84..d13c5d250554 100644 --- a/llvm/lib/Target/X86/CMakeLists.txt +++ b/llvm/lib/Target/X86/CMakeLists.txt @@ -91,6 +91,7 @@ add_llvm_target(X86CodeGen ${sources} AsmPrinter CodeGen Core + Instrumentation MC SelectionDAG Support </cut>

4 years, 6 months

[TCWG CI] Regression caused by gcc:392aa7d7adfbd84253121d2ef779bf3c627e8d0b

by ci_notify＠linaro.org

Identified regression caused by *gcc:392aa7d7adfbd84253121d2ef779bf3c627e8d0b*: commit 392aa7d7adfbd84253121d2ef779bf3c627e8d0b Author: Jeff Law <law(a)torsion.usersys.redhat.com> Fix some testsuite failures for H8/SX multilibs where short branches where used when long branches were necessary. Results regressed to (for first_bad == 392aa7d7adfbd84253121d2ef779bf3c627e8d0b) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer: -5 # true: 0 # benchmark -- -O3_LTO artifacts/build-392aa7d7adfbd84253121d2ef779bf3c627e8d0b/results_id: 1 # 456.hmmer,hmmer_base.default regressed by 104 from (for last_good == c7137fcc7cbc1f1f14f9fed75adcc6bd8f1d418c) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer: -5 # true: 0 # benchmark -- -O3_LTO artifacts/build-c7137fcc7cbc1f1f14f9fed75adcc6bd8f1d418c/results_id: 1 This commit has regressed these CI configurations: - tcwg_bmk_gnu_tx1/gnu-release-aarch64-spec2k6-O3_LTO Artifacts of last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-release-a… Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-release-a… Even more details: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-release-a… Reproduce builds: <cut> mkdir investigate-gcc-392aa7d7adfbd84253121d2ef779bf3c627e8d0b cd investigate-gcc-392aa7d7adfbd84253121d2ef779bf3c627e8d0b # Fetch scripts git clone https://git.linaro.org/toolchain/jenkins-scripts # Fetch manifests and test.sh script mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-release-a… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-release-a… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-release-a… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /gcc/ ./ ./bisect/baseline/ cd gcc # Reproduce first_bad build git checkout --detach 392aa7d7adfbd84253121d2ef779bf3c627e8d0b ../artifacts/test.sh # Reproduce last_good build git checkout --detach c7137fcc7cbc1f1f14f9fed75adcc6bd8f1d418c ../artifacts/test.sh cd .. </cut> Full commit (up to 1000 lines): <cut> commit 392aa7d7adfbd84253121d2ef779bf3c627e8d0b Author: Jeff Law <law(a)torsion.usersys.redhat.com> Date: Wed Apr 29 10:19:22 2020 -0400 Fix some testsuite failures for H8/SX multilibs where short branches where used when long branches were necessary. * config/h8300/h8300.md (H8/SX div patterns): All H8/SX specific division instructions are 4 bytes long. --- gcc/ChangeLog | 5 +++++ gcc/config/h8300/h8300.md | 4 ++-- 2 files changed, 7 insertions(+), 2 deletions(-) diff --git a/gcc/ChangeLog b/gcc/ChangeLog index 80064da83ce..a2d4a1b82f4 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,3 +1,8 @@ +2020-04-29 Jeff Law <law(a)redhat.com> + + * config/h8300/h8300.md (H8/SX div patterns): All H8/SX specific + division instructions are 4 bytes long. + 2020-04-29 Jakub Jelinek <jakub(a)redhat.com> PR target/94826 diff --git a/gcc/config/h8300/h8300.md b/gcc/config/h8300/h8300.md index 3e5cdbeeebe..a86b8ea2074 100644 --- a/gcc/config/h8300/h8300.md +++ b/gcc/config/h8300/h8300.md @@ -1218,7 +1218,7 @@ (match_operand:HSI 2 "reg_or_nibble_operand" "r IP4>X")))] "TARGET_H8300SX" { return <MODE>mode == HImode ? "divu.w\\t%T2,%T0" : "divu.l\\t%S2,%S0"; } - [(set_attr "length" "2")]) + [(set_attr "length" "4")]) (define_insn "div<mode>3" [(set (match_operand:HSI 0 "register_operand" "=r") @@ -1226,7 +1226,7 @@ (match_operand:HSI 2 "reg_or_nibble_operand" "r IP4>X")))] "TARGET_H8300SX" { return <MODE>mode == HImode ? "divs.w\\t%T2,%T0" : "divs.l\\t%S2,%S0"; } - [(set_attr "length" "2")]) + [(set_attr "length" "4")]) (define_insn "udivmodqi4" [(set (match_operand:QI 0 "register_operand" "=r") </cut>

4 years, 6 months

[TCWG CI] Regression caused by llvm:2eb554a9feafff5188d8b924908205c87d7f2fee

by ci_notify＠linaro.org

Identified regression caused by *llvm:2eb554a9feafff5188d8b924908205c87d7f2fee*: commit 2eb554a9feafff5188d8b924908205c87d7f2fee Author: Roman Lebedev <lebedev.ri(a)gmail.com> Revert "Reland [SimplifyCFG] performBranchToCommonDestFolding(): form block-closed SSA form before cloning instructions (PR51125)" Results regressed to (for first_bad == 2eb554a9feafff5188d8b924908205c87d7f2fee) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--with-mode=arm --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--with-mode=arm --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -- -O3_marm artifacts/build-2eb554a9feafff5188d8b924908205c87d7f2fee/results_id: 1 # 483.xalancbmk,libc.so.6 regressed by 81400 from (for last_good == 7142eb17fb3419a76c9ac4afce0df986ff08d61c) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--with-mode=arm --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--with-mode=arm --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -- -O3_marm artifacts/build-7142eb17fb3419a76c9ac4afce0df986ff08d61c/results_id: 1 This commit has regressed these CI configurations: - tcwg_bmk_llvm_tk1/llvm-master-arm-spec2k6-O3 Artifacts of last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-master-… Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-master-… Even more details: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-master-… Reproduce builds: <cut> mkdir investigate-llvm-2eb554a9feafff5188d8b924908205c87d7f2fee cd investigate-llvm-2eb554a9feafff5188d8b924908205c87d7f2fee # Fetch scripts git clone https://git.linaro.org/toolchain/jenkins-scripts # Fetch manifests and test.sh script mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-master-… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-master-… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-master-… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /llvm/ ./ ./bisect/baseline/ cd llvm # Reproduce first_bad build git checkout --detach 2eb554a9feafff5188d8b924908205c87d7f2fee ../artifacts/test.sh # Reproduce last_good build git checkout --detach 7142eb17fb3419a76c9ac4afce0df986ff08d61c ../artifacts/test.sh cd .. </cut> Full commit (up to 1000 lines): <cut> commit 2eb554a9feafff5188d8b924908205c87d7f2fee Author: Roman Lebedev <lebedev.ri(a)gmail.com> Date: Mon Aug 16 10:53:15 2021 +0300 Revert "Reland [SimplifyCFG] performBranchToCommonDestFolding(): form block-closed SSA form before cloning instructions (PR51125)" This is still wrong, as failing bots suggest. This reverts commit 3d9beefc7d713ad8462d92427ccd17b9532ce904. --- llvm/lib/Transforms/Utils/SimplifyCFG.cpp | 75 +++------------------- .../SimplifyCFG/fold-branch-to-common-dest.ll | 18 +++--- 2 files changed, 18 insertions(+), 75 deletions(-) diff --git a/llvm/lib/Transforms/Utils/SimplifyCFG.cpp b/llvm/lib/Transforms/Utils/SimplifyCFG.cpp index 68a0388398fc..847fdd760d2f 100644 --- a/llvm/lib/Transforms/Utils/SimplifyCFG.cpp +++ b/llvm/lib/Transforms/Utils/SimplifyCFG.cpp @@ -1095,24 +1095,17 @@ static void CloneInstructionsIntoPredecessorBlockAndUpdateSSAUses( // Update (liveout) uses of bonus instructions, // now that the bonus instruction has been cloned into predecessor. - // Note that we expect to be in a block-closed SSA form for this to work! + SSAUpdater SSAUpdate; + SSAUpdate.Initialize(BonusInst.getType(), + (NewBonusInst->getName() + ".merge").str()); + SSAUpdate.AddAvailableValue(BB, &BonusInst); + SSAUpdate.AddAvailableValue(PredBlock, NewBonusInst); for (Use &U : make_early_inc_range(BonusInst.uses())) { auto *UI = cast<Instruction>(U.getUser()); - auto *PN = dyn_cast<PHINode>(UI); - if (!PN) { - assert(UI->getParent() == BB && BonusInst.comesBefore(UI) && - "If the user is not a PHI node, then it should be in the same " - "block as, and come after, the original bonus instruction."); - continue; // Keep using the original bonus instruction. - } - // Is this the block-closed SSA form PHI node? - if (PN->getIncomingBlock(U) == BB) - continue; // Great, keep using the original bonus instruction. - // The only other alternative is an "use" when coming from - // the predecessor block - here we should refer to the cloned bonus instr. - assert(PN->getIncomingBlock(U) == PredBlock && - "Not in block-closed SSA form?"); - U.set(NewBonusInst); + if (UI->getParent() != PredBlock) + SSAUpdate.RewriteUseAfterInsertions(U); + else // Use is in the same block as, and comes before, NewBonusInst. + SSAUpdate.RewriteUse(U); } } } @@ -3039,56 +3032,6 @@ static bool performBranchToCommonDestFolding(BranchInst *BI, BranchInst *PBI, LLVM_DEBUG(dbgs() << "FOLDING BRANCH TO COMMON DEST:\n" << *PBI << *BB); - // We want to duplicate all the bonus instructions in this block, - // and rewrite their uses, but in some cases with self-loops, - // the naive use rewrite approach won't work (will result in miscompilations). - // To avoid this problem, let's form block-closed SSA form. - for (Instruction &BonusInst : - reverse(iterator_range<BasicBlock::iterator>(*BB))) { - auto IsBCSSAUse = [BB, &BonusInst](Use &U) { - auto *UI = cast<Instruction>(U.getUser()); - if (auto *PN = dyn_cast<PHINode>(UI)) - return PN->getIncomingBlock(U) == BB; - return UI->getParent() == BB && BonusInst.comesBefore(UI); - }; - - // Does this instruction require rewriting of uses? - if (all_of(BonusInst.uses(), IsBCSSAUse)) - continue; - - SSAUpdater SSAUpdate; - Type *Ty = BonusInst.getType(); - SmallVector<PHINode *, 8> BCSSAPHIs; - SSAUpdate.Initialize(Ty, BonusInst.getName()); - - // Into each successor block of BB, insert a PHI node, that receives - // the BonusInst when coming from it's basic block, or poison otherwise. - for (BasicBlock *Succ : successors(BB)) { - // The block may have the same successor multiple times. Do it only once. - if (SSAUpdate.HasValueForBlock(Succ)) - continue; - BCSSAPHIs.emplace_back(PHINode::Create( - Ty, 0, BonusInst.getName() + ".bcssa", &Succ->front())); - PHINode *PN = BCSSAPHIs.back(); - for (BasicBlock *PredOfSucc : predecessors(Succ)) - PN->addIncoming(PredOfSucc == BB ? (Value *)&BonusInst - : PoisonValue::get(Ty), - PredOfSucc); - SSAUpdate.AddAvailableValue(Succ, PN); - } - - // And rewrite all uses that break block-closed SSA form. - for (Use &U : make_early_inc_range(BonusInst.uses())) - if (!IsBCSSAUse(U)) - SSAUpdate.RewriteUseAfterInsertions(U); - - // We might not have ended up needing PHI's in all of the succ blocks, - // drop the ones that are certainly unused, but don't bother otherwise. - for (PHINode *PN : BCSSAPHIs) - if (PN->use_empty()) - PN->eraseFromParent(); - } - IRBuilder<> Builder(PBI); // The builder is used to create instructions to eliminate the branch in BB. // If BB's terminator has !annotation metadata, add it to the new diff --git a/llvm/test/Transforms/SimplifyCFG/fold-branch-to-common-dest.ll b/llvm/test/Transforms/SimplifyCFG/fold-branch-to-common-dest.ll index d948b61d65a0..2ff041826077 100644 --- a/llvm/test/Transforms/SimplifyCFG/fold-branch-to-common-dest.ll +++ b/llvm/test/Transforms/SimplifyCFG/fold-branch-to-common-dest.ll @@ -834,7 +834,7 @@ define void @pr48450() { ; CHECK-NEXT: entry: ; CHECK-NEXT: br label [[FOR_BODY:%.*]] ; CHECK: for.body: -; CHECK-NEXT: [[COUNTDOWN:%.*]] = phi i8 [ 8, [[ENTRY:%.*]] ], [ [[DEC_BCSSA1:%.*]], [[FOR_BODYTHREAD_PRE_SPLIT:%.*]] ] +; CHECK-NEXT: [[COUNTDOWN:%.*]] = phi i8 [ 8, [[ENTRY:%.*]] ], [ [[DEC_MERGE:%.*]], [[FOR_BODYTHREAD_PRE_SPLIT:%.*]] ] ; CHECK-NEXT: [[C:%.*]] = call i1 @gen1() ; CHECK-NEXT: br i1 [[C]], label [[FOR_INC:%.*]], label [[IF_THEN:%.*]] ; CHECK: for.inc: @@ -849,7 +849,7 @@ define void @pr48450() { ; CHECK-NEXT: [[OR_COND:%.*]] = select i1 [[C2_NOT]], i1 true, i1 [[CMP_NOT]] ; CHECK-NEXT: br i1 [[OR_COND]], label [[IF_END_LOOPEXIT]], label [[FOR_BODYTHREAD_PRE_SPLIT]] ; CHECK: for.bodythread-pre-split: -; CHECK-NEXT: [[DEC_BCSSA1]] = phi i8 [ [[DEC_OLD]], [[FOR_INC]] ], [ [[DEC]], [[IF_THEN]] ] +; CHECK-NEXT: [[DEC_MERGE]] = phi i8 [ [[DEC]], [[IF_THEN]] ], [ [[DEC_OLD]], [[FOR_INC]] ] ; CHECK-NEXT: call void @sideeffect0() ; CHECK-NEXT: br label [[FOR_BODY]] ; CHECK: if.end.loopexit: @@ -885,7 +885,7 @@ define void @pr48450_2(i1 %enable_loopback) { ; CHECK-NEXT: entry: ; CHECK-NEXT: br label [[FOR_BODY:%.*]] ; CHECK: for.body: -; CHECK-NEXT: [[COUNTDOWN:%.*]] = phi i8 [ 8, [[ENTRY:%.*]] ], [ [[DEC_BCSSA1:%.*]], [[FOR_BODYTHREAD_PRE_SPLIT:%.*]] ] +; CHECK-NEXT: [[COUNTDOWN:%.*]] = phi i8 [ 8, [[ENTRY:%.*]] ], [ [[DEC_MERGE:%.*]], [[FOR_BODYTHREAD_PRE_SPLIT:%.*]] ] ; CHECK-NEXT: [[C:%.*]] = call i1 @gen1() ; CHECK-NEXT: br i1 [[C]], label [[FOR_INC:%.*]], label [[IF_THEN:%.*]] ; CHECK: for.inc: @@ -900,7 +900,7 @@ define void @pr48450_2(i1 %enable_loopback) { ; CHECK-NEXT: [[OR_COND:%.*]] = select i1 [[C2_NOT]], i1 true, i1 [[CMP_NOT]] ; CHECK-NEXT: br i1 [[OR_COND]], label [[IF_END_LOOPEXIT]], label [[FOR_BODYTHREAD_PRE_SPLIT]] ; CHECK: for.bodythread-pre-split: -; CHECK-NEXT: [[DEC_BCSSA1]] = phi i8 [ poison, [[FOR_BODYTHREAD_PRE_SPLIT_LOOPBACK:%.*]] ], [ [[DEC_OLD]], [[FOR_INC]] ], [ [[DEC]], [[IF_THEN]] ] +; CHECK-NEXT: [[DEC_MERGE]] = phi i8 [ [[DEC_OLD]], [[FOR_INC]] ], [ [[DEC_MERGE]], [[FOR_BODYTHREAD_PRE_SPLIT_LOOPBACK:%.*]] ], [ [[DEC]], [[IF_THEN]] ] ; CHECK-NEXT: [[SHOULD_LOOPBACK:%.*]] = phi i1 [ true, [[FOR_INC]] ], [ false, [[FOR_BODYTHREAD_PRE_SPLIT_LOOPBACK]] ], [ true, [[IF_THEN]] ] ; CHECK-NEXT: [[DO_LOOPBACK:%.*]] = and i1 [[SHOULD_LOOPBACK]], [[ENABLE_LOOPBACK:%.*]] ; CHECK-NEXT: call void @sideeffect0() @@ -1005,8 +1005,8 @@ define void @pr49510() { ; CHECK-NEXT: [[TOBOOL_OLD:%.*]] = icmp ne i16 [[DOTOLD]], 0 ; CHECK-NEXT: br i1 [[TOBOOL_OLD]], label [[LAND_RHS:%.*]], label [[FOR_END:%.*]] ; CHECK: land.rhs: -; CHECK-NEXT: [[DOTBCSSA:%.*]] = phi i16 [ [[DOTOLD]], [[ENTRY:%.*]] ], [ [[TMP0:%.*]], [[LAND_RHS]] ] -; CHECK-NEXT: [[CMP:%.*]] = icmp slt i16 [[DOTBCSSA]], 0 +; CHECK-NEXT: [[DOTMERGE:%.*]] = phi i16 [ [[TMP0:%.*]], [[LAND_RHS]] ], [ [[DOTOLD]], [[ENTRY:%.*]] ] +; CHECK-NEXT: [[CMP:%.*]] = icmp slt i16 [[DOTMERGE]], 0 ; CHECK-NEXT: [[TMP0]] = load i16, i16* @global_pr49510, align 1 ; CHECK-NEXT: [[TOBOOL:%.*]] = icmp ne i16 [[TMP0]], 0 ; CHECK-NEXT: [[OR_COND:%.*]] = select i1 [[CMP]], i1 [[TOBOOL]], i1 false @@ -1043,15 +1043,15 @@ define i32 @pr51125() { ; CHECK-NEXT: [[ISZERO_OLD:%.*]] = icmp eq i32 [[LD_OLD]], 0 ; CHECK-NEXT: br i1 [[ISZERO_OLD]], label [[EXIT:%.*]], label [[L2:%.*]] ; CHECK: L2: -; CHECK-NEXT: [[LD_BCSSA1:%.*]] = phi i32 [ [[LD_OLD]], [[ENTRY:%.*]] ], [ [[LD:%.*]], [[L2]] ] +; CHECK-NEXT: [[LD_MERGE:%.*]] = phi i32 [ [[LD:%.*]], [[L2]] ], [ [[LD_OLD]], [[ENTRY:%.*]] ] ; CHECK-NEXT: store i32 -1, i32* @global_pr51125, align 4 -; CHECK-NEXT: [[CMP:%.*]] = icmp ne i32 [[LD_BCSSA1]], -1 +; CHECK-NEXT: [[CMP:%.*]] = icmp ne i32 [[LD_MERGE]], -1 ; CHECK-NEXT: [[LD]] = load i32, i32* @global_pr51125, align 4 ; CHECK-NEXT: [[ISZERO:%.*]] = icmp eq i32 [[LD]], 0 ; CHECK-NEXT: [[OR_COND:%.*]] = select i1 [[CMP]], i1 true, i1 [[ISZERO]] ; CHECK-NEXT: br i1 [[OR_COND]], label [[EXIT]], label [[L2]] ; CHECK: exit: -; CHECK-NEXT: [[R:%.*]] = phi i32 [ [[LD_BCSSA1]], [[L2]] ], [ [[LD_OLD]], [[ENTRY]] ] +; CHECK-NEXT: [[R:%.*]] = phi i32 [ [[LD]], [[L2]] ], [ [[LD_OLD]], [[ENTRY]] ] ; CHECK-NEXT: ret i32 [[R]] ; entry: </cut>

4 years, 6 months

[TCWG CI] Regression caused by gcc:76b75018b3d053a890ebe155e47814de14b3c9fb

by ci_notify＠linaro.org

Identified regression caused by *gcc:76b75018b3d053a890ebe155e47814de14b3c9fb*: commit 76b75018b3d053a890ebe155e47814de14b3c9fb Author: Jason Merrill <jason(a)redhat.com> c++: implement C++17 hardware interference size Results regressed to (for first_bad == 76b75018b3d053a890ebe155e47814de14b3c9fb) # reset_artifacts: -10 # true: 0 # build_abe binutils: 1 # build_abe stage1: 2 # build_abe linux: 3 # build_abe glibc: 4 # First few build errors in logs: from (for last_good == 8ea292591e42aa4d52b4b7a00b86335bfd2e2e85) # reset_artifacts: -10 # true: 0 # build_abe binutils: 1 # build_abe stage1: 2 # build_abe linux: 3 # build_abe glibc: 4 # build_abe stage2: 5 # build_abe gdb: 6 # build_abe qemu: 7 This commit has regressed these CI configurations: - tcwg_gnu_cross_build/master-aarch64 Artifacts of last_good build: https://ci.linaro.org/job/tcwg_gnu_cross_build-bisect-master-aarch64/2/arti… Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_gnu_cross_build-bisect-master-aarch64/2/arti… Even more details: https://ci.linaro.org/job/tcwg_gnu_cross_build-bisect-master-aarch64/2/arti… Reproduce builds: <cut> mkdir investigate-gcc-76b75018b3d053a890ebe155e47814de14b3c9fb cd investigate-gcc-76b75018b3d053a890ebe155e47814de14b3c9fb # Fetch scripts git clone https://git.linaro.org/toolchain/jenkins-scripts # Fetch manifests and test.sh script mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_gnu_cross_build-bisect-master-aarch64/2/arti… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_gnu_cross_build-bisect-master-aarch64/2/arti… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_gnu_cross_build-bisect-master-aarch64/2/arti… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_gnu-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /gcc/ ./ ./bisect/baseline/ cd gcc # Reproduce first_bad build git checkout --detach 76b75018b3d053a890ebe155e47814de14b3c9fb ../artifacts/test.sh # Reproduce last_good build git checkout --detach 8ea292591e42aa4d52b4b7a00b86335bfd2e2e85 ../artifacts/test.sh cd .. </cut> Full commit (up to 1000 lines): <cut> commit 76b75018b3d053a890ebe155e47814de14b3c9fb Author: Jason Merrill <jason(a)redhat.com> Date: Thu Jul 15 15:30:17 2021 -0400 c++: implement C++17 hardware interference size The last missing piece of the C++17 standard library is the hardware intereference size constants. Much of the delay in implementing these has been due to uncertainty about what the right values are, and even whether there is a single constant value that is suitable; the destructive interference size is intended to be used in structure layout, so program ABIs will depend on it. In principle, both of these values should be the same as the target's L1 cache line size. When compiling for a generic target that is intended to support a range of target CPUs with different cache line sizes, the constructive size should probably be the minimum size, and the destructive size the maximum, unless you are constrained by ABI compatibility with previous code. From discussion on gcc-patches, I've come to the conclusion that the solution to the difficulty of choosing stable values is to give up on it, and instead encourage only uses where ABI stability is unimportant: in particular, uses where the ABI is shared at most between translation units built at the same time with the same flags. To that end, I've added a warning for any use of the constant value of std::hardware_destructive_interference_size in a header or module export. Appropriate uses within a project can disable the warning. A previous iteration of this patch included an -finterference-tune flag to make the value vary with -mtune; this iteration makes that the default behavior, which should be appropriate for all reasonable uses of the variable. The previous default of "stable-ish" seems to me likely to have been more of an attractive nuisance; since we can't promise actual stability, we should instead make proper uses more convenient. JF Bastien's implementation proposal is summarized at https://github.com/itanium-cxx-abi/cxx-abi/issues/74 I implement this by adding new --params for the two sizes. Targets can override these values in targetm.target_option.override() to support a range of values for the generic target; otherwise, both will default to the L1 cache line size. 64 bytes still seems correct for all x86. I'm not sure why he proposed 64/64 for generic 32-bit ARM, since the Cortex A9 has a 32-byte cache line, so I'd think 32/64 would make more sense. He proposed 64/128 for generic AArch64, but since the A64FX now has a 256B cache line, I've changed that to 64/256. Other arch maintainers are invited to set ranges for their generic targets if that seems better than using the default cache line size for both values. With the above choice to reject stability as a goal, getting these values "right" is now just a matter of what we want the default optimization to be, and we can feel free to adjust them as CPUs with different cache lines become more and less common. gcc/ChangeLog: * params.opt: Add destructive-interference-size and constructive-interference-size. * doc/invoke.texi: Document them. * config/aarch64/aarch64.c (aarch64_override_options_internal): Set them. * config/arm/arm.c (arm_option_override): Set them. * config/i386/i386-options.c (ix86_option_override_internal): Set them. gcc/c-family/ChangeLog: * c.opt: Add -Winterference-size. * c-cppbuiltin.c (cpp_atomic_builtins): Add __GCC_DESTRUCTIVE_SIZE and __GCC_CONSTRUCTIVE_SIZE. gcc/cp/ChangeLog: * constexpr.c (maybe_warn_about_constant_value): Complain about std::hardware_destructive_interference_size. (cxx_eval_constant_expression): Call it. * decl.c (cxx_init_decl_processing): Check --param *-interference-size values. libstdc++-v3/ChangeLog: * include/std/version: Define __cpp_lib_hardware_interference_size. * libsupc++/new: Define hardware interference size variables. gcc/testsuite/ChangeLog: * g++.dg/warn/Winterference.H: New file. * g++.dg/warn/Winterference.C: New test. * g++.target/aarch64/interference.C: New test. * g++.target/arm/interference.C: New test. * g++.target/i386/interference.C: New test. --- gcc/c-family/c-cppbuiltin.c | 14 ++++++ gcc/c-family/c.opt | 5 ++ gcc/config/aarch64/aarch64.c | 22 +++++++++ gcc/config/arm/arm.c | 22 +++++++++ gcc/config/i386/i386-options.c | 6 +++ gcc/cp/constexpr.c | 33 +++++++++++++ gcc/cp/decl.c | 32 ++++++++++++ gcc/doc/invoke.texi | 65 +++++++++++++++++++++++++ gcc/params.opt | 16 ++++++ gcc/testsuite/g++.dg/warn/Winterference-2.C | 14 ++++++ gcc/testsuite/g++.dg/warn/Winterference.C | 6 +++ gcc/testsuite/g++.dg/warn/Winterference.H | 7 +++ gcc/testsuite/g++.target/aarch64/interference.C | 9 ++++ gcc/testsuite/g++.target/arm/interference.C | 9 ++++ gcc/testsuite/g++.target/i386/interference.C | 8 +++ libstdc++-v3/include/std/version | 3 ++ libstdc++-v3/libsupc++/new | 10 +++- 17 files changed, 279 insertions(+), 2 deletions(-) diff --git a/gcc/c-family/c-cppbuiltin.c b/gcc/c-family/c-cppbuiltin.c index 48cbefd8bf8..ce88e707127 100644 --- a/gcc/c-family/c-cppbuiltin.c +++ b/gcc/c-family/c-cppbuiltin.c @@ -741,6 +741,20 @@ cpp_atomic_builtins (cpp_reader *pfile) builtin_define_with_int_value ("__GCC_ATOMIC_TEST_AND_SET_TRUEVAL", targetm.atomic_test_and_set_trueval); + /* Macros for C++17 hardware interference size constants. Either both or + neither should be set. */ + gcc_assert (!param_destruct_interfere_size + == !param_construct_interfere_size); + if (param_destruct_interfere_size) + { + /* FIXME The way of communicating these values to the library should be + part of the C++ ABI, whether macro or builtin. */ + builtin_define_with_int_value ("__GCC_DESTRUCTIVE_SIZE", + param_destruct_interfere_size); + builtin_define_with_int_value ("__GCC_CONSTRUCTIVE_SIZE", + param_construct_interfere_size); + } + /* ptr_type_node can't be used here since ptr_mode is only set when toplev calls backend_init which is not done with -E or pch. */ psize = POINTER_SIZE_UNITS; diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt index c5fe90003f2..9c151d19870 100644 --- a/gcc/c-family/c.opt +++ b/gcc/c-family/c.opt @@ -722,6 +722,11 @@ Winit-list-lifetime C++ ObjC++ Var(warn_init_list) Warning Init(1) Warn about uses of std::initializer_list that can result in dangling pointers. +Winterference-size +C++ ObjC++ Var(warn_interference_size) Warning Init(1) +Warn about nonsensical values of --param destructive-interference-size or +constructive-interference-size. + Wimplicit C ObjC Var(warn_implicit) Warning LangEnabledBy(C ObjC,Wall) Warn about implicit declarations. diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 30d9a0b7a3d..36519ccc5a5 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -16540,6 +16540,28 @@ aarch64_override_options_internal (struct gcc_options *opts) SET_OPTION_IF_UNSET (opts, &global_options_set, param_l1_cache_line_size, aarch64_tune_params.prefetch->l1_cache_line_size); + + if (aarch64_tune_params.prefetch->l1_cache_line_size >= 0) + { + SET_OPTION_IF_UNSET (opts, &global_options_set, + param_destruct_interfere_size, + aarch64_tune_params.prefetch->l1_cache_line_size); + SET_OPTION_IF_UNSET (opts, &global_options_set, + param_construct_interfere_size, + aarch64_tune_params.prefetch->l1_cache_line_size); + } + else + { + /* For a generic AArch64 target, cover the current range of cache line + sizes. */ + SET_OPTION_IF_UNSET (opts, &global_options_set, + param_destruct_interfere_size, + 256); + SET_OPTION_IF_UNSET (opts, &global_options_set, + param_construct_interfere_size, + 64); + } + if (aarch64_tune_params.prefetch->l2_cache_size >= 0) SET_OPTION_IF_UNSET (opts, &global_options_set, param_l2_cache_size, diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index f1e628253d0..6c6e77fab66 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -3669,6 +3669,28 @@ arm_option_override (void) SET_OPTION_IF_UNSET (&global_options, &global_options_set, param_l1_cache_line_size, current_tune->prefetch.l1_cache_line_size); + if (current_tune->prefetch.l1_cache_line_size >= 0) + { + SET_OPTION_IF_UNSET (&global_options, &global_options_set, + param_destruct_interfere_size, + current_tune->prefetch.l1_cache_line_size); + SET_OPTION_IF_UNSET (&global_options, &global_options_set, + param_construct_interfere_size, + current_tune->prefetch.l1_cache_line_size); + } + else + { + /* For a generic ARM target, JF Bastien proposed using 64 for both. */ + /* ??? Cortex A9 has a 32-byte cache line, so why not 32 for + constructive? */ + /* More recent Cortex chips have a 64-byte cache line, but are marked + ARM_PREFETCH_NOT_BENEFICIAL, so they get these defaults. */ + SET_OPTION_IF_UNSET (&global_options, &global_options_set, + param_destruct_interfere_size, 64); + SET_OPTION_IF_UNSET (&global_options, &global_options_set, + param_construct_interfere_size, 64); + } + if (current_tune->prefetch.l1_cache_size >= 0) SET_OPTION_IF_UNSET (&global_options, &global_options_set, param_l1_cache_size, diff --git a/gcc/config/i386/i386-options.c b/gcc/config/i386/i386-options.c index 2cb87cedec0..c0006b3674b 100644 --- a/gcc/config/i386/i386-options.c +++ b/gcc/config/i386/i386-options.c @@ -2579,6 +2579,12 @@ ix86_option_override_internal (bool main_args_p, SET_OPTION_IF_UNSET (opts, opts_set, param_l2_cache_size, ix86_tune_cost->l2_cache_size); + /* 64B is the accepted value for these for all x86. */ + SET_OPTION_IF_UNSET (&global_options, &global_options_set, + param_destruct_interfere_size, 64); + SET_OPTION_IF_UNSET (&global_options, &global_options_set, + param_construct_interfere_size, 64); + /* Enable sw prefetching at -O3 for CPUS that prefetching is helpful. */ if (opts->x_flag_prefetch_loop_arrays < 0 && HAVE_prefetch diff --git a/gcc/cp/constexpr.c b/gcc/cp/constexpr.c index 7772fe62d95..0c2498aee22 100644 --- a/gcc/cp/constexpr.c +++ b/gcc/cp/constexpr.c @@ -6075,6 +6075,37 @@ inline_asm_in_constexpr_error (location_t loc) "%<constexpr%> function in C++20"); } +/* We're getting the constant value of DECL in a manifestly constant-evaluated + context; maybe complain about that. */ + +static void +maybe_warn_about_constant_value (location_t loc, tree decl) +{ + static bool explained = false; + if (cxx_dialect >= cxx17 + && warn_interference_size + && !global_options_set.x_param_destruct_interfere_size + && DECL_CONTEXT (decl) == std_node + && id_equal (DECL_NAME (decl), "hardware_destructive_interference_size") + && (LOCATION_FILE (input_location) != main_input_filename + || module_exporting_p ()) + && warning_at (loc, OPT_Winterference_size, "use of %qD", decl) + && !explained) + { + explained = true; + inform (loc, "its value can vary between compiler versions or " + "with different %<-mtune%> or %<-mcpu%> flags"); + inform (loc, "if this use is part of a public ABI, change it to " + "instead use a constant variable you define"); + inform (loc, "the default value for the current CPU tuning " + "is %d bytes", param_destruct_interfere_size); + inform (loc, "you can stabilize this value with %<--param " + "hardware_destructive_interference_size=%d%>, or disable " + "this warning with %<-Wno-interference-size%>", + param_destruct_interfere_size); + } +} + /* Attempt to reduce the expression T to a constant value. On failure, issue diagnostic and return error_mark_node. */ /* FIXME unify with c_fully_fold */ @@ -6219,6 +6250,8 @@ cxx_eval_constant_expression (const constexpr_ctx *ctx, tree t, r = *p; break; } + if (ctx->manifestly_const_eval) + maybe_warn_about_constant_value (loc, t); if (COMPLETE_TYPE_P (TREE_TYPE (t)) && is_really_empty_class (TREE_TYPE (t), /*ignore_vptr*/false)) { diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c index bce62ad202a..c2065027369 100644 --- a/gcc/cp/decl.c +++ b/gcc/cp/decl.c @@ -4752,6 +4752,38 @@ cxx_init_decl_processing (void) /* Show we use EH for cleanups. */ if (flag_exceptions) using_eh_for_cleanups (); + + /* Check that the hardware interference sizes are at least + alignof(max_align_t), as required by the standard. */ + const int max_align = max_align_t_align () / BITS_PER_UNIT; + if (param_destruct_interfere_size) + { + if (param_destruct_interfere_size < max_align) + error ("%<--param destructive-interference-size=%d%> is less than " + "%d", param_destruct_interfere_size, max_align); + else if (param_destruct_interfere_size < param_l1_cache_line_size) + warning (OPT_Winterference_size, + "%<--param destructive-interference-size=%d%> " + "is less than %<--param l1-cache-line-size=%d%>", + param_destruct_interfere_size, param_l1_cache_line_size); + } + else if (param_l1_cache_line_size >= max_align) + param_destruct_interfere_size = param_l1_cache_line_size; + /* else leave it unset. */ + + if (param_construct_interfere_size) + { + if (param_construct_interfere_size < max_align) + error ("%<--param constructive-interference-size=%d%> is less than " + "%d", param_construct_interfere_size, max_align); + else if (param_construct_interfere_size > param_l1_cache_line_size) + warning (OPT_Winterference_size, + "%<--param constructive-interference-size=%d%> " + "is greater than %<--param l1-cache-line-size=%d%>", + param_construct_interfere_size, param_l1_cache_line_size); + } + else if (param_l1_cache_line_size >= max_align) + param_construct_interfere_size = param_l1_cache_line_size; } /* Enter an abi node in global-module context. returns a cookie to diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index 23cc68f92b5..78cfc100ac2 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -9018,6 +9018,43 @@ that has already been done in the current function. Therefore, seemingly insignificant changes in the source program can cause the warnings produced by @option{-Winline} to appear or disappear. +@item -Winterference-size +@opindex Winterference-size +Warn about use of C++17 @code{std::hardware_destructive_interference_size} +without specifying its value with @option{--param destructive-interference-size}. +Also warn about questionable values for that option. + +This variable is intended to be used for controlling class layout, to +avoid false sharing in concurrent code: + +@smallexample +struct independent_fields @{ + alignas(std::hardware_destructive_interference_size) std::atomic<int> one; + alignas(std::hardware_destructive_interference_size) std::atomic<int> two; +@}; +@end smallexample + +Here @samp{one} and @samp{two} are intended to be far enough apart +that stores to one won't require accesses to the other to reload the +cache line. + +By default, @option{--param destructive-interference-size} and +@option{--param constructive-interference-size} are set based on the +current @option{-mtune} option, typically to the L1 cache line size +for the particular target CPU, sometimes to a range if tuning for a +generic target. So all translation units that depend on ABI +compatibility for the use of these variables must be compiled with +the same @option{-mtune} (or @option{-mcpu}). + +If ABI stability is important, such as if the use is in a header for a +library, you should probably not use the hardware interference size +variables at all. Alternatively, you can force a particular value +with @option{--param}. + +If you are confident that your use of the variable does not affect ABI +outside a single build of your project, you can turn off the warning +with @option{-Wno-interference-size}. + @item -Wint-in-bool-context @opindex Wint-in-bool-context @opindex Wno-int-in-bool-context @@ -13938,6 +13975,34 @@ prefetch hints can be issued for any constant stride. This setting is only useful for strides that are known and constant. +@item destructive-interference-size +@item constructive-interference-size +The values for the C++17 variables +@code{std::hardware_destructive_interference_size} and +@code{std::hardware_constructive_interference_size}. The destructive +interference size is the minimum recommended offset between two +independent concurrently-accessed objects; the constructive +interference size is the maximum recommended size of contiguous memory +accessed together. Typically both will be the size of an L1 cache +line for the target, in bytes. For a generic target covering a range of L1 +cache line sizes, typically the constructive interference size will be +the small end of the range and the destructive size will be the large +end. + +The destructive interference size is intended to be used for layout, +and thus has ABI impact. The default value is not expected to be +stable, and on some targets varies with @option{-mtune}, so use of +this variable in a context where ABI stability is important, such as +the public interface of a library, is strongly discouraged; if it is +used in that context, users can stabilize the value using this +option. + +The constructive interference size is less sensitive, as it is +typically only used in a @samp{static_assert} to make sure that a type +fits within a cache line. + +See also @option{-Winterference-size}. + @item loop-interchange-max-num-stmts The maximum number of stmts in a loop to be interchanged. diff --git a/gcc/params.opt b/gcc/params.opt index 3a701e22c46..658ca028851 100644 --- a/gcc/params.opt +++ b/gcc/params.opt @@ -361,6 +361,22 @@ The maximum code size growth ratio when expanding into a jump table (in percent) Common Joined UInteger Var(param_l1_cache_line_size) Init(32) Param Optimization The size of L1 cache line. +-param=destructive-interference-size= +Common Joined UInteger Var(param_destruct_interfere_size) Init(0) Param Optimization +The minimum recommended offset between two concurrently-accessed objects to +avoid additional performance degradation due to contention introduced by the +implementation. Typically the L1 cache line size, but can be larger to +accommodate a variety of target processors with different cache line sizes. +C++17 code might use this value in structure layout, but is strongly +discouraged from doing so in public ABIs. + +-param=constructive-interference-size= +Common Joined UInteger Var(param_construct_interfere_size) Init(0) Param Optimization +The maximum recommended size of contiguous memory occupied by two objects +accessed with temporal locality by concurrent threads. Typically the L1 cache +line size, but can be smaller to accommodate a variety of target processors with +different cache line sizes. + -param=l1-cache-size= Common Joined UInteger Var(param_l1_cache_size) Init(64) Param Optimization The size of L1 cache. diff --git a/gcc/testsuite/g++.dg/warn/Winterference-2.C b/gcc/testsuite/g++.dg/warn/Winterference-2.C new file mode 100644 index 00000000000..2af75c63f83 --- /dev/null +++ b/gcc/testsuite/g++.dg/warn/Winterference-2.C @@ -0,0 +1,14 @@ +// { dg-do compile { target c++20 } } +// { dg-additional-options -fmodules-ts } + +module ; + +#include <new> + +export module foo; + +export { + struct A { + alignas(std::hardware_destructive_interference_size) int x; // { dg-warning Winterference-size } + }; +} diff --git a/gcc/testsuite/g++.dg/warn/Winterference.C b/gcc/testsuite/g++.dg/warn/Winterference.C new file mode 100644 index 00000000000..57c001bc032 --- /dev/null +++ b/gcc/testsuite/g++.dg/warn/Winterference.C @@ -0,0 +1,6 @@ +// Test that we warn about use of std::hardware_destructive_interference_size +// in a header. +// { dg-do compile { target c++17 } } + +// { dg-warning Winterference-size "" { target *-*-* } 0 } +#include "Winterference.H" diff --git a/gcc/testsuite/g++.dg/warn/Winterference.H b/gcc/testsuite/g++.dg/warn/Winterference.H new file mode 100644 index 00000000000..36f0ad5f6d1 --- /dev/null +++ b/gcc/testsuite/g++.dg/warn/Winterference.H @@ -0,0 +1,7 @@ +#include <new> + +struct A +{ + alignas(std::hardware_destructive_interference_size) int i; + alignas(std::hardware_destructive_interference_size) int j; +}; diff --git a/gcc/testsuite/g++.target/aarch64/interference.C b/gcc/testsuite/g++.target/aarch64/interference.C new file mode 100644 index 00000000000..0fc01655223 --- /dev/null +++ b/gcc/testsuite/g++.target/aarch64/interference.C @@ -0,0 +1,9 @@ +// Test C++17 hardware interference size constants +// { dg-do compile { target c++17 } } + +#include <new> + +// Most AArch64 CPUs have an L1 cache line size of 64, but some recent ones use +// 128 or even 256. +static_assert(std::hardware_destructive_interference_size == 256); +static_assert(std::hardware_constructive_interference_size == 64); diff --git a/gcc/testsuite/g++.target/arm/interference.C b/gcc/testsuite/g++.target/arm/interference.C new file mode 100644 index 00000000000..34fe8a52bff --- /dev/null +++ b/gcc/testsuite/g++.target/arm/interference.C @@ -0,0 +1,9 @@ +// Test C++17 hardware interference size constants +// { dg-do compile { target c++17 } } + +#include <new> + +// Recent ARM CPUs have a cache line size of 64. Older ones have +// a size of 32, but I guess they're old enough that we don't care? +static_assert(std::hardware_destructive_interference_size == 64); +static_assert(std::hardware_constructive_interference_size == 64); diff --git a/gcc/testsuite/g++.target/i386/interference.C b/gcc/testsuite/g++.target/i386/interference.C new file mode 100644 index 00000000000..c7b910e3ada --- /dev/null +++ b/gcc/testsuite/g++.target/i386/interference.C @@ -0,0 +1,8 @@ +// Test C++17 hardware interference size constants +// { dg-do compile { target c++17 } } + +#include <new> + +// It is generally agreed that these are the right values for all x86. +static_assert(std::hardware_destructive_interference_size == 64); +static_assert(std::hardware_constructive_interference_size == 64); diff --git a/libstdc++-v3/include/std/version b/libstdc++-v3/include/std/version index f950bf0f0db..f41004b5911 100644 --- a/libstdc++-v3/include/std/version +++ b/libstdc++-v3/include/std/version @@ -140,6 +140,9 @@ #define __cpp_lib_filesystem 201703 #define __cpp_lib_gcd 201606 #define __cpp_lib_gcd_lcm 201606 +#ifdef __GCC_DESTRUCTIVE_SIZE +# define __cpp_lib_hardware_interference_size 201703L +#endif #define __cpp_lib_hypot 201603 #define __cpp_lib_invoke 201411L #define __cpp_lib_lcm 201606 diff --git a/libstdc++-v3/libsupc++/new b/libstdc++-v3/libsupc++/new index 3349b13fd1b..7bc67a6cb02 100644 --- a/libstdc++-v3/libsupc++/new +++ b/libstdc++-v3/libsupc++/new @@ -183,9 +183,9 @@ inline void operator delete[](void*, void*) _GLIBCXX_USE_NOEXCEPT { } } // extern "C++" #if __cplusplus >= 201703L -#ifdef _GLIBCXX_HAVE_BUILTIN_LAUNDER namespace std { +#ifdef _GLIBCXX_HAVE_BUILTIN_LAUNDER #define __cpp_lib_launder 201606 /// Pointer optimization barrier [ptr.launder] template<typename _Tp> @@ -205,8 +205,14 @@ namespace std void launder(const void*) = delete; void launder(volatile void*) = delete; void launder(const volatile void*) = delete; -} #endif // _GLIBCXX_HAVE_BUILTIN_LAUNDER + +#ifdef __GCC_DESTRUCTIVE_SIZE +# define __cpp_lib_hardware_interference_size 201703L + inline constexpr size_t hardware_destructive_interference_size = __GCC_DESTRUCTIVE_SIZE; + inline constexpr size_t hardware_constructive_interference_size = __GCC_CONSTRUCTIVE_SIZE; +#endif // __GCC_DESTRUCTIVE_SIZE +} #endif // C++17 #if __cplusplus > 201703L </cut>

4 years, 6 months

[TCWG CI] Regression caused by gcc:76b75018b3d053a890ebe155e47814de14b3c9fb

by ci_notify＠linaro.org

Identified regression caused by *gcc:76b75018b3d053a890ebe155e47814de14b3c9fb*: commit 76b75018b3d053a890ebe155e47814de14b3c9fb Author: Jason Merrill <jason(a)redhat.com> c++: implement C++17 hardware interference size Results regressed to (for first_bad == 76b75018b3d053a890ebe155e47814de14b3c9fb) # reset_artifacts: -10 # true: 0 # build_abe binutils: 1 # First few build errors in logs: from (for last_good == 8ea292591e42aa4d52b4b7a00b86335bfd2e2e85) # reset_artifacts: -10 # true: 0 # build_abe binutils: 1 # build_abe bootstrap: 2 This commit has regressed these CI configurations: - tcwg_gcc_bootstrap/master-aarch64-bootstrap Artifacts of last_good build: https://ci.linaro.org/job/tcwg_gcc_bootstrap-bisect-master-aarch64-bootstra… Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_gcc_bootstrap-bisect-master-aarch64-bootstra… Even more details: https://ci.linaro.org/job/tcwg_gcc_bootstrap-bisect-master-aarch64-bootstra… Reproduce builds: <cut> mkdir investigate-gcc-76b75018b3d053a890ebe155e47814de14b3c9fb cd investigate-gcc-76b75018b3d053a890ebe155e47814de14b3c9fb # Fetch scripts git clone https://git.linaro.org/toolchain/jenkins-scripts # Fetch manifests and test.sh script mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_gcc_bootstrap-bisect-master-aarch64-bootstra… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_gcc_bootstrap-bisect-master-aarch64-bootstra… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_gcc_bootstrap-bisect-master-aarch64-bootstra… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_gnu-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /gcc/ ./ ./bisect/baseline/ cd gcc # Reproduce first_bad build git checkout --detach 76b75018b3d053a890ebe155e47814de14b3c9fb ../artifacts/test.sh # Reproduce last_good build git checkout --detach 8ea292591e42aa4d52b4b7a00b86335bfd2e2e85 ../artifacts/test.sh cd .. </cut> Full commit (up to 1000 lines): <cut> commit 76b75018b3d053a890ebe155e47814de14b3c9fb Author: Jason Merrill <jason(a)redhat.com> Date: Thu Jul 15 15:30:17 2021 -0400 c++: implement C++17 hardware interference size The last missing piece of the C++17 standard library is the hardware intereference size constants. Much of the delay in implementing these has been due to uncertainty about what the right values are, and even whether there is a single constant value that is suitable; the destructive interference size is intended to be used in structure layout, so program ABIs will depend on it. In principle, both of these values should be the same as the target's L1 cache line size. When compiling for a generic target that is intended to support a range of target CPUs with different cache line sizes, the constructive size should probably be the minimum size, and the destructive size the maximum, unless you are constrained by ABI compatibility with previous code. From discussion on gcc-patches, I've come to the conclusion that the solution to the difficulty of choosing stable values is to give up on it, and instead encourage only uses where ABI stability is unimportant: in particular, uses where the ABI is shared at most between translation units built at the same time with the same flags. To that end, I've added a warning for any use of the constant value of std::hardware_destructive_interference_size in a header or module export. Appropriate uses within a project can disable the warning. A previous iteration of this patch included an -finterference-tune flag to make the value vary with -mtune; this iteration makes that the default behavior, which should be appropriate for all reasonable uses of the variable. The previous default of "stable-ish" seems to me likely to have been more of an attractive nuisance; since we can't promise actual stability, we should instead make proper uses more convenient. JF Bastien's implementation proposal is summarized at https://github.com/itanium-cxx-abi/cxx-abi/issues/74 I implement this by adding new --params for the two sizes. Targets can override these values in targetm.target_option.override() to support a range of values for the generic target; otherwise, both will default to the L1 cache line size. 64 bytes still seems correct for all x86. I'm not sure why he proposed 64/64 for generic 32-bit ARM, since the Cortex A9 has a 32-byte cache line, so I'd think 32/64 would make more sense. He proposed 64/128 for generic AArch64, but since the A64FX now has a 256B cache line, I've changed that to 64/256. Other arch maintainers are invited to set ranges for their generic targets if that seems better than using the default cache line size for both values. With the above choice to reject stability as a goal, getting these values "right" is now just a matter of what we want the default optimization to be, and we can feel free to adjust them as CPUs with different cache lines become more and less common. gcc/ChangeLog: * params.opt: Add destructive-interference-size and constructive-interference-size. * doc/invoke.texi: Document them. * config/aarch64/aarch64.c (aarch64_override_options_internal): Set them. * config/arm/arm.c (arm_option_override): Set them. * config/i386/i386-options.c (ix86_option_override_internal): Set them. gcc/c-family/ChangeLog: * c.opt: Add -Winterference-size. * c-cppbuiltin.c (cpp_atomic_builtins): Add __GCC_DESTRUCTIVE_SIZE and __GCC_CONSTRUCTIVE_SIZE. gcc/cp/ChangeLog: * constexpr.c (maybe_warn_about_constant_value): Complain about std::hardware_destructive_interference_size. (cxx_eval_constant_expression): Call it. * decl.c (cxx_init_decl_processing): Check --param *-interference-size values. libstdc++-v3/ChangeLog: * include/std/version: Define __cpp_lib_hardware_interference_size. * libsupc++/new: Define hardware interference size variables. gcc/testsuite/ChangeLog: * g++.dg/warn/Winterference.H: New file. * g++.dg/warn/Winterference.C: New test. * g++.target/aarch64/interference.C: New test. * g++.target/arm/interference.C: New test. * g++.target/i386/interference.C: New test. --- gcc/c-family/c-cppbuiltin.c | 14 ++++++ gcc/c-family/c.opt | 5 ++ gcc/config/aarch64/aarch64.c | 22 +++++++++ gcc/config/arm/arm.c | 22 +++++++++ gcc/config/i386/i386-options.c | 6 +++ gcc/cp/constexpr.c | 33 +++++++++++++ gcc/cp/decl.c | 32 ++++++++++++ gcc/doc/invoke.texi | 65 +++++++++++++++++++++++++ gcc/params.opt | 16 ++++++ gcc/testsuite/g++.dg/warn/Winterference-2.C | 14 ++++++ gcc/testsuite/g++.dg/warn/Winterference.C | 6 +++ gcc/testsuite/g++.dg/warn/Winterference.H | 7 +++ gcc/testsuite/g++.target/aarch64/interference.C | 9 ++++ gcc/testsuite/g++.target/arm/interference.C | 9 ++++ gcc/testsuite/g++.target/i386/interference.C | 8 +++ libstdc++-v3/include/std/version | 3 ++ libstdc++-v3/libsupc++/new | 10 +++- 17 files changed, 279 insertions(+), 2 deletions(-) diff --git a/gcc/c-family/c-cppbuiltin.c b/gcc/c-family/c-cppbuiltin.c index 48cbefd8bf8..ce88e707127 100644 --- a/gcc/c-family/c-cppbuiltin.c +++ b/gcc/c-family/c-cppbuiltin.c @@ -741,6 +741,20 @@ cpp_atomic_builtins (cpp_reader *pfile) builtin_define_with_int_value ("__GCC_ATOMIC_TEST_AND_SET_TRUEVAL", targetm.atomic_test_and_set_trueval); + /* Macros for C++17 hardware interference size constants. Either both or + neither should be set. */ + gcc_assert (!param_destruct_interfere_size + == !param_construct_interfere_size); + if (param_destruct_interfere_size) + { + /* FIXME The way of communicating these values to the library should be + part of the C++ ABI, whether macro or builtin. */ + builtin_define_with_int_value ("__GCC_DESTRUCTIVE_SIZE", + param_destruct_interfere_size); + builtin_define_with_int_value ("__GCC_CONSTRUCTIVE_SIZE", + param_construct_interfere_size); + } + /* ptr_type_node can't be used here since ptr_mode is only set when toplev calls backend_init which is not done with -E or pch. */ psize = POINTER_SIZE_UNITS; diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt index c5fe90003f2..9c151d19870 100644 --- a/gcc/c-family/c.opt +++ b/gcc/c-family/c.opt @@ -722,6 +722,11 @@ Winit-list-lifetime C++ ObjC++ Var(warn_init_list) Warning Init(1) Warn about uses of std::initializer_list that can result in dangling pointers. +Winterference-size +C++ ObjC++ Var(warn_interference_size) Warning Init(1) +Warn about nonsensical values of --param destructive-interference-size or +constructive-interference-size. + Wimplicit C ObjC Var(warn_implicit) Warning LangEnabledBy(C ObjC,Wall) Warn about implicit declarations. diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 30d9a0b7a3d..36519ccc5a5 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -16540,6 +16540,28 @@ aarch64_override_options_internal (struct gcc_options *opts) SET_OPTION_IF_UNSET (opts, &global_options_set, param_l1_cache_line_size, aarch64_tune_params.prefetch->l1_cache_line_size); + + if (aarch64_tune_params.prefetch->l1_cache_line_size >= 0) + { + SET_OPTION_IF_UNSET (opts, &global_options_set, + param_destruct_interfere_size, + aarch64_tune_params.prefetch->l1_cache_line_size); + SET_OPTION_IF_UNSET (opts, &global_options_set, + param_construct_interfere_size, + aarch64_tune_params.prefetch->l1_cache_line_size); + } + else + { + /* For a generic AArch64 target, cover the current range of cache line + sizes. */ + SET_OPTION_IF_UNSET (opts, &global_options_set, + param_destruct_interfere_size, + 256); + SET_OPTION_IF_UNSET (opts, &global_options_set, + param_construct_interfere_size, + 64); + } + if (aarch64_tune_params.prefetch->l2_cache_size >= 0) SET_OPTION_IF_UNSET (opts, &global_options_set, param_l2_cache_size, diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index f1e628253d0..6c6e77fab66 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -3669,6 +3669,28 @@ arm_option_override (void) SET_OPTION_IF_UNSET (&global_options, &global_options_set, param_l1_cache_line_size, current_tune->prefetch.l1_cache_line_size); + if (current_tune->prefetch.l1_cache_line_size >= 0) + { + SET_OPTION_IF_UNSET (&global_options, &global_options_set, + param_destruct_interfere_size, + current_tune->prefetch.l1_cache_line_size); + SET_OPTION_IF_UNSET (&global_options, &global_options_set, + param_construct_interfere_size, + current_tune->prefetch.l1_cache_line_size); + } + else + { + /* For a generic ARM target, JF Bastien proposed using 64 for both. */ + /* ??? Cortex A9 has a 32-byte cache line, so why not 32 for + constructive? */ + /* More recent Cortex chips have a 64-byte cache line, but are marked + ARM_PREFETCH_NOT_BENEFICIAL, so they get these defaults. */ + SET_OPTION_IF_UNSET (&global_options, &global_options_set, + param_destruct_interfere_size, 64); + SET_OPTION_IF_UNSET (&global_options, &global_options_set, + param_construct_interfere_size, 64); + } + if (current_tune->prefetch.l1_cache_size >= 0) SET_OPTION_IF_UNSET (&global_options, &global_options_set, param_l1_cache_size, diff --git a/gcc/config/i386/i386-options.c b/gcc/config/i386/i386-options.c index 2cb87cedec0..c0006b3674b 100644 --- a/gcc/config/i386/i386-options.c +++ b/gcc/config/i386/i386-options.c @@ -2579,6 +2579,12 @@ ix86_option_override_internal (bool main_args_p, SET_OPTION_IF_UNSET (opts, opts_set, param_l2_cache_size, ix86_tune_cost->l2_cache_size); + /* 64B is the accepted value for these for all x86. */ + SET_OPTION_IF_UNSET (&global_options, &global_options_set, + param_destruct_interfere_size, 64); + SET_OPTION_IF_UNSET (&global_options, &global_options_set, + param_construct_interfere_size, 64); + /* Enable sw prefetching at -O3 for CPUS that prefetching is helpful. */ if (opts->x_flag_prefetch_loop_arrays < 0 && HAVE_prefetch diff --git a/gcc/cp/constexpr.c b/gcc/cp/constexpr.c index 7772fe62d95..0c2498aee22 100644 --- a/gcc/cp/constexpr.c +++ b/gcc/cp/constexpr.c @@ -6075,6 +6075,37 @@ inline_asm_in_constexpr_error (location_t loc) "%<constexpr%> function in C++20"); } +/* We're getting the constant value of DECL in a manifestly constant-evaluated + context; maybe complain about that. */ + +static void +maybe_warn_about_constant_value (location_t loc, tree decl) +{ + static bool explained = false; + if (cxx_dialect >= cxx17 + && warn_interference_size + && !global_options_set.x_param_destruct_interfere_size + && DECL_CONTEXT (decl) == std_node + && id_equal (DECL_NAME (decl), "hardware_destructive_interference_size") + && (LOCATION_FILE (input_location) != main_input_filename + || module_exporting_p ()) + && warning_at (loc, OPT_Winterference_size, "use of %qD", decl) + && !explained) + { + explained = true; + inform (loc, "its value can vary between compiler versions or " + "with different %<-mtune%> or %<-mcpu%> flags"); + inform (loc, "if this use is part of a public ABI, change it to " + "instead use a constant variable you define"); + inform (loc, "the default value for the current CPU tuning " + "is %d bytes", param_destruct_interfere_size); + inform (loc, "you can stabilize this value with %<--param " + "hardware_destructive_interference_size=%d%>, or disable " + "this warning with %<-Wno-interference-size%>", + param_destruct_interfere_size); + } +} + /* Attempt to reduce the expression T to a constant value. On failure, issue diagnostic and return error_mark_node. */ /* FIXME unify with c_fully_fold */ @@ -6219,6 +6250,8 @@ cxx_eval_constant_expression (const constexpr_ctx *ctx, tree t, r = *p; break; } + if (ctx->manifestly_const_eval) + maybe_warn_about_constant_value (loc, t); if (COMPLETE_TYPE_P (TREE_TYPE (t)) && is_really_empty_class (TREE_TYPE (t), /*ignore_vptr*/false)) { diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c index bce62ad202a..c2065027369 100644 --- a/gcc/cp/decl.c +++ b/gcc/cp/decl.c @@ -4752,6 +4752,38 @@ cxx_init_decl_processing (void) /* Show we use EH for cleanups. */ if (flag_exceptions) using_eh_for_cleanups (); + + /* Check that the hardware interference sizes are at least + alignof(max_align_t), as required by the standard. */ + const int max_align = max_align_t_align () / BITS_PER_UNIT; + if (param_destruct_interfere_size) + { + if (param_destruct_interfere_size < max_align) + error ("%<--param destructive-interference-size=%d%> is less than " + "%d", param_destruct_interfere_size, max_align); + else if (param_destruct_interfere_size < param_l1_cache_line_size) + warning (OPT_Winterference_size, + "%<--param destructive-interference-size=%d%> " + "is less than %<--param l1-cache-line-size=%d%>", + param_destruct_interfere_size, param_l1_cache_line_size); + } + else if (param_l1_cache_line_size >= max_align) + param_destruct_interfere_size = param_l1_cache_line_size; + /* else leave it unset. */ + + if (param_construct_interfere_size) + { + if (param_construct_interfere_size < max_align) + error ("%<--param constructive-interference-size=%d%> is less than " + "%d", param_construct_interfere_size, max_align); + else if (param_construct_interfere_size > param_l1_cache_line_size) + warning (OPT_Winterference_size, + "%<--param constructive-interference-size=%d%> " + "is greater than %<--param l1-cache-line-size=%d%>", + param_construct_interfere_size, param_l1_cache_line_size); + } + else if (param_l1_cache_line_size >= max_align) + param_construct_interfere_size = param_l1_cache_line_size; } /* Enter an abi node in global-module context. returns a cookie to diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index 23cc68f92b5..78cfc100ac2 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -9018,6 +9018,43 @@ that has already been done in the current function. Therefore, seemingly insignificant changes in the source program can cause the warnings produced by @option{-Winline} to appear or disappear. +@item -Winterference-size +@opindex Winterference-size +Warn about use of C++17 @code{std::hardware_destructive_interference_size} +without specifying its value with @option{--param destructive-interference-size}. +Also warn about questionable values for that option. + +This variable is intended to be used for controlling class layout, to +avoid false sharing in concurrent code: + +@smallexample +struct independent_fields @{ + alignas(std::hardware_destructive_interference_size) std::atomic<int> one; + alignas(std::hardware_destructive_interference_size) std::atomic<int> two; +@}; +@end smallexample + +Here @samp{one} and @samp{two} are intended to be far enough apart +that stores to one won't require accesses to the other to reload the +cache line. + +By default, @option{--param destructive-interference-size} and +@option{--param constructive-interference-size} are set based on the +current @option{-mtune} option, typically to the L1 cache line size +for the particular target CPU, sometimes to a range if tuning for a +generic target. So all translation units that depend on ABI +compatibility for the use of these variables must be compiled with +the same @option{-mtune} (or @option{-mcpu}). + +If ABI stability is important, such as if the use is in a header for a +library, you should probably not use the hardware interference size +variables at all. Alternatively, you can force a particular value +with @option{--param}. + +If you are confident that your use of the variable does not affect ABI +outside a single build of your project, you can turn off the warning +with @option{-Wno-interference-size}. + @item -Wint-in-bool-context @opindex Wint-in-bool-context @opindex Wno-int-in-bool-context @@ -13938,6 +13975,34 @@ prefetch hints can be issued for any constant stride. This setting is only useful for strides that are known and constant. +@item destructive-interference-size +@item constructive-interference-size +The values for the C++17 variables +@code{std::hardware_destructive_interference_size} and +@code{std::hardware_constructive_interference_size}. The destructive +interference size is the minimum recommended offset between two +independent concurrently-accessed objects; the constructive +interference size is the maximum recommended size of contiguous memory +accessed together. Typically both will be the size of an L1 cache +line for the target, in bytes. For a generic target covering a range of L1 +cache line sizes, typically the constructive interference size will be +the small end of the range and the destructive size will be the large +end. + +The destructive interference size is intended to be used for layout, +and thus has ABI impact. The default value is not expected to be +stable, and on some targets varies with @option{-mtune}, so use of +this variable in a context where ABI stability is important, such as +the public interface of a library, is strongly discouraged; if it is +used in that context, users can stabilize the value using this +option. + +The constructive interference size is less sensitive, as it is +typically only used in a @samp{static_assert} to make sure that a type +fits within a cache line. + +See also @option{-Winterference-size}. + @item loop-interchange-max-num-stmts The maximum number of stmts in a loop to be interchanged. diff --git a/gcc/params.opt b/gcc/params.opt index 3a701e22c46..658ca028851 100644 --- a/gcc/params.opt +++ b/gcc/params.opt @@ -361,6 +361,22 @@ The maximum code size growth ratio when expanding into a jump table (in percent) Common Joined UInteger Var(param_l1_cache_line_size) Init(32) Param Optimization The size of L1 cache line. +-param=destructive-interference-size= +Common Joined UInteger Var(param_destruct_interfere_size) Init(0) Param Optimization +The minimum recommended offset between two concurrently-accessed objects to +avoid additional performance degradation due to contention introduced by the +implementation. Typically the L1 cache line size, but can be larger to +accommodate a variety of target processors with different cache line sizes. +C++17 code might use this value in structure layout, but is strongly +discouraged from doing so in public ABIs. + +-param=constructive-interference-size= +Common Joined UInteger Var(param_construct_interfere_size) Init(0) Param Optimization +The maximum recommended size of contiguous memory occupied by two objects +accessed with temporal locality by concurrent threads. Typically the L1 cache +line size, but can be smaller to accommodate a variety of target processors with +different cache line sizes. + -param=l1-cache-size= Common Joined UInteger Var(param_l1_cache_size) Init(64) Param Optimization The size of L1 cache. diff --git a/gcc/testsuite/g++.dg/warn/Winterference-2.C b/gcc/testsuite/g++.dg/warn/Winterference-2.C new file mode 100644 index 00000000000..2af75c63f83 --- /dev/null +++ b/gcc/testsuite/g++.dg/warn/Winterference-2.C @@ -0,0 +1,14 @@ +// { dg-do compile { target c++20 } } +// { dg-additional-options -fmodules-ts } + +module ; + +#include <new> + +export module foo; + +export { + struct A { + alignas(std::hardware_destructive_interference_size) int x; // { dg-warning Winterference-size } + }; +} diff --git a/gcc/testsuite/g++.dg/warn/Winterference.C b/gcc/testsuite/g++.dg/warn/Winterference.C new file mode 100644 index 00000000000..57c001bc032 --- /dev/null +++ b/gcc/testsuite/g++.dg/warn/Winterference.C @@ -0,0 +1,6 @@ +// Test that we warn about use of std::hardware_destructive_interference_size +// in a header. +// { dg-do compile { target c++17 } } + +// { dg-warning Winterference-size "" { target *-*-* } 0 } +#include "Winterference.H" diff --git a/gcc/testsuite/g++.dg/warn/Winterference.H b/gcc/testsuite/g++.dg/warn/Winterference.H new file mode 100644 index 00000000000..36f0ad5f6d1 --- /dev/null +++ b/gcc/testsuite/g++.dg/warn/Winterference.H @@ -0,0 +1,7 @@ +#include <new> + +struct A +{ + alignas(std::hardware_destructive_interference_size) int i; + alignas(std::hardware_destructive_interference_size) int j; +}; diff --git a/gcc/testsuite/g++.target/aarch64/interference.C b/gcc/testsuite/g++.target/aarch64/interference.C new file mode 100644 index 00000000000..0fc01655223 --- /dev/null +++ b/gcc/testsuite/g++.target/aarch64/interference.C @@ -0,0 +1,9 @@ +// Test C++17 hardware interference size constants +// { dg-do compile { target c++17 } } + +#include <new> + +// Most AArch64 CPUs have an L1 cache line size of 64, but some recent ones use +// 128 or even 256. +static_assert(std::hardware_destructive_interference_size == 256); +static_assert(std::hardware_constructive_interference_size == 64); diff --git a/gcc/testsuite/g++.target/arm/interference.C b/gcc/testsuite/g++.target/arm/interference.C new file mode 100644 index 00000000000..34fe8a52bff --- /dev/null +++ b/gcc/testsuite/g++.target/arm/interference.C @@ -0,0 +1,9 @@ +// Test C++17 hardware interference size constants +// { dg-do compile { target c++17 } } + +#include <new> + +// Recent ARM CPUs have a cache line size of 64. Older ones have +// a size of 32, but I guess they're old enough that we don't care? +static_assert(std::hardware_destructive_interference_size == 64); +static_assert(std::hardware_constructive_interference_size == 64); diff --git a/gcc/testsuite/g++.target/i386/interference.C b/gcc/testsuite/g++.target/i386/interference.C new file mode 100644 index 00000000000..c7b910e3ada --- /dev/null +++ b/gcc/testsuite/g++.target/i386/interference.C @@ -0,0 +1,8 @@ +// Test C++17 hardware interference size constants +// { dg-do compile { target c++17 } } + +#include <new> + +// It is generally agreed that these are the right values for all x86. +static_assert(std::hardware_destructive_interference_size == 64); +static_assert(std::hardware_constructive_interference_size == 64); diff --git a/libstdc++-v3/include/std/version b/libstdc++-v3/include/std/version index f950bf0f0db..f41004b5911 100644 --- a/libstdc++-v3/include/std/version +++ b/libstdc++-v3/include/std/version @@ -140,6 +140,9 @@ #define __cpp_lib_filesystem 201703 #define __cpp_lib_gcd 201606 #define __cpp_lib_gcd_lcm 201606 +#ifdef __GCC_DESTRUCTIVE_SIZE +# define __cpp_lib_hardware_interference_size 201703L +#endif #define __cpp_lib_hypot 201603 #define __cpp_lib_invoke 201411L #define __cpp_lib_lcm 201606 diff --git a/libstdc++-v3/libsupc++/new b/libstdc++-v3/libsupc++/new index 3349b13fd1b..7bc67a6cb02 100644 --- a/libstdc++-v3/libsupc++/new +++ b/libstdc++-v3/libsupc++/new @@ -183,9 +183,9 @@ inline void operator delete[](void*, void*) _GLIBCXX_USE_NOEXCEPT { } } // extern "C++" #if __cplusplus >= 201703L -#ifdef _GLIBCXX_HAVE_BUILTIN_LAUNDER namespace std { +#ifdef _GLIBCXX_HAVE_BUILTIN_LAUNDER #define __cpp_lib_launder 201606 /// Pointer optimization barrier [ptr.launder] template<typename _Tp> @@ -205,8 +205,14 @@ namespace std void launder(const void*) = delete; void launder(volatile void*) = delete; void launder(const volatile void*) = delete; -} #endif // _GLIBCXX_HAVE_BUILTIN_LAUNDER + +#ifdef __GCC_DESTRUCTIVE_SIZE +# define __cpp_lib_hardware_interference_size 201703L + inline constexpr size_t hardware_destructive_interference_size = __GCC_DESTRUCTIVE_SIZE; + inline constexpr size_t hardware_constructive_interference_size = __GCC_CONSTRUCTIVE_SIZE; +#endif // __GCC_DESTRUCTIVE_SIZE +} #endif // C++17 #if __cplusplus > 201703L </cut>

4 years, 6 months

[TCWG CI] Regression caused by gcc:76b75018b3d053a890ebe155e47814de14b3c9fb

by ci_notify＠linaro.org

Identified regression caused by *gcc:76b75018b3d053a890ebe155e47814de14b3c9fb*: commit 76b75018b3d053a890ebe155e47814de14b3c9fb Author: Jason Merrill <jason(a)redhat.com> c++: implement C++17 hardware interference size Results regressed to (for first_bad == 76b75018b3d053a890ebe155e47814de14b3c9fb) # reset_artifacts: -10 # true: 0 # build_abe binutils: 1 # First few build errors in logs: from (for last_good == 8ea292591e42aa4d52b4b7a00b86335bfd2e2e85) # reset_artifacts: -10 # true: 0 # build_abe binutils: 1 # build_abe gcc: 2 # build_abe linux: 4 # build_abe glibc: 5 # build_abe gdb: 6 This commit has regressed these CI configurations: - tcwg_gnu_native_build/master-arm Artifacts of last_good build: https://ci.linaro.org/job/tcwg_gnu_native_build-bisect-master-arm/2/artifac… Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_gnu_native_build-bisect-master-arm/2/artifac… Even more details: https://ci.linaro.org/job/tcwg_gnu_native_build-bisect-master-arm/2/artifac… Reproduce builds: <cut> mkdir investigate-gcc-76b75018b3d053a890ebe155e47814de14b3c9fb cd investigate-gcc-76b75018b3d053a890ebe155e47814de14b3c9fb # Fetch scripts git clone https://git.linaro.org/toolchain/jenkins-scripts # Fetch manifests and test.sh script mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_gnu_native_build-bisect-master-arm/2/artifac… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_gnu_native_build-bisect-master-arm/2/artifac… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_gnu_native_build-bisect-master-arm/2/artifac… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_gnu-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /gcc/ ./ ./bisect/baseline/ cd gcc # Reproduce first_bad build git checkout --detach 76b75018b3d053a890ebe155e47814de14b3c9fb ../artifacts/test.sh # Reproduce last_good build git checkout --detach 8ea292591e42aa4d52b4b7a00b86335bfd2e2e85 ../artifacts/test.sh cd .. </cut> Full commit (up to 1000 lines): <cut> commit 76b75018b3d053a890ebe155e47814de14b3c9fb Author: Jason Merrill <jason(a)redhat.com> Date: Thu Jul 15 15:30:17 2021 -0400 c++: implement C++17 hardware interference size The last missing piece of the C++17 standard library is the hardware intereference size constants. Much of the delay in implementing these has been due to uncertainty about what the right values are, and even whether there is a single constant value that is suitable; the destructive interference size is intended to be used in structure layout, so program ABIs will depend on it. In principle, both of these values should be the same as the target's L1 cache line size. When compiling for a generic target that is intended to support a range of target CPUs with different cache line sizes, the constructive size should probably be the minimum size, and the destructive size the maximum, unless you are constrained by ABI compatibility with previous code. From discussion on gcc-patches, I've come to the conclusion that the solution to the difficulty of choosing stable values is to give up on it, and instead encourage only uses where ABI stability is unimportant: in particular, uses where the ABI is shared at most between translation units built at the same time with the same flags. To that end, I've added a warning for any use of the constant value of std::hardware_destructive_interference_size in a header or module export. Appropriate uses within a project can disable the warning. A previous iteration of this patch included an -finterference-tune flag to make the value vary with -mtune; this iteration makes that the default behavior, which should be appropriate for all reasonable uses of the variable. The previous default of "stable-ish" seems to me likely to have been more of an attractive nuisance; since we can't promise actual stability, we should instead make proper uses more convenient. JF Bastien's implementation proposal is summarized at https://github.com/itanium-cxx-abi/cxx-abi/issues/74 I implement this by adding new --params for the two sizes. Targets can override these values in targetm.target_option.override() to support a range of values for the generic target; otherwise, both will default to the L1 cache line size. 64 bytes still seems correct for all x86. I'm not sure why he proposed 64/64 for generic 32-bit ARM, since the Cortex A9 has a 32-byte cache line, so I'd think 32/64 would make more sense. He proposed 64/128 for generic AArch64, but since the A64FX now has a 256B cache line, I've changed that to 64/256. Other arch maintainers are invited to set ranges for their generic targets if that seems better than using the default cache line size for both values. With the above choice to reject stability as a goal, getting these values "right" is now just a matter of what we want the default optimization to be, and we can feel free to adjust them as CPUs with different cache lines become more and less common. gcc/ChangeLog: * params.opt: Add destructive-interference-size and constructive-interference-size. * doc/invoke.texi: Document them. * config/aarch64/aarch64.c (aarch64_override_options_internal): Set them. * config/arm/arm.c (arm_option_override): Set them. * config/i386/i386-options.c (ix86_option_override_internal): Set them. gcc/c-family/ChangeLog: * c.opt: Add -Winterference-size. * c-cppbuiltin.c (cpp_atomic_builtins): Add __GCC_DESTRUCTIVE_SIZE and __GCC_CONSTRUCTIVE_SIZE. gcc/cp/ChangeLog: * constexpr.c (maybe_warn_about_constant_value): Complain about std::hardware_destructive_interference_size. (cxx_eval_constant_expression): Call it. * decl.c (cxx_init_decl_processing): Check --param *-interference-size values. libstdc++-v3/ChangeLog: * include/std/version: Define __cpp_lib_hardware_interference_size. * libsupc++/new: Define hardware interference size variables. gcc/testsuite/ChangeLog: * g++.dg/warn/Winterference.H: New file. * g++.dg/warn/Winterference.C: New test. * g++.target/aarch64/interference.C: New test. * g++.target/arm/interference.C: New test. * g++.target/i386/interference.C: New test. --- gcc/c-family/c-cppbuiltin.c | 14 ++++++ gcc/c-family/c.opt | 5 ++ gcc/config/aarch64/aarch64.c | 22 +++++++++ gcc/config/arm/arm.c | 22 +++++++++ gcc/config/i386/i386-options.c | 6 +++ gcc/cp/constexpr.c | 33 +++++++++++++ gcc/cp/decl.c | 32 ++++++++++++ gcc/doc/invoke.texi | 65 +++++++++++++++++++++++++ gcc/params.opt | 16 ++++++ gcc/testsuite/g++.dg/warn/Winterference-2.C | 14 ++++++ gcc/testsuite/g++.dg/warn/Winterference.C | 6 +++ gcc/testsuite/g++.dg/warn/Winterference.H | 7 +++ gcc/testsuite/g++.target/aarch64/interference.C | 9 ++++ gcc/testsuite/g++.target/arm/interference.C | 9 ++++ gcc/testsuite/g++.target/i386/interference.C | 8 +++ libstdc++-v3/include/std/version | 3 ++ libstdc++-v3/libsupc++/new | 10 +++- 17 files changed, 279 insertions(+), 2 deletions(-) diff --git a/gcc/c-family/c-cppbuiltin.c b/gcc/c-family/c-cppbuiltin.c index 48cbefd8bf8..ce88e707127 100644 --- a/gcc/c-family/c-cppbuiltin.c +++ b/gcc/c-family/c-cppbuiltin.c @@ -741,6 +741,20 @@ cpp_atomic_builtins (cpp_reader *pfile) builtin_define_with_int_value ("__GCC_ATOMIC_TEST_AND_SET_TRUEVAL", targetm.atomic_test_and_set_trueval); + /* Macros for C++17 hardware interference size constants. Either both or + neither should be set. */ + gcc_assert (!param_destruct_interfere_size + == !param_construct_interfere_size); + if (param_destruct_interfere_size) + { + /* FIXME The way of communicating these values to the library should be + part of the C++ ABI, whether macro or builtin. */ + builtin_define_with_int_value ("__GCC_DESTRUCTIVE_SIZE", + param_destruct_interfere_size); + builtin_define_with_int_value ("__GCC_CONSTRUCTIVE_SIZE", + param_construct_interfere_size); + } + /* ptr_type_node can't be used here since ptr_mode is only set when toplev calls backend_init which is not done with -E or pch. */ psize = POINTER_SIZE_UNITS; diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt index c5fe90003f2..9c151d19870 100644 --- a/gcc/c-family/c.opt +++ b/gcc/c-family/c.opt @@ -722,6 +722,11 @@ Winit-list-lifetime C++ ObjC++ Var(warn_init_list) Warning Init(1) Warn about uses of std::initializer_list that can result in dangling pointers. +Winterference-size +C++ ObjC++ Var(warn_interference_size) Warning Init(1) +Warn about nonsensical values of --param destructive-interference-size or +constructive-interference-size. + Wimplicit C ObjC Var(warn_implicit) Warning LangEnabledBy(C ObjC,Wall) Warn about implicit declarations. diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 30d9a0b7a3d..36519ccc5a5 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -16540,6 +16540,28 @@ aarch64_override_options_internal (struct gcc_options *opts) SET_OPTION_IF_UNSET (opts, &global_options_set, param_l1_cache_line_size, aarch64_tune_params.prefetch->l1_cache_line_size); + + if (aarch64_tune_params.prefetch->l1_cache_line_size >= 0) + { + SET_OPTION_IF_UNSET (opts, &global_options_set, + param_destruct_interfere_size, + aarch64_tune_params.prefetch->l1_cache_line_size); + SET_OPTION_IF_UNSET (opts, &global_options_set, + param_construct_interfere_size, + aarch64_tune_params.prefetch->l1_cache_line_size); + } + else + { + /* For a generic AArch64 target, cover the current range of cache line + sizes. */ + SET_OPTION_IF_UNSET (opts, &global_options_set, + param_destruct_interfere_size, + 256); + SET_OPTION_IF_UNSET (opts, &global_options_set, + param_construct_interfere_size, + 64); + } + if (aarch64_tune_params.prefetch->l2_cache_size >= 0) SET_OPTION_IF_UNSET (opts, &global_options_set, param_l2_cache_size, diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index f1e628253d0..6c6e77fab66 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -3669,6 +3669,28 @@ arm_option_override (void) SET_OPTION_IF_UNSET (&global_options, &global_options_set, param_l1_cache_line_size, current_tune->prefetch.l1_cache_line_size); + if (current_tune->prefetch.l1_cache_line_size >= 0) + { + SET_OPTION_IF_UNSET (&global_options, &global_options_set, + param_destruct_interfere_size, + current_tune->prefetch.l1_cache_line_size); + SET_OPTION_IF_UNSET (&global_options, &global_options_set, + param_construct_interfere_size, + current_tune->prefetch.l1_cache_line_size); + } + else + { + /* For a generic ARM target, JF Bastien proposed using 64 for both. */ + /* ??? Cortex A9 has a 32-byte cache line, so why not 32 for + constructive? */ + /* More recent Cortex chips have a 64-byte cache line, but are marked + ARM_PREFETCH_NOT_BENEFICIAL, so they get these defaults. */ + SET_OPTION_IF_UNSET (&global_options, &global_options_set, + param_destruct_interfere_size, 64); + SET_OPTION_IF_UNSET (&global_options, &global_options_set, + param_construct_interfere_size, 64); + } + if (current_tune->prefetch.l1_cache_size >= 0) SET_OPTION_IF_UNSET (&global_options, &global_options_set, param_l1_cache_size, diff --git a/gcc/config/i386/i386-options.c b/gcc/config/i386/i386-options.c index 2cb87cedec0..c0006b3674b 100644 --- a/gcc/config/i386/i386-options.c +++ b/gcc/config/i386/i386-options.c @@ -2579,6 +2579,12 @@ ix86_option_override_internal (bool main_args_p, SET_OPTION_IF_UNSET (opts, opts_set, param_l2_cache_size, ix86_tune_cost->l2_cache_size); + /* 64B is the accepted value for these for all x86. */ + SET_OPTION_IF_UNSET (&global_options, &global_options_set, + param_destruct_interfere_size, 64); + SET_OPTION_IF_UNSET (&global_options, &global_options_set, + param_construct_interfere_size, 64); + /* Enable sw prefetching at -O3 for CPUS that prefetching is helpful. */ if (opts->x_flag_prefetch_loop_arrays < 0 && HAVE_prefetch diff --git a/gcc/cp/constexpr.c b/gcc/cp/constexpr.c index 7772fe62d95..0c2498aee22 100644 --- a/gcc/cp/constexpr.c +++ b/gcc/cp/constexpr.c @@ -6075,6 +6075,37 @@ inline_asm_in_constexpr_error (location_t loc) "%<constexpr%> function in C++20"); } +/* We're getting the constant value of DECL in a manifestly constant-evaluated + context; maybe complain about that. */ + +static void +maybe_warn_about_constant_value (location_t loc, tree decl) +{ + static bool explained = false; + if (cxx_dialect >= cxx17 + && warn_interference_size + && !global_options_set.x_param_destruct_interfere_size + && DECL_CONTEXT (decl) == std_node + && id_equal (DECL_NAME (decl), "hardware_destructive_interference_size") + && (LOCATION_FILE (input_location) != main_input_filename + || module_exporting_p ()) + && warning_at (loc, OPT_Winterference_size, "use of %qD", decl) + && !explained) + { + explained = true; + inform (loc, "its value can vary between compiler versions or " + "with different %<-mtune%> or %<-mcpu%> flags"); + inform (loc, "if this use is part of a public ABI, change it to " + "instead use a constant variable you define"); + inform (loc, "the default value for the current CPU tuning " + "is %d bytes", param_destruct_interfere_size); + inform (loc, "you can stabilize this value with %<--param " + "hardware_destructive_interference_size=%d%>, or disable " + "this warning with %<-Wno-interference-size%>", + param_destruct_interfere_size); + } +} + /* Attempt to reduce the expression T to a constant value. On failure, issue diagnostic and return error_mark_node. */ /* FIXME unify with c_fully_fold */ @@ -6219,6 +6250,8 @@ cxx_eval_constant_expression (const constexpr_ctx *ctx, tree t, r = *p; break; } + if (ctx->manifestly_const_eval) + maybe_warn_about_constant_value (loc, t); if (COMPLETE_TYPE_P (TREE_TYPE (t)) && is_really_empty_class (TREE_TYPE (t), /*ignore_vptr*/false)) { diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c index bce62ad202a..c2065027369 100644 --- a/gcc/cp/decl.c +++ b/gcc/cp/decl.c @@ -4752,6 +4752,38 @@ cxx_init_decl_processing (void) /* Show we use EH for cleanups. */ if (flag_exceptions) using_eh_for_cleanups (); + + /* Check that the hardware interference sizes are at least + alignof(max_align_t), as required by the standard. */ + const int max_align = max_align_t_align () / BITS_PER_UNIT; + if (param_destruct_interfere_size) + { + if (param_destruct_interfere_size < max_align) + error ("%<--param destructive-interference-size=%d%> is less than " + "%d", param_destruct_interfere_size, max_align); + else if (param_destruct_interfere_size < param_l1_cache_line_size) + warning (OPT_Winterference_size, + "%<--param destructive-interference-size=%d%> " + "is less than %<--param l1-cache-line-size=%d%>", + param_destruct_interfere_size, param_l1_cache_line_size); + } + else if (param_l1_cache_line_size >= max_align) + param_destruct_interfere_size = param_l1_cache_line_size; + /* else leave it unset. */ + + if (param_construct_interfere_size) + { + if (param_construct_interfere_size < max_align) + error ("%<--param constructive-interference-size=%d%> is less than " + "%d", param_construct_interfere_size, max_align); + else if (param_construct_interfere_size > param_l1_cache_line_size) + warning (OPT_Winterference_size, + "%<--param constructive-interference-size=%d%> " + "is greater than %<--param l1-cache-line-size=%d%>", + param_construct_interfere_size, param_l1_cache_line_size); + } + else if (param_l1_cache_line_size >= max_align) + param_construct_interfere_size = param_l1_cache_line_size; } /* Enter an abi node in global-module context. returns a cookie to diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index 23cc68f92b5..78cfc100ac2 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -9018,6 +9018,43 @@ that has already been done in the current function. Therefore, seemingly insignificant changes in the source program can cause the warnings produced by @option{-Winline} to appear or disappear. +@item -Winterference-size +@opindex Winterference-size +Warn about use of C++17 @code{std::hardware_destructive_interference_size} +without specifying its value with @option{--param destructive-interference-size}. +Also warn about questionable values for that option. + +This variable is intended to be used for controlling class layout, to +avoid false sharing in concurrent code: + +@smallexample +struct independent_fields @{ + alignas(std::hardware_destructive_interference_size) std::atomic<int> one; + alignas(std::hardware_destructive_interference_size) std::atomic<int> two; +@}; +@end smallexample + +Here @samp{one} and @samp{two} are intended to be far enough apart +that stores to one won't require accesses to the other to reload the +cache line. + +By default, @option{--param destructive-interference-size} and +@option{--param constructive-interference-size} are set based on the +current @option{-mtune} option, typically to the L1 cache line size +for the particular target CPU, sometimes to a range if tuning for a +generic target. So all translation units that depend on ABI +compatibility for the use of these variables must be compiled with +the same @option{-mtune} (or @option{-mcpu}). + +If ABI stability is important, such as if the use is in a header for a +library, you should probably not use the hardware interference size +variables at all. Alternatively, you can force a particular value +with @option{--param}. + +If you are confident that your use of the variable does not affect ABI +outside a single build of your project, you can turn off the warning +with @option{-Wno-interference-size}. + @item -Wint-in-bool-context @opindex Wint-in-bool-context @opindex Wno-int-in-bool-context @@ -13938,6 +13975,34 @@ prefetch hints can be issued for any constant stride. This setting is only useful for strides that are known and constant. +@item destructive-interference-size +@item constructive-interference-size +The values for the C++17 variables +@code{std::hardware_destructive_interference_size} and +@code{std::hardware_constructive_interference_size}. The destructive +interference size is the minimum recommended offset between two +independent concurrently-accessed objects; the constructive +interference size is the maximum recommended size of contiguous memory +accessed together. Typically both will be the size of an L1 cache +line for the target, in bytes. For a generic target covering a range of L1 +cache line sizes, typically the constructive interference size will be +the small end of the range and the destructive size will be the large +end. + +The destructive interference size is intended to be used for layout, +and thus has ABI impact. The default value is not expected to be +stable, and on some targets varies with @option{-mtune}, so use of +this variable in a context where ABI stability is important, such as +the public interface of a library, is strongly discouraged; if it is +used in that context, users can stabilize the value using this +option. + +The constructive interference size is less sensitive, as it is +typically only used in a @samp{static_assert} to make sure that a type +fits within a cache line. + +See also @option{-Winterference-size}. + @item loop-interchange-max-num-stmts The maximum number of stmts in a loop to be interchanged. diff --git a/gcc/params.opt b/gcc/params.opt index 3a701e22c46..658ca028851 100644 --- a/gcc/params.opt +++ b/gcc/params.opt @@ -361,6 +361,22 @@ The maximum code size growth ratio when expanding into a jump table (in percent) Common Joined UInteger Var(param_l1_cache_line_size) Init(32) Param Optimization The size of L1 cache line. +-param=destructive-interference-size= +Common Joined UInteger Var(param_destruct_interfere_size) Init(0) Param Optimization +The minimum recommended offset between two concurrently-accessed objects to +avoid additional performance degradation due to contention introduced by the +implementation. Typically the L1 cache line size, but can be larger to +accommodate a variety of target processors with different cache line sizes. +C++17 code might use this value in structure layout, but is strongly +discouraged from doing so in public ABIs. + +-param=constructive-interference-size= +Common Joined UInteger Var(param_construct_interfere_size) Init(0) Param Optimization +The maximum recommended size of contiguous memory occupied by two objects +accessed with temporal locality by concurrent threads. Typically the L1 cache +line size, but can be smaller to accommodate a variety of target processors with +different cache line sizes. + -param=l1-cache-size= Common Joined UInteger Var(param_l1_cache_size) Init(64) Param Optimization The size of L1 cache. diff --git a/gcc/testsuite/g++.dg/warn/Winterference-2.C b/gcc/testsuite/g++.dg/warn/Winterference-2.C new file mode 100644 index 00000000000..2af75c63f83 --- /dev/null +++ b/gcc/testsuite/g++.dg/warn/Winterference-2.C @@ -0,0 +1,14 @@ +// { dg-do compile { target c++20 } } +// { dg-additional-options -fmodules-ts } + +module ; + +#include <new> + +export module foo; + +export { + struct A { + alignas(std::hardware_destructive_interference_size) int x; // { dg-warning Winterference-size } + }; +} diff --git a/gcc/testsuite/g++.dg/warn/Winterference.C b/gcc/testsuite/g++.dg/warn/Winterference.C new file mode 100644 index 00000000000..57c001bc032 --- /dev/null +++ b/gcc/testsuite/g++.dg/warn/Winterference.C @@ -0,0 +1,6 @@ +// Test that we warn about use of std::hardware_destructive_interference_size +// in a header. +// { dg-do compile { target c++17 } } + +// { dg-warning Winterference-size "" { target *-*-* } 0 } +#include "Winterference.H" diff --git a/gcc/testsuite/g++.dg/warn/Winterference.H b/gcc/testsuite/g++.dg/warn/Winterference.H new file mode 100644 index 00000000000..36f0ad5f6d1 --- /dev/null +++ b/gcc/testsuite/g++.dg/warn/Winterference.H @@ -0,0 +1,7 @@ +#include <new> + +struct A +{ + alignas(std::hardware_destructive_interference_size) int i; + alignas(std::hardware_destructive_interference_size) int j; +}; diff --git a/gcc/testsuite/g++.target/aarch64/interference.C b/gcc/testsuite/g++.target/aarch64/interference.C new file mode 100644 index 00000000000..0fc01655223 --- /dev/null +++ b/gcc/testsuite/g++.target/aarch64/interference.C @@ -0,0 +1,9 @@ +// Test C++17 hardware interference size constants +// { dg-do compile { target c++17 } } + +#include <new> + +// Most AArch64 CPUs have an L1 cache line size of 64, but some recent ones use +// 128 or even 256. +static_assert(std::hardware_destructive_interference_size == 256); +static_assert(std::hardware_constructive_interference_size == 64); diff --git a/gcc/testsuite/g++.target/arm/interference.C b/gcc/testsuite/g++.target/arm/interference.C new file mode 100644 index 00000000000..34fe8a52bff --- /dev/null +++ b/gcc/testsuite/g++.target/arm/interference.C @@ -0,0 +1,9 @@ +// Test C++17 hardware interference size constants +// { dg-do compile { target c++17 } } + +#include <new> + +// Recent ARM CPUs have a cache line size of 64. Older ones have +// a size of 32, but I guess they're old enough that we don't care? +static_assert(std::hardware_destructive_interference_size == 64); +static_assert(std::hardware_constructive_interference_size == 64); diff --git a/gcc/testsuite/g++.target/i386/interference.C b/gcc/testsuite/g++.target/i386/interference.C new file mode 100644 index 00000000000..c7b910e3ada --- /dev/null +++ b/gcc/testsuite/g++.target/i386/interference.C @@ -0,0 +1,8 @@ +// Test C++17 hardware interference size constants +// { dg-do compile { target c++17 } } + +#include <new> + +// It is generally agreed that these are the right values for all x86. +static_assert(std::hardware_destructive_interference_size == 64); +static_assert(std::hardware_constructive_interference_size == 64); diff --git a/libstdc++-v3/include/std/version b/libstdc++-v3/include/std/version index f950bf0f0db..f41004b5911 100644 --- a/libstdc++-v3/include/std/version +++ b/libstdc++-v3/include/std/version @@ -140,6 +140,9 @@ #define __cpp_lib_filesystem 201703 #define __cpp_lib_gcd 201606 #define __cpp_lib_gcd_lcm 201606 +#ifdef __GCC_DESTRUCTIVE_SIZE +# define __cpp_lib_hardware_interference_size 201703L +#endif #define __cpp_lib_hypot 201603 #define __cpp_lib_invoke 201411L #define __cpp_lib_lcm 201606 diff --git a/libstdc++-v3/libsupc++/new b/libstdc++-v3/libsupc++/new index 3349b13fd1b..7bc67a6cb02 100644 --- a/libstdc++-v3/libsupc++/new +++ b/libstdc++-v3/libsupc++/new @@ -183,9 +183,9 @@ inline void operator delete[](void*, void*) _GLIBCXX_USE_NOEXCEPT { } } // extern "C++" #if __cplusplus >= 201703L -#ifdef _GLIBCXX_HAVE_BUILTIN_LAUNDER namespace std { +#ifdef _GLIBCXX_HAVE_BUILTIN_LAUNDER #define __cpp_lib_launder 201606 /// Pointer optimization barrier [ptr.launder] template<typename _Tp> @@ -205,8 +205,14 @@ namespace std void launder(const void*) = delete; void launder(volatile void*) = delete; void launder(const volatile void*) = delete; -} #endif // _GLIBCXX_HAVE_BUILTIN_LAUNDER + +#ifdef __GCC_DESTRUCTIVE_SIZE +# define __cpp_lib_hardware_interference_size 201703L + inline constexpr size_t hardware_destructive_interference_size = __GCC_DESTRUCTIVE_SIZE; + inline constexpr size_t hardware_constructive_interference_size = __GCC_CONSTRUCTIVE_SIZE; +#endif // __GCC_DESTRUCTIVE_SIZE +} #endif // C++17 #if __cplusplus > 201703L </cut>

4 years, 6 months

← Newer
1
2
3
4
5
6
7
8
9
Older →

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

linaro-toolchain September 2021