Progress (short week, 3 days)
* UM-2 [QEMU upstream maintainership]
+ more code review, notably the Apple Silicon hvf support, which is
nearly ready to go in
* QEMU-406 [QEMU support for MVE (M-profile Vector Extension; Helium)]
+ Sent out v2 of the "optimized code gen for MVE" patchset;
this now covers all the insns that have an easy optimized version.
+ Fixed a bug where we weren't correctly setting up FPSCR.LTPSIZE
when using QEMU's user-mode-only emulator
+ Wrote some code to add support for the (not yet finalized) gdbstub
XML that tells GDB that the guest CPU has MVE. This causes a GDB
with the MVE handling to crash, so one or the other of us has
got something wrong :-)
KVM Forum was this week, as a 2-day virtual conference. I felt the
programme was comparatively a bit small this year, but there were some
interesting talks. Also a BoF session on whether/how we should
consider adding Rust code to QEMU: I am pushing for (a) a clearer
medium-to-long-term vision of where we would be going and why we'd be
doing this and (b) more design-sketch type work of "what would XYZ in
rust look like", which would hopefully both (a) make the benefit/lack
thereof a bit more clear and (b) demonstrate that there are enough
people enthusiastic enough about the prospect to make it a success...
-- PMM
After llvm commit 1c3fcc8ae92ebfe9a9d1a21a288ad71ef7f98091
Author: Amy Kwan <amy.kwan1(a)ibm.com>
[libc++][NFC] Mark values in gdb pretty print comparison functions as live to prevent values being optimized out.
the following hot functions grew in size by more than 10% (but their benchmarks grew in size by less than 1%):
- 447.dealII,[.] contract<3> grew in size by 164%
Benchmark:
Toolchain: Clang + Glibc + LLVM Linker
Version: all components were built from their latest release branch
Target: aarch64-linux-gnu
Compiler flags: -Oz
Hardware: APM Mustang 8x X-Gene1
This commit has regressed these CI configurations:
- tcwg_bmk_llvm_apm/llvm-release-aarch64-spec2k6-Oz
First_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release…
Last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release…
Baseline build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release…
Even more details: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release…
Reproduce builds:
<cut>
mkdir investigate-llvm-1c3fcc8ae92ebfe9a9d1a21a288ad71ef7f98091
cd investigate-llvm-1c3fcc8ae92ebfe9a9d1a21a288ad71ef7f98091
# Fetch scripts
git clone https://git.linaro.org/toolchain/jenkins-scripts
# Fetch manifests and test.sh script
mkdir -p artifacts/manifests
curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release… --fail
curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release… --fail
curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-release… --fail
chmod +x artifacts/test.sh
# Reproduce the baseline build (build all pre-requisites)
./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh
# Save baseline build state (which is then restored in artifacts/test.sh)
mkdir -p ./bisect
rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /llvm/ ./ ./bisect/baseline/
cd llvm
# Reproduce first_bad build
git checkout --detach 1c3fcc8ae92ebfe9a9d1a21a288ad71ef7f98091
../artifacts/test.sh
# Reproduce last_good build
git checkout --detach c8905f1bb304f1cfe297312ae0dda9946cb27594
../artifacts/test.sh
cd ..
</cut>
Full commit (up to 1000 lines):
<cut>
commit 1c3fcc8ae92ebfe9a9d1a21a288ad71ef7f98091
Author: Amy Kwan <amy.kwan1(a)ibm.com>
Date: Fri Sep 3 14:53:57 2021 -0400
[libc++][NFC] Mark values in gdb pretty print comparison functions as live to prevent values being optimized out.
It appears when testing LLVM 13 on Power, we run into failures with the
`libcxx/test/libcxx/gdb/gdb_pretty_printer_test.sh.cpp` test case optimizing
values out.
Despite some the functions in the test already being marked with optnone,
adding the `MarkAsLive()` calls inside of the pretty printer comparison functions
resolves the issues of the values being optimized out.
This patch aims to address https://llvm.org/PR51675.
Differential Revision: https://reviews.llvm.org/D109204
(cherry picked from commit 217c6d643124be312f4a99b203118744edb9d54c)
---
libcxx/test/libcxx/gdb/gdb_pretty_printer_test.sh.cpp | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/libcxx/test/libcxx/gdb/gdb_pretty_printer_test.sh.cpp b/libcxx/test/libcxx/gdb/gdb_pretty_printer_test.sh.cpp
index 2d8e9620089a..7c8d307d19fb 100644
--- a/libcxx/test/libcxx/gdb/gdb_pretty_printer_test.sh.cpp
+++ b/libcxx/test/libcxx/gdb/gdb_pretty_printer_test.sh.cpp
@@ -92,24 +92,28 @@ void MarkAsLive(Type &&) {}
template <typename TypeToPrint> void ComparePrettyPrintToChars(
TypeToPrint value,
const char *expectation) {
+ MarkAsLive(value);
StopForDebugger(&value, &expectation);
}
template <typename TypeToPrint> void ComparePrettyPrintToRegex(
TypeToPrint value,
const char *expectation) {
+ MarkAsLive(value);
StopForDebugger(&value, &expectation);
}
void CompareExpressionPrettyPrintToChars(
std::string value,
const char *expectation) {
+ MarkAsLive(value);
StopForDebugger(&value, &expectation);
}
void CompareExpressionPrettyPrintToRegex(
std::string value,
const char *expectation) {
+ MarkAsLive(value);
StopForDebugger(&value, &expectation);
}
</cut>
After gcc commit c416c52bcdb120db5e8c53a51bd78c4360daf79b
Author: Nathan Sidwell <nathan(a)acm.org>
c++ ICE with nested requirement as default tpl parm[PR94827]
the following benchmarks slowed down by more than 2%:
- 456.hmmer slowed down by 4%
Benchmark:
Toolchain: GCC + Glibc + GNU Linker
Version: all components were built from their latest release branch
Target: aarch64-linux-gnu
Compiler flags: -O3 -flto
Hardware: NVidia TX1 4x Cortex-A57
This commit has regressed these CI configurations:
- tcwg_bmk_gnu_tx1/gnu-release-aarch64-spec2k6-O3_LTO
First_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-release-a…
Last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-release-a…
Baseline build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-release-a…
Even more details: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-release-a…
Reproduce builds:
<cut>
mkdir investigate-gcc-c416c52bcdb120db5e8c53a51bd78c4360daf79b
cd investigate-gcc-c416c52bcdb120db5e8c53a51bd78c4360daf79b
# Fetch scripts
git clone https://git.linaro.org/toolchain/jenkins-scripts
# Fetch manifests and test.sh script
mkdir -p artifacts/manifests
curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-release-a… --fail
curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-release-a… --fail
curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-release-a… --fail
chmod +x artifacts/test.sh
# Reproduce the baseline build (build all pre-requisites)
./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh
# Save baseline build state (which is then restored in artifacts/test.sh)
mkdir -p ./bisect
rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /gcc/ ./ ./bisect/baseline/
cd gcc
# Reproduce first_bad build
git checkout --detach c416c52bcdb120db5e8c53a51bd78c4360daf79b
../artifacts/test.sh
# Reproduce last_good build
git checkout --detach b1983f4582bbe060b7da83578acb9ed653681fc8
../artifacts/test.sh
cd ..
</cut>
Full commit (up to 1000 lines):
<cut>
commit c416c52bcdb120db5e8c53a51bd78c4360daf79b
Author: Nathan Sidwell <nathan(a)acm.org>
Date: Thu Apr 30 08:23:16 2020 -0700
c++ ICE with nested requirement as default tpl parm[PR94827]
Template headers are not incrementally updated as we parse its parameters.
We maintain a dummy level until the closing > when we replace the dummy with
a real parameter set. requires processing was expecting a properly populated
arg_vec in current_template_parms, and then creates a self-mapping of parameters
from that. But we don't need to do that, just teach map_arguments to look at
TREE_VALUE when args is NULL.
* constraint.cc (map_arguments): If ARGS is null, it's a
self-mapping of parms.
(finish_nested_requirement): Do not pass argified
current_template_parms to normalization.
(tsubst_nested_requirement): Don't assert no template parms.
---
gcc/cp/ChangeLog | 10 ++++++++++
gcc/cp/constraint.cc | 27 ++++++++++++++++-----------
gcc/testsuite/g++.dg/concepts/pr94827.C | 15 +++++++++++++++
3 files changed, 41 insertions(+), 11 deletions(-)
diff --git a/gcc/cp/ChangeLog b/gcc/cp/ChangeLog
index 1fa0e123cb1..3c57945cecf 100644
--- a/gcc/cp/ChangeLog
+++ b/gcc/cp/ChangeLog
@@ -1,3 +1,13 @@
+2020-04-30 Jason Merrill <jason(a)redhat.com>
+ Nathan Sidwell <nathan(a)acm.org>
+
+ PR c++/94827
+ * constraint.cc (map_arguments): If ARGS is null, it's a
+ self-mapping of parms.
+ (finish_nested_requirement): Do not pass argified
+ current_template_parms to normalization.
+ (tsubst_nested_requirement): Don't assert no template parms.
+
2020-04-30 Iain Sandoe <iain(a)sandoe.co.uk>
PR c++/94886
diff --git a/gcc/cp/constraint.cc b/gcc/cp/constraint.cc
index 866b0f51b05..85513fecf43 100644
--- a/gcc/cp/constraint.cc
+++ b/gcc/cp/constraint.cc
@@ -546,12 +546,16 @@ static tree
map_arguments (tree parms, tree args)
{
for (tree p = parms; p; p = TREE_CHAIN (p))
- {
- int level;
- int index;
- template_parm_level_and_index (TREE_VALUE (p), &level, &index);
- TREE_PURPOSE (p) = TMPL_ARG (args, level, index);
- }
+ if (args)
+ {
+ int level;
+ int index;
+ template_parm_level_and_index (TREE_VALUE (p), &level, &index);
+ TREE_PURPOSE (p) = TMPL_ARG (args, level, index);
+ }
+ else
+ TREE_PURPOSE (p) = TREE_VALUE (p);
+
return parms;
}
@@ -2005,8 +2009,6 @@ tsubst_compound_requirement (tree t, tree args, subst_info info)
static tree
tsubst_nested_requirement (tree t, tree args, subst_info info)
{
- gcc_assert (!uses_template_parms (args));
-
/* Ensure that we're in an evaluation context prior to satisfaction. */
tree norm = TREE_VALUE (TREE_TYPE (t));
tree result = satisfy_constraint (norm, args, info);
@@ -2953,12 +2955,15 @@ finish_compound_requirement (location_t loc, tree expr, tree type, bool noexcept
tree
finish_nested_requirement (location_t loc, tree expr)
{
+ /* Currently open template headers have dummy arg vectors, so don't
+ pass into normalization. */
+ tree norm = normalize_constraint_expression (expr, NULL_TREE, false);
+ tree args = current_template_parms
+ ? template_parms_to_args (current_template_parms) : NULL_TREE;
+
/* Save the normalized constraint and complete set of normalization
arguments with the requirement. We keep the complete set of arguments
around for re-normalization during diagnostics. */
- tree args = current_template_parms
- ? template_parms_to_args (current_template_parms) : NULL_TREE;
- tree norm = normalize_constraint_expression (expr, args, false);
tree info = build_tree_list (args, norm);
/* Build the constraint, saving its normalization as its type. */
diff --git a/gcc/testsuite/g++.dg/concepts/pr94827.C b/gcc/testsuite/g++.dg/concepts/pr94827.C
new file mode 100644
index 00000000000..f14ec2551a1
--- /dev/null
+++ b/gcc/testsuite/g++.dg/concepts/pr94827.C
@@ -0,0 +1,15 @@
+// PR 94287 ICE looking inside open template-parm level
+// { dg-do run { target c++17 } }
+// { dg-options -fconcepts }
+
+template <typename T,
+ bool X = requires { requires (sizeof(T)==1); } >
+ int foo(T) { return X; }
+
+int main() {
+ if (!foo('4'))
+ return 1;
+ if (foo (4))
+ return 2;
+ return 0;
+}
</cut>
After llvm commit f17d60d620283b5d53286056ceeaeb8c27b6530a
Author: Bjorn Pettersson <bjorn.a.pettersson(a)ericsson.com>
Inform pass manager when child loops are deleted
Below reproducer instructions can be used to re-build both "first_bad" and "last_good" cross-toolchains used in this bisection. Naturally, the scripts will fail when triggerring benchmarking jobs if you don't have access to Linaro TCWG CI.
This commit has regressed these CI configurations:
- tcwg_bmk_llvm_tx1/llvm-release-aarch64-spec2k6-O2
First_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-release…
Last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-release…
Baseline build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-release…
Even more details: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-release…
Reproduce builds:
<cut>
mkdir investigate-llvm-f17d60d620283b5d53286056ceeaeb8c27b6530a
cd investigate-llvm-f17d60d620283b5d53286056ceeaeb8c27b6530a
# Fetch scripts
git clone https://git.linaro.org/toolchain/jenkins-scripts
# Fetch manifests and test.sh script
mkdir -p artifacts/manifests
curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-release… --fail
curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-release… --fail
curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-release… --fail
chmod +x artifacts/test.sh
# Reproduce the baseline build (build all pre-requisites)
./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh
# Save baseline build state (which is then restored in artifacts/test.sh)
mkdir -p ./bisect
rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /llvm/ ./ ./bisect/baseline/
cd llvm
# Reproduce first_bad build
git checkout --detach f17d60d620283b5d53286056ceeaeb8c27b6530a
../artifacts/test.sh
# Reproduce last_good build
git checkout --detach f56129fe78d5c849971017976c71333b6b1a27c6
../artifacts/test.sh
cd ..
</cut>
Full commit (up to 1000 lines):
<cut>
commit f17d60d620283b5d53286056ceeaeb8c27b6530a
Author: Bjorn Pettersson <bjorn.a.pettersson(a)ericsson.com>
Date: Fri Sep 3 20:50:33 2021 +0200
Inform pass manager when child loops are deleted
As part of the nontrivial unswitching we could end up removing child
loops. This patch add a notification to the pass manager when
that happens (using the markLoopAsDeleted callback).
Without this there could be stale LoopAccessAnalysis results cached
in the analysis manager. Those analysis results are cached based on
a Loop* as key. Since the BumpPtrAllocator used to allocate
Loop objects could be resetted between different runs of for
example the loop-distribute pass (running on different functions),
a new Loop object could be created using the same Loop pointer.
And then when requiring the LoopAccessAnalysis for the loop we
got the stale (corrupt) result from the destroyed loop.
Reviewed By: aeubanks
Differential Revision: https://reviews.llvm.org/D109257
(fixes PR51754)
(cherry-picked from commit 0f0344dd1e3b53387bb396070916e67f4c426da6)
---
llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp | 43 +++++++++----
.../nontrivial-unswitch-markloopasdeleted.ll | 71 ++++++++++++++++++++++
2 files changed, 102 insertions(+), 12 deletions(-)
diff --git a/llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp b/llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp
index b9cccc2af309..b1c105258027 100644
--- a/llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp
+++ b/llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp
@@ -1587,10 +1587,12 @@ deleteDeadClonedBlocks(Loop &L, ArrayRef<BasicBlock *> ExitBlocks,
BB->eraseFromParent();
}
-static void deleteDeadBlocksFromLoop(Loop &L,
- SmallVectorImpl<BasicBlock *> &ExitBlocks,
- DominatorTree &DT, LoopInfo &LI,
- MemorySSAUpdater *MSSAU) {
+static void
+deleteDeadBlocksFromLoop(Loop &L,
+ SmallVectorImpl<BasicBlock *> &ExitBlocks,
+ DominatorTree &DT, LoopInfo &LI,
+ MemorySSAUpdater *MSSAU,
+ function_ref<void(Loop &, StringRef)> DestroyLoopCB) {
// Find all the dead blocks tied to this loop, and remove them from their
// successors.
SmallSetVector<BasicBlock *, 8> DeadBlockSet;
@@ -1640,6 +1642,7 @@ static void deleteDeadBlocksFromLoop(Loop &L,
}) &&
"If the child loop header is dead all blocks in the child loop must "
"be dead as well!");
+ DestroyLoopCB(*ChildL, ChildL->getName());
LI.destroy(ChildL);
return true;
});
@@ -1980,6 +1983,8 @@ static bool rebuildLoopAfterUnswitch(Loop &L, ArrayRef<BasicBlock *> ExitBlocks,
ParentL->removeChildLoop(llvm::find(*ParentL, &L));
else
LI.removeLoop(llvm::find(LI, &L));
+ // markLoopAsDeleted for L should be triggered by the caller (it is typically
+ // done by using the UnswitchCB callback).
LI.destroy(&L);
return false;
}
@@ -2019,7 +2024,8 @@ static void unswitchNontrivialInvariants(
SmallVectorImpl<BasicBlock *> &ExitBlocks, IVConditionInfo &PartialIVInfo,
DominatorTree &DT, LoopInfo &LI, AssumptionCache &AC,
function_ref<void(bool, bool, ArrayRef<Loop *>)> UnswitchCB,
- ScalarEvolution *SE, MemorySSAUpdater *MSSAU) {
+ ScalarEvolution *SE, MemorySSAUpdater *MSSAU,
+ function_ref<void(Loop &, StringRef)> DestroyLoopCB) {
auto *ParentBB = TI.getParent();
BranchInst *BI = dyn_cast<BranchInst>(&TI);
SwitchInst *SI = BI ? nullptr : cast<SwitchInst>(&TI);
@@ -2319,7 +2325,7 @@ static void unswitchNontrivialInvariants(
// Now that our cloned loops have been built, we can update the original loop.
// First we delete the dead blocks from it and then we rebuild the loop
// structure taking these deletions into account.
- deleteDeadBlocksFromLoop(L, ExitBlocks, DT, LI, MSSAU);
+ deleteDeadBlocksFromLoop(L, ExitBlocks, DT, LI, MSSAU, DestroyLoopCB);
if (MSSAU && VerifyMemorySSA)
MSSAU->getMemorySSA()->verifyMemorySSA();
@@ -2670,7 +2676,8 @@ static bool unswitchBestCondition(
Loop &L, DominatorTree &DT, LoopInfo &LI, AssumptionCache &AC,
AAResults &AA, TargetTransformInfo &TTI,
function_ref<void(bool, bool, ArrayRef<Loop *>)> UnswitchCB,
- ScalarEvolution *SE, MemorySSAUpdater *MSSAU) {
+ ScalarEvolution *SE, MemorySSAUpdater *MSSAU,
+ function_ref<void(Loop &, StringRef)> DestroyLoopCB) {
// Collect all invariant conditions within this loop (as opposed to an inner
// loop which would be handled when visiting that inner loop).
SmallVector<std::pair<Instruction *, TinyPtrVector<Value *>>, 4>
@@ -2958,7 +2965,7 @@ static bool unswitchBestCondition(
<< "\n");
unswitchNontrivialInvariants(L, *BestUnswitchTI, BestUnswitchInvariants,
ExitBlocks, PartialIVInfo, DT, LI, AC,
- UnswitchCB, SE, MSSAU);
+ UnswitchCB, SE, MSSAU, DestroyLoopCB);
return true;
}
@@ -2988,7 +2995,8 @@ unswitchLoop(Loop &L, DominatorTree &DT, LoopInfo &LI, AssumptionCache &AC,
AAResults &AA, TargetTransformInfo &TTI, bool Trivial,
bool NonTrivial,
function_ref<void(bool, bool, ArrayRef<Loop *>)> UnswitchCB,
- ScalarEvolution *SE, MemorySSAUpdater *MSSAU) {
+ ScalarEvolution *SE, MemorySSAUpdater *MSSAU,
+ function_ref<void(Loop &, StringRef)> DestroyLoopCB) {
assert(L.isRecursivelyLCSSAForm(DT, LI) &&
"Loops must be in LCSSA form before unswitching.");
@@ -3036,7 +3044,8 @@ unswitchLoop(Loop &L, DominatorTree &DT, LoopInfo &LI, AssumptionCache &AC,
// Try to unswitch the best invariant condition. We prefer this full unswitch to
// a partial unswitch when possible below the threshold.
- if (unswitchBestCondition(L, DT, LI, AC, AA, TTI, UnswitchCB, SE, MSSAU))
+ if (unswitchBestCondition(L, DT, LI, AC, AA, TTI, UnswitchCB, SE, MSSAU,
+ DestroyLoopCB))
return true;
// No other opportunities to unswitch.
@@ -3083,6 +3092,10 @@ PreservedAnalyses SimpleLoopUnswitchPass::run(Loop &L, LoopAnalysisManager &AM,
U.markLoopAsDeleted(L, LoopName);
};
+ auto DestroyLoopCB = [&U](Loop &L, StringRef Name) {
+ U.markLoopAsDeleted(L, Name);
+ };
+
Optional<MemorySSAUpdater> MSSAU;
if (AR.MSSA) {
MSSAU = MemorySSAUpdater(AR.MSSA);
@@ -3091,7 +3104,8 @@ PreservedAnalyses SimpleLoopUnswitchPass::run(Loop &L, LoopAnalysisManager &AM,
}
if (!unswitchLoop(L, AR.DT, AR.LI, AR.AC, AR.AA, AR.TTI, Trivial, NonTrivial,
UnswitchCB, &AR.SE,
- MSSAU.hasValue() ? MSSAU.getPointer() : nullptr))
+ MSSAU.hasValue() ? MSSAU.getPointer() : nullptr,
+ DestroyLoopCB))
return PreservedAnalyses::all();
if (AR.MSSA && VerifyMemorySSA)
@@ -3179,12 +3193,17 @@ bool SimpleLoopUnswitchLegacyPass::runOnLoop(Loop *L, LPPassManager &LPM) {
LPM.markLoopAsDeleted(*L);
};
+ auto DestroyLoopCB = [&LPM](Loop &L, StringRef /* Name */) {
+ LPM.markLoopAsDeleted(L);
+ };
+
if (MSSA && VerifyMemorySSA)
MSSA->verifyMemorySSA();
bool Changed =
unswitchLoop(*L, DT, LI, AC, AA, TTI, true, NonTrivial, UnswitchCB, SE,
- MSSAU.hasValue() ? MSSAU.getPointer() : nullptr);
+ MSSAU.hasValue() ? MSSAU.getPointer() : nullptr,
+ DestroyLoopCB);
if (MSSA && VerifyMemorySSA)
MSSA->verifyMemorySSA();
diff --git a/llvm/test/Transforms/SimpleLoopUnswitch/nontrivial-unswitch-markloopasdeleted.ll b/llvm/test/Transforms/SimpleLoopUnswitch/nontrivial-unswitch-markloopasdeleted.ll
new file mode 100644
index 000000000000..455a38535576
--- /dev/null
+++ b/llvm/test/Transforms/SimpleLoopUnswitch/nontrivial-unswitch-markloopasdeleted.ll
@@ -0,0 +1,71 @@
+; RUN: opt < %s -enable-loop-distribute -passes='loop-distribute,loop-mssa(simple-loop-unswitch<nontrivial>),loop-distribute' -o /dev/null -S -debug-pass-manager=verbose 2>&1 | FileCheck %s
+
+
+; Running loop-distribute will result in LoopAccessAnalysis being required and
+; cached in the LoopAnalysisManagerFunctionProxy.
+;
+; CHECK: Running analysis: LoopAccessAnalysis on Loop at depth 2 containing: %loop_a_inner<header><latch><exiting>
+
+
+; Then simple-loop-unswitch is removing/replacing some loops (resulting in
+; Loop objects used as key in the analyses cache is destroyed). So here we
+; want to see that any analysis results cached on the destroyed loop is
+; cleared. A special case here is that loop_a_inner is destroyed when
+; unswitching the parent loop.
+;
+; The bug solved and verified by this test case was related to the
+; SimpleLoopUnswitch not marking the Loop as removed, so we missed clearing
+; the analysis caches.
+;
+; CHECK: Running pass: SimpleLoopUnswitchPass on Loop at depth 1 containing: %loop_begin<header>,%loop_b,%loop_b_inner,%loop_b_inner_exit,%loop_a,%loop_a_inner,%loop_a_inner_exit,%latch<latch><exiting>
+; CHECK-NEXT: Clearing all analysis results for: loop_a_inner
+
+
+; When running loop-distribute the second time we can see that loop_a_inner
+; isn't analysed because the loop no longer exists (instead we find a new loop,
+; loop_a_inner.us). This kind of verifies that it was correct to remove the
+; loop_a_inner related analysis above.
+;
+; CHECK: Running analysis: LoopAccessAnalysis on Loop at depth 2 containing: %loop_a_inner.us<header><latch><exiting>
+
+
+define i32 @test6(i1* %ptr, i1 %cond1, i32* %a.ptr, i32* %b.ptr) {
+entry:
+ br label %loop_begin
+
+loop_begin:
+ %v = load i1, i1* %ptr
+ br i1 %cond1, label %loop_a, label %loop_b
+
+loop_a:
+ br label %loop_a_inner
+
+loop_a_inner:
+ %va = load i1, i1* %ptr
+ %a = load i32, i32* %a.ptr
+ br i1 %va, label %loop_a_inner, label %loop_a_inner_exit
+
+loop_a_inner_exit:
+ %a.lcssa = phi i32 [ %a, %loop_a_inner ]
+ br label %latch
+
+loop_b:
+ br label %loop_b_inner
+
+loop_b_inner:
+ %vb = load i1, i1* %ptr
+ %b = load i32, i32* %b.ptr
+ br i1 %vb, label %loop_b_inner, label %loop_b_inner_exit
+
+loop_b_inner_exit:
+ %b.lcssa = phi i32 [ %b, %loop_b_inner ]
+ br label %latch
+
+latch:
+ %ab.phi = phi i32 [ %a.lcssa, %loop_a_inner_exit ], [ %b.lcssa, %loop_b_inner_exit ]
+ br i1 %v, label %loop_begin, label %loop_exit
+
+loop_exit:
+ %ab.lcssa = phi i32 [ %ab.phi, %latch ]
+ ret i32 %ab.lcssa
+}
</cut>