Progress:
* UM-2 [QEMU upstream maintainership]
+ Code review:
- RTH's latest bswap series
- RTH's series converting nios2 to TranslatorOps
- Alex's SYS_HEAPINFO series
- RTH's series to avoid linux-user putting signal trampoline code
on the stack
- Renesas SCI device refactoring
+ Working on making the docs report the license, version, etc in each
page's footer (something I was asked for back when I did the initial
conversion to Sphinx and which I promised I would do afterwards...)
+ Investigated a bug reported by Maxim Uvarov where the virt board's
GPIO-driven shutdown/reset mechanism for secure firmware wasn't
working correctly. This turns out to be caused by our PL061 GPIO
implementation hardcoding that GPIO lines have pullup resistors,
whereas the virt board and the gpio-pwr device implicitly assume
pulldown. That bug was then masked/confused by a second bug, where
we only actually signal the line level implied by the pullup when
the guest first touches the PL061, rather than on reset. Sent a
series that fixes both of these things (and does a bit of other
PL061 cleanup in the process).
* QEMU-406 [QEMU support for MVE (M-profile Vector Extension; Helium)]
+ Implemented all the non-vector shift insns
+ Sent out a second slice of patches for review
+ Progress: 112/210 (53%)
-- PMM
Successfully identified regression in *llvm* in CI configuration tcwg_bmk_llvm_tk1/llvm-release-arm-spec2k6-Os. So far, this commit has regressed CI configurations:
- tcwg_bmk_llvm_tk1/llvm-release-arm-spec2k6-Os
Culprit:
<cut>
commit 3a2b05f9fe74fcf9560632cf2695058d47d8683b
Author: Evgeniy Brevnov <ybrevnov(a)azul.com>
Date: Fri Jul 24 18:57:10 2020 +0700
[BPI][NFC] Consolidate code to deal with SCCs under a dedicated data structure.
In order to facilitate review of D79485 here is a small NFC change which restructures code around handling of SCCs in BPI.
Reviewed By: davidxl
Differential Revision: https://reviews.llvm.org/D84514
</cut>
Results regressed to (for first_bad == 3a2b05f9fe74fcf9560632cf2695058d47d8683b)
# reset_artifacts:
-10
# build_abe binutils:
-9
# build_abe stage1 -- --set gcc_override_configure=--with-mode=thumb --set gcc_override_configure=--disable-libsanitizer:
-8
# build_abe linux:
-7
# build_abe glibc:
-6
# build_abe stage2 -- --set gcc_override_configure=--with-mode=thumb --set gcc_override_configure=--disable-libsanitizer:
-5
# build_llvm true:
-3
# true:
0
# benchmark -Os_mthumb -- artifacts/build-3a2b05f9fe74fcf9560632cf2695058d47d8683b/results_id:
1
# 401.bzip2,bzip2_base.default regressed by 104
from (for last_good == 7294ca3f6ecacd05a197bbf0637e10afcb99b6d6)
# reset_artifacts:
-10
# build_abe binutils:
-9
# build_abe stage1 -- --set gcc_override_configure=--with-mode=thumb --set gcc_override_configure=--disable-libsanitizer:
-8
# build_abe linux:
-7
# build_abe glibc:
-6
# build_abe stage2 -- --set gcc_override_configure=--with-mode=thumb --set gcc_override_configure=--disable-libsanitizer:
-5
# build_llvm true:
-3
# true:
0
# benchmark -Os_mthumb -- artifacts/build-7294ca3f6ecacd05a197bbf0637e10afcb99b6d6/results_id:
1
Artifacts of last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release…
Results ID of last_good: tk1_32/tcwg_bmk_llvm_tk1/bisect-llvm-release-arm-spec2k6-Os/699
Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release…
Results ID of first_bad: tk1_32/tcwg_bmk_llvm_tk1/bisect-llvm-release-arm-spec2k6-Os/688
Build top page/logs: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release…
Configuration details:
Reproduce builds:
<cut>
mkdir investigate-llvm-3a2b05f9fe74fcf9560632cf2695058d47d8683b
cd investigate-llvm-3a2b05f9fe74fcf9560632cf2695058d47d8683b
git clone https://git.linaro.org/toolchain/jenkins-scripts
mkdir -p artifacts/manifests
curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… --fail
curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… --fail
curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… --fail
chmod +x artifacts/test.sh
# Reproduce the baseline build (build all pre-requisites)
./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh
cd llvm
# Reproduce first_bad build
git checkout --detach 3a2b05f9fe74fcf9560632cf2695058d47d8683b
../artifacts/test.sh
# Reproduce last_good build
git checkout --detach 7294ca3f6ecacd05a197bbf0637e10afcb99b6d6
../artifacts/test.sh
cd ..
</cut>
History of pending regressions and results: https://git.linaro.org/toolchain/ci/base-artifacts.git/log/?h=linaro-local/…
Artifacts: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release…
Build log: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release…
Full commit (up to 1000 lines):
<cut>
commit 3a2b05f9fe74fcf9560632cf2695058d47d8683b
Author: Evgeniy Brevnov <ybrevnov(a)azul.com>
Date: Fri Jul 24 18:57:10 2020 +0700
[BPI][NFC] Consolidate code to deal with SCCs under a dedicated data structure.
In order to facilitate review of D79485 here is a small NFC change which restructures code around handling of SCCs in BPI.
Reviewed By: davidxl
Differential Revision: https://reviews.llvm.org/D84514
---
llvm/include/llvm/Analysis/BranchProbabilityInfo.h | 71 ++++++++-
llvm/lib/Analysis/BranchProbabilityInfo.cpp | 164 +++++++++++++--------
2 files changed, 169 insertions(+), 66 deletions(-)
diff --git a/llvm/include/llvm/Analysis/BranchProbabilityInfo.h b/llvm/include/llvm/Analysis/BranchProbabilityInfo.h
index 3e72afba36c3..7feb5b625938 100644
--- a/llvm/include/llvm/Analysis/BranchProbabilityInfo.h
+++ b/llvm/include/llvm/Analysis/BranchProbabilityInfo.h
@@ -151,13 +151,66 @@ public:
/// Forget analysis results for the given basic block.
void eraseBlock(const BasicBlock *BB);
- // Use to track SCCs for handling irreducible loops.
- using SccMap = DenseMap<const BasicBlock *, int>;
- using SccHeaderMap = DenseMap<const BasicBlock *, bool>;
- using SccHeaderMaps = std::vector<SccHeaderMap>;
- struct SccInfo {
+ class SccInfo {
+ // Enum of types to classify basic blocks in SCC. Basic block belonging to
+ // SCC is 'Inner' until it is either 'Header' or 'Exiting'. Note that a
+ // basic block can be 'Header' and 'Exiting' at the same time.
+ enum SccBlockType {
+ Inner = 0x0,
+ Header = 0x1,
+ Exiting = 0x2,
+ };
+ // Map of basic blocks to SCC IDs they belong to. If basic block doesn't
+ // belong to any SCC it is not in the map.
+ using SccMap = DenseMap<const BasicBlock *, int>;
+ // Each basic block in SCC is attributed with one or several types from
+ // SccBlockType. Map value has uint32_t type (instead of SccBlockType)
+ // since basic block may be for example "Header" and "Exiting" at the same
+ // time and we need to be able to keep more than one value from
+ // SccBlockType.
+ using SccBlockTypeMap = DenseMap<const BasicBlock *, uint32_t>;
+ // Vector containing classification of basic blocks for all SCCs where i'th
+ // vector element corresponds to SCC with ID equal to i.
+ using SccBlockTypeMaps = std::vector<SccBlockTypeMap>;
+
SccMap SccNums;
- SccHeaderMaps SccHeaders;
+ SccBlockTypeMaps SccBlocks;
+
+ public:
+ explicit SccInfo(const Function &F);
+
+ /// If \p BB belongs to some SCC then ID of that SCC is returned, otherwise
+ /// -1 is returned. If \p BB belongs to more than one SCC at the same time
+ /// result is undefined.
+ int getSCCNum(const BasicBlock *BB) const;
+ /// Returns true if \p BB is a 'header' block in SCC with \p SccNum ID,
+ /// false otherwise.
+ bool isSCCHeader(const BasicBlock *BB, int SccNum) const {
+ return getSccBlockType(BB, SccNum) & Header;
+ }
+ /// Returns true if \p BB is an 'exiting' block in SCC with \p SccNum ID,
+ /// false otherwise.
+ bool isSCCExitingBlock(const BasicBlock *BB, int SccNum) const {
+ return getSccBlockType(BB, SccNum) & Exiting;
+ }
+ /// Fills in \p Enters vector with all such blocks that don't belong to
+ /// SCC with \p SccNum ID but there is an edge to a block belonging to the
+ /// SCC.
+ void getSccEnterBlocks(int SccNum,
+ SmallVectorImpl<BasicBlock *> &Enters) const;
+ /// Fills in \p Exits vector with all such blocks that don't belong to
+ /// SCC with \p SccNum ID but there is an edge from a block belonging to the
+ /// SCC.
+ void getSccExitBlocks(int SccNum,
+ SmallVectorImpl<BasicBlock *> &Exits) const;
+
+ private:
+ /// Returns \p BB's type according to classification given by SccBlockType
+ /// enum. Please note that \p BB must belong to SSC with \p SccNum ID.
+ uint32_t getSccBlockType(const BasicBlock *BB, int SccNum) const;
+ /// Calculates \p BB's type and stores it in internal data structures for
+ /// future use. Please note that \p BB must belong to SSC with \p SccNum ID.
+ void calculateSccBlockType(const BasicBlock *BB, int SccNum);
};
private:
@@ -196,6 +249,9 @@ private:
/// Track the last function we run over for printing.
const Function *LastF = nullptr;
+ /// Keeps information about all SCCs in a function.
+ std::unique_ptr<const SccInfo> SccI;
+
/// Track the set of blocks directly succeeded by a returning block.
SmallPtrSet<const BasicBlock *, 16> PostDominatedByUnreachable;
@@ -210,8 +266,7 @@ private:
bool calcMetadataWeights(const BasicBlock *BB);
bool calcColdCallHeuristics(const BasicBlock *BB);
bool calcPointerHeuristics(const BasicBlock *BB);
- bool calcLoopBranchHeuristics(const BasicBlock *BB, const LoopInfo &LI,
- SccInfo &SccI);
+ bool calcLoopBranchHeuristics(const BasicBlock *BB, const LoopInfo &LI);
bool calcZeroHeuristics(const BasicBlock *BB, const TargetLibraryInfo *TLI);
bool calcFloatingPointHeuristics(const BasicBlock *BB);
bool calcInvokeHeuristics(const BasicBlock *BB);
diff --git a/llvm/lib/Analysis/BranchProbabilityInfo.cpp b/llvm/lib/Analysis/BranchProbabilityInfo.cpp
index a396b5ad21c6..195fc69d9601 100644
--- a/llvm/lib/Analysis/BranchProbabilityInfo.cpp
+++ b/llvm/lib/Analysis/BranchProbabilityInfo.cpp
@@ -148,6 +148,105 @@ static const uint32_t IH_TAKEN_WEIGHT = 1024 * 1024 - 1;
/// instruction. This is essentially never taken.
static const uint32_t IH_NONTAKEN_WEIGHT = 1;
+BranchProbabilityInfo::SccInfo::SccInfo(const Function &F) {
+ // Record SCC numbers of blocks in the CFG to identify irreducible loops.
+ // FIXME: We could only calculate this if the CFG is known to be irreducible
+ // (perhaps cache this info in LoopInfo if we can easily calculate it there?).
+ int SccNum = 0;
+ for (scc_iterator<const Function *> It = scc_begin(&F); !It.isAtEnd();
+ ++It, ++SccNum) {
+ // Ignore single-block SCCs since they either aren't loops or LoopInfo will
+ // catch them.
+ const std::vector<const BasicBlock *> &Scc = *It;
+ if (Scc.size() == 1)
+ continue;
+
+ LLVM_DEBUG(dbgs() << "BPI: SCC " << SccNum << ":");
+ for (const auto *BB : Scc) {
+ LLVM_DEBUG(dbgs() << " " << BB->getName());
+ SccNums[BB] = SccNum;
+ calculateSccBlockType(BB, SccNum);
+ }
+ LLVM_DEBUG(dbgs() << "\n");
+ }
+}
+
+int BranchProbabilityInfo::SccInfo::getSCCNum(const BasicBlock *BB) const {
+ auto SccIt = SccNums.find(BB);
+ if (SccIt == SccNums.end())
+ return -1;
+ return SccIt->second;
+}
+
+void BranchProbabilityInfo::SccInfo::getSccEnterBlocks(
+ int SccNum, SmallVectorImpl<BasicBlock *> &Enters) const {
+
+ for (auto MapIt : SccBlocks[SccNum]) {
+ const auto *BB = MapIt.first;
+ if (isSCCHeader(BB, SccNum))
+ for (const auto *Pred : predecessors(BB))
+ if (getSCCNum(Pred) != SccNum)
+ Enters.push_back(const_cast<BasicBlock *>(BB));
+ }
+}
+
+void BranchProbabilityInfo::SccInfo::getSccExitBlocks(
+ int SccNum, SmallVectorImpl<BasicBlock *> &Exits) const {
+ for (auto MapIt : SccBlocks[SccNum]) {
+ const auto *BB = MapIt.first;
+ if (isSCCExitingBlock(BB, SccNum))
+ for (const auto *Succ : successors(BB))
+ if (getSCCNum(Succ) != SccNum)
+ Exits.push_back(const_cast<BasicBlock *>(BB));
+ }
+}
+
+uint32_t BranchProbabilityInfo::SccInfo::getSccBlockType(const BasicBlock *BB,
+ int SccNum) const {
+ assert(getSCCNum(BB) == SccNum);
+
+ assert(SccBlocks.size() > static_cast<unsigned>(SccNum) && "Unknown SCC");
+ const auto &SccBlockTypes = SccBlocks[SccNum];
+
+ auto It = SccBlockTypes.find(BB);
+ if (It != SccBlockTypes.end()) {
+ return It->second;
+ }
+ return Inner;
+}
+
+void BranchProbabilityInfo::SccInfo::calculateSccBlockType(const BasicBlock *BB,
+ int SccNum) {
+ assert(getSCCNum(BB) == SccNum);
+ uint32_t BlockType = Inner;
+
+ if (llvm::any_of(make_range(pred_begin(BB), pred_end(BB)),
+ [&](const BasicBlock *Pred) {
+ // Consider any block that is an entry point to the SCC as
+ // a header.
+ return getSCCNum(Pred) != SccNum;
+ }))
+ BlockType |= Header;
+
+ if (llvm::any_of(
+ make_range(succ_begin(BB), succ_end(BB)),
+ [&](const BasicBlock *Succ) { return getSCCNum(Succ) != SccNum; }))
+ BlockType |= Exiting;
+
+ // Lazily compute the set of headers for a given SCC and cache the results
+ // in the SccHeaderMap.
+ if (SccBlocks.size() <= static_cast<unsigned>(SccNum))
+ SccBlocks.resize(SccNum + 1);
+ auto &SccBlockTypes = SccBlocks[SccNum];
+
+ if (BlockType != Inner) {
+ bool IsInserted;
+ std::tie(std::ignore, IsInserted) =
+ SccBlockTypes.insert(std::make_pair(BB, BlockType));
+ assert(IsInserted && "Duplicated block in SCC");
+ }
+}
+
static void UpdatePDTWorklist(const BasicBlock *BB, PostDominatorTree *PDT,
SmallVectorImpl<const BasicBlock *> &WorkList,
SmallPtrSetImpl<const BasicBlock *> &TargetSet) {
@@ -511,38 +610,6 @@ bool BranchProbabilityInfo::calcPointerHeuristics(const BasicBlock *BB) {
return true;
}
-static int getSCCNum(const BasicBlock *BB,
- const BranchProbabilityInfo::SccInfo &SccI) {
- auto SccIt = SccI.SccNums.find(BB);
- if (SccIt == SccI.SccNums.end())
- return -1;
- return SccIt->second;
-}
-
-// Consider any block that is an entry point to the SCC as a header.
-static bool isSCCHeader(const BasicBlock *BB, int SccNum,
- BranchProbabilityInfo::SccInfo &SccI) {
- assert(getSCCNum(BB, SccI) == SccNum);
-
- // Lazily compute the set of headers for a given SCC and cache the results
- // in the SccHeaderMap.
- if (SccI.SccHeaders.size() <= static_cast<unsigned>(SccNum))
- SccI.SccHeaders.resize(SccNum + 1);
- auto &HeaderMap = SccI.SccHeaders[SccNum];
- bool Inserted;
- BranchProbabilityInfo::SccHeaderMap::iterator HeaderMapIt;
- std::tie(HeaderMapIt, Inserted) = HeaderMap.insert(std::make_pair(BB, false));
- if (Inserted) {
- bool IsHeader = llvm::any_of(make_range(pred_begin(BB), pred_end(BB)),
- [&](const BasicBlock *Pred) {
- return getSCCNum(Pred, SccI) != SccNum;
- });
- HeaderMapIt->second = IsHeader;
- return IsHeader;
- } else
- return HeaderMapIt->second;
-}
-
// Compute the unlikely successors to the block BB in the loop L, specifically
// those that are unlikely because this is a loop, and add them to the
// UnlikelyBlocks set.
@@ -653,12 +720,11 @@ computeUnlikelySuccessors(const BasicBlock *BB, Loop *L,
// Calculate Edge Weights using "Loop Branch Heuristics". Predict backedges
// as taken, exiting edges as not-taken.
bool BranchProbabilityInfo::calcLoopBranchHeuristics(const BasicBlock *BB,
- const LoopInfo &LI,
- SccInfo &SccI) {
+ const LoopInfo &LI) {
int SccNum;
Loop *L = LI.getLoopFor(BB);
if (!L) {
- SccNum = getSCCNum(BB, SccI);
+ SccNum = SccI->getSCCNum(BB);
if (SccNum < 0)
return false;
}
@@ -685,9 +751,9 @@ bool BranchProbabilityInfo::calcLoopBranchHeuristics(const BasicBlock *BB,
else
InEdges.push_back(I.getSuccessorIndex());
} else {
- if (getSCCNum(*I, SccI) != SccNum)
+ if (SccI->getSCCNum(*I) != SccNum)
ExitingEdges.push_back(I.getSuccessorIndex());
- else if (isSCCHeader(*I, SccNum, SccI))
+ else if (SccI->isSCCHeader(*I, SccNum))
BackEdges.push_back(I.getSuccessorIndex());
else
InEdges.push_back(I.getSuccessorIndex());
@@ -1072,26 +1138,7 @@ void BranchProbabilityInfo::calculate(const Function &F, const LoopInfo &LI,
assert(PostDominatedByUnreachable.empty());
assert(PostDominatedByColdCall.empty());
- // Record SCC numbers of blocks in the CFG to identify irreducible loops.
- // FIXME: We could only calculate this if the CFG is known to be irreducible
- // (perhaps cache this info in LoopInfo if we can easily calculate it there?).
- int SccNum = 0;
- SccInfo SccI;
- for (scc_iterator<const Function *> It = scc_begin(&F); !It.isAtEnd();
- ++It, ++SccNum) {
- // Ignore single-block SCCs since they either aren't loops or LoopInfo will
- // catch them.
- const std::vector<const BasicBlock *> &Scc = *It;
- if (Scc.size() == 1)
- continue;
-
- LLVM_DEBUG(dbgs() << "BPI: SCC " << SccNum << ":");
- for (auto *BB : Scc) {
- LLVM_DEBUG(dbgs() << " " << BB->getName());
- SccI.SccNums[BB] = SccNum;
- }
- LLVM_DEBUG(dbgs() << "\n");
- }
+ SccI = std::make_unique<SccInfo>(F);
std::unique_ptr<PostDominatorTree> PDTPtr;
@@ -1119,7 +1166,7 @@ void BranchProbabilityInfo::calculate(const Function &F, const LoopInfo &LI,
continue;
if (calcColdCallHeuristics(BB))
continue;
- if (calcLoopBranchHeuristics(BB, LI, SccI))
+ if (calcLoopBranchHeuristics(BB, LI))
continue;
if (calcPointerHeuristics(BB))
continue;
@@ -1131,6 +1178,7 @@ void BranchProbabilityInfo::calculate(const Function &F, const LoopInfo &LI,
PostDominatedByUnreachable.clear();
PostDominatedByColdCall.clear();
+ SccI.release();
if (PrintBranchProb &&
(PrintBranchProbFuncName.empty() ||
</cut>
Hi folks,
I'm hoping that I might be able to get some development help with
binutils for aarch64...
I'm maintaining the UEFI Secure Boot stack in Debian (shim etc.),
including for arm64/aarch64 (as I wanted to make that work too!). UEFI
binaries are awkward for those of used to the Linux and ELF world -
they're PE/COFF format with different calling conventions to match the
Microsoft world. But we've made things work.
On x86 platforms, the shim build process uses objcopy
--target=efi-app-$(ARCH) to produce the final output binaries. We've
never had similar support for the aarch64 platform, and instead
somebody came up with a method using locally-hacked linker script and
"-O binary" to generate the output binaries. That's worked well
enough for a while, but it's been annoying for various reasons
(particularly debugging problems).
*However*, recently for security reasons we've tweaked the layout of
Secure Boot binaries [1] and this has caused lots of problems. The
older hacks to hand-build the right sections etc. needed significant
extra work, and we're still dealing with awkward bugs related to
this. Based ont these problems, I recently had to make the painful
decision to drop support for arm64 SB in Debian. I know that other
distributions are feeling similar pain. :-(
Rather than continuing to hack on things, I think it's (way past) time
that we did things correctly! We need aarch64 binary format support in
binutils so we can just use it like we do on x86. AFAICS, there is
already a bug open asking for this from last year [2]. Could I please
prevail on some friendly neighourhood aarch64 toolchain engineer to
help with that?
Thanks for considering,
Steve
[1] https://github.com/rhboot/shim/blob/main/SBAT.md
[2] https://sourceware.org/bugzilla/show_bug.cgi?id=26206#add_comment
--
Steve McIntyre, Cambridge, UK. steve(a)einval.com
"...In the UNIX world, people tend to interpret `non-technical user'
as meaning someone who's only ever written one device driver." -- Daniel Pead
Successfully identified regression in *llvm* in CI configuration tcwg_bmk_llvm_tx1/llvm-master-aarch64-spec2k6-O2_LTO. So far, this commit has regressed CI configurations:
- tcwg_bmk_llvm_tx1/llvm-master-aarch64-spec2k6-O2_LTO
Culprit:
<cut>
commit 8fe62b7af1127691d6732438b322e66ae6d39a03
Author: Max Kazantsev <mkazantsev(a)azul.com>
Date: Thu Apr 22 12:50:38 2021 +0700
[GVN] Introduce loop load PRE
This patch allows PRE of the following type of loads:
```
preheader:
br label %loop
loop:
br i1 ..., label %merge, label %clobber
clobber:
call foo() // Clobbers %p
br label %merge
merge:
...
br i1 ..., label %loop, label %exit
```
Into
```
preheader:
%x0 = load %p
br label %loop
loop:
%x.pre = phi(x0, x2)
br i1 ..., label %merge, label %clobber
clobber:
call foo() // Clobbers %p
%x1 = load %p
br label %merge
merge:
x2 = phi(x.pre, x1)
...
br i1 ..., label %loop, label %exit
```
So instead of loading from %p on every iteration, we load only when the actual clobber happens.
The typical pattern which it is trying to address is: hot loop, with all code inlined and
provably having no side effects, and some side-effecting calls on cold path.
The worst overhead from it is, if we always take clobber block, we make 1 more load
overall (in preheader). It only matters if loop has very few iteration. If clobber block is not taken
at least once, the transform is neutral or profitable.
There are several improvements prospect open up:
- We can sometimes be smarter in loop-exiting blocks via split of critical edges;
- If we have block frequency info, we can handle multiple clobbers. The only obstacle now is that
we don't know if their sum is colder than the header.
Differential Revision: https://reviews.llvm.org/D99926
Reviewed By: reames
</cut>
Results regressed to (for first_bad == 8fe62b7af1127691d6732438b322e66ae6d39a03)
# reset_artifacts:
-10
# build_abe binutils:
-9
# build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer:
-8
# build_abe linux:
-7
# build_abe glibc:
-6
# build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer:
-5
# build_llvm true:
-3
# true:
0
# benchmark -O2_LTO -- artifacts/build-8fe62b7af1127691d6732438b322e66ae6d39a03/results_id:
1
# 464.h264ref,h264ref_base.default regressed by 104
from (for last_good == 722d4d8e7585457d407d0639a4ae2610157e06a8)
# reset_artifacts:
-10
# build_abe binutils:
-9
# build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer:
-8
# build_abe linux:
-7
# build_abe glibc:
-6
# build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer:
-5
# build_llvm true:
-3
# true:
0
# benchmark -O2_LTO -- artifacts/build-722d4d8e7585457d407d0639a4ae2610157e06a8/results_id:
1
Artifacts of last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-…
Results ID of last_good: tx1_64/tcwg_bmk_llvm_tx1/bisect-llvm-master-aarch64-spec2k6-O2_LTO/641
Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-…
Results ID of first_bad: tx1_64/tcwg_bmk_llvm_tx1/bisect-llvm-master-aarch64-spec2k6-O2_LTO/642
Build top page/logs: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-…
Configuration details:
Reproduce builds:
<cut>
mkdir investigate-llvm-8fe62b7af1127691d6732438b322e66ae6d39a03
cd investigate-llvm-8fe62b7af1127691d6732438b322e66ae6d39a03
git clone https://git.linaro.org/toolchain/jenkins-scripts
mkdir -p artifacts/manifests
curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail
curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail
curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail
chmod +x artifacts/test.sh
# Reproduce the baseline build (build all pre-requisites)
./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh
cd llvm
# Reproduce first_bad build
git checkout --detach 8fe62b7af1127691d6732438b322e66ae6d39a03
../artifacts/test.sh
# Reproduce last_good build
git checkout --detach 722d4d8e7585457d407d0639a4ae2610157e06a8
../artifacts/test.sh
cd ..
</cut>
History of pending regressions and results: https://git.linaro.org/toolchain/ci/base-artifacts.git/log/?h=linaro-local/…
Artifacts: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-…
Build log: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-…
Full commit (up to 1000 lines):
<cut>
commit 8fe62b7af1127691d6732438b322e66ae6d39a03
Author: Max Kazantsev <mkazantsev(a)azul.com>
Date: Thu Apr 22 12:50:38 2021 +0700
[GVN] Introduce loop load PRE
This patch allows PRE of the following type of loads:
```
preheader:
br label %loop
loop:
br i1 ..., label %merge, label %clobber
clobber:
call foo() // Clobbers %p
br label %merge
merge:
...
br i1 ..., label %loop, label %exit
```
Into
```
preheader:
%x0 = load %p
br label %loop
loop:
%x.pre = phi(x0, x2)
br i1 ..., label %merge, label %clobber
clobber:
call foo() // Clobbers %p
%x1 = load %p
br label %merge
merge:
x2 = phi(x.pre, x1)
...
br i1 ..., label %loop, label %exit
```
So instead of loading from %p on every iteration, we load only when the actual clobber happens.
The typical pattern which it is trying to address is: hot loop, with all code inlined and
provably having no side effects, and some side-effecting calls on cold path.
The worst overhead from it is, if we always take clobber block, we make 1 more load
overall (in preheader). It only matters if loop has very few iteration. If clobber block is not taken
at least once, the transform is neutral or profitable.
There are several improvements prospect open up:
- We can sometimes be smarter in loop-exiting blocks via split of critical edges;
- If we have block frequency info, we can handle multiple clobbers. The only obstacle now is that
we don't know if their sum is colder than the header.
Differential Revision: https://reviews.llvm.org/D99926
Reviewed By: reames
---
llvm/include/llvm/Transforms/Scalar/GVN.h | 6 ++
llvm/lib/Transforms/Scalar/GVN.cpp | 92 +++++++++++++++++++++++++--
llvm/test/Transforms/GVN/PRE/pre-loop-load.ll | 9 ++-
3 files changed, 98 insertions(+), 9 deletions(-)
diff --git a/llvm/include/llvm/Transforms/Scalar/GVN.h b/llvm/include/llvm/Transforms/Scalar/GVN.h
index 13f55ddcf2d6..70662ca213c3 100644
--- a/llvm/include/llvm/Transforms/Scalar/GVN.h
+++ b/llvm/include/llvm/Transforms/Scalar/GVN.h
@@ -328,6 +328,12 @@ private:
bool PerformLoadPRE(LoadInst *Load, AvailValInBlkVect &ValuesPerBlock,
UnavailBlkVect &UnavailableBlocks);
+ /// Try to replace a load which executes on each loop iteraiton with Phi
+ /// translation of load in preheader and load(s) in conditionally executed
+ /// paths.
+ bool performLoopLoadPRE(LoadInst *Load, AvailValInBlkVect &ValuesPerBlock,
+ UnavailBlkVect &UnavailableBlocks);
+
/// Eliminates partially redundant \p Load, replacing it with \p
/// AvailableLoads (connected by Phis if needed).
void eliminatePartiallyRedundantLoad(
diff --git a/llvm/lib/Transforms/Scalar/GVN.cpp b/llvm/lib/Transforms/Scalar/GVN.cpp
index 846a5cdb33d1..29da739fa16e 100644
--- a/llvm/lib/Transforms/Scalar/GVN.cpp
+++ b/llvm/lib/Transforms/Scalar/GVN.cpp
@@ -91,13 +91,14 @@ using namespace PatternMatch;
#define DEBUG_TYPE "gvn"
-STATISTIC(NumGVNInstr, "Number of instructions deleted");
-STATISTIC(NumGVNLoad, "Number of loads deleted");
-STATISTIC(NumGVNPRE, "Number of instructions PRE'd");
+STATISTIC(NumGVNInstr, "Number of instructions deleted");
+STATISTIC(NumGVNLoad, "Number of loads deleted");
+STATISTIC(NumGVNPRE, "Number of instructions PRE'd");
STATISTIC(NumGVNBlocks, "Number of blocks merged");
-STATISTIC(NumGVNSimpl, "Number of instructions simplified");
+STATISTIC(NumGVNSimpl, "Number of instructions simplified");
STATISTIC(NumGVNEqProp, "Number of equalities propagated");
-STATISTIC(NumPRELoad, "Number of loads PRE'd");
+STATISTIC(NumPRELoad, "Number of loads PRE'd");
+STATISTIC(NumPRELoopLoad, "Number of loop loads PRE'd");
STATISTIC(IsValueFullyAvailableInBlockNumSpeculationsMax,
"Number of blocks speculated as available in "
@@ -1447,6 +1448,84 @@ bool GVN::PerformLoadPRE(LoadInst *Load, AvailValInBlkVect &ValuesPerBlock,
return true;
}
+bool GVN::performLoopLoadPRE(LoadInst *Load, AvailValInBlkVect &ValuesPerBlock,
+ UnavailBlkVect &UnavailableBlocks) {
+ if (!LI)
+ return false;
+
+ const Loop *L = LI->getLoopFor(Load->getParent());
+ // TODO: Generalize to other loop blocks that dominate the latch.
+ if (!L || L->getHeader() != Load->getParent())
+ return false;
+
+ BasicBlock *Preheader = L->getLoopPreheader();
+ BasicBlock *Latch = L->getLoopLatch();
+ if (!Preheader || !Latch)
+ return false;
+
+ Value *LoadPtr = Load->getPointerOperand();
+ // Must be available in preheader.
+ if (!L->isLoopInvariant(LoadPtr))
+ return false;
+
+ // We plan to hoist the load to preheader without introducing a new fault.
+ // In order to do it, we need to prove that we cannot side-exit the loop
+ // once loop header is first entered before execution of the load.
+ if (ICF->isDominatedByICFIFromSameBlock(Load))
+ return false;
+
+ BasicBlock *LoopBlock = nullptr;
+ for (auto *Blocker : UnavailableBlocks) {
+ // Blockers from outside the loop are handled in preheader.
+ if (!L->contains(Blocker))
+ continue;
+
+ // Only allow one loop block. Loop header is not less frequently executed
+ // than each loop block, and likely it is much more frequently executed. But
+ // in case of multiple loop blocks, we need extra information (such as block
+ // frequency info) to understand whether it is profitable to PRE into
+ // multiple loop blocks.
+ if (LoopBlock)
+ return false;
+
+ // Do not sink into inner loops. This may be non-profitable.
+ if (L != LI->getLoopFor(Blocker))
+ return false;
+
+ // Blocks that dominate the latch execute on every single iteration, maybe
+ // except the last one. So PREing into these blocks doesn't make much sense
+ // in most cases. But the blocks that do not necessarily execute on each
+ // iteration are sometimes much colder than the header, and this is when
+ // PRE is potentially profitable.
+ if (DT->dominates(Blocker, Latch))
+ return false;
+
+ // Make sure that the terminator itself doesn't clobber.
+ if (Blocker->getTerminator()->mayWriteToMemory())
+ return false;
+
+ LoopBlock = Blocker;
+ }
+
+ if (!LoopBlock)
+ return false;
+
+ // Make sure the memory at this pointer cannot be freed, therefore we can
+ // safely reload from it after clobber.
+ if (LoadPtr->canBeFreed())
+ return false;
+
+ // TODO: Support critical edge splitting if blocker has more than 1 successor.
+ MapVector<BasicBlock *, Value *> AvailableLoads;
+ AvailableLoads[LoopBlock] = LoadPtr;
+ AvailableLoads[Preheader] = LoadPtr;
+
+ LLVM_DEBUG(dbgs() << "GVN REMOVING PRE LOOP LOAD: " << *Load << '\n');
+ eliminatePartiallyRedundantLoad(Load, ValuesPerBlock, AvailableLoads);
+ ++NumPRELoopLoad;
+ return true;
+}
+
static void reportLoadElim(LoadInst *Load, Value *AvailableValue,
OptimizationRemarkEmitter *ORE) {
using namespace ore;
@@ -1544,7 +1623,8 @@ bool GVN::processNonLocalLoad(LoadInst *Load) {
if (!isLoadInLoopPREEnabled() && LI && LI->getLoopFor(Load->getParent()))
return Changed;
- return Changed || PerformLoadPRE(Load, ValuesPerBlock, UnavailableBlocks);
+ return Changed || PerformLoadPRE(Load, ValuesPerBlock, UnavailableBlocks) ||
+ performLoopLoadPRE(Load, ValuesPerBlock, UnavailableBlocks);
}
static bool impliesEquivalanceIfTrue(CmpInst* Cmp) {
diff --git a/llvm/test/Transforms/GVN/PRE/pre-loop-load.ll b/llvm/test/Transforms/GVN/PRE/pre-loop-load.ll
index 15bc49e7ab9a..8ca12284d5c4 100644
--- a/llvm/test/Transforms/GVN/PRE/pre-loop-load.ll
+++ b/llvm/test/Transforms/GVN/PRE/pre-loop-load.ll
@@ -7,22 +7,25 @@ declare void @may_free_memory()
declare i32 @personality_function()
-; TODO: We can PRE the load from gc-managed memory away from the hot path.
+; We can PRE the load from gc-managed memory away from the hot path.
define i32 @test_load_on_cold_path_gc(i32 addrspace(1)* %p) gc "statepoint-example" personality i32 ()* @"personality_function" {
; CHECK-LABEL: @test_load_on_cold_path_gc(
; CHECK-NEXT: entry:
+; CHECK-NEXT: [[X_PRE1:%.*]] = load i32, i32 addrspace(1)* [[P:%.*]], align 4
; CHECK-NEXT: br label [[LOOP:%.*]]
; CHECK: loop:
-; CHECK-NEXT: [[IV:%.*]] = phi i32 [ 0, [[ENTRY:%.*]] ], [ [[IV_NEXT:%.*]], [[BACKEDGE:%.*]] ]
-; CHECK-NEXT: [[X:%.*]] = load i32, i32 addrspace(1)* [[P:%.*]], align 4
+; CHECK-NEXT: [[X:%.*]] = phi i32 [ [[X_PRE1]], [[ENTRY:%.*]] ], [ [[X2:%.*]], [[BACKEDGE:%.*]] ]
+; CHECK-NEXT: [[IV:%.*]] = phi i32 [ 0, [[ENTRY]] ], [ [[IV_NEXT:%.*]], [[BACKEDGE]] ]
; CHECK-NEXT: [[COND:%.*]] = icmp ne i32 [[X]], 0
; CHECK-NEXT: br i1 [[COND]], label [[HOT_PATH:%.*]], label [[COLD_PATH:%.*]]
; CHECK: hot_path:
; CHECK-NEXT: br label [[BACKEDGE]]
; CHECK: cold_path:
; CHECK-NEXT: call void @may_free_memory()
+; CHECK-NEXT: [[X_PRE:%.*]] = load i32, i32 addrspace(1)* [[P]], align 4
; CHECK-NEXT: br label [[BACKEDGE]]
; CHECK: backedge:
+; CHECK-NEXT: [[X2]] = phi i32 [ [[X_PRE]], [[COLD_PATH]] ], [ [[X]], [[HOT_PATH]] ]
; CHECK-NEXT: [[IV_NEXT]] = add i32 [[IV]], [[X]]
; CHECK-NEXT: [[LOOP_COND:%.*]] = icmp ult i32 [[IV_NEXT]], 1000
; CHECK-NEXT: br i1 [[LOOP_COND]], label [[LOOP]], label [[EXIT:%.*]]
</cut>