All,
During Connect the suggestion was made that each working group should have
its own IRC Channel for discussions and topics relating to the group in
particular (as opposed to #linaro which is 'generic' Linaro conversations).
Therefore I have just set up #linaro-tcwg on Freenode for the Toolchain
Working Group.
This channel is public and open to anyone who wants to talk with the TCWG
group about anything toolchain related.
Thanks,
Matt
--
Matthew Gretton-Dann
Toolchain Working Group, Linaro
Hi folks,
I'm hoping that I might be able to get some development help with
binutils for aarch64...
I'm maintaining the UEFI Secure Boot stack in Debian (shim etc.),
including for arm64/aarch64 (as I wanted to make that work too!). UEFI
binaries are awkward for those of used to the Linux and ELF world -
they're PE/COFF format with different calling conventions to match the
Microsoft world. But we've made things work.
On x86 platforms, the shim build process uses objcopy
--target=efi-app-$(ARCH) to produce the final output binaries. We've
never had similar support for the aarch64 platform, and instead
somebody came up with a method using locally-hacked linker script and
"-O binary" to generate the output binaries. That's worked well
enough for a while, but it's been annoying for various reasons
(particularly debugging problems).
*However*, recently for security reasons we've tweaked the layout of
Secure Boot binaries [1] and this has caused lots of problems. The
older hacks to hand-build the right sections etc. needed significant
extra work, and we're still dealing with awkward bugs related to
this. Based ont these problems, I recently had to make the painful
decision to drop support for arm64 SB in Debian. I know that other
distributions are feeling similar pain. :-(
Rather than continuing to hack on things, I think it's (way past) time
that we did things correctly! We need aarch64 binary format support in
binutils so we can just use it like we do on x86. AFAICS, there is
already a bug open asking for this from last year [2]. Could I please
prevail on some friendly neighourhood aarch64 toolchain engineer to
help with that?
Thanks for considering,
Steve
[1] https://github.com/rhboot/shim/blob/main/SBAT.md
[2] https://sourceware.org/bugzilla/show_bug.cgi?id=26206#add_comment
--
Steve McIntyre, Cambridge, UK. steve(a)einval.com
"...In the UNIX world, people tend to interpret `non-technical user'
as meaning someone who's only ever written one device driver." -- Daniel Pead
Successfully identified regression in *llvm* in CI configuration tcwg_bmk_llvm_tx1/llvm-master-aarch64-spec2k6-O2_LTO. So far, this commit has regressed CI configurations:
- tcwg_bmk_llvm_tx1/llvm-master-aarch64-spec2k6-O2_LTO
Culprit:
<cut>
commit 8fe62b7af1127691d6732438b322e66ae6d39a03
Author: Max Kazantsev <mkazantsev(a)azul.com>
Date: Thu Apr 22 12:50:38 2021 +0700
[GVN] Introduce loop load PRE
This patch allows PRE of the following type of loads:
```
preheader:
br label %loop
loop:
br i1 ..., label %merge, label %clobber
clobber:
call foo() // Clobbers %p
br label %merge
merge:
...
br i1 ..., label %loop, label %exit
```
Into
```
preheader:
%x0 = load %p
br label %loop
loop:
%x.pre = phi(x0, x2)
br i1 ..., label %merge, label %clobber
clobber:
call foo() // Clobbers %p
%x1 = load %p
br label %merge
merge:
x2 = phi(x.pre, x1)
...
br i1 ..., label %loop, label %exit
```
So instead of loading from %p on every iteration, we load only when the actual clobber happens.
The typical pattern which it is trying to address is: hot loop, with all code inlined and
provably having no side effects, and some side-effecting calls on cold path.
The worst overhead from it is, if we always take clobber block, we make 1 more load
overall (in preheader). It only matters if loop has very few iteration. If clobber block is not taken
at least once, the transform is neutral or profitable.
There are several improvements prospect open up:
- We can sometimes be smarter in loop-exiting blocks via split of critical edges;
- If we have block frequency info, we can handle multiple clobbers. The only obstacle now is that
we don't know if their sum is colder than the header.
Differential Revision: https://reviews.llvm.org/D99926
Reviewed By: reames
</cut>
Results regressed to (for first_bad == 8fe62b7af1127691d6732438b322e66ae6d39a03)
# reset_artifacts:
-10
# build_abe binutils:
-9
# build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer:
-8
# build_abe linux:
-7
# build_abe glibc:
-6
# build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer:
-5
# build_llvm true:
-3
# true:
0
# benchmark -O2_LTO -- artifacts/build-8fe62b7af1127691d6732438b322e66ae6d39a03/results_id:
1
# 464.h264ref,h264ref_base.default regressed by 104
from (for last_good == 722d4d8e7585457d407d0639a4ae2610157e06a8)
# reset_artifacts:
-10
# build_abe binutils:
-9
# build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer:
-8
# build_abe linux:
-7
# build_abe glibc:
-6
# build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer:
-5
# build_llvm true:
-3
# true:
0
# benchmark -O2_LTO -- artifacts/build-722d4d8e7585457d407d0639a4ae2610157e06a8/results_id:
1
Artifacts of last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-…
Results ID of last_good: tx1_64/tcwg_bmk_llvm_tx1/bisect-llvm-master-aarch64-spec2k6-O2_LTO/641
Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-…
Results ID of first_bad: tx1_64/tcwg_bmk_llvm_tx1/bisect-llvm-master-aarch64-spec2k6-O2_LTO/642
Build top page/logs: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-…
Configuration details:
Reproduce builds:
<cut>
mkdir investigate-llvm-8fe62b7af1127691d6732438b322e66ae6d39a03
cd investigate-llvm-8fe62b7af1127691d6732438b322e66ae6d39a03
git clone https://git.linaro.org/toolchain/jenkins-scripts
mkdir -p artifacts/manifests
curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail
curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail
curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail
chmod +x artifacts/test.sh
# Reproduce the baseline build (build all pre-requisites)
./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh
cd llvm
# Reproduce first_bad build
git checkout --detach 8fe62b7af1127691d6732438b322e66ae6d39a03
../artifacts/test.sh
# Reproduce last_good build
git checkout --detach 722d4d8e7585457d407d0639a4ae2610157e06a8
../artifacts/test.sh
cd ..
</cut>
History of pending regressions and results: https://git.linaro.org/toolchain/ci/base-artifacts.git/log/?h=linaro-local/…
Artifacts: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-…
Build log: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-…
Full commit (up to 1000 lines):
<cut>
commit 8fe62b7af1127691d6732438b322e66ae6d39a03
Author: Max Kazantsev <mkazantsev(a)azul.com>
Date: Thu Apr 22 12:50:38 2021 +0700
[GVN] Introduce loop load PRE
This patch allows PRE of the following type of loads:
```
preheader:
br label %loop
loop:
br i1 ..., label %merge, label %clobber
clobber:
call foo() // Clobbers %p
br label %merge
merge:
...
br i1 ..., label %loop, label %exit
```
Into
```
preheader:
%x0 = load %p
br label %loop
loop:
%x.pre = phi(x0, x2)
br i1 ..., label %merge, label %clobber
clobber:
call foo() // Clobbers %p
%x1 = load %p
br label %merge
merge:
x2 = phi(x.pre, x1)
...
br i1 ..., label %loop, label %exit
```
So instead of loading from %p on every iteration, we load only when the actual clobber happens.
The typical pattern which it is trying to address is: hot loop, with all code inlined and
provably having no side effects, and some side-effecting calls on cold path.
The worst overhead from it is, if we always take clobber block, we make 1 more load
overall (in preheader). It only matters if loop has very few iteration. If clobber block is not taken
at least once, the transform is neutral or profitable.
There are several improvements prospect open up:
- We can sometimes be smarter in loop-exiting blocks via split of critical edges;
- If we have block frequency info, we can handle multiple clobbers. The only obstacle now is that
we don't know if their sum is colder than the header.
Differential Revision: https://reviews.llvm.org/D99926
Reviewed By: reames
---
llvm/include/llvm/Transforms/Scalar/GVN.h | 6 ++
llvm/lib/Transforms/Scalar/GVN.cpp | 92 +++++++++++++++++++++++++--
llvm/test/Transforms/GVN/PRE/pre-loop-load.ll | 9 ++-
3 files changed, 98 insertions(+), 9 deletions(-)
diff --git a/llvm/include/llvm/Transforms/Scalar/GVN.h b/llvm/include/llvm/Transforms/Scalar/GVN.h
index 13f55ddcf2d6..70662ca213c3 100644
--- a/llvm/include/llvm/Transforms/Scalar/GVN.h
+++ b/llvm/include/llvm/Transforms/Scalar/GVN.h
@@ -328,6 +328,12 @@ private:
bool PerformLoadPRE(LoadInst *Load, AvailValInBlkVect &ValuesPerBlock,
UnavailBlkVect &UnavailableBlocks);
+ /// Try to replace a load which executes on each loop iteraiton with Phi
+ /// translation of load in preheader and load(s) in conditionally executed
+ /// paths.
+ bool performLoopLoadPRE(LoadInst *Load, AvailValInBlkVect &ValuesPerBlock,
+ UnavailBlkVect &UnavailableBlocks);
+
/// Eliminates partially redundant \p Load, replacing it with \p
/// AvailableLoads (connected by Phis if needed).
void eliminatePartiallyRedundantLoad(
diff --git a/llvm/lib/Transforms/Scalar/GVN.cpp b/llvm/lib/Transforms/Scalar/GVN.cpp
index 846a5cdb33d1..29da739fa16e 100644
--- a/llvm/lib/Transforms/Scalar/GVN.cpp
+++ b/llvm/lib/Transforms/Scalar/GVN.cpp
@@ -91,13 +91,14 @@ using namespace PatternMatch;
#define DEBUG_TYPE "gvn"
-STATISTIC(NumGVNInstr, "Number of instructions deleted");
-STATISTIC(NumGVNLoad, "Number of loads deleted");
-STATISTIC(NumGVNPRE, "Number of instructions PRE'd");
+STATISTIC(NumGVNInstr, "Number of instructions deleted");
+STATISTIC(NumGVNLoad, "Number of loads deleted");
+STATISTIC(NumGVNPRE, "Number of instructions PRE'd");
STATISTIC(NumGVNBlocks, "Number of blocks merged");
-STATISTIC(NumGVNSimpl, "Number of instructions simplified");
+STATISTIC(NumGVNSimpl, "Number of instructions simplified");
STATISTIC(NumGVNEqProp, "Number of equalities propagated");
-STATISTIC(NumPRELoad, "Number of loads PRE'd");
+STATISTIC(NumPRELoad, "Number of loads PRE'd");
+STATISTIC(NumPRELoopLoad, "Number of loop loads PRE'd");
STATISTIC(IsValueFullyAvailableInBlockNumSpeculationsMax,
"Number of blocks speculated as available in "
@@ -1447,6 +1448,84 @@ bool GVN::PerformLoadPRE(LoadInst *Load, AvailValInBlkVect &ValuesPerBlock,
return true;
}
+bool GVN::performLoopLoadPRE(LoadInst *Load, AvailValInBlkVect &ValuesPerBlock,
+ UnavailBlkVect &UnavailableBlocks) {
+ if (!LI)
+ return false;
+
+ const Loop *L = LI->getLoopFor(Load->getParent());
+ // TODO: Generalize to other loop blocks that dominate the latch.
+ if (!L || L->getHeader() != Load->getParent())
+ return false;
+
+ BasicBlock *Preheader = L->getLoopPreheader();
+ BasicBlock *Latch = L->getLoopLatch();
+ if (!Preheader || !Latch)
+ return false;
+
+ Value *LoadPtr = Load->getPointerOperand();
+ // Must be available in preheader.
+ if (!L->isLoopInvariant(LoadPtr))
+ return false;
+
+ // We plan to hoist the load to preheader without introducing a new fault.
+ // In order to do it, we need to prove that we cannot side-exit the loop
+ // once loop header is first entered before execution of the load.
+ if (ICF->isDominatedByICFIFromSameBlock(Load))
+ return false;
+
+ BasicBlock *LoopBlock = nullptr;
+ for (auto *Blocker : UnavailableBlocks) {
+ // Blockers from outside the loop are handled in preheader.
+ if (!L->contains(Blocker))
+ continue;
+
+ // Only allow one loop block. Loop header is not less frequently executed
+ // than each loop block, and likely it is much more frequently executed. But
+ // in case of multiple loop blocks, we need extra information (such as block
+ // frequency info) to understand whether it is profitable to PRE into
+ // multiple loop blocks.
+ if (LoopBlock)
+ return false;
+
+ // Do not sink into inner loops. This may be non-profitable.
+ if (L != LI->getLoopFor(Blocker))
+ return false;
+
+ // Blocks that dominate the latch execute on every single iteration, maybe
+ // except the last one. So PREing into these blocks doesn't make much sense
+ // in most cases. But the blocks that do not necessarily execute on each
+ // iteration are sometimes much colder than the header, and this is when
+ // PRE is potentially profitable.
+ if (DT->dominates(Blocker, Latch))
+ return false;
+
+ // Make sure that the terminator itself doesn't clobber.
+ if (Blocker->getTerminator()->mayWriteToMemory())
+ return false;
+
+ LoopBlock = Blocker;
+ }
+
+ if (!LoopBlock)
+ return false;
+
+ // Make sure the memory at this pointer cannot be freed, therefore we can
+ // safely reload from it after clobber.
+ if (LoadPtr->canBeFreed())
+ return false;
+
+ // TODO: Support critical edge splitting if blocker has more than 1 successor.
+ MapVector<BasicBlock *, Value *> AvailableLoads;
+ AvailableLoads[LoopBlock] = LoadPtr;
+ AvailableLoads[Preheader] = LoadPtr;
+
+ LLVM_DEBUG(dbgs() << "GVN REMOVING PRE LOOP LOAD: " << *Load << '\n');
+ eliminatePartiallyRedundantLoad(Load, ValuesPerBlock, AvailableLoads);
+ ++NumPRELoopLoad;
+ return true;
+}
+
static void reportLoadElim(LoadInst *Load, Value *AvailableValue,
OptimizationRemarkEmitter *ORE) {
using namespace ore;
@@ -1544,7 +1623,8 @@ bool GVN::processNonLocalLoad(LoadInst *Load) {
if (!isLoadInLoopPREEnabled() && LI && LI->getLoopFor(Load->getParent()))
return Changed;
- return Changed || PerformLoadPRE(Load, ValuesPerBlock, UnavailableBlocks);
+ return Changed || PerformLoadPRE(Load, ValuesPerBlock, UnavailableBlocks) ||
+ performLoopLoadPRE(Load, ValuesPerBlock, UnavailableBlocks);
}
static bool impliesEquivalanceIfTrue(CmpInst* Cmp) {
diff --git a/llvm/test/Transforms/GVN/PRE/pre-loop-load.ll b/llvm/test/Transforms/GVN/PRE/pre-loop-load.ll
index 15bc49e7ab9a..8ca12284d5c4 100644
--- a/llvm/test/Transforms/GVN/PRE/pre-loop-load.ll
+++ b/llvm/test/Transforms/GVN/PRE/pre-loop-load.ll
@@ -7,22 +7,25 @@ declare void @may_free_memory()
declare i32 @personality_function()
-; TODO: We can PRE the load from gc-managed memory away from the hot path.
+; We can PRE the load from gc-managed memory away from the hot path.
define i32 @test_load_on_cold_path_gc(i32 addrspace(1)* %p) gc "statepoint-example" personality i32 ()* @"personality_function" {
; CHECK-LABEL: @test_load_on_cold_path_gc(
; CHECK-NEXT: entry:
+; CHECK-NEXT: [[X_PRE1:%.*]] = load i32, i32 addrspace(1)* [[P:%.*]], align 4
; CHECK-NEXT: br label [[LOOP:%.*]]
; CHECK: loop:
-; CHECK-NEXT: [[IV:%.*]] = phi i32 [ 0, [[ENTRY:%.*]] ], [ [[IV_NEXT:%.*]], [[BACKEDGE:%.*]] ]
-; CHECK-NEXT: [[X:%.*]] = load i32, i32 addrspace(1)* [[P:%.*]], align 4
+; CHECK-NEXT: [[X:%.*]] = phi i32 [ [[X_PRE1]], [[ENTRY:%.*]] ], [ [[X2:%.*]], [[BACKEDGE:%.*]] ]
+; CHECK-NEXT: [[IV:%.*]] = phi i32 [ 0, [[ENTRY]] ], [ [[IV_NEXT:%.*]], [[BACKEDGE]] ]
; CHECK-NEXT: [[COND:%.*]] = icmp ne i32 [[X]], 0
; CHECK-NEXT: br i1 [[COND]], label [[HOT_PATH:%.*]], label [[COLD_PATH:%.*]]
; CHECK: hot_path:
; CHECK-NEXT: br label [[BACKEDGE]]
; CHECK: cold_path:
; CHECK-NEXT: call void @may_free_memory()
+; CHECK-NEXT: [[X_PRE:%.*]] = load i32, i32 addrspace(1)* [[P]], align 4
; CHECK-NEXT: br label [[BACKEDGE]]
; CHECK: backedge:
+; CHECK-NEXT: [[X2]] = phi i32 [ [[X_PRE]], [[COLD_PATH]] ], [ [[X]], [[HOT_PATH]] ]
; CHECK-NEXT: [[IV_NEXT]] = add i32 [[IV]], [[X]]
; CHECK-NEXT: [[LOOP_COND:%.*]] = icmp ult i32 [[IV_NEXT]], 1000
; CHECK-NEXT: br i1 [[LOOP_COND]], label [[LOOP]], label [[EXIT:%.*]]
</cut>
This bot failure appears to be unrelated to the fingered change. From
the commit log, I'd guess that 9eaf0d120 by Joel E. Denny
<jdenny.ornl(a)gmail.com> was the triggering change, but that's not in the
blame list.
Joel, FYI.
Linaro folks, as bot owner, you should investigate why the blame list is
wrong. JFYI, this is not the only bot failing with the wrong blame list,
so it might be a common problem.
Philip
On 6/25/21 10:43 AM, llvm.buildmaster(a)lab.llvm.org wrote:
> The Buildbot has detected a failed build on builder clang-armv7-quick while building llvm.
>
> Full details are available at:
> https://lab.llvm.org/buildbot#builders/171/builds/113
>
> Worker for this Build: linaro-clang-armv7-quick
> Blamelist:
> Philip Reames <listmail(a)philipreames.com>
>
> BUILD FAILED: failed 48104 expected passes 1 unexpected failures 72 expected failures Unexpected test result output SKIPPED 26089 unsupported tests (failure)
>
> Step 5 (ninja check 1) failure: 48104 expected passes 1 unexpected failures 72 expected failures Unexpected test result output SKIPPED 26089 unsupported tests (failure)
> ******************** TEST 'Clang :: utils/update_cc_test_checks/check-globals.test' FAILED ********************
> Script:
> --
> : 'RUN: at line 1'; rm -rf /home/tcwg-buildslave/worker/clang-armv7-quick/stage1/tools/clang/test/utils/update_cc_test_checks/Output/check-globals.test.tmp && mkdir /home/tcwg-buildslave/worker/clang-armv7-quick/stage1/tools/clang/test/utils/update_cc_test_checks/Output/check-globals.test.tmp
> : 'RUN: at line 5'; cp /home/tcwg-buildslave/worker/clang-armv7-quick/llvm/clang/test/utils/update_cc_test_checks/Inputs/check-globals.c /home/tcwg-buildslave/worker/clang-armv7-quick/stage1/tools/clang/test/utils/update_cc_test_checks/Output/check-globals.test.tmp/norm.c
> : 'RUN: at line 6'; /usr/bin/python3.6 /home/tcwg-buildslave/worker/clang-armv7-quick/llvm/llvm/utils/update_cc_test_checks.py --clang /home/tcwg-buildslave/worker/clang-armv7-quick/stage1/bin/clang --opt /home/tcwg-buildslave/worker/clang-armv7-quick/stage1/bin/opt /home/tcwg-buildslave/worker/clang-armv7-quick/stage1/tools/clang/test/utils/update_cc_test_checks/Output/check-globals.test.tmp/norm.c --check-globals
> : 'RUN: at line 7'; /home/tcwg-buildslave/worker/clang-armv7-quick/stage1/bin/FileCheck /home/tcwg-buildslave/worker/clang-armv7-quick/llvm/clang/test/utils/update_cc_test_checks/check-globals.test --input-file=/home/tcwg-buildslave/worker/clang-armv7-quick/stage1/tools/clang/test/utils/update_cc_test_checks/Output/check-globals.test.tmp/norm.c --match-full-lines -strict-whitespace -check-prefixes=BOTH,NRM
> : 'RUN: at line 10'; cp /home/tcwg-buildslave/worker/clang-armv7-quick/llvm/clang/test/utils/update_cc_test_checks/Inputs/check-globals.c /home/tcwg-buildslave/worker/clang-armv7-quick/stage1/tools/clang/test/utils/update_cc_test_checks/Output/check-globals.test.tmp/igf.c
> : 'RUN: at line 11'; /usr/bin/python3.6 /home/tcwg-buildslave/worker/clang-armv7-quick/llvm/llvm/utils/update_cc_test_checks.py --clang /home/tcwg-buildslave/worker/clang-armv7-quick/stage1/bin/clang --opt /home/tcwg-buildslave/worker/clang-armv7-quick/stage1/bin/opt /home/tcwg-buildslave/worker/clang-armv7-quick/stage1/tools/clang/test/utils/update_cc_test_checks/Output/check-globals.test.tmp/igf.c --check-globals --include-generated-funcs
> : 'RUN: at line 12'; /home/tcwg-buildslave/worker/clang-armv7-quick/stage1/bin/FileCheck /home/tcwg-buildslave/worker/clang-armv7-quick/llvm/clang/test/utils/update_cc_test_checks/check-globals.test --input-file=/home/tcwg-buildslave/worker/clang-armv7-quick/stage1/tools/clang/test/utils/update_cc_test_checks/Output/check-globals.test.tmp/igf.c --match-full-lines -strict-whitespace -check-prefixes=BOTH,IGF
> : 'RUN: at line 17'; cp /home/tcwg-buildslave/worker/clang-armv7-quick/stage1/tools/clang/test/utils/update_cc_test_checks/Output/check-globals.test.tmp/norm.c /home/tcwg-buildslave/worker/clang-armv7-quick/stage1/tools/clang/test/utils/update_cc_test_checks/Output/check-globals.test.tmp/norm-again.c
> : 'RUN: at line 18'; /usr/bin/python3.6 /home/tcwg-buildslave/worker/clang-armv7-quick/llvm/llvm/utils/update_cc_test_checks.py --clang /home/tcwg-buildslave/worker/clang-armv7-quick/stage1/bin/clang --opt /home/tcwg-buildslave/worker/clang-armv7-quick/stage1/bin/opt /home/tcwg-buildslave/worker/clang-armv7-quick/stage1/tools/clang/test/utils/update_cc_test_checks/Output/check-globals.test.tmp/norm-again.c --check-globals
> : 'RUN: at line 19'; diff -u /home/tcwg-buildslave/worker/clang-armv7-quick/stage1/tools/clang/test/utils/update_cc_test_checks/Output/check-globals.test.tmp/norm.c /home/tcwg-buildslave/worker/clang-armv7-quick/stage1/tools/clang/test/utils/update_cc_test_checks/Output/check-globals.test.tmp/norm-again.c
> : 'RUN: at line 20'; rm /home/tcwg-buildslave/worker/clang-armv7-quick/stage1/tools/clang/test/utils/update_cc_test_checks/Output/check-globals.test.tmp/norm-again.c
> : 'RUN: at line 22'; cp /home/tcwg-buildslave/worker/clang-armv7-quick/stage1/tools/clang/test/utils/update_cc_test_checks/Output/check-globals.test.tmp/igf.c /home/tcwg-buildslave/worker/clang-armv7-quick/stage1/tools/clang/test/utils/update_cc_test_checks/Output/check-globals.test.tmp/igf-again.c
> : 'RUN: at line 23'; /usr/bin/python3.6 /home/tcwg-buildslave/worker/clang-armv7-quick/llvm/llvm/utils/update_cc_test_checks.py --clang /home/tcwg-buildslave/worker/clang-armv7-quick/stage1/bin/clang --opt /home/tcwg-buildslave/worker/clang-armv7-quick/stage1/bin/opt /home/tcwg-buildslave/worker/clang-armv7-quick/stage1/tools/clang/test/utils/update_cc_test_checks/Output/check-globals.test.tmp/igf-again.c --check-globals --include-generated-funcs
> : 'RUN: at line 25'; diff -u /home/tcwg-buildslave/worker/clang-armv7-quick/stage1/tools/clang/test/utils/update_cc_test_checks/Output/check-globals.test.tmp/igf.c /home/tcwg-buildslave/worker/clang-armv7-quick/stage1/tools/clang/test/utils/update_cc_test_checks/Output/check-globals.test.tmp/igf-again.c
> : 'RUN: at line 26'; rm /home/tcwg-buildslave/worker/clang-armv7-quick/stage1/tools/clang/test/utils/update_cc_test_checks/Output/check-globals.test.tmp/igf-again.c
> : 'RUN: at line 31'; cp /home/tcwg-buildslave/worker/clang-armv7-quick/llvm/clang/test/utils/update_cc_test_checks/Inputs/lit.cfg.example /home/tcwg-buildslave/worker/clang-armv7-quick/stage1/tools/clang/test/utils/update_cc_test_checks/Output/check-globals.test.tmp/lit.cfg
> : 'RUN: at line 33'; /usr/bin/python3.6 /home/tcwg-buildslave/worker/clang-armv7-quick/llvm/llvm/utils/lit/lit.py -Dclang_obj_root=/home/tcwg-buildslave/worker/clang-armv7-quick/stage1/tools/clang -j1 -vv /home/tcwg-buildslave/worker/clang-armv7-quick/stage1/tools/clang/test/utils/update_cc_test_checks/Output/check-globals.test.tmp
> : 'RUN: at line 35'; /usr/bin/python3.6 /home/tcwg-buildslave/worker/clang-armv7-quick/llvm/llvm/utils/lit/lit.py -Dclang_obj_root=/home/tcwg-buildslave/worker/clang-armv7-quick/stage1/tools/clang -j1 -vv /home/tcwg-buildslave/worker/clang-armv7-quick/stage1/tools/clang/test/utils/update_cc_test_checks/Output/check-globals.test.tmp 2>&1 | /home/tcwg-buildslave/worker/clang-armv7-quick/stage1/bin/FileCheck -check-prefix=LIT-RUN /home/tcwg-buildslave/worker/clang-armv7-quick/llvm/clang/test/utils/update_cc_test_checks/check-globals.test
> --
> Exit Code: 1
>
> Command Output (stdout):
> --
> $ ":" "RUN: at line 1"
> $ "rm" "-rf" "/home/tcwg-buildslave/worker/clang-armv7-quick/stage1/tools/clang/test/utils/update_cc_test_checks/Output/check-globals.test.tmp"
> $ "mkdir" "/home/tcwg-buildslave/worker/clang-armv7-quick/stage1/tools/clang/test/utils/update_cc_test_checks/Output/check-globals.test.tmp"
> $ ":" "RUN: at line 5"
> $ "cp" "/home/tcwg-buildslave/worker/clang-armv7-quick/llvm/clang/test/utils/update_cc_test_checks/Inputs/check-globals.c" "/home/tcwg-buildslave/worker/clang-armv7-quick/stage1/tools/clang/test/utils/update_cc_test_checks/Output/check-globals.test.tmp/norm.c"
> $ ":" "RUN: at line 6"
> $ "/usr/bin/python3.6" "/home/tcwg-buildslave/worker/clang-armv7-quick/llvm/llvm/utils/update_cc_test_checks.py" "--clang" "/home/tcwg-buildslave/worker/clang-armv7-quick/stage1/bin/clang" "--opt" "/home/tcwg-buildslave/worker/clang-armv7-quick/stage1/bin/opt" "/home/tcwg-buildslave/worker/clang-armv7-quick/stage1/tools/clang/test/utils/update_cc_test_checks/Output/check-globals.test.tmp/norm.c" "--check-globals"
> # command stderr:
> NOTE: Executing non-FileChecked RUN line: true
>
> $ ":" "RUN: at line 7"
> $ "/home/tcwg-buildslave/worker/clang-armv7-quick/stage1/bin/FileCheck" "/home/tcwg-buildslave/worker/clang-armv7-quick/llvm/clang/test/utils/update_cc_test_checks/check-globals.test" "--input-file=/home/tcwg-buildslave/worker/clang-armv7-quick/stage1/tools/clang/test/utils/update_cc_test_checks/Output/check-globals.test.tmp/norm.c" "--match-full-lines" "-strict-whitespace" "-check-prefixes=BOTH,NRM"
> $ ":" "RUN: at line 10"
> $ "cp" "/home/tcwg-buildslave/worker/clang-armv7-quick/llvm/clang/test/utils/update_cc_test_checks/Inputs/check-globals.c" "/home/tcwg-buildslave/worker/clang-armv7-quick/stage1/tools/clang/test/utils/update_cc_test_checks/Output/check-globals.test.tmp/igf.c"
> $ ":" "RUN: at line 11"
> $ "/usr/bin/python3.6" "/home/tcwg-buildslave/worker/clang-armv7-quick/llvm/llvm/utils/update_cc_test_checks.py" "--clang" "/home/tcwg-buildslave/worker/clang-armv7-quick/stage1/bin/clang" "--opt" "/home/tcwg-buildslave/worker/clang-armv7-quick/stage1/bin/opt" "/home/tcwg-buildslave/worker/clang-armv7-quick/stage1/tools/clang/test/utils/update_cc_test_checks/Output/check-globals.test.tmp/igf.c" "--check-globals" "--include-generated-funcs"
> # command stderr:
> NOTE: Executing non-FileChecked RUN line: true
>
> $ ":" "RUN: at line 12"
> $ "/home/tcwg-buildslave/worker/clang-armv7-quick/stage1/bin/FileCheck" "/home/tcwg-buildslave/worker/clang-armv7-quick/llvm/clang/test/utils/update_cc_test_checks/check-globals.test" "--input-file=/home/tcwg-buildslave/worker/clang-armv7-quick/stage1/tools/clang/test/utils/update_cc_test_checks/Output/check-globals.test.tmp/igf.c" "--match-full-lines" "-strict-whitespace" "-check-prefixes=BOTH,IGF"
> $ ":" "RUN: at line 17"
> $ "cp" "/home/tcwg-buildslave/worker/clang-armv7-quick/stage1/tools/clang/test/utils/update_cc_test_checks/Output/check-globals.test.tmp/norm.c" "/home/tcwg-buildslave/worker/clang-armv7-quick/stage1/tools/clang/test/utils/update_cc_test_checks/Output/check-globals.test.tmp/norm-again.c"
> $ ":" "RUN: at line 18"
> ...
>
> Sincerely,
> LLVM Buildbot
>
Successfully identified regression in *gcc* in CI configuration tcwg_bmk_gnu_tk1/gnu-master-arm-spec2k6-O2_LTO. So far, this commit has regressed CI configurations:
- tcwg_bmk_gnu_tk1/gnu-master-arm-spec2k6-O2_LTO
Culprit:
<cut>
commit 32955416d8040b1fa1ba21cd4179b3264e6c5bd6
Author: Richard Biener <rguenther(a)suse.de>
Date: Mon May 3 12:07:58 2021 +0200
Improve PHI handling in DSE
This improves handling of PHI defs when walking uses in
dse_classify_store to track two PHI defs. This happens
when there are CFG merges and one PHI feeds into another.
If we decide to want more then using a sbitmap for this might be
the way to go.
2021-05-03 Richard Biener <rguenther(a)suse.de>
* tree-ssa-dse.c (dse_classify_store): Track two PHI defs.
* gcc.dg/tree-ssa/ssa-dse-42.c: New testcase.
* gcc.dg/pr81192.c: Disable DSE.
</cut>
Results regressed to (for first_bad == 32955416d8040b1fa1ba21cd4179b3264e6c5bd6)
# reset_artifacts:
-10
# build_abe binutils:
-9
# build_abe stage1 -- --set gcc_override_configure=--with-mode=arm --set gcc_override_configure=--disable-libsanitizer:
-8
# build_abe linux:
-7
# build_abe glibc:
-6
# build_abe stage2 -- --set gcc_override_configure=--with-mode=arm --set gcc_override_configure=--disable-libsanitizer:
-5
# true:
0
# benchmark -O2_LTO_marm -- artifacts/build-32955416d8040b1fa1ba21cd4179b3264e6c5bd6/results_id:
1
# 483.xalancbmk,Xalan_base.default regressed by 103
from (for last_good == ed3c43224cc4e378dbab066122bc63536ccb1276)
# reset_artifacts:
-10
# build_abe binutils:
-9
# build_abe stage1 -- --set gcc_override_configure=--with-mode=arm --set gcc_override_configure=--disable-libsanitizer:
-8
# build_abe linux:
-7
# build_abe glibc:
-6
# build_abe stage2 -- --set gcc_override_configure=--with-mode=arm --set gcc_override_configure=--disable-libsanitizer:
-5
# true:
0
# benchmark -O2_LTO_marm -- artifacts/build-ed3c43224cc4e378dbab066122bc63536ccb1276/results_id:
1
Artifacts of last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-ar…
Results ID of last_good: tk1_32/tcwg_bmk_gnu_tk1/bisect-gnu-master-arm-spec2k6-O2_LTO/458
Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-ar…
Results ID of first_bad: tk1_32/tcwg_bmk_gnu_tk1/bisect-gnu-master-arm-spec2k6-O2_LTO/480
Build top page/logs: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-ar…
Configuration details:
Reproduce builds:
<cut>
mkdir investigate-gcc-32955416d8040b1fa1ba21cd4179b3264e6c5bd6
cd investigate-gcc-32955416d8040b1fa1ba21cd4179b3264e6c5bd6
git clone https://git.linaro.org/toolchain/jenkins-scripts
mkdir -p artifacts/manifests
curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-ar… --fail
curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-ar… --fail
curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-ar… --fail
chmod +x artifacts/test.sh
# Reproduce the baseline build (build all pre-requisites)
./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh
cd gcc
# Reproduce first_bad build
git checkout --detach 32955416d8040b1fa1ba21cd4179b3264e6c5bd6
../artifacts/test.sh
# Reproduce last_good build
git checkout --detach ed3c43224cc4e378dbab066122bc63536ccb1276
../artifacts/test.sh
cd ..
</cut>
History of pending regressions and results: https://git.linaro.org/toolchain/ci/base-artifacts.git/log/?h=linaro-local/…
Artifacts: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-ar…
Build log: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-ar…
Full commit (up to 1000 lines):
<cut>
commit 32955416d8040b1fa1ba21cd4179b3264e6c5bd6
Author: Richard Biener <rguenther(a)suse.de>
Date: Mon May 3 12:07:58 2021 +0200
Improve PHI handling in DSE
This improves handling of PHI defs when walking uses in
dse_classify_store to track two PHI defs. This happens
when there are CFG merges and one PHI feeds into another.
If we decide to want more then using a sbitmap for this might be
the way to go.
2021-05-03 Richard Biener <rguenther(a)suse.de>
* tree-ssa-dse.c (dse_classify_store): Track two PHI defs.
* gcc.dg/tree-ssa/ssa-dse-42.c: New testcase.
* gcc.dg/pr81192.c: Disable DSE.
---
gcc/testsuite/gcc.dg/pr81192.c | 4 +++-
gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-42.c | 20 ++++++++++++++++++++
gcc/tree-ssa-dse.c | 23 ++++++++++++++---------
3 files changed, 37 insertions(+), 10 deletions(-)
diff --git a/gcc/testsuite/gcc.dg/pr81192.c b/gcc/testsuite/gcc.dg/pr81192.c
index 71bbc13a0e9..6cab6056558 100644
--- a/gcc/testsuite/gcc.dg/pr81192.c
+++ b/gcc/testsuite/gcc.dg/pr81192.c
@@ -1,4 +1,4 @@
-/* { dg-options "-Os -fdump-tree-pre-details -fdisable-tree-evrp" } */
+/* { dg-options "-Os -fdump-tree-pre-details -fdisable-tree-evrp -fno-tree-dse" } */
/* Disable tree-evrp because the new version of evrp sees
<bb 3> :
@@ -16,6 +16,8 @@ produces
# iftmp.2_12 = PHI <2147483647(3), iftmp.2_11(4)>
which causes the situation being tested to dissapear before we get to PRE. */
+/* Likewise disable DSE which also elides the tail merging "opportunity". */
+
#if __SIZEOF_INT__ == 2
#define unsigned __UINT32_TYPE__
#define int __INT32_TYPE__
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-42.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-42.c
new file mode 100644
index 00000000000..34108c83828
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-42.c
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+/* { dg-options "-O -fdump-tree-dse1" } */
+
+int a[2];
+void foo(int i, int k, int j)
+{
+ a[0] = i;
+ if (k)
+ a[0] = a[i] + k;
+ else
+ {
+ if (j)
+ a[1] = 1;
+ a[0] = a[i] + 3;
+ }
+ a[0] = 0;
+}
+
+/* The last stores to a[0] and a[1] remain. */
+/* { dg-final { scan-tree-dump-times " = " 2 "dse1" } } */
diff --git a/gcc/tree-ssa-dse.c b/gcc/tree-ssa-dse.c
index e0a944c704a..dfa6d314727 100644
--- a/gcc/tree-ssa-dse.c
+++ b/gcc/tree-ssa-dse.c
@@ -799,7 +799,8 @@ dse_classify_store (ao_ref *ref, gimple *stmt,
return DSE_STORE_LIVE;
auto_vec<gimple *, 10> defs;
- gimple *phi_def = NULL;
+ gimple *first_phi_def = NULL;
+ gimple *last_phi_def = NULL;
FOR_EACH_IMM_USE_STMT (use_stmt, ui, defvar)
{
/* Limit stmt walking. */
@@ -825,7 +826,9 @@ dse_classify_store (ao_ref *ref, gimple *stmt,
SSA_NAME_VERSION (PHI_RESULT (use_stmt))))
{
defs.safe_push (use_stmt);
- phi_def = use_stmt;
+ if (!first_phi_def)
+ first_phi_def = use_stmt;
+ last_phi_def = use_stmt;
}
}
/* If the statement is a use the store is not dead. */
@@ -889,6 +892,8 @@ dse_classify_store (ao_ref *ref, gimple *stmt,
gimple *def = defs[i];
gimple *use_stmt;
use_operand_p use_p;
+ tree vdef = (gimple_code (def) == GIMPLE_PHI
+ ? gimple_phi_result (def) : gimple_vdef (def));
/* If the path to check starts with a kill we do not need to
process it further.
??? With byte tracking we need only kill the bytes currently
@@ -901,8 +906,7 @@ dse_classify_store (ao_ref *ref, gimple *stmt,
}
/* If the path ends here we do not need to process it further.
This for example happens with calls to noreturn functions. */
- else if (gimple_code (def) != GIMPLE_PHI
- && has_zero_uses (gimple_vdef (def)))
+ else if (has_zero_uses (vdef))
{
/* But if the store is to global memory it is definitely
not dead. */
@@ -912,12 +916,13 @@ dse_classify_store (ao_ref *ref, gimple *stmt,
}
/* In addition to kills we can remove defs whose only use
is another def in defs. That can only ever be PHIs of which
- we track a single for simplicity reasons (we fail for multiple
- PHIs anyways). We can also ignore defs that feed only into
+ we track two for simplicity reasons, the first and last in
+ {first,last}_phi_def (we fail for multiple PHIs anyways).
+ We can also ignore defs that feed only into
already visited PHIs. */
- else if (gimple_code (def) != GIMPLE_PHI
- && single_imm_use (gimple_vdef (def), &use_p, &use_stmt)
- && (use_stmt == phi_def
+ else if (single_imm_use (vdef, &use_p, &use_stmt)
+ && (use_stmt == first_phi_def
+ || use_stmt == last_phi_def
|| (gimple_code (use_stmt) == GIMPLE_PHI
&& bitmap_bit_p (visited,
SSA_NAME_VERSION
</cut>