Hi Peter,
Welcome back, hope you had a good Christmas break. I'm off oh holiday myself for the next
two weeks, so this would be an ideal time to pass back merge control to you.
The board is mostly green now, with occasional allowed failures for centos-stream and
freebsd for upstream package manager failures.
See yall in a couple of weeks.
r~
[UM-2]
* Re-greening of gitlab-ci.
- There are continuing issues with cross-i386-tci.
Occasionally I see *really* long test times:
https://gitlab.com/qemu-project/qemu/-/jobs/1941996332
with qtest-aarch64/qom-test taking 1738s, or 28 of the 60 minute budget.
More often it's merely slow:
https://gitlab.com/qemu-project/qemu/-/jobs/1954634840
with qtest-aarch64/qom-test taking 538s. Note that locally this test
runs in about 100s, and I have been unable to determine why it runs so
much slower on gitlab.
- Worked on a ppc64-softmmu slowdown leading to timeouts.
- Fixes for meson regressions affecting testing.
* Refresh tcg unaligned user patch sets.
r~
Progress (short week, 2 days):
* UM-2 [QEMU upstream maintainership]
- Catching up with email and codereview backlog from 3 weeks holiday :-)
(Have got the codereview queue down to less than a dozen things
so should be able to do some more GICv4 development next week.)
-- PMM
Project Stratos
===============
- got Xen working on the MachiatoBin
- posted Configuring the host GIC for guest to guest IPI Message-Id:
<87fsqwn2sd.fsf(a)linaro.org>
QEMU Upstream Work ([UM-2])
===========================
- posted [RFC PATCH] linux-user: don't adjust base of found hole
Message-Id: <20211216144442.2270605-1-alex.bennee(a)linaro.org>
- posted [PATCH] hw/arm: add control knob to disable kaslr_seed via
DTB Message-Id: <20211215120926.1696302-1-alex.bennee(a)linaro.org>
Completed Reviews [3/3]
=======================
[PATCH 00/26] arm gicv3 ITS: Various bug fixes and refactorings
Message-Id: <20211211191135.1764649-1-peter.maydell(a)linaro.org>
[PATCH for-7.0 0/6] target/arm: Implement LVA, LPA, LPA2 features
Message-Id: <20211208231154.392029-1-richard.henderson(a)linaro.org>
[PATCH-for-6.2? v2 0/5] docs/devel/style: Improve rST rendering
Message-Id: <20211118145716.4116731-1-philmd(a)redhat.com>
Absences
========
Off for holidays, back in the new year. Merry Christmas everyone!
--
Alex Bennée
Project Stratos
===============
- posted Potential demo setup for a TSN/XDP networking Message-Id:
<87wnkfkp2f.fsf(a)linaro.org>
- final Stratos call of the year
- CC and Arnd will look at fat virtq
- nice update from EPAM on Zephyr
- had another round of getting working ACPI on MachiatoBin
- posted [PR to clean up some typos in EDK2]
- might have a working Xen setup without needing SMC hacks
[PR to clean up some typos in EDK2]
<https://github.com/tianocore/edk2-platforms/pull/34>
vhost-device maintainer effort ([UM-196])
- finished review of https://github.com/rust-vmm/vhost-device/pull/4
[UM-196] <https://linaro.atlassian.net/browse/UM-196>
QEMU Upstream Work ([UM-2])
===========================
- discussion around Suggestions for TCG performance improvements
Message-Id: <c76bde31-8f3b-2d03-b7c7-9e026d4b5873(a)huawei.com>
- did a bunch of bug triage and tagging
[UM-2] <https://linaro.atlassian.net/browse/UM-2>
Upstream MTTCG tests ([QEMU-52])
- awaiting final review of [kvm-unit-tests PATCH v9 0/9] MTTCG sanity
tests for ARM Message-Id:
<20211202115352.951548-1-alex.bennee(a)linaro.org>
[QEMU-52] <https://linaro.atlassian.net/browse/QEMU-52>
Completed Reviews [3/3]
=======================
[PATCH] tests/plugin/syscall.c: fix compiler warnings
Message-Id: <20211128011551.2115468-1-juro.bystricky(a)intel.com>
[PATCH for-6.2? 0/2] arm_gicv3: Fix handling of LPIs in list registers
Message-Id: <20211126163915.1048353-2-peter.maydell(a)linaro.org>
[PATCH] tests/docker: add libfuse3 development headers
Message-Id: <20211207160025.52466-1-stefanha(a)redhat.com>
Absences
========
Current Review Queue
====================
TODO [PATCH 0/8] virtio: Add vhost-user based Video decode
Message-Id: <20211209145601.331477-1-peter.griffin(a)linaro.org>
========================================================================================================================
TODO [PATCH for-7.0 0/6] target/arm: Implement LVA, LPA, LPA2 features
Message-Id: <20211208231154.392029-1-richard.henderson(a)linaro.org>
========================================================================================================================================
TODO [PATCH-4.16 v2] xen/efi: Fix Grub2 boot on arm64
Message-Id: <20211104141206.25153-1-luca.fancellu(a)arm.com>
===============================================================================================================
TODO [PATCH 00/16] fdt: Make OF_BOARD a boolean option
Message-Id: <20211013010120.96851-1-sjg(a)chromium.org>
===========================================================================================================
--
Alex Bennée
Progress:
* UM-2 [QEMU upstream maintainership]
- More code review: now have a target-arm.next poised and ready to
send once 6.2 is released
* QEMU-420 [GICv4 emulation]
- Working on the ITS changes needed for GICv4 support (this turns
out to be a more tractable end to start than the redistributor)
- I have a preliminary set of 25 or so patches to the ITS which
clean up the code and fix some pre-existing bugs that I found
while working on the GICv4 changes
- have implemented the new VMAPI, VMAPTI, VMAPP ITS commands
-- PMM
After llvm commit 3d549dddf75b6ff9e0ec8c053677750bde4226ea
Author: Sander de Smalen <sander.desmalen(a)arm.com>
[LV] Pass compare predicate to getCmpSelInstrCost.
the following benchmarks slowed down by more than 2%:
- 464.h264ref slowed down by 7% from 11115 to 11846 perf samples
Below reproducer instructions can be used to re-build both "first_bad" and "last_good" cross-toolchains used in this bisection. Naturally, the scripts will fail when triggerring benchmarking jobs if you don't have access to Linaro TCWG CI.
For your convenience, we have uploaded tarballs with pre-processed source and assembly files at:
- First_bad save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-…
- Last_good save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-…
- Baseline save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-…
Configuration:
- Benchmark: SPEC CPU2006
- Toolchain: Clang + Glibc + LLVM Linker
- Version: all components were built from their tip of trunk
- Target: aarch64-linux-gnu
- Compiler flags: -O2 -flto
- Hardware: NVidia TX1 4x Cortex-A57
This benchmarking CI is work-in-progress, and we welcome feedback and suggestions at linaro-toolchain(a)lists.linaro.org . In our improvement plans is to add support for SPEC CPU2017 benchmarks and provide "perf report/annotate" data behind these reports.
THIS IS THE END OF INTERESTING STUFF. BELOW ARE LINKS TO BUILDS, REPRODUCTION INSTRUCTIONS, AND THE RAW COMMIT.
This commit has regressed these CI configurations:
- tcwg_bmk_llvm_tx1/llvm-master-aarch64-spec2k6-O2_LTO
First_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-…
Last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-…
Baseline build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-…
Even more details: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-…
Reproduce builds:
<cut>
mkdir investigate-llvm-3d549dddf75b6ff9e0ec8c053677750bde4226ea
cd investigate-llvm-3d549dddf75b6ff9e0ec8c053677750bde4226ea
# Fetch scripts
git clone https://git.linaro.org/toolchain/jenkins-scripts
# Fetch manifests and test.sh script
mkdir -p artifacts/manifests
curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail
curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail
curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail
chmod +x artifacts/test.sh
# Reproduce the baseline build (build all pre-requisites)
./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh
# Save baseline build state (which is then restored in artifacts/test.sh)
mkdir -p ./bisect
rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /llvm/ ./ ./bisect/baseline/
cd llvm
# Reproduce first_bad build
git checkout --detach 3d549dddf75b6ff9e0ec8c053677750bde4226ea
../artifacts/test.sh
# Reproduce last_good build
git checkout --detach ab31d003e16e483bff298ea2f28fec0f23e8eb79
../artifacts/test.sh
cd ..
</cut>
Full commit (up to 1000 lines):
<cut>
commit 3d549dddf75b6ff9e0ec8c053677750bde4226ea
Author: Sander de Smalen <sander.desmalen(a)arm.com>
Date: Mon Dec 6 11:14:27 2021 +0000
[LV] Pass compare predicate to getCmpSelInstrCost.
If the condition of a select is a compare, pass its predicate to
TTI::getCmpSelInstrCost to get a more accurate cost value instead
of passing BAD_ICMP_PREDICATE.
I noticed that the commit message from D90070 had a comment about the
vectorized select predicate possibly being composed of other compares with
different predicate values, but I wasn't able to construct an example
where this was an actual issue. If this is an issue, I guess we could
add another check that the block isn't predicated for any reason.
Reviewed By: dmgreen, fhahn
Differential Revision: https://reviews.llvm.org/D114646
---
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp | 11 ++++++++---
llvm/test/Transforms/LoopVectorize/AArch64/select-costs.ll | 14 +++++++-------
2 files changed, 15 insertions(+), 10 deletions(-)
diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index 050879144afd..c03e506b7474 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -7570,8 +7570,12 @@ LoopVectorizationCostModel::getInstructionCost(Instruction *I, ElementCount VF,
Type *CondTy = SI->getCondition()->getType();
if (!ScalarCond)
CondTy = VectorType::get(CondTy, VF);
- return TTI.getCmpSelInstrCost(I->getOpcode(), VectorTy, CondTy,
- CmpInst::BAD_ICMP_PREDICATE, CostKind, I);
+
+ CmpInst::Predicate Pred = CmpInst::BAD_ICMP_PREDICATE;
+ if (auto *Cmp = dyn_cast<CmpInst>(SI->getCondition()))
+ Pred = Cmp->getPredicate();
+ return TTI.getCmpSelInstrCost(I->getOpcode(), VectorTy, CondTy, Pred,
+ CostKind, I);
}
case Instruction::ICmp:
case Instruction::FCmp: {
@@ -7581,7 +7585,8 @@ LoopVectorizationCostModel::getInstructionCost(Instruction *I, ElementCount VF,
ValTy = IntegerType::get(ValTy->getContext(), MinBWs[Op0AsInstruction]);
VectorTy = ToVectorTy(ValTy, VF);
return TTI.getCmpSelInstrCost(I->getOpcode(), VectorTy, nullptr,
- CmpInst::BAD_ICMP_PREDICATE, CostKind, I);
+ cast<CmpInst>(I)->getPredicate(), CostKind,
+ I);
}
case Instruction::Store:
case Instruction::Load: {
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/select-costs.ll b/llvm/test/Transforms/LoopVectorize/AArch64/select-costs.ll
index 62b18f44fbc5..20d2dc0b7cda 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/select-costs.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/select-costs.ll
@@ -5,17 +5,17 @@ target datalayout = "e-m:o-i64:64-i128:128-n32:64-S128"
target triple = "arm64-apple-ios5.0.0"
define void @selects_1(i32* nocapture %dst, i32 %A, i32 %B, i32 %C, i32 %N) {
-; CHECK: LV: Found an estimated cost of 5 for VF 2 For instruction: %cond = select i1 %cmp1, i32 10, i32 %and
-; CHECK: LV: Found an estimated cost of 5 for VF 2 For instruction: %cond6 = select i1 %cmp2, i32 30, i32 %and
-; CHECK: LV: Found an estimated cost of 5 for VF 2 For instruction: %cond11 = select i1 %cmp7, i32 %cond, i32 %cond6
+; CHECK: LV: Found an estimated cost of 1 for VF 2 For instruction: %cond = select i1 %cmp1, i32 10, i32 %and
+; CHECK: LV: Found an estimated cost of 1 for VF 2 For instruction: %cond6 = select i1 %cmp2, i32 30, i32 %and
+; CHECK: LV: Found an estimated cost of 1 for VF 2 For instruction: %cond11 = select i1 %cmp7, i32 %cond, i32 %cond6
-; CHECK: LV: Found an estimated cost of 13 for VF 4 For instruction: %cond = select i1 %cmp1, i32 10, i32 %and
-; CHECK: LV: Found an estimated cost of 13 for VF 4 For instruction: %cond6 = select i1 %cmp2, i32 30, i32 %and
-; CHECK: LV: Found an estimated cost of 13 for VF 4 For instruction: %cond11 = select i1 %cmp7, i32 %cond, i32 %cond6
+; CHECK: LV: Found an estimated cost of 1 for VF 4 For instruction: %cond = select i1 %cmp1, i32 10, i32 %and
+; CHECK: LV: Found an estimated cost of 1 for VF 4 For instruction: %cond6 = select i1 %cmp2, i32 30, i32 %and
+; CHECK: LV: Found an estimated cost of 1 for VF 4 For instruction: %cond11 = select i1 %cmp7, i32 %cond, i32 %cond6
; CHECK-LABEL: define void @selects_1(
; CHECK: vector.body:
-; CHECK: select <2 x i1>
+; CHECK: select <4 x i1>
entry:
%cmp26 = icmp sgt i32 %N, 0
</cut>
Dear Linaro Toolchain Working Group,
clang-thumbv7-full-2stage is red for 20 days.
Could you take it to the staging area and make it green again, please?
Thanks
Galina