Successfully identified regression in *llvm* in CI configuration tcwg_bmk_llvm_tk1/llvm-release-arm-spec2k6-Os. So far, this commit has regressed CI configurations: - tcwg_bmk_llvm_tk1/llvm-release-arm-spec2k6-Os
Culprit: <cut> commit ab97c9bdb747c873cd35a18229e2694156a7607d Author: David Green david.green@arm.com Date: Sat Dec 12 14:21:40 2020 +0000
[LV] Fix scalar cost for tail predicated loops
When it comes to the scalar cost of any predicated block, the loop vectorizer by default regards this predication as a sign that it is looking at an if-conversion and divides the scalar cost of the block by 2, assuming it would only be executed half the time. This however makes no sense if the predication has been introduced to tail predicate the loop.
Original patch by Anna Welker
Differential Revision: https://reviews.llvm.org/D86452 </cut>
Results regressed to (for first_bad == ab97c9bdb747c873cd35a18229e2694156a7607d) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--with-mode=thumb --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--with-mode=thumb --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -Os_mthumb -- artifacts/build-ab97c9bdb747c873cd35a18229e2694156a7607d/results_id: 1 # 401.bzip2,bzip2_base.default regressed by 103 # 401.bzip2,[.] BZ2_compressBlock regressed by 113 # 473.astar,astar_base.default regressed by 103
from (for last_good == d716eab197abec0b9aab4a76cd1a52b248b8c3b1) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--with-mode=thumb --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--with-mode=thumb --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -Os_mthumb -- artifacts/build-d716eab197abec0b9aab4a76cd1a52b248b8c3b1/results_id: 1
Artifacts of last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release-... Results ID of last_good: tk1_32/tcwg_bmk_llvm_tk1/bisect-llvm-release-arm-spec2k6-Os/1878 Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release-... Results ID of first_bad: tk1_32/tcwg_bmk_llvm_tk1/bisect-llvm-release-arm-spec2k6-Os/1876 Build top page/logs: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release-...
Configuration details:
Reproduce builds: <cut> mkdir investigate-llvm-ab97c9bdb747c873cd35a18229e2694156a7607d cd investigate-llvm-ab97c9bdb747c873cd35a18229e2694156a7607d
git clone https://git.linaro.org/toolchain/jenkins-scripts
mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release-... --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release-... --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release-... --fail chmod +x artifacts/test.sh
# Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh
# Save baseline build state (which is then restored in artifacts/test.sh) rsync -a --del --delete-excluded --exclude bisect/ --exclude artifacts/ --exclude llvm/ ./ ./bisect/baseline/
cd llvm
# Reproduce first_bad build git checkout --detach ab97c9bdb747c873cd35a18229e2694156a7607d ../artifacts/test.sh
# Reproduce last_good build git checkout --detach d716eab197abec0b9aab4a76cd1a52b248b8c3b1 ../artifacts/test.sh
cd .. </cut>
History of pending regressions and results: https://git.linaro.org/toolchain/ci/base-artifacts.git/log/?h=linaro-local/c...
Artifacts: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release-... Build log: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release-...
Full commit (up to 1000 lines): <cut> commit ab97c9bdb747c873cd35a18229e2694156a7607d Author: David Green david.green@arm.com Date: Sat Dec 12 14:21:40 2020 +0000
[LV] Fix scalar cost for tail predicated loops
When it comes to the scalar cost of any predicated block, the loop vectorizer by default regards this predication as a sign that it is looking at an if-conversion and divides the scalar cost of the block by 2, assuming it would only be executed half the time. This however makes no sense if the predication has been introduced to tail predicate the loop.
Original patch by Anna Welker
Differential Revision: https://reviews.llvm.org/D86452 --- llvm/lib/Transforms/Vectorize/LoopVectorize.cpp | 7 ++++--- llvm/test/Transforms/LoopVectorize/ARM/scalar-block-cost.ll | 2 +- 2 files changed, 5 insertions(+), 4 deletions(-)
diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp index c381377b67c9..663ea50c4c02 100644 --- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp +++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp @@ -6483,9 +6483,10 @@ LoopVectorizationCostModel::expectedCost(ElementCount VF) { // if-converted. This means that the block's instructions (aside from // stores and instructions that may divide by zero) will now be // unconditionally executed. For the scalar case, we may not always execute - // the predicated block. Thus, scale the block's cost by the probability of - // executing it. - if (VF.isScalar() && blockNeedsPredication(BB)) + // the predicated block, if it is an if-else block. Thus, scale the block's + // cost by the probability of executing it. blockNeedsPredication from + // Legal is used so as to not include all blocks in tail folded loops. + if (VF.isScalar() && Legal->blockNeedsPredication(BB)) BlockCost.first /= getReciprocalPredBlockProb();
Cost.first += BlockCost.first; diff --git a/llvm/test/Transforms/LoopVectorize/ARM/scalar-block-cost.ll b/llvm/test/Transforms/LoopVectorize/ARM/scalar-block-cost.ll index 959fbe676e67..fc8ea4fc938c 100644 --- a/llvm/test/Transforms/LoopVectorize/ARM/scalar-block-cost.ll +++ b/llvm/test/Transforms/LoopVectorize/ARM/scalar-block-cost.ll @@ -15,7 +15,7 @@ define void @pred_loop(i32* %off, i32* %data, i32* %dst, i32 %n) #0 { ; CHECK-COST-NEXT: LV: Found an estimated cost of 1 for VF 1 For instruction: store i32 %add1, i32* %arrayidx2, align 4 ; CHECK-COST-NEXT: LV: Found an estimated cost of 1 for VF 1 For instruction: %exitcond.not = icmp eq i32 %add, %n ; CHECK-COST-NEXT: LV: Found an estimated cost of 0 for VF 1 For instruction: br i1 %exitcond.not, label %exit.loopexit, label %for.body -; CHECK-COST-NEXT: LV: Scalar loop costs: 2. +; CHECK-COST-NEXT: LV: Scalar loop costs: 5.
entry: %cmp8 = icmp sgt i32 %n, 0 </cut>
Hi David,
Linaro’s benchmarking CI is churning through a backlog of old commits, and it appears that your patch increases code-size of a single function by 13% — from 5890 bytes to 6642 bytes — that’s on Thumb2 at -Os. Not a big deal, but may be it is exposing a case worth investigating.
The function is BZ2_compressBlock() from SPEC CPU2006’s 401.bzip2. Let us know if you are interested in investigating it, and we’ll provide details to reproduce.
Regards,
-- Maxim Kuvyrkov https://www.linaro.org
On 19 Jul 2021, at 00:39, ci_notify@linaro.org wrote:
Successfully identified regression in *llvm* in CI configuration tcwg_bmk_llvm_tk1/llvm-release-arm-spec2k6-Os. So far, this commit has regressed CI configurations:
- tcwg_bmk_llvm_tk1/llvm-release-arm-spec2k6-Os
Culprit:
<cut> commit ab97c9bdb747c873cd35a18229e2694156a7607d Author: David Green <david.green@arm.com> Date: Sat Dec 12 14:21:40 2020 +0000
[LV] Fix scalar cost for tail predicated loops
When it comes to the scalar cost of any predicated block, the loop vectorizer by default regards this predication as a sign that it is looking at an if-conversion and divides the scalar cost of the block by 2, assuming it would only be executed half the time. This however makes no sense if the predication has been introduced to tail predicate the loop.
Original patch by Anna Welker
Differential Revision: https://reviews.llvm.org/D86452
</cut>
Results regressed to (for first_bad == ab97c9bdb747c873cd35a18229e2694156a7607d) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--with-mode=thumb --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--with-mode=thumb --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -Os_mthumb -- artifacts/build-ab97c9bdb747c873cd35a18229e2694156a7607d/results_id: 1 # 401.bzip2,bzip2_base.default regressed by 103 # 401.bzip2,[.] BZ2_compressBlock regressed by 113 # 473.astar,astar_base.default regressed by 103
from (for last_good == d716eab197abec0b9aab4a76cd1a52b248b8c3b1) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--with-mode=thumb --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--with-mode=thumb --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -Os_mthumb -- artifacts/build-d716eab197abec0b9aab4a76cd1a52b248b8c3b1/results_id: 1
Artifacts of last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release-... Results ID of last_good: tk1_32/tcwg_bmk_llvm_tk1/bisect-llvm-release-arm-spec2k6-Os/1878 Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release-... Results ID of first_bad: tk1_32/tcwg_bmk_llvm_tk1/bisect-llvm-release-arm-spec2k6-Os/1876 Build top page/logs: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release-...
Configuration details:
Reproduce builds:
<cut> mkdir investigate-llvm-ab97c9bdb747c873cd35a18229e2694156a7607d cd investigate-llvm-ab97c9bdb747c873cd35a18229e2694156a7607d
git clone https://git.linaro.org/toolchain/jenkins-scripts
mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release-... --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release-... --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release-... --fail chmod +x artifacts/test.sh
# Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh
# Save baseline build state (which is then restored in artifacts/test.sh) rsync -a --del --delete-excluded --exclude bisect/ --exclude artifacts/ --exclude llvm/ ./ ./bisect/baseline/
cd llvm
# Reproduce first_bad build git checkout --detach ab97c9bdb747c873cd35a18229e2694156a7607d ../artifacts/test.sh
# Reproduce last_good build git checkout --detach d716eab197abec0b9aab4a76cd1a52b248b8c3b1 ../artifacts/test.sh
cd ..
</cut>
History of pending regressions and results: https://git.linaro.org/toolchain/ci/base-artifacts.git/log/?h=linaro-local/c...
Artifacts: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release-... Build log: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release-...
Full commit (up to 1000 lines):
<cut> commit ab97c9bdb747c873cd35a18229e2694156a7607d Author: David Green <david.green@arm.com> Date: Sat Dec 12 14:21:40 2020 +0000
[LV] Fix scalar cost for tail predicated loops
When it comes to the scalar cost of any predicated block, the loop vectorizer by default regards this predication as a sign that it is looking at an if-conversion and divides the scalar cost of the block by 2, assuming it would only be executed half the time. This however makes no sense if the predication has been introduced to tail predicate the loop.
Original patch by Anna Welker
Differential Revision: https://reviews.llvm.org/D86452
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp | 7 ++++--- llvm/test/Transforms/LoopVectorize/ARM/scalar-block-cost.ll | 2 +- 2 files changed, 5 insertions(+), 4 deletions(-)
diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp index c381377b67c9..663ea50c4c02 100644 --- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp +++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp @@ -6483,9 +6483,10 @@ LoopVectorizationCostModel::expectedCost(ElementCount VF) { // if-converted. This means that the block's instructions (aside from // stores and instructions that may divide by zero) will now be // unconditionally executed. For the scalar case, we may not always execute
- // the predicated block. Thus, scale the block's cost by the probability of
- // executing it.
- if (VF.isScalar() && blockNeedsPredication(BB))
// the predicated block, if it is an if-else block. Thus, scale the block's
// cost by the probability of executing it. blockNeedsPredication from
// Legal is used so as to not include all blocks in tail folded loops.
if (VF.isScalar() && Legal->blockNeedsPredication(BB)) BlockCost.first /= getReciprocalPredBlockProb();
Cost.first += BlockCost.first;
diff --git a/llvm/test/Transforms/LoopVectorize/ARM/scalar-block-cost.ll b/llvm/test/Transforms/LoopVectorize/ARM/scalar-block-cost.ll index 959fbe676e67..fc8ea4fc938c 100644 --- a/llvm/test/Transforms/LoopVectorize/ARM/scalar-block-cost.ll +++ b/llvm/test/Transforms/LoopVectorize/ARM/scalar-block-cost.ll @@ -15,7 +15,7 @@ define void @pred_loop(i32* %off, i32* %data, i32* %dst, i32 %n) #0 { ; CHECK-COST-NEXT: LV: Found an estimated cost of 1 for VF 1 For instruction: store i32 %add1, i32* %arrayidx2, align 4 ; CHECK-COST-NEXT: LV: Found an estimated cost of 1 for VF 1 For instruction: %exitcond.not = icmp eq i32 %add, %n ; CHECK-COST-NEXT: LV: Found an estimated cost of 0 for VF 1 For instruction: br i1 %exitcond.not, label %exit.loopexit, label %for.body -; CHECK-COST-NEXT: LV: Scalar loop costs: 2. +; CHECK-COST-NEXT: LV: Scalar loop costs: 5.
entry: %cmp8 = icmp sgt i32 %n, 0
</cut>
linaro-toolchain@lists.linaro.org