After llvm commit fbc0c308d599fe3300ab6516650b65b41979446d Author: Nikita Popov nikita.ppv@gmail.com
[BasicAA] Handle known bits as ranges
the following benchmarks slowed down by more than 2%: - 464.h264ref slowed down by 7% from 10899 to 11610 perf samples - 464.h264ref:libc.so.6 slowed down by 11% from 3538 to 3922 perf samples
Below reproducer instructions can be used to re-build both "first_bad" and "last_good" cross-toolchains used in this bisection. Naturally, the scripts will fail when triggerring benchmarking jobs if you don't have access to Linaro TCWG CI.
For your convenience, we have uploaded tarballs with pre-processed source and assembly files at: - First_bad save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-a... - Last_good save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-a... - Baseline save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-a...
Configuration: - Benchmark: SPEC CPU2006 - Toolchain: Clang + Glibc + LLVM Linker - Version: all components were built from their tip of trunk - Target: aarch64-linux-gnu - Compiler flags: -O2 -flto - Hardware: NVidia TX1 4x Cortex-A57
This benchmarking CI is work-in-progress, and we welcome feedback and suggestions at linaro-toolchain@lists.linaro.org . In our improvement plans is to add support for SPEC CPU2017 benchmarks and provide "perf report/annotate" data behind these reports.
THIS IS THE END OF INTERESTING STUFF. BELOW ARE LINKS TO BUILDS, REPRODUCTION INSTRUCTIONS, AND THE RAW COMMIT.
This commit has regressed these CI configurations: - tcwg_bmk_llvm_tx1/llvm-master-aarch64-spec2k6-O2_LTO
First_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-a... Last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-a... Baseline build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-a... Even more details: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-a...
Reproduce builds: <cut> mkdir investigate-llvm-fbc0c308d599fe3300ab6516650b65b41979446d cd investigate-llvm-fbc0c308d599fe3300ab6516650b65b41979446d
# Fetch scripts git clone https://git.linaro.org/toolchain/jenkins-scripts
# Fetch manifests and test.sh script mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-a... --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-a... --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-a... --fail chmod +x artifacts/test.sh
# Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh
# Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /llvm/ ./ ./bisect/baseline/
cd llvm
# Reproduce first_bad build git checkout --detach fbc0c308d599fe3300ab6516650b65b41979446d ../artifacts/test.sh
# Reproduce last_good build git checkout --detach 30a3652b6ade43504087f6e3acd8dc879055f501 ../artifacts/test.sh
cd .. </cut>
Full commit (up to 1000 lines): <cut> commit fbc0c308d599fe3300ab6516650b65b41979446d Author: Nikita Popov nikita.ppv@gmail.com Date: Mon Oct 25 15:47:21 2021 +0200
[BasicAA] Handle known bits as ranges
BasicAA currently tries to determine that the offset is positive by checking whether all variable indices are positive based on known bits, multiplied by a positive scale. However, this is incorrect if the scale multiplication might overflow. In the modified test case the original value is positive, but may be negative after a left shift.
Fix this by converting known bits into a constant range and reusing the range-based logic, which handles overflow correctly.
Differential Revision: https://reviews.llvm.org/D112611 --- llvm/lib/Analysis/BasicAliasAnalysis.cpp | 51 +++++----------------- .../test/Analysis/BasicAA/assume-index-positive.ll | 4 +- 2 files changed, 12 insertions(+), 43 deletions(-)
diff --git a/llvm/lib/Analysis/BasicAliasAnalysis.cpp b/llvm/lib/Analysis/BasicAliasAnalysis.cpp index 0305732ca5d5..8cf947c43bf4 100644 --- a/llvm/lib/Analysis/BasicAliasAnalysis.cpp +++ b/llvm/lib/Analysis/BasicAliasAnalysis.cpp @@ -318,15 +318,6 @@ struct CastedValue { return N; }
- KnownBits evaluateWith(KnownBits N) const { - assert(N.getBitWidth() == V->getType()->getPrimitiveSizeInBits() && - "Incompatible bit width"); - if (TruncBits) N = N.trunc(N.getBitWidth() - TruncBits); - if (SExtBits) N = N.sext(N.getBitWidth() + SExtBits); - if (ZExtBits) N = N.zext(N.getBitWidth() + ZExtBits); - return N; - } - ConstantRange evaluateWith(ConstantRange N) const { assert(N.getBitWidth() == V->getType()->getPrimitiveSizeInBits() && "Incompatible bit width"); @@ -1250,8 +1241,6 @@ AliasResult BasicAAResult::aliasGEP(
if (!DecompGEP1.VarIndices.empty()) { APInt GCD; - bool AllNonNegative = DecompGEP1.Offset.isNonNegative(); - bool AllNonPositive = DecompGEP1.Offset.isNonPositive(); ConstantRange OffsetRange = ConstantRange(DecompGEP1.Offset); for (unsigned i = 0, e = DecompGEP1.VarIndices.size(); i != e; ++i) { const VariableGEPIndex &Index = DecompGEP1.VarIndices[i]; @@ -1266,24 +1255,19 @@ AliasResult BasicAAResult::aliasGEP( else GCD = APIntOps::GreatestCommonDivisor(GCD, ScaleForGCD.abs());
- if (AllNonNegative || AllNonPositive) { - KnownBits Known = Index.Val.evaluateWith( - computeKnownBits(Index.Val.V, DL, 0, &AC, Index.CxtI, DT)); - bool SignKnownZero = Known.isNonNegative(); - bool SignKnownOne = Known.isNegative(); - AllNonNegative &= (SignKnownZero && Scale.isNonNegative()) || - (SignKnownOne && Scale.isNonPositive()); - AllNonPositive &= (SignKnownZero && Scale.isNonPositive()) || - (SignKnownOne && Scale.isNonNegative()); - } + ConstantRange CR = + computeConstantRange(Index.Val.V, true, &AC, Index.CxtI); + KnownBits Known = + computeKnownBits(Index.Val.V, DL, 0, &AC, Index.CxtI, DT); + CR = CR.intersectWith( + ConstantRange::fromKnownBits(Known, /* Signed */ true), + ConstantRange::Signed);
assert(OffsetRange.getBitWidth() == Scale.getBitWidth() && "Bit widths are normalized to MaxPointerSize"); - OffsetRange = OffsetRange.add(Index.Val - .evaluateWith(computeConstantRange( - Index.Val.V, true, &AC, Index.CxtI)) - .sextOrTrunc(OffsetRange.getBitWidth()) - .smul_fast(ConstantRange(Scale))); + OffsetRange = OffsetRange.add( + Index.Val.evaluateWith(CR).sextOrTrunc(OffsetRange.getBitWidth()) + .smul_fast(ConstantRange(Scale))); }
// We now have accesses at two offsets from the same base: @@ -1300,21 +1284,6 @@ AliasResult BasicAAResult::aliasGEP( (GCD - ModOffset).uge(V1Size.getValue())) return AliasResult::NoAlias;
- // If we know all the variables are non-negative, then the total offset is - // also non-negative and >= DecompGEP1.Offset. We have the following layout: - // [0, V2Size) ... [TotalOffset, TotalOffer+V1Size] - // If DecompGEP1.Offset >= V2Size, the accesses don't alias. - if (AllNonNegative && V2Size.hasValue() && - DecompGEP1.Offset.uge(V2Size.getValue())) - return AliasResult::NoAlias; - // Similarly, if the variables are non-positive, then the total offset is - // also non-positive and <= DecompGEP1.Offset. We have the following layout: - // [TotalOffset, TotalOffset+V1Size) ... [0, V2Size) - // If -DecompGEP1.Offset >= V1Size, the accesses don't alias. - if (AllNonPositive && V1Size.hasValue() && - (-DecompGEP1.Offset).uge(V1Size.getValue())) - return AliasResult::NoAlias; - if (V1Size.hasValue() && V2Size.hasValue()) { // Compute ranges of potentially accessed bytes for both accesses. If the // interseciton is empty, there can be no overlap. diff --git a/llvm/test/Analysis/BasicAA/assume-index-positive.ll b/llvm/test/Analysis/BasicAA/assume-index-positive.ll index 451592067f4b..a53fff2c6009 100644 --- a/llvm/test/Analysis/BasicAA/assume-index-positive.ll +++ b/llvm/test/Analysis/BasicAA/assume-index-positive.ll @@ -130,12 +130,12 @@ define void @symmetry([0 x i8]* %ptr, i32 %a, i32 %b, i32 %c) { ret void }
-; TODO: %ptr.neg and %ptr.shl may alias, as the shl renders the previously +; %ptr.neg and %ptr.shl may alias, as the shl renders the previously ; non-negative value potentially negative. define void @shl_of_non_negative(i8* %ptr, i64 %a) { ; CHECK-LABEL: Function: shl_of_non_negative ; CHECK: NoAlias: i8* %ptr.a, i8* %ptr.neg -; CHECK: NoAlias: i8* %ptr.neg, i8* %ptr.shl +; CHECK: MayAlias: i8* %ptr.neg, i8* %ptr.shl %a.cmp = icmp sge i64 %a, 0 call void @llvm.assume(i1 %a.cmp) %ptr.neg = getelementptr i8, i8* %ptr, i64 -2 </cut>
linaro-toolchain@lists.linaro.org