Successfully identified regression in *llvm* in CI configuration tcwg_bmk_llvm_tx1/llvm-release-aarch64-spec2k6-O3. So far, this commit has regressed CI configurations:
- tcwg_bmk_llvm_tx1/llvm-release-aarch64-spec2k6-O3
Culprit:
<cut>
commit 6998f8ae2d14e096aff33968f226587b5c1a193a
Author: David Sherwood <david.sherwood(a)arm.com>
Date: Wed Mar 10 08:34:19 2021 +0000
[LoopVectorize] Simplify scalar cost calculation in getInstructionCost
This patch simplifies the calculation of certain costs in
getInstructionCost when isScalarAfterVectorization() returns a true value.
There are a few places where we multiply a cost by a number N, i.e.
unsigned N = isScalarAfterVectorization(I, VF) ? VF.getKnownMinValue() : 1;
return N * TTI.getArithmeticInstrCost(...
After some investigation it seems that there are only these cases that occur
in practice:
1. VF is a scalar, in which case N = 1.
2. VF is a vector. We can only get here if: a) the instruction is a
GEP/bitcast/PHI with scalar uses, or b) this is an update to an induction
variable that remains scalar.
I have changed the code so that N is assumed to always be 1. For GEPs
the cost is always 0, since this is calculated later on as part of the
load/store cost. PHI nodes are costed separately and were never previously
multiplied by VF. For all other cases I have added an assert that none of
the users needs scalarising, which didn't fire in any unit tests.
Only one test required fixing and I believe the original cost for the scalar
add instruction to have been wrong, since only one copy remains after
vectorisation.
I have also added a new test for the case when a pointer PHI feeds directly
into a store that will be scalarised as we were previously never testing it.
Differential Revision: https://reviews.llvm.org/D99718
</cut>
Results regressed to (for first_bad == 6998f8ae2d14e096aff33968f226587b5c1a193a)
# reset_artifacts:
-10
# build_abe binutils:
-9
# build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer:
-8
# build_abe linux:
-7
# build_abe glibc:
-6
# build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer:
-5
# build_llvm true:
-3
# true:
0
# benchmark -- -O3 artifacts/build-6998f8ae2d14e096aff33968f226587b5c1a193a/results_id:
1
# 462.libquantum,libquantum_base.default regressed by 114
# 462.libquantum,[.] quantum_toffoli regressed by 123
# 462.libquantum,[.] quantum_cnot regressed by 115
from (for last_good == c835630c25a4f9925517949579f66a43b113fbc9)
# reset_artifacts:
-10
# build_abe binutils:
-9
# build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer:
-8
# build_abe linux:
-7
# build_abe glibc:
-6
# build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer:
-5
# build_llvm true:
-3
# true:
0
# benchmark -- -O3 artifacts/build-c835630c25a4f9925517949579f66a43b113fbc9/results_id:
1
Artifacts of last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-release…
Results ID of last_good: tx1_64/tcwg_bmk_llvm_tx1/bisect-llvm-release-aarch64-spec2k6-O3/3744
Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-release…
Results ID of first_bad: tx1_64/tcwg_bmk_llvm_tx1/bisect-llvm-release-aarch64-spec2k6-O3/3755
Build top page/logs: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-release…
Configuration details:
Reproduce builds:
<cut>
mkdir investigate-llvm-6998f8ae2d14e096aff33968f226587b5c1a193a
cd investigate-llvm-6998f8ae2d14e096aff33968f226587b5c1a193a
git clone https://git.linaro.org/toolchain/jenkins-scripts
mkdir -p artifacts/manifests
curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-release… --fail
curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-release… --fail
curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-release… --fail
chmod +x artifacts/test.sh
# Reproduce the baseline build (build all pre-requisites)
./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh
# Save baseline build state (which is then restored in artifacts/test.sh)
mkdir -p ./bisect
rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /llvm/ ./ ./bisect/baseline/
cd llvm
# Reproduce first_bad build
git checkout --detach 6998f8ae2d14e096aff33968f226587b5c1a193a
../artifacts/test.sh
# Reproduce last_good build
git checkout --detach c835630c25a4f9925517949579f66a43b113fbc9
../artifacts/test.sh
cd ..
</cut>
History of pending regressions and results: https://git.linaro.org/toolchain/ci/base-artifacts.git/log/?h=linaro-local/…
Artifacts: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-release…
Build log: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-release…
Full commit (up to 1000 lines):
<cut>
commit 6998f8ae2d14e096aff33968f226587b5c1a193a
Author: David Sherwood <david.sherwood(a)arm.com>
Date: Wed Mar 10 08:34:19 2021 +0000
[LoopVectorize] Simplify scalar cost calculation in getInstructionCost
This patch simplifies the calculation of certain costs in
getInstructionCost when isScalarAfterVectorization() returns a true value.
There are a few places where we multiply a cost by a number N, i.e.
unsigned N = isScalarAfterVectorization(I, VF) ? VF.getKnownMinValue() : 1;
return N * TTI.getArithmeticInstrCost(...
After some investigation it seems that there are only these cases that occur
in practice:
1. VF is a scalar, in which case N = 1.
2. VF is a vector. We can only get here if: a) the instruction is a
GEP/bitcast/PHI with scalar uses, or b) this is an update to an induction
variable that remains scalar.
I have changed the code so that N is assumed to always be 1. For GEPs
the cost is always 0, since this is calculated later on as part of the
load/store cost. PHI nodes are costed separately and were never previously
multiplied by VF. For all other cases I have added an assert that none of
the users needs scalarising, which didn't fire in any unit tests.
Only one test required fixing and I believe the original cost for the scalar
add instruction to have been wrong, since only one copy remains after
vectorisation.
I have also added a new test for the case when a pointer PHI feeds directly
into a store that will be scalarised as we were previously never testing it.
Differential Revision: https://reviews.llvm.org/D99718
---
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp | 73 +++++++++++++---------
.../AArch64/no_vector_instructions.ll | 2 +-
.../LoopVectorize/AArch64/predication_costs.ll | 35 +++++++++++
.../Transforms/LoopVectorize/scalarized-bitcast.ll | 40 ++++++++++++
4 files changed, 121 insertions(+), 29 deletions(-)
diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index 2b413fc49505..f25af23c86c2 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -7383,10 +7383,39 @@ LoopVectorizationCostModel::getInstructionCost(Instruction *I, ElementCount VF,
Type *RetTy = I->getType();
if (canTruncateToMinimalBitwidth(I, VF))
RetTy = IntegerType::get(RetTy->getContext(), MinBWs[I]);
- VectorTy = isScalarAfterVectorization(I, VF) ? RetTy : ToVectorTy(RetTy, VF);
auto SE = PSE.getSE();
TTI::TargetCostKind CostKind = TTI::TCK_RecipThroughput;
+ auto hasSingleCopyAfterVectorization = [this](Instruction *I,
+ ElementCount VF) -> bool {
+ if (VF.isScalar())
+ return true;
+
+ auto Scalarized = InstsToScalarize.find(VF);
+ assert(Scalarized != InstsToScalarize.end() &&
+ "VF not yet analyzed for scalarization profitability");
+ return !Scalarized->second.count(I) &&
+ llvm::all_of(I->users(), [&](User *U) {
+ auto *UI = cast<Instruction>(U);
+ return !Scalarized->second.count(UI);
+ });
+ };
+
+ if (isScalarAfterVectorization(I, VF)) {
+ // With the exception of GEPs and PHIs, after scalarization there should
+ // only be one copy of the instruction generated in the loop. This is
+ // because the VF is either 1, or any instructions that need scalarizing
+ // have already been dealt with by the the time we get here. As a result,
+ // it means we don't have to multiply the instruction cost by VF.
+ assert(I->getOpcode() == Instruction::GetElementPtr ||
+ I->getOpcode() == Instruction::PHI ||
+ (I->getOpcode() == Instruction::BitCast &&
+ I->getType()->isPointerTy()) ||
+ hasSingleCopyAfterVectorization(I, VF));
+ VectorTy = RetTy;
+ } else
+ VectorTy = ToVectorTy(RetTy, VF);
+
// TODO: We need to estimate the cost of intrinsic calls.
switch (I->getOpcode()) {
case Instruction::GetElementPtr:
@@ -7514,21 +7543,16 @@ LoopVectorizationCostModel::getInstructionCost(Instruction *I, ElementCount VF,
Op2VK = TargetTransformInfo::OK_UniformValue;
SmallVector<const Value *, 4> Operands(I->operand_values());
- unsigned N = isScalarAfterVectorization(I, VF) ? VF.getKnownMinValue() : 1;
- return N * TTI.getArithmeticInstrCost(
- I->getOpcode(), VectorTy, CostKind,
- TargetTransformInfo::OK_AnyValue,
- Op2VK, TargetTransformInfo::OP_None, Op2VP, Operands, I);
+ return TTI.getArithmeticInstrCost(
+ I->getOpcode(), VectorTy, CostKind, TargetTransformInfo::OK_AnyValue,
+ Op2VK, TargetTransformInfo::OP_None, Op2VP, Operands, I);
}
case Instruction::FNeg: {
assert(!VF.isScalable() && "VF is assumed to be non scalable.");
- unsigned N = isScalarAfterVectorization(I, VF) ? VF.getKnownMinValue() : 1;
- return N * TTI.getArithmeticInstrCost(
- I->getOpcode(), VectorTy, CostKind,
- TargetTransformInfo::OK_AnyValue,
- TargetTransformInfo::OK_AnyValue,
- TargetTransformInfo::OP_None, TargetTransformInfo::OP_None,
- I->getOperand(0), I);
+ return TTI.getArithmeticInstrCost(
+ I->getOpcode(), VectorTy, CostKind, TargetTransformInfo::OK_AnyValue,
+ TargetTransformInfo::OK_AnyValue, TargetTransformInfo::OP_None,
+ TargetTransformInfo::OP_None, I->getOperand(0), I);
}
case Instruction::Select: {
SelectInst *SI = cast<SelectInst>(I);
@@ -7583,6 +7607,10 @@ LoopVectorizationCostModel::getInstructionCost(Instruction *I, ElementCount VF,
VectorTy = ToVectorTy(getMemInstValueType(I), Width);
return getMemoryInstructionCost(I, VF);
}
+ case Instruction::BitCast:
+ if (I->getType()->isPointerTy())
+ return 0;
+ LLVM_FALLTHROUGH;
case Instruction::ZExt:
case Instruction::SExt:
case Instruction::FPToUI:
@@ -7593,8 +7621,7 @@ LoopVectorizationCostModel::getInstructionCost(Instruction *I, ElementCount VF,
case Instruction::SIToFP:
case Instruction::UIToFP:
case Instruction::Trunc:
- case Instruction::FPTrunc:
- case Instruction::BitCast: {
+ case Instruction::FPTrunc: {
// Computes the CastContextHint from a Load/Store instruction.
auto ComputeCCH = [&](Instruction *I) -> TTI::CastContextHint {
assert((isa<LoadInst>(I) || isa<StoreInst>(I)) &&
@@ -7672,14 +7699,7 @@ LoopVectorizationCostModel::getInstructionCost(Instruction *I, ElementCount VF,
}
}
- unsigned N;
- if (isScalarAfterVectorization(I, VF)) {
- assert(!VF.isScalable() && "VF is assumed to be non scalable");
- N = VF.getKnownMinValue();
- } else
- N = 1;
- return N *
- TTI.getCastInstrCost(Opcode, VectorTy, SrcVecTy, CCH, CostKind, I);
+ return TTI.getCastInstrCost(Opcode, VectorTy, SrcVecTy, CCH, CostKind, I);
}
case Instruction::Call: {
bool NeedToScalarize;
@@ -7694,11 +7714,8 @@ LoopVectorizationCostModel::getInstructionCost(Instruction *I, ElementCount VF,
case Instruction::ExtractValue:
return TTI.getInstructionCost(I, TTI::TCK_RecipThroughput);
default:
- // The cost of executing VF copies of the scalar instruction. This opcode
- // is unknown. Assume that it is the same as 'mul'.
- return VF.getKnownMinValue() * TTI.getArithmeticInstrCost(
- Instruction::Mul, VectorTy, CostKind) +
- getScalarizationOverhead(I, VF);
+ // This opcode is unknown. Assume that it is the same as 'mul'.
+ return TTI.getArithmeticInstrCost(Instruction::Mul, VectorTy, CostKind);
} // end of switch.
}
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/no_vector_instructions.ll b/llvm/test/Transforms/LoopVectorize/AArch64/no_vector_instructions.ll
index 247ea35ff5d0..3061998518ad 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/no_vector_instructions.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/no_vector_instructions.ll
@@ -6,7 +6,7 @@ target triple = "aarch64--linux-gnu"
; CHECK-LABEL: all_scalar
; CHECK: LV: Found scalar instruction: %i.next = add nuw nsw i64 %i, 2
-; CHECK: LV: Found an estimated cost of 2 for VF 2 For instruction: %i.next = add nuw nsw i64 %i, 2
+; CHECK: LV: Found an estimated cost of 1 for VF 2 For instruction: %i.next = add nuw nsw i64 %i, 2
; CHECK: LV: Not considering vector loop of width 2 because it will not generate any vector instructions
;
define void @all_scalar(i64* %a, i64 %n) {
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/predication_costs.ll b/llvm/test/Transforms/LoopVectorize/AArch64/predication_costs.ll
index b0ebb4edf2ad..858b28ddd321 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/predication_costs.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/predication_costs.ll
@@ -86,6 +86,41 @@ for.end:
ret void
}
+; CHECK-LABEL: predicated_store_phi
+;
+; Same as predicate_store except we use a pointer PHI to maintain the address
+;
+; CHECK: Found new scalar instruction: %addr = phi i32* [ %a, %entry ], [ %addr.next, %for.inc ]
+; CHECK: Found new scalar instruction: %addr.next = getelementptr inbounds i32, i32* %addr, i64 1
+; CHECK: Scalarizing and predicating: store i32 %tmp2, i32* %addr, align 4
+; CHECK: Found an estimated cost of 0 for VF 2 For instruction: %addr = phi i32* [ %a, %entry ], [ %addr.next, %for.inc ]
+; CHECK: Found an estimated cost of 3 for VF 2 For instruction: store i32 %tmp2, i32* %addr, align 4
+;
+define void @predicated_store_phi(i32* %a, i1 %c, i32 %x, i64 %n) {
+entry:
+ br label %for.body
+
+for.body:
+ %i = phi i64 [ 0, %entry ], [ %i.next, %for.inc ]
+ %addr = phi i32 * [ %a, %entry ], [ %addr.next, %for.inc ]
+ %tmp1 = load i32, i32* %addr, align 4
+ %tmp2 = add nsw i32 %tmp1, %x
+ br i1 %c, label %if.then, label %for.inc
+
+if.then:
+ store i32 %tmp2, i32* %addr, align 4
+ br label %for.inc
+
+for.inc:
+ %i.next = add nuw nsw i64 %i, 1
+ %cond = icmp slt i64 %i.next, %n
+ %addr.next = getelementptr inbounds i32, i32* %addr, i64 1
+ br i1 %cond, label %for.body, label %for.end
+
+for.end:
+ ret void
+}
+
; CHECK-LABEL: predicated_udiv_scalarized_operand
;
; This test checks that we correctly compute the cost of the predicated udiv
diff --git a/llvm/test/Transforms/LoopVectorize/scalarized-bitcast.ll b/llvm/test/Transforms/LoopVectorize/scalarized-bitcast.ll
new file mode 100644
index 000000000000..0c97e6ac475e
--- /dev/null
+++ b/llvm/test/Transforms/LoopVectorize/scalarized-bitcast.ll
@@ -0,0 +1,40 @@
+; REQUIRES: asserts
+; RUN: opt -loop-vectorize -force-vector-width=2 -debug-only=loop-vectorize -S -o - < %s 2>&1 | FileCheck %s
+
+%struct.foo = type { i32, i64 }
+
+; CHECK: LV: Found an estimated cost of 0 for VF 2 For instruction: %0 = bitcast i64* %b to i32*
+
+; The bitcast below will be scalarized due to the predication in the loop. Bitcasts
+; between pointer types should be treated as free, despite the scalarization.
+define void @foo(%struct.foo* noalias nocapture %in, i32* noalias nocapture readnone %out, i64 %n) {
+entry:
+ br label %for.body
+
+for.body: ; preds = %entry, %if.end
+ %i.012 = phi i64 [ %inc, %if.end ], [ 0, %entry ]
+ %b = getelementptr inbounds %struct.foo, %struct.foo* %in, i64 %i.012, i32 1
+ %0 = bitcast i64* %b to i32*
+ %a = getelementptr inbounds %struct.foo, %struct.foo* %in, i64 %i.012, i32 0
+ %1 = load i32, i32* %a, align 8
+ %tobool.not = icmp eq i32 %1, 0
+ br i1 %tobool.not, label %if.end, label %land.lhs.true
+
+land.lhs.true: ; preds = %for.body
+ %2 = load i32, i32* %0, align 4
+ %cmp2 = icmp sgt i32 %2, 0
+ br i1 %cmp2, label %if.then, label %if.end
+
+if.then: ; preds = %land.lhs.true
+ %sub = add nsw i32 %2, -1
+ store i32 %sub, i32* %0, align 4
+ br label %if.end
+
+if.end: ; preds = %if.then, %land.lhs.true, %for.body
+ %inc = add nuw nsw i64 %i.012, 1
+ %exitcond.not = icmp eq i64 %inc, %n
+ br i1 %exitcond.not, label %for.end, label %for.body
+
+for.end: ; preds = %if.end
+ ret void
+}
</cut>
Successfully identified regression in *llvm* in CI configuration tcwg_bmk_llvm_apm/llvm-master-aarch64-spec2k6-Oz_LTO. So far, this commit has regressed CI configurations:
- tcwg_bmk_llvm_apm/llvm-master-aarch64-spec2k6-Oz_LTO
Culprit:
<cut>
commit 4aafd5f00c2a772337ec065d4542ef158453a343
Author: Jan Svoboda <jan_svoboda(a)apple.com>
Date: Fri Aug 6 14:46:41 2021 +0200
[clang] Remove misleading assertion in FullSourceLoc
D31709 added an assertion was added to `FullSourceLoc::hasManager()` that ensured a valid `SourceLocation` is always paired with a `SourceManager`, and missing `SourceManager` is always paired with an invalid `SourceLocation`.
This appears to be incorrect, since clients never cared about constructing `FullSourceLoc` to uphold that invariant, or always checking `isValid()` before calling `hasManager()`.
The assertion started failing when serializing diagnostics pointing into an explicit module. Explicit modules don't have valid `SourceLocation` for the `import` statement, since they are "imported" from the command-line argument `-fmodule-name=x.pcm`.
This patch removes the assertion, since `FullSourceLoc` was never intended to uphold any kind of invariants between the validity of `SourceLocation` and presence of `SourceManager`.
Reviewed By: arphaman
Differential Revision: https://reviews.llvm.org/D106862
</cut>
Results regressed to (for first_bad == 4aafd5f00c2a772337ec065d4542ef158453a343)
# reset_artifacts:
-10
# build_abe binutils:
-9
# build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer:
-8
# build_abe linux:
-7
# build_abe glibc:
-6
# build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer:
-5
# build_llvm true:
-3
# true:
0
# benchmark -- -Oz_LTO artifacts/build-4aafd5f00c2a772337ec065d4542ef158453a343/results_id:
1
# 470.lbm,lbm_base.default regressed by 104
from (for last_good == 3709822d2602b8b7db2d9bcc0e856f676582f25d)
# reset_artifacts:
-10
# build_abe binutils:
-9
# build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer:
-8
# build_abe linux:
-7
# build_abe glibc:
-6
# build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer:
-5
# build_llvm true:
-3
# true:
0
# benchmark -- -Oz_LTO artifacts/build-3709822d2602b8b7db2d9bcc0e856f676582f25d/results_id:
1
Artifacts of last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-…
Results ID of last_good: apm_64/tcwg_bmk_llvm_apm/bisect-llvm-master-aarch64-spec2k6-Oz_LTO/3746
Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-…
Results ID of first_bad: apm_64/tcwg_bmk_llvm_apm/bisect-llvm-master-aarch64-spec2k6-Oz_LTO/3725
Build top page/logs: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-…
Configuration details:
Reproduce builds:
<cut>
mkdir investigate-llvm-4aafd5f00c2a772337ec065d4542ef158453a343
cd investigate-llvm-4aafd5f00c2a772337ec065d4542ef158453a343
git clone https://git.linaro.org/toolchain/jenkins-scripts
mkdir -p artifacts/manifests
curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… --fail
curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… --fail
curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… --fail
chmod +x artifacts/test.sh
# Reproduce the baseline build (build all pre-requisites)
./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh
# Save baseline build state (which is then restored in artifacts/test.sh)
mkdir -p ./bisect
rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /llvm/ ./ ./bisect/baseline/
cd llvm
# Reproduce first_bad build
git checkout --detach 4aafd5f00c2a772337ec065d4542ef158453a343
../artifacts/test.sh
# Reproduce last_good build
git checkout --detach 3709822d2602b8b7db2d9bcc0e856f676582f25d
../artifacts/test.sh
cd ..
</cut>
History of pending regressions and results: https://git.linaro.org/toolchain/ci/base-artifacts.git/log/?h=linaro-local/…
Artifacts: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-…
Build log: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-…
Full commit (up to 1000 lines):
<cut>
commit 4aafd5f00c2a772337ec065d4542ef158453a343
Author: Jan Svoboda <jan_svoboda(a)apple.com>
Date: Fri Aug 6 14:46:41 2021 +0200
[clang] Remove misleading assertion in FullSourceLoc
D31709 added an assertion was added to `FullSourceLoc::hasManager()` that ensured a valid `SourceLocation` is always paired with a `SourceManager`, and missing `SourceManager` is always paired with an invalid `SourceLocation`.
This appears to be incorrect, since clients never cared about constructing `FullSourceLoc` to uphold that invariant, or always checking `isValid()` before calling `hasManager()`.
The assertion started failing when serializing diagnostics pointing into an explicit module. Explicit modules don't have valid `SourceLocation` for the `import` statement, since they are "imported" from the command-line argument `-fmodule-name=x.pcm`.
This patch removes the assertion, since `FullSourceLoc` was never intended to uphold any kind of invariants between the validity of `SourceLocation` and presence of `SourceManager`.
Reviewed By: arphaman
Differential Revision: https://reviews.llvm.org/D106862
---
clang/include/clang/Basic/SourceLocation.h | 13 +++++++------
clang/test/Modules/Inputs/explicit-build-diags/a.h | 1 +
.../Modules/Inputs/explicit-build-diags/module.modulemap | 1 +
clang/test/Modules/explicit-build-diags.cpp | 8 ++++++++
4 files changed, 17 insertions(+), 6 deletions(-)
diff --git a/clang/include/clang/Basic/SourceLocation.h b/clang/include/clang/Basic/SourceLocation.h
index 540de23b9f55..ba2e9156a2b1 100644
--- a/clang/include/clang/Basic/SourceLocation.h
+++ b/clang/include/clang/Basic/SourceLocation.h
@@ -363,6 +363,10 @@ class FileEntry;
/// A SourceLocation and its associated SourceManager.
///
/// This is useful for argument passing to functions that expect both objects.
+///
+/// This class does not guarantee the presence of either the SourceManager or
+/// a valid SourceLocation. Clients should use `isValid()` and `hasManager()`
+/// before calling the member functions.
class FullSourceLoc : public SourceLocation {
const SourceManager *SrcMgr = nullptr;
@@ -373,13 +377,10 @@ public:
explicit FullSourceLoc(SourceLocation Loc, const SourceManager &SM)
: SourceLocation(Loc), SrcMgr(&SM) {}
- bool hasManager() const {
- bool hasSrcMgr = SrcMgr != nullptr;
- assert(hasSrcMgr == isValid() && "FullSourceLoc has location but no manager");
- return hasSrcMgr;
- }
+ /// Checks whether the SourceManager is present.
+ bool hasManager() const { return SrcMgr != nullptr; }
- /// \pre This FullSourceLoc has an associated SourceManager.
+ /// \pre hasManager()
const SourceManager &getManager() const {
assert(SrcMgr && "SourceManager is NULL.");
return *SrcMgr;
diff --git a/clang/test/Modules/Inputs/explicit-build-diags/a.h b/clang/test/Modules/Inputs/explicit-build-diags/a.h
new file mode 100644
index 000000000000..486941dde83b
--- /dev/null
+++ b/clang/test/Modules/Inputs/explicit-build-diags/a.h
@@ -0,0 +1 @@
+void a() __attribute__((deprecated));
diff --git a/clang/test/Modules/Inputs/explicit-build-diags/module.modulemap b/clang/test/Modules/Inputs/explicit-build-diags/module.modulemap
new file mode 100644
index 000000000000..bb00c840ce39
--- /dev/null
+++ b/clang/test/Modules/Inputs/explicit-build-diags/module.modulemap
@@ -0,0 +1 @@
+module a { header "a.h" }
diff --git a/clang/test/Modules/explicit-build-diags.cpp b/clang/test/Modules/explicit-build-diags.cpp
new file mode 100644
index 000000000000..4a37dc108a68
--- /dev/null
+++ b/clang/test/Modules/explicit-build-diags.cpp
@@ -0,0 +1,8 @@
+// RUN: rm -rf %t && mkdir %t
+// RUN: %clang_cc1 -fmodules -x c++ %S/Inputs/explicit-build-diags/module.modulemap -fmodule-name=a -emit-module -o %t/a.pcm
+// RUN: %clang_cc1 -fmodules -Wdeprecated-declarations -fdiagnostics-show-note-include-stack -serialize-diagnostic-file %t/tu.dia \
+// RUN: -I %S/Inputs/explicit-build-diags -fmodule-file=%t/a.pcm -fsyntax-only %s
+
+#include "a.h"
+
+void foo() { a(); }
</cut>
Successfully identified regression in *llvm* in CI configuration tcwg_bmk_llvm_tk1/llvm-release-arm-spec2k6-O3. So far, this commit has regressed CI configurations:
- tcwg_bmk_llvm_tk1/llvm-release-arm-spec2k6-O3
Culprit:
<cut>
commit e771614bae0a05585f720812d5936a0b81dcddf0
Author: David Green <david.green(a)arm.com>
Date: Thu Feb 11 11:58:55 2021 +0000
[ARM] Change getScalarizationOverhead overload used in gather costs. NFC
This changes which of the getScalarizationOverhead overloads is used in
the gather/scatter cost to use the base variant directly, not relying on
the version using heuristics on the number of args with no args
provided. It should still produce the same costs for scalarized
gathers/scatters.
</cut>
Results regressed to (for first_bad == e771614bae0a05585f720812d5936a0b81dcddf0)
# reset_artifacts:
-10
# build_abe binutils:
-9
# build_abe stage1 -- --set gcc_override_configure=--with-mode=arm --set gcc_override_configure=--disable-libsanitizer:
-8
# build_abe linux:
-7
# build_abe glibc:
-6
# build_abe stage2 -- --set gcc_override_configure=--with-mode=arm --set gcc_override_configure=--disable-libsanitizer:
-5
# build_llvm true:
-3
# true:
0
# benchmark -- -O3_marm artifacts/build-e771614bae0a05585f720812d5936a0b81dcddf0/results_id:
1
# 445.gobmk,[.] fastlib regressed by 115
from (for last_good == a31eae840525e9292a3a42c1fdac3fc594f42949)
# reset_artifacts:
-10
# build_abe binutils:
-9
# build_abe stage1 -- --set gcc_override_configure=--with-mode=arm --set gcc_override_configure=--disable-libsanitizer:
-8
# build_abe linux:
-7
# build_abe glibc:
-6
# build_abe stage2 -- --set gcc_override_configure=--with-mode=arm --set gcc_override_configure=--disable-libsanitizer:
-5
# build_llvm true:
-3
# true:
0
# benchmark -- -O3_marm artifacts/build-a31eae840525e9292a3a42c1fdac3fc594f42949/results_id:
1
Artifacts of last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release…
Results ID of last_good: tk1_32/tcwg_bmk_llvm_tk1/bisect-llvm-release-arm-spec2k6-O3/3644
Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release…
Results ID of first_bad: tk1_32/tcwg_bmk_llvm_tk1/bisect-llvm-release-arm-spec2k6-O3/3642
Build top page/logs: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release…
Configuration details:
Reproduce builds:
<cut>
mkdir investigate-llvm-e771614bae0a05585f720812d5936a0b81dcddf0
cd investigate-llvm-e771614bae0a05585f720812d5936a0b81dcddf0
git clone https://git.linaro.org/toolchain/jenkins-scripts
mkdir -p artifacts/manifests
curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… --fail
curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… --fail
curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release… --fail
chmod +x artifacts/test.sh
# Reproduce the baseline build (build all pre-requisites)
./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh
# Save baseline build state (which is then restored in artifacts/test.sh)
mkdir -p ./bisect
rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /llvm/ ./ ./bisect/baseline/
cd llvm
# Reproduce first_bad build
git checkout --detach e771614bae0a05585f720812d5936a0b81dcddf0
../artifacts/test.sh
# Reproduce last_good build
git checkout --detach a31eae840525e9292a3a42c1fdac3fc594f42949
../artifacts/test.sh
cd ..
</cut>
History of pending regressions and results: https://git.linaro.org/toolchain/ci/base-artifacts.git/log/?h=linaro-local/…
Artifacts: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release…
Build log: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-release…
Full commit (up to 1000 lines):
<cut>
commit e771614bae0a05585f720812d5936a0b81dcddf0
Author: David Green <david.green(a)arm.com>
Date: Thu Feb 11 11:58:55 2021 +0000
[ARM] Change getScalarizationOverhead overload used in gather costs. NFC
This changes which of the getScalarizationOverhead overloads is used in
the gather/scatter cost to use the base variant directly, not relying on
the version using heuristics on the number of args with no args
provided. It should still produce the same costs for scalarized
gathers/scatters.
---
llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp b/llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp
index af67839c2d75..de2c0607d2ed 100644
--- a/llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp
+++ b/llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp
@@ -1416,8 +1416,9 @@ unsigned ARMTTIImpl::getGatherScatterOpCost(unsigned Opcode, Type *DataTy,
unsigned VectorCost = NumElems * LT.first * ST->getMVEVectorCostFactor();
// The scalarization cost should be a lot higher. We use the number of vector
// elements plus the scalarization overhead.
- unsigned ScalarCost =
- NumElems * LT.first + BaseT::getScalarizationOverhead(VTy, {});
+ unsigned ScalarCost = NumElems * LT.first +
+ BaseT::getScalarizationOverhead(VTy, true, false) +
+ BaseT::getScalarizationOverhead(VTy, false, true);
if (EltSize < 8 || Alignment < EltSize / 8)
return ScalarCost;
</cut>
Successfully identified regression in *llvm* in CI configuration tcwg_bmk_llvm_tx1/llvm-release-aarch64-spec2k6-O3_LTO. So far, this commit has regressed CI configurations:
- tcwg_bmk_llvm_tx1/llvm-release-aarch64-spec2k6-O3_LTO
Culprit:
<cut>
commit 669ddd1e9b1226432b003dbba05b99f8e992285b
Author: Arthur Eubanks <aeubanks(a)google.com>
Date: Mon Jan 25 11:00:56 2021 -0800
Turn on the new pass manager by default
This turns on the new pass manager by default for the optimization pipeline in
Clang and ThinLTO in various LLD backends. This also makes uses of `opt
-instcombine` use the new pass manager (unless specifically opted out).
This does not affect the backend target-dependent codegen pipeline.
If this causes regressions, you can opt out of the new pass manager
either via the -DENABLE_EXPERIMENTAL_NEW_PASS_MANAGER=OFF CMake flag
while building LLVM, or via various compiler flags, e.g.
-flegacy-pass-manager for Clang or -Wl,--lto-legacy-pass-manager for
ELF LLD. Please file bugs for any regressions.
Major differences:
* The inliner works slightly differently
* -O1 does some amount of inlining
* LCSSA and LoopSimplify are run before all loop passes
* Loop unswitching is implemented slightly differently
* A new SpeculateAroundPHIs pass is added to the pipeline
https://lists.llvm.org/pipermail/llvm-dev/2021-January/148098.html
Reviewed By: asbirlea, ychen, MaskRay, echristo
Differential Revision: https://reviews.llvm.org/D95380
</cut>
Results regressed to (for first_bad == 669ddd1e9b1226432b003dbba05b99f8e992285b)
# reset_artifacts:
-10
# build_abe binutils:
-9
# build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer:
-8
# build_abe linux:
-7
# build_abe glibc:
-6
# build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer:
-5
# build_llvm true:
-3
# true:
0
# benchmark -- -O3_LTO artifacts/build-669ddd1e9b1226432b003dbba05b99f8e992285b/results_id:
1
# 473.astar,astar_base.default regressed by 106
from (for last_good == b15cbaf5a03d0b32dbc32c37766e32ccf66e6c87)
# reset_artifacts:
-10
# build_abe binutils:
-9
# build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer:
-8
# build_abe linux:
-7
# build_abe glibc:
-6
# build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer:
-5
# build_llvm true:
-3
# true:
0
# benchmark -- -O3_LTO artifacts/build-b15cbaf5a03d0b32dbc32c37766e32ccf66e6c87/results_id:
1
Artifacts of last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-release…
Results ID of last_good: tx1_64/tcwg_bmk_llvm_tx1/bisect-llvm-release-aarch64-spec2k6-O3_LTO/3543
Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-release…
Results ID of first_bad: tx1_64/tcwg_bmk_llvm_tx1/bisect-llvm-release-aarch64-spec2k6-O3_LTO/3539
Build top page/logs: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-release…
Configuration details:
Reproduce builds:
<cut>
mkdir investigate-llvm-669ddd1e9b1226432b003dbba05b99f8e992285b
cd investigate-llvm-669ddd1e9b1226432b003dbba05b99f8e992285b
git clone https://git.linaro.org/toolchain/jenkins-scripts
mkdir -p artifacts/manifests
curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-release… --fail
curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-release… --fail
curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-release… --fail
chmod +x artifacts/test.sh
# Reproduce the baseline build (build all pre-requisites)
./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh
# Save baseline build state (which is then restored in artifacts/test.sh)
mkdir -p ./bisect
rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /llvm/ ./ ./bisect/baseline/
cd llvm
# Reproduce first_bad build
git checkout --detach 669ddd1e9b1226432b003dbba05b99f8e992285b
../artifacts/test.sh
# Reproduce last_good build
git checkout --detach b15cbaf5a03d0b32dbc32c37766e32ccf66e6c87
../artifacts/test.sh
cd ..
</cut>
History of pending regressions and results: https://git.linaro.org/toolchain/ci/base-artifacts.git/log/?h=linaro-local/…
Artifacts: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-release…
Build log: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-release…
Full commit (up to 1000 lines):
<cut>
commit 669ddd1e9b1226432b003dbba05b99f8e992285b
Author: Arthur Eubanks <aeubanks(a)google.com>
Date: Mon Jan 25 11:00:56 2021 -0800
Turn on the new pass manager by default
This turns on the new pass manager by default for the optimization pipeline in
Clang and ThinLTO in various LLD backends. This also makes uses of `opt
-instcombine` use the new pass manager (unless specifically opted out).
This does not affect the backend target-dependent codegen pipeline.
If this causes regressions, you can opt out of the new pass manager
either via the -DENABLE_EXPERIMENTAL_NEW_PASS_MANAGER=OFF CMake flag
while building LLVM, or via various compiler flags, e.g.
-flegacy-pass-manager for Clang or -Wl,--lto-legacy-pass-manager for
ELF LLD. Please file bugs for any regressions.
Major differences:
* The inliner works slightly differently
* -O1 does some amount of inlining
* LCSSA and LoopSimplify are run before all loop passes
* Loop unswitching is implemented slightly differently
* A new SpeculateAroundPHIs pass is added to the pipeline
https://lists.llvm.org/pipermail/llvm-dev/2021-January/148098.html
Reviewed By: asbirlea, ychen, MaskRay, echristo
Differential Revision: https://reviews.llvm.org/D95380
---
llvm/CMakeLists.txt | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/llvm/CMakeLists.txt b/llvm/CMakeLists.txt
index 1affc289e64b..f5298de9f7ca 100644
--- a/llvm/CMakeLists.txt
+++ b/llvm/CMakeLists.txt
@@ -688,8 +688,8 @@ else()
endif()
option(LLVM_ENABLE_PLUGINS "Enable plugin support" ${LLVM_ENABLE_PLUGINS_default})
-set(ENABLE_EXPERIMENTAL_NEW_PASS_MANAGER FALSE CACHE BOOL
- "Enable the experimental new pass manager by default.")
+set(ENABLE_EXPERIMENTAL_NEW_PASS_MANAGER TRUE CACHE BOOL
+ "Enable the new pass manager by default.")
include(HandleLLVMOptions)
</cut>
Successfully identified regression in *gcc* in CI configuration tcwg_bmk_gnu_tx1/gnu-release-aarch64-spec2k6-O2. So far, this commit has regressed CI configurations:
- tcwg_bmk_gnu_tx1/gnu-release-aarch64-spec2k6-O2
Culprit:
<cut>
commit df7c22831f1e48dba49479c5960c1c180d8eab2c
Author: Richard Sandiford <richard.sandiford(a)arm.com>
Date: Thu Nov 14 15:12:58 2019 +0000
Support vectorisation with mixed vector sizes
After previous patches, it's now possible to make the vectoriser
support multiple vector sizes in the same vector region, using
related_vector_mode to pick the right vector mode for a given
element mode. No port yet takes advantage of this, but I have
a follow-on patch for AArch64.
This patch also seemed like a good opportunity to add some more dump
messages: one to make it clear which vector size/mode was being used
when analysis passed or failed, and another to say when we've decided
to skip a redundant vector size/mode.
2019-11-14 Richard Sandiford <richard.sandiford(a)arm.com>
gcc/
* machmode.h (opt_machine_mode::operator==): New function.
(opt_machine_mode::operator!=): Likewise.
* tree-vectorizer.h (vec_info::vector_mode): Update comment.
(get_related_vectype_for_scalar_type): Delete.
(get_vectype_for_scalar_type_and_size): Declare.
* tree-vect-slp.c (vect_slp_bb_region): Print dump messages to say
whether analysis passed or failed, and with what vector modes.
Use related_vector_mode to check whether trying a particular
vector mode would be redundant with the autodetected mode,
and print a dump message if we decide to skip it.
* tree-vect-loop.c (vect_analyze_loop): Likewise.
(vect_create_epilog_for_reduction): Use
get_related_vectype_for_scalar_type instead of
get_vectype_for_scalar_type_and_size.
* tree-vect-stmts.c (get_vectype_for_scalar_type_and_size): Replace
with...
(get_related_vectype_for_scalar_type): ...this new function.
Take a starting/"prevailing" vector mode rather than a vector size.
Take an optional nunits argument, with the same meaning as for
related_vector_mode. Use related_vector_mode when not
auto-detecting a mode, falling back to mode_for_vector if no
target mode exists.
(get_vectype_for_scalar_type): Update accordingly.
(get_same_sized_vectype): Likewise.
* tree-vectorizer.c (get_vec_alignment_for_array_type): Likewise.
From-SVN: r278240
</cut>
Results regressed to (for first_bad == df7c22831f1e48dba49479c5960c1c180d8eab2c)
# reset_artifacts:
-10
# build_abe binutils:
-9
# build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer:
-8
# build_abe linux:
-7
# build_abe glibc:
-6
# build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer:
-5
# true:
0
# benchmark -- -O2 artifacts/build-df7c22831f1e48dba49479c5960c1c180d8eab2c/results_id:
1
# 453.povray,[.] _ZN3povL24All_Sphere_IntersectionsEPNS_13Objec regressed by 114
# 482.sphinx3,[.] subvq_mgau_shortlist regressed by 112
from (for last_good == 7f52eb891b738337d5cf82c7c440a5eea8c7b0c9)
# reset_artifacts:
-10
# build_abe binutils:
-9
# build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer:
-8
# build_abe linux:
-7
# build_abe glibc:
-6
# build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer:
-5
# true:
0
# benchmark -- -O2 artifacts/build-7f52eb891b738337d5cf82c7c440a5eea8c7b0c9/results_id:
1
Artifacts of last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-release-a…
Results ID of last_good: tx1_64/tcwg_bmk_gnu_tx1/bisect-gnu-release-aarch64-spec2k6-O2/3483
Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-release-a…
Results ID of first_bad: tx1_64/tcwg_bmk_gnu_tx1/bisect-gnu-release-aarch64-spec2k6-O2/3492
Build top page/logs: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-release-a…
Configuration details:
Reproduce builds:
<cut>
mkdir investigate-gcc-df7c22831f1e48dba49479c5960c1c180d8eab2c
cd investigate-gcc-df7c22831f1e48dba49479c5960c1c180d8eab2c
git clone https://git.linaro.org/toolchain/jenkins-scripts
mkdir -p artifacts/manifests
curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-release-a… --fail
curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-release-a… --fail
curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-release-a… --fail
chmod +x artifacts/test.sh
# Reproduce the baseline build (build all pre-requisites)
./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh
# Save baseline build state (which is then restored in artifacts/test.sh)
mkdir -p ./bisect
rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /gcc/ ./ ./bisect/baseline/
cd gcc
# Reproduce first_bad build
git checkout --detach df7c22831f1e48dba49479c5960c1c180d8eab2c
../artifacts/test.sh
# Reproduce last_good build
git checkout --detach 7f52eb891b738337d5cf82c7c440a5eea8c7b0c9
../artifacts/test.sh
cd ..
</cut>
History of pending regressions and results: https://git.linaro.org/toolchain/ci/base-artifacts.git/log/?h=linaro-local/…
Artifacts: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-release-a…
Build log: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tx1-gnu-release-a…
Full commit (up to 1000 lines):
<cut>
commit df7c22831f1e48dba49479c5960c1c180d8eab2c
Author: Richard Sandiford <richard.sandiford(a)arm.com>
Date: Thu Nov 14 15:12:58 2019 +0000
Support vectorisation with mixed vector sizes
After previous patches, it's now possible to make the vectoriser
support multiple vector sizes in the same vector region, using
related_vector_mode to pick the right vector mode for a given
element mode. No port yet takes advantage of this, but I have
a follow-on patch for AArch64.
This patch also seemed like a good opportunity to add some more dump
messages: one to make it clear which vector size/mode was being used
when analysis passed or failed, and another to say when we've decided
to skip a redundant vector size/mode.
2019-11-14 Richard Sandiford <richard.sandiford(a)arm.com>
gcc/
* machmode.h (opt_machine_mode::operator==): New function.
(opt_machine_mode::operator!=): Likewise.
* tree-vectorizer.h (vec_info::vector_mode): Update comment.
(get_related_vectype_for_scalar_type): Delete.
(get_vectype_for_scalar_type_and_size): Declare.
* tree-vect-slp.c (vect_slp_bb_region): Print dump messages to say
whether analysis passed or failed, and with what vector modes.
Use related_vector_mode to check whether trying a particular
vector mode would be redundant with the autodetected mode,
and print a dump message if we decide to skip it.
* tree-vect-loop.c (vect_analyze_loop): Likewise.
(vect_create_epilog_for_reduction): Use
get_related_vectype_for_scalar_type instead of
get_vectype_for_scalar_type_and_size.
* tree-vect-stmts.c (get_vectype_for_scalar_type_and_size): Replace
with...
(get_related_vectype_for_scalar_type): ...this new function.
Take a starting/"prevailing" vector mode rather than a vector size.
Take an optional nunits argument, with the same meaning as for
related_vector_mode. Use related_vector_mode when not
auto-detecting a mode, falling back to mode_for_vector if no
target mode exists.
(get_vectype_for_scalar_type): Update accordingly.
(get_same_sized_vectype): Likewise.
* tree-vectorizer.c (get_vec_alignment_for_array_type): Likewise.
From-SVN: r278240
---
gcc/ChangeLog | 28 +++++++++++++++++++++++++
gcc/machmode.h | 3 +++
gcc/tree-vect-loop.c | 54 +++++++++++++++++++++++++++++++++++-------------
gcc/tree-vect-slp.c | 33 +++++++++++++++++++++++++----
gcc/tree-vect-stmts.c | 57 ++++++++++++++++++++++++++++++++++++---------------
gcc/tree-vectorizer.c | 2 +-
gcc/tree-vectorizer.h | 8 +++++---
7 files changed, 147 insertions(+), 38 deletions(-)
diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 41c94140b1a..680aa85121a 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,31 @@
+2019-11-14 Richard Sandiford <richard.sandiford(a)arm.com>
+
+ * machmode.h (opt_machine_mode::operator==): New function.
+ (opt_machine_mode::operator!=): Likewise.
+ * tree-vectorizer.h (vec_info::vector_mode): Update comment.
+ (get_related_vectype_for_scalar_type): Delete.
+ (get_vectype_for_scalar_type_and_size): Declare.
+ * tree-vect-slp.c (vect_slp_bb_region): Print dump messages to say
+ whether analysis passed or failed, and with what vector modes.
+ Use related_vector_mode to check whether trying a particular
+ vector mode would be redundant with the autodetected mode,
+ and print a dump message if we decide to skip it.
+ * tree-vect-loop.c (vect_analyze_loop): Likewise.
+ (vect_create_epilog_for_reduction): Use
+ get_related_vectype_for_scalar_type instead of
+ get_vectype_for_scalar_type_and_size.
+ * tree-vect-stmts.c (get_vectype_for_scalar_type_and_size): Replace
+ with...
+ (get_related_vectype_for_scalar_type): ...this new function.
+ Take a starting/"prevailing" vector mode rather than a vector size.
+ Take an optional nunits argument, with the same meaning as for
+ related_vector_mode. Use related_vector_mode when not
+ auto-detecting a mode, falling back to mode_for_vector if no
+ target mode exists.
+ (get_vectype_for_scalar_type): Update accordingly.
+ (get_same_sized_vectype): Likewise.
+ * tree-vectorizer.c (get_vec_alignment_for_array_type): Likewise.
+
2019-11-14 Richard Sandiford <richard.sandiford(a)arm.com>
* tree-vect-stmts.c (vectorizable_call): Require the types
diff --git a/gcc/machmode.h b/gcc/machmode.h
index 6750833c2fe..a507ed66c3f 100644
--- a/gcc/machmode.h
+++ b/gcc/machmode.h
@@ -258,6 +258,9 @@ public:
bool exists () const;
template<typename U> bool exists (U *) const;
+ bool operator== (const T &m) const { return m_mode == m; }
+ bool operator!= (const T &m) const { return m_mode != m; }
+
private:
machine_mode m_mode;
};
diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index 213d620ed2c..e60c159d11a 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -2435,6 +2435,17 @@ vect_analyze_loop (class loop *loop, vec_info_shared *shared)
res = vect_analyze_loop_2 (loop_vinfo, fatal, &n_stmts);
if (mode_i == 0)
autodetected_vector_mode = loop_vinfo->vector_mode;
+ if (dump_enabled_p ())
+ {
+ if (res)
+ dump_printf_loc (MSG_NOTE, vect_location,
+ "***** Analysis succeeded with vector mode %s\n",
+ GET_MODE_NAME (loop_vinfo->vector_mode));
+ else
+ dump_printf_loc (MSG_NOTE, vect_location,
+ "***** Analysis failed with vector mode %s\n",
+ GET_MODE_NAME (loop_vinfo->vector_mode));
+ }
loop->aux = NULL;
if (res)
@@ -2501,9 +2512,22 @@ vect_analyze_loop (class loop *loop, vec_info_shared *shared)
}
if (mode_i < vector_modes.length ()
- && known_eq (GET_MODE_SIZE (vector_modes[mode_i]),
- GET_MODE_SIZE (autodetected_vector_mode)))
- mode_i += 1;
+ && VECTOR_MODE_P (autodetected_vector_mode)
+ && (related_vector_mode (vector_modes[mode_i],
+ GET_MODE_INNER (autodetected_vector_mode))
+ == autodetected_vector_mode)
+ && (related_vector_mode (autodetected_vector_mode,
+ GET_MODE_INNER (vector_modes[mode_i]))
+ == vector_modes[mode_i]))
+ {
+ if (dump_enabled_p ())
+ dump_printf_loc (MSG_NOTE, vect_location,
+ "***** Skipping vector mode %s, which would"
+ " repeat the analysis for %s\n",
+ GET_MODE_NAME (vector_modes[mode_i]),
+ GET_MODE_NAME (autodetected_vector_mode));
+ mode_i += 1;
+ }
if (mode_i == vector_modes.length ()
|| autodetected_vector_mode == VOIDmode)
@@ -4898,13 +4922,14 @@ vect_create_epilog_for_reduction (stmt_vec_info stmt_info,
halves against each other. */
enum machine_mode mode1 = mode;
tree stype = TREE_TYPE (vectype);
- unsigned sz = tree_to_uhwi (TYPE_SIZE_UNIT (vectype));
- unsigned sz1 = sz;
+ unsigned nunits = TYPE_VECTOR_SUBPARTS (vectype).to_constant ();
+ unsigned nunits1 = nunits;
if (!slp_reduc
&& (mode1 = targetm.vectorize.split_reduction (mode)) != mode)
- sz1 = GET_MODE_SIZE (mode1).to_constant ();
+ nunits1 = GET_MODE_NUNITS (mode1).to_constant ();
- tree vectype1 = get_vectype_for_scalar_type_and_size (stype, sz1);
+ tree vectype1 = get_related_vectype_for_scalar_type (TYPE_MODE (vectype),
+ stype, nunits1);
reduce_with_shift = have_whole_vector_shift (mode1);
if (!VECTOR_MODE_P (mode1))
reduce_with_shift = false;
@@ -4918,11 +4943,13 @@ vect_create_epilog_for_reduction (stmt_vec_info stmt_info,
/* First reduce the vector to the desired vector size we should
do shift reduction on by combining upper and lower halves. */
new_temp = new_phi_result;
- while (sz > sz1)
+ while (nunits > nunits1)
{
gcc_assert (!slp_reduc);
- sz /= 2;
- vectype1 = get_vectype_for_scalar_type_and_size (stype, sz);
+ nunits /= 2;
+ vectype1 = get_related_vectype_for_scalar_type (TYPE_MODE (vectype),
+ stype, nunits);
+ unsigned int bitsize = tree_to_uhwi (TYPE_SIZE (vectype1));
/* The target has to make sure we support lowpart/highpart
extraction, either via direct vector extract or through
@@ -4947,15 +4974,14 @@ vect_create_epilog_for_reduction (stmt_vec_info stmt_info,
= gimple_build_assign (dst2, BIT_FIELD_REF,
build3 (BIT_FIELD_REF, vectype1,
new_temp, TYPE_SIZE (vectype1),
- bitsize_int (sz * BITS_PER_UNIT)));
+ bitsize_int (bitsize)));
gsi_insert_before (&exit_gsi, epilog_stmt, GSI_SAME_STMT);
}
else
{
/* Extract via punning to appropriately sized integer mode
vector. */
- tree eltype = build_nonstandard_integer_type (sz * BITS_PER_UNIT,
- 1);
+ tree eltype = build_nonstandard_integer_type (bitsize, 1);
tree etype = build_vector_type (eltype, 2);
gcc_assert (convert_optab_handler (vec_extract_optab,
TYPE_MODE (etype),
@@ -4984,7 +5010,7 @@ vect_create_epilog_for_reduction (stmt_vec_info stmt_info,
= gimple_build_assign (tem, BIT_FIELD_REF,
build3 (BIT_FIELD_REF, eltype,
new_temp, TYPE_SIZE (eltype),
- bitsize_int (sz * BITS_PER_UNIT)));
+ bitsize_int (bitsize)));
gsi_insert_before (&exit_gsi, epilog_stmt, GSI_SAME_STMT);
dst2 = make_ssa_name (vectype1);
epilog_stmt = gimple_build_assign (dst2, VIEW_CONVERT_EXPR,
diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c
index 3885d9cbe4a..1e00db5a326 100644
--- a/gcc/tree-vect-slp.c
+++ b/gcc/tree-vect-slp.c
@@ -3203,7 +3203,12 @@ vect_slp_bb_region (gimple_stmt_iterator region_begin,
&& dbg_cnt (vect_slp))
{
if (dump_enabled_p ())
- dump_printf_loc (MSG_NOTE, vect_location, "SLPing BB part\n");
+ {
+ dump_printf_loc (MSG_NOTE, vect_location,
+ "***** Analysis succeeded with vector mode"
+ " %s\n", GET_MODE_NAME (bb_vinfo->vector_mode));
+ dump_printf_loc (MSG_NOTE, vect_location, "SLPing BB part\n");
+ }
bb_vinfo->shared->check_datarefs ();
vect_schedule_slp (bb_vinfo);
@@ -3223,6 +3228,13 @@ vect_slp_bb_region (gimple_stmt_iterator region_begin,
vectorized = true;
}
+ else
+ {
+ if (dump_enabled_p ())
+ dump_printf_loc (MSG_NOTE, vect_location,
+ "***** Analysis failed with vector mode %s\n",
+ GET_MODE_NAME (bb_vinfo->vector_mode));
+ }
if (mode_i == 0)
autodetected_vector_mode = bb_vinfo->vector_mode;
@@ -3230,9 +3242,22 @@ vect_slp_bb_region (gimple_stmt_iterator region_begin,
delete bb_vinfo;
if (mode_i < vector_modes.length ()
- && known_eq (GET_MODE_SIZE (vector_modes[mode_i]),
- GET_MODE_SIZE (autodetected_vector_mode)))
- mode_i += 1;
+ && VECTOR_MODE_P (autodetected_vector_mode)
+ && (related_vector_mode (vector_modes[mode_i],
+ GET_MODE_INNER (autodetected_vector_mode))
+ == autodetected_vector_mode)
+ && (related_vector_mode (autodetected_vector_mode,
+ GET_MODE_INNER (vector_modes[mode_i]))
+ == vector_modes[mode_i]))
+ {
+ if (dump_enabled_p ())
+ dump_printf_loc (MSG_NOTE, vect_location,
+ "***** Skipping vector mode %s, which would"
+ " repeat the analysis for %s\n",
+ GET_MODE_NAME (vector_modes[mode_i]),
+ GET_MODE_NAME (autodetected_vector_mode));
+ mode_i += 1;
+ }
if (vectorized
|| mode_i == vector_modes.length ()
diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
index 80f59accad7..36f832bb522 100644
--- a/gcc/tree-vect-stmts.c
+++ b/gcc/tree-vect-stmts.c
@@ -11138,18 +11138,28 @@ vect_remove_stores (stmt_vec_info first_stmt_info)
}
}
-/* Function get_vectype_for_scalar_type_and_size.
+/* If NUNITS is nonzero, return a vector type that contains NUNITS
+ elements of type SCALAR_TYPE, or null if the target doesn't support
+ such a type.
- Returns the vector type corresponding to SCALAR_TYPE and SIZE as supported
- by the target. */
+ If NUNITS is zero, return a vector type that contains elements of
+ type SCALAR_TYPE, choosing whichever vector size the target prefers.
+
+ If PREVAILING_MODE is VOIDmode, we have not yet chosen a vector mode
+ for this vectorization region and want to "autodetect" the best choice.
+ Otherwise, PREVAILING_MODE is a previously-chosen vector TYPE_MODE
+ and we want the new type to be interoperable with it. PREVAILING_MODE
+ in this case can be a scalar integer mode or a vector mode; when it
+ is a vector mode, the function acts like a tree-level version of
+ related_vector_mode. */
tree
-get_vectype_for_scalar_type_and_size (tree scalar_type, poly_uint64 size)
+get_related_vectype_for_scalar_type (machine_mode prevailing_mode,
+ tree scalar_type, poly_uint64 nunits)
{
tree orig_scalar_type = scalar_type;
scalar_mode inner_mode;
machine_mode simd_mode;
- poly_uint64 nunits;
tree vectype;
if (!is_int_mode (TYPE_MODE (scalar_type), &inner_mode)
@@ -11189,10 +11199,11 @@ get_vectype_for_scalar_type_and_size (tree scalar_type, poly_uint64 size)
if (scalar_type == NULL_TREE)
return NULL_TREE;
- /* If no size was supplied use the mode the target prefers. Otherwise
- lookup a vector mode of the specified size. */
- if (known_eq (size, 0U))
+ /* If no prevailing mode was supplied, use the mode the target prefers.
+ Otherwise lookup a vector mode based on the prevailing mode. */
+ if (prevailing_mode == VOIDmode)
{
+ gcc_assert (known_eq (nunits, 0U));
simd_mode = targetm.vectorize.preferred_simd_mode (inner_mode);
if (SCALAR_INT_MODE_P (simd_mode))
{
@@ -11208,9 +11219,19 @@ get_vectype_for_scalar_type_and_size (tree scalar_type, poly_uint64 size)
return NULL_TREE;
}
}
- else if (!multiple_p (size, nbytes, &nunits)
- || !mode_for_vector (inner_mode, nunits).exists (&simd_mode))
- return NULL_TREE;
+ else if (SCALAR_INT_MODE_P (prevailing_mode)
+ || !related_vector_mode (prevailing_mode,
+ inner_mode, nunits).exists (&simd_mode))
+ {
+ /* Fall back to using mode_for_vector, mostly in the hope of being
+ able to use an integer mode. */
+ if (known_eq (nunits, 0U)
+ && !multiple_p (GET_MODE_SIZE (prevailing_mode), nbytes, &nunits))
+ return NULL_TREE;
+
+ if (!mode_for_vector (inner_mode, nunits).exists (&simd_mode))
+ return NULL_TREE;
+ }
vectype = build_vector_type_for_mode (scalar_type, simd_mode);
@@ -11238,9 +11259,8 @@ get_vectype_for_scalar_type_and_size (tree scalar_type, poly_uint64 size)
tree
get_vectype_for_scalar_type (vec_info *vinfo, tree scalar_type)
{
- tree vectype;
- poly_uint64 vector_size = GET_MODE_SIZE (vinfo->vector_mode);
- vectype = get_vectype_for_scalar_type_and_size (scalar_type, vector_size);
+ tree vectype = get_related_vectype_for_scalar_type (vinfo->vector_mode,
+ scalar_type);
if (vectype && vinfo->vector_mode == VOIDmode)
vinfo->vector_mode = TYPE_MODE (vectype);
return vectype;
@@ -11273,8 +11293,13 @@ get_same_sized_vectype (tree scalar_type, tree vector_type)
if (VECT_SCALAR_BOOLEAN_TYPE_P (scalar_type))
return truth_type_for (vector_type);
- return get_vectype_for_scalar_type_and_size
- (scalar_type, GET_MODE_SIZE (TYPE_MODE (vector_type)));
+ poly_uint64 nunits;
+ if (!multiple_p (GET_MODE_SIZE (TYPE_MODE (vector_type)),
+ GET_MODE_SIZE (TYPE_MODE (scalar_type)), &nunits))
+ return NULL_TREE;
+
+ return get_related_vectype_for_scalar_type (TYPE_MODE (vector_type),
+ scalar_type, nunits);
}
/* Function vect_is_simple_use.
diff --git a/gcc/tree-vectorizer.c b/gcc/tree-vectorizer.c
index d6de78350e6..7be81a0b27f 100644
--- a/gcc/tree-vectorizer.c
+++ b/gcc/tree-vectorizer.c
@@ -1359,7 +1359,7 @@ get_vec_alignment_for_array_type (tree type)
poly_uint64 array_size, vector_size;
tree scalar_type = strip_array_types (type);
- tree vectype = get_vectype_for_scalar_type_and_size (scalar_type, 0);
+ tree vectype = get_related_vectype_for_scalar_type (VOIDmode, scalar_type);
if (!vectype
|| !poly_int_tree_p (TYPE_SIZE (type), &array_size)
|| !poly_int_tree_p (TYPE_SIZE (vectype), &vector_size)
diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
index f6efed1f863..fadc4d89d16 100644
--- a/gcc/tree-vectorizer.h
+++ b/gcc/tree-vectorizer.h
@@ -335,8 +335,9 @@ public:
/* Cost data used by the target cost model. */
void *target_cost_data;
- /* If we've chosen a vector size for this vectorization region,
- this is one mode that has such a size, otherwise it is VOIDmode. */
+ /* The argument we should pass to related_vector_mode when looking up
+ the vector mode for a scalar mode, or VOIDmode if we haven't yet
+ made any decisions about which vector modes to use. */
machine_mode vector_mode;
private:
@@ -1624,8 +1625,9 @@ extern bool vect_can_advance_ivs_p (loop_vec_info);
extern void vect_update_inits_of_drs (loop_vec_info, tree, tree_code);
/* In tree-vect-stmts.c. */
+extern tree get_related_vectype_for_scalar_type (machine_mode, tree,
+ poly_uint64 = 0);
extern tree get_vectype_for_scalar_type (vec_info *, tree);
-extern tree get_vectype_for_scalar_type_and_size (tree, poly_uint64);
extern tree get_mask_type_for_scalar_type (vec_info *, tree);
extern tree get_same_sized_vectype (tree, tree);
extern bool vect_get_loop_mask_type (loop_vec_info);
</cut>