Successfully identified regression in *llvm* in CI configuration tcwg_bmk_llvm_tk1/llvm-master-arm-spec2k6-O2. So far, this commit has regressed CI configurations: - tcwg_bmk_llvm_tk1/llvm-master-arm-spec2k6-O2
Culprit: <cut> commit d181fd918d18cbd99768f025e14a69d35d275f14 Author: Simon Pilgrim llvm-dev@redking.me.uk Date: Fri Jul 2 14:27:27 2021 +0100
[CostModel][X86] Drop some hard coded fp<->int scalarization costs
Scalarization costs handling is a lot better now, and the hard coded costs were higher than the worse case numbers from the script in D103695 </cut>
Results regressed to (for first_bad == d181fd918d18cbd99768f025e14a69d35d275f14) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--with-mode=arm --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--with-mode=arm --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -O2_marm -- artifacts/build-d181fd918d18cbd99768f025e14a69d35d275f14/results_id: 1 # 400.perlbench,libc-2.33.9000.so regressed by 113
from (for last_good == 5df556ac8bb8c5f4ef3dff1a2039dd389d1d27c0) # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1 -- --set gcc_override_configure=--with-mode=arm --set gcc_override_configure=--disable-libsanitizer: -8 # build_abe linux: -7 # build_abe glibc: -6 # build_abe stage2 -- --set gcc_override_configure=--with-mode=arm --set gcc_override_configure=--disable-libsanitizer: -5 # build_llvm true: -3 # true: 0 # benchmark -O2_marm -- artifacts/build-5df556ac8bb8c5f4ef3dff1a2039dd389d1d27c0/results_id: 1
Artifacts of last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-master-a... Results ID of last_good: tk1_32/tcwg_bmk_llvm_tk1/bisect-llvm-master-arm-spec2k6-O2/1840 Artifacts of first_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-master-a... Results ID of first_bad: tk1_32/tcwg_bmk_llvm_tk1/bisect-llvm-master-arm-spec2k6-O2/1837 Build top page/logs: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-master-a...
Configuration details:
Reproduce builds: <cut> mkdir investigate-llvm-d181fd918d18cbd99768f025e14a69d35d275f14 cd investigate-llvm-d181fd918d18cbd99768f025e14a69d35d275f14
git clone https://git.linaro.org/toolchain/jenkins-scripts
mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-master-a... --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-master-a... --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-master-a... --fail chmod +x artifacts/test.sh
# Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh
# Save baseline build state (which is then restored in artifacts/test.sh) rsync -a --del --delete-excluded --exclude bisect/ --exclude artifacts/ --exclude llvm/ ./ ./bisect/baseline/
cd llvm
# Reproduce first_bad build git checkout --detach d181fd918d18cbd99768f025e14a69d35d275f14 ../artifacts/test.sh
# Reproduce last_good build git checkout --detach 5df556ac8bb8c5f4ef3dff1a2039dd389d1d27c0 ../artifacts/test.sh
cd .. </cut>
History of pending regressions and results: https://git.linaro.org/toolchain/ci/base-artifacts.git/log/?h=linaro-local/c...
Artifacts: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-master-a... Build log: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-master-a...
Full commit (up to 1000 lines): <cut> commit d181fd918d18cbd99768f025e14a69d35d275f14 Author: Simon Pilgrim llvm-dev@redking.me.uk Date: Fri Jul 2 14:27:27 2021 +0100
[CostModel][X86] Drop some hard coded fp<->int scalarization costs
Scalarization costs handling is a lot better now, and the hard coded costs were higher than the worse case numbers from the script in D103695 --- llvm/lib/Target/X86/X86TargetTransformInfo.cpp | 13 ------------- llvm/test/Analysis/CostModel/X86/sitofp.ll | 6 +++--- 2 files changed, 3 insertions(+), 16 deletions(-)
diff --git a/llvm/lib/Target/X86/X86TargetTransformInfo.cpp b/llvm/lib/Target/X86/X86TargetTransformInfo.cpp index d55cd8a8c7a8..9eb5abe4dd9b 100644 --- a/llvm/lib/Target/X86/X86TargetTransformInfo.cpp +++ b/llvm/lib/Target/X86/X86TargetTransformInfo.cpp @@ -1977,13 +1977,6 @@ InstructionCost X86TTIImpl::getCastInstrCost(unsigned Opcode, Type *Dst, { ISD::UINT_TO_FP, MVT::v8f64, MVT::v8i32, 10 }, { ISD::UINT_TO_FP, MVT::v2f64, MVT::v2i64, 5 }, { ISD::UINT_TO_FP, MVT::v4f64, MVT::v4i64, 6 }, - // The generic code to compute the scalar overhead is currently broken. - // Workaround this limitation by estimating the scalarization overhead - // here. We have roughly 10 instructions per scalar element. - // Multiply that by the vector width. - // FIXME: remove that when PR19268 is fixed. - { ISD::SINT_TO_FP, MVT::v4f64, MVT::v4i64, 13 }, - { ISD::SINT_TO_FP, MVT::v4f64, MVT::v4i64, 13 },
{ ISD::FP_TO_SINT, MVT::v8i8, MVT::v8f32, 4 }, { ISD::FP_TO_SINT, MVT::v4i8, MVT::v4f64, 3 }, @@ -2003,12 +1996,6 @@ InstructionCost X86TTIImpl::getCastInstrCost(unsigned Opcode, Type *Dst, { ISD::FP_TO_UINT, MVT::v8i16, MVT::v8f32, 3 }, { ISD::FP_TO_UINT, MVT::v8i32, MVT::v8f32, 9 }, { ISD::FP_TO_UINT, MVT::v8i32, MVT::v8f64, 19 }, - // This node is expanded into scalarized operations but BasicTTI is overly - // optimistic estimating its cost. It computes 3 per element (one - // vector-extract, one scalar conversion and one vector-insert). The - // problem is that the inserts form a read-modify-write chain so latency - // should be factored in too. Inflating the cost per element by 1. - { ISD::FP_TO_UINT, MVT::v4i32, MVT::v4f64, 4*4 },
{ ISD::FP_EXTEND, MVT::v4f64, MVT::v4f32, 1 }, { ISD::FP_ROUND, MVT::v4f32, MVT::v4f64, 1 }, diff --git a/llvm/test/Analysis/CostModel/X86/sitofp.ll b/llvm/test/Analysis/CostModel/X86/sitofp.ll index b3c400c93b9f..b327454c1d09 100644 --- a/llvm/test/Analysis/CostModel/X86/sitofp.ll +++ b/llvm/test/Analysis/CostModel/X86/sitofp.ll @@ -122,14 +122,14 @@ define i32 @sitofp_i64_double() { ; AVX-LABEL: 'sitofp_i64_double' ; AVX-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %cvt_i64_f64 = sitofp i64 undef to double ; AVX-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %cvt_v2i64_v2f64 = sitofp <2 x i64> undef to <2 x double> -; AVX-NEXT: Cost Model: Found an estimated cost of 13 for instruction: %cvt_v4i64_v4f64 = sitofp <4 x i64> undef to <4 x double> -; AVX-NEXT: Cost Model: Found an estimated cost of 26 for instruction: %cvt_v8i64_v8f64 = sitofp <8 x i64> undef to <8 x double> +; AVX-NEXT: Cost Model: Found an estimated cost of 11 for instruction: %cvt_v4i64_v4f64 = sitofp <4 x i64> undef to <4 x double> +; AVX-NEXT: Cost Model: Found an estimated cost of 22 for instruction: %cvt_v8i64_v8f64 = sitofp <8 x i64> undef to <8 x double> ; AVX-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef ; ; AVX512F-LABEL: 'sitofp_i64_double' ; AVX512F-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %cvt_i64_f64 = sitofp i64 undef to double ; AVX512F-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %cvt_v2i64_v2f64 = sitofp <2 x i64> undef to <2 x double> -; AVX512F-NEXT: Cost Model: Found an estimated cost of 13 for instruction: %cvt_v4i64_v4f64 = sitofp <4 x i64> undef to <4 x double> +; AVX512F-NEXT: Cost Model: Found an estimated cost of 11 for instruction: %cvt_v4i64_v4f64 = sitofp <4 x i64> undef to <4 x double> ; AVX512F-NEXT: Cost Model: Found an estimated cost of 25 for instruction: %cvt_v8i64_v8f64 = sitofp <8 x i64> undef to <8 x double> ; AVX512F-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef ; </cut>