linaro-toolchain

linaro-toolchain@lists.linaro.org

14 participants
5758 discussions

[TCWG CI] 433.milc slowed down by 4% after llvm: [LLDB] Remove xfail decorator TestInferiorAssert.py AArch64/Linux

by ci_notify＠linaro.org

After llvm commit 483db1c706864d0940206228dfe64bdcd17faa4e Author: Muhammad Omair Javaid <omair.javaid(a)linaro.org> [LLDB] Remove xfail decorator TestInferiorAssert.py AArch64/Linux the following benchmarks slowed down by more than 2%: - 433.milc slowed down by 4% from 13309 to 13838 perf samples - 433.milc:[.] mult_su3_mat_vec slowed down by 17% from 2058 to 2409 perf samples Below reproducer instructions can be used to re-build both "first_bad" and "last_good" cross-toolchains used in this bisection. Naturally, the scripts will fail when triggerring benchmarking jobs if you don't have access to Linaro TCWG CI. For your convenience, we have uploaded tarballs with pre-processed source and assembly files at: - First_bad save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… - Last_good save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… - Baseline save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Configuration: - Benchmark: SPEC CPU2006 - Toolchain: Clang + Glibc + LLVM Linker - Version: all components were built from their tip of trunk - Target: aarch64-linux-gnu - Compiler flags: -O3 - Hardware: NVidia TX1 4x Cortex-A57 This benchmarking CI is work-in-progress, and we welcome feedback and suggestions at linaro-toolchain(a)lists.linaro.org . In our improvement plans is to add support for SPEC CPU2017 benchmarks and provide "perf report/annotate" data behind these reports. THIS IS THE END OF INTERESTING STUFF. BELOW ARE LINKS TO BUILDS, REPRODUCTION INSTRUCTIONS, AND THE RAW COMMIT. This commit has regressed these CI configurations: - tcwg_bmk_llvm_tx1/llvm-master-aarch64-spec2k6-O3 First_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Baseline build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Even more details: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Reproduce builds: <cut> mkdir investigate-llvm-483db1c706864d0940206228dfe64bdcd17faa4e cd investigate-llvm-483db1c706864d0940206228dfe64bdcd17faa4e # Fetch scripts git clone https://git.linaro.org/toolchain/jenkins-scripts # Fetch manifests and test.sh script mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /llvm/ ./ ./bisect/baseline/ cd llvm # Reproduce first_bad build git checkout --detach 483db1c706864d0940206228dfe64bdcd17faa4e ../artifacts/test.sh # Reproduce last_good build git checkout --detach d11ec6f67e45c630ab87bfb6010dcc93e89542fc ../artifacts/test.sh cd .. </cut> Full commit (up to 1000 lines): <cut> commit 483db1c706864d0940206228dfe64bdcd17faa4e Author: Muhammad Omair Javaid <omair.javaid(a)linaro.org> Date: Mon Oct 11 14:34:41 2021 +0500 [LLDB] Remove xfail decorator TestInferiorAssert.py AArch64/Linux TestInferiorAssert.py test_inferior_asserting_disassemble passes after upgrading LLDB AArch64/Linux buildbot to Ubuntu Focal. --- lldb/test/API/functionalities/inferior-assert/TestInferiorAssert.py | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/lldb/test/API/functionalities/inferior-assert/TestInferiorAssert.py b/lldb/test/API/functionalities/inferior-assert/TestInferiorAssert.py index c533a1e29a12..5ac4eeb0514a 100644 --- a/lldb/test/API/functionalities/inferior-assert/TestInferiorAssert.py +++ b/lldb/test/API/functionalities/inferior-assert/TestInferiorAssert.py @@ -45,9 +45,7 @@ class AssertingInferiorTestCase(TestBase): bugnumber="llvm.org/pr21793: need to implement support for detecting assertion / abort on Windows") @expectedFailureAll( oslist=["linux"], - archs=[ - "aarch64", - "arm"], + archs=["arm"], triple=no_match(".*-android"), bugnumber="llvm.org/pr25338") @expectedFailureAll(bugnumber="llvm.org/pr26592", triple='^mips') </cut>

4 years, 9 months

[TCWG CI] 444.namd grew in size by 2% after llvm: [AArch64] Make -mcpu=generic schedule for an in-order core

by ci_notify＠linaro.org

After llvm commit adec9223616477df023026b0269ccd008701cc94 Author: David Green <david.green(a)arm.com> [AArch64] Make -mcpu=generic schedule for an in-order core the following benchmarks grew in size by more than 1%: - 444.namd grew in size by 2% from 185531 to 188815 bytes the following hot functions grew in size by more than 10% (but their benchmarks grew in size by less than 1%): - 482.sphinx3:[.] OUTLINED_FUNCTION_4 grew in size by 14% from 28 to 32 bytes Below reproducer instructions can be used to re-build both "first_bad" and "last_good" cross-toolchains used in this bisection. Naturally, the scripts will fail when triggerring benchmarking jobs if you don't have access to Linaro TCWG CI. For your convenience, we have uploaded tarballs with pre-processed source and assembly files at: - First_bad save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… - Last_good save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… - Baseline save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… Configuration: - Benchmark: SPEC CPU2006 - Toolchain: Clang + Glibc + LLVM Linker - Version: all components were built from their tip of trunk - Target: aarch64-linux-gnu - Compiler flags: -Oz - Hardware: APM Mustang 8x X-Gene1 This benchmarking CI is work-in-progress, and we welcome feedback and suggestions at linaro-toolchain(a)lists.linaro.org . In our improvement plans is to add support for SPEC CPU2017 benchmarks and provide "perf report/annotate" data behind these reports. THIS IS THE END OF INTERESTING STUFF. BELOW ARE LINKS TO BUILDS, REPRODUCTION INSTRUCTIONS, AND THE RAW COMMIT. This commit has regressed these CI configurations: - tcwg_bmk_llvm_apm/llvm-master-aarch64-spec2k6-Oz First_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… Last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… Baseline build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… Even more details: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… Reproduce builds: <cut> mkdir investigate-llvm-adec9223616477df023026b0269ccd008701cc94 cd investigate-llvm-adec9223616477df023026b0269ccd008701cc94 # Fetch scripts git clone https://git.linaro.org/toolchain/jenkins-scripts # Fetch manifests and test.sh script mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_apm-llvm-master-… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /llvm/ ./ ./bisect/baseline/ cd llvm # Reproduce first_bad build git checkout --detach adec9223616477df023026b0269ccd008701cc94 ../artifacts/test.sh # Reproduce last_good build git checkout --detach e2a2e5475cbd370044474e132a1b5c58e6a3d458 ../artifacts/test.sh cd .. </cut> Full commit (up to 1000 lines): <cut> commit adec9223616477df023026b0269ccd008701cc94 Author: David Green <david.green(a)arm.com> Date: Sat Oct 9 15:58:31 2021 +0100 [AArch64] Make -mcpu=generic schedule for an in-order core We would like to start pushing -mcpu=generic towards enabling the set of features that improves performance for some CPUs, without hurting any others. A blend of the performance options hopefully beneficial to all CPUs. The largest part of that is enabling in-order scheduling using the Cortex-A55 schedule model. This is similar to the Arm backend change from eecb353d0e25ba which made -mcpu=generic perform in-order scheduling using the cortex-a8 schedule model. The idea is that in-order cpu's require the most help in instruction scheduling, whereas out-of-order cpus can for the most part out-of-order schedule around different codegen. Our benchmarking suggests that hypothesis holds. When running on an in-order core this improved performance by 3.8% geomean on a set of DSP workloads, 2% geomean on some other embedded benchmark and between 1% and 1.8% on a set of singlecore and multicore workloads, all running on a Cortex-A55 cluster. On an out-of-order cpu the results are a lot more noisy but show flat performance or an improvement. On the set of DSP and embedded benchmarks, run on a Cortex-A78 there was a very noisy 1% speed improvement. Using the most detailed results I could find, SPEC2006 runs on a Neoverse N1 show a small increase in instruction count (+0.127%), but a decrease in cycle counts (-0.155%, on average). The instruction count is very low noise, the cycle count is more noisy with a 0.15% decrease not being significant. SPEC2k17 shows a small decrease (-0.2%) in instruction count leading to a -0.296% decrease in cycle count. These results are within noise margins but tend to show a small improvement in general. When specifying an Apple target, clang will set "-target-cpu apple-a7" on the command line, so should not be affected by this change when running from clang. This also doesn't enable more runtime unrolling like -mcpu=cortex-a55 does, only changing the schedule used. A lot of existing tests have updated. This is a summary of the important differences: - Most changes are the same instructions in a different order. - Sometimes this leads to very minor inefficiencies, such as requiring an extra mov to move variables into r0/v0 for the return value of a test function. - misched-fusion.ll was no longer fusing the pairs of instructions it should, as per D110561. I've changed the schedule used in the test for now. - neon-mla-mls.ll now uses "mul; sub" as opposed to "neg; mla" due to the different latencies. This seems fine to me. - Some SVE tests do not always remove movprfx where they did before due to different register allocation giving different destructive forms. - The tests argument-blocks-array-of-struct.ll and arm64-windows-calls.ll produce two LDR where they previously produced an LDP due to store-pair-suppress kicking in. - arm64-ldp.ll and arm64-neon-copy.ll are missing pre/postinc on LPD. - Some tests such as arm64-neon-mul-div.ll and ragreedy-local-interval-cost.ll have more, less or just different spilling. - In aarch64_generated_funcs.ll.generated.expected one part of the function is no longer outlined. Interestingly if I switch this to use any other scheduled even less is outlined. Some of these are expected to happen, such as differences in outlining or register spilling. There will be places where these result in worse codegen, places where they are better, with the SPEC instruction counts suggesting it is not a decrease overall, on average. Differential Revision: https://reviews.llvm.org/D110830 --- llvm/lib/Target/AArch64/AArch64.td | 2 +- .../Analysis/CostModel/AArch64/shuffle-select.ll | 2 +- .../Analysis/CostModel/AArch64/vector-select.ll | 4 +- llvm/test/CodeGen/AArch64/DAGCombine_vscale.ll | 2 +- .../CodeGen/AArch64/GlobalISel/arm64-atomic.ll | 68 +- llvm/test/CodeGen/AArch64/GlobalISel/byval-call.ll | 4 +- .../call-translator-variadic-musttail.ll | 26 +- .../CodeGen/AArch64/GlobalISel/combine-udiv.ll | 308 +- .../AArch64/GlobalISel/merge-stores-truncating.ll | 10 +- llvm/test/CodeGen/AArch64/GlobalISel/swifterror.ll | 86 +- llvm/test/CodeGen/AArch64/aarch64-addv.ll | 2 +- llvm/test/CodeGen/AArch64/aarch64-be-bv.ll | 40 +- .../CodeGen/AArch64/aarch64-dup-ext-scalable.ll | 40 +- llvm/test/CodeGen/AArch64/aarch64-dup-ext.ll | 18 +- llvm/test/CodeGen/AArch64/aarch64-fold-lslfast.ll | 12 +- llvm/test/CodeGen/AArch64/aarch64-load-ext.ll | 36 +- .../CodeGen/AArch64/aarch64-matrix-umull-smull.ll | 24 +- llvm/test/CodeGen/AArch64/aarch64-smull.ll | 124 +- llvm/test/CodeGen/AArch64/aarch64-tail-dup-size.ll | 6 +- .../test/CodeGen/AArch64/aarch64_win64cc_vararg.ll | 4 +- llvm/test/CodeGen/AArch64/addimm-mulimm.ll | 32 +- .../CodeGen/AArch64/addsub-constant-folding.ll | 18 +- llvm/test/CodeGen/AArch64/addsub.ll | 2 +- llvm/test/CodeGen/AArch64/align-down.ll | 10 +- llvm/test/CodeGen/AArch64/and-mask-removal.ll | 12 +- .../AArch64/argument-blocks-array-of-struct.ll | 51 +- llvm/test/CodeGen/AArch64/arm64-AdvSIMD-Scalar.ll | 24 +- .../CodeGen/AArch64/arm64-addr-type-promotion.ll | 37 +- llvm/test/CodeGen/AArch64/arm64-addrmode.ll | 6 +- .../test/CodeGen/AArch64/arm64-bitfield-extract.ll | 14 +- llvm/test/CodeGen/AArch64/arm64-collect-loh.ll | 2 +- llvm/test/CodeGen/AArch64/arm64-convert-v4f64.ll | 22 +- llvm/test/CodeGen/AArch64/arm64-csel.ll | 16 +- llvm/test/CodeGen/AArch64/arm64-dup.ll | 10 +- llvm/test/CodeGen/AArch64/arm64-fcopysign.ll | 18 +- llvm/test/CodeGen/AArch64/arm64-fmadd.ll | 4 +- .../arm64-homogeneous-prolog-epilog-no-helper.ll | 18 +- llvm/test/CodeGen/AArch64/arm64-indexed-memory.ll | 54 +- .../CodeGen/AArch64/arm64-indexed-vector-ldst.ll | 180 +- llvm/test/CodeGen/AArch64/arm64-inline-asm.ll | 8 +- .../AArch64/arm64-instruction-mix-remarks.ll | 20 +- llvm/test/CodeGen/AArch64/arm64-ldp.ll | 20 +- llvm/test/CodeGen/AArch64/arm64-memset-inline.ll | 4 +- llvm/test/CodeGen/AArch64/arm64-neon-3vdiff.ll | 64 +- llvm/test/CodeGen/AArch64/arm64-neon-aba-abd.ll | 6 +- llvm/test/CodeGen/AArch64/arm64-neon-copy.ll | 13 +- llvm/test/CodeGen/AArch64/arm64-neon-mul-div.ll | 1428 ++++---- llvm/test/CodeGen/AArch64/arm64-nvcast.ll | 10 +- llvm/test/CodeGen/AArch64/arm64-popcnt.ll | 198 +- .../arm64-promote-const-complex-initializers.ll | 8 +- .../test/CodeGen/AArch64/arm64-register-pairing.ll | 4 +- llvm/test/CodeGen/AArch64/arm64-rev.ll | 14 +- .../AArch64/arm64-setcc-int-to-fp-combine.ll | 20 +- llvm/test/CodeGen/AArch64/arm64-shrink-wrapping.ll | 92 +- llvm/test/CodeGen/AArch64/arm64-sli-sri-opt.ll | 30 +- llvm/test/CodeGen/AArch64/arm64-srl-and.ll | 2 +- .../test/CodeGen/AArch64/arm64-subvector-extend.ll | 630 ++-- llvm/test/CodeGen/AArch64/arm64-tls-dynamics.ll | 8 +- llvm/test/CodeGen/AArch64/arm64-tls-local-exec.ll | 8 +- llvm/test/CodeGen/AArch64/arm64-trunc-store.ll | 4 +- llvm/test/CodeGen/AArch64/arm64-vabs.ll | 446 ++- llvm/test/CodeGen/AArch64/arm64-vhadd.ll | 32 +- llvm/test/CodeGen/AArch64/arm64-vmul.ll | 226 +- llvm/test/CodeGen/AArch64/arm64-windows-calls.ll | 19 +- .../CodeGen/AArch64/arm64-zero-cycle-zeroing.ll | 8 +- llvm/test/CodeGen/AArch64/arm64_32-addrs.ll | 6 +- llvm/test/CodeGen/AArch64/arm64_32-atomics.ll | 2 +- llvm/test/CodeGen/AArch64/atomic-ops-lse.ll | 17 +- .../CodeGen/AArch64/atomic-ops-not-barriers.ll | 2 +- llvm/test/CodeGen/AArch64/bcmp-inline-small.ll | 4 +- llvm/test/CodeGen/AArch64/bitcast-promote-widen.ll | 8 +- llvm/test/CodeGen/AArch64/bitfield-insert.ll | 34 +- llvm/test/CodeGen/AArch64/build-one-lane.ll | 9 +- llvm/test/CodeGen/AArch64/build-vector-extract.ll | 126 +- llvm/test/CodeGen/AArch64/cgp-usubo.ll | 24 +- llvm/test/CodeGen/AArch64/cmp-select-sign.ll | 44 +- llvm/test/CodeGen/AArch64/cmpxchg-idioms.ll | 16 +- .../CodeGen/AArch64/combine-comparisons-by-cse.ll | 50 +- llvm/test/CodeGen/AArch64/cond-sel-value-prop.ll | 12 +- llvm/test/CodeGen/AArch64/consthoist-gep.ll | 32 +- llvm/test/CodeGen/AArch64/csr-split.ll | 4 +- llvm/test/CodeGen/AArch64/ctpop-nonean.ll | 30 +- llvm/test/CodeGen/AArch64/dag-combine-select.ll | 2 +- .../CodeGen/AArch64/dag-combine-trunc-build-vec.ll | 14 +- llvm/test/CodeGen/AArch64/dag-numsignbits.ll | 12 +- .../AArch64/div-rem-pair-recomposition-signed.ll | 210 +- .../AArch64/div-rem-pair-recomposition-unsigned.ll | 210 +- llvm/test/CodeGen/AArch64/emutls.ll | 6 +- llvm/test/CodeGen/AArch64/expand-select.ll | 50 +- llvm/test/CodeGen/AArch64/expand-vector-rot.ll | 12 +- llvm/test/CodeGen/AArch64/extract-bits.ll | 484 +-- llvm/test/CodeGen/AArch64/extract-lowbits.ll | 116 +- llvm/test/CodeGen/AArch64/f16-instructions.ll | 18 +- llvm/test/CodeGen/AArch64/fabs.ll | 8 +- llvm/test/CodeGen/AArch64/fadd-combines.ll | 14 +- llvm/test/CodeGen/AArch64/faddp-half.ll | 8 +- .../CodeGen/AArch64/fast-isel-addressing-modes.ll | 6 +- .../CodeGen/AArch64/fast-isel-branch-cond-split.ll | 4 +- llvm/test/CodeGen/AArch64/fast-isel-gep.ll | 6 +- llvm/test/CodeGen/AArch64/fast-isel-memcpy.ll | 6 +- llvm/test/CodeGen/AArch64/fast-isel-shift.ll | 24 +- llvm/test/CodeGen/AArch64/fdiv_combine.ll | 6 +- llvm/test/CodeGen/AArch64/fold-global-offsets.ll | 10 +- llvm/test/CodeGen/AArch64/fp16-v8-instructions.ll | 1441 ++++---- llvm/test/CodeGen/AArch64/fp16-vector-shuffle.ll | 2 +- llvm/test/CodeGen/AArch64/fptosi-sat-scalar.ll | 198 +- llvm/test/CodeGen/AArch64/fptosi-sat-vector.ll | 958 +++--- llvm/test/CodeGen/AArch64/fptoui-sat-scalar.ll | 114 +- llvm/test/CodeGen/AArch64/fptoui-sat-vector.ll | 708 ++-- .../CodeGen/AArch64/framelayout-frame-record.mir | 3 +- .../CodeGen/AArch64/framelayout-unaligned-fp.ll | 4 +- llvm/test/CodeGen/AArch64/func-calls.ll | 2 +- llvm/test/CodeGen/AArch64/funnel-shift-rot.ll | 30 +- llvm/test/CodeGen/AArch64/funnel-shift.ll | 108 +- llvm/test/CodeGen/AArch64/global-merge-3.ll | 24 +- llvm/test/CodeGen/AArch64/half.ll | 10 +- .../hoist-and-by-const-from-lshr-in-eqcmp-zero.ll | 6 +- .../test/CodeGen/AArch64/hwasan-check-memaccess.ll | 2 +- .../CodeGen/AArch64/i128_volatile_load_store.ll | 36 +- llvm/test/CodeGen/AArch64/implicit-null-check.ll | 12 +- .../AArch64/insert-subvector-res-legalization.ll | 70 +- llvm/test/CodeGen/AArch64/isinf.ll | 2 +- llvm/test/CodeGen/AArch64/known-never-nan.ll | 16 +- llvm/test/CodeGen/AArch64/ldst-opt.ll | 5 +- llvm/test/CodeGen/AArch64/llvm-ir-to-intrinsic.ll | 163 +- llvm/test/CodeGen/AArch64/logical_shifted_reg.ll | 137 +- llvm/test/CodeGen/AArch64/lowerMUL-newload.ll | 24 +- .../CodeGen/AArch64/machine-licm-sink-instr.ll | 24 +- .../test/CodeGen/AArch64/machine-outliner-throw.ll | 4 +- .../AArch64/machine_cse_impdef_killflags.ll | 4 +- llvm/test/CodeGen/AArch64/madd-lohi.ll | 4 +- llvm/test/CodeGen/AArch64/memcpy-scoped-aa.ll | 50 +- llvm/test/CodeGen/AArch64/merge-trunc-store.ll | 72 +- llvm/test/CodeGen/AArch64/midpoint-int.ll | 308 +- llvm/test/CodeGen/AArch64/min-max.ll | 260 +- llvm/test/CodeGen/AArch64/minmax-of-minmax.ll | 256 +- llvm/test/CodeGen/AArch64/minmax.ll | 10 +- llvm/test/CodeGen/AArch64/misched-fusion-lit.ll | 5 +- llvm/test/CodeGen/AArch64/misched-fusion.ll | 4 +- .../CodeGen/AArch64/named-vector-shuffles-neon.ll | 18 +- .../CodeGen/AArch64/named-vector-shuffles-sve.ll | 408 +-- llvm/test/CodeGen/AArch64/neg-abs.ll | 8 +- llvm/test/CodeGen/AArch64/neg-imm.ll | 3 +- .../CodeGen/AArch64/neon-bitwise-instructions.ll | 6 +- llvm/test/CodeGen/AArch64/neon-dotpattern.ll | 4 +- llvm/test/CodeGen/AArch64/neon-dotreduce.ll | 88 +- llvm/test/CodeGen/AArch64/neon-mla-mls.ll | 30 +- llvm/test/CodeGen/AArch64/neon-mov.ll | 2 +- llvm/test/CodeGen/AArch64/neon-reverseshuffle.ll | 2 +- llvm/test/CodeGen/AArch64/neon-shift-neg.ll | 24 +- llvm/test/CodeGen/AArch64/neon-truncstore.ll | 30 +- llvm/test/CodeGen/AArch64/nontemporal.ll | 74 +- llvm/test/CodeGen/AArch64/overeager_mla_fusing.ll | 10 +- llvm/test/CodeGen/AArch64/pow.ll | 12 +- .../pull-conditional-binop-through-shift.ll | 6 +- llvm/test/CodeGen/AArch64/qmovn.ll | 8 +- .../AArch64/ragreedy-local-interval-cost.ll | 187 +- llvm/test/CodeGen/AArch64/rand.ll | 10 +- llvm/test/CodeGen/AArch64/reduce-and.ll | 348 +- llvm/test/CodeGen/AArch64/reduce-or.ll | 348 +- llvm/test/CodeGen/AArch64/reduce-xor.ll | 164 +- llvm/test/CodeGen/AArch64/regress-tblgen-chains.ll | 4 +- llvm/test/CodeGen/AArch64/rotate-extract.ll | 14 +- .../rvmarker-pseudo-expansion-and-outlining.mir | 4 +- llvm/test/CodeGen/AArch64/sadd_sat.ll | 12 +- llvm/test/CodeGen/AArch64/sadd_sat_plus.ll | 36 +- llvm/test/CodeGen/AArch64/sadd_sat_vec.ll | 68 +- llvm/test/CodeGen/AArch64/sat-add.ll | 30 +- llvm/test/CodeGen/AArch64/sdivpow2.ll | 2 +- llvm/test/CodeGen/AArch64/seh-finally.ll | 8 +- llvm/test/CodeGen/AArch64/select-with-and-or.ll | 32 +- llvm/test/CodeGen/AArch64/select_const.ll | 112 +- llvm/test/CodeGen/AArch64/select_fmf.ll | 32 +- llvm/test/CodeGen/AArch64/selectcc-to-shiftand.ll | 16 +- llvm/test/CodeGen/AArch64/settag-merge-order.ll | 4 +- llvm/test/CodeGen/AArch64/settag-merge.ll | 8 +- llvm/test/CodeGen/AArch64/settag.ll | 10 +- llvm/test/CodeGen/AArch64/shift-amount-mod.ll | 168 +- llvm/test/CodeGen/AArch64/shift-by-signext.ll | 20 +- llvm/test/CodeGen/AArch64/shift-mod.ll | 2 +- llvm/test/CodeGen/AArch64/shrink-wrapping-vla.ll | 4 +- llvm/test/CodeGen/AArch64/sibling-call.ll | 2 +- llvm/test/CodeGen/AArch64/signbit-shift.ll | 8 +- llvm/test/CodeGen/AArch64/sink-addsub-of-const.ll | 48 +- llvm/test/CodeGen/AArch64/sitofp-fixed-legal.ll | 18 +- .../CodeGen/AArch64/speculation-hardening-loads.ll | 4 +- .../test/CodeGen/AArch64/speculation-hardening.mir | 2 +- llvm/test/CodeGen/AArch64/split-vector-insert.ll | 70 +- llvm/test/CodeGen/AArch64/sqrt-fastmath.ll | 254 +- llvm/test/CodeGen/AArch64/srem-lkk.ll | 2 +- .../CodeGen/AArch64/srem-seteq-illegal-types.ll | 90 +- llvm/test/CodeGen/AArch64/srem-seteq-optsize.ll | 16 +- .../CodeGen/AArch64/srem-seteq-vec-nonsplat.ll | 382 +-- llvm/test/CodeGen/AArch64/srem-seteq-vec-splat.ll | 64 +- llvm/test/CodeGen/AArch64/srem-seteq.ll | 12 +- llvm/test/CodeGen/AArch64/srem-vector-lkk.ll | 446 +-- llvm/test/CodeGen/AArch64/ssub_sat.ll | 12 +- llvm/test/CodeGen/AArch64/ssub_sat_plus.ll | 36 +- llvm/test/CodeGen/AArch64/ssub_sat_vec.ll | 68 +- .../CodeGen/AArch64/stack-guard-remat-bitcast.ll | 12 +- llvm/test/CodeGen/AArch64/stack-guard-sysreg.ll | 30 +- .../CodeGen/AArch64/statepoint-call-lowering.ll | 6 +- .../AArch64/sve-calling-convention-mixed.ll | 16 +- llvm/test/CodeGen/AArch64/sve-expand-div.ll | 12 +- llvm/test/CodeGen/AArch64/sve-extract-element.ll | 4 +- .../CodeGen/AArch64/sve-extract-fixed-vector.ll | 64 +- .../CodeGen/AArch64/sve-extract-scalable-vector.ll | 60 +- llvm/test/CodeGen/AArch64/sve-fcopysign.ll | 18 +- llvm/test/CodeGen/AArch64/sve-fcvt.ll | 64 +- .../CodeGen/AArch64/sve-fixed-length-concat.ll | 28 +- .../AArch64/sve-fixed-length-extract-vector-elt.ll | 12 +- .../AArch64/sve-fixed-length-float-compares.ll | 28 +- .../AArch64/sve-fixed-length-fp-extend-trunc.ll | 54 +- .../CodeGen/AArch64/sve-fixed-length-fp-select.ll | 48 +- .../CodeGen/AArch64/sve-fixed-length-fp-to-int.ll | 54 +- .../CodeGen/AArch64/sve-fixed-length-fp-vselect.ll | 1716 +++++----- .../AArch64/sve-fixed-length-insert-vector-elt.ll | 148 +- .../CodeGen/AArch64/sve-fixed-length-int-div.ll | 216 +- .../AArch64/sve-fixed-length-int-extends.ll | 56 +- .../AArch64/sve-fixed-length-int-immediates.ll | 56 +- .../CodeGen/AArch64/sve-fixed-length-int-mulh.ll | 30 +- .../CodeGen/AArch64/sve-fixed-length-int-rem.ll | 282 +- .../CodeGen/AArch64/sve-fixed-length-int-select.ll | 144 +- .../CodeGen/AArch64/sve-fixed-length-int-to-fp.ll | 108 +- .../AArch64/sve-fixed-length-int-vselect.ll | 3584 ++++++++++---------- .../AArch64/sve-fixed-length-masked-gather.ll | 296 +- .../AArch64/sve-fixed-length-masked-loads.ll | 46 +- .../AArch64/sve-fixed-length-masked-scatter.ll | 342 +- .../AArch64/sve-fixed-length-masked-stores.ll | 82 +- .../AArch64/sve-fixed-length-vector-shuffle.ll | 78 +- llvm/test/CodeGen/AArch64/sve-forward-st-to-ld.ll | 7 +- llvm/test/CodeGen/AArch64/sve-fptrunc-store.ll | 4 +- llvm/test/CodeGen/AArch64/sve-gep.ll | 4 +- .../CodeGen/AArch64/sve-implicit-zero-filling.ll | 13 +- llvm/test/CodeGen/AArch64/sve-insert-element.ll | 192 +- llvm/test/CodeGen/AArch64/sve-insert-vector.ll | 80 +- llvm/test/CodeGen/AArch64/sve-int-arith-imm.ll | 30 +- llvm/test/CodeGen/AArch64/sve-int-arith.ll | 2 +- llvm/test/CodeGen/AArch64/sve-intrinsics-index.ll | 10 +- .../CodeGen/AArch64/sve-intrinsics-int-arith.ll | 4 +- llvm/test/CodeGen/AArch64/sve-ld-post-inc.ll | 6 +- llvm/test/CodeGen/AArch64/sve-ld1r.ll | 2 +- .../sve-lsr-scaled-index-addressing-mode.ll | 1 + .../CodeGen/AArch64/sve-masked-gather-legalize.ll | 6 +- .../CodeGen/AArch64/sve-masked-scatter-legalize.ll | 2 +- llvm/test/CodeGen/AArch64/sve-masked-scatter.ll | 2 +- llvm/test/CodeGen/AArch64/sve-pred-arith.ll | 16 +- llvm/test/CodeGen/AArch64/sve-sext-zext.ll | 12 +- llvm/test/CodeGen/AArch64/sve-split-extract-elt.ll | 100 +- llvm/test/CodeGen/AArch64/sve-split-fcvt.ll | 40 +- llvm/test/CodeGen/AArch64/sve-split-fp-reduce.ll | 2 +- llvm/test/CodeGen/AArch64/sve-split-insert-elt.ll | 72 +- llvm/test/CodeGen/AArch64/sve-split-int-reduce.ll | 10 +- llvm/test/CodeGen/AArch64/sve-split-load.ll | 6 +- llvm/test/CodeGen/AArch64/sve-split-store.ll | 6 +- .../AArch64/sve-st1-addressing-mode-reg-imm.ll | 12 +- llvm/test/CodeGen/AArch64/sve-stepvector.ll | 22 +- llvm/test/CodeGen/AArch64/sve-trunc.ll | 30 +- llvm/test/CodeGen/AArch64/sve-vscale-attr.ll | 40 +- llvm/test/CodeGen/AArch64/sve-vscale.ll | 2 +- llvm/test/CodeGen/AArch64/sve-vselect-imm.ll | 12 +- llvm/test/CodeGen/AArch64/swift-async.ll | 20 +- llvm/test/CodeGen/AArch64/swift-return.ll | 2 +- llvm/test/CodeGen/AArch64/swifterror.ll | 6 +- llvm/test/CodeGen/AArch64/tiny-model-pic.ll | 12 +- llvm/test/CodeGen/AArch64/tiny-model-static.ll | 12 +- .../test/CodeGen/AArch64/typepromotion-overflow.ll | 136 +- llvm/test/CodeGen/AArch64/typepromotion-signed.ll | 38 +- llvm/test/CodeGen/AArch64/uadd_sat.ll | 6 +- llvm/test/CodeGen/AArch64/uadd_sat_plus.ll | 30 +- llvm/test/CodeGen/AArch64/uadd_sat_vec.ll | 72 +- .../AArch64/umulo-128-legalisation-lowering.ll | 27 +- ...old-masked-merge-scalar-constmask-innerouter.ll | 18 +- ...asked-merge-scalar-constmask-interleavedbits.ll | 12 +- ...merge-scalar-constmask-interleavedbytehalves.ll | 12 +- ...unfold-masked-merge-scalar-constmask-lowhigh.ll | 2 +- .../unfold-masked-merge-scalar-variablemask.ll | 98 +- llvm/test/CodeGen/AArch64/urem-lkk.ll | 20 +- .../CodeGen/AArch64/urem-seteq-illegal-types.ll | 28 +- llvm/test/CodeGen/AArch64/urem-seteq-nonzero.ll | 46 +- llvm/test/CodeGen/AArch64/urem-seteq-optsize.ll | 14 +- .../CodeGen/AArch64/urem-seteq-vec-nonsplat.ll | 340 +- .../test/CodeGen/AArch64/urem-seteq-vec-nonzero.ll | 56 +- llvm/test/CodeGen/AArch64/urem-seteq-vec-splat.ll | 38 +- .../CodeGen/AArch64/urem-seteq-vec-tautological.ll | 56 +- llvm/test/CodeGen/AArch64/urem-seteq.ll | 14 +- llvm/test/CodeGen/AArch64/urem-vector-lkk.ll | 330 +- .../AArch64/use-cr-result-of-dom-icmp-st.ll | 8 +- llvm/test/CodeGen/AArch64/usub_sat_plus.ll | 20 +- llvm/test/CodeGen/AArch64/usub_sat_vec.ll | 48 +- llvm/test/CodeGen/AArch64/vcvt-oversize.ll | 4 +- llvm/test/CodeGen/AArch64/vec-libcalls.ll | 34 +- llvm/test/CodeGen/AArch64/vec_cttz.ll | 8 +- llvm/test/CodeGen/AArch64/vec_uaddo.ll | 168 +- llvm/test/CodeGen/AArch64/vec_umulo.ll | 296 +- .../CodeGen/AArch64/vecreduce-and-legalization.ll | 36 +- .../AArch64/vecreduce-fadd-legalization-strict.ll | 96 +- .../CodeGen/AArch64/vecreduce-fadd-legalization.ll | 6 +- llvm/test/CodeGen/AArch64/vecreduce-fadd.ll | 188 +- .../CodeGen/AArch64/vecreduce-fmax-legalization.ll | 246 +- .../CodeGen/AArch64/vecreduce-fmin-legalization.ll | 246 +- .../CodeGen/AArch64/vecreduce-umax-legalization.ll | 14 +- llvm/test/CodeGen/AArch64/vector-fcopysign.ll | 346 +- llvm/test/CodeGen/AArch64/vector-gep.ll | 6 +- .../CodeGen/AArch64/vector-popcnt-128-ult-ugt.ll | 680 ++-- llvm/test/CodeGen/AArch64/vldn_shuffle.ll | 6 +- llvm/test/CodeGen/AArch64/vselect-constants.ll | 42 +- llvm/test/CodeGen/AArch64/win-tls.ll | 6 +- llvm/test/CodeGen/AArch64/win64_vararg.ll | 32 +- llvm/test/CodeGen/AArch64/win64_vararg_float.ll | 12 +- llvm/test/CodeGen/AArch64/win64_vararg_float_cc.ll | 12 +- llvm/test/CodeGen/AArch64/xor.ll | 8 +- llvm/test/MC/AArch64/elf-globaladdress.ll | 6 +- .../CanonicalizeFreezeInLoops/aarch64.ll | 2 +- .../CodeGenPrepare/AArch64/large-offset-gep.ll | 30 +- .../AArch64/lsr-pre-inc-offset-check.ll | 12 +- .../LoopStrengthReduce/AArch64/small-constant.ll | 2 +- .../aarch64_generated_funcs.ll.generated.expected | 30 +- ...aarch64_generated_funcs.ll.nogenerated.expected | 24 +- 319 files changed, 14045 insertions(+), 13817 deletions(-) diff --git a/llvm/lib/Target/AArch64/AArch64.td b/llvm/lib/Target/AArch64/AArch64.td index 5c1bf783ba2a..cb52532343fe 100644 --- a/llvm/lib/Target/AArch64/AArch64.td +++ b/llvm/lib/Target/AArch64/AArch64.td @@ -1156,7 +1156,7 @@ def ProcTSV110 : SubtargetFeature<"tsv110", "ARMProcFamily", "TSV110", FeatureFP16FML, FeatureDotProd]>; -def : ProcessorModel<"generic", NoSchedModel, [ +def : ProcessorModel<"generic", CortexA55Model, [ FeatureFPARMv8, FeatureFuseAES, FeatureNEON, diff --git a/llvm/test/Analysis/CostModel/AArch64/shuffle-select.ll b/llvm/test/Analysis/CostModel/AArch64/shuffle-select.ll index 5008c7f5c847..cb8ec7ba6f21 100644 --- a/llvm/test/Analysis/CostModel/AArch64/shuffle-select.ll +++ b/llvm/test/Analysis/CostModel/AArch64/shuffle-select.ll @@ -4,7 +4,7 @@ ; COST-LABEL: sel.v8i8 ; COST: Found an estimated cost of 42 for instruction: %tmp0 = shufflevector <8 x i8> %v0, <8 x i8> %v1, <8 x i32> <i32 0, i32 9, i32 2, i32 11, i32 4, i32 13, i32 6, i32 15> ; CODE-LABEL: sel.v8i8 -; CODE: tbl v0.8b, { v0.16b }, v2.8b +; CODE: tbl v0.8b, { v0.16b }, v1.8b define <8 x i8> @sel.v8i8(<8 x i8> %v0, <8 x i8> %v1) { %tmp0 = shufflevector <8 x i8> %v0, <8 x i8> %v1, <8 x i32> <i32 0, i32 9, i32 2, i32 11, i32 4, i32 13, i32 6, i32 15> ret <8 x i8> %tmp0 diff --git a/llvm/test/Analysis/CostModel/AArch64/vector-select.ll b/llvm/test/Analysis/CostModel/AArch64/vector-select.ll index f2271c4ed71f..6e77612815f4 100644 --- a/llvm/test/Analysis/CostModel/AArch64/vector-select.ll +++ b/llvm/test/Analysis/CostModel/AArch64/vector-select.ll @@ -119,15 +119,15 @@ define <2 x i64> @v2i64_select_sle(<2 x i64> %a, <2 x i64> %b, <2 x i64> %c) { ; CODE-LABEL: v3i64_select_sle ; CODE: bb.0 -; CODE: ldr ; CODE: mov +; CODE: ldr ; CODE: mov ; CODE: mov ; CODE: cmge ; CODE: cmge ; CODE: bif -; CODE: ext ; CODE: bif +; CODE: ext ; CODE: ret define <3 x i64> @v3i64_select_sle(<3 x i64> %a, <3 x i64> %b, <3 x i64> %c) { diff --git a/llvm/test/CodeGen/AArch64/DAGCombine_vscale.ll b/llvm/test/CodeGen/AArch64/DAGCombine_vscale.ll index 6fe73e067e1a..c2436ccecc75 100644 --- a/llvm/test/CodeGen/AArch64/DAGCombine_vscale.ll +++ b/llvm/test/CodeGen/AArch64/DAGCombine_vscale.ll @@ -51,8 +51,8 @@ define <vscale x 4 x i32> @ashr_add_shl_nxv4i8(<vscale x 4 x i32> %a) { ; CHECK-LABEL: ashr_add_shl_nxv4i8: ; CHECK: // %bb.0: ; CHECK-NEXT: mov w8, #16777216 -; CHECK-NEXT: mov z1.s, w8 ; CHECK-NEXT: lsl z0.s, z0.s, #24 +; CHECK-NEXT: mov z1.s, w8 ; CHECK-NEXT: add z0.s, z0.s, z1.s ; CHECK-NEXT: asr z0.s, z0.s, #24 ; CHECK-NEXT: ret diff --git a/llvm/test/CodeGen/AArch64/GlobalISel/arm64-atomic.ll b/llvm/test/CodeGen/AArch64/GlobalISel/arm64-atomic.ll index fd3a0072d2a8..4385e3ede36f 100644 --- a/llvm/test/CodeGen/AArch64/GlobalISel/arm64-atomic.ll +++ b/llvm/test/CodeGen/AArch64/GlobalISel/arm64-atomic.ll @@ -705,14 +705,14 @@ define i32 @atomic_load(i32* %p) #0 { define i8 @atomic_load_relaxed_8(i8* %p, i32 %off32) #0 { ; CHECK-NOLSE-O1-LABEL: atomic_load_relaxed_8: ; CHECK-NOLSE-O1: ; %bb.0: -; CHECK-NOLSE-O1-NEXT: ldrb w8, [x0, #4095] -; CHECK-NOLSE-O1-NEXT: ldrb w9, [x0, w1, sxtw] -; CHECK-NOLSE-O1-NEXT: ldurb w10, [x0, #-256] -; CHECK-NOLSE-O1-NEXT: add x11, x0, #291, lsl #12 ; =1191936 -; CHECK-NOLSE-O1-NEXT: ldrb w11, [x11] -; CHECK-NOLSE-O1-NEXT: add w8, w8, w9 -; CHECK-NOLSE-O1-NEXT: add w8, w8, w10 -; CHECK-NOLSE-O1-NEXT: add w0, w8, w11 +; CHECK-NOLSE-O1-NEXT: add x8, x0, #291, lsl #12 ; =1191936 +; CHECK-NOLSE-O1-NEXT: ldrb w9, [x0, #4095] +; CHECK-NOLSE-O1-NEXT: ldrb w10, [x0, w1, sxtw] +; CHECK-NOLSE-O1-NEXT: ldurb w11, [x0, #-256] +; CHECK-NOLSE-O1-NEXT: ldrb w8, [x8] +; CHECK-NOLSE-O1-NEXT: add w9, w9, w10 +; CHECK-NOLSE-O1-NEXT: add w9, w9, w11 +; CHECK-NOLSE-O1-NEXT: add w0, w9, w8 ; CHECK-NOLSE-O1-NEXT: ret ; ; CHECK-NOLSE-O0-LABEL: atomic_load_relaxed_8: @@ -775,14 +775,14 @@ define i8 @atomic_load_relaxed_8(i8* %p, i32 %off32) #0 { define i16 @atomic_load_relaxed_16(i16* %p, i32 %off32) #0 { ; CHECK-NOLSE-O1-LABEL: atomic_load_relaxed_16: ; CHECK-NOLSE-O1: ; %bb.0: -; CHECK-NOLSE-O1-NEXT: ldrh w8, [x0, #8190] -; CHECK-NOLSE-O1-NEXT: ldrh w9, [x0, w1, sxtw #1] -; CHECK-NOLSE-O1-NEXT: ldurh w10, [x0, #-256] -; CHECK-NOLSE-O1-NEXT: add x11, x0, #291, lsl #12 ; =1191936 -; CHECK-NOLSE-O1-NEXT: ldrh w11, [x11] -; CHECK-NOLSE-O1-NEXT: add w8, w8, w9 -; CHECK-NOLSE-O1-NEXT: add w8, w8, w10 -; CHECK-NOLSE-O1-NEXT: add w0, w8, w11 +; CHECK-NOLSE-O1-NEXT: add x8, x0, #291, lsl #12 ; =1191936 +; CHECK-NOLSE-O1-NEXT: ldrh w9, [x0, #8190] +; CHECK-NOLSE-O1-NEXT: ldrh w10, [x0, w1, sxtw #1] +; CHECK-NOLSE-O1-NEXT: ldurh w11, [x0, #-256] +; CHECK-NOLSE-O1-NEXT: ldrh w8, [x8] +; CHECK-NOLSE-O1-NEXT: add w9, w9, w10 +; CHECK-NOLSE-O1-NEXT: add w9, w9, w11 +; CHECK-NOLSE-O1-NEXT: add w0, w9, w8 ; CHECK-NOLSE-O1-NEXT: ret ; ; CHECK-NOLSE-O0-LABEL: atomic_load_relaxed_16: @@ -845,14 +845,14 @@ define i16 @atomic_load_relaxed_16(i16* %p, i32 %off32) #0 { define i32 @atomic_load_relaxed_32(i32* %p, i32 %off32) #0 { ; CHECK-NOLSE-O1-LABEL: atomic_load_relaxed_32: ; CHECK-NOLSE-O1: ; %bb.0: -; CHECK-NOLSE-O1-NEXT: ldr w8, [x0, #16380] -; CHECK-NOLSE-O1-NEXT: ldr w9, [x0, w1, sxtw #2] -; CHECK-NOLSE-O1-NEXT: ldur w10, [x0, #-256] -; CHECK-NOLSE-O1-NEXT: add x11, x0, #291, lsl #12 ; =1191936 -; CHECK-NOLSE-O1-NEXT: ldr w11, [x11] -; CHECK-NOLSE-O1-NEXT: add w8, w8, w9 -; CHECK-NOLSE-O1-NEXT: add w8, w8, w10 -; CHECK-NOLSE-O1-NEXT: add w0, w8, w11 +; CHECK-NOLSE-O1-NEXT: add x8, x0, #291, lsl #12 ; =1191936 +; CHECK-NOLSE-O1-NEXT: ldr w9, [x0, #16380] +; CHECK-NOLSE-O1-NEXT: ldr w10, [x0, w1, sxtw #2] +; CHECK-NOLSE-O1-NEXT: ldur w11, [x0, #-256] +; CHECK-NOLSE-O1-NEXT: ldr w8, [x8] +; CHECK-NOLSE-O1-NEXT: add w9, w9, w10 +; CHECK-NOLSE-O1-NEXT: add w9, w9, w11 +; CHECK-NOLSE-O1-NEXT: add w0, w9, w8 ; CHECK-NOLSE-O1-NEXT: ret ; ; CHECK-NOLSE-O0-LABEL: atomic_load_relaxed_32: @@ -911,14 +911,14 @@ define i32 @atomic_load_relaxed_32(i32* %p, i32 %off32) #0 { define i64 @atomic_load_relaxed_64(i64* %p, i32 %off32) #0 { ; CHECK-NOLSE-O1-LABEL: atomic_load_relaxed_64: ; CHECK-NOLSE-O1: ; %bb.0: -; CHECK-NOLSE-O1-NEXT: ldr x8, [x0, #32760] -; CHECK-NOLSE-O1-NEXT: ldr x9, [x0, w1, sxtw #3] -; CHECK-NOLSE-O1-NEXT: ldur x10, [x0, #-256] -; CHECK-NOLSE-O1-NEXT: add x11, x0, #291, lsl #12 ; =1191936 -; CHECK-NOLSE-O1-NEXT: ldr x11, [x11] -; CHECK-NOLSE-O1-NEXT: add x8, x8, x9 -; CHECK-NOLSE-O1-NEXT: add x8, x8, x10 -; CHECK-NOLSE-O1-NEXT: add x0, x8, x11 +; CHECK-NOLSE-O1-NEXT: add x8, x0, #291, lsl #12 ; =1191936 +; CHECK-NOLSE-O1-NEXT: ldr x9, [x0, #32760] +; CHECK-NOLSE-O1-NEXT: ldr x10, [x0, w1, sxtw #3] +; CHECK-NOLSE-O1-NEXT: ldur x11, [x0, #-256] +; CHECK-NOLSE-O1-NEXT: ldr x8, [x8] +; CHECK-NOLSE-O1-NEXT: add x9, x9, x10 +; CHECK-NOLSE-O1-NEXT: add x9, x9, x11 +; CHECK-NOLSE-O1-NEXT: add x0, x9, x8 ; CHECK-NOLSE-O1-NEXT: ret ; ; CHECK-NOLSE-O0-LABEL: atomic_load_relaxed_64: @@ -2717,8 +2717,8 @@ define { i8, i1 } @cmpxchg_i8(i8* %ptr, i8 %desired, i8 %new) { ; CHECK-NOLSE-O1-NEXT: ; kill: def $w0 killed $w0 killed $x0 ; CHECK-NOLSE-O1-NEXT: ret ; CHECK-NOLSE-O1-NEXT: LBB47_4: ; %cmpxchg.nostore -; CHECK-NOLSE-O1-NEXT: clrex ; CHECK-NOLSE-O1-NEXT: mov w1, wzr +; CHECK-NOLSE-O1-NEXT: clrex ; CHECK-NOLSE-O1-NEXT: ; kill: def $w0 killed $w0 killed $x0 ; CHECK-NOLSE-O1-NEXT: ret ; @@ -2783,8 +2783,8 @@ define { i16, i1 } @cmpxchg_i16(i16* %ptr, i16 %desired, i16 %new) { ; CHECK-NOLSE-O1-NEXT: ; kill: def $w0 killed $w0 killed $x0 ; CHECK-NOLSE-O1-NEXT: ret ; CHECK-NOLSE-O1-NEXT: LBB48_4: ; %cmpxchg.nostore -; CHECK-NOLSE-O1-NEXT: clrex ; CHECK-NOLSE-O1-NEXT: mov w1, wzr +; CHECK-NOLSE-O1-NEXT: clrex ; CHECK-NOLSE-O1-NEXT: ; kill: def $w0 killed $w0 killed $x0 ; CHECK-NOLSE-O1-NEXT: ret ; diff --git a/llvm/test/CodeGen/AArch64/GlobalISel/byval-call.ll b/llvm/test/CodeGen/AArch64/GlobalISel/byval-call.ll index f8d4731d3249..651ca31ae555 100644 --- a/llvm/test/CodeGen/AArch64/GlobalISel/byval-call.ll +++ b/llvm/test/CodeGen/AArch64/GlobalISel/byval-call.ll @@ -27,8 +27,8 @@ define void @call_byval_a64i32([64 x i32]* %incoming) { ; CHECK: // %bb.0: ; CHECK-NEXT: sub sp, sp, #288 ; CHECK-NEXT: stp x29, x30, [sp, #256] // 16-byte Folded Spill -; CHECK-NEXT: str x28, [sp, #272] // 8-byte Folded Spill ; CHECK-NEXT: add x29, sp, #256 +; CHECK-NEXT: str x28, [sp, #272] // 8-byte Folded Spill ; CHECK-NEXT: .cfi_def_cfa w29, 32 ; CHECK-NEXT: .cfi_offset w28, -16 ; CHECK-NEXT: .cfi_offset w30, -24 @@ -66,8 +66,8 @@ define void @call_byval_a64i32([64 x i32]* %incoming) { ; CHECK-NEXT: ldr q0, [x0, #240] ; CHECK-NEXT: str q0, [sp, #240] ; CHECK-NEXT: bl byval_a64i32 -; CHECK-NEXT: ldr x28, [sp, #272] // 8-byte Folded Reload ; CHECK-NEXT: ldp x29, x30, [sp, #256] // 16-byte Folded Reload +; CHECK-NEXT: ldr x28, [sp, #272] // 8-byte Folded Reload ; CHECK-NEXT: add sp, sp, #288 ; CHECK-NEXT: ret call void @byval_a64i32([64 x i32]* byval([64 x i32]) %incoming) diff --git a/llvm/test/CodeGen/AArch64/GlobalISel/call-translator-variadic-musttail.ll b/llvm/test/CodeGen/AArch64/GlobalISel/call-translator-variadic-musttail.ll index 42e91f631822..44c0854ea03d 100644 --- a/llvm/test/CodeGen/AArch64/GlobalISel/call-translator-variadic-musttail.ll +++ b/llvm/test/CodeGen/AArch64/GlobalISel/call-translator-variadic-musttail.ll @@ -63,15 +63,12 @@ define i32 @test_musttail_variadic_spill(i32 %arg0, ...) { ; CHECK-NEXT: mov x25, x6 ; CHECK-NEXT: mov x26, x7 ; CHECK-NEXT: stp q1, q0, [sp, #96] ; 32-byte Folded Spill +; CHECK-NEXT: mov x27, x8 ; CHECK-NEXT: stp q3, q2, [sp, #64] ; 32-byte Folded Spill ; CHECK-NEXT: stp q5, q4, [sp, #32] ; 32-byte Folded Spill ; CHECK-NEXT: stp q7, q6, [sp] ; 32-byte Folded Spill -; CHECK-NEXT: mov x27, x8 ; CHECK-NEXT: bl _puts ; CHECK-NEXT: ldp q1, q0, [sp, #96] ; 32-byte Folded Reload -; CHECK-NEXT: ldp q3, q2, [sp, #64] ; 32-byte Folded Reload -; CHECK-NEXT: ldp q5, q4, [sp, #32] ; 32-byte Folded Reload -; CHECK-NEXT: ldp q7, q6, [sp] ; 32-byte Folded Reload ; CHECK-NEXT: mov w0, w19 ; CHECK-NEXT: mov x1, x20 ; CHECK-NEXT: mov x2, x21 @@ -81,6 +78,9 @@ define i32 @test_musttail_variadic_spill(i32 %arg0, ...) { ; CHECK-NEXT: mov x6, x25 ; CHECK-NEXT: mov x7, x26 ; CHECK-NEXT: mov x8, x27 +; CHECK-NEXT: ldp q3, q2, [sp, #64] ; 32-byte Folded Reload +; CHECK-NEXT: ldp q5, q4, [sp, #32] ; 32-byte Folded Reload +; CHECK-NEXT: ldp q7, q6, [sp] ; 32-byte Folded Reload ; CHECK-NEXT: ldp x29, x30, [sp, #208] ; 16-byte Folded Reload ; CHECK-NEXT: ldp x20, x19, [sp, #192] ; 16-byte Folded Reload ; CHECK-NEXT: ldp x22, x21, [sp, #176] ; 16-byte Folded Reload @@ -122,9 +122,8 @@ define void @f_thunk(i8* %this, ...) { ; CHECK-NEXT: .cfi_offset w26, -80 ; CHECK-NEXT: .cfi_offset w27, -88 ; CHECK-NEXT: .cfi_offset w28, -96 -; CHECK-NEXT: mov x27, x8 -; CHECK-NEXT: add x8, sp, #128 -; CHECK-NEXT: add x9, sp, #256 +; CHECK-NEXT: add x9, sp, #128 +; CHECK-NEXT: add x10, sp, #256 ; CHECK-NEXT: mov x19, x0 ; CHECK-NEXT: mov x20, x1 ; CHECK-NEXT: mov x21, x2 @@ -134,16 +133,14 @@ define void @f_thunk(i8* %this, ...) { ; CHECK-NEXT: mov x25, x6 ; CHECK-NEXT: mov x26, x7 ; CHECK-NEXT: stp q1, q0, [sp, #96] ; 32-byte Folded Spill +; CHECK-NEXT: mov x27, x8 ; CHECK-NEXT: stp q3, q2, [sp, #64] ; 32-byte Folded Spill ; CHECK-NEXT: stp q5, q4, [sp, #32] ; 32-byte Folded Spill ; CHECK-NEXT: stp q7, q6, [sp] ; 32-byte Folded Spill -; CHECK-NEXT: str x9, [x8] +; CHECK-NEXT: str x10, [x9] ; CHECK-NEXT: bl _get_f -; CHECK-NEXT: mov x9, x0 ; CHECK-NEXT: ldp q1, q0, [sp, #96] ; 32-byte Folded Reload -; CHECK-NEXT: ldp q3, q2, [sp, #64] ; 32-byte Folded Reload -; CHECK-NEXT: ldp q5, q4, [sp, #32] ; 32-byte Folded Reload -; CHECK-NEXT: ldp q7, q6, [sp] ; 32-byte Folded Reload +; CHECK-NEXT: mov x9, x0 ; CHECK-NEXT: mov x0, x19 ; CHECK-NEXT: mov x1, x20 ; CHECK-NEXT: mov x2, x21 @@ -153,6 +150,9 @@ define void @f_thunk(i8* %this, ...) { ; CHECK-NEXT: mov x6, x25 ; CHECK-NEXT: mov x7, x26 ; CHECK-NEXT: mov x8, x27 +; CHECK-NEXT: ldp q3, q2, [sp, #64] ; 32-byte Folded Reload +; CHECK-NEXT: ldp q5, q4, [sp, #32] ; 32-byte Folded Reload +; CHECK-NEXT: ldp q7, q6, [sp] ; 32-byte Folded Reload ; CHECK-NEXT: ldp x29, x30, [sp, #240] ; 16-byte Folded Reload ; CHECK-NEXT: ldp x20, x19, [sp, #224] ; 16-byte Folded Reload ; CHECK-NEXT: ldp x22, x21, [sp, #208] ; 16-byte Folded Reload @@ -195,9 +195,9 @@ define void @h_thunk(%struct.Foo* %this, ...) { ; CHECK-NEXT: Lloh2: ; CHECK-NEXT: adrp x10, _g@GOTPAGE ; CHECK-NEXT: ldr x9, [x0, #16] +; CHECK-NEXT: mov w11, #42 ; CHECK-NEXT: Lloh3: ; CHECK-NEXT: ldr x10, [x10, _g@GOTPAGEOFF] -; CHECK-NEXT: mov w11, #42 ; CHECK-NEXT: Lloh4: ; CHECK-NEXT: str w11, [x10] ; CHECK-NEXT: br x9 diff --git a/llvm/test/CodeGen/AArch64/GlobalISel/combine-udiv.ll b/llvm/test/CodeGen/AArch64/GlobalISel/combine-udiv.ll index 6d9dad450ef1..3dc45e4cf5a7 100644 --- a/llvm/test/CodeGen/AArch64/GlobalISel/combine-udiv.ll +++ b/llvm/test/CodeGen/AArch64/GlobalISel/combine-udiv.ll @@ -18,20 +18,20 @@ define <8 x i16> @combine_vec_udiv_uniform(<8 x i16> %x) { ; ; GISEL-LABEL: combine_vec_udiv_uniform: ; GISEL: // %bb.0: -; GISEL-NEXT: adrp x8, .LCPI0_1 -; GISEL-NEXT: ldr q1, [x8, :lo12:.LCPI0_1] -; GISEL-NEXT: adrp x8, .LCPI0_0 -; GISEL-NEXT: ldr q2, [x8, :lo12:.LCPI0_0] ; GISEL-NEXT: adrp x8, .LCPI0_2 -; GISEL-NEXT: ldr q3, [x8, :lo12:.LCPI0_2] -; GISEL-NEXT: sub v1.8h, v2.8h, v1.8h -; GISEL-NEXT: neg v1.8h, v1.8h -; GISEL-NEXT: umull2 v2.4s, v0.8h, v3.8h -; GISEL-NEXT: umull v3.4s, v0.4h, v3.4h -; GISEL-NEXT: uzp2 v2.8h, v3.8h, v2.8h -; GISEL-NEXT: sub v0.8h, v0.8h, v2.8h -; GISEL-NEXT: ushl v0.8h, v0.8h, v1.8h -; GISEL-NEXT: add v0.8h, v0.8h, v2.8h +; GISEL-NEXT: adrp x9, .LCPI0_0 +; GISEL-NEXT: ldr q1, [x8, :lo12:.LCPI0_2] +; GISEL-NEXT: adrp x8, .LCPI0_1 +; GISEL-NEXT: ldr q4, [x9, :lo12:.LCPI0_0] +; GISEL-NEXT: umull2 v2.4s, v0.8h, v1.8h +; GISEL-NEXT: ldr q3, [x8, :lo12:.LCPI0_1] +; GISEL-NEXT: umull v1.4s, v0.4h, v1.4h +; GISEL-NEXT: uzp2 v1.8h, v1.8h, v2.8h +; GISEL-NEXT: sub v2.8h, v4.8h, v3.8h +; GISEL-NEXT: sub v0.8h, v0.8h, v1.8h +; GISEL-NEXT: neg v2.8h, v2.8h +; GISEL-NEXT: ushl v0.8h, v0.8h, v2.8h +; GISEL-NEXT: add v0.8h, v0.8h, v1.8h ; GISEL-NEXT: ushr v0.8h, v0.8h, #4 ; GISEL-NEXT: ret %1 = udiv <8 x i16> %x, <i16 23, i16 23, i16 23, i16 23, i16 23, i16 23, i16 23, i16 23> @@ -44,53 +44,53 @@ define <8 x i16> @combine_vec_udiv_nonuniform(<8 x i16> %x) { ; SDAG-NEXT: adrp x8, .LCPI1_0 ; SDAG-NEXT: ldr q1, [x8, :lo12:.LCPI1_0] ; SDAG-NEXT: adrp x8, .LCPI1_1 +; SDAG-NEXT: ushl v1.8h, v0.8h, v1.8h ; SDAG-NEXT: ldr q2, [x8, :lo12:.LCPI1_1] ; SDAG-NEXT: adrp x8, .LCPI1_2 -; SDAG-NEXT: ldr q3, [x8, :lo12:.LCPI1_2] -; SDAG-NEXT: ushl v1.8h, v0.8h, v1.8h -; SDAG-NEXT: umull2 v4.4s, v1.8h, v2.8h +; SDAG-NEXT: umull2 v3.4s, v1.8h, v2.8h ; SDAG-NEXT: umull v1.4s, v1.4h, v2.4h +; SDAG-NEXT: ldr q2, [x8, :lo12:.LCPI1_2] ; SDAG-NEXT: adrp x8, .LCPI1_3 -; SDAG-NEXT: uzp2 v1.8h, v1.8h, v4.8h -; SDAG-NEXT: ldr q2, [x8, :lo12:.LCPI1_3] +; SDAG-NEXT: uzp2 v1.8h, v1.8h, v3.8h ; SDAG-NEXT: sub v0.8h, v0.8h, v1.8h -; SDAG-NEXT: umull2 v4.4s, v0.8h, v3.8h -; SDAG-NEXT: umull v0.4s, v0.4h, v3.4h -; SDAG-NEXT: uzp2 v0.8h, v0.8h, v4.8h +; SDAG-NEXT: umull2 v3.4s, v0.8h, v2.8h +; SDAG-NEXT: umull v0.4s, v0.4h, v2.4h +; SDAG-NEXT: uzp2 v0.8h, v0.8h, v3.8h ; SDAG-NEXT: add v0.8h, v0.8h, v1.8h -; SDAG-NEXT: ushl v0.8h, v0.8h, v2.8h +; SDAG-NEXT: ldr q1, [x8, :lo12:.LCPI1_3] +; SDAG-NEXT: ushl v0.8h, v0.8h, v1.8h ; SDAG-NEXT: ret ; ; GISEL-LABEL: combine_vec_udiv_nonuniform: ; GISEL: // %bb.0: -; GISEL-NEXT: adrp x8, .LCPI1_5 -; GISEL-NEXT: ldr q1, [x8, :lo12:.LCPI1_5] ; GISEL-NEXT: adrp x8, .LCPI1_4 -; GISEL-NEXT: ldr q2, [x8, :lo12:.LCPI1_4] +; GISEL-NEXT: adrp x10, .LCPI1_0 +; GISEL-NEXT: adrp x9, .LCPI1_1 +; GISEL-NEXT: ldr q1, [x8, :lo12:.LCPI1_4] ; GISEL-NEXT: adrp x8, .LCPI1_3 -; GISEL-NEXT: ldr q3, [x8, :lo12:.LCPI1_3] -; GISEL-NEXT: adrp x8, .LCPI1_1 -; GISEL-NEXT: ldr q4, [x8, :lo12:.LCPI1_1] -; GISEL-NEXT: adrp x8, .LCPI1_0 -; GISEL-NEXT: ldr q5, [x8, :lo12:.LCPI1_0] +; GISEL-NEXT: ldr q5, [x10, :lo12:.LCPI1_0] +; GISEL-NEXT: ldr q6, [x9, :lo12:.LCPI1_1] +; GISEL-NEXT: neg v1.8h, v1.8h +; GISEL-NEXT: ldr q2, [x8, :lo12:.LCPI1_3] ; GISEL-NEXT: adrp x8, .LCPI1_2 -; GISEL-NEXT: neg v2.8h, v2.8h -; GISEL-NEXT: ldr q6, [x8, :lo12:.LCPI1_2] -; GISEL-NEXT: ushl v2.8h, v0.8h, v2.8h -; GISEL-NEXT: cmeq v1.8h, v1.8h, v5.8h -; GISEL-NEXT: umull2 v5.4s, v2.8h, v3.8h +; GISEL-NEXT: ushl v1.8h, v0.8h, v1.8h +; GISEL-NEXT: umull2 v3.4s, v1.8h, v2.8h +; GISEL-NEXT: umull v1.4s, v1.4h, v2.4h +; GISEL-NEXT: uzp2 v1.8h, v1.8h, v3.8h +; GISEL-NEXT: ldr q3, [x8, :lo12:.LCPI1_2] +; GISEL-NEXT: adrp x8, .LCPI1_5 +; GISEL-NEXT: sub v2.8h, v0.8h, v1.8h +; GISEL-NEXT: umull2 v4.4s, v2.8h, v3.8h ; GISEL-NEXT: umull v2.4s, v2.4h, v3.4h -; GISEL-NEXT: uzp2 v2.8h, v2.8h, v5.8h -; GISEL-NEXT: sub v3.8h, v0.8h, v2.8h -; GISEL-NEXT: umull2 v5.4s, v3.8h, v6.8h -; GISEL-NEXT: umull v3.4s, v3.4h, v6.4h -; GISEL-NEXT: uzp2 v3.8h, v3.8h, v5.8h -; GISEL-NEXT: neg v4.8h, v4.8h -; GISEL-NEXT: shl v1.8h, v1.8h, #15 -; GISEL-NEXT: add v2.8h, v3.8h, v2.8h -; GISEL-NEXT: ushl v2.8h, v2.8h, v4.8h -; GISEL-NEXT: sshr v1.8h, v1.8h, #15 -; GISEL-NEXT: bif v0.16b, v2.16b, v1.16b +; GISEL-NEXT: ldr q3, [x8, :lo12:.LCPI1_5] +; GISEL-NEXT: cmeq v3.8h, v3.8h, v5.8h +; GISEL-NEXT: uzp2 v2.8h, v2.8h, v4.8h +; GISEL-NEXT: neg v4.8h, v6.8h +; GISEL-NEXT: add v1.8h, v2.8h, v1.8h +; GISEL-NEXT: shl v2.8h, v3.8h, #15 +; GISEL-NEXT: ushl v1.8h, v1.8h, v4.8h +; GISEL-NEXT: sshr v2.8h, v2.8h, #15 +; GISEL-NEXT: bif v0.16b, v1.16b, v2.16b ; GISEL-NEXT: ret %1 = udiv <8 x i16> %x, <i16 23, i16 34, i16 -23, i16 56, i16 128, i16 -1, i16 -256, i16 -32768> ret <8 x i16> %1 @@ -100,41 +100,41 @@ define <8 x i16> @combine_vec_udiv_nonuniform2(<8 x i16> %x) { ; SDAG-LABEL: combine_vec_udiv_nonuniform2: ; SDAG: // %bb.0: ; SDAG-NEXT: adrp x8, .LCPI2_0 -; SDAG-NEXT: adrp x9, .LCPI2_1 ; SDAG-NEXT: ldr q1, [x8, :lo12:.LCPI2_0] -; SDAG-NEXT: ldr q2, [x9, :lo12:.LCPI2_1] +; SDAG-NEXT: adrp x8, .LCPI2_1 +; SDAG-NEXT: ushl v0.8h, v0.8h, v1.8h +; SDAG-NEXT: ldr q1, [x8, :lo12:.LCPI2_1] ; SDAG-NEXT: adrp x8, .LCPI2_2 -; SDAG-NEXT: ldr q3, [x8, :lo12:.LCPI2_2] +; SDAG-NEXT: umull2 v2.4s, v0.8h, v1.8h +; SDAG-NEXT: umull v0.4s, v0.4h, v1.4h +; SDAG-NEXT: ldr q1, [x8, :lo12:.LCPI2_2] +; SDAG-NEXT: uzp2 v0.8h, v0.8h, v2.8h ; SDAG-NEXT: ushl v0.8h, v0.8h, v1.8h -; SDAG-NEXT: umull2 v1.4s, v0.8h, v2.8h -; SDAG-NEXT: umull v0.4s, v0.4h, v2.4h -; SDAG-NEXT: uzp2 v0.8h, v0.8h, v1.8h -; SDAG-NEXT: ushl v0.8h, v0.8h, v3.8h ; SDAG-NEXT: ret ; ; GISEL-LABEL: combine_vec_udiv_nonuniform2: ; GISEL: // %bb.0: -; GISEL-NEXT: adrp x8, .LCPI2_4 -; GISEL-NEXT: ldr q1, [x8, :lo12:.LCPI2_4] ; GISEL-NEXT: adrp x8, .LCPI2_3 -; GISEL-NEXT: ldr q2, [x8, :lo12:.LCPI2_3] -; GISEL-NEXT: adrp x8, .LCPI2_1 -; GISEL-NEXT: ldr q3, [x8, :lo12:.LCPI2_1] -; GISEL-NEXT: adrp x8, .LCPI2_0 -; GISEL-NEXT: ldr q4, [x8, :lo12:.LCPI2_0] +; GISEL-NEXT: adrp x9, .LCPI2_4 +; GISEL-NEXT: adrp x10, .LCPI2_0 +; GISEL-NEXT: ldr q1, [x8, :lo12:.LCPI2_3] ; GISEL-NEXT: adrp x8, .LCPI2_2 -; GISEL-NEXT: ldr q5, [x8, :lo12:.LCPI2_2] +; GISEL-NEXT: ldr q3, [x9, :lo12:.LCPI2_4] +; GISEL-NEXT: ldr q4, [x10, :lo12:.LCPI2_0] +; GISEL-NEXT: neg v1.8h, v1.8h +; GISEL-NEXT: ldr q2, [x8, :lo12:.LCPI2_2] +; GISEL-NEXT: adrp x8, .LCPI2_1 +; GISEL-NEXT: cmeq v3.8h, v3.8h, v4.8h +; GISEL-NEXT: ushl v1.8h, v0.8h, v1.8h +; GISEL-NEXT: shl v3.8h, v3.8h, #15 +; GISEL-NEXT: umull2 v5.4s, v1.8h, v2.8h +; GISEL-NEXT: umull v1.4s, v1.4h, v2.4h +; GISEL-NEXT: ldr q2, [x8, :lo12:.LCPI2_1] ; GISEL-NEXT: neg v2.8h, v2.8h -; GISEL-NEXT: ushl v2.8h, v0.8h, v2.8h -; GISEL-NEXT: cmeq v1.8h, v1.8h, v4.8h -; GISEL-NEXT: umull2 v4.4s, v2.8h, v5.8h -; GISEL-NEXT: umull v2.4s, v2.4h, v5.4h -; GISEL-NEXT: neg v3.8h, v3.8h -; GISEL-NEXT: shl v1.8h, v1.8h, #15 -; GISEL-NEXT: uzp2 v2.8h, v2.8h, v4.8h -; GISEL-NEXT: ushl v2.8h, v2.8h, v3.8h -; GISEL-NEXT: sshr v1.8h, v1.8h, #15 -; GISEL-NEXT: bif v0.16b, v2.16b, v1.16b +; GISEL-NEXT: uzp2 v1.8h, v1.8h, v5.8h +; GISEL-NEXT: ushl v1.8h, v1.8h, v2.8h +; GISEL-NEXT: sshr v2.8h, v3.8h, #15 +; GISEL-NEXT: bif v0.16b, v1.16b, v2.16b ; GISEL-NEXT: ret %1 = udiv <8 x i16> %x, <i16 -34, i16 35, i16 36, i16 -37, i16 38, i16 -39, i16 40, i16 -41> ret <8 x i16> %1 @@ -146,43 +146,43 @@ define <8 x i16> @combine_vec_udiv_nonuniform3(<8 x i16> %x) { ; SDAG-NEXT: adrp x8, .LCPI3_0 ; SDAG-NEXT: ldr q1, [x8, :lo12:.LCPI3_0] ; SDAG-NEXT: adrp x8, .LCPI3_1 -; SDAG-NEXT: ldr q3, [x8, :lo12:.LCPI3_1] ; SDAG-NEXT: umull2 v2.4s, v0.8h, v1.8h ; SDAG-NEXT: umull v1.4s, v0.4h, v1.4h ; SDAG-NEXT: uzp2 v1.8h, v1.8h, v2.8h ; SDAG-NEXT: sub v0.8h, v0.8h, v1.8h ; SDAG-NEXT: usra v1.8h, v0.8h, #1 -; SDAG-NEXT: ushl v0.8h, v1.8h, v3.8h +; SDAG-NEXT: ldr q0, [x8, :lo12:.LCPI3_1] +; SDAG-NEXT: ushl v0.8h, v1.8h, v0.8h ; SDAG-NEXT: ret ; ; GISEL-LABEL: combine_vec_udiv_nonuniform3: ; GISEL: // %bb.0: -; GISEL-NEXT: adrp x8, .LCPI3_5 -; GISEL-NEXT: ldr q1, [x8, :lo12:.LCPI3_5] ; GISEL-NEXT: adrp x8, .LCPI3_4 -; GISEL-NEXT: ldr q2, [x8, :lo12:.LCPI3_4] -; GISEL-NEXT: adrp x8, .LCPI3_2 -; GISEL-NEXT: ldr q3, [x8, :lo12:.LCPI3_2] -; GISEL-NEXT: adrp x8, .LCPI3_1 -; GISEL-NEXT: ldr q4, [x8, :lo12:.LCPI3_1] -; GISEL-NEXT: adrp x8, .LCPI3_3 -; GISEL-NEXT: ldr q5, [x8, :lo12:.LCPI3_3] -; GISEL-NEXT: adrp x8, .LCPI3_0 -; GISEL-NEXT: ldr q6, [x8, :lo12:.LCPI3_0] -; GISEL-NEXT: sub v3.8h, v4.8h, v3.8h -; GISEL-NEXT: umull2 v4.4s, v0.8h, v2.8h -; GISEL-NEXT: umull v2.4s, v0.4h, v2.4h -; GISEL-NEXT: uzp2 v2.8h, v2.8h, v4.8h -; GISEL-NEXT: neg v3.8h, v3.8h -; GISEL-NEXT: sub v4.8h, v0.8h, v2.8h -; GISEL-NEXT: cmeq v1.8h, v1.8h, v6.8h -; GISEL-NEXT: ushl v3.8h, v4.8h, v3.8h -; GISEL-NEXT: neg v5.8h, v5.8h -; GISEL-NEXT: shl v1.8h, v1.8h, #15 -; GISEL-NEXT: add v2.8h, v3.8h, v2.8h -; GISEL-NEXT: ushl v2.8h, v2.8h, v5.8h -; GISEL-NEXT: sshr v1.8h, v1.8h, #15 -; GISEL-NEXT: bif v0.16b, v2.16b, v1.16b +; GISEL-NEXT: adrp x9, .LCPI3_2 +; GISEL-NEXT: adrp x10, .LCPI3_1 +; GISEL-NEXT: ldr q1, [x8, :lo12:.LCPI3_4] +; GISEL-NEXT: adrp x8, .LCPI3_5 +; GISEL-NEXT: ldr q2, [x9, :lo12:.LCPI3_2] +; GISEL-NEXT: adrp x9, .LCPI3_3 +; GISEL-NEXT: ldr q3, [x10, :lo12:.LCPI3_1] +; GISEL-NEXT: adrp x10, .LCPI3_0 +; GISEL-NEXT: umull2 v4.4s, v0.8h, v1.8h +; GISEL-NEXT: umull v1.4s, v0.4h, v1.4h +; GISEL-NEXT: ldr q6, [x9, :lo12:.LCPI3_3] +; GISEL-NEXT: sub v2.8h, v3.8h, v2.8h +; GISEL-NEXT: ldr q5, [x10, :lo12:.LCPI3_0] +; GISEL-NEXT: uzp2 v1.8h, v1.8h, v4.8h +; GISEL-NEXT: ldr q4, [x8, :lo12:.LCPI3_5] +; GISEL-NEXT: neg v2.8h, v2.8h +; GISEL-NEXT: sub v3.8h, v0.8h, v1.8h +; GISEL-NEXT: ushl v2.8h, v3.8h, v2.8h +; GISEL-NEXT: cmeq v3.8h, v4.8h, v5.8h +; GISEL-NEXT: neg v4.8h, v6.8h +; GISEL-NEXT: add v1.8h, v2.8h, v1.8h +; GISEL-NEXT: shl v2.8h, v3.8h, #15 +; GISEL-NEXT: ushl v1.8h, v1.8h, v4.8h +; GISEL-NEXT: sshr v2.8h, v2.8h, #15 +; GISEL-NEXT: bif v0.16b, v1.16b, v2.16b ; GISEL-NEXT: ret %1 = udiv <8 x i16> %x, <i16 7, i16 23, i16 25, i16 27, i16 31, i16 47, i16 63, i16 127> ret <8 x i16> %1 @@ -192,39 +192,39 @@ define <16 x i8> @combine_vec_udiv_nonuniform4(<16 x i8> %x) { ; SDAG-LABEL: combine_vec_udiv_nonuniform4: ; SDAG: // %bb.0: ; SDAG-NEXT: adrp x8, .LCPI4_0 +; SDAG-NEXT: adrp x9, .LCPI4_3 ; SDAG-NEXT: ldr q1, [x8, :lo12:.LCPI4_0] ; SDAG-NEXT: adrp x8, .LCPI4_1 +; SDAG-NEXT: ldr q3, [x9, :lo12:.LCPI4_3] +; SDAG-NEXT: umull2 v2.8h, v0.16b, v1.16b +; SDAG-NEXT: umull v1.8h, v0.8b, v1.8b +; SDAG-NEXT: and v0.16b, v0.16b, v3.16b +; SDAG-NEXT: uzp2 v1.16b, v1.16b, v2.16b ; SDAG-NEXT: ldr q2, [x8, :lo12:.LCPI4_1] ; SDAG-NEXT: adrp x8, .LCPI4_2 -; SDAG-NEXT: ldr q3, [x8, :lo12:.LCPI4_2] -; SDAG-NEXT: adrp x8, .LCPI4_3 -; SDAG-NEXT: ldr q4, [x8, :lo12:.LCPI4_3] -; SDAG-NEXT: umull2 v5.8h, v0.16b, v1.16b -; SDAG-NEXT: umull v1.8h, v0.8b, v1.8b -; SDAG-NEXT: uzp2 v1.16b, v1.16b, v5.16b ; SDAG-NEXT: ushl v1.16b, v1.16b, v2.16b -; SDAG-NEXT: and v1.16b, v1.16b, v3.16b -; SDAG-NEXT: and v0.16b, v0.16b, v4.16b +; SDAG-NEXT: ldr q2, [x8, :lo12:.LCPI4_2] +; SDAG-NEXT: and v1.16b, v1.16b, v2.16b ; SDAG-NEXT: orr v0.16b, v0.16b, v1.16b ; SDAG-NEXT: ret ; ; GISEL-LABEL: combine_vec_udiv_nonuniform4: ; GISEL: // %bb.0: ; GISEL-NEXT: adrp x8, .LCPI4_3 +; GISEL-NEXT: adrp x9, .LCPI4_2 +; GISEL-NEXT: adrp x10, .LCPI4_1 ; GISEL-NEXT: ldr q1, [x8, :lo12:.LCPI4_3] ; GISEL-NEXT: adrp x8, .LCPI4_0 -; GISEL-NEXT: ldr q2, [x8, :lo12:.LCPI4_0] -; GISEL-NEXT: adrp x8, .LCPI4_2 -; GISEL-NEXT: ldr q3, [x8, :lo12:.LCPI4_2] -; GISEL-NEXT: adrp x8, .LCPI4_1 -; GISEL-NEXT: ldr q4, [x8, :lo12:.LCPI4_1] -; GISEL-NEXT: cmeq v1.16b, v1.16b, v2.16b -; GISEL-NEXT: umull2 v2.8h, v0.16b, v3.16b -; GISEL-NEXT: umull v3.8h, v0.8b, v3.8b -; GISEL-NEXT: neg v4.16b, v4.16b -; GISEL-NEXT: uzp2 v2.16b, v3.16b, v2.16b +; GISEL-NEXT: ldr q2, [x9, :lo12:.LCPI4_2] +; GISEL-NEXT: ldr q3, [x10, :lo12:.LCPI4_1] +; GISEL-NEXT: ldr q4, [x8, :lo12:.LCPI4_0] +; GISEL-NEXT: umull2 v5.8h, v0.16b, v2.16b +; GISEL-NEXT: umull v2.8h, v0.8b, v2.8b +; GISEL-NEXT: cmeq v1.16b, v1.16b, v4.16b +; GISEL-NEXT: neg v3.16b, v3.16b +; GISEL-NEXT: uzp2 v2.16b, v2.16b, v5.16b ; GISEL-NEXT: shl v1.16b, v1.16b, #7 -; GISEL-NEXT: ushl v2.16b, v2.16b, v4.16b +; GISEL-NEXT: ushl v2.16b, v2.16b, v3.16b ; GISEL-NEXT: sshr v1.16b, v1.16b, #7 ; GISEL-NEXT: bif v0.16b, v2.16b, v1.16b ; GISEL-NEXT: ret @@ -236,55 +236,55 @@ define <8 x i16> @pr38477(<8 x i16> %a0) { </cut>

4 years, 9 months

[TCWG CI] 470.lbm slowed down by 15% after llvm: [AArch64] Make -mcpu=generic schedule for an in-order core

by ci_notify＠linaro.org

After llvm commit adec9223616477df023026b0269ccd008701cc94 Author: David Green <david.green(a)arm.com> [AArch64] Make -mcpu=generic schedule for an in-order core the following benchmarks slowed down by more than 2%: - 470.lbm slowed down by 15% from 16308 to 18676 perf samples - 433.milc slowed down by 9% from 12206 to 13270 perf samples Below reproducer instructions can be used to re-build both "first_bad" and "last_good" cross-toolchains used in this bisection. Naturally, the scripts will fail when triggerring benchmarking jobs if you don't have access to Linaro TCWG CI. For your convenience, we have uploaded tarballs with pre-processed source and assembly files at: - First_bad save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… - Last_good save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… - Baseline save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Configuration: - Benchmark: SPEC CPU2006 - Toolchain: Clang + Glibc + LLVM Linker - Version: all components were built from their tip of trunk - Target: aarch64-linux-gnu - Compiler flags: -O2 -flto - Hardware: NVidia TX1 4x Cortex-A57 This benchmarking CI is work-in-progress, and we welcome feedback and suggestions at linaro-toolchain(a)lists.linaro.org . In our improvement plans is to add support for SPEC CPU2017 benchmarks and provide "perf report/annotate" data behind these reports. THIS IS THE END OF INTERESTING STUFF. BELOW ARE LINKS TO BUILDS, REPRODUCTION INSTRUCTIONS, AND THE RAW COMMIT. This commit has regressed these CI configurations: - tcwg_bmk_llvm_tx1/llvm-master-aarch64-spec2k6-O2_LTO First_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Baseline build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Even more details: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Reproduce builds: <cut> mkdir investigate-llvm-adec9223616477df023026b0269ccd008701cc94 cd investigate-llvm-adec9223616477df023026b0269ccd008701cc94 # Fetch scripts git clone https://git.linaro.org/toolchain/jenkins-scripts # Fetch manifests and test.sh script mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /llvm/ ./ ./bisect/baseline/ cd llvm # Reproduce first_bad build git checkout --detach adec9223616477df023026b0269ccd008701cc94 ../artifacts/test.sh # Reproduce last_good build git checkout --detach e2a2e5475cbd370044474e132a1b5c58e6a3d458 ../artifacts/test.sh cd .. </cut> Full commit (up to 1000 lines): <cut> commit adec9223616477df023026b0269ccd008701cc94 Author: David Green <david.green(a)arm.com> Date: Sat Oct 9 15:58:31 2021 +0100 [AArch64] Make -mcpu=generic schedule for an in-order core We would like to start pushing -mcpu=generic towards enabling the set of features that improves performance for some CPUs, without hurting any others. A blend of the performance options hopefully beneficial to all CPUs. The largest part of that is enabling in-order scheduling using the Cortex-A55 schedule model. This is similar to the Arm backend change from eecb353d0e25ba which made -mcpu=generic perform in-order scheduling using the cortex-a8 schedule model. The idea is that in-order cpu's require the most help in instruction scheduling, whereas out-of-order cpus can for the most part out-of-order schedule around different codegen. Our benchmarking suggests that hypothesis holds. When running on an in-order core this improved performance by 3.8% geomean on a set of DSP workloads, 2% geomean on some other embedded benchmark and between 1% and 1.8% on a set of singlecore and multicore workloads, all running on a Cortex-A55 cluster. On an out-of-order cpu the results are a lot more noisy but show flat performance or an improvement. On the set of DSP and embedded benchmarks, run on a Cortex-A78 there was a very noisy 1% speed improvement. Using the most detailed results I could find, SPEC2006 runs on a Neoverse N1 show a small increase in instruction count (+0.127%), but a decrease in cycle counts (-0.155%, on average). The instruction count is very low noise, the cycle count is more noisy with a 0.15% decrease not being significant. SPEC2k17 shows a small decrease (-0.2%) in instruction count leading to a -0.296% decrease in cycle count. These results are within noise margins but tend to show a small improvement in general. When specifying an Apple target, clang will set "-target-cpu apple-a7" on the command line, so should not be affected by this change when running from clang. This also doesn't enable more runtime unrolling like -mcpu=cortex-a55 does, only changing the schedule used. A lot of existing tests have updated. This is a summary of the important differences: - Most changes are the same instructions in a different order. - Sometimes this leads to very minor inefficiencies, such as requiring an extra mov to move variables into r0/v0 for the return value of a test function. - misched-fusion.ll was no longer fusing the pairs of instructions it should, as per D110561. I've changed the schedule used in the test for now. - neon-mla-mls.ll now uses "mul; sub" as opposed to "neg; mla" due to the different latencies. This seems fine to me. - Some SVE tests do not always remove movprfx where they did before due to different register allocation giving different destructive forms. - The tests argument-blocks-array-of-struct.ll and arm64-windows-calls.ll produce two LDR where they previously produced an LDP due to store-pair-suppress kicking in. - arm64-ldp.ll and arm64-neon-copy.ll are missing pre/postinc on LPD. - Some tests such as arm64-neon-mul-div.ll and ragreedy-local-interval-cost.ll have more, less or just different spilling. - In aarch64_generated_funcs.ll.generated.expected one part of the function is no longer outlined. Interestingly if I switch this to use any other scheduled even less is outlined. Some of these are expected to happen, such as differences in outlining or register spilling. There will be places where these result in worse codegen, places where they are better, with the SPEC instruction counts suggesting it is not a decrease overall, on average. Differential Revision: https://reviews.llvm.org/D110830 --- llvm/lib/Target/AArch64/AArch64.td | 2 +- .../Analysis/CostModel/AArch64/shuffle-select.ll | 2 +- .../Analysis/CostModel/AArch64/vector-select.ll | 4 +- llvm/test/CodeGen/AArch64/DAGCombine_vscale.ll | 2 +- .../CodeGen/AArch64/GlobalISel/arm64-atomic.ll | 68 +- llvm/test/CodeGen/AArch64/GlobalISel/byval-call.ll | 4 +- .../call-translator-variadic-musttail.ll | 26 +- .../CodeGen/AArch64/GlobalISel/combine-udiv.ll | 308 +- .../AArch64/GlobalISel/merge-stores-truncating.ll | 10 +- llvm/test/CodeGen/AArch64/GlobalISel/swifterror.ll | 86 +- llvm/test/CodeGen/AArch64/aarch64-addv.ll | 2 +- llvm/test/CodeGen/AArch64/aarch64-be-bv.ll | 40 +- .../CodeGen/AArch64/aarch64-dup-ext-scalable.ll | 40 +- llvm/test/CodeGen/AArch64/aarch64-dup-ext.ll | 18 +- llvm/test/CodeGen/AArch64/aarch64-fold-lslfast.ll | 12 +- llvm/test/CodeGen/AArch64/aarch64-load-ext.ll | 36 +- .../CodeGen/AArch64/aarch64-matrix-umull-smull.ll | 24 +- llvm/test/CodeGen/AArch64/aarch64-smull.ll | 124 +- llvm/test/CodeGen/AArch64/aarch64-tail-dup-size.ll | 6 +- .../test/CodeGen/AArch64/aarch64_win64cc_vararg.ll | 4 +- llvm/test/CodeGen/AArch64/addimm-mulimm.ll | 32 +- .../CodeGen/AArch64/addsub-constant-folding.ll | 18 +- llvm/test/CodeGen/AArch64/addsub.ll | 2 +- llvm/test/CodeGen/AArch64/align-down.ll | 10 +- llvm/test/CodeGen/AArch64/and-mask-removal.ll | 12 +- .../AArch64/argument-blocks-array-of-struct.ll | 51 +- llvm/test/CodeGen/AArch64/arm64-AdvSIMD-Scalar.ll | 24 +- .../CodeGen/AArch64/arm64-addr-type-promotion.ll | 37 +- llvm/test/CodeGen/AArch64/arm64-addrmode.ll | 6 +- .../test/CodeGen/AArch64/arm64-bitfield-extract.ll | 14 +- llvm/test/CodeGen/AArch64/arm64-collect-loh.ll | 2 +- llvm/test/CodeGen/AArch64/arm64-convert-v4f64.ll | 22 +- llvm/test/CodeGen/AArch64/arm64-csel.ll | 16 +- llvm/test/CodeGen/AArch64/arm64-dup.ll | 10 +- llvm/test/CodeGen/AArch64/arm64-fcopysign.ll | 18 +- llvm/test/CodeGen/AArch64/arm64-fmadd.ll | 4 +- .../arm64-homogeneous-prolog-epilog-no-helper.ll | 18 +- llvm/test/CodeGen/AArch64/arm64-indexed-memory.ll | 54 +- .../CodeGen/AArch64/arm64-indexed-vector-ldst.ll | 180 +- llvm/test/CodeGen/AArch64/arm64-inline-asm.ll | 8 +- .../AArch64/arm64-instruction-mix-remarks.ll | 20 +- llvm/test/CodeGen/AArch64/arm64-ldp.ll | 20 +- llvm/test/CodeGen/AArch64/arm64-memset-inline.ll | 4 +- llvm/test/CodeGen/AArch64/arm64-neon-3vdiff.ll | 64 +- llvm/test/CodeGen/AArch64/arm64-neon-aba-abd.ll | 6 +- llvm/test/CodeGen/AArch64/arm64-neon-copy.ll | 13 +- llvm/test/CodeGen/AArch64/arm64-neon-mul-div.ll | 1428 ++++---- llvm/test/CodeGen/AArch64/arm64-nvcast.ll | 10 +- llvm/test/CodeGen/AArch64/arm64-popcnt.ll | 198 +- .../arm64-promote-const-complex-initializers.ll | 8 +- .../test/CodeGen/AArch64/arm64-register-pairing.ll | 4 +- llvm/test/CodeGen/AArch64/arm64-rev.ll | 14 +- .../AArch64/arm64-setcc-int-to-fp-combine.ll | 20 +- llvm/test/CodeGen/AArch64/arm64-shrink-wrapping.ll | 92 +- llvm/test/CodeGen/AArch64/arm64-sli-sri-opt.ll | 30 +- llvm/test/CodeGen/AArch64/arm64-srl-and.ll | 2 +- .../test/CodeGen/AArch64/arm64-subvector-extend.ll | 630 ++-- llvm/test/CodeGen/AArch64/arm64-tls-dynamics.ll | 8 +- llvm/test/CodeGen/AArch64/arm64-tls-local-exec.ll | 8 +- llvm/test/CodeGen/AArch64/arm64-trunc-store.ll | 4 +- llvm/test/CodeGen/AArch64/arm64-vabs.ll | 446 ++- llvm/test/CodeGen/AArch64/arm64-vhadd.ll | 32 +- llvm/test/CodeGen/AArch64/arm64-vmul.ll | 226 +- llvm/test/CodeGen/AArch64/arm64-windows-calls.ll | 19 +- .../CodeGen/AArch64/arm64-zero-cycle-zeroing.ll | 8 +- llvm/test/CodeGen/AArch64/arm64_32-addrs.ll | 6 +- llvm/test/CodeGen/AArch64/arm64_32-atomics.ll | 2 +- llvm/test/CodeGen/AArch64/atomic-ops-lse.ll | 17 +- .../CodeGen/AArch64/atomic-ops-not-barriers.ll | 2 +- llvm/test/CodeGen/AArch64/bcmp-inline-small.ll | 4 +- llvm/test/CodeGen/AArch64/bitcast-promote-widen.ll | 8 +- llvm/test/CodeGen/AArch64/bitfield-insert.ll | 34 +- llvm/test/CodeGen/AArch64/build-one-lane.ll | 9 +- llvm/test/CodeGen/AArch64/build-vector-extract.ll | 126 +- llvm/test/CodeGen/AArch64/cgp-usubo.ll | 24 +- llvm/test/CodeGen/AArch64/cmp-select-sign.ll | 44 +- llvm/test/CodeGen/AArch64/cmpxchg-idioms.ll | 16 +- .../CodeGen/AArch64/combine-comparisons-by-cse.ll | 50 +- llvm/test/CodeGen/AArch64/cond-sel-value-prop.ll | 12 +- llvm/test/CodeGen/AArch64/consthoist-gep.ll | 32 +- llvm/test/CodeGen/AArch64/csr-split.ll | 4 +- llvm/test/CodeGen/AArch64/ctpop-nonean.ll | 30 +- llvm/test/CodeGen/AArch64/dag-combine-select.ll | 2 +- .../CodeGen/AArch64/dag-combine-trunc-build-vec.ll | 14 +- llvm/test/CodeGen/AArch64/dag-numsignbits.ll | 12 +- .../AArch64/div-rem-pair-recomposition-signed.ll | 210 +- .../AArch64/div-rem-pair-recomposition-unsigned.ll | 210 +- llvm/test/CodeGen/AArch64/emutls.ll | 6 +- llvm/test/CodeGen/AArch64/expand-select.ll | 50 +- llvm/test/CodeGen/AArch64/expand-vector-rot.ll | 12 +- llvm/test/CodeGen/AArch64/extract-bits.ll | 484 +-- llvm/test/CodeGen/AArch64/extract-lowbits.ll | 116 +- llvm/test/CodeGen/AArch64/f16-instructions.ll | 18 +- llvm/test/CodeGen/AArch64/fabs.ll | 8 +- llvm/test/CodeGen/AArch64/fadd-combines.ll | 14 +- llvm/test/CodeGen/AArch64/faddp-half.ll | 8 +- .../CodeGen/AArch64/fast-isel-addressing-modes.ll | 6 +- .../CodeGen/AArch64/fast-isel-branch-cond-split.ll | 4 +- llvm/test/CodeGen/AArch64/fast-isel-gep.ll | 6 +- llvm/test/CodeGen/AArch64/fast-isel-memcpy.ll | 6 +- llvm/test/CodeGen/AArch64/fast-isel-shift.ll | 24 +- llvm/test/CodeGen/AArch64/fdiv_combine.ll | 6 +- llvm/test/CodeGen/AArch64/fold-global-offsets.ll | 10 +- llvm/test/CodeGen/AArch64/fp16-v8-instructions.ll | 1441 ++++---- llvm/test/CodeGen/AArch64/fp16-vector-shuffle.ll | 2 +- llvm/test/CodeGen/AArch64/fptosi-sat-scalar.ll | 198 +- llvm/test/CodeGen/AArch64/fptosi-sat-vector.ll | 958 +++--- llvm/test/CodeGen/AArch64/fptoui-sat-scalar.ll | 114 +- llvm/test/CodeGen/AArch64/fptoui-sat-vector.ll | 708 ++-- .../CodeGen/AArch64/framelayout-frame-record.mir | 3 +- .../CodeGen/AArch64/framelayout-unaligned-fp.ll | 4 +- llvm/test/CodeGen/AArch64/func-calls.ll | 2 +- llvm/test/CodeGen/AArch64/funnel-shift-rot.ll | 30 +- llvm/test/CodeGen/AArch64/funnel-shift.ll | 108 +- llvm/test/CodeGen/AArch64/global-merge-3.ll | 24 +- llvm/test/CodeGen/AArch64/half.ll | 10 +- .../hoist-and-by-const-from-lshr-in-eqcmp-zero.ll | 6 +- .../test/CodeGen/AArch64/hwasan-check-memaccess.ll | 2 +- .../CodeGen/AArch64/i128_volatile_load_store.ll | 36 +- llvm/test/CodeGen/AArch64/implicit-null-check.ll | 12 +- .../AArch64/insert-subvector-res-legalization.ll | 70 +- llvm/test/CodeGen/AArch64/isinf.ll | 2 +- llvm/test/CodeGen/AArch64/known-never-nan.ll | 16 +- llvm/test/CodeGen/AArch64/ldst-opt.ll | 5 +- llvm/test/CodeGen/AArch64/llvm-ir-to-intrinsic.ll | 163 +- llvm/test/CodeGen/AArch64/logical_shifted_reg.ll | 137 +- llvm/test/CodeGen/AArch64/lowerMUL-newload.ll | 24 +- .../CodeGen/AArch64/machine-licm-sink-instr.ll | 24 +- .../test/CodeGen/AArch64/machine-outliner-throw.ll | 4 +- .../AArch64/machine_cse_impdef_killflags.ll | 4 +- llvm/test/CodeGen/AArch64/madd-lohi.ll | 4 +- llvm/test/CodeGen/AArch64/memcpy-scoped-aa.ll | 50 +- llvm/test/CodeGen/AArch64/merge-trunc-store.ll | 72 +- llvm/test/CodeGen/AArch64/midpoint-int.ll | 308 +- llvm/test/CodeGen/AArch64/min-max.ll | 260 +- llvm/test/CodeGen/AArch64/minmax-of-minmax.ll | 256 +- llvm/test/CodeGen/AArch64/minmax.ll | 10 +- llvm/test/CodeGen/AArch64/misched-fusion-lit.ll | 5 +- llvm/test/CodeGen/AArch64/misched-fusion.ll | 4 +- .../CodeGen/AArch64/named-vector-shuffles-neon.ll | 18 +- .../CodeGen/AArch64/named-vector-shuffles-sve.ll | 408 +-- llvm/test/CodeGen/AArch64/neg-abs.ll | 8 +- llvm/test/CodeGen/AArch64/neg-imm.ll | 3 +- .../CodeGen/AArch64/neon-bitwise-instructions.ll | 6 +- llvm/test/CodeGen/AArch64/neon-dotpattern.ll | 4 +- llvm/test/CodeGen/AArch64/neon-dotreduce.ll | 88 +- llvm/test/CodeGen/AArch64/neon-mla-mls.ll | 30 +- llvm/test/CodeGen/AArch64/neon-mov.ll | 2 +- llvm/test/CodeGen/AArch64/neon-reverseshuffle.ll | 2 +- llvm/test/CodeGen/AArch64/neon-shift-neg.ll | 24 +- llvm/test/CodeGen/AArch64/neon-truncstore.ll | 30 +- llvm/test/CodeGen/AArch64/nontemporal.ll | 74 +- llvm/test/CodeGen/AArch64/overeager_mla_fusing.ll | 10 +- llvm/test/CodeGen/AArch64/pow.ll | 12 +- .../pull-conditional-binop-through-shift.ll | 6 +- llvm/test/CodeGen/AArch64/qmovn.ll | 8 +- .../AArch64/ragreedy-local-interval-cost.ll | 187 +- llvm/test/CodeGen/AArch64/rand.ll | 10 +- llvm/test/CodeGen/AArch64/reduce-and.ll | 348 +- llvm/test/CodeGen/AArch64/reduce-or.ll | 348 +- llvm/test/CodeGen/AArch64/reduce-xor.ll | 164 +- llvm/test/CodeGen/AArch64/regress-tblgen-chains.ll | 4 +- llvm/test/CodeGen/AArch64/rotate-extract.ll | 14 +- .../rvmarker-pseudo-expansion-and-outlining.mir | 4 +- llvm/test/CodeGen/AArch64/sadd_sat.ll | 12 +- llvm/test/CodeGen/AArch64/sadd_sat_plus.ll | 36 +- llvm/test/CodeGen/AArch64/sadd_sat_vec.ll | 68 +- llvm/test/CodeGen/AArch64/sat-add.ll | 30 +- llvm/test/CodeGen/AArch64/sdivpow2.ll | 2 +- llvm/test/CodeGen/AArch64/seh-finally.ll | 8 +- llvm/test/CodeGen/AArch64/select-with-and-or.ll | 32 +- llvm/test/CodeGen/AArch64/select_const.ll | 112 +- llvm/test/CodeGen/AArch64/select_fmf.ll | 32 +- llvm/test/CodeGen/AArch64/selectcc-to-shiftand.ll | 16 +- llvm/test/CodeGen/AArch64/settag-merge-order.ll | 4 +- llvm/test/CodeGen/AArch64/settag-merge.ll | 8 +- llvm/test/CodeGen/AArch64/settag.ll | 10 +- llvm/test/CodeGen/AArch64/shift-amount-mod.ll | 168 +- llvm/test/CodeGen/AArch64/shift-by-signext.ll | 20 +- llvm/test/CodeGen/AArch64/shift-mod.ll | 2 +- llvm/test/CodeGen/AArch64/shrink-wrapping-vla.ll | 4 +- llvm/test/CodeGen/AArch64/sibling-call.ll | 2 +- llvm/test/CodeGen/AArch64/signbit-shift.ll | 8 +- llvm/test/CodeGen/AArch64/sink-addsub-of-const.ll | 48 +- llvm/test/CodeGen/AArch64/sitofp-fixed-legal.ll | 18 +- .../CodeGen/AArch64/speculation-hardening-loads.ll | 4 +- .../test/CodeGen/AArch64/speculation-hardening.mir | 2 +- llvm/test/CodeGen/AArch64/split-vector-insert.ll | 70 +- llvm/test/CodeGen/AArch64/sqrt-fastmath.ll | 254 +- llvm/test/CodeGen/AArch64/srem-lkk.ll | 2 +- .../CodeGen/AArch64/srem-seteq-illegal-types.ll | 90 +- llvm/test/CodeGen/AArch64/srem-seteq-optsize.ll | 16 +- .../CodeGen/AArch64/srem-seteq-vec-nonsplat.ll | 382 +-- llvm/test/CodeGen/AArch64/srem-seteq-vec-splat.ll | 64 +- llvm/test/CodeGen/AArch64/srem-seteq.ll | 12 +- llvm/test/CodeGen/AArch64/srem-vector-lkk.ll | 446 +-- llvm/test/CodeGen/AArch64/ssub_sat.ll | 12 +- llvm/test/CodeGen/AArch64/ssub_sat_plus.ll | 36 +- llvm/test/CodeGen/AArch64/ssub_sat_vec.ll | 68 +- .../CodeGen/AArch64/stack-guard-remat-bitcast.ll | 12 +- llvm/test/CodeGen/AArch64/stack-guard-sysreg.ll | 30 +- .../CodeGen/AArch64/statepoint-call-lowering.ll | 6 +- .../AArch64/sve-calling-convention-mixed.ll | 16 +- llvm/test/CodeGen/AArch64/sve-expand-div.ll | 12 +- llvm/test/CodeGen/AArch64/sve-extract-element.ll | 4 +- .../CodeGen/AArch64/sve-extract-fixed-vector.ll | 64 +- .../CodeGen/AArch64/sve-extract-scalable-vector.ll | 60 +- llvm/test/CodeGen/AArch64/sve-fcopysign.ll | 18 +- llvm/test/CodeGen/AArch64/sve-fcvt.ll | 64 +- .../CodeGen/AArch64/sve-fixed-length-concat.ll | 28 +- .../AArch64/sve-fixed-length-extract-vector-elt.ll | 12 +- .../AArch64/sve-fixed-length-float-compares.ll | 28 +- .../AArch64/sve-fixed-length-fp-extend-trunc.ll | 54 +- .../CodeGen/AArch64/sve-fixed-length-fp-select.ll | 48 +- .../CodeGen/AArch64/sve-fixed-length-fp-to-int.ll | 54 +- .../CodeGen/AArch64/sve-fixed-length-fp-vselect.ll | 1716 +++++----- .../AArch64/sve-fixed-length-insert-vector-elt.ll | 148 +- .../CodeGen/AArch64/sve-fixed-length-int-div.ll | 216 +- .../AArch64/sve-fixed-length-int-extends.ll | 56 +- .../AArch64/sve-fixed-length-int-immediates.ll | 56 +- .../CodeGen/AArch64/sve-fixed-length-int-mulh.ll | 30 +- .../CodeGen/AArch64/sve-fixed-length-int-rem.ll | 282 +- .../CodeGen/AArch64/sve-fixed-length-int-select.ll | 144 +- .../CodeGen/AArch64/sve-fixed-length-int-to-fp.ll | 108 +- .../AArch64/sve-fixed-length-int-vselect.ll | 3584 ++++++++++---------- .../AArch64/sve-fixed-length-masked-gather.ll | 296 +- .../AArch64/sve-fixed-length-masked-loads.ll | 46 +- .../AArch64/sve-fixed-length-masked-scatter.ll | 342 +- .../AArch64/sve-fixed-length-masked-stores.ll | 82 +- .../AArch64/sve-fixed-length-vector-shuffle.ll | 78 +- llvm/test/CodeGen/AArch64/sve-forward-st-to-ld.ll | 7 +- llvm/test/CodeGen/AArch64/sve-fptrunc-store.ll | 4 +- llvm/test/CodeGen/AArch64/sve-gep.ll | 4 +- .../CodeGen/AArch64/sve-implicit-zero-filling.ll | 13 +- llvm/test/CodeGen/AArch64/sve-insert-element.ll | 192 +- llvm/test/CodeGen/AArch64/sve-insert-vector.ll | 80 +- llvm/test/CodeGen/AArch64/sve-int-arith-imm.ll | 30 +- llvm/test/CodeGen/AArch64/sve-int-arith.ll | 2 +- llvm/test/CodeGen/AArch64/sve-intrinsics-index.ll | 10 +- .../CodeGen/AArch64/sve-intrinsics-int-arith.ll | 4 +- llvm/test/CodeGen/AArch64/sve-ld-post-inc.ll | 6 +- llvm/test/CodeGen/AArch64/sve-ld1r.ll | 2 +- .../sve-lsr-scaled-index-addressing-mode.ll | 1 + .../CodeGen/AArch64/sve-masked-gather-legalize.ll | 6 +- .../CodeGen/AArch64/sve-masked-scatter-legalize.ll | 2 +- llvm/test/CodeGen/AArch64/sve-masked-scatter.ll | 2 +- llvm/test/CodeGen/AArch64/sve-pred-arith.ll | 16 +- llvm/test/CodeGen/AArch64/sve-sext-zext.ll | 12 +- llvm/test/CodeGen/AArch64/sve-split-extract-elt.ll | 100 +- llvm/test/CodeGen/AArch64/sve-split-fcvt.ll | 40 +- llvm/test/CodeGen/AArch64/sve-split-fp-reduce.ll | 2 +- llvm/test/CodeGen/AArch64/sve-split-insert-elt.ll | 72 +- llvm/test/CodeGen/AArch64/sve-split-int-reduce.ll | 10 +- llvm/test/CodeGen/AArch64/sve-split-load.ll | 6 +- llvm/test/CodeGen/AArch64/sve-split-store.ll | 6 +- .../AArch64/sve-st1-addressing-mode-reg-imm.ll | 12 +- llvm/test/CodeGen/AArch64/sve-stepvector.ll | 22 +- llvm/test/CodeGen/AArch64/sve-trunc.ll | 30 +- llvm/test/CodeGen/AArch64/sve-vscale-attr.ll | 40 +- llvm/test/CodeGen/AArch64/sve-vscale.ll | 2 +- llvm/test/CodeGen/AArch64/sve-vselect-imm.ll | 12 +- llvm/test/CodeGen/AArch64/swift-async.ll | 20 +- llvm/test/CodeGen/AArch64/swift-return.ll | 2 +- llvm/test/CodeGen/AArch64/swifterror.ll | 6 +- llvm/test/CodeGen/AArch64/tiny-model-pic.ll | 12 +- llvm/test/CodeGen/AArch64/tiny-model-static.ll | 12 +- .../test/CodeGen/AArch64/typepromotion-overflow.ll | 136 +- llvm/test/CodeGen/AArch64/typepromotion-signed.ll | 38 +- llvm/test/CodeGen/AArch64/uadd_sat.ll | 6 +- llvm/test/CodeGen/AArch64/uadd_sat_plus.ll | 30 +- llvm/test/CodeGen/AArch64/uadd_sat_vec.ll | 72 +- .../AArch64/umulo-128-legalisation-lowering.ll | 27 +- ...old-masked-merge-scalar-constmask-innerouter.ll | 18 +- ...asked-merge-scalar-constmask-interleavedbits.ll | 12 +- ...merge-scalar-constmask-interleavedbytehalves.ll | 12 +- ...unfold-masked-merge-scalar-constmask-lowhigh.ll | 2 +- .../unfold-masked-merge-scalar-variablemask.ll | 98 +- llvm/test/CodeGen/AArch64/urem-lkk.ll | 20 +- .../CodeGen/AArch64/urem-seteq-illegal-types.ll | 28 +- llvm/test/CodeGen/AArch64/urem-seteq-nonzero.ll | 46 +- llvm/test/CodeGen/AArch64/urem-seteq-optsize.ll | 14 +- .../CodeGen/AArch64/urem-seteq-vec-nonsplat.ll | 340 +- .../test/CodeGen/AArch64/urem-seteq-vec-nonzero.ll | 56 +- llvm/test/CodeGen/AArch64/urem-seteq-vec-splat.ll | 38 +- .../CodeGen/AArch64/urem-seteq-vec-tautological.ll | 56 +- llvm/test/CodeGen/AArch64/urem-seteq.ll | 14 +- llvm/test/CodeGen/AArch64/urem-vector-lkk.ll | 330 +- .../AArch64/use-cr-result-of-dom-icmp-st.ll | 8 +- llvm/test/CodeGen/AArch64/usub_sat_plus.ll | 20 +- llvm/test/CodeGen/AArch64/usub_sat_vec.ll | 48 +- llvm/test/CodeGen/AArch64/vcvt-oversize.ll | 4 +- llvm/test/CodeGen/AArch64/vec-libcalls.ll | 34 +- llvm/test/CodeGen/AArch64/vec_cttz.ll | 8 +- llvm/test/CodeGen/AArch64/vec_uaddo.ll | 168 +- llvm/test/CodeGen/AArch64/vec_umulo.ll | 296 +- .../CodeGen/AArch64/vecreduce-and-legalization.ll | 36 +- .../AArch64/vecreduce-fadd-legalization-strict.ll | 96 +- .../CodeGen/AArch64/vecreduce-fadd-legalization.ll | 6 +- llvm/test/CodeGen/AArch64/vecreduce-fadd.ll | 188 +- .../CodeGen/AArch64/vecreduce-fmax-legalization.ll | 246 +- .../CodeGen/AArch64/vecreduce-fmin-legalization.ll | 246 +- .../CodeGen/AArch64/vecreduce-umax-legalization.ll | 14 +- llvm/test/CodeGen/AArch64/vector-fcopysign.ll | 346 +- llvm/test/CodeGen/AArch64/vector-gep.ll | 6 +- .../CodeGen/AArch64/vector-popcnt-128-ult-ugt.ll | 680 ++-- llvm/test/CodeGen/AArch64/vldn_shuffle.ll | 6 +- llvm/test/CodeGen/AArch64/vselect-constants.ll | 42 +- llvm/test/CodeGen/AArch64/win-tls.ll | 6 +- llvm/test/CodeGen/AArch64/win64_vararg.ll | 32 +- llvm/test/CodeGen/AArch64/win64_vararg_float.ll | 12 +- llvm/test/CodeGen/AArch64/win64_vararg_float_cc.ll | 12 +- llvm/test/CodeGen/AArch64/xor.ll | 8 +- llvm/test/MC/AArch64/elf-globaladdress.ll | 6 +- .../CanonicalizeFreezeInLoops/aarch64.ll | 2 +- .../CodeGenPrepare/AArch64/large-offset-gep.ll | 30 +- .../AArch64/lsr-pre-inc-offset-check.ll | 12 +- .../LoopStrengthReduce/AArch64/small-constant.ll | 2 +- .../aarch64_generated_funcs.ll.generated.expected | 30 +- ...aarch64_generated_funcs.ll.nogenerated.expected | 24 +- 319 files changed, 14045 insertions(+), 13817 deletions(-) diff --git a/llvm/lib/Target/AArch64/AArch64.td b/llvm/lib/Target/AArch64/AArch64.td index 5c1bf783ba2a..cb52532343fe 100644 --- a/llvm/lib/Target/AArch64/AArch64.td +++ b/llvm/lib/Target/AArch64/AArch64.td @@ -1156,7 +1156,7 @@ def ProcTSV110 : SubtargetFeature<"tsv110", "ARMProcFamily", "TSV110", FeatureFP16FML, FeatureDotProd]>; -def : ProcessorModel<"generic", NoSchedModel, [ +def : ProcessorModel<"generic", CortexA55Model, [ FeatureFPARMv8, FeatureFuseAES, FeatureNEON, diff --git a/llvm/test/Analysis/CostModel/AArch64/shuffle-select.ll b/llvm/test/Analysis/CostModel/AArch64/shuffle-select.ll index 5008c7f5c847..cb8ec7ba6f21 100644 --- a/llvm/test/Analysis/CostModel/AArch64/shuffle-select.ll +++ b/llvm/test/Analysis/CostModel/AArch64/shuffle-select.ll @@ -4,7 +4,7 @@ ; COST-LABEL: sel.v8i8 ; COST: Found an estimated cost of 42 for instruction: %tmp0 = shufflevector <8 x i8> %v0, <8 x i8> %v1, <8 x i32> <i32 0, i32 9, i32 2, i32 11, i32 4, i32 13, i32 6, i32 15> ; CODE-LABEL: sel.v8i8 -; CODE: tbl v0.8b, { v0.16b }, v2.8b +; CODE: tbl v0.8b, { v0.16b }, v1.8b define <8 x i8> @sel.v8i8(<8 x i8> %v0, <8 x i8> %v1) { %tmp0 = shufflevector <8 x i8> %v0, <8 x i8> %v1, <8 x i32> <i32 0, i32 9, i32 2, i32 11, i32 4, i32 13, i32 6, i32 15> ret <8 x i8> %tmp0 diff --git a/llvm/test/Analysis/CostModel/AArch64/vector-select.ll b/llvm/test/Analysis/CostModel/AArch64/vector-select.ll index f2271c4ed71f..6e77612815f4 100644 --- a/llvm/test/Analysis/CostModel/AArch64/vector-select.ll +++ b/llvm/test/Analysis/CostModel/AArch64/vector-select.ll @@ -119,15 +119,15 @@ define <2 x i64> @v2i64_select_sle(<2 x i64> %a, <2 x i64> %b, <2 x i64> %c) { ; CODE-LABEL: v3i64_select_sle ; CODE: bb.0 -; CODE: ldr ; CODE: mov +; CODE: ldr ; CODE: mov ; CODE: mov ; CODE: cmge ; CODE: cmge ; CODE: bif -; CODE: ext ; CODE: bif +; CODE: ext ; CODE: ret define <3 x i64> @v3i64_select_sle(<3 x i64> %a, <3 x i64> %b, <3 x i64> %c) { diff --git a/llvm/test/CodeGen/AArch64/DAGCombine_vscale.ll b/llvm/test/CodeGen/AArch64/DAGCombine_vscale.ll index 6fe73e067e1a..c2436ccecc75 100644 --- a/llvm/test/CodeGen/AArch64/DAGCombine_vscale.ll +++ b/llvm/test/CodeGen/AArch64/DAGCombine_vscale.ll @@ -51,8 +51,8 @@ define <vscale x 4 x i32> @ashr_add_shl_nxv4i8(<vscale x 4 x i32> %a) { ; CHECK-LABEL: ashr_add_shl_nxv4i8: ; CHECK: // %bb.0: ; CHECK-NEXT: mov w8, #16777216 -; CHECK-NEXT: mov z1.s, w8 ; CHECK-NEXT: lsl z0.s, z0.s, #24 +; CHECK-NEXT: mov z1.s, w8 ; CHECK-NEXT: add z0.s, z0.s, z1.s ; CHECK-NEXT: asr z0.s, z0.s, #24 ; CHECK-NEXT: ret diff --git a/llvm/test/CodeGen/AArch64/GlobalISel/arm64-atomic.ll b/llvm/test/CodeGen/AArch64/GlobalISel/arm64-atomic.ll index fd3a0072d2a8..4385e3ede36f 100644 --- a/llvm/test/CodeGen/AArch64/GlobalISel/arm64-atomic.ll +++ b/llvm/test/CodeGen/AArch64/GlobalISel/arm64-atomic.ll @@ -705,14 +705,14 @@ define i32 @atomic_load(i32* %p) #0 { define i8 @atomic_load_relaxed_8(i8* %p, i32 %off32) #0 { ; CHECK-NOLSE-O1-LABEL: atomic_load_relaxed_8: ; CHECK-NOLSE-O1: ; %bb.0: -; CHECK-NOLSE-O1-NEXT: ldrb w8, [x0, #4095] -; CHECK-NOLSE-O1-NEXT: ldrb w9, [x0, w1, sxtw] -; CHECK-NOLSE-O1-NEXT: ldurb w10, [x0, #-256] -; CHECK-NOLSE-O1-NEXT: add x11, x0, #291, lsl #12 ; =1191936 -; CHECK-NOLSE-O1-NEXT: ldrb w11, [x11] -; CHECK-NOLSE-O1-NEXT: add w8, w8, w9 -; CHECK-NOLSE-O1-NEXT: add w8, w8, w10 -; CHECK-NOLSE-O1-NEXT: add w0, w8, w11 +; CHECK-NOLSE-O1-NEXT: add x8, x0, #291, lsl #12 ; =1191936 +; CHECK-NOLSE-O1-NEXT: ldrb w9, [x0, #4095] +; CHECK-NOLSE-O1-NEXT: ldrb w10, [x0, w1, sxtw] +; CHECK-NOLSE-O1-NEXT: ldurb w11, [x0, #-256] +; CHECK-NOLSE-O1-NEXT: ldrb w8, [x8] +; CHECK-NOLSE-O1-NEXT: add w9, w9, w10 +; CHECK-NOLSE-O1-NEXT: add w9, w9, w11 +; CHECK-NOLSE-O1-NEXT: add w0, w9, w8 ; CHECK-NOLSE-O1-NEXT: ret ; ; CHECK-NOLSE-O0-LABEL: atomic_load_relaxed_8: @@ -775,14 +775,14 @@ define i8 @atomic_load_relaxed_8(i8* %p, i32 %off32) #0 { define i16 @atomic_load_relaxed_16(i16* %p, i32 %off32) #0 { ; CHECK-NOLSE-O1-LABEL: atomic_load_relaxed_16: ; CHECK-NOLSE-O1: ; %bb.0: -; CHECK-NOLSE-O1-NEXT: ldrh w8, [x0, #8190] -; CHECK-NOLSE-O1-NEXT: ldrh w9, [x0, w1, sxtw #1] -; CHECK-NOLSE-O1-NEXT: ldurh w10, [x0, #-256] -; CHECK-NOLSE-O1-NEXT: add x11, x0, #291, lsl #12 ; =1191936 -; CHECK-NOLSE-O1-NEXT: ldrh w11, [x11] -; CHECK-NOLSE-O1-NEXT: add w8, w8, w9 -; CHECK-NOLSE-O1-NEXT: add w8, w8, w10 -; CHECK-NOLSE-O1-NEXT: add w0, w8, w11 +; CHECK-NOLSE-O1-NEXT: add x8, x0, #291, lsl #12 ; =1191936 +; CHECK-NOLSE-O1-NEXT: ldrh w9, [x0, #8190] +; CHECK-NOLSE-O1-NEXT: ldrh w10, [x0, w1, sxtw #1] +; CHECK-NOLSE-O1-NEXT: ldurh w11, [x0, #-256] +; CHECK-NOLSE-O1-NEXT: ldrh w8, [x8] +; CHECK-NOLSE-O1-NEXT: add w9, w9, w10 +; CHECK-NOLSE-O1-NEXT: add w9, w9, w11 +; CHECK-NOLSE-O1-NEXT: add w0, w9, w8 ; CHECK-NOLSE-O1-NEXT: ret ; ; CHECK-NOLSE-O0-LABEL: atomic_load_relaxed_16: @@ -845,14 +845,14 @@ define i16 @atomic_load_relaxed_16(i16* %p, i32 %off32) #0 { define i32 @atomic_load_relaxed_32(i32* %p, i32 %off32) #0 { ; CHECK-NOLSE-O1-LABEL: atomic_load_relaxed_32: ; CHECK-NOLSE-O1: ; %bb.0: -; CHECK-NOLSE-O1-NEXT: ldr w8, [x0, #16380] -; CHECK-NOLSE-O1-NEXT: ldr w9, [x0, w1, sxtw #2] -; CHECK-NOLSE-O1-NEXT: ldur w10, [x0, #-256] -; CHECK-NOLSE-O1-NEXT: add x11, x0, #291, lsl #12 ; =1191936 -; CHECK-NOLSE-O1-NEXT: ldr w11, [x11] -; CHECK-NOLSE-O1-NEXT: add w8, w8, w9 -; CHECK-NOLSE-O1-NEXT: add w8, w8, w10 -; CHECK-NOLSE-O1-NEXT: add w0, w8, w11 +; CHECK-NOLSE-O1-NEXT: add x8, x0, #291, lsl #12 ; =1191936 +; CHECK-NOLSE-O1-NEXT: ldr w9, [x0, #16380] +; CHECK-NOLSE-O1-NEXT: ldr w10, [x0, w1, sxtw #2] +; CHECK-NOLSE-O1-NEXT: ldur w11, [x0, #-256] +; CHECK-NOLSE-O1-NEXT: ldr w8, [x8] +; CHECK-NOLSE-O1-NEXT: add w9, w9, w10 +; CHECK-NOLSE-O1-NEXT: add w9, w9, w11 +; CHECK-NOLSE-O1-NEXT: add w0, w9, w8 ; CHECK-NOLSE-O1-NEXT: ret ; ; CHECK-NOLSE-O0-LABEL: atomic_load_relaxed_32: @@ -911,14 +911,14 @@ define i32 @atomic_load_relaxed_32(i32* %p, i32 %off32) #0 { define i64 @atomic_load_relaxed_64(i64* %p, i32 %off32) #0 { ; CHECK-NOLSE-O1-LABEL: atomic_load_relaxed_64: ; CHECK-NOLSE-O1: ; %bb.0: -; CHECK-NOLSE-O1-NEXT: ldr x8, [x0, #32760] -; CHECK-NOLSE-O1-NEXT: ldr x9, [x0, w1, sxtw #3] -; CHECK-NOLSE-O1-NEXT: ldur x10, [x0, #-256] -; CHECK-NOLSE-O1-NEXT: add x11, x0, #291, lsl #12 ; =1191936 -; CHECK-NOLSE-O1-NEXT: ldr x11, [x11] -; CHECK-NOLSE-O1-NEXT: add x8, x8, x9 -; CHECK-NOLSE-O1-NEXT: add x8, x8, x10 -; CHECK-NOLSE-O1-NEXT: add x0, x8, x11 +; CHECK-NOLSE-O1-NEXT: add x8, x0, #291, lsl #12 ; =1191936 +; CHECK-NOLSE-O1-NEXT: ldr x9, [x0, #32760] +; CHECK-NOLSE-O1-NEXT: ldr x10, [x0, w1, sxtw #3] +; CHECK-NOLSE-O1-NEXT: ldur x11, [x0, #-256] +; CHECK-NOLSE-O1-NEXT: ldr x8, [x8] +; CHECK-NOLSE-O1-NEXT: add x9, x9, x10 +; CHECK-NOLSE-O1-NEXT: add x9, x9, x11 +; CHECK-NOLSE-O1-NEXT: add x0, x9, x8 ; CHECK-NOLSE-O1-NEXT: ret ; ; CHECK-NOLSE-O0-LABEL: atomic_load_relaxed_64: @@ -2717,8 +2717,8 @@ define { i8, i1 } @cmpxchg_i8(i8* %ptr, i8 %desired, i8 %new) { ; CHECK-NOLSE-O1-NEXT: ; kill: def $w0 killed $w0 killed $x0 ; CHECK-NOLSE-O1-NEXT: ret ; CHECK-NOLSE-O1-NEXT: LBB47_4: ; %cmpxchg.nostore -; CHECK-NOLSE-O1-NEXT: clrex ; CHECK-NOLSE-O1-NEXT: mov w1, wzr +; CHECK-NOLSE-O1-NEXT: clrex ; CHECK-NOLSE-O1-NEXT: ; kill: def $w0 killed $w0 killed $x0 ; CHECK-NOLSE-O1-NEXT: ret ; @@ -2783,8 +2783,8 @@ define { i16, i1 } @cmpxchg_i16(i16* %ptr, i16 %desired, i16 %new) { ; CHECK-NOLSE-O1-NEXT: ; kill: def $w0 killed $w0 killed $x0 ; CHECK-NOLSE-O1-NEXT: ret ; CHECK-NOLSE-O1-NEXT: LBB48_4: ; %cmpxchg.nostore -; CHECK-NOLSE-O1-NEXT: clrex ; CHECK-NOLSE-O1-NEXT: mov w1, wzr +; CHECK-NOLSE-O1-NEXT: clrex ; CHECK-NOLSE-O1-NEXT: ; kill: def $w0 killed $w0 killed $x0 ; CHECK-NOLSE-O1-NEXT: ret ; diff --git a/llvm/test/CodeGen/AArch64/GlobalISel/byval-call.ll b/llvm/test/CodeGen/AArch64/GlobalISel/byval-call.ll index f8d4731d3249..651ca31ae555 100644 --- a/llvm/test/CodeGen/AArch64/GlobalISel/byval-call.ll +++ b/llvm/test/CodeGen/AArch64/GlobalISel/byval-call.ll @@ -27,8 +27,8 @@ define void @call_byval_a64i32([64 x i32]* %incoming) { ; CHECK: // %bb.0: ; CHECK-NEXT: sub sp, sp, #288 ; CHECK-NEXT: stp x29, x30, [sp, #256] // 16-byte Folded Spill -; CHECK-NEXT: str x28, [sp, #272] // 8-byte Folded Spill ; CHECK-NEXT: add x29, sp, #256 +; CHECK-NEXT: str x28, [sp, #272] // 8-byte Folded Spill ; CHECK-NEXT: .cfi_def_cfa w29, 32 ; CHECK-NEXT: .cfi_offset w28, -16 ; CHECK-NEXT: .cfi_offset w30, -24 @@ -66,8 +66,8 @@ define void @call_byval_a64i32([64 x i32]* %incoming) { ; CHECK-NEXT: ldr q0, [x0, #240] ; CHECK-NEXT: str q0, [sp, #240] ; CHECK-NEXT: bl byval_a64i32 -; CHECK-NEXT: ldr x28, [sp, #272] // 8-byte Folded Reload ; CHECK-NEXT: ldp x29, x30, [sp, #256] // 16-byte Folded Reload +; CHECK-NEXT: ldr x28, [sp, #272] // 8-byte Folded Reload ; CHECK-NEXT: add sp, sp, #288 ; CHECK-NEXT: ret call void @byval_a64i32([64 x i32]* byval([64 x i32]) %incoming) diff --git a/llvm/test/CodeGen/AArch64/GlobalISel/call-translator-variadic-musttail.ll b/llvm/test/CodeGen/AArch64/GlobalISel/call-translator-variadic-musttail.ll index 42e91f631822..44c0854ea03d 100644 --- a/llvm/test/CodeGen/AArch64/GlobalISel/call-translator-variadic-musttail.ll +++ b/llvm/test/CodeGen/AArch64/GlobalISel/call-translator-variadic-musttail.ll @@ -63,15 +63,12 @@ define i32 @test_musttail_variadic_spill(i32 %arg0, ...) { ; CHECK-NEXT: mov x25, x6 ; CHECK-NEXT: mov x26, x7 ; CHECK-NEXT: stp q1, q0, [sp, #96] ; 32-byte Folded Spill +; CHECK-NEXT: mov x27, x8 ; CHECK-NEXT: stp q3, q2, [sp, #64] ; 32-byte Folded Spill ; CHECK-NEXT: stp q5, q4, [sp, #32] ; 32-byte Folded Spill ; CHECK-NEXT: stp q7, q6, [sp] ; 32-byte Folded Spill -; CHECK-NEXT: mov x27, x8 ; CHECK-NEXT: bl _puts ; CHECK-NEXT: ldp q1, q0, [sp, #96] ; 32-byte Folded Reload -; CHECK-NEXT: ldp q3, q2, [sp, #64] ; 32-byte Folded Reload -; CHECK-NEXT: ldp q5, q4, [sp, #32] ; 32-byte Folded Reload -; CHECK-NEXT: ldp q7, q6, [sp] ; 32-byte Folded Reload ; CHECK-NEXT: mov w0, w19 ; CHECK-NEXT: mov x1, x20 ; CHECK-NEXT: mov x2, x21 @@ -81,6 +78,9 @@ define i32 @test_musttail_variadic_spill(i32 %arg0, ...) { ; CHECK-NEXT: mov x6, x25 ; CHECK-NEXT: mov x7, x26 ; CHECK-NEXT: mov x8, x27 +; CHECK-NEXT: ldp q3, q2, [sp, #64] ; 32-byte Folded Reload +; CHECK-NEXT: ldp q5, q4, [sp, #32] ; 32-byte Folded Reload +; CHECK-NEXT: ldp q7, q6, [sp] ; 32-byte Folded Reload ; CHECK-NEXT: ldp x29, x30, [sp, #208] ; 16-byte Folded Reload ; CHECK-NEXT: ldp x20, x19, [sp, #192] ; 16-byte Folded Reload ; CHECK-NEXT: ldp x22, x21, [sp, #176] ; 16-byte Folded Reload @@ -122,9 +122,8 @@ define void @f_thunk(i8* %this, ...) { ; CHECK-NEXT: .cfi_offset w26, -80 ; CHECK-NEXT: .cfi_offset w27, -88 ; CHECK-NEXT: .cfi_offset w28, -96 -; CHECK-NEXT: mov x27, x8 -; CHECK-NEXT: add x8, sp, #128 -; CHECK-NEXT: add x9, sp, #256 +; CHECK-NEXT: add x9, sp, #128 +; CHECK-NEXT: add x10, sp, #256 ; CHECK-NEXT: mov x19, x0 ; CHECK-NEXT: mov x20, x1 ; CHECK-NEXT: mov x21, x2 @@ -134,16 +133,14 @@ define void @f_thunk(i8* %this, ...) { ; CHECK-NEXT: mov x25, x6 ; CHECK-NEXT: mov x26, x7 ; CHECK-NEXT: stp q1, q0, [sp, #96] ; 32-byte Folded Spill +; CHECK-NEXT: mov x27, x8 ; CHECK-NEXT: stp q3, q2, [sp, #64] ; 32-byte Folded Spill ; CHECK-NEXT: stp q5, q4, [sp, #32] ; 32-byte Folded Spill ; CHECK-NEXT: stp q7, q6, [sp] ; 32-byte Folded Spill -; CHECK-NEXT: str x9, [x8] +; CHECK-NEXT: str x10, [x9] ; CHECK-NEXT: bl _get_f -; CHECK-NEXT: mov x9, x0 ; CHECK-NEXT: ldp q1, q0, [sp, #96] ; 32-byte Folded Reload -; CHECK-NEXT: ldp q3, q2, [sp, #64] ; 32-byte Folded Reload -; CHECK-NEXT: ldp q5, q4, [sp, #32] ; 32-byte Folded Reload -; CHECK-NEXT: ldp q7, q6, [sp] ; 32-byte Folded Reload +; CHECK-NEXT: mov x9, x0 ; CHECK-NEXT: mov x0, x19 ; CHECK-NEXT: mov x1, x20 ; CHECK-NEXT: mov x2, x21 @@ -153,6 +150,9 @@ define void @f_thunk(i8* %this, ...) { ; CHECK-NEXT: mov x6, x25 ; CHECK-NEXT: mov x7, x26 ; CHECK-NEXT: mov x8, x27 +; CHECK-NEXT: ldp q3, q2, [sp, #64] ; 32-byte Folded Reload +; CHECK-NEXT: ldp q5, q4, [sp, #32] ; 32-byte Folded Reload +; CHECK-NEXT: ldp q7, q6, [sp] ; 32-byte Folded Reload ; CHECK-NEXT: ldp x29, x30, [sp, #240] ; 16-byte Folded Reload ; CHECK-NEXT: ldp x20, x19, [sp, #224] ; 16-byte Folded Reload ; CHECK-NEXT: ldp x22, x21, [sp, #208] ; 16-byte Folded Reload @@ -195,9 +195,9 @@ define void @h_thunk(%struct.Foo* %this, ...) { ; CHECK-NEXT: Lloh2: ; CHECK-NEXT: adrp x10, _g@GOTPAGE ; CHECK-NEXT: ldr x9, [x0, #16] +; CHECK-NEXT: mov w11, #42 ; CHECK-NEXT: Lloh3: ; CHECK-NEXT: ldr x10, [x10, _g@GOTPAGEOFF] -; CHECK-NEXT: mov w11, #42 ; CHECK-NEXT: Lloh4: ; CHECK-NEXT: str w11, [x10] ; CHECK-NEXT: br x9 diff --git a/llvm/test/CodeGen/AArch64/GlobalISel/combine-udiv.ll b/llvm/test/CodeGen/AArch64/GlobalISel/combine-udiv.ll index 6d9dad450ef1..3dc45e4cf5a7 100644 --- a/llvm/test/CodeGen/AArch64/GlobalISel/combine-udiv.ll +++ b/llvm/test/CodeGen/AArch64/GlobalISel/combine-udiv.ll @@ -18,20 +18,20 @@ define <8 x i16> @combine_vec_udiv_uniform(<8 x i16> %x) { ; ; GISEL-LABEL: combine_vec_udiv_uniform: ; GISEL: // %bb.0: -; GISEL-NEXT: adrp x8, .LCPI0_1 -; GISEL-NEXT: ldr q1, [x8, :lo12:.LCPI0_1] -; GISEL-NEXT: adrp x8, .LCPI0_0 -; GISEL-NEXT: ldr q2, [x8, :lo12:.LCPI0_0] ; GISEL-NEXT: adrp x8, .LCPI0_2 -; GISEL-NEXT: ldr q3, [x8, :lo12:.LCPI0_2] -; GISEL-NEXT: sub v1.8h, v2.8h, v1.8h -; GISEL-NEXT: neg v1.8h, v1.8h -; GISEL-NEXT: umull2 v2.4s, v0.8h, v3.8h -; GISEL-NEXT: umull v3.4s, v0.4h, v3.4h -; GISEL-NEXT: uzp2 v2.8h, v3.8h, v2.8h -; GISEL-NEXT: sub v0.8h, v0.8h, v2.8h -; GISEL-NEXT: ushl v0.8h, v0.8h, v1.8h -; GISEL-NEXT: add v0.8h, v0.8h, v2.8h +; GISEL-NEXT: adrp x9, .LCPI0_0 +; GISEL-NEXT: ldr q1, [x8, :lo12:.LCPI0_2] +; GISEL-NEXT: adrp x8, .LCPI0_1 +; GISEL-NEXT: ldr q4, [x9, :lo12:.LCPI0_0] +; GISEL-NEXT: umull2 v2.4s, v0.8h, v1.8h +; GISEL-NEXT: ldr q3, [x8, :lo12:.LCPI0_1] +; GISEL-NEXT: umull v1.4s, v0.4h, v1.4h +; GISEL-NEXT: uzp2 v1.8h, v1.8h, v2.8h +; GISEL-NEXT: sub v2.8h, v4.8h, v3.8h +; GISEL-NEXT: sub v0.8h, v0.8h, v1.8h +; GISEL-NEXT: neg v2.8h, v2.8h +; GISEL-NEXT: ushl v0.8h, v0.8h, v2.8h +; GISEL-NEXT: add v0.8h, v0.8h, v1.8h ; GISEL-NEXT: ushr v0.8h, v0.8h, #4 ; GISEL-NEXT: ret %1 = udiv <8 x i16> %x, <i16 23, i16 23, i16 23, i16 23, i16 23, i16 23, i16 23, i16 23> @@ -44,53 +44,53 @@ define <8 x i16> @combine_vec_udiv_nonuniform(<8 x i16> %x) { ; SDAG-NEXT: adrp x8, .LCPI1_0 ; SDAG-NEXT: ldr q1, [x8, :lo12:.LCPI1_0] ; SDAG-NEXT: adrp x8, .LCPI1_1 +; SDAG-NEXT: ushl v1.8h, v0.8h, v1.8h ; SDAG-NEXT: ldr q2, [x8, :lo12:.LCPI1_1] ; SDAG-NEXT: adrp x8, .LCPI1_2 -; SDAG-NEXT: ldr q3, [x8, :lo12:.LCPI1_2] -; SDAG-NEXT: ushl v1.8h, v0.8h, v1.8h -; SDAG-NEXT: umull2 v4.4s, v1.8h, v2.8h +; SDAG-NEXT: umull2 v3.4s, v1.8h, v2.8h ; SDAG-NEXT: umull v1.4s, v1.4h, v2.4h +; SDAG-NEXT: ldr q2, [x8, :lo12:.LCPI1_2] ; SDAG-NEXT: adrp x8, .LCPI1_3 -; SDAG-NEXT: uzp2 v1.8h, v1.8h, v4.8h -; SDAG-NEXT: ldr q2, [x8, :lo12:.LCPI1_3] +; SDAG-NEXT: uzp2 v1.8h, v1.8h, v3.8h ; SDAG-NEXT: sub v0.8h, v0.8h, v1.8h -; SDAG-NEXT: umull2 v4.4s, v0.8h, v3.8h -; SDAG-NEXT: umull v0.4s, v0.4h, v3.4h -; SDAG-NEXT: uzp2 v0.8h, v0.8h, v4.8h +; SDAG-NEXT: umull2 v3.4s, v0.8h, v2.8h +; SDAG-NEXT: umull v0.4s, v0.4h, v2.4h +; SDAG-NEXT: uzp2 v0.8h, v0.8h, v3.8h ; SDAG-NEXT: add v0.8h, v0.8h, v1.8h -; SDAG-NEXT: ushl v0.8h, v0.8h, v2.8h +; SDAG-NEXT: ldr q1, [x8, :lo12:.LCPI1_3] +; SDAG-NEXT: ushl v0.8h, v0.8h, v1.8h ; SDAG-NEXT: ret ; ; GISEL-LABEL: combine_vec_udiv_nonuniform: ; GISEL: // %bb.0: -; GISEL-NEXT: adrp x8, .LCPI1_5 -; GISEL-NEXT: ldr q1, [x8, :lo12:.LCPI1_5] ; GISEL-NEXT: adrp x8, .LCPI1_4 -; GISEL-NEXT: ldr q2, [x8, :lo12:.LCPI1_4] +; GISEL-NEXT: adrp x10, .LCPI1_0 +; GISEL-NEXT: adrp x9, .LCPI1_1 +; GISEL-NEXT: ldr q1, [x8, :lo12:.LCPI1_4] ; GISEL-NEXT: adrp x8, .LCPI1_3 -; GISEL-NEXT: ldr q3, [x8, :lo12:.LCPI1_3] -; GISEL-NEXT: adrp x8, .LCPI1_1 -; GISEL-NEXT: ldr q4, [x8, :lo12:.LCPI1_1] -; GISEL-NEXT: adrp x8, .LCPI1_0 -; GISEL-NEXT: ldr q5, [x8, :lo12:.LCPI1_0] +; GISEL-NEXT: ldr q5, [x10, :lo12:.LCPI1_0] +; GISEL-NEXT: ldr q6, [x9, :lo12:.LCPI1_1] +; GISEL-NEXT: neg v1.8h, v1.8h +; GISEL-NEXT: ldr q2, [x8, :lo12:.LCPI1_3] ; GISEL-NEXT: adrp x8, .LCPI1_2 -; GISEL-NEXT: neg v2.8h, v2.8h -; GISEL-NEXT: ldr q6, [x8, :lo12:.LCPI1_2] -; GISEL-NEXT: ushl v2.8h, v0.8h, v2.8h -; GISEL-NEXT: cmeq v1.8h, v1.8h, v5.8h -; GISEL-NEXT: umull2 v5.4s, v2.8h, v3.8h +; GISEL-NEXT: ushl v1.8h, v0.8h, v1.8h +; GISEL-NEXT: umull2 v3.4s, v1.8h, v2.8h +; GISEL-NEXT: umull v1.4s, v1.4h, v2.4h +; GISEL-NEXT: uzp2 v1.8h, v1.8h, v3.8h +; GISEL-NEXT: ldr q3, [x8, :lo12:.LCPI1_2] +; GISEL-NEXT: adrp x8, .LCPI1_5 +; GISEL-NEXT: sub v2.8h, v0.8h, v1.8h +; GISEL-NEXT: umull2 v4.4s, v2.8h, v3.8h ; GISEL-NEXT: umull v2.4s, v2.4h, v3.4h -; GISEL-NEXT: uzp2 v2.8h, v2.8h, v5.8h -; GISEL-NEXT: sub v3.8h, v0.8h, v2.8h -; GISEL-NEXT: umull2 v5.4s, v3.8h, v6.8h -; GISEL-NEXT: umull v3.4s, v3.4h, v6.4h -; GISEL-NEXT: uzp2 v3.8h, v3.8h, v5.8h -; GISEL-NEXT: neg v4.8h, v4.8h -; GISEL-NEXT: shl v1.8h, v1.8h, #15 -; GISEL-NEXT: add v2.8h, v3.8h, v2.8h -; GISEL-NEXT: ushl v2.8h, v2.8h, v4.8h -; GISEL-NEXT: sshr v1.8h, v1.8h, #15 -; GISEL-NEXT: bif v0.16b, v2.16b, v1.16b +; GISEL-NEXT: ldr q3, [x8, :lo12:.LCPI1_5] +; GISEL-NEXT: cmeq v3.8h, v3.8h, v5.8h +; GISEL-NEXT: uzp2 v2.8h, v2.8h, v4.8h +; GISEL-NEXT: neg v4.8h, v6.8h +; GISEL-NEXT: add v1.8h, v2.8h, v1.8h +; GISEL-NEXT: shl v2.8h, v3.8h, #15 +; GISEL-NEXT: ushl v1.8h, v1.8h, v4.8h +; GISEL-NEXT: sshr v2.8h, v2.8h, #15 +; GISEL-NEXT: bif v0.16b, v1.16b, v2.16b ; GISEL-NEXT: ret %1 = udiv <8 x i16> %x, <i16 23, i16 34, i16 -23, i16 56, i16 128, i16 -1, i16 -256, i16 -32768> ret <8 x i16> %1 @@ -100,41 +100,41 @@ define <8 x i16> @combine_vec_udiv_nonuniform2(<8 x i16> %x) { ; SDAG-LABEL: combine_vec_udiv_nonuniform2: ; SDAG: // %bb.0: ; SDAG-NEXT: adrp x8, .LCPI2_0 -; SDAG-NEXT: adrp x9, .LCPI2_1 ; SDAG-NEXT: ldr q1, [x8, :lo12:.LCPI2_0] -; SDAG-NEXT: ldr q2, [x9, :lo12:.LCPI2_1] +; SDAG-NEXT: adrp x8, .LCPI2_1 +; SDAG-NEXT: ushl v0.8h, v0.8h, v1.8h +; SDAG-NEXT: ldr q1, [x8, :lo12:.LCPI2_1] ; SDAG-NEXT: adrp x8, .LCPI2_2 -; SDAG-NEXT: ldr q3, [x8, :lo12:.LCPI2_2] +; SDAG-NEXT: umull2 v2.4s, v0.8h, v1.8h +; SDAG-NEXT: umull v0.4s, v0.4h, v1.4h +; SDAG-NEXT: ldr q1, [x8, :lo12:.LCPI2_2] +; SDAG-NEXT: uzp2 v0.8h, v0.8h, v2.8h ; SDAG-NEXT: ushl v0.8h, v0.8h, v1.8h -; SDAG-NEXT: umull2 v1.4s, v0.8h, v2.8h -; SDAG-NEXT: umull v0.4s, v0.4h, v2.4h -; SDAG-NEXT: uzp2 v0.8h, v0.8h, v1.8h -; SDAG-NEXT: ushl v0.8h, v0.8h, v3.8h ; SDAG-NEXT: ret ; ; GISEL-LABEL: combine_vec_udiv_nonuniform2: ; GISEL: // %bb.0: -; GISEL-NEXT: adrp x8, .LCPI2_4 -; GISEL-NEXT: ldr q1, [x8, :lo12:.LCPI2_4] ; GISEL-NEXT: adrp x8, .LCPI2_3 -; GISEL-NEXT: ldr q2, [x8, :lo12:.LCPI2_3] -; GISEL-NEXT: adrp x8, .LCPI2_1 -; GISEL-NEXT: ldr q3, [x8, :lo12:.LCPI2_1] -; GISEL-NEXT: adrp x8, .LCPI2_0 -; GISEL-NEXT: ldr q4, [x8, :lo12:.LCPI2_0] +; GISEL-NEXT: adrp x9, .LCPI2_4 +; GISEL-NEXT: adrp x10, .LCPI2_0 +; GISEL-NEXT: ldr q1, [x8, :lo12:.LCPI2_3] ; GISEL-NEXT: adrp x8, .LCPI2_2 -; GISEL-NEXT: ldr q5, [x8, :lo12:.LCPI2_2] +; GISEL-NEXT: ldr q3, [x9, :lo12:.LCPI2_4] +; GISEL-NEXT: ldr q4, [x10, :lo12:.LCPI2_0] +; GISEL-NEXT: neg v1.8h, v1.8h +; GISEL-NEXT: ldr q2, [x8, :lo12:.LCPI2_2] +; GISEL-NEXT: adrp x8, .LCPI2_1 +; GISEL-NEXT: cmeq v3.8h, v3.8h, v4.8h +; GISEL-NEXT: ushl v1.8h, v0.8h, v1.8h +; GISEL-NEXT: shl v3.8h, v3.8h, #15 +; GISEL-NEXT: umull2 v5.4s, v1.8h, v2.8h +; GISEL-NEXT: umull v1.4s, v1.4h, v2.4h +; GISEL-NEXT: ldr q2, [x8, :lo12:.LCPI2_1] ; GISEL-NEXT: neg v2.8h, v2.8h -; GISEL-NEXT: ushl v2.8h, v0.8h, v2.8h -; GISEL-NEXT: cmeq v1.8h, v1.8h, v4.8h -; GISEL-NEXT: umull2 v4.4s, v2.8h, v5.8h -; GISEL-NEXT: umull v2.4s, v2.4h, v5.4h -; GISEL-NEXT: neg v3.8h, v3.8h -; GISEL-NEXT: shl v1.8h, v1.8h, #15 -; GISEL-NEXT: uzp2 v2.8h, v2.8h, v4.8h -; GISEL-NEXT: ushl v2.8h, v2.8h, v3.8h -; GISEL-NEXT: sshr v1.8h, v1.8h, #15 -; GISEL-NEXT: bif v0.16b, v2.16b, v1.16b +; GISEL-NEXT: uzp2 v1.8h, v1.8h, v5.8h +; GISEL-NEXT: ushl v1.8h, v1.8h, v2.8h +; GISEL-NEXT: sshr v2.8h, v3.8h, #15 +; GISEL-NEXT: bif v0.16b, v1.16b, v2.16b ; GISEL-NEXT: ret %1 = udiv <8 x i16> %x, <i16 -34, i16 35, i16 36, i16 -37, i16 38, i16 -39, i16 40, i16 -41> ret <8 x i16> %1 @@ -146,43 +146,43 @@ define <8 x i16> @combine_vec_udiv_nonuniform3(<8 x i16> %x) { ; SDAG-NEXT: adrp x8, .LCPI3_0 ; SDAG-NEXT: ldr q1, [x8, :lo12:.LCPI3_0] ; SDAG-NEXT: adrp x8, .LCPI3_1 -; SDAG-NEXT: ldr q3, [x8, :lo12:.LCPI3_1] ; SDAG-NEXT: umull2 v2.4s, v0.8h, v1.8h ; SDAG-NEXT: umull v1.4s, v0.4h, v1.4h ; SDAG-NEXT: uzp2 v1.8h, v1.8h, v2.8h ; SDAG-NEXT: sub v0.8h, v0.8h, v1.8h ; SDAG-NEXT: usra v1.8h, v0.8h, #1 -; SDAG-NEXT: ushl v0.8h, v1.8h, v3.8h +; SDAG-NEXT: ldr q0, [x8, :lo12:.LCPI3_1] +; SDAG-NEXT: ushl v0.8h, v1.8h, v0.8h ; SDAG-NEXT: ret ; ; GISEL-LABEL: combine_vec_udiv_nonuniform3: ; GISEL: // %bb.0: -; GISEL-NEXT: adrp x8, .LCPI3_5 -; GISEL-NEXT: ldr q1, [x8, :lo12:.LCPI3_5] ; GISEL-NEXT: adrp x8, .LCPI3_4 -; GISEL-NEXT: ldr q2, [x8, :lo12:.LCPI3_4] -; GISEL-NEXT: adrp x8, .LCPI3_2 -; GISEL-NEXT: ldr q3, [x8, :lo12:.LCPI3_2] -; GISEL-NEXT: adrp x8, .LCPI3_1 -; GISEL-NEXT: ldr q4, [x8, :lo12:.LCPI3_1] -; GISEL-NEXT: adrp x8, .LCPI3_3 -; GISEL-NEXT: ldr q5, [x8, :lo12:.LCPI3_3] -; GISEL-NEXT: adrp x8, .LCPI3_0 -; GISEL-NEXT: ldr q6, [x8, :lo12:.LCPI3_0] -; GISEL-NEXT: sub v3.8h, v4.8h, v3.8h -; GISEL-NEXT: umull2 v4.4s, v0.8h, v2.8h -; GISEL-NEXT: umull v2.4s, v0.4h, v2.4h -; GISEL-NEXT: uzp2 v2.8h, v2.8h, v4.8h -; GISEL-NEXT: neg v3.8h, v3.8h -; GISEL-NEXT: sub v4.8h, v0.8h, v2.8h -; GISEL-NEXT: cmeq v1.8h, v1.8h, v6.8h -; GISEL-NEXT: ushl v3.8h, v4.8h, v3.8h -; GISEL-NEXT: neg v5.8h, v5.8h -; GISEL-NEXT: shl v1.8h, v1.8h, #15 -; GISEL-NEXT: add v2.8h, v3.8h, v2.8h -; GISEL-NEXT: ushl v2.8h, v2.8h, v5.8h -; GISEL-NEXT: sshr v1.8h, v1.8h, #15 -; GISEL-NEXT: bif v0.16b, v2.16b, v1.16b +; GISEL-NEXT: adrp x9, .LCPI3_2 +; GISEL-NEXT: adrp x10, .LCPI3_1 +; GISEL-NEXT: ldr q1, [x8, :lo12:.LCPI3_4] +; GISEL-NEXT: adrp x8, .LCPI3_5 +; GISEL-NEXT: ldr q2, [x9, :lo12:.LCPI3_2] +; GISEL-NEXT: adrp x9, .LCPI3_3 +; GISEL-NEXT: ldr q3, [x10, :lo12:.LCPI3_1] +; GISEL-NEXT: adrp x10, .LCPI3_0 +; GISEL-NEXT: umull2 v4.4s, v0.8h, v1.8h +; GISEL-NEXT: umull v1.4s, v0.4h, v1.4h +; GISEL-NEXT: ldr q6, [x9, :lo12:.LCPI3_3] +; GISEL-NEXT: sub v2.8h, v3.8h, v2.8h +; GISEL-NEXT: ldr q5, [x10, :lo12:.LCPI3_0] +; GISEL-NEXT: uzp2 v1.8h, v1.8h, v4.8h +; GISEL-NEXT: ldr q4, [x8, :lo12:.LCPI3_5] +; GISEL-NEXT: neg v2.8h, v2.8h +; GISEL-NEXT: sub v3.8h, v0.8h, v1.8h +; GISEL-NEXT: ushl v2.8h, v3.8h, v2.8h +; GISEL-NEXT: cmeq v3.8h, v4.8h, v5.8h +; GISEL-NEXT: neg v4.8h, v6.8h +; GISEL-NEXT: add v1.8h, v2.8h, v1.8h +; GISEL-NEXT: shl v2.8h, v3.8h, #15 +; GISEL-NEXT: ushl v1.8h, v1.8h, v4.8h +; GISEL-NEXT: sshr v2.8h, v2.8h, #15 +; GISEL-NEXT: bif v0.16b, v1.16b, v2.16b ; GISEL-NEXT: ret %1 = udiv <8 x i16> %x, <i16 7, i16 23, i16 25, i16 27, i16 31, i16 47, i16 63, i16 127> ret <8 x i16> %1 @@ -192,39 +192,39 @@ define <16 x i8> @combine_vec_udiv_nonuniform4(<16 x i8> %x) { ; SDAG-LABEL: combine_vec_udiv_nonuniform4: ; SDAG: // %bb.0: ; SDAG-NEXT: adrp x8, .LCPI4_0 +; SDAG-NEXT: adrp x9, .LCPI4_3 ; SDAG-NEXT: ldr q1, [x8, :lo12:.LCPI4_0] ; SDAG-NEXT: adrp x8, .LCPI4_1 +; SDAG-NEXT: ldr q3, [x9, :lo12:.LCPI4_3] +; SDAG-NEXT: umull2 v2.8h, v0.16b, v1.16b +; SDAG-NEXT: umull v1.8h, v0.8b, v1.8b +; SDAG-NEXT: and v0.16b, v0.16b, v3.16b +; SDAG-NEXT: uzp2 v1.16b, v1.16b, v2.16b ; SDAG-NEXT: ldr q2, [x8, :lo12:.LCPI4_1] ; SDAG-NEXT: adrp x8, .LCPI4_2 -; SDAG-NEXT: ldr q3, [x8, :lo12:.LCPI4_2] -; SDAG-NEXT: adrp x8, .LCPI4_3 -; SDAG-NEXT: ldr q4, [x8, :lo12:.LCPI4_3] -; SDAG-NEXT: umull2 v5.8h, v0.16b, v1.16b -; SDAG-NEXT: umull v1.8h, v0.8b, v1.8b -; SDAG-NEXT: uzp2 v1.16b, v1.16b, v5.16b ; SDAG-NEXT: ushl v1.16b, v1.16b, v2.16b -; SDAG-NEXT: and v1.16b, v1.16b, v3.16b -; SDAG-NEXT: and v0.16b, v0.16b, v4.16b +; SDAG-NEXT: ldr q2, [x8, :lo12:.LCPI4_2] +; SDAG-NEXT: and v1.16b, v1.16b, v2.16b ; SDAG-NEXT: orr v0.16b, v0.16b, v1.16b ; SDAG-NEXT: ret ; ; GISEL-LABEL: combine_vec_udiv_nonuniform4: ; GISEL: // %bb.0: ; GISEL-NEXT: adrp x8, .LCPI4_3 +; GISEL-NEXT: adrp x9, .LCPI4_2 +; GISEL-NEXT: adrp x10, .LCPI4_1 ; GISEL-NEXT: ldr q1, [x8, :lo12:.LCPI4_3] ; GISEL-NEXT: adrp x8, .LCPI4_0 -; GISEL-NEXT: ldr q2, [x8, :lo12:.LCPI4_0] -; GISEL-NEXT: adrp x8, .LCPI4_2 -; GISEL-NEXT: ldr q3, [x8, :lo12:.LCPI4_2] -; GISEL-NEXT: adrp x8, .LCPI4_1 -; GISEL-NEXT: ldr q4, [x8, :lo12:.LCPI4_1] -; GISEL-NEXT: cmeq v1.16b, v1.16b, v2.16b -; GISEL-NEXT: umull2 v2.8h, v0.16b, v3.16b -; GISEL-NEXT: umull v3.8h, v0.8b, v3.8b -; GISEL-NEXT: neg v4.16b, v4.16b -; GISEL-NEXT: uzp2 v2.16b, v3.16b, v2.16b +; GISEL-NEXT: ldr q2, [x9, :lo12:.LCPI4_2] +; GISEL-NEXT: ldr q3, [x10, :lo12:.LCPI4_1] +; GISEL-NEXT: ldr q4, [x8, :lo12:.LCPI4_0] +; GISEL-NEXT: umull2 v5.8h, v0.16b, v2.16b +; GISEL-NEXT: umull v2.8h, v0.8b, v2.8b +; GISEL-NEXT: cmeq v1.16b, v1.16b, v4.16b +; GISEL-NEXT: neg v3.16b, v3.16b +; GISEL-NEXT: uzp2 v2.16b, v2.16b, v5.16b ; GISEL-NEXT: shl v1.16b, v1.16b, #7 -; GISEL-NEXT: ushl v2.16b, v2.16b, v4.16b +; GISEL-NEXT: ushl v2.16b, v2.16b, v3.16b ; GISEL-NEXT: sshr v1.16b, v1.16b, #7 ; GISEL-NEXT: bif v0.16b, v2.16b, v1.16b ; GISEL-NEXT: ret @@ -236,55 +236,55 @@ define <8 x i16> @pr38477(<8 x i16> %a0) { </cut>

4 years, 9 months

[TCWG CI] 471.omnetpp slowed down by 8% after gcc: Avoid invalid loop transformations in jump threading registry.

by ci_notify＠linaro.org

After gcc commit 4a960d548b7d7d942f316c5295f6d849b74214f5 Author: Aldy Hernandez <aldyh(a)redhat.com> Avoid invalid loop transformations in jump threading registry. the following benchmarks slowed down by more than 2%: - 471.omnetpp slowed down by 8% from 6348 to 6828 perf samples Below reproducer instructions can be used to re-build both "first_bad" and "last_good" cross-toolchains used in this bisection. Naturally, the scripts will fail when triggerring benchmarking jobs if you don't have access to Linaro TCWG CI. For your convenience, we have uploaded tarballs with pre-processed source and assembly files at: - First_bad save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-ar… - Last_good save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-ar… - Baseline save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-ar… Configuration: - Benchmark: SPEC CPU2006 - Toolchain: GCC + Glibc + GNU Linker - Version: all components were built from their tip of trunk - Target: arm-linux-gnueabihf - Compiler flags: -O3 -marm - Hardware: NVidia TK1 4x Cortex-A15 This benchmarking CI is work-in-progress, and we welcome feedback and suggestions at linaro-toolchain(a)lists.linaro.org . In our improvement plans is to add support for SPEC CPU2017 benchmarks and provide "perf report/annotate" data behind these reports. THIS IS THE END OF INTERESTING STUFF. BELOW ARE LINKS TO BUILDS, REPRODUCTION INSTRUCTIONS, AND THE RAW COMMIT. This commit has regressed these CI configurations: - tcwg_bmk_gnu_tk1/gnu-master-arm-spec2k6-O3 First_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-ar… Last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-ar… Baseline build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-ar… Even more details: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-ar… Reproduce builds: <cut> mkdir investigate-gcc-4a960d548b7d7d942f316c5295f6d849b74214f5 cd investigate-gcc-4a960d548b7d7d942f316c5295f6d849b74214f5 # Fetch scripts git clone https://git.linaro.org/toolchain/jenkins-scripts # Fetch manifests and test.sh script mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-ar… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-ar… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-ar… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /gcc/ ./ ./bisect/baseline/ cd gcc # Reproduce first_bad build git checkout --detach 4a960d548b7d7d942f316c5295f6d849b74214f5 ../artifacts/test.sh # Reproduce last_good build git checkout --detach 29c92857039d0a105281be61c10c9e851aaeea4a ../artifacts/test.sh cd .. </cut> Full commit (up to 1000 lines): <cut> commit 4a960d548b7d7d942f316c5295f6d849b74214f5 Author: Aldy Hernandez <aldyh(a)redhat.com> Date: Thu Sep 23 10:59:24 2021 +0200 Avoid invalid loop transformations in jump threading registry. My upcoming improvements to the forward jump threader make it thread more aggressively. In investigating some "regressions", I noticed that it has always allowed threading through empty latches and across loop boundaries. As we have discussed recently, this should be avoided until after loop optimizations have run their course. Note that this wasn't much of a problem before because DOM/VRP couldn't find these opportunities, but with a smarter solver, we trip over them more easily. Because the forward threader doesn't have an independent localized cost model like the new threader (profitable_path_p), it is difficult to catch these things at discovery. However, we can catch them at registration time, with the added benefit that all the threaders (forward and backward) can share the handcuffs. This patch is an adaptation of what we do in the backward threader, but it is not meant to catch everything we do there, as some of the restrictions there are due to limitations of the different block copiers (for example, the generic copier does not re-use existing threading paths). We could ideally remove the now redundant bits in profitable_path_p, but I would prefer not to for two reasons. First, the backward threader uses profitable_path_p as it discovers paths to avoid discovering paths in unprofitable directions. Second, I would like to merge all the forward cost restrictions into the profitability class in the backward threader, not the other way around. Alas, that reshuffling will have to wait for the next release. As usual, there are quite a few tests that needed adjustments. It seems we were quite happily threading improper scenarios. With most of them, as can be seen in pr77445-2.c, we're merely shifting the threading to after loop optimizations. Tested on x86-64 Linux. gcc/ChangeLog: * tree-ssa-threadupdate.c (jt_path_registry::cancel_invalid_paths): New. (jt_path_registry::register_jump_thread): Call cancel_invalid_paths. * tree-ssa-threadupdate.h (class jt_path_registry): Add cancel_invalid_paths. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/20030714-2.c: Adjust. * gcc.dg/tree-ssa/pr66752-3.c: Adjust. * gcc.dg/tree-ssa/pr77445-2.c: Adjust. * gcc.dg/tree-ssa/ssa-dom-thread-18.c: Adjust. * gcc.dg/tree-ssa/ssa-dom-thread-7.c: Adjust. * gcc.dg/vect/bb-slp-16.c: Adjust. --- gcc/testsuite/gcc.dg/tree-ssa/20030714-2.c | 7 ++- gcc/testsuite/gcc.dg/tree-ssa/pr66752-3.c | 19 ++++--- gcc/testsuite/gcc.dg/tree-ssa/pr77445-2.c | 4 +- gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-18.c | 4 +- gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c | 4 +- gcc/testsuite/gcc.dg/vect/bb-slp-16.c | 7 --- gcc/tree-ssa-threadupdate.c | 67 ++++++++++++++++++----- gcc/tree-ssa-threadupdate.h | 1 + 8 files changed, 78 insertions(+), 35 deletions(-) diff --git a/gcc/testsuite/gcc.dg/tree-ssa/20030714-2.c b/gcc/testsuite/gcc.dg/tree-ssa/20030714-2.c index eb663f2ff5b..9585ff11307 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/20030714-2.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/20030714-2.c @@ -32,7 +32,8 @@ get_alias_set (t) } } -/* There should be exactly three IF conditionals if we thread jumps - properly. */ -/* { dg-final { scan-tree-dump-times "if " 3 "dom2"} } */ +/* There should be exactly 4 IF conditionals if we thread jumps + properly. There used to be 3, but one thread was crossing + loops. */ +/* { dg-final { scan-tree-dump-times "if " 4 "dom2"} } */ diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr66752-3.c b/gcc/testsuite/gcc.dg/tree-ssa/pr66752-3.c index e1464e21170..922a331b217 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/pr66752-3.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr66752-3.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-O2 -fdump-tree-thread1-details -fdump-tree-dce2" } */ +/* { dg-options "-O2 -fdump-tree-thread1-details -fdump-tree-thread3" } */ extern int status, pt; extern int count; @@ -32,10 +32,15 @@ foo (int N, int c, int b, int *a) pt--; } -/* There are 4 jump threading opportunities, all of which will be - realized, which will eliminate testing of FLAG, completely. */ -/* { dg-final { scan-tree-dump-times "Registering jump" 4 "thread1"} } */ +/* There are 2 jump threading opportunities (which don't cross loops), + all of which will be realized, which will eliminate testing of + FLAG, completely. */ +/* { dg-final { scan-tree-dump-times "Registering jump" 2 "thread1"} } */ -/* There should be no assignments or references to FLAG, verify they're - eliminated as early as possible. */ -/* { dg-final { scan-tree-dump-not "if .flag" "dce2"} } */ +/* We used to remove references to FLAG by DCE2, but this was + depending on early threaders threading through loop boundaries + (which we shouldn't do). However, the late threading passes, which + run after loop optimizations , can successfully eliminate the + references to FLAG. Verify that ther are no references by the late + threading passes. */ +/* { dg-final { scan-tree-dump-not "if .flag" "thread3"} } */ diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr77445-2.c b/gcc/testsuite/gcc.dg/tree-ssa/pr77445-2.c index f9fc212f49e..01a0f1f197d 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/pr77445-2.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr77445-2.c @@ -123,8 +123,8 @@ enum STATES FMS( u8 **in , u32 *transitions) { aarch64 has the highest CASE_VALUES_THRESHOLD in GCC. It's high enough to change decisions in switch expansion which in turn can expose new jump threading opportunities. Skip the later tests on aarch64. */ -/* { dg-final { scan-tree-dump "Jumps threaded: 1\[1-9\]" "thread1" } } */ -/* { dg-final { scan-tree-dump-times "Invalid sum" 4 "thread1" } } */ +/* { dg-final { scan-tree-dump "Jumps threaded: 9" "thread1" } } */ +/* { dg-final { scan-tree-dump-times "Invalid sum" 1 "thread1" } } */ /* { dg-final { scan-tree-dump-not "optimizing for size" "thread1" } } */ /* { dg-final { scan-tree-dump-not "optimizing for size" "thread2" } } */ /* { dg-final { scan-tree-dump-not "optimizing for size" "thread3" { target { ! aarch64*-*-* } } } } */ diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-18.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-18.c index 60d4f76f076..2d78d045516 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-18.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-18.c @@ -21,5 +21,7 @@ condition. All the cases are picked up by VRP1 as jump threads. */ -/* { dg-final { scan-tree-dump-times "Registering jump" 6 "thread1" } } */ + +/* There used to be 6 jump threads found by thread1, but they all + depended on threading through distinct loops in ethread. */ /* { dg-final { scan-tree-dump-times "Threaded" 2 "vrp1" } } */ diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c index e3d4b311c03..16abcde5053 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c @@ -1,8 +1,8 @@ /* { dg-do compile } */ /* { dg-options "-O2 -fdump-tree-thread1-stats -fdump-tree-thread2-stats -fdump-tree-dom2-stats -fdump-tree-thread3-stats -fdump-tree-dom3-stats -fdump-tree-vrp2-stats -fno-guess-branch-probability" } */ -/* { dg-final { scan-tree-dump "Jumps threaded: 18" "thread1" } } */ -/* { dg-final { scan-tree-dump "Jumps threaded: 8" "thread3" { target { ! aarch64*-*-* } } } } */ +/* { dg-final { scan-tree-dump "Jumps threaded: 12" "thread1" } } */ +/* { dg-final { scan-tree-dump "Jumps threaded: 5" "thread3" { target { ! aarch64*-*-* } } } } */ /* { dg-final { scan-tree-dump-not "Jumps threaded" "dom2" } } */ /* aarch64 has the highest CASE_VALUES_THRESHOLD in GCC. It's high enough diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-16.c b/gcc/testsuite/gcc.dg/vect/bb-slp-16.c index 664e93e9b60..e68a9b62535 100644 --- a/gcc/testsuite/gcc.dg/vect/bb-slp-16.c +++ b/gcc/testsuite/gcc.dg/vect/bb-slp-16.c @@ -1,8 +1,5 @@ /* { dg-require-effective-target vect_int } */ -/* See note below as to why we disable threading. */ -/* { dg-additional-options "-fdisable-tree-thread1" } */ - #include <stdarg.h> #include "tree-vect.h" @@ -30,10 +27,6 @@ main1 (int dummy) *pout++ = *pin++ + a; *pout++ = *pin++ + a; *pout++ = *pin++ + a; - /* In some architectures like ppc64, jump threading may thread - the iteration where i==0 such that we no longer optimize the - BB. Another alternative to disable jump threading would be - to wrap the read from `i' into a function returning i. */ if (arr[i] = i) a = i; else diff --git a/gcc/tree-ssa-threadupdate.c b/gcc/tree-ssa-threadupdate.c index baac11280fa..2b9b8f81274 100644 --- a/gcc/tree-ssa-threadupdate.c +++ b/gcc/tree-ssa-threadupdate.c @@ -2757,6 +2757,58 @@ fwd_jt_path_registry::update_cfg (bool may_peel_loop_headers) return retval; } +bool +jt_path_registry::cancel_invalid_paths (vec<jump_thread_edge *> &path) +{ + gcc_checking_assert (!path.is_empty ()); + edge taken_edge = path[path.length () - 1]->e; + loop_p loop = taken_edge->src->loop_father; + bool seen_latch = false; + bool path_crosses_loops = false; + + for (unsigned int i = 0; i < path.length (); i++) + { + edge e = path[i]->e; + + if (e == NULL) + { + // NULL outgoing edges on a path can happen for jumping to a + // constant address. + cancel_thread (&path, "Found NULL edge in jump threading path"); + return true; + } + + if (loop->latch == e->src || loop->latch == e->dest) + seen_latch = true; + + // The first entry represents the block with an outgoing edge + // that we will redirect to the jump threading path. Thus we + // don't care about that block's loop father. + if ((i > 0 && e->src->loop_father != loop) + || e->dest->loop_father != loop) + path_crosses_loops = true; + + if (flag_checking && !m_backedge_threads) + gcc_assert ((path[i]->e->flags & EDGE_DFS_BACK) == 0); + } + + if (cfun->curr_properties & PROP_loop_opts_done) + return false; + + if (seen_latch && empty_block_p (loop->latch)) + { + cancel_thread (&path, "Threading through latch before loop opts " + "would create non-empty latch"); + return true; + } + if (path_crosses_loops) + { + cancel_thread (&path, "Path crosses loops"); + return true; + } + return false; +} + /* Register a jump threading opportunity. We queue up all the jump threading opportunities discovered by a pass and update the CFG and SSA form all at once. @@ -2776,19 +2828,8 @@ jt_path_registry::register_jump_thread (vec<jump_thread_edge *> *path) return false; } - /* First make sure there are no NULL outgoing edges on the jump threading - path. That can happen for jumping to a constant address. */ - for (unsigned int i = 0; i < path->length (); i++) - { - if ((*path)[i]->e == NULL) - { - cancel_thread (path, "Found NULL edge in jump threading path"); - return false; - } - - if (flag_checking && !m_backedge_threads) - gcc_assert (((*path)[i]->e->flags & EDGE_DFS_BACK) == 0); - } + if (cancel_invalid_paths (*path)) + return false; if (dump_file && (dump_flags & TDF_DETAILS)) dump_jump_thread_path (dump_file, *path, true); diff --git a/gcc/tree-ssa-threadupdate.h b/gcc/tree-ssa-threadupdate.h index 8b48a671212..d68795c9f27 100644 --- a/gcc/tree-ssa-threadupdate.h +++ b/gcc/tree-ssa-threadupdate.h @@ -75,6 +75,7 @@ protected: unsigned long m_num_threaded_edges; private: virtual bool update_cfg (bool peel_loop_headers) = 0; + bool cancel_invalid_paths (vec<jump_thread_edge *> &path); jump_thread_path_allocator m_allocator; // True if threading through back edges is allowed. This is only // allowed in the generic copier in the backward threader. </cut>

4 years, 9 months

[TCWG CI] Regression caused by gcc: tree-optimization/102570 - teach VN about internal functions

by ci_notify＠linaro.org

[TCWG CI] Regression caused by gcc: tree-optimization/102570 - teach VN about internal functions: commit 55a3be2f5255d69e740d61b2c5aaba1ccbc643b8 Author: Richard Biener <rguenther(a)suse.de> tree-optimization/102570 - teach VN about internal functions Results regressed to # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1: -5 # build_abe qemu: -2 # linux_n_obj: 18360 # First few build errors in logs: # 00:01:21 ./include/linux/arm-smccc.h:460:40: error: ‘res.a0’ is used uninitialized [-Werror=uninitialized] # 00:01:21 ./include/linux/arm-smccc.h:460:40: error: ‘res.a0’ is used uninitialized [-Werror=uninitialized] # 00:01:21 ./include/linux/arm-smccc.h:460:40: error: ‘res.a0’ is used uninitialized [-Werror=uninitialized] # 00:01:21 make[2]: *** [scripts/Makefile.build:288: arch/arm64/hyperv/hv_core.o] Error 1 # 00:01:22 crypto/asymmetric_keys/asymmetric_type.c:481:15: error: ‘restrict_method’ is used uninitialized [-Werror=uninitialized] # 00:01:22 make[2]: *** [scripts/Makefile.build:288: crypto/asymmetric_keys/asymmetric_type.o] Error 1 # 00:01:22 ./include/trace/perf.h:38:25: error: ‘entry’ is used uninitialized [-Werror=uninitialized] # 00:01:22 ./include/trace/perf.h:44:13: error: ‘__entry_size’ is used uninitialized [-Werror=uninitialized] # 00:01:23 security/keys/encrypted-keys/encrypted.c:660:19: error: ‘mkey’ is used uninitialized [-Werror=uninitialized] # 00:01:23 security/keys/encrypted-keys/encrypted.c:905:19: error: ‘epayload’ is used uninitialized [-Werror=uninitialized] from # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1: -5 # build_abe qemu: -2 # linux_n_obj: 21404 THIS IS THE END OF INTERESTING STUFF. BELOW ARE LINKS TO BUILDS, REPRODUCTION INSTRUCTIONS, AND THE RAW COMMIT. This commit has regressed these CI configurations: - tcwg_kernel/gnu-master-aarch64-next-allmodconfig First_bad build: https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-aarch64-next-al… Last_good build: https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-aarch64-next-al… Baseline build: https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-aarch64-next-al… Even more details: https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-aarch64-next-al… Reproduce builds: <cut> mkdir investigate-gcc-55a3be2f5255d69e740d61b2c5aaba1ccbc643b8 cd investigate-gcc-55a3be2f5255d69e740d61b2c5aaba1ccbc643b8 # Fetch scripts git clone https://git.linaro.org/toolchain/jenkins-scripts # Fetch manifests and test.sh script mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-aarch64-next-al… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-aarch64-next-al… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-aarch64-next-al… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_kernel-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /gcc/ ./ ./bisect/baseline/ cd gcc # Reproduce first_bad build git checkout --detach 55a3be2f5255d69e740d61b2c5aaba1ccbc643b8 ../artifacts/test.sh # Reproduce last_good build git checkout --detach 22d34a2a50651d01669b6fbcdb9677c18d2197c5 ../artifacts/test.sh cd .. </cut> Full commit (up to 1000 lines): <cut> commit 55a3be2f5255d69e740d61b2c5aaba1ccbc643b8 Author: Richard Biener <rguenther(a)suse.de> Date: Mon Oct 4 10:57:45 2021 +0200 tree-optimization/102570 - teach VN about internal functions We're now using internal functions for a lot of stuff but there's still missing VN support out of laziness. The following instantiates support and adds testcases for FRE and PRE (hoisting). 2021-10-04 Richard Biener <rguenther(a)suse.de> PR tree-optimization/102570 * tree-ssa-sccvn.h (vn_reference_op_struct): Document we are using clique for the internal function code. * tree-ssa-sccvn.c (vn_reference_op_eq): Compare the internal function code. (print_vn_reference_ops): Print the internal function code. (vn_reference_op_compute_hash): Hash it. (copy_reference_ops_from_call): Record it. (visit_stmt): Remove the restriction around internal function calls. (fully_constant_vn_reference_p): Use fold_const_call and handle internal functions. (vn_reference_eq): Compare call return types. * tree-ssa-pre.c (create_expression_by_pieces): Handle generating calls to internal functions. (compute_avail): Remove the restriction around internal function calls. * gcc.dg/tree-ssa/ssa-fre-96.c: New testcase. * gcc.dg/tree-ssa/ssa-pre-33.c: Likewise. --- gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-96.c | 14 +++++ gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-33.c | 15 +++++ gcc/tree-ssa-pre.c | 27 +++++---- gcc/tree-ssa-sccvn.c | 91 ++++++++++++++++++------------ gcc/tree-ssa-sccvn.h | 3 +- 5 files changed, 103 insertions(+), 47 deletions(-) diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-96.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-96.c new file mode 100644 index 00000000000..fd1d5713b5f --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-96.c @@ -0,0 +1,14 @@ +/* { dg-do compile } */ +/* { dg-options "-O -fdump-tree-fre1" } */ + +_Bool f1(unsigned x, unsigned y, unsigned *res) +{ + _Bool t = __builtin_add_overflow(x, y, res); + unsigned res1; + _Bool t1 = __builtin_add_overflow(x, y, &res1); + *res -= res1; + return t==t1; +} + +/* { dg-final { scan-tree-dump-times "ADD_OVERFLOW" 1 "fre1" } } */ +/* { dg-final { scan-tree-dump "return 1;" "fre1" } } */ diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-33.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-33.c new file mode 100644 index 00000000000..3b3bd629bc2 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-33.c @@ -0,0 +1,15 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-pre" } */ + +_Bool f1(unsigned x, unsigned y, unsigned *res, int flag, _Bool *t) +{ + if (flag) + *t = __builtin_add_overflow(x, y, res); + unsigned res1; + _Bool t1 = __builtin_add_overflow(x, y, &res1); + *res -= res1; + return *t==t1; +} + +/* We should hoist the .ADD_OVERFLOW to before the check. */ +/* { dg-final { scan-tree-dump-times "ADD_OVERFLOW" 1 "pre" } } */ diff --git a/gcc/tree-ssa-pre.c b/gcc/tree-ssa-pre.c index 08755847f66..1cc1aae694f 100644 --- a/gcc/tree-ssa-pre.c +++ b/gcc/tree-ssa-pre.c @@ -2855,9 +2855,13 @@ create_expression_by_pieces (basic_block block, pre_expr expr, unsigned int operand = 1; vn_reference_op_t currop = &ref->operands[0]; tree sc = NULL_TREE; - tree fn = find_or_generate_expression (block, currop->op0, stmts); - if (!fn) - return NULL_TREE; + tree fn = NULL_TREE; + if (currop->op0) + { + fn = find_or_generate_expression (block, currop->op0, stmts); + if (!fn) + return NULL_TREE; + } if (currop->op1) { sc = find_or_generate_expression (block, currop->op1, stmts); @@ -2873,12 +2877,19 @@ create_expression_by_pieces (basic_block block, pre_expr expr, return NULL_TREE; args.quick_push (arg); } - gcall *call = gimple_build_call_vec (fn, args); + gcall *call; + if (currop->op0) + { + call = gimple_build_call_vec (fn, args); + gimple_call_set_fntype (call, currop->type); + } + else + call = gimple_build_call_internal_vec ((internal_fn)currop->clique, + args); gimple_set_location (call, expr->loc); - gimple_call_set_fntype (call, currop->type); if (sc) gimple_call_set_chain (call, sc); - tree forcedname = make_ssa_name (TREE_TYPE (currop->type)); + tree forcedname = make_ssa_name (ref->type); gimple_call_set_lhs (call, forcedname); /* There's no CCP pass after PRE which would re-compute alignment information so make sure we re-materialize this here. */ @@ -4004,10 +4015,6 @@ compute_avail (function *fun) vn_reference_s ref1; pre_expr result = NULL; - /* We can value number only calls to real functions. */ - if (gimple_call_internal_p (stmt)) - continue; - vn_reference_lookup_call (as_a <gcall *> (stmt), &ref, &ref1); /* There is no point to PRE a call without a value. */ if (!ref || !ref->result) diff --git a/gcc/tree-ssa-sccvn.c b/gcc/tree-ssa-sccvn.c index 416a5252144..0d942218279 100644 --- a/gcc/tree-ssa-sccvn.c +++ b/gcc/tree-ssa-sccvn.c @@ -70,6 +70,7 @@ along with GCC; see the file COPYING3. If not see #include "tree-scalar-evolution.h" #include "tree-ssa-loop-niter.h" #include "builtins.h" +#include "fold-const-call.h" #include "tree-ssa-sccvn.h" /* This algorithm is based on the SCC algorithm presented by Keith @@ -212,7 +213,8 @@ vn_reference_op_eq (const void *p1, const void *p2) TYPE_MAIN_VARIANT (vro2->type)))) && expressions_equal_p (vro1->op0, vro2->op0) && expressions_equal_p (vro1->op1, vro2->op1) - && expressions_equal_p (vro1->op2, vro2->op2)); + && expressions_equal_p (vro1->op2, vro2->op2) + && (vro1->opcode != CALL_EXPR || vro1->clique == vro2->clique)); } /* Free a reference operation structure VP. */ @@ -264,15 +266,18 @@ print_vn_reference_ops (FILE *outfile, const vec<vn_reference_op_s> ops) && TREE_CODE_CLASS (vro->opcode) != tcc_declaration) { fprintf (outfile, "%s", get_tree_code_name (vro->opcode)); - if (vro->op0) + if (vro->op0 || vro->opcode == CALL_EXPR) { fprintf (outfile, "<"); closebrace = true; } } - if (vro->op0) + if (vro->op0 || vro->opcode == CALL_EXPR) { - print_generic_expr (outfile, vro->op0); + if (!vro->op0) + fprintf (outfile, internal_fn_name ((internal_fn)vro->clique)); + else + print_generic_expr (outfile, vro->op0); if (vro->op1) { fprintf (outfile, ","); @@ -684,6 +689,8 @@ static void vn_reference_op_compute_hash (const vn_reference_op_t vro1, inchash::hash &hstate) { hstate.add_int (vro1->opcode); + if (vro1->opcode == CALL_EXPR && !vro1->op0) + hstate.add_int (vro1->clique); if (vro1->op0) inchash::add_expr (vro1->op0, hstate); if (vro1->op1) @@ -769,11 +776,16 @@ vn_reference_eq (const_vn_reference_t const vr1, const_vn_reference_t const vr2) if (vr1->type != vr2->type) return false; } + else if (vr1->type == vr2->type) + ; else if (COMPLETE_TYPE_P (vr1->type) != COMPLETE_TYPE_P (vr2->type) || (COMPLETE_TYPE_P (vr1->type) && !expressions_equal_p (TYPE_SIZE (vr1->type), TYPE_SIZE (vr2->type)))) return false; + else if (vr1->operands[0].opcode == CALL_EXPR + && !types_compatible_p (vr1->type, vr2->type)) + return false; else if (INTEGRAL_TYPE_P (vr1->type) && INTEGRAL_TYPE_P (vr2->type)) { @@ -1270,6 +1282,8 @@ copy_reference_ops_from_call (gcall *call, temp.type = gimple_call_fntype (call); temp.opcode = CALL_EXPR; temp.op0 = gimple_call_fn (call); + if (gimple_call_internal_p (call)) + temp.clique = gimple_call_internal_fn (call); temp.op1 = gimple_call_chain (call); if (stmt_could_throw_p (cfun, call) && (lr = lookup_stmt_eh_lp (call)) > 0) temp.op2 = size_int (lr); @@ -1459,9 +1473,11 @@ fully_constant_vn_reference_p (vn_reference_t ref) a call to a builtin function with at most two arguments. */ op = &operands[0]; if (op->opcode == CALL_EXPR - && TREE_CODE (op->op0) == ADDR_EXPR - && TREE_CODE (TREE_OPERAND (op->op0, 0)) == FUNCTION_DECL - && fndecl_built_in_p (TREE_OPERAND (op->op0, 0)) + && (!op->op0 + || (TREE_CODE (op->op0) == ADDR_EXPR + && TREE_CODE (TREE_OPERAND (op->op0, 0)) == FUNCTION_DECL + && fndecl_built_in_p (TREE_OPERAND (op->op0, 0), + BUILT_IN_NORMAL))) && operands.length () >= 2 && operands.length () <= 3) { @@ -1481,13 +1497,17 @@ fully_constant_vn_reference_p (vn_reference_t ref) anyconst = true; if (anyconst) { - tree folded = build_call_expr (TREE_OPERAND (op->op0, 0), - arg1 ? 2 : 1, - arg0->op0, - arg1 ? arg1->op0 : NULL); - if (folded - && TREE_CODE (folded) == NOP_EXPR) - folded = TREE_OPERAND (folded, 0); + combined_fn fn; + if (op->op0) + fn = as_combined_fn (DECL_FUNCTION_CODE + (TREE_OPERAND (op->op0, 0))); + else + fn = as_combined_fn ((internal_fn) op->clique); + tree folded; + if (arg1) + folded = fold_const_call (fn, ref->type, arg0->op0, arg1->op0); + else + folded = fold_const_call (fn, ref->type, arg0->op0); if (folded && is_gimple_min_invariant (folded)) return folded; @@ -5648,28 +5668,27 @@ visit_stmt (gimple *stmt, bool backedges_varying_p = false) && TREE_CODE (TREE_OPERAND (fn, 0)) == FUNCTION_DECL) extra_fnflags = flags_from_decl_or_type (TREE_OPERAND (fn, 0)); } - if (!gimple_call_internal_p (call_stmt) - && (/* Calls to the same function with the same vuse - and the same operands do not necessarily return the same - value, unless they're pure or const. */ - ((gimple_call_flags (call_stmt) | extra_fnflags) - & (ECF_PURE | ECF_CONST)) - /* If calls have a vdef, subsequent calls won't have - the same incoming vuse. So, if 2 calls with vdef have the - same vuse, we know they're not subsequent. - We can value number 2 calls to the same function with the - same vuse and the same operands which are not subsequent - the same, because there is no code in the program that can - compare the 2 values... */ - || (gimple_vdef (call_stmt) - /* ... unless the call returns a pointer which does - not alias with anything else. In which case the - information that the values are distinct are encoded - in the IL. */ - && !(gimple_call_return_flags (call_stmt) & ERF_NOALIAS) - /* Only perform the following when being called from PRE - which embeds tail merging. */ - && default_vn_walk_kind == VN_WALK))) + if (/* Calls to the same function with the same vuse + and the same operands do not necessarily return the same + value, unless they're pure or const. */ + ((gimple_call_flags (call_stmt) | extra_fnflags) + & (ECF_PURE | ECF_CONST)) + /* If calls have a vdef, subsequent calls won't have + the same incoming vuse. So, if 2 calls with vdef have the + same vuse, we know they're not subsequent. + We can value number 2 calls to the same function with the + same vuse and the same operands which are not subsequent + the same, because there is no code in the program that can + compare the 2 values... */ + || (gimple_vdef (call_stmt) + /* ... unless the call returns a pointer which does + not alias with anything else. In which case the + information that the values are distinct are encoded + in the IL. */ + && !(gimple_call_return_flags (call_stmt) & ERF_NOALIAS) + /* Only perform the following when being called from PRE + which embeds tail merging. */ + && default_vn_walk_kind == VN_WALK)) changed = visit_reference_op_call (lhs, call_stmt); else changed = defs_to_varying (call_stmt); diff --git a/gcc/tree-ssa-sccvn.h b/gcc/tree-ssa-sccvn.h index 96100596d2e..8a1b649c726 100644 --- a/gcc/tree-ssa-sccvn.h +++ b/gcc/tree-ssa-sccvn.h @@ -106,7 +106,8 @@ typedef const struct vn_phi_s *const_vn_phi_t; typedef struct vn_reference_op_struct { ENUM_BITFIELD(tree_code) opcode : 16; - /* Dependence info, used for [TARGET_]MEM_REF only. */ + /* Dependence info, used for [TARGET_]MEM_REF only. For internal + function calls clique is also used for the internal function code. */ unsigned short clique; unsigned short base; unsigned reverse : 1; </cut>

4 years, 9 months

[TCWG CI] 464.h264ref slowed down by 6% after llvm: [SCEV] Infer flags from add/gep in any block

by ci_notify＠linaro.org

After llvm commit 0658bab870c89d81678f1f37aac0396ddd0913b3 Author: Philip Reames <listmail(a)philipreames.com> [SCEV] Infer flags from add/gep in any block the following benchmarks slowed down by more than 2%: - 464.h264ref slowed down by 6% from 11124 to 11783 perf samples - 464.h264ref:[.] FastFullPelBlockMotionSearch slowed down by 41% from 1504 to 2116 perf samples Below reproducer instructions can be used to re-build both "first_bad" and "last_good" cross-toolchains used in this bisection. Naturally, the scripts will fail when triggerring benchmarking jobs if you don't have access to Linaro TCWG CI. For your convenience, we have uploaded tarballs with pre-processed source and assembly files at: - First_bad save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… - Last_good save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… - Baseline save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Configuration: - Benchmark: SPEC CPU2006 - Toolchain: Clang + Glibc + LLVM Linker - Version: all components were built from their tip of trunk - Target: aarch64-linux-gnu - Compiler flags: -O2 - Hardware: NVidia TX1 4x Cortex-A57 This benchmarking CI is work-in-progress, and we welcome feedback and suggestions at linaro-toolchain(a)lists.linaro.org . In our improvement plans is to add support for SPEC CPU2017 benchmarks and provide "perf report/annotate" data behind these reports. THIS IS THE END OF INTERESTING STUFF. BELOW ARE LINKS TO BUILDS, REPRODUCTION INSTRUCTIONS, AND THE RAW COMMIT. This commit has regressed these CI configurations: - tcwg_bmk_llvm_tx1/llvm-master-aarch64-spec2k6-O2 First_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Baseline build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Even more details: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Reproduce builds: <cut> mkdir investigate-llvm-0658bab870c89d81678f1f37aac0396ddd0913b3 cd investigate-llvm-0658bab870c89d81678f1f37aac0396ddd0913b3 # Fetch scripts git clone https://git.linaro.org/toolchain/jenkins-scripts # Fetch manifests and test.sh script mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /llvm/ ./ ./bisect/baseline/ cd llvm # Reproduce first_bad build git checkout --detach 0658bab870c89d81678f1f37aac0396ddd0913b3 ../artifacts/test.sh # Reproduce last_good build git checkout --detach 2ced9a42be8aba4533225fdb8ed02fe6f50060b6 ../artifacts/test.sh cd .. </cut> Full commit (up to 1000 lines): <cut> commit 0658bab870c89d81678f1f37aac0396ddd0913b3 Author: Philip Reames <listmail(a)philipreames.com> Date: Wed Oct 6 10:35:01 2021 -0700 [SCEV] Infer flags from add/gep in any block This patch removes a compile time restriction from isSCEVExprNeverPoison. We've strengthened our ability to reason about flags on scopes other than addrecs, and this bailout prevents us from using it. The comment is also suspect as well in that we're in the middle of constructing a SCEV for I. As such, we're going to visit all operands *anyways*. Differential Revision: https://reviews.llvm.org/D111186 --- llvm/lib/Analysis/ScalarEvolution.cpp | 10 --------- .../Analysis/DependenceAnalysis/Preliminary.ll | 2 +- .../Analysis/ScalarEvolution/flags-from-poison.ll | 8 +++---- .../SLPVectorizer/X86/consecutive-access.ll | 25 +++++++++------------- 4 files changed, 15 insertions(+), 30 deletions(-) diff --git a/llvm/lib/Analysis/ScalarEvolution.cpp b/llvm/lib/Analysis/ScalarEvolution.cpp index 6683b1a5205c..4fb2266b6e89 100644 --- a/llvm/lib/Analysis/ScalarEvolution.cpp +++ b/llvm/lib/Analysis/ScalarEvolution.cpp @@ -6645,16 +6645,6 @@ bool ScalarEvolution::isGuaranteedToTransferExecutionTo(const Instruction *A, bool ScalarEvolution::isSCEVExprNeverPoison(const Instruction *I) { - // Here we check that I is in the header of the innermost loop containing I, - // since we only deal with instructions in the loop header. The actual loop we - // need to check later will come from an add recurrence, but getting that - // requires computing the SCEV of the operands, which can be expensive. This - // check we can do cheaply to rule out some cases early. - Loop *InnermostContainingLoop = LI.getLoopFor(I->getParent()); - if (InnermostContainingLoop == nullptr || - InnermostContainingLoop->getHeader() != I->getParent()) - return false; - // Only proceed if we can prove that I does not yield poison. if (!programUndefinedIfPoison(I)) return false; diff --git a/llvm/test/Analysis/DependenceAnalysis/Preliminary.ll b/llvm/test/Analysis/DependenceAnalysis/Preliminary.ll index 0899f67d6914..91827f3231ba 100644 --- a/llvm/test/Analysis/DependenceAnalysis/Preliminary.ll +++ b/llvm/test/Analysis/DependenceAnalysis/Preliminary.ll @@ -623,7 +623,7 @@ entry: ; CHECK-LABEL: p9 ; CHECK: da analyze - none! -; CHECK: da analyze - flow [|<]! +; CHECK: da analyze - none! ; CHECK: da analyze - confused! ; CHECK: da analyze - none! ; CHECK: da analyze - confused! diff --git a/llvm/test/Analysis/ScalarEvolution/flags-from-poison.ll b/llvm/test/Analysis/ScalarEvolution/flags-from-poison.ll index f0bda26edb38..0423854bbc3b 100644 --- a/llvm/test/Analysis/ScalarEvolution/flags-from-poison.ll +++ b/llvm/test/Analysis/ScalarEvolution/flags-from-poison.ll @@ -1628,9 +1628,9 @@ define noundef i32 @add-basic(i32 %a, i32 %b) { ; CHECK-LABEL: 'add-basic' ; CHECK-NEXT: Classifying expressions for: @add-basic ; CHECK-NEXT: %res = add nuw nsw i32 %a, %b -; CHECK-NEXT: --> (%a + %b) U: full-set S: full-set +; CHECK-NEXT: --> (%a + %b)<nuw><nsw> U: full-set S: full-set ; CHECK-NEXT: %res2 = udiv i32 255, %res -; CHECK-NEXT: --> (255 /u (%a + %b)) U: [0,256) S: [0,256) +; CHECK-NEXT: --> (255 /u (%a + %b)<nuw><nsw>) U: [0,256) S: [0,256) ; CHECK-NEXT: Determining loop execution counts for: @add-basic ; %res = add nuw nsw i32 %a, %b @@ -1656,9 +1656,9 @@ define noundef i32 @mul-basic(i32 %a, i32 %b) { ; CHECK-LABEL: 'mul-basic' ; CHECK-NEXT: Classifying expressions for: @mul-basic ; CHECK-NEXT: %res = mul nuw nsw i32 %a, %b -; CHECK-NEXT: --> (%a * %b) U: full-set S: full-set +; CHECK-NEXT: --> (%a * %b)<nuw><nsw> U: full-set S: full-set ; CHECK-NEXT: %res2 = udiv i32 255, %res -; CHECK-NEXT: --> (255 /u (%a * %b)) U: [0,256) S: [0,256) +; CHECK-NEXT: --> (255 /u (%a * %b)<nuw><nsw>) U: [0,256) S: [0,256) ; CHECK-NEXT: Determining loop execution counts for: @mul-basic ; %res = mul nuw nsw i32 %a, %b diff --git a/llvm/test/Transforms/SLPVectorizer/X86/consecutive-access.ll b/llvm/test/Transforms/SLPVectorizer/X86/consecutive-access.ll index e4000b52c4a9..8f57fe6866bd 100644 --- a/llvm/test/Transforms/SLPVectorizer/X86/consecutive-access.ll +++ b/llvm/test/Transforms/SLPVectorizer/X86/consecutive-access.ll @@ -8,10 +8,6 @@ target triple = "x86_64-apple-macosx10.9.0" @C = common global [2000 x float] zeroinitializer, align 16 @D = common global [2000 x float] zeroinitializer, align 16 -; Currently SCEV isn't smart enough to figure out that accesses -; A[3*i], A[3*i+1] and A[3*i+2] are consecutive, but in future -; that would hopefully be fixed. For now, check that this isn't -; vectorized. ; Function Attrs: nounwind ssp uwtable define void @foo_3double(i32 %u) #0 { ; CHECK-LABEL: @foo_3double( @@ -21,26 +17,25 @@ define void @foo_3double(i32 %u) #0 { ; CHECK-NEXT: [[MUL:%.*]] = mul nsw i32 [[U]], 3 ; CHECK-NEXT: [[IDXPROM:%.*]] = sext i32 [[MUL]] to i64 ; CHECK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds [2000 x double], [2000 x double]* @A, i32 0, i64 [[IDXPROM]] -; CHECK-NEXT: [[TMP0:%.*]] = load double, double* [[ARRAYIDX]], align 8 ; CHECK-NEXT: [[ARRAYIDX4:%.*]] = getelementptr inbounds [2000 x double], [2000 x double]* @B, i32 0, i64 [[IDXPROM]] -; CHECK-NEXT: [[TMP1:%.*]] = load double, double* [[ARRAYIDX4]], align 8 -; CHECK-NEXT: [[ADD5:%.*]] = fadd double [[TMP0]], [[TMP1]] -; CHECK-NEXT: store double [[ADD5]], double* [[ARRAYIDX]], align 8 ; CHECK-NEXT: [[ADD11:%.*]] = add nsw i32 [[MUL]], 1 ; CHECK-NEXT: [[IDXPROM12:%.*]] = sext i32 [[ADD11]] to i64 ; CHECK-NEXT: [[ARRAYIDX13:%.*]] = getelementptr inbounds [2000 x double], [2000 x double]* @A, i32 0, i64 [[IDXPROM12]] -; CHECK-NEXT: [[TMP2:%.*]] = load double, double* [[ARRAYIDX13]], align 8 +; CHECK-NEXT: [[TMP0:%.*]] = bitcast double* [[ARRAYIDX]] to <2 x double>* +; CHECK-NEXT: [[TMP1:%.*]] = load <2 x double>, <2 x double>* [[TMP0]], align 8 ; CHECK-NEXT: [[ARRAYIDX17:%.*]] = getelementptr inbounds [2000 x double], [2000 x double]* @B, i32 0, i64 [[IDXPROM12]] -; CHECK-NEXT: [[TMP3:%.*]] = load double, double* [[ARRAYIDX17]], align 8 -; CHECK-NEXT: [[ADD18:%.*]] = fadd double [[TMP2]], [[TMP3]] -; CHECK-NEXT: store double [[ADD18]], double* [[ARRAYIDX13]], align 8 +; CHECK-NEXT: [[TMP2:%.*]] = bitcast double* [[ARRAYIDX4]] to <2 x double>* +; CHECK-NEXT: [[TMP3:%.*]] = load <2 x double>, <2 x double>* [[TMP2]], align 8 +; CHECK-NEXT: [[TMP4:%.*]] = fadd <2 x double> [[TMP1]], [[TMP3]] +; CHECK-NEXT: [[TMP5:%.*]] = bitcast double* [[ARRAYIDX]] to <2 x double>* +; CHECK-NEXT: store <2 x double> [[TMP4]], <2 x double>* [[TMP5]], align 8 ; CHECK-NEXT: [[ADD24:%.*]] = add nsw i32 [[MUL]], 2 ; CHECK-NEXT: [[IDXPROM25:%.*]] = sext i32 [[ADD24]] to i64 ; CHECK-NEXT: [[ARRAYIDX26:%.*]] = getelementptr inbounds [2000 x double], [2000 x double]* @A, i32 0, i64 [[IDXPROM25]] -; CHECK-NEXT: [[TMP4:%.*]] = load double, double* [[ARRAYIDX26]], align 8 +; CHECK-NEXT: [[TMP6:%.*]] = load double, double* [[ARRAYIDX26]], align 8 ; CHECK-NEXT: [[ARRAYIDX30:%.*]] = getelementptr inbounds [2000 x double], [2000 x double]* @B, i32 0, i64 [[IDXPROM25]] -; CHECK-NEXT: [[TMP5:%.*]] = load double, double* [[ARRAYIDX30]], align 8 -; CHECK-NEXT: [[ADD31:%.*]] = fadd double [[TMP4]], [[TMP5]] +; CHECK-NEXT: [[TMP7:%.*]] = load double, double* [[ARRAYIDX30]], align 8 +; CHECK-NEXT: [[ADD31:%.*]] = fadd double [[TMP6]], [[TMP7]] ; CHECK-NEXT: store double [[ADD31]], double* [[ARRAYIDX26]], align 8 ; CHECK-NEXT: ret void ; </cut>

4 years, 9 months

[TCWG CI] 464.h264ref slowed down by 6% after llvm: [SCEV] Use full logic when infering flags on add and gep

by ci_notify＠linaro.org

After llvm commit d02db32644b7360bcda54cdf739fa42abe450fcd Author: Philip Reames <listmail(a)philipreames.com> [SCEV] Use full logic when infering flags on add and gep the following benchmarks slowed down by more than 2%: - 464.h264ref slowed down by 6% from 10842 to 11545 perf samples Below reproducer instructions can be used to re-build both "first_bad" and "last_good" cross-toolchains used in this bisection. Naturally, the scripts will fail when triggerring benchmarking jobs if you don't have access to Linaro TCWG CI. For your convenience, we have uploaded tarballs with pre-processed source and assembly files at: - First_bad save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… - Last_good save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… - Baseline save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Configuration: - Benchmark: SPEC CPU2006 - Toolchain: Clang + Glibc + LLVM Linker - Version: all components were built from their tip of trunk - Target: aarch64-linux-gnu - Compiler flags: -O3 -flto - Hardware: NVidia TX1 4x Cortex-A57 This benchmarking CI is work-in-progress, and we welcome feedback and suggestions at linaro-toolchain(a)lists.linaro.org . In our improvement plans is to add support for SPEC CPU2017 benchmarks and provide "perf report/annotate" data behind these reports. THIS IS THE END OF INTERESTING STUFF. BELOW ARE LINKS TO BUILDS, REPRODUCTION INSTRUCTIONS, AND THE RAW COMMIT. This commit has regressed these CI configurations: - tcwg_bmk_llvm_tx1/llvm-master-aarch64-spec2k6-O3_LTO First_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Baseline build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Even more details: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… Reproduce builds: <cut> mkdir investigate-llvm-d02db32644b7360bcda54cdf739fa42abe450fcd cd investigate-llvm-d02db32644b7360bcda54cdf739fa42abe450fcd # Fetch scripts git clone https://git.linaro.org/toolchain/jenkins-scripts # Fetch manifests and test.sh script mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /llvm/ ./ ./bisect/baseline/ cd llvm # Reproduce first_bad build git checkout --detach d02db32644b7360bcda54cdf739fa42abe450fcd ../artifacts/test.sh # Reproduce last_good build git checkout --detach f39978b84f1d3a1da6c32db48f64c8daae64b3ad ../artifacts/test.sh cd .. </cut> Full commit (up to 1000 lines): <cut> commit d02db32644b7360bcda54cdf739fa42abe450fcd Author: Philip Reames <listmail(a)philipreames.com> Date: Sun Oct 3 15:32:15 2021 -0700 [SCEV] Use full logic when infering flags on add and gep This is a followon to D109845. With that landed, we will have fixed all known instances of pr51817, and can thus start inferring flags more aggressively with greatly reduced risk of miscompiles. This patch simply applies the same inference logic used in that patch to our other major flag inference path. We can still do much better here (on both paths), but this is our first step. Differential Revision: https://reviews.llvm.org/D111003 --- llvm/lib/Analysis/ScalarEvolution.cpp | 10 ++-------- .../Delinearization/multidim_ivs_and_integer_offsets_3d.ll | 2 +- .../Delinearization/multidim_ivs_and_parameteric_offsets_3d.ll | 2 +- llvm/test/Analysis/LoopCacheAnalysis/PowerPC/stencil.ll | 4 ++-- llvm/test/Analysis/ScalarEvolution/flags-from-poison.ll | 8 ++++---- llvm/test/Analysis/ScalarEvolution/load.ll | 2 +- llvm/test/Analysis/ScalarEvolution/ptrtoint.ll | 2 +- polly/test/IstAstInfo/simple-run-time-condition.ll | 2 +- 8 files changed, 13 insertions(+), 19 deletions(-) diff --git a/llvm/lib/Analysis/ScalarEvolution.cpp b/llvm/lib/Analysis/ScalarEvolution.cpp index 75cecbf48c08..70bf9aee6e0a 100644 --- a/llvm/lib/Analysis/ScalarEvolution.cpp +++ b/llvm/lib/Analysis/ScalarEvolution.cpp @@ -6657,14 +6657,8 @@ bool ScalarEvolution::isSCEVExprNeverPoison(const Instruction *I) { // TODO: We can do better here in some cases. if (!isSCEVable(Op->getType())) return false; - // TODO: the following two lines should be: - // if (auto *DefI = getDefinedScopeRoot(getSCEV(Op))) - // if (isGuaranteedToTransferExecutionTo(DefI, I)) - // We use the following instead for the purposes of seperating a bugfix - // change from an optimization change. Once pr51817 is fully addressed, - // we should unlock this power. - if (auto *AddRecS = dyn_cast<SCEVAddRecExpr>(getSCEV(Op))) - if (isGuaranteedToExecuteForEveryIteration(I, AddRecS->getLoop())) + if (auto *DefI = getDefinedScopeRoot(getSCEV(Op))) + if (isGuaranteedToTransferExecutionTo(DefI, I)) return true; } return false; diff --git a/llvm/test/Analysis/Delinearization/multidim_ivs_and_integer_offsets_3d.ll b/llvm/test/Analysis/Delinearization/multidim_ivs_and_integer_offsets_3d.ll index 712a52927dcb..77982c786e6e 100644 --- a/llvm/test/Analysis/Delinearization/multidim_ivs_and_integer_offsets_3d.ll +++ b/llvm/test/Analysis/Delinearization/multidim_ivs_and_integer_offsets_3d.ll @@ -11,7 +11,7 @@ ; AddRec: {{{(56 + (8 * (-4 + (3 * %m)) * %o) + %A),+,(8 * %m * %o)}<%for.i>,+,(8 * %o)}<%for.j>,+,8}<%for.k> ; CHECK: Base offset: %A ; CHECK: ArrayDecl[UnknownSize][%m][%o] with elements of 8 bytes. -; CHECK: ArrayRef[{3,+,1}<nuw><%for.i>][{-4,+,1}<nw><%for.j>][{7,+,1}<nuw><nsw><%for.k>] +; CHECK: ArrayRef[{3,+,1}<nuw><%for.i>][{-4,+,1}<nsw><%for.j>][{7,+,1}<nuw><nsw><%for.k>] define void @foo(i64 %n, i64 %m, i64 %o, double* %A) { entry: diff --git a/llvm/test/Analysis/Delinearization/multidim_ivs_and_parameteric_offsets_3d.ll b/llvm/test/Analysis/Delinearization/multidim_ivs_and_parameteric_offsets_3d.ll index e3fdb0642211..8ecd498ea211 100644 --- a/llvm/test/Analysis/Delinearization/multidim_ivs_and_parameteric_offsets_3d.ll +++ b/llvm/test/Analysis/Delinearization/multidim_ivs_and_parameteric_offsets_3d.ll @@ -11,7 +11,7 @@ ; AddRec: {{{((8 * ((((%m * %p) + %q) * %o) + %r)) + %A),+,(8 * %m * %o)}<%for.i>,+,(8 * %o)}<%for.j>,+,8}<%for.k> ; CHECK: Base offset: %A ; CHECK: ArrayDecl[UnknownSize][%m][%o] with elements of 8 bytes. -; CHECK: ArrayRef[{%p,+,1}<nw><%for.i>][{%q,+,1}<nw><%for.j>][{%r,+,1}<nsw><%for.k>] +; CHECK: ArrayRef[{%p,+,1}<nw><%for.i>][{%q,+,1}<nsw><%for.j>][{%r,+,1}<nsw><%for.k>] define void @foo(i64 %n, i64 %m, i64 %o, double* %A, i64 %p, i64 %q, i64 %r) { entry: diff --git a/llvm/test/Analysis/LoopCacheAnalysis/PowerPC/stencil.ll b/llvm/test/Analysis/LoopCacheAnalysis/PowerPC/stencil.ll index 821513199546..1f1515435e1a 100644 --- a/llvm/test/Analysis/LoopCacheAnalysis/PowerPC/stencil.ll +++ b/llvm/test/Analysis/LoopCacheAnalysis/PowerPC/stencil.ll @@ -11,8 +11,8 @@ target triple = "powerpc64le-unknown-linux-gnu" ; } ; } -; CHECK-DAG: Loop 'for.i' has cost = 20300 -; CHECK-DAG: Loop 'for.j' has cost = 700 +; CHECK-DAG: Loop 'for.i' has cost = 20600 +; CHECK-DAG: Loop 'for.j' has cost = 800 define void @foo(i64 %n, i64 %m, i32* %A, i32* %B, i32* %C) { entry: diff --git a/llvm/test/Analysis/ScalarEvolution/flags-from-poison.ll b/llvm/test/Analysis/ScalarEvolution/flags-from-poison.ll index c8d3137f8dc9..5ab24159c250 100644 --- a/llvm/test/Analysis/ScalarEvolution/flags-from-poison.ll +++ b/llvm/test/Analysis/ScalarEvolution/flags-from-poison.ll @@ -273,9 +273,9 @@ define void @test-add-scope-bound-unkn-header(i32* %input, i32 %needle) { ; CHECK-NEXT: %offset = load i32, i32* %gep, align 4 ; CHECK-NEXT: --> %offset U: full-set S: full-set Exits: <<Unknown>> LoopDispositions: { %loop: Variant } ; CHECK-NEXT: %i.next = add nuw i32 %i, %offset -; CHECK-NEXT: --> (%offset + %i) U: full-set S: full-set Exits: <<Unknown>> LoopDispositions: { %loop: Variant } +; CHECK-NEXT: --> (%offset + %i)<nuw> U: full-set S: full-set Exits: <<Unknown>> LoopDispositions: { %loop: Variant } ; CHECK-NEXT: %gep2 = getelementptr i32, i32* %input, i32 %i.next -; CHECK-NEXT: --> ((4 * (sext i32 (%offset + %i) to i64))<nsw> + %input) U: full-set S: full-set Exits: <<Unknown>> LoopDispositions: { %loop: Variant } +; CHECK-NEXT: --> ((4 * (sext i32 (%offset + %i)<nuw> to i64))<nsw> + %input) U: full-set S: full-set Exits: <<Unknown>> LoopDispositions: { %loop: Variant } ; CHECK-NEXT: Determining loop execution counts for: @test-add-scope-bound-unkn-header ; CHECK-NEXT: Loop %loop: Unpredictable backedge-taken count. ; CHECK-NEXT: Loop %loop: Unpredictable max backedge-taken count. @@ -307,9 +307,9 @@ define void @test-add-scope-bound-unkn-header2(i32* %input, i32 %needle) { ; CHECK-NEXT: %offset = load i32, i32* %gep, align 4 ; CHECK-NEXT: --> %offset U: full-set S: full-set Exits: <<Unknown>> LoopDispositions: { %loop: Variant } ; CHECK-NEXT: %i.next = add nuw i32 %i, %offset -; CHECK-NEXT: --> (%offset + %i) U: full-set S: full-set Exits: <<Unknown>> LoopDispositions: { %loop: Variant } +; CHECK-NEXT: --> (%offset + %i)<nuw> U: full-set S: full-set Exits: <<Unknown>> LoopDispositions: { %loop: Variant } ; CHECK-NEXT: %gep2 = getelementptr i32, i32* %input, i32 %i.next -; CHECK-NEXT: --> ((4 * (sext i32 (%offset + %i) to i64))<nsw> + %input) U: full-set S: full-set Exits: <<Unknown>> LoopDispositions: { %loop: Variant } +; CHECK-NEXT: --> ((4 * (sext i32 (%offset + %i)<nuw> to i64))<nsw> + %input) U: full-set S: full-set Exits: <<Unknown>> LoopDispositions: { %loop: Variant } ; CHECK-NEXT: Determining loop execution counts for: @test-add-scope-bound-unkn-header2 ; CHECK-NEXT: Loop %loop: Unpredictable backedge-taken count. ; CHECK-NEXT: Loop %loop: Unpredictable max backedge-taken count. diff --git a/llvm/test/Analysis/ScalarEvolution/load.ll b/llvm/test/Analysis/ScalarEvolution/load.ll index c0d671342af7..e95a093b2a8b 100644 --- a/llvm/test/Analysis/ScalarEvolution/load.ll +++ b/llvm/test/Analysis/ScalarEvolution/load.ll @@ -73,7 +73,7 @@ define i32 @test2() nounwind uwtable readonly { ; CHECK-NEXT: %n.01 = phi %struct.ListNode* [ bitcast ({ %struct.ListNode*, i32, [4 x i8] }* @node5 to %struct.ListNode*), %entry ], [ %1, %for.body ] ; CHECK-NEXT: --> %n.01 U: full-set S: full-set Exits: @node1 LoopDispositions: { %for.body: Variant } ; CHECK-NEXT: %i = getelementptr inbounds %struct.ListNode, %struct.ListNode* %n.01, i64 0, i32 1 -; CHECK-NEXT: --> (4 + %n.01) U: full-set S: full-set Exits: (4 + @node1)<nuw><nsw> LoopDispositions: { %for.body: Variant } +; CHECK-NEXT: --> (4 + %n.01)<nuw> U: [4,0) S: [4,0) Exits: (4 + @node1)<nuw><nsw> LoopDispositions: { %for.body: Variant } ; CHECK-NEXT: %0 = load i32, i32* %i, align 4 ; CHECK-NEXT: --> %0 U: full-set S: full-set Exits: 0 LoopDispositions: { %for.body: Variant } ; CHECK-NEXT: %add = add nsw i32 %0, %sum.02 diff --git a/llvm/test/Analysis/ScalarEvolution/ptrtoint.ll b/llvm/test/Analysis/ScalarEvolution/ptrtoint.ll index 93d8782f373e..cb40ddda9369 100644 --- a/llvm/test/Analysis/ScalarEvolution/ptrtoint.ll +++ b/llvm/test/Analysis/ScalarEvolution/ptrtoint.ll @@ -502,7 +502,7 @@ define void @pr46786_c26_int(i32* %arg, i32* %arg1, i32* %arg2) { ; X32-NEXT: %i11 = ashr exact i64 %i10, 2 ; X32-NEXT: --> %i11 U: [-2147483648,2147483648) S: [-2147483648,2147483648) Exits: <<Unknown>> LoopDispositions: { %bb6: Variant } ; X32-NEXT: %i12 = getelementptr inbounds i32, i32* %arg2, i64 %i11 -; X32-NEXT: --> ((4 * (trunc i64 %i11 to i32)) + %arg2) U: full-set S: full-set Exits: <<Unknown>> LoopDispositions: { %bb6: Variant } +; X32-NEXT: --> ((4 * (trunc i64 %i11 to i32))<nsw> + %arg2) U: full-set S: full-set Exits: <<Unknown>> LoopDispositions: { %bb6: Variant } ; X32-NEXT: %i13 = load i32, i32* %i12, align 4 ; X32-NEXT: --> %i13 U: full-set S: full-set Exits: <<Unknown>> LoopDispositions: { %bb6: Variant } ; X32-NEXT: %i14 = add nsw i32 %i13, %i8 diff --git a/polly/test/IstAstInfo/simple-run-time-condition.ll b/polly/test/IstAstInfo/simple-run-time-condition.ll index aba5d9e34f50..0d167566291b 100644 --- a/polly/test/IstAstInfo/simple-run-time-condition.ll +++ b/polly/test/IstAstInfo/simple-run-time-condition.ll @@ -20,7 +20,7 @@ target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f3 ; for the delinearization is simplified such that conditions that would not ; cause any code to be executed are not generated. -; CHECK: if (((o >= 1 && q <= 0 && m + q >= 0) || (o <= 0 && m + q >= 100 && q <= 100)) && 0 == ((m >= 1 && n + p >= 9223372036854775809) || (o <= 0 && n >= 1 && m + q >= 9223372036854775909) || (o <= 0 && m >= 1 && n >= 1 && q <= -9223372036854775709))) +; CHECK: if (((o >= 1 && q <= 0 && m + q >= 0) || (o <= 0 && m + q >= 100 && q <= 100)) && 0 == ((o <= 0 && n >= 1 && m + q >= 9223372036854775909) || (o <= 0 && m >= 1 && n >= 1 && q <= -9223372036854775709))) ; CHECK: if (o <= 0) { ; CHECK: for (int c0 = 0; c0 < n; c0 += 1) </cut>

4 years, 9 months

[TCWG CI] Regression caused by gcc: Enhance -Waddress to detect more suspicious expressions [PR102103].

by ci_notify＠linaro.org

[TCWG CI] Regression caused by gcc: Enhance -Waddress to detect more suspicious expressions [PR102103].: commit 4dc7ce6fb3917958d1a6036d8acf2953b9c1b868 Author: Martin Sebor <msebor(a)redhat.com> Enhance -Waddress to detect more suspicious expressions [PR102103]. Results regressed to # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1: -5 # build_abe qemu: -2 # linux_n_obj: 21397 # First few build errors in logs: # 00:02:12 sound/core/oss/mixer_oss.c:1035:21: error: ‘slot’ is used uninitialized [-Werror=uninitialized] # 00:02:15 make[3]: *** [scripts/Makefile.build:288: sound/core/oss/mixer_oss.o] Error 1 # 00:02:15 sound/core/oss/pcm_oss.c:108:29: error: ‘t’ is used uninitialized [-Werror=uninitialized] # 00:02:15 sound/core/oss/pcm_oss.c:2475:34: error: ‘setup’ is used uninitialized [-Werror=uninitialized] # 00:02:15 sound/core/oss/pcm_oss.c:2985:51: error: ‘template’ is used uninitialized [-Werror=uninitialized] # 00:02:15 sound/core/seq/oss/seq_oss_init.c:350:35: error: ‘qinfo’ is used uninitialized [-Werror=uninitialized] # 00:02:15 sound/core/seq/oss/seq_oss_init.c:370:35: error: ‘qinfo’ is used uninitialized [-Werror=uninitialized] # 00:02:16 make[4]: *** [scripts/Makefile.build:288: sound/core/seq/oss/seq_oss_init.o] Error 1 # 00:02:22 make[3]: *** [scripts/Makefile.build:288: sound/core/oss/pcm_oss.o] Error 1 # 00:02:28 make[3]: *** [scripts/Makefile.build:571: sound/core/seq/oss] Error 2 from # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1: -5 # build_abe qemu: -2 # linux_n_obj: 21403 THIS IS THE END OF INTERESTING STUFF. BELOW ARE LINKS TO BUILDS, REPRODUCTION INSTRUCTIONS, AND THE RAW COMMIT. This commit has regressed these CI configurations: - tcwg_kernel/gnu-master-aarch64-next-allmodconfig First_bad build: https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-aarch64-next-al… Last_good build: https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-aarch64-next-al… Baseline build: https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-aarch64-next-al… Even more details: https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-aarch64-next-al… Reproduce builds: <cut> mkdir investigate-gcc-4dc7ce6fb3917958d1a6036d8acf2953b9c1b868 cd investigate-gcc-4dc7ce6fb3917958d1a6036d8acf2953b9c1b868 # Fetch scripts git clone https://git.linaro.org/toolchain/jenkins-scripts # Fetch manifests and test.sh script mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-aarch64-next-al… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-aarch64-next-al… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-aarch64-next-al… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_kernel-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /gcc/ ./ ./bisect/baseline/ cd gcc # Reproduce first_bad build git checkout --detach 4dc7ce6fb3917958d1a6036d8acf2953b9c1b868 ../artifacts/test.sh # Reproduce last_good build git checkout --detach f1710910087fb1f4a7706e9ce838163ffcbc50b4 ../artifacts/test.sh cd .. </cut> Full commit (up to 1000 lines): <cut> commit 4dc7ce6fb3917958d1a6036d8acf2953b9c1b868 Author: Martin Sebor <msebor(a)redhat.com> Date: Fri Oct 1 11:50:25 2021 -0600 Enhance -Waddress to detect more suspicious expressions [PR102103]. Resolves: PR c/102103 - missing warning comparing array address to null gcc/ChangeLog: PR c/102103 * doc/invoke.texi (-Waddress): Update. * gengtype.c (write_types): Avoid -Waddress. * poly-int.h (POLY_SET_COEFF): Avoid using null. gcc/c-family/ChangeLog: PR c/102103 * c-common.c (decl_with_nonnull_addr_p): Handle members. Check and perform warning suppression. (c_common_truthvalue_conversion): Enhance warning suppression. gcc/c/ChangeLog: PR c/102103 * c-typeck.c (maybe_warn_for_null_address): New function. (build_binary_op): Call it. gcc/cp/ChangeLog: PR c/102103 * typeck.c (warn_for_null_address): Enhance. (cp_build_binary_op): Call it also for member pointers. gcc/fortran/ChangeLog: PR c/102103 * array.c: Remove an unnecessary test. * trans-array.c: Same. gcc/testsuite/ChangeLog: PR c/102103 * g++.dg/cpp0x/constexpr-array-ptr10.C: Suppress a valid warning. * g++.dg/warn/Wreturn-local-addr-6.C: Correct a cast. * gcc.dg/Waddress.c: Expect a warning. * c-c++-common/Waddress-3.c: New test. * c-c++-common/Waddress-4.c: New test. * g++.dg/warn/Waddress-5.C: New test. * g++.dg/warn/Waddress-6.C: New test. * g++.dg/warn/pr101219.C: Expect a warning. * gcc.dg/Waddress-3.c: New test. --- gcc/c-family/c-common.c | 29 +++-- gcc/c/c-typeck.c | 140 ++++++++++++++++----- gcc/cp/typeck.c | 94 ++++++++++++-- gcc/doc/invoke.texi | 48 +++++-- gcc/fortran/array.c | 2 +- gcc/fortran/trans-array.c | 1 - gcc/gengtype.c | 4 +- gcc/poly-int.h | 4 +- gcc/testsuite/c-c++-common/Waddress-3.c | 125 ++++++++++++++++++ gcc/testsuite/c-c++-common/Waddress-4.c | 106 ++++++++++++++++ gcc/testsuite/g++.dg/cpp0x/constexpr-array-ptr10.C | 5 +- gcc/testsuite/g++.dg/warn/Waddress-5.C | 115 +++++++++++++++++ gcc/testsuite/g++.dg/warn/Waddress-6.C | 79 ++++++++++++ gcc/testsuite/g++.dg/warn/Wreturn-local-addr-6.C | 4 +- gcc/testsuite/g++.dg/warn/pr101219.C | 4 +- gcc/testsuite/gcc.dg/Waddress-3.c | 35 ++++++ gcc/testsuite/gcc.dg/Waddress.c | 2 +- 17 files changed, 722 insertions(+), 75 deletions(-) diff --git a/gcc/c-family/c-common.c b/gcc/c-family/c-common.c index 5845c675e85..9d19e352725 100644 --- a/gcc/c-family/c-common.c +++ b/gcc/c-family/c-common.c @@ -3393,14 +3393,16 @@ c_wrap_maybe_const (tree expr, bool non_const) return expr; } -/* Return whether EXPR is a declaration whose address can never be - NULL. */ +/* Return whether EXPR is a declaration whose address can never be NULL. + The address of the first struct member could be NULL only if it were + accessed through a NULL pointer, and such an access would be invalid. */ bool decl_with_nonnull_addr_p (const_tree expr) { return (DECL_P (expr) - && (TREE_CODE (expr) == PARM_DECL + && (TREE_CODE (expr) == FIELD_DECL + || TREE_CODE (expr) == PARM_DECL || TREE_CODE (expr) == LABEL_DECL || !DECL_WEAK (expr))); } @@ -3488,13 +3490,17 @@ c_common_truthvalue_conversion (location_t location, tree expr) case ADDR_EXPR: { tree inner = TREE_OPERAND (expr, 0); - if (decl_with_nonnull_addr_p (inner)) + if (decl_with_nonnull_addr_p (inner) + /* Check both EXPR and INNER for suppression. */ + && !warning_suppressed_p (expr, OPT_Waddress) + && !warning_suppressed_p (inner, OPT_Waddress)) { - /* Common Ada programmer's mistake. */ + /* Common Ada programmer's mistake. */ warning_at (location, OPT_Waddress, "the address of %qD will always evaluate as %<true%>", inner); + suppress_warning (inner, OPT_Waddress); return truthvalue_true_node; } break; @@ -3627,8 +3633,17 @@ c_common_truthvalue_conversion (location_t location, tree expr) break; /* If this isn't narrowing the argument, we can ignore it. */ if (TYPE_PRECISION (totype) >= TYPE_PRECISION (fromtype)) - return c_common_truthvalue_conversion (location, - TREE_OPERAND (expr, 0)); + { + tree op0 = TREE_OPERAND (expr, 0); + if ((TREE_CODE (fromtype) == POINTER_TYPE + && TREE_CODE (totype) == INTEGER_TYPE) + || warning_suppressed_p (expr, OPT_Waddress)) + /* Suppress -Waddress for casts to intptr_t, propagating + any suppression from the enclosing expression to its + operand. */ + suppress_warning (op0, OPT_Waddress); + return c_common_truthvalue_conversion (location, op0); + } } break; diff --git a/gcc/c/c-typeck.c b/gcc/c/c-typeck.c index c74f876e667..33963d7555a 100644 --- a/gcc/c/c-typeck.c +++ b/gcc/c/c-typeck.c @@ -11554,6 +11554,110 @@ build_vec_cmp (tree_code code, tree type, return build3 (VEC_COND_EXPR, type, cmp, minus_one_vec, zero_vec); } +/* Possibly warn about an address of OP never being NULL in a comparison + operation CODE involving null. */ + +static void +maybe_warn_for_null_address (location_t loc, tree op, tree_code code) +{ + if (!warn_address || warning_suppressed_p (op, OPT_Waddress)) + return; + + if (TREE_CODE (op) == NOP_EXPR) + { + /* Allow casts to intptr_t to suppress the warning. */ + tree type = TREE_TYPE (op); + if (TREE_CODE (type) == INTEGER_TYPE) + return; + op = TREE_OPERAND (op, 0); + } + + if (TREE_CODE (op) == POINTER_PLUS_EXPR) + { + /* Allow a cast to void* to suppress the warning. */ + tree type = TREE_TYPE (TREE_TYPE (op)); + if (VOID_TYPE_P (type)) + return; + + /* Adding any value to a null pointer, including zero, is undefined + in C. This includes the expression &p[0] where p is the null + pointer, although &p[0] will have been folded to p by this point + and so not diagnosed. */ + if (code == EQ_EXPR) + warning_at (loc, OPT_Waddress, + "the comparison will always evaluate as %<false%> " + "for the pointer operand in %qE must not be NULL", + op); + else + warning_at (loc, OPT_Waddress, + "the comparison will always evaluate as %<true%> " + "for the pointer operand in %qE must not be NULL", + op); + + return; + } + + if (TREE_CODE (op) != ADDR_EXPR) + return; + + op = TREE_OPERAND (op, 0); + + if (TREE_CODE (op) == IMAGPART_EXPR + || TREE_CODE (op) == REALPART_EXPR) + { + /* The address of either complex part may not be null. */ + if (code == EQ_EXPR) + warning_at (loc, OPT_Waddress, + "the comparison will always evaluate as %<false%> " + "for the address of %qE will never be NULL", + op); + else + warning_at (loc, OPT_Waddress, + "the comparison will always evaluate as %<true%> " + "for the address of %qE will never be NULL", + op); + return; + } + + /* Set to true in the loop below if OP dereferences is operand. + In such a case the ultimate target need not be a decl for + the null [in]equality test to be constant. */ + bool deref = false; + + /* Get the outermost array or object, or member. */ + while (handled_component_p (op)) + { + if (TREE_CODE (op) == COMPONENT_REF) + { + /* Get the member (its address is never null). */ + op = TREE_OPERAND (op, 1); + break; + } + + /* Get the outer array/object to refer to in the warning. */ + op = TREE_OPERAND (op, 0); + deref = true; + } + + if ((!deref && !decl_with_nonnull_addr_p (op)) + || from_macro_expansion_at (loc)) + return; + + if (code == EQ_EXPR) + warning_at (loc, OPT_Waddress, + "the comparison will always evaluate as %<false%> " + "for the address of %qE will never be NULL", + op); + else + warning_at (loc, OPT_Waddress, + "the comparison will always evaluate as %<true%> " + "for the address of %qE will never be NULL", + op); + + if (DECL_P (op)) + inform (DECL_SOURCE_LOCATION (op), "%qD declared here", op); +} + /* Build a binary-operation expression without default conversions. CODE is the kind of expression to build. LOCATION is the operator's location. @@ -12189,44 +12293,12 @@ build_binary_op (location_t location, enum tree_code code, short_compare = 1; else if (code0 == POINTER_TYPE && null_pointer_constant_p (orig_op1)) { - if (TREE_CODE (op0) == ADDR_EXPR - && decl_with_nonnull_addr_p (TREE_OPERAND (op0, 0)) - && !from_macro_expansion_at (location)) - { - if (code == EQ_EXPR) - warning_at (location, - OPT_Waddress, - "the comparison will always evaluate as %<false%> " - "for the address of %qD will never be NULL", - TREE_OPERAND (op0, 0)); - else - warning_at (location, - OPT_Waddress, - "the comparison will always evaluate as %<true%> " - "for the address of %qD will never be NULL", - TREE_OPERAND (op0, 0)); - } + maybe_warn_for_null_address (location, op0, code); result_type = type0; } else if (code1 == POINTER_TYPE && null_pointer_constant_p (orig_op0)) { - if (TREE_CODE (op1) == ADDR_EXPR - && decl_with_nonnull_addr_p (TREE_OPERAND (op1, 0)) - && !from_macro_expansion_at (location)) - { - if (code == EQ_EXPR) - warning_at (location, - OPT_Waddress, - "the comparison will always evaluate as %<false%> " - "for the address of %qD will never be NULL", - TREE_OPERAND (op1, 0)); - else - warning_at (location, - OPT_Waddress, - "the comparison will always evaluate as %<true%> " - "for the address of %qD will never be NULL", - TREE_OPERAND (op1, 0)); - } + maybe_warn_for_null_address (location, op1, code); result_type = type1; } else if (code0 == POINTER_TYPE && code1 == POINTER_TYPE) diff --git a/gcc/cp/typeck.c b/gcc/cp/typeck.c index cd130f16a66..e880d34dcfe 100644 --- a/gcc/cp/typeck.c +++ b/gcc/cp/typeck.c @@ -4603,25 +4603,93 @@ warn_for_null_address (location_t location, tree op, tsubst_flags_t complain) || warning_suppressed_p (op, OPT_Waddress)) return; + if (TREE_CODE (op) == NON_DEPENDENT_EXPR) + op = TREE_OPERAND (op, 0); + tree cop = fold_for_warn (op); - if (TREE_CODE (cop) == ADDR_EXPR - && decl_with_nonnull_addr_p (TREE_OPERAND (cop, 0)) - && !warning_suppressed_p (cop, OPT_Waddress)) - warning_at (location, OPT_Waddress, "the address of %qD will never " - "be NULL", TREE_OPERAND (cop, 0)); + if (TREE_CODE (cop) == NON_LVALUE_EXPR) + /* Unwrap the expression for C++ 98. */ + cop = TREE_OPERAND (cop, 0); - if (CONVERT_EXPR_P (op) + if (TREE_CODE (cop) == PTRMEM_CST) + { + /* The address of a nonstatic data member is never null. */ + warning_at (location, OPT_Waddress, + "the address %qE will never be NULL", + cop); + return; + } + + if (TREE_CODE (cop) == NOP_EXPR) + { + /* Allow casts to intptr_t to suppress the warning. */ + tree type = TREE_TYPE (cop); + if (TREE_CODE (type) == INTEGER_TYPE) + return; + + STRIP_NOPS (cop); + } + + bool warned = false; + if (TREE_CODE (cop) == ADDR_EXPR) + { + cop = TREE_OPERAND (cop, 0); + + /* Set to true in the loop below if OP dereferences its operand. + In such a case the ultimate target need not be a decl for + the null [in]equality test to be necessarily constant. */ + bool deref = false; + + /* Get the outermost array or object, or member. */ + while (handled_component_p (cop)) + { + if (TREE_CODE (cop) == COMPONENT_REF) + { + /* Get the member (its address is never null). */ + cop = TREE_OPERAND (cop, 1); + break; + } + + /* Get the outer array/object to refer to in the warning. */ + cop = TREE_OPERAND (cop, 0); + deref = true; + } + + if ((!deref && !decl_with_nonnull_addr_p (cop)) + || from_macro_expansion_at (location) + || warning_suppressed_p (cop, OPT_Waddress)) + return; + + warned = warning_at (location, OPT_Waddress, + "the address of %qD will never be NULL", cop); + op = cop; + } + else if (TREE_CODE (cop) == POINTER_PLUS_EXPR) + { + /* Adding zero to the null pointer is well-defined in C++. When + the offset is unknown (i.e., not a constant) warn anyway since + it's less likely that the pointer operand is null than not. */ + tree off = TREE_OPERAND (cop, 1); + if (!integer_zerop (off) + && !warning_suppressed_p (cop, OPT_Waddress)) + warning_at (location, OPT_Waddress, "comparing the result of pointer " + "addition %qE and NULL", cop); + return; + } + else if (CONVERT_EXPR_P (op) && TYPE_REF_P (TREE_TYPE (TREE_OPERAND (op, 0)))) { - tree inner_op = op; - STRIP_NOPS (inner_op); + STRIP_NOPS (op); - if (DECL_P (inner_op)) - warning_at (location, OPT_Waddress, - "the compiler can assume that the address of " - "%qD will never be NULL", inner_op); + if (DECL_P (op)) + warned = warning_at (location, OPT_Waddress, + "the compiler can assume that the address of " + "%qD will never be NULL", op); } + + if (warned && DECL_P (op)) + inform (DECL_SOURCE_LOCATION (op), "%qD declared here", op); } /* Warn about [expr.arith.conv]/2: If one operand is of enumeration type and @@ -5411,6 +5479,8 @@ cp_build_binary_op (const op_location_t &location, op1 = cp_convert (TREE_TYPE (op0), op1, complain); } result_type = TREE_TYPE (op0); + + warn_for_null_address (location, orig_op0, complain); } else if (TYPE_PTRMEMFUNC_P (type1) && null_ptr_cst_p (orig_op0)) return cp_build_binary_op (location, code, op1, op0, complain); diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index 5f39b208049..d35114c0727 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -8551,17 +8551,43 @@ by @option{-Wall}. @item -Waddress @opindex Waddress @opindex Wno-address -Warn about suspicious uses of memory addresses. These include using -the address of a function in a conditional expression, such as -@code{void func(void); if (func)}, and comparisons against the memory -address of a string literal, such as @code{if (x == "abc")}. Such -uses typically indicate a programmer error: the address of a function -always evaluates to true, so their use in a conditional usually -indicate that the programmer forgot the parentheses in a function -call; and comparisons against string literals result in unspecified -behavior and are not portable in C, so they usually indicate that the -programmer intended to use @code{strcmp}. This warning is enabled by -@option{-Wall}. +Warn about suspicious uses of address expressions. These include comparing +the address of a function or a declared object to the null pointer constant +such as in +@smallexample +void f (void); +void g (void) +@{ + if (!func) // warning: expression evaluates to false + abort (); +@} +@end smallexample +comparisons of a pointer to a string literal, such as in +@smallexample +void f (const char *x) +@{ + if (x == "abc") // warning: expression evaluates to false + puts ("equal"); +@} +@end smallexample +and tests of the results of pointer addition or subtraction for equality +to null, such as in +@smallexample +void f (const int *p, int i) +@{ + return p + i == NULL; +@} +@end smallexample +Such uses typically indicate a programmer error: the address of most +functions and objects necessarily evaluates to true (the exception are +weak symbols), so their use in a conditional might indicate missing +parentheses in a function call or a missing dereference in an array +expression. The subset of the warning for object pointers can be +suppressed by casting the pointer operand to an integer type such +as @code{inptr_t} or @code{uinptr_t}. +Comparisons against string literals result in unspecified behavior +and are not portable, and suggest the intent was to call @code{strcmp}. +@option{-Waddress} warning is enabled by @option{-Wall}. @item -Wno-address-of-packed-member @opindex Waddress-of-packed-member diff --git a/gcc/fortran/array.c b/gcc/fortran/array.c index a4d1cb4c72d..6552eaf3b0c 100644 --- a/gcc/fortran/array.c +++ b/gcc/fortran/array.c @@ -2581,7 +2581,7 @@ gfc_array_dimen_size (gfc_expr *array, int dimen, mpz_t *result) } } - if (array->shape && array->shape[dimen]) + if (array->shape) { mpz_init_set (*result, array->shape[dimen]); return true; diff --git a/gcc/fortran/trans-array.c b/gcc/fortran/trans-array.c index b8061f37772..e2f59e0823c 100644 --- a/gcc/fortran/trans-array.c +++ b/gcc/fortran/trans-array.c @@ -5104,7 +5104,6 @@ set_loop_bounds (gfc_loopinfo *loop) if (info->shape) { - gcc_assert (info->shape[dim]); /* The frontend has worked out the size for us. */ if (!loopspec[n] || !specinfo->shape diff --git a/gcc/gengtype.c b/gcc/gengtype.c index 31d4bf4e5d0..a77cfd92bfa 100644 --- a/gcc/gengtype.c +++ b/gcc/gengtype.c @@ -3685,8 +3685,8 @@ write_types (outf_p output_header, type_p structures, output_mangled_typename (output_header, s); oprintf (output_header, "(X) do { \\\n"); oprintf (output_header, - " if (X != NULL) gt_%sx_%s (X);\\\n", wtd->prefix, - s_id_for_tag); + " if ((intptr_t)(X) != 0) gt_%sx_%s (X);\\\n", + wtd->prefix, s_id_for_tag); oprintf (output_header, " } while (0)\n"); for (opt = s->u.s.opt; opt; opt = opt->next) diff --git a/gcc/poly-int.h b/gcc/poly-int.h index f47f9e436a8..94e7b701f64 100644 --- a/gcc/poly-int.h +++ b/gcc/poly-int.h @@ -324,10 +324,10 @@ struct poly_result<T1, T2, 2> routine can take the address of RES rather than the address of a temporary. - The dummy comparison against a null C * is just a way of checking + The dummy self-comparison against C * is just a way of checking that C gives the right type. */ #define POLY_SET_COEFF(C, RES, I, VALUE) \ - ((void) (&(RES).coeffs[0] == (C *) 0), \ + ((void) (&(RES).coeffs[0] == (C *) (void *) &(RES).coeffs[0]), \ wi::int_traits<C>::precision_type == wi::FLEXIBLE_PRECISION \ ? (void) ((RES).coeffs[I] = VALUE) \ : (void) ((RES).coeffs[I].~C (), new (&(RES).coeffs[I]) C (VALUE))) diff --git a/gcc/testsuite/c-c++-common/Waddress-3.c b/gcc/testsuite/c-c++-common/Waddress-3.c new file mode 100644 index 00000000000..9a13a444045 --- /dev/null +++ b/gcc/testsuite/c-c++-common/Waddress-3.c @@ -0,0 +1,125 @@ +/* PR c/102103 - missing warning comparing array address to null + { dg-do compile } + { dg-options "-Wall" } */ + +typedef __INTPTR_TYPE__ intptr_t; +typedef __UINTPTR_TYPE__ uintptr_t; + +#ifndef __cplusplus +# define bool _Bool +#endif + +struct S { void *p, *a1[2], *a2[2][2]; } s, *p; + +extern const void *a1[2]; +extern void *a2[2][2], *ax[]; + +void T (bool); + +void test_array_eq_0 (int i) +{ + // Verify that casts intptr_t suppress the warning. + T ((intptr_t)a1 == 0); + T ((uintptr_t)a1 == 0); + T (a1 == 0); // { dg-warning "-Waddress" } + T (0 == &a1); // { dg-warning "-Waddress" } + // Verify that casts to other pointer types don't suppress it. + T ((void *)a1 == 0); // { dg-warning "-Waddress" } + T ((char *)a1 == 0); // { dg-warning "-Waddress" } + T (a1[0] == 0); + T (0 == (intptr_t)&a1[0]); + T (0 == &a1[0]); // { dg-warning "-Waddress" } + T (a1[i] == 0); + T (0 == (uintptr_t)&a1[i]); + T (0 == &a1[i]); // { dg-warning "-Waddress" } + + T ((intptr_t)a2 == 0); + T (a2 == 0); // { dg-warning "-Waddress" } + T (0 == &a2); // { dg-warning "-Waddress" } + T (a2[0] == 0); // { dg-warning "-Waddress" } + T (0 == &a1[0]); // { dg-warning "-Waddress" } + T (a2[i] == 0); // { dg-warning "-Waddress" } + T (0 == &a2[i]); // { dg-warning "-Waddress" } + T (a2[0][0] == 0); + T (0 == &a2[0][0]); // { dg-warning "-Waddress" } + T (&ax == 0); // { dg-warning "-Waddress" } + T (0 == &ax); // { dg-warning "-Waddress" } + T (&ax[0] == 0); // { dg-warning "-Waddress" } + T (0 == ax[0]); +} + + +void test_array_neq_0 (int i) +{ + // Verify that casts to intptr_t suppress the warning. + T ((uintptr_t)a1); + + T (a1); // { dg-warning "-Waddress" } + T ((void *)a1); // { dg-warning "-Waddress" } + T (&a1 != 0); // { dg-warning "-Waddress" } + T (a1[0]); + T (&a1[0] != 0); // { dg-warning "-Waddress" } + T (a1[i]); + T (&a1[i] != 0); // { dg-warning "-Waddress" } + + T ((intptr_t)a2); + T (a2); // { dg-warning "-Waddress" } + T ((void *)a2); // { dg-warning "-Waddress" } + T ((char *)a2); // { dg-warning "-Waddress" } + T (&a2 != 0); // { dg-warning "-Waddress" } + T (a2[0]); // { dg-warning "-Waddress" } + T (&a1[0] != 0); // { dg-warning "-Waddress" } + T (a2[i]); // { dg-warning "-Waddress" } + T (&a2[i] != 0); // { dg-warning "-Waddress" } + T (a2[0][0]); + T (&a2[0][0] != 0); // { dg-warning "-Waddress" } +} + + +void test_member_array_eq_0 (int i) +{ + // Verify that casts to intptr_t suppress the warning. + T ((intptr_t)s.a1 == 0); + T (s.a1 == 0); // { dg-warning "-Waddress" } + T (0 == &a1); // { dg-warning "-Waddress" } + T (s.a1[0] == 0); + T ((void*)s.a1); // { dg-warning "-Waddress" } + T (0 == &a1[0]); // { dg-warning "-Waddress" } + T (s.a1[i] == 0); + T (0 == &a1[i]); // { dg-warning "-Waddress" } + + T ((uintptr_t)s.a2 == 0); + T (s.a2 == 0); // { dg-warning "-Waddress" } + T (0 == &a2); // { dg-warning "-Waddress" } + T ((void *)s.a2 == 0);// { dg-warning "-Waddress" } + T (s.a2[0] == 0); // { dg-warning "-Waddress" } + T (0 == &a1[0]); // { dg-warning "-Waddress" } + T (s.a2[i] == 0); // { dg-warning "-Waddress" } + T (0 == &a2[i]); // { dg-warning "-Waddress" } + T (s.a2[0][0] == 0); + T (0 == &a2[0][0]); // { dg-warning "-Waddress" } +} + + +void test_member_array_neq_0 (int i) +{ + // Verify that casts to intptr_t suppress the warning. + T ((uintptr_t)s.a1); + T (s.a1); // { dg-warning "-Waddress" } + T (&s.a1 != 0); // { dg-warning "-Waddress" } + T ((void *)&s.a1[0]); // { dg-warning "-Waddress" } + T (s.a1[0]); + T (&s.a1[0] != 0); // { dg-warning "-Waddress" } + T (s.a1[i]); + T (&s.a1[i] != 0); // { dg-warning "-Waddress" } + + T ((intptr_t)s.a2); + T (s.a2); // { dg-warning "-Waddress" } + T (&s.a2 != 0); // { dg-warning "-Waddress" } + T (s.a2[0]); // { dg-warning "-Waddress" } + T (&s.a1[0] != 0); // { dg-warning "-Waddress" } + T (s.a2[i]); // { dg-warning "-Waddress" } + T (&s.a2[i] != 0); // { dg-warning "-Waddress" } + T (s.a2[0][0]); + T (&s.a2[0][0] != 0); // { dg-warning "-Waddress" } +} diff --git a/gcc/testsuite/c-c++-common/Waddress-4.c b/gcc/testsuite/c-c++-common/Waddress-4.c new file mode 100644 index 00000000000..489a0cd717c --- /dev/null +++ b/gcc/testsuite/c-c++-common/Waddress-4.c @@ -0,0 +1,106 @@ +/* PR c/102103 - missing warning comparing array address to null + { dg-do compile } + { dg-options "-Wall" } */ + +typedef __INTPTR_TYPE__ intptr_t; +typedef __INTPTR_TYPE__ uintptr_t; + +extern char *ax[], *a2[][2]; + +void T (int); + +void test_ax_plus_eq_0 (int i) +{ + // Verify that casts to intptr_t suppress the warning. + T ((intptr_t)(ax + 0) == 0); + T ((uintptr_t)(ax + 1) == 0); + + T (ax + 0 == 0); // { dg-warning "-Waddress" } + T (&ax[0] == 0); // { dg-warning "-Waddress" } + T (ax - 1 == 0); // { dg-warning "-Waddress" } + T (0 == &ax[-1]); // { dg-warning "-Waddress" } + T ((void *)(&ax[0] + 2) == 0); // { dg-warning "-Waddress" } + T (&ax[0] + 2 == 0); // { dg-warning "-Waddress" } + T (ax + 3 == 0); // { dg-warning "-Waddress" } + T (0 == &ax[-4]); // { dg-warning "-Waddress" } + T (ax - i == 0); // { dg-warning "-Waddress" } + T (&ax[i] == 0); // { dg-warning "-Waddress" } + T (0 == &ax[1] + i); // { dg-warning "-Waddress" } +} + +void test_a2_plus_eq_0 (int i) +{ + // Verify that casts to intptr_t suppress the warning. + T ((intptr_t)(a2 + 0) == 0); + T ((uintptr_t)(a2 + 1) == 0); + + T (a2 + 0 == 0); // { dg-warning "-Waddress" } + // Verify that a cast to another pointer type doesn't suppress it. + T ((void*)(a2 + 0) == 0); // { dg-warning "-Waddress" } + T ((char*)a2 + 1 == 0); // { dg-warning "-Waddress" } + T (&a2[0] == 0); // { dg-warning "-Waddress" } + T (a2 - 1 == 0); // { dg-warning "-Waddress" } + T (0 == &a2[-1]); // { dg-warning "-Waddress" } + T (a2 + 2 == 0); // { dg-warning "-Waddress" } + T (0 == &a2[-2]); // { dg-warning "-Waddress" } + T (a2 - i == 0); // { dg-warning "-Waddress" } + T (&a2[i] == 0); // { dg-warning "-Waddress" } +} + +// Exercise a pointer. +void test_p_plus_eq_0 (int *p, int i) +{ + /* P + 0 and equivalently &P[0] are invalid for a null P but they're + folded to p before the warning has a chance to trigger. */ + T (p + 0 == 0); // { dg-warning "-Waddress" "pr102555" { xfail *-*-* } } + T (&p[0] == 0); // { dg-warning "-Waddress" "pr102555" { xfail *-*-* } } + + T (p - 1 == 0); // { dg-warning "-Waddress" } + T (0 == &p[-1]); // { dg-warning "-Waddress" } + T (p + 2 == 0); // { dg-warning "-Waddress" } + T (0 == &p[-2]); // { dg-warning "-Waddress" } + T (p - i == 0); // { dg-warning "-Waddress" } + T (&p[i] == 0); // { dg-warning "-Waddress" } +} + +// Exercise pointer to array. +void test_pa_plus_eq_0 (int (*p)[], int (*p2)[][2], int i) +{ + // The array pointer may be null. + T (*p == 0); + /* &**P is equivalent to *P and might be the result od macro expansion. + Verify it doesn't cause a warning. */ + T (&**p == 0); + + /* *P + 0 is invalid but folded to *P before the warning has a chance + to trigger. */ + T (*p + 0 == 0); // { dg-warning "-Waddress" "pr102555" { xfail *-*-* } } + + T (&(*p)[0] == 0); // { dg-warning "-Waddress" } + T (*p - 1 == 0); // { dg-warning "-Waddress" } + T (0 == &(*p)[-1]); // { dg-warning "-Waddress" } + T (*p + 2 == 0); // { dg-warning "-Waddress" } + T (0 == &(*p)[-2]); // { dg-warning "-Waddress" } + T (*p - i == 0); // { dg-warning "-Waddress" } + T (&(*p)[i] == 0); // { dg-warning "-Waddress" } + + + /* Similar to the above but for a pointer to a two-dimensional array, + referring to the higher-level element (i.e., an array itself). */ + T (*p2 == 0); + T (**p2 == 0); // { dg-warning "-Waddress" "pr102555" { xfail *-*-* } } + T (&**p2 == 0); // { dg-warning "-Waddress" "pr102555" { xfail *-*-* } } + T (&***p2 == 0); // { dg-warning "-Waddress" "pr102555" { xfail *-*-* } } + T (&**p2 == 0); + + T (*p2 + 0 == 0); // { dg-warning "-Waddress" "pr102555" { xfail *-*-* } } + T (&(*p2)[0] == 0); // { dg-warning "-Waddress" } + T (&(*p2)[0][1] == 0); // { dg-warning "-Waddress" } + T (*p2 - 1 == 0); // { dg-warning "-Waddress" } + T (0 == &(*p2)[-1]); // { dg-warning "-Waddress" } + T (0 == &(*p2)[1][2]); // { dg-warning "-Waddress" } + T (*p2 + 2 == 0); // { dg-warning "-Waddress" } + T (0 == &(*p2)[-2]); // { dg-warning "-Waddress" } + T (*p2 - i == 0); // { dg-warning "-Waddress" } + T (&(*p2)[i] == 0); // { dg-warning "-Waddress" } +} diff --git a/gcc/testsuite/g++.dg/cpp0x/constexpr-array-ptr10.C b/gcc/testsuite/g++.dg/cpp0x/constexpr-array-ptr10.C index 5224bb14234..63295230d51 100644 --- a/gcc/testsuite/g++.dg/cpp0x/constexpr-array-ptr10.C +++ b/gcc/testsuite/g++.dg/cpp0x/constexpr-array-ptr10.C @@ -85,8 +85,11 @@ extern __attribute__ ((weak)) int i; constexpr int *p1 = &i + 1; #pragma GCC diagnostic push +// Suppress warning: ordered comparison of pointer with integer zero. #pragma GCC diagnostic ignored "-Wextra" -// Suppress warning: ordered comparison of pointer with integer zero +// Also suppress -Waddress for comparisons of constant addresses to +// to null. +#pragma GCC diagnostic ignored "-Waddress" constexpr bool b0 = p1; // { dg-error "not a constant expression" } constexpr bool b1 = p1 == 0; // { dg-error "not a constant expression" } diff --git a/gcc/testsuite/g++.dg/warn/Waddress-5.C b/gcc/testsuite/g++.dg/warn/Waddress-5.C new file mode 100644 index 00000000000..b1ad38a8112 --- /dev/null +++ b/gcc/testsuite/g++.dg/warn/Waddress-5.C @@ -0,0 +1,115 @@ +/* PR c/102103 - missing warning comparing array address to null + { dg-do compile } + { dg-options "-Wall" } */ + +#if __cplusplus < 201103L +# define nullptr __null +#endif + +struct A +{ + void f (); + virtual void vf (); + virtual void pvf () = 0; + + void sf (); + + int *p; + int a[2]; +}; + +void T (bool); + +void warn_memptr_if () +{ + // Exercise warnings for addresses of nonstatic member functions. + if (&A::f == 0) // { dg-warning "the address '&A::f'" } + T (0); + + if (&A::vf) // { dg-warning "-Waddress" } + T (0); + + if (&A::pvf != 0) // { dg-warning "-Waddress" } + T (0); + + // Exercise warnings for addresses of static member functions. + if (&A::sf == 0) // { dg-warning "-Waddress" } + T (0); + + if (&A::sf) // { dg-warning "-Waddress" } + T (0); + + // Exercise warnings for addresses of nonstatic data members. + if (&A::p == 0) // { dg-warning "the address '&A::p'" } + T (0); + + if (&A::a == nullptr) // { dg-warning "-Waddress" } + T (0); +} + +void warn_memptr_bool () +{ + // Exercise warnings for addresses of nonstatic member functions. + T (&A::f == 0); // { dg-warning "-Waddress" } + T (&A::vf); // { dg-warning "-Waddress" } + T (&A::pvf != 0); // { dg-warning "-Waddress" } + + // Exercise warnings for addresses of static member functions. + T (&A::sf == 0); // { dg-warning "-Waddress" } + T (&A::sf); // { dg-warning "-Waddress" } + + // Exercise warnings for addresses of nonstatic data members. + T (&A::p == 0); // { dg-warning "-Waddress" } + T (&A::a == nullptr); // { dg-warning "-Waddress" } +} + + +/* Verify that no warnings are issued for a dependent expression in + a template. */ + +template <int> +struct B +{ + // This is why. + struct F { void* operator& () const { return 0; } } f; +}; + +template <class Type, int N> +void nowarn_dependent (Type targ) +{ + T (&Type::x == 0); + T (&targ == 0); + + Type tarr[1]; + T (&tarr[0] == nullptr); + + T (&B<N>::f == 0); + + /* Like in the case above, the address-of operator could be a member + of B<N>::vf that returns zero. */ + T (&B<N>::vf); + T (&B<N>::pvf != 0); + T (&B<N>::p == 0); + T (&B<N>::a == 0); +} + + +/* Verify that in an uninstantiated template warnings are not issued + for dependent expressions but are issued otherwise. */ + +template <class Type> +void warn_non_dependent (Type targ, Type *tptr, int i) +{ + /* The address of a pointer to a dependent type cannot be null but + the warning doesn't have a chance to see it. */ + T (&tptr == 0); // { dg-warning "-Waddress" "pr102378" { xfail *-*-* } } + T (&i == 0); // { dg-warning "-Waddress" } + + int iarr[1]; + T (&iarr == 0); // { dg-warning "-Waddress" } + T (&*iarr != 0); // { dg-warning "-Waddress" "pr102378" { xfail *-*-* } } + T (&iarr[0] == 0); // { dg-warning "-Waddress" } + + Type tarr[1]; + T (&tarr == nullptr); // { dg-warning "-Waddress" "pr102378" { xfail *-*-* } } +} diff --git a/gcc/testsuite/g++.dg/warn/Waddress-6.C b/gcc/testsuite/g++.dg/warn/Waddress-6.C new file mode 100644 index 00000000000..c22a83a0dd7 --- /dev/null +++ b/gcc/testsuite/g++.dg/warn/Waddress-6.C @@ -0,0 +1,79 @@ +/* PR c/102103 - missing warning comparing array address to null + { dg-do compile } + Verify -Waddress for member arrays of structs and notes. + { dg-options "-Wall" } */ + +#if __cplusplus < 201103L +# define nullptr __null +#endif + +void T (bool); + +struct A +{ + int n; + int ia[]; // { dg-message "'A::ia' declared here" } +}; + +struct B +{ + A a[3]; // { dg-message "'B::a' declared here" } +}; + +struct C +{ + B b[3]; // { dg-message "'C::b' declared here" } +}; + +struct D +{ + C c[3]; // { dg-message "'D::c' declared here" } +}; + + +void test_waddress_1d () +{ + D d[2]; // { dg-message "'d' declared here" } + + T (d); // { dg-warning "address of 'd'" } + T (d == nullptr); // { dg-warning "address of 'd'" } + T (&d); // { dg-warning "address of 'd'" } + T (d->c); // { dg-warning "address of 'D::c'" } + T (d->c != nullptr); // { dg-warning "address of 'D::c'" } + T (d->c->b); // { dg-warning "address of 'C::b'" } + T (d->c[1].b->a); // { dg-warning "address of 'B::a'" } + T (d->c->b[2].a->ia); // { dg-warning "address of 'A::ia'" } + + if (d->c->b[2].a[1].ia) // { dg-warning "address of 'A::ia'" } + T (0); + + if (bool b = d->c->b[1].a) // { dg-warning "address of 'B::a'" } + T (b); + + /* The following is represented as a declaration of P followed + by an if statement and so it isn't diagnosed. It's not clear + that it should be since the pointer is then used. + void *p = d->c->b[2].a; + if (p) ... + */ + if (void *p = d->c->b[2].a) // { dg-warning "address of 'A::ia'" "" { xfail *-*-* } } + T (p); +} + + +void test_waddress_2d (int i) +{ + D d[2][3]; // { dg-message "'d' declared here" } + + T (d); // { dg-warning "address of 'd'" } + T (d == nullptr); // { dg-warning "address of 'd'" } + T (&d); // { dg-warning "address of 'd'" } + T (*d); // { dg-warning "address of 'd'" } + T (d[1] != nullptr); // { dg-warning "address of 'd'" } + T (&d[1]->c); // { dg-warning "address of 'D::c'" } + T (d[1]->c); // { dg-warning "address of 'D::c'" } + T (d[1]->c == nullptr); // { dg-warning "address of 'D::c'" } + T (d[i]->c[1].b); // { dg-warning "address of 'C::b'" } + T ((*(d + i))->c->b->a); // { dg-warning "address of 'B::a'" } + T (d[1][2].c->b->a->ia); // { dg-warning "address of 'A::ia'" } +} </cut>

4 years, 9 months

[TCWG CI] 458.sjeng grew in size by 4% after gcc: aarch64: Improve size heuristic for cpymem expansion

by ci_notify＠linaro.org

After gcc commit a459ee44c0a74b0df0485ed7a56683816c02aae9 Author: Kyrylo Tkachov <kyrylo.tkachov(a)arm.com> aarch64: Improve size heuristic for cpymem expansion the following benchmarks grew in size by more than 1%: - 458.sjeng grew in size by 4% from 105780 to 109944 bytes - 459.GemsFDTD grew in size by 2% from 247504 to 251468 bytes Below reproducer instructions can be used to re-build both "first_bad" and "last_good" cross-toolchains used in this bisection. Naturally, the scripts will fail when triggerring benchmarking jobs if you don't have access to Linaro TCWG CI. For your convenience, we have uploaded tarballs with pre-processed source and assembly files at: - First_bad save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-master-aa… - Last_good save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-master-aa… - Baseline save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-master-aa… Configuration: - Benchmark: SPEC CPU2006 - Toolchain: GCC + Glibc + GNU Linker - Version: all components were built from their tip of trunk - Target: aarch64-linux-gnu - Compiler flags: -Os -flto - Hardware: APM Mustang 8x X-Gene1 This benchmarking CI is work-in-progress, and we welcome feedback and suggestions at linaro-toolchain(a)lists.linaro.org . In our improvement plans is to add support for SPEC CPU2017 benchmarks and provide "perf report/annotate" data behind these reports. THIS IS THE END OF INTERESTING STUFF. BELOW ARE LINKS TO BUILDS, REPRODUCTION INSTRUCTIONS, AND THE RAW COMMIT. This commit has regressed these CI configurations: - tcwg_bmk_gnu_apm/gnu-master-aarch64-spec2k6-Os_LTO First_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-master-aa… Last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-master-aa… Baseline build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-master-aa… Even more details: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-master-aa… Reproduce builds: <cut> mkdir investigate-gcc-a459ee44c0a74b0df0485ed7a56683816c02aae9 cd investigate-gcc-a459ee44c0a74b0df0485ed7a56683816c02aae9 # Fetch scripts git clone https://git.linaro.org/toolchain/jenkins-scripts # Fetch manifests and test.sh script mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-master-aa… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-master-aa… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-master-aa… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /gcc/ ./ ./bisect/baseline/ cd gcc # Reproduce first_bad build git checkout --detach a459ee44c0a74b0df0485ed7a56683816c02aae9 ../artifacts/test.sh # Reproduce last_good build git checkout --detach 8f95e3c04d659d541ca4937b3df2f1175a1c5f05 ../artifacts/test.sh cd .. </cut> Full commit (up to 1000 lines): <cut> commit a459ee44c0a74b0df0485ed7a56683816c02aae9 Author: Kyrylo Tkachov <kyrylo.tkachov(a)arm.com> Date: Wed Sep 29 11:21:45 2021 +0100 aarch64: Improve size heuristic for cpymem expansion Similar to my previous patch for setmem this one does the same for the cpymem expansion. We count the number of ops emitted and compare it against the alternative of just calling the library function when optimising for size. For the code: void cpy_127 (char *out, char *in) { __builtin_memcpy (out, in, 127); } void cpy_128 (char *out, char *in) { __builtin_memcpy (out, in, 128); } we now emit a call to memcpy (with an extra MOV-immediate instruction for the size) instead of: cpy_127(char*, char*): ldp q0, q1, [x1] stp q0, q1, [x0] ldp q0, q1, [x1, 32] stp q0, q1, [x0, 32] ldp q0, q1, [x1, 64] stp q0, q1, [x0, 64] ldr q0, [x1, 96] str q0, [x0, 96] ldr q0, [x1, 111] str q0, [x0, 111] ret cpy_128(char*, char*): ldp q0, q1, [x1] stp q0, q1, [x0] ldp q0, q1, [x1, 32] stp q0, q1, [x0, 32] ldp q0, q1, [x1, 64] stp q0, q1, [x0, 64] ldp q0, q1, [x1, 96] stp q0, q1, [x0, 96] ret which is a clear code size win. Speed optimisation heuristics remain unchanged. 2021-09-29 Kyrylo Tkachov <kyrylo.tkachov(a)arm.com> * config/aarch64/aarch64.c (aarch64_expand_cpymem): Count number of emitted operations and adjust heuristic for code size. 2021-09-29 Kyrylo Tkachov <kyrylo.tkachov(a)arm.com> * gcc.target/aarch64/cpymem-size.c: New test. --- gcc/config/aarch64/aarch64.c | 36 ++++++++++++++++++-------- gcc/testsuite/gcc.target/aarch64/cpymem-size.c | 29 +++++++++++++++++++++ 2 files changed, 54 insertions(+), 11 deletions(-) diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index ac17c1c88fb..a9a1800af53 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -23390,7 +23390,8 @@ aarch64_copy_one_block_and_progress_pointers (rtx *src, rtx *dst, } /* Expand cpymem, as if from a __builtin_memcpy. Return true if - we succeed, otherwise return false. */ + we succeed, otherwise return false, indicating that a libcall to + memcpy should be emitted. */ bool aarch64_expand_cpymem (rtx *operands) @@ -23407,11 +23408,13 @@ aarch64_expand_cpymem (rtx *operands) unsigned HOST_WIDE_INT size = INTVAL (operands[2]); - /* Inline up to 256 bytes when optimizing for speed. */ + /* Try to inline up to 256 bytes. */ unsigned HOST_WIDE_INT max_copy_size = 256; - if (optimize_function_for_size_p (cfun)) - max_copy_size = 128; + bool size_p = optimize_function_for_size_p (cfun); + + if (size > max_copy_size) + return false; int copy_bits = 256; @@ -23421,13 +23424,14 @@ aarch64_expand_cpymem (rtx *operands) || !TARGET_SIMD || (aarch64_tune_params.extra_tuning_flags & AARCH64_EXTRA_TUNE_NO_LDP_STP_QREGS)) - { - copy_bits = 128; - max_copy_size = max_copy_size / 2; - } + copy_bits = 128; - if (size > max_copy_size) - return false; + /* Emit an inline load+store sequence and count the number of operations + involved. We use a simple count of just the loads and stores emitted + rather than rtx_insn count as all the pointer adjustments and reg copying + in this function will get optimized away later in the pipeline. */ + start_sequence (); + unsigned nops = 0; base = copy_to_mode_reg (Pmode, XEXP (dst, 0)); dst = adjust_automodify_address (dst, VOIDmode, base, 0); @@ -23456,7 +23460,8 @@ aarch64_expand_cpymem (rtx *operands) cur_mode = V4SImode; aarch64_copy_one_block_and_progress_pointers (&src, &dst, cur_mode); - + /* A single block copy is 1 load + 1 store. */ + nops += 2; n -= mode_bits; /* Emit trailing copies using overlapping unaligned accesses - this is @@ -23471,7 +23476,16 @@ aarch64_expand_cpymem (rtx *operands) n = n_bits; } } + rtx_insn *seq = get_insns (); + end_sequence (); + + /* A memcpy libcall in the worst case takes 3 instructions to prepare the + arguments + 1 for the call. */ + unsigned libcall_cost = 4; + if (size_p && libcall_cost < nops) + return false; + emit_insn (seq); return true; } diff --git a/gcc/testsuite/gcc.target/aarch64/cpymem-size.c b/gcc/testsuite/gcc.target/aarch64/cpymem-size.c new file mode 100644 index 00000000000..4d488b74301 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/cpymem-size.c @@ -0,0 +1,29 @@ +/* { dg-do compile } */ +/* { dg-options "-Os" } */ + +#include <stdlib.h> + +/* +** cpy_127: +** mov x2, 127 +** b memcpy +*/ +void +cpy_127 (char *out, char *in) +{ + __builtin_memcpy (out, in, 127); +} + +/* +** cpy_128: +** mov x2, 128 +** b memcpy +*/ +void +cpy_128 (char *out, char *in) +{ + __builtin_memcpy (out, in, 128); +} + +/* { dg-final { check-function-bodies "**" "" "" } } */ + </cut>

4 years, 9 months

[TCWG CI] Regression caused by gcc: [PR102546] X << Y being non-zero implies X is also non-zero.

by ci_notify＠linaro.org

[TCWG CI] Regression caused by gcc: [PR102546] X << Y being non-zero implies X is also non-zero.: commit 5f9ccf17de7f7581412c6bffd4a37beca9a79836 Author: Aldy Hernandez <aldyh(a)redhat.com> [PR102546] X << Y being non-zero implies X is also non-zero. Results regressed to # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1: -5 # build_abe qemu: -2 # linux_n_obj: 18603 # First few build errors in logs: # 00:01:53 arch/arm/vfp/vfpdouble.c:1206:1: internal compiler error: in upper_bound, at value-range.h:531 # 00:01:53 arch/arm/vfp/vfpsingle.c:1246:1: internal compiler error: in upper_bound, at value-range.h:531 # 00:01:53 make[2]: *** [scripts/Makefile.build:271: arch/arm/vfp/vfpdouble.o] Error 1 # 00:01:54 make[2]: *** [scripts/Makefile.build:271: arch/arm/vfp/vfpsingle.o] Error 1 # 00:01:55 make[1]: *** [scripts/Makefile.build:514: arch/arm/vfp] Error 2 # 00:01:56 arch/arm/nwfpe/softfloat.c:3432:1: internal compiler error: in upper_bound, at value-range.h:531 # 00:01:57 make[2]: *** [scripts/Makefile.build:271: arch/arm/nwfpe/softfloat.o] Error 1 # 00:01:57 make[1]: *** [scripts/Makefile.build:514: arch/arm/nwfpe] Error 2 # 00:02:14 arch/arm/kernel/smp.c:857:1: internal compiler error: in upper_bound, at value-range.h:531 # 00:02:15 make[2]: *** [scripts/Makefile.build:271: arch/arm/kernel/smp.o] Error 1 from # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1: -5 # build_abe qemu: -2 # linux_n_obj: 19709 # linux build successful: all THIS IS THE END OF INTERESTING STUFF. BELOW ARE LINKS TO BUILDS, REPRODUCTION INSTRUCTIONS, AND THE RAW COMMIT. This commit has regressed these CI configurations: - tcwg_kernel/gnu-master-arm-stable-allyesconfig First_bad build: https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-arm-stable-ally… Last_good build: https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-arm-stable-ally… Baseline build: https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-arm-stable-ally… Even more details: https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-arm-stable-ally… Reproduce builds: <cut> mkdir investigate-gcc-5f9ccf17de7f7581412c6bffd4a37beca9a79836 cd investigate-gcc-5f9ccf17de7f7581412c6bffd4a37beca9a79836 # Fetch scripts git clone https://git.linaro.org/toolchain/jenkins-scripts # Fetch manifests and test.sh script mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-arm-stable-ally… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-arm-stable-ally… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-arm-stable-ally… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_kernel-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /gcc/ ./ ./bisect/baseline/ cd gcc # Reproduce first_bad build git checkout --detach 5f9ccf17de7f7581412c6bffd4a37beca9a79836 ../artifacts/test.sh # Reproduce last_good build git checkout --detach 257d2890a769a8aa564d079170377e637e07acb1 ../artifacts/test.sh cd .. </cut> Full commit (up to 1000 lines): <cut> commit 5f9ccf17de7f7581412c6bffd4a37beca9a79836 Author: Aldy Hernandez <aldyh(a)redhat.com> Date: Fri Oct 1 13:05:36 2021 +0200 [PR102546] X << Y being non-zero implies X is also non-zero. This patch teaches this to range-ops. Tested on x86-64 Linux. gcc/ChangeLog: PR tree-optimization/102546 * range-op.cc (operator_lshift::op1_range): Teach range-ops that X << Y is non-zero implies X is also non-zero. --- gcc/range-op.cc | 18 ++++++++++++++---- gcc/testsuite/gcc.dg/tree-ssa/pr102546.c | 23 +++++++++++++++++++++++ 2 files changed, 37 insertions(+), 4 deletions(-) diff --git a/gcc/range-op.cc b/gcc/range-op.cc index 5e37133026d..2baca4a197f 100644 --- a/gcc/range-op.cc +++ b/gcc/range-op.cc @@ -2078,6 +2078,12 @@ operator_lshift::op1_range (irange &r, relation_kind rel ATTRIBUTE_UNUSED) const { tree shift_amount; + + if (!lhs.contains_p (build_zero_cst (type))) + r.set_nonzero (type); + else + r.set_varying (type); + if (op2.singleton_p (&shift_amount)) { wide_int shift = wi::to_wide (shift_amount); @@ -2089,21 +2095,24 @@ operator_lshift::op1_range (irange &r, return false; if (shift == 0) { - r = lhs; + r.intersect (lhs); return true; } // Work completely in unsigned mode to start. tree utype = type; + int_range_max tmp_range; if (TYPE_SIGN (type) == SIGNED) { int_range_max tmp = lhs; utype = unsigned_type_for (type); range_cast (tmp, utype); - op_rshift.fold_range (r, utype, tmp, op2); + op_rshift.fold_range (tmp_range, utype, tmp, op2); } else - op_rshift.fold_range (r, utype, lhs, op2); + op_rshift.fold_range (tmp_range, utype, lhs, op2); + + r.intersect (tmp_range); // Start with ranges which can produce the LHS by right shifting the // result by the shift amount. @@ -2128,7 +2137,8 @@ operator_lshift::op1_range (irange &r, range_cast (r, type); return true; } - return false; + + return !r.varying_p (); } bool diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr102546.c b/gcc/testsuite/gcc.dg/tree-ssa/pr102546.c new file mode 100644 index 00000000000..4bd98747732 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr102546.c @@ -0,0 +1,23 @@ +// { dg-do compile } +// { dg-options "-O3 -fdump-tree-optimized" } + +static int a; +static char b, c, d; +void bar(void); +void foo(void); + +int main() { + int f = 0; + for (; f <= 5; f++) { + bar(); + b = b && f; + d = f << f; + if (!(a >= d || f)) + foo(); + c = 1; + for (; c; c = 0) + ; + } +} + +// { dg-final { scan-tree-dump-not "foo" "optimized" } } </cut>

4 years, 9 months

Jump to page:

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

linaro-toolchain