linaro-toolchain October 2021

linaro-toolchain@lists.linaro.org

16 participants
25 discussions

[TCWG CI] 458.sjeng grew in size by 4% after gcc: aarch64: Improve size heuristic for cpymem expansion

by ci_notify＠linaro.org

After gcc commit a459ee44c0a74b0df0485ed7a56683816c02aae9 Author: Kyrylo Tkachov <kyrylo.tkachov(a)arm.com> aarch64: Improve size heuristic for cpymem expansion the following benchmarks grew in size by more than 1%: - 458.sjeng grew in size by 4% from 105780 to 109944 bytes - 459.GemsFDTD grew in size by 2% from 247504 to 251468 bytes Below reproducer instructions can be used to re-build both "first_bad" and "last_good" cross-toolchains used in this bisection. Naturally, the scripts will fail when triggerring benchmarking jobs if you don't have access to Linaro TCWG CI. For your convenience, we have uploaded tarballs with pre-processed source and assembly files at: - First_bad save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-master-aa… - Last_good save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-master-aa… - Baseline save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-master-aa… Configuration: - Benchmark: SPEC CPU2006 - Toolchain: GCC + Glibc + GNU Linker - Version: all components were built from their tip of trunk - Target: aarch64-linux-gnu - Compiler flags: -Os -flto - Hardware: APM Mustang 8x X-Gene1 This benchmarking CI is work-in-progress, and we welcome feedback and suggestions at linaro-toolchain(a)lists.linaro.org . In our improvement plans is to add support for SPEC CPU2017 benchmarks and provide "perf report/annotate" data behind these reports. THIS IS THE END OF INTERESTING STUFF. BELOW ARE LINKS TO BUILDS, REPRODUCTION INSTRUCTIONS, AND THE RAW COMMIT. This commit has regressed these CI configurations: - tcwg_bmk_gnu_apm/gnu-master-aarch64-spec2k6-Os_LTO First_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-master-aa… Last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-master-aa… Baseline build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-master-aa… Even more details: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-master-aa… Reproduce builds: <cut> mkdir investigate-gcc-a459ee44c0a74b0df0485ed7a56683816c02aae9 cd investigate-gcc-a459ee44c0a74b0df0485ed7a56683816c02aae9 # Fetch scripts git clone https://git.linaro.org/toolchain/jenkins-scripts # Fetch manifests and test.sh script mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-master-aa… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-master-aa… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-master-aa… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /gcc/ ./ ./bisect/baseline/ cd gcc # Reproduce first_bad build git checkout --detach a459ee44c0a74b0df0485ed7a56683816c02aae9 ../artifacts/test.sh # Reproduce last_good build git checkout --detach 8f95e3c04d659d541ca4937b3df2f1175a1c5f05 ../artifacts/test.sh cd .. </cut> Full commit (up to 1000 lines): <cut> commit a459ee44c0a74b0df0485ed7a56683816c02aae9 Author: Kyrylo Tkachov <kyrylo.tkachov(a)arm.com> Date: Wed Sep 29 11:21:45 2021 +0100 aarch64: Improve size heuristic for cpymem expansion Similar to my previous patch for setmem this one does the same for the cpymem expansion. We count the number of ops emitted and compare it against the alternative of just calling the library function when optimising for size. For the code: void cpy_127 (char *out, char *in) { __builtin_memcpy (out, in, 127); } void cpy_128 (char *out, char *in) { __builtin_memcpy (out, in, 128); } we now emit a call to memcpy (with an extra MOV-immediate instruction for the size) instead of: cpy_127(char*, char*): ldp q0, q1, [x1] stp q0, q1, [x0] ldp q0, q1, [x1, 32] stp q0, q1, [x0, 32] ldp q0, q1, [x1, 64] stp q0, q1, [x0, 64] ldr q0, [x1, 96] str q0, [x0, 96] ldr q0, [x1, 111] str q0, [x0, 111] ret cpy_128(char*, char*): ldp q0, q1, [x1] stp q0, q1, [x0] ldp q0, q1, [x1, 32] stp q0, q1, [x0, 32] ldp q0, q1, [x1, 64] stp q0, q1, [x0, 64] ldp q0, q1, [x1, 96] stp q0, q1, [x0, 96] ret which is a clear code size win. Speed optimisation heuristics remain unchanged. 2021-09-29 Kyrylo Tkachov <kyrylo.tkachov(a)arm.com> * config/aarch64/aarch64.c (aarch64_expand_cpymem): Count number of emitted operations and adjust heuristic for code size. 2021-09-29 Kyrylo Tkachov <kyrylo.tkachov(a)arm.com> * gcc.target/aarch64/cpymem-size.c: New test. --- gcc/config/aarch64/aarch64.c | 36 ++++++++++++++++++-------- gcc/testsuite/gcc.target/aarch64/cpymem-size.c | 29 +++++++++++++++++++++ 2 files changed, 54 insertions(+), 11 deletions(-) diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index ac17c1c88fb..a9a1800af53 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -23390,7 +23390,8 @@ aarch64_copy_one_block_and_progress_pointers (rtx *src, rtx *dst, } /* Expand cpymem, as if from a __builtin_memcpy. Return true if - we succeed, otherwise return false. */ + we succeed, otherwise return false, indicating that a libcall to + memcpy should be emitted. */ bool aarch64_expand_cpymem (rtx *operands) @@ -23407,11 +23408,13 @@ aarch64_expand_cpymem (rtx *operands) unsigned HOST_WIDE_INT size = INTVAL (operands[2]); - /* Inline up to 256 bytes when optimizing for speed. */ + /* Try to inline up to 256 bytes. */ unsigned HOST_WIDE_INT max_copy_size = 256; - if (optimize_function_for_size_p (cfun)) - max_copy_size = 128; + bool size_p = optimize_function_for_size_p (cfun); + + if (size > max_copy_size) + return false; int copy_bits = 256; @@ -23421,13 +23424,14 @@ aarch64_expand_cpymem (rtx *operands) || !TARGET_SIMD || (aarch64_tune_params.extra_tuning_flags & AARCH64_EXTRA_TUNE_NO_LDP_STP_QREGS)) - { - copy_bits = 128; - max_copy_size = max_copy_size / 2; - } + copy_bits = 128; - if (size > max_copy_size) - return false; + /* Emit an inline load+store sequence and count the number of operations + involved. We use a simple count of just the loads and stores emitted + rather than rtx_insn count as all the pointer adjustments and reg copying + in this function will get optimized away later in the pipeline. */ + start_sequence (); + unsigned nops = 0; base = copy_to_mode_reg (Pmode, XEXP (dst, 0)); dst = adjust_automodify_address (dst, VOIDmode, base, 0); @@ -23456,7 +23460,8 @@ aarch64_expand_cpymem (rtx *operands) cur_mode = V4SImode; aarch64_copy_one_block_and_progress_pointers (&src, &dst, cur_mode); - + /* A single block copy is 1 load + 1 store. */ + nops += 2; n -= mode_bits; /* Emit trailing copies using overlapping unaligned accesses - this is @@ -23471,7 +23476,16 @@ aarch64_expand_cpymem (rtx *operands) n = n_bits; } } + rtx_insn *seq = get_insns (); + end_sequence (); + + /* A memcpy libcall in the worst case takes 3 instructions to prepare the + arguments + 1 for the call. */ + unsigned libcall_cost = 4; + if (size_p && libcall_cost < nops) + return false; + emit_insn (seq); return true; } diff --git a/gcc/testsuite/gcc.target/aarch64/cpymem-size.c b/gcc/testsuite/gcc.target/aarch64/cpymem-size.c new file mode 100644 index 00000000000..4d488b74301 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/cpymem-size.c @@ -0,0 +1,29 @@ +/* { dg-do compile } */ +/* { dg-options "-Os" } */ + +#include <stdlib.h> + +/* +** cpy_127: +** mov x2, 127 +** b memcpy +*/ +void +cpy_127 (char *out, char *in) +{ + __builtin_memcpy (out, in, 127); +} + +/* +** cpy_128: +** mov x2, 128 +** b memcpy +*/ +void +cpy_128 (char *out, char *in) +{ + __builtin_memcpy (out, in, 128); +} + +/* { dg-final { check-function-bodies "**" "" "" } } */ + </cut>

4 years, 5 months

[TCWG CI] Regression caused by gcc: [PR102546] X << Y being non-zero implies X is also non-zero.

by ci_notify＠linaro.org

[TCWG CI] Regression caused by gcc: [PR102546] X << Y being non-zero implies X is also non-zero.: commit 5f9ccf17de7f7581412c6bffd4a37beca9a79836 Author: Aldy Hernandez <aldyh(a)redhat.com> [PR102546] X << Y being non-zero implies X is also non-zero. Results regressed to # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1: -5 # build_abe qemu: -2 # linux_n_obj: 18603 # First few build errors in logs: # 00:01:53 arch/arm/vfp/vfpdouble.c:1206:1: internal compiler error: in upper_bound, at value-range.h:531 # 00:01:53 arch/arm/vfp/vfpsingle.c:1246:1: internal compiler error: in upper_bound, at value-range.h:531 # 00:01:53 make[2]: *** [scripts/Makefile.build:271: arch/arm/vfp/vfpdouble.o] Error 1 # 00:01:54 make[2]: *** [scripts/Makefile.build:271: arch/arm/vfp/vfpsingle.o] Error 1 # 00:01:55 make[1]: *** [scripts/Makefile.build:514: arch/arm/vfp] Error 2 # 00:01:56 arch/arm/nwfpe/softfloat.c:3432:1: internal compiler error: in upper_bound, at value-range.h:531 # 00:01:57 make[2]: *** [scripts/Makefile.build:271: arch/arm/nwfpe/softfloat.o] Error 1 # 00:01:57 make[1]: *** [scripts/Makefile.build:514: arch/arm/nwfpe] Error 2 # 00:02:14 arch/arm/kernel/smp.c:857:1: internal compiler error: in upper_bound, at value-range.h:531 # 00:02:15 make[2]: *** [scripts/Makefile.build:271: arch/arm/kernel/smp.o] Error 1 from # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1: -5 # build_abe qemu: -2 # linux_n_obj: 19709 # linux build successful: all THIS IS THE END OF INTERESTING STUFF. BELOW ARE LINKS TO BUILDS, REPRODUCTION INSTRUCTIONS, AND THE RAW COMMIT. This commit has regressed these CI configurations: - tcwg_kernel/gnu-master-arm-stable-allyesconfig First_bad build: https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-arm-stable-ally… Last_good build: https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-arm-stable-ally… Baseline build: https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-arm-stable-ally… Even more details: https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-arm-stable-ally… Reproduce builds: <cut> mkdir investigate-gcc-5f9ccf17de7f7581412c6bffd4a37beca9a79836 cd investigate-gcc-5f9ccf17de7f7581412c6bffd4a37beca9a79836 # Fetch scripts git clone https://git.linaro.org/toolchain/jenkins-scripts # Fetch manifests and test.sh script mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-arm-stable-ally… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-arm-stable-ally… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-arm-stable-ally… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_kernel-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /gcc/ ./ ./bisect/baseline/ cd gcc # Reproduce first_bad build git checkout --detach 5f9ccf17de7f7581412c6bffd4a37beca9a79836 ../artifacts/test.sh # Reproduce last_good build git checkout --detach 257d2890a769a8aa564d079170377e637e07acb1 ../artifacts/test.sh cd .. </cut> Full commit (up to 1000 lines): <cut> commit 5f9ccf17de7f7581412c6bffd4a37beca9a79836 Author: Aldy Hernandez <aldyh(a)redhat.com> Date: Fri Oct 1 13:05:36 2021 +0200 [PR102546] X << Y being non-zero implies X is also non-zero. This patch teaches this to range-ops. Tested on x86-64 Linux. gcc/ChangeLog: PR tree-optimization/102546 * range-op.cc (operator_lshift::op1_range): Teach range-ops that X << Y is non-zero implies X is also non-zero. --- gcc/range-op.cc | 18 ++++++++++++++---- gcc/testsuite/gcc.dg/tree-ssa/pr102546.c | 23 +++++++++++++++++++++++ 2 files changed, 37 insertions(+), 4 deletions(-) diff --git a/gcc/range-op.cc b/gcc/range-op.cc index 5e37133026d..2baca4a197f 100644 --- a/gcc/range-op.cc +++ b/gcc/range-op.cc @@ -2078,6 +2078,12 @@ operator_lshift::op1_range (irange &r, relation_kind rel ATTRIBUTE_UNUSED) const { tree shift_amount; + + if (!lhs.contains_p (build_zero_cst (type))) + r.set_nonzero (type); + else + r.set_varying (type); + if (op2.singleton_p (&shift_amount)) { wide_int shift = wi::to_wide (shift_amount); @@ -2089,21 +2095,24 @@ operator_lshift::op1_range (irange &r, return false; if (shift == 0) { - r = lhs; + r.intersect (lhs); return true; } // Work completely in unsigned mode to start. tree utype = type; + int_range_max tmp_range; if (TYPE_SIGN (type) == SIGNED) { int_range_max tmp = lhs; utype = unsigned_type_for (type); range_cast (tmp, utype); - op_rshift.fold_range (r, utype, tmp, op2); + op_rshift.fold_range (tmp_range, utype, tmp, op2); } else - op_rshift.fold_range (r, utype, lhs, op2); + op_rshift.fold_range (tmp_range, utype, lhs, op2); + + r.intersect (tmp_range); // Start with ranges which can produce the LHS by right shifting the // result by the shift amount. @@ -2128,7 +2137,8 @@ operator_lshift::op1_range (irange &r, range_cast (r, type); return true; } - return false; + + return !r.varying_p (); } bool diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr102546.c b/gcc/testsuite/gcc.dg/tree-ssa/pr102546.c new file mode 100644 index 00000000000..4bd98747732 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr102546.c @@ -0,0 +1,23 @@ +// { dg-do compile } +// { dg-options "-O3 -fdump-tree-optimized" } + +static int a; +static char b, c, d; +void bar(void); +void foo(void); + +int main() { + int f = 0; + for (; f <= 5; f++) { + bar(); + b = b && f; + d = f << f; + if (!(a >= d || f)) + foo(); + c = 1; + for (; c; c = 0) + ; + } +} + +// { dg-final { scan-tree-dump-not "foo" "optimized" } } </cut>

4 years, 5 months

Re: [TCWG CI] 400.perlbench slowed down by 6% after llvm: [SimplifyCFG] Ignore free instructions when computing cost for folding branch to common dest

by Maxim Kuvyrkov

Hi Arthur, Thanks for looking into this! The flags to compile regexec.c were: -O3 --target=aarch64-linux-gnu -fgnu89-inline Clang was configured with (on x86_64-linux-gnu host): cmake -G Ninja ../llvm/llvm '-DLLVM_ENABLE_PROJECTS=clang;lld' -DCMAKE_BUILD_TYPE=Release -DLLVM_ENABLE_ASSERTIONS=True -DCMAKE_INSTALL_PREFIX=../llvm-install -DLLVM_TARGETS_TO_BUILD=AArch64 Please let me know if the above doesn’t work for you. Regards, -- Maxim Kuvyrkov https://www.linaro.org > On 29 Sep 2021, at 20:47, Arthur Eubanks <aeubanks(a)google.com> wrote: > > Do you know the flags passed to Clang to compile the sources? I tried compiling the preprocessed sources but ran into the below, and couldn't find the flags in any of the logs. > > In file included from regexec.c:93: > In file included from ./perl.h:384: > In file included from /home/tcwg-buildslave/workspace/tcwg_bmk_0/abe/builds/destdir/x86_64-pc-linux-gnu/aarch64-linux-gnu/libc/usr/include/sys/types.h:144: > /home/tcwg-buildslave/workspace/tcwg_bmk_0/llvm-install/lib/clang/14.0.0/include/stddef.h:46:27: error: typedef redefinition with different types ('unsigned long' vs 'unsigned long long') > typedef long unsigned int size_t; > ^ > 1 error generated. > > > > And yeah just moving the code around could cause major performance regressions, I've had other patches do the same for various benchmarks, there's not much we can do about that if that's actually the root cause. If I can compile the file I can check if the optimization actually created worse IR or not. > > > On Wed, Sep 29, 2021 at 5:59 AM Maxim Kuvyrkov <maxim.kuvyrkov(a)linaro.org> wrote: > Hi Arthur, > > Pre-processed source is in the save-temps tarballs linked below; S_regmatch() is in regexec.i . > > The save-temps also have .s assembly file for before and after your patch, and the only code-gen difference is in S_reginclass() function — see the attached screenshot #1. > > Looking into profile of S_regmatch(), some of the extra cycles come from hot loop starting with “cbz w19,...” getting misaligned — before your patch it was starting at "2bce10", and after it starts at "2bce6c”. > > Maybe the added instructions in S_reginclass() pushed the loop in S_regmatch() in an unfortunate way? > > -- > Maxim Kuvyrkov > https://www.linaro.org > >> On 27 Sep 2021, at 20:05, Arthur Eubanks <aeubanks(a)google.com> wrote: >> >> Could I get the source file with S_regmatch()? >> >> On Mon, Sep 27, 2021 at 6:07 AM Maxim Kuvyrkov <maxim.kuvyrkov(a)linaro.org> wrote: >> Hi Arthur, >> >> Your patch seems to be slowing down 400.perlbench by 6% — due to slow down of its hot function S_regmatch() by 14%. >> >> Could you take a look if this is easily fixable, please? >> >> Regards, >> >> -- >> Maxim Kuvyrkov >> https://www.linaro.org >> >> > On 24 Sep 2021, at 15:07, ci_notify(a)linaro.org wrote: >> > >> > After llvm commit e7249e4acf3cf9438d6d9e02edecebd5b622a4dc >> > Author: Arthur Eubanks <aeubanks(a)google.com> >> > >> > [SimplifyCFG] Ignore free instructions when computing cost for folding branch to common dest >> > >> > the following benchmarks slowed down by more than 2%: >> > - 400.perlbench slowed down by 6% from 9730 to 10312 perf samples >> > - 400.perlbench:[.] S_regmatch slowed down by 14% from 3660 to 4188 perf samples >> > >> > Below reproducer instructions can be used to re-build both "first_bad" and "last_good" cross-toolchains used in this bisection. Naturally, the scripts will fail when triggerring benchmarking jobs if you don't have access to Linaro TCWG CI. >> > >> > For your convenience, we have uploaded tarballs with pre-processed source and assembly files at: >> > - First_bad save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… >> > - Last_good save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… >> > - Baseline save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-… >> > >> > Configuration: >> > - Benchmark: SPEC CPU2006 >> > - Toolchain: Clang + Glibc + LLVM Linker >> > - Version: all components were built from their tip of trunk >> > - Target: aarch64-linux-gnu >> > - Compiler flags: -O3 >> > - Hardware: NVidia TX1 4x Cortex-A57 >> > >> > This benchmarking CI is work-in-progress, and we welcome feedback and suggestions at linaro-toolchain(a)lists.linaro.org . In our improvement plans is to add support for SPEC CPU2017 benchmarks and provide "perf report/annotate" data behind these reports. > > <2021-09-29_15-44-27.png><2021-09-29_15-53-20.png>

4 years, 6 months

[TCWG CI] 456.hmmer slowed down by 5% after llvm: Revert "Allow rematerialization of virtual reg uses"

by ci_notify＠linaro.org

After llvm commit 08d7eec06e8cf5c15a96ce11f311f1480291a441 Author: Stanislav Mekhanoshin <Stanislav.Mekhanoshin(a)amd.com> Revert "Allow rematerialization of virtual reg uses" the following benchmarks slowed down by more than 2%: - 456.hmmer slowed down by 5% from 7649 to 8028 perf samples Below reproducer instructions can be used to re-build both "first_bad" and "last_good" cross-toolchains used in this bisection. Naturally, the scripts will fail when triggerring benchmarking jobs if you don't have access to Linaro TCWG CI. For your convenience, we have uploaded tarballs with pre-processed source and assembly files at: - First_bad save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-master-… - Last_good save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-master-… - Baseline save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-master-… Configuration: - Benchmark: SPEC CPU2006 - Toolchain: Clang + Glibc + LLVM Linker - Version: all components were built from their tip of trunk - Target: arm-linux-gnueabihf - Compiler flags: -O2 -flto -marm - Hardware: NVidia TK1 4x Cortex-A15 This benchmarking CI is work-in-progress, and we welcome feedback and suggestions at linaro-toolchain(a)lists.linaro.org . In our improvement plans is to add support for SPEC CPU2017 benchmarks and provide "perf report/annotate" data behind these reports. THIS IS THE END OF INTERESTING STUFF. BELOW ARE LINKS TO BUILDS, REPRODUCTION INSTRUCTIONS, AND THE RAW COMMIT. This commit has regressed these CI configurations: - tcwg_bmk_llvm_tk1/llvm-master-arm-spec2k6-O2_LTO First_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-master-… Last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-master-… Baseline build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-master-… Even more details: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-master-… Reproduce builds: <cut> mkdir investigate-llvm-08d7eec06e8cf5c15a96ce11f311f1480291a441 cd investigate-llvm-08d7eec06e8cf5c15a96ce11f311f1480291a441 # Fetch scripts git clone https://git.linaro.org/toolchain/jenkins-scripts # Fetch manifests and test.sh script mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-master-… --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-master-… --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-master-… --fail chmod +x artifacts/test.sh # Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh # Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /llvm/ ./ ./bisect/baseline/ cd llvm # Reproduce first_bad build git checkout --detach 08d7eec06e8cf5c15a96ce11f311f1480291a441 ../artifacts/test.sh # Reproduce last_good build git checkout --detach e8e2edd8ca88f8b0a7dba141349b2aa83284f3af ../artifacts/test.sh cd .. </cut> Full commit (up to 1000 lines): <cut> commit 08d7eec06e8cf5c15a96ce11f311f1480291a441 Author: Stanislav Mekhanoshin <Stanislav.Mekhanoshin(a)amd.com> Date: Fri Sep 24 09:53:51 2021 -0700 Revert "Allow rematerialization of virtual reg uses" Reverted due to two distcint performance regression reports. This reverts commit 92c1fd19abb15bc68b1127a26137a69e033cdb39. --- llvm/include/llvm/CodeGen/TargetInstrInfo.h | 12 +- llvm/lib/CodeGen/TargetInstrInfo.cpp | 9 +- llvm/test/CodeGen/AMDGPU/remat-sop.mir | 60 - llvm/test/CodeGen/ARM/arm-shrink-wrapping-linux.ll | 28 +- llvm/test/CodeGen/ARM/funnel-shift-rot.ll | 32 +- llvm/test/CodeGen/ARM/funnel-shift.ll | 30 +- .../test/CodeGen/ARM/illegal-bitfield-loadstore.ll | 30 +- llvm/test/CodeGen/ARM/neon-copy.ll | 10 +- llvm/test/CodeGen/Mips/llvm-ir/ashr.ll | 227 +- llvm/test/CodeGen/Mips/llvm-ir/lshr.ll | 206 +- llvm/test/CodeGen/Mips/llvm-ir/shl.ll | 95 +- llvm/test/CodeGen/Mips/llvm-ir/sub.ll | 31 +- llvm/test/CodeGen/Mips/tls.ll | 4 +- llvm/test/CodeGen/RISCV/atomic-rmw.ll | 120 +- llvm/test/CodeGen/RISCV/atomic-signext.ll | 24 +- llvm/test/CodeGen/RISCV/bswap-ctlz-cttz-ctpop.ll | 96 +- llvm/test/CodeGen/RISCV/mul.ll | 72 +- llvm/test/CodeGen/RISCV/rv32i-rv64i-half.ll | 12 +- llvm/test/CodeGen/RISCV/rv32zbb-zbp.ll | 270 +- llvm/test/CodeGen/RISCV/rv32zbb.ll | 94 +- llvm/test/CodeGen/RISCV/rv32zbp.ll | 262 +- llvm/test/CodeGen/RISCV/rv32zbt.ll | 206 +- .../CodeGen/RISCV/rvv/fixed-vectors-bitreverse.ll | 150 +- llvm/test/CodeGen/RISCV/rvv/fixed-vectors-bswap.ll | 146 +- llvm/test/CodeGen/RISCV/rvv/fixed-vectors-ctlz.ll | 3584 ++++++++++---------- llvm/test/CodeGen/RISCV/rvv/fixed-vectors-cttz.ll | 664 ++-- llvm/test/CodeGen/RISCV/shifts.ll | 308 +- llvm/test/CodeGen/RISCV/srem-vector-lkk.ll | 208 +- llvm/test/CodeGen/RISCV/urem-vector-lkk.ll | 190 +- llvm/test/CodeGen/Thumb/dyn-stackalloc.ll | 7 +- .../tail-pred-disabled-in-loloops.ll | 14 +- .../LowOverheadLoops/varying-outer-2d-reduction.ll | 64 +- .../CodeGen/Thumb2/LowOverheadLoops/while-loops.ll | 67 +- llvm/test/CodeGen/Thumb2/ldr-str-imm12.ll | 30 +- llvm/test/CodeGen/Thumb2/mve-float16regloops.ll | 82 +- llvm/test/CodeGen/Thumb2/mve-float32regloops.ll | 98 +- llvm/test/CodeGen/Thumb2/mve-postinc-dct.ll | 529 +-- llvm/test/CodeGen/X86/addcarry.ll | 20 +- llvm/test/CodeGen/X86/callbr-asm-blockplacement.ll | 12 +- llvm/test/CodeGen/X86/dag-update-nodetomatch.ll | 17 +- .../X86/delete-dead-instrs-with-live-uses.mir | 4 +- llvm/test/CodeGen/X86/inalloca-invoke.ll | 2 +- llvm/test/CodeGen/X86/licm-regpressure.ll | 28 +- llvm/test/CodeGen/X86/ragreedy-hoist-spill.ll | 40 +- llvm/test/CodeGen/X86/sdiv_fix.ll | 5 +- 45 files changed, 4093 insertions(+), 4106 deletions(-) diff --git a/llvm/include/llvm/CodeGen/TargetInstrInfo.h b/llvm/include/llvm/CodeGen/TargetInstrInfo.h index a0c52e2f1a13..c394ac910be1 100644 --- a/llvm/include/llvm/CodeGen/TargetInstrInfo.h +++ b/llvm/include/llvm/CodeGen/TargetInstrInfo.h @@ -117,11 +117,10 @@ public: const MachineFunction &MF) const; /// Return true if the instruction is trivially rematerializable, meaning it - /// has no side effects. Uses of constants and unallocatable physical - /// registers are always trivial to rematerialize so that the instructions - /// result is independent of the place in the function. Uses of virtual - /// registers are allowed but it is caller's responsility to ensure these - /// operands are valid at the point the instruction is beeing moved. + /// has no side effects and requires no operands that aren't always available. + /// This means the only allowed uses are constants and unallocatable physical + /// registers so that the instructions result is independent of the place + /// in the function. bool isTriviallyReMaterializable(const MachineInstr &MI, AAResults *AA = nullptr) const { return MI.getOpcode() == TargetOpcode::IMPLICIT_DEF || @@ -141,7 +140,8 @@ protected: /// set, this hook lets the target specify whether the instruction is actually /// trivially rematerializable, taking into consideration its operands. This /// predicate must return false if the instruction has any side effects other - /// than producing a value. + /// than producing a value, or if it requres any address registers that are + /// not always available. /// Requirements must be check as stated in isTriviallyReMaterializable() . virtual bool isReallyTriviallyReMaterializable(const MachineInstr &MI, AAResults *AA) const { diff --git a/llvm/lib/CodeGen/TargetInstrInfo.cpp b/llvm/lib/CodeGen/TargetInstrInfo.cpp index fe7d60e0b7e2..1eab8e7443a7 100644 --- a/llvm/lib/CodeGen/TargetInstrInfo.cpp +++ b/llvm/lib/CodeGen/TargetInstrInfo.cpp @@ -921,8 +921,7 @@ bool TargetInstrInfo::isReallyTriviallyReMaterializableGeneric( const MachineRegisterInfo &MRI = MF.getRegInfo(); // Remat clients assume operand 0 is the defined register. - if (!MI.getNumOperands() || !MI.getOperand(0).isReg() || - MI.getOperand(0).isTied()) + if (!MI.getNumOperands() || !MI.getOperand(0).isReg()) return false; Register DefReg = MI.getOperand(0).getReg(); @@ -984,6 +983,12 @@ bool TargetInstrInfo::isReallyTriviallyReMaterializableGeneric( // same virtual register, though. if (MO.isDef() && Reg != DefReg) return false; + + // Don't allow any virtual-register uses. Rematting an instruction with + // virtual register uses would length the live ranges of the uses, which + // is not necessarily a good idea, certainly not "trivial". + if (MO.isUse()) + return false; } // Everything checked out. diff --git a/llvm/test/CodeGen/AMDGPU/remat-sop.mir b/llvm/test/CodeGen/AMDGPU/remat-sop.mir index c9915aaabfde..ed799bfca028 100644 --- a/llvm/test/CodeGen/AMDGPU/remat-sop.mir +++ b/llvm/test/CodeGen/AMDGPU/remat-sop.mir @@ -51,66 +51,6 @@ body: | S_NOP 0, implicit %2 S_ENDPGM 0 ... -# The liverange of %0 covers a point of rematerialization, source value is -# availabe. ---- -name: test_remat_s_mov_b32_vreg_src_long_lr -tracksRegLiveness: true -machineFunctionInfo: - stackPtrOffsetReg: $sgpr32 -body: | - bb.0: - ; GCN-LABEL: name: test_remat_s_mov_b32_vreg_src_long_lr - ; GCN: renamable $sgpr0 = IMPLICIT_DEF - ; GCN: renamable $sgpr1 = S_MOV_B32 renamable $sgpr0 - ; GCN: S_NOP 0, implicit killed renamable $sgpr1 - ; GCN: renamable $sgpr1 = S_MOV_B32 renamable $sgpr0 - ; GCN: S_NOP 0, implicit killed renamable $sgpr1 - ; GCN: renamable $sgpr1 = S_MOV_B32 renamable $sgpr0 - ; GCN: S_NOP 0, implicit killed renamable $sgpr1 - ; GCN: S_NOP 0, implicit killed renamable $sgpr0 - ; GCN: S_ENDPGM 0 - %0:sreg_32 = IMPLICIT_DEF - %1:sreg_32 = S_MOV_B32 %0:sreg_32 - %2:sreg_32 = S_MOV_B32 %0:sreg_32 - %3:sreg_32 = S_MOV_B32 %0:sreg_32 - S_NOP 0, implicit %1 - S_NOP 0, implicit %2 - S_NOP 0, implicit %3 - S_NOP 0, implicit %0 - S_ENDPGM 0 -... -# The liverange of %0 does not cover a point of rematerialization, source value is -# unavailabe and we do not want to artificially extend the liverange. ---- -name: test_no_remat_s_mov_b32_vreg_src_short_lr -tracksRegLiveness: true -machineFunctionInfo: - stackPtrOffsetReg: $sgpr32 -body: | - bb.0: - ; GCN-LABEL: name: test_no_remat_s_mov_b32_vreg_src_short_lr - ; GCN: renamable $sgpr0 = IMPLICIT_DEF - ; GCN: renamable $sgpr1 = S_MOV_B32 renamable $sgpr0 - ; GCN: SI_SPILL_S32_SAVE killed renamable $sgpr1, %stack.1, implicit $exec, implicit $sgpr32 :: (store (s32) into %stack.1, addrspace 5) - ; GCN: renamable $sgpr1 = S_MOV_B32 renamable $sgpr0 - ; GCN: SI_SPILL_S32_SAVE killed renamable $sgpr1, %stack.0, implicit $exec, implicit $sgpr32 :: (store (s32) into %stack.0, addrspace 5) - ; GCN: renamable $sgpr0 = S_MOV_B32 killed renamable $sgpr0 - ; GCN: renamable $sgpr1 = SI_SPILL_S32_RESTORE %stack.1, implicit $exec, implicit $sgpr32 :: (load (s32) from %stack.1, addrspace 5) - ; GCN: S_NOP 0, implicit killed renamable $sgpr1 - ; GCN: renamable $sgpr1 = SI_SPILL_S32_RESTORE %stack.0, implicit $exec, implicit $sgpr32 :: (load (s32) from %stack.0, addrspace 5) - ; GCN: S_NOP 0, implicit killed renamable $sgpr1 - ; GCN: S_NOP 0, implicit killed renamable $sgpr0 - ; GCN: S_ENDPGM 0 - %0:sreg_32 = IMPLICIT_DEF - %1:sreg_32 = S_MOV_B32 %0:sreg_32 - %2:sreg_32 = S_MOV_B32 %0:sreg_32 - %3:sreg_32 = S_MOV_B32 %0:sreg_32 - S_NOP 0, implicit %1 - S_NOP 0, implicit %2 - S_NOP 0, implicit %3 - S_ENDPGM 0 -... --- name: test_remat_s_mov_b64 tracksRegLiveness: true diff --git a/llvm/test/CodeGen/ARM/arm-shrink-wrapping-linux.ll b/llvm/test/CodeGen/ARM/arm-shrink-wrapping-linux.ll index 175a2069a441..a4243276c70a 100644 --- a/llvm/test/CodeGen/ARM/arm-shrink-wrapping-linux.ll +++ b/llvm/test/CodeGen/ARM/arm-shrink-wrapping-linux.ll @@ -29,20 +29,20 @@ define fastcc i8* @wrongUseOfPostDominate(i8* readonly %s, i32 %off, i8* readnon ; ENABLE-NEXT: pophs {r11, pc} ; ENABLE-NEXT: .LBB0_3: @ %while.body.preheader ; ENABLE-NEXT: movw r12, :lower16:skip -; ENABLE-NEXT: sub r3, r1, #1 +; ENABLE-NEXT: sub r1, r1, #1 ; ENABLE-NEXT: movt r12, :upper16:skip ; ENABLE-NEXT: .LBB0_4: @ %while.body ; ENABLE-NEXT: @ =>This Inner Loop Header: Depth=1 -; ENABLE-NEXT: ldrb r1, [r0] -; ENABLE-NEXT: ldrb r1, [r12, r1] -; ENABLE-NEXT: add r0, r0, r1 -; ENABLE-NEXT: sub r1, r3, #1 -; ENABLE-NEXT: cmp r1, r3 +; ENABLE-NEXT: ldrb r3, [r0] +; ENABLE-NEXT: ldrb r3, [r12, r3] +; ENABLE-NEXT: add r0, r0, r3 +; ENABLE-NEXT: sub r3, r1, #1 +; ENABLE-NEXT: cmp r3, r1 ; ENABLE-NEXT: bhs .LBB0_6 ; ENABLE-NEXT: @ %bb.5: @ %while.body ; ENABLE-NEXT: @ in Loop: Header=BB0_4 Depth=1 ; ENABLE-NEXT: cmp r0, r2 -; ENABLE-NEXT: mov r3, r1 +; ENABLE-NEXT: mov r1, r3 ; ENABLE-NEXT: blo .LBB0_4 ; ENABLE-NEXT: .LBB0_6: @ %if.end29 ; ENABLE-NEXT: pop {r11, pc} @@ -119,20 +119,20 @@ define fastcc i8* @wrongUseOfPostDominate(i8* readonly %s, i32 %off, i8* readnon ; DISABLE-NEXT: pophs {r11, pc} ; DISABLE-NEXT: .LBB0_3: @ %while.body.preheader ; DISABLE-NEXT: movw r12, :lower16:skip -; DISABLE-NEXT: sub r3, r1, #1 +; DISABLE-NEXT: sub r1, r1, #1 ; DISABLE-NEXT: movt r12, :upper16:skip ; DISABLE-NEXT: .LBB0_4: @ %while.body ; DISABLE-NEXT: @ =>This Inner Loop Header: Depth=1 -; DISABLE-NEXT: ldrb r1, [r0] -; DISABLE-NEXT: ldrb r1, [r12, r1] -; DISABLE-NEXT: add r0, r0, r1 -; DISABLE-NEXT: sub r1, r3, #1 -; DISABLE-NEXT: cmp r1, r3 +; DISABLE-NEXT: ldrb r3, [r0] +; DISABLE-NEXT: ldrb r3, [r12, r3] +; DISABLE-NEXT: add r0, r0, r3 +; DISABLE-NEXT: sub r3, r1, #1 +; DISABLE-NEXT: cmp r3, r1 ; DISABLE-NEXT: bhs .LBB0_6 ; DISABLE-NEXT: @ %bb.5: @ %while.body ; DISABLE-NEXT: @ in Loop: Header=BB0_4 Depth=1 ; DISABLE-NEXT: cmp r0, r2 -; DISABLE-NEXT: mov r3, r1 +; DISABLE-NEXT: mov r1, r3 ; DISABLE-NEXT: blo .LBB0_4 ; DISABLE-NEXT: .LBB0_6: @ %if.end29 ; DISABLE-NEXT: pop {r11, pc} diff --git a/llvm/test/CodeGen/ARM/funnel-shift-rot.ll b/llvm/test/CodeGen/ARM/funnel-shift-rot.ll index ea15fcc5c824..55157875d355 100644 --- a/llvm/test/CodeGen/ARM/funnel-shift-rot.ll +++ b/llvm/test/CodeGen/ARM/funnel-shift-rot.ll @@ -73,13 +73,13 @@ define i64 @rotl_i64(i64 %x, i64 %z) { ; SCALAR-NEXT: push {r4, r5, r11, lr} ; SCALAR-NEXT: rsb r3, r2, #0 ; SCALAR-NEXT: and r4, r2, #63 -; SCALAR-NEXT: and r12, r3, #63 -; SCALAR-NEXT: rsb r3, r12, #32 +; SCALAR-NEXT: and lr, r3, #63 +; SCALAR-NEXT: rsb r3, lr, #32 ; SCALAR-NEXT: lsl r2, r0, r4 -; SCALAR-NEXT: lsr lr, r0, r12 -; SCALAR-NEXT: orr r3, lr, r1, lsl r3 -; SCALAR-NEXT: subs lr, r12, #32 -; SCALAR-NEXT: lsrpl r3, r1, lr +; SCALAR-NEXT: lsr r12, r0, lr +; SCALAR-NEXT: orr r3, r12, r1, lsl r3 +; SCALAR-NEXT: subs r12, lr, #32 +; SCALAR-NEXT: lsrpl r3, r1, r12 ; SCALAR-NEXT: subs r5, r4, #32 ; SCALAR-NEXT: movwpl r2, #0 ; SCALAR-NEXT: cmp r5, #0 @@ -88,8 +88,8 @@ define i64 @rotl_i64(i64 %x, i64 %z) { ; SCALAR-NEXT: lsr r3, r0, r3 ; SCALAR-NEXT: orr r3, r3, r1, lsl r4 ; SCALAR-NEXT: lslpl r3, r0, r5 -; SCALAR-NEXT: lsr r0, r1, r12 -; SCALAR-NEXT: cmp lr, #0 +; SCALAR-NEXT: lsr r0, r1, lr +; SCALAR-NEXT: cmp r12, #0 ; SCALAR-NEXT: movwpl r0, #0 ; SCALAR-NEXT: orr r1, r3, r0 ; SCALAR-NEXT: mov r0, r2 @@ -245,15 +245,15 @@ define i64 @rotr_i64(i64 %x, i64 %z) { ; CHECK: @ %bb.0: ; CHECK-NEXT: .save {r4, r5, r11, lr} ; CHECK-NEXT: push {r4, r5, r11, lr} -; CHECK-NEXT: and r12, r2, #63 +; CHECK-NEXT: and lr, r2, #63 ; CHECK-NEXT: rsb r2, r2, #0 -; CHECK-NEXT: rsb r3, r12, #32 +; CHECK-NEXT: rsb r3, lr, #32 ; CHECK-NEXT: and r4, r2, #63 -; CHECK-NEXT: lsr lr, r0, r12 -; CHECK-NEXT: orr r3, lr, r1, lsl r3 -; CHECK-NEXT: subs lr, r12, #32 +; CHECK-NEXT: lsr r12, r0, lr +; CHECK-NEXT: orr r3, r12, r1, lsl r3 +; CHECK-NEXT: subs r12, lr, #32 ; CHECK-NEXT: lsl r2, r0, r4 -; CHECK-NEXT: lsrpl r3, r1, lr +; CHECK-NEXT: lsrpl r3, r1, r12 ; CHECK-NEXT: subs r5, r4, #32 ; CHECK-NEXT: movwpl r2, #0 ; CHECK-NEXT: cmp r5, #0 @@ -262,8 +262,8 @@ define i64 @rotr_i64(i64 %x, i64 %z) { ; CHECK-NEXT: lsr r3, r0, r3 ; CHECK-NEXT: orr r3, r3, r1, lsl r4 ; CHECK-NEXT: lslpl r3, r0, r5 -; CHECK-NEXT: lsr r0, r1, r12 -; CHECK-NEXT: cmp lr, #0 +; CHECK-NEXT: lsr r0, r1, lr +; CHECK-NEXT: cmp r12, #0 ; CHECK-NEXT: movwpl r0, #0 ; CHECK-NEXT: orr r1, r0, r3 ; CHECK-NEXT: mov r0, r2 diff --git a/llvm/test/CodeGen/ARM/funnel-shift.ll b/llvm/test/CodeGen/ARM/funnel-shift.ll index 6372f9be2ca3..54c93b493c98 100644 --- a/llvm/test/CodeGen/ARM/funnel-shift.ll +++ b/llvm/test/CodeGen/ARM/funnel-shift.ll @@ -224,31 +224,31 @@ define i37 @fshr_i37(i37 %x, i37 %y, i37 %z) { ; CHECK-NEXT: mov r3, #0 ; CHECK-NEXT: bl __aeabi_uldivmod ; CHECK-NEXT: add r0, r2, #27 -; CHECK-NEXT: lsl r2, r7, #27 -; CHECK-NEXT: and r12, r0, #63 ; CHECK-NEXT: lsl r6, r6, #27 +; CHECK-NEXT: and r1, r0, #63 +; CHECK-NEXT: lsl r2, r7, #27 ; CHECK-NEXT: orr r7, r6, r7, lsr #5 -; CHECK-NEXT: rsb r3, r12, #32 -; CHECK-NEXT: lsr r2, r2, r12 ; CHECK-NEXT: mov r6, #63 -; CHECK-NEXT: orr r2, r2, r7, lsl r3 -; CHECK-NEXT: subs r3, r12, #32 +; CHECK-NEXT: rsb r3, r1, #32 +; CHECK-NEXT: lsr r2, r2, r1 +; CHECK-NEXT: subs r12, r1, #32 ; CHECK-NEXT: bic r6, r6, r0 +; CHECK-NEXT: orr r2, r2, r7, lsl r3 ; CHECK-NEXT: lsl r5, r9, #1 -; CHECK-NEXT: lsrpl r2, r7, r3 -; CHECK-NEXT: subs r1, r6, #32 +; CHECK-NEXT: lsrpl r2, r7, r12 ; CHECK-NEXT: lsl r0, r5, r6 -; CHECK-NEXT: lsl r4, r8, #1 +; CHECK-NEXT: subs r4, r6, #32 +; CHECK-NEXT: lsl r3, r8, #1 ; CHECK-NEXT: movwpl r0, #0 -; CHECK-NEXT: orr r4, r4, r9, lsr #31 +; CHECK-NEXT: orr r3, r3, r9, lsr #31 ; CHECK-NEXT: orr r0, r0, r2 ; CHECK-NEXT: rsb r2, r6, #32 -; CHECK-NEXT: cmp r1, #0 +; CHECK-NEXT: cmp r4, #0 +; CHECK-NEXT: lsr r1, r7, r1 ; CHECK-NEXT: lsr r2, r5, r2 -; CHECK-NEXT: orr r2, r2, r4, lsl r6 -; CHECK-NEXT: lslpl r2, r5, r1 -; CHECK-NEXT: lsr r1, r7, r12 -; CHECK-NEXT: cmp r3, #0 +; CHECK-NEXT: orr r2, r2, r3, lsl r6 +; CHECK-NEXT: lslpl r2, r5, r4 +; CHECK-NEXT: cmp r12, #0 ; CHECK-NEXT: movwpl r1, #0 ; CHECK-NEXT: orr r1, r2, r1 ; CHECK-NEXT: pop {r4, r5, r6, r7, r8, r9, r11, pc} diff --git a/llvm/test/CodeGen/ARM/illegal-bitfield-loadstore.ll b/llvm/test/CodeGen/ARM/illegal-bitfield-loadstore.ll index 0a0bb62b0a09..2922e0ed5423 100644 --- a/llvm/test/CodeGen/ARM/illegal-bitfield-loadstore.ll +++ b/llvm/test/CodeGen/ARM/illegal-bitfield-loadstore.ll @@ -91,17 +91,17 @@ define void @i56_or(i56* %a) { ; BE-LABEL: i56_or: ; BE: @ %bb.0: ; BE-NEXT: mov r1, r0 +; BE-NEXT: ldr r12, [r0] ; BE-NEXT: ldrh r2, [r1, #4]! ; BE-NEXT: ldrb r3, [r1, #2] ; BE-NEXT: orr r2, r3, r2, lsl #8 -; BE-NEXT: ldr r3, [r0] -; BE-NEXT: orr r2, r2, r3, lsl #24 -; BE-NEXT: orr r12, r2, #384 -; BE-NEXT: strb r12, [r1, #2] -; BE-NEXT: lsr r2, r12, #8 -; BE-NEXT: strh r2, [r1] -; BE-NEXT: bic r1, r3, #255 -; BE-NEXT: orr r1, r1, r12, lsr #24 +; BE-NEXT: orr r2, r2, r12, lsl #24 +; BE-NEXT: orr r2, r2, #384 +; BE-NEXT: strb r2, [r1, #2] +; BE-NEXT: lsr r3, r2, #8 +; BE-NEXT: strh r3, [r1] +; BE-NEXT: bic r1, r12, #255 +; BE-NEXT: orr r1, r1, r2, lsr #24 ; BE-NEXT: str r1, [r0] ; BE-NEXT: mov pc, lr %aa = load i56, i56* %a @@ -127,13 +127,13 @@ define void @i56_and_or(i56* %a) { ; BE-NEXT: ldrb r3, [r1, #2] ; BE-NEXT: strb r2, [r1, #2] ; BE-NEXT: orr r2, r3, r12, lsl #8 -; BE-NEXT: ldr r3, [r0] -; BE-NEXT: orr r2, r2, r3, lsl #24 -; BE-NEXT: orr r12, r2, #384 -; BE-NEXT: lsr r2, r12, #8 -; BE-NEXT: strh r2, [r1] -; BE-NEXT: bic r1, r3, #255 -; BE-NEXT: orr r1, r1, r12, lsr #24 +; BE-NEXT: ldr r12, [r0] +; BE-NEXT: orr r2, r2, r12, lsl #24 +; BE-NEXT: orr r2, r2, #384 +; BE-NEXT: lsr r3, r2, #8 +; BE-NEXT: strh r3, [r1] +; BE-NEXT: bic r1, r12, #255 +; BE-NEXT: orr r1, r1, r2, lsr #24 ; BE-NEXT: str r1, [r0] ; BE-NEXT: mov pc, lr diff --git a/llvm/test/CodeGen/ARM/neon-copy.ll b/llvm/test/CodeGen/ARM/neon-copy.ll index 46490efb6631..09a991da2e59 100644 --- a/llvm/test/CodeGen/ARM/neon-copy.ll +++ b/llvm/test/CodeGen/ARM/neon-copy.ll @@ -1340,16 +1340,16 @@ define <4 x i16> @test_extracts_inserts_varidx_insert(<8 x i16> %x, i32 %idx) { ; CHECK-NEXT: .pad #8 ; CHECK-NEXT: sub sp, sp, #8 ; CHECK-NEXT: vmov.u16 r1, d0[1] -; CHECK-NEXT: and r12, r0, #3 +; CHECK-NEXT: and r0, r0, #3 ; CHECK-NEXT: vmov.u16 r2, d0[2] -; CHECK-NEXT: mov r0, sp -; CHECK-NEXT: vmov.u16 r3, d0[3] -; CHECK-NEXT: orr r0, r0, r12, lsl #1 +; CHECK-NEXT: mov r3, sp +; CHECK-NEXT: vmov.u16 r12, d0[3] +; CHECK-NEXT: orr r0, r3, r0, lsl #1 ; CHECK-NEXT: vst1.16 {d0[0]}, [r0:16] ; CHECK-NEXT: vldr d0, [sp] ; CHECK-NEXT: vmov.16 d0[1], r1 ; CHECK-NEXT: vmov.16 d0[2], r2 -; CHECK-NEXT: vmov.16 d0[3], r3 +; CHECK-NEXT: vmov.16 d0[3], r12 ; CHECK-NEXT: add sp, sp, #8 ; CHECK-NEXT: bx lr %tmp = extractelement <8 x i16> %x, i32 0 diff --git a/llvm/test/CodeGen/Mips/llvm-ir/ashr.ll b/llvm/test/CodeGen/Mips/llvm-ir/ashr.ll index a125446b27c3..8be7100d368b 100644 --- a/llvm/test/CodeGen/Mips/llvm-ir/ashr.ll +++ b/llvm/test/CodeGen/Mips/llvm-ir/ashr.ll @@ -766,85 +766,79 @@ define signext i128 @ashr_i128(i128 signext %a, i128 signext %b) { ; MMR3-NEXT: .cfi_offset 17, -4 ; MMR3-NEXT: .cfi_offset 16, -8 ; MMR3-NEXT: move $8, $7 -; MMR3-NEXT: move $2, $6 -; MMR3-NEXT: sw $5, 0($sp) # 4-byte Folded Spill -; MMR3-NEXT: sw $4, 12($sp) # 4-byte Folded Spill +; MMR3-NEXT: sw $6, 32($sp) # 4-byte Folded Spill +; MMR3-NEXT: sw $5, 36($sp) # 4-byte Folded Spill +; MMR3-NEXT: sw $4, 8($sp) # 4-byte Folded Spill ; MMR3-NEXT: lw $16, 76($sp) -; MMR3-NEXT: srlv $3, $7, $16 -; MMR3-NEXT: not16 $6, $16 -; MMR3-NEXT: sw $6, 24($sp) # 4-byte Folded Spill -; MMR3-NEXT: move $4, $2 -; MMR3-NEXT: sw $2, 32($sp) # 4-byte Folded Spill -; MMR3-NEXT: sll16 $2, $2, 1 -; MMR3-NEXT: sllv $2, $2, $6 -; MMR3-NEXT: li16 $6, 64 -; MMR3-NEXT: or16 $2, $3 -; MMR3-NEXT: srlv $4, $4, $16 -; MMR3-NEXT: sw $4, 16($sp) # 4-byte Folded Spill -; MMR3-NEXT: subu16 $7, $6, $16 +; MMR3-NEXT: srlv $4, $7, $16 +; MMR3-NEXT: not16 $3, $16 +; MMR3-NEXT: sw $3, 24($sp) # 4-byte Folded Spill +; MMR3-NEXT: sll16 $2, $6, 1 +; MMR3-NEXT: sllv $3, $2, $3 +; MMR3-NEXT: li16 $2, 64 +; MMR3-NEXT: or16 $3, $4 +; MMR3-NEXT: srlv $6, $6, $16 +; MMR3-NEXT: sw $6, 12($sp) # 4-byte Folded Spill +; MMR3-NEXT: subu16 $7, $2, $16 ; MMR3-NEXT: sllv $9, $5, $7 -; MMR3-NEXT: andi16 $5, $7, 32 -; MMR3-NEXT: sw $5, 28($sp) # 4-byte Folded Spill -; MMR3-NEXT: andi16 $6, $16, 32 -; MMR3-NEXT: sw $6, 36($sp) # 4-byte Folded Spill -; MMR3-NEXT: move $3, $9 +; MMR3-NEXT: andi16 $2, $7, 32 +; MMR3-NEXT: sw $2, 28($sp) # 4-byte Folded Spill +; MMR3-NEXT: andi16 $5, $16, 32 +; MMR3-NEXT: sw $5, 16($sp) # 4-byte Folded Spill +; MMR3-NEXT: move $4, $9 ; MMR3-NEXT: li16 $17, 0 -; MMR3-NEXT: movn $3, $17, $5 -; MMR3-NEXT: movn $2, $4, $6 -; MMR3-NEXT: addiu $4, $16, -64 -; MMR3-NEXT: lw $17, 0($sp) # 4-byte Folded Reload -; MMR3-NEXT: srlv $4, $17, $4 -; MMR3-NEXT: sw $4, 20($sp) # 4-byte Folded Spill -; MMR3-NEXT: lw $6, 12($sp) # 4-byte Folded Reload -; MMR3-NEXT: sll16 $4, $6, 1 -; MMR3-NEXT: sw $4, 8($sp) # 4-byte Folded Spill -; MMR3-NEXT: addiu $5, $16, -64 -; MMR3-NEXT: not16 $5, $5 -; MMR3-NEXT: sllv $5, $4, $5 -; MMR3-NEXT: or16 $2, $3 -; MMR3-NEXT: lw $3, 20($sp) # 4-byte Folded Reload -; MMR3-NEXT: or16 $5, $3 -; MMR3-NEXT: addiu $3, $16, -64 -; MMR3-NEXT: srav $1, $6, $3 -; MMR3-NEXT: andi16 $3, $3, 32 -; MMR3-NEXT: sw $3, 20($sp) # 4-byte Folded Spill -; MMR3-NEXT: movn $5, $1, $3 -; MMR3-NEXT: sllv $3, $6, $7 -; MMR3-NEXT: sw $3, 4($sp) # 4-byte Folded Spill -; MMR3-NEXT: not16 $3, $7 -; MMR3-NEXT: srl16 $4, $17, 1 -; MMR3-NEXT: srlv $3, $4, $3 +; MMR3-NEXT: movn $4, $17, $2 +; MMR3-NEXT: movn $3, $6, $5 +; MMR3-NEXT: addiu $2, $16, -64 +; MMR3-NEXT: lw $5, 36($sp) # 4-byte Folded Reload +; MMR3-NEXT: srlv $5, $5, $2 +; MMR3-NEXT: sw $5, 20($sp) # 4-byte Folded Spill +; MMR3-NEXT: lw $17, 8($sp) # 4-byte Folded Reload +; MMR3-NEXT: sll16 $6, $17, 1 +; MMR3-NEXT: sw $6, 4($sp) # 4-byte Folded Spill +; MMR3-NEXT: not16 $5, $2 +; MMR3-NEXT: sllv $5, $6, $5 +; MMR3-NEXT: or16 $3, $4 +; MMR3-NEXT: lw $4, 20($sp) # 4-byte Folded Reload +; MMR3-NEXT: or16 $5, $4 +; MMR3-NEXT: srav $1, $17, $2 +; MMR3-NEXT: andi16 $2, $2, 32 +; MMR3-NEXT: sw $2, 20($sp) # 4-byte Folded Spill +; MMR3-NEXT: movn $5, $1, $2 +; MMR3-NEXT: sllv $2, $17, $7 +; MMR3-NEXT: not16 $4, $7 +; MMR3-NEXT: lw $7, 36($sp) # 4-byte Folded Reload +; MMR3-NEXT: srl16 $6, $7, 1 +; MMR3-NEXT: srlv $6, $6, $4 ; MMR3-NEXT: sltiu $10, $16, 64 -; MMR3-NEXT: movn $5, $2, $10 -; MMR3-NEXT: lw $2, 4($sp) # 4-byte Folded Reload +; MMR3-NEXT: movn $5, $3, $10 +; MMR3-NEXT: or16 $6, $2 +; MMR3-NEXT: srlv $2, $7, $16 +; MMR3-NEXT: lw $3, 24($sp) # 4-byte Folded Reload +; MMR3-NEXT: lw $4, 4($sp) # 4-byte Folded Reload +; MMR3-NEXT: sllv $3, $4, $3 ; MMR3-NEXT: or16 $3, $2 -; MMR3-NEXT: srlv $2, $17, $16 -; MMR3-NEXT: lw $4, 24($sp) # 4-byte Folded Reload -; MMR3-NEXT: lw $7, 8($sp) # 4-byte Folded Reload -; MMR3-NEXT: sllv $17, $7, $4 -; MMR3-NEXT: or16 $17, $2 -; MMR3-NEXT: srav $11, $6, $16 -; MMR3-NEXT: lw $2, 36($sp) # 4-byte Folded Reload -; MMR3-NEXT: movn $17, $11, $2 -; MMR3-NEXT: sra $2, $6, 31 +; MMR3-NEXT: srav $11, $17, $16 +; MMR3-NEXT: lw $4, 16($sp) # 4-byte Folded Reload +; MMR3-NEXT: movn $3, $11, $4 +; MMR3-NEXT: sra $2, $17, 31 ; MMR3-NEXT: movz $5, $8, $16 -; MMR3-NEXT: move $4, $2 -; MMR3-NEXT: movn $4, $17, $10 -; MMR3-NEXT: lw $6, 28($sp) # 4-byte Folded Reload -; MMR3-NEXT: movn $3, $9, $6 -; MMR3-NEXT: lw $6, 36($sp) # 4-byte Folded Reload -; MMR3-NEXT: li16 $17, 0 -; MMR3-NEXT: lw $7, 16($sp) # 4-byte Folded Reload -; MMR3-NEXT: movn $7, $17, $6 -; MMR3-NEXT: or16 $7, $3 +; MMR3-NEXT: move $8, $2 +; MMR3-NEXT: movn $8, $3, $10 +; MMR3-NEXT: lw $3, 28($sp) # 4-byte Folded Reload +; MMR3-NEXT: movn $6, $9, $3 +; MMR3-NEXT: li16 $3, 0 +; MMR3-NEXT: lw $7, 12($sp) # 4-byte Folded Reload +; MMR3-NEXT: movn $7, $3, $4 +; MMR3-NEXT: or16 $7, $6 ; MMR3-NEXT: lw $3, 20($sp) # 4-byte Folded Reload ; MMR3-NEXT: movn $1, $2, $3 ; MMR3-NEXT: movn $1, $7, $10 ; MMR3-NEXT: lw $3, 32($sp) # 4-byte Folded Reload ; MMR3-NEXT: movz $1, $3, $16 -; MMR3-NEXT: movn $11, $2, $6 +; MMR3-NEXT: movn $11, $2, $4 ; MMR3-NEXT: movn $2, $11, $10 -; MMR3-NEXT: move $3, $4 +; MMR3-NEXT: move $3, $8 ; MMR3-NEXT: move $4, $1 ; MMR3-NEXT: lwp $16, 40($sp) ; MMR3-NEXT: addiusp 48 @@ -858,80 +852,79 @@ define signext i128 @ashr_i128(i128 signext %a, i128 signext %b) { ; MMR6-NEXT: sw $16, 8($sp) # 4-byte Folded Spill ; MMR6-NEXT: .cfi_offset 17, -4 ; MMR6-NEXT: .cfi_offset 16, -8 -; MMR6-NEXT: move $12, $7 +; MMR6-NEXT: move $1, $7 ; MMR6-NEXT: lw $3, 44($sp) ; MMR6-NEXT: li16 $2, 64 -; MMR6-NEXT: subu16 $16, $2, $3 -; MMR6-NEXT: sllv $1, $5, $16 -; MMR6-NEXT: andi16 $2, $16, 32 -; MMR6-NEXT: selnez $8, $1, $2 -; MMR6-NEXT: sllv $9, $4, $16 -; MMR6-NEXT: not16 $16, $16 -; MMR6-NEXT: srl16 $17, $5, 1 -; MMR6-NEXT: srlv $10, $17, $16 -; MMR6-NEXT: or $9, $9, $10 -; MMR6-NEXT: seleqz $9, $9, $2 -; MMR6-NEXT: or $8, $8, $9 -; MMR6-NEXT: srlv $9, $7, $3 -; MMR6-NEXT: not16 $7, $3 -; MMR6-NEXT: sw $7, 4($sp) # 4-byte Folded Spill +; MMR6-NEXT: subu16 $7, $2, $3 +; MMR6-NEXT: sllv $8, $5, $7 +; MMR6-NEXT: andi16 $2, $7, 32 +; MMR6-NEXT: selnez $9, $8, $2 +; MMR6-NEXT: sllv $10, $4, $7 +; MMR6-NEXT: not16 $7, $7 +; MMR6-NEXT: srl16 $16, $5, 1 +; MMR6-NEXT: srlv $7, $16, $7 +; MMR6-NEXT: or $7, $10, $7 +; MMR6-NEXT: seleqz $7, $7, $2 +; MMR6-NEXT: or $7, $9, $7 +; MMR6-NEXT: srlv $9, $1, $3 +; MMR6-NEXT: not16 $16, $3 +; MMR6-NEXT: sw $16, 4($sp) # 4-byte Folded Spill ; MMR6-NEXT: sll16 $17, $6, 1 -; MMR6-NEXT: sllv $10, $17, $7 +; MMR6-NEXT: sllv $10, $17, $16 ; MMR6-NEXT: or $9, $10, $9 ; MMR6-NEXT: andi16 $17, $3, 32 ; MMR6-NEXT: seleqz $9, $9, $17 ; MMR6-NEXT: srlv $10, $6, $3 ; MMR6-NEXT: selnez $11, $10, $17 ; MMR6-NEXT: seleqz $10, $10, $17 -; MMR6-NEXT: or $8, $10, $8 -; MMR6-NEXT: seleqz $1, $1, $2 -; MMR6-NEXT: or $9, $11, $9 +; MMR6-NEXT: or $10, $10, $7 +; MMR6-NEXT: seleqz $12, $8, $2 +; MMR6-NEXT: or $8, $11, $9 ; MMR6-NEXT: addiu $2, $3, -64 -; MMR6-NEXT: srlv $10, $5, $2 +; MMR6-NEXT: srlv $9, $5, $2 ; MMR6-NEXT: sll16 $7, $4, 1 ; MMR6-NEXT: not16 $16, $2 ; MMR6-NEXT: sllv $11, $7, $16 ; MMR6-NEXT: sltiu $13, $3, 64 -; MMR6-NEXT: or $1, $9, $1 -; MMR6-NEXT: selnez $8, $8, $13 -; MMR6-NEXT: or $9, $11, $10 -; MMR6-NEXT: srav $10, $4, $2 +; MMR6-NEXT: or $8, $8, $12 +; MMR6-NEXT: selnez $10, $10, $13 +; MMR6-NEXT: or $9, $11, $9 +; MMR6-NEXT: srav $11, $4, $2 ; MMR6-NEXT: andi16 $2, $2, 32 -; MMR6-NEXT: seleqz $11, $10, $2 +; MMR6-NEXT: seleqz $12, $11, $2 ; MMR6-NEXT: sra $14, $4, 31 ; MMR6-NEXT: selnez $15, $14, $2 ; MMR6-NEXT: seleqz $9, $9, $2 -; MMR6-NEXT: or $11, $15, $11 -; MMR6-NEXT: seleqz $11, $11, $13 -; MMR6-NEXT: selnez $2, $10, $2 -; MMR6-NEXT: seleqz $10, $14, $13 -; MMR6-NEXT: or $8, $8, $11 -; MMR6-NEXT: selnez $8, $8, $3 -; MMR6-NEXT: selnez $1, $1, $13 +; MMR6-NEXT: or $12, $15, $12 +; MMR6-NEXT: seleqz $12, $12, $13 +; MMR6-NEXT: selnez $2, $11, $2 +; MMR6-NEXT: seleqz $11, $14, $13 +; MMR6-NEXT: or $10, $10, $12 +; MMR6-NEXT: selnez $10, $10, $3 +; MMR6-NEXT: selnez $8, $8, $13 ; MMR6-NEXT: or $2, $2, $9 ; MMR6-NEXT: srav $9, $4, $3 ; MMR6-NEXT: seleqz $4, $9, $17 -; MMR6-NEXT: selnez $11, $14, $17 -; MMR6-NEXT: or $4, $11, $4 -; MMR6-NEXT: selnez $11, $4, $13 +; MMR6-NEXT: selnez $12, $14, $17 +; MMR6-NEXT: or $4, $12, $4 +; MMR6-NEXT: selnez $12, $4, $13 ; MMR6-NEXT: seleqz $2, $2, $13 ; MMR6-NEXT: seleqz $4, $6, $3 -; MMR6-NEXT: seleqz $6, $12, $3 +; MMR6-NEXT: seleqz $1, $1, $3 +; MMR6-NEXT: or $2, $8, $2 +; MMR6-NEXT: selnez $2, $2, $3 ; MMR6-NEXT: or $1, $1, $2 -; MMR6-NEXT: selnez $1, $1, $3 -; MMR6-NEXT: or $1, $6, $1 -; MMR6-NEXT: or $4, $4, $8 -; MMR6-NEXT: or $6, $11, $10 -; MMR6-NEXT: srlv $2, $5, $3 -; MMR6-NEXT: lw $3, 4($sp) # 4-byte Folded Reload -; MMR6-NEXT: sllv $3, $7, $3 -; MMR6-NEXT: or $2, $3, $2 -; MMR6-NEXT: seleqz $2, $2, $17 -; MMR6-NEXT: selnez $3, $9, $17 -; MMR6-NEXT: or $2, $3, $2 -; MMR6-NEXT: selnez $2, $2, $13 -; MMR6-NEXT: or $3, $2, $10 -; MMR6-NEXT: move $2, $6 +; MMR6-NEXT: or $4, $4, $10 +; MMR6-NEXT: or $2, $12, $11 +; MMR6-NEXT: srlv $3, $5, $3 +; MMR6-NEXT: lw $5, 4($sp) # 4-byte Folded Reload +; MMR6-NEXT: sllv $5, $7, $5 +; MMR6-NEXT: or $3, $5, $3 +; MMR6-NEXT: seleqz $3, $3, $17 +; MMR6-NEXT: selnez $5, $9, $17 +; MMR6-NEXT: or $3, $5, $3 +; MMR6-NEXT: selnez $3, $3, $13 +; MMR6-NEXT: or $3, $3, $11 ; MMR6-NEXT: move $5, $1 ; MMR6-NEXT: lw $16, 8($sp) # 4-byte Folded Reload ; MMR6-NEXT: lw $17, 12($sp) # 4-byte Folded Reload diff --git a/llvm/test/CodeGen/Mips/llvm-ir/lshr.ll b/llvm/test/CodeGen/Mips/llvm-ir/lshr.ll index e4b4b3ae1d0f..ed2bfc9fcf60 100644 --- a/llvm/test/CodeGen/Mips/llvm-ir/lshr.ll +++ b/llvm/test/CodeGen/Mips/llvm-ir/lshr.ll @@ -776,77 +776,76 @@ define signext i128 @lshr_i128(i128 signext %a, i128 signext %b) { ; MMR3-NEXT: .cfi_offset 17, -4 ; MMR3-NEXT: .cfi_offset 16, -8 ; MMR3-NEXT: move $8, $7 -; MMR3-NEXT: sw $5, 4($sp) # 4-byte Folded Spill +; MMR3-NEXT: sw $6, 24($sp) # 4-byte Folded Spill ; MMR3-NEXT: sw $4, 28($sp) # 4-byte Folded Spill ; MMR3-NEXT: lw $16, 68($sp) ; MMR3-NEXT: li16 $2, 64 -; MMR3-NEXT: subu16 $17, $2, $16 -; MMR3-NEXT: sllv $9, $5, $17 -; MMR3-NEXT: andi16 $3, $17, 32 +; MMR3-NEXT: subu16 $7, $2, $16 +; MMR3-NEXT: sllv $9, $5, $7 +; MMR3-NEXT: move $17, $5 +; MMR3-NEXT: sw $5, 0($sp) # 4-byte Folded Spill +; MMR3-NEXT: andi16 $3, $7, 32 ; MMR3-NEXT: sw $3, 20($sp) # 4-byte Folded Spill ; MMR3-NEXT: li16 $2, 0 ; MMR3-NEXT: move $4, $9 ; MMR3-NEXT: movn $4, $2, $3 -; MMR3-NEXT: srlv $5, $7, $16 +; MMR3-NEXT: srlv $5, $8, $16 ; MMR3-NEXT: not16 $3, $16 ; MMR3-NEXT: sw $3, 16($sp) # 4-byte Folded Spill ; MMR3-NEXT: sll16 $2, $6, 1 -; MMR3-NEXT: sw $6, 24($sp) # 4-byte Folded Spill ; MMR3-NEXT: sllv $2, $2, $3 ; MMR3-NEXT: or16 $2, $5 -; MMR3-NEXT: srlv $7, $6, $16 +; MMR3-NEXT: srlv $5, $6, $16 +; MMR3-NEXT: sw $5, 4($sp) # 4-byte Folded Spill ; MMR3-NEXT: andi16 $3, $16, 32 ; MMR3-NEXT: sw $3, 12($sp) # 4-byte Folded Spill -; MMR3-NEXT: movn $2, $7, $3 +; MMR3-NEXT: movn $2, $5, $3 ; MMR3-NEXT: addiu $3, $16, -64 ; MMR3-NEXT: or16 $2, $4 -; MMR3-NEXT: lw $6, 4($sp) # 4-byte Folded Reload -; MMR3-NEXT: srlv $3, $6, $3 -; MMR3-NEXT: sw $3, 8($sp) # 4-byte Folded Spill -; MMR3-NEXT: lw $3, 28($sp) # 4-byte Folded Reload -; MMR3-NEXT: sll16 $4, $3, 1 -; MMR3-NEXT: sw $4, 0($sp) # 4-byte Folded Spill -; MMR3-NEXT: addiu $5, $16, -64 -; MMR3-NEXT: not16 $5, $5 -; MMR3-NEXT: sllv $5, $4, $5 -; MMR3-NEXT: lw $4, 8($sp) # 4-byte Folded Reload -; MMR3-NEXT: or16 $5, $4 -; MMR3-NEXT: addiu $4, $16, -64 -; MMR3-NEXT: srlv $1, $3, $4 -; MMR3-NEXT: andi16 $4, $4, 32 +; MMR3-NEXT: srlv $4, $17, $3 ; MMR3-NEXT: sw $4, 8($sp) # 4-byte Folded Spill -; MMR3-NEXT: movn $5, $1, $4 +; MMR3-NEXT: lw $4, 28($sp) # 4-byte Folded Reload +; MMR3-NEXT: sll16 $6, $4, 1 +; MMR3-NEXT: not16 $5, $3 +; MMR3-NEXT: sllv $5, $6, $5 +; MMR3-NEXT: lw $17, 8($sp) # 4-byte Folded Reload +; MMR3-NEXT: or16 $5, $17 +; MMR3-NEXT: srlv $1, $4, $3 +; MMR3-NEXT: andi16 $3, $3, 32 +; MMR3-NEXT: sw $3, 8($sp) # 4-byte Folded Spill +; MMR3-NEXT: movn $5, $1, $3 ; MMR3-NEXT: sltiu $10, $16, 64 ; MMR3-NEXT: movn $5, $2, $10 -; MMR3-NEXT: sllv $2, $3, $17 -; MMR3-NEXT: not16 $3, $17 -; MMR3-NEXT: srl16 $4, $6, 1 +; MMR3-NEXT: sllv $2, $4, $7 +; MMR3-NEXT: not16 $3, $7 +; MMR3-NEXT: lw $7, 0($sp) # 4-byte Folded Reload +; MMR3-NEXT: srl16 $4, $7, 1 ; MMR3-NEXT: srlv $4, $4, $3 ; MMR3-NEXT: or16 $4, $2 -; MMR3-NEXT: srlv $2, $6, $16 +; MMR3-NEXT: srlv $2, $7, $16 ; MMR3-NEXT: lw $3, 16($sp) # 4-byte Folded Reload -; MMR3-NEXT: lw $6, 0($sp) # 4-byte Folded Reload ; MMR3-NEXT: sllv $3, $6, $3 ; MMR3-NEXT: or16 $3, $2 ; MMR3-NEXT: lw $2, 28($sp) # 4-byte Folded Reload ; MMR3-NEXT: srlv $2, $2, $16 -; MMR3-NEXT: lw $6, 12($sp) # 4-byte Folded Reload -; MMR3-NEXT: movn $3, $2, $6 +; MMR3-NEXT: lw $17, 12($sp) # 4-byte Folded Reload +; MMR3-NEXT: movn $3, $2, $17 ; MMR3-NEXT: movz $5, $8, $16 -; MMR3-NEXT: li16 $17, 0 -; MMR3-NEXT: movz $3, $17, $10 -; MMR3-NEXT: lw $17, 20($sp) # 4-byte Folded Reload -; MMR3-NEXT: movn $4, $9, $17 -; MMR3-NEXT: li16 $17, 0 -; MMR3-NEXT: movn $7, $17, $6 -; MMR3-NEXT: or16 $7, $4 +; MMR3-NEXT: li16 $6, 0 +; MMR3-NEXT: movz $3, $6, $10 +; MMR3-NEXT: lw $7, 20($sp) # 4-byte Folded Reload +; MMR3-NEXT: movn $4, $9, $7 +; MMR3-NEXT: lw $6, 4($sp) # 4-byte Folded Reload +; MMR3-NEXT: li16 $7, 0 +; MMR3-NEXT: movn $6, $7, $17 +; MMR3-NEXT: or16 $6, $4 ; MMR3-NEXT: lw $4, 8($sp) # 4-byte Folded Reload -; MMR3-NEXT: movn $1, $17, $4 -; MMR3-NEXT: li16 $17, 0 -; MMR3-NEXT: movn $1, $7, $10 +; MMR3-NEXT: movn $1, $7, $4 +; MMR3-NEXT: li16 $7, 0 +; MMR3-NEXT: movn $1, $6, $10 ; MMR3-NEXT: lw $4, 24($sp) # 4-byte Folded Reload ; MMR3-NEXT: movz $1, $4, $16 -; MMR3-NEXT: movn $2, $17, $6 +; MMR3-NEXT: movn $2, $7, $17 ; MMR3-NEXT: li16 $4, 0 ; MMR3-NEXT: movz $2, $4, $10 ; MMR3-NEXT: move $4, $1 @@ -856,91 +855,98 @@ define signext i128 @lshr_i128(i128 signext %a, i128 signext %b) { ; ; MMR6-LABEL: lshr_i128: ; MMR6: # %bb.0: # %entry -; MMR6-NEXT: addiu $sp, $sp, -24 -; MMR6-NEXT: .cfi_def_cfa_offset 24 -; MMR6-NEXT: sw $17, 20($sp) # 4-byte Folded Spill -; MMR6-NEXT: sw $16, 16($sp) # 4-byte Folded Spill +; MMR6-NEXT: addiu $sp, $sp, -32 +; MMR6-NEXT: .cfi_def_cfa_offset 32 +; MMR6-NEXT: sw $17, 28($sp) # 4-byte Folded Spill +; MMR6-NEXT: sw $16, 24($sp) # 4-byte Folded Spill ; MMR6-NEXT: .cfi_offset 17, -4 ; MMR6-NEXT: .cfi_offset 16, -8 ; MMR6-NEXT: move $1, $7 -; MMR6-NEXT: move $7, $4 -; MMR6-NEXT: lw $3, 52($sp) +; MMR6-NEXT: move $7, $5 +; MMR6-NEXT: lw $3, 60($sp) ; MMR6-NEXT: srlv $2, $1, $3 -; MMR6-NEXT: not16 $16, $3 -; MMR6-NEXT: sw $16, 8($sp) # 4-byte Folded Spill -; MMR6-NEXT: move $4, $6 -; MMR6-NEXT: sw $6, 12($sp) # 4-byte Folded Spill +; MMR6-NEXT: not16 $5, $3 +; MMR6-NEXT: sw $5, 12($sp) # 4-byte Folded Spill +; MMR6-NEXT: move $17, $6 +; MMR6-NEXT: sw $6, 16($sp) # 4-byte Folded Spill ; MMR6-NEXT: sll16 $6, $6, 1 -; MMR6-NEXT: sllv $6, $6, $16 +; MMR6-NEXT: sllv $6, $6, $5 ; MMR6-NEXT: or $8, $6, $2 -; MMR6-NEXT: addiu $6, $3, -64 -; MMR6-NEXT: srlv $9, $5, $6 -; MMR6-NEXT: sll16 $2, $7, 1 -; MMR6-NEXT: sw $2, 4($sp) # 4-byte Folded Spill -; MMR6-NEXT: not16 $16, $6 +; MMR6-NEXT: addiu $5, $3, -64 +; MMR6-NEXT: srlv $9, $7, $5 +; MMR6-NEXT: move $6, $4 +; MMR6-NEXT: sll16 $2, $4, 1 +; MMR6-NEXT: sw $2, 8($sp) # 4-byte Folded Spill +; MMR6-NEXT: not16 $16, $5 ; MMR6-NEXT: sllv $10, $2, $16 ; MMR6-NEXT: andi16 $16, $3, 32 ; MMR6-NEXT: seleqz $8, $8, $16 ; MMR6-NEXT: or $9, $10, $9 -; MMR6-NEXT: srlv $10, $4, $3 +; MMR6-NEXT: srlv $10, $17, $3 ; MMR6-NEXT: selnez $11, $10, $16 ; MMR6-NEXT: li16 $17, 64 ; MMR6-NEXT: subu16 $2, $17, $3 -; MMR6-NEXT: sllv $12, $5, $2 +; MMR6-NEXT: sllv $12, $7, $2 +; MMR6-NEXT: move $17, $7 ; MMR6-NEXT: andi16 $4, $2, 32 -; MMR6-NEXT: andi16 $17, $6, 32 -; MMR6-NEXT: seleqz $9, $9, $17 +; MMR6-NEXT: andi16 $7, $5, 32 +; MMR6-NEXT: sw $7, 20($sp) # 4-byte Folded Spill +; MMR6-NEXT: seleqz $9, $9, $7 ; MMR6-NEXT: seleqz $13, $12, $4 ; MMR6-NEXT: or $8, $11, $8 ; MMR6-NEXT: selnez $11, $12, $4 -; MMR6-NEXT: sllv $12, $7, $2 +; MMR6-NEXT: sllv $12, $6, $2 +; MMR6-NEXT: move $7, $6 +; MMR6-NEXT: sw $6, 4($sp) # 4-byte Folded Spill ; MMR6-NEXT: not16 $2, $2 -; MMR6-NEXT: srl16 $6, $5, 1 +; MMR6-NEXT: srl16 $6, $17, 1 ; MMR6-NEXT: srlv $2, $6, $2 ; MMR6-NEXT: or $2, $12, $2 ; MMR6-NEXT: seleqz $2, $2, $4 -; MMR6-NEXT: addiu $4, $3, -64 -; MMR6-NEXT: srlv $4, $7, $4 -; MMR6-NEXT: or $12, $11, $2 -; MMR6-NEXT: or $6, $8, $13 -; MMR6-NEXT: srlv $5, $5, $3 -; MMR6-NEXT: selnez $8, $4, $17 -; MMR6-NEXT: sltiu $11, $3, 64 -; MMR6-NEXT: selnez $13, $6, $11 -; MMR6-NEXT: or $8, $8, $9 +; MMR6-NEXT: srlv $4, $7, $5 +; MMR6-NEXT: or $11, $11, $2 +; MMR6-NEXT: or $5, $8, $13 +; MMR6-NEXT: srlv $6, $17, $3 +; MMR6-NEXT: lw $2, 20($sp) # 4-byte Folded Reload +; MMR6-NEXT: selnez $7, $4, $2 +; MMR6-NEXT: sltiu $8, $3, 64 +; MMR6-NEXT: selnez $12, $5, $8 +; MMR6-NEXT: or $7, $7, $9 +; MMR6-NEXT: lw $5, 12($sp) # 4-byte Folded Reload ; MMR6-NEXT: lw $2, 8($sp) # 4-byte Folded Reload -; MMR6-NEXT: lw $6, 4($sp) # 4-byte Folded Reload -; MMR6-NEXT: sllv $9, $6, $2 +; MMR6-NEXT: sllv $9, $2, $5 ; MMR6-NEXT: seleqz $10, $10, $16 -; MMR6-NEXT: li16 $2, 0 -; MMR6-NEXT: or $10, $10, $12 -; MMR6-NEXT: or $9, $9, $5 -; MMR6-NEXT: seleqz $5, $8, $11 -; MMR6-NEXT: seleqz $8, $2, $11 -; MMR6-NEXT: srlv $7, $7, $3 -; MMR6-NEXT: seleqz $2, $7, $16 -; MMR6-NEXT: selnez $2, $2, $11 +; MMR6-NEXT: li16 $5, 0 +; MMR6-NEXT: or $10, $10, $11 +; MMR6-NEXT: or $6, $9, $6 +; MMR6-NEXT: seleqz $2, $7, $8 +; MMR6-NEXT: seleqz $7, $5, $8 +; MMR6-NEXT: lw $5, 4($sp) # 4-byte Folded Reload +; MMR6-NEXT: srlv $9, $5, $3 +; MMR6-NEXT: seleqz $11, $9, $16 +; MMR6-NEXT: selnez $11, $11, $8 ; MMR6-NEXT: seleqz $1, $1, $3 -; MMR6-NEXT: or $5, $13, $5 -; MMR6-NEXT: selnez $5, $5, $3 -; MMR6-NEXT: or $5, $1, $5 -; MMR6-NEXT: or $2, $8, $2 -; MMR6-NEXT: seleqz $1, $9, $16 -; MMR6-NEXT: selnez $6, $7, $16 -; MMR6-NEXT: lw $7, 12($sp) # 4-byte Folded Reload -; MMR6-NEXT: seleqz $7, $7, $3 -; MMR6-NEXT: selnez $9, $10, $11 -; MMR6-NEXT: seleqz $4, $4, $17 -; MMR6-NEXT: seleqz $4, $4, $11 -; MMR6-NEXT: or $4, $9, $4 +; MMR6-NEXT: or $2, $12, $2 +; MMR6-NEXT: selnez $2, $2, $3 +; MMR6-NEXT: or $5, $1, $2 +; MMR6-NEXT: or $2, $7, $11 +; MMR6-NEXT: seleqz $1, $6, $16 +; MMR6-NEXT: selnez $6, $9, $16 +; MMR6-NEXT: lw $16, 16($sp) # 4-byte Folded Reload +; MMR6-NEXT: seleqz $9, $16, $3 +; MMR6-NEXT: selnez $10, $10, $8 +; MMR6-NEXT: lw $16, 20($sp) # 4-byte Folded Reload +; MMR6-NEXT: seleqz $4, $4, $16 +; MMR6-NEXT: seleqz $4, $4, $8 +; MMR6-NEXT: or $4, $10, $4 ; MMR6-NEXT: selnez $3, $4, $3 -; MMR6-NEXT: or $4, $7, $3 +; MMR6-NEXT: or $4, $9, $3 ; MMR6-NEXT: or $1, $6, $1 -; MMR6-NEXT: selnez $1, $1, $11 -; MMR6-NEXT: or $3, $8, $1 -; MMR6-NEXT: lw $16, 16($sp) # 4-byte Folded Reload -; MMR6-NEXT: lw $17, 20($sp) # 4-byte Folded Reload -; MMR6-NEXT: addiu $sp, $sp, 24 +; MMR6-NEXT: selnez $1, $1, $8 +; MMR6-NEXT: or $3, $7, $1 +; MMR6-NEXT: lw $16, 24($sp) # 4-byte Folded Reload +; MMR6-NEXT: lw $17, 28($sp) # 4-byte Folded Reload +; MMR6-NEXT: addiu $sp, $sp, 32 ; MMR6-NEXT: jrc $ra </cut>

4 years, 6 months

[ACTIVITY] report week ending 1 Oct

by Peter Maydell

Progress * UM-2 [QEMU upstream maintainership] + Worked through my code-review backlog + Noticed that we never got round to making our emulated GICv3 support having redistributors in more than one contiguous region; this prevents using more than 123 CPUs with the virt board. Sent out a patchset which adds the necessary handling. + Generally trying to tie off loose ends pre-holiday :-) -- PMM

4 years, 6 months

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

linaro-toolchain October 2021