Hi Stanislav,
FYI, your patch seems to be slowing down two of SPEC CPU2006 tests on 32-bit ARM at -O2 and -O3 optimization levels.
-- Maxim Kuvyrkov https://www.linaro.org
On 15 Sep 2021, at 12:54, ci_notify@linaro.org wrote:
After llvm commit 92c1fd19abb15bc68b1127a26137a69e033cdb39 Author: Stanislav Mekhanoshin Stanislav.Mekhanoshin@amd.com
Allow rematerialization of virtual reg uses
the following benchmarks slowed down by more than 2%:
- 456.hmmer slowed down by 6%
- 482.sphinx3 slowed down by 3%
Benchmark: Toolchain: Clang + Glibc + LLVM Linker Version: all components were built from their tip of trunk Target: arm-linux-gnueabihf Compiler flags: -O3 -marm Hardware: NVidia TK1 4x Cortex-A15
This commit has regressed these CI configurations:
- tcwg_bmk_llvm_tk1/llvm-master-arm-spec2k6-O2
- tcwg_bmk_llvm_tk1/llvm-master-arm-spec2k6-O3
First_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-master-a... Last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-master-a... Baseline build: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-master-a... Even more details: https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-master-a...
Reproduce builds:
<cut> mkdir investigate-llvm-92c1fd19abb15bc68b1127a26137a69e033cdb39 cd investigate-llvm-92c1fd19abb15bc68b1127a26137a69e033cdb39
# Fetch scripts git clone https://git.linaro.org/toolchain/jenkins-scripts
# Fetch manifests and test.sh script mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-master-a... --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-master-a... --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tk1-llvm-master-a... --fail chmod +x artifacts/test.sh
# Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh
# Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /llvm/ ./ ./bisect/baseline/
cd llvm
# Reproduce first_bad build git checkout --detach 92c1fd19abb15bc68b1127a26137a69e033cdb39 ../artifacts/test.sh
# Reproduce last_good build git checkout --detach 1d02a8bcd393ea9c50f0212797059888efc78002 ../artifacts/test.sh
cd ..
</cut>
Full commit (up to 1000 lines):
<cut> commit 92c1fd19abb15bc68b1127a26137a69e033cdb39 Author: Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com> Date: Thu Aug 19 11:42:09 2021 -0700
Allow rematerialization of virtual reg uses
Currently isReallyTriviallyReMaterializableGeneric() implementation prevents rematerialization on any virtual register use on the grounds that is not a trivial rematerialization and that we do not want to extend liveranges.
It appears that LRE logic does not attempt to extend a liverange of a source register for rematerialization so that is not an issue. That is checked in the LiveRangeEdit::allUsesAvailableAt().
The only non-trivial aspect of it is accounting for tied-defs which normally represent a read-modify-write operation and not rematerializable.
The test for a tied-def situation already exists in the /CodeGen/AMDGPU/remat-vop.mir, test_no_remat_v_cvt_f32_i32_sdwa_dst_unused_preserve.
The change has affected ARM/Thumb, Mips, RISCV, and x86. For the targets where I more or less understand the asm it seems to reduce spilling (as expected) or be neutral. However, it needs a review by all targets' specialists.
Differential Revision: https://reviews.llvm.org/D106408
llvm/include/llvm/CodeGen/TargetInstrInfo.h | 12 +- llvm/lib/CodeGen/TargetInstrInfo.cpp | 9 +- llvm/test/CodeGen/AMDGPU/remat-sop.mir | 60 + llvm/test/CodeGen/ARM/arm-shrink-wrapping-linux.ll | 28 +- llvm/test/CodeGen/ARM/funnel-shift-rot.ll | 32 +- llvm/test/CodeGen/ARM/funnel-shift.ll | 30 +- .../test/CodeGen/ARM/illegal-bitfield-loadstore.ll | 30 +- llvm/test/CodeGen/ARM/neon-copy.ll | 10 +- llvm/test/CodeGen/Mips/llvm-ir/ashr.ll | 227 +- llvm/test/CodeGen/Mips/llvm-ir/lshr.ll | 206 +- llvm/test/CodeGen/Mips/llvm-ir/shl.ll | 95 +- llvm/test/CodeGen/Mips/llvm-ir/sub.ll | 31 +- llvm/test/CodeGen/Mips/tls.ll | 4 +- llvm/test/CodeGen/RISCV/atomic-rmw.ll | 120 +- llvm/test/CodeGen/RISCV/atomic-signext.ll | 24 +- llvm/test/CodeGen/RISCV/bswap-ctlz-cttz-ctpop.ll | 96 +- llvm/test/CodeGen/RISCV/rv32i-rv64i-half.ll | 12 +- llvm/test/CodeGen/RISCV/rv32zbb-zbp.ll | 526 +-- llvm/test/CodeGen/RISCV/rv32zbb.ll | 94 +- llvm/test/CodeGen/RISCV/rv32zbp.ll | 282 +- llvm/test/CodeGen/RISCV/rv32zbt.ll | 348 +- .../CodeGen/RISCV/rvv/fixed-vectors-bitreverse.ll | 324 +- llvm/test/CodeGen/RISCV/rvv/fixed-vectors-bswap.ll | 146 +- llvm/test/CodeGen/RISCV/rvv/fixed-vectors-ctlz.ll | 3540 ++++++++++---------- llvm/test/CodeGen/RISCV/rvv/fixed-vectors-cttz.ll | 720 ++-- llvm/test/CodeGen/RISCV/srem-vector-lkk.ll | 208 +- llvm/test/CodeGen/RISCV/urem-vector-lkk.ll | 190 +- llvm/test/CodeGen/Thumb/dyn-stackalloc.ll | 7 +- .../tail-pred-disabled-in-loloops.ll | 14 +- .../LowOverheadLoops/varying-outer-2d-reduction.ll | 64 +- .../CodeGen/Thumb2/LowOverheadLoops/while-loops.ll | 67 +- llvm/test/CodeGen/Thumb2/ldr-str-imm12.ll | 30 +- llvm/test/CodeGen/Thumb2/mve-float16regloops.ll | 82 +- llvm/test/CodeGen/Thumb2/mve-float32regloops.ll | 98 +- llvm/test/CodeGen/Thumb2/mve-postinc-dct.ll | 529 ++- llvm/test/CodeGen/X86/addcarry.ll | 20 +- llvm/test/CodeGen/X86/callbr-asm-blockplacement.ll | 12 +- llvm/test/CodeGen/X86/dag-update-nodetomatch.ll | 17 +- llvm/test/CodeGen/X86/inalloca-invoke.ll | 2 +- llvm/test/CodeGen/X86/licm-regpressure.ll | 28 +- llvm/test/CodeGen/X86/ragreedy-hoist-spill.ll | 40 +- llvm/test/CodeGen/X86/sdiv_fix.ll | 5 +- 42 files changed, 4217 insertions(+), 4202 deletions(-)
diff --git a/llvm/include/llvm/CodeGen/TargetInstrInfo.h b/llvm/include/llvm/CodeGen/TargetInstrInfo.h index 2f853a2c6f9f..1c05afba730d 100644 --- a/llvm/include/llvm/CodeGen/TargetInstrInfo.h +++ b/llvm/include/llvm/CodeGen/TargetInstrInfo.h @@ -117,10 +117,11 @@ public: const MachineFunction &MF) const;
/// Return true if the instruction is trivially rematerializable, meaning it
- /// has no side effects and requires no operands that aren't always available.
- /// This means the only allowed uses are constants and unallocatable physical
- /// registers so that the instructions result is independent of the place
- /// in the function.
- /// has no side effects. Uses of constants and unallocatable physical
- /// registers are always trivial to rematerialize so that the instructions
- /// result is independent of the place in the function. Uses of virtual
- /// registers are allowed but it is caller's responsility to ensure these
- /// operands are valid at the point the instruction is beeing moved. bool isTriviallyReMaterializable(const MachineInstr &MI, AAResults *AA = nullptr) const { return MI.getOpcode() == TargetOpcode::IMPLICIT_DEF ||
@@ -140,8 +141,7 @@ protected: /// set, this hook lets the target specify whether the instruction is actually /// trivially rematerializable, taking into consideration its operands. This /// predicate must return false if the instruction has any side effects other
- /// than producing a value, or if it requres any address registers that are
- /// not always available.
- /// than producing a value. /// Requirements must be check as stated in isTriviallyReMaterializable() . virtual bool isReallyTriviallyReMaterializable(const MachineInstr &MI, AAResults *AA) const {
diff --git a/llvm/lib/CodeGen/TargetInstrInfo.cpp b/llvm/lib/CodeGen/TargetInstrInfo.cpp index 1eab8e7443a7..fe7d60e0b7e2 100644 --- a/llvm/lib/CodeGen/TargetInstrInfo.cpp +++ b/llvm/lib/CodeGen/TargetInstrInfo.cpp @@ -921,7 +921,8 @@ bool TargetInstrInfo::isReallyTriviallyReMaterializableGeneric( const MachineRegisterInfo &MRI = MF.getRegInfo();
// Remat clients assume operand 0 is the defined register.
- if (!MI.getNumOperands() || !MI.getOperand(0).isReg())
- if (!MI.getNumOperands() || !MI.getOperand(0).isReg() ||
return false; Register DefReg = MI.getOperand(0).getReg();MI.getOperand(0).isTied())
@@ -983,12 +984,6 @@ bool TargetInstrInfo::isReallyTriviallyReMaterializableGeneric( // same virtual register, though. if (MO.isDef() && Reg != DefReg) return false;
// Don't allow any virtual-register uses. Rematting an instruction with
// virtual register uses would length the live ranges of the uses, which
// is not necessarily a good idea, certainly not "trivial".
if (MO.isUse())
return false;
}
// Everything checked out.
diff --git a/llvm/test/CodeGen/AMDGPU/remat-sop.mir b/llvm/test/CodeGen/AMDGPU/remat-sop.mir index ed799bfca028..c9915aaabfde 100644 --- a/llvm/test/CodeGen/AMDGPU/remat-sop.mir +++ b/llvm/test/CodeGen/AMDGPU/remat-sop.mir @@ -51,6 +51,66 @@ body: | S_NOP 0, implicit %2 S_ENDPGM 0 ... +# The liverange of %0 covers a point of rematerialization, source value is +# availabe. +--- +name: test_remat_s_mov_b32_vreg_src_long_lr +tracksRegLiveness: true +machineFunctionInfo:
- stackPtrOffsetReg: $sgpr32
+body: |
- bb.0:
- ; GCN-LABEL: name: test_remat_s_mov_b32_vreg_src_long_lr
- ; GCN: renamable $sgpr0 = IMPLICIT_DEF
- ; GCN: renamable $sgpr1 = S_MOV_B32 renamable $sgpr0
- ; GCN: S_NOP 0, implicit killed renamable $sgpr1
- ; GCN: renamable $sgpr1 = S_MOV_B32 renamable $sgpr0
- ; GCN: S_NOP 0, implicit killed renamable $sgpr1
- ; GCN: renamable $sgpr1 = S_MOV_B32 renamable $sgpr0
- ; GCN: S_NOP 0, implicit killed renamable $sgpr1
- ; GCN: S_NOP 0, implicit killed renamable $sgpr0
- ; GCN: S_ENDPGM 0
- %0:sreg_32 = IMPLICIT_DEF
- %1:sreg_32 = S_MOV_B32 %0:sreg_32
- %2:sreg_32 = S_MOV_B32 %0:sreg_32
- %3:sreg_32 = S_MOV_B32 %0:sreg_32
- S_NOP 0, implicit %1
- S_NOP 0, implicit %2
- S_NOP 0, implicit %3
- S_NOP 0, implicit %0
- S_ENDPGM 0
+... +# The liverange of %0 does not cover a point of rematerialization, source value is +# unavailabe and we do not want to artificially extend the liverange. +--- +name: test_no_remat_s_mov_b32_vreg_src_short_lr +tracksRegLiveness: true +machineFunctionInfo:
- stackPtrOffsetReg: $sgpr32
+body: |
- bb.0:
- ; GCN-LABEL: name: test_no_remat_s_mov_b32_vreg_src_short_lr
- ; GCN: renamable $sgpr0 = IMPLICIT_DEF
- ; GCN: renamable $sgpr1 = S_MOV_B32 renamable $sgpr0
- ; GCN: SI_SPILL_S32_SAVE killed renamable $sgpr1, %stack.1, implicit $exec, implicit $sgpr32 :: (store (s32) into %stack.1, addrspace 5)
- ; GCN: renamable $sgpr1 = S_MOV_B32 renamable $sgpr0
- ; GCN: SI_SPILL_S32_SAVE killed renamable $sgpr1, %stack.0, implicit $exec, implicit $sgpr32 :: (store (s32) into %stack.0, addrspace 5)
- ; GCN: renamable $sgpr0 = S_MOV_B32 killed renamable $sgpr0
- ; GCN: renamable $sgpr1 = SI_SPILL_S32_RESTORE %stack.1, implicit $exec, implicit $sgpr32 :: (load (s32) from %stack.1, addrspace 5)
- ; GCN: S_NOP 0, implicit killed renamable $sgpr1
- ; GCN: renamable $sgpr1 = SI_SPILL_S32_RESTORE %stack.0, implicit $exec, implicit $sgpr32 :: (load (s32) from %stack.0, addrspace 5)
- ; GCN: S_NOP 0, implicit killed renamable $sgpr1
- ; GCN: S_NOP 0, implicit killed renamable $sgpr0
- ; GCN: S_ENDPGM 0
- %0:sreg_32 = IMPLICIT_DEF
- %1:sreg_32 = S_MOV_B32 %0:sreg_32
- %2:sreg_32 = S_MOV_B32 %0:sreg_32
- %3:sreg_32 = S_MOV_B32 %0:sreg_32
- S_NOP 0, implicit %1
- S_NOP 0, implicit %2
- S_NOP 0, implicit %3
- S_ENDPGM 0
+...
name: test_remat_s_mov_b64 tracksRegLiveness: true diff --git a/llvm/test/CodeGen/ARM/arm-shrink-wrapping-linux.ll b/llvm/test/CodeGen/ARM/arm-shrink-wrapping-linux.ll index a4243276c70a..175a2069a441 100644 --- a/llvm/test/CodeGen/ARM/arm-shrink-wrapping-linux.ll +++ b/llvm/test/CodeGen/ARM/arm-shrink-wrapping-linux.ll @@ -29,20 +29,20 @@ define fastcc i8* @wrongUseOfPostDominate(i8* readonly %s, i32 %off, i8* readnon ; ENABLE-NEXT: pophs {r11, pc} ; ENABLE-NEXT: .LBB0_3: @ %while.body.preheader ; ENABLE-NEXT: movw r12, :lower16:skip -; ENABLE-NEXT: sub r1, r1, #1 +; ENABLE-NEXT: sub r3, r1, #1 ; ENABLE-NEXT: movt r12, :upper16:skip ; ENABLE-NEXT: .LBB0_4: @ %while.body ; ENABLE-NEXT: @ =>This Inner Loop Header: Depth=1 -; ENABLE-NEXT: ldrb r3, [r0] -; ENABLE-NEXT: ldrb r3, [r12, r3] -; ENABLE-NEXT: add r0, r0, r3 -; ENABLE-NEXT: sub r3, r1, #1 -; ENABLE-NEXT: cmp r3, r1 +; ENABLE-NEXT: ldrb r1, [r0] +; ENABLE-NEXT: ldrb r1, [r12, r1] +; ENABLE-NEXT: add r0, r0, r1 +; ENABLE-NEXT: sub r1, r3, #1 +; ENABLE-NEXT: cmp r1, r3 ; ENABLE-NEXT: bhs .LBB0_6 ; ENABLE-NEXT: @ %bb.5: @ %while.body ; ENABLE-NEXT: @ in Loop: Header=BB0_4 Depth=1 ; ENABLE-NEXT: cmp r0, r2 -; ENABLE-NEXT: mov r1, r3 +; ENABLE-NEXT: mov r3, r1 ; ENABLE-NEXT: blo .LBB0_4 ; ENABLE-NEXT: .LBB0_6: @ %if.end29 ; ENABLE-NEXT: pop {r11, pc} @@ -119,20 +119,20 @@ define fastcc i8* @wrongUseOfPostDominate(i8* readonly %s, i32 %off, i8* readnon ; DISABLE-NEXT: pophs {r11, pc} ; DISABLE-NEXT: .LBB0_3: @ %while.body.preheader ; DISABLE-NEXT: movw r12, :lower16:skip -; DISABLE-NEXT: sub r1, r1, #1 +; DISABLE-NEXT: sub r3, r1, #1 ; DISABLE-NEXT: movt r12, :upper16:skip ; DISABLE-NEXT: .LBB0_4: @ %while.body ; DISABLE-NEXT: @ =>This Inner Loop Header: Depth=1 -; DISABLE-NEXT: ldrb r3, [r0] -; DISABLE-NEXT: ldrb r3, [r12, r3] -; DISABLE-NEXT: add r0, r0, r3 -; DISABLE-NEXT: sub r3, r1, #1 -; DISABLE-NEXT: cmp r3, r1 +; DISABLE-NEXT: ldrb r1, [r0] +; DISABLE-NEXT: ldrb r1, [r12, r1] +; DISABLE-NEXT: add r0, r0, r1 +; DISABLE-NEXT: sub r1, r3, #1 +; DISABLE-NEXT: cmp r1, r3 ; DISABLE-NEXT: bhs .LBB0_6 ; DISABLE-NEXT: @ %bb.5: @ %while.body ; DISABLE-NEXT: @ in Loop: Header=BB0_4 Depth=1 ; DISABLE-NEXT: cmp r0, r2 -; DISABLE-NEXT: mov r1, r3 +; DISABLE-NEXT: mov r3, r1 ; DISABLE-NEXT: blo .LBB0_4 ; DISABLE-NEXT: .LBB0_6: @ %if.end29 ; DISABLE-NEXT: pop {r11, pc} diff --git a/llvm/test/CodeGen/ARM/funnel-shift-rot.ll b/llvm/test/CodeGen/ARM/funnel-shift-rot.ll index 55157875d355..ea15fcc5c824 100644 --- a/llvm/test/CodeGen/ARM/funnel-shift-rot.ll +++ b/llvm/test/CodeGen/ARM/funnel-shift-rot.ll @@ -73,13 +73,13 @@ define i64 @rotl_i64(i64 %x, i64 %z) { ; SCALAR-NEXT: push {r4, r5, r11, lr} ; SCALAR-NEXT: rsb r3, r2, #0 ; SCALAR-NEXT: and r4, r2, #63 -; SCALAR-NEXT: and lr, r3, #63 -; SCALAR-NEXT: rsb r3, lr, #32 +; SCALAR-NEXT: and r12, r3, #63 +; SCALAR-NEXT: rsb r3, r12, #32 ; SCALAR-NEXT: lsl r2, r0, r4 -; SCALAR-NEXT: lsr r12, r0, lr -; SCALAR-NEXT: orr r3, r12, r1, lsl r3 -; SCALAR-NEXT: subs r12, lr, #32 -; SCALAR-NEXT: lsrpl r3, r1, r12 +; SCALAR-NEXT: lsr lr, r0, r12 +; SCALAR-NEXT: orr r3, lr, r1, lsl r3 +; SCALAR-NEXT: subs lr, r12, #32 +; SCALAR-NEXT: lsrpl r3, r1, lr ; SCALAR-NEXT: subs r5, r4, #32 ; SCALAR-NEXT: movwpl r2, #0 ; SCALAR-NEXT: cmp r5, #0 @@ -88,8 +88,8 @@ define i64 @rotl_i64(i64 %x, i64 %z) { ; SCALAR-NEXT: lsr r3, r0, r3 ; SCALAR-NEXT: orr r3, r3, r1, lsl r4 ; SCALAR-NEXT: lslpl r3, r0, r5 -; SCALAR-NEXT: lsr r0, r1, lr -; SCALAR-NEXT: cmp r12, #0 +; SCALAR-NEXT: lsr r0, r1, r12 +; SCALAR-NEXT: cmp lr, #0 ; SCALAR-NEXT: movwpl r0, #0 ; SCALAR-NEXT: orr r1, r3, r0 ; SCALAR-NEXT: mov r0, r2 @@ -245,15 +245,15 @@ define i64 @rotr_i64(i64 %x, i64 %z) { ; CHECK: @ %bb.0: ; CHECK-NEXT: .save {r4, r5, r11, lr} ; CHECK-NEXT: push {r4, r5, r11, lr} -; CHECK-NEXT: and lr, r2, #63 +; CHECK-NEXT: and r12, r2, #63 ; CHECK-NEXT: rsb r2, r2, #0 -; CHECK-NEXT: rsb r3, lr, #32 +; CHECK-NEXT: rsb r3, r12, #32 ; CHECK-NEXT: and r4, r2, #63 -; CHECK-NEXT: lsr r12, r0, lr -; CHECK-NEXT: orr r3, r12, r1, lsl r3 -; CHECK-NEXT: subs r12, lr, #32 +; CHECK-NEXT: lsr lr, r0, r12 +; CHECK-NEXT: orr r3, lr, r1, lsl r3 +; CHECK-NEXT: subs lr, r12, #32 ; CHECK-NEXT: lsl r2, r0, r4 -; CHECK-NEXT: lsrpl r3, r1, r12 +; CHECK-NEXT: lsrpl r3, r1, lr ; CHECK-NEXT: subs r5, r4, #32 ; CHECK-NEXT: movwpl r2, #0 ; CHECK-NEXT: cmp r5, #0 @@ -262,8 +262,8 @@ define i64 @rotr_i64(i64 %x, i64 %z) { ; CHECK-NEXT: lsr r3, r0, r3 ; CHECK-NEXT: orr r3, r3, r1, lsl r4 ; CHECK-NEXT: lslpl r3, r0, r5 -; CHECK-NEXT: lsr r0, r1, lr -; CHECK-NEXT: cmp r12, #0 +; CHECK-NEXT: lsr r0, r1, r12 +; CHECK-NEXT: cmp lr, #0 ; CHECK-NEXT: movwpl r0, #0 ; CHECK-NEXT: orr r1, r0, r3 ; CHECK-NEXT: mov r0, r2 diff --git a/llvm/test/CodeGen/ARM/funnel-shift.ll b/llvm/test/CodeGen/ARM/funnel-shift.ll index 54c93b493c98..6372f9be2ca3 100644 --- a/llvm/test/CodeGen/ARM/funnel-shift.ll +++ b/llvm/test/CodeGen/ARM/funnel-shift.ll @@ -224,31 +224,31 @@ define i37 @fshr_i37(i37 %x, i37 %y, i37 %z) { ; CHECK-NEXT: mov r3, #0 ; CHECK-NEXT: bl __aeabi_uldivmod ; CHECK-NEXT: add r0, r2, #27 -; CHECK-NEXT: lsl r6, r6, #27 -; CHECK-NEXT: and r1, r0, #63 ; CHECK-NEXT: lsl r2, r7, #27 +; CHECK-NEXT: and r12, r0, #63 +; CHECK-NEXT: lsl r6, r6, #27 ; CHECK-NEXT: orr r7, r6, r7, lsr #5 +; CHECK-NEXT: rsb r3, r12, #32 +; CHECK-NEXT: lsr r2, r2, r12 ; CHECK-NEXT: mov r6, #63 -; CHECK-NEXT: rsb r3, r1, #32 -; CHECK-NEXT: lsr r2, r2, r1 -; CHECK-NEXT: subs r12, r1, #32 -; CHECK-NEXT: bic r6, r6, r0 ; CHECK-NEXT: orr r2, r2, r7, lsl r3 +; CHECK-NEXT: subs r3, r12, #32 +; CHECK-NEXT: bic r6, r6, r0 ; CHECK-NEXT: lsl r5, r9, #1 -; CHECK-NEXT: lsrpl r2, r7, r12 +; CHECK-NEXT: lsrpl r2, r7, r3 +; CHECK-NEXT: subs r1, r6, #32 ; CHECK-NEXT: lsl r0, r5, r6 -; CHECK-NEXT: subs r4, r6, #32 -; CHECK-NEXT: lsl r3, r8, #1 +; CHECK-NEXT: lsl r4, r8, #1 ; CHECK-NEXT: movwpl r0, #0 -; CHECK-NEXT: orr r3, r3, r9, lsr #31 +; CHECK-NEXT: orr r4, r4, r9, lsr #31 ; CHECK-NEXT: orr r0, r0, r2 ; CHECK-NEXT: rsb r2, r6, #32 -; CHECK-NEXT: cmp r4, #0 -; CHECK-NEXT: lsr r1, r7, r1 +; CHECK-NEXT: cmp r1, #0 ; CHECK-NEXT: lsr r2, r5, r2 -; CHECK-NEXT: orr r2, r2, r3, lsl r6 -; CHECK-NEXT: lslpl r2, r5, r4 -; CHECK-NEXT: cmp r12, #0 +; CHECK-NEXT: orr r2, r2, r4, lsl r6 +; CHECK-NEXT: lslpl r2, r5, r1 +; CHECK-NEXT: lsr r1, r7, r12 +; CHECK-NEXT: cmp r3, #0 ; CHECK-NEXT: movwpl r1, #0 ; CHECK-NEXT: orr r1, r2, r1 ; CHECK-NEXT: pop {r4, r5, r6, r7, r8, r9, r11, pc} diff --git a/llvm/test/CodeGen/ARM/illegal-bitfield-loadstore.ll b/llvm/test/CodeGen/ARM/illegal-bitfield-loadstore.ll index 2922e0ed5423..0a0bb62b0a09 100644 --- a/llvm/test/CodeGen/ARM/illegal-bitfield-loadstore.ll +++ b/llvm/test/CodeGen/ARM/illegal-bitfield-loadstore.ll @@ -91,17 +91,17 @@ define void @i56_or(i56* %a) { ; BE-LABEL: i56_or: ; BE: @ %bb.0: ; BE-NEXT: mov r1, r0 -; BE-NEXT: ldr r12, [r0] ; BE-NEXT: ldrh r2, [r1, #4]! ; BE-NEXT: ldrb r3, [r1, #2] ; BE-NEXT: orr r2, r3, r2, lsl #8 -; BE-NEXT: orr r2, r2, r12, lsl #24 -; BE-NEXT: orr r2, r2, #384 -; BE-NEXT: strb r2, [r1, #2] -; BE-NEXT: lsr r3, r2, #8 -; BE-NEXT: strh r3, [r1] -; BE-NEXT: bic r1, r12, #255 -; BE-NEXT: orr r1, r1, r2, lsr #24 +; BE-NEXT: ldr r3, [r0] +; BE-NEXT: orr r2, r2, r3, lsl #24 +; BE-NEXT: orr r12, r2, #384 +; BE-NEXT: strb r12, [r1, #2] +; BE-NEXT: lsr r2, r12, #8 +; BE-NEXT: strh r2, [r1] +; BE-NEXT: bic r1, r3, #255 +; BE-NEXT: orr r1, r1, r12, lsr #24 ; BE-NEXT: str r1, [r0] ; BE-NEXT: mov pc, lr %aa = load i56, i56* %a @@ -127,13 +127,13 @@ define void @i56_and_or(i56* %a) { ; BE-NEXT: ldrb r3, [r1, #2] ; BE-NEXT: strb r2, [r1, #2] ; BE-NEXT: orr r2, r3, r12, lsl #8 -; BE-NEXT: ldr r12, [r0] -; BE-NEXT: orr r2, r2, r12, lsl #24 -; BE-NEXT: orr r2, r2, #384 -; BE-NEXT: lsr r3, r2, #8 -; BE-NEXT: strh r3, [r1] -; BE-NEXT: bic r1, r12, #255 -; BE-NEXT: orr r1, r1, r2, lsr #24 +; BE-NEXT: ldr r3, [r0] +; BE-NEXT: orr r2, r2, r3, lsl #24 +; BE-NEXT: orr r12, r2, #384 +; BE-NEXT: lsr r2, r12, #8 +; BE-NEXT: strh r2, [r1] +; BE-NEXT: bic r1, r3, #255 +; BE-NEXT: orr r1, r1, r12, lsr #24 ; BE-NEXT: str r1, [r0] ; BE-NEXT: mov pc, lr
diff --git a/llvm/test/CodeGen/ARM/neon-copy.ll b/llvm/test/CodeGen/ARM/neon-copy.ll index 09a991da2e59..46490efb6631 100644 --- a/llvm/test/CodeGen/ARM/neon-copy.ll +++ b/llvm/test/CodeGen/ARM/neon-copy.ll @@ -1340,16 +1340,16 @@ define <4 x i16> @test_extracts_inserts_varidx_insert(<8 x i16> %x, i32 %idx) { ; CHECK-NEXT: .pad #8 ; CHECK-NEXT: sub sp, sp, #8 ; CHECK-NEXT: vmov.u16 r1, d0[1] -; CHECK-NEXT: and r0, r0, #3 +; CHECK-NEXT: and r12, r0, #3 ; CHECK-NEXT: vmov.u16 r2, d0[2] -; CHECK-NEXT: mov r3, sp -; CHECK-NEXT: vmov.u16 r12, d0[3] -; CHECK-NEXT: orr r0, r3, r0, lsl #1 +; CHECK-NEXT: mov r0, sp +; CHECK-NEXT: vmov.u16 r3, d0[3] +; CHECK-NEXT: orr r0, r0, r12, lsl #1 ; CHECK-NEXT: vst1.16 {d0[0]}, [r0:16] ; CHECK-NEXT: vldr d0, [sp] ; CHECK-NEXT: vmov.16 d0[1], r1 ; CHECK-NEXT: vmov.16 d0[2], r2 -; CHECK-NEXT: vmov.16 d0[3], r12 +; CHECK-NEXT: vmov.16 d0[3], r3 ; CHECK-NEXT: add sp, sp, #8 ; CHECK-NEXT: bx lr %tmp = extractelement <8 x i16> %x, i32 0 diff --git a/llvm/test/CodeGen/Mips/llvm-ir/ashr.ll b/llvm/test/CodeGen/Mips/llvm-ir/ashr.ll index 8be7100d368b..a125446b27c3 100644 --- a/llvm/test/CodeGen/Mips/llvm-ir/ashr.ll +++ b/llvm/test/CodeGen/Mips/llvm-ir/ashr.ll @@ -766,79 +766,85 @@ define signext i128 @ashr_i128(i128 signext %a, i128 signext %b) { ; MMR3-NEXT: .cfi_offset 17, -4 ; MMR3-NEXT: .cfi_offset 16, -8 ; MMR3-NEXT: move $8, $7 -; MMR3-NEXT: sw $6, 32($sp) # 4-byte Folded Spill -; MMR3-NEXT: sw $5, 36($sp) # 4-byte Folded Spill -; MMR3-NEXT: sw $4, 8($sp) # 4-byte Folded Spill +; MMR3-NEXT: move $2, $6 +; MMR3-NEXT: sw $5, 0($sp) # 4-byte Folded Spill +; MMR3-NEXT: sw $4, 12($sp) # 4-byte Folded Spill ; MMR3-NEXT: lw $16, 76($sp) -; MMR3-NEXT: srlv $4, $7, $16 -; MMR3-NEXT: not16 $3, $16 -; MMR3-NEXT: sw $3, 24($sp) # 4-byte Folded Spill -; MMR3-NEXT: sll16 $2, $6, 1 -; MMR3-NEXT: sllv $3, $2, $3 -; MMR3-NEXT: li16 $2, 64 -; MMR3-NEXT: or16 $3, $4 -; MMR3-NEXT: srlv $6, $6, $16 -; MMR3-NEXT: sw $6, 12($sp) # 4-byte Folded Spill -; MMR3-NEXT: subu16 $7, $2, $16 +; MMR3-NEXT: srlv $3, $7, $16 +; MMR3-NEXT: not16 $6, $16 +; MMR3-NEXT: sw $6, 24($sp) # 4-byte Folded Spill +; MMR3-NEXT: move $4, $2 +; MMR3-NEXT: sw $2, 32($sp) # 4-byte Folded Spill +; MMR3-NEXT: sll16 $2, $2, 1 +; MMR3-NEXT: sllv $2, $2, $6 +; MMR3-NEXT: li16 $6, 64 +; MMR3-NEXT: or16 $2, $3 +; MMR3-NEXT: srlv $4, $4, $16 +; MMR3-NEXT: sw $4, 16($sp) # 4-byte Folded Spill +; MMR3-NEXT: subu16 $7, $6, $16 ; MMR3-NEXT: sllv $9, $5, $7 -; MMR3-NEXT: andi16 $2, $7, 32 -; MMR3-NEXT: sw $2, 28($sp) # 4-byte Folded Spill -; MMR3-NEXT: andi16 $5, $16, 32 -; MMR3-NEXT: sw $5, 16($sp) # 4-byte Folded Spill -; MMR3-NEXT: move $4, $9 +; MMR3-NEXT: andi16 $5, $7, 32 +; MMR3-NEXT: sw $5, 28($sp) # 4-byte Folded Spill +; MMR3-NEXT: andi16 $6, $16, 32 +; MMR3-NEXT: sw $6, 36($sp) # 4-byte Folded Spill +; MMR3-NEXT: move $3, $9 ; MMR3-NEXT: li16 $17, 0 -; MMR3-NEXT: movn $4, $17, $2 -; MMR3-NEXT: movn $3, $6, $5 -; MMR3-NEXT: addiu $2, $16, -64 -; MMR3-NEXT: lw $5, 36($sp) # 4-byte Folded Reload -; MMR3-NEXT: srlv $5, $5, $2 -; MMR3-NEXT: sw $5, 20($sp) # 4-byte Folded Spill -; MMR3-NEXT: lw $17, 8($sp) # 4-byte Folded Reload -; MMR3-NEXT: sll16 $6, $17, 1 -; MMR3-NEXT: sw $6, 4($sp) # 4-byte Folded Spill -; MMR3-NEXT: not16 $5, $2 -; MMR3-NEXT: sllv $5, $6, $5 -; MMR3-NEXT: or16 $3, $4 -; MMR3-NEXT: lw $4, 20($sp) # 4-byte Folded Reload -; MMR3-NEXT: or16 $5, $4 -; MMR3-NEXT: srav $1, $17, $2 -; MMR3-NEXT: andi16 $2, $2, 32 -; MMR3-NEXT: sw $2, 20($sp) # 4-byte Folded Spill -; MMR3-NEXT: movn $5, $1, $2 -; MMR3-NEXT: sllv $2, $17, $7 -; MMR3-NEXT: not16 $4, $7 -; MMR3-NEXT: lw $7, 36($sp) # 4-byte Folded Reload -; MMR3-NEXT: srl16 $6, $7, 1 -; MMR3-NEXT: srlv $6, $6, $4 +; MMR3-NEXT: movn $3, $17, $5 +; MMR3-NEXT: movn $2, $4, $6 +; MMR3-NEXT: addiu $4, $16, -64 +; MMR3-NEXT: lw $17, 0($sp) # 4-byte Folded Reload +; MMR3-NEXT: srlv $4, $17, $4 +; MMR3-NEXT: sw $4, 20($sp) # 4-byte Folded Spill +; MMR3-NEXT: lw $6, 12($sp) # 4-byte Folded Reload +; MMR3-NEXT: sll16 $4, $6, 1 +; MMR3-NEXT: sw $4, 8($sp) # 4-byte Folded Spill +; MMR3-NEXT: addiu $5, $16, -64 +; MMR3-NEXT: not16 $5, $5 +; MMR3-NEXT: sllv $5, $4, $5 +; MMR3-NEXT: or16 $2, $3 +; MMR3-NEXT: lw $3, 20($sp) # 4-byte Folded Reload +; MMR3-NEXT: or16 $5, $3 +; MMR3-NEXT: addiu $3, $16, -64 +; MMR3-NEXT: srav $1, $6, $3 +; MMR3-NEXT: andi16 $3, $3, 32 +; MMR3-NEXT: sw $3, 20($sp) # 4-byte Folded Spill +; MMR3-NEXT: movn $5, $1, $3 +; MMR3-NEXT: sllv $3, $6, $7 +; MMR3-NEXT: sw $3, 4($sp) # 4-byte Folded Spill +; MMR3-NEXT: not16 $3, $7 +; MMR3-NEXT: srl16 $4, $17, 1 +; MMR3-NEXT: srlv $3, $4, $3 ; MMR3-NEXT: sltiu $10, $16, 64 -; MMR3-NEXT: movn $5, $3, $10 -; MMR3-NEXT: or16 $6, $2 -; MMR3-NEXT: srlv $2, $7, $16 -; MMR3-NEXT: lw $3, 24($sp) # 4-byte Folded Reload -; MMR3-NEXT: lw $4, 4($sp) # 4-byte Folded Reload -; MMR3-NEXT: sllv $3, $4, $3 +; MMR3-NEXT: movn $5, $2, $10 +; MMR3-NEXT: lw $2, 4($sp) # 4-byte Folded Reload ; MMR3-NEXT: or16 $3, $2 -; MMR3-NEXT: srav $11, $17, $16 -; MMR3-NEXT: lw $4, 16($sp) # 4-byte Folded Reload -; MMR3-NEXT: movn $3, $11, $4 -; MMR3-NEXT: sra $2, $17, 31 +; MMR3-NEXT: srlv $2, $17, $16 +; MMR3-NEXT: lw $4, 24($sp) # 4-byte Folded Reload +; MMR3-NEXT: lw $7, 8($sp) # 4-byte Folded Reload +; MMR3-NEXT: sllv $17, $7, $4 +; MMR3-NEXT: or16 $17, $2 +; MMR3-NEXT: srav $11, $6, $16 +; MMR3-NEXT: lw $2, 36($sp) # 4-byte Folded Reload +; MMR3-NEXT: movn $17, $11, $2 +; MMR3-NEXT: sra $2, $6, 31 ; MMR3-NEXT: movz $5, $8, $16 -; MMR3-NEXT: move $8, $2 -; MMR3-NEXT: movn $8, $3, $10 -; MMR3-NEXT: lw $3, 28($sp) # 4-byte Folded Reload -; MMR3-NEXT: movn $6, $9, $3 -; MMR3-NEXT: li16 $3, 0 -; MMR3-NEXT: lw $7, 12($sp) # 4-byte Folded Reload -; MMR3-NEXT: movn $7, $3, $4 -; MMR3-NEXT: or16 $7, $6 +; MMR3-NEXT: move $4, $2 +; MMR3-NEXT: movn $4, $17, $10 +; MMR3-NEXT: lw $6, 28($sp) # 4-byte Folded Reload +; MMR3-NEXT: movn $3, $9, $6 +; MMR3-NEXT: lw $6, 36($sp) # 4-byte Folded Reload +; MMR3-NEXT: li16 $17, 0 +; MMR3-NEXT: lw $7, 16($sp) # 4-byte Folded Reload +; MMR3-NEXT: movn $7, $17, $6 +; MMR3-NEXT: or16 $7, $3 ; MMR3-NEXT: lw $3, 20($sp) # 4-byte Folded Reload ; MMR3-NEXT: movn $1, $2, $3 ; MMR3-NEXT: movn $1, $7, $10 ; MMR3-NEXT: lw $3, 32($sp) # 4-byte Folded Reload ; MMR3-NEXT: movz $1, $3, $16 -; MMR3-NEXT: movn $11, $2, $4 +; MMR3-NEXT: movn $11, $2, $6 ; MMR3-NEXT: movn $2, $11, $10 -; MMR3-NEXT: move $3, $8 +; MMR3-NEXT: move $3, $4 ; MMR3-NEXT: move $4, $1 ; MMR3-NEXT: lwp $16, 40($sp) ; MMR3-NEXT: addiusp 48 @@ -852,79 +858,80 @@ define signext i128 @ashr_i128(i128 signext %a, i128 signext %b) { ; MMR6-NEXT: sw $16, 8($sp) # 4-byte Folded Spill ; MMR6-NEXT: .cfi_offset 17, -4 ; MMR6-NEXT: .cfi_offset 16, -8 -; MMR6-NEXT: move $1, $7 +; MMR6-NEXT: move $12, $7 ; MMR6-NEXT: lw $3, 44($sp) ; MMR6-NEXT: li16 $2, 64 -; MMR6-NEXT: subu16 $7, $2, $3 -; MMR6-NEXT: sllv $8, $5, $7 -; MMR6-NEXT: andi16 $2, $7, 32 -; MMR6-NEXT: selnez $9, $8, $2 -; MMR6-NEXT: sllv $10, $4, $7 -; MMR6-NEXT: not16 $7, $7 -; MMR6-NEXT: srl16 $16, $5, 1 -; MMR6-NEXT: srlv $7, $16, $7 -; MMR6-NEXT: or $7, $10, $7 -; MMR6-NEXT: seleqz $7, $7, $2 -; MMR6-NEXT: or $7, $9, $7 -; MMR6-NEXT: srlv $9, $1, $3 -; MMR6-NEXT: not16 $16, $3 -; MMR6-NEXT: sw $16, 4($sp) # 4-byte Folded Spill +; MMR6-NEXT: subu16 $16, $2, $3 +; MMR6-NEXT: sllv $1, $5, $16 +; MMR6-NEXT: andi16 $2, $16, 32 +; MMR6-NEXT: selnez $8, $1, $2 +; MMR6-NEXT: sllv $9, $4, $16 +; MMR6-NEXT: not16 $16, $16 +; MMR6-NEXT: srl16 $17, $5, 1 +; MMR6-NEXT: srlv $10, $17, $16 +; MMR6-NEXT: or $9, $9, $10 +; MMR6-NEXT: seleqz $9, $9, $2 +; MMR6-NEXT: or $8, $8, $9 +; MMR6-NEXT: srlv $9, $7, $3 +; MMR6-NEXT: not16 $7, $3 +; MMR6-NEXT: sw $7, 4($sp) # 4-byte Folded Spill ; MMR6-NEXT: sll16 $17, $6, 1 -; MMR6-NEXT: sllv $10, $17, $16 +; MMR6-NEXT: sllv $10, $17, $7 ; MMR6-NEXT: or $9, $10, $9 ; MMR6-NEXT: andi16 $17, $3, 32 ; MMR6-NEXT: seleqz $9, $9, $17 ; MMR6-NEXT: srlv $10, $6, $3 ; MMR6-NEXT: selnez $11, $10, $17 ; MMR6-NEXT: seleqz $10, $10, $17 -; MMR6-NEXT: or $10, $10, $7 -; MMR6-NEXT: seleqz $12, $8, $2 -; MMR6-NEXT: or $8, $11, $9 +; MMR6-NEXT: or $8, $10, $8 +; MMR6-NEXT: seleqz $1, $1, $2 +; MMR6-NEXT: or $9, $11, $9 ; MMR6-NEXT: addiu $2, $3, -64 -; MMR6-NEXT: srlv $9, $5, $2 +; MMR6-NEXT: srlv $10, $5, $2 ; MMR6-NEXT: sll16 $7, $4, 1 ; MMR6-NEXT: not16 $16, $2 ; MMR6-NEXT: sllv $11, $7, $16 ; MMR6-NEXT: sltiu $13, $3, 64 -; MMR6-NEXT: or $8, $8, $12 -; MMR6-NEXT: selnez $10, $10, $13 -; MMR6-NEXT: or $9, $11, $9 -; MMR6-NEXT: srav $11, $4, $2 +; MMR6-NEXT: or $1, $9, $1 +; MMR6-NEXT: selnez $8, $8, $13 +; MMR6-NEXT: or $9, $11, $10 +; MMR6-NEXT: srav $10, $4, $2 ; MMR6-NEXT: andi16 $2, $2, 32 -; MMR6-NEXT: seleqz $12, $11, $2 +; MMR6-NEXT: seleqz $11, $10, $2 ; MMR6-NEXT: sra $14, $4, 31 ; MMR6-NEXT: selnez $15, $14, $2 ; MMR6-NEXT: seleqz $9, $9, $2 -; MMR6-NEXT: or $12, $15, $12 -; MMR6-NEXT: seleqz $12, $12, $13 -; MMR6-NEXT: selnez $2, $11, $2 -; MMR6-NEXT: seleqz $11, $14, $13 -; MMR6-NEXT: or $10, $10, $12 -; MMR6-NEXT: selnez $10, $10, $3 -; MMR6-NEXT: selnez $8, $8, $13 +; MMR6-NEXT: or $11, $15, $11 +; MMR6-NEXT: seleqz $11, $11, $13 +; MMR6-NEXT: selnez $2, $10, $2 +; MMR6-NEXT: seleqz $10, $14, $13 +; MMR6-NEXT: or $8, $8, $11 +; MMR6-NEXT: selnez $8, $8, $3 +; MMR6-NEXT: selnez $1, $1, $13 ; MMR6-NEXT: or $2, $2, $9 ; MMR6-NEXT: srav $9, $4, $3 ; MMR6-NEXT: seleqz $4, $9, $17 -; MMR6-NEXT: selnez $12, $14, $17 -; MMR6-NEXT: or $4, $12, $4 -; MMR6-NEXT: selnez $12, $4, $13 +; MMR6-NEXT: selnez $11, $14, $17 +; MMR6-NEXT: or $4, $11, $4 +; MMR6-NEXT: selnez $11, $4, $13 ; MMR6-NEXT: seleqz $2, $2, $13 ; MMR6-NEXT: seleqz $4, $6, $3 -; MMR6-NEXT: seleqz $1, $1, $3 -; MMR6-NEXT: or $2, $8, $2 -; MMR6-NEXT: selnez $2, $2, $3 +; MMR6-NEXT: seleqz $6, $12, $3 ; MMR6-NEXT: or $1, $1, $2 -; MMR6-NEXT: or $4, $4, $10 -; MMR6-NEXT: or $2, $12, $11 -; MMR6-NEXT: srlv $3, $5, $3 -; MMR6-NEXT: lw $5, 4($sp) # 4-byte Folded Reload -; MMR6-NEXT: sllv $5, $7, $5 -; MMR6-NEXT: or $3, $5, $3 -; MMR6-NEXT: seleqz $3, $3, $17 -; MMR6-NEXT: selnez $5, $9, $17 -; MMR6-NEXT: or $3, $5, $3 -; MMR6-NEXT: selnez $3, $3, $13 -; MMR6-NEXT: or $3, $3, $11 +; MMR6-NEXT: selnez $1, $1, $3 +; MMR6-NEXT: or $1, $6, $1 +; MMR6-NEXT: or $4, $4, $8 +; MMR6-NEXT: or $6, $11, $10 +; MMR6-NEXT: srlv $2, $5, $3 +; MMR6-NEXT: lw $3, 4($sp) # 4-byte Folded Reload +; MMR6-NEXT: sllv $3, $7, $3 +; MMR6-NEXT: or $2, $3, $2 +; MMR6-NEXT: seleqz $2, $2, $17 +; MMR6-NEXT: selnez $3, $9, $17 +; MMR6-NEXT: or $2, $3, $2 +; MMR6-NEXT: selnez $2, $2, $13 +; MMR6-NEXT: or $3, $2, $10 +; MMR6-NEXT: move $2, $6 ; MMR6-NEXT: move $5, $1 ; MMR6-NEXT: lw $16, 8($sp) # 4-byte Folded Reload ; MMR6-NEXT: lw $17, 12($sp) # 4-byte Folded Reload diff --git a/llvm/test/CodeGen/Mips/llvm-ir/lshr.ll b/llvm/test/CodeGen/Mips/llvm-ir/lshr.ll index ed2bfc9fcf60..e4b4b3ae1d0f 100644 --- a/llvm/test/CodeGen/Mips/llvm-ir/lshr.ll +++ b/llvm/test/CodeGen/Mips/llvm-ir/lshr.ll @@ -776,76 +776,77 @@ define signext i128 @lshr_i128(i128 signext %a, i128 signext %b) { ; MMR3-NEXT: .cfi_offset 17, -4 ; MMR3-NEXT: .cfi_offset 16, -8 ; MMR3-NEXT: move $8, $7 -; MMR3-NEXT: sw $6, 24($sp) # 4-byte Folded Spill +; MMR3-NEXT: sw $5, 4($sp) # 4-byte Folded Spill ; MMR3-NEXT: sw $4, 28($sp) # 4-byte Folded Spill ; MMR3-NEXT: lw $16, 68($sp) ; MMR3-NEXT: li16 $2, 64 -; MMR3-NEXT: subu16 $7, $2, $16 -; MMR3-NEXT: sllv $9, $5, $7 -; MMR3-NEXT: move $17, $5 -; MMR3-NEXT: sw $5, 0($sp) # 4-byte Folded Spill -; MMR3-NEXT: andi16 $3, $7, 32 +; MMR3-NEXT: subu16 $17, $2, $16 +; MMR3-NEXT: sllv $9, $5, $17 +; MMR3-NEXT: andi16 $3, $17, 32 ; MMR3-NEXT: sw $3, 20($sp) # 4-byte Folded Spill ; MMR3-NEXT: li16 $2, 0 ; MMR3-NEXT: move $4, $9 ; MMR3-NEXT: movn $4, $2, $3 -; MMR3-NEXT: srlv $5, $8, $16 +; MMR3-NEXT: srlv $5, $7, $16 ; MMR3-NEXT: not16 $3, $16 ; MMR3-NEXT: sw $3, 16($sp) # 4-byte Folded Spill ; MMR3-NEXT: sll16 $2, $6, 1 +; MMR3-NEXT: sw $6, 24($sp) # 4-byte Folded Spill ; MMR3-NEXT: sllv $2, $2, $3 ; MMR3-NEXT: or16 $2, $5 -; MMR3-NEXT: srlv $5, $6, $16 -; MMR3-NEXT: sw $5, 4($sp) # 4-byte Folded Spill +; MMR3-NEXT: srlv $7, $6, $16 ; MMR3-NEXT: andi16 $3, $16, 32 ; MMR3-NEXT: sw $3, 12($sp) # 4-byte Folded Spill -; MMR3-NEXT: movn $2, $5, $3 +; MMR3-NEXT: movn $2, $7, $3 ; MMR3-NEXT: addiu $3, $16, -64 ; MMR3-NEXT: or16 $2, $4 -; MMR3-NEXT: srlv $4, $17, $3 -; MMR3-NEXT: sw $4, 8($sp) # 4-byte Folded Spill -; MMR3-NEXT: lw $4, 28($sp) # 4-byte Folded Reload -; MMR3-NEXT: sll16 $6, $4, 1 -; MMR3-NEXT: not16 $5, $3 -; MMR3-NEXT: sllv $5, $6, $5 -; MMR3-NEXT: lw $17, 8($sp) # 4-byte Folded Reload -; MMR3-NEXT: or16 $5, $17 -; MMR3-NEXT: srlv $1, $4, $3 -; MMR3-NEXT: andi16 $3, $3, 32 +; MMR3-NEXT: lw $6, 4($sp) # 4-byte Folded Reload +; MMR3-NEXT: srlv $3, $6, $3 ; MMR3-NEXT: sw $3, 8($sp) # 4-byte Folded Spill -; MMR3-NEXT: movn $5, $1, $3 +; MMR3-NEXT: lw $3, 28($sp) # 4-byte Folded Reload +; MMR3-NEXT: sll16 $4, $3, 1 +; MMR3-NEXT: sw $4, 0($sp) # 4-byte Folded Spill +; MMR3-NEXT: addiu $5, $16, -64 +; MMR3-NEXT: not16 $5, $5 +; MMR3-NEXT: sllv $5, $4, $5 +; MMR3-NEXT: lw $4, 8($sp) # 4-byte Folded Reload +; MMR3-NEXT: or16 $5, $4 +; MMR3-NEXT: addiu $4, $16, -64 +; MMR3-NEXT: srlv $1, $3, $4 +; MMR3-NEXT: andi16 $4, $4, 32 +; MMR3-NEXT: sw $4, 8($sp) # 4-byte Folded Spill +; MMR3-NEXT: movn $5, $1, $4 ; MMR3-NEXT: sltiu $10, $16, 64 ; MMR3-NEXT: movn $5, $2, $10 -; MMR3-NEXT: sllv $2, $4, $7 -; MMR3-NEXT: not16 $3, $7 -; MMR3-NEXT: lw $7, 0($sp) # 4-byte Folded Reload -; MMR3-NEXT: srl16 $4, $7, 1 +; MMR3-NEXT: sllv $2, $3, $17 +; MMR3-NEXT: not16 $3, $17 +; MMR3-NEXT: srl16 $4, $6, 1 ; MMR3-NEXT: srlv $4, $4, $3 ; MMR3-NEXT: or16 $4, $2 -; MMR3-NEXT: srlv $2, $7, $16 +; MMR3-NEXT: srlv $2, $6, $16 ; MMR3-NEXT: lw $3, 16($sp) # 4-byte Folded Reload +; MMR3-NEXT: lw $6, 0($sp) # 4-byte Folded Reload ; MMR3-NEXT: sllv $3, $6, $3 ; MMR3-NEXT: or16 $3, $2 ; MMR3-NEXT: lw $2, 28($sp) # 4-byte Folded Reload ; MMR3-NEXT: srlv $2, $2, $16 -; MMR3-NEXT: lw $17, 12($sp) # 4-byte Folded Reload -; MMR3-NEXT: movn $3, $2, $17 +; MMR3-NEXT: lw $6, 12($sp) # 4-byte Folded Reload +; MMR3-NEXT: movn $3, $2, $6 ; MMR3-NEXT: movz $5, $8, $16 -; MMR3-NEXT: li16 $6, 0 -; MMR3-NEXT: movz $3, $6, $10 -; MMR3-NEXT: lw $7, 20($sp) # 4-byte Folded Reload -; MMR3-NEXT: movn $4, $9, $7 -; MMR3-NEXT: lw $6, 4($sp) # 4-byte Folded Reload -; MMR3-NEXT: li16 $7, 0 -; MMR3-NEXT: movn $6, $7, $17 -; MMR3-NEXT: or16 $6, $4 +; MMR3-NEXT: li16 $17, 0 +; MMR3-NEXT: movz $3, $17, $10 +; MMR3-NEXT: lw $17, 20($sp) # 4-byte Folded Reload +; MMR3-NEXT: movn $4, $9, $17 +; MMR3-NEXT: li16 $17, 0 +; MMR3-NEXT: movn $7, $17, $6 +; MMR3-NEXT: or16 $7, $4 ; MMR3-NEXT: lw $4, 8($sp) # 4-byte Folded Reload -; MMR3-NEXT: movn $1, $7, $4 -; MMR3-NEXT: li16 $7, 0 -; MMR3-NEXT: movn $1, $6, $10 +; MMR3-NEXT: movn $1, $17, $4 +; MMR3-NEXT: li16 $17, 0 +; MMR3-NEXT: movn $1, $7, $10 ; MMR3-NEXT: lw $4, 24($sp) # 4-byte Folded Reload ; MMR3-NEXT: movz $1, $4, $16 -; MMR3-NEXT: movn $2, $7, $17 +; MMR3-NEXT: movn $2, $17, $6 ; MMR3-NEXT: li16 $4, 0 ; MMR3-NEXT: movz $2, $4, $10 ; MMR3-NEXT: move $4, $1 @@ -855,98 +856,91 @@ define signext i128 @lshr_i128(i128 signext %a, i128 signext %b) { ; ; MMR6-LABEL: lshr_i128: ; MMR6: # %bb.0: # %entry -; MMR6-NEXT: addiu $sp, $sp, -32 -; MMR6-NEXT: .cfi_def_cfa_offset 32 -; MMR6-NEXT: sw $17, 28($sp) # 4-byte Folded Spill -; MMR6-NEXT: sw $16, 24($sp) # 4-byte Folded Spill +; MMR6-NEXT: addiu $sp, $sp, -24 +; MMR6-NEXT: .cfi_def_cfa_offset 24 +; MMR6-NEXT: sw $17, 20($sp) # 4-byte Folded Spill +; MMR6-NEXT: sw $16, 16($sp) # 4-byte Folded Spill ; MMR6-NEXT: .cfi_offset 17, -4 ; MMR6-NEXT: .cfi_offset 16, -8 ; MMR6-NEXT: move $1, $7 -; MMR6-NEXT: move $7, $5 -; MMR6-NEXT: lw $3, 60($sp) +; MMR6-NEXT: move $7, $4 +; MMR6-NEXT: lw $3, 52($sp) ; MMR6-NEXT: srlv $2, $1, $3 -; MMR6-NEXT: not16 $5, $3 -; MMR6-NEXT: sw $5, 12($sp) # 4-byte Folded Spill -; MMR6-NEXT: move $17, $6 -; MMR6-NEXT: sw $6, 16($sp) # 4-byte Folded Spill +; MMR6-NEXT: not16 $16, $3 +; MMR6-NEXT: sw $16, 8($sp) # 4-byte Folded Spill +; MMR6-NEXT: move $4, $6 +; MMR6-NEXT: sw $6, 12($sp) # 4-byte Folded Spill ; MMR6-NEXT: sll16 $6, $6, 1 -; MMR6-NEXT: sllv $6, $6, $5 +; MMR6-NEXT: sllv $6, $6, $16 ; MMR6-NEXT: or $8, $6, $2 -; MMR6-NEXT: addiu $5, $3, -64 -; MMR6-NEXT: srlv $9, $7, $5 -; MMR6-NEXT: move $6, $4 -; MMR6-NEXT: sll16 $2, $4, 1 -; MMR6-NEXT: sw $2, 8($sp) # 4-byte Folded Spill -; MMR6-NEXT: not16 $16, $5 +; MMR6-NEXT: addiu $6, $3, -64 +; MMR6-NEXT: srlv $9, $5, $6 +; MMR6-NEXT: sll16 $2, $7, 1 +; MMR6-NEXT: sw $2, 4($sp) # 4-byte Folded Spill +; MMR6-NEXT: not16 $16, $6 ; MMR6-NEXT: sllv $10, $2, $16 ; MMR6-NEXT: andi16 $16, $3, 32 ; MMR6-NEXT: seleqz $8, $8, $16 ; MMR6-NEXT: or $9, $10, $9 -; MMR6-NEXT: srlv $10, $17, $3 +; MMR6-NEXT: srlv $10, $4, $3 ; MMR6-NEXT: selnez $11, $10, $16 ; MMR6-NEXT: li16 $17, 64 ; MMR6-NEXT: subu16 $2, $17, $3 -; MMR6-NEXT: sllv $12, $7, $2 -; MMR6-NEXT: move $17, $7 +; MMR6-NEXT: sllv $12, $5, $2 ; MMR6-NEXT: andi16 $4, $2, 32 -; MMR6-NEXT: andi16 $7, $5, 32 -; MMR6-NEXT: sw $7, 20($sp) # 4-byte Folded Spill -; MMR6-NEXT: seleqz $9, $9, $7 +; MMR6-NEXT: andi16 $17, $6, 32 +; MMR6-NEXT: seleqz $9, $9, $17 ; MMR6-NEXT: seleqz $13, $12, $4 ; MMR6-NEXT: or $8, $11, $8 ; MMR6-NEXT: selnez $11, $12, $4 -; MMR6-NEXT: sllv $12, $6, $2 -; MMR6-NEXT: move $7, $6 -; MMR6-NEXT: sw $6, 4($sp) # 4-byte Folded Spill +; MMR6-NEXT: sllv $12, $7, $2 ; MMR6-NEXT: not16 $2, $2 -; MMR6-NEXT: srl16 $6, $17, 1 +; MMR6-NEXT: srl16 $6, $5, 1 ; MMR6-NEXT: srlv $2, $6, $2 ; MMR6-NEXT: or $2, $12, $2 ; MMR6-NEXT: seleqz $2, $2, $4 -; MMR6-NEXT: srlv $4, $7, $5 -; MMR6-NEXT: or $11, $11, $2 -; MMR6-NEXT: or $5, $8, $13 -; MMR6-NEXT: srlv $6, $17, $3 -; MMR6-NEXT: lw $2, 20($sp) # 4-byte Folded Reload -; MMR6-NEXT: selnez $7, $4, $2 -; MMR6-NEXT: sltiu $8, $3, 64 -; MMR6-NEXT: selnez $12, $5, $8 -; MMR6-NEXT: or $7, $7, $9 -; MMR6-NEXT: lw $5, 12($sp) # 4-byte Folded Reload +; MMR6-NEXT: addiu $4, $3, -64 +; MMR6-NEXT: srlv $4, $7, $4 +; MMR6-NEXT: or $12, $11, $2 +; MMR6-NEXT: or $6, $8, $13 +; MMR6-NEXT: srlv $5, $5, $3 +; MMR6-NEXT: selnez $8, $4, $17 +; MMR6-NEXT: sltiu $11, $3, 64 +; MMR6-NEXT: selnez $13, $6, $11 +; MMR6-NEXT: or $8, $8, $9 ; MMR6-NEXT: lw $2, 8($sp) # 4-byte Folded Reload -; MMR6-NEXT: sllv $9, $2, $5 +; MMR6-NEXT: lw $6, 4($sp) # 4-byte Folded Reload +; MMR6-NEXT: sllv $9, $6, $2 ; MMR6-NEXT: seleqz $10, $10, $16 -; MMR6-NEXT: li16 $5, 0 -; MMR6-NEXT: or $10, $10, $11 -; MMR6-NEXT: or $6, $9, $6 -; MMR6-NEXT: seleqz $2, $7, $8 -; MMR6-NEXT: seleqz $7, $5, $8 -; MMR6-NEXT: lw $5, 4($sp) # 4-byte Folded Reload -; MMR6-NEXT: srlv $9, $5, $3 -; MMR6-NEXT: seleqz $11, $9, $16 -; MMR6-NEXT: selnez $11, $11, $8 +; MMR6-NEXT: li16 $2, 0 +; MMR6-NEXT: or $10, $10, $12 +; MMR6-NEXT: or $9, $9, $5 +; MMR6-NEXT: seleqz $5, $8, $11 +; MMR6-NEXT: seleqz $8, $2, $11 +; MMR6-NEXT: srlv $7, $7, $3 +; MMR6-NEXT: seleqz $2, $7, $16 +; MMR6-NEXT: selnez $2, $2, $11 ; MMR6-NEXT: seleqz $1, $1, $3 -; MMR6-NEXT: or $2, $12, $2 -; MMR6-NEXT: selnez $2, $2, $3 -; MMR6-NEXT: or $5, $1, $2 -; MMR6-NEXT: or $2, $7, $11 -; MMR6-NEXT: seleqz $1, $6, $16 -; MMR6-NEXT: selnez $6, $9, $16 -; MMR6-NEXT: lw $16, 16($sp) # 4-byte Folded Reload -; MMR6-NEXT: seleqz $9, $16, $3 -; MMR6-NEXT: selnez $10, $10, $8 -; MMR6-NEXT: lw $16, 20($sp) # 4-byte Folded Reload -; MMR6-NEXT: seleqz $4, $4, $16 -; MMR6-NEXT: seleqz $4, $4, $8 -; MMR6-NEXT: or $4, $10, $4 +; MMR6-NEXT: or $5, $13, $5 +; MMR6-NEXT: selnez $5, $5, $3 +; MMR6-NEXT: or $5, $1, $5 +; MMR6-NEXT: or $2, $8, $2 +; MMR6-NEXT: seleqz $1, $9, $16 +; MMR6-NEXT: selnez $6, $7, $16 +; MMR6-NEXT: lw $7, 12($sp) # 4-byte Folded Reload +; MMR6-NEXT: seleqz $7, $7, $3 +; MMR6-NEXT: selnez $9, $10, $11 +; MMR6-NEXT: seleqz $4, $4, $17 +; MMR6-NEXT: seleqz $4, $4, $11
</cut>