[PATCH stable 5.4 00/11] ARM 32-bit build with Clang IAS

List overview All Threads
Download

newer

older

[PATCH v3 2/2] powerpc/kvm: don't...

stable-rc/queue/5.10 build: 168...

Florian Fainelli

29 Jun 2022 29 Jun '22

6:02 p.m.

Hi,

This patch series is a collection of clean cherry picks into the 5.4 kernel allowing us to use the Clang integrated assembler to build the ARM 32-bit kernel.

This is useful in order to have proper build and runtime coverage of the stable kernel(s).

Ard Biesheuvel (3): crypto: arm/sha256-neon - avoid ADRL pseudo instruction crypto: arm/sha512-neon - avoid ADRL pseudo instruction crypto: arm - use Kconfig based compiler checks for crypto opcodes

Jian Cai (2): ARM: 8971/1: replace the sole use of a symbol with its definition ARM: 9029/1: Make iwmmxt.S support Clang's integrated assembler

Nick Desaulniers (1): ARM: 8933/1: replace Sun/Solaris style flag on section directive

Stefan Agner (5): ARM: 8989/1: use .fpu assembler directives instead of assembler arguments ARM: 8990/1: use VFP assembler mnemonics in register load/store macros ARM: 8929/1: use APSR_nzcv instead of r15 as mrc operand ARM: OMAP2+: drop unnecessary adrl crypto: arm/ghash-ce - define fpu before fpu registers are referenced

-- 2.25.1

Show replies by date

Florian Fainelli

29 Jun 29 Jun

6:02 p.m.

New subject: [PATCH stable 5.4 01/11] ARM: 8989/1: use .fpu assembler directives instead of assembler arguments

From: Stefan Agner stefan@agner.ch

commit a6c30873ee4a5cc0549c1973668156381ab2c1c4 upstream

Explicit FPU selection has been introduced in commit 1a6be26d5b1a ("[ARM] Enable VFP to be built when non-VFP capable CPUs are selected") to make use of assembler mnemonics for VFP instructions.

However, clang currently does not support passing assembler flags like this and errors out with: clang-10: error: the clang compiler does not support '-Wa,-mfpu=softvfp+vfp'

Make use of the .fpu assembler directives to select the floating point hardware selectively. Also use the new unified assembler language mnemonics. This allows to build these procedures with Clang.

Link: https://github.com/ClangBuiltLinux/linux/issues/762

Signed-off-by: Stefan Agner stefan@agner.ch Signed-off-by: Russell King rmk+kernel@armlinux.org.uk Signed-off-by: Florian Fainelli f.fainelli@gmail.com --- arch/arm/vfp/Makefile | 2 -- arch/arm/vfp/vfphw.S | 30 ++++++++++++++++++++---------- 2 files changed, 20 insertions(+), 12 deletions(-)

diff --git a/arch/arm/vfp/Makefile b/arch/arm/vfp/Makefile index 9975b63ac3b0..749901a72d6d 100644 --- a/arch/arm/vfp/Makefile +++ b/arch/arm/vfp/Makefile @@ -8,6 +8,4 @@ # ccflags-y := -DDEBUG # asflags-y := -DDEBUG

-KBUILD_AFLAGS :=$(KBUILD_AFLAGS:-msoft-float=-Wa,-mfpu=softvfp+vfp -mfloat-abi=soft) - obj-y += vfpmodule.o entry.o vfphw.o vfpsingle.o vfpdouble.o diff --git a/arch/arm/vfp/vfphw.S b/arch/arm/vfp/vfphw.S index b530db8f2c6c..772c6a3b1f72 100644 --- a/arch/arm/vfp/vfphw.S +++ b/arch/arm/vfp/vfphw.S @@ -253,11 +253,14 @@ vfp_current_hw_state_address:

ENTRY(vfp_get_float) tbl_branch r0, r3, #3 + .fpu vfpv2 .irp dr,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15 -1: mrc p10, 0, r0, c\dr, c0, 0 @ fmrs r0, s0 +1: vmov r0, s\dr ret lr .org 1b + 8 -1: mrc p10, 0, r0, c\dr, c0, 4 @ fmrs r0, s1 + .endr + .irp dr,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31 +1: vmov r0, s\dr ret lr .org 1b + 8 .endr @@ -265,11 +268,14 @@ ENDPROC(vfp_get_float)

ENTRY(vfp_put_float) tbl_branch r1, r3, #3 + .fpu vfpv2 .irp dr,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15 -1: mcr p10, 0, r0, c\dr, c0, 0 @ fmsr r0, s0 +1: vmov s\dr, r0 ret lr .org 1b + 8 -1: mcr p10, 0, r0, c\dr, c0, 4 @ fmsr r0, s1 + .endr + .irp dr,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31 +1: vmov s\dr, r0 ret lr .org 1b + 8 .endr @@ -277,15 +283,17 @@ ENDPROC(vfp_put_float)

ENTRY(vfp_get_double) tbl_branch r0, r3, #3 + .fpu vfpv2 .irp dr,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15 -1: fmrrd r0, r1, d\dr +1: vmov r0, r1, d\dr ret lr .org 1b + 8 .endr #ifdef CONFIG_VFPv3 @ d16 - d31 registers - .irp dr,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15 -1: mrrc p11, 3, r0, r1, c\dr @ fmrrd r0, r1, d\dr + .fpu vfpv3 + .irp dr,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31 +1: vmov r0, r1, d\dr ret lr .org 1b + 8 .endr @@ -299,15 +307,17 @@ ENDPROC(vfp_get_double)

ENTRY(vfp_put_double) tbl_branch r2, r3, #3 + .fpu vfpv2 .irp dr,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15 -1: fmdrr d\dr, r0, r1 +1: vmov d\dr, r0, r1 ret lr .org 1b + 8 .endr #ifdef CONFIG_VFPv3 + .fpu vfpv3 @ d16 - d31 registers - .irp dr,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15 -1: mcrr p11, 3, r0, r1, c\dr @ fmdrr r0, r1, d\dr + .irp dr,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31 +1: vmov d\dr, r0, r1 ret lr .org 1b + 8 .endr

-- 2.25.1

Florian Fainelli

6:02 p.m.

New subject: [PATCH stable 5.4 02/11] ARM: 8990/1: use VFP assembler mnemonics in register load/store macros

From: Stefan Agner stefan@agner.ch

commit ee440336e5ef977c397afdb72cbf9c6b8effc8ea upstream

The integrated assembler of Clang 10 and earlier do not allow to access the VFP registers through the coprocessor load/store instructions: <instantiation>:4:6: error: invalid operand for instruction LDC p11, cr0, [r10],#32*4 @ FLDMIAD r10!, {d0-d15} ^

This has been addressed with Clang 11 [0]. However, to support earlier versions of Clang and for better readability use of VFP assembler mnemonics still is preferred.

Replace the coprocessor load/store instructions with explicit assembler mnemonics to accessing the floating point coprocessor registers. Use assembler directives to select the appropriate FPU version.

This allows to build these macros with GNU assembler as well as with Clang's built-in assembler.

[0] https://reviews.llvm.org/D59733

Link: https://github.com/ClangBuiltLinux/linux/issues/905

Signed-off-by: Stefan Agner stefan@agner.ch Signed-off-by: Russell King rmk+kernel@armlinux.org.uk Signed-off-by: Florian Fainelli f.fainelli@gmail.com --- arch/arm/include/asm/vfpmacros.h | 19 +++++++++++-------- 1 file changed, 11 insertions(+), 8 deletions(-)

diff --git a/arch/arm/include/asm/vfpmacros.h b/arch/arm/include/asm/vfpmacros.h index 628c336e8e3b..947ee5395e1f 100644 --- a/arch/arm/include/asm/vfpmacros.h +++ b/arch/arm/include/asm/vfpmacros.h @@ -19,23 +19,25 @@

@ read all the working registers back into the VFP .macro VFPFLDMIA, base, tmp + .fpu vfpv2 #if __LINUX_ARM_ARCH__ < 6 - LDC p11, cr0, [\base],#33*4 @ FLDMIAX \base!, {d0-d15} + fldmiax \base!, {d0-d15} #else - LDC p11, cr0, [\base],#32*4 @ FLDMIAD \base!, {d0-d15} + vldmia \base!, {d0-d15} #endif #ifdef CONFIG_VFPv3 + .fpu vfpv3 #if __LINUX_ARM_ARCH__ <= 6 ldr \tmp, =elf_hwcap @ may not have MVFR regs ldr \tmp, [\tmp, #0] tst \tmp, #HWCAP_VFPD32 - ldclne p11, cr0, [\base],#32*4 @ FLDMIAD \base!, {d16-d31} + vldmiane \base!, {d16-d31} addeq \base, \base, #32*4 @ step over unused register space #else VFPFMRX \tmp, MVFR0 @ Media and VFP Feature Register 0 and \tmp, \tmp, #MVFR0_A_SIMD_MASK @ A_SIMD field cmp \tmp, #2 @ 32 x 64bit registers? - ldcleq p11, cr0, [\base],#32*4 @ FLDMIAD \base!, {d16-d31} + vldmiaeq \base!, {d16-d31} addne \base, \base, #32*4 @ step over unused register space #endif #endif @@ -44,22 +46,23 @@ @ write all the working registers out of the VFP .macro VFPFSTMIA, base, tmp #if __LINUX_ARM_ARCH__ < 6 - STC p11, cr0, [\base],#33*4 @ FSTMIAX \base!, {d0-d15} + fstmiax \base!, {d0-d15} #else - STC p11, cr0, [\base],#32*4 @ FSTMIAD \base!, {d0-d15} + vstmia \base!, {d0-d15} #endif #ifdef CONFIG_VFPv3 + .fpu vfpv3 #if __LINUX_ARM_ARCH__ <= 6 ldr \tmp, =elf_hwcap @ may not have MVFR regs ldr \tmp, [\tmp, #0] tst \tmp, #HWCAP_VFPD32 - stclne p11, cr0, [\base],#32*4 @ FSTMIAD \base!, {d16-d31} + vstmiane \base!, {d16-d31} addeq \base, \base, #32*4 @ step over unused register space #else VFPFMRX \tmp, MVFR0 @ Media and VFP Feature Register 0 and \tmp, \tmp, #MVFR0_A_SIMD_MASK @ A_SIMD field cmp \tmp, #2 @ 32 x 64bit registers? - stcleq p11, cr0, [\base],#32*4 @ FSTMIAD \base!, {d16-d31} + vstmiaeq \base!, {d16-d31} addne \base, \base, #32*4 @ step over unused register space #endif #endif

-- 2.25.1

Florian Fainelli

6:02 p.m.

New subject: [PATCH stable 5.4 03/11] ARM: 8971/1: replace the sole use of a symbol with its definition

From: Jian Cai caij2003@gmail.com

commit a780e485b5768e78aef087502499714901b68cc4 upstream

ALT_UP_B macro sets symbol up_b_offset via .equ to an expression involving another symbol. The macro gets expanded twice when arch/arm/kernel/sleep.S is assembled, creating a scenario where up_b_offset is set to another expression involving symbols while its current value is based on symbols. LLVM integrated assembler does not allow such cases, and based on the documentation of binutils, "Values that are based on expressions involving other symbols are allowed, but some targets may restrict this to only being done once per assembly", so it may be better to avoid such cases as it is not clearly stated which targets should support or disallow them. The fix in this case is simple, as up_b_offset has only one use, so we can replace the use with the definition and get rid of up_b_offset.

Link:https://github.com/ClangBuiltLinux/linux/issues/920

Reviewed-by: Stefan Agner stefan@agner.ch

Reviewed-by: Nick Desaulniers ndesaulniers@google.com Signed-off-by: Jian Cai caij2003@gmail.com Signed-off-by: Russell King rmk+kernel@armlinux.org.uk Signed-off-by: Florian Fainelli f.fainelli@gmail.com --- arch/arm/include/asm/assembler.h | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/arch/arm/include/asm/assembler.h b/arch/arm/include/asm/assembler.h index 6b3e64e19fb6..70e1c23feedb 100644 --- a/arch/arm/include/asm/assembler.h +++ b/arch/arm/include/asm/assembler.h @@ -279,10 +279,9 @@ .endif ;\ .popsection #define ALT_UP_B(label) \ - .equ up_b_offset, label - 9998b ;\ .pushsection ".alt.smp.init", "a" ;\ .long 9998b ;\ - W(b) . + up_b_offset ;\ + W(b) . + (label - 9998b) ;\ .popsection #else #define ALT_SMP(instr...)

-- 2.25.1

Florian Fainelli

6:02 p.m.

New subject: [PATCH stable 5.4 04/11] crypto: arm/sha256-neon - avoid ADRL pseudo instruction

From: Ard Biesheuvel ardb@kernel.org

commit 54781938ec342cadbe2d76669ef8d3294d909974 upstream

The ADRL pseudo instruction is not an architectural construct, but a convenience macro that was supported by the ARM proprietary assembler and adopted by binutils GAS as well, but only when assembling in 32-bit ARM mode. Therefore, it can only be used in assembler code that is known to assemble in ARM mode only, but as it turns out, the Clang assembler does not implement ADRL at all, and so it is better to get rid of it entirely.

So replace the ADRL instruction with a ADR instruction that refers to a nearer symbol, and apply the delta explicitly using an additional instruction.

Signed-off-by: Ard Biesheuvel ardb@kernel.org Tested-by: Nick Desaulniers ndesaulniers@google.com Signed-off-by: Herbert Xu herbert@gondor.apana.org.au Signed-off-by: Florian Fainelli f.fainelli@gmail.com --- arch/arm/crypto/sha256-armv4.pl | 4 ++-- arch/arm/crypto/sha256-core.S_shipped | 4 ++-- 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/arm/crypto/sha256-armv4.pl b/arch/arm/crypto/sha256-armv4.pl index a03cf4dfb781..d927483985c2 100644 --- a/arch/arm/crypto/sha256-armv4.pl +++ b/arch/arm/crypto/sha256-armv4.pl @@ -175,7 +175,6 @@ $code=<<___; #else .syntax unified # ifdef __thumb2__ -# define adrl adr .thumb # else .code 32 @@ -471,7 +470,8 @@ sha256_block_data_order_neon: stmdb sp!,{r4-r12,lr}

sub $H,sp,#16*4+16 - adrl $Ktbl,K256 + adr $Ktbl,.Lsha256_block_data_order + sub $Ktbl,$Ktbl,#.Lsha256_block_data_order-K256 bic $H,$H,#15 @ align for 128-bit stores mov $t2,sp mov sp,$H @ alloca diff --git a/arch/arm/crypto/sha256-core.S_shipped b/arch/arm/crypto/sha256-core.S_shipped index 054aae0edfce..9deb515f3c9f 100644 --- a/arch/arm/crypto/sha256-core.S_shipped +++ b/arch/arm/crypto/sha256-core.S_shipped @@ -56,7 +56,6 @@ #else .syntax unified # ifdef __thumb2__ -# define adrl adr .thumb # else .code 32 @@ -1885,7 +1884,8 @@ sha256_block_data_order_neon: stmdb sp!,{r4-r12,lr}

sub r11,sp,#16*4+16 - adrl r14,K256 + adr r14,.Lsha256_block_data_order + sub r14,r14,#.Lsha256_block_data_order-K256 bic r11,r11,#15 @ align for 128-bit stores mov r12,sp mov sp,r11 @ alloca

-- 2.25.1

Florian Fainelli

6:02 p.m.

New subject: [PATCH stable 5.4 05/11] crypto: arm/sha512-neon - avoid ADRL pseudo instruction

From: Ard Biesheuvel ardb@kernel.org

commit 0f5e8323777bfc1c1d2cba71242db6a361de03b6 upstream

So replace the ADRL instruction with a ADR instruction that refers to a nearer symbol, and apply the delta explicitly using an additional instruction.

Signed-off-by: Ard Biesheuvel ardb@kernel.org Tested-by: Nick Desaulniers ndesaulniers@google.com Signed-off-by: Herbert Xu herbert@gondor.apana.org.au Signed-off-by: Florian Fainelli f.fainelli@gmail.com --- arch/arm/crypto/sha512-armv4.pl | 4 ++-- arch/arm/crypto/sha512-core.S_shipped | 4 ++-- 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/arm/crypto/sha512-armv4.pl b/arch/arm/crypto/sha512-armv4.pl index 788c17b56ecc..2a0bdf7dd87c 100644 --- a/arch/arm/crypto/sha512-armv4.pl +++ b/arch/arm/crypto/sha512-armv4.pl @@ -212,7 +212,6 @@ $code=<<___; #else .syntax unified # ifdef __thumb2__ -# define adrl adr .thumb # else .code 32 @@ -602,7 +601,8 @@ sha512_block_data_order_neon: dmb @ errata #451034 on early Cortex A8 add $len,$inp,$len,lsl#7 @ len to point at the end of inp VFP_ABI_PUSH - adrl $Ktbl,K512 + adr $Ktbl,.Lsha512_block_data_order + sub $Ktbl,$Ktbl,.Lsha512_block_data_order-K512 vldmia $ctx,{$A-$H} @ load context .Loop_neon: ___ diff --git a/arch/arm/crypto/sha512-core.S_shipped b/arch/arm/crypto/sha512-core.S_shipped index 710ea309769e..cf5a7a70ff00 100644 --- a/arch/arm/crypto/sha512-core.S_shipped +++ b/arch/arm/crypto/sha512-core.S_shipped @@ -79,7 +79,6 @@ #else .syntax unified # ifdef __thumb2__ -# define adrl adr .thumb # else .code 32 @@ -543,7 +542,8 @@ sha512_block_data_order_neon: dmb @ errata #451034 on early Cortex A8 add r2,r1,r2,lsl#7 @ len to point at the end of inp VFP_ABI_PUSH - adrl r3,K512 + adr r3,.Lsha512_block_data_order + sub r3,r3,.Lsha512_block_data_order-K512 vldmia r0,{d16-d23} @ load context .Loop_neon: vshr.u64 d24,d20,#14 @ 0

-- 2.25.1

Florian Fainelli

6:02 p.m.

New subject: [PATCH stable 5.4 06/11] ARM: 8933/1: replace Sun/Solaris style flag on section directive

From: Nick Desaulniers ndesaulniers@google.com

commit 790756c7e0229dedc83bf058ac69633045b1000e upstream

It looks like a section directive was using "Solaris style" to declare the section flags. Replace this with the GNU style so that Clang's integrated assembler can assemble this directive.

The modified instances were identified via: $ ag .section | grep #

Link: https://ftp.gnu.org/old-gnu/Manuals/gas-2.9.1/html_chapter/as_7.html#SEC119 Link: https://github.com/ClangBuiltLinux/linux/issues/744 Link: https://bugs.llvm.org/show_bug.cgi?id=43759 Link: https://reviews.llvm.org/D69296

Acked-by: Nicolas Pitre nico@fluxnic.net Reviewed-by: Ard Biesheuvel ardb@kernel.org Reviewed-by: Stefan Agner stefan@agner.ch Signed-off-by: Nick Desaulniers ndesaulniers@google.com Suggested-by: Fangrui Song maskray@google.com Suggested-by: Jian Cai jiancai@google.com Suggested-by: Peter Smith peter.smith@linaro.org Signed-off-by: Russell King rmk+kernel@armlinux.org.uk Signed-off-by: Florian Fainelli f.fainelli@gmail.com --- arch/arm/boot/bootp/init.S | 2 +- arch/arm/boot/compressed/big-endian.S | 2 +- arch/arm/boot/compressed/head.S | 2 +- arch/arm/boot/compressed/piggy.S | 2 +- arch/arm/mm/proc-arm1020.S | 2 +- arch/arm/mm/proc-arm1020e.S | 2 +- arch/arm/mm/proc-arm1022.S | 2 +- arch/arm/mm/proc-arm1026.S | 2 +- arch/arm/mm/proc-arm720.S | 2 +- arch/arm/mm/proc-arm740.S | 2 +- arch/arm/mm/proc-arm7tdmi.S | 2 +- arch/arm/mm/proc-arm920.S | 2 +- arch/arm/mm/proc-arm922.S | 2 +- arch/arm/mm/proc-arm925.S | 2 +- arch/arm/mm/proc-arm926.S | 2 +- arch/arm/mm/proc-arm940.S | 2 +- arch/arm/mm/proc-arm946.S | 2 +- arch/arm/mm/proc-arm9tdmi.S | 2 +- arch/arm/mm/proc-fa526.S | 2 +- arch/arm/mm/proc-feroceon.S | 2 +- arch/arm/mm/proc-mohawk.S | 2 +- arch/arm/mm/proc-sa110.S | 2 +- arch/arm/mm/proc-sa1100.S | 2 +- arch/arm/mm/proc-v6.S | 2 +- arch/arm/mm/proc-v7.S | 2 +- arch/arm/mm/proc-v7m.S | 4 ++-- arch/arm/mm/proc-xsc3.S | 2 +- arch/arm/mm/proc-xscale.S | 2 +- 28 files changed, 29 insertions(+), 29 deletions(-)

diff --git a/arch/arm/boot/bootp/init.S b/arch/arm/boot/bootp/init.S index 5c476bd2b4ce..b562da2f7040 100644 --- a/arch/arm/boot/bootp/init.S +++ b/arch/arm/boot/bootp/init.S @@ -13,7 +13,7 @@ * size immediately following the kernel, we could build this into * a binary blob, and concatenate the zImage using the cat command. */ - .section .start,#alloc,#execinstr + .section .start, "ax" .type _start, #function .globl _start

diff --git a/arch/arm/boot/compressed/big-endian.S b/arch/arm/boot/compressed/big-endian.S index 88e2a88d324b..0e092c36da2f 100644 --- a/arch/arm/boot/compressed/big-endian.S +++ b/arch/arm/boot/compressed/big-endian.S @@ -6,7 +6,7 @@ * Author: Nicolas Pitre */

- .section ".start", #alloc, #execinstr + .section ".start", "ax"

mrc p15, 0, r0, c1, c0, 0 @ read control reg orr r0, r0, #(1 << 7) @ enable big endian mode diff --git a/arch/arm/boot/compressed/head.S b/arch/arm/boot/compressed/head.S index 0a2410adc25b..cdaf94027d3b 100644 --- a/arch/arm/boot/compressed/head.S +++ b/arch/arm/boot/compressed/head.S @@ -140,7 +140,7 @@ #endif .endm

- .section ".start", #alloc, #execinstr + .section ".start", "ax" /* * sort out different calling conventions */ diff --git a/arch/arm/boot/compressed/piggy.S b/arch/arm/boot/compressed/piggy.S index 0284f84dcf38..27577644ee72 100644 --- a/arch/arm/boot/compressed/piggy.S +++ b/arch/arm/boot/compressed/piggy.S @@ -1,5 +1,5 @@ /* SPDX-License-Identifier: GPL-2.0 */ - .section .piggydata,#alloc + .section .piggydata, "a" .globl input_data input_data: .incbin "arch/arm/boot/compressed/piggy_data" diff --git a/arch/arm/mm/proc-arm1020.S b/arch/arm/mm/proc-arm1020.S index 4fa5371bc662..2785da387c91 100644 --- a/arch/arm/mm/proc-arm1020.S +++ b/arch/arm/mm/proc-arm1020.S @@ -491,7 +491,7 @@ cpu_arm1020_name:

.align