This patch adds support for the Zalasr ISA extension, which supplies the real load acquire/store release instructions.
The specification can be found here: https://github.com/riscv/riscv-zalasr/blob/main/chapter2.adoc
This patch seires has been tested with ltp on Qemu with Brensan's zalasr support patch[1].
Some false positive spacing error happens during patch checking. Thus I CCed maintainers of checkpatch.pl as well.
[1] https://lore.kernel.org/all/CAGPSXwJEdtqW=nx71oufZp64nK6tK=0rytVEcz4F-gfvCOX...
v3: - Apply acquire/release semantics to arch_xchg/arch_cmpxchg operations so as to ensure FENCE.TSO ordering between operations which precede the UNLOCK+LOCK sequence and operations which follow the sequence. Thanks to Andrea. - Support hwprobe of Zalasr. - Allow Zalasr extensions for Guest/VM.
v2: - Adjust the order of Zalasr and Zalrsc in dt-bindings. Thanks to Conor.
Xu Lu (8): riscv: add ISA extension parsing for Zalasr dt-bindings: riscv: Add Zalasr ISA extension description riscv: hwprobe: Export Zalasr extension riscv: Introduce Zalasr instructions riscv: Use Zalasr for smp_load_acquire/smp_store_release riscv: Apply acquire/release semantics to arch_xchg/arch_cmpxchg operations RISC-V: KVM: Allow Zalasr extensions for Guest/VM KVM: riscv: selftests: Add Zalasr extensions to get-reg-list test
Documentation/arch/riscv/hwprobe.rst | 5 +- .../devicetree/bindings/riscv/extensions.yaml | 5 + arch/riscv/include/asm/atomic.h | 6 - arch/riscv/include/asm/barrier.h | 91 ++++++++++-- arch/riscv/include/asm/cmpxchg.h | 136 ++++++++---------- arch/riscv/include/asm/hwcap.h | 1 + arch/riscv/include/asm/insn-def.h | 79 ++++++++++ arch/riscv/include/uapi/asm/hwprobe.h | 1 + arch/riscv/include/uapi/asm/kvm.h | 1 + arch/riscv/kernel/cpufeature.c | 1 + arch/riscv/kernel/sys_hwprobe.c | 1 + arch/riscv/kvm/vcpu_onereg.c | 2 + .../selftests/kvm/riscv/get-reg-list.c | 4 + 13 files changed, 242 insertions(+), 91 deletions(-)
Add parsing for Zalasr ISA extension.
Signed-off-by: Xu Lu luxu.kernel@bytedance.com --- arch/riscv/include/asm/hwcap.h | 1 + arch/riscv/kernel/cpufeature.c | 1 + 2 files changed, 2 insertions(+)
diff --git a/arch/riscv/include/asm/hwcap.h b/arch/riscv/include/asm/hwcap.h index affd63e11b0a3..ae3852c4f2ca2 100644 --- a/arch/riscv/include/asm/hwcap.h +++ b/arch/riscv/include/asm/hwcap.h @@ -106,6 +106,7 @@ #define RISCV_ISA_EXT_ZAAMO 97 #define RISCV_ISA_EXT_ZALRSC 98 #define RISCV_ISA_EXT_ZICBOP 99 +#define RISCV_ISA_EXT_ZALASR 100
#define RISCV_ISA_EXT_XLINUXENVCFG 127
diff --git a/arch/riscv/kernel/cpufeature.c b/arch/riscv/kernel/cpufeature.c index 743d53415572e..bf9d3d92bf372 100644 --- a/arch/riscv/kernel/cpufeature.c +++ b/arch/riscv/kernel/cpufeature.c @@ -472,6 +472,7 @@ const struct riscv_isa_ext_data riscv_isa_ext[] = { __RISCV_ISA_EXT_DATA(zaamo, RISCV_ISA_EXT_ZAAMO), __RISCV_ISA_EXT_DATA(zabha, RISCV_ISA_EXT_ZABHA), __RISCV_ISA_EXT_DATA(zacas, RISCV_ISA_EXT_ZACAS), + __RISCV_ISA_EXT_DATA(zalasr, RISCV_ISA_EXT_ZALASR), __RISCV_ISA_EXT_DATA(zalrsc, RISCV_ISA_EXT_ZALRSC), __RISCV_ISA_EXT_DATA(zawrs, RISCV_ISA_EXT_ZAWRS), __RISCV_ISA_EXT_DATA(zfa, RISCV_ISA_EXT_ZFA),
Add description for the Zalasr ISA extension
Signed-off-by: Xu Lu luxu.kernel@bytedance.com --- Documentation/devicetree/bindings/riscv/extensions.yaml | 5 +++++ 1 file changed, 5 insertions(+)
diff --git a/Documentation/devicetree/bindings/riscv/extensions.yaml b/Documentation/devicetree/bindings/riscv/extensions.yaml index ede6a58ccf534..100fe53fb0731 100644 --- a/Documentation/devicetree/bindings/riscv/extensions.yaml +++ b/Documentation/devicetree/bindings/riscv/extensions.yaml @@ -242,6 +242,11 @@ properties: is supported as ratified at commit 5059e0ca641c ("update to ratified") of the riscv-zacas.
+ - const: zalasr + description: | + The standard Zalasr extension for load-acquire/store-release as frozen + at commit 194f0094 ("Version 0.9 for freeze") of riscv-zalasr. + - const: zalrsc description: | The standard Zalrsc extension for load-reserved/store-conditional as
Export the Zalasr extension to userspace using hwprobe.
Signed-off-by: Xu Lu luxu.kernel@bytedance.com --- Documentation/arch/riscv/hwprobe.rst | 5 ++++- arch/riscv/include/uapi/asm/hwprobe.h | 1 + arch/riscv/kernel/sys_hwprobe.c | 1 + 3 files changed, 6 insertions(+), 1 deletion(-)
diff --git a/Documentation/arch/riscv/hwprobe.rst b/Documentation/arch/riscv/hwprobe.rst index 2aa9be272d5de..067a3595fb9d5 100644 --- a/Documentation/arch/riscv/hwprobe.rst +++ b/Documentation/arch/riscv/hwprobe.rst @@ -249,6 +249,9 @@ The following keys are defined: defined in the in the RISC-V ISA manual starting from commit e87412e621f1 ("integrate Zaamo and Zalrsc text (#1304)").
+ * :c:macro:`RISCV_HWPROBE_EXT_ZALASR`: The Zalasr extension is supported as + frozen at commit 194f0094 ("Version 0.9 for freeze") of riscv-zalasr. + * :c:macro:`RISCV_HWPROBE_EXT_ZALRSC`: The Zalrsc extension is supported as defined in the in the RISC-V ISA manual starting from commit e87412e621f1 ("integrate Zaamo and Zalrsc text (#1304)"). @@ -360,4 +363,4 @@ The following keys are defined:
* :c:macro:`RISCV_HWPROBE_VENDOR_EXT_XSFVFWMACCQQQ`: The Xsfvfwmaccqqq vendor extension is supported in version 1.0 of Matrix Multiply Accumulate - Instruction Extensions Specification. \ No newline at end of file + Instruction Extensions Specification. diff --git a/arch/riscv/include/uapi/asm/hwprobe.h b/arch/riscv/include/uapi/asm/hwprobe.h index aaf6ad9704993..d3a65f8ff7da4 100644 --- a/arch/riscv/include/uapi/asm/hwprobe.h +++ b/arch/riscv/include/uapi/asm/hwprobe.h @@ -82,6 +82,7 @@ struct riscv_hwprobe { #define RISCV_HWPROBE_EXT_ZAAMO (1ULL << 56) #define RISCV_HWPROBE_EXT_ZALRSC (1ULL << 57) #define RISCV_HWPROBE_EXT_ZABHA (1ULL << 58) +#define RISCV_HWPROBE_EXT_ZALASR (1ULL << 59) #define RISCV_HWPROBE_KEY_CPUPERF_0 5 #define RISCV_HWPROBE_MISALIGNED_UNKNOWN (0 << 0) #define RISCV_HWPROBE_MISALIGNED_EMULATED (1 << 0) diff --git a/arch/riscv/kernel/sys_hwprobe.c b/arch/riscv/kernel/sys_hwprobe.c index 0b170e18a2beb..0529e692b1173 100644 --- a/arch/riscv/kernel/sys_hwprobe.c +++ b/arch/riscv/kernel/sys_hwprobe.c @@ -99,6 +99,7 @@ static void hwprobe_isa_ext0(struct riscv_hwprobe *pair, EXT_KEY(ZAAMO); EXT_KEY(ZABHA); EXT_KEY(ZACAS); + EXT_KEY(ZALASR); EXT_KEY(ZALRSC); EXT_KEY(ZAWRS); EXT_KEY(ZBA);
Introduce l{b|h|w|d}.{aq|aqrl} and s{b|h|w|d}.{rl|aqrl} instruction encodings.
Signed-off-by: Xu Lu luxu.kernel@bytedance.com --- arch/riscv/include/asm/insn-def.h | 79 +++++++++++++++++++++++++++++++ 1 file changed, 79 insertions(+)
diff --git a/arch/riscv/include/asm/insn-def.h b/arch/riscv/include/asm/insn-def.h index d5adbaec1d010..3fec7e66ce50f 100644 --- a/arch/riscv/include/asm/insn-def.h +++ b/arch/riscv/include/asm/insn-def.h @@ -179,6 +179,7 @@ #define RV___RS1(v) __RV_REG(v) #define RV___RS2(v) __RV_REG(v)
+#define RV_OPCODE_AMO RV_OPCODE(47) #define RV_OPCODE_MISC_MEM RV_OPCODE(15) #define RV_OPCODE_OP_IMM RV_OPCODE(19) #define RV_OPCODE_SYSTEM RV_OPCODE(115) @@ -208,6 +209,84 @@ __ASM_STR(.error "hlv.d requires 64-bit support") #endif
+#define LB_AQ(dest, addr) \ + INSN_R(OPCODE_AMO, FUNC3(0), FUNC7(26), \ + RD(dest), RS1(addr), __RS2(0)) + +#define LB_AQRL(dest, addr) \ + INSN_R(OPCODE_AMO, FUNC3(0), FUNC7(27), \ + RD(dest), RS1(addr), __RS2(0)) + +#define LH_AQ(dest, addr) \ + INSN_R(OPCODE_AMO, FUNC3(1), FUNC7(26), \ + RD(dest), RS1(addr), __RS2(0)) + +#define LH_AQRL(dest, addr) \ + INSN_R(OPCODE_AMO, FUNC3(1), FUNC7(27), \ + RD(dest), RS1(addr), __RS2(0)) + +#define LW_AQ(dest, addr) \ + INSN_R(OPCODE_AMO, FUNC3(2), FUNC7(26), \ + RD(dest), RS1(addr), __RS2(0)) + +#define LW_AQRL(dest, addr) \ + INSN_R(OPCODE_AMO, FUNC3(2), FUNC7(27), \ + RD(dest), RS1(addr), __RS2(0)) + +#define SB_RL(src, addr) \ + INSN_R(OPCODE_AMO, FUNC3(0), FUNC7(29), \ + __RD(0), RS1(addr), RS2(src)) + +#define SB_AQRL(src, addr) \ + INSN_R(OPCODE_AMO, FUNC3(0), FUNC7(31), \ + __RD(0), RS1(addr), RS2(src)) + +#define SH_RL(src, addr) \ + INSN_R(OPCODE_AMO, FUNC3(1), FUNC7(29), \ + __RD(0), RS1(addr), RS2(src)) + +#define SH_AQRL(src, addr) \ + INSN_R(OPCODE_AMO, FUNC3(1), FUNC7(31), \ + __RD(0), RS1(addr), RS2(src)) + +#define SW_RL(src, addr) \ + INSN_R(OPCODE_AMO, FUNC3(2), FUNC7(29), \ + __RD(0), RS1(addr), RS2(src)) + +#define SW_AQRL(src, addr) \ + INSN_R(OPCODE_AMO, FUNC3(2), FUNC7(31), \ + __RD(0), RS1(addr), RS2(src)) + +#ifdef CONFIG_64BIT +#define LD_AQ(dest, addr) \ + INSN_R(OPCODE_AMO, FUNC3(3), FUNC7(26), \ + RD(dest), RS1(addr), __RS2(0)) + +#define LD_AQRL(dest, addr) \ + INSN_R(OPCODE_AMO, FUNC3(3), FUNC7(27), \ + RD(dest), RS1(addr), __RS2(0)) + +#define SD_RL(src, addr) \ + INSN_R(OPCODE_AMO, FUNC3(3), FUNC7(29), \ + __RD(0), RS1(addr), RS2(src)) + +#define SD_AQRL(src, addr) \ + INSN_R(OPCODE_AMO, FUNC3(3), FUNC7(31), \ + __RD(0), RS1(addr), RS2(src)) +#else +#define LD_AQ(dest, addr) \ + __ASM_STR(.error "ld.aq requires 64-bit support") + +#define LD_AQRL(dest, addr) \ + __ASM_STR(.error "ld.aqrl requires 64-bit support") + +#define SD_RL(dest, addr) \ + __ASM_STR(.error "sd.rl requires 64-bit support") + +#define SD_AQRL(dest, addr) \ + __ASM_STR(.error "sd.aqrl requires 64-bit support") +#endif + #define SINVAL_VMA(vaddr, asid) \ INSN_R(OPCODE_SYSTEM, FUNC3(0), FUNC7(11), \ __RD(0), RS1(vaddr), RS2(asid))
Replace fence instructions with Zalasr instructions during smp_load_acquire() and smp_store_release() operations.
|----------------------------------| | | __smp_store_release | | |-----------------------------| | | zalasr | !zalasr | | rl |-----------------------------| | | s{b|h|w|d}.rl | fence rw, w | | | | s{b|h|w|d} | |----------------------------------| | | __smp_load_acquire | | |-----------------------------| | | zalasr | !zalasr | | aq |-----------------------------| | | l{b|h|w|d}.rl | l{b|h|w|d} | | | | fence r, rw | |----------------------------------|
Signed-off-by: Xu Lu luxu.kernel@bytedance.com --- arch/riscv/include/asm/barrier.h | 91 ++++++++++++++++++++++++++++---- 1 file changed, 80 insertions(+), 11 deletions(-)
diff --git a/arch/riscv/include/asm/barrier.h b/arch/riscv/include/asm/barrier.h index b8c5726d86acb..9eaf94a028096 100644 --- a/arch/riscv/include/asm/barrier.h +++ b/arch/riscv/include/asm/barrier.h @@ -51,19 +51,88 @@ */ #define smp_mb__after_spinlock() RISCV_FENCE(iorw, iorw)
-#define __smp_store_release(p, v) \ -do { \ - compiletime_assert_atomic_type(*p); \ - RISCV_FENCE(rw, w); \ - WRITE_ONCE(*p, v); \ +extern void __bad_size_call_parameter(void); + +#define __smp_store_release(p, v) \ +do { \ + typeof(p) __p = (p); \ + union { typeof(*p) __val; char __c[1]; } __u = \ + { .__val = (__force typeof(*p)) (v) }; \ + compiletime_assert_atomic_type(*p); \ + switch (sizeof(*p)) { \ + case 1: \ + asm volatile(ALTERNATIVE("fence rw, w;\t\nsb %0, 0(%1)\t\n", \ + SB_RL(%0, %1) "\t\nnop\t\n", \ + 0, RISCV_ISA_EXT_ZALASR, 1) \ + : : "r" (*(__u8 *)__u.__c), "r" (__p) \ + : "memory"); \ + break; \ + case 2: \ + asm volatile(ALTERNATIVE("fence rw, w;\t\nsh %0, 0(%1)\t\n", \ + SH_RL(%0, %1) "\t\nnop\t\n", \ + 0, RISCV_ISA_EXT_ZALASR, 1) \ + : : "r" (*(__u16 *)__u.__c), "r" (__p) \ + : "memory"); \ + break; \ + case 4: \ + asm volatile(ALTERNATIVE("fence rw, w;\t\nsw %0, 0(%1)\t\n", \ + SW_RL(%0, %1) "\t\nnop\t\n", \ + 0, RISCV_ISA_EXT_ZALASR, 1) \ + : : "r" (*(__u32 *)__u.__c), "r" (__p) \ + : "memory"); \ + break; \ + case 8: \ + asm volatile(ALTERNATIVE("fence rw, w;\t\nsd %0, 0(%1)\t\n", \ + SD_RL(%0, %1) "\t\nnop\t\n", \ + 0, RISCV_ISA_EXT_ZALASR, 1) \ + : : "r" (*(__u64 *)__u.__c), "r" (__p) \ + : "memory"); \ + break; \ + default: \ + __bad_size_call_parameter(); \ + break; \ + } \ } while (0)
-#define __smp_load_acquire(p) \ -({ \ - typeof(*p) ___p1 = READ_ONCE(*p); \ - compiletime_assert_atomic_type(*p); \ - RISCV_FENCE(r, rw); \ - ___p1; \ +#define __smp_load_acquire(p) \ +({ \ + union { typeof(*p) __val; char __c[1]; } __u; \ + typeof(p) __p = (p); \ + compiletime_assert_atomic_type(*p); \ + switch (sizeof(*p)) { \ + case 1: \ + asm volatile(ALTERNATIVE("lb %0, 0(%1)\t\nfence r, rw\t\n", \ + LB_AQ(%0, %1) "\t\nnop\t\n", \ + 0, RISCV_ISA_EXT_ZALASR, 1) \ + : "=r" (*(__u8 *)__u.__c) : "r" (__p) \ + : "memory"); \ + break; \ + case 2: \ + asm volatile(ALTERNATIVE("lh %0, 0(%1)\t\nfence r, rw\t\n", \ + LH_AQ(%0, %1) "\t\nnop\t\n", \ + 0, RISCV_ISA_EXT_ZALASR, 1) \ + : "=r" (*(__u16 *)__u.__c) : "r" (__p) \ + : "memory"); \ + break; \ + case 4: \ + asm volatile(ALTERNATIVE("lw %0, 0(%1)\t\nfence r, rw\t\n", \ + LW_AQ(%0, %1) "\t\nnop\t\n", \ + 0, RISCV_ISA_EXT_ZALASR, 1) \ + : "=r" (*(__u32 *)__u.__c) : "r" (__p) \ + : "memory"); \ + break; \ + case 8: \ + asm volatile(ALTERNATIVE("ld %0, 0(%1)\t\nfence r, rw\t\n", \ + LD_AQ(%0, %1) "\t\nnop\t\n", \ + 0, RISCV_ISA_EXT_ZALASR, 1) \ + : "=r" (*(__u64 *)__u.__c) : "r" (__p) \ + : "memory"); \ + break; \ + default: \ + __bad_size_call_parameter(); \ + break; \ + } \ + __u.__val; \ })
#ifdef CONFIG_RISCV_ISA_ZAWRS
The existing arch_xchg/arch_cmpxchg operations are implemented by inserting fence instructions before or after atomic instructions. This commit replaces them with real acquire/release semantics.
|----------------------------------------------------------------| | | arch_xchg_release | arch_cmpxchg_release | | |-----------------------------------------------------------| | | zabha | !zabha | zabha+zacas | !(zabha+zacas) | | rl |-----------------------------------------------------------| | | | (fence rw, w) | | (fence rw, w) | | | amoswap.rl | lr.w | amocas.rl | lr.w | | | | sc.w.rl | | sc.w.rl | |----------------------------------------------------------------| | | arch_xchg_acquire | arch_cmpxchg_acquire | | |-----------------------------------------------------------| | | zabha | !zabha | zabha+zacas | !(zabha+zacas) | | aq |-----------------------------------------------------------| | | | lr.w.aq | | lr.w.aq | | | amoswap.aq | sc.w | amocas.aq | sc.w | | | | (fence r, rw) | | (fence r, rw) | |----------------------------------------------------------------|
(fence rw, w), (fence r, rw) here means such instructions will only be inserted when zalasr is not implemented.
Signed-off-by: Xu Lu luxu.kernel@bytedance.com --- arch/riscv/include/asm/atomic.h | 6 -- arch/riscv/include/asm/cmpxchg.h | 136 ++++++++++++++----------------- 2 files changed, 63 insertions(+), 79 deletions(-)
diff --git a/arch/riscv/include/asm/atomic.h b/arch/riscv/include/asm/atomic.h index 5b96c2f61adb5..b79a4f889f339 100644 --- a/arch/riscv/include/asm/atomic.h +++ b/arch/riscv/include/asm/atomic.h @@ -18,12 +18,6 @@
#include <asm/cmpxchg.h>
-#define __atomic_acquire_fence() \ - __asm__ __volatile__(RISCV_ACQUIRE_BARRIER "" ::: "memory") - -#define __atomic_release_fence() \ - __asm__ __volatile__(RISCV_RELEASE_BARRIER "" ::: "memory"); - static __always_inline int arch_atomic_read(const atomic_t *v) { return READ_ONCE(v->counter); diff --git a/arch/riscv/include/asm/cmpxchg.h b/arch/riscv/include/asm/cmpxchg.h index 0b749e7102162..207fdba38d1fc 100644 --- a/arch/riscv/include/asm/cmpxchg.h +++ b/arch/riscv/include/asm/cmpxchg.h @@ -15,15 +15,23 @@ #include <asm/cpufeature-macros.h> #include <asm/processor.h>
-#define __arch_xchg_masked(sc_sfx, swap_sfx, prepend, sc_append, \ - swap_append, r, p, n) \ +/* + * These macros are here to improve the readability of the arch_xchg_XXX() + * and arch_cmpxchg_XXX() macros. + */ +#define LR_SFX(x) x +#define SC_SFX(x) x +#define CAS_SFX(x) x +#define SC_PREPEND(x) x +#define SC_APPEND(x) x + +#define __arch_xchg_masked(lr_sfx, sc_sfx, swap_sfx, sc_prepend, sc_append, \ + r, p, n) \ ({ \ if (IS_ENABLED(CONFIG_RISCV_ISA_ZABHA) && \ riscv_has_extension_unlikely(RISCV_ISA_EXT_ZABHA)) { \ __asm__ __volatile__ ( \ - prepend \ " amoswap" swap_sfx " %0, %z2, %1\n" \ - swap_append \ : "=&r" (r), "+A" (*(p)) \ : "rJ" (n) \ : "memory"); \ @@ -37,14 +45,16 @@ ulong __rc; \ \ __asm__ __volatile__ ( \ - prepend \ PREFETCHW_ASM(%5) \ + ALTERNATIVE(__nops(1), sc_prepend, \ + 0, RISCV_ISA_EXT_ZALASR, 1) \ "0: lr.w %0, %2\n" \ " and %1, %0, %z4\n" \ " or %1, %1, %z3\n" \ " sc.w" sc_sfx " %1, %1, %2\n" \ " bnez %1, 0b\n" \ - sc_append \ + ALTERNATIVE(__nops(1), sc_append, \ + 0, RISCV_ISA_EXT_ZALASR, 1) \ : "=&r" (__retx), "=&r" (__rc), "+A" (*(__ptr32b)) \ : "rJ" (__newx), "rJ" (~__mask), "rJ" (__ptr32b) \ : "memory"); \ @@ -53,19 +63,17 @@ } \ })
-#define __arch_xchg(sfx, prepend, append, r, p, n) \ +#define __arch_xchg(sfx, r, p, n) \ ({ \ __asm__ __volatile__ ( \ - prepend \ " amoswap" sfx " %0, %2, %1\n" \ - append \ : "=r" (r), "+A" (*(p)) \ : "r" (n) \ : "memory"); \ })
-#define _arch_xchg(ptr, new, sc_sfx, swap_sfx, prepend, \ - sc_append, swap_append) \ +#define _arch_xchg(ptr, new, lr_sfx, sc_sfx, swap_sfx, \ + sc_prepend, sc_append) \ ({ \ __typeof__(ptr) __ptr = (ptr); \ __typeof__(*(__ptr)) __new = (new); \ @@ -73,22 +81,20 @@ \ switch (sizeof(*__ptr)) { \ case 1: \ - __arch_xchg_masked(sc_sfx, ".b" swap_sfx, \ - prepend, sc_append, swap_append, \ + __arch_xchg_masked(lr_sfx, sc_sfx, ".b" swap_sfx, \ + sc_prepend, sc_append, \ __ret, __ptr, __new); \ break; \ case 2: \ - __arch_xchg_masked(sc_sfx, ".h" swap_sfx, \ - prepend, sc_append, swap_append, \ + __arch_xchg_masked(lr_sfx, sc_sfx, ".h" swap_sfx, \ + sc_prepend, sc_append, \ __ret, __ptr, __new); \ break; \ case 4: \ - __arch_xchg(".w" swap_sfx, prepend, swap_append, \ - __ret, __ptr, __new); \ + __arch_xchg(".w" swap_sfx, __ret, __ptr, __new); \ break; \ case 8: \ - __arch_xchg(".d" swap_sfx, prepend, swap_append, \ - __ret, __ptr, __new); \ + __arch_xchg(".d" swap_sfx, __ret, __ptr, __new); \ break; \ default: \ BUILD_BUG(); \ @@ -97,17 +103,23 @@ })
#define arch_xchg_relaxed(ptr, x) \ - _arch_xchg(ptr, x, "", "", "", "", "") + _arch_xchg(ptr, x, LR_SFX(""), SC_SFX(""), CAS_SFX(""), \ + SC_PREPEND(__nops(1)), SC_APPEND(__nops(1)))
#define arch_xchg_acquire(ptr, x) \ - _arch_xchg(ptr, x, "", "", "", \ - RISCV_ACQUIRE_BARRIER, RISCV_ACQUIRE_BARRIER) + _arch_xchg(ptr, x, LR_SFX(".aq"), SC_SFX(""), CAS_SFX(".aq"), \ + SC_PREPEND(__nops(1)), \ + SC_APPEND(RISCV_ACQUIRE_BARRIER))
#define arch_xchg_release(ptr, x) \ - _arch_xchg(ptr, x, "", "", RISCV_RELEASE_BARRIER, "", "") + _arch_xchg(ptr, x, LR_SFX(""), SC_SFX(".rl"), CAS_SFX(".rl"), \ + SC_PREPEND(RISCV_RELEASE_BARRIER), \ + SC_APPEND(__nops(1)))
#define arch_xchg(ptr, x) \ - _arch_xchg(ptr, x, ".rl", ".aqrl", "", RISCV_FULL_BARRIER, "") + _arch_xchg(ptr, x, LR_SFX(""), SC_SFX(".aqrl"), \ + CAS_SFX(".aqrl"), SC_PREPEND(__nops(1)), \ + SC_APPEND(__nops(1)))
#define xchg32(ptr, x) \ ({ \ @@ -126,9 +138,7 @@ * store NEW in MEM. Return the initial value in MEM. Success is * indicated by comparing RETURN with OLD. */ -#define __arch_cmpxchg_masked(sc_sfx, cas_sfx, \ - sc_prepend, sc_append, \ - cas_prepend, cas_append, \ +#define __arch_cmpxchg_masked(lr_sfx, sc_sfx, cas_sfx, sc_prepend, sc_append, \ r, p, o, n) \ ({ \ if (IS_ENABLED(CONFIG_RISCV_ISA_ZABHA) && \ @@ -138,9 +148,7 @@ r = o; \ \ __asm__ __volatile__ ( \ - cas_prepend \ " amocas" cas_sfx " %0, %z2, %1\n" \ - cas_append \ : "+&r" (r), "+A" (*(p)) \ : "rJ" (n) \ : "memory"); \ @@ -155,15 +163,17 @@ ulong __rc; \ \ __asm__ __volatile__ ( \ - sc_prepend \ - "0: lr.w %0, %2\n" \ + ALTERNATIVE(__nops(1), sc_prepend, \ + 0, RISCV_ISA_EXT_ZALASR, 1) \ + "0: lr.w" lr_sfx " %0, %2\n" \ " and %1, %0, %z5\n" \ " bne %1, %z3, 1f\n" \ " and %1, %0, %z6\n" \ " or %1, %1, %z4\n" \ " sc.w" sc_sfx " %1, %1, %2\n" \ " bnez %1, 0b\n" \ - sc_append \ + ALTERNATIVE(__nops(1), sc_append, \ + 0, RISCV_ISA_EXT_ZALASR, 1) \ "1:\n" \ : "=&r" (__retx), "=&r" (__rc), "+A" (*(__ptr32b)) \ : "rJ" ((long)__oldx), "rJ" (__newx), \ @@ -174,9 +184,7 @@ } \ })
-#define __arch_cmpxchg(lr_sfx, sc_sfx, cas_sfx, \ - sc_prepend, sc_append, \ - cas_prepend, cas_append, \ +#define __arch_cmpxchg(lr_sfx, sc_sfx, cas_sfx, sc_prepend, sc_append, \ r, p, co, o, n) \ ({ \ if (IS_ENABLED(CONFIG_RISCV_ISA_ZACAS) && \ @@ -184,9 +192,7 @@ r = o; \ \ __asm__ __volatile__ ( \ - cas_prepend \ " amocas" cas_sfx " %0, %z2, %1\n" \ - cas_append \ : "+&r" (r), "+A" (*(p)) \ : "rJ" (n) \ : "memory"); \ @@ -194,12 +200,14 @@ register unsigned int __rc; \ \ __asm__ __volatile__ ( \ - sc_prepend \ + ALTERNATIVE(__nops(1), sc_prepend, \ + 0, RISCV_ISA_EXT_ZALASR, 1) \ "0: lr" lr_sfx " %0, %2\n" \ " bne %0, %z3, 1f\n" \ " sc" sc_sfx " %1, %z4, %2\n" \ " bnez %1, 0b\n" \ - sc_append \ + ALTERNATIVE(__nops(1), sc_append, \ + 0, RISCV_ISA_EXT_ZALASR, 1) \ "1:\n" \ : "=&r" (r), "=&r" (__rc), "+A" (*(p)) \ : "rJ" (co o), "rJ" (n) \ @@ -207,9 +215,8 @@ } \ })
-#define _arch_cmpxchg(ptr, old, new, sc_sfx, cas_sfx, \ - sc_prepend, sc_append, \ - cas_prepend, cas_append) \ +#define _arch_cmpxchg(ptr, old, new, lr_sfx, sc_sfx, cas_sfx, \ + sc_prepend, sc_append) \ ({ \ __typeof__(ptr) __ptr = (ptr); \ __typeof__(*(__ptr)) __old = (old); \ @@ -218,27 +225,23 @@ \ switch (sizeof(*__ptr)) { \ case 1: \ - __arch_cmpxchg_masked(sc_sfx, ".b" cas_sfx, \ + __arch_cmpxchg_masked(lr_sfx, sc_sfx, ".b" cas_sfx, \ sc_prepend, sc_append, \ - cas_prepend, cas_append, \ __ret, __ptr, __old, __new); \ break; \ case 2: \ - __arch_cmpxchg_masked(sc_sfx, ".h" cas_sfx, \ + __arch_cmpxchg_masked(lr_sfx, sc_sfx, ".h" cas_sfx, \ sc_prepend, sc_append, \ - cas_prepend, cas_append, \ __ret, __ptr, __old, __new); \ break; \ case 4: \ - __arch_cmpxchg(".w", ".w" sc_sfx, ".w" cas_sfx, \ + __arch_cmpxchg(".w" lr_sfx, ".w" sc_sfx, ".w" cas_sfx, \ sc_prepend, sc_append, \ - cas_prepend, cas_append, \ __ret, __ptr, (long)(int)(long), __old, __new); \ break; \ case 8: \ - __arch_cmpxchg(".d", ".d" sc_sfx, ".d" cas_sfx, \ + __arch_cmpxchg(".d" lr_sfx, ".d" sc_sfx, ".d" cas_sfx, \ sc_prepend, sc_append, \ - cas_prepend, cas_append, \ __ret, __ptr, /**/, __old, __new); \ break; \ default: \ @@ -247,40 +250,27 @@ (__typeof__(*(__ptr)))__ret; \ })
-/* - * These macros are here to improve the readability of the arch_cmpxchg_XXX() - * macros. - */ -#define SC_SFX(x) x -#define CAS_SFX(x) x -#define SC_PREPEND(x) x -#define SC_APPEND(x) x -#define CAS_PREPEND(x) x -#define CAS_APPEND(x) x - #define arch_cmpxchg_relaxed(ptr, o, n) \ _arch_cmpxchg((ptr), (o), (n), \ - SC_SFX(""), CAS_SFX(""), \ - SC_PREPEND(""), SC_APPEND(""), \ - CAS_PREPEND(""), CAS_APPEND("")) + LR_SFX(""), SC_SFX(""), CAS_SFX(""), \ + SC_PREPEND(__nops(1)), SC_APPEND(__nops(1)))
#define arch_cmpxchg_acquire(ptr, o, n) \ _arch_cmpxchg((ptr), (o), (n), \ - SC_SFX(""), CAS_SFX(""), \ - SC_PREPEND(""), SC_APPEND(RISCV_ACQUIRE_BARRIER), \ - CAS_PREPEND(""), CAS_APPEND(RISCV_ACQUIRE_BARRIER)) + LR_SFX(".aq"), SC_SFX(""), CAS_SFX(".aq"), \ + SC_PREPEND(__nops(1)), \ + SC_APPEND(RISCV_ACQUIRE_BARRIER))
#define arch_cmpxchg_release(ptr, o, n) \ _arch_cmpxchg((ptr), (o), (n), \ - SC_SFX(""), CAS_SFX(""), \ - SC_PREPEND(RISCV_RELEASE_BARRIER), SC_APPEND(""), \ - CAS_PREPEND(RISCV_RELEASE_BARRIER), CAS_APPEND("")) + LR_SFX(""), SC_SFX(".rl"), CAS_SFX(".rl"), \ + SC_PREPEND(RISCV_RELEASE_BARRIER), \ + SC_APPEND(__nops(1)))
#define arch_cmpxchg(ptr, o, n) \ _arch_cmpxchg((ptr), (o), (n), \ - SC_SFX(".rl"), CAS_SFX(".aqrl"), \ - SC_PREPEND(""), SC_APPEND(RISCV_FULL_BARRIER), \ - CAS_PREPEND(""), CAS_APPEND("")) + LR_SFX(""), SC_SFX(".aqrl"), CAS_SFX(".aqrl"), \ + SC_PREPEND(__nops(1)), SC_APPEND(__nops(1)))
#define arch_cmpxchg_local(ptr, o, n) \ arch_cmpxchg_relaxed((ptr), (o), (n))
Hi Xu,
kernel test robot noticed the following build errors:
[auto build test ERROR on robh/for-next] [also build test ERROR on kvm/queue kvm/next linus/master v6.17-rc6] [cannot apply to kvm/linux-next] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/Xu-Lu/riscv-add-ISA-extension... base: https://git.kernel.org/pub/scm/linux/kernel/git/robh/linux.git for-next patch link: https://lore.kernel.org/r/20250919073714.83063-7-luxu.kernel%40bytedance.com patch subject: [PATCH v3 6/8] riscv: Apply acquire/release semantics to arch_xchg/arch_cmpxchg operations config: riscv-randconfig-002-20250920 (https://download.01.org/0day-ci/archive/20250920/202509202249.rOR3GJbT-lkp@i...) compiler: clang version 22.0.0git (https://github.com/llvm/llvm-project 7c861bcedf61607b6c087380ac711eb7ff918ca6) reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250920/202509202249.rOR3GJbT-lkp@i...)
If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot lkp@intel.com | Closes: https://lore.kernel.org/oe-kbuild-all/202509202249.rOR3GJbT-lkp@intel.com/
All errors (new ones prefixed by >>):
In file included from lib/objpool.c:3:
include/linux/objpool.h:156:7: error: invalid .org offset '1528' (at offset '1532')
156 | if (try_cmpxchg_release(&slot->head, &head, head + 1)) | ^ include/linux/atomic/atomic-instrumented.h:4899:2: note: expanded from macro 'try_cmpxchg_release' 4899 | raw_try_cmpxchg_release(__ai_ptr, __ai_oldp, __VA_ARGS__); \ | ^ include/linux/atomic/atomic-arch-fallback.h:228:9: note: expanded from macro 'raw_try_cmpxchg_release' 228 | ___r = raw_cmpxchg_release((_ptr), ___o, (_new)); \ | ^ include/linux/atomic/atomic-arch-fallback.h:77:29: note: expanded from macro 'raw_cmpxchg_release' 77 | #define raw_cmpxchg_release arch_cmpxchg_release | ^ note: (skipping 4 expansions in backtrace; use -fmacro-backtrace-limit=0 to see all) arch/riscv/include/asm/alternative-macros.h:104:2: note: expanded from macro '_ALTERNATIVE_CFG' 104 | __ALTERNATIVE_CFG(old_c, new_c, vendor_id, patch_id, IS_ENABLED(CONFIG_k)) | ^ arch/riscv/include/asm/alternative-macros.h:94:2: note: expanded from macro '__ALTERNATIVE_CFG' 94 | ALT_NEW_CONTENT(vendor_id, patch_id, enable, new_c) | ^ arch/riscv/include/asm/alternative-macros.h:81:3: note: expanded from macro 'ALT_NEW_CONTENT' 81 | ".org . - (887b - 886b) + (889b - 888b)\n" \ | ^ <inline asm>:27:6: note: instantiated into assembly here 27 | .org . - (887b - 886b) + (889b - 888b) | ^ In file included from lib/objpool.c:3:
include/linux/objpool.h:156:7: error: invalid .org offset '1528' (at offset '1532')
156 | if (try_cmpxchg_release(&slot->head, &head, head + 1)) | ^ include/linux/atomic/atomic-instrumented.h:4899:2: note: expanded from macro 'try_cmpxchg_release' 4899 | raw_try_cmpxchg_release(__ai_ptr, __ai_oldp, __VA_ARGS__); \ | ^ include/linux/atomic/atomic-arch-fallback.h:228:9: note: expanded from macro 'raw_try_cmpxchg_release' 228 | ___r = raw_cmpxchg_release((_ptr), ___o, (_new)); \ | ^ include/linux/atomic/atomic-arch-fallback.h:77:29: note: expanded from macro 'raw_cmpxchg_release' 77 | #define raw_cmpxchg_release arch_cmpxchg_release | ^ note: (skipping 4 expansions in backtrace; use -fmacro-backtrace-limit=0 to see all) arch/riscv/include/asm/alternative-macros.h:104:2: note: expanded from macro '_ALTERNATIVE_CFG' 104 | __ALTERNATIVE_CFG(old_c, new_c, vendor_id, patch_id, IS_ENABLED(CONFIG_k)) | ^ arch/riscv/include/asm/alternative-macros.h:94:2: note: expanded from macro '__ALTERNATIVE_CFG' 94 | ALT_NEW_CONTENT(vendor_id, patch_id, enable, new_c) | ^ arch/riscv/include/asm/alternative-macros.h:81:3: note: expanded from macro 'ALT_NEW_CONTENT' 81 | ".org . - (887b - 886b) + (889b - 888b)\n" \ | ^ <inline asm>:27:6: note: instantiated into assembly here 27 | .org . - (887b - 886b) + (889b - 888b) | ^ 2 errors generated. --
lib/generic-radix-tree.c:53:12: error: invalid .org offset '1850' (at offset '1854')
53 | if ((v = cmpxchg_release(&radix->root, r, new_root)) == r) { | ^ include/linux/atomic/atomic-instrumented.h:4803:2: note: expanded from macro 'cmpxchg_release' 4803 | raw_cmpxchg_release(__ai_ptr, __VA_ARGS__); \ | ^ include/linux/atomic/atomic-arch-fallback.h:77:29: note: expanded from macro 'raw_cmpxchg_release' 77 | #define raw_cmpxchg_release arch_cmpxchg_release | ^ arch/riscv/include/asm/cmpxchg.h:265:2: note: expanded from macro 'arch_cmpxchg_release' 265 | _arch_cmpxchg((ptr), (o), (n), \ | ^ note: (skipping 3 expansions in backtrace; use -fmacro-backtrace-limit=0 to see all) arch/riscv/include/asm/alternative-macros.h:104:2: note: expanded from macro '_ALTERNATIVE_CFG' 104 | __ALTERNATIVE_CFG(old_c, new_c, vendor_id, patch_id, IS_ENABLED(CONFIG_k)) | ^ arch/riscv/include/asm/alternative-macros.h:94:2: note: expanded from macro '__ALTERNATIVE_CFG' 94 | ALT_NEW_CONTENT(vendor_id, patch_id, enable, new_c) | ^ arch/riscv/include/asm/alternative-macros.h:81:3: note: expanded from macro 'ALT_NEW_CONTENT' 81 | ".org . - (887b - 886b) + (889b - 888b)\n" \ | ^ <inline asm>:27:6: note: instantiated into assembly here 27 | .org . - (887b - 886b) + (889b - 888b) | ^ lib/generic-radix-tree.c:74:14: error: invalid .org offset '1862' (at offset '1866') 74 | if (!(n = cmpxchg_release(p, NULL, new_node))) | ^ include/linux/atomic/atomic-instrumented.h:4803:2: note: expanded from macro 'cmpxchg_release' 4803 | raw_cmpxchg_release(__ai_ptr, __VA_ARGS__); \ | ^ include/linux/atomic/atomic-arch-fallback.h:77:29: note: expanded from macro 'raw_cmpxchg_release' 77 | #define raw_cmpxchg_release arch_cmpxchg_release | ^ arch/riscv/include/asm/cmpxchg.h:265:2: note: expanded from macro 'arch_cmpxchg_release' 265 | _arch_cmpxchg((ptr), (o), (n), \ | ^ note: (skipping 3 expansions in backtrace; use -fmacro-backtrace-limit=0 to see all) arch/riscv/include/asm/alternative-macros.h:104:2: note: expanded from macro '_ALTERNATIVE_CFG' 104 | __ALTERNATIVE_CFG(old_c, new_c, vendor_id, patch_id, IS_ENABLED(CONFIG_k)) | ^ arch/riscv/include/asm/alternative-macros.h:94:2: note: expanded from macro '__ALTERNATIVE_CFG' 94 | ALT_NEW_CONTENT(vendor_id, patch_id, enable, new_c) | ^ arch/riscv/include/asm/alternative-macros.h:81:3: note: expanded from macro 'ALT_NEW_CONTENT' 81 | ".org . - (887b - 886b) + (889b - 888b)\n" \ | ^ <inline asm>:27:6: note: instantiated into assembly here 27 | .org . - (887b - 886b) + (889b - 888b) | ^
lib/generic-radix-tree.c:53:12: error: invalid .org offset '1850' (at offset '1854')
53 | if ((v = cmpxchg_release(&radix->root, r, new_root)) == r) { | ^ include/linux/atomic/atomic-instrumented.h:4803:2: note: expanded from macro 'cmpxchg_release' 4803 | raw_cmpxchg_release(__ai_ptr, __VA_ARGS__); \ | ^ include/linux/atomic/atomic-arch-fallback.h:77:29: note: expanded from macro 'raw_cmpxchg_release' 77 | #define raw_cmpxchg_release arch_cmpxchg_release | ^ arch/riscv/include/asm/cmpxchg.h:265:2: note: expanded from macro 'arch_cmpxchg_release' 265 | _arch_cmpxchg((ptr), (o), (n), \ | ^ note: (skipping 3 expansions in backtrace; use -fmacro-backtrace-limit=0 to see all) arch/riscv/include/asm/alternative-macros.h:104:2: note: expanded from macro '_ALTERNATIVE_CFG' 104 | __ALTERNATIVE_CFG(old_c, new_c, vendor_id, patch_id, IS_ENABLED(CONFIG_k)) | ^ arch/riscv/include/asm/alternative-macros.h:94:2: note: expanded from macro '__ALTERNATIVE_CFG' 94 | ALT_NEW_CONTENT(vendor_id, patch_id, enable, new_c) | ^ arch/riscv/include/asm/alternative-macros.h:81:3: note: expanded from macro 'ALT_NEW_CONTENT' 81 | ".org . - (887b - 886b) + (889b - 888b)\n" \ | ^ <inline asm>:27:6: note: instantiated into assembly here 27 | .org . - (887b - 886b) + (889b - 888b) | ^ lib/generic-radix-tree.c:74:14: error: invalid .org offset '1862' (at offset '1866') 74 | if (!(n = cmpxchg_release(p, NULL, new_node))) | ^ include/linux/atomic/atomic-instrumented.h:4803:2: note: expanded from macro 'cmpxchg_release' 4803 | raw_cmpxchg_release(__ai_ptr, __VA_ARGS__); \ | ^ include/linux/atomic/atomic-arch-fallback.h:77:29: note: expanded from macro 'raw_cmpxchg_release' 77 | #define raw_cmpxchg_release arch_cmpxchg_release | ^ arch/riscv/include/asm/cmpxchg.h:265:2: note: expanded from macro 'arch_cmpxchg_release' 265 | _arch_cmpxchg((ptr), (o), (n), \ | ^ note: (skipping 3 expansions in backtrace; use -fmacro-backtrace-limit=0 to see all) arch/riscv/include/asm/alternative-macros.h:104:2: note: expanded from macro '_ALTERNATIVE_CFG' 104 | __ALTERNATIVE_CFG(old_c, new_c, vendor_id, patch_id, IS_ENABLED(CONFIG_k)) | ^ arch/riscv/include/asm/alternative-macros.h:94:2: note: expanded from macro '__ALTERNATIVE_CFG' 94 | ALT_NEW_CONTENT(vendor_id, patch_id, enable, new_c) | ^ arch/riscv/include/asm/alternative-macros.h:81:3: note: expanded from macro 'ALT_NEW_CONTENT' 81 | ".org . - (887b - 886b) + (889b - 888b)\n" \ | ^ <inline asm>:27:6: note: instantiated into assembly here 27 | .org . - (887b - 886b) + (889b - 888b) | ^ 4 errors generated. -- In file included from lib/refcount.c:6: In file included from include/linux/mutex.h:17: In file included from include/linux/lockdep.h:14: In file included from include/linux/smp.h:13: In file included from include/linux/cpumask.h:14: In file included from include/linux/atomic.h:80:
include/linux/atomic/atomic-arch-fallback.h:2083:9: error: invalid .org offset '528' (at offset '532')
2083 | return raw_cmpxchg_release(&v->counter, old, new); | ^ include/linux/atomic/atomic-arch-fallback.h:77:29: note: expanded from macro 'raw_cmpxchg_release' 77 | #define raw_cmpxchg_release arch_cmpxchg_release | ^ arch/riscv/include/asm/cmpxchg.h:265:2: note: expanded from macro 'arch_cmpxchg_release' 265 | _arch_cmpxchg((ptr), (o), (n), \ | ^ arch/riscv/include/asm/cmpxchg.h:238:3: note: expanded from macro '_arch_cmpxchg' 238 | __arch_cmpxchg(".w" lr_sfx, ".w" sc_sfx, ".w" cas_sfx, \ | ^ note: (skipping 2 expansions in backtrace; use -fmacro-backtrace-limit=0 to see all) arch/riscv/include/asm/alternative-macros.h:104:2: note: expanded from macro '_ALTERNATIVE_CFG' 104 | __ALTERNATIVE_CFG(old_c, new_c, vendor_id, patch_id, IS_ENABLED(CONFIG_k)) | ^ arch/riscv/include/asm/alternative-macros.h:94:2: note: expanded from macro '__ALTERNATIVE_CFG' 94 | ALT_NEW_CONTENT(vendor_id, patch_id, enable, new_c) | ^ arch/riscv/include/asm/alternative-macros.h:81:3: note: expanded from macro 'ALT_NEW_CONTENT' 81 | ".org . - (887b - 886b) + (889b - 888b)\n" \ | ^ <inline asm>:27:6: note: instantiated into assembly here 27 | .org . - (887b - 886b) + (889b - 888b) | ^ In file included from lib/refcount.c:6: In file included from include/linux/mutex.h:17: In file included from include/linux/lockdep.h:14: In file included from include/linux/smp.h:13: In file included from include/linux/cpumask.h:14: In file included from include/linux/atomic.h:80: include/linux/atomic/atomic-arch-fallback.h:2083:9: error: invalid .org offset '540' (at offset '544') 2083 | return raw_cmpxchg_release(&v->counter, old, new); | ^ include/linux/atomic/atomic-arch-fallback.h:77:29: note: expanded from macro 'raw_cmpxchg_release' 77 | #define raw_cmpxchg_release arch_cmpxchg_release | ^ arch/riscv/include/asm/cmpxchg.h:265:2: note: expanded from macro 'arch_cmpxchg_release' 265 | _arch_cmpxchg((ptr), (o), (n), \ | ^ arch/riscv/include/asm/cmpxchg.h:238:3: note: expanded from macro '_arch_cmpxchg' 238 | __arch_cmpxchg(".w" lr_sfx, ".w" sc_sfx, ".w" cas_sfx, \ | ^ note: (skipping 2 expansions in backtrace; use -fmacro-backtrace-limit=0 to see all) arch/riscv/include/asm/alternative-macros.h:104:2: note: expanded from macro '_ALTERNATIVE_CFG' 104 | __ALTERNATIVE_CFG(old_c, new_c, vendor_id, patch_id, IS_ENABLED(CONFIG_k)) | ^ arch/riscv/include/asm/alternative-macros.h:94:2: note: expanded from macro '__ALTERNATIVE_CFG' 94 | ALT_NEW_CONTENT(vendor_id, patch_id, enable, new_c) | ^ arch/riscv/include/asm/alternative-macros.h:81:3: note: expanded from macro 'ALT_NEW_CONTENT' 81 | ".org . - (887b - 886b) + (889b - 888b)\n" \ | ^ <inline asm>:27:6: note: instantiated into assembly here 27 | .org . - (887b - 886b) + (889b - 888b) | ^ In file included from lib/refcount.c:6: In file included from include/linux/mutex.h:17: In file included from include/linux/lockdep.h:14: In file included from include/linux/smp.h:13: In file included from include/linux/cpumask.h:14: In file included from include/linux/atomic.h:80:
include/linux/atomic/atomic-arch-fallback.h:2083:9: error: invalid .org offset '528' (at offset '532')
2083 | return raw_cmpxchg_release(&v->counter, old, new); | ^ include/linux/atomic/atomic-arch-fallback.h:77:29: note: expanded from macro 'raw_cmpxchg_release' 77 | #define raw_cmpxchg_release arch_cmpxchg_release | ^ arch/riscv/include/asm/cmpxchg.h:265:2: note: expanded from macro 'arch_cmpxchg_release' 265 | _arch_cmpxchg((ptr), (o), (n), \ | ^ arch/riscv/include/asm/cmpxchg.h:238:3: note: expanded from macro '_arch_cmpxchg' 238 | __arch_cmpxchg(".w" lr_sfx, ".w" sc_sfx, ".w" cas_sfx, \ | ^ note: (skipping 2 expansions in backtrace; use -fmacro-backtrace-limit=0 to see all) arch/riscv/include/asm/alternative-macros.h:104:2: note: expanded from macro '_ALTERNATIVE_CFG' 104 | __ALTERNATIVE_CFG(old_c, new_c, vendor_id, patch_id, IS_ENABLED(CONFIG_k)) | ^ arch/riscv/include/asm/alternative-macros.h:94:2: note: expanded from macro '__ALTERNATIVE_CFG' 94 | ALT_NEW_CONTENT(vendor_id, patch_id, enable, new_c) | ^ arch/riscv/include/asm/alternative-macros.h:81:3: note: expanded from macro 'ALT_NEW_CONTENT' 81 | ".org . - (887b - 886b) + (889b - 888b)\n" \ | ^ <inline asm>:27:6: note: instantiated into assembly here 27 | .org . - (887b - 886b) + (889b - 888b) | ^ In file included from lib/refcount.c:6: In file included from include/linux/mutex.h:17: In file included from include/linux/lockdep.h:14: In file included from include/linux/smp.h:13: In file included from include/linux/cpumask.h:14: In file included from include/linux/atomic.h:80: include/linux/atomic/atomic-arch-fallback.h:2083:9: error: invalid .org offset '540' (at offset '544') 2083 | return raw_cmpxchg_release(&v->counter, old, new); | ^ include/linux/atomic/atomic-arch-fallback.h:77:29: note: expanded from macro 'raw_cmpxchg_release' 77 | #define raw_cmpxchg_release arch_cmpxchg_release | ^ arch/riscv/include/asm/cmpxchg.h:265:2: note: expanded from macro 'arch_cmpxchg_release' 265 | _arch_cmpxchg((ptr), (o), (n), \ | ^ arch/riscv/include/asm/cmpxchg.h:238:3: note: expanded from macro '_arch_cmpxchg' 238 | __arch_cmpxchg(".w" lr_sfx, ".w" sc_sfx, ".w" cas_sfx, \ | ^ note: (skipping 2 expansions in backtrace; use -fmacro-backtrace-limit=0 to see all) arch/riscv/include/asm/alternative-macros.h:104:2: note: expanded from macro '_ALTERNATIVE_CFG' 104 | __ALTERNATIVE_CFG(old_c, new_c, vendor_id, patch_id, IS_ENABLED(CONFIG_k)) | ^ arch/riscv/include/asm/alternative-macros.h:94:2: note: expanded from macro '__ALTERNATIVE_CFG' 94 | ALT_NEW_CONTENT(vendor_id, patch_id, enable, new_c) | ^ arch/riscv/include/asm/alternative-macros.h:81:3: note: expanded from macro 'ALT_NEW_CONTENT' 81 | ".org . - (887b - 886b) + (889b - 888b)\n" \ | ^ <inline asm>:27:6: note: instantiated into assembly here 27 | .org . - (887b - 886b) + (889b - 888b) | ^ 4 errors generated. --
fs/overlayfs/file.c:147:10: error: invalid .org offset '3640' (at offset '3644')
147 | old = cmpxchg_release(&of->upperfile, NULL, upperfile); | ^ include/linux/atomic/atomic-instrumented.h:4803:2: note: expanded from macro 'cmpxchg_release' 4803 | raw_cmpxchg_release(__ai_ptr, __VA_ARGS__); \ | ^ include/linux/atomic/atomic-arch-fallback.h:77:29: note: expanded from macro 'raw_cmpxchg_release' 77 | #define raw_cmpxchg_release arch_cmpxchg_release | ^ arch/riscv/include/asm/cmpxchg.h:265:2: note: expanded from macro 'arch_cmpxchg_release' 265 | _arch_cmpxchg((ptr), (o), (n), \ | ^ note: (skipping 3 expansions in backtrace; use -fmacro-backtrace-limit=0 to see all) arch/riscv/include/asm/alternative-macros.h:104:2: note: expanded from macro '_ALTERNATIVE_CFG' 104 | __ALTERNATIVE_CFG(old_c, new_c, vendor_id, patch_id, IS_ENABLED(CONFIG_k)) | ^ arch/riscv/include/asm/alternative-macros.h:94:2: note: expanded from macro '__ALTERNATIVE_CFG' 94 | ALT_NEW_CONTENT(vendor_id, patch_id, enable, new_c) | ^ arch/riscv/include/asm/alternative-macros.h:81:3: note: expanded from macro 'ALT_NEW_CONTENT' 81 | ".org . - (887b - 886b) + (889b - 888b)\n" \ | ^ <inline asm>:27:6: note: instantiated into assembly here 27 | .org . - (887b - 886b) + (889b - 888b) | ^
fs/overlayfs/file.c:147:10: error: invalid .org offset '3640' (at offset '3644')
147 | old = cmpxchg_release(&of->upperfile, NULL, upperfile); | ^ include/linux/atomic/atomic-instrumented.h:4803:2: note: expanded from macro 'cmpxchg_release' 4803 | raw_cmpxchg_release(__ai_ptr, __VA_ARGS__); \ | ^ include/linux/atomic/atomic-arch-fallback.h:77:29: note: expanded from macro 'raw_cmpxchg_release' 77 | #define raw_cmpxchg_release arch_cmpxchg_release | ^ arch/riscv/include/asm/cmpxchg.h:265:2: note: expanded from macro 'arch_cmpxchg_release' 265 | _arch_cmpxchg((ptr), (o), (n), \ | ^ note: (skipping 3 expansions in backtrace; use -fmacro-backtrace-limit=0 to see all) arch/riscv/include/asm/alternative-macros.h:104:2: note: expanded from macro '_ALTERNATIVE_CFG' 104 | __ALTERNATIVE_CFG(old_c, new_c, vendor_id, patch_id, IS_ENABLED(CONFIG_k)) | ^ arch/riscv/include/asm/alternative-macros.h:94:2: note: expanded from macro '__ALTERNATIVE_CFG' 94 | ALT_NEW_CONTENT(vendor_id, patch_id, enable, new_c) | ^ arch/riscv/include/asm/alternative-macros.h:81:3: note: expanded from macro 'ALT_NEW_CONTENT' 81 | ".org . - (887b - 886b) + (889b - 888b)\n" \ | ^ <inline asm>:27:6: note: instantiated into assembly here 27 | .org . - (887b - 886b) + (889b - 888b) | ^ 2 errors generated. ..
vim +156 include/linux/objpool.h
b4edb8d2d4647a wuqiang.matt 2023-10-17 100 b4edb8d2d4647a wuqiang.matt 2023-10-17 101 /** b4edb8d2d4647a wuqiang.matt 2023-10-17 102 * objpool_init() - initialize objpool and pre-allocated objects b4edb8d2d4647a wuqiang.matt 2023-10-17 103 * @pool: the object pool to be initialized, declared by caller b4edb8d2d4647a wuqiang.matt 2023-10-17 104 * @nr_objs: total objects to be pre-allocated by this object pool b4edb8d2d4647a wuqiang.matt 2023-10-17 105 * @object_size: size of an object (should be > 0) b4edb8d2d4647a wuqiang.matt 2023-10-17 106 * @gfp: flags for memory allocation (via kmalloc or vmalloc) b4edb8d2d4647a wuqiang.matt 2023-10-17 107 * @context: user context for object initialization callback b4edb8d2d4647a wuqiang.matt 2023-10-17 108 * @objinit: object initialization callback for extra setup b4edb8d2d4647a wuqiang.matt 2023-10-17 109 * @release: cleanup callback for extra cleanup task b4edb8d2d4647a wuqiang.matt 2023-10-17 110 * b4edb8d2d4647a wuqiang.matt 2023-10-17 111 * return value: 0 for success, otherwise error code b4edb8d2d4647a wuqiang.matt 2023-10-17 112 * b4edb8d2d4647a wuqiang.matt 2023-10-17 113 * All pre-allocated objects are to be zeroed after memory allocation. b4edb8d2d4647a wuqiang.matt 2023-10-17 114 * Caller could do extra initialization in objinit callback. objinit() b4edb8d2d4647a wuqiang.matt 2023-10-17 115 * will be called just after slot allocation and called only once for b4edb8d2d4647a wuqiang.matt 2023-10-17 116 * each object. After that the objpool won't touch any content of the b4edb8d2d4647a wuqiang.matt 2023-10-17 117 * objects. It's caller's duty to perform reinitialization after each b4edb8d2d4647a wuqiang.matt 2023-10-17 118 * pop (object allocation) or do clearance before each push (object b4edb8d2d4647a wuqiang.matt 2023-10-17 119 * reclamation). b4edb8d2d4647a wuqiang.matt 2023-10-17 120 */ b4edb8d2d4647a wuqiang.matt 2023-10-17 121 int objpool_init(struct objpool_head *pool, int nr_objs, int object_size, b4edb8d2d4647a wuqiang.matt 2023-10-17 122 gfp_t gfp, void *context, objpool_init_obj_cb objinit, b4edb8d2d4647a wuqiang.matt 2023-10-17 123 objpool_fini_cb release); b4edb8d2d4647a wuqiang.matt 2023-10-17 124 a3b00f10da808b Andrii Nakryiko 2024-04-24 125 /* try to retrieve object from slot */ a3b00f10da808b Andrii Nakryiko 2024-04-24 126 static inline void *__objpool_try_get_slot(struct objpool_head *pool, int cpu) a3b00f10da808b Andrii Nakryiko 2024-04-24 127 { a3b00f10da808b Andrii Nakryiko 2024-04-24 128 struct objpool_slot *slot = pool->cpu_slots[cpu]; a3b00f10da808b Andrii Nakryiko 2024-04-24 129 /* load head snapshot, other cpus may change it */ a3b00f10da808b Andrii Nakryiko 2024-04-24 130 uint32_t head = smp_load_acquire(&slot->head); a3b00f10da808b Andrii Nakryiko 2024-04-24 131 a3b00f10da808b Andrii Nakryiko 2024-04-24 132 while (head != READ_ONCE(slot->last)) { a3b00f10da808b Andrii Nakryiko 2024-04-24 133 void *obj; a3b00f10da808b Andrii Nakryiko 2024-04-24 134 a3b00f10da808b Andrii Nakryiko 2024-04-24 135 /* a3b00f10da808b Andrii Nakryiko 2024-04-24 136 * data visibility of 'last' and 'head' could be out of a3b00f10da808b Andrii Nakryiko 2024-04-24 137 * order since memory updating of 'last' and 'head' are a3b00f10da808b Andrii Nakryiko 2024-04-24 138 * performed in push() and pop() independently a3b00f10da808b Andrii Nakryiko 2024-04-24 139 * a3b00f10da808b Andrii Nakryiko 2024-04-24 140 * before any retrieving attempts, pop() must guarantee a3b00f10da808b Andrii Nakryiko 2024-04-24 141 * 'last' is behind 'head', that is to say, there must a3b00f10da808b Andrii Nakryiko 2024-04-24 142 * be available objects in slot, which could be ensured a3b00f10da808b Andrii Nakryiko 2024-04-24 143 * by condition 'last != head && last - head <= nr_objs' a3b00f10da808b Andrii Nakryiko 2024-04-24 144 * that is equivalent to 'last - head - 1 < nr_objs' as a3b00f10da808b Andrii Nakryiko 2024-04-24 145 * 'last' and 'head' are both unsigned int32 a3b00f10da808b Andrii Nakryiko 2024-04-24 146 */ a3b00f10da808b Andrii Nakryiko 2024-04-24 147 if (READ_ONCE(slot->last) - head - 1 >= pool->nr_objs) { a3b00f10da808b Andrii Nakryiko 2024-04-24 148 head = READ_ONCE(slot->head); a3b00f10da808b Andrii Nakryiko 2024-04-24 149 continue; a3b00f10da808b Andrii Nakryiko 2024-04-24 150 } a3b00f10da808b Andrii Nakryiko 2024-04-24 151 a3b00f10da808b Andrii Nakryiko 2024-04-24 152 /* obj must be retrieved before moving forward head */ a3b00f10da808b Andrii Nakryiko 2024-04-24 153 obj = READ_ONCE(slot->entries[head & slot->mask]); a3b00f10da808b Andrii Nakryiko 2024-04-24 154 a3b00f10da808b Andrii Nakryiko 2024-04-24 155 /* move head forward to mark it's consumption */ a3b00f10da808b Andrii Nakryiko 2024-04-24 @156 if (try_cmpxchg_release(&slot->head, &head, head + 1)) a3b00f10da808b Andrii Nakryiko 2024-04-24 157 return obj; a3b00f10da808b Andrii Nakryiko 2024-04-24 158 } a3b00f10da808b Andrii Nakryiko 2024-04-24 159 a3b00f10da808b Andrii Nakryiko 2024-04-24 160 return NULL; a3b00f10da808b Andrii Nakryiko 2024-04-24 161 } a3b00f10da808b Andrii Nakryiko 2024-04-24 162
Extend the KVM ISA extension ONE_REG interface to allow KVM user space to detect and enable Zalasr extensions for Guest/VM.
Signed-off-by: Xu Lu luxu.kernel@bytedance.com --- arch/riscv/include/uapi/asm/kvm.h | 1 + arch/riscv/kvm/vcpu_onereg.c | 2 ++ 2 files changed, 3 insertions(+)
diff --git a/arch/riscv/include/uapi/asm/kvm.h b/arch/riscv/include/uapi/asm/kvm.h index ef27d4289da11..4fbc32ef888fa 100644 --- a/arch/riscv/include/uapi/asm/kvm.h +++ b/arch/riscv/include/uapi/asm/kvm.h @@ -185,6 +185,7 @@ enum KVM_RISCV_ISA_EXT_ID { KVM_RISCV_ISA_EXT_ZICCRSE, KVM_RISCV_ISA_EXT_ZAAMO, KVM_RISCV_ISA_EXT_ZALRSC, + KVM_RISCV_ISA_EXT_ZALASR, KVM_RISCV_ISA_EXT_MAX, };
diff --git a/arch/riscv/kvm/vcpu_onereg.c b/arch/riscv/kvm/vcpu_onereg.c index cce6a38ea54f2..6ae5f9859f25b 100644 --- a/arch/riscv/kvm/vcpu_onereg.c +++ b/arch/riscv/kvm/vcpu_onereg.c @@ -50,6 +50,7 @@ static const unsigned long kvm_isa_ext_arr[] = { KVM_ISA_EXT_ARR(ZAAMO), KVM_ISA_EXT_ARR(ZABHA), KVM_ISA_EXT_ARR(ZACAS), + KVM_ISA_EXT_ARR(ZALASR), KVM_ISA_EXT_ARR(ZALRSC), KVM_ISA_EXT_ARR(ZAWRS), KVM_ISA_EXT_ARR(ZBA), @@ -184,6 +185,7 @@ static bool kvm_riscv_vcpu_isa_disable_allowed(unsigned long ext) case KVM_RISCV_ISA_EXT_ZAAMO: case KVM_RISCV_ISA_EXT_ZABHA: case KVM_RISCV_ISA_EXT_ZACAS: + case KVM_RISCV_ISA_EXT_ZALASR: case KVM_RISCV_ISA_EXT_ZALRSC: case KVM_RISCV_ISA_EXT_ZAWRS: case KVM_RISCV_ISA_EXT_ZBA:
The KVM RISC-V allows Zalasr extensions for Guest/VM so add these extensions to get-reg-list test.
Signed-off-by: Xu Lu luxu.kernel@bytedance.com --- tools/testing/selftests/kvm/riscv/get-reg-list.c | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/tools/testing/selftests/kvm/riscv/get-reg-list.c b/tools/testing/selftests/kvm/riscv/get-reg-list.c index a0b7dabb50406..3020e37f621ba 100644 --- a/tools/testing/selftests/kvm/riscv/get-reg-list.c +++ b/tools/testing/selftests/kvm/riscv/get-reg-list.c @@ -65,6 +65,7 @@ bool filter_reg(__u64 reg) case KVM_REG_RISCV_ISA_EXT | KVM_REG_RISCV_ISA_SINGLE | KVM_RISCV_ISA_EXT_ZAAMO: case KVM_REG_RISCV_ISA_EXT | KVM_REG_RISCV_ISA_SINGLE | KVM_RISCV_ISA_EXT_ZABHA: case KVM_REG_RISCV_ISA_EXT | KVM_REG_RISCV_ISA_SINGLE | KVM_RISCV_ISA_EXT_ZACAS: + case KVM_REG_RISCV_ISA_EXT | KVM_REG_RISCV_ISA_SINGLE | KVM_RISCV_ISA_EXT_ZALASR: case KVM_REG_RISCV_ISA_EXT | KVM_REG_RISCV_ISA_SINGLE | KVM_RISCV_ISA_EXT_ZALRSC: case KVM_REG_RISCV_ISA_EXT | KVM_REG_RISCV_ISA_SINGLE | KVM_RISCV_ISA_EXT_ZAWRS: case KVM_REG_RISCV_ISA_EXT | KVM_REG_RISCV_ISA_SINGLE | KVM_RISCV_ISA_EXT_ZBA: @@ -517,6 +518,7 @@ static const char *isa_ext_single_id_to_str(__u64 reg_off) KVM_ISA_EXT_ARR(ZAAMO), KVM_ISA_EXT_ARR(ZABHA), KVM_ISA_EXT_ARR(ZACAS), + KVM_ISA_EXT_ARR(ZALASR), KVM_ISA_EXT_ARR(ZALRSC), KVM_ISA_EXT_ARR(ZAWRS), KVM_ISA_EXT_ARR(ZBA), @@ -1112,6 +1114,7 @@ KVM_ISA_EXT_SIMPLE_CONFIG(svvptc, SVVPTC); KVM_ISA_EXT_SIMPLE_CONFIG(zaamo, ZAAMO); KVM_ISA_EXT_SIMPLE_CONFIG(zabha, ZABHA); KVM_ISA_EXT_SIMPLE_CONFIG(zacas, ZACAS); +KVM_ISA_EXT_SIMPLE_CONFIG(zalasr, ZALASR); KVM_ISA_EXT_SIMPLE_CONFIG(zalrsc, ZALRSC); KVM_ISA_EXT_SIMPLE_CONFIG(zawrs, ZAWRS); KVM_ISA_EXT_SIMPLE_CONFIG(zba, ZBA); @@ -1187,6 +1190,7 @@ struct vcpu_reg_list *vcpu_configs[] = { &config_zabha, &config_zacas, &config_zalrsc, + &config_zalasr, &config_zawrs, &config_zba, &config_zbb,
On Fri, Sep 19, 2025 at 03:37:06PM +0800, Xu Lu wrote:
This patch adds support for the Zalasr ISA extension, which supplies the real load acquire/store release instructions.
The specification can be found here: https://github.com/riscv/riscv-zalasr/blob/main/chapter2.adoc
This patch seires has been tested with ltp on Qemu with Brensan's zalasr support patch[1].
Some false positive spacing error happens during patch checking. Thus I CCed maintainers of checkpatch.pl as well.
[1] https://lore.kernel.org/all/CAGPSXwJEdtqW=nx71oufZp64nK6tK=0rytVEcz4F-gfvCOX...
v3:
- Apply acquire/release semantics to arch_xchg/arch_cmpxchg operations
so as to ensure FENCE.TSO ordering between operations which precede the UNLOCK+LOCK sequence and operations which follow the sequence. Thanks to Andrea.
- Support hwprobe of Zalasr.
- Allow Zalasr extensions for Guest/VM.
v2:
- Adjust the order of Zalasr and Zalrsc in dt-bindings. Thanks to
Conor.
Xu Lu (8): riscv: add ISA extension parsing for Zalasr dt-bindings: riscv: Add Zalasr ISA extension description riscv: hwprobe: Export Zalasr extension riscv: Introduce Zalasr instructions riscv: Use Zalasr for smp_load_acquire/smp_store_release riscv: Apply acquire/release semantics to arch_xchg/arch_cmpxchg operations RISC-V: KVM: Allow Zalasr extensions for Guest/VM KVM: riscv: selftests: Add Zalasr extensions to get-reg-list test
Documentation/arch/riscv/hwprobe.rst | 5 +- .../devicetree/bindings/riscv/extensions.yaml | 5 + arch/riscv/include/asm/atomic.h | 6 - arch/riscv/include/asm/barrier.h | 91 ++++++++++-- arch/riscv/include/asm/cmpxchg.h | 136 ++++++++---------- arch/riscv/include/asm/hwcap.h | 1 + arch/riscv/include/asm/insn-def.h | 79 ++++++++++ arch/riscv/include/uapi/asm/hwprobe.h | 1 + arch/riscv/include/uapi/asm/kvm.h | 1 + arch/riscv/kernel/cpufeature.c | 1 + arch/riscv/kernel/sys_hwprobe.c | 1 + arch/riscv/kvm/vcpu_onereg.c | 2 + .../selftests/kvm/riscv/get-reg-list.c | 4 + 13 files changed, 242 insertions(+), 91 deletions(-)
I wouldn't have rushed this submission while the discussion on v2 seems so much alive; IAC, to add and link to that discussion, this version (not a review, just looking at this diff stat) is changing the fastpath
read_unlock() read_lock()
from something like
fence rw,w amodadd.w amoadd.w fence r,rw
to
fence rw,rw amoadd.w amoadd.w fence rw,rw
no matter Zalasr or !Zalasr. Similarly for other atomic operations with release or acquire semantics. I guess the change was not intentional? If it was intentional, it should be properly mentioned in the changelog.
Andrea
Hi Andrea,
On Fri, Sep 19, 2025 at 6:04 PM Andrea Parri parri.andrea@gmail.com wrote:
On Fri, Sep 19, 2025 at 03:37:06PM +0800, Xu Lu wrote:
This patch adds support for the Zalasr ISA extension, which supplies the real load acquire/store release instructions.
The specification can be found here: https://github.com/riscv/riscv-zalasr/blob/main/chapter2.adoc
This patch seires has been tested with ltp on Qemu with Brensan's zalasr support patch[1].
Some false positive spacing error happens during patch checking. Thus I CCed maintainers of checkpatch.pl as well.
[1] https://lore.kernel.org/all/CAGPSXwJEdtqW=nx71oufZp64nK6tK=0rytVEcz4F-gfvCOX...
v3:
- Apply acquire/release semantics to arch_xchg/arch_cmpxchg operations
so as to ensure FENCE.TSO ordering between operations which precede the UNLOCK+LOCK sequence and operations which follow the sequence. Thanks to Andrea.
- Support hwprobe of Zalasr.
- Allow Zalasr extensions for Guest/VM.
v2:
- Adjust the order of Zalasr and Zalrsc in dt-bindings. Thanks to
Conor.
Xu Lu (8): riscv: add ISA extension parsing for Zalasr dt-bindings: riscv: Add Zalasr ISA extension description riscv: hwprobe: Export Zalasr extension riscv: Introduce Zalasr instructions riscv: Use Zalasr for smp_load_acquire/smp_store_release riscv: Apply acquire/release semantics to arch_xchg/arch_cmpxchg operations RISC-V: KVM: Allow Zalasr extensions for Guest/VM KVM: riscv: selftests: Add Zalasr extensions to get-reg-list test
Documentation/arch/riscv/hwprobe.rst | 5 +- .../devicetree/bindings/riscv/extensions.yaml | 5 + arch/riscv/include/asm/atomic.h | 6 - arch/riscv/include/asm/barrier.h | 91 ++++++++++-- arch/riscv/include/asm/cmpxchg.h | 136 ++++++++---------- arch/riscv/include/asm/hwcap.h | 1 + arch/riscv/include/asm/insn-def.h | 79 ++++++++++ arch/riscv/include/uapi/asm/hwprobe.h | 1 + arch/riscv/include/uapi/asm/kvm.h | 1 + arch/riscv/kernel/cpufeature.c | 1 + arch/riscv/kernel/sys_hwprobe.c | 1 + arch/riscv/kvm/vcpu_onereg.c | 2 + .../selftests/kvm/riscv/get-reg-list.c | 4 + 13 files changed, 242 insertions(+), 91 deletions(-)
I wouldn't have rushed this submission while the discussion on v2 seems so much alive; IAC, to add and link to that discussion, this version
Thanks. This version is sent out to show my solution to the FENCE.TSO problem you pointed out before. I will continue to improve it. Look forward to more suggestions from you.
(not a review, just looking at this diff stat) is changing the fastpath
read_unlock() read_lock()
from something like
fence rw,w amodadd.w amoadd.w fence r,rw
to
fence rw,rw amoadd.w amoadd.w fence rw,rw
no matter Zalasr or !Zalasr. Similarly for other atomic operations with release or acquire semantics. I guess the change was not intentional? If it was intentional, it should be properly mentioned in the changelog.
Sorry about that. It is intended. The atomic operation before __atomic_acquire_fence or operation after __atomic_release_fence can be just a single ld or sd instruction instead of amocas or amoswap. In such cases, when the store release operation becomes 'sd.rl', the __atomic_acquire_fence via 'fence r, rw' can not ensure FENCE.TSO anymore. Thus I replace it with 'fence rw, rw'.
I will make it a separate commit and provide more messages in the changelog. Maybe alternative mechanism can be applied to accelerate it.
Best Regards, Xu Lu
Andrea
On Fri, Sep 19, 2025 at 6:39 PM Xu Lu luxu.kernel@bytedance.com wrote:
Hi Andrea,
On Fri, Sep 19, 2025 at 6:04 PM Andrea Parri parri.andrea@gmail.com wrote:
On Fri, Sep 19, 2025 at 03:37:06PM +0800, Xu Lu wrote:
This patch adds support for the Zalasr ISA extension, which supplies the real load acquire/store release instructions.
The specification can be found here: https://github.com/riscv/riscv-zalasr/blob/main/chapter2.adoc
This patch seires has been tested with ltp on Qemu with Brensan's zalasr support patch[1].
Some false positive spacing error happens during patch checking. Thus I CCed maintainers of checkpatch.pl as well.
[1] https://lore.kernel.org/all/CAGPSXwJEdtqW=nx71oufZp64nK6tK=0rytVEcz4F-gfvCOX...
v3:
- Apply acquire/release semantics to arch_xchg/arch_cmpxchg operations
so as to ensure FENCE.TSO ordering between operations which precede the UNLOCK+LOCK sequence and operations which follow the sequence. Thanks to Andrea.
- Support hwprobe of Zalasr.
- Allow Zalasr extensions for Guest/VM.
v2:
- Adjust the order of Zalasr and Zalrsc in dt-bindings. Thanks to
Conor.
Xu Lu (8): riscv: add ISA extension parsing for Zalasr dt-bindings: riscv: Add Zalasr ISA extension description riscv: hwprobe: Export Zalasr extension riscv: Introduce Zalasr instructions riscv: Use Zalasr for smp_load_acquire/smp_store_release riscv: Apply acquire/release semantics to arch_xchg/arch_cmpxchg operations RISC-V: KVM: Allow Zalasr extensions for Guest/VM KVM: riscv: selftests: Add Zalasr extensions to get-reg-list test
Documentation/arch/riscv/hwprobe.rst | 5 +- .../devicetree/bindings/riscv/extensions.yaml | 5 + arch/riscv/include/asm/atomic.h | 6 - arch/riscv/include/asm/barrier.h | 91 ++++++++++-- arch/riscv/include/asm/cmpxchg.h | 136 ++++++++---------- arch/riscv/include/asm/hwcap.h | 1 + arch/riscv/include/asm/insn-def.h | 79 ++++++++++ arch/riscv/include/uapi/asm/hwprobe.h | 1 + arch/riscv/include/uapi/asm/kvm.h | 1 + arch/riscv/kernel/cpufeature.c | 1 + arch/riscv/kernel/sys_hwprobe.c | 1 + arch/riscv/kvm/vcpu_onereg.c | 2 + .../selftests/kvm/riscv/get-reg-list.c | 4 + 13 files changed, 242 insertions(+), 91 deletions(-)
I wouldn't have rushed this submission while the discussion on v2 seems so much alive; IAC, to add and link to that discussion, this version
Thanks. This version is sent out to show my solution to the FENCE.TSO problem you pointed out before. I will continue to improve it. Look forward to more suggestions from you.
(not a review, just looking at this diff stat) is changing the fastpath
read_unlock() read_lock()
from something like
fence rw,w amodadd.w amoadd.w fence r,rw
to
fence rw,rw amoadd.w amoadd.w fence rw,rw
no matter Zalasr or !Zalasr. Similarly for other atomic operations with release or acquire semantics. I guess the change was not intentional? If it was intentional, it should be properly mentioned in the changelog.
Sorry about that. It is intended. The atomic operation before __atomic_acquire_fence or operation after __atomic_release_fence can be just a single ld or sd instruction instead of amocas or amoswap. In such cases, when the store release operation becomes 'sd.rl', the __atomic_acquire_fence via 'fence r, rw' can not ensure FENCE.TSO anymore. Thus I replace it with 'fence rw, rw'.
This is also the common implementation on other architectures who use aq/rl instructions like ARM. And you certainly already knew it~
I will make it a separate commit and provide more messages in the changelog. Maybe alternative mechanism can be applied to accelerate it.
Best Regards, Xu Lu
Andrea
(not a review, just looking at this diff stat) is changing the fastpath
read_unlock() read_lock()
from something like
fence rw,w amodadd.w amoadd.w fence r,rw
to
fence rw,rw amoadd.w amoadd.w fence rw,rw
no matter Zalasr or !Zalasr. Similarly for other atomic operations with release or acquire semantics. I guess the change was not intentional? If it was intentional, it should be properly mentioned in the changelog.
Sorry about that. It is intended. The atomic operation before __atomic_acquire_fence or operation after __atomic_release_fence can be just a single ld or sd instruction instead of amocas or amoswap. In such cases, when the store release operation becomes 'sd.rl', the __atomic_acquire_fence via 'fence r, rw' can not ensure FENCE.TSO anymore. Thus I replace it with 'fence rw, rw'.
But you could apply similar changes you performed for xchg & cmpxchg: use .AQ and .RL for other atomic RMW operations as well, no? AFAICS, that is what arm64 actually does in arch/arm64/include/asm/atomic_{ll_sc,lse}.h .
Andrea
This is also the common implementation on other architectures who use aq/rl instructions like ARM. And you certainly already knew it~
On Fri, Sep 19, 2025 at 7:06 PM Andrea Parri parri.andrea@gmail.com wrote:
(not a review, just looking at this diff stat) is changing the fastpath
read_unlock() read_lock()
from something like
fence rw,w amodadd.w amoadd.w fence r,rw
to
fence rw,rw amoadd.w amoadd.w fence rw,rw
no matter Zalasr or !Zalasr. Similarly for other atomic operations with release or acquire semantics. I guess the change was not intentional? If it was intentional, it should be properly mentioned in the changelog.
Sorry about that. It is intended. The atomic operation before __atomic_acquire_fence or operation after __atomic_release_fence can be just a single ld or sd instruction instead of amocas or amoswap. In such cases, when the store release operation becomes 'sd.rl', the __atomic_acquire_fence via 'fence r, rw' can not ensure FENCE.TSO anymore. Thus I replace it with 'fence rw, rw'.
But you could apply similar changes you performed for xchg & cmpxchg: use .AQ and .RL for other atomic RMW operations as well, no? AFAICS, that is what arm64 actually does in arch/arm64/include/asm/atomic_{ll_sc,lse}.h .
I see. I will study the implementation of ARM and refine my patch. Thanks a lot.
Best regards, Xu Lu
Andrea
This is also the common implementation on other architectures who use aq/rl instructions like ARM. And you certainly already knew it~
linux-kselftest-mirror@lists.linaro.org