v22: fixing build error due to -march=zicfiss being picked in gcc-13 and above but not actually doing any codegen or recognizing instruction for zicfiss. Change in v22 makes dependence on `-fcf-protection=full` compiler flag to ensure that toolchain has support and then only CONFIG_RISCV_USER_CFI will be visible in menuconfig.
v21: fixed build errors.
Basics and overview ===================
Software with larger attack surfaces (e.g. network facing apps like databases, browsers or apps relying on browser runtimes) suffer from memory corruption issues which can be utilized by attackers to bend control flow of the program to eventually gain control (by making their payload executable). Attackers are able to perform such attacks by leveraging call-sites which rely on indirect calls or return sites which rely on obtaining return address from stack memory.
To mitigate such attacks, risc-v extension zicfilp enforces that all indirect calls must land on a landing pad instruction `lpad` else cpu will raise software check exception (a new cpu exception cause code on riscv). Similarly for return flow, risc-v extension zicfiss extends architecture with
- `sspush` instruction to push return address on a shadow stack - `sspopchk` instruction to pop return address from shadow stack and compare with input operand (i.e. return address on stack) - `sspopchk` to raise software check exception if comparision above was a mismatch - Protection mechanism using which shadow stack is not writeable via regular store instructions
More information an details can be found at extensions github repo [1].
Equivalent to landing pad (zicfilp) on x86 is `ENDBRANCH` instruction in Intel CET [3] and branch target identification (BTI) [4] on arm. Similarly x86's Intel CET has shadow stack [5] and arm64 has guarded control stack (GCS) [6] which are very similar to risc-v's zicfiss shadow stack.
x86 and arm64 support for user mode shadow stack is already in mainline.
Kernel awareness for user control flow integrity ================================================
This series picks up Samuel Holland's envcfg changes [2] as well. So if those are being applied independently, they should be removed from this series.
Enabling:
In order to maintain compatibility and not break anything in user mode, kernel doesn't enable control flow integrity cpu extensions on binary by default. Instead exposes a prctl interface to enable, disable and lock the shadow stack or landing pad feature for a task. This allows userspace (loader) to enumerate if all objects in its address space are compiled with shadow stack and landing pad support and accordingly enable the feature. Additionally if a subsequent `dlopen` happens on a library, user mode can take a decision again to disable the feature (if incoming library is not compiled with support) OR terminate the task (if user mode policy is strict to have all objects in address space to be compiled with control flow integirty cpu feature). prctl to enable shadow stack results in allocating shadow stack from virtual memory and activating for user address space. x86 and arm64 are also following same direction due to similar reason(s).
clone/fork:
On clone and fork, cfi state for task is inherited by child. Shadow stack is part of virtual memory and is a writeable memory from kernel perspective (writeable via a restricted set of instructions aka shadow stack instructions) Thus kernel changes ensure that this memory is converted into read-only when fork/clone happens and COWed when fault is taken due to sspush, sspopchk or ssamoswap. In case `CLONE_VM` is specified and shadow stack is to be enabled, kernel will automatically allocate a shadow stack for that clone call.
map_shadow_stack:
x86 introduced `map_shadow_stack` system call to allow user space to explicitly map shadow stack memory in its address space. It is useful to allocate shadow for different contexts managed by a single thread (green threads or contexts) risc-v implements this system call as well.
signal management:
If shadow stack is enabled for a task, kernel performs an asynchronous control flow diversion to deliver the signal and eventually expects userspace to issue sigreturn so that original execution can be resumed. Even though resume context is prepared by kernel, it is in user space memory and is subject to memory corruption and corruption bugs can be utilized by attacker in this race window to perform arbitrary sigreturn and eventually bypass cfi mechanism. Another issue is how to ensure that cfi related state on sigcontext area is not trampled by legacy apps or apps compiled with old kernel headers.
In order to mitigate control-flow hijacting, kernel prepares a token and place it on shadow stack before signal delivery and places address of token in sigcontext structure. During sigreturn, kernel obtains address of token from sigcontext struture, reads token from shadow stack and validates it and only then allow sigreturn to succeed. Compatiblity issue is solved by adopting dynamic sigcontext management introduced for vector extension. This series re-factor the code little bit to allow future sigcontext management easy (as proposed by Andy Chiu from SiFive)
config and compilation:
Introduce a new risc-v config option `CONFIG_RISCV_USER_CFI`. Selecting this config option picks the kernel support for user control flow integrity. This optin is presented only if toolchain has shadow stack and landing pad support. And is on purpose guarded by toolchain support. Reason being that eventually vDSO also needs to be compiled in with shadow stack and landing pad support. vDSO compile patches are not included as of now because landing pad labeling scheme is yet to settle for usermode runtime.
To get more information on kernel interactions with respect to zicfilp and zicfiss, patch series adds documentation for `zicfilp` and `zicfiss` in following: Documentation/arch/riscv/zicfiss.rst Documentation/arch/riscv/zicfilp.rst
How to test this series =======================
Toolchain --------- $ git clone git@github.com:sifive/riscv-gnu-toolchain.git -b cfi-dev $ riscv-gnu-toolchain/configure --prefix=<path-to-where-to-build> --with-arch=rv64gc_zicfilp_zicfiss --enable-linux --disable-gdb --with-extra-multilib-test="rv64gc_zicfilp_zicfiss-lp64d:-static" $ make -j$(nproc)
Qemu ---- Get the lastest qemu $ cd qemu $ mkdir build $ cd build $ ../configure --target-list=riscv64-softmmu $ make -j$(nproc)
Opensbi ------- $ git clone git@github.com:deepak0414/opensbi.git -b v6_cfi_spec_split_opensbi $ make CROSS_COMPILE=<your riscv toolchain> -j$(nproc) PLATFORM=generic
Linux ----- Running defconfig is fine. CFI is enabled by default if the toolchain supports it.
$ make ARCH=riscv CROSS_COMPILE=<path-to-cfi-riscv-gnu-toolchain>/build/bin/riscv64-unknown-linux-gnu- -j$(nproc) defconfig $ make ARCH=riscv CROSS_COMPILE=<path-to-cfi-riscv-gnu-toolchain>/build/bin/riscv64-unknown-linux-gnu- -j$(nproc)
Running -------
Modify your qemu command to have: -bios <path-to-cfi-opensbi>/build/platform/generic/firmware/fw_dynamic.bin -cpu rv64,zicfilp=true,zicfiss=true,zimop=true,zcmop=true
References ========== [1] - https://github.com/riscv/riscv-cfi [2] - https://lore.kernel.org/all/20240814081126.956287-1-samuel.holland@sifive.co... [3] - https://lwn.net/Articles/889475/ [4] - https://developer.arm.com/documentation/109576/0100/Branch-Target-Identifica... [5] - https://www.intel.com/content/dam/develop/external/us/en/documents/catc17-in... [6] - https://lwn.net/Articles/940403/
To: Thomas Gleixner tglx@linutronix.de To: Ingo Molnar mingo@redhat.com To: Borislav Petkov bp@alien8.de To: Dave Hansen dave.hansen@linux.intel.com To: x86@kernel.org To: H. Peter Anvin hpa@zytor.com To: Andrew Morton akpm@linux-foundation.org To: Liam R. Howlett Liam.Howlett@oracle.com To: Vlastimil Babka vbabka@suse.cz To: Lorenzo Stoakes lorenzo.stoakes@oracle.com To: Paul Walmsley paul.walmsley@sifive.com To: Palmer Dabbelt palmer@dabbelt.com To: Albert Ou aou@eecs.berkeley.edu To: Conor Dooley conor@kernel.org To: Rob Herring robh@kernel.org To: Krzysztof Kozlowski krzk+dt@kernel.org To: Arnd Bergmann arnd@arndb.de To: Christian Brauner brauner@kernel.org To: Peter Zijlstra peterz@infradead.org To: Oleg Nesterov oleg@redhat.com To: Eric Biederman ebiederm@xmission.com To: Kees Cook kees@kernel.org To: Jonathan Corbet corbet@lwn.net To: Shuah Khan shuah@kernel.org To: Jann Horn jannh@google.com To: Conor Dooley conor+dt@kernel.org To: Miguel Ojeda ojeda@kernel.org To: Alex Gaynor alex.gaynor@gmail.com To: Boqun Feng boqun.feng@gmail.com To: Gary Guo gary@garyguo.net To: Björn Roy Baron bjorn3_gh@protonmail.com To: Benno Lossin benno.lossin@proton.me To: Andreas Hindborg a.hindborg@kernel.org To: Alice Ryhl aliceryhl@google.com To: Trevor Gross tmgross@umich.edu Cc: linux-kernel@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org Cc: linux-mm@kvack.org Cc: linux-riscv@lists.infradead.org Cc: devicetree@vger.kernel.org Cc: linux-arch@vger.kernel.org Cc: linux-doc@vger.kernel.org Cc: linux-kselftest@vger.kernel.org Cc: alistair.francis@wdc.com Cc: richard.henderson@linaro.org Cc: jim.shu@sifive.com Cc: andybnac@gmail.com Cc: kito.cheng@sifive.com Cc: charlie@rivosinc.com Cc: atishp@rivosinc.com Cc: evan@rivosinc.com Cc: cleger@rivosinc.com Cc: alexghiti@rivosinc.com Cc: samitolvanen@google.com Cc: broonie@kernel.org Cc: rick.p.edgecombe@intel.com Cc: rust-for-linux@vger.kernel.org
changelog --------- v22: - CONFIG_RISCV_USER_CFI was by default "n". With dual vdso support it is default "y" (if toolchain supports it). Fixing build error due to "-march=zicfiss" being picked in gcc-13 partially. gcc-13 only recognizes the flag but not actually doing any codegen or recognizing instruction for zicfiss. Change in v22 makes dependence on `-fcf-protection=full` compiler flag to ensure that toolchain has support and then only CONFIG_RISCV_USER_CFI will be visible in menuconfig. - picked up tags and some cosmetic changes in commit message for dual vdso patch.
v21: - Fixing build errors due to changes in arch/riscv/include/asm/vdso.h Using #ifdef instead of IS_ENABLED in arch/riscv/include/asm/vdso.h vdso-cfi-offsets.h should be included only when CONFIG_RISCV_USER_CFI is selected.
v20: - rebased on v6.18-rc1. - Added two vDSO support. If `CONFIG_RISCV_USER_CFI` is selected two vDSOs are compiled (one for hardware prior to RVA23 and one for RVA23 onwards). Kernel exposes RVA23 vDSO if hardware/cpu implements zimop else exposes existing vDSO to userspace. - default selection for `CONFIG_RISCV_USER_CFI` is "Yes". - replaced "__ASSEMBLY__" with "__ASSEMBLER__"
v19: - riscv_nousercfi was `int`. changed it to unsigned long. Thanks to Alex Ghiti for reporting it. It was a bug. - ELP is cleared on trap entry only when CONFIG_64BIT. - restore ssp back on return to usermode was being done before `riscv_v_context_nesting_end` on trap exit path. If kernel shadow stack were enabled this would result in kernel operating on user shadow stack and panic (as I found in my testing of kcfi patch series). So fixed that.
v18: - rebased on 6.16-rc1 - uprobe handling clears ELP in sstatus image in pt_regs - vdso was missing shadow stack elf note for object files. added that. Additional asm file for vdso needed the elf marker flag. toolchain should complain if `-fcf-protection=full` and marker is missing for object generated from asm file. Asked toolchain folks to fix this. Although no reason to gate the merge on that. - Split up compile options for march and fcf-protection in vdso Makefile - CONFIG_RISCV_USER_CFI option is moved under "Kernel features" menu Added `arch/riscv/configs/hardening.config` fragment which selects CONFIG_RISCV_USER_CFI
v17: - fixed warnings due to empty macros in usercfi.h (reported by alexg) - fixed prefixes in commit titles reported by alexg - took below uprobe with fcfi v2 patch from Zong Li and squashed it with "riscv/traps: Introduce software check exception and uprobe handling" https://lore.kernel.org/all/20250604093403.10916-1-zong.li@sifive.com/
v16: - If FWFT is not implemented or returns error for shadow stack activation, then no_usercfi is set to disable shadow stack. Although this should be picked up by extension validation and activation. Fixed this bug for zicfilp and zicfiss both. Thanks to Charlie Jenkins for reporting this. - If toolchain doesn't support cfi, cfi kselftest shouldn't build. Suggested by Charlie Jenkins. - Default for CONFIG_RISCV_USER_CFI is set to no. Charlie/Atish suggested to keep it off till we have more hardware availibility with RVA23 profile and zimop/zcmop implemented. Else this will start breaking people's workflow - Includes the fix if "!RV64 and !SBI" then definitions for FWFT in asm-offsets.c error.
v15: - Toolchain has been updated to include `-fcf-protection` flag. This exists for x86 as well. Updated kernel patches to compile vDSO and selftest to compile with `fcf-protection=full` flag. - selecting CONFIG_RISCV_USERCFI selects CONFIG_RISCV_SBI. - Patch to enable shadow stack for kernel wasn't hidden behind CONFIG_RISCV_USERCFI and CONFIG_RISCV_SBI. fixed that.
v14: - rebased on top of palmer/sbi-v3. Thus dropped clement's FWFT patches Updated RISCV_ISA_EXT_XXXX in hwcap and hwprobe constants. - Took Radim's suggestions on bitfields. - Placed cfi_state at the end of thread_info block so that current situation is not disturbed with respect to member fields of thread_info in single cacheline.
v13: - cpu_supports_shadow_stack/cpu_supports_indirect_br_lp_instr uses riscv_has_extension_unlikely() - uses nops(count) to create nop slide - RISCV_ACQUIRE_BARRIER is not needed in `amo_user_shstk`. Removed it - changed ternaries to simply use implicit casting to convert to bool. - kernel command line allows to disable zicfilp and zicfiss independently. updated kernel-parameters.txt. - ptrace user abi for cfi uses bitmasks instead of bitfields. Added ptrace kselftest. - cosmetic and grammatical changes to documentation.
v12: - It seems like I had accidently squashed arch agnostic indirect branch tracking prctl and riscv implementation of those prctls. Split them again. - set_shstk_status/set_indir_lp_status perform CSR writes only when CPU support is available. As suggested by Zong Li. - Some minor clean up in kselftests as suggested by Zong Li.
v11: - patch "arch/riscv: compile vdso with landing pad" was unconditionally selecting `_zicfilp` for vDSO compile. fixed that. Changed `lpad 1` to to `lpad 0`. v10: - dropped "mm: helper `is_shadow_stack_vma` to check shadow stack vma". This patch is not that interesting to this patch series for risc-v. There are instances in arch directories where VM_SHADOW_STACK flag is anyways used. Dropping this patch to expedite merging in riscv tree. - Took suggestions from `Clement` on "riscv: zicfiss / zicfilp enumeration" to validate presence of cfi based on config. - Added a patch for vDSO to have `lpad 0`. I had omitted this earlier to make sure we add single vdso object with cfi enabled. But a vdso object with scheme of zero labeled landing pad is least common denominator and should work with all objects of zero labeled as well as function-signature labeled objects.
v9: - rebased on master (39a803b754d5 fix braino in "9p: fix ->rename_sem exclusion") - dropped "mm: Introduce ARCH_HAS_USER_SHADOW_STACK" (master has it from arm64/gcs) - dropped "prctl: arch-agnostic prctl for shadow stack" (master has it from arm64/gcs)
v8: - rebased on palmer/for-next - dropped samuel holland's `envcfg` context switch patches. they are in parlmer/for-next
v7: - Removed "riscv/Kconfig: enable HAVE_EXIT_THREAD for riscv" Instead using `deactivate_mm` flow to clean up. see here for more context https://lore.kernel.org/all/20230908203655.543765-1-rick.p.edgecombe@intel.c... - Changed the header include in `kselftest`. Hopefully this fixes compile issue faced by Zong Li at SiFive. - Cleaned up an orphaned change to `mm/mmap.c` in below patch "riscv/mm : ensure PROT_WRITE leads to VM_READ | VM_WRITE" - Lock interfaces for shadow stack and indirect branch tracking expect arg == 0 Any future evolution of this interface should accordingly define how arg should be setup. - `mm/map.c` has an instance of using `VM_SHADOW_STACK`. Fixed it to use helper `is_shadow_stack_vma`. - Link to v6: https://lore.kernel.org/r/20241008-v5_user_cfi_series-v6-0-60d9fe073f37@rivo...
v6: - Picked up Samuel Holland's changes as is with `envcfg` placed in `thread` instead of `thread_info` - fixed unaligned newline escapes in kselftest - cleaned up messages in kselftest and included test output in commit message - fixed a bug in clone path reported by Zong Li - fixed a build issue if CONFIG_RISCV_ISA_V is not selected (this was introduced due to re-factoring signal context management code)
v5: - rebased on v6.12-rc1 - Fixed schema related issues in device tree file - Fixed some of the documentation related issues in zicfilp/ss.rst (style issues and added index) - added `SHADOW_STACK_SET_MARKER` so that implementation can define base of shadow stack. - Fixed warnings on definitions added in usercfi.h when CONFIG_RISCV_USER_CFI is not selected. - Adopted context header based signal handling as proposed by Andy Chiu - Added support for enabling kernel mode access to shadow stack using FWFT (https://github.com/riscv-non-isa/riscv-sbi-doc/blob/master/src/ext-firmware-...) - Link to v5: https://lore.kernel.org/r/20241001-v5_user_cfi_series-v1-0-3ba65b6e550f@rivo... (Note: I had an issue in my workflow due to which version number wasn't picked up correctly while sending out patches)
v4: - rebased on 6.11-rc6 - envcfg: Converged with Samuel Holland's patches for envcfg management on per- thread basis. - vma_is_shadow_stack is renamed to is_vma_shadow_stack - picked up Mark Brown's `ARCH_HAS_USER_SHADOW_STACK` patch - signal context: using extended context management to maintain compatibility. - fixed `-Wmissing-prototypes` compiler warnings for prctl functions - Documentation fixes and amending typos. - Link to v4: https://lore.kernel.org/all/20240912231650.3740732-1-debug@rivosinc.com/
v3: - envcfg logic to pick up base envcfg had a bug where `ENVCFG_CBZE` could have been picked on per task basis, even though CPU didn't implement it. Fixed in this series.
- dt-bindings As suggested, split into separate commit. fixed the messaging that spec is in public review
- arch_is_shadow_stack change arch_is_shadow_stack changed to vma_is_shadow_stack
- hwprobe zicfiss / zicfilp if present will get enumerated in hwprobe
- selftests As suggested, added object and binary filenames to .gitignore Selftest binary anyways need to be compiled with cfi enabled compiler which will make sure that landing pad and shadow stack are enabled. Thus removed separate enable/disable tests. Cleaned up tests a bit.
- Link to v3: https://lore.kernel.org/lkml/20240403234054.2020347-1-debug@rivosinc.com/
v2: - Using config `CONFIG_RISCV_USER_CFI`, kernel support for riscv control flow integrity for user mode programs can be compiled in the kernel.
- Enabling of control flow integrity for user programs is left to user runtime
- This patch series introduces arch agnostic `prctls` to enable shadow stack and indirect branch tracking. And implements them on riscv.
--- Changes in v22: - Link to v21: https://lore.kernel.org/r/20251015-v5_user_cfi_series-v21-0-6a07856e90e7@riv...
Changes in v21: - Link to v20: https://lore.kernel.org/r/20251013-v5_user_cfi_series-v20-0-b9de4be9912e@riv...
Changes in v20: - Link to v19: https://lore.kernel.org/r/20250731-v5_user_cfi_series-v19-0-09b468d7beab@riv...
Changes in v19: - Link to v18: https://lore.kernel.org/r/20250711-v5_user_cfi_series-v18-0-a8ee62f9f38e@riv...
Changes in v18: - Link to v17: https://lore.kernel.org/r/20250604-v5_user_cfi_series-v17-0-4565c2cf869f@riv...
Changes in v17: - Link to v16: https://lore.kernel.org/r/20250522-v5_user_cfi_series-v16-0-64f61a35eee7@riv...
Changes in v16: - Link to v15: https://lore.kernel.org/r/20250502-v5_user_cfi_series-v15-0-914966471885@riv...
Changes in v15: - changelog posted just below cover letter - Link to v14: https://lore.kernel.org/r/20250429-v5_user_cfi_series-v14-0-5239410d012a@riv...
Changes in v14:
- changelog posted just below cover letter - Link to v13: https://lore.kernel.org/r/20250424-v5_user_cfi_series-v13-0-971437de586a@riv...
Changes in v13: - changelog posted just below cover letter - Link to v12: https://lore.kernel.org/r/20250314-v5_user_cfi_series-v12-0-e51202b53138@riv...
Changes in v12: - changelog posted just below cover letter - Link to v11: https://lore.kernel.org/r/20250310-v5_user_cfi_series-v11-0-86b36cbfb910@riv...
Changes in v11: - changelog posted just below cover letter - Link to v10: https://lore.kernel.org/r/20250210-v5_user_cfi_series-v10-0-163dcfa31c60@riv...
--- Andy Chiu (1): riscv: signal: abstract header saving for setup_sigcontext
Deepak Gupta (26): mm: VM_SHADOW_STACK definition for riscv dt-bindings: riscv: zicfilp and zicfiss in dt-bindings (extensions.yaml) riscv: zicfiss / zicfilp enumeration riscv: zicfiss / zicfilp extension csr and bit definitions riscv: usercfi state for task and save/restore of CSR_SSP on trap entry/exit riscv/mm : ensure PROT_WRITE leads to VM_READ | VM_WRITE riscv/mm: manufacture shadow stack pte riscv/mm: teach pte_mkwrite to manufacture shadow stack PTEs riscv/mm: write protect and shadow stack riscv/mm: Implement map_shadow_stack() syscall riscv/shstk: If needed allocate a new shadow stack on clone riscv: Implements arch agnostic shadow stack prctls prctl: arch-agnostic prctl for indirect branch tracking riscv: Implements arch agnostic indirect branch tracking prctls riscv/traps: Introduce software check exception and uprobe handling riscv/signal: save and restore of shadow stack for signal riscv/kernel: update __show_regs to print shadow stack register riscv/ptrace: riscv cfi status and state via ptrace and in core files riscv/hwprobe: zicfilp / zicfiss enumeration in hwprobe riscv: kernel command line option to opt out of user cfi riscv: enable kernel access to shadow stack memory via FWFT sbi call arch/riscv: dual vdso creation logic and select vdso based on hw riscv: create a config for shadow stack and landing pad instr support riscv: Documentation for landing pad / indirect branch tracking riscv: Documentation for shadow stack on riscv kselftest/riscv: kselftest for user mode cfi
Jim Shu (1): arch/riscv: compile vdso with landing pad and shadow stack note
Documentation/admin-guide/kernel-parameters.txt | 8 + Documentation/arch/riscv/index.rst | 2 + Documentation/arch/riscv/zicfilp.rst | 115 +++++ Documentation/arch/riscv/zicfiss.rst | 179 +++++++ .../devicetree/bindings/riscv/extensions.yaml | 14 + arch/riscv/Kconfig | 22 + arch/riscv/Makefile | 8 +- arch/riscv/configs/hardening.config | 4 + arch/riscv/include/asm/asm-prototypes.h | 1 + arch/riscv/include/asm/assembler.h | 44 ++ arch/riscv/include/asm/cpufeature.h | 12 + arch/riscv/include/asm/csr.h | 16 + arch/riscv/include/asm/entry-common.h | 2 + arch/riscv/include/asm/hwcap.h | 2 + arch/riscv/include/asm/mman.h | 26 + arch/riscv/include/asm/mmu_context.h | 7 + arch/riscv/include/asm/pgtable.h | 30 +- arch/riscv/include/asm/processor.h | 1 + arch/riscv/include/asm/thread_info.h | 3 + arch/riscv/include/asm/usercfi.h | 95 ++++ arch/riscv/include/asm/vdso.h | 13 +- arch/riscv/include/asm/vector.h | 3 + arch/riscv/include/uapi/asm/hwprobe.h | 2 + arch/riscv/include/uapi/asm/ptrace.h | 34 ++ arch/riscv/include/uapi/asm/sigcontext.h | 1 + arch/riscv/kernel/Makefile | 2 + arch/riscv/kernel/asm-offsets.c | 10 + arch/riscv/kernel/cpufeature.c | 27 + arch/riscv/kernel/entry.S | 38 ++ arch/riscv/kernel/head.S | 27 + arch/riscv/kernel/process.c | 27 +- arch/riscv/kernel/ptrace.c | 95 ++++ arch/riscv/kernel/signal.c | 148 +++++- arch/riscv/kernel/sys_hwprobe.c | 2 + arch/riscv/kernel/sys_riscv.c | 10 + arch/riscv/kernel/traps.c | 54 ++ arch/riscv/kernel/usercfi.c | 545 +++++++++++++++++++++ arch/riscv/kernel/vdso.c | 7 + arch/riscv/kernel/vdso/Makefile | 40 +- arch/riscv/kernel/vdso/flush_icache.S | 4 + arch/riscv/kernel/vdso/gen_vdso_offsets.sh | 4 +- arch/riscv/kernel/vdso/getcpu.S | 4 + arch/riscv/kernel/vdso/note.S | 3 + arch/riscv/kernel/vdso/rt_sigreturn.S | 4 + arch/riscv/kernel/vdso/sys_hwprobe.S | 4 + arch/riscv/kernel/vdso/vgetrandom-chacha.S | 5 +- arch/riscv/kernel/vdso_cfi/Makefile | 25 + arch/riscv/kernel/vdso_cfi/vdso-cfi.S | 11 + arch/riscv/mm/init.c | 2 +- arch/riscv/mm/pgtable.c | 16 + include/linux/cpu.h | 4 + include/linux/mm.h | 7 + include/uapi/linux/elf.h | 2 + include/uapi/linux/prctl.h | 27 + kernel/sys.c | 30 ++ tools/testing/selftests/riscv/Makefile | 2 +- tools/testing/selftests/riscv/cfi/.gitignore | 3 + tools/testing/selftests/riscv/cfi/Makefile | 16 + tools/testing/selftests/riscv/cfi/cfi_rv_test.h | 82 ++++ tools/testing/selftests/riscv/cfi/riscv_cfi_test.c | 173 +++++++ tools/testing/selftests/riscv/cfi/shadowstack.c | 385 +++++++++++++++ tools/testing/selftests/riscv/cfi/shadowstack.h | 27 + 62 files changed, 2475 insertions(+), 41 deletions(-) --- base-commit: 3a8660878839faadb4f1a6dd72c3179c1df56787 change-id: 20240930-v5_user_cfi_series-3dc332f8f5b2 -- - debug
VM_HIGH_ARCH_5 is used for riscv
Reviewed-by: Zong Li zong.li@sifive.com Reviewed-by: Alexandre Ghiti alexghiti@rivosinc.com Acked-by: David Hildenbrand david@redhat.com Signed-off-by: Deepak Gupta debug@rivosinc.com --- include/linux/mm.h | 7 +++++++ 1 file changed, 7 insertions(+)
diff --git a/include/linux/mm.h b/include/linux/mm.h index d16b33bacc32..2032d3f195f1 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -380,6 +380,13 @@ extern unsigned int kobjsize(const void *objp); # define VM_SHADOW_STACK VM_HIGH_ARCH_6 #endif
+#if defined(CONFIG_RISCV_USER_CFI) +/* + * Following x86 and picking up the same bitpos. + */ +# define VM_SHADOW_STACK VM_HIGH_ARCH_5 +#endif + #ifndef VM_SHADOW_STACK # define VM_SHADOW_STACK VM_NONE #endif
Make an entry for cfi extensions in extensions.yaml.
Signed-off-by: Deepak Gupta debug@rivosinc.com Acked-by: Rob Herring (Arm) robh@kernel.org --- Documentation/devicetree/bindings/riscv/extensions.yaml | 14 ++++++++++++++ 1 file changed, 14 insertions(+)
diff --git a/Documentation/devicetree/bindings/riscv/extensions.yaml b/Documentation/devicetree/bindings/riscv/extensions.yaml index 543ac94718e8..3222326e32eb 100644 --- a/Documentation/devicetree/bindings/riscv/extensions.yaml +++ b/Documentation/devicetree/bindings/riscv/extensions.yaml @@ -444,6 +444,20 @@ properties: The standard Zicboz extension for cache-block zeroing as ratified in commit 3dd606f ("Create cmobase-v1.0.pdf") of riscv-CMOs.
+ - const: zicfilp + description: | + The standard Zicfilp extension for enforcing forward edge + control-flow integrity as ratified in commit 3f8e450 ("merge + pull request #227 from ved-rivos/0709") of riscv-cfi + github repo. + + - const: zicfiss + description: | + The standard Zicfiss extension for enforcing backward edge + control-flow integrity as ratified in commit 3f8e450 ("merge + pull request #227 from ved-rivos/0709") of riscv-cfi + github repo. + - const: zicntr description: The standard Zicntr extension for base counters and timers, as
This patch adds support for detecting zicfiss and zicfilp. zicfiss and zicfilp stands for unprivleged integer spec extension for shadow stack and branch tracking on indirect branches, respectively.
This patch looks for zicfiss and zicfilp in device tree and accordinlgy lights up bit in cpu feature bitmap. Furthermore this patch adds detection utility functions to return whether shadow stack or landing pads are supported by cpu.
Reviewed-by: Zong Li zong.li@sifive.com Reviewed-by: Alexandre Ghiti alexghiti@rivosinc.com Signed-off-by: Deepak Gupta debug@rivosinc.com --- arch/riscv/include/asm/cpufeature.h | 12 ++++++++++++ arch/riscv/include/asm/hwcap.h | 2 ++ arch/riscv/kernel/cpufeature.c | 22 ++++++++++++++++++++++ 3 files changed, 36 insertions(+)
diff --git a/arch/riscv/include/asm/cpufeature.h b/arch/riscv/include/asm/cpufeature.h index fbd0e4306c93..481f483ebf15 100644 --- a/arch/riscv/include/asm/cpufeature.h +++ b/arch/riscv/include/asm/cpufeature.h @@ -150,4 +150,16 @@ static __always_inline bool riscv_cpu_has_extension_unlikely(int cpu, const unsi return __riscv_isa_extension_available(hart_isa[cpu].isa, ext); }
+static inline bool cpu_supports_shadow_stack(void) +{ + return (IS_ENABLED(CONFIG_RISCV_USER_CFI) && + riscv_has_extension_unlikely(RISCV_ISA_EXT_ZICFISS)); +} + +static inline bool cpu_supports_indirect_br_lp_instr(void) +{ + return (IS_ENABLED(CONFIG_RISCV_USER_CFI) && + riscv_has_extension_unlikely(RISCV_ISA_EXT_ZICFILP)); +} + #endif diff --git a/arch/riscv/include/asm/hwcap.h b/arch/riscv/include/asm/hwcap.h index affd63e11b0a..7c4619a6d70d 100644 --- a/arch/riscv/include/asm/hwcap.h +++ b/arch/riscv/include/asm/hwcap.h @@ -106,6 +106,8 @@ #define RISCV_ISA_EXT_ZAAMO 97 #define RISCV_ISA_EXT_ZALRSC 98 #define RISCV_ISA_EXT_ZICBOP 99 +#define RISCV_ISA_EXT_ZICFILP 100 +#define RISCV_ISA_EXT_ZICFISS 101
#define RISCV_ISA_EXT_XLINUXENVCFG 127
diff --git a/arch/riscv/kernel/cpufeature.c b/arch/riscv/kernel/cpufeature.c index 67b59699357d..5a1a194e1180 100644 --- a/arch/riscv/kernel/cpufeature.c +++ b/arch/riscv/kernel/cpufeature.c @@ -274,6 +274,24 @@ static int riscv_ext_svadu_validate(const struct riscv_isa_ext_data *data, return 0; }
+static int riscv_cfilp_validate(const struct riscv_isa_ext_data *data, + const unsigned long *isa_bitmap) +{ + if (!IS_ENABLED(CONFIG_RISCV_USER_CFI)) + return -EINVAL; + + return 0; +} + +static int riscv_cfiss_validate(const struct riscv_isa_ext_data *data, + const unsigned long *isa_bitmap) +{ + if (!IS_ENABLED(CONFIG_RISCV_USER_CFI)) + return -EINVAL; + + return 0; +} + static const unsigned int riscv_a_exts[] = { RISCV_ISA_EXT_ZAAMO, RISCV_ISA_EXT_ZALRSC, @@ -461,6 +479,10 @@ const struct riscv_isa_ext_data riscv_isa_ext[] = { __RISCV_ISA_EXT_DATA_VALIDATE(zicbop, RISCV_ISA_EXT_ZICBOP, riscv_ext_zicbop_validate), __RISCV_ISA_EXT_SUPERSET_VALIDATE(zicboz, RISCV_ISA_EXT_ZICBOZ, riscv_xlinuxenvcfg_exts, riscv_ext_zicboz_validate), __RISCV_ISA_EXT_DATA(ziccrse, RISCV_ISA_EXT_ZICCRSE), + __RISCV_ISA_EXT_SUPERSET_VALIDATE(zicfilp, RISCV_ISA_EXT_ZICFILP, riscv_xlinuxenvcfg_exts, + riscv_cfilp_validate), + __RISCV_ISA_EXT_SUPERSET_VALIDATE(zicfiss, RISCV_ISA_EXT_ZICFISS, riscv_xlinuxenvcfg_exts, + riscv_cfiss_validate), __RISCV_ISA_EXT_DATA(zicntr, RISCV_ISA_EXT_ZICNTR), __RISCV_ISA_EXT_DATA(zicond, RISCV_ISA_EXT_ZICOND), __RISCV_ISA_EXT_DATA(zicsr, RISCV_ISA_EXT_ZICSR),
zicfiss and zicfilp extension gets enabled via b3 and b2 in *envcfg CSR. menvcfg controls enabling for S/HS mode. henvcfg control enabling for VS while senvcfg controls enabling for U/VU mode.
zicfilp extension extends *status CSR to hold `expected landing pad` bit. A trap or interrupt can occur between an indirect jmp/call and target instr. `expected landing pad` bit from CPU is recorded into xstatus CSR so that when supervisor performs xret, `expected landing pad` state of CPU can be restored.
zicfiss adds one new CSR - CSR_SSP: CSR_SSP contains current shadow stack pointer.
Signed-off-by: Deepak Gupta debug@rivosinc.com Reviewed-by: Charlie Jenkins charlie@rivosinc.com --- arch/riscv/include/asm/csr.h | 16 ++++++++++++++++ 1 file changed, 16 insertions(+)
diff --git a/arch/riscv/include/asm/csr.h b/arch/riscv/include/asm/csr.h index 4a37a98398ad..78f573ab4c53 100644 --- a/arch/riscv/include/asm/csr.h +++ b/arch/riscv/include/asm/csr.h @@ -18,6 +18,15 @@ #define SR_MPP _AC(0x00001800, UL) /* Previously Machine */ #define SR_SUM _AC(0x00040000, UL) /* Supervisor User Memory Access */
+/* zicfilp landing pad status bit */ +#define SR_SPELP _AC(0x00800000, UL) +#define SR_MPELP _AC(0x020000000000, UL) +#ifdef CONFIG_RISCV_M_MODE +#define SR_ELP SR_MPELP +#else +#define SR_ELP SR_SPELP +#endif + #define SR_FS _AC(0x00006000, UL) /* Floating-point Status */ #define SR_FS_OFF _AC(0x00000000, UL) #define SR_FS_INITIAL _AC(0x00002000, UL) @@ -212,6 +221,8 @@ #define ENVCFG_PMM_PMLEN_16 (_AC(0x3, ULL) << 32) #define ENVCFG_CBZE (_AC(1, UL) << 7) #define ENVCFG_CBCFE (_AC(1, UL) << 6) +#define ENVCFG_LPE (_AC(1, UL) << 2) +#define ENVCFG_SSE (_AC(1, UL) << 3) #define ENVCFG_CBIE_SHIFT 4 #define ENVCFG_CBIE (_AC(0x3, UL) << ENVCFG_CBIE_SHIFT) #define ENVCFG_CBIE_ILL _AC(0x0, UL) @@ -230,6 +241,11 @@ #define SMSTATEEN0_HSENVCFG (_ULL(1) << SMSTATEEN0_HSENVCFG_SHIFT) #define SMSTATEEN0_SSTATEEN0_SHIFT 63 #define SMSTATEEN0_SSTATEEN0 (_ULL(1) << SMSTATEEN0_SSTATEEN0_SHIFT) +/* + * zicfiss user mode csr + * CSR_SSP holds current shadow stack pointer. + */ +#define CSR_SSP 0x011
/* mseccfg bits */ #define MSECCFG_PMM ENVCFG_PMM
Carves out space in arch specific thread struct for cfi status and shadow stack in usermode on riscv.
This patch does following - defines a new structure cfi_status with status bit for cfi feature - defines shadow stack pointer, base and size in cfi_status structure - defines offsets to new member fields in thread in asm-offsets.c - Saves and restore shadow stack pointer on trap entry (U --> S) and exit (S --> U)
Shadow stack save/restore is gated on feature availiblity and implemented using alternative. CSR can be context switched in `switch_to` as well but soon as kernel shadow stack support gets rolled in, shadow stack pointer will need to be switched at trap entry/exit point (much like `sp`). It can be argued that kernel using shadow stack deployment scenario may not be as prevalant as user mode using this feature. But even if there is some minimal deployment of kernel shadow stack, that means that it needs to be supported. And thus save/restore of shadow stack pointer in entry.S instead of in `switch_to.h`.
Reviewed-by: Charlie Jenkins charlie@rivosinc.com Reviewed-by: Zong Li zong.li@sifive.com Reviewed-by: Alexandre Ghiti alexghiti@rivosinc.com Signed-off-by: Deepak Gupta debug@rivosinc.com --- arch/riscv/include/asm/processor.h | 1 + arch/riscv/include/asm/thread_info.h | 3 +++ arch/riscv/include/asm/usercfi.h | 23 +++++++++++++++++++++++ arch/riscv/kernel/asm-offsets.c | 4 ++++ arch/riscv/kernel/entry.S | 31 +++++++++++++++++++++++++++++++ 5 files changed, 62 insertions(+)
diff --git a/arch/riscv/include/asm/processor.h b/arch/riscv/include/asm/processor.h index da5426122d28..4c3dd94d0f63 100644 --- a/arch/riscv/include/asm/processor.h +++ b/arch/riscv/include/asm/processor.h @@ -16,6 +16,7 @@ #include <asm/insn-def.h> #include <asm/alternative-macros.h> #include <asm/hwcap.h> +#include <asm/usercfi.h>
#define arch_get_mmap_end(addr, len, flags) \ ({ \ diff --git a/arch/riscv/include/asm/thread_info.h b/arch/riscv/include/asm/thread_info.h index 836d80dd2921..36918c9200c9 100644 --- a/arch/riscv/include/asm/thread_info.h +++ b/arch/riscv/include/asm/thread_info.h @@ -73,6 +73,9 @@ struct thread_info { */ unsigned long a0, a1, a2; #endif +#ifdef CONFIG_RISCV_USER_CFI + struct cfi_state user_cfi_state; +#endif };
#ifdef CONFIG_SHADOW_CALL_STACK diff --git a/arch/riscv/include/asm/usercfi.h b/arch/riscv/include/asm/usercfi.h new file mode 100644 index 000000000000..4c5233e8f3f9 --- /dev/null +++ b/arch/riscv/include/asm/usercfi.h @@ -0,0 +1,23 @@ +/* SPDX-License-Identifier: GPL-2.0 + * Copyright (C) 2024 Rivos, Inc. + * Deepak Gupta debug@rivosinc.com + */ +#ifndef _ASM_RISCV_USERCFI_H +#define _ASM_RISCV_USERCFI_H + +#ifndef __ASSEMBLER__ +#include <linux/types.h> + +#ifdef CONFIG_RISCV_USER_CFI +struct cfi_state { + unsigned long ubcfi_en : 1; /* Enable for backward cfi. */ + unsigned long user_shdw_stk; /* Current user shadow stack pointer */ + unsigned long shdw_stk_base; /* Base address of shadow stack */ + unsigned long shdw_stk_size; /* size of shadow stack */ +}; + +#endif /* CONFIG_RISCV_USER_CFI */ + +#endif /* __ASSEMBLER__ */ + +#endif /* _ASM_RISCV_USERCFI_H */ diff --git a/arch/riscv/kernel/asm-offsets.c b/arch/riscv/kernel/asm-offsets.c index 7d42d3b8a32a..8a2b2656cb2f 100644 --- a/arch/riscv/kernel/asm-offsets.c +++ b/arch/riscv/kernel/asm-offsets.c @@ -51,6 +51,10 @@ void asm_offsets(void) #endif
OFFSET(TASK_TI_CPU_NUM, task_struct, thread_info.cpu); +#ifdef CONFIG_RISCV_USER_CFI + OFFSET(TASK_TI_CFI_STATE, task_struct, thread_info.user_cfi_state); + OFFSET(TASK_TI_USER_SSP, task_struct, thread_info.user_cfi_state.user_shdw_stk); +#endif OFFSET(TASK_THREAD_F0, task_struct, thread.fstate.f[0]); OFFSET(TASK_THREAD_F1, task_struct, thread.fstate.f[1]); OFFSET(TASK_THREAD_F2, task_struct, thread.fstate.f[2]); diff --git a/arch/riscv/kernel/entry.S b/arch/riscv/kernel/entry.S index d3d92a4becc7..8410850953d6 100644 --- a/arch/riscv/kernel/entry.S +++ b/arch/riscv/kernel/entry.S @@ -92,6 +92,35 @@ REG_L a0, TASK_TI_A0(tp) .endm
+/* + * If previous mode was U, capture shadow stack pointer and save it away + * Zero CSR_SSP at the same time for sanitization. + */ +.macro save_userssp tmp, status + ALTERNATIVE("nops(4)", + __stringify( \ + andi \tmp, \status, SR_SPP; \ + bnez \tmp, skip_ssp_save; \ + csrrw \tmp, CSR_SSP, x0; \ + REG_S \tmp, TASK_TI_USER_SSP(tp); \ + skip_ssp_save:), + 0, + RISCV_ISA_EXT_ZICFISS, + CONFIG_RISCV_USER_CFI) +.endm + +.macro restore_userssp tmp, status + ALTERNATIVE("nops(4)", + __stringify( \ + andi \tmp, \status, SR_SPP; \ + bnez \tmp, skip_ssp_restore; \ + REG_L \tmp, TASK_TI_USER_SSP(tp); \ + csrw CSR_SSP, \tmp; \ + skip_ssp_restore:), + 0, + RISCV_ISA_EXT_ZICFISS, + CONFIG_RISCV_USER_CFI) +.endm
SYM_CODE_START(handle_exception) /* @@ -148,6 +177,7 @@ SYM_CODE_START(handle_exception)
REG_L s0, TASK_TI_USER_SP(tp) csrrc s1, CSR_STATUS, t0 + save_userssp s2, s1 csrr s2, CSR_EPC csrr s3, CSR_TVAL csrr s4, CSR_CAUSE @@ -243,6 +273,7 @@ SYM_CODE_START_NOALIGN(ret_from_exception) call riscv_v_context_nesting_end #endif REG_L a0, PT_STATUS(sp) + restore_userssp s3, a0 /* * The current load reservation is effectively part of the processor's * state, in the sense that load reservations cannot be shared between
`arch_calc_vm_prot_bits` is implemented on risc-v to return VM_READ | VM_WRITE if PROT_WRITE is specified. Similarly `riscv_sys_mmap` is updated to convert all incoming PROT_WRITE to (PROT_WRITE | PROT_READ). This is to make sure that any existing apps using PROT_WRITE still work.
Earlier `protection_map[VM_WRITE]` used to pick read-write PTE encodings. Now `protection_map[VM_WRITE]` will always pick PAGE_SHADOWSTACK PTE encodings for shadow stack. Above changes ensure that existing apps continue to work because underneath kernel will be picking `protection_map[VM_WRITE|VM_READ]` PTE encodings.
Reviewed-by: Zong Li zong.li@sifive.com Reviewed-by: Alexandre Ghiti alexghiti@rivosinc.com Signed-off-by: Arnd Bergmann arnd@arndb.de Signed-off-by: Deepak Gupta debug@rivosinc.com --- arch/riscv/include/asm/mman.h | 26 ++++++++++++++++++++++++++ arch/riscv/include/asm/pgtable.h | 1 + arch/riscv/kernel/sys_riscv.c | 10 ++++++++++ arch/riscv/mm/init.c | 2 +- 4 files changed, 38 insertions(+), 1 deletion(-)
diff --git a/arch/riscv/include/asm/mman.h b/arch/riscv/include/asm/mman.h new file mode 100644 index 000000000000..0ad1d19832eb --- /dev/null +++ b/arch/riscv/include/asm/mman.h @@ -0,0 +1,26 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef __ASM_MMAN_H__ +#define __ASM_MMAN_H__ + +#include <linux/compiler.h> +#include <linux/types.h> +#include <linux/mm.h> +#include <uapi/asm/mman.h> + +static inline unsigned long arch_calc_vm_prot_bits(unsigned long prot, + unsigned long pkey __always_unused) +{ + unsigned long ret = 0; + + /* + * If PROT_WRITE was specified, force it to VM_READ | VM_WRITE. + * Only VM_WRITE means shadow stack. + */ + if (prot & PROT_WRITE) + ret = (VM_READ | VM_WRITE); + return ret; +} + +#define arch_calc_vm_prot_bits(prot, pkey) arch_calc_vm_prot_bits(prot, pkey) + +#endif /* ! __ASM_MMAN_H__ */ diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h index 29e994a9afb6..4c4057a2550e 100644 --- a/arch/riscv/include/asm/pgtable.h +++ b/arch/riscv/include/asm/pgtable.h @@ -182,6 +182,7 @@ extern struct pt_alloc_ops pt_ops __meminitdata; #define PAGE_READ_EXEC __pgprot(_PAGE_BASE | _PAGE_READ | _PAGE_EXEC) #define PAGE_WRITE_EXEC __pgprot(_PAGE_BASE | _PAGE_READ | \ _PAGE_EXEC | _PAGE_WRITE) +#define PAGE_SHADOWSTACK __pgprot(_PAGE_BASE | _PAGE_WRITE)
#define PAGE_COPY PAGE_READ #define PAGE_COPY_EXEC PAGE_READ_EXEC diff --git a/arch/riscv/kernel/sys_riscv.c b/arch/riscv/kernel/sys_riscv.c index 795b2e815ac9..22fc9b3268be 100644 --- a/arch/riscv/kernel/sys_riscv.c +++ b/arch/riscv/kernel/sys_riscv.c @@ -7,6 +7,7 @@
#include <linux/syscalls.h> #include <asm/cacheflush.h> +#include <asm-generic/mman-common.h>
static long riscv_sys_mmap(unsigned long addr, unsigned long len, unsigned long prot, unsigned long flags, @@ -16,6 +17,15 @@ static long riscv_sys_mmap(unsigned long addr, unsigned long len, if (unlikely(offset & (~PAGE_MASK >> page_shift_offset))) return -EINVAL;
+ /* + * If PROT_WRITE is specified then extend that to PROT_READ + * protection_map[VM_WRITE] is now going to select shadow stack encodings. + * So specifying PROT_WRITE actually should select protection_map [VM_WRITE | VM_READ] + * If user wants to create shadow stack then they should use `map_shadow_stack` syscall. + */ + if (unlikely((prot & PROT_WRITE) && !(prot & PROT_READ))) + prot |= PROT_READ; + return ksys_mmap_pgoff(addr, len, prot, flags, fd, offset >> (PAGE_SHIFT - page_shift_offset)); } diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c index d85efe74a4b6..62ab2c7de7c8 100644 --- a/arch/riscv/mm/init.c +++ b/arch/riscv/mm/init.c @@ -376,7 +376,7 @@ pgd_t early_pg_dir[PTRS_PER_PGD] __initdata __aligned(PAGE_SIZE); static const pgprot_t protection_map[16] = { [VM_NONE] = PAGE_NONE, [VM_READ] = PAGE_READ, - [VM_WRITE] = PAGE_COPY, + [VM_WRITE] = PAGE_SHADOWSTACK, [VM_WRITE | VM_READ] = PAGE_COPY, [VM_EXEC] = PAGE_EXEC, [VM_EXEC | VM_READ] = PAGE_READ_EXEC,
This patch implements creating shadow stack pte (on riscv). Creating shadow stack PTE on riscv means that clearing RWX and then setting W=1.
Reviewed-by: Alexandre Ghiti alexghiti@rivosinc.com Reviewed-by: Zong Li zong.li@sifive.com Signed-off-by: Deepak Gupta debug@rivosinc.com --- arch/riscv/include/asm/pgtable.h | 10 ++++++++++ 1 file changed, 10 insertions(+)
diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h index 4c4057a2550e..e4eb4657e1b6 100644 --- a/arch/riscv/include/asm/pgtable.h +++ b/arch/riscv/include/asm/pgtable.h @@ -425,6 +425,11 @@ static inline pte_t pte_mkwrite_novma(pte_t pte) return __pte(pte_val(pte) | _PAGE_WRITE); }
+static inline pte_t pte_mkwrite_shstk(pte_t pte) +{ + return __pte((pte_val(pte) & ~(_PAGE_LEAF)) | _PAGE_WRITE); +} + /* static inline pte_t pte_mkexec(pte_t pte) */
static inline pte_t pte_mkdirty(pte_t pte) @@ -765,6 +770,11 @@ static inline pmd_t pmd_mkwrite_novma(pmd_t pmd) return pte_pmd(pte_mkwrite_novma(pmd_pte(pmd))); }
+static inline pmd_t pmd_mkwrite_shstk(pmd_t pte) +{ + return __pmd((pmd_val(pte) & ~(_PAGE_LEAF)) | _PAGE_WRITE); +} + static inline pmd_t pmd_wrprotect(pmd_t pmd) { return pte_pmd(pte_wrprotect(pmd_pte(pmd)));
pte_mkwrite creates PTEs with WRITE encodings for underlying arch. Underlying arch can have two types of writeable mappings. One that can be written using regular store instructions. Another one that can only be written using specialized store instructions (like shadow stack stores). pte_mkwrite can select write PTE encoding based on VMA range (i.e. VM_SHADOW_STACK)
Reviewed-by: Alexandre Ghiti alexghiti@rivosinc.com Reviewed-by: Zong Li zong.li@sifive.com Signed-off-by: Deepak Gupta debug@rivosinc.com --- arch/riscv/include/asm/pgtable.h | 7 +++++++ arch/riscv/mm/pgtable.c | 16 ++++++++++++++++ 2 files changed, 23 insertions(+)
diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h index e4eb4657e1b6..b03e8f85221f 100644 --- a/arch/riscv/include/asm/pgtable.h +++ b/arch/riscv/include/asm/pgtable.h @@ -420,6 +420,10 @@ static inline pte_t pte_wrprotect(pte_t pte)
/* static inline pte_t pte_mkread(pte_t pte) */
+struct vm_area_struct; +pte_t pte_mkwrite(pte_t pte, struct vm_area_struct *vma); +#define pte_mkwrite pte_mkwrite + static inline pte_t pte_mkwrite_novma(pte_t pte) { return __pte(pte_val(pte) | _PAGE_WRITE); @@ -765,6 +769,9 @@ static inline pmd_t pmd_mkyoung(pmd_t pmd) return pte_pmd(pte_mkyoung(pmd_pte(pmd))); }
+pmd_t pmd_mkwrite(pmd_t pmd, struct vm_area_struct *vma); +#define pmd_mkwrite pmd_mkwrite + static inline pmd_t pmd_mkwrite_novma(pmd_t pmd) { return pte_pmd(pte_mkwrite_novma(pmd_pte(pmd))); diff --git a/arch/riscv/mm/pgtable.c b/arch/riscv/mm/pgtable.c index 8b6c0a112a8d..17a4bd05a02f 100644 --- a/arch/riscv/mm/pgtable.c +++ b/arch/riscv/mm/pgtable.c @@ -165,3 +165,19 @@ pud_t pudp_invalidate(struct vm_area_struct *vma, unsigned long address, return old; } #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ + +pte_t pte_mkwrite(pte_t pte, struct vm_area_struct *vma) +{ + if (vma->vm_flags & VM_SHADOW_STACK) + return pte_mkwrite_shstk(pte); + + return pte_mkwrite_novma(pte); +} + +pmd_t pmd_mkwrite(pmd_t pmd, struct vm_area_struct *vma) +{ + if (vma->vm_flags & VM_SHADOW_STACK) + return pmd_mkwrite_shstk(pmd); + + return pmd_mkwrite_novma(pmd); +}
`fork` implements copy on write (COW) by making pages readonly in child and parent both.
ptep_set_wrprotect and pte_wrprotect clears _PAGE_WRITE in PTE. Assumption is that page is readable and on fault copy on write happens.
To implement COW on shadow stack pages, clearing up W bit makes them XWR = 000. This will result in wrong PTE setting which says no perms but V=1 and PFN field pointing to final page. Instead desired behavior is to turn it into a readable page, take an access (load/store) fault on sspush/sspop (shadow stack) and then perform COW on such pages. This way regular reads would still be allowed and not lead to COW maintaining current behavior of COW on non-shadow stack but writeable memory.
On the other hand it doesn't interfere with existing COW for read-write memory. Assumption is always that _PAGE_READ must have been set and thus setting _PAGE_READ is harmless.
Reviewed-by: Alexandre Ghiti alexghiti@rivosinc.com Reviewed-by: Zong Li zong.li@sifive.com Signed-off-by: Deepak Gupta debug@rivosinc.com --- arch/riscv/include/asm/pgtable.h | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-)
diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h index b03e8f85221f..df4a04b64944 100644 --- a/arch/riscv/include/asm/pgtable.h +++ b/arch/riscv/include/asm/pgtable.h @@ -415,7 +415,7 @@ static inline int pte_special(pte_t pte)
static inline pte_t pte_wrprotect(pte_t pte) { - return __pte(pte_val(pte) & ~(_PAGE_WRITE)); + return __pte((pte_val(pte) & ~(_PAGE_WRITE)) | (_PAGE_READ)); }
/* static inline pte_t pte_mkread(pte_t pte) */ @@ -611,7 +611,15 @@ static inline pte_t ptep_get_and_clear(struct mm_struct *mm, static inline void ptep_set_wrprotect(struct mm_struct *mm, unsigned long address, pte_t *ptep) { - atomic_long_and(~(unsigned long)_PAGE_WRITE, (atomic_long_t *)ptep); + pte_t read_pte = READ_ONCE(*ptep); + /* + * ptep_set_wrprotect can be called for shadow stack ranges too. + * shadow stack memory is XWR = 010 and thus clearing _PAGE_WRITE will lead to + * encoding 000b which is wrong encoding with V = 1. This should lead to page fault + * but we dont want this wrong configuration to be set in page tables. + */ + atomic_long_set((atomic_long_t *)ptep, + ((pte_val(read_pte) & ~(unsigned long)_PAGE_WRITE) | _PAGE_READ)); }
#define __HAVE_ARCH_PTEP_CLEAR_YOUNG_FLUSH
As discussed extensively in the changelog for the addition of this syscall on x86 ("x86/shstk: Introduce map_shadow_stack syscall") the existing mmap() and madvise() syscalls do not map entirely well onto the security requirements for shadow stack memory since they lead to windows where memory is allocated but not yet protected or stacks which are not properly and safely initialised. Instead a new syscall map_shadow_stack() has been defined which allocates and initialises a shadow stack page.
This patch implements this syscall for riscv. riscv doesn't require token to be setup by kernel because user mode can do that by itself. However to provide compatibility and portability with other architectues, user mode can specify token set flag.
Reviewed-by: Zong Li zong.li@sifive.com Signed-off-by: Deepak Gupta debug@rivosinc.com --- arch/riscv/kernel/Makefile | 1 + arch/riscv/kernel/usercfi.c | 143 ++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 144 insertions(+)
diff --git a/arch/riscv/kernel/Makefile b/arch/riscv/kernel/Makefile index f60fce69b725..2d0e0dcedbd3 100644 --- a/arch/riscv/kernel/Makefile +++ b/arch/riscv/kernel/Makefile @@ -125,3 +125,4 @@ obj-$(CONFIG_ACPI) += acpi.o obj-$(CONFIG_ACPI_NUMA) += acpi_numa.o
obj-$(CONFIG_GENERIC_CPU_VULNERABILITIES) += bugs.o +obj-$(CONFIG_RISCV_USER_CFI) += usercfi.o diff --git a/arch/riscv/kernel/usercfi.c b/arch/riscv/kernel/usercfi.c new file mode 100644 index 000000000000..0b3bbb41490a --- /dev/null +++ b/arch/riscv/kernel/usercfi.c @@ -0,0 +1,143 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright (C) 2024 Rivos, Inc. + * Deepak Gupta debug@rivosinc.com + */ + +#include <linux/sched.h> +#include <linux/bitops.h> +#include <linux/types.h> +#include <linux/mm.h> +#include <linux/mman.h> +#include <linux/uaccess.h> +#include <linux/sizes.h> +#include <linux/user.h> +#include <linux/syscalls.h> +#include <linux/prctl.h> +#include <asm/csr.h> +#include <asm/usercfi.h> + +#define SHSTK_ENTRY_SIZE sizeof(void *) + +/* + * Writes on shadow stack can either be `sspush` or `ssamoswap`. `sspush` can happen + * implicitly on current shadow stack pointed to by CSR_SSP. `ssamoswap` takes pointer to + * shadow stack. To keep it simple, we plan to use `ssamoswap` to perform writes on shadow + * stack. + */ +static noinline unsigned long amo_user_shstk(unsigned long *addr, unsigned long val) +{ + /* + * Never expect -1 on shadow stack. Expect return addresses and zero + */ + unsigned long swap = -1; + + __enable_user_access(); + asm goto( + ".option push\n" + ".option arch, +zicfiss\n" + "1: ssamoswap.d %[swap], %[val], %[addr]\n" + _ASM_EXTABLE(1b, %l[fault]) + ".option pop\n" + : [swap] "=r" (swap), [addr] "+A" (*addr) + : [val] "r" (val) + : "memory" + : fault + ); + __disable_user_access(); + return swap; +fault: + __disable_user_access(); + return -1; +} + +/* + * Create a restore token on the shadow stack. A token is always XLEN wide + * and aligned to XLEN. + */ +static int create_rstor_token(unsigned long ssp, unsigned long *token_addr) +{ + unsigned long addr; + + /* Token must be aligned */ + if (!IS_ALIGNED(ssp, SHSTK_ENTRY_SIZE)) + return -EINVAL; + + /* On RISC-V we're constructing token to be function of address itself */ + addr = ssp - SHSTK_ENTRY_SIZE; + + if (amo_user_shstk((unsigned long __user *)addr, (unsigned long)ssp) == -1) + return -EFAULT; + + if (token_addr) + *token_addr = addr; + + return 0; +} + +static unsigned long allocate_shadow_stack(unsigned long addr, unsigned long size, + unsigned long token_offset, bool set_tok) +{ + int flags = MAP_ANONYMOUS | MAP_PRIVATE; + struct mm_struct *mm = current->mm; + unsigned long populate, tok_loc = 0; + + if (addr) + flags |= MAP_FIXED_NOREPLACE; + + mmap_write_lock(mm); + addr = do_mmap(NULL, addr, size, PROT_READ, flags, + VM_SHADOW_STACK | VM_WRITE, 0, &populate, NULL); + mmap_write_unlock(mm); + + if (!set_tok || IS_ERR_VALUE(addr)) + goto out; + + if (create_rstor_token(addr + token_offset, &tok_loc)) { + vm_munmap(addr, size); + return -EINVAL; + } + + addr = tok_loc; + +out: + return addr; +} + +SYSCALL_DEFINE3(map_shadow_stack, unsigned long, addr, unsigned long, size, unsigned int, flags) +{ + bool set_tok = flags & SHADOW_STACK_SET_TOKEN; + unsigned long aligned_size = 0; + + if (!cpu_supports_shadow_stack()) + return -EOPNOTSUPP; + + /* Anything other than set token should result in invalid param */ + if (flags & ~SHADOW_STACK_SET_TOKEN) + return -EINVAL; + + /* + * Unlike other architectures, on RISC-V, SSP pointer is held in CSR_SSP and is available + * CSR in all modes. CSR accesses are performed using 12bit index programmed in instruction + * itself. This provides static property on register programming and writes to CSR can't + * be unintentional from programmer's perspective. As long as programmer has guarded areas + * which perform writes to CSR_SSP properly, shadow stack pivoting is not possible. Since + * CSR_SSP is writeable by user mode, it itself can setup a shadow stack token subsequent + * to allocation. Although in order to provide portablity with other architecture (because + * `map_shadow_stack` is arch agnostic syscall), RISC-V will follow expectation of a token + * flag in flags and if provided in flags, setup a token at the base. + */ + + /* If there isn't space for a token */ + if (set_tok && size < SHSTK_ENTRY_SIZE) + return -ENOSPC; + + if (addr && (addr & (PAGE_SIZE - 1))) + return -EINVAL; + + aligned_size = PAGE_ALIGN(size); + if (aligned_size < size) + return -EOVERFLOW; + + return allocate_shadow_stack(addr, aligned_size, size, set_tok); +}
Userspace specifies CLONE_VM to share address space and spawn new thread. `clone` allow userspace to specify a new stack for new thread. However there is no way to specify new shadow stack base address without changing API. This patch allocates a new shadow stack whenever CLONE_VM is given.
In case of CLONE_VFORK, parent is suspended until child finishes and thus can child use parent shadow stack. In case of !CLONE_VM, COW kicks in because entire address space is copied from parent to child.
`clone3` is extensible and can provide mechanisms using which shadow stack as an input parameter can be provided. This is not settled yet and being extensively discussed on mailing list. Once that's settled, this commit will adapt to that.
Reviewed-by: Zong Li zong.li@sifive.com Signed-off-by: Deepak Gupta debug@rivosinc.com --- arch/riscv/include/asm/mmu_context.h | 7 ++ arch/riscv/include/asm/usercfi.h | 25 ++++++++ arch/riscv/kernel/process.c | 10 +++ arch/riscv/kernel/usercfi.c | 120 +++++++++++++++++++++++++++++++++++ 4 files changed, 162 insertions(+)
diff --git a/arch/riscv/include/asm/mmu_context.h b/arch/riscv/include/asm/mmu_context.h index 8c4bc49a3a0f..dbf27a78df6c 100644 --- a/arch/riscv/include/asm/mmu_context.h +++ b/arch/riscv/include/asm/mmu_context.h @@ -48,6 +48,13 @@ static inline unsigned long mm_untag_mask(struct mm_struct *mm) } #endif
+#define deactivate_mm deactivate_mm +static inline void deactivate_mm(struct task_struct *tsk, + struct mm_struct *mm) +{ + shstk_release(tsk); +} + #include <asm-generic/mmu_context.h>
#endif /* _ASM_RISCV_MMU_CONTEXT_H */ diff --git a/arch/riscv/include/asm/usercfi.h b/arch/riscv/include/asm/usercfi.h index 4c5233e8f3f9..a16a5dff8b0e 100644 --- a/arch/riscv/include/asm/usercfi.h +++ b/arch/riscv/include/asm/usercfi.h @@ -8,6 +8,9 @@ #ifndef __ASSEMBLER__ #include <linux/types.h>
+struct task_struct; +struct kernel_clone_args; + #ifdef CONFIG_RISCV_USER_CFI struct cfi_state { unsigned long ubcfi_en : 1; /* Enable for backward cfi. */ @@ -16,6 +19,28 @@ struct cfi_state { unsigned long shdw_stk_size; /* size of shadow stack */ };
+unsigned long shstk_alloc_thread_stack(struct task_struct *tsk, + const struct kernel_clone_args *args); +void shstk_release(struct task_struct *tsk); +void set_shstk_base(struct task_struct *task, unsigned long shstk_addr, unsigned long size); +unsigned long get_shstk_base(struct task_struct *task, unsigned long *size); +void set_active_shstk(struct task_struct *task, unsigned long shstk_addr); +bool is_shstk_enabled(struct task_struct *task); + +#else + +#define shstk_alloc_thread_stack(tsk, args) 0 + +#define shstk_release(tsk) + +#define get_shstk_base(task, size) 0UL + +#define set_shstk_base(task, shstk_addr, size) do {} while (0) + +#define set_active_shstk(task, shstk_addr) do {} while (0) + +#define is_shstk_enabled(task) false + #endif /* CONFIG_RISCV_USER_CFI */
#endif /* __ASSEMBLER__ */ diff --git a/arch/riscv/kernel/process.c b/arch/riscv/kernel/process.c index 31a392993cb4..72d35adc6e0e 100644 --- a/arch/riscv/kernel/process.c +++ b/arch/riscv/kernel/process.c @@ -31,6 +31,7 @@ #include <asm/vector.h> #include <asm/cpufeature.h> #include <asm/exec.h> +#include <asm/usercfi.h>
#if defined(CONFIG_STACKPROTECTOR) && !defined(CONFIG_STACKPROTECTOR_PER_TASK) #include <linux/stackprotector.h> @@ -226,6 +227,7 @@ int copy_thread(struct task_struct *p, const struct kernel_clone_args *args) u64 clone_flags = args->flags; unsigned long usp = args->stack; unsigned long tls = args->tls; + unsigned long ssp = 0; struct pt_regs *childregs = task_pt_regs(p);
/* Ensure all threads in this mm have the same pointer masking mode. */ @@ -245,11 +247,19 @@ int copy_thread(struct task_struct *p, const struct kernel_clone_args *args) p->thread.s[1] = (unsigned long)args->fn_arg; p->thread.ra = (unsigned long)ret_from_fork_kernel_asm; } else { + /* allocate new shadow stack if needed. In case of CLONE_VM we have to */ + ssp = shstk_alloc_thread_stack(p, args); + if (IS_ERR_VALUE(ssp)) + return PTR_ERR((void *)ssp); + *childregs = *(current_pt_regs()); /* Turn off status.VS */ riscv_v_vstate_off(childregs); if (usp) /* User fork */ childregs->sp = usp; + /* if needed, set new ssp */ + if (ssp) + set_active_shstk(p, ssp); if (clone_flags & CLONE_SETTLS) childregs->tp = tls; childregs->a0 = 0; /* Return value of fork() */ diff --git a/arch/riscv/kernel/usercfi.c b/arch/riscv/kernel/usercfi.c index 0b3bbb41490a..ec3d78efd6f3 100644 --- a/arch/riscv/kernel/usercfi.c +++ b/arch/riscv/kernel/usercfi.c @@ -19,6 +19,41 @@
#define SHSTK_ENTRY_SIZE sizeof(void *)
+bool is_shstk_enabled(struct task_struct *task) +{ + return task->thread_info.user_cfi_state.ubcfi_en; +} + +void set_shstk_base(struct task_struct *task, unsigned long shstk_addr, unsigned long size) +{ + task->thread_info.user_cfi_state.shdw_stk_base = shstk_addr; + task->thread_info.user_cfi_state.shdw_stk_size = size; +} + +unsigned long get_shstk_base(struct task_struct *task, unsigned long *size) +{ + if (size) + *size = task->thread_info.user_cfi_state.shdw_stk_size; + return task->thread_info.user_cfi_state.shdw_stk_base; +} + +void set_active_shstk(struct task_struct *task, unsigned long shstk_addr) +{ + task->thread_info.user_cfi_state.user_shdw_stk = shstk_addr; +} + +/* + * If size is 0, then to be compatible with regular stack we want it to be as big as + * regular stack. Else PAGE_ALIGN it and return back + */ +static unsigned long calc_shstk_size(unsigned long size) +{ + if (size) + return PAGE_ALIGN(size); + + return PAGE_ALIGN(min_t(unsigned long long, rlimit(RLIMIT_STACK), SZ_4G)); +} + /* * Writes on shadow stack can either be `sspush` or `ssamoswap`. `sspush` can happen * implicitly on current shadow stack pointed to by CSR_SSP. `ssamoswap` takes pointer to @@ -141,3 +176,88 @@ SYSCALL_DEFINE3(map_shadow_stack, unsigned long, addr, unsigned long, size, unsi
return allocate_shadow_stack(addr, aligned_size, size, set_tok); } + +/* + * This gets called during clone/clone3/fork. And is needed to allocate a shadow stack for + * cases where CLONE_VM is specified and thus a different stack is specified by user. We + * thus need a separate shadow stack too. How does separate shadow stack is specified by + * user is still being debated. Once that's settled, remove this part of the comment. + * This function simply returns 0 if shadow stack are not supported or if separate shadow + * stack allocation is not needed (like in case of !CLONE_VM) + */ +unsigned long shstk_alloc_thread_stack(struct task_struct *tsk, + const struct kernel_clone_args *args) +{ + unsigned long addr, size; + + /* If shadow stack is not supported, return 0 */ + if (!cpu_supports_shadow_stack()) + return 0; + + /* + * If shadow stack is not enabled on the new thread, skip any + * switch to a new shadow stack. + */ + if (!is_shstk_enabled(tsk)) + return 0; + + /* + * For CLONE_VFORK the child will share the parents shadow stack. + * Set base = 0 and size = 0, this is special means to track this state + * so the freeing logic run for child knows to leave it alone. + */ + if (args->flags & CLONE_VFORK) { + set_shstk_base(tsk, 0, 0); + return 0; + } + + /* + * For !CLONE_VM the child will use a copy of the parents shadow + * stack. + */ + if (!(args->flags & CLONE_VM)) + return 0; + + /* + * reaching here means, CLONE_VM was specified and thus a separate shadow + * stack is needed for new cloned thread. Note: below allocation is happening + * using current mm. + */ + size = calc_shstk_size(args->stack_size); + addr = allocate_shadow_stack(0, size, 0, false); + if (IS_ERR_VALUE(addr)) + return addr; + + set_shstk_base(tsk, addr, size); + + return addr + size; +} + +void shstk_release(struct task_struct *tsk) +{ + unsigned long base = 0, size = 0; + /* If shadow stack is not supported or not enabled, nothing to release */ + if (!cpu_supports_shadow_stack() || !is_shstk_enabled(tsk)) + return; + + /* + * When fork() with CLONE_VM fails, the child (tsk) already has a + * shadow stack allocated, and exit_thread() calls this function to + * free it. In this case the parent (current) and the child share + * the same mm struct. Move forward only when they're same. + */ + if (!tsk->mm || tsk->mm != current->mm) + return; + + /* + * We know shadow stack is enabled but if base is NULL, then + * this task is not managing its own shadow stack (CLONE_VFORK). So + * skip freeing it. + */ + base = get_shstk_base(tsk, &size); + if (!base) + return; + + vm_munmap(base, size); + set_shstk_base(tsk, 0, 0); +}
Implement architecture agnostic prctls() interface for setting and getting shadow stack status.
prctls implemented are PR_GET_SHADOW_STACK_STATUS, PR_SET_SHADOW_STACK_STATUS and PR_LOCK_SHADOW_STACK_STATUS.
As part of PR_SET_SHADOW_STACK_STATUS/PR_GET_SHADOW_STACK_STATUS, only PR_SHADOW_STACK_ENABLE is implemented because RISCV allows each mode to write to their own shadow stack using `sspush` or `ssamoswap`.
PR_LOCK_SHADOW_STACK_STATUS locks current configuration of shadow stack enabling.
Reviewed-by: Zong Li zong.li@sifive.com Signed-off-by: Deepak Gupta debug@rivosinc.com --- arch/riscv/include/asm/usercfi.h | 16 ++++++ arch/riscv/kernel/process.c | 8 +++ arch/riscv/kernel/usercfi.c | 110 +++++++++++++++++++++++++++++++++++++++ 3 files changed, 134 insertions(+)
diff --git a/arch/riscv/include/asm/usercfi.h b/arch/riscv/include/asm/usercfi.h index a16a5dff8b0e..d71093f414df 100644 --- a/arch/riscv/include/asm/usercfi.h +++ b/arch/riscv/include/asm/usercfi.h @@ -7,6 +7,7 @@
#ifndef __ASSEMBLER__ #include <linux/types.h> +#include <linux/prctl.h>
struct task_struct; struct kernel_clone_args; @@ -14,6 +15,7 @@ struct kernel_clone_args; #ifdef CONFIG_RISCV_USER_CFI struct cfi_state { unsigned long ubcfi_en : 1; /* Enable for backward cfi. */ + unsigned long ubcfi_locked : 1; unsigned long user_shdw_stk; /* Current user shadow stack pointer */ unsigned long shdw_stk_base; /* Base address of shadow stack */ unsigned long shdw_stk_size; /* size of shadow stack */ @@ -26,6 +28,12 @@ void set_shstk_base(struct task_struct *task, unsigned long shstk_addr, unsigned unsigned long get_shstk_base(struct task_struct *task, unsigned long *size); void set_active_shstk(struct task_struct *task, unsigned long shstk_addr); bool is_shstk_enabled(struct task_struct *task); +bool is_shstk_locked(struct task_struct *task); +bool is_shstk_allocated(struct task_struct *task); +void set_shstk_lock(struct task_struct *task); +void set_shstk_status(struct task_struct *task, bool enable); + +#define PR_SHADOW_STACK_SUPPORTED_STATUS_MASK (PR_SHADOW_STACK_ENABLE)
#else
@@ -41,6 +49,14 @@ bool is_shstk_enabled(struct task_struct *task);
#define is_shstk_enabled(task) false
+#define is_shstk_locked(task) false + +#define is_shstk_allocated(task) false + +#define set_shstk_lock(task) do {} while (0) + +#define set_shstk_status(task, enable) do {} while (0) + #endif /* CONFIG_RISCV_USER_CFI */
#endif /* __ASSEMBLER__ */ diff --git a/arch/riscv/kernel/process.c b/arch/riscv/kernel/process.c index 72d35adc6e0e..a137d3483646 100644 --- a/arch/riscv/kernel/process.c +++ b/arch/riscv/kernel/process.c @@ -156,6 +156,14 @@ void start_thread(struct pt_regs *regs, unsigned long pc, regs->epc = pc; regs->sp = sp;
+ /* + * clear shadow stack state on exec. + * libc will set it later via prctl. + */ + set_shstk_status(current, false); + set_shstk_base(current, 0, 0); + set_active_shstk(current, 0); + #ifdef CONFIG_64BIT regs->status &= ~SR_UXL;
diff --git a/arch/riscv/kernel/usercfi.c b/arch/riscv/kernel/usercfi.c index ec3d78efd6f3..08620bdae696 100644 --- a/arch/riscv/kernel/usercfi.c +++ b/arch/riscv/kernel/usercfi.c @@ -24,6 +24,16 @@ bool is_shstk_enabled(struct task_struct *task) return task->thread_info.user_cfi_state.ubcfi_en; }
+bool is_shstk_allocated(struct task_struct *task) +{ + return task->thread_info.user_cfi_state.shdw_stk_base; +} + +bool is_shstk_locked(struct task_struct *task) +{ + return task->thread_info.user_cfi_state.ubcfi_locked; +} + void set_shstk_base(struct task_struct *task, unsigned long shstk_addr, unsigned long size) { task->thread_info.user_cfi_state.shdw_stk_base = shstk_addr; @@ -42,6 +52,26 @@ void set_active_shstk(struct task_struct *task, unsigned long shstk_addr) task->thread_info.user_cfi_state.user_shdw_stk = shstk_addr; }
+void set_shstk_status(struct task_struct *task, bool enable) +{ + if (!cpu_supports_shadow_stack()) + return; + + task->thread_info.user_cfi_state.ubcfi_en = enable ? 1 : 0; + + if (enable) + task->thread.envcfg |= ENVCFG_SSE; + else + task->thread.envcfg &= ~ENVCFG_SSE; + + csr_write(CSR_ENVCFG, task->thread.envcfg); +} + +void set_shstk_lock(struct task_struct *task) +{ + task->thread_info.user_cfi_state.ubcfi_locked = 1; +} + /* * If size is 0, then to be compatible with regular stack we want it to be as big as * regular stack. Else PAGE_ALIGN it and return back @@ -261,3 +291,83 @@ void shstk_release(struct task_struct *tsk) vm_munmap(base, size); set_shstk_base(tsk, 0, 0); } + +int arch_get_shadow_stack_status(struct task_struct *t, unsigned long __user *status) +{ + unsigned long bcfi_status = 0; + + if (!cpu_supports_shadow_stack()) + return -EINVAL; + + /* this means shadow stack is enabled on the task */ + bcfi_status |= (is_shstk_enabled(t) ? PR_SHADOW_STACK_ENABLE : 0); + + return copy_to_user(status, &bcfi_status, sizeof(bcfi_status)) ? -EFAULT : 0; +} + +int arch_set_shadow_stack_status(struct task_struct *t, unsigned long status) +{ + unsigned long size = 0, addr = 0; + bool enable_shstk = false; + + if (!cpu_supports_shadow_stack()) + return -EINVAL; + + /* Reject unknown flags */ + if (status & ~PR_SHADOW_STACK_SUPPORTED_STATUS_MASK) + return -EINVAL; + + /* bcfi status is locked and further can't be modified by user */ + if (is_shstk_locked(t)) + return -EINVAL; + + enable_shstk = status & PR_SHADOW_STACK_ENABLE; + /* Request is to enable shadow stack and shadow stack is not enabled already */ + if (enable_shstk && !is_shstk_enabled(t)) { + /* shadow stack was allocated and enable request again + * no need to support such usecase and return EINVAL. + */ + if (is_shstk_allocated(t)) + return -EINVAL; + + size = calc_shstk_size(0); + addr = allocate_shadow_stack(0, size, 0, false); + if (IS_ERR_VALUE(addr)) + return -ENOMEM; + set_shstk_base(t, addr, size); + set_active_shstk(t, addr + size); + } + + /* + * If a request to disable shadow stack happens, let's go ahead and release it + * Although, if CLONE_VFORKed child did this, then in that case we will end up + * not releasing the shadow stack (because it might be needed in parent). Although + * we will disable it for VFORKed child. And if VFORKed child tries to enable again + * then in that case, it'll get entirely new shadow stack because following condition + * are true + * - shadow stack was not enabled for vforked child + * - shadow stack base was anyways pointing to 0 + * This shouldn't be a big issue because we want parent to have availability of shadow + * stack whenever VFORKed child releases resources via exit or exec but at the same + * time we want VFORKed child to break away and establish new shadow stack if it desires + * + */ + if (!enable_shstk) + shstk_release(t); + + set_shstk_status(t, enable_shstk); + return 0; +} + +int arch_lock_shadow_stack_status(struct task_struct *task, + unsigned long arg) +{ + /* If shtstk not supported or not enabled on task, nothing to lock here */ + if (!cpu_supports_shadow_stack() || + !is_shstk_enabled(task) || arg != 0) + return -EINVAL; + + set_shstk_lock(task); + + return 0; +}
Three architectures (x86, aarch64, riscv) have support for indirect branch tracking feature in a very similar fashion. On a very high level, indirect branch tracking is a CPU feature where CPU tracks branches which uses memory operand to perform control transfer in program. As part of this tracking on indirect branches, CPU goes in a state where it expects a landing pad instr on target and if not found then CPU raises some fault (architecture dependent)
x86 landing pad instr - `ENDBRANCH` arch64 landing pad instr - `BTI` riscv landing instr - `lpad`
Given that three major arches have support for indirect branch tracking, This patch makes `prctl` for indirect branch tracking arch agnostic.
To allow userspace to enable this feature for itself, following prtcls are defined: - PR_GET_INDIR_BR_LP_STATUS: Gets current configured status for indirect branch tracking. - PR_SET_INDIR_BR_LP_STATUS: Sets a configuration for indirect branch tracking. Following status options are allowed - PR_INDIR_BR_LP_ENABLE: Enables indirect branch tracking on user thread. - PR_INDIR_BR_LP_DISABLE; Disables indirect branch tracking on user thread. - PR_LOCK_INDIR_BR_LP_STATUS: Locks configured status for indirect branch tracking for user thread.
Reviewed-by: Mark Brown broonie@kernel.org Reviewed-by: Zong Li zong.li@sifive.com Signed-off-by: Deepak Gupta debug@rivosinc.com --- include/linux/cpu.h | 4 ++++ include/uapi/linux/prctl.h | 27 +++++++++++++++++++++++++++ kernel/sys.c | 30 ++++++++++++++++++++++++++++++ 3 files changed, 61 insertions(+)
diff --git a/include/linux/cpu.h b/include/linux/cpu.h index 487b3bf2e1ea..8239cd95a005 100644 --- a/include/linux/cpu.h +++ b/include/linux/cpu.h @@ -229,4 +229,8 @@ static inline bool cpu_attack_vector_mitigated(enum cpu_attack_vectors v) #define smt_mitigations SMT_MITIGATIONS_OFF #endif
+int arch_get_indir_br_lp_status(struct task_struct *t, unsigned long __user *status); +int arch_set_indir_br_lp_status(struct task_struct *t, unsigned long status); +int arch_lock_indir_br_lp_status(struct task_struct *t, unsigned long status); + #endif /* _LINUX_CPU_H_ */ diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h index 51c4e8c82b1e..9b4afdc85099 100644 --- a/include/uapi/linux/prctl.h +++ b/include/uapi/linux/prctl.h @@ -386,4 +386,31 @@ struct prctl_mm_map { # define PR_FUTEX_HASH_SET_SLOTS 1 # define PR_FUTEX_HASH_GET_SLOTS 2
+/* + * Get the current indirect branch tracking configuration for the current + * thread, this will be the value configured via PR_SET_INDIR_BR_LP_STATUS. + */ +#define PR_GET_INDIR_BR_LP_STATUS 79 + +/* + * Set the indirect branch tracking configuration. PR_INDIR_BR_LP_ENABLE will + * enable cpu feature for user thread, to track all indirect branches and ensure + * they land on arch defined landing pad instruction. + * x86 - If enabled, an indirect branch must land on `ENDBRANCH` instruction. + * arch64 - If enabled, an indirect branch must land on `BTI` instruction. + * riscv - If enabled, an indirect branch must land on `lpad` instruction. + * PR_INDIR_BR_LP_DISABLE will disable feature for user thread and indirect + * branches will no more be tracked by cpu to land on arch defined landing pad + * instruction. + */ +#define PR_SET_INDIR_BR_LP_STATUS 80 +# define PR_INDIR_BR_LP_ENABLE (1UL << 0) + +/* + * Prevent further changes to the specified indirect branch tracking + * configuration. All bits may be locked via this call, including + * undefined bits. + */ +#define PR_LOCK_INDIR_BR_LP_STATUS 81 + #endif /* _LINUX_PRCTL_H */ diff --git a/kernel/sys.c b/kernel/sys.c index 8b58eece4e58..9071422c1609 100644 --- a/kernel/sys.c +++ b/kernel/sys.c @@ -2388,6 +2388,21 @@ int __weak arch_lock_shadow_stack_status(struct task_struct *t, unsigned long st return -EINVAL; }
+int __weak arch_get_indir_br_lp_status(struct task_struct *t, unsigned long __user *status) +{ + return -EINVAL; +} + +int __weak arch_set_indir_br_lp_status(struct task_struct *t, unsigned long status) +{ + return -EINVAL; +} + +int __weak arch_lock_indir_br_lp_status(struct task_struct *t, unsigned long status) +{ + return -EINVAL; +} + #define PR_IO_FLUSHER (PF_MEMALLOC_NOIO | PF_LOCAL_THROTTLE)
static int prctl_set_vma(unsigned long opt, unsigned long addr, @@ -2868,6 +2883,21 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3, case PR_FUTEX_HASH: error = futex_hash_prctl(arg2, arg3, arg4); break; + case PR_GET_INDIR_BR_LP_STATUS: + if (arg3 || arg4 || arg5) + return -EINVAL; + error = arch_get_indir_br_lp_status(me, (unsigned long __user *)arg2); + break; + case PR_SET_INDIR_BR_LP_STATUS: + if (arg3 || arg4 || arg5) + return -EINVAL; + error = arch_set_indir_br_lp_status(me, arg2); + break; + case PR_LOCK_INDIR_BR_LP_STATUS: + if (arg3 || arg4 || arg5) + return -EINVAL; + error = arch_lock_indir_br_lp_status(me, arg2); + break; default: trace_task_prctl_unknown(option, arg2, arg3, arg4, arg5); error = -EINVAL;
prctls implemented are: PR_SET_INDIR_BR_LP_STATUS, PR_GET_INDIR_BR_LP_STATUS and PR_LOCK_INDIR_BR_LP_STATUS
Reviewed-by: Zong Li zong.li@sifive.com Signed-off-by: Deepak Gupta debug@rivosinc.com --- arch/riscv/include/asm/usercfi.h | 14 +++++++ arch/riscv/kernel/entry.S | 4 ++ arch/riscv/kernel/process.c | 5 +++ arch/riscv/kernel/usercfi.c | 79 ++++++++++++++++++++++++++++++++++++++++ 4 files changed, 102 insertions(+)
diff --git a/arch/riscv/include/asm/usercfi.h b/arch/riscv/include/asm/usercfi.h index d71093f414df..4501d741a609 100644 --- a/arch/riscv/include/asm/usercfi.h +++ b/arch/riscv/include/asm/usercfi.h @@ -16,6 +16,8 @@ struct kernel_clone_args; struct cfi_state { unsigned long ubcfi_en : 1; /* Enable for backward cfi. */ unsigned long ubcfi_locked : 1; + unsigned long ufcfi_en : 1; /* Enable for forward cfi. Note that ELP goes in sstatus */ + unsigned long ufcfi_locked : 1; unsigned long user_shdw_stk; /* Current user shadow stack pointer */ unsigned long shdw_stk_base; /* Base address of shadow stack */ unsigned long shdw_stk_size; /* size of shadow stack */ @@ -32,6 +34,10 @@ bool is_shstk_locked(struct task_struct *task); bool is_shstk_allocated(struct task_struct *task); void set_shstk_lock(struct task_struct *task); void set_shstk_status(struct task_struct *task, bool enable); +bool is_indir_lp_enabled(struct task_struct *task); +bool is_indir_lp_locked(struct task_struct *task); +void set_indir_lp_status(struct task_struct *task, bool enable); +void set_indir_lp_lock(struct task_struct *task);
#define PR_SHADOW_STACK_SUPPORTED_STATUS_MASK (PR_SHADOW_STACK_ENABLE)
@@ -57,6 +63,14 @@ void set_shstk_status(struct task_struct *task, bool enable);
#define set_shstk_status(task, enable) do {} while (0)
+#define is_indir_lp_enabled(task) false + +#define is_indir_lp_locked(task) false + +#define set_indir_lp_status(task, enable) do {} while (0) + +#define set_indir_lp_lock(task) do {} while (0) + #endif /* CONFIG_RISCV_USER_CFI */
#endif /* __ASSEMBLER__ */ diff --git a/arch/riscv/kernel/entry.S b/arch/riscv/kernel/entry.S index 8410850953d6..036a6ca7641f 100644 --- a/arch/riscv/kernel/entry.S +++ b/arch/riscv/kernel/entry.S @@ -174,6 +174,10 @@ SYM_CODE_START(handle_exception) * or vector in kernel space. */ li t0, SR_SUM | SR_FS_VS +#ifdef CONFIG_64BIT + li t1, SR_ELP + or t0, t0, t1 +#endif
REG_L s0, TASK_TI_USER_SP(tp) csrrc s1, CSR_STATUS, t0 diff --git a/arch/riscv/kernel/process.c b/arch/riscv/kernel/process.c index a137d3483646..49f527e3acfd 100644 --- a/arch/riscv/kernel/process.c +++ b/arch/riscv/kernel/process.c @@ -163,6 +163,11 @@ void start_thread(struct pt_regs *regs, unsigned long pc, set_shstk_status(current, false); set_shstk_base(current, 0, 0); set_active_shstk(current, 0); + /* + * disable indirect branch tracking on exec. + * libc will enable it later via prctl. + */ + set_indir_lp_status(current, false);
#ifdef CONFIG_64BIT regs->status &= ~SR_UXL; diff --git a/arch/riscv/kernel/usercfi.c b/arch/riscv/kernel/usercfi.c index 08620bdae696..2ebe789caa6b 100644 --- a/arch/riscv/kernel/usercfi.c +++ b/arch/riscv/kernel/usercfi.c @@ -72,6 +72,35 @@ void set_shstk_lock(struct task_struct *task) task->thread_info.user_cfi_state.ubcfi_locked = 1; }
+bool is_indir_lp_enabled(struct task_struct *task) +{ + return task->thread_info.user_cfi_state.ufcfi_en; +} + +bool is_indir_lp_locked(struct task_struct *task) +{ + return task->thread_info.user_cfi_state.ufcfi_locked; +} + +void set_indir_lp_status(struct task_struct *task, bool enable) +{ + if (!cpu_supports_indirect_br_lp_instr()) + return; + + task->thread_info.user_cfi_state.ufcfi_en = enable ? 1 : 0; + + if (enable) + task->thread.envcfg |= ENVCFG_LPE; + else + task->thread.envcfg &= ~ENVCFG_LPE; + + csr_write(CSR_ENVCFG, task->thread.envcfg); +} + +void set_indir_lp_lock(struct task_struct *task) +{ + task->thread_info.user_cfi_state.ufcfi_locked = 1; +} /* * If size is 0, then to be compatible with regular stack we want it to be as big as * regular stack. Else PAGE_ALIGN it and return back @@ -371,3 +400,53 @@ int arch_lock_shadow_stack_status(struct task_struct *task,
return 0; } + +int arch_get_indir_br_lp_status(struct task_struct *t, unsigned long __user *status) +{ + unsigned long fcfi_status = 0; + + if (!cpu_supports_indirect_br_lp_instr()) + return -EINVAL; + + /* indirect branch tracking is enabled on the task or not */ + fcfi_status |= (is_indir_lp_enabled(t) ? PR_INDIR_BR_LP_ENABLE : 0); + + return copy_to_user(status, &fcfi_status, sizeof(fcfi_status)) ? -EFAULT : 0; +} + +int arch_set_indir_br_lp_status(struct task_struct *t, unsigned long status) +{ + bool enable_indir_lp = false; + + if (!cpu_supports_indirect_br_lp_instr()) + return -EINVAL; + + /* indirect branch tracking is locked and further can't be modified by user */ + if (is_indir_lp_locked(t)) + return -EINVAL; + + /* Reject unknown flags */ + if (status & ~PR_INDIR_BR_LP_ENABLE) + return -EINVAL; + + enable_indir_lp = (status & PR_INDIR_BR_LP_ENABLE); + set_indir_lp_status(t, enable_indir_lp); + + return 0; +} + +int arch_lock_indir_br_lp_status(struct task_struct *task, + unsigned long arg) +{ + /* + * If indirect branch tracking is not supported or not enabled on task, + * nothing to lock here + */ + if (!cpu_supports_indirect_br_lp_instr() || + !is_indir_lp_enabled(task) || arg != 0) + return -EINVAL; + + set_indir_lp_lock(task); + + return 0; +}
I am getting below error on my end for all recipients. That's the reason for partial delivery. I have to figure out issue and then I'll re-send.
""" The user you are trying to contact is receiving mail at a rate that\n 4.2.1 prevents additional messages from being delivered. Please resend your\n 4.2.1 message at a later time. If the user is able to receive mail at that\n 4.2.1 time, your message will be delivered. For more information, go to\n 4.2.1 https://support.google.com/mail/?p=ReceivingRate 98e67ed59e1d1-33dfb7f8310sm153460a91.5 - gsmtp') """
On Mon, Oct 20, 2025 at 01:22:29PM -0700, Deepak Gupta wrote:
v22: fixing build error due to -march=zicfiss being picked in gcc-13 and above but not actually doing any codegen or recognizing instruction for zicfiss. Change in v22 makes dependence on `-fcf-protection=full` compiler flag to ensure that toolchain has support and then only CONFIG_RISCV_USER_CFI will be visible in menuconfig.
v21: fixed build errors.
Basics and overview
Software with larger attack surfaces (e.g. network facing apps like databases, browsers or apps relying on browser runtimes) suffer from memory corruption issues which can be utilized by attackers to bend control flow of the program to eventually gain control (by making their payload executable). Attackers are able to perform such attacks by leveraging call-sites which rely on indirect calls or return sites which rely on obtaining return address from stack memory.
To mitigate such attacks, risc-v extension zicfilp enforces that all indirect calls must land on a landing pad instruction `lpad` else cpu will raise software check exception (a new cpu exception cause code on riscv). Similarly for return flow, risc-v extension zicfiss extends architecture with
- `sspush` instruction to push return address on a shadow stack
- `sspopchk` instruction to pop return address from shadow stack
and compare with input operand (i.e. return address on stack)
- `sspopchk` to raise software check exception if comparision above
was a mismatch
- Protection mechanism using which shadow stack is not writeable via
regular store instructions
More information an details can be found at extensions github repo [1].
Equivalent to landing pad (zicfilp) on x86 is `ENDBRANCH` instruction in Intel CET [3] and branch target identification (BTI) [4] on arm. Similarly x86's Intel CET has shadow stack [5] and arm64 has guarded control stack (GCS) [6] which are very similar to risc-v's zicfiss shadow stack.
x86 and arm64 support for user mode shadow stack is already in mainline.
Kernel awareness for user control flow integrity
This series picks up Samuel Holland's envcfg changes [2] as well. So if those are being applied independently, they should be removed from this series.
Enabling:
In order to maintain compatibility and not break anything in user mode, kernel doesn't enable control flow integrity cpu extensions on binary by default. Instead exposes a prctl interface to enable, disable and lock the shadow stack or landing pad feature for a task. This allows userspace (loader) to enumerate if all objects in its address space are compiled with shadow stack and landing pad support and accordingly enable the feature. Additionally if a subsequent `dlopen` happens on a library, user mode can take a decision again to disable the feature (if incoming library is not compiled with support) OR terminate the task (if user mode policy is strict to have all objects in address space to be compiled with control flow integirty cpu feature). prctl to enable shadow stack results in allocating shadow stack from virtual memory and activating for user address space. x86 and arm64 are also following same direction due to similar reason(s).
clone/fork:
On clone and fork, cfi state for task is inherited by child. Shadow stack is part of virtual memory and is a writeable memory from kernel perspective (writeable via a restricted set of instructions aka shadow stack instructions) Thus kernel changes ensure that this memory is converted into read-only when fork/clone happens and COWed when fault is taken due to sspush, sspopchk or ssamoswap. In case `CLONE_VM` is specified and shadow stack is to be enabled, kernel will automatically allocate a shadow stack for that clone call.
map_shadow_stack:
x86 introduced `map_shadow_stack` system call to allow user space to explicitly map shadow stack memory in its address space. It is useful to allocate shadow for different contexts managed by a single thread (green threads or contexts) risc-v implements this system call as well.
signal management:
If shadow stack is enabled for a task, kernel performs an asynchronous control flow diversion to deliver the signal and eventually expects userspace to issue sigreturn so that original execution can be resumed. Even though resume context is prepared by kernel, it is in user space memory and is subject to memory corruption and corruption bugs can be utilized by attacker in this race window to perform arbitrary sigreturn and eventually bypass cfi mechanism. Another issue is how to ensure that cfi related state on sigcontext area is not trampled by legacy apps or apps compiled with old kernel headers.
In order to mitigate control-flow hijacting, kernel prepares a token and place it on shadow stack before signal delivery and places address of token in sigcontext structure. During sigreturn, kernel obtains address of token from sigcontext struture, reads token from shadow stack and validates it and only then allow sigreturn to succeed. Compatiblity issue is solved by adopting dynamic sigcontext management introduced for vector extension. This series re-factor the code little bit to allow future sigcontext management easy (as proposed by Andy Chiu from SiFive)
config and compilation:
Introduce a new risc-v config option `CONFIG_RISCV_USER_CFI`. Selecting this config option picks the kernel support for user control flow integrity. This optin is presented only if toolchain has shadow stack and landing pad support. And is on purpose guarded by toolchain support. Reason being that eventually vDSO also needs to be compiled in with shadow stack and landing pad support. vDSO compile patches are not included as of now because landing pad labeling scheme is yet to settle for usermode runtime.
To get more information on kernel interactions with respect to zicfilp and zicfiss, patch series adds documentation for `zicfilp` and `zicfiss` in following: Documentation/arch/riscv/zicfiss.rst Documentation/arch/riscv/zicfilp.rst
How to test this series
Toolchain
$ git clone git@github.com:sifive/riscv-gnu-toolchain.git -b cfi-dev $ riscv-gnu-toolchain/configure --prefix=<path-to-where-to-build> --with-arch=rv64gc_zicfilp_zicfiss --enable-linux --disable-gdb --with-extra-multilib-test="rv64gc_zicfilp_zicfiss-lp64d:-static" $ make -j$(nproc)
Qemu
Get the lastest qemu $ cd qemu $ mkdir build $ cd build $ ../configure --target-list=riscv64-softmmu $ make -j$(nproc)
Opensbi
$ git clone git@github.com:deepak0414/opensbi.git -b v6_cfi_spec_split_opensbi $ make CROSS_COMPILE=<your riscv toolchain> -j$(nproc) PLATFORM=generic
Linux
Running defconfig is fine. CFI is enabled by default if the toolchain supports it.
$ make ARCH=riscv CROSS_COMPILE=<path-to-cfi-riscv-gnu-toolchain>/build/bin/riscv64-unknown-linux-gnu- -j$(nproc) defconfig $ make ARCH=riscv CROSS_COMPILE=<path-to-cfi-riscv-gnu-toolchain>/build/bin/riscv64-unknown-linux-gnu- -j$(nproc)
Running
Modify your qemu command to have: -bios <path-to-cfi-opensbi>/build/platform/generic/firmware/fw_dynamic.bin -cpu rv64,zicfilp=true,zicfiss=true,zimop=true,zcmop=true
References
[1] - https://github.com/riscv/riscv-cfi [2] - https://lore.kernel.org/all/20240814081126.956287-1-samuel.holland@sifive.co... [3] - https://lwn.net/Articles/889475/ [4] - https://developer.arm.com/documentation/109576/0100/Branch-Target-Identifica... [5] - https://www.intel.com/content/dam/develop/external/us/en/documents/catc17-in... [6] - https://lwn.net/Articles/940403/
To: Thomas Gleixner tglx@linutronix.de To: Ingo Molnar mingo@redhat.com To: Borislav Petkov bp@alien8.de To: Dave Hansen dave.hansen@linux.intel.com To: x86@kernel.org To: H. Peter Anvin hpa@zytor.com To: Andrew Morton akpm@linux-foundation.org To: Liam R. Howlett Liam.Howlett@oracle.com To: Vlastimil Babka vbabka@suse.cz To: Lorenzo Stoakes lorenzo.stoakes@oracle.com To: Paul Walmsley paul.walmsley@sifive.com To: Palmer Dabbelt palmer@dabbelt.com To: Albert Ou aou@eecs.berkeley.edu To: Conor Dooley conor@kernel.org To: Rob Herring robh@kernel.org To: Krzysztof Kozlowski krzk+dt@kernel.org To: Arnd Bergmann arnd@arndb.de To: Christian Brauner brauner@kernel.org To: Peter Zijlstra peterz@infradead.org To: Oleg Nesterov oleg@redhat.com To: Eric Biederman ebiederm@xmission.com To: Kees Cook kees@kernel.org To: Jonathan Corbet corbet@lwn.net To: Shuah Khan shuah@kernel.org To: Jann Horn jannh@google.com To: Conor Dooley conor+dt@kernel.org To: Miguel Ojeda ojeda@kernel.org To: Alex Gaynor alex.gaynor@gmail.com To: Boqun Feng boqun.feng@gmail.com To: Gary Guo gary@garyguo.net To: Björn Roy Baron bjorn3_gh@protonmail.com To: Benno Lossin benno.lossin@proton.me To: Andreas Hindborg a.hindborg@kernel.org To: Alice Ryhl aliceryhl@google.com To: Trevor Gross tmgross@umich.edu Cc: linux-kernel@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org Cc: linux-mm@kvack.org Cc: linux-riscv@lists.infradead.org Cc: devicetree@vger.kernel.org Cc: linux-arch@vger.kernel.org Cc: linux-doc@vger.kernel.org Cc: linux-kselftest@vger.kernel.org Cc: alistair.francis@wdc.com Cc: richard.henderson@linaro.org Cc: jim.shu@sifive.com Cc: andybnac@gmail.com Cc: kito.cheng@sifive.com Cc: charlie@rivosinc.com Cc: atishp@rivosinc.com Cc: evan@rivosinc.com Cc: cleger@rivosinc.com Cc: alexghiti@rivosinc.com Cc: samitolvanen@google.com Cc: broonie@kernel.org Cc: rick.p.edgecombe@intel.com Cc: rust-for-linux@vger.kernel.org
changelog
v22:
- CONFIG_RISCV_USER_CFI was by default "n". With dual vdso support it is
default "y" (if toolchain supports it). Fixing build error due to "-march=zicfiss" being picked in gcc-13 partially. gcc-13 only recognizes the flag but not actually doing any codegen or recognizing instruction for zicfiss. Change in v22 makes dependence on `-fcf-protection=full` compiler flag to ensure that toolchain has support and then only CONFIG_RISCV_USER_CFI will be visible in menuconfig.
- picked up tags and some cosmetic changes in commit message for dual vdso
patch.
v21:
- Fixing build errors due to changes in arch/riscv/include/asm/vdso.h
Using #ifdef instead of IS_ENABLED in arch/riscv/include/asm/vdso.h vdso-cfi-offsets.h should be included only when CONFIG_RISCV_USER_CFI is selected.
v20:
- rebased on v6.18-rc1.
- Added two vDSO support. If `CONFIG_RISCV_USER_CFI` is selected
two vDSOs are compiled (one for hardware prior to RVA23 and one for RVA23 onwards). Kernel exposes RVA23 vDSO if hardware/cpu implements zimop else exposes existing vDSO to userspace.
- default selection for `CONFIG_RISCV_USER_CFI` is "Yes".
- replaced "__ASSEMBLY__" with "__ASSEMBLER__"
v19:
- riscv_nousercfi was `int`. changed it to unsigned long.
Thanks to Alex Ghiti for reporting it. It was a bug.
- ELP is cleared on trap entry only when CONFIG_64BIT.
- restore ssp back on return to usermode was being done
before `riscv_v_context_nesting_end` on trap exit path. If kernel shadow stack were enabled this would result in kernel operating on user shadow stack and panic (as I found in my testing of kcfi patch series). So fixed that.
v18:
- rebased on 6.16-rc1
- uprobe handling clears ELP in sstatus image in pt_regs
- vdso was missing shadow stack elf note for object files.
added that. Additional asm file for vdso needed the elf marker flag. toolchain should complain if `-fcf-protection=full` and marker is missing for object generated from asm file. Asked toolchain folks to fix this. Although no reason to gate the merge on that.
- Split up compile options for march and fcf-protection in vdso
Makefile
- CONFIG_RISCV_USER_CFI option is moved under "Kernel features" menu
Added `arch/riscv/configs/hardening.config` fragment which selects CONFIG_RISCV_USER_CFI
v17:
- fixed warnings due to empty macros in usercfi.h (reported by alexg)
- fixed prefixes in commit titles reported by alexg
- took below uprobe with fcfi v2 patch from Zong Li and squashed it with
"riscv/traps: Introduce software check exception and uprobe handling" https://lore.kernel.org/all/20250604093403.10916-1-zong.li@sifive.com/
v16:
- If FWFT is not implemented or returns error for shadow stack activation, then
no_usercfi is set to disable shadow stack. Although this should be picked up by extension validation and activation. Fixed this bug for zicfilp and zicfiss both. Thanks to Charlie Jenkins for reporting this.
- If toolchain doesn't support cfi, cfi kselftest shouldn't build. Suggested by
Charlie Jenkins.
- Default for CONFIG_RISCV_USER_CFI is set to no. Charlie/Atish suggested to
keep it off till we have more hardware availibility with RVA23 profile and zimop/zcmop implemented. Else this will start breaking people's workflow
- Includes the fix if "!RV64 and !SBI" then definitions for FWFT in
asm-offsets.c error.
v15:
- Toolchain has been updated to include `-fcf-protection` flag. This
exists for x86 as well. Updated kernel patches to compile vDSO and selftest to compile with `fcf-protection=full` flag.
- selecting CONFIG_RISCV_USERCFI selects CONFIG_RISCV_SBI.
- Patch to enable shadow stack for kernel wasn't hidden behind
CONFIG_RISCV_USERCFI and CONFIG_RISCV_SBI. fixed that.
v14:
- rebased on top of palmer/sbi-v3. Thus dropped clement's FWFT patches
Updated RISCV_ISA_EXT_XXXX in hwcap and hwprobe constants.
- Took Radim's suggestions on bitfields.
- Placed cfi_state at the end of thread_info block so that current situation
is not disturbed with respect to member fields of thread_info in single cacheline.
v13:
- cpu_supports_shadow_stack/cpu_supports_indirect_br_lp_instr uses
riscv_has_extension_unlikely()
- uses nops(count) to create nop slide
- RISCV_ACQUIRE_BARRIER is not needed in `amo_user_shstk`. Removed it
- changed ternaries to simply use implicit casting to convert to bool.
- kernel command line allows to disable zicfilp and zicfiss independently.
updated kernel-parameters.txt.
- ptrace user abi for cfi uses bitmasks instead of bitfields. Added ptrace
kselftest.
- cosmetic and grammatical changes to documentation.
v12:
- It seems like I had accidently squashed arch agnostic indirect branch
tracking prctl and riscv implementation of those prctls. Split them again.
- set_shstk_status/set_indir_lp_status perform CSR writes only when CPU
support is available. As suggested by Zong Li.
- Some minor clean up in kselftests as suggested by Zong Li.
v11:
- patch "arch/riscv: compile vdso with landing pad" was unconditionally
selecting `_zicfilp` for vDSO compile. fixed that. Changed `lpad 1` to to `lpad 0`. v10:
- dropped "mm: helper `is_shadow_stack_vma` to check shadow stack vma". This patch
is not that interesting to this patch series for risc-v. There are instances in arch directories where VM_SHADOW_STACK flag is anyways used. Dropping this patch to expedite merging in riscv tree.
- Took suggestions from `Clement` on "riscv: zicfiss / zicfilp enumeration" to
validate presence of cfi based on config.
- Added a patch for vDSO to have `lpad 0`. I had omitted this earlier to make sure
we add single vdso object with cfi enabled. But a vdso object with scheme of zero labeled landing pad is least common denominator and should work with all objects of zero labeled as well as function-signature labeled objects.
v9:
- rebased on master (39a803b754d5 fix braino in "9p: fix ->rename_sem exclusion")
- dropped "mm: Introduce ARCH_HAS_USER_SHADOW_STACK" (master has it from arm64/gcs)
- dropped "prctl: arch-agnostic prctl for shadow stack" (master has it from arm64/gcs)
v8:
- rebased on palmer/for-next
- dropped samuel holland's `envcfg` context switch patches.
they are in parlmer/for-next
v7:
- Removed "riscv/Kconfig: enable HAVE_EXIT_THREAD for riscv"
Instead using `deactivate_mm` flow to clean up. see here for more context https://lore.kernel.org/all/20230908203655.543765-1-rick.p.edgecombe@intel.c...
- Changed the header include in `kselftest`. Hopefully this fixes compile
issue faced by Zong Li at SiFive.
- Cleaned up an orphaned change to `mm/mmap.c` in below patch
"riscv/mm : ensure PROT_WRITE leads to VM_READ | VM_WRITE"
- Lock interfaces for shadow stack and indirect branch tracking expect arg == 0
Any future evolution of this interface should accordingly define how arg should be setup.
- `mm/map.c` has an instance of using `VM_SHADOW_STACK`. Fixed it to use helper
`is_shadow_stack_vma`.
v6:
- Picked up Samuel Holland's changes as is with `envcfg` placed in
`thread` instead of `thread_info`
- fixed unaligned newline escapes in kselftest
- cleaned up messages in kselftest and included test output in commit message
- fixed a bug in clone path reported by Zong Li
- fixed a build issue if CONFIG_RISCV_ISA_V is not selected
(this was introduced due to re-factoring signal context management code)
v5:
- rebased on v6.12-rc1
- Fixed schema related issues in device tree file
- Fixed some of the documentation related issues in zicfilp/ss.rst
(style issues and added index)
- added `SHADOW_STACK_SET_MARKER` so that implementation can define base
of shadow stack.
- Fixed warnings on definitions added in usercfi.h when
CONFIG_RISCV_USER_CFI is not selected.
- Adopted context header based signal handling as proposed by Andy Chiu
- Added support for enabling kernel mode access to shadow stack using
FWFT (https://github.com/riscv-non-isa/riscv-sbi-doc/blob/master/src/ext-firmware-...)
(Note: I had an issue in my workflow due to which version number wasn't picked up correctly while sending out patches)
v4:
- rebased on 6.11-rc6
- envcfg: Converged with Samuel Holland's patches for envcfg management on per-
thread basis.
- vma_is_shadow_stack is renamed to is_vma_shadow_stack
- picked up Mark Brown's `ARCH_HAS_USER_SHADOW_STACK` patch
- signal context: using extended context management to maintain compatibility.
- fixed `-Wmissing-prototypes` compiler warnings for prctl functions
- Documentation fixes and amending typos.
- Link to v4: https://lore.kernel.org/all/20240912231650.3740732-1-debug@rivosinc.com/
v3:
- envcfg
logic to pick up base envcfg had a bug where `ENVCFG_CBZE` could have been picked on per task basis, even though CPU didn't implement it. Fixed in this series.
- dt-bindings
As suggested, split into separate commit. fixed the messaging that spec is in public review
- arch_is_shadow_stack change
arch_is_shadow_stack changed to vma_is_shadow_stack
- hwprobe
zicfiss / zicfilp if present will get enumerated in hwprobe
- selftests
As suggested, added object and binary filenames to .gitignore Selftest binary anyways need to be compiled with cfi enabled compiler which will make sure that landing pad and shadow stack are enabled. Thus removed separate enable/disable tests. Cleaned up tests a bit.
v2:
- Using config `CONFIG_RISCV_USER_CFI`, kernel support for riscv control flow
integrity for user mode programs can be compiled in the kernel.
Enabling of control flow integrity for user programs is left to user runtime
This patch series introduces arch agnostic `prctls` to enable shadow stack
and indirect branch tracking. And implements them on riscv.
Changes in v22:
Changes in v21:
Changes in v20:
Changes in v19:
Changes in v18:
Changes in v17:
Changes in v16:
Changes in v15:
- changelog posted just below cover letter
- Link to v14: https://lore.kernel.org/r/20250429-v5_user_cfi_series-v14-0-5239410d012a@riv...
Changes in v14:
- changelog posted just below cover letter
- Link to v13: https://lore.kernel.org/r/20250424-v5_user_cfi_series-v13-0-971437de586a@riv...
Changes in v13:
- changelog posted just below cover letter
- Link to v12: https://lore.kernel.org/r/20250314-v5_user_cfi_series-v12-0-e51202b53138@riv...
Changes in v12:
- changelog posted just below cover letter
- Link to v11: https://lore.kernel.org/r/20250310-v5_user_cfi_series-v11-0-86b36cbfb910@riv...
Changes in v11:
- changelog posted just below cover letter
- Link to v10: https://lore.kernel.org/r/20250210-v5_user_cfi_series-v10-0-163dcfa31c60@riv...
Andy Chiu (1): riscv: signal: abstract header saving for setup_sigcontext
Deepak Gupta (26): mm: VM_SHADOW_STACK definition for riscv dt-bindings: riscv: zicfilp and zicfiss in dt-bindings (extensions.yaml) riscv: zicfiss / zicfilp enumeration riscv: zicfiss / zicfilp extension csr and bit definitions riscv: usercfi state for task and save/restore of CSR_SSP on trap entry/exit riscv/mm : ensure PROT_WRITE leads to VM_READ | VM_WRITE riscv/mm: manufacture shadow stack pte riscv/mm: teach pte_mkwrite to manufacture shadow stack PTEs riscv/mm: write protect and shadow stack riscv/mm: Implement map_shadow_stack() syscall riscv/shstk: If needed allocate a new shadow stack on clone riscv: Implements arch agnostic shadow stack prctls prctl: arch-agnostic prctl for indirect branch tracking riscv: Implements arch agnostic indirect branch tracking prctls riscv/traps: Introduce software check exception and uprobe handling riscv/signal: save and restore of shadow stack for signal riscv/kernel: update __show_regs to print shadow stack register riscv/ptrace: riscv cfi status and state via ptrace and in core files riscv/hwprobe: zicfilp / zicfiss enumeration in hwprobe riscv: kernel command line option to opt out of user cfi riscv: enable kernel access to shadow stack memory via FWFT sbi call arch/riscv: dual vdso creation logic and select vdso based on hw riscv: create a config for shadow stack and landing pad instr support riscv: Documentation for landing pad / indirect branch tracking riscv: Documentation for shadow stack on riscv kselftest/riscv: kselftest for user mode cfi
Jim Shu (1): arch/riscv: compile vdso with landing pad and shadow stack note
Documentation/admin-guide/kernel-parameters.txt | 8 + Documentation/arch/riscv/index.rst | 2 + Documentation/arch/riscv/zicfilp.rst | 115 +++++ Documentation/arch/riscv/zicfiss.rst | 179 +++++++ .../devicetree/bindings/riscv/extensions.yaml | 14 + arch/riscv/Kconfig | 22 + arch/riscv/Makefile | 8 +- arch/riscv/configs/hardening.config | 4 + arch/riscv/include/asm/asm-prototypes.h | 1 + arch/riscv/include/asm/assembler.h | 44 ++ arch/riscv/include/asm/cpufeature.h | 12 + arch/riscv/include/asm/csr.h | 16 + arch/riscv/include/asm/entry-common.h | 2 + arch/riscv/include/asm/hwcap.h | 2 + arch/riscv/include/asm/mman.h | 26 + arch/riscv/include/asm/mmu_context.h | 7 + arch/riscv/include/asm/pgtable.h | 30 +- arch/riscv/include/asm/processor.h | 1 + arch/riscv/include/asm/thread_info.h | 3 + arch/riscv/include/asm/usercfi.h | 95 ++++ arch/riscv/include/asm/vdso.h | 13 +- arch/riscv/include/asm/vector.h | 3 + arch/riscv/include/uapi/asm/hwprobe.h | 2 + arch/riscv/include/uapi/asm/ptrace.h | 34 ++ arch/riscv/include/uapi/asm/sigcontext.h | 1 + arch/riscv/kernel/Makefile | 2 + arch/riscv/kernel/asm-offsets.c | 10 + arch/riscv/kernel/cpufeature.c | 27 + arch/riscv/kernel/entry.S | 38 ++ arch/riscv/kernel/head.S | 27 + arch/riscv/kernel/process.c | 27 +- arch/riscv/kernel/ptrace.c | 95 ++++ arch/riscv/kernel/signal.c | 148 +++++- arch/riscv/kernel/sys_hwprobe.c | 2 + arch/riscv/kernel/sys_riscv.c | 10 + arch/riscv/kernel/traps.c | 54 ++ arch/riscv/kernel/usercfi.c | 545 +++++++++++++++++++++ arch/riscv/kernel/vdso.c | 7 + arch/riscv/kernel/vdso/Makefile | 40 +- arch/riscv/kernel/vdso/flush_icache.S | 4 + arch/riscv/kernel/vdso/gen_vdso_offsets.sh | 4 +- arch/riscv/kernel/vdso/getcpu.S | 4 + arch/riscv/kernel/vdso/note.S | 3 + arch/riscv/kernel/vdso/rt_sigreturn.S | 4 + arch/riscv/kernel/vdso/sys_hwprobe.S | 4 + arch/riscv/kernel/vdso/vgetrandom-chacha.S | 5 +- arch/riscv/kernel/vdso_cfi/Makefile | 25 + arch/riscv/kernel/vdso_cfi/vdso-cfi.S | 11 + arch/riscv/mm/init.c | 2 +- arch/riscv/mm/pgtable.c | 16 + include/linux/cpu.h | 4 + include/linux/mm.h | 7 + include/uapi/linux/elf.h | 2 + include/uapi/linux/prctl.h | 27 + kernel/sys.c | 30 ++ tools/testing/selftests/riscv/Makefile | 2 +- tools/testing/selftests/riscv/cfi/.gitignore | 3 + tools/testing/selftests/riscv/cfi/Makefile | 16 + tools/testing/selftests/riscv/cfi/cfi_rv_test.h | 82 ++++ tools/testing/selftests/riscv/cfi/riscv_cfi_test.c | 173 +++++++ tools/testing/selftests/riscv/cfi/shadowstack.c | 385 +++++++++++++++ tools/testing/selftests/riscv/cfi/shadowstack.h | 27 + 62 files changed, 2475 insertions(+), 41 deletions(-)
base-commit: 3a8660878839faadb4f1a6dd72c3179c1df56787 change-id: 20240930-v5_user_cfi_series-3dc332f8f5b2 --
- debug
On Mon, Oct 20, 2025 at 01:53:06PM -0700, Deepak Gupta wrote:
I am getting below error on my end for all recipients. That's the reason for partial delivery. I have to figure out issue and then I'll re-send.
""" The user you are trying to contact is receiving mail at a rate that\n 4.2.1 prevents additional messages from being delivered. Please resend your\n 4.2.1 message at a later time. If the user is able to receive mail at that\n 4.2.1 time, your message will be delivered. For more information, go to\n 4.2.1 https://support.google.com/mail/?p=ReceivingRate 98e67ed59e1d1-33dfb7f8310sm153460a91.5 - gsmtp') """
A MTA says "You're sending too much email too fast, please stop" and your response is to send everyone the entire patchset A SECOND TIME?!
Please stop.
--D
On Mon, Oct 20, 2025 at 01:22:29PM -0700, Deepak Gupta wrote:
v22: fixing build error due to -march=zicfiss being picked in gcc-13 and above but not actually doing any codegen or recognizing instruction for zicfiss. Change in v22 makes dependence on `-fcf-protection=full` compiler flag to ensure that toolchain has support and then only CONFIG_RISCV_USER_CFI will be visible in menuconfig.
v21: fixed build errors.
Basics and overview
Software with larger attack surfaces (e.g. network facing apps like databases, browsers or apps relying on browser runtimes) suffer from memory corruption issues which can be utilized by attackers to bend control flow of the program to eventually gain control (by making their payload executable). Attackers are able to perform such attacks by leveraging call-sites which rely on indirect calls or return sites which rely on obtaining return address from stack memory.
To mitigate such attacks, risc-v extension zicfilp enforces that all indirect calls must land on a landing pad instruction `lpad` else cpu will raise software check exception (a new cpu exception cause code on riscv). Similarly for return flow, risc-v extension zicfiss extends architecture with
- `sspush` instruction to push return address on a shadow stack
- `sspopchk` instruction to pop return address from shadow stack
and compare with input operand (i.e. return address on stack)
- `sspopchk` to raise software check exception if comparision above
was a mismatch
- Protection mechanism using which shadow stack is not writeable via
regular store instructions
More information an details can be found at extensions github repo [1].
Equivalent to landing pad (zicfilp) on x86 is `ENDBRANCH` instruction in Intel CET [3] and branch target identification (BTI) [4] on arm. Similarly x86's Intel CET has shadow stack [5] and arm64 has guarded control stack (GCS) [6] which are very similar to risc-v's zicfiss shadow stack.
x86 and arm64 support for user mode shadow stack is already in mainline.
Kernel awareness for user control flow integrity
This series picks up Samuel Holland's envcfg changes [2] as well. So if those are being applied independently, they should be removed from this series.
Enabling:
In order to maintain compatibility and not break anything in user mode, kernel doesn't enable control flow integrity cpu extensions on binary by default. Instead exposes a prctl interface to enable, disable and lock the shadow stack or landing pad feature for a task. This allows userspace (loader) to enumerate if all objects in its address space are compiled with shadow stack and landing pad support and accordingly enable the feature. Additionally if a subsequent `dlopen` happens on a library, user mode can take a decision again to disable the feature (if incoming library is not compiled with support) OR terminate the task (if user mode policy is strict to have all objects in address space to be compiled with control flow integirty cpu feature). prctl to enable shadow stack results in allocating shadow stack from virtual memory and activating for user address space. x86 and arm64 are also following same direction due to similar reason(s).
clone/fork:
On clone and fork, cfi state for task is inherited by child. Shadow stack is part of virtual memory and is a writeable memory from kernel perspective (writeable via a restricted set of instructions aka shadow stack instructions) Thus kernel changes ensure that this memory is converted into read-only when fork/clone happens and COWed when fault is taken due to sspush, sspopchk or ssamoswap. In case `CLONE_VM` is specified and shadow stack is to be enabled, kernel will automatically allocate a shadow stack for that clone call.
map_shadow_stack:
x86 introduced `map_shadow_stack` system call to allow user space to explicitly map shadow stack memory in its address space. It is useful to allocate shadow for different contexts managed by a single thread (green threads or contexts) risc-v implements this system call as well.
signal management:
If shadow stack is enabled for a task, kernel performs an asynchronous control flow diversion to deliver the signal and eventually expects userspace to issue sigreturn so that original execution can be resumed. Even though resume context is prepared by kernel, it is in user space memory and is subject to memory corruption and corruption bugs can be utilized by attacker in this race window to perform arbitrary sigreturn and eventually bypass cfi mechanism. Another issue is how to ensure that cfi related state on sigcontext area is not trampled by legacy apps or apps compiled with old kernel headers.
In order to mitigate control-flow hijacting, kernel prepares a token and place it on shadow stack before signal delivery and places address of token in sigcontext structure. During sigreturn, kernel obtains address of token from sigcontext struture, reads token from shadow stack and validates it and only then allow sigreturn to succeed. Compatiblity issue is solved by adopting dynamic sigcontext management introduced for vector extension. This series re-factor the code little bit to allow future sigcontext management easy (as proposed by Andy Chiu from SiFive)
config and compilation:
Introduce a new risc-v config option `CONFIG_RISCV_USER_CFI`. Selecting this config option picks the kernel support for user control flow integrity. This optin is presented only if toolchain has shadow stack and landing pad support. And is on purpose guarded by toolchain support. Reason being that eventually vDSO also needs to be compiled in with shadow stack and landing pad support. vDSO compile patches are not included as of now because landing pad labeling scheme is yet to settle for usermode runtime.
To get more information on kernel interactions with respect to zicfilp and zicfiss, patch series adds documentation for `zicfilp` and `zicfiss` in following: Documentation/arch/riscv/zicfiss.rst Documentation/arch/riscv/zicfilp.rst
How to test this series
Toolchain
$ git clone git@github.com:sifive/riscv-gnu-toolchain.git -b cfi-dev $ riscv-gnu-toolchain/configure --prefix=<path-to-where-to-build> --with-arch=rv64gc_zicfilp_zicfiss --enable-linux --disable-gdb --with-extra-multilib-test="rv64gc_zicfilp_zicfiss-lp64d:-static" $ make -j$(nproc)
Qemu
Get the lastest qemu $ cd qemu $ mkdir build $ cd build $ ../configure --target-list=riscv64-softmmu $ make -j$(nproc)
Opensbi
$ git clone git@github.com:deepak0414/opensbi.git -b v6_cfi_spec_split_opensbi $ make CROSS_COMPILE=<your riscv toolchain> -j$(nproc) PLATFORM=generic
Linux
Running defconfig is fine. CFI is enabled by default if the toolchain supports it.
$ make ARCH=riscv CROSS_COMPILE=<path-to-cfi-riscv-gnu-toolchain>/build/bin/riscv64-unknown-linux-gnu- -j$(nproc) defconfig $ make ARCH=riscv CROSS_COMPILE=<path-to-cfi-riscv-gnu-toolchain>/build/bin/riscv64-unknown-linux-gnu- -j$(nproc)
Running
Modify your qemu command to have: -bios <path-to-cfi-opensbi>/build/platform/generic/firmware/fw_dynamic.bin -cpu rv64,zicfilp=true,zicfiss=true,zimop=true,zcmop=true
References
[1] - https://github.com/riscv/riscv-cfi [2] - https://lore.kernel.org/all/20240814081126.956287-1-samuel.holland@sifive.co... [3] - https://lwn.net/Articles/889475/ [4] - https://developer.arm.com/documentation/109576/0100/Branch-Target-Identifica... [5] - https://www.intel.com/content/dam/develop/external/us/en/documents/catc17-in... [6] - https://lwn.net/Articles/940403/
To: Thomas Gleixner tglx@linutronix.de To: Ingo Molnar mingo@redhat.com To: Borislav Petkov bp@alien8.de To: Dave Hansen dave.hansen@linux.intel.com To: x86@kernel.org To: H. Peter Anvin hpa@zytor.com To: Andrew Morton akpm@linux-foundation.org To: Liam R. Howlett Liam.Howlett@oracle.com To: Vlastimil Babka vbabka@suse.cz To: Lorenzo Stoakes lorenzo.stoakes@oracle.com To: Paul Walmsley paul.walmsley@sifive.com To: Palmer Dabbelt palmer@dabbelt.com To: Albert Ou aou@eecs.berkeley.edu To: Conor Dooley conor@kernel.org To: Rob Herring robh@kernel.org To: Krzysztof Kozlowski krzk+dt@kernel.org To: Arnd Bergmann arnd@arndb.de To: Christian Brauner brauner@kernel.org To: Peter Zijlstra peterz@infradead.org To: Oleg Nesterov oleg@redhat.com To: Eric Biederman ebiederm@xmission.com To: Kees Cook kees@kernel.org To: Jonathan Corbet corbet@lwn.net To: Shuah Khan shuah@kernel.org To: Jann Horn jannh@google.com To: Conor Dooley conor+dt@kernel.org To: Miguel Ojeda ojeda@kernel.org To: Alex Gaynor alex.gaynor@gmail.com To: Boqun Feng boqun.feng@gmail.com To: Gary Guo gary@garyguo.net To: Björn Roy Baron bjorn3_gh@protonmail.com To: Benno Lossin benno.lossin@proton.me To: Andreas Hindborg a.hindborg@kernel.org To: Alice Ryhl aliceryhl@google.com To: Trevor Gross tmgross@umich.edu Cc: linux-kernel@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org Cc: linux-mm@kvack.org Cc: linux-riscv@lists.infradead.org Cc: devicetree@vger.kernel.org Cc: linux-arch@vger.kernel.org Cc: linux-doc@vger.kernel.org Cc: linux-kselftest@vger.kernel.org Cc: alistair.francis@wdc.com Cc: richard.henderson@linaro.org Cc: jim.shu@sifive.com Cc: andybnac@gmail.com Cc: kito.cheng@sifive.com Cc: charlie@rivosinc.com Cc: atishp@rivosinc.com Cc: evan@rivosinc.com Cc: cleger@rivosinc.com Cc: alexghiti@rivosinc.com Cc: samitolvanen@google.com Cc: broonie@kernel.org Cc: rick.p.edgecombe@intel.com Cc: rust-for-linux@vger.kernel.org
changelog
v22:
- CONFIG_RISCV_USER_CFI was by default "n". With dual vdso support it is
default "y" (if toolchain supports it). Fixing build error due to "-march=zicfiss" being picked in gcc-13 partially. gcc-13 only recognizes the flag but not actually doing any codegen or recognizing instruction for zicfiss. Change in v22 makes dependence on `-fcf-protection=full` compiler flag to ensure that toolchain has support and then only CONFIG_RISCV_USER_CFI will be visible in menuconfig.
- picked up tags and some cosmetic changes in commit message for dual vdso
patch.
v21:
- Fixing build errors due to changes in arch/riscv/include/asm/vdso.h
Using #ifdef instead of IS_ENABLED in arch/riscv/include/asm/vdso.h vdso-cfi-offsets.h should be included only when CONFIG_RISCV_USER_CFI is selected.
v20:
- rebased on v6.18-rc1.
- Added two vDSO support. If `CONFIG_RISCV_USER_CFI` is selected
two vDSOs are compiled (one for hardware prior to RVA23 and one for RVA23 onwards). Kernel exposes RVA23 vDSO if hardware/cpu implements zimop else exposes existing vDSO to userspace.
- default selection for `CONFIG_RISCV_USER_CFI` is "Yes".
- replaced "__ASSEMBLY__" with "__ASSEMBLER__"
v19:
- riscv_nousercfi was `int`. changed it to unsigned long.
Thanks to Alex Ghiti for reporting it. It was a bug.
- ELP is cleared on trap entry only when CONFIG_64BIT.
- restore ssp back on return to usermode was being done
before `riscv_v_context_nesting_end` on trap exit path. If kernel shadow stack were enabled this would result in kernel operating on user shadow stack and panic (as I found in my testing of kcfi patch series). So fixed that.
v18:
- rebased on 6.16-rc1
- uprobe handling clears ELP in sstatus image in pt_regs
- vdso was missing shadow stack elf note for object files.
added that. Additional asm file for vdso needed the elf marker flag. toolchain should complain if `-fcf-protection=full` and marker is missing for object generated from asm file. Asked toolchain folks to fix this. Although no reason to gate the merge on that.
- Split up compile options for march and fcf-protection in vdso
Makefile
- CONFIG_RISCV_USER_CFI option is moved under "Kernel features" menu
Added `arch/riscv/configs/hardening.config` fragment which selects CONFIG_RISCV_USER_CFI
v17:
- fixed warnings due to empty macros in usercfi.h (reported by alexg)
- fixed prefixes in commit titles reported by alexg
- took below uprobe with fcfi v2 patch from Zong Li and squashed it with
"riscv/traps: Introduce software check exception and uprobe handling" https://lore.kernel.org/all/20250604093403.10916-1-zong.li@sifive.com/
v16:
- If FWFT is not implemented or returns error for shadow stack activation, then
no_usercfi is set to disable shadow stack. Although this should be picked up by extension validation and activation. Fixed this bug for zicfilp and zicfiss both. Thanks to Charlie Jenkins for reporting this.
- If toolchain doesn't support cfi, cfi kselftest shouldn't build. Suggested by
Charlie Jenkins.
- Default for CONFIG_RISCV_USER_CFI is set to no. Charlie/Atish suggested to
keep it off till we have more hardware availibility with RVA23 profile and zimop/zcmop implemented. Else this will start breaking people's workflow
- Includes the fix if "!RV64 and !SBI" then definitions for FWFT in
asm-offsets.c error.
v15:
- Toolchain has been updated to include `-fcf-protection` flag. This
exists for x86 as well. Updated kernel patches to compile vDSO and selftest to compile with `fcf-protection=full` flag.
- selecting CONFIG_RISCV_USERCFI selects CONFIG_RISCV_SBI.
- Patch to enable shadow stack for kernel wasn't hidden behind
CONFIG_RISCV_USERCFI and CONFIG_RISCV_SBI. fixed that.
v14:
- rebased on top of palmer/sbi-v3. Thus dropped clement's FWFT patches
Updated RISCV_ISA_EXT_XXXX in hwcap and hwprobe constants.
- Took Radim's suggestions on bitfields.
- Placed cfi_state at the end of thread_info block so that current situation
is not disturbed with respect to member fields of thread_info in single cacheline.
v13:
- cpu_supports_shadow_stack/cpu_supports_indirect_br_lp_instr uses
riscv_has_extension_unlikely()
- uses nops(count) to create nop slide
- RISCV_ACQUIRE_BARRIER is not needed in `amo_user_shstk`. Removed it
- changed ternaries to simply use implicit casting to convert to bool.
- kernel command line allows to disable zicfilp and zicfiss independently.
updated kernel-parameters.txt.
- ptrace user abi for cfi uses bitmasks instead of bitfields. Added ptrace
kselftest.
- cosmetic and grammatical changes to documentation.
v12:
- It seems like I had accidently squashed arch agnostic indirect branch
tracking prctl and riscv implementation of those prctls. Split them again.
- set_shstk_status/set_indir_lp_status perform CSR writes only when CPU
support is available. As suggested by Zong Li.
- Some minor clean up in kselftests as suggested by Zong Li.
v11:
- patch "arch/riscv: compile vdso with landing pad" was unconditionally
selecting `_zicfilp` for vDSO compile. fixed that. Changed `lpad 1` to to `lpad 0`. v10:
- dropped "mm: helper `is_shadow_stack_vma` to check shadow stack vma". This patch
is not that interesting to this patch series for risc-v. There are instances in arch directories where VM_SHADOW_STACK flag is anyways used. Dropping this patch to expedite merging in riscv tree.
- Took suggestions from `Clement` on "riscv: zicfiss / zicfilp enumeration" to
validate presence of cfi based on config.
- Added a patch for vDSO to have `lpad 0`. I had omitted this earlier to make sure
we add single vdso object with cfi enabled. But a vdso object with scheme of zero labeled landing pad is least common denominator and should work with all objects of zero labeled as well as function-signature labeled objects.
v9:
- rebased on master (39a803b754d5 fix braino in "9p: fix ->rename_sem exclusion")
- dropped "mm: Introduce ARCH_HAS_USER_SHADOW_STACK" (master has it from arm64/gcs)
- dropped "prctl: arch-agnostic prctl for shadow stack" (master has it from arm64/gcs)
v8:
- rebased on palmer/for-next
- dropped samuel holland's `envcfg` context switch patches.
they are in parlmer/for-next
v7:
- Removed "riscv/Kconfig: enable HAVE_EXIT_THREAD for riscv"
Instead using `deactivate_mm` flow to clean up. see here for more context https://lore.kernel.org/all/20230908203655.543765-1-rick.p.edgecombe@intel.c...
- Changed the header include in `kselftest`. Hopefully this fixes compile
issue faced by Zong Li at SiFive.
- Cleaned up an orphaned change to `mm/mmap.c` in below patch
"riscv/mm : ensure PROT_WRITE leads to VM_READ | VM_WRITE"
- Lock interfaces for shadow stack and indirect branch tracking expect arg == 0
Any future evolution of this interface should accordingly define how arg should be setup.
- `mm/map.c` has an instance of using `VM_SHADOW_STACK`. Fixed it to use helper
`is_shadow_stack_vma`.
v6:
- Picked up Samuel Holland's changes as is with `envcfg` placed in
`thread` instead of `thread_info`
- fixed unaligned newline escapes in kselftest
- cleaned up messages in kselftest and included test output in commit message
- fixed a bug in clone path reported by Zong Li
- fixed a build issue if CONFIG_RISCV_ISA_V is not selected
(this was introduced due to re-factoring signal context management code)
v5:
- rebased on v6.12-rc1
- Fixed schema related issues in device tree file
- Fixed some of the documentation related issues in zicfilp/ss.rst
(style issues and added index)
- added `SHADOW_STACK_SET_MARKER` so that implementation can define base
of shadow stack.
- Fixed warnings on definitions added in usercfi.h when
CONFIG_RISCV_USER_CFI is not selected.
- Adopted context header based signal handling as proposed by Andy Chiu
- Added support for enabling kernel mode access to shadow stack using
FWFT (https://github.com/riscv-non-isa/riscv-sbi-doc/blob/master/src/ext-firmware-...)
(Note: I had an issue in my workflow due to which version number wasn't picked up correctly while sending out patches)
v4:
- rebased on 6.11-rc6
- envcfg: Converged with Samuel Holland's patches for envcfg management on per-
thread basis.
- vma_is_shadow_stack is renamed to is_vma_shadow_stack
- picked up Mark Brown's `ARCH_HAS_USER_SHADOW_STACK` patch
- signal context: using extended context management to maintain compatibility.
- fixed `-Wmissing-prototypes` compiler warnings for prctl functions
- Documentation fixes and amending typos.
- Link to v4: https://lore.kernel.org/all/20240912231650.3740732-1-debug@rivosinc.com/
v3:
- envcfg
logic to pick up base envcfg had a bug where `ENVCFG_CBZE` could have been picked on per task basis, even though CPU didn't implement it. Fixed in this series.
- dt-bindings
As suggested, split into separate commit. fixed the messaging that spec is in public review
- arch_is_shadow_stack change
arch_is_shadow_stack changed to vma_is_shadow_stack
- hwprobe
zicfiss / zicfilp if present will get enumerated in hwprobe
- selftests
As suggested, added object and binary filenames to .gitignore Selftest binary anyways need to be compiled with cfi enabled compiler which will make sure that landing pad and shadow stack are enabled. Thus removed separate enable/disable tests. Cleaned up tests a bit.
v2:
- Using config `CONFIG_RISCV_USER_CFI`, kernel support for riscv control flow
integrity for user mode programs can be compiled in the kernel.
Enabling of control flow integrity for user programs is left to user runtime
This patch series introduces arch agnostic `prctls` to enable shadow stack
and indirect branch tracking. And implements them on riscv.
Changes in v22:
Changes in v21:
Changes in v20:
Changes in v19:
Changes in v18:
Changes in v17:
Changes in v16:
Changes in v15:
- changelog posted just below cover letter
- Link to v14: https://lore.kernel.org/r/20250429-v5_user_cfi_series-v14-0-5239410d012a@riv...
Changes in v14:
- changelog posted just below cover letter
- Link to v13: https://lore.kernel.org/r/20250424-v5_user_cfi_series-v13-0-971437de586a@riv...
Changes in v13:
- changelog posted just below cover letter
- Link to v12: https://lore.kernel.org/r/20250314-v5_user_cfi_series-v12-0-e51202b53138@riv...
Changes in v12:
- changelog posted just below cover letter
- Link to v11: https://lore.kernel.org/r/20250310-v5_user_cfi_series-v11-0-86b36cbfb910@riv...
Changes in v11:
- changelog posted just below cover letter
- Link to v10: https://lore.kernel.org/r/20250210-v5_user_cfi_series-v10-0-163dcfa31c60@riv...
Andy Chiu (1): riscv: signal: abstract header saving for setup_sigcontext
Deepak Gupta (26): mm: VM_SHADOW_STACK definition for riscv dt-bindings: riscv: zicfilp and zicfiss in dt-bindings (extensions.yaml) riscv: zicfiss / zicfilp enumeration riscv: zicfiss / zicfilp extension csr and bit definitions riscv: usercfi state for task and save/restore of CSR_SSP on trap entry/exit riscv/mm : ensure PROT_WRITE leads to VM_READ | VM_WRITE riscv/mm: manufacture shadow stack pte riscv/mm: teach pte_mkwrite to manufacture shadow stack PTEs riscv/mm: write protect and shadow stack riscv/mm: Implement map_shadow_stack() syscall riscv/shstk: If needed allocate a new shadow stack on clone riscv: Implements arch agnostic shadow stack prctls prctl: arch-agnostic prctl for indirect branch tracking riscv: Implements arch agnostic indirect branch tracking prctls riscv/traps: Introduce software check exception and uprobe handling riscv/signal: save and restore of shadow stack for signal riscv/kernel: update __show_regs to print shadow stack register riscv/ptrace: riscv cfi status and state via ptrace and in core files riscv/hwprobe: zicfilp / zicfiss enumeration in hwprobe riscv: kernel command line option to opt out of user cfi riscv: enable kernel access to shadow stack memory via FWFT sbi call arch/riscv: dual vdso creation logic and select vdso based on hw riscv: create a config for shadow stack and landing pad instr support riscv: Documentation for landing pad / indirect branch tracking riscv: Documentation for shadow stack on riscv kselftest/riscv: kselftest for user mode cfi
Jim Shu (1): arch/riscv: compile vdso with landing pad and shadow stack note
Documentation/admin-guide/kernel-parameters.txt | 8 + Documentation/arch/riscv/index.rst | 2 + Documentation/arch/riscv/zicfilp.rst | 115 +++++ Documentation/arch/riscv/zicfiss.rst | 179 +++++++ .../devicetree/bindings/riscv/extensions.yaml | 14 + arch/riscv/Kconfig | 22 + arch/riscv/Makefile | 8 +- arch/riscv/configs/hardening.config | 4 + arch/riscv/include/asm/asm-prototypes.h | 1 + arch/riscv/include/asm/assembler.h | 44 ++ arch/riscv/include/asm/cpufeature.h | 12 + arch/riscv/include/asm/csr.h | 16 + arch/riscv/include/asm/entry-common.h | 2 + arch/riscv/include/asm/hwcap.h | 2 + arch/riscv/include/asm/mman.h | 26 + arch/riscv/include/asm/mmu_context.h | 7 + arch/riscv/include/asm/pgtable.h | 30 +- arch/riscv/include/asm/processor.h | 1 + arch/riscv/include/asm/thread_info.h | 3 + arch/riscv/include/asm/usercfi.h | 95 ++++ arch/riscv/include/asm/vdso.h | 13 +- arch/riscv/include/asm/vector.h | 3 + arch/riscv/include/uapi/asm/hwprobe.h | 2 + arch/riscv/include/uapi/asm/ptrace.h | 34 ++ arch/riscv/include/uapi/asm/sigcontext.h | 1 + arch/riscv/kernel/Makefile | 2 + arch/riscv/kernel/asm-offsets.c | 10 + arch/riscv/kernel/cpufeature.c | 27 + arch/riscv/kernel/entry.S | 38 ++ arch/riscv/kernel/head.S | 27 + arch/riscv/kernel/process.c | 27 +- arch/riscv/kernel/ptrace.c | 95 ++++ arch/riscv/kernel/signal.c | 148 +++++- arch/riscv/kernel/sys_hwprobe.c | 2 + arch/riscv/kernel/sys_riscv.c | 10 + arch/riscv/kernel/traps.c | 54 ++ arch/riscv/kernel/usercfi.c | 545 +++++++++++++++++++++ arch/riscv/kernel/vdso.c | 7 + arch/riscv/kernel/vdso/Makefile | 40 +- arch/riscv/kernel/vdso/flush_icache.S | 4 + arch/riscv/kernel/vdso/gen_vdso_offsets.sh | 4 +- arch/riscv/kernel/vdso/getcpu.S | 4 + arch/riscv/kernel/vdso/note.S | 3 + arch/riscv/kernel/vdso/rt_sigreturn.S | 4 + arch/riscv/kernel/vdso/sys_hwprobe.S | 4 + arch/riscv/kernel/vdso/vgetrandom-chacha.S | 5 +- arch/riscv/kernel/vdso_cfi/Makefile | 25 + arch/riscv/kernel/vdso_cfi/vdso-cfi.S | 11 + arch/riscv/mm/init.c | 2 +- arch/riscv/mm/pgtable.c | 16 + include/linux/cpu.h | 4 + include/linux/mm.h | 7 + include/uapi/linux/elf.h | 2 + include/uapi/linux/prctl.h | 27 + kernel/sys.c | 30 ++ tools/testing/selftests/riscv/Makefile | 2 +- tools/testing/selftests/riscv/cfi/.gitignore | 3 + tools/testing/selftests/riscv/cfi/Makefile | 16 + tools/testing/selftests/riscv/cfi/cfi_rv_test.h | 82 ++++ tools/testing/selftests/riscv/cfi/riscv_cfi_test.c | 173 +++++++ tools/testing/selftests/riscv/cfi/shadowstack.c | 385 +++++++++++++++ tools/testing/selftests/riscv/cfi/shadowstack.h | 27 + 62 files changed, 2475 insertions(+), 41 deletions(-)
base-commit: 3a8660878839faadb4f1a6dd72c3179c1df56787 change-id: 20240930-v5_user_cfi_series-3dc332f8f5b2 --
- debug
linux-kselftest-mirror@lists.linaro.org