From: Chia-Yu Chang <chia-yu.chang(a)nokia-bell-labs.com>
Hello,
Please find DUALPI2 iproute2 patch v12.
For more details of DualPI2, please refer IETF RFC9332
(https://datatracker.ietf.org/doc/html/rfc9332).
Best Regards,
Chia-Yu
---
v12 (04-Aug-2025)
- Split into 3 patches: one move get_float(), one add get_float_min_max(), one for dualpi2 (David Ahern <dsahern(a)kernel.org>)
- Repalce matches() with strcmp() within get_packets() (David Ahern <dsahern(a)kernel.org>)
- Apply reverse xmas tree listing of variables (David Ahern <dsahern(a)kernel.org>)
v11 (18-Jul-2025)
- Replace TCA_DUALPI2 prefix with TC_DUALPI2 prefix for enums (Jakub Kicinski <kuba(a)kernel.org>)
v10 (02-Jul-2025)
- Replace STEP_THRESH and STEP_PACKETS w/ STEP_THRESH_PKTS and STEP_THRESH_US of net-next patch (Jakub Kicinski <kuba(a)kernel.org>)
v9 (13-Jun-2025)
- Fix space issue and typos (ALOK TIWARI <alok.a.tiwari(a)oracle.com>)
- Change 'rtt_typical' to 'typical_rtt' in tc/q_dualpi2.c (ALOK TIWARI <alok.a.tiwari(a)oracle.com>)
- Add the num of enum used by DualPI2 in pkt_sched.h
v8 (09-May-2025)
- Update pkt_sched.h with the one in nex-next
- Correct a typo in the comment within pkt_sched.h (ALOK TIWARI <alok.a.tiwari(a)oracle.com>)
- Update manual content in man/man8/tc-dualpi2.8 (ALOK TIWARI <alok.a.tiwari(a)oracle.com>)
- Update tc/q_dualpi2.c to fix missing blank lines and add missing case (ALOK TIWARI <alok.a.tiwari(a)oracle.com>)
v7 (05-May-2025)
- Align pkt_sched.h with the v14 version of net-next due to spec modification in tc.yaml
- Reorganize dualpi2_print_opt() to match the order in tc.yaml
- Remove credit-queue in PRINT_JSON
v6 (26-Apr-2025)
- Update JSON file output due to spec modification in tc.yaml of net-next
v5 (25-Mar-2025)
- Use matches() to replace current strcmp() (Stephen Hemminger <stephen(a)networkplumber.org>)
- Use general parse_percent() for handling scaled percentage values (Stephen Hemminger <stephen(a)networkplumber.org>)
- Add print function for JSON of dualpi2 stats (Stephen Hemminger <stephen(a)networkplumber.org>)
v4 (16-Mar-2025)
- Add min_qlen_step to the dualpi2 attribute as the minimum queue length in number of packets in the L-queue to start step marking.
v3 (21-Feb-2025)
- Add memlimit to the dualpi2 attribute, and add memory_used, max_memory_used, and memory_limit in dualpi2 stats (Dave Taht <dave.taht(a)gmail.com>)
- Update the manual to align with the latest implementation and clarify the queue naming and default unit
- Use common "get_scaled_alpha_beta" and clean print_opt for Dualpi2
v2 (23-Oct-2024)
- Rename get_float in dualpi2 to get_float_min_max in utils.c
- Move get_float from iplink_can.c in utils.c (Stephen Hemminger <stephen(a)networkplumber.org>)
- Add print function for JSON of dualpi2 (Stephen Hemminger <stephen(a)networkplumber.org>)
---
Chia-Yu Chang (3):
Move get_float() from ip/iplink_can.c to lib/utils.c
Add get_float_min_max() in lib/utils.c
tc: add dualpi2 scheduler module
bash-completion/tc | 11 +-
include/utils.h | 2 +
ip/iplink_can.c | 14 --
lib/utils.c | 30 +++
man/man8/tc-dualpi2.8 | 249 ++++++++++++++++++++
tc/Makefile | 1 +
tc/q_dualpi2.c | 535 ++++++++++++++++++++++++++++++++++++++++++
7 files changed, 827 insertions(+), 15 deletions(-)
create mode 100644 man/man8/tc-dualpi2.8
create mode 100644 tc/q_dualpi2.c
--
2.34.1
Basics and overview
===================
Software with larger attack surfaces (e.g. network facing apps like databases,
browsers or apps relying on browser runtimes) suffer from memory corruption
issues which can be utilized by attackers to bend control flow of the program
to eventually gain control (by making their payload executable). Attackers are
able to perform such attacks by leveraging call-sites which rely on indirect
calls or return sites which rely on obtaining return address from stack memory.
To mitigate such attacks, risc-v extension zicfilp enforces that all indirect
calls must land on a landing pad instruction `lpad` else cpu will raise software
check exception (a new cpu exception cause code on riscv).
Similarly for return flow, risc-v extension zicfiss extends architecture with
- `sspush` instruction to push return address on a shadow stack
- `sspopchk` instruction to pop return address from shadow stack
and compare with input operand (i.e. return address on stack)
- `sspopchk` to raise software check exception if comparision above
was a mismatch
- Protection mechanism using which shadow stack is not writeable via
regular store instructions
More information an details can be found at extensions github repo [1].
Equivalent to landing pad (zicfilp) on x86 is `ENDBRANCH` instruction in Intel
CET [3] and branch target identification (BTI) [4] on arm.
Similarly x86's Intel CET has shadow stack [5] and arm64 has guarded control
stack (GCS) [6] which are very similar to risc-v's zicfiss shadow stack.
x86 and arm64 support for user mode shadow stack is already in mainline.
Kernel awareness for user control flow integrity
================================================
This series picks up Samuel Holland's envcfg changes [2] as well. So if those are
being applied independently, they should be removed from this series.
Enabling:
In order to maintain compatibility and not break anything in user mode, kernel
doesn't enable control flow integrity cpu extensions on binary by default.
Instead exposes a prctl interface to enable, disable and lock the shadow stack
or landing pad feature for a task. This allows userspace (loader) to enumerate
if all objects in its address space are compiled with shadow stack and landing
pad support and accordingly enable the feature. Additionally if a subsequent
`dlopen` happens on a library, user mode can take a decision again to disable
the feature (if incoming library is not compiled with support) OR terminate the
task (if user mode policy is strict to have all objects in address space to be
compiled with control flow integirty cpu feature). prctl to enable shadow stack
results in allocating shadow stack from virtual memory and activating for user
address space. x86 and arm64 are also following same direction due to similar
reason(s).
clone/fork:
On clone and fork, cfi state for task is inherited by child. Shadow stack is
part of virtual memory and is a writeable memory from kernel perspective
(writeable via a restricted set of instructions aka shadow stack instructions)
Thus kernel changes ensure that this memory is converted into read-only when
fork/clone happens and COWed when fault is taken due to sspush, sspopchk or
ssamoswap. In case `CLONE_VM` is specified and shadow stack is to be enabled,
kernel will automatically allocate a shadow stack for that clone call.
map_shadow_stack:
x86 introduced `map_shadow_stack` system call to allow user space to explicitly
map shadow stack memory in its address space. It is useful to allocate shadow
for different contexts managed by a single thread (green threads or contexts)
risc-v implements this system call as well.
signal management:
If shadow stack is enabled for a task, kernel performs an asynchronous control
flow diversion to deliver the signal and eventually expects userspace to issue
sigreturn so that original execution can be resumed. Even though resume context
is prepared by kernel, it is in user space memory and is subject to memory
corruption and corruption bugs can be utilized by attacker in this race window
to perform arbitrary sigreturn and eventually bypass cfi mechanism.
Another issue is how to ensure that cfi related state on sigcontext area is not
trampled by legacy apps or apps compiled with old kernel headers.
In order to mitigate control-flow hijacting, kernel prepares a token and place
it on shadow stack before signal delivery and places address of token in
sigcontext structure. During sigreturn, kernel obtains address of token from
sigcontext struture, reads token from shadow stack and validates it and only
then allow sigreturn to succeed. Compatiblity issue is solved by adopting
dynamic sigcontext management introduced for vector extension. This series
re-factor the code little bit to allow future sigcontext management easy (as
proposed by Andy Chiu from SiFive)
config and compilation:
Introduce a new risc-v config option `CONFIG_RISCV_USER_CFI`. Selecting this
config option picks the kernel support for user control flow integrity. This
optin is presented only if toolchain has shadow stack and landing pad support.
And is on purpose guarded by toolchain support. Reason being that eventually
vDSO also needs to be compiled in with shadow stack and landing pad support.
vDSO compile patches are not included as of now because landing pad labeling
scheme is yet to settle for usermode runtime.
To get more information on kernel interactions with respect to
zicfilp and zicfiss, patch series adds documentation for
`zicfilp` and `zicfiss` in following:
Documentation/arch/riscv/zicfiss.rst
Documentation/arch/riscv/zicfilp.rst
How to test this series
=======================
Toolchain
---------
$ git clone git@github.com:sifive/riscv-gnu-toolchain.git -b cfi-dev
$ riscv-gnu-toolchain/configure --prefix=<path-to-where-to-build> --with-arch=rv64gc_zicfilp_zicfiss --enable-linux --disable-gdb --with-extra-multilib-test="rv64gc_zicfilp_zicfiss-lp64d:-static"
$ make -j$(nproc)
Qemu
----
Get the lastest qemu
$ cd qemu
$ mkdir build
$ cd build
$ ../configure --target-list=riscv64-softmmu
$ make -j$(nproc)
Opensbi
-------
$ git clone git@github.com:deepak0414/opensbi.git -b v6_cfi_spec_split_opensbi
$ make CROSS_COMPILE=<your riscv toolchain> -j$(nproc) PLATFORM=generic
Linux
-----
Running defconfig is fine. CFI is enabled by default if the toolchain
supports it.
$ make ARCH=riscv CROSS_COMPILE=<path-to-cfi-riscv-gnu-toolchain>/build/bin/riscv64-unknown-linux-gnu- -j$(nproc) defconfig
$ make ARCH=riscv CROSS_COMPILE=<path-to-cfi-riscv-gnu-toolchain>/build/bin/riscv64-unknown-linux-gnu- -j$(nproc)
In case you're building your own rootfs using toolchain, please make sure you
pick following patch to ensure that vDSO compiled with lpad and shadow stack.
"arch/riscv: compile vdso with landing pad"
Branch where above patch can be picked
https://github.com/deepak0414/linux-riscv-cfi/tree/vdso_user_cfi_v6.12-rc1
Running
-------
Modify your qemu command to have:
-bios <path-to-cfi-opensbi>/build/platform/generic/firmware/fw_dynamic.bin
-cpu rv64,zicfilp=true,zicfiss=true,zimop=true,zcmop=true
vDSO related Opens (in the flux)
=================================
I am listing these opens for laying out plan and what to expect in future
patch sets. And of course for the sake of discussion.
Shadow stack and landing pad enabling in vDSO
----------------------------------------------
vDSO must have shadow stack and landing pad support compiled in for task
to have shadow stack and landing pad support. This patch series doesn't
enable that (yet). Enabling shadow stack support in vDSO should be
straight forward (intend to do that in next versions of patch set). Enabling
landing pad support in vDSO requires some collaboration with toolchain folks
to follow a single label scheme for all object binaries. This is necessary to
ensure that all indirect call-sites are setting correct label and target landing
pads are decorated with same label scheme.
How many vDSOs
---------------
Shadow stack instructions are carved out of zimop (may be operations) and if CPU
doesn't implement zimop, they're illegal instructions. Kernel could be running on
a CPU which may or may not implement zimop. And thus kernel will have to carry 2
different vDSOs and expose the appropriate one depending on whether CPU implements
zimop or not.
References
==========
[1] - https://github.com/riscv/riscv-cfi
[2] - https://lore.kernel.org/all/20240814081126.956287-1-samuel.holland@sifive.c…
[3] - https://lwn.net/Articles/889475/
[4] - https://developer.arm.com/documentation/109576/0100/Branch-Target-Identific…
[5] - https://www.intel.com/content/dam/develop/external/us/en/documents/catc17-i…
[6] - https://lwn.net/Articles/940403/
To: Thomas Gleixner <tglx(a)linutronix.de>
To: Ingo Molnar <mingo(a)redhat.com>
To: Borislav Petkov <bp(a)alien8.de>
To: Dave Hansen <dave.hansen(a)linux.intel.com>
To: x86(a)kernel.org
To: H. Peter Anvin <hpa(a)zytor.com>
To: Andrew Morton <akpm(a)linux-foundation.org>
To: Liam R. Howlett <Liam.Howlett(a)oracle.com>
To: Vlastimil Babka <vbabka(a)suse.cz>
To: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com>
To: Paul Walmsley <paul.walmsley(a)sifive.com>
To: Palmer Dabbelt <palmer(a)dabbelt.com>
To: Albert Ou <aou(a)eecs.berkeley.edu>
To: Conor Dooley <conor(a)kernel.org>
To: Rob Herring <robh(a)kernel.org>
To: Krzysztof Kozlowski <krzk+dt(a)kernel.org>
To: Arnd Bergmann <arnd(a)arndb.de>
To: Christian Brauner <brauner(a)kernel.org>
To: Peter Zijlstra <peterz(a)infradead.org>
To: Oleg Nesterov <oleg(a)redhat.com>
To: Eric Biederman <ebiederm(a)xmission.com>
To: Kees Cook <kees(a)kernel.org>
To: Jonathan Corbet <corbet(a)lwn.net>
To: Shuah Khan <shuah(a)kernel.org>
To: Jann Horn <jannh(a)google.com>
To: Conor Dooley <conor+dt(a)kernel.org>
To: Miguel Ojeda <ojeda(a)kernel.org>
To: Alex Gaynor <alex.gaynor(a)gmail.com>
To: Boqun Feng <boqun.feng(a)gmail.com>
To: Gary Guo <gary(a)garyguo.net>
To: Björn Roy Baron <bjorn3_gh(a)protonmail.com>
To: Benno Lossin <benno.lossin(a)proton.me>
To: Andreas Hindborg <a.hindborg(a)kernel.org>
To: Alice Ryhl <aliceryhl(a)google.com>
To: Trevor Gross <tmgross(a)umich.edu>
Cc: linux-kernel(a)vger.kernel.org
Cc: linux-fsdevel(a)vger.kernel.org
Cc: linux-mm(a)kvack.org
Cc: linux-riscv(a)lists.infradead.org
Cc: devicetree(a)vger.kernel.org
Cc: linux-arch(a)vger.kernel.org
Cc: linux-doc(a)vger.kernel.org
Cc: linux-kselftest(a)vger.kernel.org
Cc: alistair.francis(a)wdc.com
Cc: richard.henderson(a)linaro.org
Cc: jim.shu(a)sifive.com
Cc: andybnac(a)gmail.com
Cc: kito.cheng(a)sifive.com
Cc: charlie(a)rivosinc.com
Cc: atishp(a)rivosinc.com
Cc: evan(a)rivosinc.com
Cc: cleger(a)rivosinc.com
Cc: alexghiti(a)rivosinc.com
Cc: samitolvanen(a)google.com
Cc: broonie(a)kernel.org
Cc: rick.p.edgecombe(a)intel.com
Cc: rust-for-linux(a)vger.kernel.org
changelog
---------
v19:
- riscv_nousercfi was `int`. changed it to unsigned long.
Thanks to Alex Ghiti for reporting it. It was a bug.
- ELP is cleared on trap entry only when CONFIG_64BIT.
- restore ssp back on return to usermode was being done
before `riscv_v_context_nesting_end` on trap exit path.
If kernel shadow stack were enabled this would result in
kernel operating on user shadow stack and panic (as I found
in my testing of kcfi patch series). So fixed that.
v18:
- rebased on 6.16-rc1
- uprobe handling clears ELP in sstatus image in pt_regs
- vdso was missing shadow stack elf note for object files.
added that. Additional asm file for vdso needed the elf marker
flag. toolchain should complain if `-fcf-protection=full` and
marker is missing for object generated from asm file. Asked
toolchain folks to fix this. Although no reason to gate the merge
on that.
- Split up compile options for march and fcf-protection in vdso
Makefile
- CONFIG_RISCV_USER_CFI option is moved under "Kernel features" menu
Added `arch/riscv/configs/hardening.config` fragment which selects
CONFIG_RISCV_USER_CFI
v17:
- fixed warnings due to empty macros in usercfi.h (reported by alexg)
- fixed prefixes in commit titles reported by alexg
- took below uprobe with fcfi v2 patch from Zong Li and squashed it with
"riscv/traps: Introduce software check exception and uprobe handling"
https://lore.kernel.org/all/20250604093403.10916-1-zong.li@sifive.com/
v16:
- If FWFT is not implemented or returns error for shadow stack activation, then
no_usercfi is set to disable shadow stack. Although this should be picked up
by extension validation and activation. Fixed this bug for zicfilp and zicfiss
both. Thanks to Charlie Jenkins for reporting this.
- If toolchain doesn't support cfi, cfi kselftest shouldn't build. Suggested by
Charlie Jenkins.
- Default for CONFIG_RISCV_USER_CFI is set to no. Charlie/Atish suggested to
keep it off till we have more hardware availibility with RVA23 profile and
zimop/zcmop implemented. Else this will start breaking people's workflow
- Includes the fix if "!RV64 and !SBI" then definitions for FWFT in
asm-offsets.c error.
v15:
- Toolchain has been updated to include `-fcf-protection` flag. This
exists for x86 as well. Updated kernel patches to compile vDSO and
selftest to compile with `fcf-protection=full` flag.
- selecting CONFIG_RISCV_USERCFI selects CONFIG_RISCV_SBI.
- Patch to enable shadow stack for kernel wasn't hidden behind
CONFIG_RISCV_USERCFI and CONFIG_RISCV_SBI. fixed that.
v14:
- rebased on top of palmer/sbi-v3. Thus dropped clement's FWFT patches
Updated RISCV_ISA_EXT_XXXX in hwcap and hwprobe constants.
- Took Radim's suggestions on bitfields.
- Placed cfi_state at the end of thread_info block so that current situation
is not disturbed with respect to member fields of thread_info in single
cacheline.
v13:
- cpu_supports_shadow_stack/cpu_supports_indirect_br_lp_instr uses
riscv_has_extension_unlikely()
- uses nops(count) to create nop slide
- RISCV_ACQUIRE_BARRIER is not needed in `amo_user_shstk`. Removed it
- changed ternaries to simply use implicit casting to convert to bool.
- kernel command line allows to disable zicfilp and zicfiss independently.
updated kernel-parameters.txt.
- ptrace user abi for cfi uses bitmasks instead of bitfields. Added ptrace
kselftest.
- cosmetic and grammatical changes to documentation.
v12:
- It seems like I had accidently squashed arch agnostic indirect branch
tracking prctl and riscv implementation of those prctls. Split them again.
- set_shstk_status/set_indir_lp_status perform CSR writes only when CPU
support is available. As suggested by Zong Li.
- Some minor clean up in kselftests as suggested by Zong Li.
v11:
- patch "arch/riscv: compile vdso with landing pad" was unconditionally
selecting `_zicfilp` for vDSO compile. fixed that. Changed `lpad 1` to
to `lpad 0`.
v10:
- dropped "mm: helper `is_shadow_stack_vma` to check shadow stack vma". This patch
is not that interesting to this patch series for risc-v. There are instances in
arch directories where VM_SHADOW_STACK flag is anyways used. Dropping this patch
to expedite merging in riscv tree.
- Took suggestions from `Clement` on "riscv: zicfiss / zicfilp enumeration" to
validate presence of cfi based on config.
- Added a patch for vDSO to have `lpad 0`. I had omitted this earlier to make sure
we add single vdso object with cfi enabled. But a vdso object with scheme of
zero labeled landing pad is least common denominator and should work with all
objects of zero labeled as well as function-signature labeled objects.
v9:
- rebased on master (39a803b754d5 fix braino in "9p: fix ->rename_sem exclusion")
- dropped "mm: Introduce ARCH_HAS_USER_SHADOW_STACK" (master has it from arm64/gcs)
- dropped "prctl: arch-agnostic prctl for shadow stack" (master has it from arm64/gcs)
v8:
- rebased on palmer/for-next
- dropped samuel holland's `envcfg` context switch patches.
they are in parlmer/for-next
v7:
- Removed "riscv/Kconfig: enable HAVE_EXIT_THREAD for riscv"
Instead using `deactivate_mm` flow to clean up.
see here for more context
https://lore.kernel.org/all/20230908203655.543765-1-rick.p.edgecombe@intel.…
- Changed the header include in `kselftest`. Hopefully this fixes compile
issue faced by Zong Li at SiFive.
- Cleaned up an orphaned change to `mm/mmap.c` in below patch
"riscv/mm : ensure PROT_WRITE leads to VM_READ | VM_WRITE"
- Lock interfaces for shadow stack and indirect branch tracking expect arg == 0
Any future evolution of this interface should accordingly define how arg should
be setup.
- `mm/map.c` has an instance of using `VM_SHADOW_STACK`. Fixed it to use helper
`is_shadow_stack_vma`.
- Link to v6: https://lore.kernel.org/r/20241008-v5_user_cfi_series-v6-0-60d9fe073f37@riv…
v6:
- Picked up Samuel Holland's changes as is with `envcfg` placed in
`thread` instead of `thread_info`
- fixed unaligned newline escapes in kselftest
- cleaned up messages in kselftest and included test output in commit message
- fixed a bug in clone path reported by Zong Li
- fixed a build issue if CONFIG_RISCV_ISA_V is not selected
(this was introduced due to re-factoring signal context
management code)
v5:
- rebased on v6.12-rc1
- Fixed schema related issues in device tree file
- Fixed some of the documentation related issues in zicfilp/ss.rst
(style issues and added index)
- added `SHADOW_STACK_SET_MARKER` so that implementation can define base
of shadow stack.
- Fixed warnings on definitions added in usercfi.h when
CONFIG_RISCV_USER_CFI is not selected.
- Adopted context header based signal handling as proposed by Andy Chiu
- Added support for enabling kernel mode access to shadow stack using
FWFT
(https://github.com/riscv-non-isa/riscv-sbi-doc/blob/master/src/ext-firmware…)
- Link to v5: https://lore.kernel.org/r/20241001-v5_user_cfi_series-v1-0-3ba65b6e550f@riv…
(Note: I had an issue in my workflow due to which version number wasn't
picked up correctly while sending out patches)
v4:
- rebased on 6.11-rc6
- envcfg: Converged with Samuel Holland's patches for envcfg management on per-
thread basis.
- vma_is_shadow_stack is renamed to is_vma_shadow_stack
- picked up Mark Brown's `ARCH_HAS_USER_SHADOW_STACK` patch
- signal context: using extended context management to maintain compatibility.
- fixed `-Wmissing-prototypes` compiler warnings for prctl functions
- Documentation fixes and amending typos.
- Link to v4: https://lore.kernel.org/all/20240912231650.3740732-1-debug@rivosinc.com/
v3:
- envcfg
logic to pick up base envcfg had a bug where `ENVCFG_CBZE` could have been
picked on per task basis, even though CPU didn't implement it. Fixed in
this series.
- dt-bindings
As suggested, split into separate commit. fixed the messaging that spec is
in public review
- arch_is_shadow_stack change
arch_is_shadow_stack changed to vma_is_shadow_stack
- hwprobe
zicfiss / zicfilp if present will get enumerated in hwprobe
- selftests
As suggested, added object and binary filenames to .gitignore
Selftest binary anyways need to be compiled with cfi enabled compiler which
will make sure that landing pad and shadow stack are enabled. Thus removed
separate enable/disable tests. Cleaned up tests a bit.
- Link to v3: https://lore.kernel.org/lkml/20240403234054.2020347-1-debug@rivosinc.com/
v2:
- Using config `CONFIG_RISCV_USER_CFI`, kernel support for riscv control flow
integrity for user mode programs can be compiled in the kernel.
- Enabling of control flow integrity for user programs is left to user runtime
- This patch series introduces arch agnostic `prctls` to enable shadow stack
and indirect branch tracking. And implements them on riscv.
---
Changes in v19:
- Link to v18: https://lore.kernel.org/r/20250711-v5_user_cfi_series-v18-0-a8ee62f9f38e@ri…
Changes in v18:
- Link to v17: https://lore.kernel.org/r/20250604-v5_user_cfi_series-v17-0-4565c2cf869f@ri…
Changes in v17:
- Link to v16: https://lore.kernel.org/r/20250522-v5_user_cfi_series-v16-0-64f61a35eee7@ri…
Changes in v16:
- Link to v15: https://lore.kernel.org/r/20250502-v5_user_cfi_series-v15-0-914966471885@ri…
Changes in v15:
- changelog posted just below cover letter
- Link to v14: https://lore.kernel.org/r/20250429-v5_user_cfi_series-v14-0-5239410d012a@ri…
Changes in v14:
- changelog posted just below cover letter
- Link to v13: https://lore.kernel.org/r/20250424-v5_user_cfi_series-v13-0-971437de586a@ri…
Changes in v13:
- changelog posted just below cover letter
- Link to v12: https://lore.kernel.org/r/20250314-v5_user_cfi_series-v12-0-e51202b53138@ri…
Changes in v12:
- changelog posted just below cover letter
- Link to v11: https://lore.kernel.org/r/20250310-v5_user_cfi_series-v11-0-86b36cbfb910@ri…
Changes in v11:
- changelog posted just below cover letter
- Link to v10: https://lore.kernel.org/r/20250210-v5_user_cfi_series-v10-0-163dcfa31c60@ri…
---
Andy Chiu (1):
riscv: signal: abstract header saving for setup_sigcontext
Deepak Gupta (25):
mm: VM_SHADOW_STACK definition for riscv
dt-bindings: riscv: zicfilp and zicfiss in dt-bindings (extensions.yaml)
riscv: zicfiss / zicfilp enumeration
riscv: zicfiss / zicfilp extension csr and bit definitions
riscv: usercfi state for task and save/restore of CSR_SSP on trap entry/exit
riscv/mm : ensure PROT_WRITE leads to VM_READ | VM_WRITE
riscv/mm: manufacture shadow stack pte
riscv/mm: teach pte_mkwrite to manufacture shadow stack PTEs
riscv/mm: write protect and shadow stack
riscv/mm: Implement map_shadow_stack() syscall
riscv/shstk: If needed allocate a new shadow stack on clone
riscv: Implements arch agnostic shadow stack prctls
prctl: arch-agnostic prctl for indirect branch tracking
riscv: Implements arch agnostic indirect branch tracking prctls
riscv/traps: Introduce software check exception and uprobe handling
riscv/signal: save and restore of shadow stack for signal
riscv/kernel: update __show_regs to print shadow stack register
riscv/ptrace: riscv cfi status and state via ptrace and in core files
riscv/hwprobe: zicfilp / zicfiss enumeration in hwprobe
riscv: kernel command line option to opt out of user cfi
riscv: enable kernel access to shadow stack memory via FWFT sbi call
riscv: create a config for shadow stack and landing pad instr support
riscv: Documentation for landing pad / indirect branch tracking
riscv: Documentation for shadow stack on riscv
kselftest/riscv: kselftest for user mode cfi
Jim Shu (1):
arch/riscv: compile vdso with landing pad and shadow stack note
Documentation/admin-guide/kernel-parameters.txt | 8 +
Documentation/arch/riscv/index.rst | 2 +
Documentation/arch/riscv/zicfilp.rst | 115 +++++
Documentation/arch/riscv/zicfiss.rst | 179 +++++++
.../devicetree/bindings/riscv/extensions.yaml | 14 +
arch/riscv/Kconfig | 21 +
arch/riscv/Makefile | 5 +-
arch/riscv/configs/hardening.config | 4 +
arch/riscv/include/asm/asm-prototypes.h | 1 +
arch/riscv/include/asm/assembler.h | 44 ++
arch/riscv/include/asm/cpufeature.h | 12 +
arch/riscv/include/asm/csr.h | 16 +
arch/riscv/include/asm/entry-common.h | 2 +
arch/riscv/include/asm/hwcap.h | 2 +
arch/riscv/include/asm/mman.h | 26 +
arch/riscv/include/asm/mmu_context.h | 7 +
arch/riscv/include/asm/pgtable.h | 30 +-
arch/riscv/include/asm/processor.h | 1 +
arch/riscv/include/asm/thread_info.h | 3 +
arch/riscv/include/asm/usercfi.h | 95 ++++
arch/riscv/include/asm/vector.h | 3 +
arch/riscv/include/uapi/asm/hwprobe.h | 2 +
arch/riscv/include/uapi/asm/ptrace.h | 34 ++
arch/riscv/include/uapi/asm/sigcontext.h | 1 +
arch/riscv/kernel/Makefile | 1 +
arch/riscv/kernel/asm-offsets.c | 10 +
arch/riscv/kernel/cpufeature.c | 27 +
arch/riscv/kernel/entry.S | 38 ++
arch/riscv/kernel/head.S | 27 +
arch/riscv/kernel/process.c | 27 +-
arch/riscv/kernel/ptrace.c | 95 ++++
arch/riscv/kernel/signal.c | 148 +++++-
arch/riscv/kernel/sys_hwprobe.c | 2 +
arch/riscv/kernel/sys_riscv.c | 10 +
arch/riscv/kernel/traps.c | 54 ++
arch/riscv/kernel/usercfi.c | 545 +++++++++++++++++++++
arch/riscv/kernel/vdso/Makefile | 11 +-
arch/riscv/kernel/vdso/flush_icache.S | 4 +
arch/riscv/kernel/vdso/getcpu.S | 4 +
arch/riscv/kernel/vdso/rt_sigreturn.S | 4 +
arch/riscv/kernel/vdso/sys_hwprobe.S | 4 +
arch/riscv/kernel/vdso/vgetrandom-chacha.S | 5 +-
arch/riscv/mm/init.c | 2 +-
arch/riscv/mm/pgtable.c | 16 +
include/linux/cpu.h | 4 +
include/linux/mm.h | 7 +
include/uapi/linux/elf.h | 2 +
include/uapi/linux/prctl.h | 27 +
kernel/sys.c | 30 ++
tools/testing/selftests/riscv/Makefile | 2 +-
tools/testing/selftests/riscv/cfi/.gitignore | 3 +
tools/testing/selftests/riscv/cfi/Makefile | 16 +
tools/testing/selftests/riscv/cfi/cfi_rv_test.h | 82 ++++
tools/testing/selftests/riscv/cfi/riscv_cfi_test.c | 173 +++++++
tools/testing/selftests/riscv/cfi/shadowstack.c | 385 +++++++++++++++
tools/testing/selftests/riscv/cfi/shadowstack.h | 27 +
56 files changed, 2389 insertions(+), 30 deletions(-)
---
base-commit: a2a05801de77ca5122fc34e3eb84d6359ef70389
change-id: 20240930-v5_user_cfi_series-3dc332f8f5b2
--
- debug
The "nohz_full" and "rcu_nocbs" boot command parameters can be used to
remove a lot of kernel overhead on a specific set of isolated CPUs which
can be used to run some latency/bandwidth sensitive workloads with as
little kernel disturbance/noise as possible. The problem with this mode
of operation is the fact that it is a static configuration which cannot
be changed after boot to adjust for changes in application loading.
There is always a desire to enable runtime modification of the number
of isolated CPUs that can be dedicated to this type of demanding
workloads. This patchset is an attempt to do just that with an amount of
CPU isolation close to what can be done with the nohz_full and rcu_nocbs
boot kernel parameters.
This patch series provides the ability to change the set of housekeeping
CPUs at run time via the cpuset isolated partition functionality.
Currently, the cpuset isolated partition is able to disable scheduler
load balancing and the CPU affinity of the unbound workqueue to avoid the
isolated CPUs. This patch series will extend that with other kernel noises
associated with the nohz_full boot command line parameter which has the
following sub-categories:
- tick
- timer
- RCU
- MISC
- WQ
- kthread
The rcu_nocbs is actually a subset of nohz_full focusing just on the
RCU part of the kernel noises. The WQ part has already been handled by
the current cpuset code.
This series focuses on the tick and RCU part of the kernel noises by
actively changing their internal data structures to track changes in
the list of isolated CPUs used by cpuset isolated partitions.
The dynamic update of the lists of housekeeping CPUs at run time will
also have impact on the other part of the kernel noises that reference
the lists of housekeeping CPUs at run time.
The pending patch series on timer migration[1], when properly integrated
will support the timer part too.
The CPU hotplug functionality of the Linux kernel is used to facilitate
the runtime change of the nohz_full isolated CPUs with minimal code
changes. The CPUs that need to be switched from non-isolated to
isolated or vice versa will be brought offline first, making the
necessary changes and then brought back online afterward.
The use of CPU hotplug, however, does have a slight drawback of
freezing all the other CPUs in part of the offlining process using
the stop machine feature of the kernel. That will cause a noticeable
latency spikes in other running applications which may be significant
to sensitive applications running on isolated CPUs in other isolated
partitions at the time. Hopefully we can find a way to solve this
problem in the future.
One possible workaround for this is to reserve a set of nohz_full
isolated CPUs at boot time using the nohz_full boot command parameter.
The bringing of those nohz_full reserved CPUs into and out of isolated
partitions will not invoke CPU hotplug and hence will not cause
unexpected latency spikes. These reserved CPUs will only be needed
if there are other existing isolated partitions running critical
applications at the time when an isolated partition needs to be created.
Patches 1-4 updates the CPU isolation code at kernel/sched/isolation.c
to enable dynamic update of the lists of housekeeping CPUs.
Patch 5 introduces a new cpuhp_offline_cb() API for shutting down the
given set of CPUs, running the given callback method and then bringing
those CPUs back online again. This new API will block any incoming
hotplug events from interfering this operation.
Patches 6-9 updates the cpuset partition code to use the new cpuhp API
to shut down the affect CPUs, making changes to the housekeeping
cpumasks and then bring those CPUs online afterward.
Patch 10 works around an issue in the DL server code that block the
hotplug operation under certain configurations.
Patch 11-14 updates the timer tick and related code to enable proper
updates to the set of CPUs requiring nohz_full dynticks support.
Patch 15 enables runtime modification to the set of isolated CPUs
requiring RCU NO-CB CPU support with minor changes to the RCU code.
Patches 16-18 includes other miscellaneous updates to cpuset code and
documentation.
This patch series is applied on top of some other cpuset patches[1]
posted upstream recently.
[1] https://lore.kernel.org/lkml/20250806093855.86469-1-gmonaco@redhat.com/
[2] https://lore.kernel.org/lkml/20250806172430.1155133-1-longman@redhat.com/
Waiman Long (18):
sched/isolation: Enable runtime update of housekeeping cpumasks
sched/isolation: Call sched_tick_offload_init() when
HK_FLAG_KERNEL_NOISE is first set
sched/isolation: Use RCU to delay successive housekeeping cpumask
updates
sched/isolation: Add a debugfs file to dump housekeeping cpumasks
cpu/hotplug: Add a new cpuhp_offline_cb() API
cgroup/cpuset: Introduce a new top level isolcpus_update_mutex
cgroup/cpuset: Allow overwriting HK_TYPE_DOMAIN housekeeping cpumask
cgroup/cpuset: Use CPU hotplug to enable runtime nohz_full
modification
cgroup/cpuset: Revert "Include isolated cpuset CPUs in
cpu_is_isolated() check"
sched/core: Ignore DL BW deactivation error if in
cpuhp_offline_cb_mode
tick/nohz: Make nohz_full parameter optional
tick/nohz: Introduce tick_nohz_full_update_cpus() to update
tick_nohz_full_mask
tick/nohz: Allow runtime changes in full dynticks CPUs
tick: Pass timer tick job to an online HK CPU in tick_cpu_dying()
cgroup/cpuset: Enable RCU NO-CB CPU offloading of newly isolated CPUs
cgroup/cpuset: Don't set have_boot_nohz_full without any boot time
nohz_full CPU
cgroup/cpuset: Documentation updates & don't use CPU 0 for isolated
partition
cgroup/cpuset: Add pr_debug() statements for cpuhp_offline_cb() call
Documentation/admin-guide/cgroup-v2.rst | 33 +-
.../admin-guide/kernel-parameters.txt | 19 +-
include/linux/context_tracking.h | 8 +-
include/linux/cpuhplock.h | 9 +
include/linux/cpuset.h | 6 -
include/linux/rcupdate.h | 2 +
include/linux/sched/isolation.h | 9 +-
include/linux/tick.h | 2 +
kernel/cgroup/cpuset.c | 344 ++++++++++++------
kernel/context_tracking.c | 21 +-
kernel/cpu.c | 47 +++
kernel/rcu/tree_nocb.h | 7 +-
kernel/sched/core.c | 8 +-
kernel/sched/debug.c | 32 ++
kernel/sched/isolation.c | 151 +++++++-
kernel/sched/sched.h | 2 +-
kernel/time/tick-common.c | 15 +-
kernel/time/tick-sched.c | 24 +-
.../selftests/cgroup/test_cpuset_prs.sh | 15 +-
19 files changed, 583 insertions(+), 171 deletions(-)
--
2.50.0
Ever since the introduction of pid namespaces, procfs has had very
implicit behaviour surrounding them (the pidns used by a procfs mount is
auto-selected based on the mounting process's active pidns, and the
pidns itself is basically hidden once the mount has been constructed).
/* pidns mount option for procfs */
This implicit behaviour has historically meant that userspace was
required to do some special dances in order to configure the pidns of a
procfs mount as desired. Examples include:
* In order to bypass the mnt_too_revealing() check, Kubernetes creates
a procfs mount from an empty pidns so that user namespaced containers
can be nested (without this, the nested containers would fail to
mount procfs). But this requires forking off a helper process because
you cannot just one-shot this using mount(2).
* Container runtimes in general need to fork into a container before
configuring its mounts, which can lead to security issues in the case
of shared-pidns containers (a privileged process in the pidns can
interact with your container runtime process). While
SUID_DUMP_DISABLE and user namespaces make this less of an issue, the
strict need for this due to a minor uAPI wart is kind of unfortunate.
Things would be much easier if there was a way for userspace to just
specify the pidns they want. Patch 1 implements a new "pidns" argument
which can be set using fsconfig(2):
fsconfig(procfd, FSCONFIG_SET_FD, "pidns", NULL, nsfd);
fsconfig(procfd, FSCONFIG_SET_STRING, "pidns", "/proc/self/ns/pid", 0);
or classic mount(2) / mount(8):
// mount -t proc -o pidns=/proc/self/ns/pid proc /tmp/proc
mount("proc", "/tmp/proc", "proc", MS_..., "pidns=/proc/self/ns/pid");
The initial security model I have in this RFC is to be as conservative
as possible and just mirror the security model for setns(2) -- which
means that you can only set pidns=... to pid namespaces that your
current pid namespace is a direct ancestor of and you have CAP_SYS_ADMIN
privileges over the pid namespace. This fulfils the requirements of
container runtimes, but I suspect that this may be too strict for some
usecases.
The pidns argument is not displayed in mountinfo -- it's not clear to me
what value it would make sense to show (maybe we could just use ns_dname
to provide an identifier for the namespace, but this number would be
fairly useless to userspace). I'm open to suggestions. Note that
PROCFS_GET_PID_NAMESPACE (see below) does at least let userspace get
information about this outside of mountinfo.
Note that you cannot change the pidns of an already-created procfs
instance. The primary reason is that allowing this to be changed would
require RCU-protecting proc_pid_ns(sb) and thus auditing all of
fs/proc/* and some of the users in fs/* to make sure they wouldn't UAF
the pid namespace. Since creating procfs instances is very cheap, it
seems unnecessary to overcomplicate this upfront. Trying to reconfigure
procfs this way errors out with -EBUSY.
/* ioctl(PROCFS_GET_PID_NAMESPACE) */
In addition, being able to figure out what pid namespace is being used
by a procfs mount is quite useful when you have an administrative
process (such as a container runtime) which wants to figure out the
correct way of mapping PIDs between its own namespace and the namespace
for procfs (using NS_GET_{PID,TGID}_{IN,FROM}_PIDNS). There are
alternative ways to do this, but they all rely on ancillary information
that third-party libraries and tools do not necessarily have access to.
To make this easier, add a new ioctl (PROCFS_GET_PID_NAMESPACE) which
can be used to get a reference to the pidns that a procfs is using.
Rather than copying the (fairly strict) security model for setns(2),
apply a slightly looser model to better match what userspace can already
do:
* Make the ioctl only valid on the root (meaning that a process without
access to the procfs root -- such as only having an fd to a procfs
file or some open_tree(2)-like subset -- cannot use this API). This
means that the process already has some level of access to the
/proc/$pid directories.
* If the calling process is in an ancestor pidns, then they can already
create pidfd for processes inside the pidns, which is morally
equivalent to a pidns file descriptor according to setns(2). So it
seems reasonable to just allow it in this case. (The justification
for this model was suggested by Christian.)
* If the process has access to /proc/1/ns/pid already (i.e. has
ptrace-read access to the pidns pid1), then this ioctl is equivalent
to just opening a handle to it that way.
Ideally we would check for ptrace-read access against all processes
in the pidns (which is very likely to be true for at least one
process, as SUID_DUMP_DISABLE is cleared on exec(2) and is rarely set
by most programs), but this would obviously not scale.
I'm open to suggestions for whether we need to make this stricter (or
possibly allow more cases).
Signed-off-by: Aleksa Sarai <cyphar(a)cyphar.com>
---
Changes in v4:
- Remove unneeded EXPORT_SYMBOL_GPL. [Christian Brauner]
- Return -EOPNOTSUPP for new APIs for CONFIG_PID_NS=n rather than
pretending they don't exist entirely. [Christian Brauner]
- PROCFS_IOCTL_MAGIC conflicts with XSDFEC_MAGIC, so we need to allocate
subvalues more carefully (switch to _IO(PROCFS_IOCTL_MAGIC, 32)).
- Add some more selftests for PROCFS_GET_PID_NAMESPACE.
- Reword argument for PROCFS_GET_PID_NAMESPACE security model based on
Christian's suggestion, and remove CAP_SYS_ADMIN edge-case (in most
cases, such a process would also have ptrace-read credentials over the
pidns pid1).
- v3: <https://lore.kernel.org/r/20250724-procfs-pidns-api-v3-0-4c685c910923@cypha…>
Changes in v3:
- Disallow changing pidns for existing procfs instances, as we'd
probably have to RCU-protect everything that touches the pinned pidns
reference.
- Improve tests with slightly nicer ASSERT_ERRNO* macros.
- v2: <https://lore.kernel.org/r/20250723-procfs-pidns-api-v2-0-621e7edd8e40@cypha…>
Changes in v2:
- #ifdef CONFIG_PID_NS
- Improve cover letter wording to make it clear we're talking about two
separate features with different permission models. [Andy Lutomirski]
- Fix build warnings in pidns_is_ancestor() patch. [kernel test robot]
- v1: <https://lore.kernel.org/r/20250721-procfs-pidns-api-v1-0-5cd9007e512d@cypha…>
---
Aleksa Sarai (4):
pidns: move is-ancestor logic to helper
procfs: add "pidns" mount option
procfs: add PROCFS_GET_PID_NAMESPACE ioctl
selftests/proc: add tests for new pidns APIs
Documentation/filesystems/proc.rst | 12 ++
fs/proc/root.c | 166 +++++++++++++++-
include/linux/pid_namespace.h | 9 +
include/uapi/linux/fs.h | 4 +
kernel/pid_namespace.c | 22 ++-
tools/testing/selftests/proc/.gitignore | 1 +
tools/testing/selftests/proc/Makefile | 1 +
tools/testing/selftests/proc/proc-pidns.c | 315 ++++++++++++++++++++++++++++++
8 files changed, 514 insertions(+), 16 deletions(-)
---
base-commit: 66639db858112bf6b0f76677f7517643d586e575
change-id: 20250717-procfs-pidns-api-8ed1583431f0
Best regards,
--
Aleksa Sarai <cyphar(a)cyphar.com>
David asked me if there is a way of checking split_huge_page_test
results instead of the existing smap check[1]. This patchset uses
kpageflags to get after-split folio orders for a better
split_huge_page_test result check. The added gather_folio_orders() scans
through a VPN range and collects the numbers of folios at different orders.
check_folio_orders() compares the result of gather_folio_orders() to
a given list of numbers of different orders.
split_huge_page_test needs the FORCE_READ fix in [2] to work correctly.
This patchset also:
1. added new order and in folio offset to the split huge page debugfs's
pr_debug()s;
2. changed split_huge_pages_pid() to skip the rest of a folio if it is
split by folio_split() (not changing split_folio_to_order() part
since split_pte_mapped_thp test relies on its behavior).
[1] https://lore.kernel.org/linux-mm/e2f32bdb-e4a4-447c-867c-31405cbba151@redha…
[2] https://lore.kernel.org/linux-mm/20250805175140.241656-1-ziy@nvidia.com/
Zi Yan (4):
mm/huge_memory: add new_order and offset to split_huge_pages*()
pr_debug.
mm/huge_memory: move to next folio after folio_split() succeeds.
selftests/mm: add check_folio_orders() helper.
selftests/mm: check after-split folio orders in split_huge_page_test.
mm/huge_memory.c | 22 +--
.../selftests/mm/split_huge_page_test.c | 67 ++++++---
tools/testing/selftests/mm/vm_util.c | 139 ++++++++++++++++++
tools/testing/selftests/mm/vm_util.h | 2 +
4 files changed, 200 insertions(+), 30 deletions(-)
--
2.47.2
Userspace generally expects APIs that return -EMSGSIZE to allow for them
to adjust their buffer size and retry the operation. However, the
fscontext log would previously clear the message even in the -EMSGSIZE
case.
Given that it is very cheap for us to check whether the buffer is too
small before we remove the message from the ring buffer, let's just do
that instead. While we're at it, refactor some fscontext_read() into a
separate helper to make the ring buffer logic a bit easier to read.
Fixes: 007ec26cdc9f ("vfs: Implement logging through fs_context")
Signed-off-by: Aleksa Sarai <cyphar(a)cyphar.com>
---
Changes in v3:
- selftests: use EXPECT_STREQ()
- v2: <https://lore.kernel.org/r/20250806-fscontext-log-cleanups-v2-0-88e9d34d142f…>
Changes in v2:
- Refactor message fetching to fetch_message_locked() which returns
ERR_PTR() in error cases. [Al Viro]
- v1: <https://lore.kernel.org/r/20250806-fscontext-log-cleanups-v1-0-880597d42a5a…>
---
Aleksa Sarai (2):
fscontext: do not consume log entries when returning -EMSGSIZE
selftests/filesystems: add basic fscontext log tests
fs/fsopen.c | 54 +++++-----
tools/testing/selftests/filesystems/.gitignore | 1 +
tools/testing/selftests/filesystems/Makefile | 2 +-
tools/testing/selftests/filesystems/fclog.c | 130 +++++++++++++++++++++++++
4 files changed, 162 insertions(+), 25 deletions(-)
---
base-commit: 66639db858112bf6b0f76677f7517643d586e575
change-id: 20250806-fscontext-log-cleanups-50f0143674ae
Best regards,
--
Aleksa Sarai <cyphar(a)cyphar.com>
As described in commit 7a54947e727b ('Merge patch series "fs: allow
changing idmappings"'), open_tree_attr(2) was necessary in order to
allow for a detached mount to be created and have its idmappings changed
without the risk of any racing threads operating on it. For this reason,
mount_setattr(2) still does not allow for id-mappings to be changed.
However, there was a bug in commit 2462651ffa76 ("fs: allow changing
idmappings") which allowed users to bypass this restriction by calling
open_tree_attr(2) *without* OPEN_TREE_CLONE.
can_idmap_mount() prevented this bug from allowing an attached
mountpoint's id-mapping from being modified (thanks to an is_anon_ns()
check), but this still allows for detached (but visible) mounts to have
their be id-mapping changed. This risks the same UAF and locking issues
as described in the merge commit, and was likely unintentional.
For what it's worth, I found this while working on the open_tree_attr(2)
man page, and was trying to figure out what open_tree_attr(2)'s
behaviour was in the (slightly fruity) ~OPEN_TREE_CLONE case.
Signed-off-by: Aleksa Sarai <cyphar(a)cyphar.com>
---
Aleksa Sarai (2):
open_tree_attr: do not allow id-mapping changes without OPEN_TREE_CLONE
selftests/mount_setattr: add smoke tests for open_tree_attr(2) bug
fs/namespace.c | 3 +-
.../selftests/mount_setattr/mount_setattr_test.c | 77 ++++++++++++++++++----
2 files changed, 66 insertions(+), 14 deletions(-)
---
base-commit: 66639db858112bf6b0f76677f7517643d586e575
change-id: 20250808-open_tree_attr-bugfix-idmap-bb741166dc04
Best regards,
--
Aleksa Sarai <cyphar(a)cyphar.com>
Hello!
KUnit offers a parameterized testing framework, where tests can be
run multiple times with different inputs.
Currently, the same `struct kunit` is used for each parameter
execution. After each run, the test instance gets cleaned up.
This creates the following limitations:
a. There is no way to store resources that are accessible across
the individual parameter test executions.
b. It's not possible to pass additional context besides the
previous parameter to `generate_params()` to get the next
parameter.
c. Test users are restricted to using pre-defined static arrays
of parameter objects or `generate_params()` to define their
parameters. There is no flexibility to pass a custom dynamic
array without using `generate_params()`, which can be complex
if generating the next parameter depends on more than just
the single previous parameter (e.g., two or more previous
parameters).
This patch series resolves these limitations by:
1. [P 1] Giving each parameterized test execution its own
`struct kunit`. This aligns more with the definition of a
`struct kunit` as a running instance of a test. It will also
remove the need to manage state, such as resetting the
`test->priv` field or the `test->status_comment` after every
parameter run.
2. [P 1] Introducing a parent pointer of type `struct kunit`.
Behind the scenes, a parent instance for the parameterized
tests will be created. It won't be used to execute any test
logic, but will instead be used as a context for shared
resources. Each individual running instance of a test will
now have a reference to that parent instance and thus, have
access to those resources.
3. [P 2] Introducing `param_init()` and `param_exit()` functions
that can set up and clean up the parent instance of the
parameterized tests. They will run once before and after the
parameterized series and provide a way for the user to
access the parent instance to add the parameter array or any
other resources to it, including custom ones to the
`test->parent->priv` field or to `test->parent->resources`
via the Resource API (link below).
https://elixir.bootlin.com/linux/v6.16-rc7/source/include/kunit/resource.h
4. [P 3, 4 & 5] Passing the parent `struct kunit` as an additional
parameter to `generate_params()`. This provides
`generate_params()` with more available context, making
parameter generation much more flexible. The
`generate_params()` implementations in the KCSAN and drm/xe
tests have been adapted to match the new function pointer
signature.
5. [P 6] Introducing a `params_data` field in `struct kunit`.
This will allow the parent instance of a test to have direct
storage of the parameter array, enabling features like using
dynamic parameter arrays or using context beyond just the
previous parameter.
Thank you!
-Marie
Marie Zhussupova (9):
kunit: Add parent kunit for parameterized test context
kunit: Introduce param_init/exit for parameterized test shared context
management
kunit: Pass additional context to generate_params for parameterized
testing
kcsan: test: Update parameter generator to new signature
drm/xe: Update parameter generator to new signature
kunit: Enable direct registration of parameter arrays to a KUnit test
kunit: Add example parameterized test with shared resources and direct
static parameter array setup
kunit: Add example parameterized test with direct dynamic parameter
array setup
Documentation: kunit: Document new parameterized test features
Documentation/dev-tools/kunit/usage.rst | 455 +++++++++++++++++++++++-
drivers/gpu/drm/xe/tests/xe_pci.c | 2 +-
include/kunit/test.h | 98 ++++-
kernel/kcsan/kcsan_test.c | 2 +-
lib/kunit/kunit-example-test.c | 207 +++++++++++
lib/kunit/test.c | 82 ++++-
6 files changed, 818 insertions(+), 28 deletions(-)
--
2.50.1.552.g942d659e1b-goog