While taking a look at '[PATCH net] pktgen: Avoid out-of-range in
get_imix_entries' ([1]) and '[PATCH net v2] pktgen: Avoid out-of-bounds access
in get_imix_entries' ([2], [3]) and doing some tests and code review I
detected that the /proc/net/pktgen/... parsing logic does not honour the
user given buffer bounds (resulting in out-of-bounds access).
This can be observed e.g. by the following simple test (sometimes the
old/'longer' previous value is re-read from the buffer):
$ echo add_device lo@0 > /proc/net/pktgen/kpktgend_0
$ echo "min_pkt_size 12345" > /proc/net/pktgen/lo\@0 && grep min_pkt_size /proc/net/pktgen/lo\@0
Params: count 1000 min_pkt_size: 12345 max_pkt_size: 0
Result: OK: min_pkt_size=12345
$ echo -n "min_pkt_size 123" > /proc/net/pktgen/lo\@0 && grep min_pkt_size /proc/net/pktgen/lo\@0
Params: count 1000 min_pkt_size: 12345 max_pkt_size: 0
Result: OK: min_pkt_size=12345
$ echo "min_pkt_size 123" > /proc/net/pktgen/lo\@0 && grep min_pkt_size /proc/net/pktgen/lo\@0
Params: count 1000 min_pkt_size: 123 max_pkt_size: 0
Result: OK: min_pkt_size=123
So fix the out-of-bounds access (and two minor findings) and add a simple
proc_net_pktgen selftest...
Regards,
Peter
Changes v1 -> v2:
- new patch: 'net: pktgen: fix hex32_arg parsing for short reads'
- new patch: 'net: pktgen: fix 'rate 0' error handling (return -EINVAL)'
- new patch: 'net: pktgen: fix 'ratep 0' error handling (return -EINVAL)'
- net/core/pktgen.c: additional fix get_imix_entries() and get_labels()
- tools/testing/selftests/net/proc_net_pktgen.c:
- fix tyop not vs. nod (suggested by Jakub Kicinski)
- fix misaligned line (suggested by Jakub Kicinski)
- enable fomerly commented out CONFIG_XFRM dependent test (command spi),
as CONFIG_XFRM is enabled via tools/testing/selftests/net/config
CONFIG_XFRM_INTERFACE/CONFIG_XFRM_USER (suggestex by Jakub Kicinski)
- add CONFIG_NET_PKTGEN=m to tools/testing/selftests/net/config
(suggested by Jakub Kicinski)
- add modprobe pktgen to FIXTURE_SETUP() (suggested by Jakub Kicinski)
- fix some checkpatch warnings (Missing a blank line after declarations)
- shrink line length by re-naming some variables (command -> cmd,
device -> dev)
- add 'rate 0' testcase
- add 'ratep 0' testcase
[1] https://lore.kernel.org/netdev/20241006221221.3744995-1-artem.chernyshev@re…
[2] https://lore.kernel.org/netdev/20250109083039.14004-1-pchelkin@ispras.ru/
[3] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?…
Peter Seiderer (8):
net: pktgen: replace ENOTSUPP with EOPNOTSUPP
net: pktgen: enable 'param=value' parsing
net: pktgen: fix hex32_arg parsing for short reads
net: pktgen: fix 'rate 0' error handling (return -EINVAL)
net: pktgen: fix 'ratep 0' error handling (return -EINVAL)
net: pktgen: fix access outside of user given buffer in
pktgen_thread_write()
net: pktgen: fix access outside of user given buffer in
pktgen_if_write()
selftest: net: add proc_net_pktgen
net/core/pktgen.c | 238 ++++---
tools/testing/selftests/net/Makefile | 1 +
tools/testing/selftests/net/config | 1 +
tools/testing/selftests/net/proc_net_pktgen.c | 605 ++++++++++++++++++
4 files changed, 761 insertions(+), 84 deletions(-)
create mode 100644 tools/testing/selftests/net/proc_net_pktgen.c
--
2.48.1
While taking a look at '[PATCH net] pktgen: Avoid out-of-range in
get_imix_entries' ([1]) and '[PATCH net v2] pktgen: Avoid out-of-bounds access
in get_imix_entries' ([2], [3]) and doing some tests and code review I
detected that the /proc/net/pktgen/... parsing logic does not honour the
user given buffer bounds (resulting in out-of-bounds access).
This can be observed e.g. by the following simple test (sometimes the
old/'longer' previous value is re-read from the buffer):
$ echo add_device lo@0 > /proc/net/pktgen/kpktgend_0
$ echo "min_pkt_size 12345" > /proc/net/pktgen/lo\@0 && grep min_pkt_size /proc/net/pktgen/lo\@0
Params: count 1000 min_pkt_size: 12345 max_pkt_size: 0
Result: OK: min_pkt_size=12345
$ echo -n "min_pkt_size 123" > /proc/net/pktgen/lo\@0 && grep min_pkt_size /proc/net/pktgen/lo\@0
Params: count 1000 min_pkt_size: 12345 max_pkt_size: 0
Result: OK: min_pkt_size=12345
$ echo "min_pkt_size 123" > /proc/net/pktgen/lo\@0 && grep min_pkt_size /proc/net/pktgen/lo\@0
Params: count 1000 min_pkt_size: 123 max_pkt_size: 0
Result: OK: min_pkt_size=123
So fix the out-of-bounds access (and two minor findings) and add a simple
proc_net_pktgen selftest...
Regards,
Peter
[1] https://lore.kernel.org/netdev/20241006221221.3744995-1-artem.chernyshev@re…
[2] https://lore.kernel.org/netdev/20250109083039.14004-1-pchelkin@ispras.ru/
[3] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?…
Peter Seiderer (5):
net: pktgen: replace ENOTSUPP with EOPNOTSUPP
net: pktgen: enable 'param=value' parsing
net: pktgen: fix access outside of user given buffer in
pktgen_thread_write()
net: pktgen: fix access outside of user given buffer in
pktgen_if_write()
selftest: net: add proc_net_pktgen
net/core/pktgen.c | 210 ++++---
tools/testing/selftests/net/Makefile | 1 +
tools/testing/selftests/net/proc_net_pktgen.c | 575 ++++++++++++++++++
3 files changed, 712 insertions(+), 74 deletions(-)
create mode 100644 tools/testing/selftests/net/proc_net_pktgen.c
--
2.48.0
Hi all
This patchset adds a new buddy allocator like large folio split to the total
number of resulting folios, the amount of memory needed for multi-index xarray
split, and keep more large folios after a split. It is on top of
mm-everything-2025-01-16-06-37. It is ready to be merged.
Instead of duplicating existing split_huge_page*() code, __folio_split()
is introduced as the shared backend code for both
split_huge_page_to_list_to_order() and folio_split(). __folio_split()
can support both uniform split and buddy allocator like split. All
existing split_huge_page*() users can be gradually converted to use
folio_split() if possible. In this patchset, I converted
truncate_inode_partial_folio() to use folio_split().
xfstests quick group passed for both tmpfs and xfs.
Changelog
===
From V4[6]:
1. Enabled shmem support in both uniform and buddy allocator like split
and added selftests for it.
2. Added functions to check if uniform split and buddy allocator like
split are supported for the given folio and order.
3. Made truncate fall back to uniform split if buddy allocator split is
not supported (CONFIG_READ_ONLY_THP_FOR_FS and FS without large folio).
4. Added the missing folio_clear_has_hwpoisoned() to
__split_unmapped_folio().
From V3[5]:
1. Used xas_split_alloc(GFP_NOWAIT) instead of xas_nomem(), since extra
operations inside xas_split_alloc() are needed for correctness.
2. Enabled folio_split() for shmem and no issue was found with xfstests
quick test group.
3. Split both ends of a truncate range in truncate_inode_partial_folio()
to avoid wasting memory in shmem truncate (per David Hildenbrand).
4. Removed page_in_folio_offset() since page_folio() does the same
thing.
5. Finished truncate related tests from xfstests quick test group on XFS and
tmpfs without issues.
6. Disabled buddy allocator like split on CONFIG_READ_ONLY_THP_FOR_FS
and FS without large folio. This check was missed in the prior
versions.
From V2[3]:
1. Incorporated all the feedback from Kirill[4].
2. Used GFP_NOWAIT for xas_nomem().
3. Tested the code path when xas_nomem() fails.
4. Added selftests for folio_split().
5. Fixed no THP config build error.
From V1[2]:
1. Split the original patch 1 into multiple ones for easy review (per
Kirill).
2. Added xas_destroy() to avoid memory leak.
3. Fixed nr_dropped not used error (per kernel test robot).
4. Added proper error handling when xas_nomem() fails to allocate memory
for xas_split() during buddy allocator like split.
From RFC[1]:
1. Merged backend code of split_huge_page_to_list_to_order() and
folio_split(). The same code is used for both uniform split and buddy
allocator like split.
2. Use xas_nomem() instead of xas_split_alloc() for folio_split().
3. folio_split() now leaves the first after-split folio unlocked,
instead of the one containing the given page, since
the caller of truncate_inode_partial_folio() locks and unlocks the
first folio.
4. Extended split_huge_page debugfs to use folio_split().
5. Added truncate_inode_partial_folio() as first user of folio_split().
Design
===
folio_split() splits a large folio in the same way as buddy allocator
splits a large free page for allocation. The purpose is to minimize the
number of folios after the split. For example, if user wants to free the
3rd subpage in a order-9 folio, folio_split() will split the order-9 folio
as:
O-0, O-0, O-0, O-0, O-2, O-3, O-4, O-5, O-6, O-7, O-8 if it is anon
O-1, O-0, O-0, O-2, O-3, O-4, O-5, O-6, O-7, O-9 if it is pagecache
Since anon folio does not support order-1 yet.
The split process is similar to existing approach:
1. Unmap all page mappings (split PMD mappings if exist);
2. Split meta data like memcg, page owner, page alloc tag;
3. Copy meta data in struct folio to sub pages, but instead of spliting
the whole folio into multiple smaller ones with the same order in a
shot, this approach splits the folio iteratively. Taking the example
above, this approach first splits the original order-9 into two order-8,
then splits left part of order-8 to two order-7 and so on;
4. Post-process split folios, like write mapping->i_pages for pagecache,
adjust folio refcounts, add split folios to corresponding list;
5. Remap split folios
6. Unlock split folios.
__split_unmapped_folio() and __split_folio_to_order() replace
__split_huge_page() and __split_huge_page_tail() respectively.
__split_unmapped_folio() uses different approaches to perform
uniform split and buddy allocator like split:
1. uniform split: one single call to __split_folio_to_order() is used to
uniformly split the given folio. All resulting folios are put back to
the list after split. The folio containing the given page is left to
caller to unlock and others are unlocked.
2. buddy allocator like split: old_order - new_order calls to
__split_folio_to_order() are used to split the given folio at order N to
order N-1. After each call, the target folio is changed to the one
containing the page, which is given via folio_split() parameters.
After each call, folios not containing the page are put back to the list.
The folio containing the page is put back to the list when its order
is new_order. All folios are unlocked except the first folio, which
is left to caller to unlock.
Patch Overview
===
1. Patch 1 made file-backed THP split work in split_huge_page_test,
patch 2 enabled shmem large folio split to any lower order, and
patch 3 added tests for splitting file-backed THP to any lower order.
They can be picked independent of this patchset.
2. Patch 4 added __split_unmapped_folio() and __split_folio_to_order() to
prepare for moving to new backend split code.
3. Patch 5 moved common code in split_huge_page_to_list_to_order() to
__folio_split().
4. Patch 6 added new folio_split() and made
split_huge_page_to_list_to_order() share the new
__split_unmapped_folio() with folio_split().
5. Patch 7 removed no longer used __split_huge_page() and
__split_huge_page_tail().
6. Patch 8 added a new in_folio_offset to split_huge_page debugfs for
folio_split() test.
7. Patch 9 used folio_split() for truncate operation.
8. Patch 10 added folio_split() tests.
Any comments and/or suggestions are welcome. Thanks.
[1] https://lore.kernel.org/linux-mm/20241008223748.555845-1-ziy@nvidia.com/
[2] https://lore.kernel.org/linux-mm/20241028180932.1319265-1-ziy@nvidia.com/
[3] https://lore.kernel.org/linux-mm/20241101150357.1752726-1-ziy@nvidia.com/
[4] https://lore.kernel.org/linux-mm/e6ppwz5t4p4kvir6eqzoto4y5fmdjdxdyvxvtw43nc…
[5] https://lore.kernel.org/linux-mm/20241205001839.2582020-1-ziy@nvidia.com/
[6] https://lore.kernel.org/linux-mm/20250106165513.104899-1-ziy@nvidia.com/
Zi Yan (10):
selftests/mm: make file-backed THP split work by setting force option
mm/huge_memory: allow split shmem large folio to any lower order
selftests/mm: test splitting file-backed THP to any lower order.
mm/huge_memory: add two new (not yet used) functions for folio_split()
mm/huge_memory: move folio split common code to __folio_split()
mm/huge_memory: add buddy allocator like folio_split()
mm/huge_memory: remove the old, unused __split_huge_page()
mm/huge_memory: add folio_split() to debugfs testing interface.
mm/truncate: use folio_split() for truncate operation.
selftests/mm: add tests for folio_split(), buddy allocator like split.
include/linux/huge_mm.h | 24 +
mm/huge_memory.c | 755 ++++++++++++------
mm/truncate.c | 31 +-
.../selftests/mm/split_huge_page_test.c | 105 ++-
4 files changed, 633 insertions(+), 282 deletions(-)
--
2.45.2
Given the time of year and point in the release cycle this is an RFC
series, there's a few areas where I'm particularly expecting that people
might have feedback:
- The userspace ABI, in particular:
- The vector length used for the SVE registers, access to the SVE
registers and access to ZA and (if available) ZT0 depending on
the current state of PSTATE.{SM,ZA}.
- The use of a single finalisation for both SVE and SME.
- The addition of control for enabling fine grained traps in a similar
manner to FGU but without the UNDEF, I'm not clear if this is desired
at all and at present this requires symmetric read and write traps like
FGU. That seemed like it might be desired from an implementation
point of view but we already have one case where we enable an
asymmetric trap (for ARM64_WORKAROUND_AMPERE_AC03_CPU_38) and it
seems generally useful to enable asymmetrically.
There is some nested virtualisation support in the code but it is not
enabled or complete, this will be completed before the RFC tag is
removed. I am anticipating having a vastly better test environment soon
which will make this much easier to complete and there is no SME
specific ABI for nested virtualisation.
This series implements support for SME use in non-protected KVM guests.
Much of this is very similar to SVE, the main additional challenge that
SME presents is that it introduces a new vector length similar to the
SVE vector length and two new controls which change the registers seen
by guests:
- PSTATE.ZA enables the ZA matrix register and, if SME2 is supported,
the ZT0 LUT register.
- PSTATE.SM enables streaming mode, a new floating point mode which
uses the SVE register set with the separately configured SME vector
length. In streaming mode implementation of the FFR register is
optional.
It is also permitted to build systems which support SME without SVE, in
this case when not in streaming mode no SVE registers or instructions
are available. Further, there is no requirement that there be any
overlap in the set of vector lengths supported by SVE and SME in a
system, this is expected to be a common situation in practical systems.
Since there is a new vector length to configure we introduce a new
feature parallel to the existing SVE one with a new pseudo register for
the streaming mode vector length. Due to the overlap with SVE caused by
streaming mode rather than finalising SME as a separate feature we use
the existing SVE finalisation to also finalise SME, a new define
KVM_ARM_VCPU_VEC is provided to help make user code clearer. Finalising
SVE and SME separately would introduce complication with register access
since finalising SVE makes the SVE regsiters writeable by userspace and
doing multiple finalisations results in an error being reported.
Dealing with a state where the SVE registers are writeable due to one of
SVE or SME being finalised but may have their VL changed by the other
being finalised seems like needless complexity with minimal practical
utility, it seems clearer to just express directly that only one
finalisation can be done in the ABI.
Access to the floating point registers follows the architecture:
- When both SVE and SME are present:
- If PSTATE.SM == 0 the vector length used for the Z and P registers
is the SVE vector length.
- If PSTATE.SM == 1 the vector length used for the Z and P registers
is the SME vector length.
- If only SME is present:
- If PSTATE.SM == 0 the Z and P registers are inaccessible and the
floating point state accessed via the encodings for the V registers.
- If PSTATE.SM == 1 the vector length used for the Z and P registers
- The SME specific ZA and ZT0 registers are only accessible if SVCR.ZA is 1.
The VMM must understand this, in particular when loading state SVCR
should be configured before other state.
There are a large number of subfeatures for SME, most of which only
offer additional instructions but some of which (SME2 and FA64) add
architectural state. These are configured via the ID registers as per
usual.
The new KVM_ARM_VCPU_VEC feature and ZA and ZT0 registers have not been
added to the get-reg-list selftest, the idea of supporting additional
features there without restructuring the program to generate all
possible feature combinations has been rejected. I will post a separate
series which does that restructuring.
No support is present for protected guests, this is expected to be added
separately.
The series is based on Fuad's series:
https://lore.kernel.org/r/20241216105057.579031-1-tabba@google.com/
It will need a rebase on:
https://lore.kernel.org/r/20241219173351.1123087-1-maz@kernel.org
(as will Fuad's.)
Signed-off-by: Mark Brown <broonie(a)kernel.org>
---
Changes in v3:
- Rebase onto v6.12-rc2.
- Link to v2: https://lore.kernel.org/r/20231222-kvm-arm64-sme-v2-0-da226cb180bb@kernel.o…
Changes in v2:
- Rebase onto v6.7-rc3.
- Configure subfeatures based on host system only.
- Complete nVHE support.
- There was some snafu with sending v1 out, it didn't make it to the
lists but in case it hit people's inboxes I'm sending as v2.
---
Mark Brown (27):
arm64/fpsimd: Update FA64 and ZT0 enables when loading SME state
arm64/fpsimd: Decide to save ZT0 and streaming mode FFR at bind time
arm64/fpsimd: Check enable bit for FA64 when saving EFI state
arm64/fpsimd: Determine maximum virtualisable SME vector length
KVM: arm64: Introduce non-UNDEF FGT control
KVM: arm64: Pull ctxt_has_ helpers to start of sysreg-sr.h
KVM: arm64: Convert cpacr_clear_set() to a static inline
KVM: arm64: Move SVE state access macros after feature test macros
KVM: arm64: Factor SVE guest exit handling out into a function
KVM: arm64: Rename SVE finalization constants to be more general
KVM: arm64: Document the KVM ABI for SME
KVM: arm64: Define internal features for SME
KVM: arm64: Rename sve_state_reg_region
KVM: arm64: Store vector lengths in an array
KVM: arm64: Implement SME vector length configuration
KVM: arm64: Add definitions for SME control register
KVM: arm64: Support TPIDR2_EL0
KVM: arm64: Support SMIDR_EL1 for guests
KVM: arm64: Support SME priority registers
KVM: arm64: Provide assembly for SME state restore
KVM: arm64: Support Z and P registers in streaming mode
KVM: arm64: Expose SME specific state to userspace
KVM: arm64: Context switch SME state for normal guests
KVM: arm64: Handle SME exceptions
KVM: arm64: Provide interface for configuring and enabling SME for guests
KVM: arm64: selftests: Add SME system registers to get-reg-list
KVM: arm64: selftests: Add SME to set_id_regs test
Documentation/virt/kvm/api.rst | 117 ++++++---
arch/arm64/include/asm/fpsimd.h | 22 ++
arch/arm64/include/asm/kvm_emulate.h | 37 ++-
arch/arm64/include/asm/kvm_host.h | 135 ++++++++---
arch/arm64/include/asm/kvm_hyp.h | 4 +-
arch/arm64/include/asm/kvm_pkvm.h | 2 +-
arch/arm64/include/asm/vncr_mapping.h | 2 +
arch/arm64/include/uapi/asm/kvm.h | 33 +++
arch/arm64/kernel/cpufeature.c | 2 -
arch/arm64/kernel/fpsimd.c | 86 ++++---
arch/arm64/kvm/arm.c | 10 +
arch/arm64/kvm/fpsimd.c | 156 +++++++-----
arch/arm64/kvm/guest.c | 262 ++++++++++++++++++---
arch/arm64/kvm/handle_exit.c | 14 ++
arch/arm64/kvm/hyp/fpsimd.S | 16 ++
arch/arm64/kvm/hyp/include/hyp/switch.h | 104 ++++++--
arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h | 47 ++--
arch/arm64/kvm/hyp/nvhe/hyp-main.c | 17 +-
arch/arm64/kvm/hyp/nvhe/pkvm.c | 4 +-
arch/arm64/kvm/hyp/nvhe/switch.c | 11 +-
arch/arm64/kvm/hyp/vhe/switch.c | 21 +-
arch/arm64/kvm/reset.c | 154 +++++++++---
arch/arm64/kvm/sys_regs.c | 118 +++++++++-
include/uapi/linux/kvm.h | 1 +
tools/testing/selftests/kvm/aarch64/get-reg-list.c | 32 ++-
tools/testing/selftests/kvm/aarch64/set_id_regs.c | 29 ++-
26 files changed, 1117 insertions(+), 319 deletions(-)
---
base-commit: e32a80927434907f973f38a88cd19d7e51991d24
change-id: 20230301-kvm-arm64-sme-06a1246d3636
prerequisite-message-id: 20241216105057.579031-1-tabba(a)google.com
prerequisite-patch-id: 10a23279fc1aa942c363d66df0e95414342b614b
prerequisite-patch-id: 670db72b1987d2591e23db072fd27db7f65ffb0f
prerequisite-patch-id: c6bc6f799cebe5010bf3d734eb06e39d5dfab0d6
prerequisite-patch-id: 5555cde0b025483c2318d006a0324fd95bd06268
prerequisite-patch-id: a73738d5bbc5e694c92b7a5654f78eb79ed23c09
prerequisite-patch-id: 6194857db22ccaefe13e88b3155b6e761c9b7692
prerequisite-patch-id: 5dca3992c2ffa5bf2edb45f68be45edfae9b41b3
prerequisite-patch-id: b048e799d816c9c6750ed4f264fd38cb6e31f968
prerequisite-patch-id: 07fea6c2207f8cd2d35d4c171a97d28397db9a79
prerequisite-patch-id: f330e82665af9f223e838511bd4a95faad56e3ac
prerequisite-patch-id: 060a6061eaedb7fd02c18e898bfd9652c991b9af
prerequisite-patch-id: fc31d9f0e7812a8f962876fdb311414122895389
prerequisite-patch-id: ae675f63215a211c42a497789ee5e092fd461279
prerequisite-patch-id: ff3c533043a1fa3a13827ea5c70459b228aa95ee
prerequisite-patch-id: de489d2d73f49d74b75c628828a6b56dbac751e2
prerequisite-patch-id: 92f4a1249e3a1ff32eb16c25af56930762c5697d
prerequisite-patch-id: ac1248b4e10dce15672e02b366a359d634297877
Best regards,
--
Mark Brown <broonie(a)kernel.org>
A previous commit described in this topic
http://lore.kernel.org/bpf/20230523025618.113937-9-john.fastabend@gmail.com
directly updated 'sk->copied_seq' in the tcp_eat_skb() function when the
action of a BPF program was SK_REDIRECT. For other actions, like SK_PASS,
the update logic for 'sk->copied_seq' was moved to
tcp_bpf_recvmsg_parser() to ensure the accuracy of the 'fionread' feature.
That commit works for a single stream_verdict scenario, as it also
modified 'sk_data_ready->sk_psock_verdict_data_ready->tcp_read_skb'
to remove updating 'sk->copied_seq'.
However, for programs where both stream_parser and stream_verdict are
active(strparser purpose), tcp_read_sock() was used instead of
tcp_read_skb() (sk_data_ready->strp_data_ready->tcp_read_sock)
tcp_read_sock() now still update 'sk->copied_seq', leading to duplicated
updates.
In summary, for strparser + SK_PASS, copied_seq is redundantly calculated
in both tcp_read_sock() and tcp_bpf_recvmsg_parser().
The issue causes incorrect copied_seq calculations, which prevent
correct data reads from the recv() interface in user-land.
Also we added test cases for bpf + strparser and separated them from
sockmap_basic, as strparser has more encapsulation and parsing
capabilities compared to sockmap.
Fixes: e5c6de5fa025 ("bpf, sockmap: Incorrectly handling copied_seq")
---
V8 -> V7:
https://lore.kernel.org/bpf/20250116140531.108636-1-mrpre@163.com/
Avoid using add read_sock to psock. (Jakub Sitnicki)
Avoid using warpper function to check whether strparser is supported.
V3 -> V7:
https://lore.kernel.org/bpf/20250109094402.50838-1-mrpre@163.com/https://lore.kernel.org/bpf/20241218053408.437295-1-mrpre@163.com/
Avoid introducing new proto_ops. (Jakub Sitnicki).
Add more edge test cases for strparser + bpf.
Fix patchwork fail of test cases code.
Fix psock fetch without rcu lock.
Move code of modifying to tcp_bpf.c.
V1 -> V3:
https://lore.kernel.org/bpf/20241209152740.281125-1-mrpre@163.com/
Fix patchwork fail by adding Fixes tag.
Save skb data offset for ENOMEM. (John Fastabend)
---
Jiayuan Chen (5):
strparser: add read_sock callback
bpf: fix wrong copied_seq calculation
bpf: disable non stream socket for strparser
selftests/bpf: fix invalid flag of recv()
selftests/bpf: add strparser test for bpf
Documentation/networking/strparser.rst | 9 +-
include/linux/skmsg.h | 2 +
include/net/strparser.h | 2 +
include/net/tcp.h | 8 +
net/core/skmsg.c | 7 +
net/core/sock_map.c | 5 +-
net/ipv4/tcp.c | 29 +-
net/ipv4/tcp_bpf.c | 42 ++
net/strparser/strparser.c | 11 +-
.../selftests/bpf/prog_tests/sockmap_basic.c | 59 +--
.../selftests/bpf/prog_tests/sockmap_strp.c | 452 ++++++++++++++++++
.../selftests/bpf/progs/test_sockmap_strp.c | 53 ++
12 files changed, 614 insertions(+), 65 deletions(-)
create mode 100644 tools/testing/selftests/bpf/prog_tests/sockmap_strp.c
create mode 100644 tools/testing/selftests/bpf/progs/test_sockmap_strp.c
--
2.43.5
From: Roberto Sassu <roberto.sassu(a)huawei.com>
Integrity detection and protection has long been a desirable feature, to
reach a large user base and mitigate the risk of flaws in the software
and attacks.
However, while solutions exist, they struggle to reach a large user base,
due to requiring higher than desired constraints on performance,
flexibility and configurability, that only security conscious people are
willing to accept.
For example, IMA measurement requires the target platform to collect
integrity measurements, and to protect them with the TPM, which introduces
a noticeable overhead (up to 10x slower in a microbenchmark) on frequently
used system calls, like the open().
IMA Appraisal currently requires individual files to be signed and
verified, and Linux distributions to rebuild all packages to include file
signatures (this approach has been adopted from Fedora 39+). Like a TPM,
also signature verification introduces a significant overhead, especially
if it is used to check the integrity of many files.
This is where the new Integrity Digest Cache comes into play, it offers
additional support for new and existing integrity solutions, to make
them faster and easier to deploy.
The Integrity Digest Cache can help IMA to reduce the number of TPM
operations and to make them happen in a deterministic way. If IMA knows
that a file comes from a Linux distribution, it can measure files in a
different way: measure the list of digests coming from the distribution
(e.g. RPM package headers), and subsequently measure a file if it is not
found in that list.
The performance improvement comes at the cost of IMA not reporting which
files from installed packages were accessed, and in which temporal
sequence. This approach might not be suitable for all use cases.
The Integrity Digest Cache can also help IMA for appraisal. IMA can simply
lookup the calculated digest of an accessed file in the list of digests
extracted from package headers, after verifying the header signature. It is
sufficient to verify only one signature for all files in the package, as
opposed to verifying a signature for each file.
The same approach can be followed by other LSMs, such as Integrity Policy
Enforcement (IPE), and BPF LSM.
The Integrity Digest Cache is not tied to a specific package format. The
kernel supports a TLV-based digest list format. More can be added through
third-party kernel modules. The TLV parser has been verified for memory
safety with the Frama-C static analyzer. The version with the Frama-C
assertions is available here:
https://github.com/robertosassu/rpm-formal/blob/main/validate_tlv.c
Integrating the Integrity Digest Cache in IMA brings significant
performance improvements: up to 67% and 79% for measurement respectively in
sequential and parallel file reads; up to 65% and 43% for appraisal
respectively in sequential and parallel file reads.
The performance can be further enhanced by using fsverity digests instead
of conventional file digests, which would make IMA verify only the portion
of the file to be read. However, at the moment, fsverity digests are not
included in RPM packages. In this case, once rpm is extended to include
them, Linux distributions still have to rebuild their packages.
The Integrity Digest Cache can support both digest types, so that the
functionality is immediately available without waiting for Linux
distributions to do the transition.
This patch set only includes the patches necessary to extract digests from
a TLV-based data format, and exposes an API for LSMs to query them. A
separate patch set will be provided to integrate it in IMA.
This patch set and the follow-up IMA integration can be tested by following
the instructions at:
https://github.com/linux-integrity/digest-cache-tools
This patch set applies on top of:
https://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity.git/l…
with commit 08ae3e5f5fc8 ("integrity: Use static_assert() to check struct
sizes").
Changelog
v5:
- Remove the RPM parser and selftests (suggested by Linus)
- Return digest cache pointer from digest_cache_lookup()
- Export new Parser API, and allow registration of third-party digest list
parsers (suggested by Mimi)
- Reduce sizes in TLV format and remove TLV header (suggested by Jani
Nikula)
- Introduce new DIGEST_LIST_NUM_ENTRIES TLV field
- Pass file descriptor instead of dentry in digest_cache_get() to properly
detect potential deadlocks
- Introduce digest_cache_opened_fd() to tell lockdep when it is safe to
nest a mutex if digest_cache_get() is called with that mutex held
- Add new patch to introduce ksys_finit_module()
- Make the TLV parser as configurable (Y/N/m) with Kconfig (suggested by
Mimi)
- Don't store the path structure in the digest cache and pass it between
creation and initialization of the digest cache
- Remove digest_cache_dir_update_dig_user() and keep the digest cache
retrieved during digest_cache_get()
- Fail with an error pointer in digest_cache_dir_lookup_digest() if the
current and passed directory digest cache don't match, or the digest
cache was reset
- Handle num_digest = 0 in digest_cache_htable_init()
- Accept -EOPNOTSUPP error in digest_cache_new()
- Implement inode_free_security_rcu LSM hook instead of inode_free_security
- Move reservation of file descriptor security blob inside the #ifdef in
init_ima_lsm()
- Add test file_reset_again to check the error pointer returned by
digest_cache_lookup()
- Remove TLV_FAILURE_HDR_LEN TLV error test
- Add missing MODULE_DESCRIPTION in kselftest kernel module (suggested by
Jeff Johnson)
- Replace dentry_open() with kernel_file_open() in populate.c and dir.c
- Skip affected tests when CONFIG_DYNAMIC_FTRACE_WITH_ARGS=n
v4:
- Rename digest_cache LSM to Integrity Digest Cache (suggested by Paul
Moore)
- Update documentation
- Remove forward declaration of struct digest_cache in
include/linux/digest_cache.h (suggested by Jarkko)
- Add DIGEST_CACHE_FREE digest cache event for notification
- Remove digest_cache_found_t typedef and use uintptr_t instead
- Add header callback in TLV parser and unexport tlv_parse_hdr() and
tlv_parse_data()
- Plug the Integrity Digest Cache into the 'ima' LSM
- Switch from constructor to zeroing the object cache
- Remove notifier and detect digest cache changes by comparing pointers
- Rename digest_cache_dir_create() to digest_cache_dir_add_entries()
- Introduce digest_cache_dir_create() to create and initialize a directory
digest cache
- Introduce digest_cache_dir_update_dig_user() to update dig_user with a
file digest cache on positive digest lookup
- Use up to date directory digest cache, to take into account possible
inode eviction for the old ones
- Introduce digest_cache_dir_prefetch() to prefetch digest lists
- Adjust component name in debug messages (suggested by Jarkko)
- Add FILE_PREFETCH and FILE_READ digest cache flags, remove RESET_USER
- Reintroduce spin lock for digest cache verification data (needed for the
selftests)
- Get inode and file descriptor security blob offsets from outside (IMA)
- Avoid user-after-free in digest_cache_unref() by decrementing the ref.
count after printing the debug message
- Check for digest list lookup loops also for the parent directory
- Put and clear dig_owner directly in digest_cache_reset_clear_owner()
- Move digest cache initialization code from digest_cache_create() to
digest_cache_init()
- Hold the digest list path until the digest cache is initialized (to avoid
premature inode eviction)
- Avoid race condition on setting DIR_PREFETCH in the directory digest
cache
- Introduce digest_cache_dir_prefetch() and do it between digest cache
creation and initialization (to avoid lock inversion)
- Avoid unnecessary length check in digest_list_parse_rpm()
- Declare arrays of strings in tlv parser as static
- Emit reset for parent directory on directory entry modification
- Rename digest_cache_reset_owner() to digest_cache_reset_clear_owner()
and digest_cache_reset_user() to digest_cache_clear_user()
- Execute digest_cache_file_release() either if FMODE_WRITE or
FMODE_CREATED are set in the file descriptor f_mode
- Determine in digest_cache_verif_set() which gfp flag to use depending on
verifier ID
- Update selftests
v3:
- Rewrite documentation, and remove the installation instructions since
they are now included in the README of digest-cache-tools
- Add digest cache event notifier
- Drop digest_cache_was_reset(), and send instead to asynchronous
notifications
- Fix digest_cache LSM Kconfig style issues (suggested by Randy Dunlap)
- Propagate digest cache reset to directory entries
- Destroy per directory entry mutex
- Introduce RESET_USER bit, to clear the dig_user pointer on
set/removexattr
- Replace 'file content' with 'file data' (suggested by Mimi)
- Introduce per digest cache mutex and replace verif_data_lock spinlock
- Track changes of security.digest_list xattr
- Stop tracking file_open and use file_release instead also for file writes
- Add error messages in digest_cache_create()
- Load/unload testing kernel module automatically during execution of test
- Add tests for digest cache event notifier
- Add test for ftruncate()
- Remove DIGEST_CACHE_RESET_PREFETCH_BUF command in test and clear the
buffer on read instead
v2:
- Include the TLV parser in this patch set (from user asymmetric keys and
signatures)
- Move from IMA and make an independent LSM
- Remove IMA-specific stuff from this patch set
- Add per algorithm hash table
- Expect all digest lists to be in the same directory and allow changing
the default directory
- Support digest lookup on directories, when there is no
security.digest_list xattr
- Add seq num to digest list file name, to impose ordering on directory
iteration
- Add a new data type DIGEST_LIST_ENTRY_DATA for the nested data in the
tlv digest list format
- Add the concept of verification data attached to digest caches
- Add the reset mechanism to track changes on digest lists and directory
containing the digest lists
- Add kernel selftests
v1:
- Add documentation in Documentation/security/integrity-digest-cache.rst
- Pass the mask of IMA actions to digest_cache_alloc()
- Add a reference count to the digest cache
- Remove the path parameter from digest_cache_get(), and rely on the
reference count to avoid the digest cache disappearing while being used
- Rename the dentry_to_check parameter of digest_cache_get() to dentry
- Rename digest_cache_get() to digest_cache_new() and add
digest_cache_get() to set the digest cache in the iint of the inode for
which the digest cache was requested
- Add dig_owner and dig_user to the iint, to distinguish from which inode
the digest cache was created from, and which is using it; consequently it
makes the digest cache usable to measure/appraise other digest caches
(support not yet enabled)
- Add dig_owner_mutex and dig_user_mutex to serialize accesses to dig_owner
and dig_user until they are initialized
- Enforce strong synchronization and make the contenders wait until
dig_owner and dig_user are assigned to the iint the first time
- Move checking IMA actions on the digest list earlier, and fail if no
action were performed (digest cache not usable)
- Remove digest_cache_put(), not needed anymore with the introduction of
the reference count
- Fail immediately in digest_cache_lookup() if the digest algorithm is
not set in the digest cache
- Use 64 bit mask for IMA actions on the digest list instead of 8 bit
- Return NULL in the inline version of digest_cache_get()
- Use list_add_tail() instead of list_add() in the iterator
- Copy the digest list path to a separate buffer in digest_cache_iter_dir()
- Use digest list parsers verified with Frama-C
- Explicitly disable (for now) the possibility in the IMA policy to use the
digest cache to measure/appraise other digest lists
- Replace exit(<value>) with return <value> in manage_digest_lists.c
Roberto Sassu (15):
lib: Add TLV parser
module: Introduce ksys_finit_module()
integrity: Introduce the Integrity Digest Cache
digest_cache: Initialize digest caches
digest_cache: Add securityfs interface
digest_cache: Add hash tables and operations
digest_cache: Allow registration of digest list parsers
digest_cache: Parse tlv digest lists
digest_cache: Populate the digest cache from a digest list
digest_cache: Add management of verification data
digest_cache: Add support for directories
digest cache: Prefetch digest lists if requested
digest_cache: Reset digest cache on file/directory change
selftests/digest_cache: Add selftests for the Integrity Digest Cache
docs: Add documentation of the Integrity Digest Cache
Documentation/security/digest_cache.rst | 850 ++++++++++++++++++
Documentation/security/index.rst | 1 +
MAINTAINERS | 10 +
include/linux/digest_cache.h | 131 +++
include/linux/kernel_read_file.h | 1 +
include/linux/syscalls.h | 10 +
include/linux/tlv_parser.h | 32 +
include/uapi/linux/tlv_digest_list.h | 47 +
include/uapi/linux/tlv_parser.h | 41 +
include/uapi/linux/xattr.h | 6 +
kernel/module/main.c | 43 +-
lib/Kconfig | 3 +
lib/Makefile | 2 +
lib/tlv_parser.c | 87 ++
lib/tlv_parser.h | 18 +
security/integrity/Kconfig | 1 +
security/integrity/Makefile | 1 +
security/integrity/digest_cache/Kconfig | 43 +
security/integrity/digest_cache/Makefile | 11 +
security/integrity/digest_cache/dir.c | 400 +++++++++
security/integrity/digest_cache/htable.c | 260 ++++++
security/integrity/digest_cache/internal.h | 283 ++++++
security/integrity/digest_cache/main.c | 597 ++++++++++++
security/integrity/digest_cache/modsig.c | 66 ++
security/integrity/digest_cache/parsers.c | 257 ++++++
security/integrity/digest_cache/parsers/tlv.c | 341 +++++++
security/integrity/digest_cache/populate.c | 104 +++
security/integrity/digest_cache/reset.c | 227 +++++
security/integrity/digest_cache/secfs.c | 104 +++
security/integrity/digest_cache/verif.c | 135 +++
security/integrity/ima/ima.h | 1 +
security/integrity/ima/ima_fs.c | 6 +
security/integrity/ima/ima_main.c | 10 +-
tools/testing/selftests/Makefile | 1 +
.../testing/selftests/digest_cache/.gitignore | 3 +
tools/testing/selftests/digest_cache/Makefile | 24 +
.../testing/selftests/digest_cache/all_test.c | 769 ++++++++++++++++
tools/testing/selftests/digest_cache/common.c | 78 ++
tools/testing/selftests/digest_cache/common.h | 93 ++
.../selftests/digest_cache/common_user.c | 33 +
.../selftests/digest_cache/common_user.h | 15 +
tools/testing/selftests/digest_cache/config | 2 +
.../selftests/digest_cache/generators.c | 130 +++
.../selftests/digest_cache/generators.h | 16 +
.../selftests/digest_cache/testmod/Makefile | 16 +
.../selftests/digest_cache/testmod/kern.c | 551 ++++++++++++
46 files changed, 5849 insertions(+), 11 deletions(-)
create mode 100644 Documentation/security/digest_cache.rst
create mode 100644 include/linux/digest_cache.h
create mode 100644 include/linux/tlv_parser.h
create mode 100644 include/uapi/linux/tlv_digest_list.h
create mode 100644 include/uapi/linux/tlv_parser.h
create mode 100644 lib/tlv_parser.c
create mode 100644 lib/tlv_parser.h
create mode 100644 security/integrity/digest_cache/Kconfig
create mode 100644 security/integrity/digest_cache/Makefile
create mode 100644 security/integrity/digest_cache/dir.c
create mode 100644 security/integrity/digest_cache/htable.c
create mode 100644 security/integrity/digest_cache/internal.h
create mode 100644 security/integrity/digest_cache/main.c
create mode 100644 security/integrity/digest_cache/modsig.c
create mode 100644 security/integrity/digest_cache/parsers.c
create mode 100644 security/integrity/digest_cache/parsers/tlv.c
create mode 100644 security/integrity/digest_cache/populate.c
create mode 100644 security/integrity/digest_cache/reset.c
create mode 100644 security/integrity/digest_cache/secfs.c
create mode 100644 security/integrity/digest_cache/verif.c
create mode 100644 tools/testing/selftests/digest_cache/.gitignore
create mode 100644 tools/testing/selftests/digest_cache/Makefile
create mode 100644 tools/testing/selftests/digest_cache/all_test.c
create mode 100644 tools/testing/selftests/digest_cache/common.c
create mode 100644 tools/testing/selftests/digest_cache/common.h
create mode 100644 tools/testing/selftests/digest_cache/common_user.c
create mode 100644 tools/testing/selftests/digest_cache/common_user.h
create mode 100644 tools/testing/selftests/digest_cache/config
create mode 100644 tools/testing/selftests/digest_cache/generators.c
create mode 100644 tools/testing/selftests/digest_cache/generators.h
create mode 100644 tools/testing/selftests/digest_cache/testmod/Makefile
create mode 100644 tools/testing/selftests/digest_cache/testmod/kern.c
--
2.47.0.118.gfd3785337b
Extend pmu_counters_test to AMD CPUs.
As the AMD PMU is quite different from Intel with different events and
feature sets, this series introduces a new code path to test it,
specifically focusing on the core counters including the
PerfCtrExtCore and PerfMonV2 features. Northbridge counters and cache
counters exist, but are not as important and can be deferred to a
later series.
The first patch is a bug fix that could be submitted separately.
The series has been tested on both Intel and AMD machines, but I have
not found an AMD machine old enough to lack PerfCtrExtCore. I have
made efforts that no part of the code has any dependency on its
presence.
I am aware of similar work in this direction done by Jinrong Liang
[1]. He told me he is not working on it currently and I am not
intruding by making my own submission.
[1] https://lore.kernel.org/kvm/20231121115457.76269-1-cloudliang@tencent.com/
v2:
* Test all combinations of VM setup rather than only the maximum
allowed by hardware
* Add fixes tag to bug fix in patch 1
* Refine some names
v1:
https://lore.kernel.org/kvm/20240813164244.751597-1-coltonlewis@google.com/
Colton Lewis (6):
KVM: x86: selftests: Fix typos in macro variable use
KVM: x86: selftests: Define AMD PMU CPUID leaves
KVM: x86: selftests: Set up AMD VM in pmu_counters_test
KVM: x86: selftests: Test read/write core counters
KVM: x86: selftests: Test core events
KVM: x86: selftests: Test PerfMonV2
.../selftests/kvm/include/x86_64/processor.h | 7 +
.../selftests/kvm/x86_64/pmu_counters_test.c | 304 ++++++++++++++++--
2 files changed, 277 insertions(+), 34 deletions(-)
base-commit: da3ea35007d0af457a0afc87e84fddaebc4e0b63
--
2.46.0.662.g92d0881bb0-goog
Hi,
This series carries forward the effort to add Kselftest for PCI Endpoint
Subsystem started by Aman Gupta [1] a while ago. I reworked the initial version
based on another patch that fixes the return values of IOCTLs in
pci_endpoint_test driver and did many cleanups. Since the resulting work
modified the initial version substantially, I took over the authorship.
This series also incorporates the review comment by Shuah Khan [2] to move the
existing tests from 'tools/pci' to 'tools/testing/kselftest/pci_endpoint' before
migrating to Kselftest framework. I made sure that the tests are executable in
each commit and updated documentation accordingly.
- Mani
[1] https://lore.kernel.org/linux-pci/20221007053934.5188-1-aman1.gupta@samsung…
[2] https://lore.kernel.org/linux-pci/b2a5db97-dc59-33ab-71cd-f591e0b1b34d@linu…
Changes in v6:
* Fixed the documentation to pass max MSI and MSI-X count to configfs
* Collected tags
Changes in v5:
* Incorporated comments from Niklas
* Added a patch to fix the DMA MEMCPY check in pci-epf-test driver
* Collected tags
* Rebased on top of pci/next 0333f56dbbf7ef6bb46d2906766c3e1b2a04a94d
Changes in v4:
* Dropped the BAR fix patches and submitted them separately:
https://lore.kernel.org/linux-pci/20241231130224.38206-1-manivannan.sadhasi…
* Rebased on top of pci/next 9e1b45d7a5bc0ad20f6b5267992da422884b916e
Changes in v3:
* Collected tags.
* Added a note about failing testcase 10 and command to skip it in
documentation.
* Removed Aman Gupta and Padmanabhan Rajanbabu from CC as their addresses are
bouncing.
Changes in v2:
* Added a patch that fixes return values of IOCTL in pci_endpoint_test driver
* Moved the existing tests to new location before migrating
* Added a fix for BARs on Qcom devices
* Updated documentation and also added fixture variants for memcpy & DMA modes
Manivannan Sadhasivam (4):
PCI: endpoint: pci-epf-test: Fix the check for DMA MEMCPY test
misc: pci_endpoint_test: Fix the return value of IOCTL
selftests: Move PCI Endpoint tests from tools/pci to Kselftests
selftests: pci_endpoint: Migrate to Kselftest framework
Documentation/PCI/endpoint/pci-test-howto.rst | 174 +++++-------
MAINTAINERS | 2 +-
drivers/misc/pci_endpoint_test.c | 255 +++++++++--------
drivers/pci/endpoint/functions/pci-epf-test.c | 4 +-
tools/pci/Build | 1 -
tools/pci/Makefile | 58 ----
tools/pci/pcitest.c | 264 ------------------
tools/pci/pcitest.sh | 73 -----
tools/testing/selftests/Makefile | 1 +
.../testing/selftests/pci_endpoint/.gitignore | 2 +
tools/testing/selftests/pci_endpoint/Makefile | 7 +
tools/testing/selftests/pci_endpoint/config | 4 +
.../pci_endpoint/pci_endpoint_test.c | 221 +++++++++++++++
13 files changed, 437 insertions(+), 629 deletions(-)
delete mode 100644 tools/pci/Build
delete mode 100644 tools/pci/Makefile
delete mode 100644 tools/pci/pcitest.c
delete mode 100644 tools/pci/pcitest.sh
create mode 100644 tools/testing/selftests/pci_endpoint/.gitignore
create mode 100644 tools/testing/selftests/pci_endpoint/Makefile
create mode 100644 tools/testing/selftests/pci_endpoint/config
create mode 100644 tools/testing/selftests/pci_endpoint/pci_endpoint_test.c
--
2.25.1