February 2025 - Linux-kselftest-mirror

[PATCH v15 00/15] PCI: EP: Add RC-to-EP doorbell with platform MSI controller

by Frank Li

┌────────────┐ ┌───────────────────────────────────┐ ┌────────────────┐ │ │ │ │ │ │ │ │ │ PCI Endpoint │ │ PCI Host │ │ │ │ │ │ │ │ │◄──┤ 1.platform_msi_domain_alloc_irqs()│ │ │ │ │ │ │ │ │ │ MSI ├──►│ 2.write_msi_msg() ├──►├─BAR<n> │ │ Controller │ │ update doorbell register address│ │ │ │ │ │ for BAR │ │ │ │ │ │ │ │ 3. Write BAR<n>│ │ │◄──┼───────────────────────────────────┼───┤ │ │ │ │ │ │ │ │ ├──►│ 4.Irq Handle │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ └────────────┘ └───────────────────────────────────┘ └────────────────┘ This patches based on old https://lore.kernel.org/imx/20221124055036.1630573-1-Frank.Li@nxp.com/ Original patch only target to vntb driver. But actually it is common method. This patches add new API to pci-epf-core, so any EP driver can use it. Previous v2 discussion here. https://lore.kernel.org/imx/20230911220920.1817033-1-Frank.Li@nxp.com/ Changes in v15: - rebase to v6.14-rc1 - fix build issue find by kernel test robot - Link to v14: https://lore.kernel.org/r/20250207-ep-msi-v14-0-9671b136f2b8@nxp.com Changes in v14: Marc Zyngier raised concerns about adding DOMAIN_BUS_DEVICE_PCI_EP_MSI. As a result, the approach has been reverted to the v9 method. However, there are several improvements: MSI now supports msi-map in addition to msi-parent. - The struct device: id is used as the endpoint function (EPF) device identity to map to the stream ID (sideband information). - The EPC device tree source (DTS) utilizes msi-map to provide such information. - The EPF device's of_node is set to the EPC controller’s node. This approach is commonly used for multi-function device (MFD) platform child devices, allowing them to inherit properties from the MFD device’s DTS, such as reset-cells and gpio-cells. This method is well-suited for the current case, as the EPF is inherently created/binded to the EPC and should inherit the EPC’s DTS node properties. Additionally: Since the basic IMX95 LUT support has already been merged into the mainline, a DTS and driver increment patch is added to complete the solution. The patch is rebased onto the latest linux-next tree and aligned with the new pcitest framework. - Link to v13: https://lore.kernel.org/r/20241218-ep-msi-v13-0-646e2192dc24@nxp.com Changes in v13: - Change to use DOMAIN_BUS_PCI_DEVICE_EP_MSI - Change request id as func | vfunc << 3 - Remove IRQ_DOMAIN_MSI_IMMUTABLE Thomas Gleixner: I hope capture all your points in review comments. If missed, let me know. - Link to v12: https://lore.kernel.org/r/20241211-ep-msi-v12-0-33d4532fa520@nxp.com Changes in v12: - Change to use IRQ_DOMAIN_MSI_IMMUTABLE and add help function irq_domain_msi_is_immuatble(). - split PCI: endpoint: pci-ep-msi: Add MSI address/data pair mutable check to 3 patches - Link to v11: https://lore.kernel.org/r/20241209-ep-msi-v11-0-7434fa8397bd@nxp.com Changes in v11: - Change to use MSI_FLAG_MSG_IMMUTABLE - Link to v10: https://lore.kernel.org/r/20241204-ep-msi-v10-0-87c378dbcd6d@nxp.com Changes in v10: Thomas Gleixner: There are big change in pci-ep-msi.c. I am sure if go on the corrent path. The key improvement is remove only 1 function devices's limitation. I use new patch for imutable check, which relative additional feature compared to base enablement patch. - Remove patch Add msi_remove_device_irq_domain() in platform_device_msi_free_irqs_all() - Add new patch irqchip/gic-v3-its: Avoid overwriting msi_prepare callback if provided by msi_domain_info - Remove only support 1 endpoint function limiation. - Create one MSI domain for each endpoint function devices. - Use "msi-map" in pci ep controler node, instead of of msi-parent. first argument is (func_no << 8 | vfunc_no) - Link to v9: https://lore.kernel.org/r/20241203-ep-msi-v9-0-a60dbc3f15dd@nxp.com Changes in v9 - Add patch platform-msi: Add msi_remove_device_irq_domain() in platform_device_msi_free_irqs_all() - Remove patch PCI: endpoint: Add pci_epc_get_fn() API for customizable filtering - Remove API pci_epf_align_inbound_addr_lo_hi - Move doorbell_alloc in to doorbell_enable function. - Link to v8: https://lore.kernel.org/r/20241116-ep-msi-v8-0-6f1f68ffd1bb@nxp.com Changes in v8: - update helper function name to pci_epf_align_inbound_addr() - Link to v7: https://lore.kernel.org/r/20241114-ep-msi-v7-0-d4ac7aafbd2c@nxp.com Changes in v7: - Add helper function pci_epf_align_addr(); - Link to v6: https://lore.kernel.org/r/20241112-ep-msi-v6-0-45f9722e3c2a@nxp.com Changes in v6: - change doorbell_addr to doorbell_offset - use round_down() - add Niklas's test by tag - rebase to pci/endpoint - Link to v5: https://lore.kernel.org/r/20241108-ep-msi-v5-0-a14951c0d007@nxp.com Changes in v5: - Move request_irq to epf test function driver for more flexiable user case - Add fixed size bar handler - Some minor improvememtn to see each patches's changelog. - Link to v4: https://lore.kernel.org/r/20241031-ep-msi-v4-0-717da2d99b28@nxp.com Changes in v4: - Remove patch genirq/msi: Add cleanup guard define for msi_lock_descs()/msi_unlock_descs() - Use new method to avoid compatible problem. Add new command DOORBELL_ENABLE and DOORBELL_DISABLE. pcitest -B send DOORBELL_ENABLE first, EP test function driver try to remap one of BAR_N (except test register bar) to ITS MSI MMIO space. Old driver don't support new command, so failure return, not side effect. After test, DOORBELL_DISABLE command send out to recover original map, so pcitest bar test can pass as normal. - Other detail change see each patches's change log - Link to v3: https://lore.kernel.org/r/20241015-ep-msi-v3-0-cedc89a16c1a@nxp.com Change from v2 to v3 - Fixed manivannan's comments - Move common part to pci-ep-msi.c and pci-ep-msi.h - rebase to 6.12-rc1 - use RevID to distingiush old version mkdir /sys/kernel/config/pci_ep/functions/pci_epf_test/func1 echo 16 > /sys/kernel/config/pci_ep/functions/pci_epf_test/func1/msi_interrupts echo 0x080c > /sys/kernel/config/pci_ep/functions/pci_epf_test/func1/deviceid echo 0x1957 > /sys/kernel/config/pci_ep/functions/pci_epf_test/func1/vendorid echo 1 > /sys/kernel/config/pci_ep/functions/pci_epf_test/func1/revid ^^^^^^ to enable platform msi support. ln -s /sys/kernel/config/pci_ep/functions/pci_epf_test/func1 /sys/kernel/config/pci_ep/controllers/4c380000.pcie-ep - use new device ID, which identify support doorbell to avoid broken compatility. Enable doorbell support only for PCI_DEVICE_ID_IMX8_DB, while other devices keep the same behavior as before. EP side RC with old driver RC with new driver PCI_DEVICE_ID_IMX8_DB no probe doorbell enabled Other device ID doorbell disabled* doorbell disabled* * Behavior remains unchanged. Change from v1 to v2 - Add missed patch for endpont/pci-epf-test.c - Move alloc and free to epc driver from epf. - Provide general help function for EPC driver to alloc platform msi irq. - Fixed manivannan's comments. Signed-off-by: Frank Li <Frank.Li(a)nxp.com> --- Frank Li (15): platform-msi: Add msi_remove_device_irq_domain() in platform_device_msi_free_irqs_all() irqdomain: Add IRQ_DOMAIN_FLAG_MSI_IMMUTABLE and irq_domain_is_msi_immutable() irqchip/gic-v3-its: Set IRQ_DOMAIN_FLAG_MSI_IMMUTABLE for ITS irqchip/gic-v3-its: Add support for device tree msi-map and msi-mask PCI: endpoint: Set ID and of_node for function driver PCI: endpoint: Add RC-to-EP doorbell support using platform MSI controller PCI: endpoint: pci-ep-msi: Add MSI address/data pair mutable check PCI: endpoint: Add pci_epf_align_inbound_addr() helper for address alignment PCI: endpoint: pci-epf-test: Add doorbell test support misc: pci_endpoint_test: Add doorbell test case selftests: pci_endpoint: Add doorbell test case pci: imx6: Add helper function imx_pcie_add_lut_by_rid() pci: imx6: Add LUT setting for MSI/IOMMU in Endpoint mode arm64: dts: imx95: Add msi-map for pci-ep device arm64: dts: imx95-19x19-evk: Add PCIe1 endpoint function overlay file arch/arm64/boot/dts/freescale/Makefile | 3 + .../dts/freescale/imx95-19x19-evk-pcie1-ep.dtso | 21 ++++ arch/arm64/boot/dts/freescale/imx95.dtsi | 1 + drivers/base/platform-msi.c | 1 + drivers/irqchip/irq-gic-v3-its-msi-parent.c | 8 ++ drivers/irqchip/irq-gic-v3-its.c | 2 +- drivers/misc/pci_endpoint_test.c | 81 +++++++++++++ drivers/pci/controller/dwc/pci-imx6.c | 25 ++-- drivers/pci/endpoint/Makefile | 1 + drivers/pci/endpoint/functions/pci-epf-test.c | 132 +++++++++++++++++++++ drivers/pci/endpoint/pci-ep-msi.c | 90 ++++++++++++++ drivers/pci/endpoint/pci-epf-core.c | 48 ++++++++ include/linux/irqdomain.h | 7 ++ include/linux/pci-ep-msi.h | 28 +++++ include/linux/pci-epf.h | 21 ++++ include/uapi/linux/pcitest.h | 1 + .../selftests/pci_endpoint/pci_endpoint_test.c | 25 ++++ 17 files changed, 486 insertions(+), 9 deletions(-) --- base-commit: 2014c95afecee3e76ca4a56956a936e23283f05b change-id: 20241010-ep-msi-8b4cab33b1be Best regards, --- Frank Li <Frank.Li(a)nxp.com>

7 months, 1 week

4
28
0 0

[PATCH v9 0/8] Buddy allocator like (or non-uniform) folio split

by Zi Yan

Hi all, This patchset adds a new buddy allocator like (or non-uniform) large folio split from a order-n folio to order-m with m < n. It reduces 1. the total number of after-split folios from 2^(n-m) to n-m+1; 2. the amount of memory needed for multi-index xarray split from 2^(n/6-m/6) to n/6-m/6, assuming XA_CHUNK_SHIFT=6; 3. keep more large folios after a split from all order-m folios to order-(n-1) to order-m folios. For example, to split an order-9 to order-0, folio split generates 10 (or 11 for anonymous memory) folios instead of 512, allocates 1 xa_node instead of 8, and leaves 1 order-8, 1 order-7, ..., 1 order-1 and 2 order-0 folios (or 4 order-0 for anonymous memory) instead of 512 order-0 folios. Instead of duplicating existing split_huge_page*() code, __folio_split() is introduced as the shared backend code for both split_huge_page_to_list_to_order() and folio_split(). __folio_split() can support both uniform split and buddy allocator like (or non-uniform) split. All existing split_huge_page*() users can be gradually converted to use folio_split() if possible. In this patchset, I converted truncate_inode_partial_folio() to use folio_split(). xfstests quick group passed for both tmpfs and xfs. It is on top of mm-everything-2025-02-26-03-56 with V8 reverted. It is ready to be merged. Changelog === From V8[11]: 1. Removed gfp parameter from xas_try_split() and GFP_NOWAIT is used all the time. (per Baolin Wang) 2. Used __xas_init_node_for_split() instead of __xas_alloc_node_for_split() and moved node allocation out. It fixed a bug when xa_node is pre-allocated by xas_nomem() before xas_try_split() is called without being initialized for split. From V7[9]: 1. Fixed a wrong function name in lib/test_xarray.c. 2. Made __split_folio_to_order() never fail, since the old order check is already done in __folio_split(). (per David Hildenbrand) 3. Fixed an issue reported by syzbot[10] by not dropping the original folio during truncate. 4. Fixed a WARNING when READ_ONLY_THP_FOR_FS is enabled. (Thank David Hildenbrand for reporting the issue) 5. Used two separate struct page* parameters, split_at and lock_at, to specify at which subpage the non-uniform split happens and which subpage to keep locked after the split, respectively. It improves code readability. From V6[8]: 1. Added an xarray function xas_try_split() to support iterative folio split, removing the need of using xas_split_alloc() and xas_split(). The function guarantees that at most one xa_node is allocated for each call. 2. Added concrete numbers of after-split folios and xa_node savings to cover letter, commit log. (per Andrew) From V5[7]: 1. Split shmem to any lower order patches are in mm tree, so dropped from this series. 2. Rename split_folio_at() to try_folio_split() to clarify that non-uniform split will not be used if it is not supported. From V4[6]: 1. Enabled shmem support in both uniform and buddy allocator like split and added selftests for it. 2. Added functions to check if uniform split and buddy allocator like split are supported for the given folio and order. 3. Made truncate fall back to uniform split if buddy allocator split is not supported (CONFIG_READ_ONLY_THP_FOR_FS and FS without large folio). 4. Added the missing folio_clear_has_hwpoisoned() to __split_unmapped_folio(). From V3[5]: 1. Used xas_split_alloc(GFP_NOWAIT) instead of xas_nomem(), since extra operations inside xas_split_alloc() are needed for correctness. 2. Enabled folio_split() for shmem and no issue was found with xfstests quick test group. 3. Split both ends of a truncate range in truncate_inode_partial_folio() to avoid wasting memory in shmem truncate (per David Hildenbrand). 4. Removed page_in_folio_offset() since page_folio() does the same thing. 5. Finished truncate related tests from xfstests quick test group on XFS and tmpfs without issues. 6. Disabled buddy allocator like split on CONFIG_READ_ONLY_THP_FOR_FS and FS without large folio. This check was missed in the prior versions. From V2[3]: 1. Incorporated all the feedback from Kirill[4]. 2. Used GFP_NOWAIT for xas_nomem(). 3. Tested the code path when xas_nomem() fails. 4. Added selftests for folio_split(). 5. Fixed no THP config build error. From V1[2]: 1. Split the original patch 1 into multiple ones for easy review (per Kirill). 2. Added xas_destroy() to avoid memory leak. 3. Fixed nr_dropped not used error (per kernel test robot). 4. Added proper error handling when xas_nomem() fails to allocate memory for xas_split() during buddy allocator like split. From RFC[1]: 1. Merged backend code of split_huge_page_to_list_to_order() and folio_split(). The same code is used for both uniform split and buddy allocator like split. 2. Use xas_nomem() instead of xas_split_alloc() for folio_split(). 3. folio_split() now leaves the first after-split folio unlocked, instead of the one containing the given page, since the caller of truncate_inode_partial_folio() locks and unlocks the first folio. 4. Extended split_huge_page debugfs to use folio_split(). 5. Added truncate_inode_partial_folio() as first user of folio_split(). Design === folio_split() splits a large folio in the same way as buddy allocator splits a large free page for allocation. The purpose is to minimize the number of folios after the split. For example, if user wants to free the 3rd subpage in a order-9 folio, folio_split() will split the order-9 folio as: O-0, O-0, O-0, O-0, O-2, O-3, O-4, O-5, O-6, O-7, O-8 if it is anon O-1, O-0, O-0, O-2, O-3, O-4, O-5, O-6, O-7, O-9 if it is pagecache Since anon folio does not support order-1 yet. The split process is similar to existing approach: 1. Unmap all page mappings (split PMD mappings if exist); 2. Split meta data like memcg, page owner, page alloc tag; 3. Copy meta data in struct folio to sub pages, but instead of spliting the whole folio into multiple smaller ones with the same order in a shot, this approach splits the folio iteratively. Taking the example above, this approach first splits the original order-9 into two order-8, then splits left part of order-8 to two order-7 and so on; 4. Post-process split folios, like write mapping->i_pages for pagecache, adjust folio refcounts, add split folios to corresponding list; 5. Remap split folios 6. Unlock split folios. __split_unmapped_folio() and __split_folio_to_order() replace __split_huge_page() and __split_huge_page_tail() respectively. __split_unmapped_folio() uses different approaches to perform uniform split and buddy allocator like split: 1. uniform split: one single call to __split_folio_to_order() is used to uniformly split the given folio. All resulting folios are put back to the list after split. The folio containing the given page is left to caller to unlock and others are unlocked. 2. buddy allocator like (or non-uniform) split: (old_order - new_order) calls to __split_folio_to_order() are used to split the given folio at order N to order N-1. After each call, the target folio is changed to the one containing the page, which is given as a folio_split() parameter. After each call, folios not containing the page are put back to the list. The folio containing the page is put back to the list when its order is new_order. All folios are unlocked except the first folio, which is left to caller to unlock. Patch Overview === 1. Patch 1 added a new xarray function xas_try_split() to perform iterative xarray split. 2. Patch 2 added __split_unmapped_folio() and __split_folio_to_order() to prepare for moving to new backend split code. 3. Patch 3 moved common code in split_huge_page_to_list_to_order() to __folio_split(). 4. Patch 4 added new folio_split() and made split_huge_page_to_list_to_order() share the new __split_unmapped_folio() with folio_split(). 5. Patch 5 removed no longer used __split_huge_page() and __split_huge_page_tail(). 6. Patch 6 added a new in_folio_offset to split_huge_page debugfs for folio_split() test. 7. Patch 7 used try_folio_split() for truncate operation. 8. Patch 8 added folio_split() tests. Any comments and/or suggestions are welcome. Thanks. [1] https://lore.kernel.org/linux-mm/20241008223748.555845-1-ziy@nvidia.com/ [2] https://lore.kernel.org/linux-mm/20241028180932.1319265-1-ziy@nvidia.com/ [3] https://lore.kernel.org/linux-mm/20241101150357.1752726-1-ziy@nvidia.com/ [4] https://lore.kernel.org/linux-mm/e6ppwz5t4p4kvir6eqzoto4y5fmdjdxdyvxvtw43nc… [5] https://lore.kernel.org/linux-mm/20241205001839.2582020-1-ziy@nvidia.com/ [6] https://lore.kernel.org/linux-mm/20250106165513.104899-1-ziy@nvidia.com/ [7] https://lore.kernel.org/linux-mm/20250116211042.741543-1-ziy@nvidia.com/ [8] https://lore.kernel.org/linux-mm/20250205031417.1771278-1-ziy@nvidia.com/ [9] https://lore.kernel.org/linux-mm/20250211155034.268962-1-ziy@nvidia.com/ [10] https://lore.kernel.org/all/67af65cb.050a0220.21dd3.004a.GAE@google.com/ [11] https://lore.kernel.org/linux-mm/20250218235012.1542225-1-ziy@nvidia.com/ Zi Yan (8): xarray: add xas_try_split() to split a multi-index entry mm/huge_memory: add two new (not yet used) functions for folio_split() mm/huge_memory: move folio split common code to __folio_split() mm/huge_memory: add buddy allocator like (non-uniform) folio_split() mm/huge_memory: remove the old, unused __split_huge_page() mm/huge_memory: add folio_split() to debugfs testing interface mm/truncate: use buddy allocator like folio split for truncate operation selftests/mm: add tests for folio_split(), buddy allocator like split Documentation/core-api/xarray.rst | 14 +- include/linux/huge_mm.h | 36 + include/linux/xarray.h | 6 + lib/test_xarray.c | 52 ++ lib/xarray.c | 131 ++- mm/huge_memory.c | 755 ++++++++++++------ mm/truncate.c | 31 +- tools/testing/radix-tree/Makefile | 1 + .../selftests/mm/split_huge_page_test.c | 34 +- 9 files changed, 783 insertions(+), 277 deletions(-) -- 2.47.2

7 months, 1 week

5
30
0 0

[PATCH bpf-next v2 0/3] bpf: Fix use-after-free of sockmap

by Jiayuan Chen

1. Issue Syzkaller reported this issue [1]. 2. Reproduce We can reproduce this issue by using the test_sockmap_with_close_on_write() test I provided in selftest, also you need to apply the following patch to ensure 100% reproducibility (sleep after checking sock): ''' static void sk_psock_verdict_data_ready(struct sock *sk) { ....... if (unlikely(!sock)) return; + if (!strcmp("test_progs", current->comm)) { + printk("sleep 2s to wait socket freed\n"); + mdelay(2000); + printk("sleep end\n"); + } ops = READ_ONCE(sock->ops); if (!ops || !ops->read_skb) return; } ''' Then running './test_progs -v sockmap_basic', and if the kernel has KASAN enabled [2], you will see the following warning: ''' BUG: KASAN: slab-use-after-free in sk_psock_verdict_data_ready+0x29b/0x2d0 Read of size 8 at addr ffff88813a777020 by task test_progs/47055 Tainted: [O]=OOT_MODULE Call Trace: <TASK> dump_stack_lvl+0x53/0x70 print_address_description.constprop.0+0x30/0x420 ? sk_psock_verdict_data_ready+0x29b/0x2d0 print_report+0xb7/0x270 ? sk_psock_verdict_data_ready+0x29b/0x2d0 ? kasan_addr_to_slab+0xd/0xa0 ? sk_psock_verdict_data_ready+0x29b/0x2d0 kasan_report+0xca/0x100 ? sk_psock_verdict_data_ready+0x29b/0x2d0 sk_psock_verdict_data_ready+0x29b/0x2d0 unix_stream_sendmsg+0x4a6/0xa40 ? __pfx_unix_stream_sendmsg+0x10/0x10 ? fdget+0x2c1/0x3a0 __sys_sendto+0x39c/0x410 ''' 3. Reason ''' CPU0 CPU1 unix_stream_sendmsg(sk): other = unix_peer(sk) other->sk_data_ready(other): socket *sock = sk->sk_socket if (unlikely(!sock)) return; close(other): ... other->close() free(socket) READ_ONCE(sock->ops) ^ use 'sock' after free ''' For TCP, UDP, or other protocols, we have already performed rcu_read_lock() when the network stack receives packets in ip_input.c: ''' ip_local_deliver_finish(): rcu_read_lock() ip_protocol_deliver_rcu() xxx_rcv rcu_read_unlock() ''' However, for Unix sockets, sk_data_ready is called directly from the process context without rcu_read_lock() protection. 4. Solution Based on the fact that the 'struct socket' is released using call_rcu(), We add rcu_read_{un}lock() at the entrance and exit of our sk_data_ready. It will not increase performance overhead, at least for TCP and UDP, they are already in a relatively large critical section. Of course, we can also add a custom callback for Unix sockets and call rcu_read_lock() before calling _verdict_data_ready like this: ''' if (sk_is_unix(sk)) sk->sk_data_ready = sk_psock_verdict_data_ready_rcu; else sk->sk_data_ready = sk_psock_verdict_data_ready; sk_psock_verdict_data_ready_rcu(): rcu_read_lock() sk_psock_verdict_data_ready() rcu_read_unlock() ''' However, this will cause too many branches, and it's not suitable to distinguish network protocols in skmsg.c. [1] https://syzkaller.appspot.com/bug?extid=dd90a702f518e0eac072 [2] https://syzkaller.appspot.com/text?tag=KernelConfig&x=1362a5aee630ff34 --- v1 -> v2: 1. Add Fixes tag. 2. Extend selftest of edge case for TCP/UDP sockets. 3. Add Reviewed-by and Acked-by tag. https://lore.kernel.org/bpf/20250226132242.52663-1-jiayuan.chen@linux.dev/T… Jiayuan Chen (3): bpf, sockmap: avoid using sk_socket after free selftests/bpf: Add socketpair to create_pair to support unix socket selftests/bpf: Add edge case tests for sockmap net/core/skmsg.c | 18 ++++-- .../selftests/bpf/prog_tests/socket_helpers.h | 13 +++- .../selftests/bpf/prog_tests/sockmap_basic.c | 59 +++++++++++++++++++ 3 files changed, 84 insertions(+), 6 deletions(-) -- 2.47.1

7 months, 1 week

2
7
0 0

[PATCH bpf-next v12 0/5] xsk: TX metadata Launch Time support

by Song Yoong Siang

This series expands the XDP TX metadata framework to allow user applications to pass per packet 64-bit launch time directly to the kernel driver, requesting launch time hardware offload support. The XDP TX metadata framework will not perform any clock conversion or packet reordering. Please note that the role of Tx metadata is just to pass the launch time, not to enable the offload feature. Users will need to enable the launch time hardware offload feature of the device by using the respective command, such as the tc-etf command. Although some devices use the tc-etf command to enable their launch time hardware offload feature, xsk packets will not go through the etf qdisc. Therefore, in my opinion, the launch time should always be based on the PTP Hardware Clock (PHC). Thus, i did not include a clock ID to indicate the clock source. To simplify the test steps, I modified the xdp_hw_metadata bpf self-test tool in such a way that it will set the launch time based on the offset provided by the user and the value of the Receive Hardware Timestamp, which is against the PHC. This will eliminate the need to discipline System Clock with the PHC and then use clock_gettime() to get the time. Please note that AF_XDP lacks a feedback mechanism to inform the application if the requested launch time is invalid. So, users are expected to familiar with the horizon of the launch time of the device they use and not request a launch time that is beyond the horizon. Otherwise, the driver might interpret the launch time incorrectly and react wrongly. For stmmac and igc, where modulo computation is used, a launch time larger than the horizon will cause the device to transmit the packet earlier that the requested launch time. Although there is no feedback mechanism for the launch time request for now, user still can check whether the requested launch time is working or not, by requesting the Transmit Completion Hardware Timestamp. v12: - Fix the comment in include/uapi/linux/if_xdp.h to allign with what is generated by ./tools/net/ynl/ynl-regen.sh to avoid dirty tree error in the netdev/ynl checks. v11: https://lore.kernel.org/netdev/20250216074302.956937-1-yoong.siang.song@int… - regenerate netdev_xsk_flags based on latest netdev.yaml (Jakub) v10: https://lore.kernel.org/netdev/20250207021943.814768-1-yoong.siang.song@int… - use net_err_ratelimited(), instead of net_ratelimit() (Maciej) - accumulate the amount of used descs in local variable and update the igc_metadata_request::used_desc once (Maciej) - Ensure reverse christmas tree rule (Maciej) V9: https://lore.kernel.org/netdev/20250206060408.808325-1-yoong.siang.song@int… - Remove the igc_desc_unused() checking (Maciej) - Ensure that skb allocation and DMA mapping work before proceeding to fill in igc_tx_buffer info, context desc, and data desc (Maciej) - Rate limit the error messages (Maciej) - Update the comment to indicate that the 2 descriptors needed by the empty frame are already taken into consideration (Maciej) - Handle the case where the insertion of an empty frame fails and explain the reason behind (Maciej) - put self SOB tag as last tag (Maciej) V8: https://lore.kernel.org/netdev/20250205024116.798862-1-yoong.siang.song@int… - check the number of used descriptor in xsk_tx_metadata_request() by using used_desc of struct igc_metadata_request, and then decreases the budget with it (Maciej) - submit another bug fix patch to set the buffer type for empty frame (Maciej): https://lore.kernel.org/netdev/20250205023603.798819-1-yoong.siang.song@int… V7: https://lore.kernel.org/netdev/20250204004907.789330-1-yoong.siang.song@int… - split the refactoring code of igc empty packet insertion into a separate commit (Faizal) - add explanation on why the value "4" is used as igc transmit budget (Faizal) - perform a stress test by sending 1000 packets with 10ms interval and launch time set to 500us in the future (Faizal & Yong Liang) V6: https://lore.kernel.org/netdev/20250116155350.555374-1-yoong.siang.song@int… - fix selftest build errors by using asprintf() and realloc(), instead of managing the buffer sizes manually (Daniel, Stanislav) V5: https://lore.kernel.org/netdev/20250114152718.120588-1-yoong.siang.song@int… - change netdev feature name from tx-launch-time to tx-launch-time-fifo to explicitly state the FIFO behaviour (Stanislav) - improve the looping of xdp_hw_metadata app to wait for packet tx completion to be more readable by using clock_gettime() (Stanislav) - add launch time setup steps into xdp_hw_metadata app (Stanislav) V4: https://lore.kernel.org/netdev/20250106135506.9687-1-yoong.siang.song@intel… - added XDP launch time support to the igc driver (Jesper & Florian) - added per-driver launch time limitation on xsk-tx-metadata.rst (Jesper) - added explanation on FIFO behavior on xsk-tx-metadata.rst (Jakub) - added step to enable launch time in the commit message (Jesper & Willem) - explicitly documented the type of launch_time and which clock source it is against (Willem) V3: https://lore.kernel.org/netdev/20231203165129.1740512-1-yoong.siang.song@in… - renamed to use launch time (Jesper & Willem) - changed the default launch time in xdp_hw_metadata apps from 1s to 0.1s because some NICs do not support such a large future time. V2: https://lore.kernel.org/netdev/20231201062421.1074768-1-yoong.siang.song@in… - renamed to use Earliest TxTime First (Willem) - renamed to use txtime (Willem) V1: https://lore.kernel.org/netdev/20231130162028.852006-1-yoong.siang.song@int… Song Yoong Siang (5): xsk: Add launch time hardware offload support to XDP Tx metadata selftests/bpf: Add launch time request to xdp_hw_metadata net: stmmac: Add launch time support to XDP ZC igc: Refactor empty frame insertion for launch time support igc: Add launch time support to XDP ZC Documentation/netlink/specs/netdev.yaml | 4 + Documentation/networking/xsk-tx-metadata.rst | 62 +++++++ drivers/net/ethernet/intel/igc/igc.h | 1 + drivers/net/ethernet/intel/igc/igc_main.c | 143 +++++++++++---- drivers/net/ethernet/stmicro/stmmac/stmmac.h | 2 + .../net/ethernet/stmicro/stmmac/stmmac_main.c | 13 ++ include/net/xdp_sock.h | 10 ++ include/net/xdp_sock_drv.h | 1 + include/uapi/linux/if_xdp.h | 10 ++ include/uapi/linux/netdev.h | 3 + net/core/netdev-genl.c | 2 + net/xdp/xsk.c | 3 + tools/include/uapi/linux/if_xdp.h | 10 ++ tools/include/uapi/linux/netdev.h | 3 + tools/testing/selftests/bpf/xdp_hw_metadata.c | 168 +++++++++++++++++- 15 files changed, 396 insertions(+), 39 deletions(-) -- 2.34.1

7 months, 1 week

5
10
0 0

[PATCH v3 0/2] scanf: convert self-test to KUnit

by Tamir Duberstein

This is one of just 3 remaining "Test Module" kselftests (the others being bitmap and printf), the rest having been converted to KUnit. In addition to the enclosed patch, please consider this an RFC on the removal of the "Test Module" kselftest machinery. I tested this using: $ tools/testing/kunit/kunit.py run --arch arm64 --make_options LLVM=1 scanf Signed-off-by: Tamir Duberstein <tamird(a)gmail.com> --- Changes in v3: - Reduce diff noise in lib/Makefile. (Petr Mladek) - Split `scanf_test` into a few test cases. New output: : =================== scanf (10 subtests) ==================== : [PASSED] numbers_simple : ====================== numbers_list ======================= : [PASSED] delim=" " : [PASSED] delim=":" : [PASSED] delim="," : [PASSED] delim="-" : [PASSED] delim="/" : ================== [PASSED] numbers_list =================== : ============ numbers_list_field_width_typemax ============= : [PASSED] delim=" " : [PASSED] delim=":" : [PASSED] delim="," : [PASSED] delim="-" : [PASSED] delim="/" : ======== [PASSED] numbers_list_field_width_typemax ========= : =========== numbers_list_field_width_val_width ============ : [PASSED] delim=" " : [PASSED] delim=":" : [PASSED] delim="," : [PASSED] delim="-" : [PASSED] delim="/" : ======= [PASSED] numbers_list_field_width_val_width ======== : [PASSED] numbers_slice : [PASSED] numbers_prefix_overflow : [PASSED] test_simple_strtoull : [PASSED] test_simple_strtoll : [PASSED] test_simple_strtoul : [PASSED] test_simple_strtol : ====================== [PASSED] scanf ====================== : ============================================================ : Testing complete. Ran 22 tests: passed: 22 : Elapsed time: 5.517s total, 0.001s configuring, 5.440s building, 0.067s running - Link to v2: https://lore.kernel.org/r/20250203-scanf-kunit-convert-v2-1-277a618d804e@gm… Changes in v2: - Rename lib/{test_scanf.c => scanf_kunit.c}. (Andy Shevchenko) - Link to v1: https://lore.kernel.org/r/20250131-scanf-kunit-convert-v1-1-0976524f0eba@gm… --- Tamir Duberstein (2): scanf: convert self-test to KUnit scanf: break kunit into test cases MAINTAINERS | 2 +- arch/m68k/configs/amiga_defconfig | 1 - arch/m68k/configs/apollo_defconfig | 1 - arch/m68k/configs/atari_defconfig | 1 - arch/m68k/configs/bvme6000_defconfig | 1 - arch/m68k/configs/hp300_defconfig | 1 - arch/m68k/configs/mac_defconfig | 1 - arch/m68k/configs/multi_defconfig | 1 - arch/m68k/configs/mvme147_defconfig | 1 - arch/m68k/configs/mvme16x_defconfig | 1 - arch/m68k/configs/q40_defconfig | 1 - arch/m68k/configs/sun3_defconfig | 1 - arch/m68k/configs/sun3x_defconfig | 1 - arch/powerpc/configs/ppc64_defconfig | 1 - lib/Kconfig.debug | 20 +- lib/Makefile | 2 +- lib/scanf_kunit.c | 800 ++++++++++++++++++++++++++++++++++ lib/test_scanf.c | 814 ----------------------------------- tools/testing/selftests/lib/Makefile | 2 +- tools/testing/selftests/lib/config | 1 - tools/testing/selftests/lib/scanf.sh | 4 - 21 files changed, 820 insertions(+), 838 deletions(-) --- base-commit: a86bf2283d2c9769205407e2b54777c03d012939 change-id: 20250131-scanf-kunit-convert-f70dc33bb34c Best regards, -- Tamir Duberstein <tamird(a)gmail.com>

7 months, 1 week

3
6
0 0

[PATCH v5 0/3] printf: convert self-test to KUnit

by Tamir Duberstein

This is one of just 3 remaining "Test Module" kselftests (the others being bitmap and scanf), the rest having been converted to KUnit. I tested this using: $ tools/testing/kunit/kunit.py run --arch arm64 --make_options LLVM=1 printf I have also sent out a series converting scanf[0]. Link: https://lore.kernel.org/all/20250204-scanf-kunit-convert-v3-0-386d7c3ee714@… [0] Signed-off-by: Tamir Duberstein <tamird(a)gmail.com> --- Changes in v5: - Update `do_test` `__printf` annotation (Rasmus Villemoes). - Link to v4: https://lore.kernel.org/r/20250214-printf-kunit-convert-v4-0-c254572f1565@g… Changes in v4: - Add patch "implicate test line in failure messages". - Rebase on linux-next, move scanf_kunit.c into lib/tests/. - Link to v3: https://lore.kernel.org/r/20250210-printf-kunit-convert-v3-0-ee6ac5500f5e@g… Changes in v3: - Remove extraneous trailing newlines from failure messages. - Replace `pr_warn` with `kunit_warn`. - Drop arch changes. - Remove KUnit boilerplate from CONFIG_PRINTF_KUNIT_TEST help text. - Restore `total_tests` counting. - Remove tc_fail macro in last patch. - Link to v2: https://lore.kernel.org/r/20250207-printf-kunit-convert-v2-0-057b23860823@g… Changes in v2: - Incorporate code review from prior work[0] by Arpitha Raghunandan. - Link to v1: https://lore.kernel.org/r/20250204-printf-kunit-convert-v1-0-ecf1b846a4de@g… Link: https://lore.kernel.org/lkml/20200817043028.76502-1-98.arpi@gmail.com/t/#u [0] --- Tamir Duberstein (3): printf: convert self-test to KUnit printf: break kunit into test cases printf: implicate test line in failure messages Documentation/core-api/printk-formats.rst | 4 +- MAINTAINERS | 2 +- lib/Kconfig.debug | 12 +- lib/Makefile | 1 - lib/tests/Makefile | 1 + lib/{test_printf.c => tests/printf_kunit.c} | 437 ++++++++++++---------------- tools/testing/selftests/lib/config | 1 - tools/testing/selftests/lib/printf.sh | 4 - 8 files changed, 200 insertions(+), 262 deletions(-) --- base-commit: d4b0fd87ff0d4338b259dc79b2b3c6f7e70e8afa change-id: 20250131-printf-kunit-convert-fd4012aa2ec6 Best regards, -- Tamir Duberstein <tamird(a)gmail.com>

7 months, 1 week

2
16
0 0

[PATCH net-next v6 0/8] Device memory TCP TX

by Mina Almasry

v6: https://lore.kernel.org/netdev/20250222191517.743530-1-almasrymina@google.c… === v6 has no major changes. Addressed a few issues from Paolo and David, and collected Acks from Stan. Thank you everyone for the review! Changes: - retain behavior to process MSG_FASTOPEN even if the provided cmsg is invalid (Paolo). - Rework the freeing of tx_vec slightly (it now has its own err label). (Paolo). - Squash the commit that makes dmabuf unbinding scheduled work into the same one which implements the TX path so we don't run into future errors on bisecting (Paolo). - Fix/add comments to explain how dmabuf binding refcounting works (David). v5: https://lore.kernel.org/netdev/20250220020914.895431-1-almasrymina@google.c… === v5 has no major changes; it clears up the relatively minor issues pointed out to in v4, and rebases the series on top of net-next to resolve the conflict with a patch that raced to the tree. It also collects the review tags from v4. Changes: - Rebase to net-next - Fix issues in selftest (Stan). - Address comments in the devmem and netmem driver docs (Stan and Bagas) - Fix zerocopy_fill_skb_from_devmem return error code (Stan). v4: https://lore.kernel.org/netdev/20250203223916.1064540-1-almasrymina@google.… === v4 mainly addresses the critical driver support issue surfaced in v3 by Paolo and Stan. Drivers aiming to support netmem_tx should make sure not to pass the netmem dma-addrs to the dma-mapping APIs, as these dma-addrs may come from dma-bufs. Additionally other feedback from v3 is addressed. Major changes: - Add helpers to handle netmem dma-addrs. Add GVE support for netmem_tx. - Fix binding->tx_vec not being freed on error paths during the tx binding. - Add a minimal devmem_tx test to devmem.py. - Clean up everything obsolete from the cover letter (Paolo). v3: https://patchwork.kernel.org/project/netdevbpf/list/?series=929401&state=* === Address minor comments from RFCv2 and fix a few build warnings and ynl-regen issues. No major changes. RFC v2: https://patchwork.kernel.org/project/netdevbpf/list/?series=920056&state=* ======= RFC v2 addresses much of the feedback from RFC v1. I plan on sending something close to this as net-next reopens, sending it slightly early to get feedback if any. Major changes: -------------- - much improved UAPI as suggested by Stan. We now interpret the iov_base of the passed in iov from userspace as the offset into the dmabuf to send from. This removes the need to set iov.iov_base = NULL which may be confusing to users, and enables us to send multiple iovs in the same sendmsg() call. ncdevmem and the docs show a sample use of that. - Removed the duplicate dmabuf iov_iter in binding->iov_iter. I think this is good improvment as it was confusing to keep track of 2 iterators for the same sendmsg, and mistracking both iterators caused a couple of bugs reported in the last iteration that are now resolved with this streamlining. - Improved test coverage in ncdevmem. Now multiple sendmsg() are tested, and sending multiple iovs in the same sendmsg() is tested. - Fixed issue where dmabuf unmapping was happening in invalid context (Stan). ==================================================================== The TX path had been dropped from the Device Memory TCP patch series post RFCv1 [1], to make that series slightly easier to review. This series rebases the implementation of the TX path on top of the net_iov/netmem framework agreed upon and merged. The motivation for the feature is thoroughly described in the docs & cover letter of the original proposal, so I don't repeat the lengthy descriptions here, but they are available in [1]. Full outline on usage of the TX path is detailed in the documentation included with this series. Test example is available via the kselftest included in the series as well. The series is relatively small, as the TX path for this feature largely piggybacks on the existing MSG_ZEROCOPY implementation. Patch Overview: --------------- 1. Documentation & tests to give high level overview of the feature being added. 1. Add netmem refcounting needed for the TX path. 2. Devmem TX netlink API. 3. Devmem TX net stack implementation. 4. Make dma-buf unbinding scheduled work to handle TX cases where it gets freed from contexts where we can't sleep. 5. Add devmem TX documentation. 6. Add scaffolding enabling driver support for netmem_tx. Add helpers, driver feature flag, and docs to enable drivers to declare netmem_tx support. 7. Guard netmem_tx against being enabled against drivers that don't support it. 8. Add devmem_tx selftests. Add TX path to ncdevmem and add a test to devmem.py. Testing: -------- Testing is very similar to devmem TCP RX path. The ncdevmem test used for the RX path is now augemented with client functionality to test TX path. * Test Setup: Kernel: net-next with this RFC and memory provider API cherry-picked locally. Hardware: Google Cloud A3 VMs. NIC: GVE with header split & RSS & flow steering support. Performance results are not included with this version, unfortunately. I'm having issues running the dma-buf exporter driver against the upstream kernel on my test setup. The issues are specific to that dma-buf exporter and do not affect this patch series. I plan to follow up this series with perf fixes if the tests point to issues once they're up and running. Special thanks to Stan who took a stab at rebasing the TX implementation on top of the netmem/net_iov framework merged. Parts of his proposal [2] that are reused as-is are forked off into their own patches to give full credit. [1] https://lore.kernel.org/netdev/20240909054318.1809580-1-almasrymina@google.… [2] https://lore.kernel.org/netdev/20240913150913.1280238-2-sdf@fomichev.me/T/#… Cc: sdf(a)fomichev.me Cc: asml.silence(a)gmail.com Cc: dw(a)davidwei.uk Cc: Jamal Hadi Salim <jhs(a)mojatatu.com> Cc: Victor Nogueira <victor(a)mojatatu.com> Cc: Pedro Tammela <pctammela(a)mojatatu.com> Cc: Samiullah Khawaja <skhawaja(a)google.com> Mina Almasry (7): net: add get_netmem/put_netmem support net: devmem: Implement TX path net: add devmem TCP TX documentation net: enable driver support for netmem TX gve: add netmem TX support to GVE DQO-RDA mode net: check for driver support in netmem TX selftests: ncdevmem: Implement devmem TCP TX Stanislav Fomichev (1): net: devmem: TCP tx netlink api Documentation/netlink/specs/netdev.yaml | 12 + Documentation/networking/devmem.rst | 150 ++++++++- .../networking/net_cachelines/net_device.rst | 1 + Documentation/networking/netdev-features.rst | 5 + Documentation/networking/netmem.rst | 23 +- drivers/net/ethernet/google/gve/gve_main.c | 4 + drivers/net/ethernet/google/gve/gve_tx_dqo.c | 8 +- include/linux/netdevice.h | 2 + include/linux/skbuff.h | 17 +- include/linux/skbuff_ref.h | 4 +- include/net/netmem.h | 23 ++ include/net/sock.h | 1 + include/uapi/linux/netdev.h | 1 + net/core/datagram.c | 48 ++- net/core/dev.c | 3 + net/core/devmem.c | 115 ++++++- net/core/devmem.h | 77 ++++- net/core/netdev-genl-gen.c | 13 + net/core/netdev-genl-gen.h | 1 + net/core/netdev-genl.c | 73 ++++- net/core/skbuff.c | 48 ++- net/core/sock.c | 6 + net/ipv4/ip_output.c | 3 +- net/ipv4/tcp.c | 50 ++- net/ipv6/ip6_output.c | 3 +- net/vmw_vsock/virtio_transport_common.c | 5 +- tools/include/uapi/linux/netdev.h | 1 + .../selftests/drivers/net/hw/devmem.py | 26 +- .../selftests/drivers/net/hw/ncdevmem.c | 300 +++++++++++++++++- 29 files changed, 950 insertions(+), 73 deletions(-) base-commit: 80c4a0015ce249cf0917a04dbb3cc652a6811079 -- 2.48.1.658.g4767266eb4-goog

7 months, 2 weeks

6
24
0 0

[PATCH v2] selftests: livepatch: test if ftrace can trace a livepatched function

by Filipe Xavier

This new test makes sure that ftrace can trace a function that was introduced by a livepatch. Signed-off-by: Filipe Xavier <felipeaggger(a)gmail.com> --- Changes in v2: - functions.sh: added reset tracing on push and pop_config. - test-ftrace.sh: enabled tracing_on before test init. - nitpick: added double quotations on filenames and fixed some wording. - Link to v1: https://lore.kernel.org/r/20250102-ftrace-selftest-livepatch-v1-1-84880baef… --- tools/testing/selftests/livepatch/functions.sh | 14 ++++++++++ tools/testing/selftests/livepatch/test-ftrace.sh | 33 ++++++++++++++++++++++++ 2 files changed, 47 insertions(+) diff --git a/tools/testing/selftests/livepatch/functions.sh b/tools/testing/selftests/livepatch/functions.sh index e5d06fb402335d85959bafe099087effc6ddce12..e6c13514002dae5f8d7461f90b8241ab43024ea4 100644 --- a/tools/testing/selftests/livepatch/functions.sh +++ b/tools/testing/selftests/livepatch/functions.sh @@ -62,6 +62,9 @@ function push_config() { awk -F'[: ]' '{print "file " $1 " line " $2 " " $4}') FTRACE_ENABLED=$(sysctl --values kernel.ftrace_enabled) KPROBE_ENABLED=$(cat "$SYSFS_KPROBES_DIR/enabled") + TRACING_ON=$(cat "$SYSFS_DEBUG_DIR/tracing/tracing_on") + CURRENT_TRACER=$(cat "$SYSFS_DEBUG_DIR/tracing/current_tracer") + FTRACE_FILTER=$(cat "$SYSFS_DEBUG_DIR/tracing/set_ftrace_filter") } function pop_config() { @@ -74,6 +77,17 @@ function pop_config() { if [[ -n "$KPROBE_ENABLED" ]]; then echo "$KPROBE_ENABLED" > "$SYSFS_KPROBES_DIR/enabled" fi + if [[ -n "$TRACING_ON" ]]; then + echo "$TRACING_ON" > "$SYSFS_DEBUG_DIR/tracing/tracing_on" + fi + if [[ -n "$CURRENT_TRACER" ]]; then + echo "$CURRENT_TRACER" > "$SYSFS_DEBUG_DIR/tracing/current_tracer" + fi + if [[ "$FTRACE_FILTER" == *"#"* ]]; then + echo > "$SYSFS_DEBUG_DIR/tracing/set_ftrace_filter" + elif [[ -n "$FTRACE_FILTER" ]]; then + echo "$FTRACE_FILTER" > "$SYSFS_DEBUG_DIR/tracing/set_ftrace_filter" + fi } function set_dynamic_debug() { diff --git a/tools/testing/selftests/livepatch/test-ftrace.sh b/tools/testing/selftests/livepatch/test-ftrace.sh index fe14f248913acbec46fb6c0fec38a2fc84209d39..66af5d726c52e48e5177804e182b4ff31784d5ac 100755 --- a/tools/testing/selftests/livepatch/test-ftrace.sh +++ b/tools/testing/selftests/livepatch/test-ftrace.sh @@ -61,4 +61,37 @@ livepatch: '$MOD_LIVEPATCH': unpatching complete % rmmod $MOD_LIVEPATCH" +# - verify livepatch can load +# - check if traces have a patched function +# - unload livepatch and reset trace + +start_test "trace livepatched function and check that the live patch remains in effect" + +TRACE_FILE="$SYSFS_DEBUG_DIR/tracing/trace" +FUNCTION_NAME="livepatch_cmdline_proc_show" + +load_lp $MOD_LIVEPATCH + +echo 1 > "$SYSFS_DEBUG_DIR/tracing/tracing_on" +echo $FUNCTION_NAME > "$SYSFS_DEBUG_DIR/tracing/set_ftrace_filter" +echo "function" > "$SYSFS_DEBUG_DIR/tracing/current_tracer" +echo "" > "$TRACE_FILE" + +if [[ "$(cat /proc/cmdline)" != "$MOD_LIVEPATCH: this has been live patched" ]] ; then + echo -e "FAIL\n\n" + die "livepatch kselftest(s) failed" +fi + +grep -q $FUNCTION_NAME "$TRACE_FILE" +FOUND=$? + +disable_lp $MOD_LIVEPATCH +unload_lp $MOD_LIVEPATCH + +if [ "$FOUND" -eq 1 ]; then + echo -e "FAIL\n\n" + die "livepatch kselftest(s) failed" +fi + + exit 0 --- base-commit: fc033cf25e612e840e545f8d5ad2edd6ba613ed5 change-id: 20250101-ftrace-selftest-livepatch-161fb77dbed8 Best regards, -- Filipe Xavier <felipeaggger(a)gmail.com>

7 months, 2 weeks

4
4
0 0

[PATCH v8 0/4] scanf: convert self-test to KUnit

by Tamir Duberstein

This is one of just 3 remaining "Test Module" kselftests (the others being bitmap and printf), the rest having been converted to KUnit. In addition to the enclosed patch, please consider this an RFC on the removal of the "Test Module" kselftest machinery. I tested this using: $ tools/testing/kunit/kunit.py run --arch arm64 --make_options LLVM=1 scanf Failure output before this series: [ 383.100048] test_scanf: vsscanf("1574 9 64ca 935b 7 142d ff58 0", "%4hx %1hx %4hx %4hx %1hx %4hx %4hx %1hx", ...) expected 2472240330 got 1690959881 [ 383.102843] test_scanf: vsscanf("f12:2:d:2:c166:1:36b:1906", "%3hx:%1hx:%1hx:%1hx:%4hx:%1hx:%3hx:%4hx", ...) expected 131085 got 851970 [ 383.105376] test_scanf: vsscanf("4,b2fe,3,593,6,0,3bde,0", "%1hx,%4hx,%1hx,%3hx,%1hx,%1hx,%4hx,%1hx", ...) expected 93519875 got 242430 [ 383.105659] test_scanf: vsscanf("6-1-2-1-d9e6-f-93e-e567", "%1hx-%1hx-%1hx-%1hx-%4hx-%1hx-%3hx-%4hx", ...) expected 65538 got 131073 [ 383.106127] test_scanf: vsscanf("72d6/35/e88d/1/0/6c8c/7/1", "%4hx/%2hx/%4hx/%1hx/%1hx/%4hx/%1hx/%1hx", ...) expected 125069 got 3901554741 [ 383.106235] test_scanf: vsscanf("c9bea1b8122113e9a168df573", "%4hx%4hx%1hx%4hx%4hx%1hx%4hx%3hx", ...) expected 571539457 got 106936 ... [ 383.106398] test_scanf: failed 6 out of 2545 tests Failure output after this series: # numbers_list_field_width_val_width: ASSERTION FAILED at lib/scanf_kunit.c:94 lib/scanf_kunit.c:555: vsscanf("0 1e 3e43 31f0 0 0 5797 9c70", "%1hx %2hx %4hx %4hx %1hx %1hx %4hx %4hx", ...) expected 837828163 got 1044578334 not ok 1 " " # numbers_list_field_width_val_width: ASSERTION FAILED at lib/scanf_kunit.c:94 lib/scanf_kunit.c:555: vsscanf("dc2:1c:0:3531:2621:5172:1:7", "%3hx:%2hx:%1hx:%4hx:%4hx:%4hx:%1hx:%1hx", ...) expected 892403712 got 28 not ok 2 ":" # numbers_list_field_width_val_width: ASSERTION FAILED at lib/scanf_kunit.c:94 lib/scanf_kunit.c:555: vsscanf("e083,8f6e,b,70ca,1,1,aab1,10e4", "%4hx,%4hx,%1hx,%4hx,%1hx,%1hx,%4hx,%4hx", ...) expected 1892286475 got 757614 not ok 3 "," # numbers_list_field_width_val_width: ASSERTION FAILED at lib/scanf_kunit.c:94 lib/scanf_kunit.c:555: vsscanf("2e72-8435-1-2fc-7cbd-c2f1-7158-2b41", "%4hx-%4hx-%1hx-%3hx-%4hx-%4hx-%4hx-%4hx", ...) expected 50069505 got 99381 not ok 4 "-" # numbers_list_field_width_val_width: ASSERTION FAILED at lib/scanf_kunit.c:94 lib/scanf_kunit.c:555: vsscanf("403/0/17/1/11e7/1/1fe8/34ba", "%3hx/%1hx/%2hx/%1hx/%4hx/%1hx/%4hx/%4hx", ...) expected 65559 got 1507328 not ok 5 "/" Signed-off-by: Tamir Duberstein <tamird(a)gmail.com> --- Changes in v8: - Expand "scanf: remove redundant debug logs" commit message. (Andy Shevchenko) - Add patch "implicate test line in failure messages". - Rebase on linux-next, move scanf_kunit.c into lib/tests/. - Link to v7: https://lore.kernel.org/r/20250211-scanf-kunit-convert-v7-0-c057f0a3d9d8@gm… Changes in v7: - Remove redundant debug logs. (Petr Mladek) - Drop Petr's Acked-by. - Use original test assertions as KUNIT_*_EQ_MSG produces hard-to-parse messages. The new failure output is: - Link to v6: https://lore.kernel.org/r/20250210-scanf-kunit-convert-v6-0-4d583d07f92d@gm… Changes in v6: - s/at boot/at runtime/ for consistency with the printf series. - Go back to kmalloc. (Geert Uytterhoeven) - Link to v5: https://lore.kernel.org/r/20250210-scanf-kunit-convert-v5-0-8e64f3a7de99@gm… Changes in v5: - Remove extraneous trailing newlines from failure messages. - Replace `pr_debug` with `kunit_printk`. - Use static char arrays instead of kmalloc. - Drop KUnit boilerplate from CONFIG_SCANF_KUNIT_TEST help text. - Drop arch changes. - Link to v4: https://lore.kernel.org/r/20250207-scanf-kunit-convert-v4-0-a23e2afaede8@gm… Changes in v4: - Bake `test` into various macros, greatly reducing diff noise. - Revert control flow changes. - Link to v3: https://lore.kernel.org/r/20250204-scanf-kunit-convert-v3-0-386d7c3ee714@gm… Changes in v3: - Reduce diff noise in lib/Makefile. (Petr Mladek) - Split `scanf_test` into a few test cases. New output: : =================== scanf (10 subtests) ==================== : [PASSED] numbers_simple : ====================== numbers_list ======================= : [PASSED] delim=" " : [PASSED] delim=":" : [PASSED] delim="," : [PASSED] delim="-" : [PASSED] delim="/" : ================== [PASSED] numbers_list =================== : ============ numbers_list_field_width_typemax ============= : [PASSED] delim=" " : [PASSED] delim=":" : [PASSED] delim="," : [PASSED] delim="-" : [PASSED] delim="/" : ======== [PASSED] numbers_list_field_width_typemax ========= : =========== numbers_list_field_width_val_width ============ : [PASSED] delim=" " : [PASSED] delim=":" : [PASSED] delim="," : [PASSED] delim="-" : [PASSED] delim="/" : ======= [PASSED] numbers_list_field_width_val_width ======== : [PASSED] numbers_slice : [PASSED] numbers_prefix_overflow : [PASSED] test_simple_strtoull : [PASSED] test_simple_strtoll : [PASSED] test_simple_strtoul : [PASSED] test_simple_strtol : ====================== [PASSED] scanf ====================== : ============================================================ : Testing complete. Ran 22 tests: passed: 22 : Elapsed time: 5.517s total, 0.001s configuring, 5.440s building, 0.067s running - Link to v2: https://lore.kernel.org/r/20250203-scanf-kunit-convert-v2-1-277a618d804e@gm… Changes in v2: - Rename lib/{test_scanf.c => scanf_kunit.c}. (Andy Shevchenko) - Link to v1: https://lore.kernel.org/r/20250131-scanf-kunit-convert-v1-1-0976524f0eba@gm… --- Tamir Duberstein (4): scanf: implicate test line in failure messages scanf: remove redundant debug logs scanf: convert self-test to KUnit scanf: break kunit into test cases MAINTAINERS | 2 +- lib/Kconfig.debug | 12 +- lib/Makefile | 1 - lib/tests/Makefile | 1 + lib/{test_scanf.c => tests/scanf_kunit.c} | 299 +++++++++++++++--------------- tools/testing/selftests/lib/Makefile | 2 +- tools/testing/selftests/lib/config | 1 - tools/testing/selftests/lib/scanf.sh | 4 - 8 files changed, 160 insertions(+), 162 deletions(-) --- base-commit: 7b7a883c7f4de1ee5040bd1c32aabaafde54d209 change-id: 20250131-scanf-kunit-convert-f70dc33bb34c Best regards, -- Tamir Duberstein <tamird(a)gmail.com>

7 months, 2 weeks

4
19
0 0

[PATCH net-next v7 0/6] tun: Introduce virtio-net hashing feature

by Akihiko Odaki

virtio-net have two usage of hashes: one is RSS and another is hash reporting. Conventionally the hash calculation was done by the VMM. However, computing the hash after the queue was chosen defeats the purpose of RSS. Another approach is to use eBPF steering program. This approach has another downside: it cannot report the calculated hash due to the restrictive nature of eBPF. Introduce the code to compute hashes to the kernel in order to overcome thse challenges. An alternative solution is to extend the eBPF steering program so that it will be able to report to the userspace, but it is based on context rewrites, which is in feature freeze. We can adopt kfuncs, but they will not be UAPIs. We opt to ioctl to align with other relevant UAPIs (KVM and vhost_net). The patches for QEMU to use this new feature was submitted as RFC and is available at: https://patchew.org/QEMU/20240915-hash-v3-0-79cb08d28647@daynix.com/ This work was presented at LPC 2024: https://lpc.events/event/18/contributions/1963/ V1 -> V2: Changed to introduce a new BPF program type. Signed-off-by: Akihiko Odaki <akihiko.odaki(a)daynix.com> --- Changes in v7: - Ensured to set hash_report to VIRTIO_NET_HASH_REPORT_NONE for VHOST_NET_F_VIRTIO_NET_HDR. - s/4/sizeof(u32)/ in patch "virtio_net: Add functions for hashing". - Added tap_skb_cb type. - Rebased. - Link to v6: https://lore.kernel.org/r/20250109-rss-v6-0-b1c90ad708f6@daynix.com Changes in v6: - Extracted changes to fill vnet header holes into another series. - Squashed patches "skbuff: Introduce SKB_EXT_TUN_VNET_HASH", "tun: Introduce virtio-net hash reporting feature", and "tun: Introduce virtio-net RSS" into patch "tun: Introduce virtio-net hash feature". - Dropped the RFC tag. - Link to v5: https://lore.kernel.org/r/20241008-rss-v5-0-f3cf68df005d@daynix.com Changes in v5: - Fixed a compilation error with CONFIG_TUN_VNET_CROSS_LE. - Optimized the calculation of the hash value according to: https://git.dpdk.org/dpdk/commit/?id=3fb1ea032bd6ff8317af5dac9af901f1f324ca… - Added patch "tun: Unify vnet implementation". - Dropped patch "tap: Pad virtio header with zero". - Added patch "selftest: tun: Test vnet ioctls without device". - Reworked selftests to skip for older kernels. - Documented the case when the underlying device is deleted and packets have queue_mapping set by TC. - Reordered test harness arguments. - Added code to handle fragmented packets. - Link to v4: https://lore.kernel.org/r/20240924-rss-v4-0-84e932ec0e6c@daynix.com Changes in v4: - Moved tun_vnet_hash_ext to if_tun.h. - Renamed virtio_net_toeplitz() to virtio_net_toeplitz_calc(). - Replaced htons() with cpu_to_be16(). - Changed virtio_net_hash_rss() to return void. - Reordered variable declarations in virtio_net_hash_rss(). - Removed virtio_net_hdr_v1_hash_from_skb(). - Updated messages of "tap: Pad virtio header with zero" and "tun: Pad virtio header with zero". - Fixed vnet_hash allocation size. - Ensured to free vnet_hash when destructing tun_struct. - Link to v3: https://lore.kernel.org/r/20240915-rss-v3-0-c630015db082@daynix.com Changes in v3: - Reverted back to add ioctl. - Split patch "tun: Introduce virtio-net hashing feature" into "tun: Introduce virtio-net hash reporting feature" and "tun: Introduce virtio-net RSS". - Changed to reuse hash values computed for automq instead of performing RSS hashing when hash reporting is requested but RSS is not. - Extracted relevant data from struct tun_struct to keep it minimal. - Added kernel-doc. - Changed to allow calling TUNGETVNETHASHCAP before TUNSETIFF. - Initialized num_buffers with 1. - Added a test case for unclassified packets. - Fixed error handling in tests. - Changed tests to verify that the queue index will not overflow. - Rebased. - Link to v2: https://lore.kernel.org/r/20231015141644.260646-1-akihiko.odaki@daynix.com --- Akihiko Odaki (6): virtio_net: Add functions for hashing net: flow_dissector: Export flow_keys_dissector_symmetric tun: Introduce virtio-net hash feature selftest: tun: Test vnet ioctls without device selftest: tun: Add tests for virtio-net hashing vhost/net: Support VIRTIO_NET_F_HASH_REPORT Documentation/networking/tuntap.rst | 7 + drivers/net/Kconfig | 1 + drivers/net/tap.c | 62 +++- drivers/net/tun.c | 89 ++++- drivers/net/tun_vnet.h | 180 +++++++++- drivers/vhost/net.c | 49 +-- include/linux/if_tap.h | 2 + include/linux/skbuff.h | 3 + include/linux/virtio_net.h | 188 +++++++++++ include/net/flow_dissector.h | 1 + include/uapi/linux/if_tun.h | 75 +++++ net/core/flow_dissector.c | 3 +- net/core/skbuff.c | 4 + tools/testing/selftests/net/Makefile | 2 +- tools/testing/selftests/net/tun.c | 627 ++++++++++++++++++++++++++++++++++- 15 files changed, 1231 insertions(+), 62 deletions(-) --- base-commit: dd83757f6e686a2188997cb58b5975f744bb7786 change-id: 20240403-rss-e737d89efa77 prerequisite-change-id: 20241230-tun-66e10a49b0c7:v6 prerequisite-patch-id: 871dc5f146fb6b0e3ec8612971a8e8190472c0fb prerequisite-patch-id: 2797ed249d32590321f088373d4055ff3f430a0e prerequisite-patch-id: ea3370c72d4904e2f0536ec76ba5d26784c0cede prerequisite-patch-id: 837e4cf5d6b451424f9b1639455e83a260c4440d prerequisite-patch-id: ea701076f57819e844f5a35efe5cbc5712d3080d prerequisite-patch-id: 701646fb43ad04cc64dd2bf13c150ccbe6f828ce prerequisite-patch-id: 53176dae0c003f5b6c114d43f936cf7140d31bb5 prerequisite-change-id: 20250116-buffers-96e14bf023fc:v2 prerequisite-patch-id: 25fd4f99d4236a05a5ef16ab79f3e85ee57e21cc Best regards, -- Akihiko Odaki <akihiko.odaki(a)daynix.com>

7 months, 2 weeks

4
12
0 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

Linux-kselftest-mirror February 2025