November 2023 - Linux-kselftest-mirror

by Mina Almasry

Changes in RFC v3: ------------------ 1. Pulled in the memory-provider dependency from Jakub's RFC[1] to make the series reviewable and mergable. 2. Implemented multi-rx-queue binding which was a todo in v2. 3. Fix to cmsg handling. The sticking point in RFC v2[2] was the device reset required to refill the device rx-queues after the dmabuf bind/unbind. The solution suggested as I understand is a subset of the per-queue management ops Jakub suggested or similar: https://lore.kernel.org/netdev/20230815171638.4c057dcd@kernel.org/ This is not addressed in this revision, because: 1. This point was discussed at netconf & netdev and there is openness to using the current approach of requiring a device reset. 2. Implementing individual queue resetting seems to be difficult for my test bed with GVE. My prototype to test this ran into issues with the rx-queues not coming back up properly if reset individually. At the moment I'm unsure if it's a mistake in the POC or a genuine issue in the virtualization stack behind GVE, which currently doesn't test individual rx-queue restart. 3. Our usecases are not bothered by requiring a device reset to refill the buffer queues, and we'd like to support NICs that run into this limitation with resetting individual queues. My thought is that drivers that have trouble with per-queue configs can use the support in this series, while drivers that support new netdev ops to reset individual queues can automatically reset the queue as part of the dma-buf bind/unbind. The same approach with device resets is presented again for consideration with other sticking points addressed. This proposal includes the rx devmem path only proposed for merge. For a snapshot of my entire tree which includes the GVE POC page pool support & device memory support: https://github.com/torvalds/linux/compare/master...mina:linux:tcpdevmem-v3 [1] https://lore.kernel.org/netdev/f8270765-a27b-6ccf-33ea-cda097168d79@redhat.… [2] https://lore.kernel.org/netdev/CAHS8izOVJGJH5WF68OsRWFKJid1_huzzUK+hpKbLcL4… Cc: Shakeel Butt <shakeelb(a)google.com> Cc: Jeroen de Borst <jeroendb(a)google.com> Cc: Praveen Kaligineedi <pkaligineedi(a)google.com> Changes in RFC v2: ------------------ The sticking point in RFC v1[1] was the dma-buf pages approach we used to deliver the device memory to the TCP stack. RFC v2 is a proof-of-concept that attempts to resolve this by implementing scatterlist support in the networking stack, such that we can import the dma-buf scatterlist directly. This is the approach proposed at a high level here[2]. Detailed changes: 1. Replaced dma-buf pages approach with importing scatterlist into the page pool. 2. Replace the dma-buf pages centric API with a netlink API. 3. Removed the TX path implementation - there is no issue with implementing the TX path with scatterlist approach, but leaving out the TX path makes it easier to review. 4. Functionality is tested with this proposal, but I have not conducted perf testing yet. I'm not sure there are regressions, but I removed perf claims from the cover letter until they can be re-confirmed. 5. Added Signed-off-by: contributors to the implementation. 6. Fixed some bugs with the RX path since RFC v1. Any feedback welcome, but specifically the biggest pending questions needing feedback IMO are: 1. Feedback on the scatterlist-based approach in general. 2. Netlink API (Patch 1 & 2). 3. Approach to handle all the drivers that expect to receive pages from the page pool (Patch 6). [1] https://lore.kernel.org/netdev/dfe4bae7-13a0-3c5d-d671-f61b375cb0b4@gmail.c… [2] https://lore.kernel.org/netdev/CAHS8izPm6XRS54LdCDZVd0C75tA1zHSu6jLVO8nzTLX… ---------------------- * TL;DR: Device memory TCP (devmem TCP) is a proposal for transferring data to and/or from device memory efficiently, without bouncing the data to a host memory buffer. * Problem: A large amount of data transfers have device memory as the source and/or destination. Accelerators drastically increased the volume of such transfers. Some examples include: - ML accelerators transferring large amounts of training data from storage into GPU/TPU memory. In some cases ML training setup time can be as long as 50% of TPU compute time, improving data transfer throughput & efficiency can help improving GPU/TPU utilization. - Distributed training, where ML accelerators, such as GPUs on different hosts, exchange data among them. - Distributed raw block storage applications transfer large amounts of data with remote SSDs, much of this data does not require host processing. Today, the majority of the Device-to-Device data transfers the network are implemented as the following low level operations: Device-to-Host copy, Host-to-Host network transfer, and Host-to-Device copy. The implementation is suboptimal, especially for bulk data transfers, and can put significant strains on system resources, such as host memory bandwidth, PCIe bandwidth, etc. One important reason behind the current state is the kernel’s lack of semantics to express device to network transfers. * Proposal: In this patch series we attempt to optimize this use case by implementing socket APIs that enable the user to: 1. send device memory across the network directly, and 2. receive incoming network packets directly into device memory. Packet _payloads_ go directly from the NIC to device memory for receive and from device memory to NIC for transmit. Packet _headers_ go to/from host memory and are processed by the TCP/IP stack normally. The NIC _must_ support header split to achieve this. Advantages: - Alleviate host memory bandwidth pressure, compared to existing network-transfer + device-copy semantics. - Alleviate PCIe BW pressure, by limiting data transfer to the lowest level of the PCIe tree, compared to traditional path which sends data through the root complex. * Patch overview: ** Part 1: netlink API Gives user ability to bind dma-buf to an RX queue. ** Part 2: scatterlist support Currently the standard for device memory sharing is DMABUF, which doesn't generate struct pages. On the other hand, networking stack (skbs, drivers, and page pool) operate on pages. We have 2 options: 1. Generate struct pages for dmabuf device memory, or, 2. Modify the networking stack to process scatterlist. Approach #1 was attempted in RFC v1. RFC v2 implements approach #2. ** part 3: page pool support We piggy back on page pool memory providers proposal: https://github.com/kuba-moo/linux/tree/pp-providers It allows the page pool to define a memory provider that provides the page allocation and freeing. It helps abstract most of the device memory TCP changes from the driver. ** part 4: support for unreadable skb frags Page pool iovs are not accessible by the host; we implement changes throughput the networking stack to correctly handle skbs with unreadable frags. ** Part 5: recvmsg() APIs We define user APIs for the user to send and receive device memory. Not included with this RFC is the GVE devmem TCP support, just to simplify the review. Code available here if desired: https://github.com/mina/linux/tree/tcpdevmem This RFC is built on top of net-next with Jakub's pp-providers changes cherry-picked. * NIC dependencies: 1. (strict) Devmem TCP require the NIC to support header split, i.e. the capability to split incoming packets into a header + payload and to put each into a separate buffer. Devmem TCP works by using device memory for the packet payload, and host memory for the packet headers. 2. (optional) Devmem TCP works better with flow steering support & RSS support, i.e. the NIC's ability to steer flows into certain rx queues. This allows the sysadmin to enable devmem TCP on a subset of the rx queues, and steer devmem TCP traffic onto these queues and non devmem TCP elsewhere. The NIC I have access to with these properties is the GVE with DQO support running in Google Cloud, but any NIC that supports these features would suffice. I may be able to help reviewers bring up devmem TCP on their NICs. * Testing: The series includes a udmabuf kselftest that show a simple use case of devmem TCP and validates the entire data path end to end without a dependency on a specific dmabuf provider. ** Test Setup Kernel: net-next with this RFC and memory provider API cherry-picked locally. Hardware: Google Cloud A3 VMs. NIC: GVE with header split & RSS & flow steering support. Jakub Kicinski (2): net: page_pool: factor out releasing DMA from releasing the page net: page_pool: create hooks for custom page providers Mina Almasry (10): net: netdev netlink api to bind dma-buf to a net device netdev: support binding dma-buf to netdevice netdev: netdevice devmem allocator memory-provider: dmabuf devmem memory provider page-pool: device memory support net: support non paged skb frags net: add support for skbs with unreadable frags tcp: RX path for devmem TCP net: add SO_DEVMEM_DONTNEED setsockopt to release RX pages selftests: add ncdevmem, netcat for devmem TCP Documentation/netlink/specs/netdev.yaml | 28 ++ include/linux/netdevice.h | 93 ++++ include/linux/skbuff.h | 56 ++- include/linux/socket.h | 1 + include/net/netdev_rx_queue.h | 1 + include/net/page_pool/helpers.h | 151 ++++++- include/net/page_pool/types.h | 55 +++ include/net/sock.h | 2 + include/net/tcp.h | 5 +- include/uapi/asm-generic/socket.h | 6 + include/uapi/linux/netdev.h | 10 + include/uapi/linux/uio.h | 10 + net/core/datagram.c | 6 + net/core/dev.c | 240 +++++++++++ net/core/gro.c | 7 +- net/core/netdev-genl-gen.c | 14 + net/core/netdev-genl-gen.h | 1 + net/core/netdev-genl.c | 118 +++++ net/core/page_pool.c | 209 +++++++-- net/core/skbuff.c | 80 +++- net/core/sock.c | 36 ++ net/ipv4/tcp.c | 205 ++++++++- net/ipv4/tcp_input.c | 13 +- net/ipv4/tcp_ipv4.c | 7 + net/ipv4/tcp_output.c | 5 +- net/packet/af_packet.c | 4 +- tools/include/uapi/linux/netdev.h | 10 + tools/net/ynl/generated/netdev-user.c | 42 ++ tools/net/ynl/generated/netdev-user.h | 47 ++ tools/testing/selftests/net/.gitignore | 1 + tools/testing/selftests/net/Makefile | 5 + tools/testing/selftests/net/ncdevmem.c | 546 ++++++++++++++++++++++++ 32 files changed, 1950 insertions(+), 64 deletions(-) create mode 100644 tools/testing/selftests/net/ncdevmem.c -- 2.42.0.869.gea05f2083d-goog

2 years

12
125
0 0

[PATCH v3 0/3] Add a test to catch unprobed Devicetree devices

by Nícolas F. R. A. Prado

Regressions that cause a device to no longer be probed by a driver can have a big impact on the platform's functionality, and despite being relatively common there isn't currently any generic test to detect them. As an example, bootrr [1] does test for device probe, but it requires defining the expected probed devices for each platform. Given that the Devicetree already provides a static description of devices on the system, it is a good basis for building such a test on top. This series introduces a test to catch regressions that prevent devices from probing. Patches 1 and 2 extend the existing dt-extract-compatibles to be able to output only the compatibles that can be expected to match a Devicetree node to a driver. Patch 2 adds a kselftest that walks over the Devicetree nodes on the current platform and compares the compatibles to the ones on the list, and on an ignore list, to point out devices that failed to be probed. A compatible list is needed because not all compatibles that can show up in a Devicetree node can be used to match to a driver, for example the code for that compatible might use "OF_DECLARE" type macros and avoid the driver framework, or the node might be controlled by a driver that was bound to a different node. An ignore list is needed for the few cases where it's common for a driver to match a device but not probe, like for the "simple-mfd" compatible, where the driver only probes if that compatible is the node's first compatible. The reason for parsing the kernel source instead of relying on information exposed by the kernel at runtime (say, looking at modaliases or introducing some other mechanism), is to be able to catch issues where a config was renamed or a driver moved across configs, and the .config used by the kernel not updated accordingly. We need to parse the source to find all compatibles present in the kernel independent of the current config being run. [1] https://github.com/kernelci/bootrr Changes in v3: - Added DT selftest path to MAINTAINERS - Enabled device probe test for nodes with 'status = "ok"' - Added pass/fail/skip totals to end of test output Changes in v2: - Extended dt-extract-compatibles script to be able to extract driver matching compatibles, instead of adding a new one in Coccinelle - Made kselftest output in the KTAP format Nícolas F. R. A. Prado (3): dt: dt-extract-compatibles: Handle cfile arguments in generator function dt: dt-extract-compatibles: Add flag for driver matching compatibles kselftest: Add new test for detecting unprobed Devicetree devices MAINTAINERS | 1 + scripts/dtc/dt-extract-compatibles | 74 +++++++++++++---- tools/testing/selftests/Makefile | 1 + tools/testing/selftests/dt/.gitignore | 1 + tools/testing/selftests/dt/Makefile | 21 +++++ .../selftests/dt/compatible_ignore_list | 1 + tools/testing/selftests/dt/ktap_helpers.sh | 70 ++++++++++++++++ .../selftests/dt/test_unprobed_devices.sh | 83 +++++++++++++++++++ 8 files changed, 236 insertions(+), 16 deletions(-) create mode 100644 tools/testing/selftests/dt/.gitignore create mode 100644 tools/testing/selftests/dt/Makefile create mode 100644 tools/testing/selftests/dt/compatible_ignore_list create mode 100644 tools/testing/selftests/dt/ktap_helpers.sh create mode 100755 tools/testing/selftests/dt/test_unprobed_devices.sh -- 2.42.0

2 years

6
12
0 0

[PATCH v8 0/6] workload-specific and memory pressure-driven zswap writeback

by Nhat Pham

Changelog: v8: * Fixed a couple of build errors in the case of !CONFIG_MEMCG * Simplified the online memcg selection scheme for the zswap global limit reclaim (suggested by Michal Hocko and Johannes Weiner) (patch 2 and patch 3) * Added a new kconfig to allows users to enable zswap shrinker by default. (suggested by Johannes Weiner) (patch 6) v7: * Added the mem_cgroup_iter_online() function to the API for the new behavior (suggested by Andrew Morton) (patch 2) * Fixed a missing list_lru_del -> list_lru_del_obj (patch 1) v6: * Rebase on top of latest mm-unstable. * Fix/improve the in-code documentation of the new list_lru manipulation functions (patch 1) v5: * Replace reference getting with an rcu_read_lock() section for zswap lru modifications (suggested by Yosry) * Add a new prep patch that allows mem_cgroup_iter() to return online cgroup. * Add a callback that updates pool->next_shrink when the cgroup is offlined (suggested by Yosry Ahmed, Johannes Weiner) v4: * Rename list_lru_add to list_lru_add_obj and __list_lru_add to list_lru_add (patch 1) (suggested by Johannes Weiner and Yosry Ahmed) * Some cleanups on the memcg aware LRU patch (patch 2) (suggested by Yosry Ahmed) * Use event interface for the new per-cgroup writeback counters. (patch 3) (suggested by Yosry Ahmed) * Abstract zswap's lruvec states and handling into zswap_lruvec_state (patch 5) (suggested by Yosry Ahmed) v3: * Add a patch to export per-cgroup zswap writeback counters * Add a patch to update zswap's kselftest * Separate the new list_lru functions into its own prep patch * Do not start from the top of the hierarchy when encounter a memcg that is not online for the global limit zswap writeback (patch 2) (suggested by Yosry Ahmed) * Do not remove the swap entry from list_lru in __read_swapcache_async() (patch 2) (suggested by Yosry Ahmed) * Removed a redundant zswap pool getting (patch 2) (reported by Ryan Roberts) * Use atomic for the nr_zswap_protected (instead of lruvec's lock) (patch 5) (suggested by Yosry Ahmed) * Remove the per-cgroup zswap shrinker knob (patch 5) (suggested by Yosry Ahmed) v2: * Fix loongarch compiler errors * Use pool stats instead of memcg stats when !CONFIG_MEMCG_KEM There are currently several issues with zswap writeback: 1. There is only a single global LRU for zswap, making it impossible to perform worload-specific shrinking - an memcg under memory pressure cannot determine which pages in the pool it owns, and often ends up writing pages from other memcgs. This issue has been previously observed in practice and mitigated by simply disabling memcg-initiated shrinking: https://lore.kernel.org/all/20230530232435.3097106-1-nphamcs@gmail.com/T/#u But this solution leaves a lot to be desired, as we still do not have an avenue for an memcg to free up its own memory locked up in the zswap pool. 2. We only shrink the zswap pool when the user-defined limit is hit. This means that if we set the limit too high, cold data that are unlikely to be used again will reside in the pool, wasting precious memory. It is hard to predict how much zswap space will be needed ahead of time, as this depends on the workload (specifically, on factors such as memory access patterns and compressibility of the memory pages). This patch series solves these issues by separating the global zswap LRU into per-memcg and per-NUMA LRUs, and performs workload-specific (i.e memcg- and NUMA-aware) zswap writeback under memory pressure. The new shrinker does not have any parameter that must be tuned by the user, and can be opted in or out on a per-memcg basis. As a proof of concept, we ran the following synthetic benchmark: build the linux kernel in a memory-limited cgroup, and allocate some cold data in tmpfs to see if the shrinker could write them out and improved the overall performance. Depending on the amount of cold data generated, we observe from 14% to 35% reduction in kernel CPU time used in the kernel builds. Domenico Cerasuolo (3): zswap: make shrinking memcg-aware mm: memcg: add per-memcg zswap writeback stat selftests: cgroup: update per-memcg zswap writeback selftest Nhat Pham (3): list_lru: allows explicit memcg and NUMA node selection memcontrol: implement mem_cgroup_tryget_online() zswap: shrinks zswap pool based on memory pressure Documentation/admin-guide/mm/zswap.rst | 10 + drivers/android/binder_alloc.c | 7 +- fs/dcache.c | 8 +- fs/gfs2/quota.c | 6 +- fs/inode.c | 4 +- fs/nfs/nfs42xattr.c | 8 +- fs/nfsd/filecache.c | 4 +- fs/xfs/xfs_buf.c | 6 +- fs/xfs/xfs_dquot.c | 2 +- fs/xfs/xfs_qm.c | 2 +- include/linux/list_lru.h | 54 ++- include/linux/memcontrol.h | 15 + include/linux/mmzone.h | 2 + include/linux/vm_event_item.h | 1 + include/linux/zswap.h | 27 +- mm/Kconfig | 14 + mm/list_lru.c | 48 ++- mm/memcontrol.c | 3 + mm/mmzone.c | 1 + mm/swap.h | 3 +- mm/swap_state.c | 26 +- mm/vmstat.c | 1 + mm/workingset.c | 4 +- mm/zswap.c | 456 +++++++++++++++++--- tools/testing/selftests/cgroup/test_zswap.c | 74 ++-- 25 files changed, 661 insertions(+), 125 deletions(-) base-commit: 5cdba94229e58a39ca389ad99763af29e6b0c5a5 -- 2.34.1

2 years

8
47
0 0

[PATCH v2 00/26] selftests/resctrl: CAT test improvements & generalized test framework

by Ilpo Järvinen

Hi all, Here's v2 series to improve resctrl selftests with generalized test framework and rewritten CAT test. In contrast to v1, this version does not include L2 CAT test because it needs further work. In general, I feel that v2 is in much better shape than v1 because I ended up addressing a few small things beyond what came up during v1 review. The series contains following improvements: - Excludes shareable bits from CAT test allocation to avoid interference - Replaces file "sink" with a volatile variable - Alters read pattern to defeat HW prefetcher optimizations - Rewrites CAT test to make the CAT test reliable and truly measure if CAT is working or not - Introduces generalized test framework making easier to add new tests - Lots of other cleanups & refactoring This series have been tested across a large number of systems from different generations. v2: - Postpone adding L2 CAT test as more investigations are necessary - Add patch to remove ctrlc_handler() from wrong place - Improvements to changelogs - Function comments improvements & comment cleanups - Move some parts of the changes into more logical patch - If checks: buf == NULL -> !buf - Variable naming: - p -> buf - cbm_mask_path -> cbm_path - Function naming: - get_cbm_mask() -> get_full_cbm() - cache_size() -> cache_portion_size() - Use PATH_MAX - Improved cache_portion_size() parameter names - int count -> unsigned int - Pass filename to measurement taking functions instead of resctrl_val_param - !lines ? : reversal - Removed bogus static from function local variable - Open perf fd only once, reset & enable in the innermost test loop - Add perf fd ioctl() error handling - Add patch to change compiler optimization prevention "sink" from file to volatile variable - Remove cpu_no and resource (the latter was added in v1) members from resctrl_val_param (pass uparams and test where those are needed) - Removed ARRAY_SIZE() macro - Add patch to rename "resource_id" to "domain_id" Ilpo Järvinen (26): selftests/resctrl: Don't use ctrlc_handler() outside signal handling selftests/resctrl: Split fill_buf to allow tests finer-grained control selftests/resctrl: Refactor fill_buf functions selftests/resctrl: Refactor get_cbm_mask() and rename to get_full_cbm() selftests/resctrl: Mark get_cache_size() cache_type const selftests/resctrl: Create cache_portion_size() helper selftests/resctrl: Exclude shareable bits from schemata in CAT test selftests/resctrl: Split measure_cache_vals() selftests/resctrl: Split show_cache_info() to test specific and generic parts selftests/resctrl: Remove unnecessary __u64 -> unsigned long conversion selftests/resctrl: Remove nested calls in perf event handling selftests/resctrl: Consolidate naming of perf event related things selftests/resctrl: Improve perf init selftests/resctrl: Convert perf related globals to locals selftests/resctrl: Move cat_val() to cat_test.c and rename to cat_test() selftests/resctrl: Open perf fd before start & add error handling selftests/resctrl: Replace file write with volatile variable selftests/resctrl: Read in less obvious order to defeat prefetch optimizations selftests/resctrl: Rewrite Cache Allocation Technology (CAT) test selftests/resctrl: Create struct for input parameters selftests/resctrl: Introduce generalized test framework selftests/resctrl: Pass write_schemata() resource instead of test name selftests/resctrl: Add helper to convert L2/3 to integer selftests/resctrl: Rename resource ID to domain ID selftests/resctrl: Get domain id from cache id selftests/resctrl: Add test groups and name L3 CAT test L3_CAT tools/testing/selftests/resctrl/cache.c | 274 +++++---------- tools/testing/selftests/resctrl/cat_test.c | 328 +++++++++++------- tools/testing/selftests/resctrl/cmt_test.c | 76 +++- tools/testing/selftests/resctrl/fill_buf.c | 132 ++++--- tools/testing/selftests/resctrl/mba_test.c | 26 +- tools/testing/selftests/resctrl/mbm_test.c | 28 +- tools/testing/selftests/resctrl/resctrl.h | 117 +++++-- .../testing/selftests/resctrl/resctrl_tests.c | 205 +++++------ tools/testing/selftests/resctrl/resctrl_val.c | 56 +-- tools/testing/selftests/resctrl/resctrlfs.c | 246 ++++++++----- 10 files changed, 821 insertions(+), 667 deletions(-) -- 2.30.2

2 years

3
61
0 0

WARNING: CPU: 6 PID: 474 at include/linux/maple_tree.h:712 mmap_region (include/linux/maple_tree.h:556 include/linux/maple_tree.h:731

by Naresh Kamboju

Following kernel panic noticed while running selftests: exec: load_address on Fastmodels (FVP) running Linux next-20231109. Reported-by: Linux Kernel Functional Testing <lkft(a)linaro.org> log: --- # timeout set to 45 # selftests: exec: load_address_16777216 [ 238.405168] ------------[ cut here ]------------ [ 238.405244] WARNING: CPU: 6 PID: 474 at include/linux/maple_tree.h:712 mmap_region (include/linux/maple_tree.h:556 include/linux/maple_tree.h:731 include/linux/maple_tree.h:747 include/linux/mm.h:1033 mm/mmap.c:2828) [ 238.405432] Modules linked in: arm_spe_pmu crct10dif_ce panel_simple pl111_drm drm_dma_helper drm_kms_helper fuse drm backlight dm_mod ip_tables x_tables [ 238.405932] CPU: 6 PID: 474 Comm: load_address_16 Not tainted 6.6.0-next-20231109 #1 [ 238.406070] Hardware name: FVP Base RevC (DT) [ 238.406151] pstate: 123402009 (nzCv daif +PAN -UAO +TCO +DIT -SSBS BTYPE=--) [ 238.406294] pc : mmap_region (include/linux/maple_tree.h:556 include/linux/maple_tree.h:731 include/linux/maple_tree.h:747 include/linux/mm.h:1033 mm/mmap.c:2828) [ 238.406424] lr : mmap_region (mm/mmap.c:2836) [ 238.406554] sp : ffff8000819639b0 [ 238.406629] x29: ffff8000819639c0 x28: ffff000806f79000 x27: 0000000002002000 [ 238.406829] x26: ffff000806f798f0 x25: ffff000806f790b0 x24: 0000000000000006 [ 238.407029] x23: 0000000000000ffc x22: ffff000805d6e100 x21: ffff0008016adf00 [ 238.407229] x20: 0000000000100073 x19: 0000000001ffc000 x18: ffffffffffffffff [ 238.407425] x17: 0000000000000000 x16: ffffd7c64ceb7c10 x15: ffffffffffffffff [ 238.407627] x14: 0000000000000000 x13: 1fffe001002bc9a1 x12: ffff0008015e4d0c [ 238.407825] x11: ffff800081963a48 x10: ffff0008015e4d00 x9 : ffffd7c64b49c9f0 [ 238.408028] x8 : ffff800081963778 x7 : 0000000000000000 x6 : 0000000000000000 [ 238.408223] x5 : ffffd7c64e35f000 x4 : ffffd7c64e35f278 x3 : 0000000000000000 [ 238.408420] x2 : ffffd7c64e92fd78 x1 : 0000000002001fff x0 : 0000000000479fff [ 238.408618] Call trace: [ 238.408681] mmap_region (include/linux/maple_tree.h:556 include/linux/maple_tree.h:731 include/linux/maple_tree.h:747 include/linux/mm.h:1033 mm/mmap.c:2828) [ 238.408812] do_mmap (arch/arm64/include/asm/mman.h:18 include/linux/mman.h:147 mm/mmap.c:1274) [ 238.408940] vm_mmap_pgoff (mm/util.c:546) [ 238.409088] vm_mmap (mm/util.c:559) [ 238.409229] elf_load (fs/binfmt_elf.c:385 fs/binfmt_elf.c:408) [ 238.409337] load_elf_binary (fs/binfmt_elf.c:1134 (discriminator 1)) [ 238.409454] bprm_execve (fs/exec.c:1940) [ 238.409598] do_execveat_common.isra.0 (fs/exec.c:1938) [ 238.409757] __arm64_sys_execve (fs/exec.c:2106) [ 238.409910] invoke_syscall (arch/arm64/kernel/syscall.c:46 (discriminator 19)) [ 238.410058] el0_svc_common.constprop.0 (arch/arm64/kernel/syscall.c:136) [ 238.410218] do_el0_svc (arch/arm64/kernel/syscall.c:155) [ 238.410363] el0_svc (arch/arm64/include/asm/daifflags.h:75 arch/arm64/kernel/entry-common.c:677) [ 238.410508] el0t_64_sync_handler (arch/arm64/kernel/entry-common.c:697) [ 238.410623] el0t_64_sync (arch/arm64/kernel/entry.S:595) [ 238.410735] ---[ end trace 0000000000000000 ]--- Links: - https://qa-reports.linaro.org/lkft/linux-next-master/build/next-20231109/te… - https://qa-reports.linaro.org/lkft/linux-next-master/build/next-20231109/te… - https://tuxapi.tuxsuite.com/v1/groups/linaro/projects/lkft/tests/2Xv9wca3SP… - https://storage.tuxsuite.com/public/linaro/lkft/builds/2Xv9vEpjybxlDA4IvgDB… -- Linaro LKFT https://lkft.linaro.org

2 years

5
4
0 0

[PATCH v1 0/9] x86/resctrl: Use soft RMIDs for reliable MBM on AMD

by Peter Newman

Hi Reinette, Fenghua, This series introduces a new mount option enabling an alternate mode for MBM to work around an issue on present AMD implementations and any other resctrl implementation where there are more RMIDs (or equivalent) than hardware counters. The L3 External Bandwidth Monitoring feature of the AMD PQoS extension[1] only guarantees that RMIDs currently assigned to a processor will be tracked by hardware. The counters of any other RMIDs which are no longer being tracked will be reset to zero. The MBM event counters return "Unavailable" to indicate when this has happened. An interval for effectively measuring memory bandwidth typically needs to be multiple seconds long. In Google's workloads, it is not feasible to bound the number of jobs with different RMIDs which will run in a cache domain over any period of time. Consequently, on a fully-committed system where all RMIDs are allocated, few groups' counters return non-zero values. To demonstrate the underlying issue, the first patch provides a test case in tools/testing/selftests/resctrl/test_rmids.sh. On an AMD EPYC 7B12 64-Core Processor with the default behavior: # ./test_rmids.sh Created 255 monitoring groups. g1: mbm_total_bytes: Unavailable -> Unavailable (FAIL) g2: mbm_total_bytes: Unavailable -> Unavailable (FAIL) g3: mbm_total_bytes: Unavailable -> Unavailable (FAIL) [..] g238: mbm_total_bytes: Unavailable -> Unavailable (FAIL) g239: mbm_total_bytes: Unavailable -> Unavailable (FAIL) g240: mbm_total_bytes: Unavailable -> Unavailable (FAIL) g241: mbm_total_bytes: Unavailable -> 660497472 g242: mbm_total_bytes: Unavailable -> 660793344 g243: mbm_total_bytes: Unavailable -> 660477312 g244: mbm_total_bytes: Unavailable -> 660495360 g245: mbm_total_bytes: Unavailable -> 660775360 g246: mbm_total_bytes: Unavailable -> 660645504 g247: mbm_total_bytes: Unavailable -> 660696128 g248: mbm_total_bytes: Unavailable -> 660605248 g249: mbm_total_bytes: Unavailable -> 660681280 g250: mbm_total_bytes: Unavailable -> 660834240 g251: mbm_total_bytes: Unavailable -> 660440064 g252: mbm_total_bytes: Unavailable -> 660501504 g253: mbm_total_bytes: Unavailable -> 660590720 g254: mbm_total_bytes: Unavailable -> 660548352 g255: mbm_total_bytes: Unavailable -> 660607296 255 groups, 0 returned counts in first pass, 15 in second successfully measured bandwidth from 15/255 groups To compare, here is the output from an Intel(R) Xeon(R) Platinum 8173M CPU: # ./test_rmids.sh Created 223 monitoring groups. g1: mbm_total_bytes: 0 -> 606126080 g2: mbm_total_bytes: 0 -> 613236736 g3: mbm_total_bytes: 0 -> 610254848 [..] g221: mbm_total_bytes: 0 -> 584679424 g222: mbm_total_bytes: 0 -> 588808192 g223: mbm_total_bytes: 0 -> 587317248 223 groups, 223 returned counts in first pass, 223 in second successfully measured bandwidth from 223/223 groups To make better use of the hardware in such a use case, this patchset introduces a "soft" RMID implementation, where each CPU is permanently assigned a "hard" RMID. On context switches which change the current soft RMID, the difference between each CPU's current event counts and most recent counts is added to the totals for the current or outgoing soft RMID. This technique does not work for cache occupancy counters, so this patch series disables cache occupancy events when soft RMIDs are enabled. This series adds the "mbm_soft_rmid" mount option to allow users to opt-in to the functionaltiy when they deem it helpful. When the same system from the earlier AMD example enables the mbm_soft_rmid mount option: # ./test_rmids.sh Created 255 monitoring groups. g1: mbm_total_bytes: 0 -> 686560576 g2: mbm_total_bytes: 0 -> 668204416 [..] g252: mbm_total_bytes: 0 -> 672651200 g253: mbm_total_bytes: 0 -> 666956800 g254: mbm_total_bytes: 0 -> 665917056 g255: mbm_total_bytes: 0 -> 671049600 255 groups, 255 returned counts in first pass, 255 in second successfully measured bandwidth from 255/255 groups (patches are based on tip/master) [1] https://www.amd.com/system/files/TechDocs/56375_1.03_PUB.pdf Peter Newman (8): selftests/resctrl: Verify all RMIDs count together x86/resctrl: Add resctrl_mbm_flush_cpu() to collect CPUs' MBM events x86/resctrl: Flush MBM event counts on soft RMID change x86/resctrl: Call mon_event_count() directly for soft RMIDs x86/resctrl: Create soft RMID version of __mon_event_count() x86/resctrl: Assign HW RMIDs to CPUs for soft RMID x86/resctrl: Use mbm_update() to push soft RMID counts x86/resctrl: Add mount option to enable soft RMID Stephane Eranian (1): x86/resctrl: Hold a spinlock in __rmid_read() on AMD arch/x86/include/asm/resctrl.h | 29 +++- arch/x86/kernel/cpu/resctrl/core.c | 80 ++++++++- arch/x86/kernel/cpu/resctrl/ctrlmondata.c | 9 +- arch/x86/kernel/cpu/resctrl/internal.h | 19 ++- arch/x86/kernel/cpu/resctrl/monitor.c | 158 +++++++++++++++++- arch/x86/kernel/cpu/resctrl/rdtgroup.c | 52 ++++++ tools/testing/selftests/resctrl/test_rmids.sh | 93 +++++++++++ 7 files changed, 425 insertions(+), 15 deletions(-) create mode 100755 tools/testing/selftests/resctrl/test_rmids.sh base-commit: dd806e2f030e57dd5bac973372aa252b6c175b73 -- 2.40.0.634.g4ca3ef3211-goog

2 years

2
39
0 0

[PATCH v2] kunit: run test suites only after module initialization completes

by Marco Pagani

Commit 2810c1e99867 ("kunit: Fix wild-memory-access bug in kunit_free_suite_set()") fixed a wild-memory-access bug that could have happened during the loading phase of test suites built and executed as loadable modules. However, it also introduced a problematic side effect that causes test suites modules to crash when they attempt to register fake devices. When a module is loaded, it traverses the MODULE_STATE_UNFORMED and MODULE_STATE_COMING states before reaching the normal operating state MODULE_STATE_LIVE. Finally, when the module is removed, it moves to MODULE_STATE_GOING before being released. However, if the loading function load_module() fails between complete_formation() and do_init_module(), the module goes directly from MODULE_STATE_COMING to MODULE_STATE_GOING without passing through MODULE_STATE_LIVE. This behavior was causing kunit_module_exit() to be called without having first executed kunit_module_init(). Since kunit_module_exit() is responsible for freeing the memory allocated by kunit_module_init() through kunit_filter_suites(), this behavior was resulting in a wild-memory-access bug. Commit 2810c1e99867 ("kunit: Fix wild-memory-access bug in kunit_free_suite_set()") fixed this issue by running the tests when the module is still in MODULE_STATE_COMING. However, modules in that state are not fully initialized, lacking sysfs kobjects. Therefore, if a test module attempts to register a fake device, it will inevitably crash. This patch proposes a different approach to fix the original wild-memory-access bug while restoring the normal module execution flow by making kunit_module_exit() able to detect if kunit_module_init() has previously initialized the tests suite set. In this way, test modules can once again register fake devices without crashing. This behavior is achieved by checking whether mod->kunit_suites is a virtual or direct mapping address. If it is a virtual address, then kunit_module_init() has allocated the suite_set in kunit_filter_suites() using kmalloc_array(). On the contrary, if mod->kunit_suites is still pointing to the original address that was set when looking up the .kunit_test_suites section of the module, then the loading phase has failed and there's no memory to be freed. v2: - add include <linux/mm.h> Fixes: 2810c1e99867 ("kunit: Fix wild-memory-access bug in kunit_free_suite_set()") Signed-off-by: Marco Pagani <marpagan(a)redhat.com> --- lib/kunit/test.c | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/lib/kunit/test.c b/lib/kunit/test.c index f2eb71f1a66c..0e829b9f8ce5 100644 --- a/lib/kunit/test.c +++ b/lib/kunit/test.c @@ -16,6 +16,7 @@ #include <linux/panic.h> #include <linux/sched/debug.h> #include <linux/sched.h> +#include <linux/mm.h> #include "debugfs.h" #include "hooks-impl.h" @@ -737,12 +738,14 @@ static void kunit_module_exit(struct module *mod) }; const char *action = kunit_action(); + if (!suite_set.start || !virt_addr_valid(suite_set.start)) + return; + if (!action) __kunit_test_suites_exit(mod->kunit_suites, mod->num_kunit_suites); - if (suite_set.start) - kunit_free_suite_set(suite_set); + kunit_free_suite_set(suite_set); } static int kunit_module_notify(struct notifier_block *nb, unsigned long val, @@ -752,12 +755,12 @@ static int kunit_module_notify(struct notifier_block *nb, unsigned long val, switch (val) { case MODULE_STATE_LIVE: + kunit_module_init(mod); break; case MODULE_STATE_GOING: kunit_module_exit(mod); break; case MODULE_STATE_COMING: - kunit_module_init(mod); break; case MODULE_STATE_UNFORMED: break; base-commit: 2cc14f52aeb78ce3f29677c2de1f06c0e91471ab -- 2.42.0

2 years

3
3
0 0

[PATCH v5 0/5] userfaultfd move option

by Suren Baghdasaryan

This patch series introduces UFFDIO_MOVE feature to userfaultfd, which has long been implemented and maintained by Andrea in his local tree [1], but was not upstreamed due to lack of use cases where this approach would be better than allocating a new page and copying the contents. Previous upstraming attempts could be found at [6] and [7]. UFFDIO_COPY performs ~20% better than UFFDIO_MOVE when the application needs pages to be allocated [2]. However, with UFFDIO_MOVE, if pages are available (in userspace) for recycling, as is usually the case in heap compaction algorithms, then we can avoid the page allocation and memcpy (done by UFFDIO_COPY). Also, since the pages are recycled in the userspace, we avoid the need to release (via madvise) the pages back to the kernel [3]. We see over 40% reduction (on a Google pixel 6 device) in the compacting thread’s completion time by using UFFDIO_MOVE vs. UFFDIO_COPY. This was measured using a benchmark that emulates a heap compaction implementation using userfaultfd (to allow concurrent accesses by application threads). More details of the usecase are explained in [3]. Furthermore, UFFDIO_MOVE enables moving swapped-out pages without touching them within the same vma. Today, it can only be done by mremap, however it forces splitting the vma. TODOs for follow-up improvements: - cross-mm support. Known differences from single-mm and missing pieces: - memcg recharging (might need to isolate pages in the process) - mm counters - cross-mm deposit table moves - cross-mm test - document the address space where src and dest reside in struct uffdio_move - TLB flush batching. Will require extensive changes to PTL locking in move_pages_pte(). OTOH that might let us reuse parts of mremap code. Changes since v4 [9]: - added Acked-by in patch 1, per Peter Xu - added description for ctx, mm and mode parameters of move_pages(), per kernel test robot - added Reviewed-by's, per Peter Xu and Axel Rasmussen - removed unused operations in uffd_test_case_ops - refactored uffd-unit-test changes to avoid using global variables and handle pmd moves without page size overrides, per Peter Xu Changes since v3 [8]: - changed retry path in folio_lock_anon_vma_read() to unlock and then relock RCU, per Peter Xu - removed cross-mm support from initial patchset, per David Hildenbrand - replaced BUG_ONs with VM_WARN_ON or WARN_ON_ONCE, per David Hildenbrand - added missing cache flushing, per Lokesh Gidra and Peter Xu - updated manpage text in the patch description, per Peter Xu - renamed internal functions from "remap" to "move", per Peter Xu - added mmap_changing check after taking mmap_lock, per Peter Xu - changed uffd context check to ensure dst_mm is registered onto uffd we are operating on, Peter Xu and David Hildenbrand - changed to non-maybe variants of maybe*_mkwrite(), per David Hildenbrand - fixed warning for CONFIG_TRANSPARENT_HUGEPAGE=n, per kernel test robot - comments cleanup, per David Hildenbrand and Peter Xu - checks for VM_IO,VM_PFNMAP,VM_HUGETLB,..., per David Hildenbrand - prevent moving pinned pages, per Peter Xu - changed uffd tests to call move uffd_test_ctx_clear() at the end of the test run instead of in the beginning of the next run - added support for testcase-specific ops - added test for moving PMD-aligned blocks Changes since v2 [5]: - renamed UFFDIO_REMAP to UFFDIO_MOVE, per David Hildenbrand - rebase over mm-unstable to use folio_move_anon_rmap(), per David Hildenbrand - added text for manpage explaining DONTFORK and KSM requirements for this feature, per David Hildenbrand - check for anon_vma changes in the fast path of folio_lock_anon_vma_read, per Peter Xu - updated the title and description of the first patch, per David Hildenbrand - updating comments in folio_lock_anon_vma_read() explaining the need for anon_vma checks, per David Hildenbrand - changed all mapcount checks to PageAnonExclusive, per Jann Horn and David Hildenbrand - changed counters in remap_swap_pte() from MM_ANONPAGES to MM_SWAPENTS, per Jann Horn - added a check for PTE change after folio is locked in remap_pages_pte(), per Jann Horn - added handling of PMD migration entries and bailout when pmd_devmap(), per Jann Horn - added checks to ensure both src and dst VMAs are writable, per Peter Xu - added UFFD_FEATURE_MOVE, per Peter Xu - removed obsolete comments, per Peter Xu - renamed remap_anon_pte to remap_present_pte, per Peter Xu - added a comment for folio_get_anon_vma() explaining the need for anon_vma checks, per Peter Xu - changed error handling in remap_pages() to make it more clear, per Peter Xu - changed EFAULT to EAGAIN to retry when a hugepage appears or disappears from under us, per Peter Xu - added links to previous upstreaming attempts, per David Hildenbrand Changes since v1 [4]: - add mmget_not_zero in userfaultfd_remap, per Jann Horn - removed extern from function definitions, per Matthew Wilcox - converted to folios in remap_pages_huge_pmd, per Matthew Wilcox - use PageAnonExclusive in remap_pages_huge_pmd, per David Hildenbrand - handle pgtable transfers between MMs, per Jann Horn - ignore concurrent A/D pte bit changes, per Jann Horn - split functions into smaller units, per David Hildenbrand - test for folio_test_large in remap_anon_pte, per Matthew Wilcox - use pte_swp_exclusive for swapcount check, per David Hildenbrand - eliminated use of mmu_notifier_invalidate_range_start_nonblock, per Jann Horn - simplified THP alignment checks, per Jann Horn - refactored the loop inside remap_pages, per Jann Horn - additional clarifying comments, per Jann Horn Main changes since Andrea's last version [1]: - Trivial translations from page to folio, mmap_sem to mmap_lock - Replace pmd_trans_unstable() with pte_offset_map_nolock() and handle its possible failure - Move pte mapping into remap_pages_pte to allow for retries when source page or anon_vma is contended. Since pte_offset_map_nolock() start RCU read section, we can't block anymore after mapping a pte, so have to unmap the ptesm do the locking and retry. - Add and use anon_vma_trylock_write() to avoid blocking while in RCU read section. - Accommodate changes in mmu_notifier_range_init() API, switch to mmu_notifier_invalidate_range_start_nonblock() to avoid blocking while in RCU read section. - Open-code now removed __swp_swapcount() - Replace pmd_read_atomic() with pmdp_get_lockless() - Add new selftest for UFFDIO_MOVE [1] https://gitlab.com/aarcange/aa/-/commit/2aec7aea56b10438a3881a20a411aa4b1fc… [2] https://lore.kernel.org/all/1425575884-2574-1-git-send-email-aarcange@redha… [3] https://lore.kernel.org/linux-mm/CA+EESO4uO84SSnBhArH4HvLNhaUQ5nZKNKXqxRCyj… [4] https://lore.kernel.org/all/20230914152620.2743033-1-surenb@google.com/ [5] https://lore.kernel.org/all/20230923013148.1390521-1-surenb@google.com/ [6] https://lore.kernel.org/all/1425575884-2574-21-git-send-email-aarcange@redh… [7] https://lore.kernel.org/all/cover.1547251023.git.blake.caldwell@colorado.ed… [8] https://lore.kernel.org/all/20231009064230.2952396-1-surenb@google.com/ [9] https://lore.kernel.org/all/20231028003819.652322-1-surenb@google.com/ Andrea Arcangeli (2): mm/rmap: support move to different root anon_vma in folio_move_anon_rmap() userfaultfd: UFFDIO_MOVE uABI Suren Baghdasaryan (3): selftests/mm: call uffd_test_ctx_clear at the end of the test selftests/mm: add uffd_test_case_ops to allow test case-specific operations selftests/mm: add UFFDIO_MOVE ioctl test Documentation/admin-guide/mm/userfaultfd.rst | 3 + fs/userfaultfd.c | 72 +++ include/linux/rmap.h | 5 + include/linux/userfaultfd_k.h | 11 + include/uapi/linux/userfaultfd.h | 29 +- mm/huge_memory.c | 122 ++++ mm/khugepaged.c | 3 + mm/rmap.c | 30 + mm/userfaultfd.c | 599 +++++++++++++++++++ tools/testing/selftests/mm/uffd-common.c | 39 +- tools/testing/selftests/mm/uffd-common.h | 9 + tools/testing/selftests/mm/uffd-stress.c | 5 +- tools/testing/selftests/mm/uffd-unit-tests.c | 192 ++++++ 13 files changed, 1115 insertions(+), 4 deletions(-) -- 2.43.0.rc1.413.gea7ed67945-goog

2 years

3
21
0 0

[PATCH v3 0/9] RISCV: Add kvm Sstc timer selftests

by Haibo Xu

The RISC-V arch_timer selftests is used to validate Sstc timer functionality in a guest, which sets up periodic timer interrupts and check the basic interrupt status upon its receipt. This KVM selftests was ported from aarch64 arch_timer and tested with Linux v6.6-rc1 on a Qemu riscv64 virt machine. --- Changed since v2: * Rebase to Linux 6.6-rc1 * Add separate patch for kvm/Makefile improvement * Move aarch64 specific macros to aarch64/arch_timer.c * Add -DCONFIG_64BIT to kvm/Makefile CFLAGS to ensure only 64bit registers were available in csr.h * Avoid some #ifdef in kvm/arch_timer.c by setting some aarch64 specific variable to 0 on risc-v Haibo Xu (9): KVM: selftests: Unify the codes for guest exception handling KVM: selftests: Unify the makefile rule for split targets KVM: arm64: selftests: Split arch_timer test code tools: riscv: Add header file csr.h KVM: riscv: selftests: Switch to use macro from csr.h KVM: riscv: selftests: Add exception handling support KVM: riscv: selftests: Add guest helper to get vcpu id KVM: riscv: selftests: Change vcpu_has_ext to a common function KVM: riscv: selftests: Add sstc timer test tools/arch/riscv/include/asm/csr.h | 521 ++++++++++++++++++ tools/testing/selftests/kvm/Makefile | 14 +- .../selftests/kvm/aarch64/arch_timer.c | 291 +--------- .../selftests/kvm/aarch64/debug-exceptions.c | 4 +- .../selftests/kvm/aarch64/page_fault_test.c | 4 +- .../testing/selftests/kvm/aarch64/vgic_irq.c | 4 +- tools/testing/selftests/kvm/arch_timer.c | 250 +++++++++ .../selftests/kvm/include/aarch64/processor.h | 12 +- .../selftests/kvm/include/kvm_util_base.h | 9 + .../selftests/kvm/include/riscv/arch_timer.h | 80 +++ .../selftests/kvm/include/riscv/processor.h | 63 ++- .../testing/selftests/kvm/include/test_util.h | 2 + .../selftests/kvm/include/timer_test.h | 43 ++ .../selftests/kvm/include/x86_64/processor.h | 5 - .../selftests/kvm/lib/aarch64/processor.c | 6 +- .../selftests/kvm/lib/riscv/handlers.S | 101 ++++ .../selftests/kvm/lib/riscv/processor.c | 86 +++ .../selftests/kvm/lib/x86_64/processor.c | 4 +- .../testing/selftests/kvm/riscv/arch_timer.c | 107 ++++ .../selftests/kvm/riscv/get-reg-list.c | 16 +- tools/testing/selftests/kvm/x86_64/amx_test.c | 4 +- .../selftests/kvm/x86_64/fix_hypercall_test.c | 4 +- .../selftests/kvm/x86_64/hyperv_evmcs.c | 4 +- .../selftests/kvm/x86_64/hyperv_features.c | 8 +- .../testing/selftests/kvm/x86_64/hyperv_ipi.c | 6 +- .../selftests/kvm/x86_64/kvm_pv_test.c | 4 +- .../selftests/kvm/x86_64/monitor_mwait_test.c | 4 +- .../kvm/x86_64/pmu_event_filter_test.c | 8 +- .../smaller_maxphyaddr_emulation_test.c | 4 +- .../selftests/kvm/x86_64/svm_int_ctl_test.c | 4 +- .../kvm/x86_64/svm_nested_shutdown_test.c | 4 +- .../kvm/x86_64/svm_nested_soft_inject_test.c | 4 +- .../kvm/x86_64/ucna_injection_test.c | 8 +- .../kvm/x86_64/userspace_msr_exit_test.c | 4 +- .../vmx_exception_with_invalid_guest_state.c | 4 +- .../selftests/kvm/x86_64/vmx_pmu_caps_test.c | 4 +- .../selftests/kvm/x86_64/xapic_ipi_test.c | 4 +- .../selftests/kvm/x86_64/xcr0_cpuid_test.c | 4 +- .../selftests/kvm/x86_64/xen_shinfo_test.c | 4 +- 39 files changed, 1338 insertions(+), 374 deletions(-) create mode 100644 tools/arch/riscv/include/asm/csr.h create mode 100644 tools/testing/selftests/kvm/arch_timer.c create mode 100644 tools/testing/selftests/kvm/include/riscv/arch_timer.h create mode 100644 tools/testing/selftests/kvm/include/timer_test.h create mode 100644 tools/testing/selftests/kvm/lib/riscv/handlers.S create mode 100644 tools/testing/selftests/kvm/riscv/arch_timer.c -- 2.34.1

2 years

4
30
0 0

[PATCH 0/3] sysctl: Fix out of bounds access for empty syscl ctl_tables

by Joel Granados via B4 Relay

Fix an out of bounds access reported in https://lore.kernel.org/oe-lkp/202311201431.57aae8f3-oliver.sang@intel.com Make sure that the ctl_table header size is greater than 0 before evaluating if a ctl_table is permanently empty; this evaluation accesses the first element regardless of the size. Adjusted the ctl_table_size of Permanently empty ctl_table registers to 1 as they the check now requires them to have size greater than 0. Added tests for empty directory handling; in response to the path followed by empty ctl_tables changing slightly. Clarified the results of sysctl self tests to more easily identify which ones are OK, Skipped and Failed. Comments are greatly appreciated Best Signed-off-by: Joel Granados <j.granados(a)samsung.com> --- Joel Granados (3): sysctl: Fix out of bounds access for empty sysctl registers sysctl: Add a selftest for handling empty dirs sysclt: Clarify the results of selftest run fs/proc/proc_sysctl.c | 9 +- lib/test_sysctl.c | 29 ++++++ tools/testing/selftests/sysctl/sysctl.sh | 146 ++++++++++++++++++------------- 3 files changed, 121 insertions(+), 63 deletions(-) --- base-commit: 8b793bcda61f6c3ed4f5b2ded7530ef6749580cb change-id: 20231121-jag-fix_out_of_bounds_insert-86380d1bc95e Best regards, -- Joel Granados <j.granados(a)samsung.com>

2 years

2
4
0 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

Linux-kselftest-mirror November 2023