March 2025 - Linux-kselftest-mirror

by Suren Baghdasaryan

This patch series introduces UFFDIO_MOVE feature to userfaultfd, which has long been implemented and maintained by Andrea in his local tree [1], but was not upstreamed due to lack of use cases where this approach would be better than allocating a new page and copying the contents. Previous upstraming attempts could be found at [6] and [7]. UFFDIO_COPY performs ~20% better than UFFDIO_MOVE when the application needs pages to be allocated [2]. However, with UFFDIO_MOVE, if pages are available (in userspace) for recycling, as is usually the case in heap compaction algorithms, then we can avoid the page allocation and memcpy (done by UFFDIO_COPY). Also, since the pages are recycled in the userspace, we avoid the need to release (via madvise) the pages back to the kernel [3]. We see over 40% reduction (on a Google pixel 6 device) in the compacting thread’s completion time by using UFFDIO_MOVE vs. UFFDIO_COPY. This was measured using a benchmark that emulates a heap compaction implementation using userfaultfd (to allow concurrent accesses by application threads). More details of the usecase are explained in [3]. Furthermore, UFFDIO_MOVE enables moving swapped-out pages without touching them within the same vma. Today, it can only be done by mremap, however it forces splitting the vma. TODOs for follow-up improvements: - cross-mm support. Known differences from single-mm and missing pieces: - memcg recharging (might need to isolate pages in the process) - mm counters - cross-mm deposit table moves - cross-mm test - document the address space where src and dest reside in struct uffdio_move - TLB flush batching. Will require extensive changes to PTL locking in move_pages_pte(). OTOH that might let us reuse parts of mremap code. Changes since v5 [10]: - added logic to split large folios in move_pages_pte(), per David Hildenbrand - added check for PAE before split_huge_pmd() to avoid the split if the move operation can't be done - replaced calls to default_huge_page_size() with read_pmd_pagesize() in uffd_move_pmd test, per David Hildenbrand - fixed the condition in uffd_move_test_common() checking if area alignment is needed Changes since v4 [9]: - added Acked-by in patch 1, per Peter Xu - added description for ctx, mm and mode parameters of move_pages(), per kernel test robot - added Reviewed-by's, per Peter Xu and Axel Rasmussen - removed unused operations in uffd_test_case_ops - refactored uffd-unit-test changes to avoid using global variables and handle pmd moves without page size overrides, per Peter Xu Changes since v3 [8]: - changed retry path in folio_lock_anon_vma_read() to unlock and then relock RCU, per Peter Xu - removed cross-mm support from initial patchset, per David Hildenbrand - replaced BUG_ONs with VM_WARN_ON or WARN_ON_ONCE, per David Hildenbrand - added missing cache flushing, per Lokesh Gidra and Peter Xu - updated manpage text in the patch description, per Peter Xu - renamed internal functions from "remap" to "move", per Peter Xu - added mmap_changing check after taking mmap_lock, per Peter Xu - changed uffd context check to ensure dst_mm is registered onto uffd we are operating on, Peter Xu and David Hildenbrand - changed to non-maybe variants of maybe*_mkwrite(), per David Hildenbrand - fixed warning for CONFIG_TRANSPARENT_HUGEPAGE=n, per kernel test robot - comments cleanup, per David Hildenbrand and Peter Xu - checks for VM_IO,VM_PFNMAP,VM_HUGETLB,..., per David Hildenbrand - prevent moving pinned pages, per Peter Xu - changed uffd tests to call move uffd_test_ctx_clear() at the end of the test run instead of in the beginning of the next run - added support for testcase-specific ops - added test for moving PMD-aligned blocks Changes since v2 [5]: - renamed UFFDIO_REMAP to UFFDIO_MOVE, per David Hildenbrand - rebase over mm-unstable to use folio_move_anon_rmap(), per David Hildenbrand - added text for manpage explaining DONTFORK and KSM requirements for this feature, per David Hildenbrand - check for anon_vma changes in the fast path of folio_lock_anon_vma_read, per Peter Xu - updated the title and description of the first patch, per David Hildenbrand - updating comments in folio_lock_anon_vma_read() explaining the need for anon_vma checks, per David Hildenbrand - changed all mapcount checks to PageAnonExclusive, per Jann Horn and David Hildenbrand - changed counters in remap_swap_pte() from MM_ANONPAGES to MM_SWAPENTS, per Jann Horn - added a check for PTE change after folio is locked in remap_pages_pte(), per Jann Horn - added handling of PMD migration entries and bailout when pmd_devmap(), per Jann Horn - added checks to ensure both src and dst VMAs are writable, per Peter Xu - added UFFD_FEATURE_MOVE, per Peter Xu - removed obsolete comments, per Peter Xu - renamed remap_anon_pte to remap_present_pte, per Peter Xu - added a comment for folio_get_anon_vma() explaining the need for anon_vma checks, per Peter Xu - changed error handling in remap_pages() to make it more clear, per Peter Xu - changed EFAULT to EAGAIN to retry when a hugepage appears or disappears from under us, per Peter Xu - added links to previous upstreaming attempts, per David Hildenbrand Changes since v1 [4]: - add mmget_not_zero in userfaultfd_remap, per Jann Horn - removed extern from function definitions, per Matthew Wilcox - converted to folios in remap_pages_huge_pmd, per Matthew Wilcox - use PageAnonExclusive in remap_pages_huge_pmd, per David Hildenbrand - handle pgtable transfers between MMs, per Jann Horn - ignore concurrent A/D pte bit changes, per Jann Horn - split functions into smaller units, per David Hildenbrand - test for folio_test_large in remap_anon_pte, per Matthew Wilcox - use pte_swp_exclusive for swapcount check, per David Hildenbrand - eliminated use of mmu_notifier_invalidate_range_start_nonblock, per Jann Horn - simplified THP alignment checks, per Jann Horn - refactored the loop inside remap_pages, per Jann Horn - additional clarifying comments, per Jann Horn Main changes since Andrea's last version [1]: - Trivial translations from page to folio, mmap_sem to mmap_lock - Replace pmd_trans_unstable() with pte_offset_map_nolock() and handle its possible failure - Move pte mapping into remap_pages_pte to allow for retries when source page or anon_vma is contended. Since pte_offset_map_nolock() start RCU read section, we can't block anymore after mapping a pte, so have to unmap the ptesm do the locking and retry. - Add and use anon_vma_trylock_write() to avoid blocking while in RCU read section. - Accommodate changes in mmu_notifier_range_init() API, switch to mmu_notifier_invalidate_range_start_nonblock() to avoid blocking while in RCU read section. - Open-code now removed __swp_swapcount() - Replace pmd_read_atomic() with pmdp_get_lockless() - Add new selftest for UFFDIO_MOVE [1] https://gitlab.com/aarcange/aa/-/commit/2aec7aea56b10438a3881a20a411aa4b1fc… [2] https://lore.kernel.org/all/1425575884-2574-1-git-send-email-aarcange@redha… [3] https://lore.kernel.org/linux-mm/CA+EESO4uO84SSnBhArH4HvLNhaUQ5nZKNKXqxRCyj… [4] https://lore.kernel.org/all/20230914152620.2743033-1-surenb@google.com/ [5] https://lore.kernel.org/all/20230923013148.1390521-1-surenb@google.com/ [6] https://lore.kernel.org/all/1425575884-2574-21-git-send-email-aarcange@redh… [7] https://lore.kernel.org/all/cover.1547251023.git.blake.caldwell@colorado.ed… [8] https://lore.kernel.org/all/20231009064230.2952396-1-surenb@google.com/ [9] https://lore.kernel.org/all/20231028003819.652322-1-surenb@google.com/ [10] https://lore.kernel.org/all/20231121171643.3719880-1-surenb@google.com/ Andrea Arcangeli (2): mm/rmap: support move to different root anon_vma in folio_move_anon_rmap() userfaultfd: UFFDIO_MOVE uABI Suren Baghdasaryan (3): selftests/mm: call uffd_test_ctx_clear at the end of the test selftests/mm: add uffd_test_case_ops to allow test case-specific operations selftests/mm: add UFFDIO_MOVE ioctl test Documentation/admin-guide/mm/userfaultfd.rst | 3 + fs/userfaultfd.c | 72 +++ include/linux/rmap.h | 5 + include/linux/userfaultfd_k.h | 11 + include/uapi/linux/userfaultfd.h | 29 +- mm/huge_memory.c | 122 ++++ mm/khugepaged.c | 3 + mm/rmap.c | 30 + mm/userfaultfd.c | 614 +++++++++++++++++++ tools/testing/selftests/mm/uffd-common.c | 39 +- tools/testing/selftests/mm/uffd-common.h | 9 + tools/testing/selftests/mm/uffd-stress.c | 5 +- tools/testing/selftests/mm/uffd-unit-tests.c | 192 ++++++ 13 files changed, 1130 insertions(+), 4 deletions(-) -- 2.43.0.rc2.451.g8631bc7472-goog

3 months, 1 week

7
43
0 0

[PATCH 0/4] mm: permit guard regions for file-backed/shmem mappings

by Lorenzo Stoakes

The guard regions feature was initially implemented to support anonymous mappings only, excluding shmem. This was done such as to introduce the feature carefully and incrementally and to be conservative when considering the various caveats and corner cases that are applicable to file-backed mappings but not to anonymous ones. Now this feature has landed in 6.13, it is time to revisit this and to extend this functionality to file-backed and shmem mappings. In order to make this maximally useful, and since one may map file-backed mappings read-only (for instance ELF images), we also remove the restriction on read-only mappings and permit the establishment of guard regions in any non-hugetlb, non-mlock()'d mapping. It is permissible to permit the establishment of guard regions in read-only mappings because the guard regions only reduce access to the mapping, and when removed simply reinstate the existing attributes of the underlying VMA, meaning no access violations can occur. While the change in kernel code introduced in this series is small, the majority of the effort here is spent in extending the testing to assert that the feature works correctly across numerous file-backed mapping scenarios. Every single guard region self-test performed against anonymous memory (which is relevant and not anon-only) has now been updated to also be performed against shmem and a mapping of a file in the working directory. This confirms that all cases also function correctly for file-backed guard regions. In addition a number of other tests are added for specific file-backed mapping scenarios. There are a number of other concerns that one might have with regard to guard regions, addressed below: Readahead ~~~~~~~~~ Readahead is a process through which the page cache is populated on the assumption that sequential reads will occur, thus amortising I/O and, through a clever use of the PG_readahead folio flag establishing during major fault and checked upon minor fault, provides for asynchronous I/O to occur as dat is processed, reducing I/O stalls as data is faulted in. Guard regions do not alter this mechanism which operations at the folio and fault level, but do of course prevent the faulting of folios that would otherwise be mapped. In the instance of a major fault prior to a guard region, synchronous readahead will occur including populating folios in the page cache which the guard regions will, in the case of the mapping in question, prevent access to. In addition, if PG_readahead is placed in a folio that is now inaccessible, this will prevent asynchronous readahead from occurring as it would otherwise do. However, there are mechanisms for heuristically resetting this within readahead regardless, which will 'recover' correct readahead behaviour. Readahead presumes sequential data access, the presence of a guard region clearly indicates that, at least in the guard region, no such sequential access will occur, as it cannot occur there. So this should have very little impact on any real workload. The far more important point is as to whether readahead causes incorrect or inappropriate mapping of ranges disallowed by the presence of guard regions - this is not the case, as readahead does not 'pre-fault' memory in this fashion. At any rate, any mechanism which would attempt to do so would hit the usual page fault paths, which correctly handle PTE markers as with anonymous mappings. Fault-Around ~~~~~~~~~~~~ The fault-around logic, in a similar vein to readahead, attempts to improve efficiency with regard to file-backed memory mappings, however it differs in that it does not try to fetch folios into the page cache that are about to be accessed, but rather pre-maps a range of folios around the faulting address. Guard regions making use of PTE markers makes this relatively trivial, as this case is already handled - see filemap_map_folio_range() and filemap_map_order0_folio() - in both instances, the solution is to simply keep the established page table mappings and let the fault handler take care of PTE markers, as per the comment: /* * NOTE: If there're PTE markers, we'll leave them to be * handled in the specific fault path, and it'll prohibit * the fault-around logic. */ This works, as establishing guard regions results in page table mappings with PTE markers, and clearing them removes them. Truncation ~~~~~~~~~~ File truncation will not eliminate existing guard regions, as the truncation operation will ultimately zap the range via unmap_mapping_range(), which specifically excludes PTE markers. Zapping ~~~~~~~ Zapping is, as with anonymous mappings, handled by zap_nonpresent_ptes(), which specifically deals with guard entries, leaving them intact except in instances such as process teardown or munmap() where they need to be removed. Reclaim ~~~~~~~ When reclaim is performed on file-backed folios, it ultimately invokes try_to_unmap_one() via the rmap. If the folio is non-large, then map_pte() will ultimately abort the operation for the guard region mapping. If large, then check_pte() will determine that this is a non-device private entry/device-exclusive entry 'swap' PTE and thus abort the operation in that instance. Therefore, no odd things happen in the instance of reclaim being attempted upon a file-backed guard region. Hole Punching ~~~~~~~~~~~~~ This updates the page cache and ultimately invokes unmap_mapping_range(), which explicitly leaves PTE markers in place. Because the establishment of guard regions zapped any existing mappings to file-backed folios, once the guard regions are removed then the hole-punched region will be faulted in as usual and everything will behave as expected. Lorenzo Stoakes (4): mm: allow guard regions in file-backed and read-only mappings selftests/mm: rename guard-pages to guard-regions tools/selftests: expand all guard region tests to file-backed tools/selftests: add file/shmem-backed mapping guard region tests mm/madvise.c | 8 +- tools/testing/selftests/mm/.gitignore | 2 +- tools/testing/selftests/mm/Makefile | 2 +- .../mm/{guard-pages.c => guard-regions.c} | 921 ++++++++++++++++-- 4 files changed, 821 insertions(+), 112 deletions(-) rename tools/testing/selftests/mm/{guard-pages.c => guard-regions.c} (58%) -- 2.48.1

3 months, 1 week

7
63
0 0

[PATCH v2 1/2] time/timekeeping: Fix possible inconsistencies in _COARSE clockids

by John Stultz

Lei Chen raised an issue with CLOCK_MONOTONIC_COARSE seeing time inconsistencies. Lei tracked down that this was being caused by the adjustment tk->tkr_mono.xtime_nsec -= offset; which is made to compensate for the unaccumulated cycles in offset when the mult value is adjusted forward, so that the non-_COARSE clockids don't see inconsistencies. However, the _COARSE clockids don't use the mult*offset value in their calculations, so this subtraction can cause the _COARSE clock ids to jump back a bit. Now, by design, this negative adjustment should be fine, because the logic run from timekeeping_adjust() is done after we accumulate approx mult*interval_cycles into xtime_nsec. The accumulated (mult*interval_cycles) will be larger then the (mult_adj*offset) value subtracted from xtime_nsec, and both operations are done together under the tk_core.lock, so the net change to xtime_nsec should always be positive. However, do_adjtimex() calls into timekeeping_advance() as well, since we want to apply the ntp freq adjustment immediately. In this case, we don't return early when the offset is smaller then interval_cycles, so we don't end up accumulating any time into xtime_nsec. But we do go on to call timekeeping_adjust(), which modifies the mult value, and subtracts from xtime_nsec to correct for the new mult value. Here because we did not accumulate anything, we have a window where the _COARSE clockids that don't utilize the mult*offset value, can see an inconsistency. So to fix this, rework the timekeeping_advance() logic a bit so that when we are called from do_adjtimex(), we call timekeeping_forward(), to first accumulate the sub-interval time into xtime_nsec. Then with no unaccumulated cycles in offset, we can do the mult adjustment without worry of the subtraction having an impact. Cc: Thomas Gleixner <tglx(a)linutronix.de> Cc: Stephen Boyd <sboyd(a)kernel.org> Cc: Anna-Maria Behnsen <anna-maria(a)linutronix.de> Cc: Frederic Weisbecker <frederic(a)kernel.org> Cc: Shuah Khan <shuah(a)kernel.org> Cc: Miroslav Lichvar <mlichvar(a)redhat.com> Cc: linux-kselftest(a)vger.kernel.org Cc: kernel-team(a)android.com Cc: Lei Chen <lei.chen(a)smartx.com> Fixes: da15cfdae033 ("time: Introduce CLOCK_REALTIME_COARSE") Reported-by: Lei Chen <lei.chen(a)smartx.com> Closes: https://lore.kernel.org/lkml/20250310030004.3705801-1-lei.chen@smartx.com/ Diagnosed-by: Thomas Gleixner <tglx(a)linutronix.de> Additional-fixes-by: Thomas Gleixner <tglx(a)linutronix.de> Signed-off-by: John Stultz <jstultz(a)google.com> --- v2: Include fixes from Thomas, dropping the unnecessary clock_set setting, and instead clearing ntp_error, along with some other minor tweaks. --- kernel/time/timekeeping.c | 94 ++++++++++++++++++++++++++++----------- 1 file changed, 69 insertions(+), 25 deletions(-) diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c index 1e67d076f1955..929846b8b45ab 100644 --- a/kernel/time/timekeeping.c +++ b/kernel/time/timekeeping.c @@ -682,20 +682,19 @@ static void timekeeping_update_from_shadow(struct tk_data *tkd, unsigned int act } /** - * timekeeping_forward_now - update clock to the current time + * timekeeping_forward - update clock to given cycle now value * @tk: Pointer to the timekeeper to update + * @cycle_now: Current clocksource read value * * Forward the current clock to update its state since the last call to * update_wall_time(). This is useful before significant clock changes, * as it avoids having to deal with this time offset explicitly. */ -static void timekeeping_forward_now(struct timekeeper *tk) +static void timekeeping_forward(struct timekeeper *tk, u64 cycle_now) { - u64 cycle_now, delta; + u64 delta = clocksource_delta(cycle_now, tk->tkr_mono.cycle_last, tk->tkr_mono.mask, + tk->tkr_mono.clock->max_raw_delta); - cycle_now = tk_clock_read(&tk->tkr_mono); - delta = clocksource_delta(cycle_now, tk->tkr_mono.cycle_last, tk->tkr_mono.mask, - tk->tkr_mono.clock->max_raw_delta); tk->tkr_mono.cycle_last = cycle_now; tk->tkr_raw.cycle_last = cycle_now; @@ -710,6 +709,21 @@ static void timekeeping_forward_now(struct timekeeper *tk) } } +/** + * timekeeping_forward_now - update clock to the current time + * @tk: Pointer to the timekeeper to update + * + * Forward the current clock to update its state since the last call to + * update_wall_time(). This is useful before significant clock changes, + * as it avoids having to deal with this time offset explicitly. + */ +static void timekeeping_forward_now(struct timekeeper *tk) +{ + u64 cycle_now = tk_clock_read(&tk->tkr_mono); + + timekeeping_forward(tk, cycle_now); +} + /** * ktime_get_real_ts64 - Returns the time of day in a timespec64. * @ts: pointer to the timespec to be set @@ -2151,6 +2165,54 @@ static u64 logarithmic_accumulation(struct timekeeper *tk, u64 offset, return offset; } +static u64 timekeeping_accumulate(struct timekeeper *tk, u64 offset, + enum timekeeping_adv_mode mode, + unsigned int *clock_set) +{ + int shift = 0, maxshift; + + /* + * TK_ADV_FREQ indicates that adjtimex(2) directly set the + * frequency or the tick length. + * + * Accumulate the offset, so that the new multiplier starts from + * now. This is required as otherwise for offsets, which are + * smaller than tk::cycle_interval, timekeeping_adjust() could set + * xtime_nsec backwards, which subsequently causes time going + * backwards in the coarse time getters. But even for the case + * where offset is greater than tk::cycle_interval the periodic + * accumulation does not have much value. + * + * Also reset tk::ntp_error as it does not make sense to keep the + * old accumulated error around in this case. + */ + if (mode == TK_ADV_FREQ) { + timekeeping_forward(tk, tk->tkr_mono.cycle_last + offset); + tk->ntp_error = 0; + return 0; + } + + /* + * With NO_HZ we may have to accumulate many cycle_intervals + * (think "ticks") worth of time at once. To do this efficiently, + * we calculate the largest doubling multiple of cycle_intervals + * that is smaller than the offset. We then accumulate that + * chunk in one go, and then try to consume the next smaller + * doubled multiple. + */ + shift = ilog2(offset) - ilog2(tk->cycle_interval); + shift = max(0, shift); + /* Bound shift to one less than what overflows tick_length */ + maxshift = (64 - (ilog2(ntp_tick_length()) + 1)) - 1; + shift = min(shift, maxshift); + while (offset >= tk->cycle_interval) { + offset = logarithmic_accumulation(tk, offset, shift, clock_set); + if (offset < tk->cycle_interval << shift) + shift--; + } + return offset; +} + /* * timekeeping_advance - Updates the timekeeper to the current time and * current NTP tick length @@ -2160,7 +2222,6 @@ static bool timekeeping_advance(enum timekeeping_adv_mode mode) struct timekeeper *tk = &tk_core.shadow_timekeeper; struct timekeeper *real_tk = &tk_core.timekeeper; unsigned int clock_set = 0; - int shift = 0, maxshift; u64 offset; guard(raw_spinlock_irqsave)(&tk_core.lock); @@ -2177,24 +2238,7 @@ static bool timekeeping_advance(enum timekeeping_adv_mode mode) if (offset < real_tk->cycle_interval && mode == TK_ADV_TICK) return false; - /* - * With NO_HZ we may have to accumulate many cycle_intervals - * (think "ticks") worth of time at once. To do this efficiently, - * we calculate the largest doubling multiple of cycle_intervals - * that is smaller than the offset. We then accumulate that - * chunk in one go, and then try to consume the next smaller - * doubled multiple. - */ - shift = ilog2(offset) - ilog2(tk->cycle_interval); - shift = max(0, shift); - /* Bound shift to one less than what overflows tick_length */ - maxshift = (64 - (ilog2(ntp_tick_length())+1)) - 1; - shift = min(shift, maxshift); - while (offset >= tk->cycle_interval) { - offset = logarithmic_accumulation(tk, offset, shift, &clock_set); - if (offset < tk->cycle_interval<<shift) - shift--; - } + offset = timekeeping_accumulate(tk, offset, mode, &clock_set); /* Adjust the multiplier to correct NTP error */ timekeeping_adjust(tk, offset); -- 2.49.0.395.g12beb8f557-goog

3 months, 2 weeks

3
22
0 0

selftests: cgroup: Failures – Timeouts & OOM Issues Analysis

by Naresh Kamboju

As part of LKFT’s re-validation of known issues, we have observed that the selftests: cgroup suite is consistently failing across almost all LKFT-supported devices due to: - Test timeouts (45 seconds limit reached) - OOM-killer invocation ## Key Questions for Discussion: - Would it be beneficial to increase the test timeout to ~180 seconds to allow sufficient execution time? - Should we enhance logging to explicitly print failure reasons when a test fails? - Are there any missing dependencies that could be causing these failures? Note: The required selftests/cgroup/config options were included in LKFT's build and test plans. ## Devices Affected: The following DUTs consistently experience these failures: - dragonboard-410c (arm64) - dragonboard-845c (arm64) - e850-96 (arm64) - juno-r2 (arm64) - qemu-arm64 (arm64) - qemu-armv7 - qemu-x86_64 - rk3399-rock-pi-4b (arm64) - x15 (arm) - x86_64 Regression Analysis: - New regression? No (these failures have been observed for months/years). - Reproducibility? Yes, the failures occur consistently. - Test suite affected? selftests: cgroup (timeouts and OOM-related failures). Test regression: selftests cgroup fails timeout and oom-killer Reported-by: Linux Kernel Functional Testing <lkft(a)linaro.org> ## Test log: # selftests: cgroup: test_cpu # ok 1 test_cpucg_subtree_control # ok 2 test_cpucg_stats # ok 3 test_cpucg_nice # not ok 4 test_cpucg_weight_overprovisioned # ok 5 test_cpucg_weight_underprovisioned # ok 6 test_cpucg_nested_weight_overprovisioned # ok 7 test_cpucg_nested_weight_underprovisioned # not ok 2 selftests: cgroup: test_cpu # TIMEOUT 45 seconds <trim> # selftests: cgroup: test_freezer # ok 1 test_cgfreezer_simple # ok 2 test_cgfreezer_tree # ok 3 test_cgfreezer_forkbomb # ok 4 test_cgfreezer_mkdir # ok 5 test_cgfreezer_rmdir # ok 6 test_cgfreezer_migrate # Cgroup /sys/fs/cgroup/cg_test_ptrace isn't frozen # not ok 7 test_cgfreezer_ptrace # ok 8 test_cgfreezer_stopped # ok 9 test_cgfreezer_ptraced # ok 10 test_cgfreezer_vfork not ok 4 selftests: cgroup: test_freezer # exit=1 <trim> selftests: cgroup: test_kmem # not ok 7 selftests: cgroup: test_kmem # TIMEOUT 45 seconds <trim> # selftests: cgroup: test_memcontrol # ok 1 test_memcg_subtree_control # not ok 2 test_memcg_current_peak # not ok 3 test_memcg_min # not ok 4 test_memcg_low # not ok 5 test_memcg_high # ok 6 test_memcg_high_sync [ 270.699078] test_memcontrol invoked oom-killer: gfp_mask=0xcc0(GFP_KERNEL), order=0, oom_score_adj=0 [ 270.699921] CPU: 1 UID: 0 PID: 946 Comm: test_memcontrol Not tainted 6.14.0-rc5-next-20250303 #1 [ 270.699930] Hardware name: Radxa ROCK Pi 4B (DT) <trim> [ 270.729527] Memory cgroup out of memory: Killed process 946 (test_memcontrol) total-vm:104840kB, anon-rss:30596kB, file-rss:1056kB, shmem-rss:0kB, UID:0 pgtables:104kB oom_score_adj:0 # not ok 7 test_memcg_max # not ok 8 test_memcg_reclaim <trim> not ok 8 selftests: cgroup: test_memcontrol # exit=1 ## Source * Kernel version: 6.14.0-rc5-next-20250303 * Git tree: https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git * Git sha: cd3215bbcb9d4321def93fea6cfad4d5b42b9d1d * Git describe: 6.14.0-rc5-next-20250303 * Project details: https://qa-reports.linaro.org/lkft/linux-next-master/build/next-20250303/ ## Test data * Test log: https://qa-reports.linaro.org/lkft/linux-next-master/build/next-20250303/te… * Test history: https://qa-reports.linaro.org/lkft/linux-next-master/build/next-20250303/te… * Test details: https://qa-reports.linaro.org/lkft/linux-next-master/build/next-20250303/te… * Test logs rock pi: https://lkft.validation.linaro.org/scheduler/job/8148789#L1774 * Test logs x86: https://lkft.validation.linaro.org/scheduler/job/8148731#L1948 -- Linaro LKFT https://lkft.linaro.org

3 months, 2 weeks

2
2
0 0

[PATCH] kunit: cs_dsp: Depend on FW_CS_DSP rather then enabling it

by Nico Pache

FW_CS_DSP gets enabled if KUNIT is enabled. The test should rather depend on if the feature is enabled. Fix this by moving FW_CS_DSP to the depends on clause, and set CONFIG_FW_CS_DSP=y in the kunit tooling. Fixes: dd0b6b1f29b9 ("firmware: cs_dsp: Add KUnit testing of bin file download") Signed-off-by: Nico Pache <npache(a)redhat.com> --- drivers/firmware/cirrus/Kconfig | 3 +-- tools/testing/kunit/configs/all_tests.config | 2 ++ 2 files changed, 3 insertions(+), 2 deletions(-) diff --git a/drivers/firmware/cirrus/Kconfig b/drivers/firmware/cirrus/Kconfig index 0a883091259a..989568ab5712 100644 --- a/drivers/firmware/cirrus/Kconfig +++ b/drivers/firmware/cirrus/Kconfig @@ -11,9 +11,8 @@ config FW_CS_DSP_KUNIT_TEST_UTILS config FW_CS_DSP_KUNIT_TEST tristate "KUnit tests for Cirrus Logic cs_dsp" if !KUNIT_ALL_TESTS - depends on KUNIT && REGMAP + depends on KUNIT && REGMAP && FW_CS_DSP default KUNIT_ALL_TESTS - select FW_CS_DSP select FW_CS_DSP_KUNIT_TEST_UTILS help This builds KUnit tests for cs_dsp. diff --git a/tools/testing/kunit/configs/all_tests.config b/tools/testing/kunit/configs/all_tests.config index b0049be00c70..96c6b4aca87d 100644 --- a/tools/testing/kunit/configs/all_tests.config +++ b/tools/testing/kunit/configs/all_tests.config @@ -49,3 +49,5 @@ CONFIG_SOUND=y CONFIG_SND=y CONFIG_SND_SOC=y CONFIG_SND_SOC_TOPOLOGY_BUILD=y + +CONFIG_FW_CS_DSP=y \ No newline at end of file -- 2.48.1

3 months, 3 weeks

3
21
0 0

[PATCH 0/4] sysctl: Move the u8 range check test to lib/test_sysctl.c

by Joel Granados

Originally introduced to sysctl-test.c by commit b5ffbd139688 ("sysctl: move the extra1/2 boundary check of u8 to sysctl_check_table_array"), it has been shown to lead to a panic under certain conditions related to a dangling registration. This series moves the u8 test to lib/test_sysctl.c where the registration calls are kept and correctly removed on module exit. An additional 0012 test is added to selftests/sysctl/sysctl.sh in order to visualize the registration calls done in test_sysctl.c. Very much related to adding tests to sysctl, the last two patches of this series reduce the places that need to be changed when tests are added by managing the initialization and closing of sysctl tables with a for loop. Comments are greatly appreciated Signed-off-by: Joel Granados <joel.granados(a)kernel.org> --- Joel Granados (4): sysctl: move u8 register test to lib/test_sysctl.c sysctl: Add 0012 to test the u8 range check sysctl: call sysctl tests with a for loop sysctl: Close test ctl_headers with a for loop kernel/sysctl-test.c | 49 ------------ lib/test_sysctl.c | 133 +++++++++++++++++++++---------- tools/testing/selftests/sysctl/sysctl.sh | 30 +++++++ 3 files changed, 122 insertions(+), 90 deletions(-) --- base-commit: 7eb172143d5508b4da468ed59ee857c6e5e01da6 change-id: 20250321-jag-test_extra_val-40954050a1f6 Best regards, -- Joel Granados <joel.granados(a)kernel.org>

3 months, 3 weeks

2
9
0 0

[PATCH bpf-next v2 0/6] selftests/bpf: Various sockmap-related fixes

by Michal Luczaj

Series takes care of few bugs and missing features with the aim to improve the test coverage of sockmap/sockhash. Last patch is a create_pair() rewrite making use of __attribute__((cleanup)) to handle socket fd lifetime. Signed-off-by: Michal Luczaj <mhal(a)rbox.co> --- Changes in v2: - Rebase on bpf-next (Jakub) - Use cleanup helpers from kernel's cleanup.h (Jakub) - Fix subject of patch 3, rephrase patch 4, use correct prefix - Link to v1: https://lore.kernel.org/r/20240724-sockmap-selftest-fixes-v1-0-46165d224712… Changes in v1: - No declarations in function body (Jakub) - Don't touch output arguments until function succeeds (Jakub) - Link to v0: https://lore.kernel.org/netdev/027fdb41-ee11-4be0-a493-22f28a1abd7c@rbox.co/ --- Michal Luczaj (6): selftests/bpf: Support more socket types in create_pair() selftests/bpf: Socket pair creation, cleanups selftests/bpf: Simplify inet_socketpair() and vsock_socketpair_connectible() selftests/bpf: Honour the sotype of af_unix redir tests selftests/bpf: Exercise SOCK_STREAM unix_inet_redir_to_connected() selftests/bpf: Introduce __attribute__((cleanup)) in create_pair() .../selftests/bpf/prog_tests/sockmap_basic.c | 28 ++-- .../selftests/bpf/prog_tests/sockmap_helpers.h | 149 ++++++++++++++------- .../selftests/bpf/prog_tests/sockmap_listen.c | 117 ++-------------- 3 files changed, 124 insertions(+), 170 deletions(-) --- base-commit: 92cc2456e9775dc4333fb4aa430763ae4ac2f2d9 change-id: 20240729-selftest-sockmap-fixes-bcca996e143b Best regards, -- Michal Luczaj <mhal(a)rbox.co>

3 months, 3 weeks

3
26
0 0

[PATCH bpf-next v2 0/2] bpf: fix ktls panic with sockmap and add tests

by Jiayuan Chen

We can reproduce the issue using the existing test program: './test_sockmap --ktls' Or use the selftest I provided, which will cause a panic: ------------[ cut here ]------------ kernel BUG at lib/iov_iter.c:629! PKRU: 55555554 Call Trace: <TASK> ? die+0x36/0x90 ? do_trap+0xdd/0x100 ? iov_iter_revert+0x178/0x180 ? iov_iter_revert+0x178/0x180 ? do_error_trap+0x7d/0x110 ? iov_iter_revert+0x178/0x180 ? exc_invalid_op+0x50/0x70 ? iov_iter_revert+0x178/0x180 ? asm_exc_invalid_op+0x1a/0x20 ? iov_iter_revert+0x178/0x180 ? iov_iter_revert+0x5c/0x180 tls_sw_sendmsg_locked.isra.0+0x794/0x840 tls_sw_sendmsg+0x52/0x80 ? inet_sendmsg+0x1f/0x70 __sys_sendto+0x1cd/0x200 ? find_held_lock+0x2b/0x80 ? syscall_trace_enter+0x140/0x270 ? __lock_release.isra.0+0x5e/0x170 ? find_held_lock+0x2b/0x80 ? syscall_trace_enter+0x140/0x270 ? lockdep_hardirqs_on_prepare+0xda/0x190 ? ktime_get_coarse_real_ts64+0xc2/0xd0 __x64_sys_sendto+0x24/0x30 do_syscall_64+0x90/0x170 1. It looks like the issue started occurring after bpf being introduced to ktls and later the addition of assertions to iov_iter has caused a panic. If my fix tag is incorrect, please assist me in correcting the fix tag. 2. I make minimal changes for now, it's enough to make ktls work correctly. --- v1->v2: Added more content to the commit message https://lore.kernel.org/all/20250123171552.57345-1-mrpre@163.com/#r --- Jiayuan Chen (2): bpf: fix ktls panic with sockmap selftests/bpf: add ktls selftest net/tls/tls_sw.c | 8 +- .../selftests/bpf/prog_tests/sockmap_ktls.c | 174 +++++++++++++++++- .../selftests/bpf/progs/test_sockmap_ktls.c | 26 +++ 3 files changed, 205 insertions(+), 3 deletions(-) create mode 100644 tools/testing/selftests/bpf/progs/test_sockmap_ktls.c -- 2.47.1

3 months, 3 weeks

3
4
0 0

[PATCH] selftests/bpf: close the file descriptor to avoid resource leaks

by Malaya Kumar Rout

Static Analyis for bench_htab_mem.c with cppcheck:error tools/testing/selftests/bpf/benchs/bench_htab_mem.c:284:3: error: Resource leak: fd [resourceLeak] tools/testing/selftests/bpf/prog_tests/sk_assign.c:41:3: error: Resource leak: tc [resourceLeak] fix the issue by closing the file descriptor (fd & tc) when read & fgets operation fails. Signed-off-by: Malaya Kumar Rout <malayarout91(a)gmail.com> --- tools/testing/selftests/bpf/benchs/bench_htab_mem.c | 1 + tools/testing/selftests/bpf/prog_tests/sk_assign.c | 4 +++- 2 files changed, 4 insertions(+), 1 deletion(-) diff --git a/tools/testing/selftests/bpf/benchs/bench_htab_mem.c b/tools/testing/selftests/bpf/benchs/bench_htab_mem.c index 926ee822143e..59746fd2c23a 100644 --- a/tools/testing/selftests/bpf/benchs/bench_htab_mem.c +++ b/tools/testing/selftests/bpf/benchs/bench_htab_mem.c @@ -281,6 +281,7 @@ static void htab_mem_read_mem_cgrp_file(const char *name, unsigned long *value) got = read(fd, buf, sizeof(buf) - 1); if (got <= 0) { *value = 0; + close(fd); return; } buf[got] = 0; diff --git a/tools/testing/selftests/bpf/prog_tests/sk_assign.c b/tools/testing/selftests/bpf/prog_tests/sk_assign.c index 0b9bd1d6f7cc..10a0ab954b8a 100644 --- a/tools/testing/selftests/bpf/prog_tests/sk_assign.c +++ b/tools/testing/selftests/bpf/prog_tests/sk_assign.c @@ -37,8 +37,10 @@ configure_stack(void) tc = popen("tc -V", "r"); if (CHECK_FAIL(!tc)) return false; - if (CHECK_FAIL(!fgets(tc_version, sizeof(tc_version), tc))) + if (CHECK_FAIL(!fgets(tc_version, sizeof(tc_version), tc))) { + pclose(tc); return false; + } if (strstr(tc_version, ", libbpf ")) prog = "test_sk_assign_libbpf.bpf.o"; else -- 2.43.0

3 months, 3 weeks

5
10
0 0

[PATCH] selftests/x86/lam: fix memory leak and resource leak in lam.c

by Malaya Kumar Rout

Static Analyis for bench_htab_mem.c with cppcheck:error tools/testing/selftests/x86/lam.c:585:3: error: Resource leak: file_fd [resourceLeak] tools/testing/selftests/x86/lam.c:593:3: error: Resource leak: file_fd [resourceLeak] tools/testing/selftests/x86/lam.c:600:3: error: Memory leak: fi [memleak] tools/testing/selftests/x86/lam.c:1066:2: error: Resource leak: fd [resourceLeak] fix the issue by closing the file descriptors and releasing the allocated memory. Signed-off-by: Malaya Kumar Rout <malayarout91(a)gmail.com> --- tools/testing/selftests/x86/lam.c | 20 +++++++++++++------- 1 file changed, 13 insertions(+), 7 deletions(-) diff --git a/tools/testing/selftests/x86/lam.c b/tools/testing/selftests/x86/lam.c index 4d4a76532dc9..0b43b83ad142 100644 --- a/tools/testing/selftests/x86/lam.c +++ b/tools/testing/selftests/x86/lam.c @@ -581,24 +581,28 @@ int do_uring(unsigned long lam) if (file_fd < 0) return 1; - if (fstat(file_fd, &st) < 0) + if (fstat(file_fd, &st) < 0) { + close(file_fd); return 1; - + } off_t file_sz = st.st_size; int blocks = (int)(file_sz + URING_BLOCK_SZ - 1) / URING_BLOCK_SZ; fi = malloc(sizeof(*fi) + sizeof(struct iovec) * blocks); - if (!fi) + if (!fi) { + close(file_fd); return 1; - + } fi->file_sz = file_sz; fi->file_fd = file_fd; ring = malloc(sizeof(*ring)); - if (!ring) + if (!ring) { + close(file_fd); + free(fi); return 1; - + } memset(ring, 0, sizeof(struct io_ring)); if (setup_io_uring(ring)) @@ -1060,8 +1064,10 @@ void *allocate_dsa_pasid(void) wq = mmap(NULL, 0x1000, PROT_WRITE, MAP_SHARED | MAP_POPULATE, fd, 0); - if (wq == MAP_FAILED) + if (wq == MAP_FAILED) { + close(fd); perror("mmap"); + } return wq; } -- 2.43.0

3 months, 3 weeks

4
10
0 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

Linux-kselftest-mirror March 2025