- Linux-stable-mirror - lists.linaro.org

[merged mm-hotfixes-stable] mm-kmemleak-avoid-deadlock-by-moving-pr_warn-outside-kmemleak_lock.patch removed from -mm tree

by Andrew Morton

The quilt patch titled Subject: mm/kmemleak: avoid deadlock by moving pr_warn() outside kmemleak_lock has been removed from the -mm tree. Its filename was mm-kmemleak-avoid-deadlock-by-moving-pr_warn-outside-kmemleak_lock.patch This patch was dropped because it was merged into the mm-hotfixes-stable branch of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm ------------------------------------------------------ From: Breno Leitao <leitao(a)debian.org> Subject: mm/kmemleak: avoid deadlock by moving pr_warn() outside kmemleak_lock Date: Thu, 31 Jul 2025 02:57:18 -0700 When netpoll is enabled, calling pr_warn_once() while holding kmemleak_lock in mem_pool_alloc() can cause a deadlock due to lock inversion with the netconsole subsystem. This occurs because pr_warn_once() may trigger netpoll, which eventually leads to __alloc_skb() and back into kmemleak code, attempting to reacquire kmemleak_lock. This is the path for the deadlock. mem_pool_alloc() -> raw_spin_lock_irqsave(&kmemleak_lock, flags); -> pr_warn_once() -> netconsole subsystem -> netpoll -> __alloc_skb -> __create_object -> raw_spin_lock_irqsave(&kmemleak_lock, flags); Fix this by setting a flag and issuing the pr_warn_once() after kmemleak_lock is released. Link: https://lkml.kernel.org/r/20250731-kmemleak_lock-v1-1-728fd470198f@debian.o… Fixes: c5665868183f ("mm: kmemleak: use the memory pool for early allocations") Signed-off-by: Breno Leitao <leitao(a)debian.org> Reported-by: Jakub Kicinski <kuba(a)kernel.org> Acked-by: Catalin Marinas <catalin.marinas(a)arm.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- mm/kmemleak.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) --- a/mm/kmemleak.c~mm-kmemleak-avoid-deadlock-by-moving-pr_warn-outside-kmemleak_lock +++ a/mm/kmemleak.c @@ -470,6 +470,7 @@ static struct kmemleak_object *mem_pool_ { unsigned long flags; struct kmemleak_object *object; + bool warn = false; /* try the slab allocator first */ if (object_cache) { @@ -488,8 +489,10 @@ static struct kmemleak_object *mem_pool_ else if (mem_pool_free_count) object = &mem_pool[--mem_pool_free_count]; else - pr_warn_once("Memory pool empty, consider increasing CONFIG_DEBUG_KMEMLEAK_MEM_POOL_SIZE\n"); + warn = true; raw_spin_unlock_irqrestore(&kmemleak_lock, flags); + if (warn) + pr_warn_once("Memory pool empty, consider increasing CONFIG_DEBUG_KMEMLEAK_MEM_POOL_SIZE\n"); return object; } _ Patches currently in -mm which might be from leitao(a)debian.org are

3 weeks, 2 days

1
0
0 0

[merged mm-hotfixes-stable] kasan-test-fix-protection-against-compiler-elision.patch removed from -mm tree

by Andrew Morton

The quilt patch titled Subject: kasan/test: fix protection against compiler elision has been removed from the -mm tree. Its filename was kasan-test-fix-protection-against-compiler-elision.patch This patch was dropped because it was merged into the mm-hotfixes-stable branch of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm ------------------------------------------------------ From: Jann Horn <jannh(a)google.com> Subject: kasan/test: fix protection against compiler elision Date: Mon, 28 Jul 2025 22:11:54 +0200 The kunit test is using assignments to "static volatile void *kasan_ptr_result" to prevent elision of memory loads, but that's not working: In this variable definition, the "volatile" applies to the "void", not to the pointer. To make "volatile" apply to the pointer as intended, it must follow after the "*". This makes the kasan_memchr test pass again on my system. The kasan_strings test is still failing because all the definitions of load_unaligned_zeropad() are lacking explicit instrumentation hooks and ASAN does not instrument asm() memory operands. Link: https://lkml.kernel.org/r/20250728-kasan-kunit-fix-volatile-v1-1-e7157c9af8… Fixes: 5f1c8108e7ad ("mm:kasan: fix sparse warnings: Should it be static?") Signed-off-by: Jann Horn <jannh(a)google.com> Cc: Alexander Potapenko <glider(a)google.com> Cc: Andrey Konovalov <andreyknvl(a)gmail.com> Cc: Andrey Ryabinin <ryabinin.a.a(a)gmail.com> Cc: Dmitriy Vyukov <dvyukov(a)google.com> Cc: Jann Horn <jannh(a)google.com> Cc: Nihar Chaithanya <niharchaithanya(a)gmail.com> Cc: Vincenzo Frascino <vincenzo.frascino(a)arm.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- mm/kasan/kasan_test_c.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) --- a/mm/kasan/kasan_test_c.c~kasan-test-fix-protection-against-compiler-elision +++ a/mm/kasan/kasan_test_c.c @@ -47,7 +47,7 @@ static struct { * Some tests use these global variables to store return values from function * calls that could otherwise be eliminated by the compiler as dead code. */ -static volatile void *kasan_ptr_result; +static void *volatile kasan_ptr_result; static volatile int kasan_int_result; /* Probe for console output: obtains test_status lines of interest. */ _ Patches currently in -mm which might be from jannh(a)google.com are kasan-add-test-for-slab_typesafe_by_rcu-quarantine-skipping.patch kasan-add-test-for-slab_typesafe_by_rcu-quarantine-skipping-v2.patch

3 weeks, 2 days

1
0
0 0

[PATCH] NFSD: Fix destination buffer size in nfsd4_ssc_setup_dul()

by Thorsten Blum

Commit 5304877936c0 ("NFSD: Fix strncpy() fortify warning") replaced strncpy(,, sizeof(..)) with strlcpy(,, sizeof(..) - 1), but strlcpy() already guaranteed NUL-termination of the destination buffer and subtracting one byte potentially truncated the source string. The incorrect size was then carried over in commit 72f78ae00a8e ("NFSD: move from strlcpy with unused retval to strscpy") when switching from strlcpy() to strscpy(). Fix this off-by-one error by using the full size of the destination buffer again. Cc: stable(a)vger.kernel.org Fixes: 5304877936c0 ("NFSD: Fix strncpy() fortify warning") Signed-off-by: Thorsten Blum <thorsten.blum(a)linux.dev> --- fs/nfsd/nfs4proc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c index 71b428efcbb5..32be002a248f 100644 --- a/fs/nfsd/nfs4proc.c +++ b/fs/nfsd/nfs4proc.c @@ -1469,7 +1469,7 @@ static __be32 nfsd4_ssc_setup_dul(struct nfsd_net *nn, char *ipaddr, return 0; } if (work) { - strscpy(work->nsui_ipaddr, ipaddr, sizeof(work->nsui_ipaddr) - 1); + strscpy(work->nsui_ipaddr, ipaddr); refcount_set(&work->nsui_refcnt, 2); work->nsui_busy = true; list_add_tail(&work->nsui_list, &nn->nfsd_ssc_mount_list); -- 2.50.1

3 weeks, 2 days

2
1
0 0

[PATCH AUTOSEL 6.16-6.6] mfd: axp20x: Set explicit ID for AXP313 regulator

by Sasha Levin

From: Chen-Yu Tsai <wens(a)csie.org> [ Upstream commit 88828c7e940dd45d139ad4a39d702b23840a37c5 ] On newer boards featuring the A523 SoC, the AXP323 (related to the AXP313) is paired with the AXP717 and serves as a secondary PMIC providing additional regulator outputs. However the MFD cells are all registered with PLATFORM_DEVID_NONE, which causes the regulator cells to conflict with each other. Commit e37ec3218870 ("mfd: axp20x: Allow multiple regulators") attempted to fix this by switching to PLATFORM_DEVID_AUTO so that the device names would all be different, however that broke IIO channel mapping, which is also tied to the device names. As a result the change was later reverted. Instead, here we attempt to make sure the AXP313/AXP323 regulator cell does not conflict by explicitly giving it an ID number. This was previously done for the AXP809+AXP806 pair used with the A80 SoC. Signed-off-by: Chen-Yu Tsai <wens(a)csie.org> Link: https://lore.kernel.org/r/20250619173207.3367126-1-wens@kernel.org Signed-off-by: Lee Jones <lee(a)kernel.org> Signed-off-by: Sasha Levin <sashal(a)kernel.org> --- LLM Generated explanations, may be completely bogus: Based on my analysis, here's my assessment: **Backport Status: YES** This commit is suitable for backporting to stable kernel trees for the following reasons: 1. **Fixes a real bug**: The commit addresses a device naming conflict issue when using AXP313/AXP323 PMICs alongside AXP717 as a secondary PMIC. Without this fix, the kernel produces a sysfs duplicate filename error and fails to properly register the secondary regulator device. 2. **Small and contained change**: The fix is minimal - it only changes one line of code from `MFD_CELL_NAME("axp20x-regulator")` to `MFD_CELL_BASIC("axp20x-regulator", NULL, NULL, 0, 1)`, which explicitly sets an ID of 1 for the AXP313 regulator cell. 3. **Follows established pattern**: The commit follows an existing pattern already used in the same driver for the AXP806 PMIC (lines 1173-1174 in axp806_cells), which also sets an explicit ID (2) to avoid conflicts when paired with AXP809. 4. **Minimal risk of regression**: The change only affects AXP313/AXP323 devices and doesn't touch other PMIC configurations. The explicit ID assignment is a safe approach that doesn't break existing IIO channel mappings (which was the problem with the previous PLATFORM_DEVID_AUTO approach mentioned in the commit message). 5. **Clear problem and solution**: The commit message clearly explains the issue (sysfs duplicate filename error) and references the history of previous attempts to fix similar issues (commit e37ec3218870 and its revert). The solution is targeted and doesn't introduce architectural changes. 6. **Hardware enablement fix**: This fix enables proper functioning of boards with the A523 SoC that use dual PMIC configurations (AXP323 + AXP717), which would otherwise fail to initialize properly. The commit meets the stable tree criteria of being an important bugfix with minimal risk and contained scope. It fixes a specific hardware configuration issue without introducing new features or making broad architectural changes. drivers/mfd/axp20x.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/mfd/axp20x.c b/drivers/mfd/axp20x.c index e9914e8a29a3..25c639b348cd 100644 --- a/drivers/mfd/axp20x.c +++ b/drivers/mfd/axp20x.c @@ -1053,7 +1053,8 @@ static const struct mfd_cell axp152_cells[] = { }; static struct mfd_cell axp313a_cells[] = { - MFD_CELL_NAME("axp20x-regulator"), + /* AXP323 is sometimes paired with AXP717 as sub-PMIC */ + MFD_CELL_BASIC("axp20x-regulator", NULL, NULL, 0, 1), MFD_CELL_RES("axp313a-pek", axp313a_pek_resources), }; -- 2.39.5

3 weeks, 2 days

2
72
0 0

[PATCH v3 0/4] kcov, usb: Fix invalid context sleep in softirq path on PREEMPT_RT

by Yunseong Kim

This patch series resolves a sleeping function called from invalid context bug that occurs when fuzzing USB with syzkaller on a PREEMPT_RT kernel. The regression was introduced by the interaction of two separate patches: one that made kcov's internal locks sleep on PREEMPT_RT for better latency (d5d2c51f1e5f), and another that wrapped a kcov call in the USB softirq path with local_irq_save() to prevent re-entrancy (f85d39dd7ed8). This combination resulted in an attempt to acquire a sleeping lock from within an atomic context, causing a kernel BUG. To resolve this, this series makes the kcov remote path fully compatible with atomic contexts by converting all its internal locking primitives to non-sleeping variants. This approach is more robust than conditional compilation as it creates a single, unified codebase that works correctly on both RT and non-RT kernels. The series is structured as follows: Patch 1 converts the global kcov locks (kcov->lock and kcov_remote_lock) to use the non-sleeping raw_spinlock_t. Patch 2 replace the PREEMPT_RT-specific per-CPU local_lock_t back to the original local_irq_save/restore primitives, making the per-CPU protection non-sleeping as well. Patches 3 and 4 are preparatory refactoring. They move the memory allocation for remote handles out of the locked sections in the KCOV_REMOTE_ENABLE ioctl path, which is a prerequisite for safely using raw_spinlock_t as it forbids sleeping functions like kmalloc within its critical section. With these changes, I have been able to run syzkaller fuzzing on a PREEMPT_RT kernel for a full day with no issues reported. Reproduction details in here. Link: https://lore.kernel.org/all/20250725201400.1078395-2-ysk@kzalloc.com/t/#u Signed-off-by: Yunseong Kim <ysk(a)kzalloc.com> --- Changes from v2: 1. Updated kcov_remote_reset() to use raw_spin_lock_irqsave() / raw_spin_unlock_irqrestore() instead of raw_spin_lock() / raw_spin_unlock(), following the interrupt disabling pattern used in the original function that guard kcov_remote_lock. Changes from v1: 1. Dropped the #ifdef-based PREEMPT_RT branching. 2. Convert kcov->lock and kcov_remote_lock from spinlock_t to raw_spinlock_t. This ensures they remain true, non-sleeping spinlocks even on PREEMPT_RT kernels. 3. Remove the local_lock_t protection for kcov_percpu_data in kcov_remote_start/stop(). Since local_lock_t can also sleep under RT, and the required protection is against local interrupts when accessing per-CPU data, it is replaced with explicit local_irq_save/restore(). 4. Refactor the KCOV_REMOTE_ENABLE path to move memory allocations out of the critical section. 5. Modify the ioctl handling logic to utilize these pre-allocated structures within the critical section. kcov_remote_add() is modified to accept a pre-allocated structure instead of allocating one internally. All necessary struct kcov_remote structures are now pre-allocated individually in kcov_ioctl() using GFP_KERNEL (allowing sleep) before acquiring the raw spinlocks. Changes from v0: 1. On PREEMPT_RT, separated the handling of kcov_remote_start_usb_softirq() and kcov_remote_stop_usb_softirq() to allow sleeping when entering kcov_remote_start_usb() / kcov_remote_stop(). Yunseong Kim (4): kcov: Use raw_spinlock_t for kcov->lock and kcov_remote_lock kcov: Replace per-CPU local_lock with local_irq_save/restore kcov: Separate KCOV_REMOTE_ENABLE ioctl helper function kcov: move remote handle allocation outside raw spinlock kernel/kcov.c | 248 +++++++++++++++++++++++++++----------------------- 1 file changed, 134 insertions(+), 114 deletions(-) base-commit: 186f3edfdd41f2ae87fc40a9ccba52a3bf930994 -- 2.50.0

3 weeks, 2 days

3
11
0 0

[PATCH v2] drm/xe: Defer buffer object shrinker write-backs and GPU waits

by Thomas Hellström

When the xe buffer-object shrinker allows GPU waits and write-back, (typically from kswapd), perform multiple passes, skipping subsequent passes if the shrinker number of scanned objects target is reached. 1) Without GPU waits and write-back 2) Without write-back 3) With both GPU-waits and write-back This is to avoid stalls and costly write- and readbacks unless they are really necessary. v2: - Don't test for scan completion twice. (Stuart Summers) - Update tags. Reported-by: melvyn <melvyn2(a)dnsense.pub> Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/5557 Cc: Summers Stuart <stuart.summers(a)intel.com> Fixes: 00c8efc3180f ("drm/xe: Add a shrinker for xe bos") Cc: <stable(a)vger.kernel.org> # v6.15+ Signed-off-by: Thomas Hellström <thomas.hellstrom(a)linux.intel.com> --- drivers/gpu/drm/xe/xe_shrinker.c | 51 +++++++++++++++++++++++++++++--- 1 file changed, 47 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/xe/xe_shrinker.c b/drivers/gpu/drm/xe/xe_shrinker.c index 1c3c04d52f55..90244fe59b59 100644 --- a/drivers/gpu/drm/xe/xe_shrinker.c +++ b/drivers/gpu/drm/xe/xe_shrinker.c @@ -54,10 +54,10 @@ xe_shrinker_mod_pages(struct xe_shrinker *shrinker, long shrinkable, long purgea write_unlock(&shrinker->lock); } -static s64 xe_shrinker_walk(struct xe_device *xe, - struct ttm_operation_ctx *ctx, - const struct xe_bo_shrink_flags flags, - unsigned long to_scan, unsigned long *scanned) +static s64 __xe_shrinker_walk(struct xe_device *xe, + struct ttm_operation_ctx *ctx, + const struct xe_bo_shrink_flags flags, + unsigned long to_scan, unsigned long *scanned) { unsigned int mem_type; s64 freed = 0, lret; @@ -93,6 +93,48 @@ static s64 xe_shrinker_walk(struct xe_device *xe, return freed; } +/* + * Try shrinking idle objects without writeback first, then if not sufficient, + * try also non-idle objects and finally if that's not sufficient either, + * add writeback. This avoids stalls and explicit writebacks with light or + * moderate memory pressure. + */ +static s64 xe_shrinker_walk(struct xe_device *xe, + struct ttm_operation_ctx *ctx, + const struct xe_bo_shrink_flags flags, + unsigned long to_scan, unsigned long *scanned) +{ + bool no_wait_gpu = true; + struct xe_bo_shrink_flags save_flags = flags; + s64 lret, freed; + + swap(no_wait_gpu, ctx->no_wait_gpu); + save_flags.writeback = false; + lret = __xe_shrinker_walk(xe, ctx, save_flags, to_scan, scanned); + swap(no_wait_gpu, ctx->no_wait_gpu); + if (lret < 0 || *scanned >= to_scan) + return lret; + + freed = lret; + if (!ctx->no_wait_gpu) { + lret = __xe_shrinker_walk(xe, ctx, save_flags, to_scan, scanned); + if (lret < 0) + return lret; + freed += lret; + if (*scanned >= to_scan) + return freed; + } + + if (flags.writeback) { + lret = __xe_shrinker_walk(xe, ctx, flags, to_scan, scanned); + if (lret < 0) + return lret; + freed += lret; + } + + return freed; +} + static unsigned long xe_shrinker_count(struct shrinker *shrink, struct shrink_control *sc) { @@ -199,6 +241,7 @@ static unsigned long xe_shrinker_scan(struct shrinker *shrink, struct shrink_con runtime_pm = xe_shrinker_runtime_pm_get(shrinker, true, 0, can_backup); shrink_flags.purge = false; + lret = xe_shrinker_walk(shrinker->xe, &ctx, shrink_flags, nr_to_scan, &nr_scanned); if (lret >= 0) -- 2.50.1

3 weeks, 2 days

3
2
0 0

[PATCH v2] RDMA/siw: Fix the sendmsg byte count in siw_tcp_sendpages

by Pedro Falcato

Ever since commit c2ff29e99a76 ("siw: Inline do_tcp_sendpages()"), we have been doing this: static int siw_tcp_sendpages(struct socket *s, struct page **page, int offset, size_t size) [...] /* Calculate the number of bytes we need to push, for this page * specifically */ size_t bytes = min_t(size_t, PAGE_SIZE - offset, size); /* If we can't splice it, then copy it in, as normal */ if (!sendpage_ok(page[i])) msg.msg_flags &= ~MSG_SPLICE_PAGES; /* Set the bvec pointing to the page, with len $bytes */ bvec_set_page(&bvec, page[i], bytes, offset); /* Set the iter to $size, aka the size of the whole sendpages (!!!) */ iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, &bvec, 1, size); try_page_again: lock_sock(sk); /* Sendmsg with $size size (!!!) */ rv = tcp_sendmsg_locked(sk, &msg, size); This means we've been sending oversized iov_iters and tcp_sendmsg calls for a while. This has a been a benign bug because sendpage_ok() always returned true. With the recent slab allocator changes being slowly introduced into next (that disallow sendpage on large kmalloc allocations), we have recently hit out-of-bounds crashes, due to slight differences in iov_iter behavior between the MSG_SPLICE_PAGES and "regular" copy paths: (MSG_SPLICE_PAGES) skb_splice_from_iter iov_iter_extract_pages iov_iter_extract_bvec_pages uses i->nr_segs to correctly stop in its tracks before OoB'ing everywhere skb_splice_from_iter gets a "short" read (!MSG_SPLICE_PAGES) skb_copy_to_page_nocache copy=iov_iter_count [...] copy_from_iter /* this doesn't help */ if (unlikely(iter->count < len)) len = iter->count; iterate_bvec ... and we run off the bvecs Fix this by properly setting the iov_iter's byte count, plus sending the correct byte count to tcp_sendmsg_locked. Cc: stable(a)vger.kernel.org Fixes: c2ff29e99a76 ("siw: Inline do_tcp_sendpages()") Reported-by: kernel test robot <oliver.sang(a)intel.com> Closes: https://lore.kernel.org/oe-lkp/202507220801.50a7210-lkp@intel.com Reviewed-by: David Howells <dhowells(a)redhat.com> Signed-off-by: Pedro Falcato <pfalcato(a)suse.de> --- v2: - Add David Howells's Rb on the original patch - Remove the offset increment, since it's dead code drivers/infiniband/sw/siw/siw_qp_tx.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/drivers/infiniband/sw/siw/siw_qp_tx.c b/drivers/infiniband/sw/siw/siw_qp_tx.c index 3a08f57d2211..f7dd32c6e5ba 100644 --- a/drivers/infiniband/sw/siw/siw_qp_tx.c +++ b/drivers/infiniband/sw/siw/siw_qp_tx.c @@ -340,18 +340,17 @@ static int siw_tcp_sendpages(struct socket *s, struct page **page, int offset, if (!sendpage_ok(page[i])) msg.msg_flags &= ~MSG_SPLICE_PAGES; bvec_set_page(&bvec, page[i], bytes, offset); - iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, &bvec, 1, size); + iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, &bvec, 1, bytes); try_page_again: lock_sock(sk); - rv = tcp_sendmsg_locked(sk, &msg, size); + rv = tcp_sendmsg_locked(sk, &msg, bytes); release_sock(sk); if (rv > 0) { size -= rv; sent += rv; if (rv != bytes) { - offset += rv; bytes -= rv; goto try_page_again; } -- 2.50.1

3 weeks, 2 days

4
5
0 0

[PATCH v3] vhost/net: Protect ubufs with rcu read lock in vhost_net_ubuf_put()

by Nikolay Kuratov

When operating on struct vhost_net_ubuf_ref, the following execution sequence is theoretically possible: CPU0 is finalizing DMA operation CPU1 is doing VHOST_NET_SET_BACKEND // ubufs->refcount == 2 vhost_net_ubuf_put() vhost_net_ubuf_put_wait_and_free(oldubufs) vhost_net_ubuf_put_and_wait() vhost_net_ubuf_put() int r = atomic_sub_return(1, &ubufs->refcount); // r = 1 int r = atomic_sub_return(1, &ubufs->refcount); // r = 0 wait_event(ubufs->wait, !atomic_read(&ubufs->refcount)); // no wait occurs here because condition is already true kfree(ubufs); if (unlikely(!r)) wake_up(&ubufs->wait); // use-after-free This leads to use-after-free on ubufs access. This happens because CPU1 skips waiting for wake_up() when refcount is already zero. To prevent that use a read-side RCU critical section in vhost_net_ubuf_put(), as suggested by Hillf Danton. For this lock to take effect, free ubufs with kfree_rcu(). Cc: stable(a)vger.kernel.org Fixes: 0ad8b480d6ee9 ("vhost: fix ref cnt checking deadlock") Reported-by: Andrey Ryabinin <arbn(a)yandex-team.com> Suggested-by: Hillf Danton <hdanton(a)sina.com> Signed-off-by: Nikolay Kuratov <kniv(a)yandex-team.ru> --- v2: * move reinit_completion() into vhost_net_flush(), thanks to Hillf Danton * add Tested-by: Lei Yang * check that usages of put_and_wait() are consistent across LTS kernels v3: * use rcu_read_lock() with kfree_rcu() instead of completion, as suggested by Hillf Danton drivers/vhost/net.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c index 6edac0c1ba9b..c6508fe0d5c8 100644 --- a/drivers/vhost/net.c +++ b/drivers/vhost/net.c @@ -99,6 +99,7 @@ struct vhost_net_ubuf_ref { atomic_t refcount; wait_queue_head_t wait; struct vhost_virtqueue *vq; + struct rcu_head rcu; }; #define VHOST_NET_BATCH 64 @@ -250,9 +251,13 @@ vhost_net_ubuf_alloc(struct vhost_virtqueue *vq, bool zcopy) static int vhost_net_ubuf_put(struct vhost_net_ubuf_ref *ubufs) { - int r = atomic_sub_return(1, &ubufs->refcount); + int r; + + rcu_read_lock(); + r = atomic_sub_return(1, &ubufs->refcount); if (unlikely(!r)) wake_up(&ubufs->wait); + rcu_read_unlock(); return r; } @@ -265,7 +270,7 @@ static void vhost_net_ubuf_put_and_wait(struct vhost_net_ubuf_ref *ubufs) static void vhost_net_ubuf_put_wait_and_free(struct vhost_net_ubuf_ref *ubufs) { vhost_net_ubuf_put_and_wait(ubufs); - kfree(ubufs); + kfree_rcu(ubufs, rcu); } static void vhost_net_clear_ubuf_info(struct vhost_net *n) -- 2.34.1

3 weeks, 2 days

1
0
0 0

[PATCH v3 1/3] sched_ext: Mark scx_bpf_cpu_rq as NULL returnable

by Christian Loehle

scx_bpf_cpu_rq() obviously returns NULL on invalid cpu. Mark it as such. While kf_cpu_valid() will trigger scx_ops_error() that leads to the BPF scheduler exiting, this isn't guaranteed to be immediate, allowing for a dereference of a NULL scx_bpf_cpu_rq() return value. Cc: stable(a)vger.kernel.org Fixes: 6203ef73fa5c ("sched/ext: Add BPF function to fetch rq") Signed-off-by: Christian Loehle <christian.loehle(a)arm.com> Acked-by: Andrea Righi <arighi(a)nvidia.com> --- kernel/sched/ext.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c index 7dedc9a16281..3ea3f0f18030 100644 --- a/kernel/sched/ext.c +++ b/kernel/sched/ext.c @@ -7589,7 +7589,7 @@ BTF_ID_FLAGS(func, scx_bpf_get_online_cpumask, KF_ACQUIRE) BTF_ID_FLAGS(func, scx_bpf_put_cpumask, KF_RELEASE) BTF_ID_FLAGS(func, scx_bpf_task_running, KF_RCU) BTF_ID_FLAGS(func, scx_bpf_task_cpu, KF_RCU) -BTF_ID_FLAGS(func, scx_bpf_cpu_rq) +BTF_ID_FLAGS(func, scx_bpf_cpu_rq, KF_RET_NULL) #ifdef CONFIG_CGROUP_SCHED BTF_ID_FLAGS(func, scx_bpf_task_cgroup, KF_RCU | KF_ACQUIRE) #endif -- 2.34.1

3 weeks, 3 days

1
0
0 0

[PATCH] drm/amdgpu: Raven: don't allow mixing GTT and VRAM

by Brian Geffon

Commit 81d0bcf99009 ("drm/amdgpu: make display pinning more flexible (v2)") allowed for newer ASICs to mix GTT and VRAM, this change also noted that some older boards, such as Stoney and Carrizo do not support this. It appears that at least one additional ASIC does not support this which is Raven. We observed this issue when migrating a device from a 5.4 to 6.6 kernel and have confirmed that Raven also needs to be excluded from mixing GTT and VRAM. Fixes: 81d0bcf99009 ("drm/amdgpu: make display pinning more flexible (v2)") Cc: Luben Tuikov <luben.tuikov(a)amd.com> Cc: Christian König <christian.koenig(a)amd.com> Cc: Alex Deucher <alexander.deucher(a)amd.com> Cc: stable(a)vger.kernel.org # 6.1+ Tested-by: Thadeu Lima de Souza Cascardo <cascardo(a)igalia.com> Signed-off-by: Brian Geffon <bgeffon(a)google.com> --- drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c index 73403744331a..5d7f13e25b7c 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c @@ -1545,7 +1545,8 @@ uint32_t amdgpu_bo_get_preferred_domain(struct amdgpu_device *adev, uint32_t domain) { if ((domain == (AMDGPU_GEM_DOMAIN_VRAM | AMDGPU_GEM_DOMAIN_GTT)) && - ((adev->asic_type == CHIP_CARRIZO) || (adev->asic_type == CHIP_STONEY))) { + ((adev->asic_type == CHIP_CARRIZO) || (adev->asic_type == CHIP_STONEY) || + (adev->asic_type == CHIP_RAVEN))) { domain = AMDGPU_GEM_DOMAIN_VRAM; if (adev->gmc.real_vram_size <= AMDGPU_SG_THRESHOLD) domain = AMDGPU_GEM_DOMAIN_GTT; -- 2.50.0.727.gbf7dc18ff4-goog

3 weeks, 3 days

5
16
0 0

[PATCH v2] vhost/net: Replace wait_queue with completion in ubufs reference

by Nikolay Kuratov

When operating on struct vhost_net_ubuf_ref, the following execution sequence is theoretically possible: CPU0 is finalizing DMA operation CPU1 is doing VHOST_NET_SET_BACKEND // &ubufs->refcount == 2 vhost_net_ubuf_put() vhost_net_ubuf_put_wait_and_free(oldubufs) vhost_net_ubuf_put_and_wait() vhost_net_ubuf_put() int r = atomic_sub_return(1, &ubufs->refcount); // r = 1 int r = atomic_sub_return(1, &ubufs->refcount); // r = 0 wait_event(ubufs->wait, !atomic_read(&ubufs->refcount)); // no wait occurs here because condition is already true kfree(ubufs); if (unlikely(!r)) wake_up(&ubufs->wait); // use-after-free This leads to use-after-free on ubufs access. This happens because CPU1 skips waiting for wake_up() when refcount is already zero. To prevent that use a completion instead of wait_queue as the ubufs notification mechanism. wait_for_completion() guarantees that there will be complete() call prior to its return. We also need to reinit completion in vhost_net_flush(), because refcnt == 0 does not mean freeing in that case. Cc: stable(a)vger.kernel.org Fixes: 0ad8b480d6ee9 ("vhost: fix ref cnt checking deadlock") Reported-by: Andrey Ryabinin <arbn(a)yandex-team.com> Suggested-by: Andrey Smetanin <asmetanin(a)yandex-team.ru> Suggested-by: Hillf Danton <hdanton(a)sina.com> Tested-by: Lei Yang <leiyang(a)redhat.com> (v1) Signed-off-by: Nikolay Kuratov <kniv(a)yandex-team.ru> --- v2: * move reinit_completion() into vhost_net_flush(), thanks to Hillf Danton * add Tested-by: Lei Yang * check that usages of put_and_wait() are consistent across LTS kernels drivers/vhost/net.c | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c index 7cbfc7d718b3..69e1bfb9627e 100644 --- a/drivers/vhost/net.c +++ b/drivers/vhost/net.c @@ -94,7 +94,7 @@ struct vhost_net_ubuf_ref { * >1: outstanding ubufs */ atomic_t refcount; - wait_queue_head_t wait; + struct completion wait; struct vhost_virtqueue *vq; }; @@ -240,7 +240,7 @@ vhost_net_ubuf_alloc(struct vhost_virtqueue *vq, bool zcopy) if (!ubufs) return ERR_PTR(-ENOMEM); atomic_set(&ubufs->refcount, 1); - init_waitqueue_head(&ubufs->wait); + init_completion(&ubufs->wait); ubufs->vq = vq; return ubufs; } @@ -249,14 +249,14 @@ static int vhost_net_ubuf_put(struct vhost_net_ubuf_ref *ubufs) { int r = atomic_sub_return(1, &ubufs->refcount); if (unlikely(!r)) - wake_up(&ubufs->wait); + complete_all(&ubufs->wait); return r; } static void vhost_net_ubuf_put_and_wait(struct vhost_net_ubuf_ref *ubufs) { vhost_net_ubuf_put(ubufs); - wait_event(ubufs->wait, !atomic_read(&ubufs->refcount)); + wait_for_completion(&ubufs->wait); } static void vhost_net_ubuf_put_wait_and_free(struct vhost_net_ubuf_ref *ubufs) @@ -1381,6 +1381,7 @@ static void vhost_net_flush(struct vhost_net *n) mutex_lock(&n->vqs[VHOST_NET_VQ_TX].vq.mutex); n->tx_flush = false; atomic_set(&n->vqs[VHOST_NET_VQ_TX].ubufs->refcount, 1); + reinit_completion(&n->vqs[VHOST_NET_VQ_TX].ubufs->wait); mutex_unlock(&n->vqs[VHOST_NET_VQ_TX].vq.mutex); } } -- 2.34.1

3 weeks, 3 days

2
1
0 0

[PATCH] rust: faux: fix C header link

by Miguel Ojeda

Starting with Rust 1.91.0 (expected 2025-10-30), `rustdoc` has improved some false negatives around intra-doc links [1], and it found a broken intra-doc link we currently have: error: unresolved link to `include/linux/device/faux.h` --> rust/kernel/faux.rs:7:17 | 7 | //! C header: [`include/linux/device/faux.h`] | ^^^^^^^^^^^^^^^^^^^^^^^^^^^ no item named `include/linux/device/faux.h` in scope | = help: to escape `[` and `]` characters, add '\' before them like `\[` or `\]` = note: `-D rustdoc::broken-intra-doc-links` implied by `-D warnings` = help: to override `-D warnings` add `#[allow(rustdoc::broken_intra_doc_links)]` Our `srctree/` C header links are not intra-doc links, thus they need the link destination. Thus fix it. Cc: stable(a)vger.kernel.org Link: https://github.com/rust-lang/rust/pull/132748 [1] Fixes: 78418f300d39 ("rust/kernel: Add faux device bindings") Signed-off-by: Miguel Ojeda <ojeda(a)kernel.org> --- It may have been in 1.90, but the beta branch does not have it, and the rollup PR says 1.91, unlike the PR itself, so I picked 1.91. It happened just after the version bump to 1.91, so it may have to do with that. rust/kernel/faux.rs | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/rust/kernel/faux.rs b/rust/kernel/faux.rs index 7a906099993f..7fe2dd197e37 100644 --- a/rust/kernel/faux.rs +++ b/rust/kernel/faux.rs @@ -4,7 +4,7 @@ //! //! This module provides bindings for working with faux devices in kernel modules. //! -//! C header: [`include/linux/device/faux.h`] +//! C header: [`include/linux/device/faux.h`](srctree/include/linux/device/faux.h) use crate::{bindings, device, error::code::*, prelude::*}; use core::ptr::{addr_of_mut, null, null_mut, NonNull}; base-commit: d2eedaa3909be9102d648a4a0a50ccf64f96c54f -- 2.50.1

3 weeks, 3 days

3
2
0 0

[PATCH v6] riscv: hwprobe: Fix stale vDSO data for late-initialized keys at boot

by Jingwei Wang

The hwprobe vDSO data for some keys, like MISALIGNED_VECTOR_PERF, is determined by an asynchronous kthread. This can create a race condition where the kthread finishes after the vDSO data has already been populated, causing userspace to read stale values. To fix this race, a new 'ready' flag is added to the vDSO data, initialized to 'false' during late_initcall. This flag is checked by both the vDSO's user-space code and the riscv_hwprobe syscall. The syscall serves as a one-time gate, using a completion to wait for any pending probes before populating the data and setting the flag to 'true', thus ensuring userspace reads fresh values on its first request. Reported-by: Tsukasa OI <research_trasio(a)irq.a4lg.com> Closes: https://lore.kernel.org/linux-riscv/760d637b-b13b-4518-b6bf-883d55d44e7f@ir… Fixes: e7c9d66e313b ("RISC-V: Report vector unaligned access speed hwprobe") Cc: Palmer Dabbelt <palmer(a)dabbelt.com> Cc: Alexandre Ghiti <alexghiti(a)rivosinc.com> Cc: Olof Johansson <olof(a)lixom.net> Cc: stable(a)vger.kernel.org Co-developed-by: Palmer Dabbelt <palmer(a)dabbelt.com> Signed-off-by: Jingwei Wang <wangjingwei(a)iscas.ac.cn> --- Changes in v6: - Based on Palmer's feedback, reworked the synchronization to be on-demand, deferring the wait until the first hwprobe syscall via a 'ready' flag. This avoids the boot-time regression from v5's approach. Changes in v5: - Reworked the synchronization logic to a robust "sentinel-count" pattern based on feedback from Alexandre. - Fixed a "multiple definition" linker error for nommu builds by changing the header-file stub functions to `static inline`, as pointed out by Olof. - Updated the commit message to better explain the rationale for moving the vDSO initialization to `late_initcall`. Changes in v4: - Reworked the synchronization mechanism based on feedback from Palmer and Alexandre. - Instead of a post-hoc refresh, this version introduces a robust completion-based framework using an atomic counter to ensure async probes are finished before populating the vDSO. - Moved the vdso data initialization to a late_initcall to avoid impacting boot time. Changes in v3: - Retained existing blank line. Changes in v2: - Addressed feedback from Yixun's regarding #ifdef CONFIG_MMU usage. - Updated commit message to provide a high-level summary. - Added Fixes tag for commit e7c9d66e313b. v1: https://lore.kernel.org/linux-riscv/20250521052754.185231-1-wangjingwei@isc… arch/riscv/include/asm/hwprobe.h | 8 ++- arch/riscv/include/asm/vdso/arch_data.h | 6 ++ arch/riscv/kernel/sys_hwprobe.c | 71 ++++++++++++++++++---- arch/riscv/kernel/unaligned_access_speed.c | 9 ++- arch/riscv/kernel/vdso/hwprobe.c | 2 +- 5 files changed, 79 insertions(+), 17 deletions(-) diff --git a/arch/riscv/include/asm/hwprobe.h b/arch/riscv/include/asm/hwprobe.h index 7fe0a379474ae2c6..3b2888126e659ea1 100644 --- a/arch/riscv/include/asm/hwprobe.h +++ b/arch/riscv/include/asm/hwprobe.h @@ -40,5 +40,11 @@ static inline bool riscv_hwprobe_pair_cmp(struct riscv_hwprobe *pair, return pair->value == other_pair->value; } - +#ifdef CONFIG_MMU +void riscv_hwprobe_register_async_probe(void); +void riscv_hwprobe_complete_async_probe(void); +#else +static inline void riscv_hwprobe_register_async_probe(void) {} +static inline void riscv_hwprobe_complete_async_probe(void) {} +#endif #endif diff --git a/arch/riscv/include/asm/vdso/arch_data.h b/arch/riscv/include/asm/vdso/arch_data.h index da57a3786f7a53c8..88b37af55175129b 100644 --- a/arch/riscv/include/asm/vdso/arch_data.h +++ b/arch/riscv/include/asm/vdso/arch_data.h @@ -12,6 +12,12 @@ struct vdso_arch_data { /* Boolean indicating all CPUs have the same static hwprobe values. */ __u8 homogeneous_cpus; + + /* + * A gate to check and see if the hwprobe data is actually ready, as + * probing is deferred to avoid boot slowdowns. + */ + __u8 ready; }; #endif /* __RISCV_ASM_VDSO_ARCH_DATA_H */ diff --git a/arch/riscv/kernel/sys_hwprobe.c b/arch/riscv/kernel/sys_hwprobe.c index 0b170e18a2beba57..fecb6790fa88e96c 100644 --- a/arch/riscv/kernel/sys_hwprobe.c +++ b/arch/riscv/kernel/sys_hwprobe.c @@ -5,6 +5,8 @@ * more details. */ #include <linux/syscalls.h> +#include <linux/completion.h> +#include <linux/atomic.h> #include <asm/cacheflush.h> #include <asm/cpufeature.h> #include <asm/hwprobe.h> @@ -452,28 +454,36 @@ static int hwprobe_get_cpus(struct riscv_hwprobe __user *pairs, return 0; } -static int do_riscv_hwprobe(struct riscv_hwprobe __user *pairs, - size_t pair_count, size_t cpusetsize, - unsigned long __user *cpus_user, - unsigned int flags) -{ - if (flags & RISCV_HWPROBE_WHICH_CPUS) - return hwprobe_get_cpus(pairs, pair_count, cpusetsize, - cpus_user, flags); +#ifdef CONFIG_MMU - return hwprobe_get_values(pairs, pair_count, cpusetsize, - cpus_user, flags); +static DECLARE_COMPLETION(boot_probes_done); +static atomic_t pending_boot_probes = ATOMIC_INIT(1); + +void riscv_hwprobe_register_async_probe(void) +{ + atomic_inc(&pending_boot_probes); } -#ifdef CONFIG_MMU +void riscv_hwprobe_complete_async_probe(void) +{ + if (atomic_dec_and_test(&pending_boot_probes)) + complete(&boot_probes_done); +} -static int __init init_hwprobe_vdso_data(void) +static int complete_hwprobe_vdso_data(void) { struct vdso_arch_data *avd = vdso_k_arch_data; u64 id_bitsmash = 0; struct riscv_hwprobe pair; int key; + /* We've probably already produced these values. */ + if (likely(avd->ready)) + return 0; + + if (unlikely(!atomic_dec_and_test(&pending_boot_probes))) + wait_for_completion(&boot_probes_done); + /* * Initialize vDSO data with the answers for the "all CPUs" case, to * save a syscall in the common case. @@ -501,13 +511,48 @@ static int __init init_hwprobe_vdso_data(void) * vDSO should defer to the kernel for exotic cpu masks. */ avd->homogeneous_cpus = id_bitsmash != 0 && id_bitsmash != -1; + + /* + * Make sure all the VDSO values are visible before we look at them. + * This pairs with the implicit "no speculativly visible accesses" + * barrier in the VDSO hwprobe code. + */ + smp_wmb(); + avd->ready = true; + return 0; +} + +static int __init init_hwprobe_vdso_data(void) +{ + struct vdso_arch_data *avd = vdso_k_arch_data; + + /* + * Prevent the vDSO cached values from being used, as they're not ready + * yet. + */ + avd->ready = false; return 0; } -arch_initcall_sync(init_hwprobe_vdso_data); +late_initcall(init_hwprobe_vdso_data); #endif /* CONFIG_MMU */ +static int do_riscv_hwprobe(struct riscv_hwprobe __user *pairs, + size_t pair_count, size_t cpusetsize, + unsigned long __user *cpus_user, + unsigned int flags) +{ + complete_hwprobe_vdso_data(); + + if (flags & RISCV_HWPROBE_WHICH_CPUS) + return hwprobe_get_cpus(pairs, pair_count, cpusetsize, + cpus_user, flags); + + return hwprobe_get_values(pairs, pair_count, cpusetsize, + cpus_user, flags); +} + SYSCALL_DEFINE5(riscv_hwprobe, struct riscv_hwprobe __user *, pairs, size_t, pair_count, size_t, cpusetsize, unsigned long __user *, cpus, unsigned int, flags) diff --git a/arch/riscv/kernel/unaligned_access_speed.c b/arch/riscv/kernel/unaligned_access_speed.c index ae2068425fbcd207..4b8ad2673b0f7470 100644 --- a/arch/riscv/kernel/unaligned_access_speed.c +++ b/arch/riscv/kernel/unaligned_access_speed.c @@ -379,6 +379,7 @@ static void check_vector_unaligned_access(struct work_struct *work __always_unus static int __init vec_check_unaligned_access_speed_all_cpus(void *unused __always_unused) { schedule_on_each_cpu(check_vector_unaligned_access); + riscv_hwprobe_complete_async_probe(); return 0; } @@ -473,8 +474,12 @@ static int __init check_unaligned_access_all_cpus(void) per_cpu(vector_misaligned_access, cpu) = unaligned_vector_speed_param; } else if (!check_vector_unaligned_access_emulated_all_cpus() && IS_ENABLED(CONFIG_RISCV_PROBE_VECTOR_UNALIGNED_ACCESS)) { - kthread_run(vec_check_unaligned_access_speed_all_cpus, - NULL, "vec_check_unaligned_access_speed_all_cpus"); + riscv_hwprobe_register_async_probe(); + if (IS_ERR(kthread_run(vec_check_unaligned_access_speed_all_cpus, + NULL, "vec_check_unaligned_access_speed_all_cpus"))) { + pr_warn("Failed to create vec_unalign_check kthread\n"); + riscv_hwprobe_complete_async_probe(); + } } /* diff --git a/arch/riscv/kernel/vdso/hwprobe.c b/arch/riscv/kernel/vdso/hwprobe.c index 2ddeba6c68dda09b..bf77b4c1d2d8e803 100644 --- a/arch/riscv/kernel/vdso/hwprobe.c +++ b/arch/riscv/kernel/vdso/hwprobe.c @@ -27,7 +27,7 @@ static int riscv_vdso_get_values(struct riscv_hwprobe *pairs, size_t pair_count, * homogeneous, then this function can handle requests for arbitrary * masks. */ - if ((flags != 0) || (!all_cpus && !avd->homogeneous_cpus)) + if ((flags != 0) || (!all_cpus && !avd->homogeneous_cpus) || unlikely(!avd->ready)) return riscv_hwprobe(pairs, pair_count, cpusetsize, cpus, flags); /* This is something we can handle, fill out the pairs. */ -- 2.50.1

3 weeks, 3 days

2
2
0 0

[PATCH v2] fs/adfs: bigdir: Restore EIO errno return when checkbyte mismatch

by Zhen Ni

The error path in adfs_fplus_read() prints an error message via adfs_error() but incorrectly returns success (0) to the caller. This occurs because the 'ret' variable remains set to 0 from the earlier successful call to adfs_fplus_validate_tail(). Fix by setting 'ret = -EIO' before jumping to the error exit. This issue was detected by smatch static analysis: warning: fs/adfs/dir_fplus.c:146: adfs_fplus_read() warn: missing error code 'ret'. Fixes: d79288b4f61b ("fs/adfs: bigdir: calculate and validate directory checkbyte") Cc: stable(a)vger.kernel.org Signed-off-by: Zhen Ni <zhen.ni(a)easystack.cn> --- Changes in v2: - Add tags of 'Fixes' and 'Cc' in commit message - Added error description and the corresponding fix in commit message --- fs/adfs/dir_fplus.c | 1 + 1 file changed, 1 insertion(+) diff --git a/fs/adfs/dir_fplus.c b/fs/adfs/dir_fplus.c index 4a15924014da..4334279409b2 100644 --- a/fs/adfs/dir_fplus.c +++ b/fs/adfs/dir_fplus.c @@ -143,6 +143,7 @@ static int adfs_fplus_read(struct super_block *sb, u32 indaddr, if (adfs_fplus_checkbyte(dir) != t->bigdircheckbyte) { adfs_error(sb, "dir %06x checkbyte mismatch\n", indaddr); + ret = -EIO; goto out; } -- 2.20.1

3 weeks, 3 days

1
0
0 0

[PATCH] drm/xe: Defer buffer object shrinker write-backs and GPU waits

by Thomas Hellström

When the xe buffer-object shrinker allows GPU waits and write-back, (typically from kswapd), perform multilpe passes, skipping subsequent passes if the shrinker number of scanned objects target is reached. 1) Without GPU waits and write-back 2) Without write-back 3) With both GPU-waits and write-back This is to avoid stalls and costly write- and readbacks unless they are really necessary. Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/5557#note_3035136 Fixes: 00c8efc3180f ("drm/xe: Add a shrinker for xe bos") Cc: <stable(a)vger.kernel.org> # v6.15+ Signed-off-by: Thomas Hellström <thomas.hellstrom(a)linux.intel.com> --- drivers/gpu/drm/xe/xe_shrinker.c | 51 +++++++++++++++++++++++++++++--- 1 file changed, 47 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/xe/xe_shrinker.c b/drivers/gpu/drm/xe/xe_shrinker.c index 1c3c04d52f55..bc3439bd4450 100644 --- a/drivers/gpu/drm/xe/xe_shrinker.c +++ b/drivers/gpu/drm/xe/xe_shrinker.c @@ -54,10 +54,10 @@ xe_shrinker_mod_pages(struct xe_shrinker *shrinker, long shrinkable, long purgea write_unlock(&shrinker->lock); } -static s64 xe_shrinker_walk(struct xe_device *xe, - struct ttm_operation_ctx *ctx, - const struct xe_bo_shrink_flags flags, - unsigned long to_scan, unsigned long *scanned) +static s64 __xe_shrinker_walk(struct xe_device *xe, + struct ttm_operation_ctx *ctx, + const struct xe_bo_shrink_flags flags, + unsigned long to_scan, unsigned long *scanned) { unsigned int mem_type; s64 freed = 0, lret; @@ -93,6 +93,48 @@ static s64 xe_shrinker_walk(struct xe_device *xe, return freed; } +/* + * Try shrinking idle objects without writeback first, then if not sufficient, + * try also non-idle objects and finally if that's not sufficient either, + * add writeback. This avoids stalls and explicit writebacks with light or + * moderate memory pressure. + */ +static s64 xe_shrinker_walk(struct xe_device *xe, + struct ttm_operation_ctx *ctx, + const struct xe_bo_shrink_flags flags, + unsigned long to_scan, unsigned long *scanned) +{ + bool no_wait_gpu = true; + struct xe_bo_shrink_flags save_flags = flags; + s64 lret, freed; + + swap(no_wait_gpu, ctx->no_wait_gpu); + save_flags.writeback = false; + lret = __xe_shrinker_walk(xe, ctx, save_flags, to_scan, scanned); + swap(no_wait_gpu, ctx->no_wait_gpu); + if (lret < 0 || *scanned >= to_scan) + return lret; + + freed = lret; + if (!ctx->no_wait_gpu) { + lret = __xe_shrinker_walk(xe, ctx, save_flags, to_scan, scanned); + if (lret < 0) + return lret; + freed += lret; + } + if (*scanned >= to_scan) + return freed; + + if (flags.writeback) { + lret = __xe_shrinker_walk(xe, ctx, flags, to_scan, scanned); + if (lret < 0) + return lret; + freed += lret; + } + + return freed; +} + static unsigned long xe_shrinker_count(struct shrinker *shrink, struct shrink_control *sc) { @@ -199,6 +241,7 @@ static unsigned long xe_shrinker_scan(struct shrinker *shrink, struct shrink_con runtime_pm = xe_shrinker_runtime_pm_get(shrinker, true, 0, can_backup); shrink_flags.purge = false; + lret = xe_shrinker_walk(shrinker->xe, &ctx, shrink_flags, nr_to_scan, &nr_scanned); if (lret >= 0) -- 2.50.1

3 weeks, 3 days

2
2
0 0

[PATCH v4] x86/cpu/intel: Fix the constant_tsc model check for Pentium 4

by Suchit Karunakaran

Pentium 4's which are INTEL_P4_PRESCOTT (model 0x03) and later have a constant TSC. This was correctly captured until commit fadb6f569b10 ("x86/cpu/intel: Limit the non-architectural constant_tsc model checks"). In that commit, an error was introduced while selecting the last P4 model (0x06) as the upper bound. Model 0x06 was transposed to INTEL_P4_WILLAMETTE, which is just plain wrong. That was presumably a simple typo, probably just copying and pasting the wrong P4 model. Fix the constant TSC logic to cover all later P4 models. End at INTEL_P4_CEDARMILL which accurately corresponds to the last P4 model. Fixes: fadb6f569b10 ("x86/cpu/intel: Limit the non-architectural constant_tsc model checks") Cc: <stable(a)vger.kernel.org> # v6.15 Signed-off-by: Suchit Karunakaran <suchitkarunakaran(a)gmail.com> Changes since v3: - Refined changelog Changes since v2: - Improve commit message Changes since v1: - Fix incorrect logic --- arch/x86/kernel/cpu/intel.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c index 076eaa41b8c8..6f5bd5dbc249 100644 --- a/arch/x86/kernel/cpu/intel.c +++ b/arch/x86/kernel/cpu/intel.c @@ -262,7 +262,7 @@ static void early_init_intel(struct cpuinfo_x86 *c) if (c->x86_power & (1 << 8)) { set_cpu_cap(c, X86_FEATURE_CONSTANT_TSC); set_cpu_cap(c, X86_FEATURE_NONSTOP_TSC); - } else if ((c->x86_vfm >= INTEL_P4_PRESCOTT && c->x86_vfm <= INTEL_P4_WILLAMETTE) || + } else if ((c->x86_vfm >= INTEL_P4_PRESCOTT && c->x86_vfm <= INTEL_P4_CEDARMILL) || (c->x86_vfm >= INTEL_CORE_YONAH && c->x86_vfm <= INTEL_IVYBRIDGE)) { set_cpu_cap(c, X86_FEATURE_CONSTANT_TSC); } -- 2.50.1

3 weeks, 3 days

2
2
0 0

[PATCH net v2] net/packet: fix a race in packet_set_ring() and packet_notifier()

by Willem de Bruijn

From: Quang Le <quanglex97(a)gmail.com> When packet_set_ring() releases po->bind_lock, another thread can run packet_notifier() and process an NETDEV_UP event. This race and the fix are both similar to that of commit 15fe076edea7 ("net/packet: fix a race in packet_bind() and packet_notifier()"). There too the packet_notifier NETDEV_UP event managed to run while a po->bind_lock critical section had to be temporarily released. And the fix was similarly to temporarily set po->num to zero to keep the socket unhooked until the lock is retaken. The po->bind_lock in packet_set_ring and packet_notifier precede the introduction of git history. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Cc: stable(a)vger.kernel.org Signed-off-by: Quang Le <quanglex97(a)gmail.com> Signed-off-by: Willem de Bruijn <willemb(a)google.com> --- v1->v2: - fix author attribution (From: at the top) v1: https://lore.kernel.org/netdev/20250731175132.2592130-1-willemdebruijn.kern… --- net/packet/af_packet.c | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c index bc438d0d96a7..a7017d7f0927 100644 --- a/net/packet/af_packet.c +++ b/net/packet/af_packet.c @@ -4573,10 +4573,10 @@ static int packet_set_ring(struct sock *sk, union tpacket_req_u *req_u, spin_lock(&po->bind_lock); was_running = packet_sock_flag(po, PACKET_SOCK_RUNNING); num = po->num; - if (was_running) { - WRITE_ONCE(po->num, 0); + WRITE_ONCE(po->num, 0); + if (was_running) __unregister_prot_hook(sk, false); - } + spin_unlock(&po->bind_lock); synchronize_net(); @@ -4608,10 +4608,10 @@ static int packet_set_ring(struct sock *sk, union tpacket_req_u *req_u, mutex_unlock(&po->pg_vec_lock); spin_lock(&po->bind_lock); - if (was_running) { - WRITE_ONCE(po->num, num); + WRITE_ONCE(po->num, num); + if (was_running) register_prot_hook(sk); - } + spin_unlock(&po->bind_lock); if (pg_vec && (po->tp_version > TPACKET_V2)) { /* Because we don't support block-based V3 on tx-ring */ -- 2.50.1.565.gc32cd1483b-goog

3 weeks, 3 days

2
1
0 0

[PATCH v2 0/4] i2c: rtl9300: Fix multi-byte I2C operations

by Sven Eckelmann

During the integration of the RTL8239 POE chip + its frontend MCU, it was noticed that multi-byte operations were basically broken in the current driver. Tests using SMBus Block Writes showed that the data (after the Wr + Ack marker) was mixed up on the wire. At first glance, it looked like an endianness problem. But for transfers were the number of count + data bytes was not divisible by 4, the last bytes were not looking like an endianness problem because they were were in the wrong order but not for example 0 - which would be the case for an endianness problem with 32 bit registers. At the end, it turned out to be a the way how i2c_write tried to add the bytes to the send registers. Each 32 bit register was used similar to a shift register - shifting the various bytes up the register while the next one is added to the least significant byte. But the I2C controller expects the first byte of the tranmission in the least significant byte of the first register. And the last byte (assuming it is a 16 byte transfer) in the most significant byte of the fourth register. While doing these tests, it was also observed that the count byte was missing from the SMBus Block Writes. The driver just removed them from the data->block (from the I2C subsystem). But the I2C controller DOES NOT automatically add this byte - for example by using the configured transmission length. The RTL8239 MCU is not actually an SMBus compliant device. Instead, it expects I2C Block Reads + I2C Block Writes. But according to the already identified bugs in the driver, it was clear that the I2C controller can simply be modified to not send the count byte for I2C_SMBUS_I2C_BLOCK_DATA. The receive part, just needs to write the content of the receive buffer to the correct position in data->block. While the on-wire formwat was now correct, reads were still not possible against the MCU (for the RTL8239 POE chip). It was always timing out because the 2ms were not enough for sending the read request and then receiving the 12 byte answer. These changes were originally submitted to OpenWrt. But there are plans to migrate OpenWrt to the upstream Linux driver. As result, the pull request was stopped and the changes were redone against this driver. For reasons of transparency: The work on I2C_SMBUS_I2C_BLOCK_DATA support for the RTL8239-MCU was done on RTL931xx. All problem were therefore detected with the patches from Jonas Jelonek [1] and not the vanilla Linux driver. But looking through the code, it seems like these are NOT regressions introduced by the RTL931x patchset. [1] https://patchwork.ozlabs.org/project/linux-i2c/cover/20250727114800.3046-1-… Signed-off-by: Sven Eckelmann <sven(a)narfation.org> --- Changes in v2: - add the missing transfer width and read length increase for the SMBus Write/Read - Link to v1: https://lore.kernel.org/r/20250802-i2c-rtl9300-multi-byte-v1-0-5f687e0098e2… --- Harshal Gohel (2): i2c: rtl9300: Fix multi-byte I2C write i2c: rtl9300: Implement I2C block read and write Sven Eckelmann (2): i2c: rtl9300: Increase timeout for transfer polling i2c: rtl9300: Add missing count byte for SMBus Block Ops drivers/i2c/busses/i2c-rtl9300.c | 43 +++++++++++++++++++++++++++++++++------- 1 file changed, 36 insertions(+), 7 deletions(-) --- base-commit: b9ddaa95fd283bce7041550ddbbe7e764c477110 change-id: 20250802-i2c-rtl9300-multi-byte-edaa1fb0872c Best regards, -- Sven Eckelmann <sven(a)narfation.org>

3 weeks, 3 days

2
7
0 0

[PATCH v3 0/5] i2c: rtl9300: Fix multi-byte I2C operations

by Sven Eckelmann

During the integration of the RTL8239 POE chip + its frontend MCU, it was noticed that multi-byte operations were basically broken in the current driver. Tests using SMBus Block Writes showed that the data (after the Wr + Ack marker) was mixed up on the wire. At first glance, it looked like an endianness problem. But for transfers were the number of count + data bytes was not divisible by 4, the last bytes were not looking like an endianness problem because they were were in the wrong order but not for example 0 - which would be the case for an endianness problem with 32 bit registers. At the end, it turned out to be a the way how i2c_write tried to add the bytes to the send registers. Each 32 bit register was used similar to a shift register - shifting the various bytes up the register while the next one is added to the least significant byte. But the I2C controller expects the first byte of the tranmission in the least significant byte of the first register. And the last byte (assuming it is a 16 byte transfer) in the most significant byte of the fourth register. While doing these tests, it was also observed that the count byte was missing from the SMBus Block Writes. The driver just removed them from the data->block (from the I2C subsystem). But the I2C controller DOES NOT automatically add this byte - for example by using the configured transmission length. The RTL8239 MCU is not actually an SMBus compliant device. Instead, it expects I2C Block Reads + I2C Block Writes. But according to the already identified bugs in the driver, it was clear that the I2C controller can simply be modified to not send the count byte for I2C_SMBUS_I2C_BLOCK_DATA. The receive part, just needs to write the content of the receive buffer to the correct position in data->block. While the on-wire formwat was now correct, reads were still not possible against the MCU (for the RTL8239 POE chip). It was always timing out because the 2ms were not enough for sending the read request and then receiving the 12 byte answer. These changes were originally submitted to OpenWrt. But there are plans to migrate OpenWrt to the upstream Linux driver. As result, the pull request was stopped and the changes were redone against this driver. For reasons of transparency: The work on I2C_SMBUS_I2C_BLOCK_DATA support for the RTL8239-MCU was done on RTL931xx. All problem were therefore detected with the patches from Jonas Jelonek [1] and not the vanilla Linux driver. But looking through the code, it seems like these are NOT regressions introduced by the RTL931x patchset. I've picked up Alex Guo's patch [2] to reduce conflicts between pending fixes. [1] https://patchwork.ozlabs.org/project/linux-i2c/cover/20250727114800.3046-1-… [2] https://lore.kernel.org/r/20250615235248.529019-1-alexguo1023@gmail.com Signed-off-by: Sven Eckelmann <sven(a)narfation.org> --- Changes in v3: - integrated patch https://lore.kernel.org/r/20250615235248.529019-1-alexguo1023@gmail.com to avoid conflicts in the I2C_SMBUS_BLOCK_DATA code - added Fixes and stable(a)vger.kernel.org to Alex Guo's patch - added Chris Packham's Reviewed-by/Acked-by - Link to v2: https://lore.kernel.org/r/20250803-i2c-rtl9300-multi-byte-v2-0-9b7b759fe2b6… Changes in v2: - add the missing transfer width and read length increase for the SMBus Write/Read - Link to v1: https://lore.kernel.org/r/20250802-i2c-rtl9300-multi-byte-v1-0-5f687e0098e2… --- Alex Guo (1): i2c: rtl9300: Fix out-of-bounds bug in rtl9300_i2c_smbus_xfer Harshal Gohel (2): i2c: rtl9300: Fix multi-byte I2C write i2c: rtl9300: Implement I2C block read and write Sven Eckelmann (2): i2c: rtl9300: Increase timeout for transfer polling i2c: rtl9300: Add missing count byte for SMBus Block Ops drivers/i2c/busses/i2c-rtl9300.c | 51 ++++++++++++++++++++++++++++++++++------ 1 file changed, 44 insertions(+), 7 deletions(-) --- base-commit: 0ae982df67760cd08affa935c0fe86c8a9311797 change-id: 20250802-i2c-rtl9300-multi-byte-edaa1fb0872c Best regards, -- Sven Eckelmann <sven(a)narfation.org>

3 weeks, 3 days

1
4
0 0

[PATCH] amdgpu/amdgpu_discovery: increase timeout limit for IFWI init

by Xaver Hugl

With a timeout of only 1 second, my rx 5700XT fails to initialize, so this increases the timeout to 2s. Closes https://gitlab.freedesktop.org/drm/amd/-/issues/3697 Signed-off-by: Xaver Hugl <xaver.hugl(a)kde.org> Cc: stable(a)vger.kernel.org --- drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c index 6d34eac0539d..ae6908b57d78 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c @@ -275,7 +275,7 @@ static int amdgpu_discovery_read_binary_from_mem(struct amdgpu_device *adev, int i, ret = 0; if (!amdgpu_sriov_vf(adev)) { - /* It can take up to a second for IFWI init to complete on some dGPUs, + /* It can take up to two seconds for IFWI init to complete on some dGPUs, * but generally it should be in the 60-100ms range. Normally this starts * as soon as the device gets power so by the time the OS loads this has long * completed. However, when a card is hotplugged via e.g., USB4, we need to @@ -283,7 +283,7 @@ static int amdgpu_discovery_read_binary_from_mem(struct amdgpu_device *adev, * continue. */ - for (i = 0; i < 1000; i++) { + for (i = 0; i < 2000; i++) { msg = RREG32(mmMP0_SMN_C2PMSG_33); if (msg & 0x80000000) break; -- 2.50.1

3 weeks, 3 days

2
1
0 0

[PATCH] eventpoll: Fix semi-unbounded recursion

by Jann Horn

Ensure that epoll instances can never form a graph deeper than EP_MAX_NESTS+1 links. Currently, ep_loop_check_proc() ensures that the graph is loop-free and does some recursion depth checks, but those recursion depth checks don't limit the depth of the resulting tree for two reasons: - They don't look upwards in the tree. - If there are multiple downwards paths of different lengths, only one of the paths is actually considered for the depth check since commit 28d82dc1c4ed ("epoll: limit paths"). Essentially, the current recursion depth check in ep_loop_check_proc() just serves to prevent it from recursing too deeply while checking for loops. A more thorough check is done in reverse_path_check() after the new graph edge has already been created; this checks, among other things, that no paths going upwards from any non-epoll file with a length of more than 5 edges exist. However, this check does not apply to non-epoll files. As a result, it is possible to recurse to a depth of at least roughly 500, tested on v6.15. (I am unsure if deeper recursion is possible; and this may have changed with commit 8c44dac8add7 ("eventpoll: Fix priority inversion problem").) To fix it: 1. In ep_loop_check_proc(), note the subtree depth of each visited node, and use subtree depths for the total depth calculation even when a subtree has already been visited. 2. Add ep_get_upwards_depth_proc() for similarly determining the maximum depth of an upwards walk. 3. In ep_loop_check(), use these values to limit the total path length between epoll nodes to EP_MAX_NESTS edges. Fixes: 22bacca48a17 ("epoll: prevent creating circular epoll structures") Cc: stable(a)vger.kernel.org Signed-off-by: Jann Horn <jannh(a)google.com> --- fs/eventpoll.c | 60 ++++++++++++++++++++++++++++++++++++++++++++-------------- 1 file changed, 46 insertions(+), 14 deletions(-) diff --git a/fs/eventpoll.c b/fs/eventpoll.c index d4dbffdedd08..44648cc09250 100644 --- a/fs/eventpoll.c +++ b/fs/eventpoll.c @@ -218,6 +218,7 @@ struct eventpoll { /* used to optimize loop detection check */ u64 gen; struct hlist_head refs; + u8 loop_check_depth; /* * usage count, used together with epitem->dying to @@ -2142,23 +2143,24 @@ static int ep_poll(struct eventpoll *ep, struct epoll_event __user *events, } /** - * ep_loop_check_proc - verify that adding an epoll file inside another - * epoll structure does not violate the constraints, in - * terms of closed loops, or too deep chains (which can - * result in excessive stack usage). + * ep_loop_check_proc - verify that adding an epoll file @ep inside another + * epoll file does not create closed loops, and + * determine the depth of the subtree starting at @ep * * @ep: the &struct eventpoll to be currently checked. * @depth: Current depth of the path being checked. * - * Return: %zero if adding the epoll @file inside current epoll - * structure @ep does not violate the constraints, or %-1 otherwise. + * Return: depth of the subtree, or INT_MAX if we found a loop or went too deep. */ static int ep_loop_check_proc(struct eventpoll *ep, int depth) { - int error = 0; + int result = 0; struct rb_node *rbp; struct epitem *epi; + if (ep->gen == loop_check_gen) + return ep->loop_check_depth; + mutex_lock_nested(&ep->mtx, depth + 1); ep->gen = loop_check_gen; for (rbp = rb_first_cached(&ep->rbr); rbp; rbp = rb_next(rbp)) { @@ -2166,13 +2168,11 @@ static int ep_loop_check_proc(struct eventpoll *ep, int depth) if (unlikely(is_file_epoll(epi->ffd.file))) { struct eventpoll *ep_tovisit; ep_tovisit = epi->ffd.file->private_data; - if (ep_tovisit->gen == loop_check_gen) - continue; if (ep_tovisit == inserting_into || depth > EP_MAX_NESTS) - error = -1; + result = INT_MAX; else - error = ep_loop_check_proc(ep_tovisit, depth + 1); - if (error != 0) + result = max(result, ep_loop_check_proc(ep_tovisit, depth + 1) + 1); + if (result > EP_MAX_NESTS) break; } else { /* @@ -2186,9 +2186,27 @@ static int ep_loop_check_proc(struct eventpoll *ep, int depth) list_file(epi->ffd.file); } } + ep->loop_check_depth = result; mutex_unlock(&ep->mtx); - return error; + return result; +} + +/** + * ep_get_upwards_depth_proc - determine depth of @ep when traversed upwards + */ +static int ep_get_upwards_depth_proc(struct eventpoll *ep, int depth) +{ + int result = 0; + struct epitem *epi; + + if (ep->gen == loop_check_gen) + return ep->loop_check_depth; + hlist_for_each_entry_rcu(epi, &ep->refs, fllink) + result = max(result, ep_get_upwards_depth_proc(epi->ep, depth + 1) + 1); + ep->gen = loop_check_gen; + ep->loop_check_depth = result; + return result; } /** @@ -2204,8 +2222,22 @@ static int ep_loop_check_proc(struct eventpoll *ep, int depth) */ static int ep_loop_check(struct eventpoll *ep, struct eventpoll *to) { + int depth, upwards_depth; + inserting_into = ep; - return ep_loop_check_proc(to, 0); + /* + * Check how deep down we can get from @to, and whether it is possible + * to loop up to @ep. + */ + depth = ep_loop_check_proc(to, 0); + if (depth > EP_MAX_NESTS) + return -1; + /* Check how far up we can go from @ep. */ + rcu_read_lock(); + upwards_depth = ep_get_upwards_depth_proc(ep, 0); + rcu_read_unlock(); + + return (depth+1+upwards_depth > EP_MAX_NESTS) ? -1 : 0; } static void clear_tfile_check_list(void) --- base-commit: 0ff41df1cb268fc69e703a08a57ee14ae967d0ca change-id: 20250711-epoll-recursion-fix-fb0e336b2aeb -- Jann Horn <jannh(a)google.com>

3 weeks, 3 days

2
3
0 0

[PATCH v3] mfd: intel_soc_pmic_chtdc_ti: Set use_single_read regmap_config flag

by Hans de Goede

Testing has shown that reading multiple registers at once (for 10-bit ADC values) does not work. Set the use_single_read regmap_config flag to make regmap split these for us. This should fix temperature opregion accesses done by drivers/acpi/pmic/intel_pmic_chtdc_ti.c and is also necessary for the upcoming drivers for the ADC and battery MFD cells. Fixes: 6bac0606fdba ("mfd: Add support for Cherry Trail Dollar Cove TI PMIC") Cc: stable(a)vger.kernel.org Reviewed-by: Andy Shevchenko <andy(a)kernel.org> Signed-off-by: Hans de Goede <hansg(a)kernel.org> --- Changes in v3: - Fix a few typos in the commit message Changes in v2: - Update comment to: "The hardware does not support reading multiple registers at once" --- drivers/mfd/intel_soc_pmic_chtdc_ti.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/mfd/intel_soc_pmic_chtdc_ti.c b/drivers/mfd/intel_soc_pmic_chtdc_ti.c index 4c1a68c9f575..6daf33e07ea0 100644 --- a/drivers/mfd/intel_soc_pmic_chtdc_ti.c +++ b/drivers/mfd/intel_soc_pmic_chtdc_ti.c @@ -82,6 +82,8 @@ static const struct regmap_config chtdc_ti_regmap_config = { .reg_bits = 8, .val_bits = 8, .max_register = 0xff, + /* The hardware does not support reading multiple registers at once */ + .use_single_read = true, }; static const struct regmap_irq chtdc_ti_irqs[] = { -- 2.49.0

3 weeks, 3 days

1
0
0 0

[PATCH AUTOSEL 6.6 01/59] usb: xhci: print xhci->xhc_state when queue_command failed

by Sasha Levin

From: Su Hui <suhui(a)nfschina.com> [ Upstream commit 7919407eca2ef562fa6c98c41cfdf6f6cdd69d92 ] When encounters some errors like these: xhci_hcd 0000:4a:00.2: xHCI dying or halted, can't queue_command xhci_hcd 0000:4a:00.2: FIXME: allocate a command ring segment usb usb5-port6: couldn't allocate usb_device It's hard to know whether xhc_state is dying or halted. So it's better to print xhc_state's value which can help locate the resaon of the bug. Signed-off-by: Su Hui <suhui(a)nfschina.com> Link: https://lore.kernel.org/r/20250725060117.1773770-1-suhui@nfschina.com Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> Signed-off-by: Sasha Levin <sashal(a)kernel.org> --- LLM Generated explanations, may be completely bogus: **Backport Status: YES** This commit is suitable for backporting to stable kernel trees for the following reasons: 1. **Enhanced Debugging for Real-World Issues**: The commit improves debugging of USB xHCI host controller failures by printing the actual `xhc_state` value when `queue_command` fails. The commit message shows real error messages users encounter ("xHCI dying or halted, can't queue_command"), demonstrating this is a real-world debugging problem. 2. **Minimal and Safe Change**: The change is extremely small and safe - it only modifies a debug print statement from: ```c xhci_dbg(xhci, "xHCI dying or halted, can't queue_command\n"); ``` to: ```c xhci_dbg(xhci, "xHCI dying or halted, can't queue_command. state: 0x%x\n", xhci->xhc_state); ``` 3. **No Functional Changes**: This is a pure diagnostic improvement. It doesn't change any logic, control flow, or data structures. It only adds the state value (0x%x format) to an existing debug message. 4. **Important for Troubleshooting**: The xHCI driver is critical for USB functionality, and when it fails with "dying or halted" states, knowing the exact state helps diagnose whether: - `XHCI_STATE_DYING` (0x1) - controller is dying - `XHCI_STATE_HALTED` (0x2) - controller is halted - Both states (0x3) - controller has both flags set This distinction is valuable for debugging hardware issues, driver bugs, or system problems. 5. **Zero Risk of Regression**: Adding a parameter to a debug print statement has no risk of introducing regressions. The worst case is the debug message prints the state value. 6. **Follows Stable Rules**: This meets stable kernel criteria as it: - Fixes a real debugging limitation - Is obviously correct - Has been tested (signed-off and accepted by Greg KH) - Is small (single line change) - Doesn't add new features, just improves existing diagnostics The commit helps system administrators and developers diagnose USB issues more effectively by providing the actual state value rather than just saying "dying or halted", making it a valuable debugging enhancement for stable kernels. drivers/usb/host/xhci-ring.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c index 44352df58c9e..c6d89b51c678 100644 --- a/drivers/usb/host/xhci-ring.c +++ b/drivers/usb/host/xhci-ring.c @@ -4454,7 +4454,8 @@ static int queue_command(struct xhci_hcd *xhci, struct xhci_command *cmd, if ((xhci->xhc_state & XHCI_STATE_DYING) || (xhci->xhc_state & XHCI_STATE_HALTED)) { - xhci_dbg(xhci, "xHCI dying or halted, can't queue_command\n"); + xhci_dbg(xhci, "xHCI dying or halted, can't queue_command. state: 0x%x\n", + xhci->xhc_state); return -ESHUTDOWN; } -- 2.39.5

3 weeks, 3 days

2
60
0 0

[PATCH v2] powerpc/mm: Fix SLB multihit issue during SLB preload

by Donet Tom

On systems using the hash MMU, there is a software SLB preload cache that mirrors the entries loaded into the hardware SLB buffer. This preload cache is subject to periodic eviction — typically after every 256 context switches — to remove old entry. To optimize performance, the kernel skips switch_mmu_context() in switch_mm_irqs_off() when the prev and next mm_struct are the same. However, on hash MMU systems, this can lead to inconsistencies between the hardware SLB and the software preload cache. If an SLB entry for a process is evicted from the software cache on one CPU, and the same process later runs on another CPU without executing switch_mmu_context(), the hardware SLB may retain stale entries. If the kernel then attempts to reload that entry, it can trigger an SLB multi-hit error. The following timeline shows how stale SLB entries are created and can cause a multi-hit error when a process moves between CPUs without a MMU context switch. CPU 0 CPU 1 ----- ----- Process P exec swapper/1 load_elf_binary begin_new_exc activate_mm switch_mm_irqs_off switch_mmu_context switch_slb /* * This invalidates all * the entries in the HW * and setup the new HW * SLB entries as per the * preload cache. */ context_switch sched_migrate_task migrates process P to cpu-1 Process swapper/0 context switch (to process P) (uses mm_struct of Process P) switch_mm_irqs_off() switch_slb load_slb++ /* * load_slb becomes 0 here * and we evict an entry from * the preload cache with * preload_age(). We still * keep HW SLB and preload * cache in sync, that is * because all HW SLB entries * anyways gets evicted in * switch_slb during SLBIA. * We then only add those * entries back in HW SLB, * which are currently * present in preload_cache * (after eviction). */ load_elf_binary continues... setup_new_exec() slb_setup_new_exec() sched_switch event sched_migrate_task migrates process P to cpu-0 context_switch from swapper/0 to Process P switch_mm_irqs_off() /* * Since both prev and next mm struct are same we don't call * switch_mmu_context(). This will cause the HW SLB and SW preload * cache to go out of sync in preload_new_slb_context. Because there * was an SLB entry which was evicted from both HW and preload cache * on cpu-1. Now later in preload_new_slb_context(), when we will try * to add the same preload entry again, we will add this to the SW * preload cache and then will add it to the HW SLB. Since on cpu-0 * this entry was never invalidated, hence adding this entry to the HW * SLB will cause a SLB multi-hit error. */ load_elf_binary continues... START_THREAD start_thread preload_new_slb_context /* * This tries to add a new EA to preload cache which was earlier * evicted from both cpu-1 HW SLB and preload cache. This caused the * HW SLB of cpu-0 to go out of sync with the SW preload cache. The * reason for this was, that when we context switched back on CPU-0, * we should have ideally called switch_mmu_context() which will * bring the HW SLB entries on CPU-0 in sync with SW preload cache * entries by setting up the mmu context properly. But we didn't do * that since the prev mm_struct running on cpu-0 was same as the * next mm_struct (which is true for swapper / kernel threads). So * now when we try to add this new entry into the HW SLB of cpu-0, * we hit a SLB multi-hit error. */ WARNING: CPU: 0 PID: 1810970 at arch/powerpc/mm/book3s64/slb.c:62 assert_slb_presence+0x2c/0x50(48 results) 02:47:29 [20157/42149] Modules linked in: CPU: 0 UID: 0 PID: 1810970 Comm: dd Not tainted 6.16.0-rc3-dirty #12 VOLUNTARY Hardware name: IBM pSeries (emulated by qemu) POWER8 (architected) 0x4d0200 0xf000004 of:SLOF,HEAD hv:linux,kvm pSeries NIP: c00000000015426c LR: c0000000001543b4 CTR: 0000000000000000 REGS: c0000000497c77e0 TRAP: 0700 Not tainted (6.16.0-rc3-dirty) MSR: 8000000002823033 <SF,VEC,VSX,FP,ME,IR,DR,RI,LE> CR: 28888482 XER: 00000000 CFAR: c0000000001543b0 IRQMASK: 3 <...> NIP [c00000000015426c] assert_slb_presence+0x2c/0x50 LR [c0000000001543b4] slb_insert_entry+0x124/0x390 Call Trace: 0x7fffceb5ffff (unreliable) preload_new_slb_context+0x100/0x1a0 start_thread+0x26c/0x420 load_elf_binary+0x1b04/0x1c40 bprm_execve+0x358/0x680 do_execveat_common+0x1f8/0x240 sys_execve+0x58/0x70 system_call_exception+0x114/0x300 system_call_common+0x160/0x2c4 To fix this issue, we add a code change to always switch the MMU context on hash MMU if the SLB preload cache has aged. With this change, the SLB multi-hit error no longer occurs. cc: Christophe Leroy <christophe.leroy(a)csgroup.eu> cc: Ritesh Harjani (IBM) <ritesh.list(a)gmail.com> cc: Michael Ellerman <mpe(a)ellerman.id.au> cc: Nicholas Piggin <npiggin(a)gmail.com> Fixes: 5434ae74629a ("powerpc/64s/hash: Add a SLB preload cache") cc: stable(a)vger.kernel.org Suggested-by: Ritesh Harjani (IBM) <ritesh.list(a)gmail.com> Signed-off-by: Donet Tom <donettom(a)linux.ibm.com> --- v1 -> v2 : Changed commit message and added a comment in switch_mm_irqs_off() v1 - https://lore.kernel.org/all/20250731161027.966196-1-donettom@linux.ibm.com/ --- arch/powerpc/mm/book3s64/slb.c | 2 +- arch/powerpc/mm/mmu_context.c | 7 +++++-- 2 files changed, 6 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/mm/book3s64/slb.c b/arch/powerpc/mm/book3s64/slb.c index 6b783552403c..08daac3f978c 100644 --- a/arch/powerpc/mm/book3s64/slb.c +++ b/arch/powerpc/mm/book3s64/slb.c @@ -509,7 +509,7 @@ void switch_slb(struct task_struct *tsk, struct mm_struct *mm) * SLB preload cache. */ tsk->thread.load_slb++; - if (!tsk->thread.load_slb) { + if (tsk->thread.load_slb == U8_MAX) { unsigned long pc = KSTK_EIP(tsk); preload_age(ti); diff --git a/arch/powerpc/mm/mmu_context.c b/arch/powerpc/mm/mmu_context.c index 3e3af29b4523..95455d787288 100644 --- a/arch/powerpc/mm/mmu_context.c +++ b/arch/powerpc/mm/mmu_context.c @@ -83,8 +83,11 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next, /* Some subarchs need to track the PGD elsewhere */ switch_mm_pgdir(tsk, next); - /* Nothing else to do if we aren't actually switching */ - if (prev == next) + /* + * Nothing else to do if we aren't actually switching and + * the preload slb cache has not aged + */ + if ((prev == next) && (tsk->thread.load_slb != U8_MAX)) return; /* -- 2.50.1

3 weeks, 3 days

4
4
0 0

[PATCH v2] mfd: intel_soc_pmic_chtdc_ti: Set use_single_read regmap_config flag

by Hans de Goede

Testing has shown that reading multiple registers at once (for 10 bit adc values) does not work. Set the use_single_read regmap_config flag to make regmap split these for is. This should fix temperature opregion accesses done by drivers/acpi/pmic/intel_pmic_chtdc_ti.c and is also necessary for the upcoming drivers for the ADC and battery MFD cells. Fixes: 6bac0606fdba ("mfd: Add support for Cherry Trail Dollar Cove TI PMIC") Cc: stable(a)vger.kernel.org Reviewed-by: Andy Shevchenko <andy(a)kernel.org> Signed-off-by: Hans de Goede <hansg(a)kernel.org> --- Changes in v2: - Update comment to: "The hardware does not support reading multiple registers at once" --- drivers/mfd/intel_soc_pmic_chtdc_ti.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/mfd/intel_soc_pmic_chtdc_ti.c b/drivers/mfd/intel_soc_pmic_chtdc_ti.c index 4c1a68c9f575..6daf33e07ea0 100644 --- a/drivers/mfd/intel_soc_pmic_chtdc_ti.c +++ b/drivers/mfd/intel_soc_pmic_chtdc_ti.c @@ -82,6 +82,8 @@ static const struct regmap_config chtdc_ti_regmap_config = { .reg_bits = 8, .val_bits = 8, .max_register = 0xff, + /* The hardware does not support reading multiple registers at once */ + .use_single_read = true, }; static const struct regmap_irq chtdc_ti_irqs[] = { -- 2.49.0

3 weeks, 3 days

2
1
0 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

Linux-stable-mirror