- Linux-stable-mirror - lists.linaro.org

[PATCH AUTOSEL 6.6 01/18] drm/msm: Fix a fence leak in submit error path

by Sasha Levin

From: Rob Clark <robdclark(a)chromium.org> [ Upstream commit 5d319f75ccf7f0927425a7545aa1a22b3eedc189 ] In error paths, we could unref the submit without calling drm_sched_entity_push_job(), so msm_job_free() will never get called. Since drm_sched_job_cleanup() will NULL out the s_fence, we can use that to detect this case. Signed-off-by: Rob Clark <robdclark(a)chromium.org> Patchwork: https://patchwork.freedesktop.org/patch/653584/ Signed-off-by: Rob Clark <robin.clark(a)oss.qualcomm.com> Signed-off-by: Sasha Levin <sashal(a)kernel.org> --- Let me analyze based on what I know from the commit: **YES** This commit should be backported to stable kernel trees. Here's my extensive analysis: ## Analysis of the Fence Leak Fix ### 1. **Bug Description** The commit fixes a resource leak in the MSM GPU driver's submit error path. When errors occur after `drm_sched_job_arm()` but before `drm_sched_entity_push_job()`, the scheduler fence (`s_fence`) is leaked because `drm_sched_job_cleanup()` is never called. ### 2. **Code Changes Analysis** The fix adds this critical code to `__msm_gem_submit_destroy()`: ```c /* - In error paths, we could unref the submit without calling - drm_sched_entity_push_job(), so msm_job_free() will never - get called. Since drm_sched_job_cleanup() will NULL out - s_fence, we can use that to detect this case. */ if (submit->base.s_fence) drm_sched_job_cleanup(&submit->base); ``` This is a defensive check that ensures proper cleanup of scheduler resources when the submit is destroyed without going through the normal scheduler path. ### 3. **Why This is a Good Backport Candidate** **a) Fixes a Real Bug:** This addresses a concrete resource leak that affects system stability. Each leaked fence consumes memory and other kernel resources that are never freed. **b) Small and Contained:** The fix is minimal - just 9 lines of code including comments. It doesn't change any APIs or data structures. **c) Clear Error Path Fix:** The fix targets a specific error handling path without affecting the normal execution flow. When `drm_sched_entity_push_job()` is called successfully, `msm_job_free()` handles the cleanup as before. **d) Follows Established Patterns:** The fix uses the same pattern as the hw_fence cleanup (checking refcount) that's already in the code, making it consistent with existing error handling. **e) No Architectural Changes:** This is purely a bug fix that plugs a resource leak. It doesn't introduce new features or change behavior. **f) Critical Subsystem:** GPU drivers are important for system stability and user experience. Memory leaks in GPU submission paths can accumulate quickly under heavy graphics workloads. ### 4. **Comparison with Similar Commits** Looking at the similar commits provided: - The first commit (drm/msm: Fix wait_fence submitqueue leak) with backport status YES also fixes a resource leak in error paths - It's similarly small, contained, and fixes a clear bug without architectural changes ### 5. **Risk Assessment** **Low Risk:** - The added check only executes in the destroy path when `s_fence` is non-NULL - If `drm_sched_entity_push_job()` was called, `msm_job_free()` will have already cleaned up `s_fence`, making it NULL - The fix is defensive and won't cause issues even if called unnecessarily ### 6. **Stable Criteria Met** According to stable tree rules: - ✓ Fixes a bug (fence resource leak) - ✓ Minimal change (9 lines) - ✓ Obviously correct (clear logic with explanatory comment) - ✓ Already tested (has Patchwork link indicating review) - ✓ No new features - ✓ Fixes a real issue users could encounter The fence leak could accumulate over time, especially in scenarios with GPU submission failures, leading to memory exhaustion and system instability. This makes it an important fix for stable kernels. drivers/gpu/drm/msm/msm_gem_submit.c | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/drivers/gpu/drm/msm/msm_gem_submit.c b/drivers/gpu/drm/msm/msm_gem_submit.c index 018b39546fc1d..27366304f5d59 100644 --- a/drivers/gpu/drm/msm/msm_gem_submit.c +++ b/drivers/gpu/drm/msm/msm_gem_submit.c @@ -85,6 +85,15 @@ void __msm_gem_submit_destroy(struct kref *kref) container_of(kref, struct msm_gem_submit, ref); unsigned i; + /* + * In error paths, we could unref the submit without calling + * drm_sched_entity_push_job(), so msm_job_free() will never + * get called. Since drm_sched_job_cleanup() will NULL out + * s_fence, we can use that to detect this case. + */ + if (submit->base.s_fence) + drm_sched_job_cleanup(&submit->base); + if (submit->fence_id) { spin_lock(&submit->queue->idr_lock); idr_remove(&submit->queue->fence_idr, submit->fence_id); -- 2.39.5

3 weeks

1
17
0 0

[PATCH AUTOSEL 6.12 01/19] drm/msm: Fix a fence leak in submit error path

by Sasha Levin

From: Rob Clark <robdclark(a)chromium.org> [ Upstream commit 5d319f75ccf7f0927425a7545aa1a22b3eedc189 ] In error paths, we could unref the submit without calling drm_sched_entity_push_job(), so msm_job_free() will never get called. Since drm_sched_job_cleanup() will NULL out the s_fence, we can use that to detect this case. Signed-off-by: Rob Clark <robdclark(a)chromium.org> Patchwork: https://patchwork.freedesktop.org/patch/653584/ Signed-off-by: Rob Clark <robin.clark(a)oss.qualcomm.com> Signed-off-by: Sasha Levin <sashal(a)kernel.org> --- Let me analyze based on what I know from the commit: **YES** This commit should be backported to stable kernel trees. Here's my extensive analysis: ## Analysis of the Fence Leak Fix ### 1. **Bug Description** The commit fixes a resource leak in the MSM GPU driver's submit error path. When errors occur after `drm_sched_job_arm()` but before `drm_sched_entity_push_job()`, the scheduler fence (`s_fence`) is leaked because `drm_sched_job_cleanup()` is never called. ### 2. **Code Changes Analysis** The fix adds this critical code to `__msm_gem_submit_destroy()`: ```c /* - In error paths, we could unref the submit without calling - drm_sched_entity_push_job(), so msm_job_free() will never - get called. Since drm_sched_job_cleanup() will NULL out - s_fence, we can use that to detect this case. */ if (submit->base.s_fence) drm_sched_job_cleanup(&submit->base); ``` This is a defensive check that ensures proper cleanup of scheduler resources when the submit is destroyed without going through the normal scheduler path. ### 3. **Why This is a Good Backport Candidate** **a) Fixes a Real Bug:** This addresses a concrete resource leak that affects system stability. Each leaked fence consumes memory and other kernel resources that are never freed. **b) Small and Contained:** The fix is minimal - just 9 lines of code including comments. It doesn't change any APIs or data structures. **c) Clear Error Path Fix:** The fix targets a specific error handling path without affecting the normal execution flow. When `drm_sched_entity_push_job()` is called successfully, `msm_job_free()` handles the cleanup as before. **d) Follows Established Patterns:** The fix uses the same pattern as the hw_fence cleanup (checking refcount) that's already in the code, making it consistent with existing error handling. **e) No Architectural Changes:** This is purely a bug fix that plugs a resource leak. It doesn't introduce new features or change behavior. **f) Critical Subsystem:** GPU drivers are important for system stability and user experience. Memory leaks in GPU submission paths can accumulate quickly under heavy graphics workloads. ### 4. **Comparison with Similar Commits** Looking at the similar commits provided: - The first commit (drm/msm: Fix wait_fence submitqueue leak) with backport status YES also fixes a resource leak in error paths - It's similarly small, contained, and fixes a clear bug without architectural changes ### 5. **Risk Assessment** **Low Risk:** - The added check only executes in the destroy path when `s_fence` is non-NULL - If `drm_sched_entity_push_job()` was called, `msm_job_free()` will have already cleaned up `s_fence`, making it NULL - The fix is defensive and won't cause issues even if called unnecessarily ### 6. **Stable Criteria Met** According to stable tree rules: - ✓ Fixes a bug (fence resource leak) - ✓ Minimal change (9 lines) - ✓ Obviously correct (clear logic with explanatory comment) - ✓ Already tested (has Patchwork link indicating review) - ✓ No new features - ✓ Fixes a real issue users could encounter The fence leak could accumulate over time, especially in scenarios with GPU submission failures, leading to memory exhaustion and system instability. This makes it an important fix for stable kernels. drivers/gpu/drm/msm/msm_gem_submit.c | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/drivers/gpu/drm/msm/msm_gem_submit.c b/drivers/gpu/drm/msm/msm_gem_submit.c index f775638d239a5..937c1f5d88cbb 100644 --- a/drivers/gpu/drm/msm/msm_gem_submit.c +++ b/drivers/gpu/drm/msm/msm_gem_submit.c @@ -85,6 +85,15 @@ void __msm_gem_submit_destroy(struct kref *kref) container_of(kref, struct msm_gem_submit, ref); unsigned i; + /* + * In error paths, we could unref the submit without calling + * drm_sched_entity_push_job(), so msm_job_free() will never + * get called. Since drm_sched_job_cleanup() will NULL out + * s_fence, we can use that to detect this case. + */ + if (submit->base.s_fence) + drm_sched_job_cleanup(&submit->base); + if (submit->fence_id) { spin_lock(&submit->queue->idr_lock); idr_remove(&submit->queue->fence_idr, submit->fence_id); -- 2.39.5

3 weeks

1
18
0 0

[PATCH AUTOSEL 6.15 01/20] x86/platform/amd: move final timeout check to after final sleep

by Sasha Levin

From: Jake Hillion <jake(a)hillion.co.uk> [ Upstream commit f8afb12a2d7503de6558c23cacd7acbf6e9fe678 ] __hsmp_send_message sleeps between result read attempts and has a timeout of 100ms. Under extreme load it's possible for these sleeps to take a long time, exceeding the 100ms. In this case the current code does not check the register and fails with ETIMEDOUT. Refactor the loop to ensure there is at least one read of the register after a sleep of any duration. This removes instances of ETIMEDOUT with a single caller, even with a misbehaving scheduler. Tested on AMD Bergamo machines. Suggested-by: Blaise Sanouillet <linux(a)blaise.sanouillet.com> Reviewed-by: Suma Hegde <suma.hegde(a)amd.com> Tested-by: Suma Hegde <suma.hegde(a)amd.com> Signed-off-by: Jake Hillion <jake(a)hillion.co.uk> Link: https://lore.kernel.org/r/20250605-amd-hsmp-v2-1-a811bc3dd74a@hillion.co.uk Reviewed-by: Ilpo Järvinen <ilpo.jarvinen(a)linux.intel.com> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen(a)linux.intel.com> Signed-off-by: Sasha Levin <sashal(a)kernel.org> --- **YES** This commit should be backported to stable kernel trees. Here's my extensive analysis: ## Bug Analysis The commit fixes a critical race condition in the AMD HSMP driver's timeout handling. The issue occurs in the `__hsmp_send_message()` function where: 1. **Original code flaw**: The loop checks `time_before(jiffies, timeout)` at the start of each iteration, then sleeps, but doesn't check the register one final time after the last sleep completes. 2. **Race condition scenario**: Under heavy system load (as mentioned in the commit message - "Under extreme load"), the sleep operations (`usleep_range()`) can take significantly longer than intended. If the final sleep extends beyond the 100ms timeout window, the code exits with `-ETIMEDOUT` without checking if the SMU actually responded during that extended sleep. 3. **The fix**: Changes the loop from `while (time_before(jiffies, timeout))` to `while (true)` and moves the timeout check to after the register read, ensuring at least one register check occurs after any sleep duration. ## Why This Qualifies for Stable Backport 1. **Fixes a real bug affecting users**: The commit message explicitly states this was "Tested on AMD Bergamo machines" and fixes actual instances of ETIMEDOUT errors. AMD Bergamo is a server processor used in production data centers. 2. **Small, contained fix**: The change is minimal - just restructuring the loop logic without changing functionality. This meets the stable kernel criteria of being a small, obvious fix. 3. **No new features**: This purely fixes existing behavior without adding capabilities. 4. **Critical subsystem**: The HSMP driver is essential for AMD EPYC server management, controlling: - Power management - Thermal monitoring - Performance states - System telemetry 5. **Production impact**: Spurious `-ETIMEDOUT` errors would cause failures in: - Data center management tools - Power capping operations - Performance monitoring - Thermal management 6. **Clear problem and solution**: The race condition is well- understood, and the fix ensures the code behaves as intended - checking the register after sleeping rather than potentially timing out without a final check. ## Risk Assessment The risk is minimal because: - The logic change is straightforward and correct - It's been tested on production AMD Bergamo systems - It only affects the timeout path behavior - The worst case is the same as before (timeout after 100ms) - The best case fixes false timeouts under load This is exactly the type of bug fix that stable kernels exist to deliver - fixing real issues users encounter in production without introducing new risks. drivers/platform/x86/amd/hsmp/hsmp.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/drivers/platform/x86/amd/hsmp/hsmp.c b/drivers/platform/x86/amd/hsmp/hsmp.c index a3ac09a90de45..ab877112f4c80 100644 --- a/drivers/platform/x86/amd/hsmp/hsmp.c +++ b/drivers/platform/x86/amd/hsmp/hsmp.c @@ -99,7 +99,7 @@ static int __hsmp_send_message(struct hsmp_socket *sock, struct hsmp_message *ms short_sleep = jiffies + msecs_to_jiffies(HSMP_SHORT_SLEEP); timeout = jiffies + msecs_to_jiffies(HSMP_MSG_TIMEOUT); - while (time_before(jiffies, timeout)) { + while (true) { ret = sock->amd_hsmp_rdwr(sock, mbinfo->msg_resp_off, &mbox_status, HSMP_RD); if (ret) { dev_err(sock->dev, "Error %d reading mailbox status\n", ret); @@ -108,6 +108,10 @@ static int __hsmp_send_message(struct hsmp_socket *sock, struct hsmp_message *ms if (mbox_status != HSMP_STATUS_NOT_READY) break; + + if (!time_before(jiffies, timeout)) + break; + if (time_before(jiffies, short_sleep)) usleep_range(50, 100); else -- 2.39.5

3 weeks

1
19
0 0

[PATCH] lib/crypto: mips/chacha: Fix clang build and remove unneeded byteswap

by Eric Biggers

The MIPS32r2 ChaCha code has never been buildable with the clang assembler. First, clang doesn't support the 'rotl' pseudo-instruction: error: unknown instruction, did you mean: rol, rotr? Second, clang requires that both operands of the 'wsbh' instruction be explicitly given: error: too few operands for instruction To fix this, align the code with the real instruction set by (1) using the real instruction 'rotr' instead of the nonstandard pseudo- instruction 'rotl', and (2) explicitly giving both operands to 'wsbh'. To make removing the use of 'rotl' a bit easier, also remove the unnecessary special-casing for big endian CPUs at .Lchacha_mips_xor_bytes. The tail handling is actually endian-independent since it processes one byte at a time. On big endian CPUs the old code byte-swapped SAVED_X, then iterated through it in reverse order. But the byteswap and reverse iteration canceled out. Tested with chacha20poly1305-selftest in QEMU using "-M malta" with both little endian and big endian mips32r2 kernels. Fixes: 49aa7c00eddf ("crypto: mips/chacha - import 32r2 ChaCha code from Zinc") Cc: stable(a)vger.kernel.org Reported-by: kernel test robot <lkp(a)intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202505080409.EujEBwA0-lkp@intel.com/ Signed-off-by: Eric Biggers <ebiggers(a)kernel.org> --- This applies on top of other pending lib/crypto patches and can be retrieved from git at: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux.git mips-chacha-fix lib/crypto/mips/chacha-core.S | 20 +++++++------------- 1 file changed, 7 insertions(+), 13 deletions(-) diff --git a/lib/crypto/mips/chacha-core.S b/lib/crypto/mips/chacha-core.S index 5755f69cfe007..706aeb850fb0d 100644 --- a/lib/crypto/mips/chacha-core.S +++ b/lib/crypto/mips/chacha-core.S @@ -53,21 +53,17 @@ #define IS_UNALIGNED $s7 #if __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__ #define MSB 0 #define LSB 3 -#define ROTx rotl -#define ROTR(n) rotr n, 24 #define CPU_TO_LE32(n) \ - wsbh n; \ + wsbh n, n; \ rotr n, 16; #else #define MSB 3 #define LSB 0 -#define ROTx rotr #define CPU_TO_LE32(n) -#define ROTR(n) #endif #define FOR_EACH_WORD(x) \ x( 0); \ x( 1); \ @@ -190,14 +186,14 @@ CONCAT3(.Lchacha_mips_xor_aligned_, PLUS_ONE(x), _b: ;) \ addu X(D), X(N); \ xor X(V), X(A); \ xor X(W), X(B); \ xor X(Y), X(C); \ xor X(Z), X(D); \ - rotl X(V), S; \ - rotl X(W), S; \ - rotl X(Y), S; \ - rotl X(Z), S; + rotr X(V), 32 - S; \ + rotr X(W), 32 - S; \ + rotr X(Y), 32 - S; \ + rotr X(Z), 32 - S; .text .set reorder .set noat .globl chacha_crypt_arch @@ -370,25 +366,23 @@ chacha_crypt_arch: addu IN, $at addu OUT, $at /* First byte */ lbu T1, 0(IN) addiu $at, BYTES, 1 - CPU_TO_LE32(SAVED_X) - ROTR(SAVED_X) xor T1, SAVED_X sb T1, 0(OUT) beqz $at, .Lchacha_mips_xor_done /* Second byte */ lbu T1, 1(IN) addiu $at, BYTES, 2 - ROTx SAVED_X, 8 + rotr SAVED_X, 8 xor T1, SAVED_X sb T1, 1(OUT) beqz $at, .Lchacha_mips_xor_done /* Third byte */ lbu T1, 2(IN) - ROTx SAVED_X, 8 + rotr SAVED_X, 8 xor T1, SAVED_X sb T1, 2(OUT) b .Lchacha_mips_xor_done .Lchacha_mips_no_full_block_unaligned: -- 2.50.0

3 weeks

1
1
0 0

[PATCH v4 0/2] Fix DR6/DR7 initialization

by Xin Li (Intel)

Sohil reported seeing a split lock warning when running a test that generates userspace #DB: x86/split lock detection: #DB: sigtrap_loop_64/4614 took a bus_lock trap at address: 0x4011ae We investigated the issue and figured out: 1) The warning is a false positive. 2) It is not caused by the test itself. 3) It occurs even when Bus Lock Detection (BLD) is disabled. 4) It only happens on the first #DB on a CPU. And the root cause is, at boot time, Linux zeros DR6. This leads to different DR6 values depending on whether the CPU supports BLD: 1) On CPUs with BLD support, DR6 becomes 0xFFFF07F0 (bit 11, DR6.BLD, is cleared). 2) On CPUs without BLD, DR6 becomes 0xFFFF0FF0. Since only BLD-induced #DB exceptions clear DR6.BLD and other debug exceptions leave it unchanged, even if the first #DB is unrelated to BLD, DR6.BLD is still cleared. As a result, such a first #DB is misinterpreted as a BLD #DB, and a false warning is triggerred. Fix the bug by initializing DR6 by writing its architectural reset value at boot time. DR7 suffers from a similar issue, apply the same fix. This patch set is based on tip/x86/urgent branch as of today. Link to the previous patch set v3: https://lore.kernel.org/all/20250618172723.1651465-1-xin@zytor.com/ Change in v4: *) Cc stable in the DR7 initialization patch for backporting, just in case bit 10 of DR7 has become unreserved on new hardware, even though clearing it doesn't currently cause any real issues (Dave Hansen). Xin Li (Intel) (2): x86/traps: Initialize DR6 by writing its architectural reset value x86/traps: Initialize DR7 by writing its architectural reset value arch/x86/include/asm/debugreg.h | 19 ++++++++++++---- arch/x86/include/asm/kvm_host.h | 2 +- arch/x86/include/uapi/asm/debugreg.h | 21 ++++++++++++++++- arch/x86/kernel/cpu/common.c | 24 ++++++++------------ arch/x86/kernel/kgdb.c | 2 +- arch/x86/kernel/process_32.c | 2 +- arch/x86/kernel/process_64.c | 2 +- arch/x86/kernel/traps.c | 34 +++++++++++++++++----------- arch/x86/kvm/x86.c | 4 ++-- 9 files changed, 72 insertions(+), 38 deletions(-) base-commit: 2aebf5ee43bf0ed225a09a30cf515d9f2813b759 -- 2.49.0

3 weeks

4
11
0 0

[PATCH] Revert "riscv: Define TASK_SIZE_MAX for __access_ok()"

by Nam Cao

This reverts commit ad5643cf2f69 ("riscv: Define TASK_SIZE_MAX for __access_ok()"). This commit changes TASK_SIZE_MAX to be LONG_MAX to optimize access_ok(), because the previous TASK_SIZE_MAX (default to TASK_SIZE) requires some computation. The reasoning was that all user addresses are less than LONG_MAX, and all kernel addresses are greater than LONG_MAX. Therefore access_ok() can filter kernel addresses. Addresses between TASK_SIZE and LONG_MAX are not valid user addresses, but access_ok() let them pass. That was thought to be okay, because they are not valid addresses at hardware level. Unfortunately, one case is missed: get_user_pages_fast() happily accepts addresses between TASK_SIZE and LONG_MAX. futex(), for instance, uses get_user_pages_fast(). This causes the problem reported by Robert [1]. Therefore, revert this commit. TASK_SIZE_MAX is changed to the default: TASK_SIZE. This unfortunately reduces performance, because TASK_SIZE is more expensive to compute compared to LONG_MAX. But correctness first, we can think about optimization later, if required. Reported-by: <rtm(a)csail.mit.edu> Closes: https://lore.kernel.org/linux-riscv/77605.1750245028@localhost/ Signed-off-by: Nam Cao <namcao(a)linutronix.de> Cc: stable(a)vger.kernel.org --- arch/riscv/include/asm/pgtable.h | 1 - 1 file changed, 1 deletion(-) diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h index 438ce7df24c39..5bd5aae60d536 100644 --- a/arch/riscv/include/asm/pgtable.h +++ b/arch/riscv/include/asm/pgtable.h @@ -1075,7 +1075,6 @@ static inline pte_t pte_swp_clear_exclusive(pte_t pte) */ #ifdef CONFIG_64BIT #define TASK_SIZE_64 (PGDIR_SIZE * PTRS_PER_PGD / 2) -#define TASK_SIZE_MAX LONG_MAX #ifdef CONFIG_COMPAT #define TASK_SIZE_32 (_AC(0x80000000, UL) - PAGE_SIZE) -- 2.39.5

3 weeks

4
3
0 0

[PATCH] Revert "riscv: misaligned: fix sleeping function called during misaligned access handling"

by Nam Cao

This reverts commit 61a74ad25462 ("riscv: misaligned: fix sleeping function called during misaligned access handling"). The commit addresses a sleeping in atomic context problem, but it is not the correct fix as explained by Clément: "Using nofault would lead to failure to read from user memory that is paged out for instance. This is not really acceptable, we should handle user misaligned access even at an address that would generate a page fault." This bug has been properly fixed by commit 453805f0a28f ("riscv: misaligned: enable IRQs while handling misaligned accesses"). Revert this improper fix. Link: https://lore.kernel.org/linux-riscv/b779beed-e44e-4a5e-9551-4647682b0d21@ri… Signed-off-by: Nam Cao <namcao(a)linutronix.de> Cc: stable(a)vger.kernel.org --- arch/riscv/kernel/traps_misaligned.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/riscv/kernel/traps_misaligned.c b/arch/riscv/kernel/traps_misaligned.c index dd8e4af6583f4..93043924fe6c6 100644 --- a/arch/riscv/kernel/traps_misaligned.c +++ b/arch/riscv/kernel/traps_misaligned.c @@ -454,7 +454,7 @@ static int handle_scalar_misaligned_load(struct pt_regs *regs) val.data_u64 = 0; if (user_mode(regs)) { - if (copy_from_user_nofault(&val, (u8 __user *)addr, len)) + if (copy_from_user(&val, (u8 __user *)addr, len)) return -1; } else { memcpy(&val, (u8 *)addr, len); @@ -555,7 +555,7 @@ static int handle_scalar_misaligned_store(struct pt_regs *regs) return -EOPNOTSUPP; if (user_mode(regs)) { - if (copy_to_user_nofault((u8 __user *)addr, &val, len)) + if (copy_to_user((u8 __user *)addr, &val, len)) return -1; } else { memcpy((u8 *)addr, &val, len); -- 2.39.5

3 weeks

4
3
0 0

[PATCH] riscv: export boot_cpu_hartid

by Klara Modin

The mailbox controller driver for the Microchip Inter-processor Communication can be built as a module. It uses cpuid_to_hartid_map and commit 4783ce32b080 ("riscv: export __cpuid_to_hartid_map") enables that to work for SMP. However, cpuid_to_hartid_map uses boot_cpu_hartid on non-SMP kernels and this driver can be useful in such configurations[1]. Export boot_cpu_hartid so the driver can be built as a module on non-SMP kernels as well. Link: https://lore.kernel.org/lkml/20250617-confess-reimburse-876101e099cb@spud/ [1] Cc: stable(a)vger.kernel.org Fixes: e4b1d67e7141 ("mailbox: add Microchip IPC support") Signed-off-by: Klara Modin <klarasmodin(a)gmail.com> --- arch/riscv/kernel/setup.c | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/riscv/kernel/setup.c b/arch/riscv/kernel/setup.c index f7c9a1caa83e..14888e5ea19a 100644 --- a/arch/riscv/kernel/setup.c +++ b/arch/riscv/kernel/setup.c @@ -50,6 +50,7 @@ atomic_t hart_lottery __section(".sdata") #endif ; unsigned long boot_cpu_hartid; +EXPORT_SYMBOL_GPL(boot_cpu_hartid); /* * Place kernel memory regions on the resource tree so that -- 2.49.0

3 weeks

4
3
0 0

+ mm-vmalloc-leave-lazy-mmu-mode-on-pte-mapping-error.patch added to mm-hotfixes-unstable branch

by Andrew Morton

The patch titled Subject: mm/vmalloc: leave lazy MMU mode on PTE mapping error has been added to the -mm mm-hotfixes-unstable branch. Its filename is mm-vmalloc-leave-lazy-mmu-mode-on-pte-mapping-error.patch This patch will shortly appear at https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche… This patch will later appear in the mm-hotfixes-unstable branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Before you just go and hit "reply", please: a) Consider who else should be cc'ed b) Prefer to cc a suitable mailing list as well c) Ideally: find the original patch on the mailing list and do a reply-to-all to that, adding suitable additional cc's *** Remember to use Documentation/process/submit-checklist.rst when testing your code *** The -mm tree is included into linux-next via the mm-everything branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm and is updated there every 2-3 working days ------------------------------------------------------ From: Alexander Gordeev <agordeev(a)linux.ibm.com> Subject: mm/vmalloc: leave lazy MMU mode on PTE mapping error Date: Mon, 23 Jun 2025 09:57:21 +0200 vmap_pages_pte_range() enters the lazy MMU mode, but fails to leave it in case an error is encountered. Link: https://lkml.kernel.org/r/20250623075721.2817094-1-agordeev@linux.ibm.com Fixes: 2ba3e6947aed ("mm/vmalloc: track which page-table levels were modified") Signed-off-by: Alexander Gordeev <agordeev(a)linux.ibm.com> Reported-by: kernel test robot <lkp(a)intel.com> Reported-by: Dan Carpenter <dan.carpenter(a)linaro.org> Closes: https://lore.kernel.org/r/202506132017.T1l1l6ME-lkp@intel.com/ Reviewed-by: Ryan Roberts <ryan.roberts(a)arm.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- mm/vmalloc.c | 22 +++++++++++++++------- 1 file changed, 15 insertions(+), 7 deletions(-) --- a/mm/vmalloc.c~mm-vmalloc-leave-lazy-mmu-mode-on-pte-mapping-error +++ a/mm/vmalloc.c @@ -514,6 +514,7 @@ static int vmap_pages_pte_range(pmd_t *p unsigned long end, pgprot_t prot, struct page **pages, int *nr, pgtbl_mod_mask *mask) { + int err = 0; pte_t *pte; /* @@ -530,12 +531,18 @@ static int vmap_pages_pte_range(pmd_t *p do { struct page *page = pages[*nr]; - if (WARN_ON(!pte_none(ptep_get(pte)))) - return -EBUSY; - if (WARN_ON(!page)) - return -ENOMEM; - if (WARN_ON(!pfn_valid(page_to_pfn(page)))) - return -EINVAL; + if (WARN_ON(!pte_none(ptep_get(pte)))) { + err = -EBUSY; + break; + } + if (WARN_ON(!page)) { + err = -ENOMEM; + break; + } + if (WARN_ON(!pfn_valid(page_to_pfn(page)))) { + err = -EINVAL; + break; + } set_pte_at(&init_mm, addr, pte, mk_pte(page, prot)); (*nr)++; @@ -543,7 +550,8 @@ static int vmap_pages_pte_range(pmd_t *p arch_leave_lazy_mmu_mode(); *mask |= PGTBL_PTE_MODIFIED; - return 0; + + return err; } static int vmap_pages_pmd_range(pud_t *pud, unsigned long addr, _ Patches currently in -mm which might be from agordeev(a)linux.ibm.com are mm-vmalloc-leave-lazy-mmu-mode-on-pte-mapping-error.patch

3 weeks

1
0
0 0

[ath10k][QCA9377] Firmware crashes on Dell Inspiron 5567 (IRQ #16, all modern distro kernels)

by Bandhan Pramanik

Hello, This is to inform all that constant firmware crashes have been seen in the "Qualcomm Atheros QCA9377 802.11ac Wireless Network Adapter", which was shipped with the Dell Inspiron 5567 laptops. This affects every kernel release, including the stable and the longterm ones. All the logs have been taken after livebooting an Arch Linux ISO. Every distro has been tried, and it has been confirmed that some error of this kind is shown in every distro. ## Steps to reproduce the issue 1. Boot/liveboot any Linux ISO through this card (and possibly, this laptop). 2. Wi-Fi network interface appears. 3. Connect the Wi-Fi router to the computer. 4. A few moments/minutes after that, the touchpad stops working, and the network interface cannot even access the Internet anymore (BUT, the network interface might disappear, might not disappear). ## Affected distros and the necessary workarounds This has been the pattern on every distro and their corresponding kernels (LMDE, Linux Mint, Pop!_OS, Zorin, Kubuntu, KDE Neon, elementaryOS, Fedora, and even Arch). The fix which made these distros usable is to add two things: - Adding "options ath10k_core skip_otp=y" to a new conf file in /etc/modprobe.d. - Adding "pci=noaer" in GRUB kernel parameters so that the logs are not flooded with Multiple Correctable Errors. To defend my case (that it occurs in the other models of Inspiron 5567 too), I have recently contacted someone running Linux Mint on the same model. The answer was the same: the touchpad and the Wi-Fi stop simultaneously. ## Some of the limitations The kernel was tainted, but the other things have been properly noted in case they might provide some useful details. As stated, investigating why IRQ #16 is disabled will probably give us the answer. ## Logs provided All the logs in a combined manner can be found here: https://gist.github.com/BandhanPramanik/ddb0cb23eca03ca2ea43a1d832a16180 - Full dmesg: https://gist.github.com/BandhanPramanik/ddb0cb23eca03ca2ea43a1d832a16180#fi… - Hostnamectl: https://gist.github.com/BandhanPramanik/ddb0cb23eca03ca2ea43a1d832a16180#fi… - lspci: https://gist.github.com/BandhanPramanik/ddb0cb23eca03ca2ea43a1d832a16180#fi… - Modinfo of the driver: https://gist.github.com/BandhanPramanik/ddb0cb23eca03ca2ea43a1d832a16180#fi… - Ping command: https://gist.github.com/BandhanPramanik/ddb0cb23eca03ca2ea43a1d832a16180#fi… - /proc/interrupts: https://gist.github.com/BandhanPramanik/ddb0cb23eca03ca2ea43a1d832a16180#fi… - IP addr command (Heavily Redacted): https://gist.github.com/BandhanPramanik/ddb0cb23eca03ca2ea43a1d832a16180#fi… Lastly, this issue on the GitHub repository of Pop!_OS 'might' be relevant: https://github.com/pop-os/pop/issues/1470 It would be highly appreciated if the matter were looked into. Thanks, Bandhan Pramanik

3 weeks

1
3
0 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

Linux-stable-mirror