September 2025 - Linux-stable-mirror

[PATCH v2 1/1] hung_task: fix warnings caused by unaligned lock pointers

by Lance Yang

From: Lance Yang <lance.yang(a)linux.dev> The blocker tracking mechanism assumes that lock pointers are at least 4-byte aligned to use their lower bits for type encoding. However, as reported by Eero Tamminen, some architectures like m68k only guarantee 2-byte alignment of 32-bit values. This breaks the assumption and causes two related WARN_ON_ONCE checks to trigger. To fix this, the runtime checks are adjusted to silently ignore any lock that is not 4-byte aligned, effectively disabling the feature in such cases and avoiding the related warnings. Thanks to Geert Uytterhoeven for bisecting! Reported-by: Eero Tamminen <oak(a)helsinkinet.fi> Closes: https://lore.kernel.org/lkml/CAMuHMdW7Ab13DdGs2acMQcix5ObJK0O2dG_Fxzr8_g58R… Fixes: e711faaafbe5 ("hung_task: replace blocker_mutex with encoded blocker") Cc: <stable(a)vger.kernel.org> Reviewed-by: Masami Hiramatsu (Google) <mhiramat(a)kernel.org> Signed-off-by: Lance Yang <lance.yang(a)linux.dev> --- v1 -> v2: - Pick RB from Masami - thanks! - Update the changelog and comments - https://lore.kernel.org/lkml/20250823050036.7748-1-lance.yang@linux.dev/ include/linux/hung_task.h | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/include/linux/hung_task.h b/include/linux/hung_task.h index 34e615c76ca5..c4403eeb7144 100644 --- a/include/linux/hung_task.h +++ b/include/linux/hung_task.h @@ -20,6 +20,10 @@ * always zero. So we can use these bits to encode the specific blocking * type. * + * Note that on architectures where this is not guaranteed, or for any + * unaligned lock, this tracking mechanism is silently skipped for that + * lock. + * * Type encoding: * 00 - Blocked on mutex (BLOCKER_TYPE_MUTEX) * 01 - Blocked on semaphore (BLOCKER_TYPE_SEM) @@ -45,7 +49,7 @@ static inline void hung_task_set_blocker(void *lock, unsigned long type) * If the lock pointer matches the BLOCKER_TYPE_MASK, return * without writing anything. */ - if (WARN_ON_ONCE(lock_ptr & BLOCKER_TYPE_MASK)) + if (lock_ptr & BLOCKER_TYPE_MASK) return; WRITE_ONCE(current->blocker, lock_ptr | type); @@ -53,8 +57,6 @@ static inline void hung_task_set_blocker(void *lock, unsigned long type) static inline void hung_task_clear_blocker(void) { - WARN_ON_ONCE(!READ_ONCE(current->blocker)); - WRITE_ONCE(current->blocker, 0UL); } -- 2.49.0

2 months

9
32
0 0

[PATCH v2 RESEND] media: as102: fix to not free memory after the device is registered in as102_usb_probe()

by Jeongjun Park

In as102_usb driver, the following race condition occurs: ``` CPU0 CPU1 as102_usb_probe() kzalloc(); // alloc as102_dev_t .... usb_register_dev(); open("/path/to/dev"); // open as102 dev .... usb_deregister_dev(); .... kfree(); // free as102_dev_t .... close(fd); as102_release() // UAF!! as102_usb_release() kfree(); // DFB!! ``` When a USB character device registered with usb_register_dev() is later unregistered (via usb_deregister_dev() or disconnect), the device node is removed so new open() calls fail. However, file descriptors that are already open do not go away immediately: they remain valid until the last reference is dropped and the driver's .release() is invoked. In as102, as102_usb_probe() calls usb_register_dev() and then, on an error path, does usb_deregister_dev() and frees as102_dev_t right away. If userspace raced a successful open() before the deregistration, that open FD will later hit as102_release() --> as102_usb_release() and access or free as102_dev_t again, occur a race to use-after-free and double-free vuln. The fix is to never kfree(as102_dev_t) directly once usb_register_dev() has succeeded. After deregistration, defer freeing memory to .release(). In other words, let release() perform the last kfree when the final open FD is closed. Cc: <stable(a)vger.kernel.org> Reported-by: syzbot+47321e8fd5a4c84088db(a)syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=47321e8fd5a4c84088db Fixes: cd19f7d3e39b ("[media] as102: fix leaks at failure paths in as102_usb_probe()") Signed-off-by: Jeongjun Park <aha310510(a)gmail.com> --- v2: Fix incorrect patch description style and CC stable mailing list - Link to v1: https://lore.kernel.org/all/20250822143539.1157329-1-aha310510@gmail.com/ --- drivers/media/usb/as102/as102_usb_drv.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/media/usb/as102/as102_usb_drv.c b/drivers/media/usb/as102/as102_usb_drv.c index e0ef66a522e2..abde5666b2ee 100644 --- a/drivers/media/usb/as102/as102_usb_drv.c +++ b/drivers/media/usb/as102/as102_usb_drv.c @@ -404,6 +404,7 @@ static int as102_usb_probe(struct usb_interface *intf, as102_free_usb_stream_buffer(as102_dev); failed_stream: usb_deregister_dev(intf, &as102_usb_class_driver); + return ret; failed: usb_put_dev(as102_dev->bus_adap.usb_dev); usb_set_intfdata(intf, NULL); --

2 months

2
1
0 0

[PATCH v2 RESEND] media: hackrf: fix to not free memory after the device is registered in hackrf_probe()

by Jeongjun Park

In hackrf driver, the following race condition occurs: ``` CPU0 CPU1 hackrf_probe() kzalloc(); // alloc hackrf_dev .... v4l2_device_register(); .... open("/path/to/dev"); // open hackrf dev .... v4l2_device_unregister(); .... kfree(); // free hackrf_dev .... ioctl(fd, ...); v4l2_ioctl(); video_is_registered() // UAF!! .... close(fd); v4l2_release() // UAF!! hackrf_video_release() kfree(); // DFB!! ``` When a V4L2 or video device is unregistered, the device node is removed so new open() calls are blocked. However, file descriptors that are already open-and any in-flight I/O-do not terminate immediately; they remain valid until the last reference is dropped and the driver's release() is invoked. Therefore, freeing device memory on the error path after hackrf_probe() has registered dev it will lead to a race to use-after-free vuln, since those already-open handles haven't been released yet. And since release() free memory too, race to use-after-free and double-free vuln occur. To prevent this, if device is registered from probe(), it should be modified to free memory only through release() rather than calling kfree() directly. Cc: <stable(a)vger.kernel.org> Reported-by: syzbot+6ffd76b5405c006a46b7(a)syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=6ffd76b5405c006a46b7 Reported-by: syzbot+f1b20958f93d2d250727(a)syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=f1b20958f93d2d250727 Fixes: 8bc4a9ed8504 ("[media] hackrf: add support for transmitter") Signed-off-by: Jeongjun Park <aha310510(a)gmail.com> --- v2: Fix incorrect patch description style and CC stable mailing list - Link to v1: https://lore.kernel.org/all/20250822142729.1156816-1-aha310510@gmail.com/ --- drivers/media/usb/hackrf/hackrf.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/media/usb/hackrf/hackrf.c b/drivers/media/usb/hackrf/hackrf.c index 0b50de8775a3..d7a84422193d 100644 --- a/drivers/media/usb/hackrf/hackrf.c +++ b/drivers/media/usb/hackrf/hackrf.c @@ -1515,6 +1515,8 @@ static int hackrf_probe(struct usb_interface *intf, video_unregister_device(&dev->rx_vdev); err_v4l2_device_unregister: v4l2_device_unregister(&dev->v4l2_dev); + dev_dbg(&intf->dev, "failed=%d\n", ret); + return ret; err_v4l2_ctrl_handler_free_tx: v4l2_ctrl_handler_free(&dev->tx_ctrl_handler); err_v4l2_ctrl_handler_free_rx: --

2 months

2
1
0 0

[PATCH] [PATCH v3] perf/x86/amd: check event before enable to avoid GPF

by George Kennedy

On AMD machines cpuc->events[idx] can become NULL in a subtle race condition with NMI->throttle->x86_pmu_stop(). Check event for NULL in amd_pmu_enable_all() before enable to avoid a GPF. This appears to be an AMD only issue. Syzkaller reported a GPF in amd_pmu_enable_all. INFO: NMI handler (perf_event_nmi_handler) took too long to run: 13.143 msecs Oops: general protection fault, probably for non-canonical address 0xdffffc0000000034: 0000 PREEMPT SMP KASAN NOPTI KASAN: null-ptr-deref in range [0x00000000000001a0-0x00000000000001a7] CPU: 0 UID: 0 PID: 328415 Comm: repro_36674776 Not tainted 6.12.0-rc1-syzk RIP: 0010:x86_pmu_enable_event (arch/x86/events/perf_event.h:1195 arch/x86/events/core.c:1430) RSP: 0018:ffff888118009d60 EFLAGS: 00010012 RAX: dffffc0000000000 RBX: 0000000000000000 RCX: 0000000000000000 RDX: 0000000000000034 RSI: 0000000000000000 RDI: 00000000000001a0 RBP: 0000000000000001 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000002 R13: ffff88811802a440 R14: ffff88811802a240 R15: ffff8881132d8601 FS: 00007f097dfaa700(0000) GS:ffff888118000000(0000) GS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000200001c0 CR3: 0000000103d56000 CR4: 00000000000006f0 Call Trace: <IRQ> amd_pmu_enable_all (arch/x86/events/amd/core.c:760 (discriminator 2)) x86_pmu_enable (arch/x86/events/core.c:1360) event_sched_out (kernel/events/core.c:1191 kernel/events/core.c:1186 kernel/events/core.c:2346) __perf_remove_from_context (kernel/events/core.c:2435) event_function (kernel/events/core.c:259) remote_function (kernel/events/core.c:92 (discriminator 1) kernel/events/core.c:72 (discriminator 1)) __flush_smp_call_function_queue (./arch/x86/include/asm/jump_label.h:27 ./include/linux/jump_label.h:207 ./include/trace/events/csd.h:64 kernel/smp.c:135 kernel/smp.c:540) __sysvec_call_function_single (./arch/x86/include/asm/jump_label.h:27 ./include/linux/jump_label.h:207 ./arch/x86/include/asm/trace/irq_vectors.h:99 arch/x86/kernel/smp.c:272) sysvec_call_function_single (arch/x86/kernel/smp.c:266 (discriminator 47) arch/x86/kernel/smp.c:266 (discriminator 47)) </IRQ> Reported-by: syzkaller <syzkaller(a)googlegroups.com> Signed-off-by: George Kennedy <george.kennedy(a)oracle.com> --- Suggested comment now with the code. arch/x86/events/amd/core.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/arch/x86/events/amd/core.c b/arch/x86/events/amd/core.c index 920e3a640cad..6615ace15f5d 100644 --- a/arch/x86/events/amd/core.c +++ b/arch/x86/events/amd/core.c @@ -762,7 +762,12 @@ static void amd_pmu_enable_all(int added) if (!test_bit(idx, cpuc->active_mask)) continue; - amd_pmu_enable_event(cpuc->events[idx]); + /* + * FIXME: cpuc->events[idx] can become NULL in a subtle race + * condition with NMI->throttle->x86_pmu_stop(). + */ + if (cpuc->events[idx]) + amd_pmu_enable_event(cpuc->events[idx]); } } -- 2.39.3

2 months

3
2
0 0

[PATCH v2 0/5] rtc: Fix problems with missing UIE irqs

by Esben Haabendal

This fixes a couple of different problems, that can cause RTC (alarm) irqs to be missing when generating UIE interrupts. The first commit fixes a long-standing problem, which has been documented in a comment since 2010. This fixes a race that could cause UIE irqs to stop being generated, which was easily reproduced by timing the use of RTC_UIE_ON ioctl with the seconds tick in the RTC. The last commit ensures that RTC (alarm) irqs are enabled whenever RTC_UIE_ON ioctl is used. The driver specific commits avoids kernel warnings about unbalanced enable_irq/disable_irq, which gets triggered on first RTC_UIE_ON with the last commit. Before this series, the same warning should be seen on initial RTC_AIE_ON with those drivers. Signed-off-by: Esben Haabendal <esben(a)geanix.com> --- Changes in v2: - Dropped patch for rtc-st-lpc driver. - Link to v1: https://lore.kernel.org/r/20241203-rtc-uie-irq-fixes-v1-0-01286ecd9f3f@gean… --- Esben Haabendal (5): rtc: interface: Fix long-standing race when setting alarm rtc: isl12022: Fix initial enable_irq/disable_irq balance rtc: cpcap: Fix initial enable_irq/disable_irq balance rtc: tps6586x: Fix initial enable_irq/disable_irq balance rtc: interface: Ensure alarm irq is enabled when UIE is enabled drivers/rtc/interface.c | 27 +++++++++++++++++++++++++++ drivers/rtc/rtc-cpcap.c | 1 + drivers/rtc/rtc-isl12022.c | 1 + drivers/rtc/rtc-tps6586x.c | 1 + 4 files changed, 30 insertions(+) --- base-commit: 82f2b0b97b36ee3fcddf0f0780a9a0825d52fec3 change-id: 20241203-rtc-uie-irq-fixes-f2838782d0f8 Best regards, -- Esben Haabendal <esben(a)geanix.com>

2 months, 1 week

2
3
0 0

[PATCH] perf lzma: Fix return value in lzma_is_compressed()

by Miaoqian Lin

lzma_is_compressed() returns -1 on error but is declared as bool. And -1 gets converted to true, which could be misleading. Return false instead to match the declared type. Fixes: 4b57fd44b61b ("perf tools: Add lzma_is_compressed function") Cc: <stable(a)vger.kernel.org> Signed-off-by: Miaoqian Lin <linmq006(a)gmail.com> --- tools/perf/util/lzma.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/perf/util/lzma.c b/tools/perf/util/lzma.c index bbcd2ffcf4bd..c355757ed391 100644 --- a/tools/perf/util/lzma.c +++ b/tools/perf/util/lzma.c @@ -120,7 +120,7 @@ bool lzma_is_compressed(const char *input) ssize_t rc; if (fd < 0) - return -1; + return false; rc = read(fd, buf, sizeof(buf)); close(fd); -- 2.39.5 (Apple Git-154)

2 months, 1 week

2
1
0 0

FAILED: patch "[PATCH] selftests: mptcp: connect: catch IO errors on listen side" failed to apply to 5.15-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 5.15-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y git checkout FETCH_HEAD git cherry-pick -x 14e22b43df25dbd4301351b882486ea38892ae4f # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025092158-molehill-radiation-11c3@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^.. Possible dependencies: thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From 14e22b43df25dbd4301351b882486ea38892ae4f Mon Sep 17 00:00:00 2001 From: "Matthieu Baerts (NGI0)" <matttbe(a)kernel.org> Date: Fri, 12 Sep 2025 14:25:51 +0200 Subject: [PATCH] selftests: mptcp: connect: catch IO errors on listen side IO errors were correctly printed to stderr, and propagated up to the main loop for the server side, but the returned value was ignored. As a consequence, the program for the listener side was no longer exiting with an error code in case of IO issues. Because of that, some issues might not have been seen. But very likely, most issues either had an effect on the client side, or the file transfer was not the expected one, e.g. the connection got reset before the end. Still, it is better to fix this. The main consequence of this issue is the error that was reported by the selftests: the received and sent files were different, and the MIB counters were not printed. Also, when such errors happened during the 'disconnect' tests, the program tried to continue until the timeout. Now when an IO error is detected, the program exits directly with an error. Fixes: 05be5e273c84 ("selftests: mptcp: add disconnect tests") Cc: stable(a)vger.kernel.org Reviewed-by: Mat Martineau <martineau(a)kernel.org> Reviewed-by: Geliang Tang <geliang(a)kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe(a)kernel.org> Link: https://patch.msgid.link/20250912-net-mptcp-fix-sft-connect-v1-2-d40e77cbbf… Signed-off-by: Jakub Kicinski <kuba(a)kernel.org> diff --git a/tools/testing/selftests/net/mptcp/mptcp_connect.c b/tools/testing/selftests/net/mptcp/mptcp_connect.c index 4f07ac9fa207..1408698df099 100644 --- a/tools/testing/selftests/net/mptcp/mptcp_connect.c +++ b/tools/testing/selftests/net/mptcp/mptcp_connect.c @@ -1093,6 +1093,7 @@ int main_loop_s(int listensock) struct pollfd polls; socklen_t salen; int remotesock; + int err = 0; int fd = 0; again: @@ -1125,7 +1126,7 @@ int main_loop_s(int listensock) SOCK_TEST_TCPULP(remotesock, 0); memset(&winfo, 0, sizeof(winfo)); - copyfd_io(fd, remotesock, 1, true, &winfo); + err = copyfd_io(fd, remotesock, 1, true, &winfo); } else { perror("accept"); return 1; @@ -1134,10 +1135,10 @@ int main_loop_s(int listensock) if (cfg_input) close(fd); - if (--cfg_repeat > 0) + if (!err && --cfg_repeat > 0) goto again; - return 0; + return err; } static void init_rng(void)

2 months, 1 week

2
1
0 0

Unplayable framerates in game but specific kernel versions work, maybe amdgpu problem

by machion＠disroot.org

Hello kernel/driver developers, I hope, with my information it's possible to find a bug/problem in the kernel. Otherwise I am sorry, that I disturbed you. I only use LTS kernels, but I can narrow it down to a hand full of them, where it works. The PC: Manjaro Stable/Cinnamon/X11/AMD Ryzen 5 2600/Radeon HD 7790/8GB RAM I already asked the Manjaro community, but with no luck. The game: Hellpoint (GOG Linux latest version, Unity3D-Engine v2021), uses vulkan --- I came a long road of kernels. I had many versions of 5.4, 5.10, 5.15, 6.1 and 6.6 and and the game was always unplayable, because the frames where around 1fps (performance of PC is not the problem). I asked the mesa and cinnamon team for help in the past, but also with no luck. It never worked, till on 2025-03-29 when I installed 6.12.19 for the first time and it worked! But it only worked with 6.12.19, 6.12.20 and 6.12.21 When I updated to 6.12.25, it was back to unplayable. For testing I installed 6.14.4 with the same result. It doesn't work. I also compared file /proc/config.gz of both kernels (6.12.21 <> 6.14.4), but can't seem to see drastic changes to the graphical part. I presume it has something to do with amdgpu. If you need more information, I would be happy to help. Kind regards, Marion

2 months, 1 week

2
5
0 0

[PATCH] nvmet: pci-epf: Move DMA initialization to EPC init callback

by Niklas Cassel

From: Shin'ichiro Kawasaki <shinichiro.kawasaki(a)wdc.com> For DMA initialization to work across all EPC drivers, the DMA initialization has to be done in the .init() callback. This is because not all EPC drivers will have a refclock (which is often needed to access registers of a DMA controller embedded in a PCIe controller) at the time the .bind() callback is called. However, all EPC drivers are guaranteed to have a refclock by the time the .init() callback is called. Thus, move the DMA initialization to the .init() callback. This change was already done for other EPF drivers in commit 60bd3e039aa2 ("PCI: endpoint: pci-epf-{mhi/test}: Move DMA initialization to EPC init callback"). Cc: stable(a)vger.kernel.org Fixes: 0faa0fe6f90e ("nvmet: New NVMe PCI endpoint function target driver") Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki(a)wdc.com> Signed-off-by: Niklas Cassel <cassel(a)kernel.org> --- drivers/nvme/target/pci-epf.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/nvme/target/pci-epf.c b/drivers/nvme/target/pci-epf.c index 2e78397a7373a..9c5b0f78ce8df 100644 --- a/drivers/nvme/target/pci-epf.c +++ b/drivers/nvme/target/pci-epf.c @@ -2325,6 +2325,8 @@ static int nvmet_pci_epf_epc_init(struct pci_epf *epf) return ret; } + nvmet_pci_epf_init_dma(nvme_epf); + /* Set device ID, class, etc. */ epf->header->vendorid = ctrl->tctrl->subsys->vendor_id; epf->header->subsys_vendor_id = ctrl->tctrl->subsys->subsys_vendor_id; @@ -2422,8 +2424,6 @@ static int nvmet_pci_epf_bind(struct pci_epf *epf) if (ret) return ret; - nvmet_pci_epf_init_dma(nvme_epf); - return 0; } -- 2.51.0

2 months, 1 week

3
3
0 0

[PATCH] drm/xe: Increase global invalidation timeout to 1000us

by Kenneth Graunke

The previous timeout of 500us seems to be too small; panning the map in the Roll20 VTT in Firefox on a KDE/Wayland desktop reliably triggered timeouts within a few seconds of usage, causing the monitor to freeze and the following to be printed to dmesg: [Jul30 13:44] xe 0000:03:00.0: [drm] *ERROR* GT0: Global invalidation timeout [Jul30 13:48] xe 0000:03:00.0: [drm] *ERROR* [CRTC:82:pipe A] flip_done timed out I haven't hit a single timeout since increasing it to 1000us even after several multi-hour testing sessions. Fixes: c0114fdf6d4a ("drm/xe: Move DSB l2 flush to a more sensible place") Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/5710 Signed-off-by: Kenneth Graunke <kenneth(a)whitecape.org> Cc: stable(a)vger.kernel.org Cc: Maarten Lankhorst <dev(a)lankhorst.se> --- drivers/gpu/drm/xe/xe_device.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) This fixes my desktop which has been broken since 6.15. Given that https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/6097 was recently filed and they seem to need a timeout of 2000 (and are having somewhat different issues), maybe more work's needed here...but I figured I'd send out the fix for my system and let xe folks figure out what they'd like to do. Thanks :) diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c index a4d12ee7d575..6339b8800914 100644 --- a/drivers/gpu/drm/xe/xe_device.c +++ b/drivers/gpu/drm/xe/xe_device.c @@ -1064,7 +1064,7 @@ void xe_device_l2_flush(struct xe_device *xe) spin_lock(&gt->global_invl_lock); xe_mmio_write32(&gt->mmio, XE2_GLOBAL_INVAL, 0x1); - if (xe_mmio_wait32(&gt->mmio, XE2_GLOBAL_INVAL, 0x1, 0x0, 500, NULL, true)) + if (xe_mmio_wait32(&gt->mmio, XE2_GLOBAL_INVAL, 0x1, 0x0, 1000, NULL, true)) xe_gt_err_once(gt, "Global invalidation timeout\n"); spin_unlock(&gt->global_invl_lock); -- 2.51.0

2 months, 1 week

3
2
0 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

Linux-stable-mirror September 2025