- Linux-stable-mirror - lists.linaro.org

[PATCH] platform/x86: ideapad-laptop: use usleep_range() for EC polling

by Rong Zhang

It was reported that ideapad-laptop sometimes causes some recent (since 2024) Lenovo ThinkBook models shut down when: - suspending/resuming - closing/opening the lid - (dis)connecting a charger - reading/writing some sysfs properties, e.g., fan_mode, touchpad - pressing down some Fn keys, e.g., Brightness Up/Down (Fn+F5/F6) - (seldom) loading the kmod The issue has existed since the launch day of such models, and there have been some out-of-tree workarounds (see Link:) for the issue. One disables some functionalities, while another one simply shortens IDEAPAD_EC_TIMEOUT. The disabled functionalities have read_ec_data() in their call chains, which calls schedule() between each poll. It turns out that these models suffer from the indeterminacy of schedule() because of their low tolerance for being polled too frequently. Sometimes schedule() returns too soon due to the lack of ready tasks, causing the margin between two polls to be too short. In this case, the command is somehow aborted, and too many subsequent polls (they poll for "nothing!") may eventually break the state machine in the EC, resulting in a hard shutdown. This explains why shortening IDEAPAD_EC_TIMEOUT works around the issue - it reduces the total number of polls sent to the EC. Even when it doesn't lead to a shutdown, frequent polls may also disturb the ongoing operation and notably delay (+ 10-20ms) the availability of EC response. This phenomenon is unlikely to be exclusive to the models mentioned above, so dropping the schedule() manner should also slightly improve the responsiveness of various models. Fix these issues by migrating to usleep_range(150, 300). The interval is chosen to add some margin to the minimal 50us and considering EC responses are usually available after 150-2500us based on my test. It should be enough to fix these issues on all models subject to the EC bug without introducing latency on other models. Tested on ThinkBook 14 G7+ ASP and solved both issues. No regression was introduced in the test on a model without the EC bug (ThinkBook X IMH, thanks Eric). Link: https://github.com/ty2/ideapad-laptop-tb2024g6plus/commit/6c5db18c9e8109873… Link: https://github.com/ferstar/ideapad-laptop-tb/commit/42d1e68e5009529d31bd23f… Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218771 Fixes: 6a09f21dd1e2 ("ideapad: add ACPI helpers") Cc: stable(a)vger.kernel.org Tested-by: Eric Long <i(a)hack3r.moe> Signed-off-by: Rong Zhang <i(a)rong.moe> --- drivers/platform/x86/ideapad-laptop.c | 19 +++++++++++++++++-- 1 file changed, 17 insertions(+), 2 deletions(-) diff --git a/drivers/platform/x86/ideapad-laptop.c b/drivers/platform/x86/ideapad-laptop.c index ede483573fe0..b5e4da6a6779 100644 --- a/drivers/platform/x86/ideapad-laptop.c +++ b/drivers/platform/x86/ideapad-laptop.c @@ -15,6 +15,7 @@ #include <linux/bug.h> #include <linux/cleanup.h> #include <linux/debugfs.h> +#include <linux/delay.h> #include <linux/device.h> #include <linux/dmi.h> #include <linux/i8042.h> @@ -267,6 +268,20 @@ static void ideapad_shared_exit(struct ideapad_private *priv) */ #define IDEAPAD_EC_TIMEOUT 200 /* in ms */ +/* + * Some models (e.g., ThinkBook since 2024) have a low tolerance for being + * polled too frequently. Doing so may break the state machine in the EC, + * resulting in a hard shutdown. + * + * It is also observed that frequent polls may disturb the ongoing operation + * and notably delay the availability of EC response. + * + * These values are used as the delay before the first poll and the interval + * between subsequent polls to solve the above issues. + */ +#define IDEAPAD_EC_POLL_MIN_US 150 +#define IDEAPAD_EC_POLL_MAX_US 300 + static int eval_int(acpi_handle handle, const char *name, unsigned long *res) { unsigned long long result; @@ -383,7 +398,7 @@ static int read_ec_data(acpi_handle handle, unsigned long cmd, unsigned long *da end_jiffies = jiffies + msecs_to_jiffies(IDEAPAD_EC_TIMEOUT) + 1; while (time_before(jiffies, end_jiffies)) { - schedule(); + usleep_range(IDEAPAD_EC_POLL_MIN_US, IDEAPAD_EC_POLL_MAX_US); err = eval_vpcr(handle, 1, &val); if (err) @@ -414,7 +429,7 @@ static int write_ec_cmd(acpi_handle handle, unsigned long cmd, unsigned long dat end_jiffies = jiffies + msecs_to_jiffies(IDEAPAD_EC_TIMEOUT) + 1; while (time_before(jiffies, end_jiffies)) { - schedule(); + usleep_range(IDEAPAD_EC_POLL_MIN_US, IDEAPAD_EC_POLL_MAX_US); err = eval_vpcr(handle, 1, &val); if (err) base-commit: a5806cd506af5a7c19bcd596e4708b5c464bfd21 -- 2.49.0

1 month

7
6
0 0

Unplayable framerates in game but specific kernel versions work, maybe amdgpu problem

by machion＠disroot.org

Hello kernel/driver developers, I hope, with my information it's possible to find a bug/problem in the kernel. Otherwise I am sorry, that I disturbed you. I only use LTS kernels, but I can narrow it down to a hand full of them, where it works. The PC: Manjaro Stable/Cinnamon/X11/AMD Ryzen 5 2600/Radeon HD 7790/8GB RAM I already asked the Manjaro community, but with no luck. The game: Hellpoint (GOG Linux latest version, Unity3D-Engine v2021), uses vulkan --- I came a long road of kernels. I had many versions of 5.4, 5.10, 5.15, 6.1 and 6.6 and and the game was always unplayable, because the frames where around 1fps (performance of PC is not the problem). I asked the mesa and cinnamon team for help in the past, but also with no luck. It never worked, till on 2025-03-29 when I installed 6.12.19 for the first time and it worked! But it only worked with 6.12.19, 6.12.20 and 6.12.21 When I updated to 6.12.25, it was back to unplayable. For testing I installed 6.14.4 with the same result. It doesn't work. I also compared file /proc/config.gz of both kernels (6.12.21 <> 6.14.4), but can't seem to see drastic changes to the graphical part. I presume it has something to do with amdgpu. If you need more information, I would be happy to help. Kind regards, Marion

1 month

2
4
0 0

[PATCH 0/2] media: qcom: camss: Fix two bugs in mainline

by Bryan O'Donoghue

Two bug fixes here. First up SDM630/SDM660 hasn't been probing because moving the CSIPHY gen2 init sequence into a common location also moved the default case of the switch statement which rejects non-gen2 devices. Second is a fix for a very longstanding bug which is a race-condition between fully enumerating /dev/videoX devices along with all of their dependent data-structures and gating user-space access to those devices. Signed-off-by: Bryan O'Donoghue <bryan.odonoghue(a)linaro.org> --- Bryan O'Donoghue (2): media: qcom: camss: csiphy-3ph: Fix inadvertent dropping of SDM660/SDM670 phy init media: qcom: camss: vfe: Fix registration sequencing bug drivers/media/platform/qcom/camss/camss-csiphy-3ph-1-0.c | 3 +-- drivers/media/platform/qcom/camss/camss-vfe.c | 8 ++++++++ drivers/media/platform/qcom/camss/camss-vfe.h | 1 + 3 files changed, 10 insertions(+), 2 deletions(-) --- base-commit: 8666245114d979b963dc23894a03c74ecab8a7a6 change-id: 20250610-linux-next-25-05-30-daily-reviews-47ef54eee7ea Best regards, -- Bryan O'Donoghue <bryan.odonoghue(a)linaro.org>

1 month

3
8
0 0

[PATCH v2 1/4] wifi: ath12k: fix dest ring-buffer corruption

by Johan Hovold

Add the missing memory barrier to make sure that destination ring descriptors are read after the head pointers to avoid using stale data on weakly ordered architectures like aarch64. The barrier is added to the ath12k_hal_srng_access_begin() helper for symmetry with follow-on fixes for source ring buffer corruption which will add barriers to ath12k_hal_srng_access_end(). Note that this may fix the empty descriptor issue recently worked around by commit 51ad34a47e9f ("wifi: ath12k: Add drop descriptor handling for monitor ring"). Tested-on: WCN7850 hw2.0 WLAN.HMT.1.0.c5-00481-QCAHMTSWPL_V1.0_V2.0_SILICONZ-3 Fixes: d889913205cf ("wifi: ath12k: driver for Qualcomm Wi-Fi 7 devices") Cc: stable(a)vger.kernel.org # 6.3 Signed-off-by: Johan Hovold <johan+linaro(a)kernel.org> --- drivers/net/wireless/ath/ath12k/ce.c | 3 --- drivers/net/wireless/ath/ath12k/hal.c | 17 ++++++++++++++--- 2 files changed, 14 insertions(+), 6 deletions(-) diff --git a/drivers/net/wireless/ath/ath12k/ce.c b/drivers/net/wireless/ath/ath12k/ce.c index 740586fe49d1..b66d23d6b2bd 100644 --- a/drivers/net/wireless/ath/ath12k/ce.c +++ b/drivers/net/wireless/ath/ath12k/ce.c @@ -343,9 +343,6 @@ static int ath12k_ce_completed_recv_next(struct ath12k_ce_pipe *pipe, goto err; } - /* Make sure descriptor is read after the head pointer. */ - dma_rmb(); - *nbytes = ath12k_hal_ce_dst_status_get_length(desc); *skb = pipe->dest_ring->skb[sw_index]; diff --git a/drivers/net/wireless/ath/ath12k/hal.c b/drivers/net/wireless/ath/ath12k/hal.c index 91d5126ca149..9eea13ed5565 100644 --- a/drivers/net/wireless/ath/ath12k/hal.c +++ b/drivers/net/wireless/ath/ath12k/hal.c @@ -2126,13 +2126,24 @@ void *ath12k_hal_srng_src_get_next_reaped(struct ath12k_base *ab, void ath12k_hal_srng_access_begin(struct ath12k_base *ab, struct hal_srng *srng) { + u32 hp; + lockdep_assert_held(&srng->lock); - if (srng->ring_dir == HAL_SRNG_DIR_SRC) + if (srng->ring_dir == HAL_SRNG_DIR_SRC) { srng->u.src_ring.cached_tp = *(volatile u32 *)srng->u.src_ring.tp_addr; - else - srng->u.dst_ring.cached_hp = READ_ONCE(*srng->u.dst_ring.hp_addr); + } else { + hp = READ_ONCE(*srng->u.dst_ring.hp_addr); + + if (hp != srng->u.dst_ring.cached_hp) { + srng->u.dst_ring.cached_hp = hp; + /* Make sure descriptor is read after the head + * pointer. + */ + dma_rmb(); + } + } } /* Update cached ring head/tail pointers to HW. ath12k_hal_srng_access_begin() -- 2.49.0

1 month

4
6
0 0

[PATCH] erofs: remove unused trace event erofs_destroy_inode

by Gao Xiang

The trace event `erofs_destroy_inode` was added but remains unused. This unused event contributes approximately 5KB to the kernel module size. Reported-by: Steven Rostedt <rostedt(a)goodmis.org> Closes: https://lore.kernel.org/r/20250612224906.15000244@batman.local.home Fixes: 13f06f48f7bf ("staging: erofs: support tracepoint") Cc: stable(a)vger.kernel.org Signed-off-by: Gao Xiang <hsiangkao(a)linux.alibaba.com> --- include/trace/events/erofs.h | 18 ------------------ 1 file changed, 18 deletions(-) diff --git a/include/trace/events/erofs.h b/include/trace/events/erofs.h index a5f4b9234f46..dad7360f42f9 100644 --- a/include/trace/events/erofs.h +++ b/include/trace/events/erofs.h @@ -211,24 +211,6 @@ TRACE_EVENT(erofs_map_blocks_exit, show_mflags(__entry->mflags), __entry->ret) ); -TRACE_EVENT(erofs_destroy_inode, - TP_PROTO(struct inode *inode), - - TP_ARGS(inode), - - TP_STRUCT__entry( - __field( dev_t, dev ) - __field( erofs_nid_t, nid ) - ), - - TP_fast_assign( - __entry->dev = inode->i_sb->s_dev; - __entry->nid = EROFS_I(inode)->nid; - ), - - TP_printk("dev = (%d,%d), nid = %llu", show_dev_nid(__entry)) -); - #endif /* _TRACE_EROFS_H */ /* This part must be outside protection */ -- 2.43.5

1 month

1
0
0 0

[PATCH] net/9p: Fix buffer overflow in USB transport layer

by Yuhao Jiang

A buffer overflow vulnerability exists in the USB 9pfs transport layer where inconsistent size validation between packet header parsing and actual data copying allows a malicious USB host to overflow heap buffers. The issue occurs because: - usb9pfs_rx_header() validates only the declared size in packet header - usb9pfs_rx_complete() uses req->actual (actual received bytes) for memcpy This allows an attacker to craft packets with small declared size (bypassing validation) but large actual payload (triggering overflow in memcpy). Add validation in usb9pfs_rx_complete() to ensure req->actual does not exceed the buffer capacity before copying data. Reported-by: Yuhao Jiang <danisjiang(a)gmail.com> Fixes: a3be076dc174 ("net/9p/usbg: Add new usb gadget function transport") Cc: stable(a)vger.kernel.org Signed-off-by: Yuhao Jiang <danisjiang(a)gmail.com> --- net/9p/trans_usbg.c | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/net/9p/trans_usbg.c b/net/9p/trans_usbg.c index 6b694f117aef..047a2862fc84 100644 --- a/net/9p/trans_usbg.c +++ b/net/9p/trans_usbg.c @@ -242,6 +242,15 @@ static void usb9pfs_rx_complete(struct usb_ep *ep, struct usb_request *req) if (!p9_rx_req) return; + /* Validate actual received size against buffer capacity */ + if (req->actual > p9_rx_req->rc.capacity) { + dev_err(&cdev->gadget->dev, + "received data size %u exceeds buffer capacity %zu\n", + req->actual, p9_rx_req->rc.capacity); + p9_req_put(usb9pfs->client, p9_rx_req); + return; + } + memcpy(p9_rx_req->rc.sdata, req->buf, req->actual); p9_rx_req->rc.size = req->actual; -- 2.43.0

1 month

3
4
0 0

[PATCH 1/2] Revert "usb: gadget: u_serial: Add null pointer check in gs_start_io"

by Kuen-Han Tsai

This reverts commit ffd603f214237e250271162a5b325c6199a65382. Commit ffd603f21423 ("usb: gadget: u_serial: Add null pointer check in gs_start_io") adds null pointer checks at the beginning of the gs_start_io() function to prevent a null pointer dereference. However, these checks are redundant because the function's comment already requires callers to hold the port_lock and ensure port.tty and port_usb are not null. All existing callers already follow these rules. The true cause of the null pointer dereference is a race condition. When gs_start_io() calls either gs_start_rx() or gs_start_tx(), the port_lock is temporarily released for usb_ep_queue(). This allows port.tty and port_usb to be cleared. Cc: stable(a)vger.kernel.org Fixes: ffd603f21423 ("usb: gadget: u_serial: Add null pointer check in gs_start_io") Signed-off-by: Kuen-Han Tsai <khtsai(a)google.com> --- drivers/usb/gadget/function/u_serial.c | 6 +----- 1 file changed, 1 insertion(+), 5 deletions(-) diff --git a/drivers/usb/gadget/function/u_serial.c b/drivers/usb/gadget/function/u_serial.c index ab544f6824be..c043bdc30d8a 100644 --- a/drivers/usb/gadget/function/u_serial.c +++ b/drivers/usb/gadget/function/u_serial.c @@ -544,20 +544,16 @@ static int gs_alloc_requests(struct usb_ep *ep, struct list_head *head, static int gs_start_io(struct gs_port *port) { struct list_head *head = &port->read_pool; - struct usb_ep *ep; + struct usb_ep *ep = port->port_usb->out; int status; unsigned started; - if (!port->port_usb || !port->port.tty) - return -EIO; - /* Allocate RX and TX I/O buffers. We can't easily do this much * earlier (with GFP_KERNEL) because the requests are coupled to * endpoints, as are the packet sizes we'll be using. Different * configurations may use different endpoints with a given port; * and high speed vs full speed changes packet sizes too. */ - ep = port->port_usb->out; status = gs_alloc_requests(ep, head, gs_read_complete, &port->read_allocated); if (status) -- 2.50.0.rc2.692.g299adb8693-goog

1 month

2
5
0 0

[PATCH] drm/etnaviv: Protect the scheduler's pending list with its lock

by Maíra Canal

Commit 704d3d60fec4 ("drm/etnaviv: don't block scheduler when GPU is still active") ensured that active jobs are returned to the pending list when extending the timeout. However, it didn't use the pending list's lock to manipulate the list, which causes a race condition as the scheduler's workqueues are running. Hold the lock while manipulating the scheduler's pending list to prevent a race. Cc: stable(a)vger.kernel.org Fixes: 704d3d60fec4 ("drm/etnaviv: don't block scheduler when GPU is still active") Signed-off-by: Maíra Canal <mcanal(a)igalia.com> --- Hi, I'm proposing this workaround patch to address the race-condition caused by manipulating the pending list without using its lock. Although I understand this isn't a complete solution (see [1]), it's not reasonable to backport the new DRM stat series [2] to the stable branches. Therefore, I believe the best solution is backporting this fix to the stable branches, which will fix the race and will keep adding the job back to the pending list (which will avoid most memory leaks). [1] https://lore.kernel.org/dri-devel/bcc0ed477f8a6f3bb06665b1756bcb98fb7af871.… [2] https://lore.kernel.org/dri-devel/20250530-sched-skip-reset-v2-0-c40a8d2d8d… Best Regards, - Maíra --- drivers/gpu/drm/etnaviv/etnaviv_sched.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/etnaviv/etnaviv_sched.c b/drivers/gpu/drm/etnaviv/etnaviv_sched.c index 76a3a3e517d8..71e2e6b9d713 100644 --- a/drivers/gpu/drm/etnaviv/etnaviv_sched.c +++ b/drivers/gpu/drm/etnaviv/etnaviv_sched.c @@ -35,6 +35,7 @@ static enum drm_gpu_sched_stat etnaviv_sched_timedout_job(struct drm_sched_job *sched_job) { struct etnaviv_gem_submit *submit = to_etnaviv_submit(sched_job); + struct drm_gpu_scheduler *sched = sched_job->sched; struct etnaviv_gpu *gpu = submit->gpu; u32 dma_addr, primid = 0; int change; @@ -89,7 +90,9 @@ static enum drm_gpu_sched_stat etnaviv_sched_timedout_job(struct drm_sched_job return DRM_GPU_SCHED_STAT_NOMINAL; out_no_timeout: - list_add(&sched_job->list, &sched_job->sched->pending_list); + spin_lock(&sched->job_list_lock); + list_add(&sched_job->list, &sched->pending_list); + spin_unlock(&sched->job_list_lock); return DRM_GPU_SCHED_STAT_NOMINAL; } -- 2.49.0

1 month

3
3
0 0

+ maple_tree-fix-ma_state_prealloc-flag-in-mas_preallocate.patch added to mm-hotfixes-unstable branch

by Andrew Morton

The patch titled Subject: maple_tree: fix MA_STATE_PREALLOC flag in mas_preallocate() has been added to the -mm mm-hotfixes-unstable branch. Its filename is maple_tree-fix-ma_state_prealloc-flag-in-mas_preallocate.patch This patch will shortly appear at https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche… This patch will later appear in the mm-hotfixes-unstable branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Before you just go and hit "reply", please: a) Consider who else should be cc'ed b) Prefer to cc a suitable mailing list as well c) Ideally: find the original patch on the mailing list and do a reply-to-all to that, adding suitable additional cc's *** Remember to use Documentation/process/submit-checklist.rst when testing your code *** The -mm tree is included into linux-next via the mm-everything branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm and is updated there every 2-3 working days ------------------------------------------------------ From: "Liam R. Howlett" <Liam.Howlett(a)oracle.com> Subject: maple_tree: fix MA_STATE_PREALLOC flag in mas_preallocate() Date: Mon, 16 Jun 2025 14:45:20 -0400 Temporarily clear the preallocation flag when explicitly requesting allocations. Pre-existing allocations are already counted against the request through mas_node_count_gfp(), but the allocations will not happen if the MA_STATE_PREALLOC flag is set. This flag is meant to avoid re-allocating in bulk allocation mode, and to detect issues with preallocation calculations. The MA_STATE_PREALLOC flag should also always be set on zero allocations so that detection of underflow allocations will print a WARN_ON() during consumption. User visible effect of this flaw is a WARN_ON() followed by a null pointer dereference when subsequent requests for larger number of nodes is ignored, such as the vma merge retry in mmap_region() caused by drivers altering the vma flags (which happens in v6.6, at least) Link: https://lkml.kernel.org/r/20250616184521.3382795-3-Liam.Howlett@oracle.com Fixes: 54a611b605901 ("Maple Tree: add new data structure") Signed-off-by: Liam R. Howlett <Liam.Howlett(a)oracle.com> Reported-by: Zhaoyang Huang <zhaoyang.huang(a)unisoc.com> Reported-by: Hailong Liu <hailong.liu(a)oppo.com> Link: https://lore.kernel.org/all/1652f7eb-a51b-4fee-8058-c73af63bacd1@oppo.com/ Link: https://lore.kernel.org/all/20250428184058.1416274-1-Liam.Howlett@oracle.co… Link: https://lore.kernel.org/all/20250429014754.1479118-1-Liam.Howlett@oracle.co… Cc: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com> Cc: Suren Baghdasaryan <surenb(a)google.com> Cc: Hailong Liu <hailong.liu(a)oppo.com> Cc: zhangpeng.00(a)bytedance.com <zhangpeng.00(a)bytedance.com> Cc: Steve Kang <Steve.Kang(a)unisoc.com> Cc: Matthew Wilcox <willy(a)infradead.org> Cc: Sidhartha Kumar <sidhartha.kumar(a)oracle.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- lib/maple_tree.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) --- a/lib/maple_tree.c~maple_tree-fix-ma_state_prealloc-flag-in-mas_preallocate +++ a/lib/maple_tree.c @@ -5527,8 +5527,9 @@ int mas_preallocate(struct ma_state *mas mas->store_type = mas_wr_store_type(&wr_mas); request = mas_prealloc_calc(&wr_mas, entry); if (!request) - return ret; + goto set_flag; + mas->mas_flags &= ~MA_STATE_PREALLOC; mas_node_count_gfp(mas, request, gfp); if (mas_is_err(mas)) { mas_set_alloc_req(mas, 0); @@ -5538,6 +5539,7 @@ int mas_preallocate(struct ma_state *mas return ret; } +set_flag: mas->mas_flags |= MA_STATE_PREALLOC; return ret; } _ Patches currently in -mm which might be from Liam.Howlett(a)oracle.com are maple_tree-fix-ma_state_prealloc-flag-in-mas_preallocate.patch testing-raix-tree-maple-increase-readers-and-reduce-delay-for-faster-machines.patch

1 month

1
0
0 0

[PATCH] drm/v3d: Avoid NULL pointer dereference in `v3d_job_update_stats()`

by Maíra Canal

The following kernel Oops was recently reported by Mesa CI: [ 800.139824] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000588 [ 800.148619] Mem abort info: [ 800.151402] ESR = 0x0000000096000005 [ 800.155141] EC = 0x25: DABT (current EL), IL = 32 bits [ 800.160444] SET = 0, FnV = 0 [ 800.163488] EA = 0, S1PTW = 0 [ 800.166619] FSC = 0x05: level 1 translation fault [ 800.171487] Data abort info: [ 800.174357] ISV = 0, ISS = 0x00000005, ISS2 = 0x00000000 [ 800.179832] CM = 0, WnR = 0, TnD = 0, TagAccess = 0 [ 800.184873] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 [ 800.190176] user pgtable: 4k pages, 39-bit VAs, pgdp=00000001014c2000 [ 800.196607] [0000000000000588] pgd=0000000000000000, p4d=0000000000000000, pud=0000000000000000 [ 800.205305] Internal error: Oops: 0000000096000005 [#1] PREEMPT SMP [ 800.211564] Modules linked in: vc4 snd_soc_hdmi_codec drm_display_helper v3d cec gpu_sched drm_dma_helper drm_shmem_helper drm_kms_helper drm drm_panel_orientation_quirks snd_soc_core snd_compress snd_pcm_dmaengine snd_pcm i2c_brcmstb snd_timer snd backlight [ 800.234448] CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted 6.12.25+rpt-rpi-v8 #1 Debian 1:6.12.25-1+rpt1 [ 800.244182] Hardware name: Raspberry Pi 4 Model B Rev 1.4 (DT) [ 800.250005] pstate: 600000c5 (nZCv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 800.256959] pc : v3d_job_update_stats+0x60/0x130 [v3d] [ 800.262112] lr : v3d_job_update_stats+0x48/0x130 [v3d] [ 800.267251] sp : ffffffc080003e60 [ 800.270555] x29: ffffffc080003e60 x28: ffffffd842784980 x27: 0224012000000000 [ 800.277687] x26: ffffffd84277f630 x25: ffffff81012fd800 x24: 0000000000000020 [ 800.284818] x23: ffffff8040238b08 x22: 0000000000000570 x21: 0000000000000158 [ 800.291948] x20: 0000000000000000 x19: ffffff8040238000 x18: 0000000000000000 [ 800.299078] x17: ffffffa8c1bd2000 x16: ffffffc080000000 x15: 0000000000000000 [ 800.306208] x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000 [ 800.313338] x11: 0000000000000040 x10: 0000000000001a40 x9 : ffffffd83b39757c [ 800.320468] x8 : ffffffd842786420 x7 : 7fffffffffffffff x6 : 0000000000ef32b0 [ 800.327598] x5 : 00ffffffffffffff x4 : 0000000000000015 x3 : ffffffd842784980 [ 800.334728] x2 : 0000000000000004 x1 : 0000000000010002 x0 : 000000ba4c0ca382 [ 800.341859] Call trace: [ 800.344294] v3d_job_update_stats+0x60/0x130 [v3d] [ 800.349086] v3d_irq+0x124/0x2e0 [v3d] [ 800.352835] __handle_irq_event_percpu+0x58/0x218 [ 800.357539] handle_irq_event+0x54/0xb8 [ 800.361369] handle_fasteoi_irq+0xac/0x240 [ 800.365458] handle_irq_desc+0x48/0x68 [ 800.369200] generic_handle_domain_irq+0x24/0x38 [ 800.373810] gic_handle_irq+0x48/0xd8 [ 800.377464] call_on_irq_stack+0x24/0x58 [ 800.381379] do_interrupt_handler+0x88/0x98 [ 800.385554] el1_interrupt+0x34/0x68 [ 800.389123] el1h_64_irq_handler+0x18/0x28 [ 800.393211] el1h_64_irq+0x64/0x68 [ 800.396603] default_idle_call+0x3c/0x168 [ 800.400606] do_idle+0x1fc/0x230 [ 800.403827] cpu_startup_entry+0x40/0x50 [ 800.407742] rest_init+0xe4/0xf0 [ 800.410962] start_kernel+0x5e8/0x790 [ 800.414616] __primary_switched+0x80/0x90 [ 800.418622] Code: 8b170277 8b160296 11000421 b9000861 (b9401ac1) [ 800.424707] ---[ end trace 0000000000000000 ]--- [ 800.457313] ---[ end Kernel panic - not syncing: Oops: Fatal exception in interrupt ]--- This issue happens when the file descriptor is closed before the jobs submitted by it are completed. When the job completes, we update the global GPU stats and the per-fd GPU stats, which are exposed through fdinfo. If the file descriptor was closed, then the struct `v3d_file_priv` and its stats were already freed and we can't update the per-fd stats. Therefore, if the file descriptor was already closed, don't update the per-fd GPU stats, only update the global ones. Cc: stable(a)vger.kernel.org # v6.12+ Signed-off-by: Maíra Canal <mcanal(a)igalia.com> --- drivers/gpu/drm/v3d/v3d_sched.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/v3d/v3d_sched.c b/drivers/gpu/drm/v3d/v3d_sched.c index 466d28ceee28..5ed676304964 100644 --- a/drivers/gpu/drm/v3d/v3d_sched.c +++ b/drivers/gpu/drm/v3d/v3d_sched.c @@ -199,7 +199,6 @@ v3d_job_update_stats(struct v3d_job *job, enum v3d_queue queue) struct v3d_dev *v3d = job->v3d; struct v3d_file_priv *file = job->file->driver_priv; struct v3d_stats *global_stats = &v3d->queue[queue].stats; - struct v3d_stats *local_stats = &file->stats[queue]; u64 now = local_clock(); unsigned long flags; @@ -209,7 +208,12 @@ v3d_job_update_stats(struct v3d_job *job, enum v3d_queue queue) else preempt_disable(); - v3d_stats_update(local_stats, now); + /* Don't update the local stats if the file context has already closed */ + if (file) + v3d_stats_update(&file->stats[queue], now); + else + drm_dbg(&v3d->drm, "The file descriptor was closed before job completion\n"); + v3d_stats_update(global_stats, now); if (IS_ENABLED(CONFIG_LOCKDEP)) -- 2.49.0

1 month

2
2
0 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

Linux-stable-mirror