pick_eevdf() may return NULL, which would triggers NULL pointer dereference and crash when best and curr are both NULL.
There are two cases when curr would be NULL: 1) curr is NULL when enter pick_eevdf 2) we set it to NUll when curr is not on_rq or eligible.
And when we went to the best = curr flow, the se should never be NULL, So when best and curr are both NULL, we'd better set best = se to avoid return NULL.
Below crash is what I encounter very low probability on our server and I have not reproduce it, and I also found other people feedback some similar crash on lore. So believe the issue is really exit.
<1>[ 8.607396] Unable to handle kernel NULL pointer dereference at virtual address 00000000000000a0 <1>[ 8.607399] Mem abort info: <1>[ 8.607400] ESR = 0x0000000096000004 <1>[ 8.607401] EC = 0x25: DABT (current EL), IL = 32 bits <1>[ 8.607402] SET = 0, FnV = 0 <1>[ 8.607403] EA = 0, S1PTW = 0 <1>[ 8.607403] FSC = 0x04: level 0 translation fault <1>[ 8.607404] Data abort info: <1>[ 8.607404] ISV = 0, ISS = 0x00000004 <1>[ 8.607404] CM = 0, WnR = 0 <1>[ 8.607405] user pgtable: 4k pages, 48-bit VAs, pgdp=000000011efef000 <1>[ 8.607406] [00000000000000a0] pgd=0000000000000000, p4d=0000000000000000 <0>[ 8.607409] Internal error: Oops: 0000000096000004 [#1] PREEMPT_RT SMP <4>[ 8.607412] Modules linked in: tegradisp(O) sch_ingress xt_tcpudp iptable_filter 8021q garp mrp um_heap(O) nvhost_isp5(O) spidev nvhost_vi5(O) bridge nvhost_nvcsi_t194(O) tegra_capture_isp(O) tegra210_adma stp llc nvhost_capture(O) tegra_aconnect watchdog_tegra_t18x(O) spi_tegra114 tegra_camera(O) v4l2_dv_timings v4l2_fwnode v4l2_async videobuf2_dma_contig tegra_drm(O) cpuidle_tegra_auto(O) nvhost_nvcsi(O) nvhost_nvdla(O) tegra_camera_platform(O) drm_dp_aux_bus camchar(O) capture_ivc(O) cec videobuf2_v4l2 camera_diagnostics(O) snd_soc_tegra_virt_t210ref_pcm(O) drm_display_helper videobuf2_memops rtcpu_debug(O) snd_soc_tegra210_virt_alt_adsp(O) videobuf2_common drm_kms_helper videodev nvadsp(O) cdi_mgr(O) nvhost_pva(O) cdi_pwm(O) isc_mgr(O) sha3_ce isc_pwm(O) sha3_generic cdi_dev(O) sha512_ce lm90 snd_soc_tegra210_virt_alt_admaif(O) tegra_bpmp_thermal sha512_arm64 tegra_hv_vcpu_yield(O) tegra_hv_pm_ctl(O) cam_fsync(O) cdi_gpio(O) ukl(O) isc_dev(O) mc tegra_camera_rtcpu(O) <4>[ 8.607458] board_id_driver(O) tegra_fsicom(O) isc_gpio(O) ivc_bus(O) hsp_mailbox_client(O) nvhwpm(O) host1x_nvhost(O) nvgpu(O) mc_utils(O) nvmap(O) hvc_sysfs(O) tegra_nvvse_cryptodev(O) tegra_hv_vse_safety(O) host1x_fence(O) host1x(O) nvsciipc(O) userspace_ivc_mempool(O) ivc_cdev(O) logger(O) drm fuse ip_tables x_tables nvme nvme_core hashed_ecid(O) oak_pci(O) nvethernet(O) nvpps(O) tegra_bpmp(O) tegra_vblk(O) li_osdump(O) tegra_hv_vblk_oops(O) <4>[ 8.607479] CPU: 9 PID: 1300 Comm: R000000007400 Tainted: G W O 6.1.119-rt45-prod-rt-tegra #1 <4>[ 8.607481] Hardware name: p3960-0010 (DT) <4>[ 8.607482] pstate: 224000c5 (nzCv daIF +PAN -UAO +TCO -DIT -SSBS BTYPE=--) <4>[ 8.607483] pc : pick_next_task_fair+0x98/0x490 <4>[ 8.607490] lr : pick_next_task_fair+0x98/0x490 <4>[ 8.607490] sp : ffff800021bdb1b0 <4>[ 8.607491] x29: ffff800021bdb1b0 x28: ffff0000836205c0 x27: 0000000000001000 <4>[ 8.607492] x26: d8e0d16df7b8e848 x25: ffff000e881b6dc0 x24: ffff000e881b6dc0 <4>[ 8.607494] x23: ffff800021bdb268 x22: ffff000e881b6d40 x21: ffff000083620000 <4>[ 8.607495] x20: ffff000e881b6dc0 x19: ffff000e881b6d40 x18: 00000000000005c8 <4>[ 8.607496] x17: 0000000000000000 x16: ffffd16df7896b40 x15: 0000000000000000 <4>[ 8.607497] x14: 0000000000000014 x13: ffff800021bdba90 x12: ffff0000b052f300 <4>[ 8.607498] x11: 00000000c425686b x10: 00000002010a4ab3 x9 : ffffd16df6ca7778 <4>[ 8.607499] x8 : ffff800021bdb390 x7 : 0000000000000000 x6 : 0000000000000002 <4>[ 8.607500] x5 : 0000000000000003 x4 : 0000000000000003 x3 : 0ab3e2bc8934c987 <4>[ 8.607501] x2 : ffff000085241ec0 x1 : 0abe8499696457cc x0 : 0000000000000000 <4>[ 8.607503] Call trace: <4>[ 8.607503] pick_next_task_fair+0x98/0x490 <4>[ 8.607505] __schedule+0x16c/0x870 <4>[ 8.607511] schedule_rtlock+0x28/0x60 <4>[ 8.607513] rtlock_slowlock_locked+0x3a0/0xcf0 <4>[ 8.607515] rt_spin_lock+0xb0/0xe0 <4>[ 8.607516] __wake_up_common_lock+0x68/0xe0 <4>[ 8.607519] __wake_up_sync_key+0x28/0x50 <4>[ 8.607520] sock_def_readable+0x48/0xa0 <4>[ 8.607523] __udp_enqueue_schedule_skb+0x158/0x2e0 <4>[ 8.607527] udp_queue_rcv_one_skb+0x1f8/0x6f0 <4>[ 8.607529] udp_queue_rcv_skb+0x64/0x290 <4>[ 8.607531] __udp4_lib_rcv+0x654/0x980 <4>[ 8.607532] udp_rcv+0x28/0x40 <4>[ 8.607533] ip_protocol_deliver_rcu+0x40/0x1d0 <4>[ 8.607538] ip_local_deliver_finish+0x84/0xe0 <4>[ 8.607540] ip_local_deliver+0x84/0x130 <4>[ 8.607542] ip_rcv+0x78/0x150 <4>[ 8.607544] __netif_receive_skb_one_core+0x60/0xb0 <4>[ 8.607548] __netif_receive_skb+0x20/0x80 <4>[ 8.607549] process_backlog+0xcc/0x1a0 <4>[ 8.607551] __napi_poll.constprop.0+0x40/0x230 <4>[ 8.607552] net_rx_action+0x13c/0x310 <4>[ 8.607553] handle_softirqs.isra.0+0x118/0x3a0 <4>[ 8.607556] __local_bh_enable_ip+0x8c/0x110 <4>[ 8.607556] netif_rx+0xf4/0x1d0 <4>[ 8.607558] dev_loopback_xmit+0x88/0x170 <4>[ 8.607559] ip_mc_finish_output+0x7c/0x180 <4>[ 8.607561] ip_mc_output+0x338/0x350 <4>[ 8.607562] ip_send_skb+0x58/0x130 <4>[ 8.607563] udp_send_skb+0x11c/0x3d0 <4>[ 8.607564] udp_sendmsg+0x794/0x9d0 <4>[ 8.607566] inet_sendmsg+0x4c/0xa0 <4>[ 8.607568] __sock_sendmsg+0x64/0x80 <4>[ 8.607572] __sys_sendto+0x114/0x170 <4>[ 8.607573] __arm64_sys_sendto+0x30/0x50 <4>[ 8.607575] invoke_syscall+0x50/0x140 <4>[ 8.607579] el0_svc_common.constprop.0+0x4c/0x110 <4>[ 8.607581] do_el0_svc+0x2c/0x90 <4>[ 8.607583] el0_svc+0x2c/0xa0 <4>[ 8.607585] el0t_64_sync_handler+0x124/0x130 <4>[ 8.607586] el0t_64_sync+0x190/0x194 <0>[ 8.607589] Code: 97fff1ee 37000200 aa1403e0 97ffddc7 (f9405014) <4>[ 8.607596] ---[ end trace 0000000000000000 ]---
Signed-off-by: limingming3 limingming3@lixiang.com --- kernel/sched/fair.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 0fb9bf995a47..7fd867d6b62d 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -978,7 +978,7 @@ static struct sched_entity *pick_eevdf(struct cfs_rq *cfs_rq) } found: if (!best || (curr && entity_before(curr, best))) - best = curr; + best = curr ? curr : se;
return best; }
On Mon, May 19, 2025 at 05:25:39PM +0800, limingming3 wrote:
pick_eevdf() may return NULL, which would triggers NULL pointer dereference and crash when best and curr are both NULL.
There are two cases when curr would be NULL:
- curr is NULL when enter pick_eevdf
- we set it to NUll when curr is not on_rq or eligible.
And when we went to the best = curr flow, the se should never be NULL, So when best and curr are both NULL, we'd better set best = se to avoid return NULL.
Below crash is what I encounter very low probability on our server and I have not reproduce it, and I also found other people feedback some similar crash on lore. So believe the issue is really exit.
If you've found those emails, you'll also have found me telling them this is the wrong fix.
This (returning NULL) can only happen when the internal state is broken. Ignoring the NULL will then hide the actual problem.
Can you reproduce on the latest kernels?, 6.1 is so old I don't even remember what's in there.
On Mon, 19 May 2025 at 11:39, Peter Zijlstra peterz@infradead.org wrote:
On Mon, May 19, 2025 at 05:25:39PM +0800, limingming3 wrote:
pick_eevdf() may return NULL, which would triggers NULL pointer dereference and crash when best and curr are both NULL.
There are two cases when curr would be NULL: 1) curr is NULL when enter pick_eevdf 2) we set it to NUll when curr is not on_rq or eligible.
And when we went to the best = curr flow, the se should never be NULL, So when best and curr are both NULL, we'd better set best = se to avoid return NULL.
Below crash is what I encounter very low probability on our server and I have not reproduce it, and I also found other people feedback some similar crash on lore. So believe the issue is really exit.
If you've found those emails, you'll also have found me telling them this is the wrong fix.
This (returning NULL) can only happen when the internal state is broken. Ignoring the NULL will then hide the actual problem.
Can you reproduce on the latest kernels?, 6.1 is so old I don't even remember what's in there.
Wasn't eevdf merhged in v6.6 ?
On Mon, 19 May 2025 11:38:57 +0200, Peter wrote:
On Mon, May 19, 2025 at 05:25:39PM +0800, limingming3 wrote:
pick_eevdf() may return NULL, which would triggers NULL pointer dereference and crash when best and curr are both NULL.
There are two cases when curr would be NULL:
- curr is NULL when enter pick_eevdf
- we set it to NUll when curr is not on_rq or eligible.
And when we went to the best = curr flow, the se should never be NULL, So when best and curr are both NULL, we'd better set best = se to avoid return NULL.
Below crash is what I encounter very low probability on our server and I have not reproduce it, and I also found other people feedback some similar crash on lore. So believe the issue is really exit.
If you've found those emails, you'll also have found me telling them this is the wrong fix.
This (returning NULL) can only happen when the internal state is broken. Ignoring the NULL will then hide the actual problem.
Thank you for patiently reply, I thought before the curent flow might not deal with the case when curr and best are both NULL.
Now I got your mean, the current flow would never return NULL except the internal state is broken.
Can you reproduce on the latest kernels?, 6.1 is so old I don't even remember what's in there.
We have not reproduced it on the latest kernels.
We merged the eevdf to our kernel 6.1, and we just encounter several crashes on our server, and all of them are at the boot up time.
Maybe there are some bug on our portings and I would add much more debug info in pick_eevdf() to debug.
Hi,
Thanks for your patch.
FYI: kernel test robot notices the stable kernel rule is not satisfied.
The check is based on https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.html#opti...
Rule: add the tag "Cc: stable@vger.kernel.org" in the sign-off area to have the patch automatically included in the stable tree. Subject: [PATCH] sched/eevdf: avoid pick_eevdf() returns NULL Link: https://lore.kernel.org/stable/20250519092540.3932826-1-limingming3%40lixian...
linux-stable-mirror@lists.linaro.org