In hrtimer_interrupt(), interrupts are disabled when acquiring a spinlock, which subsequently triggers an oops. During the oops call chain, blocking_notifier_call_chain() invokes _cond_resched, ultimately leading to a hard lockup.
Call Stack: hrtimer_interrupt//raw_spin_lock_irqsave __hrtimer_run_queues page_fault do_page_fault bad_area_nosemaphore no_context oops_end bust_spinlocks unblank_screen do_unblank_screen fbcon_blank fb_notifier_call_chain blocking_notifier_call_chain down_read _cond_resched
If the system is in an oops state, use down_read_trylock instead of a blocking lock acquisition. If the trylock fails, skip executing the notifier callbacks to avoid potential deadlocks or unsafe operations during the oops handling process.
Cc: stable@vger.kernel.org # 6.6 Fixes: fe9d4f576324 ("Add kernel/notifier.c") Signed-off-by: Yi Yang yiyang13@huawei.com --- kernel/notifier.c | 15 ++++++++++++--- 1 file changed, 12 insertions(+), 3 deletions(-)
diff --git a/kernel/notifier.c b/kernel/notifier.c index b3ce28f39eb6..ebff2315fac2 100644 --- a/kernel/notifier.c +++ b/kernel/notifier.c @@ -384,9 +384,18 @@ int blocking_notifier_call_chain(struct blocking_notifier_head *nh, * is, we re-check the list after having taken the lock anyway: */ if (rcu_access_pointer(nh->head)) { - down_read(&nh->rwsem); - ret = notifier_call_chain(&nh->head, val, v, -1, NULL); - up_read(&nh->rwsem); + if (!oops_in_progress) { + down_read(&nh->rwsem); + ret = notifier_call_chain(&nh->head, val, v, -1, NULL); + up_read(&nh->rwsem); + } else { + if (down_read_trylock(&nh->rwsem)) { + ret = notifier_call_chain(&nh->head, val, v, -1, NULL); + up_read(&nh->rwsem); + } else { + ret = NOTIFY_BAD; + } + } } return ret; }
On Fri, 17 Oct 2025 06:17:40 +0000 Yi Yang yiyang13@huawei.com wrote:
In hrtimer_interrupt(), interrupts are disabled when acquiring a spinlock, which subsequently triggers an oops. During the oops call chain, blocking_notifier_call_chain() invokes _cond_resched, ultimately leading to a hard lockup.
Call Stack: hrtimer_interrupt//raw_spin_lock_irqsave __hrtimer_run_queues page_fault do_page_fault bad_area_nosemaphore no_context oops_end bust_spinlocks unblank_screen do_unblank_screen fbcon_blank fb_notifier_call_chain blocking_notifier_call_chain down_read _cond_resched
Seems this trace is upside-down relative to what we usually see.
Is the unaltered dmesg output available?
If the system is in an oops state, use down_read_trylock instead of a blocking lock acquisition. If the trylock fails, skip executing the notifier callbacks to avoid potential deadlocks or unsafe operations during the oops handling process.
...
--- a/kernel/notifier.c +++ b/kernel/notifier.c @@ -384,9 +384,18 @@ int blocking_notifier_call_chain(struct blocking_notifier_head *nh, * is, we re-check the list after having taken the lock anyway: */ if (rcu_access_pointer(nh->head)) {
down_read(&nh->rwsem);
ret = notifier_call_chain(&nh->head, val, v, -1, NULL);
up_read(&nh->rwsem);
if (!oops_in_progress) {
down_read(&nh->rwsem);
ret = notifier_call_chain(&nh->head, val, v, -1, NULL);
up_read(&nh->rwsem);
} else {
if (down_read_trylock(&nh->rwsem)) {
ret = notifier_call_chain(&nh->head, val, v, -1, NULL);
up_read(&nh->rwsem);
} else {
ret = NOTIFY_BAD;
}
} return ret;}
Am I correct in believing that fb_notifier_call_chain() is only ever called if defined(CONFIG_GUMSTIX_AM200EPD)?
I wonder what that call is for, and if we can simply remove it.
On 2025/10/18 6:25, Andrew Morton wrote:
On Fri, 17 Oct 2025 06:17:40 +0000 Yi Yang yiyang13@huawei.com wrote:
In hrtimer_interrupt(), interrupts are disabled when acquiring a spinlock, which subsequently triggers an oops. During the oops call chain, blocking_notifier_call_chain() invokes _cond_resched, ultimately leading to a hard lockup.
Call Stack: hrtimer_interrupt//raw_spin_lock_irqsave __hrtimer_run_queues page_fault do_page_fault bad_area_nosemaphore no_context oops_end bust_spinlocks unblank_screen do_unblank_screen fbcon_blank fb_notifier_call_chain blocking_notifier_call_chain down_read _cond_resched
Seems this trace is upside-down relative to what we usually see.
Is the unaltered dmesg output available?
Below is an excerpt from the original error message:
#0[ffff8a317f6c3ac0] __cond_resched at ffffffffa10d29a6 #1[ffff8a317f6c3ad8] _cond_resched at ffffffffa17292cf #2[ffff8a317f6c3ae8] down_read at ffffffffa1728022 #3[ffff8a317f6c3b00] __blocking_notifier_call_chain at ffffffffa10c5c37 #4[ffff8a317f6c3b40] blocking_notifier_call_chain at ffffffffa10c5c86 #5[ffff8a317f6c3b50] fb_notifier_call_chain at ffffffffa13c83eb #6[ffff8a317f6c3b60] fb_blank at ffffffffa13c88eb #7[ffff8a317f6c3ba0] fbcon_blank at ffffffffa13d4a4b #8[ffff8a317f6c3ca0] do_unblank_screen at ffffffffa144cb30 #9[ffff8a317f6c3cc0] unblank_screen at ffffffffa144cbf0 #10[ffff8a317f6c3ce0] oops_end at ffffffffa172d6d5 #11[ffff8a317f6c3d08] no_context at ffffffffa171cebc #12[ffff8a317f6c3d58] __bad_area_nosemaphore at ffffffffa171cf53 #13[ffff8a317f6c3da8] bad_area_nosemaphore at ffffffffa171d0c4 #14[ffff8a317f6c3db8] __do_page_fault at ffffffffa17306b0 #15[ffff8a317f6c3e20] do_page_fault at ffffffffa1730895 #16[ffff8a317f6c3e50] page_fault at ffffffffa172c768
If the system is in an oops state, use down_read_trylock instead of a blocking lock acquisition. If the trylock fails, skip executing the notifier callbacks to avoid potential deadlocks or unsafe operations during the oops handling process.
...
--- a/kernel/notifier.c +++ b/kernel/notifier.c @@ -384,9 +384,18 @@ int blocking_notifier_call_chain(struct blocking_notifier_head *nh, * is, we re-check the list after having taken the lock anyway: */ if (rcu_access_pointer(nh->head)) {
down_read(&nh->rwsem);
ret = notifier_call_chain(&nh->head, val, v, -1, NULL);
up_read(&nh->rwsem);
if (!oops_in_progress) {
down_read(&nh->rwsem);
ret = notifier_call_chain(&nh->head, val, v, -1, NULL);
up_read(&nh->rwsem);
} else {
if (down_read_trylock(&nh->rwsem)) {
ret = notifier_call_chain(&nh->head, val, v, -1, NULL);
up_read(&nh->rwsem);
} else {
ret = NOTIFY_BAD;
}
} return ret;}
Am I correct in believing that fb_notifier_call_chain() is only ever called if defined(CONFIG_GUMSTIX_AM200EPD)?
fb_notifier_call_chain() is called in both the fb_blank() and fb_set_var() functions, and it is not only called when defined(CONFIG_GUMSTIX_AM200EPD).
I wonder what that call is for, and if we can simply remove it.
The function called when an issue occurs is `fb_notifier_call_chain(FB_EVENT_BLANK, &event);`. The purpose of this function is to invoke the notification chain that has registered for the FB_EVENT_BLANK event.
The FB_EVENT_BLANK event appears to indicate a screen-related state.
.
linux-stable-mirror@lists.linaro.org