[PATCH stable] notifiers: Add oops check in blocking_notifier_call_chain()

List overview All Threads
Download

newer

older

[PATCH v2 01/17] drm/xe/svm: Fix a...

[PATCH net 0/6] selftests: mptcp:...

Yi Yang

17 Oct 2025 17 Oct '25

6:17 a.m.

In hrtimer_interrupt(), interrupts are disabled when acquiring a spinlock, which subsequently triggers an oops. During the oops call chain, blocking_notifier_call_chain() invokes _cond_resched, ultimately leading to a hard lockup.

Call Stack: hrtimer_interrupt//raw_spin_lock_irqsave __hrtimer_run_queues page_fault do_page_fault bad_area_nosemaphore no_context oops_end bust_spinlocks unblank_screen do_unblank_screen fbcon_blank fb_notifier_call_chain blocking_notifier_call_chain down_read _cond_resched

If the system is in an oops state, use down_read_trylock instead of a blocking lock acquisition. If the trylock fails, skip executing the notifier callbacks to avoid potential deadlocks or unsafe operations during the oops handling process.

Cc: stable@vger.kernel.org # 6.6 Fixes: fe9d4f576324 ("Add kernel/notifier.c") Signed-off-by: Yi Yang yiyang13@huawei.com --- kernel/notifier.c | 15 ++++++++++++--- 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/kernel/notifier.c b/kernel/notifier.c index b3ce28f39eb6..ebff2315fac2 100644 --- a/kernel/notifier.c +++ b/kernel/notifier.c @@ -384,9 +384,18 @@ int blocking_notifier_call_chain(struct blocking_notifier_head *nh, * is, we re-check the list after having taken the lock anyway: */ if (rcu_access_pointer(nh->head)) { - down_read(&nh->rwsem); - ret = notifier_call_chain(&nh->head, val, v, -1, NULL); - up_read(&nh->rwsem); + if (!oops_in_progress) { + down_read(&nh->rwsem); + ret = notifier_call_chain(&nh->head, val, v, -1, NULL); + up_read(&nh->rwsem); + } else { + if (down_read_trylock(&nh->rwsem)) { + ret = notifier_call_chain(&nh->head, val, v, -1, NULL); + up_read(&nh->rwsem); + } else { + ret = NOTIFY_BAD; + } + } } return ret; }

-- 2.25.1

Show replies by date

Andrew Morton

17 Oct 17 Oct

10:25 p.m.

On Fri, 17 Oct 2025 06:17:40 +0000 Yi Yang yiyang13@huawei.com wrote:

...

In hrtimer_interrupt(), interrupts are disabled when acquiring a spinlock, which subsequently triggers an oops. During the oops call chain, blocking_notifier_call_chain() invokes _cond_resched, ultimately leading to a hard lockup.

Call Stack: hrtimer_interrupt//raw_spin_lock_irqsave __hrtimer_run_queues page_fault do_page_fault bad_area_nosemaphore no_context oops_end bust_spinlocks unblank_screen do_unblank_screen fbcon_blank fb_notifier_call_chain blocking_notifier_call_chain down_read _cond_resched

Seems this trace is upside-down relative to what we usually see.

Is the unaltered dmesg output available?

...

If the system is in an oops state, use down_read_trylock instead of a blocking lock acquisition. If the trylock fails, skip executing the notifier callbacks to avoid potential deadlocks or unsafe operations during the oops handling process.

...

--- a/kernel/notifier.c +++ b/kernel/notifier.c @@ -384,9 +384,18 @@ int blocking_notifier_call_chain(struct blocking_notifier_head *nh, * is, we re-check the list after having taken the lock anyway: */ if (rcu_access_pointer(nh->head)) {
down_read(&nh->rwsem);
ret = notifier_call_chain(&nh->head, val, v, -1, NULL);
up_read(&nh->rwsem);
if (!oops_in_progress) {
	down_read(&nh->rwsem);
	ret = notifier_call_chain(&nh->head, val, v, -1, NULL);
	up_read(&nh->rwsem);
} else {
	if (down_read_trylock(&nh->rwsem)) {
		ret = notifier_call_chain(&nh->head, val, v, -1, NULL);
		up_read(&nh->rwsem);
	} else {
		ret = NOTIFY_BAD;
	}
}
} return ret;

Am I correct in believing that fb_notifier_call_chain() is only ever called if defined(CONFIG_GUMSTIX_AM200EPD)?

I wonder what that call is for, and if we can simply remove it.

yiyang (D)

22 Oct 22 Oct

3:36 a.m.

On 2025/10/18 6:25, Andrew Morton wrote:

...

On Fri, 17 Oct 2025 06:17:40 +0000 Yi Yang yiyang13@huawei.com wrote:

...
In hrtimer_interrupt(), interrupts are disabled when acquiring a spinlock, which subsequently triggers an oops. During the oops call chain, blocking_notifier_call_chain() invokes _cond_resched, ultimately leading to a hard lockup.

Call Stack: hrtimer_interrupt//raw_spin_lock_irqsave __hrtimer_run_queues page_fault do_page_fault bad_area_nosemaphore no_context oops_end bust_spinlocks unblank_screen do_unblank_screen fbcon_blank fb_notifier_call_chain blocking_notifier_call_chain down_read _cond_resched

Seems this trace is upside-down relative to what we usually see.

Is the unaltered dmesg output available?

Below is an excerpt from the original error message:

#0[ffff8a317f6c3ac0] __cond_resched at ffffffffa10d29a6 #1[ffff8a317f6c3ad8] _cond_resched at ffffffffa17292cf #2[ffff8a317f6c3ae8] down_read at ffffffffa1728022 #3[ffff8a317f6c3b00] __blocking_notifier_call_chain at ffffffffa10c5c37 #4[ffff8a317f6c3b40] blocking_notifier_call_chain at ffffffffa10c5c86 #5[ffff8a317f6c3b50] fb_notifier_call_chain at ffffffffa13c83eb #6[ffff8a317f6c3b60] fb_blank at ffffffffa13c88eb #7[ffff8a317f6c3ba0] fbcon_blank at ffffffffa13d4a4b #8[ffff8a317f6c3ca0] do_unblank_screen at ffffffffa144cb30 #9[ffff8a317f6c3cc0] unblank_screen at ffffffffa144cbf0 #10[ffff8a317f6c3ce0] oops_end at ffffffffa172d6d5 #11[ffff8a317f6c3d08] no_context at ffffffffa171cebc #12[ffff8a317f6c3d58] __bad_area_nosemaphore at ffffffffa171cf53 #13[ffff8a317f6c3da8] bad_area_nosemaphore at ffffffffa171d0c4 #14[ffff8a317f6c3db8] __do_page_fault at ffffffffa17306b0 #15[ffff8a317f6c3e20] do_page_fault at ffffffffa1730895 #16[ffff8a317f6c3e50] page_fault at ffffffffa172c768

...

...
If the system is in an oops state, use down_read_trylock instead of a blocking lock acquisition. If the trylock fails, skip executing the notifier callbacks to avoid potential deadlocks or unsafe operations during the oops handling process.

...

--- a/kernel/notifier.c +++ b/kernel/notifier.c @@ -384,9 +384,18 @@ int blocking_notifier_call_chain(struct blocking_notifier_head *nh, * is, we re-check the list after having taken the lock anyway: */ if (rcu_access_pointer(nh->head)) {
down_read(&nh->rwsem);
ret = notifier_call_chain(&nh->head, val, v, -1, NULL);
up_read(&nh->rwsem);
if (!oops_in_progress) {
	down_read(&nh->rwsem);
	ret = notifier_call_chain(&nh->head, val, v, -1, NULL);
	up_read(&nh->rwsem);
} else {
	if (down_read_trylock(&nh->rwsem)) {
		ret = notifier_call_chain(&nh->head, val, v, -1, NULL);
		up_read(&nh->rwsem);
	} else {
		ret = NOTIFY_BAD;
	}
}
} return ret;
Am I correct in believing that fb_notifier_call_chain() is only ever called if defined(CONFIG_GUMSTIX_AM200EPD)?

fb_notifier_call_chain() is called in both the fb_blank() and fb_set_var() functions, and it is not only called when defined(CONFIG_GUMSTIX_AM200EPD).

...

I wonder what that call is for, and if we can simply remove it.

The function called when an issue occurs is `fb_notifier_call_chain(FB_EVENT_BLANK, &event);`. The purpose of this function is to invoke the notification chain that has registered for the FB_EVENT_BLANK event.

The FB_EVENT_BLANK event appears to indicate a screen-related state.

...

.

yiyang (D)

12 Nov 12 Nov

2:46 a.m.

On 2025/10/22 11:36, yiyang (D) wrote:

...

On 2025/10/18 6:25, Andrew Morton wrote:

...
On Fri, 17 Oct 2025 06:17:40 +0000 Yi Yang yiyang13@huawei.com wrote:

...
In hrtimer_interrupt(), interrupts are disabled when acquiring a spinlock, which subsequently triggers an oops. During the oops call chain, blocking_notifier_call_chain() invokes _cond_resched, ultimately leading to a hard lockup.

Call Stack: hrtimer_interrupt//raw_spin_lock_irqsave __hrtimer_run_queues page_fault do_page_fault bad_area_nosemaphore no_context oops_end bust_spinlocks unblank_screen do_unblank_screen fbcon_blank fb_notifier_call_chain blocking_notifier_call_chain down_read _cond_resched

Seems this trace is upside-down relative to what we usually see.

Is the unaltered dmesg output available?

Below is an excerpt from the original error message:

#0[ffff8a317f6c3ac0] __cond_resched at ffffffffa10d29a6 #1[ffff8a317f6c3ad8] _cond_resched at ffffffffa17292cf #2[ffff8a317f6c3ae8] down_read at ffffffffa1728022 #3[ffff8a317f6c3b00] __blocking_notifier_call_chain at ffffffffa10c5c37 #4[ffff8a317f6c3b40] blocking_notifier_call_chain at ffffffffa10c5c86 #5[ffff8a317f6c3b50] fb_notifier_call_chain at ffffffffa13c83eb #6[ffff8a317f6c3b60] fb_blank at ffffffffa13c88eb #7[ffff8a317f6c3ba0] fbcon_blank at ffffffffa13d4a4b #8[ffff8a317f6c3ca0] do_unblank_screen at ffffffffa144cb30 #9[ffff8a317f6c3cc0] unblank_screen at ffffffffa144cbf0 #10[ffff8a317f6c3ce0] oops_end at ffffffffa172d6d5 #11[ffff8a317f6c3d08] no_context at ffffffffa171cebc #12[ffff8a317f6c3d58] __bad_area_nosemaphore at ffffffffa171cf53 #13[ffff8a317f6c3da8] bad_area_nosemaphore at ffffffffa171d0c4 #14[ffff8a317f6c3db8] __do_page_fault at ffffffffa17306b0 #15[ffff8a317f6c3e20] do_page_fault at ffffffffa1730895 #16[ffff8a317f6c3e50] page_fault at ffffffffa172c768

...
...
If the system is in an oops state, use down_read_trylock instead of a blocking lock acquisition. If the trylock fails, skip executing the notifier callbacks to avoid potential deadlocks or unsafe operations during the oops handling process.

...

--- a/kernel/notifier.c +++ b/kernel/notifier.c @@ -384,9 +384,18 @@ int blocking_notifier_call_chain(struct blocking_notifier_head *nh, * is, we re-check the list after having taken the lock anyway: */ if (rcu_access_pointer(nh->head)) { - down_read(&nh->rwsem); - ret = notifier_call_chain(&nh->head, val, v, -1, NULL); - up_read(&nh->rwsem); + if (!oops_in_progress) { + down_read(&nh->rwsem); + ret = notifier_call_chain(&nh->head, val, v, -1, NULL); + up_read(&nh->rwsem); + } else { + if (down_read_trylock(&nh->rwsem)) { + ret = notifier_call_chain(&nh->head, val, v, -1, NULL); + up_read(&nh->rwsem); + } else { + ret = NOTIFY_BAD; + } + } } return ret;

Am I correct in believing that fb_notifier_call_chain() is only ever called if defined(CONFIG_GUMSTIX_AM200EPD)?

fb_notifier_call_chain() is called in both the fb_blank() and fb_set_var() functions, and it is not only called when defined(CONFIG_GUMSTIX_AM200EPD).

...
I wonder what that call is for, and if we can simply remove it.

The function called when an issue occurs is `fb_notifier_call_chain(FB_EVENT_BLANK, &event);`. The purpose of this function is to invoke the notification chain that has registered for the FB_EVENT_BLANK event.

The FB_EVENT_BLANK event appears to indicate a screen-related state.

...
.

Do you think it is necessary to merge this patch into the 6.6 stable branch (or earlier versions)? Currently, when an oops occurs, the actual panic stack trace is not being printed because it is being blocked by the notification chain.

days inactive

days old

linux-stable-mirror@lists.linaro.org

3 comments

participants

tags (0)

participants (3)

Andrew Morton
Yi Yang
yiyang (D)