Calling 'panic()' on a kernel with CONFIG_PREEMPT=y can leave the calling CPU in an infinite loop, but with interrupts and preemption enabled. From this state, userspace can continue to be scheduled, despite the system being "dead" as far as the kernel is concerned. This is easily reproducible on arm64 when booting with "nosmp" on the command line; a couple of shell scripts print out a periodic "Ping" message whilst another triggers a crash by writing to /proc/sysrq-trigger:
| sysrq: Trigger a crash | Kernel panic - not syncing: sysrq triggered crash | CPU: 0 PID: 1 Comm: init Not tainted 5.2.15 #1 | Hardware name: linux,dummy-virt (DT) | Call trace: | dump_backtrace+0x0/0x148 | show_stack+0x14/0x20 | dump_stack+0xa0/0xc4 | panic+0x140/0x32c | sysrq_handle_reboot+0x0/0x20 | __handle_sysrq+0x124/0x190 | write_sysrq_trigger+0x64/0x88 | proc_reg_write+0x60/0xa8 | __vfs_write+0x18/0x40 | vfs_write+0xa4/0x1b8 | ksys_write+0x64/0xf0 | __arm64_sys_write+0x14/0x20 | el0_svc_common.constprop.0+0xb0/0x168 | el0_svc_handler+0x28/0x78 | el0_svc+0x8/0xc | Kernel Offset: disabled | CPU features: 0x0002,24002004 | Memory Limit: none | ---[ end Kernel panic - not syncing: sysrq triggered crash ]--- | Ping 2! | Ping 1! | Ping 1! | Ping 2!
The issue can also be triggered on x86 kernels if CONFIG_SMP=n, otherwise local interrupts are disabled in 'smp_send_stop()'.
Disable preemption in 'panic()' before re-enabling interrupts.
Cc: Russell King linux@armlinux.org.uk Cc: Greg Kroah-Hartman gregkh@linuxfoundation.org Cc: Ingo Molnar mingo@redhat.com Cc: Kees Cook keescook@chromium.org Cc: Andrew Morton akpm@linux-foundation.org Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/BX1W47JXPMR8.58IYW53H6M5N@dragonstone Reported-by: Xogium contact@xogium.me Signed-off-by: Will Deacon will@kernel.org --- kernel/panic.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/kernel/panic.c b/kernel/panic.c index 47e8ebccc22b..f470a038b05b 100644 --- a/kernel/panic.c +++ b/kernel/panic.c @@ -180,6 +180,7 @@ void panic(const char *fmt, ...) * after setting panic_cpu) from invoking panic() again. */ local_irq_disable(); + preempt_disable_notrace();
/* * It's possible to come here directly from a panic-assertion and
On Wed, Oct 02, 2019 at 01:35:38PM +0100, Will Deacon wrote:
Calling 'panic()' on a kernel with CONFIG_PREEMPT=y can leave the calling CPU in an infinite loop, but with interrupts and preemption enabled. From this state, userspace can continue to be scheduled, despite the system being "dead" as far as the kernel is concerned. This is easily reproducible on arm64 when booting with "nosmp" on the command line; a couple of shell scripts print out a periodic "Ping" message whilst another triggers a crash by writing to /proc/sysrq-trigger:
| sysrq: Trigger a crash | Kernel panic - not syncing: sysrq triggered crash | CPU: 0 PID: 1 Comm: init Not tainted 5.2.15 #1 | Hardware name: linux,dummy-virt (DT) | Call trace: | dump_backtrace+0x0/0x148 | show_stack+0x14/0x20 | dump_stack+0xa0/0xc4 | panic+0x140/0x32c | sysrq_handle_reboot+0x0/0x20 | __handle_sysrq+0x124/0x190 | write_sysrq_trigger+0x64/0x88 | proc_reg_write+0x60/0xa8 | __vfs_write+0x18/0x40 | vfs_write+0xa4/0x1b8 | ksys_write+0x64/0xf0 | __arm64_sys_write+0x14/0x20 | el0_svc_common.constprop.0+0xb0/0x168 | el0_svc_handler+0x28/0x78 | el0_svc+0x8/0xc | Kernel Offset: disabled | CPU features: 0x0002,24002004 | Memory Limit: none | ---[ end Kernel panic - not syncing: sysrq triggered crash ]--- | Ping 2! | Ping 1! | Ping 1! | Ping 2!
The issue can also be triggered on x86 kernels if CONFIG_SMP=n, otherwise local interrupts are disabled in 'smp_send_stop()'.
Disable preemption in 'panic()' before re-enabling interrupts.
Is this perhaps the correct solution for what commit c39ea0b9dd24 ("panic: avoid the extra noise dmesg") was trying to fix?
Either way:
Reviewed-by: Kees Cook keescook@chromium.org
-Kees
Cc: Russell King linux@armlinux.org.uk Cc: Greg Kroah-Hartman gregkh@linuxfoundation.org Cc: Ingo Molnar mingo@redhat.com Cc: Kees Cook keescook@chromium.org Cc: Andrew Morton akpm@linux-foundation.org Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/BX1W47JXPMR8.58IYW53H6M5N@dragonstone Reported-by: Xogium contact@xogium.me Signed-off-by: Will Deacon will@kernel.org
kernel/panic.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/kernel/panic.c b/kernel/panic.c index 47e8ebccc22b..f470a038b05b 100644 --- a/kernel/panic.c +++ b/kernel/panic.c @@ -180,6 +180,7 @@ void panic(const char *fmt, ...) * after setting panic_cpu) from invoking panic() again. */ local_irq_disable();
- preempt_disable_notrace();
/* * It's possible to come here directly from a panic-assertion and -- 2.23.0.444.g18eeb5a265-goog
Hi Kees,
On Wed, Oct 02, 2019 at 01:58:46PM -0700, Kees Cook wrote:
On Wed, Oct 02, 2019 at 01:35:38PM +0100, Will Deacon wrote:
Calling 'panic()' on a kernel with CONFIG_PREEMPT=y can leave the calling CPU in an infinite loop, but with interrupts and preemption enabled. From this state, userspace can continue to be scheduled, despite the system being "dead" as far as the kernel is concerned. This is easily reproducible on arm64 when booting with "nosmp" on the command line; a couple of shell scripts print out a periodic "Ping" message whilst another triggers a crash by writing to /proc/sysrq-trigger:
| sysrq: Trigger a crash | Kernel panic - not syncing: sysrq triggered crash | CPU: 0 PID: 1 Comm: init Not tainted 5.2.15 #1 | Hardware name: linux,dummy-virt (DT) | Call trace: | dump_backtrace+0x0/0x148 | show_stack+0x14/0x20 | dump_stack+0xa0/0xc4 | panic+0x140/0x32c | sysrq_handle_reboot+0x0/0x20 | __handle_sysrq+0x124/0x190 | write_sysrq_trigger+0x64/0x88 | proc_reg_write+0x60/0xa8 | __vfs_write+0x18/0x40 | vfs_write+0xa4/0x1b8 | ksys_write+0x64/0xf0 | __arm64_sys_write+0x14/0x20 | el0_svc_common.constprop.0+0xb0/0x168 | el0_svc_handler+0x28/0x78 | el0_svc+0x8/0xc | Kernel Offset: disabled | CPU features: 0x0002,24002004 | Memory Limit: none | ---[ end Kernel panic - not syncing: sysrq triggered crash ]--- | Ping 2! | Ping 1! | Ping 1! | Ping 2!
The issue can also be triggered on x86 kernels if CONFIG_SMP=n, otherwise local interrupts are disabled in 'smp_send_stop()'.
Disable preemption in 'panic()' before re-enabling interrupts.
Is this perhaps the correct solution for what commit c39ea0b9dd24 ("panic: avoid the extra noise dmesg") was trying to fix?
Hmm, maybe, although that looks like it's focussed more on irq handling than preemption. I've deliberately left the irq part alone, since I think having magic sysrq work via the keyboard interrupt is desirable from the panic loop.
Reviewed-by: Kees Cook keescook@chromium.org
Thanks!
Will
On Thu 2019-10-03 21:56:34, Will Deacon wrote:
Hi Kees,
On Wed, Oct 02, 2019 at 01:58:46PM -0700, Kees Cook wrote:
On Wed, Oct 02, 2019 at 01:35:38PM +0100, Will Deacon wrote:
Calling 'panic()' on a kernel with CONFIG_PREEMPT=y can leave the calling CPU in an infinite loop, but with interrupts and preemption enabled. From this state, userspace can continue to be scheduled, despite the system being "dead" as far as the kernel is concerned. This is easily reproducible on arm64 when booting with "nosmp" on the command line; a couple of shell scripts print out a periodic "Ping" message whilst another triggers a crash by writing to /proc/sysrq-trigger:
| sysrq: Trigger a crash | Kernel panic - not syncing: sysrq triggered crash | CPU: 0 PID: 1 Comm: init Not tainted 5.2.15 #1 | Hardware name: linux,dummy-virt (DT) | Call trace: | dump_backtrace+0x0/0x148 | show_stack+0x14/0x20 | dump_stack+0xa0/0xc4 | panic+0x140/0x32c | sysrq_handle_reboot+0x0/0x20 | __handle_sysrq+0x124/0x190 | write_sysrq_trigger+0x64/0x88 | proc_reg_write+0x60/0xa8 | __vfs_write+0x18/0x40 | vfs_write+0xa4/0x1b8 | ksys_write+0x64/0xf0 | __arm64_sys_write+0x14/0x20 | el0_svc_common.constprop.0+0xb0/0x168 | el0_svc_handler+0x28/0x78 | el0_svc+0x8/0xc | Kernel Offset: disabled | CPU features: 0x0002,24002004 | Memory Limit: none | ---[ end Kernel panic - not syncing: sysrq triggered crash ]--- | Ping 2! | Ping 1! | Ping 1! | Ping 2!
The issue can also be triggered on x86 kernels if CONFIG_SMP=n, otherwise local interrupts are disabled in 'smp_send_stop()'.
Disable preemption in 'panic()' before re-enabling interrupts.
Is this perhaps the correct solution for what commit c39ea0b9dd24 ("panic: avoid the extra noise dmesg") was trying to fix?
Hmm, maybe, although that looks like it's focussed more on irq handling than preemption.
Exactly, the backtrace mentioned in commit c39ea0b9dd24 ("panic: avoid the extra noise dmesg") is printed by wake_up() called from wake_up_klogd_work_func(). It is irq_work. Therefore disabling preemption would not prevent this.
I've deliberately left the irq part alone, since I think having magic sysrq work via the keyboard interrupt is desirable from the panic loop.
I agree that we should keep sysrq working.
One pity thing is that led_panic_blink() in leds/drivers/trigger/ledtrig-panic.c uses workqueues:
+ led_panic_blink() + led_trigger_event() + led_set_brightness() + schedule_work()
It means that it depends on the scheduler. I guess that it does not work in many panic situations. But this patch will always block it.
I agree that it is strange that userspace still works at this stage. But does it cause any real problems?
Best Regards, Petr
On Fri, Oct 04, 2019 at 11:11:42AM +0200, Petr Mladek wrote:
On Thu 2019-10-03 21:56:34, Will Deacon wrote:
Hi Kees,
On Wed, Oct 02, 2019 at 01:58:46PM -0700, Kees Cook wrote:
On Wed, Oct 02, 2019 at 01:35:38PM +0100, Will Deacon wrote:
Calling 'panic()' on a kernel with CONFIG_PREEMPT=y can leave the calling CPU in an infinite loop, but with interrupts and preemption enabled. From this state, userspace can continue to be scheduled, despite the system being "dead" as far as the kernel is concerned. This is easily reproducible on arm64 when booting with "nosmp" on the command line; a couple of shell scripts print out a periodic "Ping" message whilst another triggers a crash by writing to /proc/sysrq-trigger:
| sysrq: Trigger a crash | Kernel panic - not syncing: sysrq triggered crash | CPU: 0 PID: 1 Comm: init Not tainted 5.2.15 #1 | Hardware name: linux,dummy-virt (DT) | Call trace: | dump_backtrace+0x0/0x148 | show_stack+0x14/0x20 | dump_stack+0xa0/0xc4 | panic+0x140/0x32c | sysrq_handle_reboot+0x0/0x20 | __handle_sysrq+0x124/0x190 | write_sysrq_trigger+0x64/0x88 | proc_reg_write+0x60/0xa8 | __vfs_write+0x18/0x40 | vfs_write+0xa4/0x1b8 | ksys_write+0x64/0xf0 | __arm64_sys_write+0x14/0x20 | el0_svc_common.constprop.0+0xb0/0x168 | el0_svc_handler+0x28/0x78 | el0_svc+0x8/0xc | Kernel Offset: disabled | CPU features: 0x0002,24002004 | Memory Limit: none | ---[ end Kernel panic - not syncing: sysrq triggered crash ]--- | Ping 2! | Ping 1! | Ping 1! | Ping 2!
The issue can also be triggered on x86 kernels if CONFIG_SMP=n, otherwise local interrupts are disabled in 'smp_send_stop()'.
Disable preemption in 'panic()' before re-enabling interrupts.
Is this perhaps the correct solution for what commit c39ea0b9dd24 ("panic: avoid the extra noise dmesg") was trying to fix?
Hmm, maybe, although that looks like it's focussed more on irq handling than preemption.
Exactly, the backtrace mentioned in commit c39ea0b9dd24 ("panic: avoid the extra noise dmesg") is printed by wake_up() called from wake_up_klogd_work_func(). It is irq_work. Therefore disabling preemption would not prevent this.
I've deliberately left the irq part alone, since I think having magic sysrq work via the keyboard interrupt is desirable from the panic loop.
I agree that we should keep sysrq working.
One pity thing is that led_panic_blink() in leds/drivers/trigger/ledtrig-panic.c uses workqueues:
- led_panic_blink()
- led_trigger_event()
- led_set_brightness()
- schedule_work()
It means that it depends on the scheduler. I guess that it does not work in many panic situations. But this patch will always block it.
I agree that it is strange that userspace still works at this stage. But does it cause any real problems?
Yes, there are watchdog drivers that continue to pat their watchdog after the kernel has panic'd. It makes watchdogs useless (which is exactly how this problem was discovered.)
On Fri, Oct 04, 2019 at 10:29:17AM +0100, Russell King - ARM Linux admin wrote:
On Fri, Oct 04, 2019 at 11:11:42AM +0200, Petr Mladek wrote:
On Thu 2019-10-03 21:56:34, Will Deacon wrote:
I've deliberately left the irq part alone, since I think having magic sysrq work via the keyboard interrupt is desirable from the panic loop.
I agree that we should keep sysrq working.
One pity thing is that led_panic_blink() in leds/drivers/trigger/ledtrig-panic.c uses workqueues:
- led_panic_blink()
- led_trigger_event()
- led_set_brightness()
- schedule_work()
It means that it depends on the scheduler. I guess that it does not work in many panic situations. But this patch will always block it.
I agree that it is strange that userspace still works at this stage. But does it cause any real problems?
Yes, there are watchdog drivers that continue to pat their watchdog after the kernel has panic'd. It makes watchdogs useless (which is exactly how this problem was discovered.)
Indeed, and I think the LED blinking is already unreliable if the brightness operation needs to sleep. For example, if the kernel isn't preemptible or the work gets queued up on a different CPU which is sitting in panic_smp_self_stop().
Will
On Fri 2019-10-04 11:49:48, Will Deacon wrote:
On Fri, Oct 04, 2019 at 10:29:17AM +0100, Russell King - ARM Linux admin wrote:
On Fri, Oct 04, 2019 at 11:11:42AM +0200, Petr Mladek wrote:
On Thu 2019-10-03 21:56:34, Will Deacon wrote:
I've deliberately left the irq part alone, since I think having magic sysrq work via the keyboard interrupt is desirable from the panic loop.
I agree that we should keep sysrq working.
One pity thing is that led_panic_blink() in leds/drivers/trigger/ledtrig-panic.c uses workqueues:
- led_panic_blink()
- led_trigger_event()
- led_set_brightness()
- schedule_work()
It means that it depends on the scheduler. I guess that it does not work in many panic situations. But this patch will always block it.
I agree that it is strange that userspace still works at this stage. But does it cause any real problems?
Yes, there are watchdog drivers that continue to pat their watchdog after the kernel has panic'd. It makes watchdogs useless (which is exactly how this problem was discovered.)
Indeed, and I think the LED blinking is already unreliable if the brightness operation needs to sleep. For example, if the kernel isn't preemptible or the work gets queued up on a different CPU which is sitting in panic_smp_self_stop().
To make it clear. I do not want to block this patch. I just wanted to point out the problem. I am not sure how the blinking is important these days. Well, I could imagine that it might be useful on some embedded devices.
Another question is how many people want to end up with dead system these days. The watchdogs are likely used in data centers. I guess that automatic reboot in panic() is a better choice there.
Anyway, it might make sense to remove the panic blinking code when it will not have a chance to work.
Best Regards, Petr
On Fri, Oct 04, 2019 at 01:15:21PM +0200, Petr Mladek wrote:
On Fri 2019-10-04 11:49:48, Will Deacon wrote:
On Fri, Oct 04, 2019 at 10:29:17AM +0100, Russell King - ARM Linux admin wrote:
On Fri, Oct 04, 2019 at 11:11:42AM +0200, Petr Mladek wrote:
On Thu 2019-10-03 21:56:34, Will Deacon wrote:
I've deliberately left the irq part alone, since I think having magic sysrq work via the keyboard interrupt is desirable from the panic loop.
I agree that we should keep sysrq working.
One pity thing is that led_panic_blink() in leds/drivers/trigger/ledtrig-panic.c uses workqueues:
- led_panic_blink()
- led_trigger_event()
- led_set_brightness()
- schedule_work()
It means that it depends on the scheduler. I guess that it does not work in many panic situations. But this patch will always block it.
I agree that it is strange that userspace still works at this stage. But does it cause any real problems?
Yes, there are watchdog drivers that continue to pat their watchdog after the kernel has panic'd. It makes watchdogs useless (which is exactly how this problem was discovered.)
Indeed, and I think the LED blinking is already unreliable if the brightness operation needs to sleep. For example, if the kernel isn't preemptible or the work gets queued up on a different CPU which is sitting in panic_smp_self_stop().
To make it clear. I do not want to block this patch. I just wanted to point out the problem. I am not sure how the blinking is important these days. Well, I could imagine that it might be useful on some embedded devices.
When reviewing the c39ea0b9dd24 ("panic: avoid the extra noise dmesg"), there was similar discussion about what's the expectation for kernel when panic happens, as the earlier version patch is simply keeping the the local irq disabled, which may break the sysrq and the panic blink code, so we turned to handling it together with printk.
Another question is how many people want to end up with dead system these days. The watchdogs are likely used in data centers. I guess that automatic reboot in panic() is a better choice there.
Anyway, it might make sense to remove the panic blinking code when it will not have a chance to work.
I was also wondering if the panic blinking code still really works on any platforms.
Thanks, Feng
Best Regards, Petr
On Fri, 4 Oct 2019, Will Deacon wrote:
Indeed, and I think the LED blinking is already unreliable if the brightness operation needs to sleep.
One thing is that led_set_brightness() can probably be forced to avoid the workqueue scheduling, by setting LED_BLINK_SW on the device (e.g. by issuing led_set_software_blink() during panic).
But I am afraid this still won't solve the issue completely, as USB keyboards need workqueues for blinking the LEDs in for URB management.
On Wed, 2 Oct 2019 13:35:38 +0100 Will Deacon will@kernel.org wrote:
Calling 'panic()' on a kernel with CONFIG_PREEMPT=y can leave the calling CPU in an infinite loop, but with interrupts and preemption enabled. From this state, userspace can continue to be scheduled, despite the system being "dead" as far as the kernel is concerned. This is easily reproducible on arm64 when booting with "nosmp" on the command line; a couple of shell scripts print out a periodic "Ping" message whilst another triggers a crash by writing to /proc/sysrq-trigger:
| sysrq: Trigger a crash | Kernel panic - not syncing: sysrq triggered crash | CPU: 0 PID: 1 Comm: init Not tainted 5.2.15 #1 | Hardware name: linux,dummy-virt (DT) | Call trace: | dump_backtrace+0x0/0x148 | show_stack+0x14/0x20 | dump_stack+0xa0/0xc4 | panic+0x140/0x32c | sysrq_handle_reboot+0x0/0x20 | __handle_sysrq+0x124/0x190 | write_sysrq_trigger+0x64/0x88 | proc_reg_write+0x60/0xa8 | __vfs_write+0x18/0x40 | vfs_write+0xa4/0x1b8 | ksys_write+0x64/0xf0 | __arm64_sys_write+0x14/0x20 | el0_svc_common.constprop.0+0xb0/0x168 | el0_svc_handler+0x28/0x78 | el0_svc+0x8/0xc | Kernel Offset: disabled | CPU features: 0x0002,24002004 | Memory Limit: none | ---[ end Kernel panic - not syncing: sysrq triggered crash ]--- | Ping 2! | Ping 1! | Ping 1! | Ping 2!
The issue can also be triggered on x86 kernels if CONFIG_SMP=n, otherwise local interrupts are disabled in 'smp_send_stop()'.
Disable preemption in 'panic()' before re-enabling interrupts.
...
--- a/kernel/panic.c +++ b/kernel/panic.c @@ -180,6 +180,7 @@ void panic(const char *fmt, ...) * after setting panic_cpu) from invoking panic() again. */ local_irq_disable();
- preempt_disable_notrace();
/* * It's possible to come here directly from a panic-assertion and
We still do a lot of stuff (kexec, kgdb, etc) after this preempt_disable() and I worry that something in there will now trigger a might_sleep() warning as a result?
Hi Andrew,
Thanks for having a look.
On Wed, Oct 02, 2019 at 02:45:58PM -0700, Andrew Morton wrote:
On Wed, 2 Oct 2019 13:35:38 +0100 Will Deacon will@kernel.org wrote:
Disable preemption in 'panic()' before re-enabling interrupts.
...
--- a/kernel/panic.c +++ b/kernel/panic.c @@ -180,6 +180,7 @@ void panic(const char *fmt, ...) * after setting panic_cpu) from invoking panic() again. */ local_irq_disable();
- preempt_disable_notrace();
/* * It's possible to come here directly from a panic-assertion and
We still do a lot of stuff (kexec, kgdb, etc) after this preempt_disable() and I worry that something in there will now trigger a might_sleep() warning as a result?
Given that interrupts are already disabled at this point, I don't think we'll get any additional warnings here by disabling preemption as well.
Will
linux-stable-mirror@lists.linaro.org