On Tue, Jun 28, 2022 at 09:41:09AM +0800, Duoming Zhou wrote:
There are sleep in atomic context bugs when uploading device dump data in mwifiex. The root cause is that dev_coredumpv could not be used in atomic contexts, because it calls dev_set_name which include operations that may sleep. The call tree shows execution paths that could lead to bugs:
(Interrupt context) fw_dump_timer_fn mwifiex_upload_device_dump dev_coredumpv(..., GFP_KERNEL) dev_coredumpm() kzalloc(sizeof(*devcd), gfp); //may sleep dev_set_name kobject_set_name_vargs kvasprintf_const(GFP_KERNEL, ...); //may sleep kstrdup(s, GFP_KERNEL); //may sleep
The corresponding fail log is shown below:
[ 135.275938] usb 1-1: == mwifiex dump information to /sys/class/devcoredump start [ 135.281029] BUG: sleeping function called from invalid context at include/linux/sched/mm.h:265 ... [ 135.293613] Call Trace: [ 135.293613] <IRQ> [ 135.293613] dump_stack_lvl+0x57/0x7d [ 135.293613] __might_resched.cold+0x138/0x173 [ 135.293613] ? dev_coredumpm+0xca/0x2e0 [ 135.293613] kmem_cache_alloc_trace+0x189/0x1f0 [ 135.293613] ? devcd_match_failing+0x30/0x30 [ 135.293613] dev_coredumpm+0xca/0x2e0 [ 135.293613] ? devcd_freev+0x10/0x10 [ 135.293613] dev_coredumpv+0x1c/0x20 [ 135.293613] ? devcd_match_failing+0x30/0x30 [ 135.293613] mwifiex_upload_device_dump+0x65/0xb0 [ 135.293613] ? mwifiex_dnld_fw+0x1b0/0x1b0 [ 135.293613] call_timer_fn+0x122/0x3d0 [ 135.293613] ? msleep_interruptible+0xb0/0xb0 [ 135.293613] ? lock_downgrade+0x3c0/0x3c0 [ 135.293613] ? __next_timer_interrupt+0x13c/0x160 [ 135.293613] ? lockdep_hardirqs_on_prepare+0xe/0x220 [ 135.293613] ? mwifiex_dnld_fw+0x1b0/0x1b0 [ 135.293613] __run_timers.part.0+0x3f8/0x540 [ 135.293613] ? call_timer_fn+0x3d0/0x3d0 [ 135.293613] ? arch_restore_msi_irqs+0x10/0x10 [ 135.293613] ? lapic_next_event+0x31/0x40 [ 135.293613] run_timer_softirq+0x4f/0xb0 [ 135.293613] __do_softirq+0x1c2/0x651 ... [ 135.293613] RIP: 0010:default_idle+0xb/0x10 [ 135.293613] RSP: 0018:ffff888006317e68 EFLAGS: 00000246 [ 135.293613] RAX: ffffffff82ad8d10 RBX: ffff888006301cc0 RCX: ffffffff82ac90e1 [ 135.293613] RDX: ffffed100d9ff1b4 RSI: ffffffff831ad140 RDI: ffffffff82ad8f20 [ 135.293613] RBP: 0000000000000003 R08: 0000000000000000 R09: ffff88806cff8d9b [ 135.293613] R10: ffffed100d9ff1b3 R11: 0000000000000001 R12: ffffffff84593410 [ 135.293613] R13: 0000000000000000 R14: 0000000000000000 R15: 1ffff11000c62fd2 ... [ 135.389205] usb 1-1: == mwifiex dump information to /sys/class/devcoredump end
This patch uses delayed work to replace timer and moves the operations that may sleep into a delayed work in order to mitigate bugs, it was tested on Marvell 88W8801 chip whose port is usb and the firmware is usb8801_uapsta.bin. The following is the result after using delayed work to replace timer.
[ 134.936453] usb 1-1: == mwifiex dump information to /sys/class/devcoredump start [ 135.043344] usb 1-1: == mwifiex dump information to /sys/class/devcoredump end
As we can see, there is no bug now.
Cc: stable@vger.kernel.org Fixes: f5ecd02a8b20 ("mwifiex: device dump support for usb interface") Signed-off-by: Duoming Zhou duoming@zju.edu.cn Reviewed-by: Brian Norris briannorris@chromium.org
Changes in v6:
- Use clang-format to adjust the format of code.
Acked-by: Greg Kroah-Hartman gregkh@linuxfoundation.org