On Tue, Sep 28, 2021 at 04:36:20PM +0800, Macpaul Lin wrote:
From: Leon Yu leoyu@nvidia.com
commit 31096c3e8b1163c6e966bf4d1f36d8b699008f84 upstream.
Commit 14b41a2959fb ("net: stmmac: Delete txtimer in suspend()") was the first attempt to fix a race between mod_timer() and setup_timer() during stmmac_resume(). However the issue still exists as the commit only addressed half of the issue.
Same race can still happen as stmmac_resume() re-attaches interface way too early - even before hardware is fully initialized. Worse, doing so allows network traffic to restart and stmmac_tx_timer_arm() being called in the middle of stmmac_resume(), which re-init tx timers in stmmac_init_coalesce(). timer_list will be corrupted and system crashes as a result of race between mod_timer() and setup_timer().
systemd--1995 2.... 552950018us : stmmac_suspend: 4994 ksoftirq-9 0..s2 553123133us : stmmac_tx_timer_arm: 2276 systemd--1995 0.... 553127896us : stmmac_resume: 5101 systemd--320 7...2 553132752us : stmmac_tx_timer_arm: 2276 (sd-exec-1999 5...2 553135204us : stmmac_tx_timer_arm: 2276
pc : run_timer_softirq+0x468/0x5e0 lr : run_timer_softirq+0x570/0x5e0 Call trace: run_timer_softirq+0x468/0x5e0 __do_softirq+0x124/0x398 irq_exit+0xd8/0xe0 __handle_domain_irq+0x6c/0xc0 gic_handle_irq+0x60/0xb0 el1_irq+0xb8/0x180 arch_cpu_idle+0x38/0x230 default_idle_call+0x24/0x3c do_idle+0x1e0/0x2b8 cpu_startup_entry+0x28/0x48 secondary_start_kernel+0x1b4/0x208
Fix this by deferring netif_device_attach() to the end of stmmac_resume().
Signed-off-by: Leon Yu leoyu@nvidia.com Signed-off-by: David S. Miller davem@davemloft.net
Whenever you forward on a patch, you should add yourself to the signed-off-by chain.
I'll just add you to the cc: to let us know who asked for this patch.
thanks,
greg k-h