Hi Greg & Sasha,
tl;dr: Please add 2dce224f469f ("netns: protect netns ID lookups with RCU") to the stable releases from v5.4 and older. It fixes a spin_unlock_bh() in peernet2id() called with IRQs off. I think this neat side-effect of commit 2dce224f469f was quite un-intentional, hence no Fixes: tag or CC: stable.
The details:
From bugzilla.redhat.com/show_bug.cgi?id=1384179 (an ancient 4.9.0-0.rc0 kernel):
dump_stack+0x86/0xc3 __warn+0xcb/0xf0 warn_slowpath_null+0x1d/0x20 __local_bh_enable_ip+0x9d/0xc0 _raw_spin_unlock_bh+0x35/0x40 peernet2id+0x54/0x80 netlink_broadcast_filtered+0x220/0x3c0 netlink_broadcast+0x1d/0x20 audit_log+0x6a/0x90 security_set_bools+0xee/0x200 []
Note, security_set_bools() calls write_lock_irq(). peernet2id() calls spin_unlock_bh().
From an internal (UEK) stack trace based on the v4.14.35 kernel (LTS 4.14.231):
queued_spin_lock_slowpath+0xb/0xf _raw_spin_lock_irqsave+0x46/0x48 send_mad+0x3d2/0x590 [ib_core] ib_sa_path_rec_get+0x223/0x4d0 [ib_core] path_rec_start+0xa3/0x140 [ib_ipoib] ipoib_start_xmit+0x2b0/0x6a0 [ib_ipoib] dev_hard_start_xmit+0xb2/0x237 sch_direct_xmit+0x114/0x1bf __dev_queue_xmit+0x592/0x818 dev_queue_xmit+0x10/0x12 arp_xmit+0x38/0xa6 arp_send_dst.part.16+0x61/0x84 arp_process+0x825/0x889 arp_rcv+0x140/0x1c9 __netif_receive_skb_core+0x401/0xb39 __netif_receive_skb+0x18/0x59 netif_receive_skb_internal+0x45/0x119 napi_gro_receive+0xd8/0xf6 ipoib_ib_handle_rx_wc+0x1ca/0x520 [ib_ipoib] ipoib_poll+0xcd/0x150 [ib_ipoib] net_rx_action+0x289/0x3f4 __do_softirq+0xe1/0x2b5 do_softirq_own_stack+0x2a/0x35 </IRQ> do_softirq+0x4d/0x6a __local_bh_enable_ip+0x57/0x59 _raw_spin_unlock_bh+0x23/0x25 peernet2id+0x51/0x73 netlink_broadcast_filtered+0x223/0x41b netlink_broadcast+0x1d/0x1f rdma_nl_multicast+0x22/0x30 [ib_core] send_mad+0x3e5/0x590 [ib_core] ib_sa_path_rec_get+0x223/0x4d0 [ib_core] rdma_resolve_route+0x287/0x810 [rdma_cm] rds_rdma_cm_event_handler_cmn+0x311/0x7d0 [rds_rdma] rds_rdma_cm_event_handler_worker+0x22/0x30 [rds_rdma] process_one_work+0x169/0x3a6 worker_thread+0x4d/0x3e5 kthread+0x105/0x138 ret_from_fork+0x24/0x49
Here, pay attention to ib_nl_make_request() which calls spin_lock_irqsave() on a global lock just before calling rdma_nl_multicast(). Thereafter, peernet2id() enables SoftIRQs, and ipoib starts and calls the same path and end up trying to acquire the same global lock again.
I have tried to repro this with no luck. But, stack traces seldom lies ;-)
Thxs, Håkon
On Thu, Sep 09, 2021 at 01:10:05PM +0000, Haakon Bugge wrote:
Hi Greg & Sasha,
tl;dr: Please add 2dce224f469f ("netns: protect netns ID lookups with RCU") to the stable releases from v5.4 and older. It fixes a spin_unlock_bh() in peernet2id() called with IRQs off. I think this neat side-effect of commit 2dce224f469f was quite un-intentional, hence no Fixes: tag or CC: stable.
Please provide a working backport for all of the relevant kernel verisons, as it does not apply cleanly on it's own.
thanks,
greg k-h
On 9 Sep 2021, at 15:30, Greg KH gregkh@linuxfoundation.org wrote:
On Thu, Sep 09, 2021 at 01:10:05PM +0000, Haakon Bugge wrote:
Hi Greg & Sasha,
tl;dr: Please add 2dce224f469f ("netns: protect netns ID lookups with RCU") to the stable releases from v5.4 and older. It fixes a spin_unlock_bh() in peernet2id() called with IRQs off. I think this neat side-effect of commit 2dce224f469f was quite un-intentional, hence no Fixes: tag or CC: stable.
Please provide a working backport for all of the relevant kernel verisons, as it does not apply cleanly on it's own.
I've done the backports. 4.9 is actually not needed, because it uses spin_{lock,unlock}_irqsave() in peernet2id(). Hence, we have an "offending" commit which this one fixes:
fba143c66abb ("netns: avoid disabling irq for netns id")
Will get'm out during the weekend.
Thxs, Håkon
On Fri, Sep 10, 2021 at 02:22:32PM +0000, Haakon Bugge wrote:
On 9 Sep 2021, at 15:30, Greg KH gregkh@linuxfoundation.org wrote:
On Thu, Sep 09, 2021 at 01:10:05PM +0000, Haakon Bugge wrote:
Hi Greg & Sasha,
tl;dr: Please add 2dce224f469f ("netns: protect netns ID lookups with RCU") to the stable releases from v5.4 and older. It fixes a spin_unlock_bh() in peernet2id() called with IRQs off. I think this neat side-effect of commit 2dce224f469f was quite un-intentional, hence no Fixes: tag or CC: stable.
Please provide a working backport for all of the relevant kernel verisons, as it does not apply cleanly on it's own.
I've done the backports. 4.9 is actually not needed, because it uses spin_{lock,unlock}_irqsave() in peernet2id(). Hence, we have an "offending" commit which this one fixes:
fba143c66abb ("netns: avoid disabling irq for netns id")
Will get'm out during the weekend.
All now queued up, thanks.
greg k-h
On 13 Sep 2021, at 10:42, Greg KH gregkh@linuxfoundation.org wrote:
On Fri, Sep 10, 2021 at 02:22:32PM +0000, Haakon Bugge wrote:
On 9 Sep 2021, at 15:30, Greg KH gregkh@linuxfoundation.org wrote:
On Thu, Sep 09, 2021 at 01:10:05PM +0000, Haakon Bugge wrote:
Hi Greg & Sasha,
tl;dr: Please add 2dce224f469f ("netns: protect netns ID lookups with RCU") to the stable releases from v5.4 and older. It fixes a spin_unlock_bh() in peernet2id() called with IRQs off. I think this neat side-effect of commit 2dce224f469f was quite un-intentional, hence no Fixes: tag or CC: stable.
Please provide a working backport for all of the relevant kernel verisons, as it does not apply cleanly on it's own.
I've done the backports. 4.9 is actually not needed, because it uses spin_{lock,unlock}_irqsave() in peernet2id(). Hence, we have an "offending" commit which this one fixes:
fba143c66abb ("netns: avoid disabling irq for netns id")
Will get'm out during the weekend.
All now queued up, thanks.
Thanks for helping me out here, Greg!
Appreciated, Håkon
greg k-h
linux-stable-mirror@lists.linaro.org