Protection from the queuing operation is achieved with an RCU read lock to avoid calling 'queue_delayed_work()' after 'cancel_delayed_work()', but this does not apply to 'hci_conn_drop()'.
commit deee93d13d38 ("Bluetooth: use hdev->workqueue when queuing hdev->{cmd,ncmd}_timer works")
The situation described raises concerns about suspicious RCU usage in a corrupted context.
CPU 1 CPU 2 hci_dev_do_reset() synchronize_rcu() hci_conn_drop() drain_workqueue() <-- no RCU read protection during queuing. --> queue_delayed_work()
It displays a warning message like the following
Bluetooth: hci0: unexpected cc 0x0c38 length: 249 > 2 ============================= WARNING: suspicious RCU usage 6.10.0-rc6-01340-gf14c0bb78769 #5 Not tainted ----------------------------- net/mac80211/util.c:4000 RCU-list traversed in non-reader section!!
other info that might help us debug this:
rcu_scheduler_active = 2, debug_locks = 1 2 locks held by syz-executor/798: #0: ffff800089a3de50 (rtnl_mutex){+.+.}-{4:4}, at: rtnl_lock+0x28/0x40 net/core/rtnetlink.c:79
stack backtrace: CPU: 0 PID: 798 Comm: syz-executor Not tainted 6.10.0-rc6-01340-gf14c0bb78769 #5 Hardware name: linux,dummy-virt (DT) Call trace: dump_backtrace.part.0+0x1b8/0x1d0 arch/arm64/kernel/stacktrace.c:317 dump_backtrace arch/arm64/kernel/stacktrace.c:323 [inline] show_stack+0x34/0x50 arch/arm64/kernel/stacktrace.c:324 __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0xf0/0x170 lib/dump_stack.c:114 dump_stack+0x20/0x30 lib/dump_stack.c:123 lockdep_rcu_suspicious+0x204/0x2f8 kernel/locking/lockdep.c:6712 ieee80211_check_combinations+0x71c/0x828 [mac80211] ieee80211_check_concurrent_iface+0x494/0x700 [mac80211] ieee80211_open+0x140/0x238 [mac80211] __dev_open+0x270/0x498 net/core/dev.c:1474 __dev_change_flags+0x47c/0x610 net/core/dev.c:8837 dev_change_flags+0x98/0x170 net/core/dev.c:8909 devinet_ioctl+0xdf0/0x18d0 net/ipv4/devinet.c:1177 inet_ioctl+0x34c/0x388 net/ipv4/af_inet.c:1003 sock_do_ioctl+0xe4/0x240 net/socket.c:1222 sock_ioctl+0x4cc/0x740 net/socket.c:1341 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:907 [inline] __se_sys_ioctl fs/ioctl.c:893 [inline] __arm64_sys_ioctl+0x184/0x218 fs/ioctl.c:893 __invoke_syscall arch/arm64/kernel/syscall.c:34 [inline] invoke_syscall+0x90/0x2e8 arch/arm64/kernel/syscall.c:48 el0_svc_common.constprop.0+0x200/0x2a8 arch/arm64/kernel/syscall.c:131 el0_svc+0x48/0xc0 arch/arm64/kernel/entry-common.c:712 el0t_64_sync_handler+0x120/0x130 arch/arm64/kernel/entry-common.c:730 el0t_64_sync+0x190/0x198 arch/arm64/kernel/entry.S:598
This patch attempts to fix that issue with the same convention.
Cc: stable@vger.kernel.org # v6.1+ Fixes: deee93d13d38 ("Bluetooth: use hdev->workqueue when queuing hdev->{cmd,ncmd}_timer works") Signed-off-by: Yeoreum Yun yeoreum.yun@arm.com Tested-by: Yunseong Kim yskelg@gmail.com Signed-off-by: Yunseong Kim yskelg@gmail.com --- include/net/bluetooth/hci_core.h | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-)
diff --git a/include/net/bluetooth/hci_core.h b/include/net/bluetooth/hci_core.h index 31020891fc68..111509dc1a23 100644 --- a/include/net/bluetooth/hci_core.h +++ b/include/net/bluetooth/hci_core.h @@ -1572,8 +1572,13 @@ static inline void hci_conn_drop(struct hci_conn *conn) }
cancel_delayed_work(&conn->disc_work); - queue_delayed_work(conn->hdev->workqueue, - &conn->disc_work, timeo); + + rcu_read_lock(); + if (!hci_dev_test_flag(conn->hdev, HCI_CMD_DRAIN_WORKQUEUE)) { + queue_delayed_work(conn->hdev->workqueue, + &conn->disc_work, timeo); + } + rcu_read_unlock(); } }
On 2024/07/25 22:47, Yunseong Kim wrote:
============================= WARNING: suspicious RCU usage 6.10.0-rc6-01340-gf14c0bb78769 #5 Not tainted
net/mac80211/util.c:4000 RCU-list traversed in non-reader section!!
other info that might help us debug this:
rcu_scheduler_active = 2, debug_locks = 1 2 locks held by syz-executor/798: #0: ffff800089a3de50 (rtnl_mutex){+.+.}-{4:4}, at: rtnl_lock+0x28/0x40 net/core/rtnetlink.c:79
stack backtrace: CPU: 0 PID: 798 Comm: syz-executor Not tainted 6.10.0-rc6-01340-gf14c0bb78769 #5 Hardware name: linux,dummy-virt (DT) Call trace: dump_backtrace.part.0+0x1b8/0x1d0 arch/arm64/kernel/stacktrace.c:317 dump_backtrace arch/arm64/kernel/stacktrace.c:323 [inline] show_stack+0x34/0x50 arch/arm64/kernel/stacktrace.c:324 __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0xf0/0x170 lib/dump_stack.c:114 dump_stack+0x20/0x30 lib/dump_stack.c:123 lockdep_rcu_suspicious+0x204/0x2f8 kernel/locking/lockdep.c:6712 ieee80211_check_combinations+0x71c/0x828 [mac80211] ieee80211_check_concurrent_iface+0x494/0x700 [mac80211] ieee80211_open+0x140/0x238 [mac80211] __dev_open+0x270/0x498 net/core/dev.c:1474 __dev_change_flags+0x47c/0x610 net/core/dev.c:8837 dev_change_flags+0x98/0x170 net/core/dev.c:8909 devinet_ioctl+0xdf0/0x18d0 net/ipv4/devinet.c:1177 inet_ioctl+0x34c/0x388 net/ipv4/af_inet.c:1003 sock_do_ioctl+0xe4/0x240 net/socket.c:1222 sock_ioctl+0x4cc/0x740 net/socket.c:1341 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:907 [inline] __se_sys_ioctl fs/ioctl.c:893 [inline] __arm64_sys_ioctl+0x184/0x218 fs/ioctl.c:893 __invoke_syscall arch/arm64/kernel/syscall.c:34 [inline] invoke_syscall+0x90/0x2e8 arch/arm64/kernel/syscall.c:48 el0_svc_common.constprop.0+0x200/0x2a8 arch/arm64/kernel/syscall.c:131 el0_svc+0x48/0xc0 arch/arm64/kernel/entry-common.c:712 el0t_64_sync_handler+0x120/0x130 arch/arm64/kernel/entry-common.c:730 el0t_64_sync+0x190/0x198 arch/arm64/kernel/entry.S:598
This patch attempts to fix that issue with the same convention.
Excuse me, but I can't interpret why this patch solves the warning.
The warning says that list_for_each_entry_rcu() { } in ieee80211_check_combinations() is called outside of rcu_read_lock() and rcu_read_unlock() pair, doesn't it? How does that connected to guarding hci_dev_test_flag() and queue_delayed_work() with rcu_read_lock() and rcu_read_unlock() pair? Unless you guard list_for_each_entry_rcu() { } in ieee80211_check_combinations() with rcu_read_lock() and rcu_read_unlock() pair (or annotate that appropriate locks are already held), I can't expect that the warning will be solved...
Also, what guarantees that drain_workqueue() won't be disturbed by queue_work(disc_work) which will be called after "timeo" delay, for you are not explicitly cancelling scheduled "disc_work" (unlike "cmd_timer" work and "ncmd_timer" work shown below) before calling drain_workqueue() ?
/* Cancel these to avoid queueing non-chained pending work */ hci_dev_set_flag(hdev, HCI_CMD_DRAIN_WORKQUEUE); /* Wait for * * if (!hci_dev_test_flag(hdev, HCI_CMD_DRAIN_WORKQUEUE)) * queue_delayed_work(&hdev->{cmd,ncmd}_timer) * * inside RCU section to see the flag or complete scheduling. */ synchronize_rcu(); /* Explicitly cancel works in case scheduled after setting the flag. */ cancel_delayed_work(&hdev->cmd_timer); cancel_delayed_work(&hdev->ncmd_timer);
/* Avoid potential lockdep warnings from the *_flush() calls by * ensuring the workqueue is empty up front. */ drain_workqueue(hdev->workqueue);
Hi Tetsuo,
Excuse me, but I can't interpret why this patch solves the warning.
The warning says that list_for_each_entry_rcu() { } in ieee80211_check_combinations() is called outside of rcu_read_lock() and rcu_read_unlock() pair, doesn't it? How does that connected to guarding hci_dev_test_flag() and queue_delayed_work() with rcu_read_lock() and rcu_read_unlock() pair? Unless you guard list_for_each_entry_rcu() { } in ieee80211_check_combinations() with rcu_read_lock() and rcu_read_unlock() pair (or annotate that appropriate locks are already held), I can't expect that the warning will be solved...
Thank you for the code review.
Sorry, I apologize for attaching the wrong kernel dump.
Also, what guarantees that drain_workqueue() won't be disturbed by queue_work(disc_work) which will be called after "timeo" delay, for you are not explicitly cancelling scheduled "disc_work" (unlike "cmd_timer" work and "ncmd_timer" work shown below) before calling drain_workqueue() ?
/* Cancel these to avoid queueing non-chained pending work */ hci_dev_set_flag(hdev, HCI_CMD_DRAIN_WORKQUEUE); /* Wait for * * if (!hci_dev_test_flag(hdev, HCI_CMD_DRAIN_WORKQUEUE)) * queue_delayed_work(&hdev->{cmd,ncmd}_timer) * * inside RCU section to see the flag or complete scheduling. */ synchronize_rcu(); /* Explicitly cancel works in case scheduled after setting the flag. */ cancel_delayed_work(&hdev->cmd_timer); cancel_delayed_work(&hdev->ncmd_timer);
/* Avoid potential lockdep warnings from the *_flush() calls by * ensuring the workqueue is empty up front. */ drain_workqueue(hdev->workqueue);
Please bear with me for a moment.
I'll attach the correct kernel dump and resend the patch email.
Warm regards,
Yunseong Kim
linux-stable-mirror@lists.linaro.org