On 2025/2/21 15:22, Tian, Kevin wrote:
From: Baolu Lubaolu.lu@linux.intel.com Sent: Thursday, February 20, 2025 7:38 PM
On 2025/2/20 15:21, Tian, Kevin wrote:
From: Lu Baolubaolu.lu@linux.intel.com Sent: Tuesday, February 18, 2025 10:24 AM
Commit <d74169ceb0d2> ("iommu/vt-d: Allocate DMAR fault interrupts locally") moved the call to enable_drhd_fault_handling() to a code path that does not hold any lock while traversing the drhd list. Fix it by ensuring the dmar_global_lock lock is held when traversing the drhd list.
Without this fix, the following warning is triggered:
WARNING: suspicious RCU usage 6.14.0-rc3 #55 Not tainted
drivers/iommu/intel/dmar.c:2046 RCU-list traversed in non-reader section!! other info that might help us debug this: rcu_scheduler_active = 1, debug_locks = 1 2 locks held by cpuhp/1/23: #0: ffffffff84a67c50 (cpu_hotplug_lock){++++}-{0:0}, at: cpuhp_thread_fun+0x87/0x2c0 #1: ffffffff84a6a380 (cpuhp_state-up){+.+.}-{0:0}, at: cpuhp_thread_fun+0x87/0x2c0 stack backtrace: CPU: 1 UID: 0 PID: 23 Comm: cpuhp/1 Not tainted 6.14.0-rc3 #55 Call Trace: <TASK> dump_stack_lvl+0xb7/0xd0 lockdep_rcu_suspicious+0x159/0x1f0 ? __pfx_enable_drhd_fault_handling+0x10/0x10 enable_drhd_fault_handling+0x151/0x180 cpuhp_invoke_callback+0x1df/0x990 cpuhp_thread_fun+0x1ea/0x2c0 smpboot_thread_fn+0x1f5/0x2e0 ? __pfx_smpboot_thread_fn+0x10/0x10 kthread+0x12a/0x2d0 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x4a/0x60 ? __pfx_kthread+0x10/0x10 ret_from_fork_asm+0x1a/0x30 </TASK>
Simply holding the lock in enable_drhd_fault_handling() will trigger a lock order splat. Avoid holding the dmar_global_lock when calling iommu_device_register(), which starts the device probe process.
Can you elaborate the splat issue? It's not intuitive to me with a quick read of the code and iommu_device_register() is not occurred in above calling stack.
The lockdep splat looks like below:
Thanks and it's clear now. Probably you can expand "to avoid unnecessary lock order splat " a little bit to mark the dead lock between dmar_global_lock and cpu_hotplug_lock (acquired in path of iommu_device_register()).
Yes, sure.