On 22.12.2022 9.29, Jimmy Hu wrote:
When the host controller is not responding, all URBs queued to all endpoints need to be killed. This can cause a kernel panic if we dereference an invalid endpoint.
Fix this by using xhci_get_virt_ep() helper to find the endpoint and checking if the endpoint is valid before dereferencing it.
[233311.853271] xhci-hcd xhci-hcd.1.auto: xHCI host controller not responding, assume dead [233311.853393] Unable to handle kernel NULL pointer dereference at virtual address 00000000000000e8
[233311.853964] pc : xhci_hc_died+0x10c/0x270 [233311.853971] lr : xhci_hc_died+0x1ac/0x270
[233311.854077] Call trace: [233311.854085] xhci_hc_died+0x10c/0x270 [233311.854093] xhci_stop_endpoint_command_watchdog+0x100/0x1a4 [233311.854105] call_timer_fn+0x50/0x2d4 [233311.854112] expire_timers+0xac/0x2e4 [233311.854118] run_timer_softirq+0x300/0xabc [233311.854127] __do_softirq+0x148/0x528 [233311.854135] irq_exit+0x194/0x1a8 [233311.854143] __handle_domain_irq+0x164/0x1d0 [233311.854149] gic_handle_irq.22273+0x10c/0x188 [233311.854156] el1_irq+0xfc/0x1a8 [233311.854175] lpm_cpuidle_enter+0x25c/0x418 [msm_pm] [233311.854185] cpuidle_enter_state+0x1f0/0x764 [233311.854194] do_idle+0x594/0x6ac [233311.854201] cpu_startup_entry+0x7c/0x80 [233311.854209] secondary_start_kernel+0x170/0x198
Fixes: 50e8725e7c42 ("xhci: Refactor command watchdog and fix split string.") Cc: stable@vger.kernel.org Signed-off-by: Jimmy Hu hhhuuu@google.com
drivers/usb/host/xhci-ring.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c index ddc30037f9ce..f5b0e1ce22af 100644 --- a/drivers/usb/host/xhci-ring.c +++ b/drivers/usb/host/xhci-ring.c @@ -1169,7 +1169,10 @@ static void xhci_kill_endpoint_urbs(struct xhci_hcd *xhci, struct xhci_virt_ep *ep; struct xhci_ring *ring;
- ep = &xhci->devs[slot_id]->eps[ep_index];
- ep = xhci_get_virt_ep(xhci, slot_id, ep_index);
- if (!ep)
return;
This is a good generic change that should be added anyway, but but might not fix the rootcause. It will reduce the risk of the race drastically.
We do check that xhci-devs[slot_id] exists once before calling xhci_kill_endpoint_urbs() for the endpoints of that virt_dev in xhci_hc_died(). And xhci_hc_died() should always be calling with lock held (need to doublecheck this is true).
Older kernels used release and re-aquire the lock while giving back urbs for an endpoint, so older kernels really need this change.
But newer kernels don't release the lock anymore when giving back URBs.
Looks like real issue is that we don't take the lock when we free the virt_dev.
-Mathias