Hi Ricardo,
On Mon, Jan 02, 2023 at 01:45:06PM +0100, Ricardo Ribalda wrote:
On Tue, 27 Dec 2022 at 15:37, Laurent Pinchart wrote:
On Tue, Dec 27, 2022 at 04:25:01PM +0200, Laurent Pinchart wrote:
On Wed, Dec 14, 2022 at 05:31:55PM +0100, Ricardo Ribalda wrote:
usb_kill_urb warranties that all the handlers are finished when it returns, but does not protect against threads that might be handling asynchronously the urb.
For UVC, the function uvc_ctrl_status_event_async() takes care of control changes asynchronously.
If the code is executed in the following order:
CPU 0 CPU 1 ===== ===== uvc_status_complete() uvc_status_stop() uvc_ctrl_status_event_work() uvc_status_start() -> FAIL
Then uvc_status_start will keep failing and this error will be shown:
<4>[ 5.540139] URB 0000000000000000 submitted while active drivers/usb/core/urb.c:378 usb_submit_urb+0x4c3/0x528
Let's improve the current situation, by not re-submiting the urb if we are stopping the status event. Also process the queued work (if any) during stop.
CPU 0 CPU 1 ===== ===== uvc_status_complete() uvc_status_stop() uvc_status_start() uvc_ctrl_status_event_work() -> FAIL
Hopefully, with the usb layer protection this should be enough to cover all the cases.
Cc: stable@vger.kernel.org Fixes: e5225c820c05 ("media: uvcvideo: Send a control event when a Control Change interrupt arrives") Reviewed-by: Yunke Cao yunkec@chromium.org Signed-off-by: Ricardo Ribalda ribalda@chromium.org
uvc: Fix race condition on uvc
Make sure that all the async work is finished when we stop the status urb.
To: Yunke Cao yunkec@chromium.org To: Sergey Senozhatsky senozhatsky@chromium.org To: Max Staudt mstaudt@google.com To: Laurent Pinchart laurent.pinchart@ideasonboard.com To: Mauro Carvalho Chehab mchehab@kernel.org Cc: linux-media@vger.kernel.org Cc: linux-kernel@vger.kernel.org
Changes in v4:
- Replace bool with atomic_t to avoid compiler reordering
- First complete the async work and then kill the urb to avoid race (Thanks Laurent!)
- Link to v3: https://lore.kernel.org/r/20221212-uvc-race-v3-0-954efc752c9a@chromium.org
Changes in v3:
- Remove the patch for dev->status, makes more sense in another series, and makes the zero day less nervous.
- Update reviewed-by (thanks Yunke!).
- Link to v2: https://lore.kernel.org/r/20221212-uvc-race-v2-0-54496cc3b8ab@chromium.org
Changes in v2:
- Add a patch for not kalloc dev->status
- Redo the logic mechanism, so it also works with suspend (Thanks Yunke!)
- Link to v1: https://lore.kernel.org/r/20221212-uvc-race-v1-0-c52e1783c31d@chromium.org
drivers/media/usb/uvc/uvc_ctrl.c | 3 +++ drivers/media/usb/uvc/uvc_status.c | 6 ++++++ drivers/media/usb/uvc/uvcvideo.h | 1 + 3 files changed, 10 insertions(+)
diff --git a/drivers/media/usb/uvc/uvc_ctrl.c b/drivers/media/usb/uvc/uvc_ctrl.c index c95a2229f4fa..1be6897a7d6d 100644 --- a/drivers/media/usb/uvc/uvc_ctrl.c +++ b/drivers/media/usb/uvc/uvc_ctrl.c @@ -1442,6 +1442,9 @@ static void uvc_ctrl_status_event_work(struct work_struct *work)
uvc_ctrl_status_event(w->chain, w->ctrl, w->data);
- if (atomic_read(&dev->flush_status))
return;
- /* Resubmit the URB. */ w->urb->interval = dev->int_ep->desc.bInterval; ret = usb_submit_urb(w->urb, GFP_KERNEL);
diff --git a/drivers/media/usb/uvc/uvc_status.c b/drivers/media/usb/uvc/uvc_status.c index 7518ffce22ed..4a95850cdc1b 100644 --- a/drivers/media/usb/uvc/uvc_status.c +++ b/drivers/media/usb/uvc/uvc_status.c @@ -304,10 +304,16 @@ int uvc_status_start(struct uvc_device *dev, gfp_t flags) if (dev->int_urb == NULL) return 0;
- atomic_set(&dev->flush_status, 0); return usb_submit_urb(dev->int_urb, flags);
}
void uvc_status_stop(struct uvc_device *dev) {
- struct uvc_ctrl_work *w = &dev->async_ctrl;
- atomic_set(&dev->flush_status, 1);
Note that atomic_read() and atomic_set() do no imply any memory barrier on most architectures, as far as I can tell. They essentially become READ_ONCE() and WRITE_ONCE() calls, which guarantee that the compiler will not merge or split loads or stores, or reorder them with respect to load and stores to the *same* memory location, but nothing else. I think you need to add proper barriers, and you can then probably also drop usage of atomic_t.
You are absolutely right! Only a subset of atomic implies memory barriers.
Found this doc particularly good: https://www.kernel.org/doc/html/v5.10/core-api/atomic_ops.html
- if (cancel_work_sync(&w->work))
usb_kill_urb(dev->int_urb);uvc_ctrl_status_event(w->chain, w->ctrl, w->data);
This should get rid of the main race (possibly save the barrier issue), but it's not the most efficient option, and can still be problematic. Consider the following case:
CPU0 CPU1
void uvc_status_stop(struct uvc_device *dev) uvc_ctrl_status_event_async() { { ... atomic_set(&dev->flush_status, 1); ... if (cancel_work_sync()) ... schedule_work(); usb_kill_urb(); } }
The usb_kill_urb() call ensures that uvc_ctrl_status_event_async() completes before uvc_status_stop() returns, but there will still be work scheduled in that case. uvc_ctrl_status_event_work() will be run later, and as flush_status is set to 1, the function will not resubmit the URB. That fixes the main race, but leaves the asynchronous control status event handling for after uvc_status_stop() returns, which isn't great.
Now, if we consider that uvc_status_start() could be called shortly after uvc_status_stop(), we may get in a situation where uvc_status_start() will reset flush_status to 0 before uvc_ctrl_status_event_async() runs. Both uvc_ctrl_status_event_async() and uvc_status_start() will thus submit the same URB.
You can't fix this by first killing the URB and then cancelling the work, as there would then be a risk that uvc_ctrl_status_event_work() would be running in parallel, going past the flush_status check before flush_status gets set to 1 in uvc_status_stop(), and submitting the URB after usb_kill_urb() is called.
I think a good fix would be to check flush_status in uvc_ctrl_status_event_async() and avoid the schedule_work() call in that case.
If we do that, then we will be losing an event. I would rather call cancel_work_sync() again after usb_kill_urb().
I've thought about that, but I don't think losing the event is an issue. We're stopping event handling in the first place, there's no synchronization guarantee with the camera. For all we know the camera could have generate the event right after we stop instead of right before. There's of course no reason to drop the event for the sake of it, but if it leads to simpler code, there's no reason to process it either.
You could then also move the atomic_set(..., 0) call from uvc_status_start() to the end of uvc_status_stop() (again with proper barriers).
Will do that, it is more "elegant".
Also, as all of this is tricky, comments in appropriate places in the code would be very helpful to avoid breaking things later.
Could you please check the memory barriers and test the above proposal (after double-checking it of course, I may have missed something) ?
} diff --git a/drivers/media/usb/uvc/uvcvideo.h b/drivers/media/usb/uvc/uvcvideo.h index df93db259312..1274691f157f 100644 --- a/drivers/media/usb/uvc/uvcvideo.h +++ b/drivers/media/usb/uvc/uvcvideo.h @@ -560,6 +560,7 @@ struct uvc_device { struct usb_host_endpoint *int_ep; struct urb *int_urb; u8 *status;
- atomic_t flush_status; struct input_dev *input; char input_phys[64];