On Mon, 2 Dec 2024 at 01:19, Laurent Pinchart laurent.pinchart@ideasonboard.com wrote:
On Fri, Nov 29, 2024 at 11:18:54PM +0100, Ricardo Ribalda wrote:
On Fri, 29 Nov 2024 at 23:03, Laurent Pinchart wrote:
On Fri, Nov 29, 2024 at 07:47:31PM +0100, Ricardo Ribalda wrote:
Before we all go on a well deserved weekend, let me recap what we know. If I did not get something correctly, let me know.
- Well behaved devices do not allow to set or get an incomplete async
control. They will stall instead (ref: Figure 2-21 in UVC 1.5 ) 2) Both Laurent and Ricardo consider that there is a big chance that some camera modules do not implement this properly. (ref: years of crying over broken module firmware :) )
- ctrl->handle is designed to point to the fh that originated the
control. So the logic can decide if the originator needs to be notified or not. (ref: uvc_ctrl_send_event() ) 4) Right now we replace the originator in ctrl->handle for unfinished async controls. (ref: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/driv...)
My interpretation is that: A) We need to change 4). We shall not change the originator of unfinished ctrl->handle. B) Well behaved cameras do not need the patch "Do not set an async control owned by another fh" C) For badly behaved cameras, it is fine if we slightly break the v4l2-compliance in corner cases, if we do not break any internal data structure.
The fact that some devices may not implement the documented behaviour correctly may not be a problem. Well-behaved devices will stall, which means we shouldn't query the device while as async update is in progress. Badly-behaved devices, whatever they do when queried, should not cause any issue if we don't query them.
I thought we could detect the stall and return safely. Isn't that the case?
We could, but if we know the device will stall anyway, is there a reason not to avoid issuing the request in the first place ?
Because there are always going to be devices that do not send the event when the control has finished.
It is impossible to know the state of the device: Is the zoom still moving or the device is not compliant?
Why we have not seen issues with this?
I haven't tested a PTZ device for a very long time, and you would need to hit a small time window to see the issue.
Not that small, some devices take up to 10 seconds to go from the smallest zoom to the biggest zoom. I'd say that v4l2-ctl has a very big chance to hit any error.
BTW, homing can take an even longer time.
We should not send GET_CUR and SET_CUR requests to the device while an async update is in progress, and use cached values instead. When we receive the async update event, we should clear the cache. This will be the same for both well-behaved and badly-behaved devices, so we can expose the same behaviour towards userspace.
seting ctrl->loaded = 0 when we get an event sounds like a good idea and something we can implement right away. If I have to resend the set I will add it to the end.
We possibly also need some kind of timeout mechanism to cope with the async update event not being delivered by the device.
This is the part that worries me the most:
- timeouts make the code fragile
- What is a good value for timeout? 1 second, 30, 300? I do not think
that we can find a value.
I've been thinking about the implementation of uvc_fh cleanup over the weekend, and having a timeout would have the nice advantage that we could reference-count uvc_fh instead of implementing a cleanup that walks over all controls when closing a file handle. I think it would make the code simpler, and possibly safer too.
The problem with a timeout is: - We do not know what is the right value for the timeout. - We need to have one timeout per control, or implement a timeout dispatcher mechanism, and that is racy by nature - It will require introducing a new locking mechanism to avoid races between the timeout handler and the event completion - It introduces a new lifetime in the driver... I'd argue that it is best to have less, not more.
I do not see many benefits....
What we can introduce on top of my set is something like this (just pseudocode, do not scream at me :P) That code can prevent stalls and will work with misbehaving hardware.... (But I still do not know a good value for timeout) and solve some of the issues that I mention.
diff --git a/drivers/media/usb/uvc/uvc_ctrl.c b/drivers/media/usb/uvc/uvc_ctrl.c index f0e8a436a306..a55bd4b3ac07 100644 --- a/drivers/media/usb/uvc/uvc_ctrl.c +++ b/drivers/media/usb/uvc/uvc_ctrl.c @@ -1132,6 +1132,9 @@ static int __uvc_ctrl_get(struct uvc_video_chain *chain, if ((ctrl->info.flags & UVC_CTRL_FLAG_GET_CUR) == 0) return -EACCES;
+ if (ctrl->handle && ( NOW - ctrl->async_start_time ) < TIMEOUT) + return -EBUSY; + ret = __uvc_ctrl_load_cur(chain, ctrl); if (ret < 0) return ret; @@ -1591,6 +1594,7 @@ static int uvc_ctrl_get_handle(struct uvc_fh *handle, struct uvc_control *ctrl) return -EBUSY;
ctrl->handle = handle; + ctrl->async_start_time = NOW; handle->pending_async_ctrls++; return 0; } @@ -1982,6 +1986,9 @@ int uvc_ctrl_set(struct uvc_fh *handle, if (!(ctrl->info.flags & UVC_CTRL_FLAG_SET_CUR)) return -EACCES;
+ if (ctrl->handle && ( NOW - ctrl->async_start_time ) < TIMEOUT) + return -EBUSY; + /* Clamp out of range values. */ switch (mapping->v4l2_type) { case V4L2_CTRL_TYPE_INTEGER: diff --git a/drivers/media/usb/uvc/uvcvideo.h b/drivers/media/usb/uvc/uvcvideo.h index e0e4f099a210..5c82fae94dff 100644 --- a/drivers/media/usb/uvc/uvcvideo.h +++ b/drivers/media/usb/uvc/uvcvideo.h @@ -154,6 +154,8 @@ struct uvc_control { * File handle that initially changed the * async control. */ + + TIME async_start_time; };
In any case. Can we solve this in incremental steps? I think the current patch seems simple and correct. We can introduce the timeout later.
Thanks!
Regarding the userspace behaviour during an auto-update, we have multiple options:
For control get,
- We can return -EBUSY
- We can return the old value from the cache
- We can return the new value fromt he cache
Returning -EBUSY would be simpler to implement.
Not only easy, I think it is the most correct,
I don't think the behaviour should depend on whether the control is read on the file handle that initiated the async operation or on a different file handle.
For control set, I don't think we can do much else than returning -EBUSY, regardless of which file handle the control is set on.
ACK.
I will send a new version with my interpretation.
Thanks for a great discussion
Looking with some perspective... I believe that we should look into the "userspace behaviour for auto controls" in a different patchset. It is slightly unrelated to this discussion.
-- Regards,
Laurent Pinchart