On Wed, Mar 02, 2022 at 09:50:38AM -0500, Michael S. Tsirkin wrote:
On Wed, Mar 02, 2022 at 03:11:21PM +0100, Stefano Garzarella wrote:
On Wed, Mar 02, 2022 at 08:35:08AM -0500, Michael S. Tsirkin wrote:
On Wed, Mar 02, 2022 at 10:34:46AM +0100, Stefano Garzarella wrote:
On Wed, Mar 02, 2022 at 07:54:21AM +0000, Lee Jones wrote:
vhost_vsock_handle_tx_kick() already holds the mutex during its call to vhost_get_vq_desc(). All we have to do is take the same lock during virtqueue clean-up and we mitigate the reported issues.
Link: https://syzkaller.appspot.com/bug?extid=279432d30d825e63ba00
This issue is similar to [1] that should be already fixed upstream by [2].
However I think this patch would have prevented some issues, because vhost_vq_reset() sets vq->private to NULL, preventing the worker from running.
Anyway I think that when we enter in vhost_dev_cleanup() the worker should be already stopped, so it shouldn't be necessary to take the mutex. But in order to prevent future issues maybe it's better to take them, so:
Reviewed-by: Stefano Garzarella sgarzare@redhat.com
[1] https://syzkaller.appspot.com/bug?id=993d8b5e64393ed9e6a70f9ae4de0119c605a82... [2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
Right. I want to queue this but I would like to get a warning so we can detect issues like [2] before they cause more issues.
I agree, what about moving the warning that we already have higher up, right at the beginning of the function?
I mean something like this:
diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c index 59edb5a1ffe2..1721ff3f18c0 100644 --- a/drivers/vhost/vhost.c +++ b/drivers/vhost/vhost.c @@ -692,6 +692,8 @@ void vhost_dev_cleanup(struct vhost_dev *dev) { int i;
WARN_ON(!llist_empty(&dev->work_list));
for (i = 0; i < dev->nvqs; ++i) { if (dev->vqs[i]->error_ctx) eventfd_ctx_put(dev->vqs[i]->error_ctx);
@@ -712,7 +714,6 @@ void vhost_dev_cleanup(struct vhost_dev *dev) dev->iotlb = NULL; vhost_clear_msg(dev); wake_up_interruptible_poll(&dev->wait, EPOLLIN | EPOLLRDNORM);
WARN_ON(!llist_empty(&dev->work_list)); if (dev->worker) { kthread_stop(dev->worker); dev->worker = NULL;
Hmm I'm not sure why it matters.
Because after this new patch, putting locks in the while loop, when we finish the loop the workers should be stopped, because vhost_vq_reset() sets vq->private to NULL.
But the best thing IMHO is to check that there is no backend set for each vq, so the workers have been stopped correctly at this point.
Thanks, Stefano