Re: [PATCH untested] vhost: order avail ring reads after index updates

27 Mar 2024

On Wed, Mar 27, 2024 at 07:52:02PM +0000, Will Deacon wrote:
...
On Wed, Mar 27, 2024 at 01:26:23PM -0400, Michael S. Tsirkin wrote:
...
vhost_get_vq_desc (correctly) uses smp_rmb to order
avail ring reads after index reads.
However, over time we added two more places that read the
index and do not bother with barriers.
Since vhost_get_vq_desc when it was written assumed it is the
only reader when it sees a new index value is cached
it does not bother with a barrier either, as a result,
on the nvidia-gracehopper platform (arm64) available ring
entry reads have been observed bypassing ring reads, causing
a ring corruption.
To fix, factor out the correct index access code from vhost_get_vq_desc.
As a side benefit, we also validate the index on all paths now, which
will hopefully help catch future errors earlier.
Note: current code is inconsistent in how it handles errors:
some places treat it as an empty ring, others - non empty.
This patch does not attempt to change the existing behaviour.
Cc: stable@vger.kernel.org
Reported-by: Gavin Shan gshan@redhat.com
Reported-by: Will Deacon will@kernel.org
Suggested-by: Will Deacon will@kernel.org
Fixes: 275bf960ac69 ("vhost: better detection of available buffers")
Cc: "Jason Wang" jasowang@redhat.com
Fixes: d3bb267bbdcb ("vhost: cache avail index in vhost_enable_notify()")
Cc: "Stefano Garzarella" sgarzare@redhat.com
Signed-off-by: Michael S. Tsirkin mst@redhat.com

I think it's better to bite the bullet and clean up the code.
Note: this is still only built, not tested.
Gavin could you help test please?
Especially on the arm platform you have?
Will thanks so much for finding this race!
No problem, and I was also hoping that the smp_rmb() could be
consolidated into a single helper like you've done here.
One minor comment below:
...
drivers/vhost/vhost.c | 80 +++++++++++++++++++++++--------------------
 1 file changed, 42 insertions(+), 38 deletions(-)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 045f666b4f12..26b70b1fd9ff 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -1290,10 +1290,38 @@ static void vhost_dev_unlock_vqs(struct vhost_dev *d)
   	mutex_unlock(&d->vqs[i]->mutex);
 }
 
-static inline int vhost_get_avail_idx(struct vhost_virtqueue *vq,

		      __virtio16 *idx)



+static inline int vhost_get_avail_idx(struct vhost_virtqueue *vq)
 {

return vhost_get_avail(vq, *idx, &vq->avail->idx);


__virtio16 idx;
u16 avail_idx;
int r = vhost_get_avail(vq, idx, &vq->avail->idx);

if (unlikely(r < 0)) {
vq_err(vq, "Failed to access avail idx at %p: %d\n",


       &vq->avail->idx, r);


return -EFAULT;


}

avail_idx = vhost16_to_cpu(vq, idx);

/* Check it isn't doing very strange things with descriptor numbers. */
if (unlikely((u16)(avail_idx - vq->last_avail_idx) > vq->num)) {
vq_err(vq, "Guest moved used index from %u to %u",


       vq->last_avail_idx, vq->avail_idx);


return -EFAULT;


}

/* Nothing new? We are done. */
if (avail_idx == vq->avail_idx)
return 0;



vq->avail_idx = avail_idx;

/* We updated vq->avail_idx so we need a memory barrier between
* the index read above and the caller reading avail ring entries.


*/


smp_rmb();

I think you could use smp_acquire__after_ctrl_dep() if you're feeling
brave, but to be honest I'd prefer we went in the opposite direction
and used READ/WRITE_ONCE + smp_load_acquire()/smp_store_release() across
the board. It's just a thankless, error-prone task to get there :(
Let's just say that's a separate patch, I tried hard to make this one
a bugfix only, no other functional changes at all.
...
So, for the patch as-is:
Acked-by: Will Deacon will@kernel.org
(I've not tested it either though, so definitely wait for Gavin on that!)
Cheers,
Will

    

2025

2024

2023

2022

2021

2020

2019

2018

2017

Re: [PATCH untested] vhost: order avail ring reads after index updates