Re: [PATCH 1/1] vhost: Protect the virtqueue from being cleared whilst still in use

8 Mar 2022

On Tue, Mar 08, 2022 at 10:57:42AM +0100, Greg KH wrote:
...
On Tue, Mar 08, 2022 at 09:15:27AM +0000, Lee Jones wrote:
...
On Tue, 08 Mar 2022, Greg KH wrote:
...
On Tue, Mar 08, 2022 at 08:10:06AM +0000, Lee Jones wrote:
...
On Mon, 07 Mar 2022, Greg KH wrote:
...
On Mon, Mar 07, 2022 at 07:17:57PM +0000, Lee Jones wrote:
...
vhost_vsock_handle_tx_kick() already holds the mutex during its call
to vhost_get_vq_desc().  All we have to do here is take the same lock
during virtqueue clean-up and we mitigate the reported issues.
Also WARN() as a precautionary measure.  The purpose of this is to
capture possible future race conditions which may pop up over time.
Link: https://syzkaller.appspot.com/bug?extid=279432d30d825e63ba00
Cc: stable@vger.kernel.org
Reported-by: syzbot+adc3cb32385586bec859@syzkaller.appspotmail.com
Signed-off-by: Lee Jones lee.jones@linaro.org

drivers/vhost/vhost.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 59edb5a1ffe28..ef7e371e3e649 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -693,6 +693,15 @@ void vhost_dev_cleanup(struct vhost_dev *dev)
   int i;
 
   for (i = 0; i < dev->nvqs; ++i) {

/* No workers should run here by design. However, races have


 * previously occurred where drivers have been unable to flush


 * all work properly prior to clean-up.  Without a successful


 * flush the guest will malfunction, but avoiding host memory


 * corruption in those cases does seem preferable.


 */


WARN_ON(mutex_is_locked(&dev->vqs[i]->mutex));



So you are trading one syzbot triggered issue for another one in the
future?  :)
If this ever can happen, handle it, but don't log it with a WARN_ON() as
that will trigger the panic-on-warn boxes, as well as syzbot.  Unless
you want that to happen?
No, Syzbot doesn't report warnings, only BUGs and memory corruption.
Has it changed?  Last I looked, it did trigger on WARN_* calls, which
has resulted in a huge number of kernel fixes because of that.
Everything is customisable in syzkaller, so maybe there are specific
builds which panic_on_warn enabled, but none that I'm involved with
do.
Many systems run with panic-on-warn (i.e. the cloud), as they want to
drop a box and restart it if anything goes wrong.
That's why syzbot reports on WARN_* calls.  They should never be
reachable by userspace actions.
...
Here follows a topical example.  The report above in the Link: tag
comes with a crashlog [0].  In there you can see the WARN() at the
bottom of vhost_dev_cleanup() trigger many times due to a populated
(non-flushed) worker list, before finally tripping the BUG() which
triggers the report:
[0] https://syzkaller.appspot.com/text?tag=CrashLog&x=16a61fce700000
Ok, so both happens here.  But don't add a warning for something that
can't happen.  Just handle it and move on.  It looks like you are
handling it in this code, so please drop the WARN_ON().
thanks,
greg k-h
Hmm. Well this will mean if we ever reintroduce the bug then
syzkaller will not catch it for us :( And the bug is there,
it just results in a hard to reproduce error for userspace.
Not sure what to do here. Export panic_on_warn flag to modules
and check it here?
-- 
MST


    

2025

2024

2023

2022

2021

2020

2019

2018

2017

Re: [PATCH 1/1] vhost: Protect the virtqueue from being cleared whilst still in use