When converting to directly create the vfio_device the mdev driver has to put a vfio_register_emulated_iommu_dev() in the probe() and a pairing vfio_unregister_group_dev() in the remove.
This was missed for gvt, add it.
Cc: stable@vger.kernel.org Fixes: 978cf586ac35 ("drm/i915/gvt: convert to use vfio_register_emulated_iommu_dev") Reported-by: Alex Williamson alex.williamson@redhat.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com --- drivers/gpu/drm/i915/gvt/kvmgt.c | 1 + 1 file changed, 1 insertion(+)
Should go through Alex's tree.
diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c b/drivers/gpu/drm/i915/gvt/kvmgt.c index 41bba40feef8f4..9003145adb5a93 100644 --- a/drivers/gpu/drm/i915/gvt/kvmgt.c +++ b/drivers/gpu/drm/i915/gvt/kvmgt.c @@ -1615,6 +1615,7 @@ static void intel_vgpu_remove(struct mdev_device *mdev) if (WARN_ON_ONCE(vgpu->attached)) return;
+ vfio_unregister_group_dev(&vgpu->vfio_device); vfio_put_device(&vgpu->vfio_device); }
base-commit: c72e0034e6d4c36322d958b997d11d2627c6056c
From: Jason Gunthorpe Sent: Friday, September 30, 2022 1:49 AM
When converting to directly create the vfio_device the mdev driver has to put a vfio_register_emulated_iommu_dev() in the probe() and a pairing vfio_unregister_group_dev() in the remove.
This was missed for gvt, add it.
Cc: stable@vger.kernel.org Fixes: 978cf586ac35 ("drm/i915/gvt: convert to use vfio_register_emulated_iommu_dev") Reported-by: Alex Williamson alex.williamson@redhat.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com
Reviewed-by: Kevin Tian kevin.tian@intel.com
On Thu, 29 Sep 2022 14:48:35 -0300 Jason Gunthorpe jgg@nvidia.com wrote:
When converting to directly create the vfio_device the mdev driver has to put a vfio_register_emulated_iommu_dev() in the probe() and a pairing vfio_unregister_group_dev() in the remove.
This was missed for gvt, add it.
Cc: stable@vger.kernel.org Fixes: 978cf586ac35 ("drm/i915/gvt: convert to use vfio_register_emulated_iommu_dev") Reported-by: Alex Williamson alex.williamson@redhat.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com
drivers/gpu/drm/i915/gvt/kvmgt.c | 1 + 1 file changed, 1 insertion(+)
Should go through Alex's tree.
Applied to vfio next branch for v6.1. Thanks for the quick fix!
Alex
diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c b/drivers/gpu/drm/i915/gvt/kvmgt.c index 41bba40feef8f4..9003145adb5a93 100644 --- a/drivers/gpu/drm/i915/gvt/kvmgt.c +++ b/drivers/gpu/drm/i915/gvt/kvmgt.c @@ -1615,6 +1615,7 @@ static void intel_vgpu_remove(struct mdev_device *mdev) if (WARN_ON_ONCE(vgpu->attached)) return;
- vfio_unregister_group_dev(&vgpu->vfio_device); vfio_put_device(&vgpu->vfio_device);
}
base-commit: c72e0034e6d4c36322d958b997d11d2627c6056c
On Thu, 29 Sep 2022 14:48:35 -0300 Jason Gunthorpe jgg@nvidia.com wrote:
When converting to directly create the vfio_device the mdev driver has to put a vfio_register_emulated_iommu_dev() in the probe() and a pairing vfio_unregister_group_dev() in the remove.
This was missed for gvt, add it.
Cc: stable@vger.kernel.org Fixes: 978cf586ac35 ("drm/i915/gvt: convert to use vfio_register_emulated_iommu_dev") Reported-by: Alex Williamson alex.williamson@redhat.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com
drivers/gpu/drm/i915/gvt/kvmgt.c | 1 + 1 file changed, 1 insertion(+)
Should go through Alex's tree.
diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c b/drivers/gpu/drm/i915/gvt/kvmgt.c index 41bba40feef8f4..9003145adb5a93 100644 --- a/drivers/gpu/drm/i915/gvt/kvmgt.c +++ b/drivers/gpu/drm/i915/gvt/kvmgt.c @@ -1615,6 +1615,7 @@ static void intel_vgpu_remove(struct mdev_device *mdev) if (WARN_ON_ONCE(vgpu->attached)) return;
- vfio_unregister_group_dev(&vgpu->vfio_device); vfio_put_device(&vgpu->vfio_device);
}
base-commit: c72e0034e6d4c36322d958b997d11d2627c6056c
This is marked for stable, but I think the stable backport for existing kernels is actually:
diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c b/drivers/gpu/drm/i915/gvt/kvmgt.c index e3cd58946477..de89946c4817 100644 --- a/drivers/gpu/drm/i915/gvt/kvmgt.c +++ b/drivers/gpu/drm/i915/gvt/kvmgt.c @@ -1595,6 +1595,9 @@ static void intel_vgpu_remove(struct mdev_device *mdev)
if (WARN_ON_ONCE(vgpu->attached)) return; + + vfio_unregister_group_dev(&vgpu->vfio_device); + vfio_uninit_group_dev(&vgpu->vfio_device); intel_gvt_destroy_vgpu(vgpu); }
On Wed, 5 Oct 2022 14:17:17 -0600 Alex Williamson alex.williamson@redhat.com wrote:
On Thu, 29 Sep 2022 14:48:35 -0300 Jason Gunthorpe jgg@nvidia.com wrote:
When converting to directly create the vfio_device the mdev driver has to put a vfio_register_emulated_iommu_dev() in the probe() and a pairing vfio_unregister_group_dev() in the remove.
This was missed for gvt, add it.
Cc: stable@vger.kernel.org Fixes: 978cf586ac35 ("drm/i915/gvt: convert to use vfio_register_emulated_iommu_dev") Reported-by: Alex Williamson alex.williamson@redhat.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com
drivers/gpu/drm/i915/gvt/kvmgt.c | 1 + 1 file changed, 1 insertion(+)
Should go through Alex's tree.
diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c b/drivers/gpu/drm/i915/gvt/kvmgt.c index 41bba40feef8f4..9003145adb5a93 100644 --- a/drivers/gpu/drm/i915/gvt/kvmgt.c +++ b/drivers/gpu/drm/i915/gvt/kvmgt.c @@ -1615,6 +1615,7 @@ static void intel_vgpu_remove(struct mdev_device *mdev) if (WARN_ON_ONCE(vgpu->attached)) return;
Actually, what's the purpose of this ^^^^ ?
We can't have a .remove callback that does nothing, this breaks removing the device while it's in use. Once we have the vfio_unregister_group_dev() fix below, we'll block until the device is unused, at which point vgpu->attached becomes false. Unless I'm missing something, I think we should also follow-up with a patch to remove that bogus warn-on branch, right? Thanks,
Alex
- vfio_unregister_group_dev(&vgpu->vfio_device); vfio_put_device(&vgpu->vfio_device);
}
base-commit: c72e0034e6d4c36322d958b997d11d2627c6056c
This is marked for stable, but I think the stable backport for existing kernels is actually:
diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c b/drivers/gpu/drm/i915/gvt/kvmgt.c index e3cd58946477..de89946c4817 100644 --- a/drivers/gpu/drm/i915/gvt/kvmgt.c +++ b/drivers/gpu/drm/i915/gvt/kvmgt.c @@ -1595,6 +1595,9 @@ static void intel_vgpu_remove(struct mdev_device *mdev) if (WARN_ON_ONCE(vgpu->attached)) return;
- vfio_unregister_group_dev(&vgpu->vfio_device);
- vfio_uninit_group_dev(&vgpu->vfio_device); intel_gvt_destroy_vgpu(vgpu);
}
On Wed, Oct 05, 2022 at 04:03:56PM -0600, Alex Williamson wrote:
We can't have a .remove callback that does nothing, this breaks removing the device while it's in use. Once we have the vfio_unregister_group_dev() fix below, we'll block until the device is unused, at which point vgpu->attached becomes false. Unless I'm missing something, I think we should also follow-up with a patch to remove that bogus warn-on branch, right? Thanks,
Yes, looks right to me.
I question all the logical arround attached, where is the locking?
Jason
On Thu, 6 Oct 2022 08:37:09 -0300 Jason Gunthorpe jgg@nvidia.com wrote:
On Wed, Oct 05, 2022 at 04:03:56PM -0600, Alex Williamson wrote:
We can't have a .remove callback that does nothing, this breaks removing the device while it's in use. Once we have the vfio_unregister_group_dev() fix below, we'll block until the device is unused, at which point vgpu->attached becomes false. Unless I'm missing something, I think we should also follow-up with a patch to remove that bogus warn-on branch, right? Thanks,
Yes, looks right to me.
I question all the logical arround attached, where is the locking?
Zhenyu, Zhi, Kevin,
Could someone please take a look at use of vgpu->attached in the GVT-g driver? It's use in intel_vgpu_remove() is bogus, the .release callback needs to use vfio_unregister_group_dev() to wait for the device to be unused. The WARN_ON/return here breaks all future use of the device. I assume @attached has something to do with the page table interface with KVM, but it all looks racy anyway.
Also, whatever purpose vgpu->released served looks unnecessary now. Thanks,
Alex
From: Alex Williamson alex.williamson@redhat.com Sent: Friday, October 7, 2022 2:31 AM
On Thu, 6 Oct 2022 08:37:09 -0300 Jason Gunthorpe jgg@nvidia.com wrote:
On Wed, Oct 05, 2022 at 04:03:56PM -0600, Alex Williamson wrote:
We can't have a .remove callback that does nothing, this breaks removing the device while it's in use. Once we have the vfio_unregister_group_dev() fix below, we'll block until the device is unused, at which point vgpu->attached becomes false. Unless I'm missing something, I think we should also follow-up with a patch to remove that bogus warn-on branch, right? Thanks,
Yes, looks right to me.
I question all the logical arround attached, where is the locking?
Zhenyu, Zhi, Kevin,
Could someone please take a look at use of vgpu->attached in the GVT-g driver? It's use in intel_vgpu_remove() is bogus, the .release callback needs to use vfio_unregister_group_dev() to wait for the device to be unused. The WARN_ON/return here breaks all future use of the device. I assume @attached has something to do with the page table interface with KVM, but it all looks racy anyway.
Also, whatever purpose vgpu->released served looks unnecessary now. Thanks,
Zhi is looking at it.
On 10/6/22 18:31, Alex Williamson wrote:
On Thu, 6 Oct 2022 08:37:09 -0300 Jason Gunthorpe jgg@nvidia.com wrote:
On Wed, Oct 05, 2022 at 04:03:56PM -0600, Alex Williamson wrote:
We can't have a .remove callback that does nothing, this breaks removing the device while it's in use. Once we have the vfio_unregister_group_dev() fix below, we'll block until the device is unused, at which point vgpu->attached becomes false. Unless I'm missing something, I think we should also follow-up with a patch to remove that bogus warn-on branch, right? Thanks,
Yes, looks right to me.
I question all the logical arround attached, where is the locking?
Zhenyu, Zhi, Kevin,
Could someone please take a look at use of vgpu->attached in the GVT-g driver? It's use in intel_vgpu_remove() is bogus, the .release callback needs to use vfio_unregister_group_dev() to wait for the device to be unused. The WARN_ON/return here breaks all future use of the device. I assume @attached has something to do with the page table interface with KVM, but it all looks racy anyway.
Thanks for pointing this out.
It was introduced in the GVT-g refactor patch series and Christoph might not want to touch the vgpu->released while he needed a new state.
I dig it a bit. vgpu->attached would be used for preventing multiple open on a single vGPU and indicate the kvm_get_kvm() has been done. vgpu->released was to prevent the release before close, which is now handled by the vfio_device_*.
What I would like to do are: 1) Remove the vgpu->released. 2) Use alock to protect vgpu->attached.
After those were solved, the WARN_ON/return in the intel_vgpu_remove() should be safely removed as the .release will be called after .close_device deceases the vfio_device->refcnt to zero.
Thanks, Zhi.
Also, whatever purpose vgpu->released served looks unnecessary now. Thanks,
Alex
From: Wang, Zhi A zhi.a.wang@intel.com Sent: Wednesday, October 19, 2022 5:41 PM
On 10/6/22 18:31, Alex Williamson wrote:
On Thu, 6 Oct 2022 08:37:09 -0300 Jason Gunthorpe jgg@nvidia.com wrote:
On Wed, Oct 05, 2022 at 04:03:56PM -0600, Alex Williamson wrote:
We can't have a .remove callback that does nothing, this breaks removing the device while it's in use. Once we have the vfio_unregister_group_dev() fix below, we'll block until the device is unused, at which point vgpu->attached becomes false. Unless I'm missing something, I think we should also follow-up with a patch to remove that bogus warn-on branch, right? Thanks,
Yes, looks right to me.
I question all the logical arround attached, where is the locking?
Zhenyu, Zhi, Kevin,
Could someone please take a look at use of vgpu->attached in the GVT-g driver? It's use in intel_vgpu_remove() is bogus, the .release callback needs to use vfio_unregister_group_dev() to wait for the device to be unused. The WARN_ON/return here breaks all future use of the device. I assume @attached has something to do with the page table interface with KVM, but it all looks racy anyway.
Thanks for pointing this out.
It was introduced in the GVT-g refactor patch series and Christoph might not want to touch the vgpu->released while he needed a new state.
I dig it a bit. vgpu->attached would be used for preventing multiple open on a single vGPU and indicate the kvm_get_kvm() has been done.
vfio core already ensures that .open_device() is called only once:
vfio_device_open() { ... mutex_lock(&device->dev_set->lock); device->open_count++; if (device->open_count == 1) { ... if (device->ops->open_device) { ret = device->ops->open_device(device); ... }
vgpu->released was to prevent the release before close, which is now handled by the vfio_device_*.
What I would like to do are:
- Remove the vgpu->released. 2) Use alock to protect vgpu->attached.
After those were solved, the WARN_ON/return in the intel_vgpu_remove() should be safely removed as the .release will be called after .close_device deceases the vfio_device->refcnt to zero.
Thanks, Zhi.
Also, whatever purpose vgpu->released served looks unnecessary now. Thanks,
Alex
On Wed, Oct 05, 2022 at 02:17:17PM -0600, Alex Williamson wrote:
diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c b/drivers/gpu/drm/i915/gvt/kvmgt.c index 41bba40feef8f4..9003145adb5a93 100644 --- a/drivers/gpu/drm/i915/gvt/kvmgt.c +++ b/drivers/gpu/drm/i915/gvt/kvmgt.c @@ -1615,6 +1615,7 @@ static void intel_vgpu_remove(struct mdev_device *mdev) if (WARN_ON_ONCE(vgpu->attached)) return;
- vfio_unregister_group_dev(&vgpu->vfio_device); vfio_put_device(&vgpu->vfio_device);
}
base-commit: c72e0034e6d4c36322d958b997d11d2627c6056c
This is marked for stable, but I think the stable backport for existing kernels is actually:
Yes probably, this patch won't apply so if anyone wants to see it in the stable series they need to follow the process to send the reworked version.
Jason
linux-stable-mirror@lists.linaro.org