On 10/6/22 18:31, Alex Williamson wrote:
On Thu, 6 Oct 2022 08:37:09 -0300 Jason Gunthorpe jgg@nvidia.com wrote:
On Wed, Oct 05, 2022 at 04:03:56PM -0600, Alex Williamson wrote:
We can't have a .remove callback that does nothing, this breaks removing the device while it's in use. Once we have the vfio_unregister_group_dev() fix below, we'll block until the device is unused, at which point vgpu->attached becomes false. Unless I'm missing something, I think we should also follow-up with a patch to remove that bogus warn-on branch, right? Thanks,
Yes, looks right to me.
I question all the logical arround attached, where is the locking?
Zhenyu, Zhi, Kevin,
Could someone please take a look at use of vgpu->attached in the GVT-g driver? It's use in intel_vgpu_remove() is bogus, the .release callback needs to use vfio_unregister_group_dev() to wait for the device to be unused. The WARN_ON/return here breaks all future use of the device. I assume @attached has something to do with the page table interface with KVM, but it all looks racy anyway.
Thanks for pointing this out.
It was introduced in the GVT-g refactor patch series and Christoph might not want to touch the vgpu->released while he needed a new state.
I dig it a bit. vgpu->attached would be used for preventing multiple open on a single vGPU and indicate the kvm_get_kvm() has been done. vgpu->released was to prevent the release before close, which is now handled by the vfio_device_*.
What I would like to do are: 1) Remove the vgpu->released. 2) Use alock to protect vgpu->attached.
After those were solved, the WARN_ON/return in the intel_vgpu_remove() should be safely removed as the .release will be called after .close_device deceases the vfio_device->refcnt to zero.
Thanks, Zhi.
Also, whatever purpose vgpu->released served looks unnecessary now. Thanks,
Alex