On Wed, Feb 15, 2023 at 01:37:19AM +0000, Tian, Kevin wrote:
From: Nicolin Chen nicolinc@nvidia.com Sent: Tuesday, February 14, 2023 6:54 PM
On Mon, Feb 13, 2023 at 10:49:34AM -0400, Jason Gunthorpe wrote:
On Sun, Feb 12, 2023 at 11:48:30PM -0800, Nicolin Chen wrote:
What about point 1? If dev2 and dev3 are already replaced when doing iommu_group_replace_domain() on dev1, their idev objects still have the old hwpt/iopt until user space does another two IOCTLs on them, right?
We have a complicated model for multi-device groups...
The first device in the group to change domains must move all the devices in the group
But userspace is still expected to run through and change all the other devices
So replace should be a NOP if the group is already linked to the right domain.
This patch needs to make sure that incosistency in the view betwen the iommu_group and the iommufd model doesn't cause a functional problem.
Yea, I was thinking that we'd need to block any access to the idev->hwpt of a pending device's, before the kernel finishes the "NOP" IOCTL from userspace, maybe with a helper: (iommu_get_domain_for_dev(idev->dev) != idev->hwpt->domain)
I didn't see what would be broken w/o such blocking measure.
Can you elaborate?
iommu_group { idev1, idev2 }
(1) Attach all devices to domain1 { group->domain = domain1; idev1->hwpt = hwpt1; // domain1 idev2->hwpt = hwpt1; // domain1 }
(2) Attach (replace) idev1 only to domain2 { group->domain = domain2 idev1->hwpt = hwpt2; // domain2 idev2->hwpt == domain1 // != iommu_get_domain_for_dev() }
Then if user space isn't aware of these and continues to do IOMMU_IOAS_MAP for idev2. Though IOVA mappings may be added onto the domain2 correctly, yet domain2's iopt itree won't reflect that, until idev2->hwpt is updated too, right?
And the tricky thing is that, though we advocate a device- centric uAPI, we'd still have to ask user space to align the devices in the same iommu_group, which is also not present in QEMU's RFC v3 series.
The traditional detach+attach flow doesn't seem to have this issue, since there's no re-entry so the work flow is always that detaching all devices first before attaching to another domain.
Thanks Nic