On Tue, Mar 07, 2023 at 08:42:06AM +0000, Tian, Kevin wrote:
From: Jason Gunthorpe jgg@nvidia.com Sent: Saturday, February 25, 2023 8:28 AM
[...]
The implementation is complicated because we have to introduce some per-iommu_group memory in iommufd and redo how we think about multi- device groups to be more explicit. This solves all the locking problems in the prior attempts.
Now think about the pasid case.
pasid attach is managed as a device operation today: iommu_attach_device_pasid()
Following it naturally we'll have a pasid array per iommufd_device to track attached HWPT per pasid.
But internally there is only one pasid table per iommu group. i.e. same story as RID attach that once dev1 replaces hwpt on pasid1 then it takes effect on all other devices in the same group.
IMHO I can't belive that any actual systems that support PASID have a RID aliasing problem too.
I think we should fix the iommu core to make PASID per-device and require systems that have a RID aliasing problem to block PASID.
This is a bigger picture, if drivers have to optionally share their PASID tables with other drivers then we can't have per-driver PASID allocators at all either.
Then confusion comes. If we must have explicit group object in iommufd to manage domain replacement per rid, then do we need the same explicit mechanism e.g. tracking pasid attached hwpt in iommufd_group instead of in iommufd_device? and have a iommu_attach_group_pasid() API.
If we make PASID per-group then yes
But no actual HW models per-group PASID to the VM so this is a complete API disaster for vIOMMU. Better not to do it.
Jason