 
            From: Jason Gunthorpe jgg@nvidia.com Sent: Friday, May 19, 2023 7:50 PM
On Fri, May 19, 2023 at 09:56:04AM +0000, Tian, Kevin wrote:
From: Liu, Yi L yi.l.liu@intel.com Sent: Thursday, May 11, 2023 10:39 PM
Lu Baolu (2): iommu: Add new iommu op to create domains owned by userspace iommu: Add nested domain support
Nicolin Chen (5): iommufd/hw_pagetable: Do not populate user-managed hw_pagetables iommufd/selftest: Add domain_alloc_user() support in iommu mock iommufd/selftest: Add coverage for IOMMU_HWPT_ALLOC with user
data
iommufd/selftest: Add IOMMU_TEST_OP_MD_CHECK_IOTLB test op iommufd/selftest: Add coverage for IOMMU_HWPT_INVALIDATE ioctl
Yi Liu (4): iommufd/hw_pagetable: Use domain_alloc_user op for domain
allocation
iommufd: Pass parent hwpt and user_data to iommufd_hw_pagetable_alloc() iommufd: IOMMU_HWPT_ALLOC allocation with user data iommufd: Add IOMMU_HWPT_INVALIDATE
I didn't see any change in iommufd_hw_pagetable_attach() to handle stage-1 hwpt differently.
In concept whatever reserved regions existing on a device should be directly reflected on the hwpt which the device is attached to.
So with nesting presumably the reserved regions of the device have been reported to the userspace and it's user's responsibility to avoid allocating IOVA from those reserved regions in stage-1 hwpt.
Presumably
It's not necessarily to add reserved regions to the IOAS of the parent hwpt since the device doesn't access that address space after it's attached to stage-1. The parent is used only for address translation in the iommu side.
But if we don't put them in the IOAS of the parent there is no way for userspace to learn what they are to forward to the VM ?
emmm I wonder whether that is the right interface to report per-device reserved regions.
e.g. does it imply that all devices will be reported to the guest with the exact same set of reserved regions merged in the parent IOAS?
it works but looks unclear in concept. By definition the list of reserved regions on a device should be static/fixed instead of being dynamic upon which IOAS this device is attached to and how many other devices are sharing the same IOAS...
IOAS_IOVA_RANGES kind of follows what vfio type1 provides today
IMHO probably we should have DEVICE_IOVA_RANGES in the first place instead of doing it via IOAS_IOVA_RANGES which is then described as being dynamic upon the list of currently attached devices.
Since we expect the parent IOAS to be usable in an identity mode I think they should be added, at least I can't see a reason not to add them.
this is a good point.
for SMMU this sounds a must-have as identity mode is configured in CD with nested translation always enabled. It is out of the host awareness hence reserved regions must be added to the parent IOAS.
for VT-d identity must be configured explicitly and the hardware doesn't support stage-1 identity in nested mode. It essentially means not using nested translation and the user just explicitly attaches the associated RID or {RID, PASID} to the parent IOAS then get reserved regions covered already.
With that it makes more sense to make it a vendor specific choice. Probably can have a flag bit when creating nested hwpt to mark that identity mode might be used in this nested configuration then iommufd should add device reserved regions to the parent IOAS?
Which is definately complicating some parts of this..
Jason