On Fri, Oct 25, 2024 at 08:34:05AM +0000, Tian, Kevin wrote:
From: Nicolin Chen nicolinc@nvidia.com Sent: Tuesday, October 22, 2024 8:19 AM
This series introduces a new vIOMMU infrastructure and related ioctls.
IOMMUFD has been using the HWPT infrastructure for all cases, including a nested IO page table support. Yet, there're limitations for an HWPT-based structure to support some advanced HW-accelerated features, such as CMDQV on NVIDIA Grace, and HW-accelerated vIOMMU on AMD. Even for a multi- IOMMU environment, it is not straightforward for nested HWPTs to share the same parent HWPT (stage-2 IO pagetable), with the HWPT infrastructure alone: a parent HWPT typically hold one stage-2 IO pagetable and tag it with only one ID in the cache entries. When sharing one large stage-2 IO pagetable across physical IOMMU instances, that one ID may not always be available across all the IOMMU instances. In other word, it's ideal for SW to have a different container for the stage-2 IO pagetable so it can hold another ID that's available.
Just holding multiple IDs doesn't require a different container. This is just a side effect when vIOMMU will be required for other said reasons.
If we have to put more words here I'd prefer to adding a bit more for CMDQV which is more compelling. not a big deal though. 😊
Ack.
For this "different container", add vIOMMU, an additional layer to hold extra virtualization information:
| iommufd (with vIOMMU) | | | | [5] | | _____________ | | | | | | |----------------| vIOMMU | | | | | | | | | | | | | | [1] | | [4] [2] | | | ______ | | _____________ ________ | | | | | | [3] | | | | | | | | | IOAS |<---|(HWPT_PAGING)|<---| HWPT_NESTED |<--| DEVICE | | | | |______| |_____________| |_____________| |________| | | | | | | | |
|______|________|______________|__________________|_____________ __|_____| | | | | | ______v_____ | ______v_____ ______v_____ ___v__ | struct | | PFN | (paging) | | (nested) | |struct| |iommu_device| |------>|iommu_domain|<----|iommu_domain|<---- |device| |____________| storage|____________| |____________| |______|
nit - [1] ... [5] can be removed.
They are copied from the Documentation where numbers are needed. I will take all the numbers out in the cover-letters.
The vIOMMU object should be seen as a slice of a physical IOMMU instance that is passed to or shared with a VM. That can be some HW/SW resources:
- Security namespace for guest owned ID, e.g. guest-controlled cache tags
- Access to a sharable nesting parent pagetable across physical IOMMUs
- Virtualization of various platforms IDs, e.g. RIDs and others
- Delivery of paravirtualized invalidation
- Direct assigned invalidation queues
- Direct assigned interrupts
- Non-affiliated event reporting
sorry no idea about 'non-affiliated event'. Can you elaborate?
I'll put an "e.g.".
On a multi-IOMMU system, the vIOMMU object must be instanced to the number of the physical IOMMUs that are passed to (via devices) a guest VM, while
'to the number of the physical IOMMUs that have a slice passed to ..."
Ack.
Thanks Nicolin