From: Nicolin Chen nicolinc@nvidia.com Sent: Tuesday, October 22, 2024 8:19 AM
This series introduces a new vIOMMU infrastructure and related ioctls.
IOMMUFD has been using the HWPT infrastructure for all cases, including a nested IO page table support. Yet, there're limitations for an HWPT-based structure to support some advanced HW-accelerated features, such as CMDQV on NVIDIA Grace, and HW-accelerated vIOMMU on AMD. Even for a multi- IOMMU environment, it is not straightforward for nested HWPTs to share the same parent HWPT (stage-2 IO pagetable), with the HWPT infrastructure alone: a parent HWPT typically hold one stage-2 IO pagetable and tag it with only one ID in the cache entries. When sharing one large stage-2 IO pagetable across physical IOMMU instances, that one ID may not always be available across all the IOMMU instances. In other word, it's ideal for SW to have a different container for the stage-2 IO pagetable so it can hold another ID that's available.
Just holding multiple IDs doesn't require a different container. This is just a side effect when vIOMMU will be required for other said reasons.
If we have to put more words here I'd prefer to adding a bit more for CMDQV which is more compelling. not a big deal though. 😊
For this "different container", add vIOMMU, an additional layer to hold extra virtualization information:
| iommufd (with vIOMMU) | | | | [5] | | _____________ | | | | | | |----------------| vIOMMU | | | | | | | | | | | | | | [1] | | [4] [2] | | | ______ | | _____________ ________ | | | | | | [3] | | | | | | | | | IOAS |<---|(HWPT_PAGING)|<---| HWPT_NESTED |<--| DEVICE | | | | |______| |_____________| |_____________| |________| | | | | | | | |
|______|________|______________|__________________|_____________ __|_____| | | | | | ______v_____ | ______v_____ ______v_____ ___v__ | struct | | PFN | (paging) | | (nested) | |struct| |iommu_device| |------>|iommu_domain|<----|iommu_domain|<---- |device| |____________| storage|____________| |____________| |______|
nit - [1] ... [5] can be removed.
The vIOMMU object should be seen as a slice of a physical IOMMU instance that is passed to or shared with a VM. That can be some HW/SW resources:
- Security namespace for guest owned ID, e.g. guest-controlled cache tags
- Access to a sharable nesting parent pagetable across physical IOMMUs
- Virtualization of various platforms IDs, e.g. RIDs and others
- Delivery of paravirtualized invalidation
- Direct assigned invalidation queues
- Direct assigned interrupts
- Non-affiliated event reporting
sorry no idea about 'non-affiliated event'. Can you elaborate?
On a multi-IOMMU system, the vIOMMU object must be instanced to the number of the physical IOMMUs that are passed to (via devices) a guest VM, while
'to the number of the physical IOMMUs that have a slice passed to ..."
being able to hold the shareable parent HWPT. Each vIOMMU then just needs to allocate its own individual ID to tag its own cache: ---------------------------- ---------------- | | paging_hwpt0 | | hwpt_nested0 |--->| viommu0 ------------------
---------------- | | IDx |
----------------------------
---------------- | | paging_hwpt0 | | hwpt_nested1 |--->| viommu1 ------------------
---------------- | | IDy |
As an initial part-1, add IOMMUFD_CMD_VIOMMU_ALLOC ioctl for an allocation only. And implement it in arm-smmu-v3 driver as a real world use case.
More vIOMMU-based structs and ioctls will be introduced in the follow-up series to support vDEVICE, vIRQ (vEVENT) and vQUEUE objects. Although we repurposed the vIOMMU object from an earlier RFC, just for a referece: https://lore.kernel.org/all/cover.1712978212.git.nicolinc@nvidia.com/
This series is on Github: https://github.com/nicolinc/iommufd/commits/iommufd_viommu_p1-v4 (paring QEMU branch for testing will be provided with the part2 series)
Changelog v4
- Added "Reviewed-by" from Jason
- Dropped IOMMU_VIOMMU_TYPE_DEFAULT support
- Dropped iommufd_object_alloc_elm renamings
- Renamed iommufd's viommu_api.c to driver.c
- Reworked iommufd_viommu_alloc helper
- Added a separate iommufd_hwpt_nested_alloc_for_viommu function for hwpt_nested allocations on a vIOMMU, and added comparison between viommu->iommu_dev->ops and dev_iommu_ops(idev->dev)
- Replaced s2_parent with vsmmu in arm_smmu_nested_domain
- Replaced domain_alloc_user in iommu_ops with domain_alloc_nested in viommu_ops
- Replaced wait_queue_head_t with a completion, to delay the unplug of mock_iommu_dev
- Corrected documentation graph that was missing struct iommu_device
- Added an iommufd_verify_unfinalized_object helper to verify driver- allocated vIOMMU/vDEVICE objects
- Added missing test cases for TEST_LENGTH and fail_nth
v3 https://lore.kernel.org/all/cover.1728491453.git.nicolinc@nvidia.com/
- Rebased on top of Jason's nesting v3 series https://lore.kernel.org/all/0-v3-e2e16cd7467f+2a6a1-
smmuv3_nesting_jgg@nvidia.com/
- Split the series into smaller parts
- Added Jason's Reviewed-by
- Added back viommu->iommu_dev
- Added support for driver-allocated vIOMMU v.s. core-allocated
- Dropped arm_smmu_cache_invalidate_user
- Added an iommufd_test_wait_for_users() in selftest
- Reworked test code to make viommu an individual FIXTURE
- Added missing TEST_LENGTH case for the new ioctl command
v2 https://lore.kernel.org/all/cover.1724776335.git.nicolinc@nvidia.com/
- Limited vdev_id to one per idev
- Added a rw_sem to protect the vdev_id list
- Reworked driver-level APIs with proper lockings
- Added a new viommu_api file for IOMMUFD_DRIVER config
- Dropped useless iommu_dev point from the viommu structure
- Added missing index numnbers to new types in the uAPI header
- Dropped IOMMU_VIOMMU_INVALIDATE uAPI; Instead, reuse the HWPT
one
- Reworked mock_viommu_cache_invalidate() using the new iommu helper
- Reordered details of set/unset_vdev_id handlers for proper lockings
v1 https://lore.kernel.org/all/cover.1723061377.git.nicolinc@nvidia.com/
Thanks! Nicolin
Nicolin Chen (11): iommufd: Move struct iommufd_object to public iommufd header iommufd: Introduce IOMMUFD_OBJ_VIOMMU and its related struct iommufd: Add iommufd_verify_unfinalized_object iommufd/viommu: Add IOMMU_VIOMMU_ALLOC ioctl iommufd: Add domain_alloc_nested op to iommufd_viommu_ops iommufd: Allow pt_id to carry viommu_id for IOMMU_HWPT_ALLOC iommufd/selftest: Add refcount to mock_iommu_device iommufd/selftest: Add IOMMU_VIOMMU_TYPE_SELFTEST iommufd/selftest: Add IOMMU_VIOMMU_ALLOC test coverage Documentation: userspace-api: iommufd: Update vIOMMU iommu/arm-smmu-v3: Add IOMMU_VIOMMU_TYPE_ARM_SMMUV3 support
drivers/iommu/iommufd/Makefile | 5 +- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 26 +++--- drivers/iommu/iommufd/iommufd_private.h | 36 ++------ drivers/iommu/iommufd/iommufd_test.h | 2 + include/linux/iommu.h | 14 +++ include/linux/iommufd.h | 89 +++++++++++++++++++ include/uapi/linux/iommufd.h | 56 ++++++++++-- tools/testing/selftests/iommu/iommufd_utils.h | 28 ++++++ .../arm/arm-smmu-v3/arm-smmu-v3-iommufd.c | 79 ++++++++++------ drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 9 +- drivers/iommu/iommufd/driver.c | 38 ++++++++ drivers/iommu/iommufd/hw_pagetable.c | 69 +++++++++++++- drivers/iommu/iommufd/main.c | 58 ++++++------ drivers/iommu/iommufd/selftest.c | 73 +++++++++++++-- drivers/iommu/iommufd/viommu.c | 85 ++++++++++++++++++ tools/testing/selftests/iommu/iommufd.c | 78 ++++++++++++++++ .../selftests/iommu/iommufd_fail_nth.c | 11 +++ Documentation/userspace-api/iommufd.rst | 69 +++++++++++++- 18 files changed, 701 insertions(+), 124 deletions(-) create mode 100644 drivers/iommu/iommufd/driver.c create mode 100644 drivers/iommu/iommufd/viommu.c
-- 2.43.0