The vIOMMU object is designed to represent a slice of an IOMMU HW for its virtualization features shared with or passed to user space (a VM mostly) in a way of HW acceleration. This extended the HWPT-based design for more advanced virtualization feature.
A vCMDQ introduced by this series as a part of the vIOMMU infrastructure represents a HW supported queue/buffer for VM to use exclusively, e.g. - NVIDIA's virtual command queue - AMD vIOMMU's command buffer either of which is an IOMMU HW feature to directly load and execute cache invalidation commands issued by a guest kernel, to shoot down TLB entries that HW cached for guest-owned stage-1 page table entries. This is a big improvement since there is no VM Exit during an invalidation, compared to the traditional invalidation pathway by trapping a guest-own invalidation queue and forwarding those commands/requests to the host kernel that will eventually fill a HW-owned queue to execute those commands.
Thus, a vCMDQ object, as an initial use case, is all about a guest-owned HW command queue that VMM can allocate/configure depending on the request from a guest kernel. Introduce a new IOMMUFD_OBJ_VCMDQ and its allocator IOMMUFD_CMD_VCMDQ_ALLOC allowing VMM to forward the IOMMU-specific queue info, such as queue base address, size, and etc.
Meanwhile, a guest-owned command queue needs the kernel (a command queue driver) to control the queue by reading/writing its consumer and producer indexes, which means the command queue HW allows the guest kernel to get a direct R/W access to those registers. Introduce an mmap infrastructure to the iommufd core so as to support pass through a piece of MMIO region from the host physical address space to the guest physical address space. The VMA info (vm_pgoff/size) used by an mmap must be pre-allocated during the IOMMUFD_CMD_VCMDQ_ALLOC and given those info to the user space as an output driver-data by the IOMMUFD_CMD_VCMDQ_ALLOC. So, this requires a driver-specific user data support by a vIOMMU object.
As a real-world use case, this series implements a vCMDQ support to the tegra241-cmdqv driver for the vCMDQ on NVIDIA Grace CPU. In another word, this is also the Tegra CMDQV series Part-2 (user-space support), reworked from Previous RFCv1: https://lore.kernel.org/all/cover.1712978212.git.nicolinc@nvidia.com/
This is on Github: https://github.com/nicolinc/iommufd/commits/iommufd_vcmdq-v1
Paring QEMU branch for testing: https://github.com/nicolinc/qemu/commits/wip/for_iommufd_vcmdq-v1
Thanks Nicolin
Nicolin Chen (16): iommu: Pass in a driver-level user data structure to viommu_alloc op iommufd/viommu: Allow driver-specific user data for a vIOMMU object iommu: Add iommu_copy_struct_to_user helper iommufd: Add iommufd_struct_destroy to revert iommufd_viommu_alloc iommufd/selftest: Support user_data in mock_viommu_alloc iommufd/selftest: Add covearge for viommu data iommufd/viommu: Add driver-allocated vDEVICE support iommufd/viommu: Introduce IOMMUFD_OBJ_VCMDQ and its related struct iommufd/viommmu: Add IOMMUFD_CMD_VCMDQ_ALLOC ioctl iommufd: Add mmap interface iommufd/selftest: Add coverage for the new mmap interface Documentation: userspace-api: iommufd: Update vCMDQ iommu/tegra241-cmdqv: Use request_threaded_irq iommu/arm-smmu-v3: Add vsmmu_alloc impl op iommu/tegra241-cmdqv: Add user-space use support iommu/tegra241-cmdqv: Add IOMMU_VEVENTQ_TYPE_TEGRA241_CMDQV support
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 24 +- drivers/iommu/iommufd/iommufd_private.h | 20 +- drivers/iommu/iommufd/iommufd_test.h | 17 + include/linux/iommu.h | 43 ++- include/linux/iommufd.h | 93 +++++ include/uapi/linux/iommufd.h | 87 +++++ tools/testing/selftests/iommu/iommufd_utils.h | 21 +- .../arm/arm-smmu-v3/arm-smmu-v3-iommufd.c | 26 +- .../iommu/arm/arm-smmu-v3/tegra241-cmdqv.c | 349 +++++++++++++++++- drivers/iommu/iommufd/driver.c | 54 +++ drivers/iommu/iommufd/main.c | 54 ++- drivers/iommu/iommufd/selftest.c | 58 ++- drivers/iommu/iommufd/viommu.c | 78 +++- tools/testing/selftests/iommu/iommufd.c | 34 +- .../selftests/iommu/iommufd_fail_nth.c | 5 +- Documentation/userspace-api/iommufd.rst | 11 + 16 files changed, 912 insertions(+), 62 deletions(-)
The new type of vIOMMU for tegra241-cmdqv needs to pass in a driver-level data structure from user space via iommufd, so add a user_data to the op.
Signed-off-by: Nicolin Chen nicolinc@nvidia.com --- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 3 ++- include/linux/iommu.h | 3 ++- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c | 3 ++- drivers/iommu/iommufd/selftest.c | 8 ++++---- drivers/iommu/iommufd/viommu.c | 2 +- 5 files changed, 11 insertions(+), 8 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h index dd1ad56ce863..6b8f0d20dac3 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h @@ -1062,7 +1062,8 @@ void *arm_smmu_hw_info(struct device *dev, u32 *length, u32 *type); struct iommufd_viommu *arm_vsmmu_alloc(struct device *dev, struct iommu_domain *parent, struct iommufd_ctx *ictx, - unsigned int viommu_type); + unsigned int viommu_type, + const struct iommu_user_data *user_data); int arm_smmu_attach_prepare_vmaster(struct arm_smmu_attach_state *state, struct arm_smmu_nested_domain *nested_domain); void arm_smmu_attach_commit_vmaster(struct arm_smmu_attach_state *state); diff --git a/include/linux/iommu.h b/include/linux/iommu.h index ccce8a751e2a..ebde19aa3c28 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -662,7 +662,8 @@ struct iommu_ops {
struct iommufd_viommu *(*viommu_alloc)( struct device *dev, struct iommu_domain *parent_domain, - struct iommufd_ctx *ictx, unsigned int viommu_type); + struct iommufd_ctx *ictx, unsigned int viommu_type, + const struct iommu_user_data *user_data);
const struct iommu_domain_ops *default_domain_ops; unsigned long pgsize_bitmap; diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c index e4fd8d522af8..66855cae775e 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c @@ -385,7 +385,8 @@ static const struct iommufd_viommu_ops arm_vsmmu_ops = { struct iommufd_viommu *arm_vsmmu_alloc(struct device *dev, struct iommu_domain *parent, struct iommufd_ctx *ictx, - unsigned int viommu_type) + unsigned int viommu_type, + const struct iommu_user_data *user_data) { struct arm_smmu_device *smmu = iommu_get_iommu_dev(dev, struct arm_smmu_device, iommu); diff --git a/drivers/iommu/iommufd/selftest.c b/drivers/iommu/iommufd/selftest.c index 18d9a216eb30..8b8ba4fb91cd 100644 --- a/drivers/iommu/iommufd/selftest.c +++ b/drivers/iommu/iommufd/selftest.c @@ -733,10 +733,10 @@ static struct iommufd_viommu_ops mock_viommu_ops = { .cache_invalidate = mock_viommu_cache_invalidate, };
-static struct iommufd_viommu *mock_viommu_alloc(struct device *dev, - struct iommu_domain *domain, - struct iommufd_ctx *ictx, - unsigned int viommu_type) +static struct iommufd_viommu * +mock_viommu_alloc(struct device *dev, struct iommu_domain *domain, + struct iommufd_ctx *ictx, unsigned int viommu_type, + const struct iommu_user_data *user_data) { struct mock_iommu_device *mock_iommu = iommu_get_iommu_dev(dev, struct mock_iommu_device, iommu_dev); diff --git a/drivers/iommu/iommufd/viommu.c b/drivers/iommu/iommufd/viommu.c index 01df2b985f02..b861ca49d723 100644 --- a/drivers/iommu/iommufd/viommu.c +++ b/drivers/iommu/iommufd/viommu.c @@ -48,7 +48,7 @@ int iommufd_viommu_alloc_ioctl(struct iommufd_ucmd *ucmd) }
viommu = ops->viommu_alloc(idev->dev, hwpt_paging->common.domain, - ucmd->ictx, cmd->type); + ucmd->ictx, cmd->type, NULL); if (IS_ERR(viommu)) { rc = PTR_ERR(viommu); goto out_put_hwpt;
On Thu, Apr 10, 2025 at 11:37:40PM -0700, Nicolin Chen wrote:
The new type of vIOMMU for tegra241-cmdqv needs to pass in a driver-level data structure from user space via iommufd, so add a user_data to the op.
Signed-off-by: Nicolin Chen nicolinc@nvidia.com
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 3 ++- include/linux/iommu.h | 3 ++- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c | 3 ++- drivers/iommu/iommufd/selftest.c | 8 ++++---- drivers/iommu/iommufd/viommu.c | 2 +- 5 files changed, 11 insertions(+), 8 deletions(-)
Reviewed-by: Jason Gunthorpe jgg@nvidia.com
Jason
The new type of vIOMMU for tegra241-cmdqv driver needs a driver-specific user data. So, add data_len/uptr to the iommu_viommu_alloc uAPI and pass it in via the viommu_alloc iommu op.
Signed-off-by: Nicolin Chen nicolinc@nvidia.com --- include/uapi/linux/iommufd.h | 6 ++++++ drivers/iommu/iommufd/viommu.c | 8 +++++++- 2 files changed, 13 insertions(+), 1 deletion(-)
diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h index f29b6c44655e..cc90299a08d9 100644 --- a/include/uapi/linux/iommufd.h +++ b/include/uapi/linux/iommufd.h @@ -965,6 +965,9 @@ enum iommu_viommu_type { * @dev_id: The device's physical IOMMU will be used to back the virtual IOMMU * @hwpt_id: ID of a nesting parent HWPT to associate to * @out_viommu_id: Output virtual IOMMU ID for the allocated object + * @data_len: Length of the type specific data + * @__reserved: Must be 0 + * @data_uptr: User pointer to an array of driver-specific virtual IOMMU data * * Allocate a virtual IOMMU object, representing the underlying physical IOMMU's * virtualization support that is a security-isolated slice of the real IOMMU HW @@ -985,6 +988,9 @@ struct iommu_viommu_alloc { __u32 dev_id; __u32 hwpt_id; __u32 out_viommu_id; + __u32 data_len; + __u32 __reserved; + __aligned_u64 data_uptr; }; #define IOMMU_VIOMMU_ALLOC _IO(IOMMUFD_TYPE, IOMMUFD_CMD_VIOMMU_ALLOC)
diff --git a/drivers/iommu/iommufd/viommu.c b/drivers/iommu/iommufd/viommu.c index b861ca49d723..3d4d325adeaa 100644 --- a/drivers/iommu/iommufd/viommu.c +++ b/drivers/iommu/iommufd/viommu.c @@ -17,6 +17,11 @@ void iommufd_viommu_destroy(struct iommufd_object *obj) int iommufd_viommu_alloc_ioctl(struct iommufd_ucmd *ucmd) { struct iommu_viommu_alloc *cmd = ucmd->cmd; + const struct iommu_user_data user_data = { + .type = cmd->type, + .uptr = u64_to_user_ptr(cmd->data_uptr), + .len = cmd->data_len, + }; struct iommufd_hwpt_paging *hwpt_paging; struct iommufd_viommu *viommu; struct iommufd_device *idev; @@ -48,7 +53,8 @@ int iommufd_viommu_alloc_ioctl(struct iommufd_ucmd *ucmd) }
viommu = ops->viommu_alloc(idev->dev, hwpt_paging->common.domain, - ucmd->ictx, cmd->type, NULL); + ucmd->ictx, cmd->type, + user_data.len ? &user_data : NULL); if (IS_ERR(viommu)) { rc = PTR_ERR(viommu); goto out_put_hwpt;
On Thu, Apr 10, 2025 at 11:37:41PM -0700, Nicolin Chen wrote:
The new type of vIOMMU for tegra241-cmdqv driver needs a driver-specific user data. So, add data_len/uptr to the iommu_viommu_alloc uAPI and pass it in via the viommu_alloc iommu op.
Signed-off-by: Nicolin Chen nicolinc@nvidia.com
include/uapi/linux/iommufd.h | 6 ++++++ drivers/iommu/iommufd/viommu.c | 8 +++++++- 2 files changed, 13 insertions(+), 1 deletion(-)
Reviewed-by: Jason Gunthorpe jgg@nvidia.com
Jason
Similar to the iommu_copy_struct_from_user helper receiving data from the user space, add an iommu_copy_struct_to_user helper to report output data back to the user space data pointer.
Signed-off-by: Nicolin Chen nicolinc@nvidia.com --- include/linux/iommu.h | 40 ++++++++++++++++++++++++++++++++++++++++ 1 file changed, 40 insertions(+)
diff --git a/include/linux/iommu.h b/include/linux/iommu.h index ebde19aa3c28..ffb00054da33 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -562,6 +562,46 @@ iommu_copy_struct_from_full_user_array(void *kdst, size_t kdst_entry_size, return 0; }
+/** + * __iommu_copy_struct_to_user - Report iommu driver specific user space data + * @dst_data: Pointer to a struct iommu_user_data for user space data location + * @src_data: Pointer to an iommu driver specific user data that is defined in + * include/uapi/linux/iommufd.h + * @data_type: The data type of the @dst_data. Must match with @src_data.type + * @data_len: Length of current user data structure, i.e. sizeof(struct _src) + * @min_len: Initial length of user data structure for backward compatibility. + * This should be offsetofend using the last member in the user data + * struct that was initially added to include/uapi/linux/iommufd.h + */ +static inline int +__iommu_copy_struct_to_user(const struct iommu_user_data *dst_data, + void *src_data, unsigned int data_type, + size_t data_len, size_t min_len) +{ + if (dst_data->type != data_type) + return -EINVAL; + if (WARN_ON(!dst_data || !src_data)) + return -EINVAL; + if (dst_data->len < min_len || data_len < dst_data->len) + return -EINVAL; + return copy_struct_to_user(dst_data->uptr, dst_data->len, src_data, + data_len, NULL); +} + +/** + * iommu_copy_struct_to_user - Report iommu driver specific user space data + * @user_data: Pointer to a struct iommu_user_data for user space data location + * @ksrc: Pointer to an iommu driver specific user data that is defined in + * include/uapi/linux/iommufd.h + * @data_type: The data type of the @ksrc. Must match with @user_data->type + * @min_last: The last memember of the data structure @ksrc points in the + * initial version. + * Return 0 for success, otherwise -error. + */ +#define iommu_copy_struct_to_user(user_data, ksrc, data_type, min_last) \ + __iommu_copy_struct_to_user(user_data, ksrc, data_type, sizeof(*ksrc), \ + offsetofend(typeof(*ksrc), min_last)) + /** * struct iommu_ops - iommu ops and capabilities * @capable: check capability
On 11-04-2025 12:07, Nicolin Chen wrote:
- iommu_copy_struct_to_user - Report iommu driver specific user space data
- @user_data: Pointer to a struct iommu_user_data for user space data location
- @ksrc: Pointer to an iommu driver specific user data that is defined in
include/uapi/linux/iommufd.h
- @data_type: The data type of the @ksrc. Must match with @user_data->type
- @min_last: The last memember of the data structure @ksrc points in the
old typo memember -> member
initial version.
- Return 0 for success, otherwise -error.
Thanks, Alok
On Fri, Apr 11, 2025 at 06:05:30PM +0530, ALOK TIWARI wrote:
On 11-04-2025 12:07, Nicolin Chen wrote:
- iommu_copy_struct_to_user - Report iommu driver specific user space data
- @user_data: Pointer to a struct iommu_user_data for user space data location
- @ksrc: Pointer to an iommu driver specific user data that is defined in
include/uapi/linux/iommufd.h
- @data_type: The data type of the @ksrc. Must match with @user_data->type
- @min_last: The last memember of the data structure @ksrc points in the
old typo memember -> member
Fixed for this one.
And yea, we need a patch fixing iommu_copy_struct_from_user() too.
Thanks Nicolin
On Apr 11, 2025, at 1:37 AM, Nicolin Chen nicolinc@nvidia.com wrote: +__iommu_copy_struct_to_user(const struct iommu_user_data *dst_data,
- void *src_data, unsigned int data_type,
- size_t data_len, size_t min_len)
+{
- if (dst_data->type != data_type)
- return -EINVAL;
- if (WARN_ON(!dst_data || !src_data))
- return -EINVAL;
The NULL pointer check should be first.
-matt
On Mon, Apr 14, 2025 at 08:25:40AM -0700, Matt Ochs wrote:
On Apr 11, 2025, at 1:37 AM, Nicolin Chen nicolinc@nvidia.com wrote: +__iommu_copy_struct_to_user(const struct iommu_user_data *dst_data,
- void *src_data, unsigned int data_type,
- size_t data_len, size_t min_len)
+{
- if (dst_data->type != data_type)
- return -EINVAL;
- if (WARN_ON(!dst_data || !src_data))
- return -EINVAL;
The NULL pointer check should be first.
Fixed.
We seem to have the same issue in __iommu_copy_struct_from_user(). Will send a patch fix that too.
Thanks Nicolin
On Thu, Apr 10, 2025 at 11:37:42PM -0700, Nicolin Chen wrote:
Similar to the iommu_copy_struct_from_user helper receiving data from the user space, add an iommu_copy_struct_to_user helper to report output data back to the user space data pointer.
Signed-off-by: Nicolin Chen nicolinc@nvidia.com
include/linux/iommu.h | 40 ++++++++++++++++++++++++++++++++++++++++ 1 file changed, 40 insertions(+)
Reviewed-by: Jason Gunthorpe jgg@nvidia.com
Jason
An IOMMU driver that allocated a vIOMMU may want to revert the allocation, if it encounters an internal error after the allocation. So, there needs a destroy helper for drivers to use.
Move iommufd_object_abort() to the driver.c file and the public header, to introduce common iommufd_struct_destroy() helper that will abort all kinds of driver structures, not confined to iommufd_viommu but also the new ones being added by the following patches.
Signed-off-by: Nicolin Chen nicolinc@nvidia.com --- drivers/iommu/iommufd/iommufd_private.h | 1 - include/linux/iommufd.h | 15 +++++++++++++++ drivers/iommu/iommufd/driver.c | 14 ++++++++++++++ drivers/iommu/iommufd/main.c | 13 ------------- 4 files changed, 29 insertions(+), 14 deletions(-)
diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h index 80e8c76d25f2..51f1b3938023 100644 --- a/drivers/iommu/iommufd/iommufd_private.h +++ b/drivers/iommu/iommufd/iommufd_private.h @@ -180,7 +180,6 @@ static inline void iommufd_put_object(struct iommufd_ctx *ictx, wake_up_interruptible_all(&ictx->destroy_wait); }
-void iommufd_object_abort(struct iommufd_ctx *ictx, struct iommufd_object *obj); void iommufd_object_abort_and_destroy(struct iommufd_ctx *ictx, struct iommufd_object *obj); void iommufd_object_finalize(struct iommufd_ctx *ictx, diff --git a/include/linux/iommufd.h b/include/linux/iommufd.h index 34b6e6ca4bfa..999498ddab75 100644 --- a/include/linux/iommufd.h +++ b/include/linux/iommufd.h @@ -192,6 +192,7 @@ static inline int iommufd_vfio_compat_set_no_iommu(struct iommufd_ctx *ictx) struct iommufd_object *_iommufd_object_alloc(struct iommufd_ctx *ictx, size_t size, enum iommufd_object_type type); +void iommufd_object_abort(struct iommufd_ctx *ictx, struct iommufd_object *obj); struct device *iommufd_viommu_find_dev(struct iommufd_viommu *viommu, unsigned long vdev_id); int iommufd_viommu_get_vdev_id(struct iommufd_viommu *viommu, @@ -207,6 +208,11 @@ _iommufd_object_alloc(struct iommufd_ctx *ictx, size_t size, return ERR_PTR(-EOPNOTSUPP); }
+static inline void iommufd_object_abort(struct iommufd_ctx *ictx, + struct iommufd_object *obj) +{ +} + static inline struct device * iommufd_viommu_find_dev(struct iommufd_viommu *viommu, unsigned long vdev_id) { @@ -245,4 +251,13 @@ static inline int iommufd_viommu_report_event(struct iommufd_viommu *viommu, ret->member.ops = viommu_ops; \ ret; \ }) + +/* Helper for IOMMU driver to destroy structures created by allocators above */ +#define iommufd_struct_destroy(ictx, drv_struct, member) \ + ({ \ + static_assert(__same_type(struct iommufd_object, \ + drv_struct->member.obj)); \ + static_assert(offsetof(typeof(*drv_struct), member.obj) == 0); \ + iommufd_object_abort(ictx, &drv_struct->member.obj); \ + }) #endif diff --git a/drivers/iommu/iommufd/driver.c b/drivers/iommu/iommufd/driver.c index 922cd1fe7ec2..7980a09761c2 100644 --- a/drivers/iommu/iommufd/driver.c +++ b/drivers/iommu/iommufd/driver.c @@ -36,6 +36,20 @@ struct iommufd_object *_iommufd_object_alloc(struct iommufd_ctx *ictx, } EXPORT_SYMBOL_NS_GPL(_iommufd_object_alloc, "IOMMUFD");
+/* Undo _iommufd_object_alloc() if iommufd_object_finalize() was not called */ +void iommufd_object_abort(struct iommufd_ctx *ictx, struct iommufd_object *obj) +{ + XA_STATE(xas, &ictx->objects, obj->id); + void *old; + + xa_lock(&ictx->objects); + old = xas_store(&xas, NULL); + xa_unlock(&ictx->objects); + WARN_ON(old != XA_ZERO_ENTRY); + kfree(obj); +} +EXPORT_SYMBOL_NS_GPL(iommufd_object_abort, "IOMMUFD"); + /* Caller should xa_lock(&viommu->vdevs) to protect the return value */ struct device *iommufd_viommu_find_dev(struct iommufd_viommu *viommu, unsigned long vdev_id) diff --git a/drivers/iommu/iommufd/main.c b/drivers/iommu/iommufd/main.c index 3df468f64e7d..2b9ee9b4a424 100644 --- a/drivers/iommu/iommufd/main.c +++ b/drivers/iommu/iommufd/main.c @@ -51,19 +51,6 @@ void iommufd_object_finalize(struct iommufd_ctx *ictx, WARN_ON(old != XA_ZERO_ENTRY); }
-/* Undo _iommufd_object_alloc() if iommufd_object_finalize() was not called */ -void iommufd_object_abort(struct iommufd_ctx *ictx, struct iommufd_object *obj) -{ - XA_STATE(xas, &ictx->objects, obj->id); - void *old; - - xa_lock(&ictx->objects); - old = xas_store(&xas, NULL); - xa_unlock(&ictx->objects); - WARN_ON(old != XA_ZERO_ENTRY); - kfree(obj); -} - /* * Abort an object that has been fully initialized and needs destroy, but has * not been finalized.
On Thu, Apr 10, 2025 at 11:37:43PM -0700, Nicolin Chen wrote:
An IOMMU driver that allocated a vIOMMU may want to revert the allocation, if it encounters an internal error after the allocation. So, there needs a destroy helper for drivers to use.
Move iommufd_object_abort() to the driver.c file and the public header, to introduce common iommufd_struct_destroy() helper that will abort all kinds of driver structures, not confined to iommufd_viommu but also the new ones being added by the following patches.
Signed-off-by: Nicolin Chen nicolinc@nvidia.com
drivers/iommu/iommufd/iommufd_private.h | 1 - include/linux/iommufd.h | 15 +++++++++++++++ drivers/iommu/iommufd/driver.c | 14 ++++++++++++++ drivers/iommu/iommufd/main.c | 13 ------------- 4 files changed, 29 insertions(+), 14 deletions(-)
Reviewed-by: Jason Gunthorpe jgg@nvidia.com
Jason
Add a simple user_data for an input-to-output loopback test.
Signed-off-by: Nicolin Chen nicolinc@nvidia.com --- drivers/iommu/iommufd/iommufd_test.h | 13 +++++++++++++ drivers/iommu/iommufd/selftest.c | 19 +++++++++++++++++++ 2 files changed, 32 insertions(+)
diff --git a/drivers/iommu/iommufd/iommufd_test.h b/drivers/iommu/iommufd/iommufd_test.h index 1cd7e8394129..fbf9ecb35a13 100644 --- a/drivers/iommu/iommufd/iommufd_test.h +++ b/drivers/iommu/iommufd/iommufd_test.h @@ -227,6 +227,19 @@ struct iommu_hwpt_invalidate_selftest {
#define IOMMU_VIOMMU_TYPE_SELFTEST 0xdeadbeef
+/** + * struct iommu_viommu_selftest - vIOMMU data for Mock driver + * (IOMMU_VIOMMU_TYPE_SELFTEST) + * @in_data: Input random data from user space + * @out_data: Output data (matching @in_data) to user space + * + * Simply set @out_data=@in_data for a loopback test + */ +struct iommu_viommu_selftest { + __u32 in_data; + __u32 out_data; +}; + /* Should not be equal to any defined value in enum iommu_viommu_invalidate_data_type */ #define IOMMU_VIOMMU_INVALIDATE_DATA_SELFTEST 0xdeadbeef #define IOMMU_VIOMMU_INVALIDATE_DATA_SELFTEST_INVALID 0xdadbeef diff --git a/drivers/iommu/iommufd/selftest.c b/drivers/iommu/iommufd/selftest.c index 8b8ba4fb91cd..b04bd2fbc53d 100644 --- a/drivers/iommu/iommufd/selftest.c +++ b/drivers/iommu/iommufd/selftest.c @@ -740,16 +740,35 @@ mock_viommu_alloc(struct device *dev, struct iommu_domain *domain, { struct mock_iommu_device *mock_iommu = iommu_get_iommu_dev(dev, struct mock_iommu_device, iommu_dev); + struct iommu_viommu_selftest data; struct mock_viommu *mock_viommu; + int rc;
if (viommu_type != IOMMU_VIOMMU_TYPE_SELFTEST) return ERR_PTR(-EOPNOTSUPP);
+ if (user_data) { + rc = iommu_copy_struct_from_user( + &data, user_data, IOMMU_VIOMMU_TYPE_SELFTEST, out_data); + if (rc) + return ERR_PTR(rc); + } + mock_viommu = iommufd_viommu_alloc(ictx, struct mock_viommu, core, &mock_viommu_ops); if (IS_ERR(mock_viommu)) return ERR_CAST(mock_viommu);
+ if (user_data) { + data.out_data = data.in_data; + rc = iommu_copy_struct_to_user( + user_data, &data, IOMMU_VIOMMU_TYPE_SELFTEST, out_data); + if (rc) { + iommufd_struct_destroy(ictx, mock_viommu, core); + return ERR_PTR(rc); + } + } + refcount_inc(&mock_iommu->users); return &mock_viommu->core; }
Extend the existing test_cmd/err_viommu_alloc helpers to accept optional user data. And add a TEST_F for a loopback test.
Signed-off-by: Nicolin Chen nicolinc@nvidia.com --- tools/testing/selftests/iommu/iommufd_utils.h | 21 +++++++++----- tools/testing/selftests/iommu/iommufd.c | 29 +++++++++++++++---- .../selftests/iommu/iommufd_fail_nth.c | 5 ++-- 3 files changed, 39 insertions(+), 16 deletions(-)
diff --git a/tools/testing/selftests/iommu/iommufd_utils.h b/tools/testing/selftests/iommu/iommufd_utils.h index 72f6636e5d90..a5d4cbd089ba 100644 --- a/tools/testing/selftests/iommu/iommufd_utils.h +++ b/tools/testing/selftests/iommu/iommufd_utils.h @@ -897,7 +897,8 @@ static int _test_cmd_trigger_iopf(int fd, __u32 device_id, __u32 pasid, pasid, fault_fd))
static int _test_cmd_viommu_alloc(int fd, __u32 device_id, __u32 hwpt_id, - __u32 type, __u32 flags, __u32 *viommu_id) + __u32 flags, __u32 type, void *data, + __u32 data_len, __u32 *viommu_id) { struct iommu_viommu_alloc cmd = { .size = sizeof(cmd), @@ -905,6 +906,8 @@ static int _test_cmd_viommu_alloc(int fd, __u32 device_id, __u32 hwpt_id, .type = type, .dev_id = device_id, .hwpt_id = hwpt_id, + .data_uptr = (uint64_t)data, + .data_len = data_len, }; int ret;
@@ -916,13 +919,15 @@ static int _test_cmd_viommu_alloc(int fd, __u32 device_id, __u32 hwpt_id, return 0; }
-#define test_cmd_viommu_alloc(device_id, hwpt_id, type, viommu_id) \ - ASSERT_EQ(0, _test_cmd_viommu_alloc(self->fd, device_id, hwpt_id, \ - type, 0, viommu_id)) -#define test_err_viommu_alloc(_errno, device_id, hwpt_id, type, viommu_id) \ - EXPECT_ERRNO(_errno, \ - _test_cmd_viommu_alloc(self->fd, device_id, hwpt_id, \ - type, 0, viommu_id)) +#define test_cmd_viommu_alloc(device_id, hwpt_id, type, data, data_len, \ + viommu_id) \ + ASSERT_EQ(0, _test_cmd_viommu_alloc(self->fd, device_id, hwpt_id, 0, \ + type, data, data_len, viommu_id)) +#define test_err_viommu_alloc(_errno, device_id, hwpt_id, type, data, \ + data_len, viommu_id) \ + EXPECT_ERRNO(_errno, \ + _test_cmd_viommu_alloc(self->fd, device_id, hwpt_id, 0, \ + type, data, data_len, viommu_id))
static int _test_cmd_vdevice_alloc(int fd, __u32 viommu_id, __u32 idev_id, __u64 virt_id, __u32 *vdev_id) diff --git a/tools/testing/selftests/iommu/iommufd.c b/tools/testing/selftests/iommu/iommufd.c index 1a8e85afe9aa..8ebbb7fda02d 100644 --- a/tools/testing/selftests/iommu/iommufd.c +++ b/tools/testing/selftests/iommu/iommufd.c @@ -2688,7 +2688,7 @@ FIXTURE_SETUP(iommufd_viommu)
/* Allocate a vIOMMU taking refcount of the parent hwpt */ test_cmd_viommu_alloc(self->device_id, self->hwpt_id, - IOMMU_VIOMMU_TYPE_SELFTEST, + IOMMU_VIOMMU_TYPE_SELFTEST, NULL, 0, &self->viommu_id);
/* Allocate a regular nested hwpt */ @@ -2727,24 +2727,27 @@ TEST_F(iommufd_viommu, viommu_negative_tests) if (self->device_id) { /* Negative test -- invalid hwpt (hwpt_id=0) */ test_err_viommu_alloc(ENOENT, device_id, 0, - IOMMU_VIOMMU_TYPE_SELFTEST, NULL); + IOMMU_VIOMMU_TYPE_SELFTEST, NULL, 0, + NULL);
/* Negative test -- not a nesting parent hwpt */ test_cmd_hwpt_alloc(device_id, ioas_id, 0, &hwpt_id); test_err_viommu_alloc(EINVAL, device_id, hwpt_id, - IOMMU_VIOMMU_TYPE_SELFTEST, NULL); + IOMMU_VIOMMU_TYPE_SELFTEST, NULL, 0, + NULL); test_ioctl_destroy(hwpt_id);
/* Negative test -- unsupported viommu type */ test_err_viommu_alloc(EOPNOTSUPP, device_id, self->hwpt_id, - 0xdead, NULL); + 0xdead, NULL, 0, NULL); EXPECT_ERRNO(EBUSY, _test_ioctl_destroy(self->fd, self->hwpt_id)); EXPECT_ERRNO(EBUSY, _test_ioctl_destroy(self->fd, self->viommu_id)); } else { test_err_viommu_alloc(ENOENT, self->device_id, self->hwpt_id, - IOMMU_VIOMMU_TYPE_SELFTEST, NULL); + IOMMU_VIOMMU_TYPE_SELFTEST, NULL, 0, + NULL); } }
@@ -2791,6 +2794,20 @@ TEST_F(iommufd_viommu, viommu_alloc_nested_iopf) } }
+TEST_F(iommufd_viommu, viommu_alloc_with_data) +{ + struct iommu_viommu_selftest data = { + .in_data = 0xbeef, + }; + + if (self->device_id) { + test_cmd_viommu_alloc(self->device_id, self->hwpt_id, + IOMMU_VIOMMU_TYPE_SELFTEST, &data, + sizeof(data), &self->viommu_id); + assert(data.out_data == data.in_data); + } +} + TEST_F(iommufd_viommu, vdevice_alloc) { uint32_t viommu_id = self->viommu_id; @@ -3105,7 +3122,7 @@ TEST_F(iommufd_device_pasid, pasid_attach)
/* Allocate a regular nested hwpt based on viommu */ test_cmd_viommu_alloc(self->device_id, parent_hwpt_id, - IOMMU_VIOMMU_TYPE_SELFTEST, + IOMMU_VIOMMU_TYPE_SELFTEST, NULL, 0, &viommu_id); test_cmd_hwpt_alloc_nested(self->device_id, viommu_id, IOMMU_HWPT_ALLOC_PASID, diff --git a/tools/testing/selftests/iommu/iommufd_fail_nth.c b/tools/testing/selftests/iommu/iommufd_fail_nth.c index e11ec4b121fc..f7ccf1822108 100644 --- a/tools/testing/selftests/iommu/iommufd_fail_nth.c +++ b/tools/testing/selftests/iommu/iommufd_fail_nth.c @@ -688,8 +688,9 @@ TEST_FAIL_NTH(basic_fail_nth, device) IOMMU_HWPT_DATA_NONE, 0, 0)) return -1;
- if (_test_cmd_viommu_alloc(self->fd, idev_id, hwpt_id, - IOMMU_VIOMMU_TYPE_SELFTEST, 0, &viommu_id)) + if (_test_cmd_viommu_alloc(self->fd, idev_id, hwpt_id, 0, + IOMMU_VIOMMU_TYPE_SELFTEST, NULL, 0, + &viommu_id)) return -1;
if (_test_cmd_vdevice_alloc(self->fd, viommu_id, idev_id, 0, &vdev_id))
To allow IOMMU drivers to allocate own vDEVICE structures, move the struct iommufd_vdevice to the public header and provide a pair of viommu ops.
The iommufd_vdevice_alloc_ioctl will prioritize the callback function from the viommu ops, i.e. a driver-allocated vDEVICE.
Signed-off-by: Nicolin Chen nicolinc@nvidia.com --- drivers/iommu/iommufd/iommufd_private.h | 8 ------- include/linux/iommufd.h | 32 +++++++++++++++++++++++++ drivers/iommu/iommufd/viommu.c | 9 ++++++- 3 files changed, 40 insertions(+), 9 deletions(-)
diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h index 51f1b3938023..8d96aa514033 100644 --- a/drivers/iommu/iommufd/iommufd_private.h +++ b/drivers/iommu/iommufd/iommufd_private.h @@ -606,14 +606,6 @@ void iommufd_viommu_destroy(struct iommufd_object *obj); int iommufd_vdevice_alloc_ioctl(struct iommufd_ucmd *ucmd); void iommufd_vdevice_destroy(struct iommufd_object *obj);
-struct iommufd_vdevice { - struct iommufd_object obj; - struct iommufd_ctx *ictx; - struct iommufd_viommu *viommu; - struct device *dev; - u64 id; /* per-vIOMMU virtual ID */ -}; - #ifdef CONFIG_IOMMUFD_TEST int iommufd_test(struct iommufd_ucmd *ucmd); void iommufd_selftest_destroy(struct iommufd_object *obj); diff --git a/include/linux/iommufd.h b/include/linux/iommufd.h index 999498ddab75..df7a25117e3b 100644 --- a/include/linux/iommufd.h +++ b/include/linux/iommufd.h @@ -104,6 +104,14 @@ struct iommufd_viommu { unsigned int type; };
+struct iommufd_vdevice { + struct iommufd_object obj; + struct iommufd_ctx *ictx; + struct iommufd_viommu *viommu; + struct device *dev; + u64 id; /* per-vIOMMU virtual ID */ +}; + /** * struct iommufd_viommu_ops - vIOMMU specific operations * @destroy: Clean up all driver-specific parts of an iommufd_viommu. The memory @@ -120,6 +128,12 @@ struct iommufd_viommu { * array->entry_num to report the number of handled requests. * The data structure of the array entry must be defined in * include/uapi/linux/iommufd.h + * @vdevice_alloc: Allocate a vDEVICE object and init its driver-level structure + * or HW procedure. Note that the core-level structure is filled + * by the iommufd core after calling this op + * @vdevice_destroy: Clean up all driver-specific parts of an iommufd_vdevice. The + * memory of the vDEVICE will be free-ed by iommufd core after + * calling this op */ struct iommufd_viommu_ops { void (*destroy)(struct iommufd_viommu *viommu); @@ -128,6 +142,9 @@ struct iommufd_viommu_ops { const struct iommu_user_data *user_data); int (*cache_invalidate)(struct iommufd_viommu *viommu, struct iommu_user_data_array *array); + struct iommufd_vdevice *(*vdevice_alloc)(struct iommufd_viommu *viommu, + struct device *dev, u64 id); + void (*vdevice_destroy)(struct iommufd_vdevice *vdev); };
#if IS_ENABLED(CONFIG_IOMMUFD) @@ -252,6 +269,21 @@ static inline int iommufd_viommu_report_event(struct iommufd_viommu *viommu, ret; \ })
+#define iommufd_vdevice_alloc(viommu, drv_struct, member) \ + ({ \ + drv_struct *ret; \ + \ + static_assert(__same_type(struct iommufd_viommu, *viommu)); \ + static_assert(__same_type(struct iommufd_vdevice, \ + ((drv_struct *)NULL)->member)); \ + static_assert(offsetof(drv_struct, member.obj) == 0); \ + ret = (drv_struct *)_iommufd_object_alloc( \ + viommu->ictx, sizeof(drv_struct), IOMMUFD_OBJ_VDEVICE);\ + if (!IS_ERR(ret)) \ + ret->member.viommu = viommu; \ + ret; \ + }) + /* Helper for IOMMU driver to destroy structures created by allocators above */ #define iommufd_struct_destroy(ictx, drv_struct, member) \ ({ \ diff --git a/drivers/iommu/iommufd/viommu.c b/drivers/iommu/iommufd/viommu.c index 3d4d325adeaa..a65153458a26 100644 --- a/drivers/iommu/iommufd/viommu.c +++ b/drivers/iommu/iommufd/viommu.c @@ -96,6 +96,9 @@ void iommufd_vdevice_destroy(struct iommufd_object *obj) container_of(obj, struct iommufd_vdevice, obj); struct iommufd_viommu *viommu = vdev->viommu;
+ if (viommu->ops && viommu->ops->vdevice_destroy) + viommu->ops->vdevice_destroy(vdev); + /* xa_cmpxchg is okay to fail if alloc failed xa_cmpxchg previously */ xa_cmpxchg(&viommu->vdevs, vdev->id, vdev, NULL, GFP_KERNEL); refcount_dec(&viommu->obj.users); @@ -130,7 +133,11 @@ int iommufd_vdevice_alloc_ioctl(struct iommufd_ucmd *ucmd) goto out_put_idev; }
- vdev = iommufd_object_alloc(ucmd->ictx, vdev, IOMMUFD_OBJ_VDEVICE); + if (viommu->ops && viommu->ops->vdevice_alloc) + vdev = viommu->ops->vdevice_alloc(viommu, idev->dev, virt_id); + else + vdev = iommufd_object_alloc(ucmd->ictx, vdev, + IOMMUFD_OBJ_VDEVICE); if (IS_ERR(vdev)) { rc = PTR_ERR(vdev); goto out_put_idev;
From: Nicolin Chen nicolinc@nvidia.com Sent: Friday, April 11, 2025 2:38 PM
@@ -128,6 +142,9 @@ struct iommufd_viommu_ops { const struct iommu_user_data *user_data); int (*cache_invalidate)(struct iommufd_viommu *viommu, struct iommu_user_data_array *array);
- struct iommufd_vdevice *(*vdevice_alloc)(struct iommufd_viommu
*viommu,
struct device *dev, u64 id);
s/id/virt_id/ would be clearer.
On Mon, Apr 21, 2025 at 08:00:37AM +0000, Tian, Kevin wrote:
From: Nicolin Chen nicolinc@nvidia.com Sent: Friday, April 11, 2025 2:38 PM
@@ -128,6 +142,9 @@ struct iommufd_viommu_ops { const struct iommu_user_data *user_data); int (*cache_invalidate)(struct iommufd_viommu *viommu, struct iommu_user_data_array *array);
- struct iommufd_vdevice *(*vdevice_alloc)(struct iommufd_viommu
*viommu,
struct device *dev, u64 id);
s/id/virt_id/ would be clearer.
OK.
@@ -143,7 +143,8 @@ struct iommufd_viommu_ops { int (*cache_invalidate)(struct iommufd_viommu *viommu, struct iommu_user_data_array *array); struct iommufd_vdevice *(*vdevice_alloc)(struct iommufd_viommu *viommu, - struct device *dev, u64 id); + struct device *dev, + u64 virt_id); void (*vdevice_destroy)(struct iommufd_vdevice *vdev); };
Thanks Nicolin
On Thu, Apr 10, 2025 at 11:37:46PM -0700, Nicolin Chen wrote:
To allow IOMMU drivers to allocate own vDEVICE structures, move the struct iommufd_vdevice to the public header and provide a pair of viommu ops.
The iommufd_vdevice_alloc_ioctl will prioritize the callback function from the viommu ops, i.e. a driver-allocated vDEVICE.
Signed-off-by: Nicolin Chen nicolinc@nvidia.com
drivers/iommu/iommufd/iommufd_private.h | 8 ------- include/linux/iommufd.h | 32 +++++++++++++++++++++++++ drivers/iommu/iommufd/viommu.c | 9 ++++++- 3 files changed, 40 insertions(+), 9 deletions(-)
Reviewed-by: Jason Gunthorpe jgg@nvidia.com
Jason
Add a new IOMMUFD_OBJ_VCMDQ with an iommufd_vcmdq structure, representing a command queue type of physical HW passed to or shared with a user space VM. This vCMDQQ object, is a subset of vIOMMU virtualization resources of a physical IOMMU's, such as: - NVIDIA's virtual command queue - AMD vIOMMU's command buffer
Inroduce a struct iommufd_vcmdq and its allocator iommufd_vcmdq_alloc(). Also, add a pair of viommu ops for iommufd to forward user space ioctls to IOMMU drivers.
Signed-off-by: Nicolin Chen nicolinc@nvidia.com --- include/linux/iommufd.h | 31 +++++++++++++++++++++++++++++++ 1 file changed, 31 insertions(+)
diff --git a/include/linux/iommufd.h b/include/linux/iommufd.h index df7a25117e3b..4118eaece1a5 100644 --- a/include/linux/iommufd.h +++ b/include/linux/iommufd.h @@ -37,6 +37,7 @@ enum iommufd_object_type { IOMMUFD_OBJ_VIOMMU, IOMMUFD_OBJ_VDEVICE, IOMMUFD_OBJ_VEVENTQ, + IOMMUFD_OBJ_VCMDQ, #ifdef CONFIG_IOMMUFD_TEST IOMMUFD_OBJ_SELFTEST, #endif @@ -112,6 +113,12 @@ struct iommufd_vdevice { u64 id; /* per-vIOMMU virtual ID */ };
+struct iommufd_vcmdq { + struct iommufd_object obj; + struct iommufd_ctx *ictx; + struct iommufd_viommu *viommu; +}; + /** * struct iommufd_viommu_ops - vIOMMU specific operations * @destroy: Clean up all driver-specific parts of an iommufd_viommu. The memory @@ -134,6 +141,11 @@ struct iommufd_vdevice { * @vdevice_destroy: Clean up all driver-specific parts of an iommufd_vdevice. The * memory of the vDEVICE will be free-ed by iommufd core after * calling this op + * @vcmdq_alloc: Allocate an iommufd_vcmdq as a user space command queue for a + * @viommu instance. Queue specific @user_data must be defined in + * the include/uapi/linux/iommufd.h header. + * @vcmdq_free: Free all driver-specific parts of an iommufd_vcmdq. The memory + * of the iommufd_vcmdq will be free-ed by iommufd core */ struct iommufd_viommu_ops { void (*destroy)(struct iommufd_viommu *viommu); @@ -145,6 +157,10 @@ struct iommufd_viommu_ops { struct iommufd_vdevice *(*vdevice_alloc)(struct iommufd_viommu *viommu, struct device *dev, u64 id); void (*vdevice_destroy)(struct iommufd_vdevice *vdev); + struct iommufd_vcmdq *(*vcmdq_alloc)( + struct iommufd_viommu *viommu, + const struct iommu_user_data *user_data); + void (*vcmdq_free)(struct iommufd_vcmdq *vcmdq); };
#if IS_ENABLED(CONFIG_IOMMUFD) @@ -284,6 +300,21 @@ static inline int iommufd_viommu_report_event(struct iommufd_viommu *viommu, ret; \ })
+#define iommufd_vcmdq_alloc(viommu, drv_struct, member) \ + ({ \ + drv_struct *ret; \ + \ + static_assert(__same_type(struct iommufd_viommu, *viommu)); \ + static_assert(__same_type(struct iommufd_vcmdq, \ + ((drv_struct *)NULL)->member)); \ + static_assert(offsetof(drv_struct, member.obj) == 0); \ + ret = (drv_struct *)_iommufd_object_alloc( \ + viommu->ictx, sizeof(drv_struct), IOMMUFD_OBJ_VCMDQ); \ + if (!IS_ERR(ret)) \ + ret->member.viommu = viommu; \ + ret; \ + }) + /* Helper for IOMMU driver to destroy structures created by allocators above */ #define iommufd_struct_destroy(ictx, drv_struct, member) \ ({ \
From: Nicolin Chen nicolinc@nvidia.com Sent: Friday, April 11, 2025 2:38 PM
Add a new IOMMUFD_OBJ_VCMDQ with an iommufd_vcmdq structure, representing a command queue type of physical HW passed to or shared with a user space VM.
what is the implication of "or shared with"? Do you intend to support an usage in which a CMDQ is coordinated by both guest and host (which sounds insane)?
On Mon, Apr 21, 2025 at 08:03:01AM +0000, Tian, Kevin wrote:
From: Nicolin Chen nicolinc@nvidia.com Sent: Friday, April 11, 2025 2:38 PM
Add a new IOMMUFD_OBJ_VCMDQ with an iommufd_vcmdq structure, representing a command queue type of physical HW passed to or shared with a user space VM.
what is the implication of "or shared with"? Do you intend to support an usage in which a CMDQ is coordinated by both guest and host (which sounds insane)?
No. It should have been removed. Was for "vQUEUE" previously for a potential wider use case. Will fix this.
Thanks Nicolin
Introduce a new IOMMUFD_CMD_VCMDQ_ALLOC ioctl for user space to allocate a vCMDQ for a vIOMMU object. Simply increase the refcount of the vIOMMU.
Signed-off-by: Nicolin Chen nicolinc@nvidia.com --- drivers/iommu/iommufd/iommufd_private.h | 2 + include/uapi/linux/iommufd.h | 32 +++++++++++++ drivers/iommu/iommufd/main.c | 6 +++ drivers/iommu/iommufd/viommu.c | 61 +++++++++++++++++++++++++ 4 files changed, 101 insertions(+)
diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h index 8d96aa514033..da35ffcc212b 100644 --- a/drivers/iommu/iommufd/iommufd_private.h +++ b/drivers/iommu/iommufd/iommufd_private.h @@ -605,6 +605,8 @@ int iommufd_viommu_alloc_ioctl(struct iommufd_ucmd *ucmd); void iommufd_viommu_destroy(struct iommufd_object *obj); int iommufd_vdevice_alloc_ioctl(struct iommufd_ucmd *ucmd); void iommufd_vdevice_destroy(struct iommufd_object *obj); +int iommufd_vcmdq_alloc_ioctl(struct iommufd_ucmd *ucmd); +void iommufd_vcmdq_destroy(struct iommufd_object *obj);
#ifdef CONFIG_IOMMUFD_TEST int iommufd_test(struct iommufd_ucmd *ucmd); diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h index cc90299a08d9..428f1e9e6239 100644 --- a/include/uapi/linux/iommufd.h +++ b/include/uapi/linux/iommufd.h @@ -56,6 +56,7 @@ enum { IOMMUFD_CMD_VDEVICE_ALLOC = 0x91, IOMMUFD_CMD_IOAS_CHANGE_PROCESS = 0x92, IOMMUFD_CMD_VEVENTQ_ALLOC = 0x93, + IOMMUFD_CMD_VCMDQ_ALLOC = 0x94, };
/** @@ -1147,4 +1148,35 @@ struct iommu_veventq_alloc { __u32 __reserved; }; #define IOMMU_VEVENTQ_ALLOC _IO(IOMMUFD_TYPE, IOMMUFD_CMD_VEVENTQ_ALLOC) + +/** + * enum iommu_vcmdq_type - Virtual Queue Type + * @IOMMU_VCMDQ_TYPE_DEFAULT: Reserved for future use + */ +enum iommu_vcmdq_data_type { + IOMMU_VCMDQ_TYPE_DEFAULT = 0, +}; + +/** + * struct iommu_vcmdq_alloc - ioctl(IOMMU_VCMDQ_ALLOC) + * @size: sizeof(struct iommu_vcmdq_alloc) + * @flags: Must be 0 + * @viommu_id: viommu ID to associate the virtual queue with + * @type: One of enum iommu_vcmdq_type + * @out_vcmdq_id: The ID of the new virtual queue + * @data_len: Length of the type specific data + * @data_uptr: User pointer to the type specific data + * + * Allocate a virtual queue object for vIOMMU-specific HW-accelerated queue + */ +struct iommu_vcmdq_alloc { + __u32 size; + __u32 flags; + __u32 viommu_id; + __u32 type; + __u32 out_vcmdq_id; + __u32 data_len; + __aligned_u64 data_uptr; +}; +#define IOMMU_VCMDQ_ALLOC _IO(IOMMUFD_TYPE, IOMMUFD_CMD_VCMDQ_ALLOC) #endif diff --git a/drivers/iommu/iommufd/main.c b/drivers/iommu/iommufd/main.c index 2b9ee9b4a424..27473aff150f 100644 --- a/drivers/iommu/iommufd/main.c +++ b/drivers/iommu/iommufd/main.c @@ -303,6 +303,7 @@ union ucmd_buffer { struct iommu_ioas_map map; struct iommu_ioas_unmap unmap; struct iommu_option option; + struct iommu_vcmdq_alloc vcmdq; struct iommu_vdevice_alloc vdev; struct iommu_veventq_alloc veventq; struct iommu_vfio_ioas vfio_ioas; @@ -358,6 +359,8 @@ static const struct iommufd_ioctl_op iommufd_ioctl_ops[] = { IOCTL_OP(IOMMU_IOAS_UNMAP, iommufd_ioas_unmap, struct iommu_ioas_unmap, length), IOCTL_OP(IOMMU_OPTION, iommufd_option, struct iommu_option, val64), + IOCTL_OP(IOMMU_VCMDQ_ALLOC, iommufd_vcmdq_alloc_ioctl, + struct iommu_vcmdq_alloc, data_uptr), IOCTL_OP(IOMMU_VDEVICE_ALLOC, iommufd_vdevice_alloc_ioctl, struct iommu_vdevice_alloc, virt_id), IOCTL_OP(IOMMU_VEVENTQ_ALLOC, iommufd_veventq_alloc, @@ -501,6 +504,9 @@ static const struct iommufd_object_ops iommufd_object_ops[] = { [IOMMUFD_OBJ_IOAS] = { .destroy = iommufd_ioas_destroy, }, + [IOMMUFD_OBJ_VCMDQ] = { + .destroy = iommufd_vcmdq_destroy, + }, [IOMMUFD_OBJ_VDEVICE] = { .destroy = iommufd_vdevice_destroy, }, diff --git a/drivers/iommu/iommufd/viommu.c b/drivers/iommu/iommufd/viommu.c index a65153458a26..c5a0db132e03 100644 --- a/drivers/iommu/iommufd/viommu.c +++ b/drivers/iommu/iommufd/viommu.c @@ -170,3 +170,64 @@ int iommufd_vdevice_alloc_ioctl(struct iommufd_ucmd *ucmd) iommufd_put_object(ucmd->ictx, &viommu->obj); return rc; } + +void iommufd_vcmdq_destroy(struct iommufd_object *obj) +{ + struct iommufd_vcmdq *vcmdq = + container_of(obj, struct iommufd_vcmdq, obj); + struct iommufd_viommu *viommu = vcmdq->viommu; + + if (viommu->ops->vcmdq_free) + viommu->ops->vcmdq_free(vcmdq); + refcount_dec(&viommu->obj.users); +} + +int iommufd_vcmdq_alloc_ioctl(struct iommufd_ucmd *ucmd) +{ + struct iommu_vcmdq_alloc *cmd = ucmd->cmd; + const struct iommu_user_data user_data = { + .type = cmd->type, + .uptr = u64_to_user_ptr(cmd->data_uptr), + .len = cmd->data_len, + }; + struct iommufd_vcmdq *vcmdq; + struct iommufd_viommu *viommu; + int rc; + + if (cmd->flags || cmd->type == IOMMU_VCMDQ_TYPE_DEFAULT) + return -EOPNOTSUPP; + if (!cmd->data_len) + return -EINVAL; + + viommu = iommufd_get_viommu(ucmd, cmd->viommu_id); + if (IS_ERR(viommu)) + return PTR_ERR(viommu); + + if (!viommu->ops || !viommu->ops->vcmdq_alloc) { + rc = -EOPNOTSUPP; + goto out_put_viommu; + } + + vcmdq = viommu->ops->vcmdq_alloc(viommu, + user_data.len ? &user_data : NULL); + if (IS_ERR(vcmdq)) { + rc = PTR_ERR(vcmdq); + goto out_put_viommu; + } + + vcmdq->viommu = viommu; + refcount_inc(&viommu->obj.users); + vcmdq->ictx = ucmd->ictx; + cmd->out_vcmdq_id = vcmdq->obj.id; + rc = iommufd_ucmd_respond(ucmd, sizeof(*cmd)); + if (rc) + goto out_abort; + iommufd_object_finalize(ucmd->ictx, &vcmdq->obj); + goto out_put_viommu; + +out_abort: + iommufd_object_abort_and_destroy(ucmd->ictx, &vcmdq->obj); +out_put_viommu: + iommufd_put_object(ucmd->ictx, &viommu->obj); + return rc; +}
From: Nicolin Chen nicolinc@nvidia.com Sent: Friday, April 11, 2025 2:38 PM
+/**
- enum iommu_vcmdq_type - Virtual Queue Type
"Virtual Command Queue Type"
- @IOMMU_VCMDQ_TYPE_DEFAULT: Reserved for future use
- */
+enum iommu_vcmdq_data_type {
- IOMMU_VCMDQ_TYPE_DEFAULT = 0,
+};
+/**
- struct iommu_vcmdq_alloc - ioctl(IOMMU_VCMDQ_ALLOC)
- @size: sizeof(struct iommu_vcmdq_alloc)
- @flags: Must be 0
- @viommu_id: viommu ID to associate the virtual queue with
- @type: One of enum iommu_vcmdq_type
s/ iommu_vcmdq_type/ iommu_vcmdq_data_type/
+int iommufd_vcmdq_alloc_ioctl(struct iommufd_ucmd *ucmd) +{
- struct iommu_vcmdq_alloc *cmd = ucmd->cmd;
- const struct iommu_user_data user_data = {
.type = cmd->type,
.uptr = u64_to_user_ptr(cmd->data_uptr),
.len = cmd->data_len,
- };
- struct iommufd_vcmdq *vcmdq;
- struct iommufd_viommu *viommu;
- int rc;
- if (cmd->flags || cmd->type == IOMMU_VCMDQ_TYPE_DEFAULT)
return -EOPNOTSUPP;
- if (!cmd->data_len)
return -EINVAL;
- viommu = iommufd_get_viommu(ucmd, cmd->viommu_id);
- if (IS_ERR(viommu))
return PTR_ERR(viommu);
- if (!viommu->ops || !viommu->ops->vcmdq_alloc) {
rc = -EOPNOTSUPP;
goto out_put_viommu;
- }
- vcmdq = viommu->ops->vcmdq_alloc(viommu,
user_data.len ? &user_data : NULL);
the length cannot be zero at this point due to earlier check.
On Mon, Apr 21, 2025 at 08:05:40AM +0000, Tian, Kevin wrote:
From: Nicolin Chen nicolinc@nvidia.com Sent: Friday, April 11, 2025 2:38 PM
+/**
- enum iommu_vcmdq_type - Virtual Queue Type
"Virtual Command Queue Type"
- @IOMMU_VCMDQ_TYPE_DEFAULT: Reserved for future use
- */
+enum iommu_vcmdq_data_type {
- IOMMU_VCMDQ_TYPE_DEFAULT = 0,
+};
+/**
- struct iommu_vcmdq_alloc - ioctl(IOMMU_VCMDQ_ALLOC)
- @size: sizeof(struct iommu_vcmdq_alloc)
- @flags: Must be 0
- @viommu_id: viommu ID to associate the virtual queue with
- @type: One of enum iommu_vcmdq_type
s/ iommu_vcmdq_type/ iommu_vcmdq_data_type/
+int iommufd_vcmdq_alloc_ioctl(struct iommufd_ucmd *ucmd) +{
- struct iommu_vcmdq_alloc *cmd = ucmd->cmd;
- const struct iommu_user_data user_data = {
.type = cmd->type,
.uptr = u64_to_user_ptr(cmd->data_uptr),
.len = cmd->data_len,
- };
- struct iommufd_vcmdq *vcmdq;
- struct iommufd_viommu *viommu;
- int rc;
- if (cmd->flags || cmd->type == IOMMU_VCMDQ_TYPE_DEFAULT)
return -EOPNOTSUPP;
- if (!cmd->data_len)
return -EINVAL;
- viommu = iommufd_get_viommu(ucmd, cmd->viommu_id);
- if (IS_ERR(viommu))
return PTR_ERR(viommu);
- if (!viommu->ops || !viommu->ops->vcmdq_alloc) {
rc = -EOPNOTSUPP;
goto out_put_viommu;
- }
- vcmdq = viommu->ops->vcmdq_alloc(viommu,
user_data.len ? &user_data : NULL);
the length cannot be zero at this point due to earlier check.
Yes. Will fix them all.
Thanks! Nicolin
For vIOMMU passing through HW resources to user space (VMs), add an mmap infrastructure to map a region of hardware MMIO pages. The addr and size should be given previously via a prior IOMMU_VIOMMU_ALLOC ioctl in some output fields of the structure.
Maintain an mt_mmap per ictx for validations. And give IOMMU drivers a pair of helpers to add and delete mmap regions onto the mt_mmap.
Signed-off-by: Nicolin Chen nicolinc@nvidia.com --- drivers/iommu/iommufd/iommufd_private.h | 9 ++++++ include/linux/iommufd.h | 15 ++++++++++ drivers/iommu/iommufd/driver.c | 40 +++++++++++++++++++++++++ drivers/iommu/iommufd/main.c | 35 ++++++++++++++++++++++ 4 files changed, 99 insertions(+)
diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h index da35ffcc212b..5be0248966aa 100644 --- a/drivers/iommu/iommufd/iommufd_private.h +++ b/drivers/iommu/iommufd/iommufd_private.h @@ -7,6 +7,7 @@ #include <linux/iommu.h> #include <linux/iommufd.h> #include <linux/iova_bitmap.h> +#include <linux/maple_tree.h> #include <linux/rwsem.h> #include <linux/uaccess.h> #include <linux/xarray.h> @@ -44,6 +45,7 @@ struct iommufd_ctx { struct xarray groups; wait_queue_head_t destroy_wait; struct rw_semaphore ioas_creation_lock; + struct maple_tree mt_mmap;
struct mutex sw_msi_lock; struct list_head sw_msi_list; @@ -55,6 +57,13 @@ struct iommufd_ctx { struct iommufd_ioas *vfio_ioas; };
+/* Entry for iommufd_ctx::mt_mmap */ +struct iommufd_mmap { + unsigned long pfn_start; + unsigned long pfn_end; + bool is_io; +}; + /* * The IOVA to PFN map. The map automatically copies the PFNs into multiple * domains and permits sharing of PFNs between io_pagetable instances. This diff --git a/include/linux/iommufd.h b/include/linux/iommufd.h index 4118eaece1a5..e9394e20c4dd 100644 --- a/include/linux/iommufd.h +++ b/include/linux/iommufd.h @@ -226,6 +226,9 @@ struct iommufd_object *_iommufd_object_alloc(struct iommufd_ctx *ictx, size_t size, enum iommufd_object_type type); void iommufd_object_abort(struct iommufd_ctx *ictx, struct iommufd_object *obj); +int iommufd_ctx_alloc_mmap(struct iommufd_ctx *ictx, phys_addr_t base, + size_t size, bool is_io, unsigned long *immap_id); +void iommufd_ctx_free_mmap(struct iommufd_ctx *ictx, unsigned long immap_id); struct device *iommufd_viommu_find_dev(struct iommufd_viommu *viommu, unsigned long vdev_id); int iommufd_viommu_get_vdev_id(struct iommufd_viommu *viommu, @@ -246,6 +249,18 @@ static inline void iommufd_object_abort(struct iommufd_ctx *ictx, { }
+static inline int iommufd_ctx_alloc_mmap(struct iommufd_ctx *ictx, + phys_addr_t base, size_t size, + bool is_io, unsigned long *immap_id) +{ + return -EOPNOTSUPP; +} + +static inline void iommufd_ctx_free_mmap(struct iommufd_ctx *ictx, + unsigned long immap_id) +{ +} + static inline struct device * iommufd_viommu_find_dev(struct iommufd_viommu *viommu, unsigned long vdev_id) { diff --git a/drivers/iommu/iommufd/driver.c b/drivers/iommu/iommufd/driver.c index 7980a09761c2..abe0bdc1e9a3 100644 --- a/drivers/iommu/iommufd/driver.c +++ b/drivers/iommu/iommufd/driver.c @@ -50,6 +50,46 @@ void iommufd_object_abort(struct iommufd_ctx *ictx, struct iommufd_object *obj) } EXPORT_SYMBOL_NS_GPL(iommufd_object_abort, "IOMMUFD");
+/* Driver should report the immap_id to user space for mmap() syscalls */ +int iommufd_ctx_alloc_mmap(struct iommufd_ctx *ictx, phys_addr_t base, + size_t size, bool is_io, unsigned long *immap_id) +{ + struct iommufd_mmap *immap; + int rc; + + if (WARN_ON_ONCE(!immap_id)) + return -EINVAL; + if (base & ~PAGE_MASK) + return -EINVAL; + if (!size || size & ~PAGE_MASK) + return -EINVAL; + + immap = kzalloc(sizeof(*immap), GFP_KERNEL); + if (!immap) + return -ENOMEM; + immap->pfn_start = base >> PAGE_SHIFT; + immap->pfn_end = immap->pfn_start + (size >> PAGE_SHIFT) - 1; + immap->is_io = is_io; + + rc = mtree_alloc_range(&ictx->mt_mmap, immap_id, immap, sizeof(immap), + 0, LONG_MAX >> PAGE_SHIFT, GFP_KERNEL); + if (rc < 0) { + kfree(immap); + return rc; + } + + /* mmap() syscall would right-shift the immap_id to vma->vm_pgoff */ + *immap_id <<= PAGE_SHIFT; + return 0; +} +EXPORT_SYMBOL_NS_GPL(iommufd_ctx_alloc_mmap, "IOMMUFD"); + +void iommufd_ctx_free_mmap(struct iommufd_ctx *ictx, unsigned long immap_id) +{ + kfree(mtree_erase(&ictx->mt_mmap, immap_id >> PAGE_SHIFT)); +} +EXPORT_SYMBOL_NS_GPL(iommufd_ctx_free_mmap, "IOMMUFD"); + /* Caller should xa_lock(&viommu->vdevs) to protect the return value */ struct device *iommufd_viommu_find_dev(struct iommufd_viommu *viommu, unsigned long vdev_id) diff --git a/drivers/iommu/iommufd/main.c b/drivers/iommu/iommufd/main.c index 27473aff150f..d3101c76fcb8 100644 --- a/drivers/iommu/iommufd/main.c +++ b/drivers/iommu/iommufd/main.c @@ -213,6 +213,7 @@ static int iommufd_fops_open(struct inode *inode, struct file *filp) xa_init_flags(&ictx->objects, XA_FLAGS_ALLOC1 | XA_FLAGS_ACCOUNT); xa_init(&ictx->groups); ictx->file = filp; + mt_init_flags(&ictx->mt_mmap, MT_FLAGS_ALLOC_RANGE); init_waitqueue_head(&ictx->destroy_wait); mutex_init(&ictx->sw_msi_lock); INIT_LIST_HEAD(&ictx->sw_msi_list); @@ -410,11 +411,45 @@ static long iommufd_fops_ioctl(struct file *filp, unsigned int cmd, return ret; }
+/* + * The pfn and size carried in @vma from the user space mmap call should be + * previously given to user space via a prior ioctl output. + */ +static int iommufd_fops_mmap(struct file *filp, struct vm_area_struct *vma) +{ + struct iommufd_ctx *ictx = filp->private_data; + size_t size = vma->vm_end - vma->vm_start; + struct iommufd_mmap *immap; + + if (size & ~PAGE_MASK) + return -EINVAL; + if (!(vma->vm_flags & VM_SHARED)) + return -EINVAL; + if (vma->vm_flags & VM_EXEC) + return -EPERM; + + /* vm_pgoff carries an index of an mtree entry/immap */ + immap = mtree_load(&ictx->mt_mmap, vma->vm_pgoff); + if (!immap) + return -EINVAL; + if (size >> PAGE_SHIFT > immap->pfn_end - immap->pfn_start + 1) + return -EINVAL; + + vma->vm_pgoff = 0; + vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot); + vm_flags_set(vma, VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP); + if (immap->is_io) + vm_flags_set(vma, VM_IO); + return remap_pfn_range(vma, vma->vm_start, immap->pfn_start, size, + vma->vm_page_prot); +} + static const struct file_operations iommufd_fops = { .owner = THIS_MODULE, .open = iommufd_fops_open, .release = iommufd_fops_release, .unlocked_ioctl = iommufd_fops_ioctl, + .mmap = iommufd_fops_mmap, };
/**
From: Nicolin Chen nicolinc@nvidia.com Sent: Friday, April 11, 2025 2:38 PM
For vIOMMU passing through HW resources to user space (VMs), add an mmap infrastructure to map a region of hardware MMIO pages. The addr and size should be given previously via a prior IOMMU_VIOMMU_ALLOC ioctl in some output fields of the structure.
According to the code the addr must be the immap_id given by previous alloc but size can be any as long as it doesn't exceed the physical length.
+/* Entry for iommufd_ctx::mt_mmap */ +struct iommufd_mmap {
- unsigned long pfn_start;
- unsigned long pfn_end;
- bool is_io;
+};
what is the point of 'is_io' here? Do you intend to allow userspace to mmap anonymous memory via iommufd?
anyway for now the only user in this series always sets it to true.
I'd suggest to remove it until there is a real need.
+/*
- The pfn and size carried in @vma from the user space mmap call should
be
there is no 'pfn' carried in the mmap call. It's vm_pgoff.
- previously given to user space via a prior ioctl output.
- */
+static int iommufd_fops_mmap(struct file *filp, struct vm_area_struct *vma) +{
- struct iommufd_ctx *ictx = filp->private_data;
- size_t size = vma->vm_end - vma->vm_start;
- struct iommufd_mmap *immap;
- if (size & ~PAGE_MASK)
return -EINVAL;
- if (!(vma->vm_flags & VM_SHARED))
return -EINVAL;
- if (vma->vm_flags & VM_EXEC)
return -EPERM;
- /* vm_pgoff carries an index of an mtree entry/immap */
- immap = mtree_load(&ictx->mt_mmap, vma->vm_pgoff);
- if (!immap)
return -EINVAL;
- if (size >> PAGE_SHIFT > immap->pfn_end - immap->pfn_start + 1)
return -EINVAL;
Do we want to document in uAPI that iommufd mmap allows to map a sub-region (starting from offset zero) of the reported size from earlier alloc ioctl, but not from random offset (of course impossible by forcing vm_pgoff to be a mtree index)?
On Mon, Apr 21, 2025 at 08:16:54AM +0000, Tian, Kevin wrote:
From: Nicolin Chen nicolinc@nvidia.com Sent: Friday, April 11, 2025 2:38 PM
For vIOMMU passing through HW resources to user space (VMs), add an mmap infrastructure to map a region of hardware MMIO pages. The addr and size should be given previously via a prior IOMMU_VIOMMU_ALLOC ioctl in some output fields of the structure.
According to the code the addr must be the immap_id given by previous alloc but size can be any as long as it doesn't exceed the physical length.
Yea, though I start to wonder if we should simply close the window by forcing the full range..
I changed the kdoc of the iommufd_fops_mmap():
+/* + * Kernel driver must first do iommufd_ctx_alloc_mmap() to register an mmappable + * MMIO region to the iommufd core to receive an "immap_id". Then, driver should + * report to user space this immap_id and the size of the registered MMIO region + * for @vm_pgoff and @size of an mmap() call, via an IOMMU_VIOMMU_ALLOC ioctl in + * the output fields of its driver-type data structure. + * + * Note the @size is allowed to be smaller than the registered size as a partial + * mmap starting from the registered base address. + */
+/* Entry for iommufd_ctx::mt_mmap */ +struct iommufd_mmap {
- unsigned long pfn_start;
- unsigned long pfn_end;
- bool is_io;
+};
what is the point of 'is_io' here? Do you intend to allow userspace to mmap anonymous memory via iommufd?
anyway for now the only user in this series always sets it to true.
I'd suggest to remove it until there is a real need.
Done.
Thanks Nicolin
On Mon, Apr 21, 2025 at 08:16:54AM +0000, Tian, Kevin wrote:
From: Nicolin Chen nicolinc@nvidia.com Sent: Friday, April 11, 2025 2:38 PM
- previously given to user space via a prior ioctl output.
- */
+static int iommufd_fops_mmap(struct file *filp, struct vm_area_struct *vma) +{
- struct iommufd_ctx *ictx = filp->private_data;
- size_t size = vma->vm_end - vma->vm_start;
- struct iommufd_mmap *immap;
- if (size & ~PAGE_MASK)
return -EINVAL;
- if (!(vma->vm_flags & VM_SHARED))
return -EINVAL;
- if (vma->vm_flags & VM_EXEC)
return -EPERM;
- /* vm_pgoff carries an index of an mtree entry/immap */
- immap = mtree_load(&ictx->mt_mmap, vma->vm_pgoff);
- if (!immap)
return -EINVAL;
- if (size >> PAGE_SHIFT > immap->pfn_end - immap->pfn_start + 1)
return -EINVAL;
Do we want to document in uAPI that iommufd mmap allows to map a sub-region (starting from offset zero) of the reported size from earlier alloc ioctl, but not from random offset (of course impossible by forcing vm_pgoff to be a mtree index)?
I also did this:
diff --git a/Documentation/userspace-api/iommufd.rst b/Documentation/userspace-api/iommufd.rst index ace0579432d57..f57a5bf2feea1 100644 --- a/Documentation/userspace-api/iommufd.rst +++ b/Documentation/userspace-api/iommufd.rst @@ -128,11 +128,13 @@ Following IOMMUFD objects are exposed to userspace: virtualization feature for a VM to directly execute guest-issued commands to invalidate HW cache entries holding the mappings or translations of a guest- owned stage-1 page table. Along with this queue object, iommufd provides the - user space a new mmap interface that the VMM can mmap a physical MMIO region - from the host physical address space to a guest physical address space. To use - this mmap interface, the VMM must define an IOMMU specific driver structure - to ask for a pair of VMA info (vm_pgoff/size) to do mmap after a vCMDQ gets - allocated. + user space an mmap interface for VMM to mmap a physical MMIO region from the + host physical address space to a guest physical address space. When allocating + a vCMDQ, the VMM must request a pair of VMA info (vm_pgoff/size) for a later + mmap call. The length argument of an mmap call can be smaller than the given + size for a paritial mmap, but the given vm_pgoff (as the addr argument of the + mmap call) should never be changed, which implies that the mmap always starts + from the beginning of the MMIO region.
All user-visible objects are destroyed via the IOMMU_DESTROY uAPI.
Thanks Nicolin
Extend the loopback test to a new mmap page.
Signed-off-by: Nicolin Chen nicolinc@nvidia.com --- drivers/iommu/iommufd/iommufd_test.h | 4 +++ drivers/iommu/iommufd/selftest.c | 47 ++++++++++++++++++++----- tools/testing/selftests/iommu/iommufd.c | 5 +++ 3 files changed, 48 insertions(+), 8 deletions(-)
diff --git a/drivers/iommu/iommufd/iommufd_test.h b/drivers/iommu/iommufd/iommufd_test.h index fbf9ecb35a13..4e4be0b73391 100644 --- a/drivers/iommu/iommufd/iommufd_test.h +++ b/drivers/iommu/iommufd/iommufd_test.h @@ -232,12 +232,16 @@ struct iommu_hwpt_invalidate_selftest { * (IOMMU_VIOMMU_TYPE_SELFTEST) * @in_data: Input random data from user space * @out_data: Output data (matching @in_data) to user space + * @out_mmap_pgoff: The offset argument for mmap syscall + * @out_mmap_pgsz: Maximum page size for mmap syscall * * Simply set @out_data=@in_data for a loopback test */ struct iommu_viommu_selftest { __u32 in_data; __u32 out_data; + __aligned_u64 out_mmap_pgoff; + __aligned_u64 out_mmap_pgsz; };
/* Should not be equal to any defined value in enum iommu_viommu_invalidate_data_type */ diff --git a/drivers/iommu/iommufd/selftest.c b/drivers/iommu/iommufd/selftest.c index b04bd2fbc53d..4e6374f4fad2 100644 --- a/drivers/iommu/iommufd/selftest.c +++ b/drivers/iommu/iommufd/selftest.c @@ -148,6 +148,9 @@ to_mock_nested(struct iommu_domain *domain) struct mock_viommu { struct iommufd_viommu core; struct mock_iommu_domain *s2_parent; + + unsigned long mmap_pgoff; + u32 *page; /* Mmap page to test u32 type of in_data */ };
static inline struct mock_viommu *to_mock_viommu(struct iommufd_viommu *viommu) @@ -632,9 +635,12 @@ static void mock_viommu_destroy(struct iommufd_viommu *viommu) { struct mock_iommu_device *mock_iommu = container_of( viommu->iommu_dev, struct mock_iommu_device, iommu_dev); + struct mock_viommu *mock_viommu = to_mock_viommu(viommu);
if (refcount_dec_and_test(&mock_iommu->users)) complete(&mock_iommu->complete); + iommufd_ctx_free_mmap(viommu->ictx, mock_viommu->mmap_pgoff); + free_page((unsigned long)mock_viommu->page);
/* iommufd core frees mock_viommu and viommu */ } @@ -748,8 +754,9 @@ mock_viommu_alloc(struct device *dev, struct iommu_domain *domain, return ERR_PTR(-EOPNOTSUPP);
if (user_data) { - rc = iommu_copy_struct_from_user( - &data, user_data, IOMMU_VIOMMU_TYPE_SELFTEST, out_data); + rc = iommu_copy_struct_from_user(&data, user_data, + IOMMU_VIOMMU_TYPE_SELFTEST, + out_mmap_pgsz); if (rc) return ERR_PTR(rc); } @@ -760,17 +767,41 @@ mock_viommu_alloc(struct device *dev, struct iommu_domain *domain, return ERR_CAST(mock_viommu);
if (user_data) { - data.out_data = data.in_data; - rc = iommu_copy_struct_to_user( - user_data, &data, IOMMU_VIOMMU_TYPE_SELFTEST, out_data); - if (rc) { - iommufd_struct_destroy(ictx, mock_viommu, core); - return ERR_PTR(rc); + mock_viommu->page = + (u32 *)__get_free_page(GFP_KERNEL | __GFP_ZERO); + if (!mock_viommu->page) { + rc = -ENOMEM; + goto err_destroy_struct; } + + rc = iommufd_ctx_alloc_mmap(ictx, __pa(mock_viommu->page), + PAGE_SIZE, false, + &mock_viommu->mmap_pgoff); + if (rc) + goto err_free_page; + + /* For loopback tests on both the page and out_data */ + *mock_viommu->page = data.in_data; + data.out_data = data.in_data; + data.out_mmap_pgsz = PAGE_SIZE; + data.out_mmap_pgoff = mock_viommu->mmap_pgoff; + rc = iommu_copy_struct_to_user(user_data, &data, + IOMMU_VIOMMU_TYPE_SELFTEST, + out_mmap_pgsz); + if (rc) + goto err_free_mmap; }
refcount_inc(&mock_iommu->users); return &mock_viommu->core; + +err_free_mmap: + iommufd_ctx_free_mmap(ictx, mock_viommu->mmap_pgoff); +err_free_page: + free_page((unsigned long)mock_viommu->page); +err_destroy_struct: + iommufd_struct_destroy(ictx, mock_viommu, core); + return ERR_PTR(rc); }
static const struct iommu_ops mock_ops = { diff --git a/tools/testing/selftests/iommu/iommufd.c b/tools/testing/selftests/iommu/iommufd.c index 8ebbb7fda02d..4aa6411e9a76 100644 --- a/tools/testing/selftests/iommu/iommufd.c +++ b/tools/testing/selftests/iommu/iommufd.c @@ -2799,12 +2799,17 @@ TEST_F(iommufd_viommu, viommu_alloc_with_data) struct iommu_viommu_selftest data = { .in_data = 0xbeef, }; + uint32_t *test;
if (self->device_id) { test_cmd_viommu_alloc(self->device_id, self->hwpt_id, IOMMU_VIOMMU_TYPE_SELFTEST, &data, sizeof(data), &self->viommu_id); assert(data.out_data == data.in_data); + test = mmap(NULL, data.out_mmap_pgsz, PROT_READ | PROT_WRITE, + MAP_SHARED, self->fd, data.out_mmap_pgoff); + assert(test && *test == data.in_data); + munmap(test, data.out_mmap_pgsz); } }
With the introduction of the new object and its infrastructure, update the doc to reflect that.
Signed-off-by: Nicolin Chen nicolinc@nvidia.com --- Documentation/userspace-api/iommufd.rst | 11 +++++++++++ 1 file changed, 11 insertions(+)
diff --git a/Documentation/userspace-api/iommufd.rst b/Documentation/userspace-api/iommufd.rst index b0df15865dec..ace0579432d5 100644 --- a/Documentation/userspace-api/iommufd.rst +++ b/Documentation/userspace-api/iommufd.rst @@ -124,6 +124,16 @@ Following IOMMUFD objects are exposed to userspace: used to allocate a vEVENTQ. Each vIOMMU can support multiple types of vEVENTS, but is confined to one vEVENTQ per vEVENTQ type.
+- IOMMUFD_OBJ_VCMDQ, representing a hardware queue as a subset of a vIOMMU's + virtualization feature for a VM to directly execute guest-issued commands to + invalidate HW cache entries holding the mappings or translations of a guest- + owned stage-1 page table. Along with this queue object, iommufd provides the + user space a new mmap interface that the VMM can mmap a physical MMIO region + from the host physical address space to a guest physical address space. To use + this mmap interface, the VMM must define an IOMMU specific driver structure + to ask for a pair of VMA info (vm_pgoff/size) to do mmap after a vCMDQ gets + allocated. + All user-visible objects are destroyed via the IOMMU_DESTROY uAPI.
The diagrams below show relationships between user-visible objects and kernel @@ -270,6 +280,7 @@ User visible objects are backed by following datastructures: - iommufd_viommu for IOMMUFD_OBJ_VIOMMU. - iommufd_vdevice for IOMMUFD_OBJ_VDEVICE. - iommufd_veventq for IOMMUFD_OBJ_VEVENTQ. +- iommufd_vcmdq for IOMMUFD_OBJ_VCMDQ.
Several terminologies when looking at these datastructures:
A vIRQ can be reported only from a threaded IRQ context. Change to use to request_threaded_irq to support that.
Signed-off-by: Nicolin Chen nicolinc@nvidia.com --- drivers/iommu/arm/arm-smmu-v3/tegra241-cmdqv.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/tegra241-cmdqv.c b/drivers/iommu/arm/arm-smmu-v3/tegra241-cmdqv.c index dd7d030d2e89..ba029f7d24ce 100644 --- a/drivers/iommu/arm/arm-smmu-v3/tegra241-cmdqv.c +++ b/drivers/iommu/arm/arm-smmu-v3/tegra241-cmdqv.c @@ -824,8 +824,9 @@ __tegra241_cmdqv_probe(struct arm_smmu_device *smmu, struct resource *res, cmdqv->dev = smmu->impl_dev;
if (cmdqv->irq > 0) { - ret = request_irq(irq, tegra241_cmdqv_isr, 0, "tegra241-cmdqv", - cmdqv); + ret = request_threaded_irq(irq, NULL, tegra241_cmdqv_isr, + IRQF_ONESHOT, "tegra241-cmdqv", + cmdqv); if (ret) { dev_err(cmdqv->dev, "failed to request irq (%d): %d\n", cmdqv->irq, ret);
An impl driver might support its own vIOMMU object, as the following patch will add IOMMU_VIOMMU_TYPE_TEGRA241_CMDQV.
Add a vsmmu_alloc op to give impl a try, upon failure fallback to standard vsmmu allocation for IOMMU_VIOMMU_TYPE_ARM_SMMUV3.
Signed-off-by: Nicolin Chen nicolinc@nvidia.com --- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 6 ++++++ .../iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c | 17 +++++++++++------ 2 files changed, 17 insertions(+), 6 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h index 6b8f0d20dac3..a5835af72417 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h @@ -16,6 +16,7 @@ #include <linux/sizes.h>
struct arm_smmu_device; +struct arm_smmu_domain;
/* MMIO registers */ #define ARM_SMMU_IDR0 0x0 @@ -720,6 +721,11 @@ struct arm_smmu_impl_ops { int (*init_structures)(struct arm_smmu_device *smmu); struct arm_smmu_cmdq *(*get_secondary_cmdq)( struct arm_smmu_device *smmu, struct arm_smmu_cmdq_ent *ent); + struct arm_vsmmu *(*vsmmu_alloc)( + struct arm_smmu_device *smmu, + struct arm_smmu_domain *smmu_domain, struct iommufd_ctx *ictx, + unsigned int viommu_type, + const struct iommu_user_data *user_data); };
/* An SMMUv3 instance */ diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c index 66855cae775e..aa8653af50f2 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c @@ -392,10 +392,7 @@ struct iommufd_viommu *arm_vsmmu_alloc(struct device *dev, iommu_get_iommu_dev(dev, struct arm_smmu_device, iommu); struct arm_smmu_master *master = dev_iommu_priv_get(dev); struct arm_smmu_domain *s2_parent = to_smmu_domain(parent); - struct arm_vsmmu *vsmmu; - - if (viommu_type != IOMMU_VIOMMU_TYPE_ARM_SMMUV3) - return ERR_PTR(-EOPNOTSUPP); + struct arm_vsmmu *vsmmu = NULL;
if (!(smmu->features & ARM_SMMU_FEAT_NESTING)) return ERR_PTR(-EOPNOTSUPP); @@ -423,8 +420,16 @@ struct iommufd_viommu *arm_vsmmu_alloc(struct device *dev, !(smmu->features & ARM_SMMU_FEAT_S2FWB)) return ERR_PTR(-EOPNOTSUPP);
- vsmmu = iommufd_viommu_alloc(ictx, struct arm_vsmmu, core, - &arm_vsmmu_ops); + if (master->smmu->impl_ops && master->smmu->impl_ops->vsmmu_alloc) + vsmmu = master->smmu->impl_ops->vsmmu_alloc( + master->smmu, s2_parent, ictx, viommu_type, user_data); + if (PTR_ERR(vsmmu) == -EOPNOTSUPP) { + if (viommu_type != IOMMU_VIOMMU_TYPE_ARM_SMMUV3) + return ERR_PTR(-EOPNOTSUPP); + /* Fallback to standard SMMUv3 type if viommu_type matches */ + vsmmu = iommufd_viommu_alloc(ictx, struct arm_vsmmu, core, + &arm_vsmmu_ops); + } if (IS_ERR(vsmmu)) return ERR_CAST(vsmmu);
From: Nicolin Chen nicolinc@nvidia.com Sent: Friday, April 11, 2025 2:38 PM
An impl driver might support its own vIOMMU object, as the following patch will add IOMMU_VIOMMU_TYPE_TEGRA241_CMDQV.
Add a vsmmu_alloc op to give impl a try, upon failure fallback to standard vsmmu allocation for IOMMU_VIOMMU_TYPE_ARM_SMMUV3.
Signed-off-by: Nicolin Chen nicolinc@nvidia.com
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 6 ++++++ .../iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c | 17 +++++++++++-
2 files changed, 17 insertions(+), 6 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h index 6b8f0d20dac3..a5835af72417 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h @@ -16,6 +16,7 @@ #include <linux/sizes.h>
struct arm_smmu_device; +struct arm_smmu_domain;
/* MMIO registers */ #define ARM_SMMU_IDR0 0x0 @@ -720,6 +721,11 @@ struct arm_smmu_impl_ops { int (*init_structures)(struct arm_smmu_device *smmu); struct arm_smmu_cmdq *(*get_secondary_cmdq)( struct arm_smmu_device *smmu, struct arm_smmu_cmdq_ent *ent);
- struct arm_vsmmu *(*vsmmu_alloc)(
struct arm_smmu_device *smmu,
struct arm_smmu_domain *smmu_domain, struct
iommufd_ctx *ictx,
unsigned int viommu_type,
const struct iommu_user_data *user_data);
};
/* An SMMUv3 instance */ diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c index 66855cae775e..aa8653af50f2 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c @@ -392,10 +392,7 @@ struct iommufd_viommu *arm_vsmmu_alloc(struct device *dev, iommu_get_iommu_dev(dev, struct arm_smmu_device, iommu); struct arm_smmu_master *master = dev_iommu_priv_get(dev); struct arm_smmu_domain *s2_parent = to_smmu_domain(parent);
- struct arm_vsmmu *vsmmu;
- if (viommu_type != IOMMU_VIOMMU_TYPE_ARM_SMMUV3)
return ERR_PTR(-EOPNOTSUPP);
struct arm_vsmmu *vsmmu = NULL;
if (!(smmu->features & ARM_SMMU_FEAT_NESTING)) return ERR_PTR(-EOPNOTSUPP);
@@ -423,8 +420,16 @@ struct iommufd_viommu *arm_vsmmu_alloc(struct device *dev, !(smmu->features & ARM_SMMU_FEAT_S2FWB)) return ERR_PTR(-EOPNOTSUPP);
- vsmmu = iommufd_viommu_alloc(ictx, struct arm_vsmmu, core,
&arm_vsmmu_ops);
- if (master->smmu->impl_ops && master->smmu->impl_ops-
vsmmu_alloc)
vsmmu = master->smmu->impl_ops->vsmmu_alloc(
master->smmu, s2_parent, ictx, viommu_type,
user_data);
- if (PTR_ERR(vsmmu) == -EOPNOTSUPP) {
did it work on standard SMMUv3 when there is no @vsmmu_alloc() and the variable 'vsmmu' is initialized to NULL?
if (viommu_type != IOMMU_VIOMMU_TYPE_ARM_SMMUV3)
return ERR_PTR(-EOPNOTSUPP);
/* Fallback to standard SMMUv3 type if viommu_type
matches */
vsmmu = iommufd_viommu_alloc(ictx, struct arm_vsmmu,
core,
&arm_vsmmu_ops);
- } if (IS_ERR(vsmmu)) return ERR_CAST(vsmmu);
-- 2.43.0
On Mon, Apr 21, 2025 at 08:23:50AM +0000, Tian, Kevin wrote:
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c index 66855cae775e..aa8653af50f2 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c @@ -392,10 +392,7 @@ struct iommufd_viommu *arm_vsmmu_alloc(struct device *dev, iommu_get_iommu_dev(dev, struct arm_smmu_device, iommu); struct arm_smmu_master *master = dev_iommu_priv_get(dev); struct arm_smmu_domain *s2_parent = to_smmu_domain(parent);
- struct arm_vsmmu *vsmmu;
- if (viommu_type != IOMMU_VIOMMU_TYPE_ARM_SMMUV3)
return ERR_PTR(-EOPNOTSUPP);
struct arm_vsmmu *vsmmu = NULL;
if (!(smmu->features & ARM_SMMU_FEAT_NESTING)) return ERR_PTR(-EOPNOTSUPP);
@@ -423,8 +420,16 @@ struct iommufd_viommu *arm_vsmmu_alloc(struct device *dev, !(smmu->features & ARM_SMMU_FEAT_S2FWB)) return ERR_PTR(-EOPNOTSUPP);
- vsmmu = iommufd_viommu_alloc(ictx, struct arm_vsmmu, core,
&arm_vsmmu_ops);
- if (master->smmu->impl_ops && master->smmu->impl_ops-
vsmmu_alloc)
vsmmu = master->smmu->impl_ops->vsmmu_alloc(
master->smmu, s2_parent, ictx, viommu_type,
user_data);
- if (PTR_ERR(vsmmu) == -EOPNOTSUPP) {
did it work on standard SMMUv3 when there is no @vsmmu_alloc() and the variable 'vsmmu' is initialized to NULL?
You are right. We should do:
- struct arm_vsmmu *vsmmu = NULL; + struct arm_vsmmu *vsmmu = ERR_PTR(-EOPNOTSUPP);
Thanks! Nicolin
Add the support via vIOMMU infrastructure for virtualization use case.
This basically allows VMM to allocate VINTFs (as a vIOMMU object) and assign VCMDQs (vCMDQ objects) to it. A VINTF's MMIO page0 can be mmap'd to user space for VM to access directly without VMEXIT and corresponding hypercall.
As an initial version, the number of VCMDQs per VINTF is fixed to two.
Signed-off-by: Nicolin Chen nicolinc@nvidia.com --- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 15 + include/uapi/linux/iommufd.h | 34 ++ .../arm/arm-smmu-v3/arm-smmu-v3-iommufd.c | 6 +- .../iommu/arm/arm-smmu-v3/tegra241-cmdqv.c | 322 +++++++++++++++++- 4 files changed, 370 insertions(+), 7 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h index a5835af72417..84f8dfdd9638 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h @@ -999,6 +999,14 @@ int arm_smmu_cmdq_issue_cmdlist(struct arm_smmu_device *smmu, struct arm_smmu_cmdq *cmdq, u64 *cmds, int n, bool sync);
+static inline phys_addr_t +arm_smmu_domain_ipa_to_pa(struct arm_smmu_domain *smmu_domain, u64 ipa) +{ + if (WARN_ON_ONCE(smmu_domain->stage != ARM_SMMU_DOMAIN_S2)) + return 0; + return iommu_iova_to_phys(&smmu_domain->domain, ipa); +} + #ifdef CONFIG_ARM_SMMU_V3_SVA bool arm_smmu_sva_supported(struct arm_smmu_device *smmu); bool arm_smmu_master_sva_supported(struct arm_smmu_master *master); @@ -1075,9 +1083,16 @@ int arm_smmu_attach_prepare_vmaster(struct arm_smmu_attach_state *state, void arm_smmu_attach_commit_vmaster(struct arm_smmu_attach_state *state); void arm_smmu_master_clear_vmaster(struct arm_smmu_master *master); int arm_vmaster_report_event(struct arm_smmu_vmaster *vmaster, u64 *evt); +struct iommu_domain * +arm_vsmmu_alloc_domain_nested(struct iommufd_viommu *viommu, u32 flags, + const struct iommu_user_data *user_data); +int arm_vsmmu_cache_invalidate(struct iommufd_viommu *viommu, + struct iommu_user_data_array *array); #else #define arm_smmu_hw_info NULL #define arm_vsmmu_alloc NULL +#define arm_vsmmu_alloc_domain_nested NULL +#define arm_vsmmu_cache_invalidate NULL
static inline int arm_smmu_attach_prepare_vmaster(struct arm_smmu_attach_state *state, diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h index 428f1e9e6239..ce20f038b56b 100644 --- a/include/uapi/linux/iommufd.h +++ b/include/uapi/linux/iommufd.h @@ -952,10 +952,28 @@ struct iommu_fault_alloc { * enum iommu_viommu_type - Virtual IOMMU Type * @IOMMU_VIOMMU_TYPE_DEFAULT: Reserved for future use * @IOMMU_VIOMMU_TYPE_ARM_SMMUV3: ARM SMMUv3 driver specific type + * @IOMMU_VIOMMU_TYPE_TEGRA241_CMDQV: NVIDIA Tegra241 CMDQV Extension for SMMUv3 */ enum iommu_viommu_type { IOMMU_VIOMMU_TYPE_DEFAULT = 0, IOMMU_VIOMMU_TYPE_ARM_SMMUV3 = 1, + IOMMU_VIOMMU_TYPE_TEGRA241_CMDQV = 2, +}; + +/** + * struct iommu_viommu_tegra241_cmdqv - NVIDIA Tegra241 CMDQV Virtual Interface + * (IOMMU_VIOMMU_TYPE_TEGRA241_CMDQV) + * @out_vintf_page0_pgoff: Offset of the VINTF page0 for mmap syscall + * @out_vintf_page0_pgsz: Size of the VINTF page0 for mmap syscall + * + * Both @out_vintf_page0_pgoff and @out_vintf_page0_pgsz are given by the kernel + * for user space to mmap the VINTF page0 from the host physical address space + * to the guest physical address space so that a guest kernel can directly R/W + * access to the VINTF page0 in order to control its virtual comamnd queues. + */ +struct iommu_viommu_tegra241_cmdqv { + __aligned_u64 out_vintf_page0_pgoff; + __aligned_u64 out_vintf_page0_pgsz; };
/** @@ -1152,9 +1170,25 @@ struct iommu_veventq_alloc { /** * enum iommu_vcmdq_type - Virtual Queue Type * @IOMMU_VCMDQ_TYPE_DEFAULT: Reserved for future use + * @IOMMU_VCMDQ_TYPE_TEGRA241_CMDQV: NVIDIA Tegra241 CMDQV Extension for SMMUv3 */ enum iommu_vcmdq_data_type { IOMMU_VCMDQ_TYPE_DEFAULT = 0, + IOMMU_VCMDQ_TYPE_TEGRA241_CMDQV = 1, +}; + +/** + * struct iommu_vqueue_tegra241_cmdqv - NVIDIA Tegra241's Virtual Command Queue + * for its CMDQV Extension for ARM SMMUv3 + * (IOMMU_VCMDQ_TYPE_TEGRA241_CMDQV) + * @vcmdq_id: logical ID of a virtual command queue in the vIOMMU instance + * @vcmdq_log2size: (1 << @vcmdq_log2size) will be the size of the vcmdq + * @vcmdq_base: guest physical address (IPA) to the vcmdq base address + */ +struct iommu_vcmdq_tegra241_cmdqv { + __u32 vcmdq_id; + __u32 vcmdq_log2size; + __aligned_u64 vcmdq_base; };
/** diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c index aa8653af50f2..162872d82cd4 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c @@ -216,7 +216,7 @@ static int arm_smmu_validate_vste(struct iommu_hwpt_arm_smmuv3 *arg, return 0; }
-static struct iommu_domain * +struct iommu_domain * arm_vsmmu_alloc_domain_nested(struct iommufd_viommu *viommu, u32 flags, const struct iommu_user_data *user_data) { @@ -327,8 +327,8 @@ static int arm_vsmmu_convert_user_cmd(struct arm_vsmmu *vsmmu, return 0; }
-static int arm_vsmmu_cache_invalidate(struct iommufd_viommu *viommu, - struct iommu_user_data_array *array) +int arm_vsmmu_cache_invalidate(struct iommufd_viommu *viommu, + struct iommu_user_data_array *array) { struct arm_vsmmu *vsmmu = container_of(viommu, struct arm_vsmmu, core); struct arm_smmu_device *smmu = vsmmu->smmu; diff --git a/drivers/iommu/arm/arm-smmu-v3/tegra241-cmdqv.c b/drivers/iommu/arm/arm-smmu-v3/tegra241-cmdqv.c index ba029f7d24ce..b778739f845a 100644 --- a/drivers/iommu/arm/arm-smmu-v3/tegra241-cmdqv.c +++ b/drivers/iommu/arm/arm-smmu-v3/tegra241-cmdqv.c @@ -8,7 +8,9 @@ #include <linux/dma-mapping.h> #include <linux/interrupt.h> #include <linux/iommu.h> +#include <linux/iommufd.h> #include <linux/iopoll.h> +#include <uapi/linux/iommufd.h>
#include <acpi/acpixf.h>
@@ -26,6 +28,7 @@ #define CMDQV_EN BIT(0)
#define TEGRA241_CMDQV_PARAM 0x0004 +#define CMDQV_NUM_SID_PER_VM_LOG2 GENMASK(15, 12) #define CMDQV_NUM_VINTF_LOG2 GENMASK(11, 8) #define CMDQV_NUM_VCMDQ_LOG2 GENMASK(7, 4)
@@ -53,6 +56,9 @@ #define VINTF_STATUS GENMASK(3, 1) #define VINTF_ENABLED BIT(0)
+#define TEGRA241_VINTF_SID_MATCH(s) (0x0040 + 0x4*(s)) +#define TEGRA241_VINTF_SID_REPLACE(s) (0x0080 + 0x4*(s)) + #define TEGRA241_VINTF_LVCMDQ_ERR_MAP_64(m) \ (0x00C0 + 0x8*(m)) #define LVCMDQ_ERR_MAP_NUM_64 2 @@ -114,6 +120,7 @@ MODULE_PARM_DESC(bypass_vcmdq,
/** * struct tegra241_vcmdq - Virtual Command Queue + * @core: Embedded iommufd_vcmdq structure * @idx: Global index in the CMDQV * @lidx: Local index in the VINTF * @enabled: Enable status @@ -124,6 +131,8 @@ MODULE_PARM_DESC(bypass_vcmdq, * @page1: MMIO Page1 base address */ struct tegra241_vcmdq { + struct iommufd_vcmdq core; + u16 idx; u16 lidx;
@@ -136,18 +145,26 @@ struct tegra241_vcmdq { void __iomem *page0; void __iomem *page1; }; +#define core_to_vcmdq(v) container_of(v, struct tegra241_vcmdq, core)
/** * struct tegra241_vintf - Virtual Interface + * @vsmmue: Embedded arm_vsmmu structure * @idx: Global index in the CMDQV + * @vmid: VMID for configuration * @enabled: Enable status * @hyp_own: Owned by hypervisor (in-kernel) * @cmdqv: Parent CMDQV pointer * @lvcmdqs: List of logical VCMDQ pointers * @base: MMIO base address + * @s2_domain: Stage-2 SMMU domain + * @sid_slots: Stream ID Slot allocator */ struct tegra241_vintf { + struct arm_vsmmu vsmmu; + u16 idx; + u16 vmid;
bool enabled; bool hyp_own; @@ -156,6 +173,25 @@ struct tegra241_vintf { struct tegra241_vcmdq **lvcmdqs;
void __iomem *base; + struct arm_smmu_domain *s2_domain; + unsigned long mmap_pgoff; + + struct ida sid_slots; +}; +#define viommu_to_vintf(v) container_of(v, struct tegra241_vintf, vsmmu.core) + +/** + * struct tegra241_vintf_slot - Virtual Interface Stream ID Slot + * @core: Embedded iommufd_vdevice structure + * @vintf: Parent VINTF pointer + * @sid: Physical Stream ID + * @id: Slot index in the VINTF + */ +struct tegra241_vintf_slot { + struct iommufd_vdevice core; + struct tegra241_vintf *vintf; + u32 sid; + u8 idx; };
/** @@ -163,10 +199,12 @@ struct tegra241_vintf { * @smmu: SMMUv3 device * @dev: CMDQV device * @base: MMIO base address + * @base_phys: MMIO physical base address, for mmap * @irq: IRQ number * @num_vintfs: Total number of VINTFs * @num_vcmdqs: Total number of VCMDQs * @num_lvcmdqs_per_vintf: Number of logical VCMDQs per VINTF + * @num_sids_per_vintf: Total number of SID replacements per VINTF * @vintf_ids: VINTF id allocator * @vintfs: List of VINTFs */ @@ -175,12 +213,14 @@ struct tegra241_cmdqv { struct device *dev;
void __iomem *base; + phys_addr_t base_phys; int irq;
/* CMDQV Hardware Params */ u16 num_vintfs; u16 num_vcmdqs; u16 num_lvcmdqs_per_vintf; + u16 num_sids_per_vintf;
struct ida vintf_ids;
@@ -379,6 +419,11 @@ static void tegra241_vcmdq_hw_deinit(struct tegra241_vcmdq *vcmdq) dev_dbg(vcmdq->cmdqv->dev, "%sdeinited\n", h); }
+static void _tegra241_vcmdq_hw_init(struct tegra241_vcmdq *vcmdq) +{ + writeq_relaxed(vcmdq->cmdq.q.q_base, REG_VCMDQ_PAGE1(vcmdq, BASE)); +} + static int tegra241_vcmdq_hw_init(struct tegra241_vcmdq *vcmdq) { char header[64], *h = lvcmdq_error_header(vcmdq, header, 64); @@ -388,7 +433,7 @@ static int tegra241_vcmdq_hw_init(struct tegra241_vcmdq *vcmdq) tegra241_vcmdq_hw_deinit(vcmdq);
/* Configure and enable VCMDQ */ - writeq_relaxed(vcmdq->cmdq.q.q_base, REG_VCMDQ_PAGE1(vcmdq, BASE)); + _tegra241_vcmdq_hw_init(vcmdq);
ret = vcmdq_write_config(vcmdq, VCMDQ_EN); if (ret) { @@ -407,11 +452,16 @@ static int tegra241_vcmdq_hw_init(struct tegra241_vcmdq *vcmdq) static void tegra241_vintf_hw_deinit(struct tegra241_vintf *vintf) { u16 lidx; + int slot;
for (lidx = 0; lidx < vintf->cmdqv->num_lvcmdqs_per_vintf; lidx++) if (vintf->lvcmdqs && vintf->lvcmdqs[lidx]) tegra241_vcmdq_hw_deinit(vintf->lvcmdqs[lidx]); vintf_write_config(vintf, 0); + for (slot = 0; slot < vintf->cmdqv->num_sids_per_vintf; slot++) { + writel_relaxed(0, REG_VINTF(vintf, SID_REPLACE(slot))); + writel_relaxed(0, REG_VINTF(vintf, SID_MATCH(slot))); + } }
static int tegra241_vintf_hw_init(struct tegra241_vintf *vintf, bool hyp_own) @@ -429,7 +479,8 @@ static int tegra241_vintf_hw_init(struct tegra241_vintf *vintf, bool hyp_own) * whether enabling it here or not, as !HYP_OWN cmdq HWs only support a * restricted set of supported commands. */ - regval = FIELD_PREP(VINTF_HYP_OWN, hyp_own); + regval = FIELD_PREP(VINTF_HYP_OWN, hyp_own) | + FIELD_PREP(VINTF_VMID, vintf->vmid); writel(regval, REG_VINTF(vintf, CONFIG));
ret = vintf_write_config(vintf, regval | VINTF_EN); @@ -555,7 +606,9 @@ static void tegra241_vintf_free_lvcmdq(struct tegra241_vintf *vintf, u16 lidx)
dev_dbg(vintf->cmdqv->dev, "%sdeallocated\n", lvcmdq_error_header(vcmdq, header, 64)); - kfree(vcmdq); + /* Guest-owned VCMDQ is free-ed with vcmdq by iommufd core */ + if (vcmdq->vintf->hyp_own) + kfree(vcmdq); }
static struct tegra241_vcmdq * @@ -594,6 +647,9 @@ tegra241_vintf_alloc_lvcmdq(struct tegra241_vintf *vintf, u16 lidx)
static void tegra241_cmdqv_deinit_vintf(struct tegra241_cmdqv *cmdqv, u16 idx) { + if (cmdqv->vintfs[idx]->mmap_pgoff) + iommufd_ctx_free_mmap(cmdqv->vintfs[idx]->vsmmu.core.ictx, + cmdqv->vintfs[idx]->mmap_pgoff); kfree(cmdqv->vintfs[idx]->lvcmdqs); ida_free(&cmdqv->vintf_ids, idx); cmdqv->vintfs[idx] = NULL; @@ -649,7 +705,11 @@ static void tegra241_cmdqv_remove_vintf(struct tegra241_cmdqv *cmdqv, u16 idx)
dev_dbg(cmdqv->dev, "VINTF%u: deallocated\n", vintf->idx); tegra241_cmdqv_deinit_vintf(cmdqv, idx); - kfree(vintf); + if (!vintf->hyp_own) + ida_destroy(&vintf->sid_slots); + /* Guest-owned VINTF is free-ed with viommu by iommufd core */ + if (vintf->hyp_own) + kfree(vintf); }
static void tegra241_cmdqv_remove(struct arm_smmu_device *smmu) @@ -677,10 +737,17 @@ static void tegra241_cmdqv_remove(struct arm_smmu_device *smmu) put_device(cmdqv->dev); /* smmu->impl_dev */ }
+static struct arm_vsmmu * +tegra241_cmdqv_vsmmu_alloc(struct arm_smmu_device *smmu, + struct arm_smmu_domain *smmu_domain, + struct iommufd_ctx *ictx, unsigned int viommu_type, + const struct iommu_user_data *user_data); + static struct arm_smmu_impl_ops tegra241_cmdqv_impl_ops = { .get_secondary_cmdq = tegra241_cmdqv_get_cmdq, .device_reset = tegra241_cmdqv_hw_reset, .device_remove = tegra241_cmdqv_remove, + .vsmmu_alloc = tegra241_cmdqv_vsmmu_alloc, };
/* Probe Functions */ @@ -822,6 +889,7 @@ __tegra241_cmdqv_probe(struct arm_smmu_device *smmu, struct resource *res, cmdqv->irq = irq; cmdqv->base = base; cmdqv->dev = smmu->impl_dev; + cmdqv->base_phys = res->start;
if (cmdqv->irq > 0) { ret = request_threaded_irq(irq, NULL, tegra241_cmdqv_isr, @@ -838,6 +906,8 @@ __tegra241_cmdqv_probe(struct arm_smmu_device *smmu, struct resource *res, cmdqv->num_vintfs = 1 << FIELD_GET(CMDQV_NUM_VINTF_LOG2, regval); cmdqv->num_vcmdqs = 1 << FIELD_GET(CMDQV_NUM_VCMDQ_LOG2, regval); cmdqv->num_lvcmdqs_per_vintf = cmdqv->num_vcmdqs / cmdqv->num_vintfs; + cmdqv->num_sids_per_vintf = + 1 << FIELD_GET(CMDQV_NUM_SID_PER_VM_LOG2, regval);
cmdqv->vintfs = kcalloc(cmdqv->num_vintfs, sizeof(*cmdqv->vintfs), GFP_KERNEL); @@ -891,3 +961,247 @@ struct arm_smmu_device *tegra241_cmdqv_probe(struct arm_smmu_device *smmu) put_device(smmu->impl_dev); return ERR_PTR(-ENODEV); } + +/* User-space vIOMMU and vCMDQ Functions */ + +static int tegra241_vcmdq_hw_init_user(struct tegra241_vcmdq *vcmdq) +{ + char header[64]; + + /* Configure the vcmdq only; User space does the enabling */ + _tegra241_vcmdq_hw_init(vcmdq); + + dev_dbg(vcmdq->cmdqv->dev, "%sinited at host PA 0x%llx size 0x%lx\n", + lvcmdq_error_header(vcmdq, header, 64), + vcmdq->cmdq.q.q_base & VCMDQ_ADDR, + 1UL << (vcmdq->cmdq.q.q_base & VCMDQ_LOG2SIZE)); + return 0; +} + +static struct iommufd_vcmdq * +tegra241_cmdqv_vcmdq_alloc(struct iommufd_viommu *viommu, + const struct iommu_user_data *user_data) +{ + struct tegra241_vintf *vintf = viommu_to_vintf(viommu); + struct tegra241_cmdqv *cmdqv = vintf->cmdqv; + struct iommu_vcmdq_tegra241_cmdqv arg; + struct tegra241_vcmdq *vcmdq; + phys_addr_t q_base; + char header[64]; + int ret; + + ret = iommu_copy_struct_from_user( + &arg, user_data, IOMMU_VCMDQ_TYPE_TEGRA241_CMDQV, vcmdq_base); + if (ret) + return ERR_PTR(ret); + + if (!arg.vcmdq_base || arg.vcmdq_base & ~VCMDQ_ADDR) + return ERR_PTR(-EINVAL); + if (!arg.vcmdq_log2size || arg.vcmdq_log2size > VCMDQ_LOG2SIZE) + return ERR_PTR(-EINVAL); + if (arg.vcmdq_id >= cmdqv->num_lvcmdqs_per_vintf) + return ERR_PTR(-EINVAL); + q_base = arm_smmu_domain_ipa_to_pa(vintf->s2_domain, arg.vcmdq_base); + if (!q_base) + return ERR_PTR(-EINVAL); + + if (vintf->lvcmdqs[arg.vcmdq_id]) { + vcmdq = vintf->lvcmdqs[arg.vcmdq_id]; + + /* deinit the previous setting as a reset, before re-init */ + tegra241_vcmdq_hw_deinit(vcmdq); + + vcmdq->cmdq.q.q_base = q_base & VCMDQ_ADDR; + vcmdq->cmdq.q.q_base |= arg.vcmdq_log2size; + tegra241_vcmdq_hw_init_user(vcmdq); + + return &vcmdq->core; + } + + vcmdq = iommufd_vcmdq_alloc(viommu, struct tegra241_vcmdq, core); + if (!vcmdq) + return ERR_PTR(-ENOMEM); + + ret = tegra241_vintf_init_lvcmdq(vintf, arg.vcmdq_id, vcmdq); + if (ret) + goto free_vcmdq; + dev_dbg(cmdqv->dev, "%sallocated\n", + lvcmdq_error_header(vcmdq, header, 64)); + + vcmdq->cmdq.q.q_base = q_base & VCMDQ_ADDR; + vcmdq->cmdq.q.q_base |= arg.vcmdq_log2size; + + ret = tegra241_vcmdq_hw_init_user(vcmdq); + if (ret) + goto free_vcmdq; + vintf->lvcmdqs[arg.vcmdq_id] = vcmdq; + + return &vcmdq->core; +free_vcmdq: + iommufd_struct_destroy(viommu->ictx, vcmdq, core); + return ERR_PTR(ret); +} + +static void tegra241_cmdqv_vcmdq_free(struct iommufd_vcmdq *core) +{ + struct tegra241_vcmdq *vcmdq = core_to_vcmdq(core); + + tegra241_vintf_remove_lvcmdq(vcmdq->vintf, vcmdq->lidx); + + /* IOMMUFD core frees the memory of vcmdq and vcmdq */ +} + +static void tegra241_cmdqv_viommu_destroy(struct iommufd_viommu *viommu) +{ + struct tegra241_vintf *vintf = viommu_to_vintf(viommu); + + tegra241_cmdqv_remove_vintf(vintf->cmdqv, vintf->idx); + + /* IOMMUFD core frees the memory of vintf and viommu */ +} + +static struct iommufd_vdevice * +tegra241_cmdqv_vdevice_alloc(struct iommufd_viommu *viommu, struct device *dev, + u64 dev_id) +{ + struct tegra241_vintf *vintf = viommu_to_vintf(viommu); + struct arm_smmu_master *master = dev_iommu_priv_get(dev); + struct arm_smmu_stream *stream = &master->streams[0]; + struct tegra241_vintf_slot *slot; + int idx; + + if (dev_id > UINT_MAX) + return ERR_PTR(-EINVAL); + + slot = iommufd_vdevice_alloc(viommu, struct tegra241_vintf_slot, core); + if (!slot) + return ERR_PTR(-ENOMEM); + + WARN_ON_ONCE(master->num_streams != 1); + + /* Find an empty slot of SID_MATCH and SID_REPLACE */ + idx = ida_alloc_max(&vintf->sid_slots, + vintf->cmdqv->num_sids_per_vintf - 1, GFP_KERNEL); + if (idx < 0) { + iommufd_struct_destroy(viommu->ictx, slot, core); + return ERR_PTR(idx); + } + + writel_relaxed(stream->id, REG_VINTF(vintf, SID_REPLACE(idx))); + writel_relaxed(dev_id << 1 | 0x1, REG_VINTF(vintf, SID_MATCH(idx))); + dev_dbg(vintf->cmdqv->dev, + "VINTF%u: allocated a slot (%d) for pSID=%x, vSID=%x\n", + vintf->idx, idx, stream->id, (u32)dev_id); + + slot->idx = idx; + slot->vintf = vintf; + slot->sid = stream->id; + + return &slot->core; +} + +static void tegra241_cmdqv_vdevice_destroy(struct iommufd_vdevice *vdev) +{ + struct tegra241_vintf_slot *slot = + container_of(vdev, struct tegra241_vintf_slot, core); + struct tegra241_vintf *vintf = slot->vintf; + + writel_relaxed(0, REG_VINTF(vintf, SID_REPLACE(slot->idx))); + writel_relaxed(0, REG_VINTF(vintf, SID_MATCH(slot->idx))); + ida_free(&vintf->sid_slots, slot->idx); + dev_dbg(vintf->cmdqv->dev, + "VINTF%u: deallocated a slot (%d) for pSID=%x\n", vintf->idx, + slot->idx, slot->sid); + + /* IOMMUFD core frees the memory of slot and vdev */ +} + +static struct iommufd_viommu_ops tegra241_cmdqv_viommu_ops = { + .destroy = tegra241_cmdqv_viommu_destroy, + .alloc_domain_nested = arm_vsmmu_alloc_domain_nested, + .cache_invalidate = arm_vsmmu_cache_invalidate, + .vdevice_alloc = tegra241_cmdqv_vdevice_alloc, + .vdevice_destroy = tegra241_cmdqv_vdevice_destroy, + .vcmdq_alloc = tegra241_cmdqv_vcmdq_alloc, + .vcmdq_free = tegra241_cmdqv_vcmdq_free, +}; + +static struct arm_vsmmu * +tegra241_cmdqv_vsmmu_alloc(struct arm_smmu_device *smmu, + struct arm_smmu_domain *smmu_domain, + struct iommufd_ctx *ictx, unsigned int viommu_type, + const struct iommu_user_data *user_data) +{ + struct tegra241_cmdqv *cmdqv = + container_of(smmu, struct tegra241_cmdqv, smmu); + struct iommu_viommu_tegra241_cmdqv data; + struct tegra241_vintf *vintf; + phys_addr_t page0_base; + int ret; + + if (viommu_type != IOMMU_VIOMMU_TYPE_TEGRA241_CMDQV) + return ERR_PTR(-EOPNOTSUPP); + if (!smmu_domain || smmu_domain->stage != ARM_SMMU_DOMAIN_S2) + return ERR_PTR(-EINVAL); + if (!user_data) + return ERR_PTR(-EINVAL); + + ret = iommu_copy_struct_from_user(&data, user_data, + IOMMU_VIOMMU_TYPE_TEGRA241_CMDQV, + out_vintf_page0_pgsz); + if (ret) + return ERR_PTR(ret); + + vintf = iommufd_viommu_alloc(ictx, struct tegra241_vintf, vsmmu.core, + &tegra241_cmdqv_viommu_ops); + if (!vintf) + return ERR_PTR(-ENOMEM); + + ret = tegra241_cmdqv_init_vintf(cmdqv, cmdqv->num_vintfs - 1, vintf); + if (ret < 0) { + dev_err(cmdqv->dev, "no more available vintf\n"); + goto free_vintf; + } + + vintf->s2_domain = smmu_domain; + vintf->vmid = smmu_domain->s2_cfg.vmid; + + ret = tegra241_vintf_hw_init(vintf, false); + if (ret) + goto deinit_vintf; + + vintf->lvcmdqs = kcalloc(cmdqv->num_lvcmdqs_per_vintf, + sizeof(*vintf->lvcmdqs), GFP_KERNEL); + if (!vintf->lvcmdqs) { + ret = -ENOMEM; + goto hw_deinit_vintf; + } + + page0_base = cmdqv->base_phys + TEGRA241_VINTFi_PAGE0(vintf->idx); + ret = iommufd_ctx_alloc_mmap(ictx, page0_base, SZ_64K, true, + &vintf->mmap_pgoff); + if (ret) + goto hw_deinit_vintf; + + data.out_vintf_page0_pgsz = SZ_64K; + data.out_vintf_page0_pgoff = vintf->mmap_pgoff; + ret = iommu_copy_struct_to_user(user_data, &data, + IOMMU_VIOMMU_TYPE_TEGRA241_CMDQV, + out_vintf_page0_pgsz); + if (ret) + goto hw_deinit_vintf; + + ida_init(&vintf->sid_slots); + + dev_dbg(cmdqv->dev, "VINTF%u: allocated with vmid (%d)\n", vintf->idx, + vintf->vmid); + + return &vintf->vsmmu; +hw_deinit_vintf: + tegra241_vintf_hw_deinit(vintf); +deinit_vintf: + tegra241_cmdqv_deinit_vintf(cmdqv, vintf->idx); +free_vintf: + iommufd_struct_destroy(ictx, vintf, vsmmu.core); + return ERR_PTR(ret); +}
From: Nicolin Chen nicolinc@nvidia.com Sent: Friday, April 11, 2025 2:38 PM
Add the support via vIOMMU infrastructure for virtualization use case.
This basically allows VMM to allocate VINTFs (as a vIOMMU object) and assign VCMDQs (vCMDQ objects) to it. A VINTF's MMIO page0 can be mmap'd to user space for VM to access directly without VMEXIT and corresponding hypercall.
it'd be helpful to add a bit more context, e.g. what page0 contains, sid slots, vcmdq_base (mapped in s2), etc. so it's easier for one to understand it from start instead of by reading the code.
As an initial version, the number of VCMDQs per VINTF is fixed to two.
so an user could map both VCMDQs of an VINTF even when only one VCMDQ is created, given the entire 64K page0 is legible for mmap once the VINTF is associated to a viommu?
no security issue given the VINTF is not shared, but conceptually if feasible (e.g. two CMDQ's MMIO ranges sits in different 4k pages of VINTF page0) does it make sense to do per-VCMDQ mmap control and return mmap info at VCMDQ alloc?
+static struct iommufd_vcmdq * +tegra241_cmdqv_vcmdq_alloc(struct iommufd_viommu *viommu,
const struct iommu_user_data *user_data)
+{
- struct tegra241_vintf *vintf = viommu_to_vintf(viommu);
- struct tegra241_cmdqv *cmdqv = vintf->cmdqv;
- struct iommu_vcmdq_tegra241_cmdqv arg;
- struct tegra241_vcmdq *vcmdq;
- phys_addr_t q_base;
- char header[64];
- int ret;
- ret = iommu_copy_struct_from_user(
&arg, user_data, IOMMU_VCMDQ_TYPE_TEGRA241_CMDQV,
vcmdq_base);
- if (ret)
return ERR_PTR(ret);
- if (!arg.vcmdq_base || arg.vcmdq_base & ~VCMDQ_ADDR)
return ERR_PTR(-EINVAL);
- if (!arg.vcmdq_log2size || arg.vcmdq_log2size > VCMDQ_LOG2SIZE)
return ERR_PTR(-EINVAL);
- if (arg.vcmdq_id >= cmdqv->num_lvcmdqs_per_vintf)
return ERR_PTR(-EINVAL);
- q_base = arm_smmu_domain_ipa_to_pa(vintf->s2_domain,
arg.vcmdq_base);
- if (!q_base)
return ERR_PTR(-EINVAL);
- if (vintf->lvcmdqs[arg.vcmdq_id]) {
vcmdq = vintf->lvcmdqs[arg.vcmdq_id];
/* deinit the previous setting as a reset, before re-init */
tegra241_vcmdq_hw_deinit(vcmdq);
vcmdq->cmdq.q.q_base = q_base & VCMDQ_ADDR;
vcmdq->cmdq.q.q_base |= arg.vcmdq_log2size;
tegra241_vcmdq_hw_init_user(vcmdq);
return &vcmdq->core;
- }
why not returning -EBUSY here?
- vcmdq = iommufd_vcmdq_alloc(viommu, struct tegra241_vcmdq,
core);
- if (!vcmdq)
return ERR_PTR(-ENOMEM);
- ret = tegra241_vintf_init_lvcmdq(vintf, arg.vcmdq_id, vcmdq);
- if (ret)
goto free_vcmdq;
- dev_dbg(cmdqv->dev, "%sallocated\n",
lvcmdq_error_header(vcmdq, header, 64));
- vcmdq->cmdq.q.q_base = q_base & VCMDQ_ADDR;
- vcmdq->cmdq.q.q_base |= arg.vcmdq_log2size;
could the queue size be multiple pages? there is no guarantee that the HPA of guest queue would be contiguous :/
On Mon, Apr 21, 2025 at 08:37:40AM +0000, Tian, Kevin wrote:
From: Nicolin Chen nicolinc@nvidia.com Sent: Friday, April 11, 2025 2:38 PM
Add the support via vIOMMU infrastructure for virtualization use case.
This basically allows VMM to allocate VINTFs (as a vIOMMU object) and assign VCMDQs (vCMDQ objects) to it. A VINTF's MMIO page0 can be mmap'd to user space for VM to access directly without VMEXIT and corresponding hypercall.
it'd be helpful to add a bit more context, e.g. what page0 contains, sid slots, vcmdq_base (mapped in s2), etc. so it's easier for one to understand it from start instead of by reading the code.
Will do. It basically has all the control/status bits for direct vCMDQ controls.
As an initial version, the number of VCMDQs per VINTF is fixed to two.
so an user could map both VCMDQs of an VINTF even when only one VCMDQ is created, given the entire 64K page0 is legible for mmap once the VINTF is associated to a viommu?
Oh, that's a good point!
If a guest OS ignores the total number of VCMDQs emulated by the VMM and tries to enable the VCMDQ via the "reserved" MMIO region in the mmap'd VINTF page0, the host system would be spammed with vCMDQ TIMEOUTs that aren't supposed to happen nor be forwarded back to the guest.
It looks like we need some dynamic VCMDQ mapping to a VINTF v.s. static allocation, though we can still set the max number to 2.
no security issue given the VINTF is not shared, but conceptually if feasible (e.g. two CMDQ's MMIO ranges sits in different 4k pages of VINTF page0) does it make sense to do per-VCMDQ mmap control and return mmap info at VCMDQ alloc?
Page size can be 64K on ARM. And each additional logical VCMDQ (in a VINTF page0) has only an offset of 0x80. So, vCMDQ cannot be mmap'd individually.
- if (vintf->lvcmdqs[arg.vcmdq_id]) {
vcmdq = vintf->lvcmdqs[arg.vcmdq_id];
/* deinit the previous setting as a reset, before re-init */
tegra241_vcmdq_hw_deinit(vcmdq);
vcmdq->cmdq.q.q_base = q_base & VCMDQ_ADDR;
vcmdq->cmdq.q.q_base |= arg.vcmdq_log2size;
tegra241_vcmdq_hw_init_user(vcmdq);
return &vcmdq->core;
- }
why not returning -EBUSY here?
Hmm, this seems to a WAR that I forgot to drop! Will check and remove this.
- vcmdq = iommufd_vcmdq_alloc(viommu, struct tegra241_vcmdq,
core);
- if (!vcmdq)
return ERR_PTR(-ENOMEM);
- ret = tegra241_vintf_init_lvcmdq(vintf, arg.vcmdq_id, vcmdq);
- if (ret)
goto free_vcmdq;
- dev_dbg(cmdqv->dev, "%sallocated\n",
lvcmdq_error_header(vcmdq, header, 64));
- vcmdq->cmdq.q.q_base = q_base & VCMDQ_ADDR;
- vcmdq->cmdq.q.q_base |= arg.vcmdq_log2size;
could the queue size be multiple pages? there is no guarantee that the HPA of guest queue would be contiguous :/
It certainly can. VMM must make sure the guest PA are contiguous by using huge pages to back the guest RAM space. Kernel has no control of this but only has to trust the VMM.
I'm adding a note here: /* User space ensures that the queue memory is physically contiguous */
And likely something similar in the uAPI header too.
Thanks! Nicolin
From: Nicolin Chen nicolinc@nvidia.com Sent: Tuesday, April 22, 2025 3:14 AM On Mon, Apr 21, 2025 at 08:37:40AM +0000, Tian, Kevin wrote:
From: Nicolin Chen nicolinc@nvidia.com Sent: Friday, April 11, 2025 2:38 PM
- vcmdq = iommufd_vcmdq_alloc(viommu, struct tegra241_vcmdq,
core);
- if (!vcmdq)
return ERR_PTR(-ENOMEM);
- ret = tegra241_vintf_init_lvcmdq(vintf, arg.vcmdq_id, vcmdq);
- if (ret)
goto free_vcmdq;
- dev_dbg(cmdqv->dev, "%sallocated\n",
lvcmdq_error_header(vcmdq, header, 64));
- vcmdq->cmdq.q.q_base = q_base & VCMDQ_ADDR;
- vcmdq->cmdq.q.q_base |= arg.vcmdq_log2size;
could the queue size be multiple pages? there is no guarantee that the HPA of guest queue would be contiguous :/
It certainly can. VMM must make sure the guest PA are contiguous by using huge pages to back the guest RAM space. Kernel has no control of this but only has to trust the VMM.
I'm adding a note here: /* User space ensures that the queue memory is physically contiguous */
And likely something similar in the uAPI header too.
It's not a good idea having the kernel trust the VMM. Also I'm not sure the contiguity is guaranteed all the time with huge page (e.g. if just using THP).
@Jason?
btw does smmu only read the cmdq or also update some fields in the queue? If the latter, then it also brings a security hole as a malicious VMM could violate the contiguity requirement to instruct the smmu to touch pages which don't belong to it...
On Wed, Apr 23, 2025 at 08:05:49AM +0000, Tian, Kevin wrote:
It's not a good idea having the kernel trust the VMM.
It certainly shouldn't trust it, but it can validate the VMM's choice and generate a failure if it isn't good.
Also I'm not sure the contiguity is guaranteed all the time with huge page (e.g. if just using THP).
If things are aligned then the contiguity will work out. Ie a 64K aligned allocation on a 2M GPA is fine. I don't think there are edge cases where a GPA will be fragmented. It does rely on the VMM always getting some kind of huge page and then pinning it in iommufd.
IMHO this is bad HW design, but it is what it is..
btw does smmu only read the cmdq or also update some fields in the queue? If the latter, then it also brings a security hole as a malicious VMM could violate the contiguity requirement to instruct the smmu to touch pages which don't belong to it...
This really must be prevented. I haven't looked closely here, but the GPA -> PA mapping should go through the IOAS and that should generate a page list and that should be validated for contiguity.
It also needs to act like a mdev and lock down the part of the IOAS that provides that memory so the pin can't be released and UAF things.
Jason
On Wed, Apr 23, 2025 at 08:55:51AM -0300, Jason Gunthorpe wrote:
On Wed, Apr 23, 2025 at 08:05:49AM +0000, Tian, Kevin wrote:
It's not a good idea having the kernel trust the VMM.
It certainly shouldn't trust it, but it can validate the VMM's choice and generate a failure if it isn't good.
Also I'm not sure the contiguity is guaranteed all the time with huge page (e.g. if just using THP).
If things are aligned then the contiguity will work out. Ie a 64K aligned allocation on a 2M GPA is fine. I don't think there are edge cases where a GPA will be fragmented. It does rely on the VMM always getting some kind of huge page and then pinning it in iommufd.
With QEMU that does ensure the alignment when using system huge pages, I haven't seen any edge problem yet.
IMHO this is bad HW design, but it is what it is..
btw does smmu only read the cmdq or also update some fields in the queue? If the latter, then it also brings a security hole as a malicious VMM could violate the contiguity requirement to instruct the smmu to touch pages which don't belong to it...
This really must be prevented. I haven't looked closely here, but the GPA -> PA mapping should go through the IOAS and that should generate a page list and that should be validated for contiguity.
It also needs to act like a mdev and lock down the part of the IOAS that provides that memory so the pin can't be released and UAF things.
If I capture this correctly, the GPA->PA mapping is already done at the IOAS level for the S2 HWPT/domain, i.e. pages are already pinned. So we just need to a pair of for-driver APIs to validate the contiguity and refcount pages calling iopt_area_add_access().
Thanks Nicolin
On Wed, Apr 23, 2025 at 11:31:29AM -0700, Nicolin Chen wrote:
It also needs to act like a mdev and lock down the part of the IOAS that provides that memory so the pin can't be released and UAF things.
If I capture this correctly, the GPA->PA mapping is already done at the IOAS level for the S2 HWPT/domain, i.e. pages are already pinned. So we just need to a pair of for-driver APIs to validate the contiguity and refcount pages calling iopt_area_add_access().
Yes, adding an access is the key thing, the access will give you a page list which you can validate, but it also provides a way to synchronize if a hostile userspace does an unmap.
Jason
On Wed, Apr 23, 2025 at 08:13:33PM -0300, Jason Gunthorpe wrote:
On Wed, Apr 23, 2025 at 11:31:29AM -0700, Nicolin Chen wrote:
It also needs to act like a mdev and lock down the part of the IOAS that provides that memory so the pin can't be released and UAF things.
If I capture this correctly, the GPA->PA mapping is already done at the IOAS level for the S2 HWPT/domain, i.e. pages are already pinned. So we just need to a pair of for-driver APIs to validate the contiguity and refcount pages calling iopt_area_add_access().
Yes, adding an access is the key thing, the access will give you a page list which you can validate, but it also provides a way to synchronize if a hostile userspace does an unmap.
The new APIs are very like iommufd_access_pin/unpin_pages(). But to reduce the amount of code that we have to share with driver.o, I added a smaller iopt_area_get/put_access() that gets an access and increases/decreases the refcounts only.
Yet, this still inevitably doubled (-ish) the size of driver.o: text data bss dec hex filename 4429 296 0 4725 1275 drivers/iommu/iommufd/driver.o text data bss dec hex filename 8430 783 0 9213 23fd drivers/iommu/iommufd/driver.o
Meanwhile, I am thinking if we could use the known S2 domain to translate the GPAs to PAs for the contiguity test, which feels a little cleaner to do in an IOMMU driver v.s. with a page list?
Thanks Nicolin
From: Nicolin Chen nicolinc@nvidia.com Sent: Thursday, April 24, 2025 2:52 PM
On Wed, Apr 23, 2025 at 08:13:33PM -0300, Jason Gunthorpe wrote:
On Wed, Apr 23, 2025 at 11:31:29AM -0700, Nicolin Chen wrote:
It also needs to act like a mdev and lock down the part of the IOAS that provides that memory so the pin can't be released and UAF things.
If I capture this correctly, the GPA->PA mapping is already done at the IOAS level for the S2 HWPT/domain, i.e. pages are already pinned. So we just need to a pair of for-driver APIs to validate the contiguity and refcount pages calling iopt_area_add_access().
Yes, adding an access is the key thing, the access will give you a page list which you can validate, but it also provides a way to synchronize if a hostile userspace does an unmap.
The new APIs are very like iommufd_access_pin/unpin_pages(). But to reduce the amount of code that we have to share with driver.o, I added a smaller iopt_area_get/put_access() that gets an access and increases/decreases the refcounts only.
Yet, this still inevitably doubled (-ish) the size of driver.o: text data bss dec hex filename 4429 296 0 4725 1275 drivers/iommu/iommufd/driver.o text data bss dec hex filename 8430 783 0 9213 23fd drivers/iommu/iommufd/driver.o
Meanwhile, I am thinking if we could use the known S2 domain to translate the GPAs to PAs for the contiguity test, which feels a little cleaner to do in an IOMMU driver v.s. with a page list?
but w/o adding an access to increase the refcounts, it's unsafe to walk the S2 domain...
On Wed, Apr 23, 2025 at 11:51:53PM -0700, Nicolin Chen wrote:
On Wed, Apr 23, 2025 at 08:13:33PM -0300, Jason Gunthorpe wrote:
On Wed, Apr 23, 2025 at 11:31:29AM -0700, Nicolin Chen wrote:
It also needs to act like a mdev and lock down the part of the IOAS that provides that memory so the pin can't be released and UAF things.
If I capture this correctly, the GPA->PA mapping is already done at the IOAS level for the S2 HWPT/domain, i.e. pages are already pinned. So we just need to a pair of for-driver APIs to validate the contiguity and refcount pages calling iopt_area_add_access().
Yes, adding an access is the key thing, the access will give you a page list which you can validate, but it also provides a way to synchronize if a hostile userspace does an unmap.
The new APIs are very like iommufd_access_pin/unpin_pages(). But to reduce the amount of code that we have to share with driver.o, I added a smaller iopt_area_get/put_access() that gets an access and increases/decreases the refcounts only.
Maybe the access should be obtained by the core code to avoid the driver.o bloating? All the cmdq types need a memory buffer, right? Perhaps that should just be generalized
Meanwhile, I am thinking if we could use the known S2 domain to translate the GPAs to PAs for the contiguity test, which feels a little cleaner to do in an IOMMU driver v.s. with a page list?
You still need the access, and the access already generates a page list..
Jason
On Thu, Apr 24, 2025 at 10:40:49AM -0300, Jason Gunthorpe wrote:
On Wed, Apr 23, 2025 at 11:51:53PM -0700, Nicolin Chen wrote:
On Wed, Apr 23, 2025 at 08:13:33PM -0300, Jason Gunthorpe wrote:
On Wed, Apr 23, 2025 at 11:31:29AM -0700, Nicolin Chen wrote:
It also needs to act like a mdev and lock down the part of the IOAS that provides that memory so the pin can't be released and UAF things.
If I capture this correctly, the GPA->PA mapping is already done at the IOAS level for the S2 HWPT/domain, i.e. pages are already pinned. So we just need to a pair of for-driver APIs to validate the contiguity and refcount pages calling iopt_area_add_access().
Yes, adding an access is the key thing, the access will give you a page list which you can validate, but it also provides a way to synchronize if a hostile userspace does an unmap.
The new APIs are very like iommufd_access_pin/unpin_pages(). But to reduce the amount of code that we have to share with driver.o, I added a smaller iopt_area_get/put_access() that gets an access and increases/decreases the refcounts only.
Maybe the access should be obtained by the core code to avoid the driver.o bloating? All the cmdq types need a memory buffer, right? Perhaps that should just be generalized
Yes. AMD just confirmed that they needed a similar setup. I think we can do that!
Meanwhile, I am thinking if we could use the known S2 domain to translate the GPAs to PAs for the contiguity test, which feels a little cleaner to do in an IOMMU driver v.s. with a page list?
You still need the access, and the access already generates a page list..
Right. I was thinking it could save a few lines that fetches the page list, but then that would need another around of translation, which is unnecessary.
Thanks Nicolin
Add a new vEVENTQ type for VINTFs that are assigned to the user space. Simply report the two 64-bit LVCMDQ_ERR_MAPs register values.
Signed-off-by: Nicolin Chen nicolinc@nvidia.com --- include/uapi/linux/iommufd.h | 15 +++++++++++++ .../iommu/arm/arm-smmu-v3/tegra241-cmdqv.c | 22 +++++++++++++++++++ 2 files changed, 37 insertions(+)
diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h index ce20f038b56b..dd3f6c021024 100644 --- a/include/uapi/linux/iommufd.h +++ b/include/uapi/linux/iommufd.h @@ -1100,10 +1100,12 @@ struct iommufd_vevent_header { * enum iommu_veventq_type - Virtual Event Queue Type * @IOMMU_VEVENTQ_TYPE_DEFAULT: Reserved for future use * @IOMMU_VEVENTQ_TYPE_ARM_SMMUV3: ARM SMMUv3 Virtual Event Queue + * @IOMMU_VEVENTQ_TYPE_TEGRA241_CMDQV: NVIDIA Tegra241 CMDQV Extension IRQ */ enum iommu_veventq_type { IOMMU_VEVENTQ_TYPE_DEFAULT = 0, IOMMU_VEVENTQ_TYPE_ARM_SMMUV3 = 1, + IOMMU_VEVENTQ_TYPE_TEGRA241_CMDQV = 2, };
/** @@ -1127,6 +1129,19 @@ struct iommu_vevent_arm_smmuv3 { __aligned_le64 evt[4]; };
+/** + * struct iommu_vevent_tegra241_cmdqv - Tegra241 CMDQV Virtual IRQ + * (IOMMU_VEVENTQ_TYPE_TEGRA241_CMDQV) + * @lvcmdq_err_map: 128-bit logical vcmdq error map, little-endian. + * (Refer to register LVCMDQ_ERR_MAPs per VINTF ) + * + * The 128-bit register values from HW exclusively reflect the error bits for a + * Virtual Interface represented by a vIOMMU object. Read and report directly. + */ +struct iommu_vevent_tegra241_cmdqv { + __aligned_le64 lvcmdq_err_map[2]; +}; + /** * struct iommu_veventq_alloc - ioctl(IOMMU_VEVENTQ_ALLOC) * @size: sizeof(struct iommu_veventq_alloc) diff --git a/drivers/iommu/arm/arm-smmu-v3/tegra241-cmdqv.c b/drivers/iommu/arm/arm-smmu-v3/tegra241-cmdqv.c index b778739f845a..dafaeff8d51d 100644 --- a/drivers/iommu/arm/arm-smmu-v3/tegra241-cmdqv.c +++ b/drivers/iommu/arm/arm-smmu-v3/tegra241-cmdqv.c @@ -292,6 +292,20 @@ static inline int vcmdq_write_config(struct tegra241_vcmdq *vcmdq, u32 regval)
/* ISR Functions */
+static void tegra241_vintf_user_handle_error(struct tegra241_vintf *vintf) +{ + struct iommufd_viommu *viommu = &vintf->vsmmu.core; + struct iommu_vevent_tegra241_cmdqv vevent_data; + int i; + + for (i = 0; i < LVCMDQ_ERR_MAP_NUM_64; i++) + vevent_data.lvcmdq_err_map[i] = + readq_relaxed(REG_VINTF(vintf, LVCMDQ_ERR_MAP_64(i))); + + iommufd_viommu_report_event(viommu, IOMMU_VEVENTQ_TYPE_TEGRA241_CMDQV, + &vevent_data, sizeof(vevent_data)); +} + static void tegra241_vintf0_handle_error(struct tegra241_vintf *vintf) { int i; @@ -337,6 +351,14 @@ static irqreturn_t tegra241_cmdqv_isr(int irq, void *devid) vintf_map &= ~BIT_ULL(0); }
+ /* Handle other user VINTFs and their LVCMDQs */ + while (vintf_map) { + unsigned long idx = __ffs64(vintf_map); + + tegra241_vintf_user_handle_error(cmdqv->vintfs[idx]); + vintf_map &= ~BIT_ULL(idx); + } + return IRQ_HANDLED; }
Hi Nicolin,
On 4/11/2025 12:07 PM, Nicolin Chen wrote:
The vIOMMU object is designed to represent a slice of an IOMMU HW for its virtualization features shared with or passed to user space (a VM mostly) in a way of HW acceleration. This extended the HWPT-based design for more advanced virtualization feature.
A vCMDQ introduced by this series as a part of the vIOMMU infrastructure represents a HW supported queue/buffer for VM to use exclusively, e.g.
- NVIDIA's virtual command queue
- AMD vIOMMU's command buffer
I assume we can pass multiple buffer details (like GPA, size) from guest to hypervisor. Is that correct understanding?
either of which is an IOMMU HW feature to directly load and execute cache invalidation commands issued by a guest kernel, to shoot down TLB entries that HW cached for guest-owned stage-1 page table entries. This is a big improvement since there is no VM Exit during an invalidation, compared to the traditional invalidation pathway by trapping a guest-own invalidation queue and forwarding those commands/requests to the host kernel that will eventually fill a HW-owned queue to execute those commands.
Thus, a vCMDQ object, as an initial use case, is all about a guest-owned HW command queue that VMM can allocate/configure depending on the request from a guest kernel. Introduce a new IOMMUFD_OBJ_VCMDQ and its allocator IOMMUFD_CMD_VCMDQ_ALLOC allowing VMM to forward the IOMMU-specific queue info, such as queue base address, size, and etc.
Meanwhile, a guest-owned command queue needs the kernel (a command queue
driver) to control the queue by reading/writing its consumer and producer indexes, which means the command queue HW allows the guest kernel to get a direct R/W access to those registers. Introduce an mmap infrastructure to the iommufd core so as to support pass through a piece of MMIO region from the host physical address space to the guest physical address space. The VMA info (vm_pgoff/size) used by an mmap must be pre-allocated during the IOMMUFD_CMD_VCMDQ_ALLOC and given those info to the user space as an output driver-data by the IOMMUFD_CMD_VCMDQ_ALLOC. So, this requires a driver-specific user data support by a vIOMMU object.
Nice! Thanks.
-Vasant
On Wed, Apr 23, 2025 at 12:58:19PM +0530, Vasant Hegde wrote:
On 4/11/2025 12:07 PM, Nicolin Chen wrote:
The vIOMMU object is designed to represent a slice of an IOMMU HW for its virtualization features shared with or passed to user space (a VM mostly) in a way of HW acceleration. This extended the HWPT-based design for more advanced virtualization feature.
A vCMDQ introduced by this series as a part of the vIOMMU infrastructure represents a HW supported queue/buffer for VM to use exclusively, e.g.
- NVIDIA's virtual command queue
- AMD vIOMMU's command buffer
I assume we can pass multiple buffer details (like GPA, size) from guest to hypervisor. Is that correct understanding?
Yes. The NVIDIA model passes through a Virtual-Interface to a VM, and the VM can allocate and map multiple command queues (buffers) to the V-Interface, by providing each command queue info in:
+struct iommu_vcmdq_tegra241_cmdqv { + __u32 vcmdq_id; + __u32 vcmdq_log2size; // size + __aligned_u64 vcmdq_base; // GPA };
Thanks Nicolin
On 4/23/2025 1:15 PM, Nicolin Chen wrote:
On Wed, Apr 23, 2025 at 12:58:19PM +0530, Vasant Hegde wrote:
On 4/11/2025 12:07 PM, Nicolin Chen wrote:
The vIOMMU object is designed to represent a slice of an IOMMU HW for its virtualization features shared with or passed to user space (a VM mostly) in a way of HW acceleration. This extended the HWPT-based design for more advanced virtualization feature.
A vCMDQ introduced by this series as a part of the vIOMMU infrastructure represents a HW supported queue/buffer for VM to use exclusively, e.g.
- NVIDIA's virtual command queue
- AMD vIOMMU's command buffer
I assume we can pass multiple buffer details (like GPA, size) from guest to hypervisor. Is that correct understanding?
Yes. The NVIDIA model passes through a Virtual-Interface to a VM, and the VM can allocate and map multiple command queues (buffers) to the V-Interface, by providing each command queue info in:
+struct iommu_vcmdq_tegra241_cmdqv {
- __u32 vcmdq_id;
- __u32 vcmdq_log2size; // size
- __aligned_u64 vcmdq_base; // GPA
};
Nice. Thanks for the details.
-Vasant
From: Nicolin Chen nicolinc@nvidia.com Sent: Friday, April 11, 2025 2:38 PM
[...]
This is a big improvement since there is no VM Exit during an invalidation, compared to the traditional invalidation pathway by trapping a guest-own invalidation queue and forwarding those commands/requests to the host kernel that will eventually fill a HW-owned queue to execute those commands.
any data to show how big the improvements could be in major IOMMU usages (kernel dma, user dma and sva)?
On Thu, Apr 24, 2025 at 08:21:08AM +0000, Tian, Kevin wrote:
From: Nicolin Chen nicolinc@nvidia.com Sent: Friday, April 11, 2025 2:38 PM
[...]
This is a big improvement since there is no VM Exit during an invalidation, compared to the traditional invalidation pathway by trapping a guest-own invalidation queue and forwarding those commands/requests to the host kernel that will eventually fill a HW-owned queue to execute those commands.
any data to show how big the improvements could be in major IOMMU usages (kernel dma, user dma and sva)?
I thought I mentioned about the percentage of the gain somewhere but seemingly not in this series.. Will add.
Thanks! Nicolin
linux-kselftest-mirror@lists.linaro.org