iommufd gives userspace the capability to manipulate iommu subsytem. e.g. DMA map/unmap etc. In the near future, it will support iommu nested translation. Different platform vendors have different implementation for the nested translation. So before set up nested translation, userspace needs to know the hardware iommu information. For example, Intel VT-d supports using guest I/O page table as the stage-1 translation table. This requires guest I/O page table be compatible with hardware IOMMU.
This series reports the iommu hardware information for a given iommufd_device which has been bound to iommufd. It is preparation work for userspace to allocate hwpt for given device. Like the nested translation support[1].
This series introduces an iommu op to report the iommu hardware info, and an ioctl IOMMU_DEVICE_GET_HW_INFO is added to report such hardware info to user. enum iommu_hw_info_type is defined to differentiate the iommu hardware info reported to user hence user can decode them. This series only adds the framework for iommu hw info reporting, the complete reporting path needs vendor specific definition and driver support. The full picture is available in [1] as well.
base-commit: 4c7e97cb6e65eab02991f60a5cc7a4fed5498c7a
[1] https://github.com/yiliu1765/iommufd/tree/iommufd_nesting
Change log:
v2: - Drop patch 05 of v1 as it is already covered by other series - Rename the capability info to be iommu hardware info
v1: https://lore.kernel.org/linux-iommu/20230209041642.9346-1-yi.l.liu@intel.com...
Regards, Yi Liu
Lu Baolu (1): iommu: Add new iommu op to get iommu hardware information
Nicolin Chen (1): iommufd/selftest: Add coverage for IOMMU_DEVICE_GET_HW_INFO ioctl
Yi Liu (2): iommu: Move dev_iommu_ops() to private header iommufd: Add IOMMU_DEVICE_GET_HW_INFO
drivers/iommu/iommu-priv.h | 11 +++ drivers/iommu/iommu.c | 2 + drivers/iommu/iommufd/device.c | 75 +++++++++++++++++++ drivers/iommu/iommufd/iommufd_private.h | 1 + drivers/iommu/iommufd/iommufd_test.h | 15 ++++ drivers/iommu/iommufd/main.c | 3 + drivers/iommu/iommufd/selftest.c | 16 ++++ include/linux/iommu.h | 24 +++--- include/uapi/linux/iommufd.h | 47 ++++++++++++ tools/testing/selftests/iommu/iommufd.c | 17 ++++- tools/testing/selftests/iommu/iommufd_utils.h | 26 +++++++ 11 files changed, 225 insertions(+), 12 deletions(-)
dev_iommu_ops() is essentially only used in iommu subsystem, so move to a private header to avoid being abused by other drivers.
Suggested-by: Jason Gunthorpe jgg@nvidia.com Reviewed-by: Kevin Tian kevin.tian@intel.com Signed-off-by: Yi Liu yi.l.liu@intel.com --- drivers/iommu/iommu-priv.h | 11 +++++++++++ drivers/iommu/iommu.c | 2 ++ include/linux/iommu.h | 11 ----------- 3 files changed, 13 insertions(+), 11 deletions(-)
diff --git a/drivers/iommu/iommu-priv.h b/drivers/iommu/iommu-priv.h index 7c8011bfd153..a6e694f59f64 100644 --- a/drivers/iommu/iommu-priv.h +++ b/drivers/iommu/iommu-priv.h @@ -4,6 +4,17 @@
#include <linux/iommu.h>
+static inline const struct iommu_ops *dev_iommu_ops(struct device *dev) +{ + /* + * Assume that valid ops must be installed if iommu_probe_device() + * has succeeded. The device ops are essentially for internal use + * within the IOMMU subsystem itself, so we should be able to trust + * ourselves not to misuse the helper. + */ + return dev->iommu->iommu_dev->ops; +} + int iommu_group_replace_domain(struct iommu_group *group, struct iommu_domain *new_domain);
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index fd8fe2cd7303..437476c36de5 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -37,6 +37,8 @@
#include "iommu-sva.h"
+#include "iommu-priv.h" + static struct kset *iommu_group_kset; static DEFINE_IDA(iommu_group_ida);
diff --git a/include/linux/iommu.h b/include/linux/iommu.h index 6595454d4f48..7202d8ece343 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -444,17 +444,6 @@ static inline void iommu_iotlb_gather_init(struct iommu_iotlb_gather *gather) }; }
-static inline const struct iommu_ops *dev_iommu_ops(struct device *dev) -{ - /* - * Assume that valid ops must be installed if iommu_probe_device() - * has succeeded. The device ops are essentially for internal use - * within the IOMMU subsystem itself, so we should be able to trust - * ourselves not to misuse the helper. - */ - return dev->iommu->iommu_dev->ops; -} - extern int bus_iommu_probe(struct bus_type *bus); extern bool iommu_present(struct bus_type *bus); extern bool device_iommu_capable(struct device *dev, enum iommu_cap cap);
On 2023/3/9 15:53, Yi Liu wrote:
dev_iommu_ops() is essentially only used in iommu subsystem, so move to a private header to avoid being abused by other drivers.
Suggested-by: Jason Gunthorpejgg@nvidia.com Reviewed-by: Kevin Tiankevin.tian@intel.com Signed-off-by: Yi Liuyi.l.liu@intel.com
Reviewed-by: Lu Baolu baolu.lu@linux.intel.com
Best regards, baolu
From: Lu Baolu baolu.lu@linux.intel.com
Introduce a new iommu op to get the IOMMU hardware capabilities for iommufd. This information will be used by any vIOMMU driver which is owned by userspace.
This op chooses to make the special parameters opaque to the core. This suits the current usage model where accessing any of the IOMMU device special parameters does require a userspace driver that matches the kernel driver. If a need for common parameters, implemented similarly by several drivers, arises then there's room in the design to grow a generic parameter set as well. No wrapper API is added as it is supposed to be used by iommufd only.
Different IOMMU hardware would have different hardware information. So the information reported differs as well. To let the external user understand the difference. enum iommu_hw_info_type is defined. For the iommu drivers that are capable to report hardware information, it should have a unique iommu_hw_info_type. The iommu_hw_info_type is stored in struct iommu_ops. For the driver oesn't report hardware information, just use IOMMU_HW_INFO_TYPE_DEFAULT.
Signed-off-by: Lu Baolu baolu.lu@linux.intel.com Co-developed-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Yi Liu yi.l.liu@intel.com --- include/linux/iommu.h | 13 +++++++++++++ include/uapi/linux/iommufd.h | 7 +++++++ 2 files changed, 20 insertions(+)
diff --git a/include/linux/iommu.h b/include/linux/iommu.h index 7202d8ece343..3ef84ee359d2 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -15,6 +15,7 @@ #include <linux/of.h> #include <linux/ioasid.h> #include <uapi/linux/iommu.h> +#include <uapi/linux/iommufd.h>
#define IOMMU_READ (1 << 0) #define IOMMU_WRITE (1 << 1) @@ -222,6 +223,11 @@ struct iommu_iotlb_gather { /** * struct iommu_ops - iommu ops and capabilities * @capable: check capability + * @hw_info: IOMMU hardware information. The type of the returned data is + * defined in include/uapi/linux/iommufd.h. The data buffer is + * allocated in the IOMMU driver and the caller should free it + * after use. Return the data buffer if success, or ERR_PTR on + * failure. * @domain_alloc: allocate iommu domain * @probe_device: Add device to iommu driver handling * @release_device: Remove device from iommu driver handling @@ -246,11 +252,17 @@ struct iommu_iotlb_gather { * @remove_dev_pasid: Remove any translation configurations of a specific * pasid, so that any DMA transactions with this pasid * will be blocked by the hardware. + * @driver_type: One of enum iommu_hw_info_type. This is used in the hw_info + * reporting path. For the drivers that supports it, a unique + * type should be defined. For the driver that does not support + * it, this field is the IOMMU_HW_INFO_TYPE_DEFAULT that is 0. + * Hence, such drivers do not need to care this field. * @pgsize_bitmap: bitmap of all possible supported page sizes * @owner: Driver module providing these ops */ struct iommu_ops { bool (*capable)(struct device *dev, enum iommu_cap); + void *(*hw_info)(struct device *dev, u32 *length);
/* Domain allocation and freeing by the iommu driver */ struct iommu_domain *(*domain_alloc)(unsigned iommu_domain_type); @@ -279,6 +291,7 @@ struct iommu_ops { void (*remove_dev_pasid)(struct device *dev, ioasid_t pasid);
const struct iommu_domain_ops *default_domain_ops; + enum iommu_hw_info_type driver_type; unsigned long pgsize_bitmap; struct module *owner; }; diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h index ccd36acad36a..955cbef640da 100644 --- a/include/uapi/linux/iommufd.h +++ b/include/uapi/linux/iommufd.h @@ -370,4 +370,11 @@ struct iommu_hwpt_alloc { __u32 __reserved; }; #define IOMMU_HWPT_ALLOC _IO(IOMMUFD_TYPE, IOMMUFD_CMD_HWPT_ALLOC) + +/** + * enum iommu_hw_info_type - IOMMU Hardware Info Types + */ +enum iommu_hw_info_type { + IOMMU_HW_INFO_TYPE_DEFAULT, +}; #endif
From: Liu, Yi L yi.l.liu@intel.com Sent: Thursday, March 9, 2023 3:54 PM @@ -222,6 +223,11 @@ struct iommu_iotlb_gather { /**
- struct iommu_ops - iommu ops and capabilities
- @capable: check capability
- @hw_info: IOMMU hardware information. The type of the returned data
is
defined in include/uapi/linux/iommufd.h. The data buffer is
"The type of the returned data is marked by @driver_type".
"defined in include/uapi/linux/iommufd.h" should belong to the comment of @driver_type
allocated in the IOMMU driver and the caller should free it
after use. Return the data buffer if success, or ERR_PTR on
failure.
- @domain_alloc: allocate iommu domain
- @probe_device: Add device to iommu driver handling
- @release_device: Remove device from iommu driver handling
@@ -246,11 +252,17 @@ struct iommu_iotlb_gather {
- @remove_dev_pasid: Remove any translation configurations of a specific
pasid, so that any DMA transactions with this pasid
will be blocked by the hardware.
- @driver_type: One of enum iommu_hw_info_type. This is used in the
hw_info
reporting path. For the drivers that supports it, a unique
type should be defined. For the driver that does not support
it, this field is the IOMMU_HW_INFO_TYPE_DEFAULT that is 0.
Hence, such drivers do not need to care this field.
The meaning of "driver_type" is much broader than reporting hw_info.
let's be accurate to call it as "hw_info_type". and while we have two separate fields for one feature where is the check enforced on whether both are provided?
Is it simpler to return the type directly in @hw_info?
btw IOMMU_HW_INFO_TYPE_DEFAULT also sounds misleading. 'default' implies hw_info still available but in a default format.
probably it's clearer to call it IOMMU_HW_INFO_TYPE_NONE.
On 2023/3/16 16:16, Tian, Kevin wrote:
allocated in the IOMMU driver and the caller should free it
after use. Return the data buffer if success, or ERR_PTR on
failure.
- @domain_alloc: allocate iommu domain
- @probe_device: Add device to iommu driver handling
- @release_device: Remove device from iommu driver handling
@@ -246,11 +252,17 @@ struct iommu_iotlb_gather {
- @remove_dev_pasid: Remove any translation configurations of a specific
pasid, so that any DMA transactions with this pasid
will be blocked by the hardware.
- @driver_type: One of enum iommu_hw_info_type. This is used in the
hw_info
reporting path. For the drivers that supports it, a unique
type should be defined. For the driver that does not support
it, this field is the IOMMU_HW_INFO_TYPE_DEFAULT that is 0.
Hence, such drivers do not need to care this field.
The meaning of "driver_type" is much broader than reporting hw_info.
let's be accurate to call it as "hw_info_type". and while we have two separate fields for one feature where is the check enforced on whether both are provided?
Is it simpler to return the type directly in @hw_info?
If I remember correctly, the vendor iommu type and hardware info are reported to user space separately.
Best regards, baolu
From: Baolu Lu baolu.lu@linux.intel.com Sent: Thursday, March 16, 2023 4:30 PM
On 2023/3/16 16:16, Tian, Kevin wrote:
allocated in the IOMMU driver and the caller should free it
after use. Return the data buffer if success, or ERR_PTR on
failure.
- @domain_alloc: allocate iommu domain
- @probe_device: Add device to iommu driver handling
- @release_device: Remove device from iommu driver handling
@@ -246,11 +252,17 @@ struct iommu_iotlb_gather {
- @remove_dev_pasid: Remove any translation configurations of a
specific
pasid, so that any DMA transactions with this pasid
will be blocked by the hardware.
- @driver_type: One of enum iommu_hw_info_type. This is used in the
hw_info
reporting path. For the drivers that supports it, a unique
type should be defined. For the driver that does not support
it, this field is the IOMMU_HW_INFO_TYPE_DEFAULT that is 0.
Hence, such drivers do not need to care this field.
The meaning of "driver_type" is much broader than reporting hw_info.
let's be accurate to call it as "hw_info_type". and while we have two separate fields for one feature where is the check enforced on whether both are provided?
Is it simpler to return the type directly in @hw_info?
If I remember correctly, the vendor iommu type and hardware info are reported to user space separately.
there is only one IOMMU_DEVICE_GET_HW_INFO cmd. It's written as:
data = ops->hw_info(idev->dev, &data_len); copy_to_user(u64_to_user_ptr(cmd->data_ptr), data, length); cmd->out_data_type = ops->driver_type;
From: Tian, Kevin kevin.tian@intel.com Sent: Thursday, March 16, 2023 4:17 PM
From: Liu, Yi L yi.l.liu@intel.com Sent: Thursday, March 9, 2023 3:54 PM @@ -222,6 +223,11 @@ struct iommu_iotlb_gather { /**
- struct iommu_ops - iommu ops and capabilities
- @capable: check capability
- @hw_info: IOMMU hardware information. The type of the returned
data
is
defined in include/uapi/linux/iommufd.h. The data buffer is
"The type of the returned data is marked by @driver_type".
"defined in include/uapi/linux/iommufd.h" should belong to the comment of @driver_type
Sure.
allocated in the IOMMU driver and the caller should free it
after use. Return the data buffer if success, or ERR_PTR on
failure.
- @domain_alloc: allocate iommu domain
- @probe_device: Add device to iommu driver handling
- @release_device: Remove device from iommu driver handling
@@ -246,11 +252,17 @@ struct iommu_iotlb_gather {
- @remove_dev_pasid: Remove any translation configurations of a
specific
pasid, so that any DMA transactions with this pasid
will be blocked by the hardware.
- @driver_type: One of enum iommu_hw_info_type. This is used in the
hw_info
reporting path. For the drivers that supports it, a unique
type should be defined. For the driver that does not support
it, this field is the IOMMU_HW_INFO_TYPE_DEFAULT that is 0.
Hence, such drivers do not need to care this field.
The meaning of "driver_type" is much broader than reporting hw_info.
let's be accurate to call it as "hw_info_type". and while we have two separate fields for one feature where is the check enforced on whether both are provided?
It is filled in the uapi structure by referring ops->driver_type in next patch.
Is it simpler to return the type directly in @hw_info?
Per the current description, if the iommu driver doesn't implement .hw_info callback, then it will not set driver_type field neither. Then this field is 0 (IOMMU_HW_INFO_TYPE_NONE). The GET_HW_INFO ioctl in next patch would fail as well. Under this implementation, returning the driver_type (a.k.a hw_info_type per your comment) in the hw_info callback may be simpler.
But I plan to update the implementation per the below remark from Jason. The GET_HW_INFO needs to succeed even if the underlying iommu driver does not implement hw_info callback. If so, it's still much more convenient to get the type by referring ops->driver_type.
https://lore.kernel.org/kvm/ZAcwJSK%2F9UVI9LXu@nvidia.com/
Also, per Nic's other remark, there would be a bitmap named hwpt_types field added to iommu_ops. Then it is also easier to referring it by ops->hwpt_types.
https://lore.kernel.org/linux-iommu/ZArgAXMUpNjDfFgZ@Asurada-Nvidia/#t
Surely, we also have another alternative. We can enforce all the iommu drivers to implement a minimum hw_info callback which just returns the driver_type if it does not have driver-specific data to report to the user yet.
btw IOMMU_HW_INFO_TYPE_DEFAULT also sounds misleading. 'default' implies hw_info still available but in a default format.
probably it's clearer to call it IOMMU_HW_INFO_TYPE_NONE.
Sure. Makes sense. So _NONE means no driver specific info is Reported back to user.
Regards, Yi Liu
Under nested IOMMU translation, userspace owns the stage-1 translation table (e.g. the stage-1 page table of Intel VT-d or the context table of ARM SMMUv3, and etc.). Stage-1 translation tables are vendor specific, and needs to be compatiable with the underlying IOMMU hardware. Hence, userspace should know the IOMMU hardware capability before creating and configuring the stage-1 translation table to kernel.
This adds IOMMU_DEVICE_GET_HW_INFO to query the IOMMU hardware information for a given device. The returned data is vendor specific, userspace needs to decode it with the structure mapped by the @out_data_type field.
As only physical devices have IOMMU hardware, so this will return error if the given device is not a physical device.
Co-developed-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Yi Liu yi.l.liu@intel.com --- drivers/iommu/iommufd/device.c | 74 +++++++++++++++++++++++++ drivers/iommu/iommufd/iommufd_private.h | 1 + drivers/iommu/iommufd/main.c | 3 + include/uapi/linux/iommufd.h | 40 +++++++++++++ 4 files changed, 118 insertions(+)
diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c index c10e02f6a0be..6948539488a5 100644 --- a/drivers/iommu/iommufd/device.c +++ b/drivers/iommu/iommufd/device.c @@ -257,6 +257,80 @@ struct iommufd_ctx *iommufd_device_to_ictx(struct iommufd_device *idev) } EXPORT_SYMBOL_NS_GPL(iommufd_device_to_ictx, IOMMUFD);
+static int iommufd_zero_fill_user(u64 ptr, int bytes) +{ + int index = 0; + + for (; index < bytes; index++) { + if (put_user(0, (uint8_t __user *)(ptr + index))) + return -EFAULT; + } + return 0; +} + +int iommufd_device_get_hw_info(struct iommufd_ucmd *ucmd) +{ + struct iommu_hw_info *cmd = ucmd->cmd; + struct iommufd_device *idev; + const struct iommu_ops *ops; + void *data; + unsigned int length, data_len; + int rc; + + if (cmd->flags || cmd->__reserved || !cmd->data_len) + return -EOPNOTSUPP; + + idev = iommufd_get_device(ucmd, cmd->dev_id); + if (IS_ERR(idev)) + return PTR_ERR(idev); + + ops = dev_iommu_ops(idev->dev); + if (!ops || !ops->hw_info) { + rc = -EOPNOTSUPP; + goto out_put; + } + + /* driver has hw_info callback should have a unique driver_type */ + if (WARN_ON(ops->driver_type == IOMMU_HW_INFO_TYPE_DEFAULT)) { + rc = -EOPNOTSUPP; + goto out_put; + } + + data = ops->hw_info(idev->dev, &data_len); + if (IS_ERR(data)) { + rc = PTR_ERR(data); + goto out_put; + } + + length = min(cmd->data_len, data_len); + if (copy_to_user(u64_to_user_ptr(cmd->data_ptr), data, length)) { + rc = -EFAULT; + goto out_free_data; + } + + /* + * Zero the trailing bytes if the user buffer is bigger than the + * data size kernel actually has. + */ + if (length < cmd->data_len) { + rc = iommufd_zero_fill_user(cmd->data_ptr + length, + cmd->data_len - length); + if (rc) + goto out_free_data; + } + + cmd->out_data_type = ops->driver_type; + cmd->data_len = length; + + rc = iommufd_ucmd_respond(ucmd, sizeof(*cmd)); + +out_free_data: + kfree(data); +out_put: + iommufd_put_object(&idev->obj); + return rc; +} + static int iommufd_group_setup_msi(struct iommufd_group *igroup, struct iommufd_hw_pagetable *hwpt) { diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h index b18f843ad6a4..05b5ad66f716 100644 --- a/drivers/iommu/iommufd/iommufd_private.h +++ b/drivers/iommu/iommufd/iommufd_private.h @@ -307,6 +307,7 @@ iommufd_get_device(struct iommufd_ucmd *ucmd, u32 id) }
void iommufd_device_destroy(struct iommufd_object *obj); +int iommufd_device_get_hw_info(struct iommufd_ucmd *ucmd);
struct iommufd_access { struct iommufd_object obj; diff --git a/drivers/iommu/iommufd/main.c b/drivers/iommu/iommufd/main.c index 694da191e4b1..f079c0bda46b 100644 --- a/drivers/iommu/iommufd/main.c +++ b/drivers/iommu/iommufd/main.c @@ -262,6 +262,7 @@ static int iommufd_option(struct iommufd_ucmd *ucmd) union ucmd_buffer { struct iommu_destroy destroy; struct iommu_hwpt_alloc hwpt; + struct iommu_hw_info info; struct iommu_ioas_alloc alloc; struct iommu_ioas_allow_iovas allow_iovas; struct iommu_ioas_copy ioas_copy; @@ -295,6 +296,8 @@ static const struct iommufd_ioctl_op iommufd_ioctl_ops[] = { IOCTL_OP(IOMMU_DESTROY, iommufd_destroy, struct iommu_destroy, id), IOCTL_OP(IOMMU_HWPT_ALLOC, iommufd_hwpt_alloc, struct iommu_hwpt_alloc, __reserved), + IOCTL_OP(IOMMU_DEVICE_GET_HW_INFO, iommufd_device_get_hw_info, + struct iommu_hw_info, __reserved), IOCTL_OP(IOMMU_IOAS_ALLOC, iommufd_ioas_alloc_ioctl, struct iommu_ioas_alloc, out_ioas_id), IOCTL_OP(IOMMU_IOAS_ALLOW_IOVAS, iommufd_ioas_allow_iovas, diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h index 955cbef640da..4ac525897b82 100644 --- a/include/uapi/linux/iommufd.h +++ b/include/uapi/linux/iommufd.h @@ -46,6 +46,7 @@ enum { IOMMUFD_CMD_OPTION, IOMMUFD_CMD_VFIO_IOAS, IOMMUFD_CMD_HWPT_ALLOC, + IOMMUFD_CMD_DEVICE_GET_HW_INFO, };
/** @@ -377,4 +378,43 @@ struct iommu_hwpt_alloc { enum iommu_hw_info_type { IOMMU_HW_INFO_TYPE_DEFAULT, }; + +/** + * struct iommu_hw_info - ioctl(IOMMU_DEVICE_GET_HW_INFO) + * @size: sizeof(struct iommu_hw_info) + * @flags: Must be 0 + * @dev_id: The device being attached to the iommufd + * @data_len: Input the length of the user buffer in bytes. Output the + * length of data filled to the user buffer. + * @data_ptr: Pointer to the type specific structure + * @out_data_type: Output the iommu hardware info type, it is one of + * enum iommu_hw_info_type. + * @__reserved: Must be 0 + * + * Query the hardware iommu information for given device which has been + * bound to iommufd. @data_len is the size of the buffer which captures + * iommu type specific data and the data will be filled. Trailing bytes + * are zeroed if the user buffer is larger than the data kernel has. + * + * The type specific data would be used to sync capability between the + * vIOMMU and the hardware IOMMU. e.g. nested translation requires to + * check the hardware IOMMU capability, since a stage-1 translation table + * is owned by user but used by hardware IOMMU. + * + * The @out_data_type will be filled if the ioctl succeeds. It would + * be used to decode the data filled in the buffer pointed by @data_ptr. + * + * This is only available for the physical devices bound to iommufd as + * only physical devices can have hardware IOMMU. + */ +struct iommu_hw_info { + __u32 size; + __u32 flags; + __u32 dev_id; + __u32 data_len; + __aligned_u64 data_ptr; + __u32 out_data_type; + __u32 __reserved; +}; +#define IOMMU_DEVICE_GET_HW_INFO _IO(IOMMUFD_TYPE, IOMMUFD_CMD_DEVICE_GET_HW_INFO) #endif
On 2023/3/9 15:53, Yi Liu wrote:
Under nested IOMMU translation, userspace owns the stage-1 translation table (e.g. the stage-1 page table of Intel VT-d or the context table of ARM SMMUv3, and etc.). Stage-1 translation tables are vendor specific, and needs to be compatiable with the underlying IOMMU hardware. Hence, userspace should know the IOMMU hardware capability before creating and configuring the stage-1 translation table to kernel.
This adds IOMMU_DEVICE_GET_HW_INFO to query the IOMMU hardware information for a given device. The returned data is vendor specific, userspace needs to decode it with the structure mapped by the @out_data_type field.
As only physical devices have IOMMU hardware, so this will return error if the given device is not a physical device.
Co-developed-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Yi Liu yi.l.liu@intel.com
drivers/iommu/iommufd/device.c | 74 +++++++++++++++++++++++++ drivers/iommu/iommufd/iommufd_private.h | 1 + drivers/iommu/iommufd/main.c | 3 + include/uapi/linux/iommufd.h | 40 +++++++++++++ 4 files changed, 118 insertions(+)
diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c index c10e02f6a0be..6948539488a5 100644 --- a/drivers/iommu/iommufd/device.c +++ b/drivers/iommu/iommufd/device.c @@ -257,6 +257,80 @@ struct iommufd_ctx *iommufd_device_to_ictx(struct iommufd_device *idev) } EXPORT_SYMBOL_NS_GPL(iommufd_device_to_ictx, IOMMUFD); +static int iommufd_zero_fill_user(u64 ptr, int bytes) +{
- int index = 0;
- for (; index < bytes; index++) {
if (put_user(0, (uint8_t __user *)(ptr + index)))
return -EFAULT;
- }
- return 0;
+}
+int iommufd_device_get_hw_info(struct iommufd_ucmd *ucmd) +{
- struct iommu_hw_info *cmd = ucmd->cmd;
- struct iommufd_device *idev;
- const struct iommu_ops *ops;
- void *data;
- unsigned int length, data_len;
- int rc;
Reverse christmas tree order for declarations.
- if (cmd->flags || cmd->__reserved || !cmd->data_len)
return -EOPNOTSUPP;
- idev = iommufd_get_device(ucmd, cmd->dev_id);
- if (IS_ERR(idev))
return PTR_ERR(idev);
- ops = dev_iommu_ops(idev->dev);
- if (!ops || !ops->hw_info) {
dev_iommu_ops() will never return a NULL.
Need below check
dev->iommu && dev->iommu->iommu_dev
before dev_iommu_ops(). Perhaps something like below?
if (!dev->iommu || !dev->iommu->iommu_dev) return -EINVAL;
ops = dev_iommu_ops(idev->dev); if (!ops->hw_info) return -ENODEV;
rc = -EOPNOTSUPP;
goto out_put;
- }
- /* driver has hw_info callback should have a unique driver_type */
- if (WARN_ON(ops->driver_type == IOMMU_HW_INFO_TYPE_DEFAULT)) {
If so, perhaps IOMMU_HW_INFO_TYPE_INVALID is more readable?
I'm not sure if a calltrace is really necessary here. If not, perhaps,
if (ops->driver_type == IOMMU_HW_INFO_TYPE_DEFAULT) { pr_warn_ratelimited("iommu driver set an invalid type\n"); rc = -ENODEV; goto out_put; }
rc = -EOPNOTSUPP;
goto out_put;
- }
- data = ops->hw_info(idev->dev, &data_len);
- if (IS_ERR(data)) {
rc = PTR_ERR(data);
goto out_put;
- }
- length = min(cmd->data_len, data_len);
- if (copy_to_user(u64_to_user_ptr(cmd->data_ptr), data, length)) {
rc = -EFAULT;
goto out_free_data;
- }
- /*
* Zero the trailing bytes if the user buffer is bigger than the
* data size kernel actually has.
*/
- if (length < cmd->data_len) {
rc = iommufd_zero_fill_user(cmd->data_ptr + length,
cmd->data_len - length);
if (rc)
goto out_free_data;
- }
- cmd->out_data_type = ops->driver_type;
- cmd->data_len = length;
- rc = iommufd_ucmd_respond(ucmd, sizeof(*cmd));
+out_free_data:
- kfree(data);
+out_put:
- iommufd_put_object(&idev->obj);
- return rc;
+}
- static int iommufd_group_setup_msi(struct iommufd_group *igroup, struct iommufd_hw_pagetable *hwpt) {
diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h index b18f843ad6a4..05b5ad66f716 100644 --- a/drivers/iommu/iommufd/iommufd_private.h +++ b/drivers/iommu/iommufd/iommufd_private.h @@ -307,6 +307,7 @@ iommufd_get_device(struct iommufd_ucmd *ucmd, u32 id) } void iommufd_device_destroy(struct iommufd_object *obj); +int iommufd_device_get_hw_info(struct iommufd_ucmd *ucmd); struct iommufd_access { struct iommufd_object obj; diff --git a/drivers/iommu/iommufd/main.c b/drivers/iommu/iommufd/main.c index 694da191e4b1..f079c0bda46b 100644 --- a/drivers/iommu/iommufd/main.c +++ b/drivers/iommu/iommufd/main.c @@ -262,6 +262,7 @@ static int iommufd_option(struct iommufd_ucmd *ucmd) union ucmd_buffer { struct iommu_destroy destroy; struct iommu_hwpt_alloc hwpt;
- struct iommu_hw_info info; struct iommu_ioas_alloc alloc; struct iommu_ioas_allow_iovas allow_iovas; struct iommu_ioas_copy ioas_copy;
@@ -295,6 +296,8 @@ static const struct iommufd_ioctl_op iommufd_ioctl_ops[] = { IOCTL_OP(IOMMU_DESTROY, iommufd_destroy, struct iommu_destroy, id), IOCTL_OP(IOMMU_HWPT_ALLOC, iommufd_hwpt_alloc, struct iommu_hwpt_alloc, __reserved),
- IOCTL_OP(IOMMU_DEVICE_GET_HW_INFO, iommufd_device_get_hw_info,
IOCTL_OP(IOMMU_IOAS_ALLOC, iommufd_ioas_alloc_ioctl, struct iommu_ioas_alloc, out_ioas_id), IOCTL_OP(IOMMU_IOAS_ALLOW_IOVAS, iommufd_ioas_allow_iovas,struct iommu_hw_info, __reserved),
diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h index 955cbef640da..4ac525897b82 100644 --- a/include/uapi/linux/iommufd.h +++ b/include/uapi/linux/iommufd.h @@ -46,6 +46,7 @@ enum { IOMMUFD_CMD_OPTION, IOMMUFD_CMD_VFIO_IOAS, IOMMUFD_CMD_HWPT_ALLOC,
- IOMMUFD_CMD_DEVICE_GET_HW_INFO, };
/** @@ -377,4 +378,43 @@ struct iommu_hwpt_alloc { enum iommu_hw_info_type { IOMMU_HW_INFO_TYPE_DEFAULT, };
+/**
- struct iommu_hw_info - ioctl(IOMMU_DEVICE_GET_HW_INFO)
- @size: sizeof(struct iommu_hw_info)
- @flags: Must be 0
- @dev_id: The device being attached to the iommufd
- @data_len: Input the length of the user buffer in bytes. Output the
length of data filled to the user buffer.
- @data_ptr: Pointer to the type specific structure
- @out_data_type: Output the iommu hardware info type, it is one of
enum iommu_hw_info_type.
- @__reserved: Must be 0
- Query the hardware iommu information for given device which has been
- bound to iommufd. @data_len is the size of the buffer which captures
- iommu type specific data and the data will be filled. Trailing bytes
- are zeroed if the user buffer is larger than the data kernel has.
- The type specific data would be used to sync capability between the
- vIOMMU and the hardware IOMMU. e.g. nested translation requires to
- check the hardware IOMMU capability, since a stage-1 translation table
- is owned by user but used by hardware IOMMU.
- The @out_data_type will be filled if the ioctl succeeds. It would
- be used to decode the data filled in the buffer pointed by @data_ptr.
- This is only available for the physical devices bound to iommufd as
- only physical devices can have hardware IOMMU.
- */
+struct iommu_hw_info {
- __u32 size;
- __u32 flags;
- __u32 dev_id;
- __u32 data_len;
- __aligned_u64 data_ptr;
- __u32 out_data_type;
- __u32 __reserved;
+}; +#define IOMMU_DEVICE_GET_HW_INFO _IO(IOMMUFD_TYPE, IOMMUFD_CMD_DEVICE_GET_HW_INFO) #endif
Other look good to me.
Best regards, baolu
On Thu, Mar 09, 2023 at 09:50:18PM +0800, Baolu Lu wrote:
- if (cmd->flags || cmd->__reserved || !cmd->data_len)
return -EOPNOTSUPP;
- idev = iommufd_get_device(ucmd, cmd->dev_id);
- if (IS_ERR(idev))
return PTR_ERR(idev);
- ops = dev_iommu_ops(idev->dev);
- if (!ops || !ops->hw_info) {
dev_iommu_ops() will never return a NULL.
Need below check
dev->iommu && dev->iommu->iommu_dev
before dev_iommu_ops(). Perhaps something like below?
if (!dev->iommu || !dev->iommu->iommu_dev) return -EINVAL;
At this point the device has become owned through the ownership API, it absolutely has to have an iommu and an ops. No need to check anything.
Jason
From: Jason Gunthorpe jgg@nvidia.com Sent: Friday, March 10, 2023 1:21 AM
On Thu, Mar 09, 2023 at 09:50:18PM +0800, Baolu Lu wrote:
- if (cmd->flags || cmd->__reserved || !cmd->data_len)
return -EOPNOTSUPP;
- idev = iommufd_get_device(ucmd, cmd->dev_id);
- if (IS_ERR(idev))
return PTR_ERR(idev);
- ops = dev_iommu_ops(idev->dev);
- if (!ops || !ops->hw_info) {
dev_iommu_ops() will never return a NULL.
Need below check
dev->iommu && dev->iommu->iommu_dev
before dev_iommu_ops(). Perhaps something like below?
if (!dev->iommu || !dev->iommu->iommu_dev) return -EINVAL;
At this point the device has become owned through the ownership API, it absolutely has to have an iommu and an ops. No need to check anything.
ok. so just needs to check hw_info callback.
From: Liu, Yi L yi.l.liu@intel.com Sent: Thursday, March 9, 2023 3:54 PM
+int iommufd_device_get_hw_info(struct iommufd_ucmd *ucmd) +{
- struct iommu_hw_info *cmd = ucmd->cmd;
- struct iommufd_device *idev;
- const struct iommu_ops *ops;
- void *data;
- unsigned int length, data_len;
- int rc;
- if (cmd->flags || cmd->__reserved || !cmd->data_len)
return -EOPNOTSUPP;
- idev = iommufd_get_device(ucmd, cmd->dev_id);
- if (IS_ERR(idev))
return PTR_ERR(idev);
- ops = dev_iommu_ops(idev->dev);
- if (!ops || !ops->hw_info) {
rc = -EOPNOTSUPP;
goto out_put;
- }
- /* driver has hw_info callback should have a unique driver_type */
- if (WARN_ON(ops->driver_type ==
IOMMU_HW_INFO_TYPE_DEFAULT)) {
rc = -EOPNOTSUPP;
goto out_put;
- }
ok, here is where the check is done.
- data = ops->hw_info(idev->dev, &data_len);
if we directly return type in @hw_info, this becomes:
data = ops->hw_info(idev->dev, &data_len, &driver_type);
+/**
- struct iommu_hw_info - ioctl(IOMMU_DEVICE_GET_HW_INFO)
- @size: sizeof(struct iommu_hw_info)
- @flags: Must be 0
- @dev_id: The device being attached to the iommufd
"The device bound to the iommufd"
- @data_len: Input the length of the user buffer in bytes. Output the
length of data filled to the user buffer.
s/to/in/
- @data_ptr: Pointer to the type specific structure
"Pointer to the user buffer"
- @out_data_type: Output the iommu hardware info type, it is one of
enum iommu_hw_info_type.
s/it is one of/as defined by/
- @__reserved: Must be 0
- Query the hardware iommu information for given device which has been
- bound to iommufd. @data_len is the size of the buffer which captures
- iommu type specific data and the data will be filled. Trailing bytes
- are zeroed if the user buffer is larger than the data kernel has.
- The type specific data would be used to sync capability between the
- vIOMMU and the hardware IOMMU. e.g. nested translation requires to
s/vIOMMU/virtual IOMMU/
- check the hardware IOMMU capability, since a stage-1 translation table
- is owned by user but used by hardware IOMMU.
"check ... capability so guest stage-1 page table uses a format compatible to the hardware IOMMU"
- The @out_data_type will be filled if the ioctl succeeds. It would
- be used to decode the data filled in the buffer pointed by @data_ptr.
- This is only available for the physical devices bound to iommufd as
- only physical devices can have hardware IOMMU.
not required. User doesn't know whether it's physical or emulated device.
From: Nicolin Chen nicolinc@nvidia.com
Add a mock_domain_hw_info function and an iommu_hw_info_selftest data structure. This allows to test the IOMMU_DEVICE_GET_HW_INFO ioctl by passing the test_reg value for the mock_dev.
Signed-off-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Yi Liu yi.l.liu@intel.com --- drivers/iommu/iommufd/device.c | 1 + drivers/iommu/iommufd/iommufd_test.h | 15 +++++++++++ drivers/iommu/iommufd/selftest.c | 16 ++++++++++++ tools/testing/selftests/iommu/iommufd.c | 17 +++++++++++- tools/testing/selftests/iommu/iommufd_utils.h | 26 +++++++++++++++++++ 5 files changed, 74 insertions(+), 1 deletion(-)
diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c index 6948539488a5..5c352807d946 100644 --- a/drivers/iommu/iommufd/device.c +++ b/drivers/iommu/iommufd/device.c @@ -8,6 +8,7 @@
#include "io_pagetable.h" #include "iommufd_private.h" +#include "iommufd_test.h"
static bool allow_unsafe_interrupts; module_param(allow_unsafe_interrupts, bool, S_IRUGO | S_IWUSR); diff --git a/drivers/iommu/iommufd/iommufd_test.h b/drivers/iommu/iommufd/iommufd_test.h index da1898a9128f..578691602d94 100644 --- a/drivers/iommu/iommufd/iommufd_test.h +++ b/drivers/iommu/iommufd/iommufd_test.h @@ -100,4 +100,19 @@ struct iommu_test_cmd { }; #define IOMMU_TEST_CMD _IO(IOMMUFD_TYPE, IOMMUFD_CMD_BASE + 32)
+/* Mock structs for IOMMU_DEVICE_GET_HW_INFO ioctl */ +#define IOMMU_HW_INFO_TYPE_SELFTEST 0xfeedbeef +#define IOMMU_HW_INFO_SELFTEST_REGVAL 0xdeadbeef + +/** + * struct iommu_hw_info_selftest + * + * @flags: Must be set to 0 + * @test_reg: Pass IOMMU_HW_INFO_SELFTEST_REGVAL to user selftest program + */ +struct iommu_hw_info_selftest { + __u32 flags; + __u32 test_reg; +}; + #endif diff --git a/drivers/iommu/iommufd/selftest.c b/drivers/iommu/iommufd/selftest.c index d7832ffc3aa6..b50ec3528ec1 100644 --- a/drivers/iommu/iommufd/selftest.c +++ b/drivers/iommu/iommufd/selftest.c @@ -128,6 +128,20 @@ static struct iommu_domain mock_blocking_domain = { .ops = &mock_blocking_ops, };
+static void *mock_domain_hw_info(struct device *dev, u32 *length) +{ + struct iommu_hw_info_selftest *info; + + info = kzalloc(sizeof(*info), GFP_KERNEL); + if (!info) + return ERR_PTR(-ENOMEM); + + info->test_reg = IOMMU_HW_INFO_SELFTEST_REGVAL; + *length = sizeof(*info); + + return info; +} + static struct iommu_domain *mock_domain_alloc(unsigned int iommu_domain_type) { struct mock_iommu_domain *mock; @@ -279,6 +293,8 @@ static void mock_domain_set_plaform_dma_ops(struct device *dev) static const struct iommu_ops mock_ops = { .owner = THIS_MODULE, .pgsize_bitmap = MOCK_IO_PAGE_SIZE, + .driver_type = IOMMU_HW_INFO_TYPE_SELFTEST, + .hw_info = mock_domain_hw_info, .domain_alloc = mock_domain_alloc, .capable = mock_domain_capable, .set_platform_dma_ops = mock_domain_set_plaform_dma_ops, diff --git a/tools/testing/selftests/iommu/iommufd.c b/tools/testing/selftests/iommu/iommufd.c index 5c33b6b52c65..d2ce2ddbdc40 100644 --- a/tools/testing/selftests/iommu/iommufd.c +++ b/tools/testing/selftests/iommu/iommufd.c @@ -124,6 +124,7 @@ TEST_F(iommufd, cmd_length) TEST_LENGTH(iommu_ioas_unmap, IOMMU_IOAS_UNMAP); TEST_LENGTH(iommu_option, IOMMU_OPTION); TEST_LENGTH(iommu_vfio_ioas, IOMMU_VFIO_IOAS); + TEST_LENGTH(iommu_hw_info, IOMMU_DEVICE_GET_HW_INFO); #undef TEST_LENGTH }
@@ -188,6 +189,7 @@ FIXTURE(iommufd_ioas) uint32_t ioas_id; uint32_t stdev_id; uint32_t hwpt_id; + uint32_t device_id; uint64_t base_iova; };
@@ -214,7 +216,7 @@ FIXTURE_SETUP(iommufd_ioas)
for (i = 0; i != variant->mock_domains; i++) { test_cmd_mock_domain(self->ioas_id, &self->stdev_id, - &self->hwpt_id, NULL); + &self->hwpt_id, &self->device_id); self->base_iova = MOCK_APERTURE_START; } } @@ -293,6 +295,19 @@ TEST_F(iommufd_ioas, ioas_area_auto_destroy) } }
+TEST_F(iommufd_ioas, device_get_hw_info) +{ + struct iommu_hw_info_selftest info; + + if (self->device_id) { + test_cmd_device_get_hw_info(self->device_id, sizeof(info), &info); + assert(info.test_reg == IOMMU_HW_INFO_SELFTEST_REGVAL); + } else { + test_err_device_get_hw_info(ENOENT, self->device_id, + sizeof(info), &info); + } +} + TEST_F(iommufd_ioas, area) { int i; diff --git a/tools/testing/selftests/iommu/iommufd_utils.h b/tools/testing/selftests/iommu/iommufd_utils.h index 399779acce23..b57e1e60f69d 100644 --- a/tools/testing/selftests/iommu/iommufd_utils.h +++ b/tools/testing/selftests/iommu/iommufd_utils.h @@ -345,3 +345,29 @@ static void teardown_iommufd(int fd, struct __test_metadata *_metadata) })
#endif + +static int _test_cmd_device_get_hw_info(int fd, __u32 device_id, + __u32 data_len, void *data) +{ + struct iommu_hw_info cmd = { + .size = sizeof(cmd), + .dev_id = device_id, + .data_len = data_len, + .data_ptr = (uint64_t)data, + }; + int ret; + + ret = ioctl(fd, IOMMU_DEVICE_GET_HW_INFO, &cmd); + if (ret) + return ret; + return 0; +} + +#define test_cmd_device_get_hw_info(device_id, data_len, data) \ + ASSERT_EQ(0, _test_cmd_device_get_hw_info(self->fd, device_id, \ + data_len, data)) + +#define test_err_device_get_hw_info(_errno, device_id, data_len, data) \ + EXPECT_ERRNO(_errno, \ + _test_cmd_device_get_hw_info(self->fd, device_id, \ + data_len, data))
On Wed, Mar 08, 2023 at 11:53:58PM -0800, Yi Liu wrote:
diff --git a/drivers/iommu/iommufd/iommufd_test.h b/drivers/iommu/iommufd/iommufd_test.h index da1898a9128f..578691602d94 100644 --- a/drivers/iommu/iommufd/iommufd_test.h +++ b/drivers/iommu/iommufd/iommufd_test.h @@ -100,4 +100,19 @@ struct iommu_test_cmd { }; #define IOMMU_TEST_CMD _IO(IOMMUFD_TYPE, IOMMUFD_CMD_BASE + 32) +/* Mock structs for IOMMU_DEVICE_GET_HW_INFO ioctl */ +#define IOMMU_HW_INFO_TYPE_SELFTEST 0xfeedbeef +#define IOMMU_HW_INFO_SELFTEST_REGVAL 0xdeadbeef
+/**
- struct iommu_hw_info_selftest
- @flags: Must be set to 0
- @test_reg: Pass IOMMU_HW_INFO_SELFTEST_REGVAL to user selftest program
- */
Probably don't need the comment, it is misleading
+struct iommu_hw_info_selftest {
struct iommu_test_hw_info
Jason
On Wed, Mar 08, 2023 at 11:53:54PM -0800, Yi Liu wrote:
iommufd gives userspace the capability to manipulate iommu subsytem. e.g. DMA map/unmap etc. In the near future, it will support iommu nested translation. Different platform vendors have different implementation for the nested translation. So before set up nested translation, userspace needs to know the hardware iommu information. For example, Intel VT-d supports using guest I/O page table as the stage-1 translation table. This requires guest I/O page table be compatible with hardware IOMMU.
This series reports the iommu hardware information for a given iommufd_device which has been bound to iommufd. It is preparation work for userspace to allocate hwpt for given device. Like the nested translation support[1].
This series introduces an iommu op to report the iommu hardware info, and an ioctl IOMMU_DEVICE_GET_HW_INFO is added to report such hardware info to user. enum iommu_hw_info_type is defined to differentiate the iommu hardware info reported to user hence user can decode them. This series only adds the framework for iommu hw info reporting, the complete reporting path needs vendor specific definition and driver support. The full picture is available in [1] as well.
Other than the small notes this looks pretty good to me
Jason
linux-kselftest-mirror@lists.linaro.org