iommufd gives userspace the capability to manipulate iommu subsytem. e.g. DMA map/unmap etc. In the near future, it will support iommu nested translation. Different platform vendors have different implementation for the nested translation. For example, Intel VT-d supports using guest I/O page table as the stage-1 translation table. This requires guest I/O page table be compatible with hardware IOMMU. So before set up nested translation, userspace needs to know the hardware iommu information to understand the nested translation requirements.
This series reports the iommu hardware information for a given device which has been bound to iommufd. It is preparation work for userspace to allocate hwpt for given device. Like the nested translation support[1].
This series introduces an iommu op to report the iommu hardware info, and an ioctl IOMMU_GET_HW_INFO is added to report such hardware info to user. enum iommu_hw_info_type is defined to differentiate the iommu hardware info reported to user hence user can decode them. This series only adds the framework for iommu hw info reporting, the complete reporting path needs vendor specific definition and driver support. The full code is available in [1] as well.
[1] https://github.com/yiliu1765/iommufd/tree/wip/iommufd_nesting_08112023-yi (only the hw_info report path is the latest, other parts is wip)
Change log:
v7: - Use clear_user() (Jason) - Add fail_nth for hw_ifo (Jason)
v6: https://lore.kernel.org/linux-iommu/20230808153510.4170-1-yi.l.liu@intel.com... - Add Jingqi's comment on patch 02 - Add Baolu's r-b to patch 03 - Address Jason's comment on patch 03
v5: https://lore.kernel.org/linux-iommu/20230803143144.200945-1-yi.l.liu@intel.c... - Return hw_info_type in the .hw_info op, hence drop hw_info_type field in iommu_ops (Kevin) - Add Jason's r-b for patch 01 - Address coding style comments from Jason and Kevin w.r.t. patch 02, 03 and 04
v4: https://lore.kernel.org/linux-iommu/20230724105936.107042-1-yi.l.liu@intel.c... - Rename ioctl to IOMMU_GET_HW_INFO and structure to iommu_hw_info - Move the iommufd_get_hw_info handler to main.c - Place iommu_hw_info prior to iommu_hwpt_alloc - Update the function namings accordingly - Update uapi kdocs
v3: https://lore.kernel.org/linux-iommu/20230511143024.19542-1-yi.l.liu@intel.co... - Add r-b from Baolu - Rename IOMMU_HW_INFO_TYPE_DEFAULT to be IOMMU_HW_INFO_TYPE_NONE to better suit what it means - Let IOMMU_DEVICE_GET_HW_INFO succeed even the underlying iommu driver does not have driver-specific data to report per below remark. https://lore.kernel.org/kvm/ZAcwJSK%2F9UVI9LXu@nvidia.com/
v2: https://lore.kernel.org/linux-iommu/20230309075358.571567-1-yi.l.liu@intel.c... - Drop patch 05 of v1 as it is already covered by other series - Rename the capability info to be iommu hardware info
v1: https://lore.kernel.org/linux-iommu/20230209041642.9346-1-yi.l.liu@intel.com...
Regards, Yi Liu
Lu Baolu (1): iommu: Add new iommu op to get iommu hardware information
Nicolin Chen (1): iommufd/selftest: Add coverage for IOMMU_GET_HW_INFO ioctl
Yi Liu (2): iommu: Move dev_iommu_ops() to private header iommufd: Add IOMMU_GET_HW_INFO
drivers/iommu/iommu-priv.h | 11 +++ drivers/iommu/iommufd/iommufd_test.h | 9 ++ drivers/iommu/iommufd/main.c | 85 +++++++++++++++++++ drivers/iommu/iommufd/selftest.c | 16 ++++ include/linux/iommu.h | 20 ++--- include/uapi/linux/iommufd.h | 45 ++++++++++ tools/testing/selftests/iommu/iommufd.c | 28 +++++- .../selftests/iommu/iommufd_fail_nth.c | 4 + tools/testing/selftests/iommu/iommufd_utils.h | 47 ++++++++++ 9 files changed, 253 insertions(+), 12 deletions(-)
dev_iommu_ops() is essentially only used in iommu subsystem, so move to a private header to avoid being abused by other drivers.
Suggested-by: Jason Gunthorpe jgg@nvidia.com Reviewed-by: Kevin Tian kevin.tian@intel.com Reviewed-by: Lu Baolu baolu.lu@linux.intel.com Reviewed-by: Jason Gunthorpe jgg@nvidia.com Signed-off-by: Yi Liu yi.l.liu@intel.com --- drivers/iommu/iommu-priv.h | 11 +++++++++++ include/linux/iommu.h | 11 ----------- 2 files changed, 11 insertions(+), 11 deletions(-)
diff --git a/drivers/iommu/iommu-priv.h b/drivers/iommu/iommu-priv.h index 7c8011bfd153..a6e694f59f64 100644 --- a/drivers/iommu/iommu-priv.h +++ b/drivers/iommu/iommu-priv.h @@ -4,6 +4,17 @@
#include <linux/iommu.h>
+static inline const struct iommu_ops *dev_iommu_ops(struct device *dev) +{ + /* + * Assume that valid ops must be installed if iommu_probe_device() + * has succeeded. The device ops are essentially for internal use + * within the IOMMU subsystem itself, so we should be able to trust + * ourselves not to misuse the helper. + */ + return dev->iommu->iommu_dev->ops; +} + int iommu_group_replace_domain(struct iommu_group *group, struct iommu_domain *new_domain);
diff --git a/include/linux/iommu.h b/include/linux/iommu.h index d31642596675..e0245aa82b75 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -450,17 +450,6 @@ static inline void iommu_iotlb_gather_init(struct iommu_iotlb_gather *gather) }; }
-static inline const struct iommu_ops *dev_iommu_ops(struct device *dev) -{ - /* - * Assume that valid ops must be installed if iommu_probe_device() - * has succeeded. The device ops are essentially for internal use - * within the IOMMU subsystem itself, so we should be able to trust - * ourselves not to misuse the helper. - */ - return dev->iommu->iommu_dev->ops; -} - extern int bus_iommu_probe(const struct bus_type *bus); extern bool iommu_present(const struct bus_type *bus); extern bool device_iommu_capable(struct device *dev, enum iommu_cap cap);
From: Lu Baolu baolu.lu@linux.intel.com
Introduce a new iommu op to get the IOMMU hardware capabilities for iommufd. This information will be used by any vIOMMU driver which is owned by userspace.
This op chooses to make the special parameters opaque to the core. This suits the current usage model where accessing any of the IOMMU device special parameters does require a userspace driver that matches the kernel driver. If a need for common parameters, implemented similarly by several drivers, arises then there's room in the design to grow a generic parameter set as well. No wrapper API is added as it is supposed to be used by iommufd only.
Different IOMMU hardware would have different hardware information. So the information reported differs as well. To let the external user understand the difference. enum iommu_hw_info_type is defined. For the iommu drivers that are capable to report hardware information, it should have a unique iommu_hw_info_type and return to caller. For the driver doesn't report hardware information, caller just uses IOMMU_HW_INFO_TYPE_NONE if a type is required.
Signed-off-by: Lu Baolu baolu.lu@linux.intel.com Co-developed-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Yi Liu yi.l.liu@intel.com --- include/linux/iommu.h | 9 +++++++++ include/uapi/linux/iommufd.h | 9 +++++++++ 2 files changed, 18 insertions(+)
diff --git a/include/linux/iommu.h b/include/linux/iommu.h index e0245aa82b75..f2d6a3989713 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -228,6 +228,14 @@ struct iommu_iotlb_gather { /** * struct iommu_ops - iommu ops and capabilities * @capable: check capability + * @hw_info: IOMMU hardware information. The type of the returned data is + * marked by the output type of this op. Type is one of + * enum iommu_hw_info_type defined in include/uapi/linux/iommufd.h. + * The drivers that support this op should define a unique type + * in include/uapi/linux/iommufd.h. The data buffer returned by this + * op is allocated in the IOMMU driver and the caller should free it + * after use. Return the data buffer if success, or ERR_PTR on + * failure. * @domain_alloc: allocate iommu domain * @probe_device: Add device to iommu driver handling * @release_device: Remove device from iommu driver handling @@ -257,6 +265,7 @@ struct iommu_iotlb_gather { */ struct iommu_ops { bool (*capable)(struct device *dev, enum iommu_cap); + void *(*hw_info)(struct device *dev, u32 *length, u32 *type);
/* Domain allocation and freeing by the iommu driver */ struct iommu_domain *(*domain_alloc)(unsigned iommu_domain_type); diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h index 8245c01adca6..ac11ace21edb 100644 --- a/include/uapi/linux/iommufd.h +++ b/include/uapi/linux/iommufd.h @@ -370,4 +370,13 @@ struct iommu_hwpt_alloc { __u32 __reserved; }; #define IOMMU_HWPT_ALLOC _IO(IOMMUFD_TYPE, IOMMUFD_CMD_HWPT_ALLOC) + +/** + * enum iommu_hw_info_type - IOMMU Hardware Info Types + * @IOMMU_HW_INFO_TYPE_NONE: Used by the drivers that do not report hardware + * info + */ +enum iommu_hw_info_type { + IOMMU_HW_INFO_TYPE_NONE, +}; #endif
Under nested IOMMU translation, userspace owns the stage-1 translation table (e.g. the stage-1 page table of Intel VT-d or the context table of ARM SMMUv3, and etc.). Stage-1 translation tables are vendor specific, and need to be compatible with the underlying IOMMU hardware. Hence, userspace should know the IOMMU hardware capability before creating and configuring the stage-1 translation table to kernel.
This adds IOMMU_GET_HW_INFO ioctl to query the IOMMU hardware information (a.k.a capability) for a given device. The returned data is vendor specific, userspace needs to decode it with the structure mapped by the @out_data_type field.
As only physical devices have IOMMU hardware, so this will return error if the given device is not a physical device.
Reviewed-by: Lu Baolu baolu.lu@linux.intel.com Co-developed-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Yi Liu yi.l.liu@intel.com --- drivers/iommu/iommufd/main.c | 85 ++++++++++++++++++++++++++++++++++++ include/uapi/linux/iommufd.h | 36 +++++++++++++++ 2 files changed, 121 insertions(+)
diff --git a/drivers/iommu/iommufd/main.c b/drivers/iommu/iommufd/main.c index 94c498b8fdf6..d459811c5381 100644 --- a/drivers/iommu/iommufd/main.c +++ b/drivers/iommu/iommufd/main.c @@ -17,6 +17,7 @@ #include <linux/bug.h> #include <uapi/linux/iommufd.h> #include <linux/iommufd.h> +#include "../iommu-priv.h"
#include "io_pagetable.h" #include "iommufd_private.h" @@ -177,6 +178,87 @@ static int iommufd_destroy(struct iommufd_ucmd *ucmd) return 0; }
+static int iommufd_fill_hw_info(struct device *dev, void __user *user_ptr, + unsigned int *length, u32 *type) +{ + const struct iommu_ops *ops; + unsigned int data_len; + void *data; + int rc = 0; + + ops = dev_iommu_ops(dev); + if (!ops->hw_info) { + *length = 0; + *type = IOMMU_HW_INFO_TYPE_NONE; + return 0; + } + + data = ops->hw_info(dev, &data_len, type); + if (IS_ERR(data)) + return PTR_ERR(data); + + /* + * drivers that have hw_info callback should have a unique + * iommu_hw_info_type. + */ + if (WARN_ON_ONCE(*type == IOMMU_HW_INFO_TYPE_NONE)) { + rc = -ENODEV; + goto err_free; + } + + *length = min(*length, data_len); + if (copy_to_user(user_ptr, data, *length)) { + rc = -EFAULT; + goto err_free; + } + +err_free: + kfree(data); + return rc; +} + +static int iommufd_get_hw_info(struct iommufd_ucmd *ucmd) +{ + struct iommu_hw_info *cmd = ucmd->cmd; + unsigned int length = cmd->data_len; + struct iommufd_device *idev; + void __user *user_ptr; + u32 hw_info_type; + int rc = 0; + + if (cmd->flags || cmd->__reserved || !cmd->data_len) + return -EOPNOTSUPP; + + idev = iommufd_get_device(ucmd, cmd->dev_id); + if (IS_ERR(idev)) + return PTR_ERR(idev); + + user_ptr = u64_to_user_ptr(cmd->data_ptr); + + rc = iommufd_fill_hw_info(idev->dev, user_ptr, + &length, &hw_info_type); + if (rc) + goto err_put; + + /* + * Zero the trailing bytes if the user buffer is bigger than the + * data size kernel actually has. + */ + if (length < cmd->data_len) { + rc = clear_user(user_ptr + length, cmd->data_len - length); + if (rc) + goto err_put; + } + + cmd->data_len = length; + cmd->out_data_type = hw_info_type; + rc = iommufd_ucmd_respond(ucmd, sizeof(*cmd)); + +err_put: + iommufd_put_object(&idev->obj); + return rc; +} + static int iommufd_fops_open(struct inode *inode, struct file *filp) { struct iommufd_ctx *ictx; @@ -265,6 +347,7 @@ static int iommufd_option(struct iommufd_ucmd *ucmd)
union ucmd_buffer { struct iommu_destroy destroy; + struct iommu_hw_info info; struct iommu_hwpt_alloc hwpt; struct iommu_ioas_alloc alloc; struct iommu_ioas_allow_iovas allow_iovas; @@ -297,6 +380,8 @@ struct iommufd_ioctl_op { } static const struct iommufd_ioctl_op iommufd_ioctl_ops[] = { IOCTL_OP(IOMMU_DESTROY, iommufd_destroy, struct iommu_destroy, id), + IOCTL_OP(IOMMU_GET_HW_INFO, iommufd_get_hw_info, struct iommu_hw_info, + __reserved), IOCTL_OP(IOMMU_HWPT_ALLOC, iommufd_hwpt_alloc, struct iommu_hwpt_alloc, __reserved), IOCTL_OP(IOMMU_IOAS_ALLOC, iommufd_ioas_alloc_ioctl, diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h index ac11ace21edb..4a00f8fb2d54 100644 --- a/include/uapi/linux/iommufd.h +++ b/include/uapi/linux/iommufd.h @@ -46,6 +46,7 @@ enum { IOMMUFD_CMD_OPTION, IOMMUFD_CMD_VFIO_IOAS, IOMMUFD_CMD_HWPT_ALLOC, + IOMMUFD_CMD_GET_HW_INFO, };
/** @@ -379,4 +380,39 @@ struct iommu_hwpt_alloc { enum iommu_hw_info_type { IOMMU_HW_INFO_TYPE_NONE, }; + +/** + * struct iommu_hw_info - ioctl(IOMMU_GET_HW_INFO) + * @size: sizeof(struct iommu_hw_info) + * @flags: Must be 0 + * @dev_id: The device bound to the iommufd + * @data_len: Input the length of the user buffer in bytes. Output the length + * of data filled in the user buffer. + * @data_ptr: Pointer to the user buffer + * @out_data_type: Output the iommu hardware info type as defined in the enum + * iommu_hw_info_type. + * @__reserved: Must be 0 + * + * Query the hardware information from an iommu behind a given device that has + * been bound to iommufd. @data_len is the size of the buffer, which captures an + * iommu type specific input data and a filled output data. Trailing bytes will + * be zeroed if the user buffer is larger than the data kernel has. + * + * The type specific data would be used to sync capabilities between the virtual + * IOMMU and the hardware IOMMU, e.g. a nested translation setup needs to check + * the hardware information, so the guest stage-1 page table will be compatible. + * + * The @out_data_type will be filled if the ioctl succeeds. It would be used to + * decode the data filled in the buffer pointed by @data_ptr. + */ +struct iommu_hw_info { + __u32 size; + __u32 flags; + __u32 dev_id; + __u32 data_len; + __aligned_u64 data_ptr; + __u32 out_data_type; + __u32 __reserved; +}; +#define IOMMU_GET_HW_INFO _IO(IOMMUFD_TYPE, IOMMUFD_CMD_GET_HW_INFO) #endif
On Fri, Aug 11, 2023 at 12:15:00AM -0700, Yi Liu wrote:
+static int iommufd_get_hw_info(struct iommufd_ucmd *ucmd) +{
- struct iommu_hw_info *cmd = ucmd->cmd;
- unsigned int length = cmd->data_len;
- struct iommufd_device *idev;
- void __user *user_ptr;
- u32 hw_info_type;
- int rc = 0;
- if (cmd->flags || cmd->__reserved || !cmd->data_len)
return -EOPNOTSUPP;
Is there a reason to block 0 data_len? I think this should work. The code looks OK?
Jason
On Tue, Aug 15, 2023 at 01:32:01PM -0300, Jason Gunthorpe wrote:
On Fri, Aug 11, 2023 at 12:15:00AM -0700, Yi Liu wrote:
+static int iommufd_get_hw_info(struct iommufd_ucmd *ucmd) +{
- struct iommu_hw_info *cmd = ucmd->cmd;
- unsigned int length = cmd->data_len;
- struct iommufd_device *idev;
- void __user *user_ptr;
- u32 hw_info_type;
- int rc = 0;
- if (cmd->flags || cmd->__reserved || !cmd->data_len)
return -EOPNOTSUPP;
Is there a reason to block 0 data_len? I think this should work. The code looks OK?
I did a quick test passing !data_len and !data_ptr. And it works by returning the type only.
Yet, in that case, should we mention this in the uAPI kdoc? It feels to me that the uAPI always expects user space to read out a length of data.
Thanks Nic
On Tue, Aug 15, 2023 at 10:31:09AM -0700, Nicolin Chen wrote:
On Tue, Aug 15, 2023 at 01:32:01PM -0300, Jason Gunthorpe wrote:
On Fri, Aug 11, 2023 at 12:15:00AM -0700, Yi Liu wrote:
+static int iommufd_get_hw_info(struct iommufd_ucmd *ucmd) +{
- struct iommu_hw_info *cmd = ucmd->cmd;
- unsigned int length = cmd->data_len;
- struct iommufd_device *idev;
- void __user *user_ptr;
- u32 hw_info_type;
- int rc = 0;
- if (cmd->flags || cmd->__reserved || !cmd->data_len)
return -EOPNOTSUPP;
Is there a reason to block 0 data_len? I think this should work. The code looks OK?
I did a quick test passing !data_len and !data_ptr. And it works by returning the type only.
Yet, in that case, should we mention this in the uAPI kdoc? It feels to me that the uAPI always expects user space to read out a length of data.
Well the way it ought to work is that userspace can pass in 0 length and the kernel will return the correct length
So maybe this does need resending with this removed:
*length = min(*length, data_len);
Also I see clear_user is called wrong, it doesn't return errno.
Please check and repost it ASAP I will update the branch. Probably needs some doc adjusting too.
I came up with this:
int iommufd_get_hw_info(struct iommufd_ucmd *ucmd) { struct iommu_hw_info *cmd = ucmd->cmd; void __user *user_ptr = u64_to_user_ptr(cmd->data_ptr); const struct iommu_ops *ops; struct iommufd_device *idev; unsigned int data_len; unsigned int copy_len; void *data = NULL; int rc;
if (cmd->flags || cmd->__reserved) return -EOPNOTSUPP;
idev = iommufd_get_device(ucmd, cmd->dev_id); if (IS_ERR(idev)) return PTR_ERR(idev);
ops = dev_iommu_ops(idev->dev); if (!ops->hw_info) { data = ops->hw_info(idev->dev, &data_len, &cmd->out_data_type); if (IS_ERR(data)) { rc = PTR_ERR(data); goto err_put; }
/* * drivers that have hw_info callback should have a unique * iommu_hw_info_type. */ if (WARN_ON_ONCE(cmd->out_data_type == IOMMU_HW_INFO_TYPE_NONE)) { rc = -ENODEV; goto out; } } else { cmd->out_data_type = IOMMU_HW_INFO_TYPE_NONE; data_len = 0; data = NULL; }
copy_len = min(cmd->data_len, data_len); if (copy_to_user(user_ptr, data, copy_len)) { rc = -EFAULT; goto out; }
/* * Zero the trailing bytes if the user buffer is bigger than the * data size kernel actually has. */ if (copy_len < cmd->data_len) { if (clear_user(user_ptr + copy_len, cmd->data_len - copy_len)) { rc = -EFAULT; goto out; } }
/* * We return the length the kernel supports so userspace may know what * the kernel capability is. It could be larger than the input buffer. */ cmd->data_len = data_len;
rc = iommufd_ucmd_respond(ucmd, sizeof(*cmd)); out: kfree(data); err_put: iommufd_put_object(&idev->obj); return rc; }
On Tue, Aug 15, 2023 at 03:29:21PM -0300, Jason Gunthorpe wrote:
Well the way it ought to work is that userspace can pass in 0 length and the kernel will return the correct length
So maybe this does need resending with this removed:
*length = min(*length, data_len);
That "length" is 0 (copying the value of cmd->data_len), so it should be 0 even having this line?
Also I see clear_user is called wrong, it doesn't return errno.
Oh, right.
Please check and repost it ASAP I will update the branch. Probably needs some doc adjusting too.
I think your version should be good. I can update the series for the doc part. Yi can confirm tonight and report in his time zone. And it should be available for you to take tomorrow.
ops = dev_iommu_ops(idev->dev); if (!ops->hw_info) { data = ops->hw_info(idev->dev, &data_len, &cmd->out_data_type);
It should be: if (ops->hw_info) {
Thanks Nic
On Tue, Aug 15, 2023 at 11:53:26AM -0700, Nicolin Chen wrote:
ops = dev_iommu_ops(idev->dev); if (!ops->hw_info) { data = ops->hw_info(idev->dev, &data_len, &cmd->out_data_type);
It should be: if (ops->hw_info) {
Hmm, the test suite probably needs some more stuff then too since it passed like that :)
Jason
On Tue, Aug 15, 2023 at 03:56:37PM -0300, Jason Gunthorpe wrote:
On Tue, Aug 15, 2023 at 11:53:26AM -0700, Nicolin Chen wrote:
ops = dev_iommu_ops(idev->dev); if (!ops->hw_info) { data = ops->hw_info(idev->dev, &data_len, &cmd->out_data_type);
It should be: if (ops->hw_info) {
Hmm, the test suite probably needs some more stuff then too since it passed like that :)
Ack. I will see what I can do.
On Tue, Aug 15, 2023 at 11:58:04AM -0700, Nicolin Chen wrote:
On Tue, Aug 15, 2023 at 03:56:37PM -0300, Jason Gunthorpe wrote:
On Tue, Aug 15, 2023 at 11:53:26AM -0700, Nicolin Chen wrote:
ops = dev_iommu_ops(idev->dev); if (!ops->hw_info) { data = ops->hw_info(idev->dev, &data_len, &cmd->out_data_type);
It should be: if (ops->hw_info) {
Hmm, the test suite probably needs some more stuff then too since it passed like that :)
Ack. I will see what I can do.
It actually reports errors when hw_info is defined (and it would get an IOMMU_HW_INFO_TYPE_NONE.
#ok 62 iommufd_ioas.two_mock_domain.ioas_area_auto_destroy # # RUN iommufd_ioas.two_mock_domain.get_hw_info ... # iommufd: iommufd_utils.h:368: _test_cmd_get_hw_info: Assertion `cmd.out_data_type == IOMMU_HW_INFO_TYPE_SELFTEST' failed. # # get_hw_info: Test terminated by assertion
By removing mock_domain_hw_info() to test the other path, simply there would be a kernel crash.
So, I think that we are fine.
Thanks Nicolin
On Fri, Aug 11, 2023 at 12:15:00AM -0700, Yi Liu wrote:
Under nested IOMMU translation, userspace owns the stage-1 translation table (e.g. the stage-1 page table of Intel VT-d or the context table of ARM SMMUv3, and etc.). Stage-1 translation tables are vendor specific, and need to be compatible with the underlying IOMMU hardware. Hence, userspace should know the IOMMU hardware capability before creating and configuring the stage-1 translation table to kernel.
This adds IOMMU_GET_HW_INFO ioctl to query the IOMMU hardware information (a.k.a capability) for a given device. The returned data is vendor specific, userspace needs to decode it with the structure mapped by the @out_data_type field.
As only physical devices have IOMMU hardware, so this will return error if the given device is not a physical device.
Reviewed-by: Lu Baolu baolu.lu@linux.intel.com Co-developed-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Yi Liu yi.l.liu@intel.com
drivers/iommu/iommufd/main.c | 85 ++++++++++++++++++++++++++++++++++++ include/uapi/linux/iommufd.h | 36 +++++++++++++++ 2 files changed, 121 insertions(+)
diff --git a/drivers/iommu/iommufd/main.c b/drivers/iommu/iommufd/main.c index 94c498b8fdf6..d459811c5381 100644 --- a/drivers/iommu/iommufd/main.c +++ b/drivers/iommu/iommufd/main.c @@ -17,6 +17,7 @@
I was looking at this more and this code should be in device.c:
+static int iommufd_fill_hw_info(struct device *dev, void __user *user_ptr,
unsigned int *length, u32 *type)
+{
Since it is working on devices
main.c is primarily for context related stuff
Jason
On Tue, Aug 15, 2023 at 01:42:35PM -0300, Jason Gunthorpe wrote:
On Fri, Aug 11, 2023 at 12:15:00AM -0700, Yi Liu wrote:
Under nested IOMMU translation, userspace owns the stage-1 translation table (e.g. the stage-1 page table of Intel VT-d or the context table of ARM SMMUv3, and etc.). Stage-1 translation tables are vendor specific, and need to be compatible with the underlying IOMMU hardware. Hence, userspace should know the IOMMU hardware capability before creating and configuring the stage-1 translation table to kernel.
This adds IOMMU_GET_HW_INFO ioctl to query the IOMMU hardware information (a.k.a capability) for a given device. The returned data is vendor specific, userspace needs to decode it with the structure mapped by the @out_data_type field.
As only physical devices have IOMMU hardware, so this will return error if the given device is not a physical device.
Reviewed-by: Lu Baolu baolu.lu@linux.intel.com Co-developed-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Yi Liu yi.l.liu@intel.com
drivers/iommu/iommufd/main.c | 85 ++++++++++++++++++++++++++++++++++++ include/uapi/linux/iommufd.h | 36 +++++++++++++++ 2 files changed, 121 insertions(+)
diff --git a/drivers/iommu/iommufd/main.c b/drivers/iommu/iommufd/main.c index 94c498b8fdf6..d459811c5381 100644 --- a/drivers/iommu/iommufd/main.c +++ b/drivers/iommu/iommufd/main.c @@ -17,6 +17,7 @@
I was looking at this more and this code should be in device.c:
+static int iommufd_fill_hw_info(struct device *dev, void __user *user_ptr,
unsigned int *length, u32 *type)
+{
Since it is working on devices
main.c is primarily for context related stuff
Ack for that. We'd make similar changes to the other handlers too.
Thanks Nic
From: Nicolin Chen nicolinc@nvidia.com
Add a mock_domain_hw_info function and an iommu_test_hw_info data structure. This allows to test the IOMMU_GET_HW_INFO ioctl passing the test_reg value for the mock_dev.
Signed-off-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Yi Liu yi.l.liu@intel.com --- drivers/iommu/iommufd/iommufd_test.h | 9 ++++ drivers/iommu/iommufd/selftest.c | 16 +++++++ tools/testing/selftests/iommu/iommufd.c | 28 ++++++++++- .../selftests/iommu/iommufd_fail_nth.c | 4 ++ tools/testing/selftests/iommu/iommufd_utils.h | 47 +++++++++++++++++++ 5 files changed, 103 insertions(+), 1 deletion(-)
diff --git a/drivers/iommu/iommufd/iommufd_test.h b/drivers/iommu/iommufd/iommufd_test.h index 258de2253b61..3f3644375bf1 100644 --- a/drivers/iommu/iommufd/iommufd_test.h +++ b/drivers/iommu/iommufd/iommufd_test.h @@ -100,4 +100,13 @@ struct iommu_test_cmd { }; #define IOMMU_TEST_CMD _IO(IOMMUFD_TYPE, IOMMUFD_CMD_BASE + 32)
+/* Mock structs for IOMMU_DEVICE_GET_HW_INFO ioctl */ +#define IOMMU_HW_INFO_TYPE_SELFTEST 0xfeedbeef +#define IOMMU_HW_INFO_SELFTEST_REGVAL 0xdeadbeef + +struct iommu_test_hw_info { + __u32 flags; + __u32 test_reg; +}; + #endif diff --git a/drivers/iommu/iommufd/selftest.c b/drivers/iommu/iommufd/selftest.c index bb2cd54ca7b6..ab4011e3a7c6 100644 --- a/drivers/iommu/iommufd/selftest.c +++ b/drivers/iommu/iommufd/selftest.c @@ -128,6 +128,21 @@ static struct iommu_domain mock_blocking_domain = { .ops = &mock_blocking_ops, };
+static void *mock_domain_hw_info(struct device *dev, u32 *length, u32 *type) +{ + struct iommu_test_hw_info *info; + + info = kzalloc(sizeof(*info), GFP_KERNEL); + if (!info) + return ERR_PTR(-ENOMEM); + + info->test_reg = IOMMU_HW_INFO_SELFTEST_REGVAL; + *length = sizeof(*info); + *type = IOMMU_HW_INFO_TYPE_SELFTEST; + + return info; +} + static struct iommu_domain *mock_domain_alloc(unsigned int iommu_domain_type) { struct mock_iommu_domain *mock; @@ -279,6 +294,7 @@ static void mock_domain_set_plaform_dma_ops(struct device *dev) static const struct iommu_ops mock_ops = { .owner = THIS_MODULE, .pgsize_bitmap = MOCK_IO_PAGE_SIZE, + .hw_info = mock_domain_hw_info, .domain_alloc = mock_domain_alloc, .capable = mock_domain_capable, .set_platform_dma_ops = mock_domain_set_plaform_dma_ops, diff --git a/tools/testing/selftests/iommu/iommufd.c b/tools/testing/selftests/iommu/iommufd.c index 8acd0af37aa5..7e0fdf372c12 100644 --- a/tools/testing/selftests/iommu/iommufd.c +++ b/tools/testing/selftests/iommu/iommufd.c @@ -113,6 +113,7 @@ TEST_F(iommufd, cmd_length) }
TEST_LENGTH(iommu_destroy, IOMMU_DESTROY); + TEST_LENGTH(iommu_hw_info, IOMMU_GET_HW_INFO); TEST_LENGTH(iommu_ioas_alloc, IOMMU_IOAS_ALLOC); TEST_LENGTH(iommu_ioas_iova_ranges, IOMMU_IOAS_IOVA_RANGES); TEST_LENGTH(iommu_ioas_allow_iovas, IOMMU_IOAS_ALLOW_IOVAS); @@ -185,6 +186,7 @@ FIXTURE(iommufd_ioas) uint32_t ioas_id; uint32_t stdev_id; uint32_t hwpt_id; + uint32_t device_id; uint64_t base_iova; };
@@ -211,7 +213,7 @@ FIXTURE_SETUP(iommufd_ioas)
for (i = 0; i != variant->mock_domains; i++) { test_cmd_mock_domain(self->ioas_id, &self->stdev_id, - &self->hwpt_id, NULL); + &self->hwpt_id, &self->device_id); self->base_iova = MOCK_APERTURE_START; } } @@ -290,6 +292,30 @@ TEST_F(iommufd_ioas, ioas_area_auto_destroy) } }
+TEST_F(iommufd_ioas, get_hw_info) +{ + struct iommu_test_hw_info buffer_exact; + struct iommu_test_hw_info_buffer { + struct iommu_test_hw_info info; + uint64_t trailing_bytes; + } buffer_larger; + + if (self->device_id) { + /* Provide a user_buffer with exact size */ + test_cmd_get_hw_info(self->device_id, &buffer_exact, sizeof(buffer_exact)); + /* + * Provide a user_buffer with size larger than the exact size to check if + * kernel zero the trailing bytes. + */ + test_cmd_get_hw_info(self->device_id, &buffer_larger, sizeof(buffer_larger)); + } else { + test_err_get_hw_info(ENOENT, self->device_id, + &buffer_exact, sizeof(buffer_exact)); + test_err_get_hw_info(ENOENT, self->device_id, + &buffer_larger, sizeof(buffer_larger)); + } +} + TEST_F(iommufd_ioas, area) { int i; diff --git a/tools/testing/selftests/iommu/iommufd_fail_nth.c b/tools/testing/selftests/iommu/iommufd_fail_nth.c index d4c552e56948..a220ca2a689d 100644 --- a/tools/testing/selftests/iommu/iommufd_fail_nth.c +++ b/tools/testing/selftests/iommu/iommufd_fail_nth.c @@ -576,6 +576,7 @@ TEST_FAIL_NTH(basic_fail_nth, access_pin_domain) /* device.c */ TEST_FAIL_NTH(basic_fail_nth, device) { + struct iommu_test_hw_info info; uint32_t ioas_id; uint32_t ioas_id2; uint32_t stdev_id; @@ -611,6 +612,9 @@ TEST_FAIL_NTH(basic_fail_nth, device) &idev_id)) return -1;
+ if (_test_cmd_get_hw_info(self->fd, idev_id, &info, sizeof(info))) + return -1; + if (_test_cmd_hwpt_alloc(self->fd, idev_id, ioas_id, &hwpt_id)) return -1;
diff --git a/tools/testing/selftests/iommu/iommufd_utils.h b/tools/testing/selftests/iommu/iommufd_utils.h index 70353e68e599..ccd0ef7833a0 100644 --- a/tools/testing/selftests/iommu/iommufd_utils.h +++ b/tools/testing/selftests/iommu/iommufd_utils.h @@ -348,3 +348,50 @@ static void teardown_iommufd(int fd, struct __test_metadata *_metadata) })
#endif + +static int _test_cmd_get_hw_info(int fd, __u32 device_id, + void *data, size_t data_len) +{ + struct iommu_hw_info cmd = { + .size = sizeof(cmd), + .dev_id = device_id, + .data_len = data_len, + .data_ptr = (uint64_t)data, + }; + struct iommu_test_hw_info *info = (struct iommu_test_hw_info *)data; + int ret; + + ret = ioctl(fd, IOMMU_GET_HW_INFO, &cmd); + if (ret) + return ret; + + assert(cmd.out_data_type == IOMMU_HW_INFO_TYPE_SELFTEST); + + /* + * Trailing bytes should be 0 if user buffer is larger than + * the data that kernel reports. + */ + if (data_len > cmd.data_len) { + char *ptr = (char *)(data + cmd.data_len); + int idx = 0; + + while (idx < data_len - cmd.data_len) { + assert(!*(ptr + idx)); + idx++; + } + } + + assert(info->test_reg == IOMMU_HW_INFO_SELFTEST_REGVAL); + assert(!info->flags); + + return 0; +} + +#define test_cmd_get_hw_info(device_id, data, data_len) \ + ASSERT_EQ(0, _test_cmd_get_hw_info(self->fd, device_id, \ + data, data_len)) + +#define test_err_get_hw_info(_errno, device_id, data, data_len) \ + EXPECT_ERRNO(_errno, \ + _test_cmd_get_hw_info(self->fd, device_id, \ + data, data_len))
On Fri, Aug 11, 2023 at 12:14:57AM -0700, Yi Liu wrote:
iommufd gives userspace the capability to manipulate iommu subsytem. e.g. DMA map/unmap etc. In the near future, it will support iommu nested translation. Different platform vendors have different implementation for the nested translation. For example, Intel VT-d supports using guest I/O page table as the stage-1 translation table. This requires guest I/O page table be compatible with hardware IOMMU. So before set up nested translation, userspace needs to know the hardware iommu information to understand the nested translation requirements.
This series reports the iommu hardware information for a given device which has been bound to iommufd. It is preparation work for userspace to allocate hwpt for given device. Like the nested translation support[1].
This series introduces an iommu op to report the iommu hardware info, and an ioctl IOMMU_GET_HW_INFO is added to report such hardware info to user. enum iommu_hw_info_type is defined to differentiate the iommu hardware info reported to user hence user can decode them. This series only adds the framework for iommu hw info reporting, the complete reporting path needs vendor specific definition and driver support. The full code is available in [1] as well.
[1] https://github.com/yiliu1765/iommufd/tree/wip/iommufd_nesting_08112023-yi (only the hw_info report path is the latest, other parts is wip)
I made the changes I noted and pull these plus the single vt-d patch into iommufd for-next
Let me know if it is not OK and we can back it out
Thanks, Jason
linux-kselftest-mirror@lists.linaro.org