Nested translation is a hardware feature that is supported by many modern IOMMU hardwares. It has two stages of address translations to get access to the physical address. A stage-1 translation table is owned by userspace (e.g. by a guest OS), while a stage-2 is owned by kernel. Any change to a stage-1 translation table should be followed by an IOTLB invalidation.
Take Intel VT-d as an example, the stage-1 translation table is guest I/O page table. As the below diagram shows, the guest I/O page table pointer in GPA (guest physical address) is passed to host and be used to perform a stage-1 translation. Along with it, a modification to present mappings in the guest I/O page table should be followed by an IOTLB invalidation.
.-------------. .---------------------------. | vIOMMU | | Guest I/O page table | | | '---------------------------' .----------------/ | PASID Entry |--- PASID cache flush --+ '-------------' | | | V | | I/O page table pointer in GPA '-------------' Guest ------| Shadow |---------------------------|-------- v v v Host .-------------. .------------------------. | pIOMMU | | FS for GIOVA->GPA | | | '------------------------' .----------------/ | | PASID Entry | V (Nested xlate) '----------------.----------------------------------. | | | SS for GPA->HPA, unmanaged domain| | | '----------------------------------' '-------------' Where: - FS = First stage page tables - SS = Second stage page tables <Intel VT-d Nested translation>
In IOMMUFD, all the translation tables are tracked by hw_pagetable (hwpt) and each hwpt is backed by an iommu_domain allocated from an iommu driver. So in this series hw_pagetable and iommu_domain means the same thing if no special note. IOMMUFD has already supported allocating hw_pagetable linked with an IOAS. However, a nesting case requires IOMMUFD to allow allocating hw_pagetable with driver specific parameters and interface to sync stage-1 IOTLB as user owns the stage-1 translation table.
This series is based on the iommu hw info reporting series [1] and nested parent domain allocation [2]. It first extends domain_alloc_user to allocate hwpt with user data by allowing the IOMMUFD internal infrastructure to accept user_data and parent hwpt, relaying the user_data/parent to the iommu core to allocate IOMMU_DOMAIN_NESTED. And it then extends the IOMMU_HWPT_ALLOC ioctl to accept user data and a parent hwpt ID.
Note that this series is the part-1 set of a two-part nesting series. It does not include the cache invalidation interface, which will be added in the part 2.
Complete code can be found in [3], it is on top of Joao's dirty page tracking v6 series and fix patches. QEMU could can be found in [4].
At last, this is a team work together with Nicolin Chen, Lu Baolu. Thanks them for the help. ^_^. Look forward to your feedbacks.
[1] https://lore.kernel.org/linux-iommu/20230818101033.4100-1-yi.l.liu@intel.com... - merged [2] https://lore.kernel.org/linux-iommu/20230928071528.26258-1-yi.l.liu@intel.co... - merged [3] https://github.com/yiliu1765/iommufd/tree/iommufd_nesting [4] https://github.com/yiliu1765/qemu/tree/zhenzhong/wip/iommufd_nesting_rfcv1
Change log:
v6: - Rebase on top of Joao's dirty tracking series: https://lore.kernel.org/linux-iommu/20231024135109.73787-1-joao.m.martins@or... - Rebase on top of the enforce_cache_coherency removal patch: https://lore.kernel.org/linux-iommu/ZTcAhwYjjzqM0A5M@Asurada-Nvidia/ - Add parent and user_data check in iommu driver before the driver actually supports the two input. This can make better bisect support, the change is in patch 02.
v5: https://lore.kernel.org/linux-iommu/20231020091946.12173-1-yi.l.liu@intel.co... - Split the iommufd nesting series into two parts of alloc_user and invalidation (Jason) - Split IOMMUFD_OBJ_HW_PAGETABLE to IOMMUFD_OBJ_HWPT_PAGING/_NESTED, and do the same with the structures/alloc()/abort()/destroy(). Reworked the selftest accordingly too. (Jason) - Move hwpt/data_type into struct iommu_user_data from standalone op arguments. (Jason) - Rename hwpt_type to be data_type, the HWPT_TYPE to be HWPT_ALLOC_DATA, _TYPE_DEFAULT to be _ALLOC_DATA_NONE (Jason, Kevin) - Rename iommu_copy_user_data() to iommu_copy_struct_from_user() (Kevin) - Add macro to the iommu_copy_struct_from_user() to calculate min_size (Jason) - Fix two bugs spotted by ZhaoYan
v4: https://lore.kernel.org/linux-iommu/20230921075138.124099-1-yi.l.liu@intel.c... - Separate HWPT alloc/destroy/abort functions between user-managed HWPTs and kernel-managed HWPTs - Rework invalidate uAPI to be a multi-request array-based design - Add a struct iommu_user_data_array and a helper for driver to sanitize and copy the entry data from user space invalidation array - Add a patch fixing TEST_LENGTH() in selftest program - Drop IOMMU_RESV_IOVA_RANGES patches - Update kdoc and inline comments - Drop the code to add IOMMU_RESV_SW_MSI to kernel-managed HWPT in nested translation, this does not change the rule that resv regions should only be added to the kernel-managed HWPT. The IOMMU_RESV_SW_MSI stuff will be added in later series as it is needed only by SMMU so far.
v3: https://lore.kernel.org/linux-iommu/20230724110406.107212-1-yi.l.liu@intel.c... - Add new uAPI things in alphabetical order - Pass in "enum iommu_hwpt_type hwpt_type" to op->domain_alloc_user for sanity, replacing the previous op->domain_alloc_user_data_len solution - Return ERR_PTR from domain_alloc_user instead of NULL - Only add IOMMU_RESV_SW_MSI to kernel-managed HWPT in nested translation (Kevin) - Add IOMMU_RESV_IOVA_RANGES to report resv iova ranges to userspace hence userspace is able to exclude the ranges in the stage-1 HWPT (e.g. guest I/O page table). (Kevin) - Add selftest coverage for the new IOMMU_RESV_IOVA_RANGES ioctl - Minor changes per Kevin's inputs
v2: https://lore.kernel.org/linux-iommu/20230511143844.22693-1-yi.l.liu@intel.co... - Add union iommu_domain_user_data to include all user data structures to avoid passing void * in kernel APIs. - Add iommu op to return user data length for user domain allocation - Rename struct iommu_hwpt_alloc::data_type to be hwpt_type - Store the invalidation data length in iommu_domain_ops::cache_invalidate_user_data_len - Convert cache_invalidate_user op to be int instead of void - Remove @data_type in struct iommu_hwpt_invalidate - Remove out_hwpt_type_bitmap in struct iommu_hw_info hence drop patch 08 of v1
v1: https://lore.kernel.org/linux-iommu/20230309080910.607396-1-yi.l.liu@intel.c...
Thanks, Yi Liu
Jason Gunthorpe (2): iommufd: Rename IOMMUFD_OBJ_HW_PAGETABLE to IOMMUFD_OBJ_HWPT_PAGING iommufd/device: Wrap IOMMUFD_OBJ_HWPT_PAGING-only configurations
Lu Baolu (1): iommu: Add IOMMU_DOMAIN_NESTED
Nicolin Chen (6): iommufd: Derive iommufd_hwpt_paging from iommufd_hw_pagetable iommufd: Share iommufd_hwpt_alloc with IOMMUFD_OBJ_HWPT_NESTED iommufd: Add a nested HW pagetable object iommu: Add iommu_copy_struct_from_user helper iommufd/selftest: Add nested domain allocation for mock domain iommufd/selftest: Add coverage for IOMMU_HWPT_ALLOC with nested HWPTs
Yi Liu (1): iommu: Pass in parent domain with user_data to domain_alloc_user op
drivers/iommu/intel/iommu.c | 7 +- drivers/iommu/iommufd/device.c | 157 +++++++--- drivers/iommu/iommufd/hw_pagetable.c | 271 +++++++++++++----- drivers/iommu/iommufd/iommufd_private.h | 73 +++-- drivers/iommu/iommufd/iommufd_test.h | 18 ++ drivers/iommu/iommufd/main.c | 10 +- drivers/iommu/iommufd/selftest.c | 151 ++++++++-- drivers/iommu/iommufd/vfio_compat.c | 6 +- include/linux/iommu.h | 72 ++++- include/uapi/linux/iommufd.h | 31 +- tools/testing/selftests/iommu/iommufd.c | 120 ++++++++ .../selftests/iommu/iommufd_fail_nth.c | 3 +- tools/testing/selftests/iommu/iommufd_utils.h | 31 +- 13 files changed, 768 insertions(+), 182 deletions(-)
From: Lu Baolu baolu.lu@linux.intel.com
Introduce a new domain type for a user I/O page table, which is nested on top of another user space address represented by a PAGING domain. This new domain can be allocated by the domain_alloc_user op, and attached to a device through the existing iommu_attach_device/group() interfaces.
The mappings of a nested domain are managed by user space software, so it is not necessary to have map/unmap callbacks.
Signed-off-by: Lu Baolu baolu.lu@linux.intel.com Reviewed-by: Kevin Tian kevin.tian@intel.com Signed-off-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Yi Liu yi.l.liu@intel.com --- include/linux/iommu.h | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/include/linux/iommu.h b/include/linux/iommu.h index 1d42bdb37cbc..becfc8b725de 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -67,6 +67,9 @@ struct iommu_domain_geometry {
#define __IOMMU_DOMAIN_SVA (1U << 4) /* Shared process address space */
+#define __IOMMU_DOMAIN_NESTED (1U << 5) /* User-managed address space nested + on a stage-2 translation */ + #define IOMMU_DOMAIN_ALLOC_FLAGS ~__IOMMU_DOMAIN_DMA_FQ /* * This are the possible domain-types @@ -93,6 +96,7 @@ struct iommu_domain_geometry { __IOMMU_DOMAIN_DMA_API | \ __IOMMU_DOMAIN_DMA_FQ) #define IOMMU_DOMAIN_SVA (__IOMMU_DOMAIN_SVA) +#define IOMMU_DOMAIN_NESTED (__IOMMU_DOMAIN_NESTED)
struct iommu_domain { unsigned type;
domain_alloc_user op already accepts user flags for domain allocation, add a parent domain pointer and a driver specific user data support as well.
Add a struct iommu_user_data as a bundle of data_ptr/data_len/type from an iommufd core uAPI structure. Make the user data opaque to the core, since a userspace driver must match the kernel driver. In the future, if drivers share some common parameter, there would be a generic parameter as well.
Define an enum iommu_hwpt_data_type (with IOMMU_HWPT_DATA_NONE type) for iommu drivers to add their own driver specific user data per hw_pagetable.
Signed-off-by: Lu Baolu baolu.lu@linux.intel.com Co-developed-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Yi Liu yi.l.liu@intel.com --- drivers/iommu/intel/iommu.c | 7 ++++++- drivers/iommu/iommufd/hw_pagetable.c | 3 ++- drivers/iommu/iommufd/selftest.c | 7 ++++++- include/linux/iommu.h | 27 ++++++++++++++++++++++++--- include/uapi/linux/iommufd.h | 8 ++++++++ 5 files changed, 46 insertions(+), 6 deletions(-)
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c index 65f46f1347e6..cb64759b3d95 100644 --- a/drivers/iommu/intel/iommu.c +++ b/drivers/iommu/intel/iommu.c @@ -4076,7 +4076,9 @@ static struct iommu_domain *intel_iommu_domain_alloc(unsigned type) }
static struct iommu_domain * -intel_iommu_domain_alloc_user(struct device *dev, u32 flags) +intel_iommu_domain_alloc_user(struct device *dev, u32 flags, + struct iommu_domain *parent, + const struct iommu_user_data *user_data) { struct iommu_domain *domain; struct intel_iommu *iommu; @@ -4086,6 +4088,9 @@ intel_iommu_domain_alloc_user(struct device *dev, u32 flags) (~(IOMMU_HWPT_ALLOC_NEST_PARENT | IOMMU_HWPT_ALLOC_DIRTY_TRACKING))) return ERR_PTR(-EOPNOTSUPP);
+ if (parent || user_data) + return ERR_PTR(-EOPNOTSUPP); + iommu = device_to_iommu(dev, NULL, NULL); if (!iommu) return ERR_PTR(-ENODEV); diff --git a/drivers/iommu/iommufd/hw_pagetable.c b/drivers/iommu/iommufd/hw_pagetable.c index c2b742ffeb0c..97b84d681e09 100644 --- a/drivers/iommu/iommufd/hw_pagetable.c +++ b/drivers/iommu/iommufd/hw_pagetable.c @@ -96,7 +96,8 @@ iommufd_hw_pagetable_alloc(struct iommufd_ctx *ictx, struct iommufd_ioas *ioas, hwpt->ioas = ioas;
if (ops->domain_alloc_user) { - hwpt->domain = ops->domain_alloc_user(idev->dev, flags); + hwpt->domain = ops->domain_alloc_user(idev->dev, flags, + NULL, NULL); if (IS_ERR(hwpt->domain)) { rc = PTR_ERR(hwpt->domain); hwpt->domain = NULL; diff --git a/drivers/iommu/iommufd/selftest.c b/drivers/iommu/iommufd/selftest.c index d8551c9d5b6c..a01a38332dc1 100644 --- a/drivers/iommu/iommufd/selftest.c +++ b/drivers/iommu/iommufd/selftest.c @@ -241,7 +241,9 @@ static struct iommu_domain *mock_domain_alloc(unsigned int iommu_domain_type) }
static struct iommu_domain * -mock_domain_alloc_user(struct device *dev, u32 flags) +mock_domain_alloc_user(struct device *dev, u32 flags, + struct iommu_domain *parent, + const struct iommu_user_data *user_data) { struct mock_dev *mdev = container_of(dev, struct mock_dev, dev); struct iommu_domain *domain; @@ -250,6 +252,9 @@ mock_domain_alloc_user(struct device *dev, u32 flags) (~(IOMMU_HWPT_ALLOC_NEST_PARENT | IOMMU_HWPT_ALLOC_DIRTY_TRACKING))) return ERR_PTR(-EOPNOTSUPP);
+ if (parent || user_data) + return ERR_PTR(-EOPNOTSUPP); + if ((flags & IOMMU_HWPT_ALLOC_DIRTY_TRACKING) && (mdev->flags & MOCK_FLAGS_DEVICE_NO_DIRTY)) return ERR_PTR(-EOPNOTSUPP); diff --git a/include/linux/iommu.h b/include/linux/iommu.h index becfc8b725de..a86ce9cb3547 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -265,6 +265,21 @@ struct iommu_dirty_ops { struct iommu_dirty_bitmap *dirty); };
+/** + * struct iommu_user_data - iommu driver specific user space data info + * @type: The data type of the user buffer + * @uptr: Pointer to the user buffer for copy_from_user() + * @len: The length of the user buffer in bytes + * + * A user space data is an uAPI that is defined in include/uapi/linux/iommufd.h + * @type, @uptr and @len should be just copied from an iommufd core uAPI struct. + */ +struct iommu_user_data { + unsigned int type; + void __user *uptr; + size_t len; +}; + /** * struct iommu_ops - iommu ops and capabilities * @capable: check capability @@ -279,8 +294,12 @@ struct iommu_dirty_ops { * parameters as defined in include/uapi/linux/iommufd.h. * Unlike @domain_alloc, it is called only by IOMMUFD and * must fully initialize the new domain before return. - * Upon success, a domain is returned. Upon failure, - * ERR_PTR must be returned. + * Upon success, if the @user_data is valid and the @parent + * points to a kernel-managed domain, the new domain must be + * IOMMU_DOMAIN_NESTED type; otherwise, the @parent must be + * NULL while the @user_data can be optionally provided, the + * new domain must support __IOMMU_DOMAIN_PAGING. + * Upon failure, ERR_PTR must be returned. * @probe_device: Add device to iommu driver handling * @release_device: Remove device from iommu driver handling * @probe_finalize: Do final setup work after the device is added to an IOMMU @@ -313,7 +332,9 @@ struct iommu_ops {
/* Domain allocation and freeing by the iommu driver */ struct iommu_domain *(*domain_alloc)(unsigned iommu_domain_type); - struct iommu_domain *(*domain_alloc_user)(struct device *dev, u32 flags); + struct iommu_domain *(*domain_alloc_user)(struct device *dev, u32 flags, + struct iommu_domain *parent, + const struct iommu_user_data *user_data);
struct iommu_device *(*probe_device)(struct device *dev); void (*release_device)(struct device *dev); diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h index c44eecf5d318..fccc6315a520 100644 --- a/include/uapi/linux/iommufd.h +++ b/include/uapi/linux/iommufd.h @@ -361,6 +361,14 @@ enum iommufd_hwpt_alloc_flags { IOMMU_HWPT_ALLOC_DIRTY_TRACKING = 1 << 1, };
+/** + * enum iommu_hwpt_data_type - IOMMU HWPT Data Type + * @IOMMU_HWPT_DATA_NONE: no data + */ +enum iommu_hwpt_data_type { + IOMMU_HWPT_DATA_NONE, +}; + /** * struct iommu_hwpt_alloc - ioctl(IOMMU_HWPT_ALLOC) * @size: sizeof(struct iommu_hwpt_alloc)
On 24/10/2023 16:06, Yi Liu wrote:
domain_alloc_user op already accepts user flags for domain allocation, add a parent domain pointer and a driver specific user data support as well.
Add a struct iommu_user_data as a bundle of data_ptr/data_len/type from an iommufd core uAPI structure. Make the user data opaque to the core, since a userspace driver must match the kernel driver. In the future, if drivers share some common parameter, there would be a generic parameter as well.
Define an enum iommu_hwpt_data_type (with IOMMU_HWPT_DATA_NONE type) for iommu drivers to add their own driver specific user data per hw_pagetable.
Signed-off-by: Lu Baolu baolu.lu@linux.intel.com Co-developed-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Yi Liu yi.l.liu@intel.com
drivers/iommu/intel/iommu.c | 7 ++++++-
You are sadly missing AMD IOMMU
This would fix the build and nack the op should parent or user_data be passed:
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c index caad10f9cee3..bc747513afcb 100644 --- a/drivers/iommu/amd/iommu.c +++ b/drivers/iommu/amd/iommu.c @@ -2220,13 +2220,17 @@ static struct iommu_domain *amd_iommu_domain_alloc(unsigned int type) }
static struct iommu_domain *amd_iommu_domain_alloc_user(struct device *dev, - u32 flags) + u32 flags, struct iommu_domain *parent, + const struct iommu_user_data *user_data) { unsigned int type = IOMMU_DOMAIN_UNMANAGED;
if (flags & ~IOMMU_HWPT_ALLOC_DIRTY_TRACKING) return ERR_PTR(-EOPNOTSUPP);
+ if (parent || user_data) + return ERR_PTR(-EOPNOTSUPP); + return do_iommu_domain_alloc(type, dev, flags); }
On 2023/10/24 23:56, Joao Martins wrote:
On 24/10/2023 16:06, Yi Liu wrote:
domain_alloc_user op already accepts user flags for domain allocation, add a parent domain pointer and a driver specific user data support as well.
Add a struct iommu_user_data as a bundle of data_ptr/data_len/type from an iommufd core uAPI structure. Make the user data opaque to the core, since a userspace driver must match the kernel driver. In the future, if drivers share some common parameter, there would be a generic parameter as well.
Define an enum iommu_hwpt_data_type (with IOMMU_HWPT_DATA_NONE type) for iommu drivers to add their own driver specific user data per hw_pagetable.
Signed-off-by: Lu Baolu baolu.lu@linux.intel.com Co-developed-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Yi Liu yi.l.liu@intel.com
drivers/iommu/intel/iommu.c | 7 ++++++-
You are sadly missing AMD IOMMU
good catch.
This would fix the build and nack the op should parent or user_data be passed:
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c index caad10f9cee3..bc747513afcb 100644 --- a/drivers/iommu/amd/iommu.c +++ b/drivers/iommu/amd/iommu.c @@ -2220,13 +2220,17 @@ static struct iommu_domain *amd_iommu_domain_alloc(unsigned int type) }
static struct iommu_domain *amd_iommu_domain_alloc_user(struct device *dev,
u32 flags)
u32 flags, struct iommu_domain *parent,
const struct iommu_user_data *user_data)
{ unsigned int type = IOMMU_DOMAIN_UNMANAGED;
if (flags & ~IOMMU_HWPT_ALLOC_DIRTY_TRACKING) return ERR_PTR(-EOPNOTSUPP);
if (parent || user_data)
return ERR_PTR(-EOPNOTSUPP);
return do_iommu_domain_alloc(type, dev, flags);
}
yes, this should work. @Jason, one more version or just this one with the above diff from Joao?
On Tue, Oct 24, 2023 at 08:06:01AM -0700, Yi Liu wrote:
domain_alloc_user op already accepts user flags for domain allocation, add a parent domain pointer and a driver specific user data support as well.
Add a struct iommu_user_data as a bundle of data_ptr/data_len/type from an iommufd core uAPI structure. Make the user data opaque to the core, since a userspace driver must match the kernel driver. In the future, if drivers share some common parameter, there would be a generic parameter as well.
Define an enum iommu_hwpt_data_type (with IOMMU_HWPT_DATA_NONE type) for iommu drivers to add their own driver specific user data per hw_pagetable.
Signed-off-by: Lu Baolu baolu.lu@linux.intel.com Co-developed-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Yi Liu yi.l.liu@intel.com
drivers/iommu/intel/iommu.c | 7 ++++++- drivers/iommu/iommufd/hw_pagetable.c | 3 ++- drivers/iommu/iommufd/selftest.c | 7 ++++++- include/linux/iommu.h | 27 ++++++++++++++++++++++++--- include/uapi/linux/iommufd.h | 8 ++++++++ 5 files changed, 46 insertions(+), 6 deletions(-)
This patch should be immediately before "iommufd: Add a nested HW pagetable object"
Since it is basically preperation for that patch
diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h index c44eecf5d318..fccc6315a520 100644 --- a/include/uapi/linux/iommufd.h +++ b/include/uapi/linux/iommufd.h @@ -361,6 +361,14 @@ enum iommufd_hwpt_alloc_flags { IOMMU_HWPT_ALLOC_DIRTY_TRACKING = 1 << 1, }; +/**
- enum iommu_hwpt_data_type - IOMMU HWPT Data Type
- @IOMMU_HWPT_DATA_NONE: no data
- */
+enum iommu_hwpt_data_type {
- IOMMU_HWPT_DATA_NONE,
+};
And this hunk should go in "iommufd: Add a nested HW pagetable object"
With the rest of the uapi
Jason
From: Jason Gunthorpe jgg@nvidia.com
To add a new IOMMUFD_OBJ_HWPT_NESTED, rename the HWPT object to confine it to PAGING hwpts/domains. The following patch will separate the hwpt structure as well.
Signed-off-by: Jason Gunthorpe jgg@nvidia.com Signed-off-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Yi Liu yi.l.liu@intel.com --- drivers/iommu/iommufd/device.c | 10 +++++----- drivers/iommu/iommufd/hw_pagetable.c | 2 +- drivers/iommu/iommufd/iommufd_private.h | 4 ++-- drivers/iommu/iommufd/main.c | 2 +- drivers/iommu/iommufd/selftest.c | 2 +- 5 files changed, 10 insertions(+), 10 deletions(-)
diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c index 0a8867487508..449b64e6ef53 100644 --- a/drivers/iommu/iommufd/device.c +++ b/drivers/iommu/iommufd/device.c @@ -563,7 +563,7 @@ static int iommufd_device_change_pt(struct iommufd_device *idev, u32 *pt_id, return PTR_ERR(pt_obj);
switch (pt_obj->type) { - case IOMMUFD_OBJ_HW_PAGETABLE: { + case IOMMUFD_OBJ_HWPT_PAGING: { struct iommufd_hw_pagetable *hwpt = container_of(pt_obj, struct iommufd_hw_pagetable, obj);
@@ -601,8 +601,8 @@ static int iommufd_device_change_pt(struct iommufd_device *idev, u32 *pt_id, /** * iommufd_device_attach - Connect a device to an iommu_domain * @idev: device to attach - * @pt_id: Input a IOMMUFD_OBJ_IOAS, or IOMMUFD_OBJ_HW_PAGETABLE - * Output the IOMMUFD_OBJ_HW_PAGETABLE ID + * @pt_id: Input a IOMMUFD_OBJ_IOAS, or IOMMUFD_OBJ_HWPT_PAGING + * Output the IOMMUFD_OBJ_HWPT_PAGING ID * * This connects the device to an iommu_domain, either automatically or manually * selected. Once this completes the device could do DMA. @@ -630,8 +630,8 @@ EXPORT_SYMBOL_NS_GPL(iommufd_device_attach, IOMMUFD); /** * iommufd_device_replace - Change the device's iommu_domain * @idev: device to change - * @pt_id: Input a IOMMUFD_OBJ_IOAS, or IOMMUFD_OBJ_HW_PAGETABLE - * Output the IOMMUFD_OBJ_HW_PAGETABLE ID + * @pt_id: Input a IOMMUFD_OBJ_IOAS, or IOMMUFD_OBJ_HWPT_PAGING + * Output the IOMMUFD_OBJ_HWPT_PAGING ID * * This is the same as:: * diff --git a/drivers/iommu/iommufd/hw_pagetable.c b/drivers/iommu/iommufd/hw_pagetable.c index 97b84d681e09..1c13c3b6368c 100644 --- a/drivers/iommu/iommufd/hw_pagetable.c +++ b/drivers/iommu/iommufd/hw_pagetable.c @@ -86,7 +86,7 @@ iommufd_hw_pagetable_alloc(struct iommufd_ctx *ictx, struct iommufd_ioas *ioas, if (flags && !ops->domain_alloc_user) return ERR_PTR(-EOPNOTSUPP);
- hwpt = iommufd_object_alloc(ictx, hwpt, IOMMUFD_OBJ_HW_PAGETABLE); + hwpt = iommufd_object_alloc(ictx, hwpt, IOMMUFD_OBJ_HWPT_PAGING); if (IS_ERR(hwpt)) return hwpt;
diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h index 43410fd53157..70bebad63a74 100644 --- a/drivers/iommu/iommufd/iommufd_private.h +++ b/drivers/iommu/iommufd/iommufd_private.h @@ -123,7 +123,7 @@ enum iommufd_object_type { IOMMUFD_OBJ_NONE, IOMMUFD_OBJ_ANY = IOMMUFD_OBJ_NONE, IOMMUFD_OBJ_DEVICE, - IOMMUFD_OBJ_HW_PAGETABLE, + IOMMUFD_OBJ_HWPT_PAGING, IOMMUFD_OBJ_IOAS, IOMMUFD_OBJ_ACCESS, #ifdef CONFIG_IOMMUFD_TEST @@ -256,7 +256,7 @@ static inline struct iommufd_hw_pagetable * iommufd_get_hwpt(struct iommufd_ucmd *ucmd, u32 id) { return container_of(iommufd_get_object(ucmd->ictx, id, - IOMMUFD_OBJ_HW_PAGETABLE), + IOMMUFD_OBJ_HWPT_PAGING), struct iommufd_hw_pagetable, obj); } int iommufd_hwpt_set_dirty_tracking(struct iommufd_ucmd *ucmd); diff --git a/drivers/iommu/iommufd/main.c b/drivers/iommu/iommufd/main.c index d50f42a730aa..46198e8948d6 100644 --- a/drivers/iommu/iommufd/main.c +++ b/drivers/iommu/iommufd/main.c @@ -488,7 +488,7 @@ static const struct iommufd_object_ops iommufd_object_ops[] = { [IOMMUFD_OBJ_IOAS] = { .destroy = iommufd_ioas_destroy, }, - [IOMMUFD_OBJ_HW_PAGETABLE] = { + [IOMMUFD_OBJ_HWPT_PAGING] = { .destroy = iommufd_hw_pagetable_destroy, .abort = iommufd_hw_pagetable_abort, }, diff --git a/drivers/iommu/iommufd/selftest.c b/drivers/iommu/iommufd/selftest.c index a01a38332dc1..d71007234896 100644 --- a/drivers/iommu/iommufd/selftest.c +++ b/drivers/iommu/iommufd/selftest.c @@ -442,7 +442,7 @@ get_md_pagetable(struct iommufd_ucmd *ucmd, u32 mockpt_id, struct iommufd_object *obj;
obj = iommufd_get_object(ucmd->ictx, mockpt_id, - IOMMUFD_OBJ_HW_PAGETABLE); + IOMMUFD_OBJ_HWPT_PAGING); if (IS_ERR(obj)) return ERR_CAST(obj); hwpt = container_of(obj, struct iommufd_hw_pagetable, obj);
From: Jason Gunthorpe jgg@nvidia.com
Some of the configurations during the attach/replace() should only apply to IOMMUFD_OBJ_HWPT_PAGING. Once IOMMUFD_OBJ_HWPT_NESTED gets introduced in a following patch, keeping them unconditionally in the common routine will not work.
Wrap all of those PAGING-only configurations together into helpers. Do a hwpt_is_paging check whenever calling them or their fallback routines.
Signed-off-by: Jason Gunthorpe jgg@nvidia.com Signed-off-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Yi Liu yi.l.liu@intel.com --- drivers/iommu/iommufd/device.c | 110 +++++++++++++++++------- drivers/iommu/iommufd/iommufd_private.h | 5 ++ 2 files changed, 85 insertions(+), 30 deletions(-)
diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c index 449b64e6ef53..5bcc15cd54a1 100644 --- a/drivers/iommu/iommufd/device.c +++ b/drivers/iommu/iommufd/device.c @@ -325,6 +325,28 @@ static int iommufd_group_setup_msi(struct iommufd_group *igroup, return 0; }
+static int iommufd_hwpt_paging_attach(struct iommufd_hw_pagetable *hwpt, + struct iommufd_device *idev) +{ + int rc; + + lockdep_assert_held(&idev->igroup->lock); + + rc = iopt_table_enforce_dev_resv_regions(&hwpt->ioas->iopt, idev->dev, + &idev->igroup->sw_msi_start); + if (rc) + return rc; + + if (list_empty(&idev->igroup->device_list)) { + rc = iommufd_group_setup_msi(idev->igroup, hwpt); + if (rc) { + iopt_remove_reserved_iova(&hwpt->ioas->iopt, idev->dev); + return rc; + } + } + return 0; +} + int iommufd_hw_pagetable_attach(struct iommufd_hw_pagetable *hwpt, struct iommufd_device *idev) { @@ -337,10 +359,11 @@ int iommufd_hw_pagetable_attach(struct iommufd_hw_pagetable *hwpt, goto err_unlock; }
- rc = iopt_table_enforce_dev_resv_regions(&hwpt->ioas->iopt, idev->dev, - &idev->igroup->sw_msi_start); - if (rc) - goto err_unlock; + if (hwpt_is_paging(hwpt)) { + rc = iommufd_hwpt_paging_attach(hwpt, idev); + if (rc) + goto err_unlock; + }
/* * Only attach to the group once for the first device that is in the @@ -350,10 +373,6 @@ int iommufd_hw_pagetable_attach(struct iommufd_hw_pagetable *hwpt, * attachment. */ if (list_empty(&idev->igroup->device_list)) { - rc = iommufd_group_setup_msi(idev->igroup, hwpt); - if (rc) - goto err_unresv; - rc = iommu_attach_group(hwpt->domain, idev->igroup->group); if (rc) goto err_unresv; @@ -364,7 +383,8 @@ int iommufd_hw_pagetable_attach(struct iommufd_hw_pagetable *hwpt, mutex_unlock(&idev->igroup->lock); return 0; err_unresv: - iopt_remove_reserved_iova(&hwpt->ioas->iopt, idev->dev); + if (hwpt_is_paging(hwpt)) + iopt_remove_reserved_iova(&hwpt->ioas->iopt, idev->dev); err_unlock: mutex_unlock(&idev->igroup->lock); return rc; @@ -381,7 +401,8 @@ iommufd_hw_pagetable_detach(struct iommufd_device *idev) iommu_detach_group(hwpt->domain, idev->igroup->group); idev->igroup->hwpt = NULL; } - iopt_remove_reserved_iova(&hwpt->ioas->iopt, idev->dev); + if (hwpt_is_paging(hwpt)) + iopt_remove_reserved_iova(&hwpt->ioas->iopt, idev->dev); mutex_unlock(&idev->igroup->lock);
/* Caller must destroy hwpt */ @@ -400,13 +421,51 @@ iommufd_device_do_attach(struct iommufd_device *idev, return NULL; }
+static void iommufd_group_remove_reserved_iova(struct iommufd_group *igroup, + struct iommufd_hw_pagetable *hwpt) +{ + struct iommufd_device *cur; + + lockdep_assert_held(&igroup->lock); + + list_for_each_entry(cur, &igroup->device_list, group_item) + iopt_remove_reserved_iova(&hwpt->ioas->iopt, cur->dev); +} + +static int iommufd_group_do_replace_paging(struct iommufd_group *igroup, + struct iommufd_hw_pagetable *hwpt) +{ + struct iommufd_hw_pagetable *old_hwpt = igroup->hwpt; + struct iommufd_device *cur; + int rc; + + lockdep_assert_held(&igroup->lock); + + if (hwpt_is_paging(old_hwpt) && hwpt->ioas != old_hwpt->ioas) { + list_for_each_entry(cur, &igroup->device_list, group_item) { + rc = iopt_table_enforce_dev_resv_regions( + &hwpt->ioas->iopt, cur->dev, NULL); + if (rc) + goto err_unresv; + } + } + + rc = iommufd_group_setup_msi(igroup, hwpt); + if (rc) + goto err_unresv; + return 0; + +err_unresv: + iommufd_group_remove_reserved_iova(igroup, hwpt); + return rc; +} + static struct iommufd_hw_pagetable * iommufd_device_do_replace(struct iommufd_device *idev, struct iommufd_hw_pagetable *hwpt) { struct iommufd_group *igroup = idev->igroup; struct iommufd_hw_pagetable *old_hwpt; - struct iommufd_device *cur; unsigned int num_devices; int rc;
@@ -422,29 +481,20 @@ iommufd_device_do_replace(struct iommufd_device *idev, return NULL; }
- old_hwpt = igroup->hwpt; - if (hwpt->ioas != old_hwpt->ioas) { - list_for_each_entry(cur, &igroup->device_list, group_item) { - rc = iopt_table_enforce_dev_resv_regions( - &hwpt->ioas->iopt, cur->dev, NULL); - if (rc) - goto err_unresv; - } + if (hwpt_is_paging(hwpt)) { + rc = iommufd_group_do_replace_paging(igroup, hwpt); + if (rc) + goto err_unlock; }
- rc = iommufd_group_setup_msi(idev->igroup, hwpt); - if (rc) - goto err_unresv; - rc = iommu_group_replace_domain(igroup->group, hwpt->domain); if (rc) goto err_unresv;
- if (hwpt->ioas != old_hwpt->ioas) { - list_for_each_entry(cur, &igroup->device_list, group_item) - iopt_remove_reserved_iova(&old_hwpt->ioas->iopt, - cur->dev); - } + old_hwpt = igroup->hwpt; + if (hwpt_is_paging(old_hwpt) && + (!hwpt_is_paging(hwpt) || hwpt->ioas != old_hwpt->ioas)) + iommufd_group_remove_reserved_iova(igroup, old_hwpt);
igroup->hwpt = hwpt;
@@ -462,8 +512,8 @@ iommufd_device_do_replace(struct iommufd_device *idev, /* Caller must destroy old_hwpt */ return old_hwpt; err_unresv: - list_for_each_entry(cur, &igroup->device_list, group_item) - iopt_remove_reserved_iova(&hwpt->ioas->iopt, cur->dev); + if (hwpt_is_paging(hwpt)) + iommufd_group_remove_reserved_iova(igroup, hwpt); err_unlock: mutex_unlock(&idev->igroup->lock); return ERR_PTR(rc); diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h index 70bebad63a74..776dd41c077f 100644 --- a/drivers/iommu/iommufd/iommufd_private.h +++ b/drivers/iommu/iommufd/iommufd_private.h @@ -252,6 +252,11 @@ struct iommufd_hw_pagetable { struct list_head hwpt_item; };
+static inline bool hwpt_is_paging(struct iommufd_hw_pagetable *hwpt) +{ + return hwpt->obj.type == IOMMUFD_OBJ_HWPT_PAGING; +} + static inline struct iommufd_hw_pagetable * iommufd_get_hwpt(struct iommufd_ucmd *ucmd, u32 id) {
From: Liu, Yi L yi.l.liu@intel.com Sent: Tuesday, October 24, 2023 11:06 PM
+static int iommufd_group_do_replace_paging(struct iommufd_group *igroup,
struct iommufd_hw_pagetable
*hwpt) +{
- struct iommufd_hw_pagetable *old_hwpt = igroup->hwpt;
- struct iommufd_device *cur;
- int rc;
- lockdep_assert_held(&igroup->lock);
- if (hwpt_is_paging(old_hwpt) && hwpt->ioas != old_hwpt->ioas) {
list_for_each_entry(cur, &igroup->device_list, group_item) {
rc = iopt_table_enforce_dev_resv_regions(
&hwpt->ioas->iopt, cur->dev, NULL);
if (rc)
goto err_unresv;
}
should be:
if (!hwpt_is_paging(old_hwpt) || hwpt->ioas != old_hwpt->ioas) { ...
On 2023/10/25 14:46, Tian, Kevin wrote:
From: Liu, Yi L yi.l.liu@intel.com Sent: Tuesday, October 24, 2023 11:06 PM
+static int iommufd_group_do_replace_paging(struct iommufd_group *igroup,
struct iommufd_hw_pagetable
*hwpt) +{
- struct iommufd_hw_pagetable *old_hwpt = igroup->hwpt;
- struct iommufd_device *cur;
- int rc;
- lockdep_assert_held(&igroup->lock);
- if (hwpt_is_paging(old_hwpt) && hwpt->ioas != old_hwpt->ioas) {
list_for_each_entry(cur, &igroup->device_list, group_item) {
rc = iopt_table_enforce_dev_resv_regions(
&hwpt->ioas->iopt, cur->dev, NULL);
if (rc)
goto err_unresv;
}
should be:
if (!hwpt_is_paging(old_hwpt) || hwpt->ioas != old_hwpt->ioas) { ...
oh, yes. The original logic is to add resv region when the ioas are different between new and old hwpts. But now, if the old hwpt is not paging, then it's already needed to add resv regions in the ioas.
Regards, Yi Liu
From: Nicolin Chen nicolinc@nvidia.com
To prepare for IOMMUFD_OBJ_HWPT_NESTED, derive struct iommufd_hwpt_paging from struct iommufd_hw_pagetable, by leaving the common members in struct iommufd_hw_pagetable. Add a __iommufd_object_alloc and to_hwpt_paging() helpers for the new structure.
Then, update "hwpt" to "hwpt_paging" throughout the files, accordingly.
Suggested-by: Jason Gunthorpe jgg@nvidia.com Signed-off-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Yi Liu yi.l.liu@intel.com --- drivers/iommu/iommufd/device.c | 74 ++++++++------ drivers/iommu/iommufd/hw_pagetable.c | 129 +++++++++++++----------- drivers/iommu/iommufd/iommufd_private.h | 41 +++++--- drivers/iommu/iommufd/main.c | 4 +- drivers/iommu/iommufd/vfio_compat.c | 6 +- 5 files changed, 148 insertions(+), 106 deletions(-)
diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c index 5bcc15cd54a1..902288ca910d 100644 --- a/drivers/iommu/iommufd/device.c +++ b/drivers/iommu/iommufd/device.c @@ -293,7 +293,7 @@ u32 iommufd_device_to_id(struct iommufd_device *idev) EXPORT_SYMBOL_NS_GPL(iommufd_device_to_id, IOMMUFD);
static int iommufd_group_setup_msi(struct iommufd_group *igroup, - struct iommufd_hw_pagetable *hwpt) + struct iommufd_hwpt_paging *hwpt_paging) { phys_addr_t sw_msi_start = igroup->sw_msi_start; int rc; @@ -311,8 +311,9 @@ static int iommufd_group_setup_msi(struct iommufd_group *igroup, * matches what the IRQ layer actually expects in a newly created * domain. */ - if (sw_msi_start != PHYS_ADDR_MAX && !hwpt->msi_cookie) { - rc = iommu_get_msi_cookie(hwpt->domain, sw_msi_start); + if (sw_msi_start != PHYS_ADDR_MAX && !hwpt_paging->msi_cookie) { + rc = iommu_get_msi_cookie(hwpt_paging->common.domain, + sw_msi_start); if (rc) return rc;
@@ -320,27 +321,29 @@ static int iommufd_group_setup_msi(struct iommufd_group *igroup, * iommu_get_msi_cookie() can only be called once per domain, * it returns -EBUSY on later calls. */ - hwpt->msi_cookie = true; + hwpt_paging->msi_cookie = true; } return 0; }
-static int iommufd_hwpt_paging_attach(struct iommufd_hw_pagetable *hwpt, +static int iommufd_hwpt_paging_attach(struct iommufd_hwpt_paging *hwpt_paging, struct iommufd_device *idev) { int rc;
lockdep_assert_held(&idev->igroup->lock);
- rc = iopt_table_enforce_dev_resv_regions(&hwpt->ioas->iopt, idev->dev, + rc = iopt_table_enforce_dev_resv_regions(&hwpt_paging->ioas->iopt, + idev->dev, &idev->igroup->sw_msi_start); if (rc) return rc;
if (list_empty(&idev->igroup->device_list)) { - rc = iommufd_group_setup_msi(idev->igroup, hwpt); + rc = iommufd_group_setup_msi(idev->igroup, hwpt_paging); if (rc) { - iopt_remove_reserved_iova(&hwpt->ioas->iopt, idev->dev); + iopt_remove_reserved_iova(&hwpt_paging->ioas->iopt, + idev->dev); return rc; } } @@ -360,7 +363,7 @@ int iommufd_hw_pagetable_attach(struct iommufd_hw_pagetable *hwpt, }
if (hwpt_is_paging(hwpt)) { - rc = iommufd_hwpt_paging_attach(hwpt, idev); + rc = iommufd_hwpt_paging_attach(to_hwpt_paging(hwpt), idev); if (rc) goto err_unlock; } @@ -384,7 +387,8 @@ int iommufd_hw_pagetable_attach(struct iommufd_hw_pagetable *hwpt, return 0; err_unresv: if (hwpt_is_paging(hwpt)) - iopt_remove_reserved_iova(&hwpt->ioas->iopt, idev->dev); + iopt_remove_reserved_iova(&to_hwpt_paging(hwpt)->ioas->iopt, + idev->dev); err_unlock: mutex_unlock(&idev->igroup->lock); return rc; @@ -402,7 +406,8 @@ iommufd_hw_pagetable_detach(struct iommufd_device *idev) idev->igroup->hwpt = NULL; } if (hwpt_is_paging(hwpt)) - iopt_remove_reserved_iova(&hwpt->ioas->iopt, idev->dev); + iopt_remove_reserved_iova(&to_hwpt_paging(hwpt)->ioas->iopt, + idev->dev); mutex_unlock(&idev->igroup->lock);
/* Caller must destroy hwpt */ @@ -421,19 +426,20 @@ iommufd_device_do_attach(struct iommufd_device *idev, return NULL; }
-static void iommufd_group_remove_reserved_iova(struct iommufd_group *igroup, - struct iommufd_hw_pagetable *hwpt) +static void +iommufd_group_remove_reserved_iova(struct iommufd_group *igroup, + struct iommufd_hwpt_paging *hwpt_paging) { struct iommufd_device *cur;
lockdep_assert_held(&igroup->lock);
list_for_each_entry(cur, &igroup->device_list, group_item) - iopt_remove_reserved_iova(&hwpt->ioas->iopt, cur->dev); + iopt_remove_reserved_iova(&hwpt_paging->ioas->iopt, cur->dev); }
static int iommufd_group_do_replace_paging(struct iommufd_group *igroup, - struct iommufd_hw_pagetable *hwpt) + struct iommufd_hwpt_paging *hwpt_paging) { struct iommufd_hw_pagetable *old_hwpt = igroup->hwpt; struct iommufd_device *cur; @@ -441,22 +447,23 @@ static int iommufd_group_do_replace_paging(struct iommufd_group *igroup,
lockdep_assert_held(&igroup->lock);
- if (hwpt_is_paging(old_hwpt) && hwpt->ioas != old_hwpt->ioas) { + if (hwpt_is_paging(old_hwpt) && + hwpt_paging->ioas != to_hwpt_paging(old_hwpt)->ioas) { list_for_each_entry(cur, &igroup->device_list, group_item) { rc = iopt_table_enforce_dev_resv_regions( - &hwpt->ioas->iopt, cur->dev, NULL); + &hwpt_paging->ioas->iopt, cur->dev, NULL); if (rc) goto err_unresv; } }
- rc = iommufd_group_setup_msi(igroup, hwpt); + rc = iommufd_group_setup_msi(igroup, hwpt_paging); if (rc) goto err_unresv; return 0;
err_unresv: - iommufd_group_remove_reserved_iova(igroup, hwpt); + iommufd_group_remove_reserved_iova(igroup, hwpt_paging); return rc; }
@@ -482,7 +489,8 @@ iommufd_device_do_replace(struct iommufd_device *idev, }
if (hwpt_is_paging(hwpt)) { - rc = iommufd_group_do_replace_paging(igroup, hwpt); + rc = iommufd_group_do_replace_paging(igroup, + to_hwpt_paging(hwpt)); if (rc) goto err_unlock; } @@ -493,8 +501,10 @@ iommufd_device_do_replace(struct iommufd_device *idev,
old_hwpt = igroup->hwpt; if (hwpt_is_paging(old_hwpt) && - (!hwpt_is_paging(hwpt) || hwpt->ioas != old_hwpt->ioas)) - iommufd_group_remove_reserved_iova(igroup, old_hwpt); + (!hwpt_is_paging(hwpt) || + to_hwpt_paging(hwpt)->ioas != to_hwpt_paging(old_hwpt)->ioas)) + iommufd_group_remove_reserved_iova(igroup, + to_hwpt_paging(old_hwpt));
igroup->hwpt = hwpt;
@@ -513,7 +523,8 @@ iommufd_device_do_replace(struct iommufd_device *idev, return old_hwpt; err_unresv: if (hwpt_is_paging(hwpt)) - iommufd_group_remove_reserved_iova(igroup, hwpt); + iommufd_group_remove_reserved_iova(igroup, + to_hwpt_paging(old_hwpt)); err_unlock: mutex_unlock(&idev->igroup->lock); return ERR_PTR(rc); @@ -541,6 +552,7 @@ iommufd_device_auto_get_domain(struct iommufd_device *idev, */ bool immediate_attach = do_attach == iommufd_device_do_attach; struct iommufd_hw_pagetable *destroy_hwpt; + struct iommufd_hwpt_paging *hwpt_paging; struct iommufd_hw_pagetable *hwpt;
/* @@ -549,10 +561,11 @@ iommufd_device_auto_get_domain(struct iommufd_device *idev, * other. */ mutex_lock(&ioas->mutex); - list_for_each_entry(hwpt, &ioas->hwpt_list, hwpt_item) { - if (!hwpt->auto_domain) + list_for_each_entry(hwpt_paging, &ioas->hwpt_list, hwpt_item) { + if (!hwpt_paging->auto_domain) continue;
+ hwpt = &hwpt_paging->common; if (!iommufd_lock_obj(&hwpt->obj)) continue; destroy_hwpt = (*do_attach)(idev, hwpt); @@ -573,12 +586,13 @@ iommufd_device_auto_get_domain(struct iommufd_device *idev, goto out_unlock; }
- hwpt = iommufd_hw_pagetable_alloc(idev->ictx, ioas, idev, - 0, immediate_attach); - if (IS_ERR(hwpt)) { - destroy_hwpt = ERR_CAST(hwpt); + hwpt_paging = iommufd_hwpt_paging_alloc(idev->ictx, ioas, idev, + 0, immediate_attach); + if (IS_ERR(hwpt_paging)) { + destroy_hwpt = ERR_CAST(hwpt_paging); goto out_unlock; } + hwpt = &hwpt_paging->common;
if (!immediate_attach) { destroy_hwpt = (*do_attach)(idev, hwpt); @@ -588,7 +602,7 @@ iommufd_device_auto_get_domain(struct iommufd_device *idev, destroy_hwpt = NULL; }
- hwpt->auto_domain = true; + hwpt_paging->auto_domain = true; *pt_id = hwpt->obj.id;
iommufd_object_finalize(idev->ictx, &hwpt->obj); diff --git a/drivers/iommu/iommufd/hw_pagetable.c b/drivers/iommu/iommufd/hw_pagetable.c index 1c13c3b6368c..ddce4189753f 100644 --- a/drivers/iommu/iommufd/hw_pagetable.c +++ b/drivers/iommu/iommufd/hw_pagetable.c @@ -8,56 +8,61 @@ #include "../iommu-priv.h" #include "iommufd_private.h"
-void iommufd_hw_pagetable_destroy(struct iommufd_object *obj) +void iommufd_hwpt_paging_destroy(struct iommufd_object *obj) { - struct iommufd_hw_pagetable *hwpt = - container_of(obj, struct iommufd_hw_pagetable, obj); + struct iommufd_hwpt_paging *hwpt_paging = + container_of(obj, struct iommufd_hwpt_paging, common.obj);
- if (!list_empty(&hwpt->hwpt_item)) { - mutex_lock(&hwpt->ioas->mutex); - list_del(&hwpt->hwpt_item); - mutex_unlock(&hwpt->ioas->mutex); + if (!list_empty(&hwpt_paging->hwpt_item)) { + mutex_lock(&hwpt_paging->ioas->mutex); + list_del(&hwpt_paging->hwpt_item); + mutex_unlock(&hwpt_paging->ioas->mutex);
- iopt_table_remove_domain(&hwpt->ioas->iopt, hwpt->domain); + iopt_table_remove_domain(&hwpt_paging->ioas->iopt, + hwpt_paging->common.domain); }
- if (hwpt->domain) - iommu_domain_free(hwpt->domain); + if (hwpt_paging->common.domain) + iommu_domain_free(hwpt_paging->common.domain);
- refcount_dec(&hwpt->ioas->obj.users); + refcount_dec(&hwpt_paging->ioas->obj.users); }
-void iommufd_hw_pagetable_abort(struct iommufd_object *obj) +void iommufd_hwpt_paging_abort(struct iommufd_object *obj) { - struct iommufd_hw_pagetable *hwpt = - container_of(obj, struct iommufd_hw_pagetable, obj); + struct iommufd_hwpt_paging *hwpt_paging = + container_of(obj, struct iommufd_hwpt_paging, common.obj);
/* The ioas->mutex must be held until finalize is called. */ - lockdep_assert_held(&hwpt->ioas->mutex); + lockdep_assert_held(&hwpt_paging->ioas->mutex);
- if (!list_empty(&hwpt->hwpt_item)) { - list_del_init(&hwpt->hwpt_item); - iopt_table_remove_domain(&hwpt->ioas->iopt, hwpt->domain); + if (!list_empty(&hwpt_paging->hwpt_item)) { + list_del_init(&hwpt_paging->hwpt_item); + iopt_table_remove_domain(&hwpt_paging->ioas->iopt, + hwpt_paging->common.domain); } - iommufd_hw_pagetable_destroy(obj); + iommufd_hwpt_paging_destroy(obj); }
-static int iommufd_hw_pagetable_enforce_cc(struct iommufd_hw_pagetable *hwpt) +static int +iommufd_hwpt_paging_enforce_cc(struct iommufd_hwpt_paging *hwpt_paging) { - if (hwpt->enforce_cache_coherency) + struct iommu_domain *paging_domain = hwpt_paging->common.domain; + + if (hwpt_paging->enforce_cache_coherency) return 0;
- if (hwpt->domain->ops->enforce_cache_coherency) - hwpt->enforce_cache_coherency = - hwpt->domain->ops->enforce_cache_coherency( - hwpt->domain); - if (!hwpt->enforce_cache_coherency) + if (paging_domain->ops->enforce_cache_coherency) + hwpt_paging->enforce_cache_coherency = + paging_domain->ops->enforce_cache_coherency( + paging_domain); + if (!hwpt_paging->enforce_cache_coherency) return -EINVAL; return 0; }
/** - * iommufd_hw_pagetable_alloc() - Get an iommu_domain for a device + * iommufd_hwpt_paging_alloc() - Get a PAGING iommu_domain for a device * @ictx: iommufd context * @ioas: IOAS to associate the domain with * @idev: Device to get an iommu_domain for @@ -72,12 +77,13 @@ static int iommufd_hw_pagetable_enforce_cc(struct iommufd_hw_pagetable *hwpt) * iommufd_object_abort_and_destroy() or iommufd_object_finalize() is called on * the returned hwpt. */ -struct iommufd_hw_pagetable * -iommufd_hw_pagetable_alloc(struct iommufd_ctx *ictx, struct iommufd_ioas *ioas, - struct iommufd_device *idev, u32 flags, - bool immediate_attach) +struct iommufd_hwpt_paging * +iommufd_hwpt_paging_alloc(struct iommufd_ctx *ictx, struct iommufd_ioas *ioas, + struct iommufd_device *idev, u32 flags, + bool immediate_attach) { const struct iommu_ops *ops = dev_iommu_ops(idev->dev); + struct iommufd_hwpt_paging *hwpt_paging; struct iommufd_hw_pagetable *hwpt; int rc;
@@ -86,14 +92,17 @@ iommufd_hw_pagetable_alloc(struct iommufd_ctx *ictx, struct iommufd_ioas *ioas, if (flags && !ops->domain_alloc_user) return ERR_PTR(-EOPNOTSUPP);
- hwpt = iommufd_object_alloc(ictx, hwpt, IOMMUFD_OBJ_HWPT_PAGING); - if (IS_ERR(hwpt)) - return hwpt; + hwpt_paging = __iommufd_object_alloc(ictx, hwpt_paging, + IOMMUFD_OBJ_HWPT_PAGING, + common.obj); + if (IS_ERR(hwpt_paging)) + return ERR_CAST(hwpt_paging); + hwpt = &hwpt_paging->common;
- INIT_LIST_HEAD(&hwpt->hwpt_item); + INIT_LIST_HEAD(&hwpt_paging->hwpt_item); /* Pairs with iommufd_hw_pagetable_destroy() */ refcount_inc(&ioas->obj.users); - hwpt->ioas = ioas; + hwpt_paging->ioas = ioas;
if (ops->domain_alloc_user) { hwpt->domain = ops->domain_alloc_user(idev->dev, flags, @@ -126,7 +135,7 @@ iommufd_hw_pagetable_alloc(struct iommufd_ctx *ictx, struct iommufd_ioas *ioas, * allocate a separate HWPT (non-CC). */ if (idev->enforce_cache_coherency) { - rc = iommufd_hw_pagetable_enforce_cc(hwpt); + rc = iommufd_hwpt_paging_enforce_cc(hwpt_paging); if (WARN_ON(rc)) goto out_abort; } @@ -143,11 +152,11 @@ iommufd_hw_pagetable_alloc(struct iommufd_ctx *ictx, struct iommufd_ioas *ioas, goto out_abort; }
- rc = iopt_table_add_domain(&hwpt->ioas->iopt, hwpt->domain); + rc = iopt_table_add_domain(&ioas->iopt, hwpt->domain); if (rc) goto out_detach; - list_add_tail(&hwpt->hwpt_item, &hwpt->ioas->hwpt_list); - return hwpt; + list_add_tail(&hwpt_paging->hwpt_item, &ioas->hwpt_list); + return hwpt_paging;
out_detach: if (immediate_attach) @@ -160,6 +169,7 @@ iommufd_hw_pagetable_alloc(struct iommufd_ctx *ictx, struct iommufd_ioas *ioas, int iommufd_hwpt_alloc(struct iommufd_ucmd *ucmd) { struct iommu_hwpt_alloc *cmd = ucmd->cmd; + struct iommufd_hwpt_paging *hwpt_paging; struct iommufd_hw_pagetable *hwpt; struct iommufd_device *idev; struct iommufd_ioas *ioas; @@ -181,12 +191,13 @@ int iommufd_hwpt_alloc(struct iommufd_ucmd *ucmd) }
mutex_lock(&ioas->mutex); - hwpt = iommufd_hw_pagetable_alloc(ucmd->ictx, ioas, - idev, cmd->flags, false); - if (IS_ERR(hwpt)) { - rc = PTR_ERR(hwpt); + hwpt_paging = iommufd_hwpt_paging_alloc(ucmd->ictx, ioas, idev, + cmd->flags, false); + if (IS_ERR(hwpt_paging)) { + rc = PTR_ERR(hwpt_paging); goto out_unlock; } + hwpt = &hwpt_paging->common;
cmd->out_hwpt_id = hwpt->obj.id; rc = iommufd_ucmd_respond(ucmd, sizeof(*cmd)); @@ -208,7 +219,7 @@ int iommufd_hwpt_alloc(struct iommufd_ucmd *ucmd) int iommufd_hwpt_set_dirty_tracking(struct iommufd_ucmd *ucmd) { struct iommu_hwpt_set_dirty_tracking *cmd = ucmd->cmd; - struct iommufd_hw_pagetable *hwpt; + struct iommufd_hwpt_paging *hwpt_paging; struct iommufd_ioas *ioas; int rc = -EOPNOTSUPP; bool enable; @@ -216,23 +227,24 @@ int iommufd_hwpt_set_dirty_tracking(struct iommufd_ucmd *ucmd) if (cmd->flags & ~IOMMU_HWPT_DIRTY_TRACKING_ENABLE) return rc;
- hwpt = iommufd_get_hwpt(ucmd, cmd->hwpt_id); - if (IS_ERR(hwpt)) - return PTR_ERR(hwpt); + hwpt_paging = iommufd_get_hwpt_paging(ucmd, cmd->hwpt_id); + if (IS_ERR(hwpt_paging)) + return PTR_ERR(hwpt_paging);
- ioas = hwpt->ioas; + ioas = hwpt_paging->ioas; enable = cmd->flags & IOMMU_HWPT_DIRTY_TRACKING_ENABLE;
- rc = iopt_set_dirty_tracking(&ioas->iopt, hwpt->domain, enable); + rc = iopt_set_dirty_tracking(&ioas->iopt, + hwpt_paging->common.domain, enable);
- iommufd_put_object(&hwpt->obj); + iommufd_put_object(&hwpt_paging->common.obj); return rc; }
int iommufd_hwpt_get_dirty_bitmap(struct iommufd_ucmd *ucmd) { struct iommu_hwpt_get_dirty_bitmap *cmd = ucmd->cmd; - struct iommufd_hw_pagetable *hwpt; + struct iommufd_hwpt_paging *hwpt_paging; struct iommufd_ioas *ioas; int rc = -EOPNOTSUPP;
@@ -240,14 +252,15 @@ int iommufd_hwpt_get_dirty_bitmap(struct iommufd_ucmd *ucmd) cmd->__reserved) return -EOPNOTSUPP;
- hwpt = iommufd_get_hwpt(ucmd, cmd->hwpt_id); - if (IS_ERR(hwpt)) - return PTR_ERR(hwpt); + hwpt_paging = iommufd_get_hwpt_paging(ucmd, cmd->hwpt_id); + if (IS_ERR(hwpt_paging)) + return PTR_ERR(hwpt_paging);
- ioas = hwpt->ioas; - rc = iopt_read_and_clear_dirty_data(&ioas->iopt, hwpt->domain, + ioas = hwpt_paging->ioas; + rc = iopt_read_and_clear_dirty_data(&ioas->iopt, + hwpt_paging->common.domain, cmd->flags, cmd);
- iommufd_put_object(&hwpt->obj); + iommufd_put_object(&hwpt_paging->common.obj); return rc; } diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h index 776dd41c077f..6493fcdad1b7 100644 --- a/drivers/iommu/iommufd/iommufd_private.h +++ b/drivers/iommu/iommufd/iommufd_private.h @@ -181,7 +181,7 @@ struct iommufd_object *_iommufd_object_alloc(struct iommufd_ctx *ictx, size_t size, enum iommufd_object_type type);
-#define iommufd_object_alloc(ictx, ptr, type) \ +#define __iommufd_object_alloc(ictx, ptr, type, obj) \ container_of(_iommufd_object_alloc( \ ictx, \ sizeof(*(ptr)) + BUILD_BUG_ON_ZERO( \ @@ -190,6 +190,9 @@ struct iommufd_object *_iommufd_object_alloc(struct iommufd_ctx *ictx, type), \ typeof(*(ptr)), obj)
+#define iommufd_object_alloc(ictx, ptr, type) \ + __iommufd_object_alloc(ictx, ptr, type, obj) + /* * The IO Address Space (IOAS) pagetable is a virtual page table backed by the * io_pagetable object. It is a user controlled mapping of IOVA -> PFNs. The @@ -243,8 +246,12 @@ int iommufd_check_iova_range(struct io_pagetable *iopt, */ struct iommufd_hw_pagetable { struct iommufd_object obj; - struct iommufd_ioas *ioas; struct iommu_domain *domain; +}; + +struct iommufd_hwpt_paging { + struct iommufd_hw_pagetable common; + struct iommufd_ioas *ioas; bool auto_domain : 1; bool enforce_cache_coherency : 1; bool msi_cookie : 1; @@ -257,33 +264,41 @@ static inline bool hwpt_is_paging(struct iommufd_hw_pagetable *hwpt) return hwpt->obj.type == IOMMUFD_OBJ_HWPT_PAGING; }
-static inline struct iommufd_hw_pagetable * -iommufd_get_hwpt(struct iommufd_ucmd *ucmd, u32 id) +static inline struct iommufd_hwpt_paging * +to_hwpt_paging(struct iommufd_hw_pagetable *hwpt) +{ + return container_of(hwpt, struct iommufd_hwpt_paging, common); +} + +static inline struct iommufd_hwpt_paging * +iommufd_get_hwpt_paging(struct iommufd_ucmd *ucmd, u32 id) { return container_of(iommufd_get_object(ucmd->ictx, id, IOMMUFD_OBJ_HWPT_PAGING), - struct iommufd_hw_pagetable, obj); + struct iommufd_hwpt_paging, common.obj); } int iommufd_hwpt_set_dirty_tracking(struct iommufd_ucmd *ucmd); int iommufd_hwpt_get_dirty_bitmap(struct iommufd_ucmd *ucmd);
-struct iommufd_hw_pagetable * -iommufd_hw_pagetable_alloc(struct iommufd_ctx *ictx, struct iommufd_ioas *ioas, - struct iommufd_device *idev, u32 flags, - bool immediate_attach); +struct iommufd_hwpt_paging * +iommufd_hwpt_paging_alloc(struct iommufd_ctx *ictx, struct iommufd_ioas *ioas, + struct iommufd_device *idev, u32 flags, + bool immediate_attach); int iommufd_hw_pagetable_attach(struct iommufd_hw_pagetable *hwpt, struct iommufd_device *idev); struct iommufd_hw_pagetable * iommufd_hw_pagetable_detach(struct iommufd_device *idev); -void iommufd_hw_pagetable_destroy(struct iommufd_object *obj); -void iommufd_hw_pagetable_abort(struct iommufd_object *obj); +void iommufd_hwpt_paging_destroy(struct iommufd_object *obj); +void iommufd_hwpt_paging_abort(struct iommufd_object *obj); int iommufd_hwpt_alloc(struct iommufd_ucmd *ucmd);
static inline void iommufd_hw_pagetable_put(struct iommufd_ctx *ictx, struct iommufd_hw_pagetable *hwpt) { - lockdep_assert_not_held(&hwpt->ioas->mutex); - if (hwpt->auto_domain) + struct iommufd_hwpt_paging *hwpt_paging = to_hwpt_paging(hwpt); + + lockdep_assert_not_held(&hwpt_paging->ioas->mutex); + if (hwpt_paging->auto_domain) iommufd_object_deref_user(ictx, &hwpt->obj); else refcount_dec(&hwpt->obj.users); diff --git a/drivers/iommu/iommufd/main.c b/drivers/iommu/iommufd/main.c index 46198e8948d6..ab6675a7f6b4 100644 --- a/drivers/iommu/iommufd/main.c +++ b/drivers/iommu/iommufd/main.c @@ -489,8 +489,8 @@ static const struct iommufd_object_ops iommufd_object_ops[] = { .destroy = iommufd_ioas_destroy, }, [IOMMUFD_OBJ_HWPT_PAGING] = { - .destroy = iommufd_hw_pagetable_destroy, - .abort = iommufd_hw_pagetable_abort, + .destroy = iommufd_hwpt_paging_destroy, + .abort = iommufd_hwpt_paging_abort, }, #ifdef CONFIG_IOMMUFD_TEST [IOMMUFD_OBJ_SELFTEST] = { diff --git a/drivers/iommu/iommufd/vfio_compat.c b/drivers/iommu/iommufd/vfio_compat.c index 6c810bf80f99..538fbf76354d 100644 --- a/drivers/iommu/iommufd/vfio_compat.c +++ b/drivers/iommu/iommufd/vfio_compat.c @@ -255,7 +255,7 @@ static int iommufd_vfio_unmap_dma(struct iommufd_ctx *ictx, unsigned int cmd,
static int iommufd_vfio_cc_iommu(struct iommufd_ctx *ictx) { - struct iommufd_hw_pagetable *hwpt; + struct iommufd_hwpt_paging *hwpt_paging; struct iommufd_ioas *ioas; int rc = 1;
@@ -264,8 +264,8 @@ static int iommufd_vfio_cc_iommu(struct iommufd_ctx *ictx) return PTR_ERR(ioas);
mutex_lock(&ioas->mutex); - list_for_each_entry(hwpt, &ioas->hwpt_list, hwpt_item) { - if (!hwpt->enforce_cache_coherency) { + list_for_each_entry(hwpt_paging, &ioas->hwpt_list, hwpt_item) { + if (!hwpt_paging->enforce_cache_coherency) { rc = 0; break; }
On Tue, Oct 24, 2023 at 08:06:04AM -0700, Yi Liu wrote:
static int iommufd_group_do_replace_paging(struct iommufd_group *igroup,
struct iommufd_hw_pagetable *hwpt)
struct iommufd_hwpt_paging *hwpt_paging)
{ struct iommufd_hw_pagetable *old_hwpt = igroup->hwpt; struct iommufd_device *cur; @@ -441,22 +447,23 @@ static int iommufd_group_do_replace_paging(struct iommufd_group *igroup, lockdep_assert_held(&igroup->lock);
- if (hwpt_is_paging(old_hwpt) && hwpt->ioas != old_hwpt->ioas) {
- if (hwpt_is_paging(old_hwpt) &&
list_for_each_entry(cur, &igroup->device_list, group_item) { rc = iopt_table_enforce_dev_resv_regions(hwpt_paging->ioas != to_hwpt_paging(old_hwpt)->ioas) {
&hwpt->ioas->iopt, cur->dev, NULL);
} }&hwpt_paging->ioas->iopt, cur->dev, NULL); if (rc) goto err_unresv;
- rc = iommufd_group_setup_msi(igroup, hwpt);
- rc = iommufd_group_setup_msi(igroup, hwpt_paging); if (rc) goto err_unresv; return 0;
err_unresv:
- iommufd_group_remove_reserved_iova(igroup, hwpt);
- iommufd_group_remove_reserved_iova(igroup, hwpt_paging); return rc;
} @@ -482,7 +489,8 @@ iommufd_device_do_replace(struct iommufd_device *idev, } if (hwpt_is_paging(hwpt)) {
rc = iommufd_group_do_replace_paging(igroup, hwpt);
rc = iommufd_group_do_replace_paging(igroup,
if (rc) goto err_unlock; }to_hwpt_paging(hwpt));
@@ -493,8 +501,10 @@ iommufd_device_do_replace(struct iommufd_device *idev, old_hwpt = igroup->hwpt; if (hwpt_is_paging(old_hwpt) &&
(!hwpt_is_paging(hwpt) || hwpt->ioas != old_hwpt->ioas))
iommufd_group_remove_reserved_iova(igroup, old_hwpt);
(!hwpt_is_paging(hwpt) ||
to_hwpt_paging(hwpt)->ioas != to_hwpt_paging(old_hwpt)->ioas))
iommufd_group_remove_reserved_iova(igroup,
to_hwpt_paging(old_hwpt));
igroup->hwpt = hwpt; @@ -513,7 +523,8 @@ iommufd_device_do_replace(struct iommufd_device *idev, return old_hwpt; err_unresv: if (hwpt_is_paging(hwpt))
iommufd_group_remove_reserved_iova(igroup, hwpt);
iommufd_group_remove_reserved_iova(igroup,
to_hwpt_paging(old_hwpt));
This gets a compiler warning:
../drivers/iommu/iommufd/device.c:527:25: warning: variable 'old_hwpt' is uninitialized when used here [-Wuninitialized] to_hwpt_paging(old_hwpt)); ^~~~~~~~ ../drivers/iommu/iommufd/device.c:475:39: note: initialize the variable 'old_hwpt' to silence this warning struct iommufd_hw_pagetable *old_hwpt; ^ = NULL
I fixed it with:
--- a/drivers/iommu/iommufd/device.c +++ b/drivers/iommu/iommufd/device.c @@ -488,6 +488,7 @@ iommufd_device_do_replace(struct iommufd_device *idev, return NULL; }
+ old_hwpt = igroup->hwpt; if (hwpt_is_paging(hwpt)) { rc = iommufd_group_do_replace_paging(igroup, to_hwpt_paging(hwpt)); @@ -499,7 +500,6 @@ iommufd_device_do_replace(struct iommufd_device *idev, if (rc) goto err_unresv;
- old_hwpt = igroup->hwpt; if (hwpt_is_paging(old_hwpt) && (!hwpt_is_paging(hwpt) || to_hwpt_paging(hwpt)->ioas != to_hwpt_paging(old_hwpt)->ioas))
Jason
From: Nicolin Chen nicolinc@nvidia.com
Allow iommufd_hwpt_alloc() to have a common routine but jump to different allocators corresponding to different user input pt_obj types, either an IOMMUFD_OBJ_IOAS for a PAGING hwpt or an IOMMUFD_OBJ_HWPT_PAGING as the parent for a NESTED hwpt.
Also, move the "flags" validation to the hwpt allocator (paging), so that later the hwpt_nested allocator can do its own separate flags validation.
Signed-off-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Yi Liu yi.l.liu@intel.com --- drivers/iommu/iommufd/hw_pagetable.c | 46 ++++++++++++++++++---------- 1 file changed, 29 insertions(+), 17 deletions(-)
diff --git a/drivers/iommu/iommufd/hw_pagetable.c b/drivers/iommu/iommufd/hw_pagetable.c index ddce4189753f..fbe09e2473bc 100644 --- a/drivers/iommu/iommufd/hw_pagetable.c +++ b/drivers/iommu/iommufd/hw_pagetable.c @@ -82,6 +82,8 @@ iommufd_hwpt_paging_alloc(struct iommufd_ctx *ictx, struct iommufd_ioas *ioas, struct iommufd_device *idev, u32 flags, bool immediate_attach) { + const u32 valid_flags = IOMMU_HWPT_ALLOC_NEST_PARENT | + IOMMU_HWPT_ALLOC_DIRTY_TRACKING; const struct iommu_ops *ops = dev_iommu_ops(idev->dev); struct iommufd_hwpt_paging *hwpt_paging; struct iommufd_hw_pagetable *hwpt; @@ -91,6 +93,8 @@ iommufd_hwpt_paging_alloc(struct iommufd_ctx *ictx, struct iommufd_ioas *ioas,
if (flags && !ops->domain_alloc_user) return ERR_PTR(-EOPNOTSUPP); + if (flags & ~valid_flags) + return ERR_PTR(-EOPNOTSUPP);
hwpt_paging = __iommufd_object_alloc(ictx, hwpt_paging, IOMMUFD_OBJ_HWPT_PAGING, @@ -169,35 +173,41 @@ iommufd_hwpt_paging_alloc(struct iommufd_ctx *ictx, struct iommufd_ioas *ioas, int iommufd_hwpt_alloc(struct iommufd_ucmd *ucmd) { struct iommu_hwpt_alloc *cmd = ucmd->cmd; - struct iommufd_hwpt_paging *hwpt_paging; struct iommufd_hw_pagetable *hwpt; + struct iommufd_ioas *ioas = NULL; + struct iommufd_object *pt_obj; struct iommufd_device *idev; - struct iommufd_ioas *ioas; int rc;
- if ((cmd->flags & ~(IOMMU_HWPT_ALLOC_NEST_PARENT | - IOMMU_HWPT_ALLOC_DIRTY_TRACKING)) || - cmd->__reserved) + if (cmd->__reserved) return -EOPNOTSUPP;
idev = iommufd_get_device(ucmd, cmd->dev_id); if (IS_ERR(idev)) return PTR_ERR(idev);
- ioas = iommufd_get_ioas(ucmd->ictx, cmd->pt_id); - if (IS_ERR(ioas)) { - rc = PTR_ERR(ioas); + pt_obj = iommufd_get_object(ucmd->ictx, cmd->pt_id, IOMMUFD_OBJ_ANY); + if (IS_ERR(pt_obj)) { + rc = -EINVAL; goto out_put_idev; }
- mutex_lock(&ioas->mutex); - hwpt_paging = iommufd_hwpt_paging_alloc(ucmd->ictx, ioas, idev, - cmd->flags, false); - if (IS_ERR(hwpt_paging)) { - rc = PTR_ERR(hwpt_paging); - goto out_unlock; + if (pt_obj->type == IOMMUFD_OBJ_IOAS) { + struct iommufd_hwpt_paging *hwpt_paging; + + ioas = container_of(pt_obj, struct iommufd_ioas, obj); + mutex_lock(&ioas->mutex); + hwpt_paging = iommufd_hwpt_paging_alloc(ucmd->ictx, ioas, idev, + cmd->flags, false); + if (IS_ERR(hwpt_paging)) { + rc = PTR_ERR(hwpt_paging); + goto out_unlock; + } + hwpt = &hwpt_paging->common; + } else { + rc = -EINVAL; + goto out_put_pt; } - hwpt = &hwpt_paging->common;
cmd->out_hwpt_id = hwpt->obj.id; rc = iommufd_ucmd_respond(ucmd, sizeof(*cmd)); @@ -209,8 +219,10 @@ int iommufd_hwpt_alloc(struct iommufd_ucmd *ucmd) out_hwpt: iommufd_object_abort_and_destroy(ucmd->ictx, &hwpt->obj); out_unlock: - mutex_unlock(&ioas->mutex); - iommufd_put_object(&ioas->obj); + if (ioas) + mutex_unlock(&ioas->mutex); +out_put_pt: + iommufd_put_object(pt_obj); out_put_idev: iommufd_put_object(&idev->obj); return rc;
From: Nicolin Chen nicolinc@nvidia.com
IOMMU_HWPT_ALLOC already supports iommu_domain allocation for usersapce. But it can only allocate a hw_pagetable that associates to a given IOAS, i.e. only a kernel-managed hw_pagetable of IOMMUFD_OBJ_HWPT_PAGING type.
IOMMU drivers can now support user-managed hw_pagetables, for two-stage translation use cases that require user data input from the user space.
Add a new IOMMUFD_OBJ_HWPT_NESTED type with its abort/destroy(). Pair it with a new iommufd_hwpt_nested structure and its to_hwpt_nested() helper. Update the to_hwpt_paging() helper, so a NESTED-type hw_pagetable can be handled in the callers, for example iommufd_hw_pagetable_enforce_rr().
Screen the inputs including the parent PAGING-type hw_pagetable that has a need of a new nest_parent flag in the iommufd_hwpt_paging structure.
Extend the IOMMU_HWPT_ALLOC ioctl to accept an IOMMU driver specific data input. Also, update the @pt_id to accept hwpt_id too besides an ioas_id. Then, use them to allocate a hw_pagetable of IOMMUFD_OBJ_HWPT_NESTED type using the iommufd_hw_pagetable_alloc_nested() allocator.
Signed-off-by: Nicolin Chen nicolinc@nvidia.com Co-developed-by: Yi Liu yi.l.liu@intel.com Signed-off-by: Yi Liu yi.l.liu@intel.com --- drivers/iommu/iommufd/device.c | 1 + drivers/iommu/iommufd/hw_pagetable.c | 105 ++++++++++++++++++++++++ drivers/iommu/iommufd/iommufd_private.h | 29 +++++-- drivers/iommu/iommufd/main.c | 4 + include/uapi/linux/iommufd.h | 23 +++++- 5 files changed, 154 insertions(+), 8 deletions(-)
diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c index 902288ca910d..6dbd58a718e4 100644 --- a/drivers/iommu/iommufd/device.c +++ b/drivers/iommu/iommufd/device.c @@ -627,6 +627,7 @@ static int iommufd_device_change_pt(struct iommufd_device *idev, u32 *pt_id, return PTR_ERR(pt_obj);
switch (pt_obj->type) { + case IOMMUFD_OBJ_HWPT_NESTED: case IOMMUFD_OBJ_HWPT_PAGING: { struct iommufd_hw_pagetable *hwpt = container_of(pt_obj, struct iommufd_hw_pagetable, obj); diff --git a/drivers/iommu/iommufd/hw_pagetable.c b/drivers/iommu/iommufd/hw_pagetable.c index fbe09e2473bc..d41428a0fdc2 100644 --- a/drivers/iommu/iommufd/hw_pagetable.c +++ b/drivers/iommu/iommufd/hw_pagetable.c @@ -44,6 +44,22 @@ void iommufd_hwpt_paging_abort(struct iommufd_object *obj) iommufd_hwpt_paging_destroy(obj); }
+void iommufd_hwpt_nested_destroy(struct iommufd_object *obj) +{ + struct iommufd_hwpt_nested *hwpt_nested = + container_of(obj, struct iommufd_hwpt_nested, common.obj); + + if (hwpt_nested->common.domain) + iommu_domain_free(hwpt_nested->common.domain); + + refcount_dec(&hwpt_nested->parent->common.obj.users); +} + +void iommufd_hwpt_nested_abort(struct iommufd_object *obj) +{ + iommufd_hwpt_nested_destroy(obj); +} + static int iommufd_hwpt_paging_enforce_cc(struct iommufd_hwpt_paging *hwpt_paging) { @@ -107,6 +123,7 @@ iommufd_hwpt_paging_alloc(struct iommufd_ctx *ictx, struct iommufd_ioas *ioas, /* Pairs with iommufd_hw_pagetable_destroy() */ refcount_inc(&ioas->obj.users); hwpt_paging->ioas = ioas; + hwpt_paging->nest_parent = flags & IOMMU_HWPT_ALLOC_NEST_PARENT;
if (ops->domain_alloc_user) { hwpt->domain = ops->domain_alloc_user(idev->dev, flags, @@ -170,6 +187,73 @@ iommufd_hwpt_paging_alloc(struct iommufd_ctx *ictx, struct iommufd_ioas *ioas, return ERR_PTR(rc); }
+/** + * iommufd_hwpt_nested_alloc() - Get a NESTED iommu_domain for a device + * @ictx: iommufd context + * @parent: Parent PAGING-type hwpt to associate the domain with + * @idev: Device to get an iommu_domain for + * @flags: Flags from userspace + * @user_data: user_data pointer. Must be valid + * + * Allocate a new iommu_domain (must be IOMMU_DOMAIN_NESTED) and return it as + * a NESTED hw_pagetable. The given parent PAGING-type hwpt must be capable of + * being a parent. + */ +static struct iommufd_hwpt_nested * +iommufd_hwpt_nested_alloc(struct iommufd_ctx *ictx, + struct iommufd_hwpt_paging *parent, + struct iommufd_device *idev, u32 flags, + const struct iommu_user_data *user_data) +{ + const struct iommu_ops *ops = dev_iommu_ops(idev->dev); + struct iommufd_hwpt_nested *hwpt_nested; + struct iommufd_hw_pagetable *hwpt; + int rc; + + if (flags != 0) + return ERR_PTR(-EOPNOTSUPP); + if (!user_data) + return ERR_PTR(-EINVAL); + if (user_data->type == IOMMU_HWPT_DATA_NONE) + return ERR_PTR(-EINVAL); + if (parent->auto_domain) + return ERR_PTR(-EINVAL); + if (!parent->nest_parent) + return ERR_PTR(-EINVAL); + + if (!ops->domain_alloc_user) + return ERR_PTR(-EOPNOTSUPP); + + hwpt_nested = __iommufd_object_alloc(ictx, hwpt_nested, + IOMMUFD_OBJ_HWPT_NESTED, + common.obj); + if (IS_ERR(hwpt_nested)) + return ERR_CAST(hwpt_nested); + hwpt = &hwpt_nested->common; + + refcount_inc(&parent->common.obj.users); + hwpt_nested->parent = parent; + + hwpt->domain = ops->domain_alloc_user(idev->dev, 0, + parent->common.domain, + user_data); + if (IS_ERR(hwpt->domain)) { + rc = PTR_ERR(hwpt->domain); + hwpt->domain = NULL; + goto out_abort; + } + + if (WARN_ON_ONCE(hwpt->domain->type != IOMMU_DOMAIN_NESTED)) { + rc = -EINVAL; + goto out_abort; + } + return hwpt_nested; + +out_abort: + iommufd_object_abort_and_destroy(ictx, &hwpt->obj); + return ERR_PTR(rc); +} + int iommufd_hwpt_alloc(struct iommufd_ucmd *ucmd) { struct iommu_hwpt_alloc *cmd = ucmd->cmd; @@ -195,6 +279,10 @@ int iommufd_hwpt_alloc(struct iommufd_ucmd *ucmd) if (pt_obj->type == IOMMUFD_OBJ_IOAS) { struct iommufd_hwpt_paging *hwpt_paging;
+ if (cmd->data_type != IOMMU_HWPT_DATA_NONE) { + rc = -EINVAL; + goto out_put_pt; + } ioas = container_of(pt_obj, struct iommufd_ioas, obj); mutex_lock(&ioas->mutex); hwpt_paging = iommufd_hwpt_paging_alloc(ucmd->ictx, ioas, idev, @@ -204,6 +292,23 @@ int iommufd_hwpt_alloc(struct iommufd_ucmd *ucmd) goto out_unlock; } hwpt = &hwpt_paging->common; + } else if (pt_obj->type == IOMMUFD_OBJ_HWPT_PAGING) { + const struct iommu_user_data user_data = { + .type = cmd->data_type, + .uptr = u64_to_user_ptr(cmd->data_uptr), + .len = cmd->data_len, + }; + struct iommufd_hwpt_nested *hwpt_nested; + struct iommufd_hwpt_paging *parent; + + parent = container_of(pt_obj, typeof(*parent), common.obj); + hwpt_nested = iommufd_hwpt_nested_alloc(ucmd->ictx, parent, idev, + cmd->flags, &user_data); + if (IS_ERR(hwpt_nested)) { + rc = PTR_ERR(hwpt_nested); + goto out_unlock; + } + hwpt = &hwpt_nested->common; } else { rc = -EINVAL; goto out_put_pt; diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h index 6493fcdad1b7..6fdad56893af 100644 --- a/drivers/iommu/iommufd/iommufd_private.h +++ b/drivers/iommu/iommufd/iommufd_private.h @@ -124,6 +124,7 @@ enum iommufd_object_type { IOMMUFD_OBJ_ANY = IOMMUFD_OBJ_NONE, IOMMUFD_OBJ_DEVICE, IOMMUFD_OBJ_HWPT_PAGING, + IOMMUFD_OBJ_HWPT_NESTED, IOMMUFD_OBJ_IOAS, IOMMUFD_OBJ_ACCESS, #ifdef CONFIG_IOMMUFD_TEST @@ -255,10 +256,16 @@ struct iommufd_hwpt_paging { bool auto_domain : 1; bool enforce_cache_coherency : 1; bool msi_cookie : 1; + bool nest_parent : 1; /* Head at iommufd_ioas::hwpt_list */ struct list_head hwpt_item; };
+struct iommufd_hwpt_nested { + struct iommufd_hw_pagetable common; + struct iommufd_hwpt_paging *parent; +}; + static inline bool hwpt_is_paging(struct iommufd_hw_pagetable *hwpt) { return hwpt->obj.type == IOMMUFD_OBJ_HWPT_PAGING; @@ -290,18 +297,28 @@ struct iommufd_hw_pagetable * iommufd_hw_pagetable_detach(struct iommufd_device *idev); void iommufd_hwpt_paging_destroy(struct iommufd_object *obj); void iommufd_hwpt_paging_abort(struct iommufd_object *obj); +void iommufd_hwpt_nested_destroy(struct iommufd_object *obj); +void iommufd_hwpt_nested_abort(struct iommufd_object *obj); int iommufd_hwpt_alloc(struct iommufd_ucmd *ucmd);
static inline void iommufd_hw_pagetable_put(struct iommufd_ctx *ictx, struct iommufd_hw_pagetable *hwpt) { - struct iommufd_hwpt_paging *hwpt_paging = to_hwpt_paging(hwpt); + if (WARN_ON(hwpt->obj.type != IOMMUFD_OBJ_HWPT_PAGING && + hwpt->obj.type != IOMMUFD_OBJ_HWPT_NESTED)) + return;
- lockdep_assert_not_held(&hwpt_paging->ioas->mutex); - if (hwpt_paging->auto_domain) - iommufd_object_deref_user(ictx, &hwpt->obj); - else - refcount_dec(&hwpt->obj.users); + if (hwpt->obj.type == IOMMUFD_OBJ_HWPT_PAGING) { + struct iommufd_hwpt_paging *hwpt_paging = to_hwpt_paging(hwpt); + + lockdep_assert_not_held(&hwpt_paging->ioas->mutex); + + if (hwpt_paging->auto_domain) { + iommufd_object_deref_user(ictx, &hwpt->obj); + return; + } + } + refcount_dec(&hwpt->obj.users); }
struct iommufd_group { diff --git a/drivers/iommu/iommufd/main.c b/drivers/iommu/iommufd/main.c index ab6675a7f6b4..45b9d40773b1 100644 --- a/drivers/iommu/iommufd/main.c +++ b/drivers/iommu/iommufd/main.c @@ -492,6 +492,10 @@ static const struct iommufd_object_ops iommufd_object_ops[] = { .destroy = iommufd_hwpt_paging_destroy, .abort = iommufd_hwpt_paging_abort, }, + [IOMMUFD_OBJ_HWPT_NESTED] = { + .destroy = iommufd_hwpt_nested_destroy, + .abort = iommufd_hwpt_nested_abort, + }, #ifdef CONFIG_IOMMUFD_TEST [IOMMUFD_OBJ_SELFTEST] = { .destroy = iommufd_selftest_destroy, diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h index fccc6315a520..d816deac906f 100644 --- a/include/uapi/linux/iommufd.h +++ b/include/uapi/linux/iommufd.h @@ -374,15 +374,31 @@ enum iommu_hwpt_data_type { * @size: sizeof(struct iommu_hwpt_alloc) * @flags: Combination of enum iommufd_hwpt_alloc_flags * @dev_id: The device to allocate this HWPT for - * @pt_id: The IOAS to connect this HWPT to + * @pt_id: The IOAS or HWPT to connect this HWPT to * @out_hwpt_id: The ID of the new HWPT * @__reserved: Must be 0 + * @data_type: One of enum iommu_hwpt_data_type + * @data_len: Length of the type specific data + * @data_uptr: User pointer to the type specific data * * Explicitly allocate a hardware page table object. This is the same object * type that is returned by iommufd_device_attach() and represents the * underlying iommu driver's iommu_domain kernel object. * - * A HWPT will be created with the IOVA mappings from the given IOAS. + * A kernel-managed HWPT will be created with the mappings from the given + * IOAS via the @pt_id. The @data_type for this allocation must be set to + * IOMMU_HWPT_DATA_NONE. The HWPT can be allocated as a parent HWPT for a + * nesting configuration by passing IOMMU_HWPT_ALLOC_NEST_PARENT via @flags. + * + * A user-managed nested HWPT will be created from a given parent HWPT via + * @pt_id, in which the parent HWPT must be allocated previously via the + * same ioctl from a given IOAS (@pt_id). In this case, the @data_type + * must be set to a pre-defined type corresponding to an I/O page table + * type supported by the underlying IOMMU hardware. + * + * If the @data_type is set to IOMMU_HWPT_DATA_NONE, @data_len and + * @data_uptr should be zero. Otherwise, both @data_len and @data_uptr + * must be given. */ struct iommu_hwpt_alloc { __u32 size; @@ -391,6 +407,9 @@ struct iommu_hwpt_alloc { __u32 pt_id; __u32 out_hwpt_id; __u32 __reserved; + __u32 data_type; + __u32 data_len; + __aligned_u64 data_uptr; }; #define IOMMU_HWPT_ALLOC _IO(IOMMUFD_TYPE, IOMMUFD_CMD_HWPT_ALLOC)
On Tue, Oct 24, 2023 at 08:06:06AM -0700, Yi Liu wrote:
+static struct iommufd_hwpt_nested * +iommufd_hwpt_nested_alloc(struct iommufd_ctx *ictx,
struct iommufd_hwpt_paging *parent,
struct iommufd_device *idev, u32 flags,
const struct iommu_user_data *user_data)
+{
- const struct iommu_ops *ops = dev_iommu_ops(idev->dev);
- struct iommufd_hwpt_nested *hwpt_nested;
- struct iommufd_hw_pagetable *hwpt;
- int rc;
- if (flags != 0)
return ERR_PTR(-EOPNOTSUPP);
- if (!user_data)
return ERR_PTR(-EINVAL);
user_data is never NULL now, I removed this
- if (user_data->type == IOMMU_HWPT_DATA_NONE)
return ERR_PTR(-EINVAL);
- if (parent->auto_domain)
return ERR_PTR(-EINVAL);
- if (!parent->nest_parent)
return ERR_PTR(-EINVAL);
And put these all together
if (flags || user_data->type == IOMMU_HWPT_DATA_NONE || !ops->domain_alloc_user) return ERR_PTR(-EOPNOTSUPP); if (parent->auto_domain || !parent->nest_parent) return ERR_PTR(-EINVAL);
- if (!ops->domain_alloc_user)
return ERR_PTR(-EOPNOTSUPP);
- hwpt_nested = __iommufd_object_alloc(ictx, hwpt_nested,
IOMMUFD_OBJ_HWPT_NESTED,
common.obj);
- if (IS_ERR(hwpt_nested))
return ERR_CAST(hwpt_nested);
- hwpt = &hwpt_nested->common;
- refcount_inc(&parent->common.obj.users);
- hwpt_nested->parent = parent;
- hwpt->domain = ops->domain_alloc_user(idev->dev, 0,
And we may as well pass flags here even though we know it is 0 today.
Jason
On Tue, Oct 24, 2023 at 08:06:06AM -0700, Yi Liu wrote:
@@ -195,6 +279,10 @@ int iommufd_hwpt_alloc(struct iommufd_ucmd *ucmd) if (pt_obj->type == IOMMUFD_OBJ_IOAS) { struct iommufd_hwpt_paging *hwpt_paging;
if (cmd->data_type != IOMMU_HWPT_DATA_NONE) {
rc = -EINVAL;
goto out_put_pt;
ioas = container_of(pt_obj, struct iommufd_ioas, obj); mutex_lock(&ioas->mutex); hwpt_paging = iommufd_hwpt_paging_alloc(ucmd->ictx, ioas, idev,}
?? What is this?
Ah something went wrong earlier in "iommu: Pass in parent domain with user_data to domain_alloc_user op"
Once we added the user_data we should flow it through to the op always.
Jason
On Tue, Oct 24, 2023 at 02:18:10PM -0300, Jason Gunthorpe wrote:
On Tue, Oct 24, 2023 at 08:06:06AM -0700, Yi Liu wrote:
@@ -195,6 +279,10 @@ int iommufd_hwpt_alloc(struct iommufd_ucmd *ucmd) if (pt_obj->type == IOMMUFD_OBJ_IOAS) { struct iommufd_hwpt_paging *hwpt_paging;
if (cmd->data_type != IOMMU_HWPT_DATA_NONE) {
rc = -EINVAL;
goto out_put_pt;
ioas = container_of(pt_obj, struct iommufd_ioas, obj); mutex_lock(&ioas->mutex); hwpt_paging = iommufd_hwpt_paging_alloc(ucmd->ictx, ioas, idev,}
?? What is this?
Ah something went wrong earlier in "iommu: Pass in parent domain with user_data to domain_alloc_user op"
Once we added the user_data we should flow it through to the op always.
Hmm, iommufd_hwpt_paging_alloc doesn't take (or need) user_data, but we could pass in a dummy one if that looks better?
Thanks Nicolin
On Tue, Oct 24, 2023 at 10:28:45AM -0700, Nicolin Chen wrote:
On Tue, Oct 24, 2023 at 02:18:10PM -0300, Jason Gunthorpe wrote:
On Tue, Oct 24, 2023 at 08:06:06AM -0700, Yi Liu wrote:
@@ -195,6 +279,10 @@ int iommufd_hwpt_alloc(struct iommufd_ucmd *ucmd) if (pt_obj->type == IOMMUFD_OBJ_IOAS) { struct iommufd_hwpt_paging *hwpt_paging;
if (cmd->data_type != IOMMU_HWPT_DATA_NONE) {
rc = -EINVAL;
goto out_put_pt;
ioas = container_of(pt_obj, struct iommufd_ioas, obj); mutex_lock(&ioas->mutex); hwpt_paging = iommufd_hwpt_paging_alloc(ucmd->ictx, ioas, idev,}
?? What is this?
Ah something went wrong earlier in "iommu: Pass in parent domain with user_data to domain_alloc_user op"
Once we added the user_data we should flow it through to the op always.
Hmm, iommufd_hwpt_paging_alloc doesn't take (or need) user_data, but we could pass in a dummy one if that looks better?
The point is for the user_data to always be available, the driver needs to check it if it is passed.
This should all be plumbed to allow drivers to also customize their paging domains too.
Jason
On Tue, Oct 24, 2023 at 02:31:39PM -0300, Jason Gunthorpe wrote:
On Tue, Oct 24, 2023 at 10:28:45AM -0700, Nicolin Chen wrote:
On Tue, Oct 24, 2023 at 02:18:10PM -0300, Jason Gunthorpe wrote:
On Tue, Oct 24, 2023 at 08:06:06AM -0700, Yi Liu wrote:
@@ -195,6 +279,10 @@ int iommufd_hwpt_alloc(struct iommufd_ucmd *ucmd) if (pt_obj->type == IOMMUFD_OBJ_IOAS) { struct iommufd_hwpt_paging *hwpt_paging;
if (cmd->data_type != IOMMU_HWPT_DATA_NONE) {
rc = -EINVAL;
goto out_put_pt;
ioas = container_of(pt_obj, struct iommufd_ioas, obj); mutex_lock(&ioas->mutex); hwpt_paging = iommufd_hwpt_paging_alloc(ucmd->ictx, ioas, idev,}
?? What is this?
Ah something went wrong earlier in "iommu: Pass in parent domain with user_data to domain_alloc_user op"
Once we added the user_data we should flow it through to the op always.
Hmm, iommufd_hwpt_paging_alloc doesn't take (or need) user_data, but we could pass in a dummy one if that looks better?
The point is for the user_data to always be available, the driver needs to check it if it is passed.
This should all be plumbed to allow drivers to also customize their paging domains too.
We don't have a use case of customizing the paging domains. And our selftest isn't covering this path. Nor the case is supported by the uAPI:
458- * A kernel-managed HWPT will be created with the mappings from the given 459- * IOAS via the @pt_id. The @data_type for this allocation must be set to 460: * IOMMU_HWPT_DATA_NONE. The HWPT can be allocated as a parent HWPT for a 461- * nesting configuration by passing IOMMU_HWPT_ALLOC_NEST_PARENT via @flags. 462- *
Also, if we do passing in the data, we'd need to...
280-static struct iommu_domain * 281-mock_domain_alloc_user(struct device *dev, u32 flags, 282- struct iommu_domain *parent, 283: const struct iommu_user_data *user_data) 284-{ 285- struct mock_iommu_domain *mock_parent; 286- struct iommu_hwpt_selftest user_cfg; 287- int rc; 288- 289: if (!user_data) { /* must be mock_domain */
...change this to if (!parent)...
290- struct mock_dev *mdev = container_of(dev, struct mock_dev, dev); 291- bool has_dirty_flag = flags & IOMMU_HWPT_ALLOC_DIRTY_TRACKING; 292- bool no_dirty_ops = mdev->flags & MOCK_FLAGS_DEVICE_NO_DIRTY; 293- 294- if (parent) 295- return ERR_PTR(-EINVAL);
...and drop this.
296- if (has_dirty_flag && no_dirty_ops) 297- return ERR_PTR(-EOPNOTSUPP); 298- return __mock_domain_alloc_paging(IOMMU_DOMAIN_UNMANAGED, 299- has_dirty_flag); 300- } 301- 302- /* must be mock_domain_nested */ 303: if (user_data->type != IOMMU_HWPT_DATA_SELFTEST) 304- return ERR_PTR(-EOPNOTSUPP);
Thanks Nicolin
On Tue, Oct 24, 2023 at 10:50:58AM -0700, Nicolin Chen wrote:
The point is for the user_data to always be available, the driver needs to check it if it is passed.
This should all be plumbed to allow drivers to also customize their paging domains too.
We don't have a use case of customizing the paging domains. And our selftest isn't covering this path. Nor the case is supported by the uAPI:
But this is the design, it is why everything is setup like this - we didn't create a new op to allocate nesting domains, we made a flexible user allocator.
458- * A kernel-managed HWPT will be created with the mappings from the given 459- * IOAS via the @pt_id. The @data_type for this allocation must be set to 460: * IOMMU_HWPT_DATA_NONE. The HWPT can be allocated as a parent HWPT for a 461- * nesting configuration by passing IOMMU_HWPT_ALLOC_NEST_PARENT via @flags. 462- *
Yes, that is the reality today. If someone comes to use the more complete interface they need to fix that comment..
Also, if we do passing in the data, we'd need to...
280-static struct iommu_domain * 281-mock_domain_alloc_user(struct device *dev, u32 flags, 282- struct iommu_domain *parent, 283: const struct iommu_user_data *user_data) 284-{ 285- struct mock_iommu_domain *mock_parent; 286- struct iommu_hwpt_selftest user_cfg; 287- int rc; 288- 289: if (!user_data) { /* must be mock_domain */
...change this to if (!parent)...
Yes, this logic is not ideal. The parent is the request for nesting, not user_data. user_data is the generic creation parameters, which are not supported outside nesting
Like this:
--- a/drivers/iommu/iommufd/selftest.c +++ b/drivers/iommu/iommufd/selftest.c @@ -286,14 +286,12 @@ mock_domain_alloc_user(struct device *dev, u32 flags, int rc;
/* must be mock_domain */ - if (!user_data) { + if (!parent) { struct mock_dev *mdev = container_of(dev, struct mock_dev, dev); bool has_dirty_flag = flags & IOMMU_HWPT_ALLOC_DIRTY_TRACKING; bool no_dirty_ops = mdev->flags & MOCK_FLAGS_DEVICE_NO_DIRTY;
- if (parent) - return ERR_PTR(-EINVAL); - if (has_dirty_flag && no_dirty_ops) + if (user_data || (has_dirty_flag && no_dirty_ops)) return ERR_PTR(-EOPNOTSUPP); return __mock_domain_alloc_paging(IOMMU_DOMAIN_UNMANAGED, has_dirty_flag);
Thanks, Jason
On Tue, Oct 24, 2023 at 03:00:49PM -0300, Jason Gunthorpe wrote:
On Tue, Oct 24, 2023 at 10:50:58AM -0700, Nicolin Chen wrote:
The point is for the user_data to always be available, the driver needs to check it if it is passed.
This should all be plumbed to allow drivers to also customize their paging domains too.
We don't have a use case of customizing the paging domains. And our selftest isn't covering this path. Nor the case is supported by the uAPI:
But this is the design, it is why everything is setup like this - we didn't create a new op to allocate nesting domains, we made a flexible user allocator.
458- * A kernel-managed HWPT will be created with the mappings from the given 459- * IOAS via the @pt_id. The @data_type for this allocation must be set to 460: * IOMMU_HWPT_DATA_NONE. The HWPT can be allocated as a parent HWPT for a 461- * nesting configuration by passing IOMMU_HWPT_ALLOC_NEST_PARENT via @flags. 462- *
Yes, that is the reality today. If someone comes to use the more complete interface they need to fix that comment..
Ack.
Also, if we do passing in the data, we'd need to...
280-static struct iommu_domain * 281-mock_domain_alloc_user(struct device *dev, u32 flags, 282- struct iommu_domain *parent, 283: const struct iommu_user_data *user_data) 284-{ 285- struct mock_iommu_domain *mock_parent; 286- struct iommu_hwpt_selftest user_cfg; 287- int rc; 288- 289: if (!user_data) { /* must be mock_domain */
...change this to if (!parent)...
Yes, this logic is not ideal. The parent is the request for nesting, not user_data. user_data is the generic creation parameters, which are not supported outside nesting Like this:
--- a/drivers/iommu/iommufd/selftest.c +++ b/drivers/iommu/iommufd/selftest.c @@ -286,14 +286,12 @@ mock_domain_alloc_user(struct device *dev, u32 flags, int rc; /* must be mock_domain */
if (!user_data) {
if (!parent) { struct mock_dev *mdev = container_of(dev, struct mock_dev, dev); bool has_dirty_flag = flags & IOMMU_HWPT_ALLOC_DIRTY_TRACKING; bool no_dirty_ops = mdev->flags & MOCK_FLAGS_DEVICE_NO_DIRTY;
if (parent)
return ERR_PTR(-EINVAL);
if (has_dirty_flag && no_dirty_ops)
if (user_data || (has_dirty_flag && no_dirty_ops)) return ERR_PTR(-EOPNOTSUPP); return __mock_domain_alloc_paging(IOMMU_DOMAIN_UNMANAGED, has_dirty_flag);
Yea.. Then the vt-d driver needs a similar change too (@Yi) as I found it almost doing the same: https://lore.kernel.org/linux-iommu/20231024151412.50046-8-yi.l.liu@intel.co...
Thanks Nic
On 2023/10/25 02:19, Nicolin Chen wrote:
On Tue, Oct 24, 2023 at 03:00:49PM -0300, Jason Gunthorpe wrote:
On Tue, Oct 24, 2023 at 10:50:58AM -0700, Nicolin Chen wrote:
The point is for the user_data to always be available, the driver needs to check it if it is passed.
This should all be plumbed to allow drivers to also customize their paging domains too.
We don't have a use case of customizing the paging domains. And our selftest isn't covering this path. Nor the case is supported by the uAPI:
But this is the design, it is why everything is setup like this - we didn't create a new op to allocate nesting domains, we made a flexible user allocator.
458- * A kernel-managed HWPT will be created with the mappings from the given 459- * IOAS via the @pt_id. The @data_type for this allocation must be set to 460: * IOMMU_HWPT_DATA_NONE. The HWPT can be allocated as a parent HWPT for a 461- * nesting configuration by passing IOMMU_HWPT_ALLOC_NEST_PARENT via @flags. 462- *
Yes, that is the reality today. If someone comes to use the more complete interface they need to fix that comment..
Ack.
Also, if we do passing in the data, we'd need to...
280-static struct iommu_domain * 281-mock_domain_alloc_user(struct device *dev, u32 flags, 282- struct iommu_domain *parent, 283: const struct iommu_user_data *user_data) 284-{ 285- struct mock_iommu_domain *mock_parent; 286- struct iommu_hwpt_selftest user_cfg; 287- int rc; 288- 289: if (!user_data) { /* must be mock_domain */
...change this to if (!parent)...
Yes, this logic is not ideal. The parent is the request for nesting, not user_data. user_data is the generic creation parameters, which are not supported outside nesting Like this:
--- a/drivers/iommu/iommufd/selftest.c +++ b/drivers/iommu/iommufd/selftest.c @@ -286,14 +286,12 @@ mock_domain_alloc_user(struct device *dev, u32 flags, int rc; /* must be mock_domain */
if (!user_data) {
if (!parent) { struct mock_dev *mdev = container_of(dev, struct mock_dev, dev); bool has_dirty_flag = flags & IOMMU_HWPT_ALLOC_DIRTY_TRACKING; bool no_dirty_ops = mdev->flags & MOCK_FLAGS_DEVICE_NO_DIRTY;
if (parent)
return ERR_PTR(-EINVAL);
if (has_dirty_flag && no_dirty_ops)
if (user_data || (has_dirty_flag && no_dirty_ops)) return ERR_PTR(-EOPNOTSUPP); return __mock_domain_alloc_paging(IOMMU_DOMAIN_UNMANAGED, has_dirty_flag);
Yea.. Then the vt-d driver needs a similar change too (@Yi) as I found it almost doing the same: https://lore.kernel.org/linux-iommu/20231024151412.50046-8-yi.l.liu@intel.co...
yes. mock driver is kind of sample code, so the intel iommu driver is doing almost the same thing. will follow up with the branch Jason shared.
On Tue, Oct 24, 2023 at 02:18:10PM -0300, Jason Gunthorpe wrote:
On Tue, Oct 24, 2023 at 08:06:06AM -0700, Yi Liu wrote:
@@ -195,6 +279,10 @@ int iommufd_hwpt_alloc(struct iommufd_ucmd *ucmd) if (pt_obj->type == IOMMUFD_OBJ_IOAS) { struct iommufd_hwpt_paging *hwpt_paging;
if (cmd->data_type != IOMMU_HWPT_DATA_NONE) {
rc = -EINVAL;
goto out_put_pt;
ioas = container_of(pt_obj, struct iommufd_ioas, obj); mutex_lock(&ioas->mutex); hwpt_paging = iommufd_hwpt_paging_alloc(ucmd->ictx, ioas, idev,}
?? What is this?
Ah something went wrong earlier in "iommu: Pass in parent domain with user_data to domain_alloc_user op"
Bah, I got confused because that had half the uapi, so in this pathc
Once we added the user_data we should flow it through to the op always.
Like this:
diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c index 92859b907bb93c..a3382811af8a81 100644 --- a/drivers/iommu/iommufd/device.c +++ b/drivers/iommu/iommufd/device.c @@ -586,8 +586,8 @@ iommufd_device_auto_get_domain(struct iommufd_device *idev, goto out_unlock; }
- hwpt_paging = iommufd_hwpt_paging_alloc(idev->ictx, ioas, idev, - 0, immediate_attach); + hwpt_paging = iommufd_hwpt_paging_alloc(idev->ictx, ioas, idev, 0, + immediate_attach, NULL); if (IS_ERR(hwpt_paging)) { destroy_hwpt = ERR_CAST(hwpt_paging); goto out_unlock; diff --git a/drivers/iommu/iommufd/hw_pagetable.c b/drivers/iommu/iommufd/hw_pagetable.c index cfd85df693d7b2..324a6d73f032ee 100644 --- a/drivers/iommu/iommufd/hw_pagetable.c +++ b/drivers/iommu/iommufd/hw_pagetable.c @@ -96,7 +96,8 @@ iommufd_hwpt_paging_enforce_cc(struct iommufd_hwpt_paging *hwpt_paging) struct iommufd_hwpt_paging * iommufd_hwpt_paging_alloc(struct iommufd_ctx *ictx, struct iommufd_ioas *ioas, struct iommufd_device *idev, u32 flags, - bool immediate_attach) + bool immediate_attach, + const struct iommu_user_data *user_data) { const u32 valid_flags = IOMMU_HWPT_ALLOC_NEST_PARENT | IOMMU_HWPT_ALLOC_DIRTY_TRACKING; @@ -107,7 +108,7 @@ iommufd_hwpt_paging_alloc(struct iommufd_ctx *ictx, struct iommufd_ioas *ioas,
lockdep_assert_held(&ioas->mutex);
- if (flags && !ops->domain_alloc_user) + if ((flags || user_data) && !ops->domain_alloc_user) return ERR_PTR(-EOPNOTSUPP); if (flags & ~valid_flags) return ERR_PTR(-EOPNOTSUPP); @@ -127,7 +128,7 @@ iommufd_hwpt_paging_alloc(struct iommufd_ctx *ictx, struct iommufd_ioas *ioas,
if (ops->domain_alloc_user) { hwpt->domain = ops->domain_alloc_user(idev->dev, flags, - NULL, NULL); + NULL, user_data); if (IS_ERR(hwpt->domain)) { rc = PTR_ERR(hwpt->domain); hwpt->domain = NULL; @@ -210,8 +211,7 @@ iommufd_hwpt_nested_alloc(struct iommufd_ctx *ictx, struct iommufd_hw_pagetable *hwpt; int rc;
- if (flags || user_data->type == IOMMU_HWPT_DATA_NONE || - !ops->domain_alloc_user) + if (flags || !user_data->len || !ops->domain_alloc_user) return ERR_PTR(-EOPNOTSUPP); if (parent->auto_domain || !parent->nest_parent) return ERR_PTR(-EINVAL); @@ -249,6 +249,11 @@ iommufd_hwpt_nested_alloc(struct iommufd_ctx *ictx, int iommufd_hwpt_alloc(struct iommufd_ucmd *ucmd) { struct iommu_hwpt_alloc *cmd = ucmd->cmd; + const struct iommu_user_data user_data = { + .type = cmd->data_type, + .uptr = u64_to_user_ptr(cmd->data_uptr), + .len = cmd->data_len, + }; struct iommufd_hw_pagetable *hwpt; struct iommufd_ioas *ioas = NULL; struct iommufd_object *pt_obj; @@ -273,25 +278,17 @@ int iommufd_hwpt_alloc(struct iommufd_ucmd *ucmd) if (pt_obj->type == IOMMUFD_OBJ_IOAS) { struct iommufd_hwpt_paging *hwpt_paging;
- if (cmd->data_type != IOMMU_HWPT_DATA_NONE) { - rc = -EINVAL; - goto out_put_pt; - } ioas = container_of(pt_obj, struct iommufd_ioas, obj); mutex_lock(&ioas->mutex); - hwpt_paging = iommufd_hwpt_paging_alloc(ucmd->ictx, ioas, idev, - cmd->flags, false); + hwpt_paging = iommufd_hwpt_paging_alloc( + ucmd->ictx, ioas, idev, cmd->flags, false, + user_data.len ? &user_data : NULL); if (IS_ERR(hwpt_paging)) { rc = PTR_ERR(hwpt_paging); goto out_unlock; } hwpt = &hwpt_paging->common; } else if (pt_obj->type == IOMMUFD_OBJ_HWPT_PAGING) { - const struct iommu_user_data user_data = { - .type = cmd->data_type, - .uptr = u64_to_user_ptr(cmd->data_uptr), - .len = cmd->data_len, - }; struct iommufd_hwpt_nested *hwpt_nested; struct iommufd_hwpt_paging *parent;
diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h index 6fdad56893af4d..24e5a36fc875e0 100644 --- a/drivers/iommu/iommufd/iommufd_private.h +++ b/drivers/iommu/iommufd/iommufd_private.h @@ -290,7 +290,8 @@ int iommufd_hwpt_get_dirty_bitmap(struct iommufd_ucmd *ucmd); struct iommufd_hwpt_paging * iommufd_hwpt_paging_alloc(struct iommufd_ctx *ictx, struct iommufd_ioas *ioas, struct iommufd_device *idev, u32 flags, - bool immediate_attach); + bool immediate_attach, + const struct iommu_user_data *user_data); int iommufd_hw_pagetable_attach(struct iommufd_hw_pagetable *hwpt, struct iommufd_device *idev); struct iommufd_hw_pagetable *
On 2023/10/25 01:30, Jason Gunthorpe wrote:
On Tue, Oct 24, 2023 at 02:18:10PM -0300, Jason Gunthorpe wrote:
On Tue, Oct 24, 2023 at 08:06:06AM -0700, Yi Liu wrote:
@@ -195,6 +279,10 @@ int iommufd_hwpt_alloc(struct iommufd_ucmd *ucmd) if (pt_obj->type == IOMMUFD_OBJ_IOAS) { struct iommufd_hwpt_paging *hwpt_paging;
if (cmd->data_type != IOMMU_HWPT_DATA_NONE) {
rc = -EINVAL;
goto out_put_pt;
ioas = container_of(pt_obj, struct iommufd_ioas, obj); mutex_lock(&ioas->mutex); hwpt_paging = iommufd_hwpt_paging_alloc(ucmd->ictx, ioas, idev,}
?? What is this?
Ah something went wrong earlier in "iommu: Pass in parent domain with user_data to domain_alloc_user op"
Bah, I got confused because that had half the uapi, so in this pathc
Once we added the user_data we should flow it through to the op always.
Like this:
ack.
diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c index 92859b907bb93c..a3382811af8a81 100644 --- a/drivers/iommu/iommufd/device.c +++ b/drivers/iommu/iommufd/device.c @@ -586,8 +586,8 @@ iommufd_device_auto_get_domain(struct iommufd_device *idev, goto out_unlock; }
- hwpt_paging = iommufd_hwpt_paging_alloc(idev->ictx, ioas, idev,
0, immediate_attach);
- hwpt_paging = iommufd_hwpt_paging_alloc(idev->ictx, ioas, idev, 0,
if (IS_ERR(hwpt_paging)) { destroy_hwpt = ERR_CAST(hwpt_paging); goto out_unlock;immediate_attach, NULL);
diff --git a/drivers/iommu/iommufd/hw_pagetable.c b/drivers/iommu/iommufd/hw_pagetable.c index cfd85df693d7b2..324a6d73f032ee 100644 --- a/drivers/iommu/iommufd/hw_pagetable.c +++ b/drivers/iommu/iommufd/hw_pagetable.c @@ -96,7 +96,8 @@ iommufd_hwpt_paging_enforce_cc(struct iommufd_hwpt_paging *hwpt_paging) struct iommufd_hwpt_paging * iommufd_hwpt_paging_alloc(struct iommufd_ctx *ictx, struct iommufd_ioas *ioas, struct iommufd_device *idev, u32 flags,
bool immediate_attach)
bool immediate_attach,
{ const u32 valid_flags = IOMMU_HWPT_ALLOC_NEST_PARENT | IOMMU_HWPT_ALLOC_DIRTY_TRACKING;const struct iommu_user_data *user_data)
@@ -107,7 +108,7 @@ iommufd_hwpt_paging_alloc(struct iommufd_ctx *ictx, struct iommufd_ioas *ioas, lockdep_assert_held(&ioas->mutex);
- if (flags && !ops->domain_alloc_user)
- if ((flags || user_data) && !ops->domain_alloc_user) return ERR_PTR(-EOPNOTSUPP); if (flags & ~valid_flags) return ERR_PTR(-EOPNOTSUPP);
@@ -127,7 +128,7 @@ iommufd_hwpt_paging_alloc(struct iommufd_ctx *ictx, struct iommufd_ioas *ioas, if (ops->domain_alloc_user) { hwpt->domain = ops->domain_alloc_user(idev->dev, flags,
NULL, NULL);
if (IS_ERR(hwpt->domain)) { rc = PTR_ERR(hwpt->domain); hwpt->domain = NULL;NULL, user_data);
@@ -210,8 +211,7 @@ iommufd_hwpt_nested_alloc(struct iommufd_ctx *ictx, struct iommufd_hw_pagetable *hwpt; int rc;
- if (flags || user_data->type == IOMMU_HWPT_DATA_NONE ||
!ops->domain_alloc_user)
- if (flags || !user_data->len || !ops->domain_alloc_user) return ERR_PTR(-EOPNOTSUPP); if (parent->auto_domain || !parent->nest_parent) return ERR_PTR(-EINVAL);
@@ -249,6 +249,11 @@ iommufd_hwpt_nested_alloc(struct iommufd_ctx *ictx, int iommufd_hwpt_alloc(struct iommufd_ucmd *ucmd) { struct iommu_hwpt_alloc *cmd = ucmd->cmd;
- const struct iommu_user_data user_data = {
.type = cmd->data_type,
.uptr = u64_to_user_ptr(cmd->data_uptr),
.len = cmd->data_len,
- }; struct iommufd_hw_pagetable *hwpt; struct iommufd_ioas *ioas = NULL; struct iommufd_object *pt_obj;
@@ -273,25 +278,17 @@ int iommufd_hwpt_alloc(struct iommufd_ucmd *ucmd) if (pt_obj->type == IOMMUFD_OBJ_IOAS) { struct iommufd_hwpt_paging *hwpt_paging;
if (cmd->data_type != IOMMU_HWPT_DATA_NONE) {
rc = -EINVAL;
goto out_put_pt;
}
it can be covered by iommu driver when checking user data pointer. hence remove above if.
ioas = container_of(pt_obj, struct iommufd_ioas, obj); mutex_lock(&ioas->mutex);
hwpt_paging = iommufd_hwpt_paging_alloc(ucmd->ictx, ioas, idev,
cmd->flags, false);
hwpt_paging = iommufd_hwpt_paging_alloc(
ucmd->ictx, ioas, idev, cmd->flags, false,
if (IS_ERR(hwpt_paging)) { rc = PTR_ERR(hwpt_paging); goto out_unlock; } hwpt = &hwpt_paging->common; } else if (pt_obj->type == IOMMUFD_OBJ_HWPT_PAGING) {user_data.len ? &user_data : NULL);
const struct iommu_user_data user_data = {
.type = cmd->data_type,
.uptr = u64_to_user_ptr(cmd->data_uptr),
.len = cmd->data_len,
struct iommufd_hwpt_nested *hwpt_nested; struct iommufd_hwpt_paging *parent;};
diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h index 6fdad56893af4d..24e5a36fc875e0 100644 --- a/drivers/iommu/iommufd/iommufd_private.h +++ b/drivers/iommu/iommufd/iommufd_private.h @@ -290,7 +290,8 @@ int iommufd_hwpt_get_dirty_bitmap(struct iommufd_ucmd *ucmd); struct iommufd_hwpt_paging * iommufd_hwpt_paging_alloc(struct iommufd_ctx *ictx, struct iommufd_ioas *ioas, struct iommufd_device *idev, u32 flags,
bool immediate_attach);
bool immediate_attach,
int iommufd_hw_pagetable_attach(struct iommufd_hw_pagetable *hwpt, struct iommufd_device *idev); struct iommufd_hw_pagetable *const struct iommu_user_data *user_data);
On Tue, Oct 24, 2023 at 08:06:06AM -0700, Yi Liu wrote:
static inline void iommufd_hw_pagetable_put(struct iommufd_ctx *ictx, struct iommufd_hw_pagetable *hwpt) {
- struct iommufd_hwpt_paging *hwpt_paging = to_hwpt_paging(hwpt);
- if (WARN_ON(hwpt->obj.type != IOMMUFD_OBJ_HWPT_PAGING &&
hwpt->obj.type != IOMMUFD_OBJ_HWPT_NESTED))
return;
This is redundant, we have a C type, no need to check the type field like this
Jason
From: Nicolin Chen nicolinc@nvidia.com
Wrap up the data type/pointer/len sanity and a copy_struct_from_user call for iommu drivers to copy driver specific data via struct iommu_user_data. And expect it to be used in the domain_alloc_user op for example.
Signed-off-by: Nicolin Chen nicolinc@nvidia.com Co-developed-by: Yi Liu yi.l.liu@intel.com Signed-off-by: Yi Liu yi.l.liu@intel.com --- include/linux/iommu.h | 41 +++++++++++++++++++++++++++++++++++++++++ 1 file changed, 41 insertions(+)
diff --git a/include/linux/iommu.h b/include/linux/iommu.h index a86ce9cb3547..9417b9e0616b 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -280,6 +280,47 @@ struct iommu_user_data { size_t len; };
+/** + * __iommu_copy_struct_from_user - Copy iommu driver specific user space data + * @dst_data: Pointer to an iommu driver specific user data that is defined in + * include/uapi/linux/iommufd.h + * @src_data: Pointer to a struct iommu_user_data for user space data info + * @data_type: The data type of the @dst_data. Must match with @src_data.type + * @data_len: Length of current user data structure, i.e. sizeof(struct _dst) + * @min_len: Initial length of user data structure for backward compatibility. + * This should be offsetofend using the last member in the user data + * struct that was initially added to include/uapi/linux/iommufd.h + */ +static inline int +__iommu_copy_struct_from_user(void *dst_data, + const struct iommu_user_data *src_data, + unsigned int data_type, size_t data_len, + size_t min_len) +{ + if (src_data->type != data_type) + return -EINVAL; + if (WARN_ON(!dst_data || !src_data)) + return -EINVAL; + if (src_data->len < min_len || data_len < src_data->len) + return -EINVAL; + return copy_struct_from_user(dst_data, data_len, + src_data->uptr, src_data->len); +} + +/** + * iommu_copy_struct_from_user - Copy iommu driver specific user space data + * @kdst: Pointer to an iommu driver specific user data that is defined in + * include/uapi/linux/iommufd.h + * @user_data: Pointer to a struct iommu_user_data for user space data info + * @data_type: The data type of the @kdst. Must match with @user_data->type + * @min_last: The last memember of the data structure @kdst points in the + * initial version. + * Return 0 for success, otherwise -error. + */ +#define iommu_copy_struct_from_user(kdst, user_data, data_type, min_last) \ + __iommu_copy_struct_from_user(kdst, user_data, data_type, sizeof(*kdst), \ + offsetofend(typeof(*kdst), min_last)) + /** * struct iommu_ops - iommu ops and capabilities * @capable: check capability
From: Nicolin Chen nicolinc@nvidia.com
Add nested domain support in the ->domain_alloc_user op with some proper sanity checks. Then, add a domain_nested_ops for all nested domains and split the get_md_pagetable helper into paging and nested helpers.
Also, add an iotlb as a testing property of a nested domain.
Signed-off-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Yi Liu yi.l.liu@intel.com --- drivers/iommu/iommufd/iommufd_test.h | 18 ++++ drivers/iommu/iommufd/selftest.c | 150 +++++++++++++++++++++------ 2 files changed, 138 insertions(+), 30 deletions(-)
diff --git a/drivers/iommu/iommufd/iommufd_test.h b/drivers/iommu/iommufd/iommufd_test.h index 1f2e93d3d4e8..cf7d999a55bb 100644 --- a/drivers/iommu/iommufd/iommufd_test.h +++ b/drivers/iommu/iommufd/iommufd_test.h @@ -46,6 +46,11 @@ enum { MOCK_FLAGS_DEVICE_NO_DIRTY = 1 << 0, };
+enum { + MOCK_NESTED_DOMAIN_IOTLB_ID_MAX = 3, + MOCK_NESTED_DOMAIN_IOTLB_NUM = 4, +}; + struct iommu_test_cmd { __u32 size; __u32 op; @@ -130,4 +135,17 @@ struct iommu_test_hw_info { __u32 test_reg; };
+/* Should not be equal to any defined value in enum iommu_hwpt_data_type */ +#define IOMMU_HWPT_DATA_SELFTEST 0xdead + +/** + * struct iommu_hwpt_selftest + * + * @iotlb: default mock iotlb value, IOMMU_TEST_IOTLB_DEFAULT + */ +struct iommu_hwpt_selftest { +#define IOMMU_TEST_IOTLB_DEFAULT 0xbadbeef + __u32 iotlb; +}; + #endif diff --git a/drivers/iommu/iommufd/selftest.c b/drivers/iommu/iommufd/selftest.c index d71007234896..bf9a88193367 100644 --- a/drivers/iommu/iommufd/selftest.c +++ b/drivers/iommu/iommufd/selftest.c @@ -93,6 +93,12 @@ struct mock_iommu_domain { struct xarray pfns; };
+struct mock_iommu_domain_nested { + struct iommu_domain domain; + struct mock_iommu_domain *parent; + u32 iotlb[MOCK_NESTED_DOMAIN_IOTLB_NUM]; +}; + enum selftest_obj_type { TYPE_IDEV, }; @@ -217,54 +223,97 @@ const struct iommu_dirty_ops dirty_ops = { };
static const struct iommu_ops mock_ops; +static struct iommu_domain_ops domain_nested_ops;
-static struct iommu_domain *mock_domain_alloc(unsigned int iommu_domain_type) +static struct iommu_domain * +__mock_domain_alloc_paging(unsigned int iommu_domain_type, bool needs_dirty_ops) { struct mock_iommu_domain *mock;
- if (iommu_domain_type == IOMMU_DOMAIN_BLOCKED) - return &mock_blocking_domain; - - if (iommu_domain_type != IOMMU_DOMAIN_UNMANAGED) - return NULL; - mock = kzalloc(sizeof(*mock), GFP_KERNEL); if (!mock) - return NULL; + return ERR_PTR(-ENOMEM); mock->domain.geometry.aperture_start = MOCK_APERTURE_START; mock->domain.geometry.aperture_end = MOCK_APERTURE_LAST; mock->domain.pgsize_bitmap = MOCK_IO_PAGE_SIZE; mock->domain.ops = mock_ops.default_domain_ops; + if (needs_dirty_ops) + mock->domain.dirty_ops = &dirty_ops; mock->domain.type = iommu_domain_type; xa_init(&mock->pfns); return &mock->domain; }
+static struct iommu_domain * +__mock_domain_alloc_nested(struct mock_iommu_domain *mock_parent, + const struct iommu_hwpt_selftest *user_cfg) +{ + struct mock_iommu_domain_nested *mock_nested; + int i; + + mock_nested = kzalloc(sizeof(*mock_nested), GFP_KERNEL); + if (!mock_nested) + return ERR_PTR(-ENOMEM); + mock_nested->parent = mock_parent; + mock_nested->domain.ops = &domain_nested_ops; + mock_nested->domain.type = IOMMU_DOMAIN_NESTED; + for (i = 0; i < MOCK_NESTED_DOMAIN_IOTLB_NUM; i++) + mock_nested->iotlb[i] = user_cfg->iotlb; + return &mock_nested->domain; +} + +static struct iommu_domain *mock_domain_alloc(unsigned int iommu_domain_type) +{ + struct iommu_domain *domain; + + if (iommu_domain_type == IOMMU_DOMAIN_BLOCKED) + return &mock_blocking_domain; + if (iommu_domain_type != IOMMU_DOMAIN_UNMANAGED) + return NULL; + domain = __mock_domain_alloc_paging(iommu_domain_type, false); + if (IS_ERR(domain)) + domain = NULL; + return domain; +} + static struct iommu_domain * mock_domain_alloc_user(struct device *dev, u32 flags, struct iommu_domain *parent, const struct iommu_user_data *user_data) { - struct mock_dev *mdev = container_of(dev, struct mock_dev, dev); - struct iommu_domain *domain; + struct mock_iommu_domain *mock_parent; + struct iommu_hwpt_selftest user_cfg; + int rc;
- if (flags & - (~(IOMMU_HWPT_ALLOC_NEST_PARENT | IOMMU_HWPT_ALLOC_DIRTY_TRACKING))) - return ERR_PTR(-EOPNOTSUPP); + if (!user_data) { /* must be mock_domain */ + struct mock_dev *mdev = container_of(dev, struct mock_dev, dev); + bool has_dirty_flag = flags & IOMMU_HWPT_ALLOC_DIRTY_TRACKING; + bool no_dirty_ops = mdev->flags & MOCK_FLAGS_DEVICE_NO_DIRTY; + + if (parent) + return ERR_PTR(-EINVAL); + if (has_dirty_flag && no_dirty_ops) + return ERR_PTR(-EOPNOTSUPP); + return __mock_domain_alloc_paging(IOMMU_DOMAIN_UNMANAGED, + has_dirty_flag); + }
- if (parent || user_data) + /* must be mock_domain_nested */ + if (user_data->type != IOMMU_HWPT_DATA_SELFTEST) return ERR_PTR(-EOPNOTSUPP); + if (!parent || parent->ops != mock_ops.default_domain_ops) + return ERR_PTR(-EINVAL);
- if ((flags & IOMMU_HWPT_ALLOC_DIRTY_TRACKING) && - (mdev->flags & MOCK_FLAGS_DEVICE_NO_DIRTY)) - return ERR_PTR(-EOPNOTSUPP); + mock_parent = container_of(parent, struct mock_iommu_domain, domain); + if (!mock_parent) + return ERR_PTR(-EINVAL);
- domain = mock_domain_alloc(IOMMU_DOMAIN_UNMANAGED); - if (domain && !(mdev->flags & MOCK_FLAGS_DEVICE_NO_DIRTY)) - domain->dirty_ops = &dirty_ops; - if (!domain) - domain = ERR_PTR(-ENOMEM); - return domain; + rc = iommu_copy_struct_from_user(&user_cfg, user_data, + IOMMU_HWPT_DATA_SELFTEST, iotlb); + if (rc) + return ERR_PTR(rc); + + return __mock_domain_alloc_nested(mock_parent, &user_cfg); }
static void mock_domain_free(struct iommu_domain *domain) @@ -434,19 +483,41 @@ static const struct iommu_ops mock_ops = { }, };
+static void mock_domain_free_nested(struct iommu_domain *domain) +{ + struct mock_iommu_domain_nested *mock_nested = + container_of(domain, struct mock_iommu_domain_nested, domain); + + kfree(mock_nested); +} + +static struct iommu_domain_ops domain_nested_ops = { + .free = mock_domain_free_nested, + .attach_dev = mock_domain_nop_attach, +}; + static inline struct iommufd_hw_pagetable * -get_md_pagetable(struct iommufd_ucmd *ucmd, u32 mockpt_id, - struct mock_iommu_domain **mock) +__get_md_pagetable(struct iommufd_ucmd *ucmd, u32 mockpt_id, u32 hwpt_type) { - struct iommufd_hw_pagetable *hwpt; struct iommufd_object *obj;
- obj = iommufd_get_object(ucmd->ictx, mockpt_id, - IOMMUFD_OBJ_HWPT_PAGING); + obj = iommufd_get_object(ucmd->ictx, mockpt_id, hwpt_type); if (IS_ERR(obj)) return ERR_CAST(obj); - hwpt = container_of(obj, struct iommufd_hw_pagetable, obj); - if (hwpt->domain->ops != mock_ops.default_domain_ops) { + return container_of(obj, struct iommufd_hw_pagetable, obj); +} + +static inline struct iommufd_hw_pagetable * +get_md_pagetable(struct iommufd_ucmd *ucmd, u32 mockpt_id, + struct mock_iommu_domain **mock) +{ + struct iommufd_hw_pagetable *hwpt; + + hwpt = __get_md_pagetable(ucmd, mockpt_id, IOMMUFD_OBJ_HWPT_PAGING); + if (IS_ERR(hwpt)) + return hwpt; + if (hwpt->domain->type != IOMMU_DOMAIN_UNMANAGED || + hwpt->domain->ops != mock_ops.default_domain_ops) { iommufd_put_object(&hwpt->obj); return ERR_PTR(-EINVAL); } @@ -454,6 +525,25 @@ get_md_pagetable(struct iommufd_ucmd *ucmd, u32 mockpt_id, return hwpt; }
+static inline struct iommufd_hw_pagetable * +get_md_pagetable_nested(struct iommufd_ucmd *ucmd, u32 mockpt_id, + struct mock_iommu_domain_nested **mock_nested) +{ + struct iommufd_hw_pagetable *hwpt; + + hwpt = __get_md_pagetable(ucmd, mockpt_id, IOMMUFD_OBJ_HWPT_NESTED); + if (IS_ERR(hwpt)) + return hwpt; + if (hwpt->domain->type != IOMMU_DOMAIN_NESTED || + hwpt->domain->ops != &domain_nested_ops) { + iommufd_put_object(&hwpt->obj); + return ERR_PTR(-EINVAL); + } + *mock_nested = container_of(hwpt->domain, + struct mock_iommu_domain_nested, domain); + return hwpt; +} + struct mock_bus_type { struct bus_type bus; struct notifier_block nb;
From: Nicolin Chen nicolinc@nvidia.com
The IOMMU_HWPT_ALLOC ioctl now supports passing user_data to allocate a user-managed domain for nested HWPTs. Add its coverage for that. Also, update _test_cmd_hwpt_alloc() and add test_cmd/err_hwpt_alloc_nested().
Signed-off-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Yi Liu yi.l.liu@intel.com --- tools/testing/selftests/iommu/iommufd.c | 120 ++++++++++++++++++ .../selftests/iommu/iommufd_fail_nth.c | 3 +- tools/testing/selftests/iommu/iommufd_utils.h | 31 ++++- 3 files changed, 146 insertions(+), 8 deletions(-)
diff --git a/tools/testing/selftests/iommu/iommufd.c b/tools/testing/selftests/iommu/iommufd.c index 76a4351e3434..7ad42656f4b1 100644 --- a/tools/testing/selftests/iommu/iommufd.c +++ b/tools/testing/selftests/iommu/iommufd.c @@ -264,6 +264,126 @@ TEST_F(iommufd_ioas, ioas_destroy) } }
+TEST_F(iommufd_ioas, alloc_hwpt_nested) +{ + const uint32_t min_data_len = + offsetofend(struct iommu_hwpt_selftest, iotlb); + struct iommu_hwpt_selftest data = { + .iotlb = IOMMU_TEST_IOTLB_DEFAULT, + }; + uint32_t nested_hwpt_id[2] = {}; + uint32_t parent_hwpt_id = 0; + uint32_t parent_hwpt_id_not_work = 0; + uint32_t test_hwpt_id = 0; + + if (self->device_id) { + /* Negative tests */ + test_err_hwpt_alloc(ENOENT, self->ioas_id, self->device_id, + 0, &test_hwpt_id); + test_err_hwpt_alloc(EINVAL, self->device_id, + self->device_id, 0, &test_hwpt_id); + + test_cmd_hwpt_alloc(self->device_id, self->ioas_id, + IOMMU_HWPT_ALLOC_NEST_PARENT, + &parent_hwpt_id); + + test_cmd_hwpt_alloc(self->device_id, self->ioas_id, + 0, &parent_hwpt_id_not_work); + + /* Negative nested tests */ + test_err_hwpt_alloc_nested(EINVAL, + self->device_id, parent_hwpt_id, + 0, &nested_hwpt_id[0], + IOMMU_HWPT_DATA_NONE, + &data, sizeof(data)); + test_err_hwpt_alloc_nested(EOPNOTSUPP, + self->device_id, parent_hwpt_id, + 0, &nested_hwpt_id[0], + IOMMU_HWPT_DATA_SELFTEST + 1, + &data, sizeof(data)); + test_err_hwpt_alloc_nested(EINVAL, + self->device_id, parent_hwpt_id, + 0, &nested_hwpt_id[0], + IOMMU_HWPT_DATA_SELFTEST, + &data, min_data_len - 1); + test_err_hwpt_alloc_nested(EFAULT, + self->device_id, parent_hwpt_id, + 0, &nested_hwpt_id[0], + IOMMU_HWPT_DATA_SELFTEST, + NULL, sizeof(data)); + test_err_hwpt_alloc_nested(EOPNOTSUPP, + self->device_id, parent_hwpt_id, + IOMMU_HWPT_ALLOC_NEST_PARENT, + &nested_hwpt_id[0], + IOMMU_HWPT_DATA_SELFTEST, + &data, sizeof(data)); + test_err_hwpt_alloc_nested(EINVAL, self->device_id, + parent_hwpt_id_not_work, + 0, &nested_hwpt_id[0], + IOMMU_HWPT_DATA_SELFTEST, + &data, sizeof(data)); + + /* Allocate two nested hwpts sharing one common parent hwpt */ + test_cmd_hwpt_alloc_nested(self->device_id, parent_hwpt_id, + 0, &nested_hwpt_id[0], + IOMMU_HWPT_DATA_SELFTEST, + &data, sizeof(data)); + test_cmd_hwpt_alloc_nested(self->device_id, parent_hwpt_id, + 0, &nested_hwpt_id[1], + IOMMU_HWPT_DATA_SELFTEST, + &data, sizeof(data)); + + /* Negative test: a nested hwpt on top of a nested hwpt */ + test_err_hwpt_alloc_nested(EINVAL, + self->device_id, nested_hwpt_id[0], + 0, &test_hwpt_id, + IOMMU_HWPT_DATA_SELFTEST, + &data, sizeof(data)); + /* Negative test: parent hwpt now cannot be freed */ + EXPECT_ERRNO(EBUSY, + _test_ioctl_destroy(self->fd, parent_hwpt_id)); + + /* Attach device to nested_hwpt_id[0] that then will be busy */ + test_cmd_mock_domain_replace(self->stdev_id, + nested_hwpt_id[0]); + EXPECT_ERRNO(EBUSY, + _test_ioctl_destroy(self->fd, nested_hwpt_id[0])); + + /* Switch from nested_hwpt_id[0] to nested_hwpt_id[1] */ + test_cmd_mock_domain_replace(self->stdev_id, + nested_hwpt_id[1]); + EXPECT_ERRNO(EBUSY, + _test_ioctl_destroy(self->fd, nested_hwpt_id[1])); + test_ioctl_destroy(nested_hwpt_id[0]); + + /* Detach from nested_hwpt_id[1] and destroy it */ + test_cmd_mock_domain_replace(self->stdev_id, parent_hwpt_id); + test_ioctl_destroy(nested_hwpt_id[1]); + + /* Detach from the parent hw_pagetable and destroy it */ + test_cmd_mock_domain_replace(self->stdev_id, self->ioas_id); + test_ioctl_destroy(parent_hwpt_id); + test_ioctl_destroy(parent_hwpt_id_not_work); + } else { + test_err_hwpt_alloc(ENOENT, self->device_id, self->ioas_id, + 0, &parent_hwpt_id); + test_err_hwpt_alloc_nested(ENOENT, + self->device_id, parent_hwpt_id, + 0, &nested_hwpt_id[0], + IOMMU_HWPT_DATA_SELFTEST, + &data, sizeof(data)); + test_err_hwpt_alloc_nested(ENOENT, + self->device_id, parent_hwpt_id, + 0, &nested_hwpt_id[1], + IOMMU_HWPT_DATA_SELFTEST, + &data, sizeof(data)); + test_err_mock_domain_replace(ENOENT, + self->stdev_id, nested_hwpt_id[0]); + test_err_mock_domain_replace(ENOENT, + self->stdev_id, nested_hwpt_id[1]); + } +} + TEST_F(iommufd_ioas, hwpt_attach) { /* Create a device attached directly to a hwpt */ diff --git a/tools/testing/selftests/iommu/iommufd_fail_nth.c b/tools/testing/selftests/iommu/iommufd_fail_nth.c index ff735bdd833e..f590417cd67a 100644 --- a/tools/testing/selftests/iommu/iommufd_fail_nth.c +++ b/tools/testing/selftests/iommu/iommufd_fail_nth.c @@ -615,7 +615,8 @@ TEST_FAIL_NTH(basic_fail_nth, device) if (_test_cmd_get_hw_info(self->fd, idev_id, &info, sizeof(info), NULL)) return -1;
- if (_test_cmd_hwpt_alloc(self->fd, idev_id, ioas_id, 0, &hwpt_id)) + if (_test_cmd_hwpt_alloc(self->fd, idev_id, ioas_id, 0, &hwpt_id, + IOMMU_HWPT_DATA_NONE, 0, 0)) return -1;
if (_test_cmd_mock_domain_replace(self->fd, stdev_id, ioas_id2, NULL)) diff --git a/tools/testing/selftests/iommu/iommufd_utils.h b/tools/testing/selftests/iommu/iommufd_utils.h index e263bf80a977..e5f1b4d66b80 100644 --- a/tools/testing/selftests/iommu/iommufd_utils.h +++ b/tools/testing/selftests/iommu/iommufd_utils.h @@ -154,13 +154,17 @@ static int _test_cmd_mock_domain_replace(int fd, __u32 stdev_id, __u32 pt_id, pt_id, NULL))
static int _test_cmd_hwpt_alloc(int fd, __u32 device_id, __u32 pt_id, - __u32 flags, __u32 *hwpt_id) + __u32 flags, __u32 *hwpt_id, __u32 data_type, + void *data, size_t data_len) { struct iommu_hwpt_alloc cmd = { .size = sizeof(cmd), .flags = flags, .dev_id = device_id, .pt_id = pt_id, + .data_type = data_type, + .data_len = data_len, + .data_uptr = (uint64_t)data, }; int ret;
@@ -172,12 +176,25 @@ static int _test_cmd_hwpt_alloc(int fd, __u32 device_id, __u32 pt_id, return 0; }
-#define test_cmd_hwpt_alloc(device_id, pt_id, flags, hwpt_id) \ - ASSERT_EQ(0, _test_cmd_hwpt_alloc(self->fd, device_id, \ - pt_id, flags, hwpt_id)) -#define test_err_hwpt_alloc(_errno, device_id, pt_id, flags, hwpt_id) \ - EXPECT_ERRNO(_errno, _test_cmd_hwpt_alloc(self->fd, device_id, \ - pt_id, flags, hwpt_id)) +#define test_cmd_hwpt_alloc(device_id, pt_id, flags, hwpt_id) \ + ASSERT_EQ(0, _test_cmd_hwpt_alloc(self->fd, device_id, pt_id, flags, \ + hwpt_id, IOMMU_HWPT_DATA_NONE, \ + NULL, 0)) +#define test_err_hwpt_alloc(_errno, device_id, pt_id, flags, hwpt_id) \ + EXPECT_ERRNO(_errno, _test_cmd_hwpt_alloc(self->fd, device_id, pt_id, \ + flags, hwpt_id, \ + IOMMU_HWPT_DATA_NONE, \ + NULL, 0)) + +#define test_cmd_hwpt_alloc_nested(device_id, pt_id, flags, hwpt_id, \ + data_type, data, data_len) \ + ASSERT_EQ(0, _test_cmd_hwpt_alloc(self->fd, device_id, pt_id, flags, \ + hwpt_id, data_type, data, data_len)) +#define test_err_hwpt_alloc_nested(_errno, device_id, pt_id, flags, hwpt_id, \ + data_type, data, data_len) \ + EXPECT_ERRNO(_errno, \ + _test_cmd_hwpt_alloc(self->fd, device_id, pt_id, flags, \ + hwpt_id, data_type, data, data_len))
static int _test_cmd_access_replace_ioas(int fd, __u32 access_id, unsigned int ioas_id)
On Tue, Oct 24, 2023 at 08:05:59AM -0700, Yi Liu wrote:
Jason Gunthorpe (2): iommufd: Rename IOMMUFD_OBJ_HW_PAGETABLE to IOMMUFD_OBJ_HWPT_PAGING iommufd/device: Wrap IOMMUFD_OBJ_HWPT_PAGING-only configurations
Lu Baolu (1): iommu: Add IOMMU_DOMAIN_NESTED
Nicolin Chen (6): iommufd: Derive iommufd_hwpt_paging from iommufd_hw_pagetable iommufd: Share iommufd_hwpt_alloc with IOMMUFD_OBJ_HWPT_NESTED iommufd: Add a nested HW pagetable object iommu: Add iommu_copy_struct_from_user helper iommufd/selftest: Add nested domain allocation for mock domain iommufd/selftest: Add coverage for IOMMU_HWPT_ALLOC with nested HWPTs
Yi Liu (1): iommu: Pass in parent domain with user_data to domain_alloc_user op
I made the changes noted in the thread and pushed this to linux-next
If we need more things then lets have a v6 no later than thursday, otherwise please check/retest what I pushed carefully.
Thanks, Jason
From: Jason Gunthorpe jgg@nvidia.com Sent: Wednesday, October 25, 2023 1:56 AM
On Tue, Oct 24, 2023 at 08:05:59AM -0700, Yi Liu wrote:
Jason Gunthorpe (2): iommufd: Rename IOMMUFD_OBJ_HW_PAGETABLE to
IOMMUFD_OBJ_HWPT_PAGING
iommufd/device: Wrap IOMMUFD_OBJ_HWPT_PAGING-only
configurations
Lu Baolu (1): iommu: Add IOMMU_DOMAIN_NESTED
Nicolin Chen (6): iommufd: Derive iommufd_hwpt_paging from iommufd_hw_pagetable iommufd: Share iommufd_hwpt_alloc with
IOMMUFD_OBJ_HWPT_NESTED
iommufd: Add a nested HW pagetable object iommu: Add iommu_copy_struct_from_user helper iommufd/selftest: Add nested domain allocation for mock domain iommufd/selftest: Add coverage for IOMMU_HWPT_ALLOC with nested
HWPTs
Yi Liu (1): iommu: Pass in parent domain with user_data to domain_alloc_user op
I made the changes noted in the thread and pushed this to linux-next
this looks good overall except a small error. For the whole series with your changes:
Reviewed-by: Kevin Tian kevin.tian@intel.com
linux-kselftest-mirror@lists.linaro.org