This adds the pasid attach/detach uAPIs for userspace to attach/detach a PASID of a device to/from a given ioas/hwpt. Only vfio-pci driver is enabled in this series. After this series, PASID-capable devices bound with vfio-pci can report PASID capability to userspace and VM to enable PASID usages like Shared Virtual Addressing (SVA).
This series first adds the helpers for pasid attach in vfio core and then add the device cdev ioctls for pasid attach/detach, finally exposes the device PASID capability to user. It depends on iommufd pasid attach/detach series [1].
Complete code can be found at [2]
[1] https://lore.kernel.org/linux-iommu/20230926092651.17041-1-yi.l.liu@intel.co... [2] https://github.com/yiliu1765/iommufd/tree/iommufd_pasid
Regards, Yi Liu
Kevin Tian (1): vfio-iommufd: Support pasid [at|de]tach for physical VFIO devices
Yi Liu (2): vfio: Add VFIO_DEVICE_PASID_[AT|DE]TACH_IOMMUFD_PT vfio/pci: Expose PCIe PASID capability to userspace
drivers/vfio/device_cdev.c | 45 ++++++++++++++++++++++++ drivers/vfio/iommufd.c | 48 ++++++++++++++++++++++++++ drivers/vfio/pci/vfio_pci.c | 2 ++ drivers/vfio/pci/vfio_pci_config.c | 2 +- drivers/vfio/vfio.h | 4 +++ drivers/vfio/vfio_main.c | 8 +++++ include/linux/vfio.h | 11 ++++++ include/uapi/linux/vfio.h | 55 ++++++++++++++++++++++++++++++ 8 files changed, 174 insertions(+), 1 deletion(-)
From: Kevin Tian kevin.tian@intel.com
This adds pasid_at|de]tach_ioas ops for attaching hwpt to pasid of a device and the helpers for it. For now, only vfio-pci supports pasid attach/detach.
Signed-off-by: Kevin Tian kevin.tian@intel.com Signed-off-by: Yi Liu yi.l.liu@intel.com --- drivers/vfio/iommufd.c | 48 +++++++++++++++++++++++++++++++++++++ drivers/vfio/pci/vfio_pci.c | 2 ++ include/linux/vfio.h | 11 +++++++++ 3 files changed, 61 insertions(+)
diff --git a/drivers/vfio/iommufd.c b/drivers/vfio/iommufd.c index 82eba6966fa5..43a702b9b4d3 100644 --- a/drivers/vfio/iommufd.c +++ b/drivers/vfio/iommufd.c @@ -119,6 +119,7 @@ int vfio_iommufd_physical_bind(struct vfio_device *vdev, if (IS_ERR(idev)) return PTR_ERR(idev); vdev->iommufd_device = idev; + xa_init(&vdev->pasid_pts); return 0; } EXPORT_SYMBOL_GPL(vfio_iommufd_physical_bind); @@ -127,6 +128,17 @@ void vfio_iommufd_physical_unbind(struct vfio_device *vdev) { lockdep_assert_held(&vdev->dev_set->lock);
+ if (!xa_empty(&vdev->pasid_pts)) { + void *entry; + unsigned long index; + + xa_for_each(&vdev->pasid_pts, index, entry) { + xa_erase(&vdev->pasid_pts, index); + iommufd_device_pasid_detach(vdev->iommufd_device, index); + } + xa_destroy(&vdev->pasid_pts); + } + if (vdev->iommufd_attached) { iommufd_device_detach(vdev->iommufd_device); vdev->iommufd_attached = false; @@ -168,6 +180,42 @@ void vfio_iommufd_physical_detach_ioas(struct vfio_device *vdev) } EXPORT_SYMBOL_GPL(vfio_iommufd_physical_detach_ioas);
+int vfio_iommufd_physical_pasid_attach_ioas(struct vfio_device *vdev, + u32 pasid, u32 *pt_id) +{ + void *entry; + int rc; + + lockdep_assert_held(&vdev->dev_set->lock); + + if (WARN_ON(!vdev->iommufd_device)) + return -EINVAL; + + entry = xa_load(&vdev->pasid_pts, pasid); + if (xa_is_value(entry)) + rc = iommufd_device_pasid_replace(vdev->iommufd_device, pasid, pt_id); + else + rc = iommufd_device_pasid_attach(vdev->iommufd_device, pasid, pt_id); + if (rc) + return rc; + xa_store(&vdev->pasid_pts, pasid, xa_mk_value(*pt_id), GFP_KERNEL); + return 0; +} +EXPORT_SYMBOL_GPL(vfio_iommufd_physical_pasid_attach_ioas); + +void vfio_iommufd_physical_pasid_detach_ioas(struct vfio_device *vdev, u32 pasid) +{ + lockdep_assert_held(&vdev->dev_set->lock); + + if (WARN_ON(!vdev->iommufd_device) || + !xa_is_value(xa_load(&vdev->pasid_pts, pasid))) + return; + + iommufd_device_pasid_detach(vdev->iommufd_device, pasid); + xa_erase(&vdev->pasid_pts, pasid); +} +EXPORT_SYMBOL_GPL(vfio_iommufd_physical_pasid_detach_ioas); + /* * The emulated standard ops mean that vfio_device is going to use the * "mdev path" and will call vfio_pin_pages()/vfio_dma_rw(). Drivers using this diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c index cb5b7f865d58..e0198851ffd2 100644 --- a/drivers/vfio/pci/vfio_pci.c +++ b/drivers/vfio/pci/vfio_pci.c @@ -142,6 +142,8 @@ static const struct vfio_device_ops vfio_pci_ops = { .unbind_iommufd = vfio_iommufd_physical_unbind, .attach_ioas = vfio_iommufd_physical_attach_ioas, .detach_ioas = vfio_iommufd_physical_detach_ioas, + .pasid_attach_ioas = vfio_iommufd_physical_pasid_attach_ioas, + .pasid_detach_ioas = vfio_iommufd_physical_pasid_detach_ioas, };
static int vfio_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id) diff --git a/include/linux/vfio.h b/include/linux/vfio.h index 454e9295970c..7b06d1bc7cb3 100644 --- a/include/linux/vfio.h +++ b/include/linux/vfio.h @@ -66,6 +66,7 @@ struct vfio_device { void (*put_kvm)(struct kvm *kvm); #if IS_ENABLED(CONFIG_IOMMUFD) struct iommufd_device *iommufd_device; + struct xarray pasid_pts; u8 iommufd_attached:1; #endif u8 cdev_opened:1; @@ -83,6 +84,8 @@ struct vfio_device { * bound iommufd. Undo in unbind_iommufd if @detach_ioas is not * called. * @detach_ioas: Opposite of attach_ioas + * @pasid_attach_ioas: The pasid variation of attach_ioas + * @pasid_detach_ioas: Opposite of pasid_attach_ioas * @open_device: Called when the first file descriptor is opened for this device * @close_device: Opposite of open_device * @read: Perform read(2) on device file descriptor @@ -107,6 +110,8 @@ struct vfio_device_ops { void (*unbind_iommufd)(struct vfio_device *vdev); int (*attach_ioas)(struct vfio_device *vdev, u32 *pt_id); void (*detach_ioas)(struct vfio_device *vdev); + int (*pasid_attach_ioas)(struct vfio_device *vdev, u32 pasid, u32 *pt_id); + void (*pasid_detach_ioas)(struct vfio_device *vdev, u32 pasid); int (*open_device)(struct vfio_device *vdev); void (*close_device)(struct vfio_device *vdev); ssize_t (*read)(struct vfio_device *vdev, char __user *buf, @@ -131,6 +136,8 @@ int vfio_iommufd_physical_bind(struct vfio_device *vdev, void vfio_iommufd_physical_unbind(struct vfio_device *vdev); int vfio_iommufd_physical_attach_ioas(struct vfio_device *vdev, u32 *pt_id); void vfio_iommufd_physical_detach_ioas(struct vfio_device *vdev); +int vfio_iommufd_physical_pasid_attach_ioas(struct vfio_device *vdev, u32 pasid, u32 *pt_id); +void vfio_iommufd_physical_pasid_detach_ioas(struct vfio_device *vdev, u32 pasid); int vfio_iommufd_emulated_bind(struct vfio_device *vdev, struct iommufd_ctx *ictx, u32 *out_device_id); void vfio_iommufd_emulated_unbind(struct vfio_device *vdev); @@ -158,6 +165,10 @@ vfio_iommufd_get_dev_id(struct vfio_device *vdev, struct iommufd_ctx *ictx) ((int (*)(struct vfio_device *vdev, u32 *pt_id)) NULL) #define vfio_iommufd_physical_detach_ioas \ ((void (*)(struct vfio_device *vdev)) NULL) +#define vfio_iommufd_physical_pasid_attach_ioas \ + ((int (*)(struct vfio_device *vdev, u32 pasid, u32 *pt_id)) NULL) +#define vfio_iommufd_physical_pasid_detach_ioas \ + ((void (*)(struct vfio_device *vdev, u32 pasid)) NULL) #define vfio_iommufd_emulated_bind \ ((int (*)(struct vfio_device *vdev, struct iommufd_ctx *ictx, \ u32 *out_device_id)) NULL)
This adds ioctls for the userspace to attach a given pasid of a vfio device to/from an IOAS/HWPT.
Signed-off-by: Yi Liu yi.l.liu@intel.com --- drivers/vfio/device_cdev.c | 45 +++++++++++++++++++++++++++++++ drivers/vfio/vfio.h | 4 +++ drivers/vfio/vfio_main.c | 8 ++++++ include/uapi/linux/vfio.h | 55 ++++++++++++++++++++++++++++++++++++++ 4 files changed, 112 insertions(+)
diff --git a/drivers/vfio/device_cdev.c b/drivers/vfio/device_cdev.c index e75da0a70d1f..c2ac7ed44537 100644 --- a/drivers/vfio/device_cdev.c +++ b/drivers/vfio/device_cdev.c @@ -210,6 +210,51 @@ int vfio_df_ioctl_detach_pt(struct vfio_device_file *df, return 0; }
+int vfio_df_ioctl_pasid_attach_pt(struct vfio_device_file *df, + struct vfio_device_pasid_attach_iommufd_pt __user *arg) +{ + struct vfio_device *device = df->device; + struct vfio_device_pasid_attach_iommufd_pt attach; + unsigned long minsz; + int ret; + + minsz = offsetofend(struct vfio_device_pasid_attach_iommufd_pt, pt_id); + + if (copy_from_user(&attach, arg, minsz)) + return -EFAULT; + + if (attach.argsz < minsz || attach.flags) + return -EINVAL; + + mutex_lock(&device->dev_set->lock); + ret = device->ops->pasid_attach_ioas(device, attach.pasid, &attach.pt_id); + mutex_unlock(&device->dev_set->lock); + + return ret; +} + +int vfio_df_ioctl_pasid_detach_pt(struct vfio_device_file *df, + struct vfio_device_pasid_detach_iommufd_pt __user *arg) +{ + struct vfio_device *device = df->device; + struct vfio_device_pasid_detach_iommufd_pt detach; + unsigned long minsz; + + minsz = offsetofend(struct vfio_device_pasid_detach_iommufd_pt, flags); + + if (copy_from_user(&detach, arg, minsz)) + return -EFAULT; + + if (detach.argsz < minsz || detach.flags) + return -EINVAL; + + mutex_lock(&device->dev_set->lock); + device->ops->pasid_detach_ioas(device, detach.pasid); + mutex_unlock(&device->dev_set->lock); + + return 0; +} + static char *vfio_device_devnode(const struct device *dev, umode_t *mode) { return kasprintf(GFP_KERNEL, "vfio/devices/%s", dev_name(dev)); diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h index 307e3f29b527..d228cdb6b345 100644 --- a/drivers/vfio/vfio.h +++ b/drivers/vfio/vfio.h @@ -353,6 +353,10 @@ int vfio_df_ioctl_attach_pt(struct vfio_device_file *df, struct vfio_device_attach_iommufd_pt __user *arg); int vfio_df_ioctl_detach_pt(struct vfio_device_file *df, struct vfio_device_detach_iommufd_pt __user *arg); +int vfio_df_ioctl_pasid_attach_pt(struct vfio_device_file *df, + struct vfio_device_pasid_attach_iommufd_pt __user *arg); +int vfio_df_ioctl_pasid_detach_pt(struct vfio_device_file *df, + struct vfio_device_pasid_detach_iommufd_pt __user *arg);
#if IS_ENABLED(CONFIG_VFIO_DEVICE_CDEV) void vfio_init_device_cdev(struct vfio_device *device); diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c index 40732e8ed4c6..850bbaebdd29 100644 --- a/drivers/vfio/vfio_main.c +++ b/drivers/vfio/vfio_main.c @@ -1230,6 +1230,14 @@ static long vfio_device_fops_unl_ioctl(struct file *filep, case VFIO_DEVICE_DETACH_IOMMUFD_PT: ret = vfio_df_ioctl_detach_pt(df, uptr); goto out; + + case VFIO_DEVICE_PASID_ATTACH_IOMMUFD_PT: + ret = vfio_df_ioctl_pasid_attach_pt(df, uptr); + goto out; + + case VFIO_DEVICE_PASID_DETACH_IOMMUFD_PT: + ret = vfio_df_ioctl_pasid_detach_pt(df, uptr); + goto out; } }
diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h index 97ab68a175e0..474bb314d135 100644 --- a/include/uapi/linux/vfio.h +++ b/include/uapi/linux/vfio.h @@ -975,6 +975,61 @@ struct vfio_device_detach_iommufd_pt {
#define VFIO_DEVICE_DETACH_IOMMUFD_PT _IO(VFIO_TYPE, VFIO_BASE + 20)
+/* + * VFIO_DEVICE_PASID_ATTACH_IOMMUFD_PT - _IOW(VFIO_TYPE, VFIO_BASE + 21, + * struct vfio_device_pasid_attach_iommufd_pt) + * @argsz: User filled size of this data. + * @flags: Must be 0. + * @pasid: The pasid to be attached. + * @pt_id: Input the target id which can represent an ioas or a hwpt + * allocated via iommufd subsystem. + * Output the input ioas id or the attached hwpt id which could + * be the specified hwpt itself or a hwpt automatically created + * for the specified ioas by kernel during the attachment. + * + * Associate a pasid (of a cdev device) with an address space within the + * bound iommufd. Undo by VFIO_DEVICE_PASID_DETACH_IOMMUFD_PT or device fd + * close. This is only allowed on cdev fds. + * + * If a pasid is currently attached to a valid hw_pagetable (hwpt), without + * doing a VFIO_DEVICE_PASID_DETACH_IOMMUFD_PT, a second + * VFIO_DEVICE_PASID_ATTACH_IOMMUFD_PT ioctl passing in another hwpt id is + * allowed. This action, also known as a hwpt replacement, will replace the + * pasid's currently attached hwpt with a new hwpt corresponding to the given + * @pt_id. + * + * Return: 0 on success, -errno on failure. + */ +struct vfio_device_pasid_attach_iommufd_pt { + __u32 argsz; + __u32 flags; + __u32 pasid; + __u32 pt_id; +}; + +#define VFIO_DEVICE_PASID_ATTACH_IOMMUFD_PT _IO(VFIO_TYPE, VFIO_BASE + 21) + +/* + * VFIO_DEVICE_PASID_DETACH_IOMMUFD_PT - _IOW(VFIO_TYPE, VFIO_BASE + 22, + * struct vfio_device_pasid_detach_iommufd_pt) + * @argsz: User filled size of this data. + * @flags: Must be 0. + * @pasid: The pasid to be detached. + * + * Remove the association of a pasid (of a cdev device) and its current + * associated address space. After it, the pasid of the device should be in + * a blocking DMA state. This is only allowed on cdev fds. + * + * Return: 0 on success, -errno on failure. + */ +struct vfio_device_pasid_detach_iommufd_pt { + __u32 argsz; + __u32 flags; + __u32 pasid; +}; + +#define VFIO_DEVICE_PASID_DETACH_IOMMUFD_PT _IO(VFIO_TYPE, VFIO_BASE + 22) + /* * Provide support for setting a PCI VF Token, which is used as a shared * secret between PF and VF drivers. This feature may only be set on a
This exposes PCIe PASID capability to userspace and where to emulate this capability if wants to further expose it to VM.
And this only exposes PASID capability for devices which has PCIe PASID extended struture in its configuration space. While for VFs, userspace is still unable to see this capability as SR-IOV spec forbides VF to implement PASID capability extended structure. It is a TODO in future. Related discussion can be found in below links:
https://lore.kernel.org/kvm/20200407095801.648b1371@w520.home/ https://lore.kernel.org/kvm/BL1PR11MB5271A60035EF591A5BE8AC878C01A@BL1PR11MB...
Signed-off-by: Yi Liu yi.l.liu@intel.com --- drivers/vfio/pci/vfio_pci_config.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/vfio/pci/vfio_pci_config.c b/drivers/vfio/pci/vfio_pci_config.c index 7e2e62ab0869..dfae5ad5ebc0 100644 --- a/drivers/vfio/pci/vfio_pci_config.c +++ b/drivers/vfio/pci/vfio_pci_config.c @@ -95,7 +95,7 @@ static const u16 pci_ext_cap_length[PCI_EXT_CAP_ID_MAX + 1] = { [PCI_EXT_CAP_ID_LTR] = PCI_EXT_CAP_LTR_SIZEOF, [PCI_EXT_CAP_ID_SECPCI] = 0, /* not yet */ [PCI_EXT_CAP_ID_PMUX] = 0, /* not yet */ - [PCI_EXT_CAP_ID_PASID] = 0, /* not yet */ + [PCI_EXT_CAP_ID_PASID] = PCI_EXT_CAP_PASID_SIZEOF, [PCI_EXT_CAP_ID_DVSEC] = 0xFF, };
From: Liu, Yi L yi.l.liu@intel.com Sent: Tuesday, September 26, 2023 5:31 PM
This exposes PCIe PASID capability to userspace and where to emulate this capability if wants to further expose it to VM.
And this only exposes PASID capability for devices which has PCIe PASID extended struture in its configuration space. While for VFs, userspace is still unable to see this capability as SR-IOV spec forbides VF to implement PASID capability extended structure. It is a TODO in future. Related discussion can be found in below links:
https://lore.kernel.org/kvm/20200407095801.648b1371@w520.home/ https://lore.kernel.org/kvm/BL1PR11MB5271A60035EF591A5BE8AC878C01A @BL1PR11MB5271.namprd11.prod.outlook.com/
Yes, we need a decision for VF case.
If the consensus is to continue exposing the PASID capability in vfio-pci config space by developing a kernel quirk mechanism to find offset for VF, then this patch for PF is orthogonal to that VF work and can go as it is.
But if the decision is to have a device feature for the user to enumerate the vPASID capability and let the VMM take care of finding the vPASID cap offset, then better we start doing that for PF too since it's not good to have two enumeration interfaces for PF/VF respectively.
My preference is via device feature given Qemu already includes lots of quirks for vfio-pci devices. Another reason is that when supporting vPASID with SIOV there are some arch constraints which the driver needs to report to the user to follow (e.g. don't assign ENQCMD-capable sibling vdev's to a same guest, etc.). A device feature interface can better encapsulate everything related to vPASID in one place.
Thanks Kevin
On Wed, 27 Sep 2023 08:07:54 +0000 "Tian, Kevin" kevin.tian@intel.com wrote:
From: Liu, Yi L yi.l.liu@intel.com Sent: Tuesday, September 26, 2023 5:31 PM
This exposes PCIe PASID capability to userspace and where to emulate this capability if wants to further expose it to VM.
And this only exposes PASID capability for devices which has PCIe PASID extended struture in its configuration space. While for VFs, userspace is still unable to see this capability as SR-IOV spec forbides VF to implement PASID capability extended structure. It is a TODO in future. Related discussion can be found in below links:
https://lore.kernel.org/kvm/20200407095801.648b1371@w520.home/ https://lore.kernel.org/kvm/BL1PR11MB5271A60035EF591A5BE8AC878C01A @BL1PR11MB5271.namprd11.prod.outlook.com/
Yes, we need a decision for VF case.
If the consensus is to continue exposing the PASID capability in vfio-pci config space by developing a kernel quirk mechanism to find offset for VF, then this patch for PF is orthogonal to that VF work and can go as it is.
But if the decision is to have a device feature for the user to enumerate the vPASID capability and let the VMM take care of finding the vPASID cap offset, then better we start doing that for PF too since it's not good to have two enumeration interfaces for PF/VF respectively.
Note also that QEMU implements a lazy algorithm for exposing capabilities, the default is to expose them, so we need to consider existing VMs seeing a new read-only PASID capability on an assigned PF.
That might support an alternate means to expose the capability.
My preference is via device feature given Qemu already includes lots of quirks for vfio-pci devices. Another reason is that when supporting vPASID with SIOV there are some arch constraints which the driver needs to report to the user to follow (e.g. don't assign ENQCMD-capable sibling vdev's to a same guest, etc.).
?!
A device feature interface can better encapsulate everything related to vPASID in one place.
Sorry if I don't remember, have you posted a proposal for the device feature interface? Thanks,
Alex
From: Alex Williamson alex.williamson@redhat.com Sent: Thursday, September 28, 2023 2:53 AM
On Wed, 27 Sep 2023 08:07:54 +0000 "Tian, Kevin" kevin.tian@intel.com wrote:
From: Liu, Yi L yi.l.liu@intel.com Sent: Tuesday, September 26, 2023 5:31 PM
This exposes PCIe PASID capability to userspace and where to emulate
this
capability if wants to further expose it to VM.
And this only exposes PASID capability for devices which has PCIe PASID extended struture in its configuration space. While for VFs, userspace is still unable to see this capability as SR-IOV spec forbides VF to implement PASID capability extended structure. It is a TODO in future. Related discussion can be found in below links:
https://lore.kernel.org/kvm/20200407095801.648b1371@w520.home/
https://lore.kernel.org/kvm/BL1PR11MB5271A60035EF591A5BE8AC878C01A
@BL1PR11MB5271.namprd11.prod.outlook.com/
Yes, we need a decision for VF case.
If the consensus is to continue exposing the PASID capability in vfio-pci config space by developing a kernel quirk mechanism to find offset for VF, then this patch for PF is orthogonal to that VF work and can go as it is.
But if the decision is to have a device feature for the user to enumerate the vPASID capability and let the VMM take care of finding the vPASID cap offset, then better we start doing that for PF too since it's not good to have two enumeration interfaces for PF/VF respectively.
Note also that QEMU implements a lazy algorithm for exposing capabilities, the default is to expose them, so we need to consider existing VMs seeing a new read-only PASID capability on an assigned PF.
That might support an alternate means to expose the capability.
Yep. that's also a valid point.
My preference is via device feature given Qemu already includes lots of quirks for vfio-pci devices. Another reason is that when supporting vPASID with SIOV there are some arch constraints which the driver needs to report to the user to follow (e.g. don't assign ENQCMD-capable sibling vdev's to a same guest, etc.).
?!
Sorry that I didn't plan to elaborate that tricky constraint before we show the overall SIOV/vPASID implementation. Explaining it requires lots of context and here just want to mention the potential requirement in case we need more proofs to go this direction. 😊
A device feature interface can better encapsulate everything related to vPASID in one place.
Sorry if I don't remember, have you posted a proposal for the device feature interface? Thanks,
Not yet. Will do in next version.
linux-kselftest-mirror@lists.linaro.org