On Thu, Aug 15, 2024 at 08:41:19PM -0300, Jason Gunthorpe wrote:
On Thu, Aug 15, 2024 at 12:46:24PM -0700, Nicolin Chen wrote:
On Thu, Aug 15, 2024 at 04:08:48PM -0300, Jason Gunthorpe wrote:
On Wed, Aug 07, 2024 at 01:10:46PM -0700, Nicolin Chen wrote:
+int iommufd_viommu_set_vdev_id(struct iommufd_ucmd *ucmd) +{
- struct iommu_viommu_set_vdev_id *cmd = ucmd->cmd;
- struct iommufd_hwpt_nested *hwpt_nested;
- struct iommufd_vdev_id *vdev_id, *curr;
- struct iommufd_hw_pagetable *hwpt;
- struct iommufd_viommu *viommu;
- struct iommufd_device *idev;
- int rc = 0;
- if (cmd->vdev_id > ULONG_MAX)
return -EINVAL;
- idev = iommufd_get_device(ucmd, cmd->dev_id);
- if (IS_ERR(idev))
return PTR_ERR(idev);
- hwpt = idev->igroup->hwpt;
- if (hwpt == NULL || hwpt->obj.type != IOMMUFD_OBJ_HWPT_NESTED) {
rc = -EINVAL;
goto out_put_idev;
- }
- hwpt_nested = container_of(hwpt, struct iommufd_hwpt_nested, common);
This doesn't seem like a necessary check, the attached hwpt can change after this is established, so this can't be an invariant we enforce.
If you want to do 1:1 then somehow directly check if the idev is already linked to a viommu.
But idev can't link to a viommu without a proxy hwpt_nested?
Why not? The idev becomes linked to the viommu when the dev id is set
Unless we are also going to enforce the idev is always attached to a nested then I don't think we need to check it here.
Things will definately not entirely work as expected if the vdev is directly attached to the s2 or a blocking, but it won't harm anything.
My view is that, the moment there is a VIOMMU object, that must be a nested IOMMU case, so there must be a nested hwpt. Blocking domain would be a hwpt_nested too (vSTE=Abort) as we previously concluded.
Then, in a nested case, it feels odd that an idev is attached to an S2 hwpt..
That being said, I think we can still do that with validations: If idev->hwpt is nested, compare input viommu v.s idev->hwpt->viommu. If idev->hwpt is paging, compare input viommu->hwpt v.s idev->hwpt.
the stage-2 only configuration should have an identity hwpt_nested right?
Yes, that is the right way to use the API
It has to work by having the iommu driver directly access the xarray and the entirely under the spinlock the iommu driver can translate the vSID to the pSID and the let go and push the invalidation to HW. No races.
Maybe the iommufd_viommu_invalidate ioctl handler should hold that xa_lock around the viommu->ops->cache_invalidate, and then add lock assert in iommufd_viommu_find_device?
That doesn't seem like a great idea, you can't do copy_from_user under a spinlock.
xa_lock(&viommu->vdev_ids); vdev_id = xa_load(&viommu->vdev_ids, cmd->vdev_id); if (!vdev_id || vdev_id->vdev_id != cmd->vdev_id (????) || vdev_id->dev != idev->dev) err __xa_erase(&viommu->vdev_ids, cmd->vdev_id); xa_unlock((&viommu->vdev_ids);
I've changed to xa_cmpxchg() in my local tree. Would it be simpler?
No, that is still not right, you can't take the vdev_id outside the lock at all. Even for cmpxchng because the vdev_id could have been freed and reallocated by another thread.
You must combine the validation of the vdev_id with the erase under a single critical region.
Yea, we need a wider locker to keep the vdev_id list and its data pointers unchanged. I'll try the rw semaphore that you suggested in the other mail.
This complicates things overall especially with the VIRQ that has involved interrupt context polling vdev_id, where semaphore/mutex won't fit very well. Perhaps it would need a driver-level bottom half routine to call those helpers with locks. I am glad that you noticed the problem early.
Thanks! Nicolin