On Mon, Feb 13, 2023 at 02:34:18AM +0000, Tian, Kevin wrote:
External email: Use caution opening links or attachments
From: Nicolin Chen nicolinc@nvidia.com Sent: Saturday, February 11, 2023 8:10 AM
On Fri, Feb 10, 2023 at 02:11:23AM +0000, Tian, Kevin wrote:
if (!iommufd_hw_pagetable_has_group(hwpt, idev->group)) {
if (list_empty(&hwpt->devices)) {
iopt_table_remove_domain(&hwpt->ioas->iopt,
hwpt->domain);
list_del(&hwpt->hwpt_item);
}
I'm not sure how this can be fully shared between detach and replace. Here some work e.g. above needs to be done before calling iommu_group_replace_domain() while others can be done afterwards.
This iopt_table_remove_domain/list_del is supposed to be done in the hwpt's destroy() actually. We couldn't move it because it'd need the new domain_alloc_user op and its implementation in ARM driver. Overall, I think it should be safe to put it behind the iommu_group_replace_domain().
My confusion is that we have different flows between detach/attach and replace.
today with separate detach+attach we have following flow:
Remove device from current hwpt; if (last_device in hwpt) { Remove hwpt domain from current iopt; if (last_device in group) detach group from hwpt domain; } if (first device in group) { attach group to new hwpt domain; if (first_device in hwpt) Add hwpt domain to new iopt; Add device to new hwpt;
but replace flow is different on the detach part:
if (first device in group) { replace group's domain from current hwpt to new hwpt; if (first_device in hwpt) Add hwpt domain to new iopt; } Remove device from old hwpt; if (last_device in old hwpt) Remove hwpt domain from old iopt; Add device to new hwpt;
Oh... thinking it carefully, I see the flow does look a bit off. Perhaps it's better to have a similar flow for replace.
However, I think something would be still different due to its tricky nature, especially for a multi-device iommu_group.
An iommu_group_detach happens only when a device is the last one in its group to go through the routine via a DETACH ioctl, while an iommu_group_replace_domain() happens only when the device is the first one in its group to go through the routine via another ATTACH ioctl. However, when the first device does a replace, the cleanup routine of the old hwpt is a NOP, since there are still other devices (same group) in the old hwpt. And two implications here:
- Any other device in the same group has to forcibly switch to the new domain, when the first device does a replace.
- The actual hwpt cleanup can only happen at the last device's replace call.
This also means that kernel has to rely on the integrity of the user space that it must replace all active devices in the group:
Jason suggested to move hwpt cleanup out of the detach path in his reply to patch7. Presumably with that fix the major tricky point about hwpt in following scenarios would disappear. Let's see how it will work out then. 😊
What about point 1? If dev2 and dev3 are already replaced when doing iommu_group_replace_domain() on dev1, their idev objects still have the old hwpt/iopt until user space does another two IOCTLs on them, right?
Should we only call iommu_group_replace_domain() when the last device in the group gets a replace IOCTL?
Thanks Nic