RE: [PATCH v2 02/10] iommu: Introduce a new iommu_group_replace_domain() API

13 Feb 2023


      ...
From: Jason Gunthorpe jgg@nvidia.com
Sent: Saturday, February 11, 2023 8:45 AM
On Fri, Feb 10, 2023 at 04:51:10PM -0700, Alex Williamson wrote:
...
On Tue, 7 Feb 2023 13:17:54 -0800
Nicolin Chen nicolinc@nvidia.com wrote:
...
qemu has a need to replace the translations associated with a domain
when the guest does large-scale operations like switching between an
IDENTITY domain and, say, dma-iommu.c.
Currently, it does this by replacing all the mappings in a single
domain, but this is very inefficient and means that domains have to be
per-device rather than per-translation.
Provide a high-level API to allow replacements of one domain with
another. This is similar to a detach/attach cycle except it doesn't
force the group to go to the blocking domain in-between.
By removing this forced blocking domain the iommu driver has the
opportunity to implement an atomic replacement of the domains to the
greatest extent its hardware allows.
It could be possible to adderss this by simply removing the protection
from the iommu_attach_group(), but it is not so clear if that is safe
for the few users. Thus, add a new API to serve this new purpose.
Atomic replacement allows the qemu emulation of the viommu to be
more
...
...
complete, as real hardware has this ability.
I was under the impression that we could not atomically switch a
device's domain relative to in-flight DMA.
it's possible as long as the mappings for in-flight DMA don't change
in the transition.
...
Certainly all the HW can be proper atomic but not necessarily easily -
the usual issue is a SW complication to manage the software controlled
cache tags in a way that doesn't corrupt the cache.
This is because the cache tag and the io page table top are in
different 64 bit words so atomic writes don't cover both, and thus the
IOMMU HW could tear the two stores and mismatch the cache tag to the
table top. This would corrupt the cache.
VT-d spec recommends using 128bit cmpxchg instruction to update
page table pointer and DID together.
...
The easiest way to avoid this is for SW to use the same DID for the
new and old tables. This is possible if this translation entry is the
only user of the DID. A more complex way would use a temporary DID
that can be safely corrupted. But realistically I'd expect VT-d
drivers to simply make the PASID invalid for the duration of the
update.
However something like AMD has a single cache tag for every entry in
the PASID table so you could do an atomic replace trivially. Just
update the PASID and invalidate the caches.
ARM has a flexible PASID table and atomic replace would be part of
resizing it. eg you can atomically update the top of the PASID table
with a single 64 bit store.
So replace lets the drivers implement those special behaviors if it
makes sense for them.
...
Or maybe atomic is the wrong word here since we expect no in-flight DMA
during the sort of mode transitions referred to here, and we're really
just trying to convey that we can do this via a single operation with
reduced latency?  Thanks,
atomic means DMA will either translate with the old domain or the new
domain but never a blocking domain. Keep in mind that with nesting
"domain" can mean a full PASID table in guest memory.
I should reiterate that replace is not an API that is required to be
atomic.
yes. Just as explained in the commit message:
"
  By removing this forced blocking domain the iommu driver has the
  opportunity to implement an atomic replacement of the domains to the
  greatest extent its hardware allows.
"
API level replace only implies removing transition to/from blocking
domain. and then it's driver's call whether it wants to take this
chance to implement a true 'atomic' behavior.

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

RE: [PATCH v2 02/10] iommu: Introduce a new iommu_group_replace_domain() API