On Sat, Oct 25, 2025 at 11:24:25AM -0400, Pasha Tatashin wrote:
On Thu, Oct 23, 2025 at 2:21 PM Jason Gunthorpe jgg@nvidia.com wrote:
The existing IOMMU page table implementations duplicate all of the working algorithms for each format. By using the generic page table API a single C version of the IOMMU algorithms can be created and re-used for all of the different formats used in the drivers. The implementation will provide a single C version of the iommu domain operations: iova_to_phys, map, unmap, and read_and_clear_dirty.
Further, adding new algorithms and techniques becomes easy to do across the entire fleet of drivers and formats.
It is an enabler for cross-arch page_table_check for IOMMU. There is also a long-standing issue where PT pages are not freed on unmap, leading to substantial overhead on some configurations, especially where IOVA is cycled through for security purposes (as it was done in our environment). Having a single, solid fix for this issue that affects all arches is very much desirable.
Yes, I have a simple low cost plan to fix the PMD/etc unfreeing problem, at least for iommufd.
In iommufd there is an interval tree of IOVA used in the iommu_domain. When a range of IOVA is removed from the interval tree it can be normally unmapped. iommufd can then compute the empty span, this is the end of the prior populated range till the start of the next populated range and do a cleaning operation on the iommu domain with that range.
Cleaning will free any table levels that are fully included in the empty span. cleaning will run under the same 'range-locked' rules as map/unmap/iova_to_phys.
This cleaning algorithm is already used as part of map, it just needs to be exposed as an independent op.
Thanks, Jason