From: Jason Gunthorpe jgg@nvidia.com Sent: Friday, October 24, 2025 2:21 AM
map is slightly complicated because it has to handle a number of special edge cases:
- Overmapping a previously shared, but now empty, table level with an OA. Requries validating and freeing the possibly empty tables
- Doing the above across an entire to-be-created contiguous entry
- Installing a new shared table level concurrently with another thread
- Expanding the table by adding more top levels
Table expansion is a unique feature of AMDv1, this version is quite similar except we handle racing concurrent lockless map. The table top pointer and starting level are encoded in a single uintptr_t which ensures we can READ_ONCE() without tearing. Any op will do the READ_ONCE() and use that fixed point as its starting point. Concurrent expansion is handled with a table global spinlock.
When inserting a new table entry map checks that the entire portion of the table is empty. This includes freeing any empty lower tables that will be overwritten by an OA. A separate free list is used while checking and collecting all the empty lower tables so that writing the new entry is uninterrupted, either the new entry fully writes or nothing changes.
A special fast path for PAGE_SIZE is implemented that does a direct walk to the leaf level and installs a single entry. This gives ~15% improvement for iommu_map() when mapping lists of single pages.
This version sits under the iommu_domain_ops as map_pages() but does not require the external page size calculation. The implementation is actually map_range() and can do arbitrary ranges, internally handling all the validation and supporting any arrangment of page sizes. A future series can optimize iommu_map() to take advantage of this.
Tested-by: Alejandro Jimenez alejandro.j.jimenez@oracle.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com
Reviewed-by: Kevin Tian kevin.tian@intel.com