On 10/23/2025 11:50 PM, Jason Gunthorpe wrote:
From: Alejandro Jimenez alejandro.j.jimenez@oracle.com
Replace the io_pgtable versions with pt_iommu versions. The v2 page table uses the x86 implementation that will be eventually shared with VT-d.
This supports the same special features as the original code:
- increase_top for the v1 format to allow scaling from 3 to 6 levels
- non-present flushing
- Dirty tracking for v1 only
- __sme_set() to adjust the PTEs for CC
- Optimization for flushing with virtualization to minimize the range
- amd_iommu_pgsize_bitmap override of the native page sizes
- page tables allocate from the device's NUMA node
Rework the domain ops so that v1/v2 get their own ops. Make dedicated allocation functions for v1 and v2. Hook up invalidation for a top change to struct pt_iommu_flush_ops. Delete some of the iopgtable related code that becomes unused in this patch. The next patch will delete the rest of it.
This fixes a race bug in AMD's increase_address_space() implementation. It stores the top level and top pointer in different memory, which prevents other threads from reading a coherent version:
increase_address_space() alloc_pte() level = pgtable->mode - 1; pgtable->root = pte; pgtable->mode += 1; pte = &pgtable->root[PM_LEVEL_INDEX(level, address)];
The iommupt version is careful to put mode and root under a single READ_ONCE and then is careful to only READ_ONCE a single time per walk.
Signed-off-by: Alejandro Jimenez alejandro.j.jimenez@oracle.com Tested-by: Alejandro Jimenez alejandro.j.jimenez@oracle.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com
I have reviewed and done some testing. Looks good to me.
Reviewed-by: Vasant Hegde vasant.hegde@amd.com
-Vasant