Jason,
On 9/18/2025 7:47 PM, Jason Gunthorpe wrote:
On Thu, Sep 11, 2025 at 12:14:15PM +0000, Vasant Hegde wrote:
The IOMMU IOVA allocator initially starts with 32-bit address and onces its exhuasted it switches to 64-bit address (max address is determined based on IOMMU and device DMA capability). To support larger IOVA, AMD IOMMU driver increases page table level.
Is this the case? I thought I saw something that the allocator is starting from high addresses?
Right. By default we start with 32bit address (from 0xffff_ffff to 0x0) and once its full it goes to 64bit (assuming device supports 64bit). At that point increase_address_space() gets called.
But in unmap path (iommu_v1_unmap_pages()), fetch_pte() reads pgtable->[root/mode] without lock. So its possible that in exteme corner case, when increase_address_space() is updating pgtable->[root/mode], fetch_pte() reads wrong page table level (pgtable->mode). It does compare the value with level encoded in page table and returns NULL. This will result is iommu_unmap ops to fail and upper layer may retry/log WARN_ON.
Yep, definately a bug, I spotted it already and fixed it in iommupt, you can read about it here:
https://lore.kernel.org/linux-iommu/13-v5-116c4948af3d+68091-iommu_pt_jgg@nv...
Nice. Will take a look this week.
CPU 0 CPU 1
map pages unmap pages alloc_pte() -> increase_address_space() iommu_v1_unmap_pages() -> fetch_pte() pgtable->root = pte (new root value) READ pgtable->[mode/root] Reads new root, old mode Updates mode (pgtable->mode += 1)
This doesn't solve the whole problem, yes reading the two values coherently is important but we must also serialize parallel map such that map only returns if the IOMMU is actually programmed with the new roots.
I don't see that in this fix.
IMHO unless someone is actually hitting this I'd leave it and focus on merging iomupt which fully fixes this without adding any locks to the fast path.
Unfortunately yes. We had customer reporting this issue.
-Vasant