On 7/9/25 19:14, Baolu Lu wrote:
If the problem is truly limited to freeing page tables, it needs to be commented appropriately.
Yeah, good comments. It should not be limited to freeing page tables; freeing page tables is just a real case that we can see in the vmalloc/ vfree paths. Theoretically, whenever a kernel page table update is done and the CPU TLB needs to be flushed, the secondary TLB (i.e., the caches on the IOMMU) should be flushed accordingly. It's assumed that this happens in flush_tlb_kernel_range().
Could you elaborate on this a bit further?
I thought that the IOMMU was only ever doing "user" permission walks and never "supervisor". That's why we had the whole discussion about whether the IOMMU would stop if it saw an upper-level entry with _PAGE_USER clear.
The reason this matters is that leaf kernel page table entries always have _PAGE_USER clear. That means an IOMMU doing "user" permission walks should never be able to do anything with them. Even if an IOTLB entry was created* for a _PAGE_USER=0 PTE, the "user" permission walk will stop when it finds that entry.
It doesn't matter if the entry is stale or not. The "user" permission IOMMU walk can't do anything with the supervisor mapping.
Why does this matter? We flush the CPU TLB in a bunch of different ways, _especially_ when it's being done for kernel mappings. For example, __flush_tlb_all() is a non-ranged kernel flush which has a completely parallel implementation with flush_tlb_kernel_range(). Call sites that use _it_ are unaffected by the patch here.
Basically, if we're only worried about vmalloc/vfree freeing page tables, then this patch is OK. If the problem is bigger than that, then we need a more comprehensive patch.
* I'm not sure if the IOMMU will even create an IOTLB entry for a supervisor-permission mapping while doing a user-permission walk.