From: Baolu Lu baolu.lu@linux.intel.com Sent: Thursday, July 10, 2025 10:15 AM
On 7/9/25 23:29, Dave Hansen wrote:
On 7/8/25 23:28, Lu Baolu wrote:
Modern IOMMUs often cache page table entries to optimize walk
performance,
even for intermediate page table levels. If kernel page table mappings are changed (e.g., by vfree()), but the IOMMU's internal caches retain stale entries, Use-After-Free (UAF) vulnerability condition arises. If these freed page table pages are reallocated for a different purpose, potentially by an attacker, the IOMMU could misinterpret the new data as valid page table entries. This allows the IOMMU to walk into attacker-controlled memory, leading to arbitrary physical memory DMA access or privilege escalation.
The approach here is certainly conservative and simple. It's also not going to cause big problems on systems without fancy IOMMUs.
But I am a _bit_ worried that it's _too_ conservative. The changelog talks about page table page freeing, but the actual code:
@@ -1540,6 +1541,7 @@ void flush_tlb_kernel_range(unsigned long start,
unsigned long end)
kernel_tlb_flush_range(info);
put_flush_tlb_info();
- iommu_sva_invalidate_kva_range(start, end); }
is in a very generic TLB flushing spot that's used for a lot more than just freeing page tables.
If the problem is truly limited to freeing page tables, it needs to be commented appropriately.
Yeah, good comments. It should not be limited to freeing page tables; freeing page tables is just a real case that we can see in the vmalloc/ vfree paths. Theoretically, whenever a kernel page table update is done and the CPU TLB needs to be flushed, the secondary TLB (i.e., the caches on the IOMMU) should be flushed accordingly. It's assumed that this happens in flush_tlb_kernel_range().
this conservative approach sounds safer - even if we overlooked any threat beyond relying on page table free, doing invalidation in this function is sufficient to mitigate.
but as Dave suggested let's add a comment in above to clarify the motivation of doing so.