Hi Jacob,
On 7/10/25 02:15, Jacob Pan wrote:
Hi Jason,
On Wed, 9 Jul 2025 13:27:24 -0300 Jason Gunthorpe jgg@nvidia.com wrote:
On Wed, Jul 09, 2025 at 08:51:58AM -0700, Jacob Pan wrote:
In the IOMMU Shared Virtual Addressing (SVA) context, the IOMMU hardware shares and walks the CPU's page tables. Architectures like x86 share static kernel address mappings across all user page tables, allowing the IOMMU to access the kernel portion of these tables.
Is there a use case where a SVA user can access kernel memory in the first place?
No. It should be fully blocked.
Then I don't understand what is the "vulnerability condition" being addressed here. We are talking about KVA range here.
Let me take a real example:
A device might be mistakenly configured to access memory at IOVA 0xffffa866001d5000 (a vmalloc'd memory region) with user-mode access permission. The corresponding page table entries for this IOVA translation, assuming a five-level page table, would appear as follows:
PGD: Entry present with U/S bit set (1) P4D: Entry present with U/S bit set (1) PUD: Entry present with U/S bit set (1) PMD: Entry present with U/S bit set (1) PTE: Entry present with U/S bit clear (0)
When the IOMMU walks this page table, it may potentially cache all present entries, regardless of the U/S bit's state. Upon reaching the leaf PTE, the IOMMU performs a permission check. This involves comparing the device's DMA access mode (in this case, user mode) against the cumulative U/S permission derived from an AND operation across all U/S bits in the traversed page table entries (which here results in U/S == 0).
The IOMMU correctly blocks this DMA access because the device's requested access (user mode) exceeds the permissions granted by the page table (supervisor-only at the PTE level). However, the PGD, P4D, PUD, and PMD entries that were traversed might remain cached within the IOMMU's paging structure cache.
Now, consider a scenario where the page table leaf page is freed and subsequently repurposed, and the U/S bit at its previous location is modified to 1. From the IOMMU's perspective, the page table for the aforementioned IOVA would now appear as follows:
PGD: Entry present with U/S bit set (1) [retrieved from paging cache] P4D: Entry present with U/S bit set (1) [retrieved from paging cache] PUD: Entry present with U/S bit set (1) [retrieved from paging cache] PMD: Entry present with U/S bit set (1) [retrieved from paging cache] PTE: Entry present with U/S bit set (1) {read from physical memory}
As a result, the device could then potentially access the memory at IOVA 0xffffa866001d5000 with user-mode permission, which was explicitly disallowed.
Thanks, baolu