On Mon, May 19, 2025 at 10:59:49PM +0530, Vasant Hegde wrote:
Jason, Nicolin, Kevin,
On 5/15/2025 9:36 PM, Jason Gunthorpe wrote:
On Thu, May 08, 2025 at 08:02:32PM -0700, Nicolin Chen wrote:
+/**
- struct iommu_hw_queue_alloc - ioctl(IOMMU_HW_QUEUE_ALLOC)
- @size: sizeof(struct iommu_hw_queue_alloc)
- @flags: Must be 0
- @viommu_id: Virtual IOMMU ID to associate the HW queue with
- @type: One of enum iommu_hw_queue_type
- @index: The logical index to the HW queue per virtual IOMMU for a multi-queue
model
- @out_hw_queue_id: The ID of the new HW queue
- @base_addr: Base address of the queue memory in guest physical address space
- @length: Length of the queue memory in the guest physical address space
- Allocate a HW queue object for a vIOMMU-specific HW-accelerated queue, which
- allows HW to access a guest queue memory described by @base_addr and @length.
- Upon success, the underlying physical pages of the guest queue memory will be
- pinned to prevent VMM from unmapping them in the IOAS until the HW queue gets
- destroyed.
Do we have way to make the pinning optional?
As I understand AMD's system the iommu HW itself translates the base_addr through the S2 page table automatically, so it doesn't need pinned memory and physical addresses but just the IOVA.
Correct. HW will translate GPA -> SPA automatically using below information.
AMD IOMMU need special device ID to setup with GPA -> SPA mapping per VM. and its programmed in VF Control BAR (VFCntlMMIO Offset {16’b[GuestID], 6’b01_0000} Guest Miscellaneous Control Register). IOMMU HW will use this address for GPA to SPA translation for buffers like command buffer.
So HW will use Base address (GPA), head/tail pointer to get the offset from Base. Then it will use GPA -> SPA translation.
Perhaps for this reason the pinning should be done with a function call from the driver?
We still need to make sure memory allocated for page is present in memory so that IOMMU HW can access it.
Pinning at the time of guest boot is enough here -OR- do we need to increase reference in queue_alloc() path ?
For NVIDIA's vCMDQ that reads host PA directly, pages should be pinned once when stage 2 mappings are created for the guest RAM, and iommu_hw_queue_alloc() should pin the pages again to prevent the gPA from being unmapped in the stage 2 page table. Otherwise it will be a security hole, as HW continues to read the unmapped memory through physical address space.
I understand that AMD Command Buffer also needs the S2 mappings to be present in order to work correctly. But what happens if a queue memory that isn't pinned (or even gets unmapped)? Will it raise a translation fault v.s. HW reading the unmapped memory?
If so, I think this is Jason's point: there would be unlikely a security hole, i.e. for AMD, iommu_hw_queue_alloc() pinning the physical pages is likely optional.
Thanks Nicolin