On Thu, Jan 09, 2025 at 12:57:58AM +0800, Xu Yilun wrote:
On Wed, Jan 08, 2025 at 09:30:26AM -0400, Jason Gunthorpe wrote:
On Tue, Jan 07, 2025 at 10:27:15PM +0800, Xu Yilun wrote:
Add a flag for ioctl(VFIO_DEVICE_BIND_IOMMUFD) to mark a device as for private assignment. For these private assigned devices, disallow host accessing their MMIO resources.
Why? Shouldn't the VMM simply not call mmap? Why does the kernel have to enforce this?
MM.. maybe I should not say 'host', instead 'userspace'.
I think the kernel part VMM (KVM) has the responsibility to enforce the correct behavior of the userspace part VMM (QEMU). QEMU has no way to touch private memory/MMIO intentionally or accidently. IIUC that's one of the initiative guest_memfd is introduced for private memory. Private MMIO follows.
Okay, but then why is it a flag like that? I'm expecting a much broader system here to make the VFIO device into a confidential device (like setup the TDI) where we'd have to enforce the private things, communicate with some secure world to assign it, and so on.
I want to see a fuller solution to the CC problem in VFIO before we can be sure what is the correct UAPI. In other words, make the VFIO device into a CC device should also prevent mmaping it and so on.
So, I would take this out and defer VFIO enforcment to a series which does fuller CC enablement of VFIO.
The precursor work should just be avoiding requiring a VMA when installing VFIO MMIO into the KVM and IOMMU stage 2 mappings. Ie by using a FD to get the CPU pfns into iommufd and kvm as you are showing.
This works just fine for non-CC devices anyhow and is the necessary building block for making a TDI interface in VFIO.
Jason