On Wed, Mar 25, 2026 at 12:23 PM Jiri Pirko jiri@resnulli.us wrote:
From: Jiri Pirko jiri@nvidia.com
Add a new "system_cc_shared" dma-buf heap to allow userspace to allocate shared (decrypted) memory for confidential computing (CoCo) VMs.
On CoCo VMs, guest memory is private by default. The hardware uses an encryption bit in page table entries (C-bit on AMD SEV, "shared" bit on Intel TDX) to control whether a given memory access is private or shared. The kernel's direct map is set up as private, so pages returned by alloc_pages() are private in the direct map by default. To make this memory usable for devices that do not support DMA to private memory (no TDISP support), it has to be explicitly shared. A couple of things are needed to properly handle shared memory for the dma-buf use case:
set_memory_decrypted() on the direct map after allocation: Besides clearing the encryption bit in the direct map PTEs, this also notifies the hypervisor about the page state change. On free, the inverse set_memory_encrypted() must be called before returning pages to the allocator. If re-encryption fails, pages are intentionally leaked to prevent shared memory from being reused as private.
pgprot_decrypted() for userspace and kernel virtual mappings: Any new mapping of the shared pages, be it to userspace via mmap or to kernel vmalloc space via vmap, creates PTEs independent of the direct map. These must also have the encryption bit cleared, otherwise accesses through them would see encrypted (garbage) data.
DMA_ATTR_CC_SHARED for DMA mapping: Since the pages are already shared, the DMA API needs to be informed via DMA_ATTR_CC_SHARED so it can map them correctly as unencrypted for device access.
On non-CoCo VMs, the system_cc_shared heap is not registered to prevent misuse by userspace that does not understand the security implications of explicitly shared memory.
Signed-off-by: Jiri Pirko jiri@nvidia.com
Reviewed-by: T.J. Mercier tjmercier@google.com