On Mon, Mar 09, 2026 at 09:39:44AM -0600, Peter Gonda wrote:
Great feature to have thanks Jiri! A couple naive questions.
On Thu, Mar 5, 2026 at 5:38 AM Jiri Pirko jiri@resnulli.us wrote:
From: Jiri Pirko jiri@nvidia.com
Add a new "system_cc_decrypted" dma-buf heap to allow userspace to allocate decrypted (shared) memory for confidential computing (CoCo) VMs.
On CoCo VMs, guest memory is encrypted by default. The hardware uses an encryption bit in page table entries (C-bit on AMD SEV, "shared" bit on Intel TDX) to control whether a given memory access is encrypted or decrypted. The kernel's direct map is set up with encryption enabled, so pages returned by alloc_pages() are encrypted in the direct map by default. To make this memory usable for devices that do not support DMA to encrypted memory (no TDISP support), it has to be explicitly decrypted. A couple of things are needed to properly handle decrypted memory for the dma-buf use case:
set_memory_decrypted() on the direct map after allocation: Besides clearing the encryption bit in the direct map PTEs, this also notifies the hypervisor about the page state change. On free, the inverse set_memory_encrypted() must be called before returning pages to the allocator. If re-encryption fails, pages are intentionally leaked to prevent decrypted memory from being reused as private.
pgprot_decrypted() for userspace and kernel virtual mappings: Any new mapping of the decrypted pages, be it to userspace via mmap or to kernel vmalloc space via vmap, creates PTEs independent of the direct map. These must also have the encryption bit cleared, otherwise accesses through them would see encrypted (garbage) data.
So this only works on new mappings? What if there are existing mappings to the memory that will be converted to shared?
The set_memory_decrypted() is called during system_heap_allocate(), it is not possible to change dynamically between encrypted/decrypted.
Once the heap is created every PTE is always created with the correct pgprot.
Jason