On Mon, Feb 23, 2026 at 1:51 AM Jiri Pirko jiri@resnulli.us wrote:
From: Jiri Pirko jiri@nvidia.com
Add a new "system_cc_decrypted" dma-buf heap to allow userspace to allocate decrypted (shared) memory for confidential computing (CoCo) VMs.
On CoCo VMs, guest memory is encrypted by default. The hardware uses an encryption bit in page table entries (C-bit on AMD SEV, "shared" bit on Intel TDX) to control whether a given memory access is encrypted or decrypted. The kernel's direct map is set up with encryption enabled, so pages returned by alloc_pages() are encrypted in the direct map by default. To make this memory usable for devices that do not support DMA to encrypted memory (no TDISP support), it has to be explicitly decrypted. A couple of things are needed to properly handle decrypted memory for the dma-buf use case:
set_memory_decrypted() on the direct map after allocation: Besides clearing the encryption bit in the direct map PTEs, this also notifies the hypervisor about the page state change. On free, the inverse set_memory_encrypted() must be called before returning pages to the allocator. If re-encryption fails, pages are intentionally leaked to prevent decrypted memory from being reused as private.
pgprot_decrypted() for userspace and kernel virtual mappings: Any new mapping of the decrypted pages, be it to userspace via mmap or to kernel vmalloc space via vmap, creates PTEs independent of the direct map. These must also have the encryption bit cleared, otherwise accesses through them would see encrypted (garbage) data.
DMA_ATTR_CC_DECRYPTED for DMA mapping: Since the pages are already decrypted, the DMA API needs to be informed via DMA_ATTR_CC_DECRYPTED so it can map them correctly as unencrypted for device access.
On non-CoCo VMs, the system_cc_decrypted heap is not registered to prevent misuse by userspace that does not understand the security implications of explicitly decrypted memory.
Signed-off-by: Jiri Pirko jiri@nvidia.com
Thanks for reworking this! I've not reviewed it super closely, but I believe it resolves my objection on your first version.
Few nits/questions below.
@@ -296,6 +345,14 @@ static void system_heap_dma_buf_release(struct dma_buf *dmabuf) for_each_sgtable_sg(table, sg, i) { struct page *page = sg_page(sg);
/** Intentionally leak pages that cannot be re-encrypted* to prevent decrypted memory from being reused.*/if (buffer->decrypted &&system_heap_set_page_encrypted(page))continue;
What are the conditions where this would fail? How much of an edge case is this? I fret this opens a DoS vector if one is able to allocate from this heap and then stress the system when doing the free.
Should there be some global list of leaked decrypted pages such that the mm subsystem could try again later to recover these?
diff --git a/include/linux/dma-heap.h b/include/linux/dma-heap.h index 648328a64b27..d97b668413c1 100644 --- a/include/linux/dma-heap.h +++ b/include/linux/dma-heap.h @@ -10,6 +10,7 @@ #define _DMA_HEAPS_H
#include <linux/types.h> +#include <uapi/linux/dma-heap.h>
struct dma_heap;
diff --git a/include/uapi/linux/dma-heap.h b/include/uapi/linux/dma-heap.h index a4cf716a49fa..ab95bb355ed5 100644 --- a/include/uapi/linux/dma-heap.h +++ b/include/uapi/linux/dma-heap.h @@ -18,8 +18,7 @@ /* Valid FD_FLAGS are O_CLOEXEC, O_RDONLY, O_WRONLY, O_RDWR */ #define DMA_HEAP_VALID_FD_FLAGS (O_CLOEXEC | O_ACCMODE)
-/* Currently no heap flags */ -#define DMA_HEAP_VALID_HEAP_FLAGS (0ULL) +#define DMA_HEAP_VALID_HEAP_FLAGS (0)
/**
- struct dma_heap_allocation_data - metadata passed from userspace for
Are these header changes still necessary?
thanks -john
linaro-mm-sig@lists.linaro.org