On Thu, Mar 05, 2026 at 01:36:40PM +0100, Jiri Pirko wrote:
> From: Jiri Pirko <jiri(a)nvidia.com>
>
> Current CC designs don't place a vIOMMU in front of untrusted devices.
> Instead, the DMA API forces all untrusted device DMA through swiotlb
> bounce buffers (is_swiotlb_force_bounce()) which copies data into
> decrypted memory on behalf of the device.
>
> When a caller has already arranged for the memory to be decrypted
> via set_memory_decrypted(), the DMA API needs to know so it can map
> directly using the unencrypted physical address rather than bounce
> buffering. Following the pattern of DMA_ATTR_MMIO, add
> DMA_ATTR_CC_DECRYPTED for this purpose. Like the MMIO case, only the
> caller knows what kind of memory it has and must inform the DMA API
> for it to work correctly.
>
> Signed-off-by: Jiri Pirko <jiri(a)nvidia.com>
> ---
> v1->v2:
> - rebased on top of recent dma-mapping-fixes
> ---
> include/linux/dma-mapping.h | 6 ++++++
> include/trace/events/dma.h | 3 ++-
> kernel/dma/direct.h | 14 +++++++++++---
> 3 files changed, 19 insertions(+), 4 deletions(-)
>
> diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
> index 29973baa0581..ae3d85e494ec 100644
> --- a/include/linux/dma-mapping.h
> +++ b/include/linux/dma-mapping.h
> @@ -85,6 +85,12 @@
> * a cacheline must have this attribute for this to be considered safe.
> */
> #define DMA_ATTR_CPU_CACHE_CLEAN (1UL << 11)
> +/*
> + * DMA_ATTR_CC_DECRYPTED: Indicates memory that has been explicitly decrypted
> + * (shared) for confidential computing guests. The caller must have
> + * called set_memory_decrypted(). A struct page is required.
> + */
> +#define DMA_ATTR_CC_DECRYPTED (1UL << 12)
While adding the new attribute is fine, I would expect additional checks in
dma_map_phys() to ensure the attribute cannot be misused. For example,
WARN_ON(attrs & (DMA_ATTR_CC_DECRYPTED | DMA_ATTR_MMIO)), along with a check
that we are taking the direct path only.
Thanks
On 3/12/26 19:45, Matt Evans wrote:
> Hi all,
>
>
> There were various suggestions in the September 2025 thread "[TECH
> TOPIC] vfio, iommufd: Enabling user space drivers to vend more
> granular access to client processes" [0], and LPC discussions, around
> improving the situation for multi-process userspace driver designs.
> This RFC series implements some of these ideas.
>
> (Thanks for feedback on v1! Revised series, with changes noted
> inline.)
>
> Background: Multi-process USDs
> ==============================
>
> The userspace driver scenario discussed in that thread involves a
> primary process driving a PCIe function through VFIO/iommufd, which
> manages the function-wide ownership/lifecycle. The function is
> designed to provide multiple distinct programming interfaces (for
> example, several independent MMIO register frames in one function),
> and the primary process delegates control of these interfaces to
> multiple independent client processes (which do the actual work).
> This scenario clearly relies on a HW design that provides appropriate
> isolation between the programming interfaces.
>
> The two key needs are:
>
> 1. Mechanisms to safely delegate a subset of the device MMIO
> resources to a client process without over-sharing wider access
> (or influence over whole-device activities, such as reset).
>
> 2. Mechanisms to allow a client process to do its own iommufd
> management w.r.t. its address space, in a way that's isolated
> from DMA relating to other clients.
>
>
> mmap() of VFIO DMABUFs
> ======================
>
> This RFC addresses #1 in "vfio/pci: Support mmap() of a VFIO DMABUF",
> implementing the proposals in [0] to add mmap() support to the
> existing VFIO DMABUF exporter.
>
> This enables a userspace driver to define DMABUF ranges corresponding
> to sub-ranges of a BAR, and grant a given client (via a shared fd)
> the capability to access (only) those sub-ranges. The VFIO device fds
> would be kept private to the primary process. All the client can do
> with that fd is map (or iomap via iommufd) that specific subset of
> resources, and the impact of bugs/malice is contained.
>
> (We'll follow up on #2 separately, as a related-but-distinct problem.
> PASIDs are one way to achieve per-client isolation of DMA; another
> could be sharing of a single IOVA space via 'constrained' iommufds.)
>
>
> New in v2: To achieve this, the existing VFIO BAR mmap() path is
> converted to use DMABUFs behind the scenes, in "vfio/pci: Convert BAR
> mmap() to use a DMABUF" plus new helper functions, as Jason/Christian
> suggested in the v1 discussion [3].
>
> This means:
>
> - Both regular and new DMABUF BAR mappings share the same vm_ops,
> i.e. mmap()ing DMABUFs is a smaller change on top of the existing
> mmap().
>
> - The zapping of mappings occurs via vfio_pci_dma_buf_move(), and the
> vfio_pci_zap_bars() originally paired with the _move()s can go
> away. Each DMABUF has a unique address_space.
>
> - It's a step towards future iommufd VFIO Type1 emulation
> implementing P2P, since iommufd can now get a DMABUF from a VA that
> it's mapping for IO; the VMAs' vm_file is that of the backing
> DMABUF.
>
>
> Revocation/reclaim
> ==================
>
> Mapping a BAR subset is useful, but the lifetime of access granted to
> a client needs to be managed well. For example, a protocol between
> the primary process and the client can indicate when the client is
> done, and when it's safe to reuse the resources elsewhere, but cleanup
> can't practically be cooperative.
>
> For robustness, we enable the driver to make the resources
> guaranteed-inaccessible when it chooses, so that it can re-assign them
> to other uses in future.
>
> "vfio/pci: Permanently revoke a DMABUF on request" adds a new VFIO
> device fd ioctl, VFIO_DEVICE_PCI_DMABUF_REVOKE. This takes a DMABUF
> fd parameter previously exported (from that device!) and permanently
> revokes the DMABUF. This notifies/detaches importers, zaps PTEs for
> any mappings, and guarantees no future attachment/import/map/access is
> possible by any means.
>
> A primary driver process would use this operation when the client's
> tenure ends to reclaim "loaned-out" MMIO interfaces, at which point
> the interfaces could be safely re-used.
>
> New in v2: ioctl() on VFIO driver fd, rather than DMABUF fd. A DMABUF
> is revoked using code common to vfio_pci_dma_buf_move(), selectively
> zapping mappings (after waiting for completion on the
> dma_buf_invalidate_mappings() request).
>
>
> BAR mapping access attributes
> =============================
>
> Inspired by Alex [Mastro] and Jason's comments in [0] and Mahmoud's
> work in [1] with the goal of controlling CPU access attributes for
> VFIO BAR mappings (e.g. WC), we can decorate DMABUFs with access
> attributes that are then used by a mapping's PTEs.
>
> I've proposed reserving a field in struct
> vfio_device_feature_dma_buf's flags to specify an attribute for its
> ranges. Although that keeps the (UAPI) struct unchanged, it means all
> ranges in a DMABUF share the same attribute. I feel a single
> attribute-to-mmap() relation is logical/reasonable. An application
> can also create multiple DMABUFs to describe any BAR layout and mix of
> attributes.
>
>
> Tests
> =====
>
> (Still sharing the [RFC ONLY] userspace test/demo program for context,
> not for merge.)
>
> It illustrates & tests various map/revoke cases, but doesn't use the
> existing VFIO selftests and relies on a (tweaked) QEMU EDU function.
> I'm (still) working on integrating the scenarios into the existing
> VFIO selftests.
>
> This code has been tested in mapping DMABUFs of single/multiple
> ranges, aliasing mmap()s, aliasing ranges across DMABUFs, vm_pgoff >
> 0, revocation, shutdown/cleanup scenarios, and hugepage mappings seem
> to work correctly. I've lightly tested WC mappings also (by observing
> resulting PTEs as having the correct attributes...).
>
>
> Fin
> ===
>
> v2 is based on next-20260310 (to build on Leon's recent series
> "vfio: Wait for dma-buf invalidation to complete" [2]).
>
>
> Please share your thoughts! I'd like to de-RFC if we feel this
> approach is now fair.
I only skimmed over it, but at least of hand I couldn't find anything fundamentally wrong.
The locking order seems to change in patch #6. In general I strongly recommend to enable lockdep while testing anyway but explicitly when I see such changes.
Additional to that it might also be a good idea to have a lockdep initcall function which defines the locking order in the way all the VFIO code should follow.
See function dma_resv_lockdep() for an example on how to do that. Especially with mmap support and all the locks involved with that it has proven to be a good practice to have something like that.
Regards,
Christian.
>
>
> Many thanks,
>
>
> Matt
>
>
>
> References:
>
> [0]: https://lore.kernel.org/linux-iommu/20250918214425.2677057-1-amastro@fb.com/
> [1]: https://lore.kernel.org/all/20250804104012.87915-1-mngyadam@amazon.de/
> [2]: https://lore.kernel.org/linux-iommu/20260205-nocturnal-poetic-chamois-f566a…
> [3]: https://lore.kernel.org/all/20260226202211.929005-1-mattev@meta.com/
>
> --------------------------------------------------------------------------------
> Changelog:
>
> v2: Respin based on the feedback/suggestions:
>
> - Transform the existing VFIO BAR mmap path to also use DMABUFs behind
> the scenes, and then simply share that code for explicitly-mapped
> DMABUFs.
>
> - Refactors the export itself out of vfio_pci_core_feature_dma_buf,
> and shared by a new vfio_pci_core_mmap_prep_dmabuf helper used by
> the regular VFIO mmap to create a DMABUF.
>
> - Revoke buffers using a VFIO device fd ioctl
>
> v1: https://lore.kernel.org/all/20260226202211.929005-1-mattev@meta.com/
>
>
> Matt Evans (10):
> vfio/pci: Set up VFIO barmap before creating a DMABUF
> vfio/pci: Clean up DMABUFs before disabling function
> vfio/pci: Add helper to look up PFNs for DMABUFs
> vfio/pci: Add a helper to create a DMABUF for a BAR-map VMA
> vfio/pci: Convert BAR mmap() to use a DMABUF
> vfio/pci: Remove vfio_pci_zap_bars()
> vfio/pci: Support mmap() of a VFIO DMABUF
> vfio/pci: Permanently revoke a DMABUF on request
> vfio/pci: Add mmap() attributes to DMABUF feature
> [RFC ONLY] selftests: vfio: Add standalone vfio_dmabuf_mmap_test
>
> drivers/vfio/pci/Kconfig | 3 +-
> drivers/vfio/pci/Makefile | 3 +-
> drivers/vfio/pci/vfio_pci_config.c | 18 +-
> drivers/vfio/pci/vfio_pci_core.c | 123 +--
> drivers/vfio/pci/vfio_pci_dmabuf.c | 425 +++++++--
> drivers/vfio/pci/vfio_pci_priv.h | 46 +-
> include/uapi/linux/vfio.h | 42 +-
> tools/testing/selftests/vfio/Makefile | 1 +
> .../vfio/standalone/vfio_dmabuf_mmap_test.c | 837 ++++++++++++++++++
> 9 files changed, 1339 insertions(+), 159 deletions(-)
> create mode 100644 tools/testing/selftests/vfio/standalone/vfio_dmabuf_mmap_test.c
>
On Thu, Mar 12, 2026 at 11:49:23AM -0400, Mathieu Desnoyers wrote:
> On 2026-03-12 11:40, Steven Rostedt wrote:
> > On Thu, 12 Mar 2026 11:28:07 -0400
> > Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com> wrote:
> >
> > > > Note, Vineeth came up with the naming. I would have done "do" but when I
> > > > saw "invoke" I thought it sounded better.
> > >
> > > It works as long as you don't have a tracing subsystem called
> > > "invoke", then you get into identifier clash territory.
> >
> > True. Perhaps we should do the double underscore trick.
> >
> > Instead of: trace_invoke_foo()
> >
> > use: trace_invoke__foo()
> >
> >
> > Which will make it more visible to what the trace event is.
> >
> > Hmm, we probably should have used: trace__foo() for all tracepoints, as
> > there's still functions that are called trace_foo() that are not
> > tracepoints :-p
>
> One certain way to eliminate identifier clash would be to go for a
> prefix to "trace_", e.g.
Oh, I know!, call them __do_trace_##foo().
/me runs like hell
On Thu, Mar 12, 2026 at 11:39:06AM -0400, Vineeth Remanan Pillai wrote:
> On Thu, Mar 12, 2026 at 11:13 AM Steven Rostedt <rostedt(a)goodmis.org> wrote:
> >
> > On Thu, 12 Mar 2026 11:04:56 -0400
> > "Vineeth Pillai (Google)" <vineeth(a)bitbyteword.org> wrote:
> >
> > > Add trace_invoke_##name() as a companion to trace_##name(). When a
> > > caller already guards a tracepoint with an explicit enabled check:
> > >
> > > if (trace_foo_enabled() && cond)
> > > trace_foo(args);
> > >
> > > trace_foo() internally repeats the static_branch_unlikely() test, which
> > > the compiler cannot fold since static branches are patched binary
> > > instructions. This results in two static-branch evaluations for every
> > > guarded call site.
> > >
> > > trace_invoke_##name() calls __do_trace_##name() directly, skipping the
> > > redundant static-branch re-check. This avoids leaking the internal
> > > __do_trace_##name() symbol into call sites while still eliminating the
> > > double evaluation:
> > >
> > > if (trace_foo_enabled() && cond)
> > > trace_invoke_foo(args); /* calls __do_trace_foo() directly */
> > >
> > > Three locations are updated:
> > > - __DECLARE_TRACE: invoke form omits static_branch_unlikely, retains
> > > the LOCKDEP RCU-watching assertion.
> > > - __DECLARE_TRACE_SYSCALL: same, plus retains might_fault().
> > > - !TRACEPOINTS_ENABLED stub: empty no-op so callers compile cleanly
> > > when tracepoints are compiled out.
> > >
> > > Suggested-by: Steven Rostedt <rostedt(a)goodmis.org>
> > > Suggested-by: Peter Zijlstra <peterz(a)infradead.org>
> > > Signed-off-by: Vineeth Pillai (Google) <vineeth(a)bitbyteword.org>
> > > Assisted-by: Claude:claude-sonnet-4-6
> >
> > I'm guessing Claude helped with the other patches. Did it really help with this one?
> >
>
> Claude wrote and build tested the whole series based on my guidance
> and prompt :-). I verified the series before sending it out, but
> claude did the initial work.
That seems like an unreasonable waste of energy. You could've had claude
write a Coccinelle script for you and saved a ton of tokens.
On Thu, Mar 12, 2026 at 10:03:37AM +0100, Jiri Pirko wrote:
> >Alot of device MMIO is decrypted by nature and can't be encrypted, so
> >you'd have to use both flags. eg in VFIO we'd want to do this.
>
> Why both flags? Why MMIO flag is not enough? You still want to hit
> "if (attrs & DMA_ATTR_MMIO) {" path, don't you?
Because we will eventually have both decrypted and encrypted MMIO.
> I mean, CC_DECRYPTED says the memory to be mapped was explicitly
> decrypted before the call. MMIO was not explicitly decrypted, it is
> decrypted by definition. For me that does not fit the CC_DECRYPTED
> semantics.
I would say CC_DECRYPTED means that pgprot_decrypted must be used to
form a PTE, and !CC_DECRYPTED means that pgprot_encrypted() was used
This flag should someday flow down into the vIOMMU driver and set the
corresponding C bit the IOPTE (for AMD) exactly as the pgprot does.
Less about set_memory_encrypted as that is only for DRAM.
Jason
Hey everyone! I'm thrilled to announce that I have fully rec0vered my l0cked BTC from an inve stment platf0rm even after 3 months it happened. Here's a quick story.
Early this year, I came across an inve stment platf0rm that gave me my profit after the first inve stment , I did the second it was successful and I tried the 3rd this time with a bigger amount of $120,000, when it was time for withdr awal I couldn't, the option was l0cked, After weeks of trying I l0st hope, fast-forward to last week I saw a post about rec0 vering l0cked or st0len fuñ ds, tho I was skeptical but I gave it a try what more could I lose?? In 2 4hrs ghosttrackhackers@gmail . com h€lped me r€trieve my full fuπds (of $120,000) including profit ($40,000). It's back safe in my w@ll€t. I know many have fallen v!ct!m too you can re@ch out to th€m for h€lp they're l€git and €asy to talk to. Goodluck.
✉️: ghosttrackhackers @ gmail . com
On Mon, Mar 09, 2026 at 06:51:21PM +0100, Jiri Pirko wrote:
> Mon, Mar 09, 2026 at 04:18:57PM +0100, jgg(a)ziepe.ca wrote:
> >On Mon, Mar 09, 2026 at 04:02:33PM +0200, Leon Romanovsky wrote:
> >> On Mon, Mar 09, 2026 at 10:15:30AM -0300, Jason Gunthorpe wrote:
> >> > On Sun, Mar 08, 2026 at 12:19:48PM +0200, Leon Romanovsky wrote:
> >> >
> >> > > > +/*
> >> > > > + * DMA_ATTR_CC_DECRYPTED: Indicates memory that has been explicitly decrypted
> >> > > > + * (shared) for confidential computing guests. The caller must have
> >> > > > + * called set_memory_decrypted(). A struct page is required.
> >> > > > + */
> >> > > > +#define DMA_ATTR_CC_DECRYPTED (1UL << 12)
> >> > >
> >> > > While adding the new attribute is fine, I would expect additional checks in
> >> > > dma_map_phys() to ensure the attribute cannot be misused. For example,
> >> > > WARN_ON(attrs & (DMA_ATTR_CC_DECRYPTED | DMA_ATTR_MMIO)), along with a check
> >> > > that we are taking the direct path only.
> >> >
> >> > DECRYPYED and MMIO is something that needs to work, VFIO (inside a
> >> > TVM) should be using that combination.
> >>
> >> So this sentence "A struct page is required" from the comment above is
> >> not accurate.
> >
> >It would be clearer to say "Unless DMA_ATTR_MMIO is provided a struct
> >page is required"
> >
> >We need to audit if that works properly, IIRC it does, but I don't
> >remember.. Jiri?
>
> How can you do set_memory_decrypted if you don't have page/folio ?
Alot of device MMIO is decrypted by nature and can't be encrypted, so
you'd have to use both flags. eg in VFIO we'd want to do this.
Jason
This is the next version of the shmem backed GEM objects series
originally from Asahi, previously posted by Daniel Almeida.
One of the major changes in this patch series is a much better interface
around vmaps, which we achieve by introducing a new set of rust bindings
for iosys_map.
The previous version of the patch series can be found here:
https://patchwork.freedesktop.org/series/156093/
This patch series may be applied on top of the
driver-core/driver-core-testing branch:
https://git.kernel.org/pub/scm/linux/kernel/git/driver-core/driver-core.git…
Changelogs are per-patch
Asahi Lina (2):
rust: helpers: Add bindings/wrappers for dma_resv_lock
rust: drm: gem: shmem: Add DRM shmem helper abstraction
Lyude Paul (5):
rust: drm: Add gem::impl_aref_for_gem_obj!
rust: drm: gem: Add raw_dma_resv() function
rust: gem: Introduce DriverObject::Args
rust: drm: gem: Introduce shmem::SGTable
rust: drm/gem: Add vmap functions to shmem bindings
drivers/gpu/drm/nova/gem.rs | 5 +-
drivers/gpu/drm/tyr/gem.rs | 3 +-
rust/bindings/bindings_helper.h | 3 +
rust/helpers/dma-resv.c | 13 +
rust/helpers/drm.c | 56 +++-
rust/helpers/helpers.c | 1 +
rust/kernel/drm/gem/mod.rs | 79 +++--
rust/kernel/drm/gem/shmem.rs | 529 ++++++++++++++++++++++++++++++++
8 files changed, 667 insertions(+), 22 deletions(-)
create mode 100644 rust/helpers/dma-resv.c
create mode 100644 rust/kernel/drm/gem/shmem.rs
--
2.53.0
This patch series adds a new dma-buf heap driver that exposes coherent,
non‑reusable reserved-memory regions as named heaps, so userspace can
explicitly allocate buffers from those device‑specific pools.
Motivation: we want cgroup accounting for all userspace‑visible buffer
allocations (DRM, v4l2, dma‑buf heaps, etc.). That’s hard to do when
drivers call dma_alloc_attrs() directly because the accounting controller
(memcg vs dmem) is ambiguous. The long‑term plan is to steer those paths
toward dma‑buf heaps, where each heap can unambiguously charge a single
controller. To reach that goal, we need a heap backend for each
dma_alloc_attrs() memory type. CMA and system heaps already exist;
coherent reserved‑memory was the missing piece, since many SoCs define
dedicated, device‑local coherent pools in DT under /reserved-memory using
"shared-dma-pool" with non‑reusable regions (i.e., not CMA) that are
carved out exclusively for coherent DMA and are currently only usable by
in‑kernel drivers.
Because these regions are device‑dependent, each heap instance binds a
heap device to its reserved‑mem region via a newly introduced helper
function -namely, of_reserved_mem_device_init_with_mem()- so coherent
allocations use the correct dev->dma_mem.
Charging to cgroups for these buffers is intentionally left out to keep
review focused on the new heap; I plan to follow up based on Eric’s [1]
and Maxime’s [2] work on dmem charging from userspace.
This series also makes the new heap driver modular, in line with the CMA
heap change in [3].
[1] https://lore.kernel.org/all/20260218-dmabuf-heap-cma-dmem-v2-0-b249886fb7b2…
[2] https://lore.kernel.org/all/20250310-dmem-cgroups-v1-0-2984c1bc9312@kernel.…
[3] https://lore.kernel.org/all/20260303-dma-buf-heaps-as-modules-v3-0-24344812…
Signed-off-by: Albert Esteve <aesteve(a)redhat.com>
---
Changes in v3:
- Reorganized changesets among patches to ensure bisectability
- Removed unused dma_heap_coherent_register() leftover
- Removed fallback when setting mask in coherent heap dev, since
dma_set_mask() already truncates to supported masks
- Moved struct rmem_assigned_device (rd) logic to
of_reserved_mem_device_init_with_mem() to allow listing the device
- Link to v2: https://lore.kernel.org/r/20260303-b4-dmabuf-heap-coherent-rmem-v2-0-65a465…
Changes in v2:
- Removed dmem charging parts
- Moved coherent heap registering logic to coherent.c
- Made heap device a member of struct dma_heap
- Split dma_heap_add logic into create/register, to be able to
access the stored heap device before registered.
- Avoid platform device in favour of heap device
- Added a wrapper to rmem device_init() op
- Switched from late_initcall() to module_init()
- Made the coherent heap driver modular
- Link to v1: https://lore.kernel.org/r/20260224-b4-dmabuf-heap-coherent-rmem-v1-1-dffef4…
---
Albert Esteve (5):
dma-buf: dma-heap: split dma_heap_add
of_reserved_mem: add a helper for rmem device_init op
dma: coherent: store reserved memory coherent regions
dma-buf: heaps: Add Coherent heap to dmabuf heaps
dma-buf: heaps: coherent: Turn heap into a module
John Stultz (1):
dma-buf: dma-heap: Keep track of the heap device struct
drivers/dma-buf/dma-heap.c | 138 +++++++++--
drivers/dma-buf/heaps/Kconfig | 9 +
drivers/dma-buf/heaps/Makefile | 1 +
drivers/dma-buf/heaps/coherent_heap.c | 417 ++++++++++++++++++++++++++++++++++
drivers/of/of_reserved_mem.c | 68 ++++--
include/linux/dma-heap.h | 5 +
include/linux/dma-map-ops.h | 7 +
include/linux/of_reserved_mem.h | 8 +
kernel/dma/coherent.c | 34 +++
9 files changed, 640 insertions(+), 47 deletions(-)
---
base-commit: 6de23f81a5e08be8fbf5e8d7e9febc72a5b5f27f
change-id: 20260223-b4-dmabuf-heap-coherent-rmem-91fd3926afe9
Best regards,
--
Albert Esteve <aesteve(a)redhat.com>