By combining cross-chain tracing, rapid response, and data-driven investigation, Cipher Rescue Chain stands as the global benchmark for crypto recovery. Every traced transaction, every reconstructed path, and every recovered asset reinforces the same conclusion: with the right forensic expertise, recovery is not only possible—it is highly achievable.
The VFS now warns if an inode flagged with S_ANON_INODE is located on a
filesystem that does not have SB_I_NOEXEC set. dmabuf inodes are
created using alloc_anon_inode(), which sets S_ANON_INODE.
This triggers a warning in path_noexec() when a dmabuf is mmapped, for
example by GStreamer's v4l2src element.
[ 60.061328] WARNING: CPU: 2 PID: 2803 at fs/exec.c:125 path_noexec+0xa0/0xd0
...
[ 60.061637] do_mmap+0x2b5/0x680
The warning was introduced by commit 1e7ab6f67824 ("anon_inode: rework
assertions") which added enforcement that anonymous inodes must be on
filesystems with SB_I_NOEXEC set.
Fix this by setting SB_I_NOEXEC and SB_I_NODEV on the dmabuf filesystem
context, following the same pattern as commit ce7419b6cf23d ("anon_inode:
raise SB_I_NODEV and SB_I_NOEXEC") and commit 98f99394a104c ("secretmem:
use SB_I_NOEXEC").
Signed-off-by: Chia-Lin Kao (AceLan) <acelan.kao(a)canonical.com>
---
drivers/dma-buf/dma-buf.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
index a4d8f2ff94e46..dea79aaab10ce 100644
--- a/drivers/dma-buf/dma-buf.c
+++ b/drivers/dma-buf/dma-buf.c
@@ -221,6 +221,8 @@ static int dma_buf_fs_init_context(struct fs_context *fc)
if (!ctx)
return -ENOMEM;
ctx->dops = &dma_buf_dentry_ops;
+ fc->s_iflags |= SB_I_NOEXEC;
+ fc->s_iflags |= SB_I_NODEV;
return 0;
}
--
2.51.0
Hi,
The recent introduction of heaps in the optee driver [1] made possible
the creation of heaps as modules.
It's generally a good idea if possible, including for the already
existing system and CMA heaps.
The system one is pretty trivial, the CMA one is a bit more involved,
especially since we have a call from kernel/dma/contiguous.c to the CMA
heap code. This was solved by turning the logic around and making the
CMA heap call into the contiguous DMA code.
Let me know what you think,
Maxime
1: https://lore.kernel.org/dri-devel/20250911135007.1275833-4-jens.wiklander@l…
Signed-off-by: Maxime Ripard <mripard(a)kernel.org>
---
Changes in v4:
- Fix compilation failure
- Rework to take into account OF_RESERVED_MEM
- Fix regression making the default CMA area disappear if not created
through the DT
- Added some documentation and comments
- Link to v3: https://lore.kernel.org/r/20260303-dma-buf-heaps-as-modules-v3-0-24344812c7…
Changes in v3:
- Squashed cma_get_name and cma_alloc/release patches
- Fixed typo in Export dev_get_cma_area commit title
- Fixed compilation failure with DMA_CMA but not OF_RESERVED_MEM
- Link to v2: https://lore.kernel.org/r/20260227-dma-buf-heaps-as-modules-v2-0-454aee7e06…
Changes in v2:
- Collect tags
- Don't export dma_contiguous_default_area anymore, but export
dev_get_cma_area instead
- Mentioned that heap modules can't be removed
- Link to v1: https://lore.kernel.org/r/20260225-dma-buf-heaps-as-modules-v1-0-2109225a09…
---
Maxime Ripard (8):
dma: contiguous: Turn heap registration logic around
dma: contiguous: Make dev_get_cma_area() a proper function
dma: contiguous: Make dma_contiguous_default_area static
dma: contiguous: Export dev_get_cma_area()
mm: cma: Export cma_alloc(), cma_release() and cma_get_name()
dma-buf: heaps: Export mem_accounting parameter
dma-buf: heaps: cma: Turn the heap into a module
dma-buf: heaps: system: Turn the heap into a module
drivers/dma-buf/dma-heap.c | 1 +
drivers/dma-buf/heaps/Kconfig | 4 +--
drivers/dma-buf/heaps/cma_heap.c | 22 +++----------
drivers/dma-buf/heaps/system_heap.c | 5 +++
include/linux/dma-buf/heaps/cma.h | 16 ---------
include/linux/dma-map-ops.h | 14 ++++----
kernel/dma/contiguous.c | 66 +++++++++++++++++++++++++++++++++----
mm/cma.c | 3 ++
8 files changed, 82 insertions(+), 49 deletions(-)
---
base-commit: c081b71f11732ad2c443f170ab19c3ebe8a1a422
change-id: 20260225-dma-buf-heaps-as-modules-1034b3ec9f2a
Best regards,
--
Maxime Ripard <mripard(a)kernel.org>
This patch series introduces the Qualcomm DSP Accelerator (QDA) driver,
a modern DRM-based accelerator implementation for Qualcomm Hexagon DSPs.
The driver provides a standardized interface for offloading computational
tasks to DSPs found on Qualcomm SoCs, supporting all DSP domains (ADSP,
CDSP, SDSP, GDSP).
The QDA driver is designed as an alternative for the FastRPC driver
in drivers/misc/, offering improved resource management, better integration
with standard kernel subsystems, and alignment with the Linux kernel's
Compute Accelerators framework.
User-space staging branch
============
https://github.com/qualcomm/fastrpc/tree/accel/staging
Key Features
============
* Standard DRM accelerator interface via /dev/accel/accelN
* GEM-based buffer management with DMA-BUF import/export support
* IOMMU-based memory isolation using per-process context banks
* FastRPC protocol implementation for DSP communication
* RPMsg transport layer for reliable message passing
* Support for all DSP domains (ADSP, CDSP, SDSP, GDSP)
* Comprehensive IOCTL interface for DSP operations
High-Level Architecture Differences with Existing FastRPC Driver
=================================================================
The QDA driver represents a significant architectural departure from the
existing FastRPC driver (drivers/misc/fastrpc.c), addressing several key
limitations while maintaining protocol compatibility:
1. DRM Accelerator Framework Integration
- FastRPC: Custom character device (/dev/fastrpc-*)
- QDA: Standard DRM accel device (/dev/accel/accelN)
- Benefit: Leverages established DRM infrastructure for device
management.
2. Memory Management
- FastRPC: Custom memory allocator with ION/DMA-BUF integration
- QDA: Native GEM objects with full PRIME support
- Benefit: Seamless buffer sharing using standard DRM mechanisms
3. IOMMU Context Bank Management
- FastRPC: Direct IOMMU domain manipulation, limited isolation
- QDA: Custom compute bus (qda_cb_bus_type) with proper device model
- Benefit: Each CB device is a proper struct device with IOMMU group
support, enabling better isolation and resource tracking.
- https://lore.kernel.org/all/245d602f-3037-4ae3-9af9-d98f37258aae@oss.qualco…
4. Memory Manager Architecture
- FastRPC: Monolithic allocator
- QDA: Pluggable memory manager with backend abstraction
- Benefit: Currently uses DMA-coherent backend, easily extensible for
future memory types (e.g., carveout, CMA)
5. Transport Layer
- FastRPC: Direct RPMsg integration in core driver
- QDA: Abstracted transport layer (qda_rpmsg.c)
- Benefit: Clean separation of concerns, easier to add alternative
transports if needed
8. Code Organization
- FastRPC: ~3000 lines in single file
- QDA: Modular design across multiple files (~4600 lines total)
* qda_drv.c: Core driver and DRM integration
* qda_gem.c: GEM object management
* qda_memory_manager.c: Memory and IOMMU management
* qda_fastrpc.c: FastRPC protocol implementation
* qda_rpmsg.c: Transport layer
* qda_cb.c: Context bank device management
- Benefit: Better maintainability, clearer separation of concerns
9. UAPI Design
- FastRPC: Custom IOCTL interface
- QDA: DRM-style IOCTLs with proper versioning support
- Benefit: Follows DRM conventions, easier userspace integration
10. Documentation
- FastRPC: Minimal in-tree documentation
- QDA: Comprehensive documentation in Documentation/accel/qda/
- Benefit: Better developer experience, clearer API contracts
11. Buffer Reference Mechanism
- FastRPC: Uses buffer file descriptors (FDs) for all book-keeping
in both kernel and DSP
- QDA: Uses GEM handles for kernel-side management, providing better
integration with DRM subsystem
- Benefit: Leverages DRM GEM infrastructure for reference counting,
lifetime management, and integration with other DRM components
Key Technical Improvements
===========================
* Proper device model: CB devices are real struct device instances on a
custom bus, enabling proper IOMMU group management and power management
integration
* Reference-counted IOMMU devices: Multiple file descriptors from the same
process share a single IOMMU device, reducing overhead
* GEM-based buffer lifecycle: Automatic cleanup via DRM GEM reference
counting, eliminating many resource leak scenarios
* Modular memory backends: The memory manager supports pluggable backends,
currently implementing DMA-coherent allocations with SID-prefixed
addresses for DSP firmware
* Context-based invocation tracking: XArray-based context management with
proper synchronization and cleanup
Patch Series Organization
==========================
Patches 1-2: Driver skeleton and documentation
Patches 3-6: RPMsg transport and IOMMU/CB infrastructure
Patches 7-9: DRM device registration and basic IOCTL
Patches 10-12: GEM buffer management and PRIME support
Patches 13-17: FastRPC protocol implementation (attach, invoke, create,
map/unmap)
Patch 18: MAINTAINERS entry
Open Items
===========
The following items are identified as open items:
1. Privilege Level Management
- Currently, daemon processes and user processes have the same access
level as both use the same accel device node. This needs to be
addressed as daemons attach to privileged DSP PDs and require
higher privilege levels for system-level operations
- Seeking guidance on the best approach: separate device nodes,
capability-based checks, or DRM master/authentication mechanisms
2. UAPI Compatibility Layer
- Add UAPI compat layer to facilitate migration of client applications
from existing FastRPC UAPI to the new QDA accel driver UAPI,
ensuring smooth transition for existing userspace code
- Seeking guidance on implementation approach: in-kernel translation
layer, userspace wrapper library, or hybrid solution
3. Documentation Improvements
- Add detailed IOCTL usage examples
- Document DSP firmware interface requirements
- Create migration guide from existing FastRPC
4. Per-Domain Memory Allocation
- Develop new userspace API to support memory allocation on a per
domain basis, enabling domain-specific memory management and
optimization
5. Audio and Sensors PD Support
- The current patch series does not handle Audio PD and Sensors PD
functionalities. These specialized protection domains require
additional support for real-time constraints and power management
Interface Compatibility
========================
The QDA driver maintains compatibility with existing FastRPC infrastructure:
* Device Tree Bindings: The driver uses the same device tree bindings as
the existing FastRPC driver, ensuring no changes are required to device
tree sources. The "qcom,fastrpc" compatible string and child node
structure remain unchanged.
* Userspace Interface: While the driver provides a new DRM-based UAPI,
the underlying FastRPC protocol and DSP firmware interface remain
compatible. This ensures that DSP firmware and libraries continue to
work without modification.
* Migration Path: The modular design allows for gradual migration, where
both drivers can coexist during the transition period. Applications can
be migrated incrementally to the new UAPI with the help of the planned
compatibility layer.
References
==========
Previous discussions on this migration:
- https://lkml.org/lkml/2024/6/24/479
- https://lkml.org/lkml/2024/6/21/1252
Testing
=======
The driver has been tested on Qualcomm platforms with:
- Basic FastRPC attach/release operations
- DSP process creation and initialization
- Memory mapping/unmapping operations
- Dynamic invocation with various buffer types
- GEM buffer allocation and mmap
- PRIME buffer import from other subsystems
Signed-off-by: Ekansh Gupta <ekansh.gupta(a)oss.qualcomm.com>
---
Ekansh Gupta (18):
accel/qda: Add Qualcomm QDA DSP accelerator driver docs
accel/qda: Add Qualcomm DSP accelerator driver skeleton
accel/qda: Add RPMsg transport for Qualcomm DSP accelerator
accel/qda: Add built-in compute CB bus for QDA and integrate with IOMMU
accel/qda: Create compute CB devices on QDA compute bus
accel/qda: Add memory manager for CB devices
accel/qda: Add DRM accel device registration for QDA driver
accel/qda: Add per-file DRM context and open/close handling
accel/qda: Add QUERY IOCTL and basic QDA UAPI header
accel/qda: Add DMA-backed GEM objects and memory manager integration
accel/qda: Add GEM_CREATE and GEM_MMAP_OFFSET IOCTLs
accel/qda: Add PRIME dma-buf import support
accel/qda: Add initial FastRPC attach and release support
accel/qda: Add FastRPC dynamic invocation support
accel/qda: Add FastRPC DSP process creation support
accel/qda: Add FastRPC-based DSP memory mapping support
accel/qda: Add FastRPC-based DSP memory unmapping support
MAINTAINERS: Add MAINTAINERS entry for QDA driver
Documentation/accel/index.rst | 1 +
Documentation/accel/qda/index.rst | 14 +
Documentation/accel/qda/qda.rst | 129 ++++
MAINTAINERS | 9 +
arch/arm64/configs/defconfig | 2 +
drivers/accel/Kconfig | 1 +
drivers/accel/Makefile | 2 +
drivers/accel/qda/Kconfig | 35 ++
drivers/accel/qda/Makefile | 19 +
drivers/accel/qda/qda_cb.c | 182 ++++++
drivers/accel/qda/qda_cb.h | 26 +
drivers/accel/qda/qda_compute_bus.c | 23 +
drivers/accel/qda/qda_drv.c | 375 ++++++++++++
drivers/accel/qda/qda_drv.h | 171 ++++++
drivers/accel/qda/qda_fastrpc.c | 1002 ++++++++++++++++++++++++++++++++
drivers/accel/qda/qda_fastrpc.h | 433 ++++++++++++++
drivers/accel/qda/qda_gem.c | 211 +++++++
drivers/accel/qda/qda_gem.h | 103 ++++
drivers/accel/qda/qda_ioctl.c | 271 +++++++++
drivers/accel/qda/qda_ioctl.h | 118 ++++
drivers/accel/qda/qda_memory_dma.c | 91 +++
drivers/accel/qda/qda_memory_dma.h | 46 ++
drivers/accel/qda/qda_memory_manager.c | 382 ++++++++++++
drivers/accel/qda/qda_memory_manager.h | 148 +++++
drivers/accel/qda/qda_prime.c | 194 +++++++
drivers/accel/qda/qda_prime.h | 43 ++
drivers/accel/qda/qda_rpmsg.c | 327 +++++++++++
drivers/accel/qda/qda_rpmsg.h | 57 ++
drivers/iommu/iommu.c | 4 +
include/linux/qda_compute_bus.h | 22 +
include/uapi/drm/qda_accel.h | 224 +++++++
31 files changed, 4665 insertions(+)
---
base-commit: d4906ae14a5f136ceb671bb14cedbf13fa560da6
change-id: 20260223-qda-firstpost-4ab05249e2cc
Best regards,
--
Ekansh Gupta <ekansh.gupta(a)oss.qualcomm.com>
This patch series adds a new dma-buf heap driver that exposes coherent,
non‑reusable reserved-memory regions as named heaps, so userspace can
explicitly allocate buffers from those device‑specific pools.
Motivation: we want cgroup accounting for all userspace‑visible buffer
allocations (DRM, v4l2, dma‑buf heaps, etc.). That’s hard to do when
drivers call dma_alloc_attrs() directly because the accounting controller
(memcg vs dmem) is ambiguous. The long‑term plan is to steer those paths
toward dma‑buf heaps, where each heap can unambiguously charge a single
controller. To reach that goal, we need a heap backend for each
dma_alloc_attrs() memory type. CMA and system heaps already exist;
coherent reserved‑memory was the missing piece, since many SoCs define
dedicated, device‑local coherent pools in DT under /reserved-memory using
"shared-dma-pool" with non‑reusable regions (i.e., not CMA) that are
carved out exclusively for coherent DMA and are currently only usable by
in‑kernel drivers.
Because these regions are device‑dependent, each heap instance binds a
heap device to its reserved‑mem region via a newly introduced helper
function -namely, of_reserved_mem_device_init_with_mem()- so coherent
allocations use the correct dev->dma_mem.
Charging to cgroups for these buffers is intentionally left out to keep
review focused on the new heap; I plan to follow up based on Eric’s [1]
and Maxime’s [2] work on dmem charging from userspace.
This series also makes the new heap driver modular, in line with the CMA
heap change in [3].
[1] https://lore.kernel.org/all/20260218-dmabuf-heap-cma-dmem-v2-0-b249886fb7b2…
[2] https://lore.kernel.org/all/20250310-dmem-cgroups-v1-0-2984c1bc9312@kernel.…
[3] https://lore.kernel.org/all/20260303-dma-buf-heaps-as-modules-v3-0-24344812…
Signed-off-by: Albert Esteve <aesteve(a)redhat.com>
---
Changes in v3:
- Reorganized changesets among patches to ensure bisectability
- Removed unused dma_heap_coherent_register() leftover
- Removed fallback when setting mask in coherent heap dev, since
dma_set_mask() already truncates to supported masks
- Moved struct rmem_assigned_device (rd) logic to
of_reserved_mem_device_init_with_mem() to allow listing the device
- Link to v2: https://lore.kernel.org/r/20260303-b4-dmabuf-heap-coherent-rmem-v2-0-65a465…
Changes in v2:
- Removed dmem charging parts
- Moved coherent heap registering logic to coherent.c
- Made heap device a member of struct dma_heap
- Split dma_heap_add logic into create/register, to be able to
access the stored heap device before registered.
- Avoid platform device in favour of heap device
- Added a wrapper to rmem device_init() op
- Switched from late_initcall() to module_init()
- Made the coherent heap driver modular
- Link to v1: https://lore.kernel.org/r/20260224-b4-dmabuf-heap-coherent-rmem-v1-1-dffef4…
---
Albert Esteve (5):
dma-buf: dma-heap: split dma_heap_add
of_reserved_mem: add a helper for rmem device_init op
dma: coherent: store reserved memory coherent regions
dma-buf: heaps: Add Coherent heap to dmabuf heaps
dma-buf: heaps: coherent: Turn heap into a module
John Stultz (1):
dma-buf: dma-heap: Keep track of the heap device struct
drivers/dma-buf/dma-heap.c | 138 +++++++++--
drivers/dma-buf/heaps/Kconfig | 9 +
drivers/dma-buf/heaps/Makefile | 1 +
drivers/dma-buf/heaps/coherent_heap.c | 417 ++++++++++++++++++++++++++++++++++
drivers/of/of_reserved_mem.c | 68 ++++--
include/linux/dma-heap.h | 5 +
include/linux/dma-map-ops.h | 7 +
include/linux/of_reserved_mem.h | 8 +
kernel/dma/coherent.c | 34 +++
9 files changed, 640 insertions(+), 47 deletions(-)
---
base-commit: 6de23f81a5e08be8fbf5e8d7e9febc72a5b5f27f
change-id: 20260223-b4-dmabuf-heap-coherent-rmem-91fd3926afe9
Best regards,
--
Albert Esteve <aesteve(a)redhat.com>
begin_cpu_udmabuf() maps the sg_table with the caller-provided direction
(e.g., DMA_TO_DEVICE for a write-only sync), and caches it in ubuf->sg
for reuse. However, release_udmabuf() always unmaps this sg_table with
a hardcoded DMA_BIDIRECTIONAL, regardless of the direction that was
originally used for the mapping.
With CONFIG_DMA_API_DEBUG=y this produces:
DMA-API: misc udmabuf: device driver frees DMA memory with different
direction [device address=0x000000044a123000] [size=4096 bytes]
[mapped with DMA_TO_DEVICE] [unmapped with DMA_BIDIRECTIONAL]
The issue was found during video playback when GStreamer performed a
write-only DMA_BUF_IOCTL_SYNC on a udmabuf. It can be reproduced
with CONFIG_DMA_API_DEBUG=y by creating a udmabuf from a memfd,
performing a write-only sync (DMA_BUF_SYNC_WRITE without
DMA_BUF_SYNC_READ), and closing the file descriptor.
Fix this by storing the DMA direction used when the sg_table is first
created in begin_cpu_udmabuf(), and passing that same direction to
put_sg_table() in release_udmabuf().
Fixes: 284562e1f348 ("udmabuf: implement begin_cpu_access/end_cpu_access hooks")
Cc: stable(a)vger.kernel.org
Signed-off-by: Mikhail Gavrilov <mikhail.v.gavrilov(a)gmail.com>
---
drivers/dma-buf/udmabuf.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/drivers/dma-buf/udmabuf.c b/drivers/dma-buf/udmabuf.c
index 94b8ecb892bb..d0836febefdd 100644
--- a/drivers/dma-buf/udmabuf.c
+++ b/drivers/dma-buf/udmabuf.c
@@ -40,6 +40,7 @@ struct udmabuf {
struct folio **pinned_folios;
struct sg_table *sg;
+ enum dma_data_direction sg_dir;
struct miscdevice *device;
pgoff_t *offsets;
};
@@ -235,7 +236,7 @@ static void release_udmabuf(struct dma_buf *buf)
struct device *dev = ubuf->device->this_device;
if (ubuf->sg)
- put_sg_table(dev, ubuf->sg, DMA_BIDIRECTIONAL);
+ put_sg_table(dev, ubuf->sg, ubuf->sg_dir);
deinit_udmabuf(ubuf);
kfree(ubuf);
@@ -253,6 +254,8 @@ static int begin_cpu_udmabuf(struct dma_buf *buf,
if (IS_ERR(ubuf->sg)) {
ret = PTR_ERR(ubuf->sg);
ubuf->sg = NULL;
+ } else {
+ ubuf->sg_dir = direction;
}
} else {
dma_sync_sgtable_for_cpu(dev, ubuf->sg, direction);
--
2.53.0
Hi all,
There were various suggestions in the September 2025 thread "[TECH
TOPIC] vfio, iommufd: Enabling user space drivers to vend more
granular access to client processes" [0], and LPC discussions, around
improving the situation for multi-process userspace driver designs.
This RFC series implements some of these ideas.
(Thanks for feedback on v1! Revised series, with changes noted
inline.)
Background: Multi-process USDs
==============================
The userspace driver scenario discussed in that thread involves a
primary process driving a PCIe function through VFIO/iommufd, which
manages the function-wide ownership/lifecycle. The function is
designed to provide multiple distinct programming interfaces (for
example, several independent MMIO register frames in one function),
and the primary process delegates control of these interfaces to
multiple independent client processes (which do the actual work).
This scenario clearly relies on a HW design that provides appropriate
isolation between the programming interfaces.
The two key needs are:
1. Mechanisms to safely delegate a subset of the device MMIO
resources to a client process without over-sharing wider access
(or influence over whole-device activities, such as reset).
2. Mechanisms to allow a client process to do its own iommufd
management w.r.t. its address space, in a way that's isolated
from DMA relating to other clients.
mmap() of VFIO DMABUFs
======================
This RFC addresses #1 in "vfio/pci: Support mmap() of a VFIO DMABUF",
implementing the proposals in [0] to add mmap() support to the
existing VFIO DMABUF exporter.
This enables a userspace driver to define DMABUF ranges corresponding
to sub-ranges of a BAR, and grant a given client (via a shared fd)
the capability to access (only) those sub-ranges. The VFIO device fds
would be kept private to the primary process. All the client can do
with that fd is map (or iomap via iommufd) that specific subset of
resources, and the impact of bugs/malice is contained.
(We'll follow up on #2 separately, as a related-but-distinct problem.
PASIDs are one way to achieve per-client isolation of DMA; another
could be sharing of a single IOVA space via 'constrained' iommufds.)
New in v2: To achieve this, the existing VFIO BAR mmap() path is
converted to use DMABUFs behind the scenes, in "vfio/pci: Convert BAR
mmap() to use a DMABUF" plus new helper functions, as Jason/Christian
suggested in the v1 discussion [3].
This means:
- Both regular and new DMABUF BAR mappings share the same vm_ops,
i.e. mmap()ing DMABUFs is a smaller change on top of the existing
mmap().
- The zapping of mappings occurs via vfio_pci_dma_buf_move(), and the
vfio_pci_zap_bars() originally paired with the _move()s can go
away. Each DMABUF has a unique address_space.
- It's a step towards future iommufd VFIO Type1 emulation
implementing P2P, since iommufd can now get a DMABUF from a VA that
it's mapping for IO; the VMAs' vm_file is that of the backing
DMABUF.
Revocation/reclaim
==================
Mapping a BAR subset is useful, but the lifetime of access granted to
a client needs to be managed well. For example, a protocol between
the primary process and the client can indicate when the client is
done, and when it's safe to reuse the resources elsewhere, but cleanup
can't practically be cooperative.
For robustness, we enable the driver to make the resources
guaranteed-inaccessible when it chooses, so that it can re-assign them
to other uses in future.
"vfio/pci: Permanently revoke a DMABUF on request" adds a new VFIO
device fd ioctl, VFIO_DEVICE_PCI_DMABUF_REVOKE. This takes a DMABUF
fd parameter previously exported (from that device!) and permanently
revokes the DMABUF. This notifies/detaches importers, zaps PTEs for
any mappings, and guarantees no future attachment/import/map/access is
possible by any means.
A primary driver process would use this operation when the client's
tenure ends to reclaim "loaned-out" MMIO interfaces, at which point
the interfaces could be safely re-used.
New in v2: ioctl() on VFIO driver fd, rather than DMABUF fd. A DMABUF
is revoked using code common to vfio_pci_dma_buf_move(), selectively
zapping mappings (after waiting for completion on the
dma_buf_invalidate_mappings() request).
BAR mapping access attributes
=============================
Inspired by Alex [Mastro] and Jason's comments in [0] and Mahmoud's
work in [1] with the goal of controlling CPU access attributes for
VFIO BAR mappings (e.g. WC), we can decorate DMABUFs with access
attributes that are then used by a mapping's PTEs.
I've proposed reserving a field in struct
vfio_device_feature_dma_buf's flags to specify an attribute for its
ranges. Although that keeps the (UAPI) struct unchanged, it means all
ranges in a DMABUF share the same attribute. I feel a single
attribute-to-mmap() relation is logical/reasonable. An application
can also create multiple DMABUFs to describe any BAR layout and mix of
attributes.
Tests
=====
(Still sharing the [RFC ONLY] userspace test/demo program for context,
not for merge.)
It illustrates & tests various map/revoke cases, but doesn't use the
existing VFIO selftests and relies on a (tweaked) QEMU EDU function.
I'm (still) working on integrating the scenarios into the existing
VFIO selftests.
This code has been tested in mapping DMABUFs of single/multiple
ranges, aliasing mmap()s, aliasing ranges across DMABUFs, vm_pgoff >
0, revocation, shutdown/cleanup scenarios, and hugepage mappings seem
to work correctly. I've lightly tested WC mappings also (by observing
resulting PTEs as having the correct attributes...).
Fin
===
v2 is based on next-20260310 (to build on Leon's recent series
"vfio: Wait for dma-buf invalidation to complete" [2]).
Please share your thoughts! I'd like to de-RFC if we feel this
approach is now fair.
Many thanks,
Matt
References:
[0]: https://lore.kernel.org/linux-iommu/20250918214425.2677057-1-amastro@fb.com/
[1]: https://lore.kernel.org/all/20250804104012.87915-1-mngyadam@amazon.de/
[2]: https://lore.kernel.org/linux-iommu/20260205-nocturnal-poetic-chamois-f566a…
[3]: https://lore.kernel.org/all/20260226202211.929005-1-mattev@meta.com/
--------------------------------------------------------------------------------
Changelog:
v2: Respin based on the feedback/suggestions:
- Transform the existing VFIO BAR mmap path to also use DMABUFs behind
the scenes, and then simply share that code for explicitly-mapped
DMABUFs.
- Refactors the export itself out of vfio_pci_core_feature_dma_buf,
and shared by a new vfio_pci_core_mmap_prep_dmabuf helper used by
the regular VFIO mmap to create a DMABUF.
- Revoke buffers using a VFIO device fd ioctl
v1: https://lore.kernel.org/all/20260226202211.929005-1-mattev@meta.com/
Matt Evans (10):
vfio/pci: Set up VFIO barmap before creating a DMABUF
vfio/pci: Clean up DMABUFs before disabling function
vfio/pci: Add helper to look up PFNs for DMABUFs
vfio/pci: Add a helper to create a DMABUF for a BAR-map VMA
vfio/pci: Convert BAR mmap() to use a DMABUF
vfio/pci: Remove vfio_pci_zap_bars()
vfio/pci: Support mmap() of a VFIO DMABUF
vfio/pci: Permanently revoke a DMABUF on request
vfio/pci: Add mmap() attributes to DMABUF feature
[RFC ONLY] selftests: vfio: Add standalone vfio_dmabuf_mmap_test
drivers/vfio/pci/Kconfig | 3 +-
drivers/vfio/pci/Makefile | 3 +-
drivers/vfio/pci/vfio_pci_config.c | 18 +-
drivers/vfio/pci/vfio_pci_core.c | 123 +--
drivers/vfio/pci/vfio_pci_dmabuf.c | 425 +++++++--
drivers/vfio/pci/vfio_pci_priv.h | 46 +-
include/uapi/linux/vfio.h | 42 +-
tools/testing/selftests/vfio/Makefile | 1 +
.../vfio/standalone/vfio_dmabuf_mmap_test.c | 837 ++++++++++++++++++
9 files changed, 1339 insertions(+), 159 deletions(-)
create mode 100644 tools/testing/selftests/vfio/standalone/vfio_dmabuf_mmap_test.c
--
2.47.3
Commit 3a236f6a5cf2 ("dma: contiguous: Turn heap registration logic
around") didn't remove one last call to dma_heap_cma_register_heap()
that it removed, thus breaking the build.
That last call is in dma_contiguous_reserve(), to handle the
registration of the default CMA region heap instance.
The default CMA region instance is already somewhat handled by
retrieving it through the dev_get_cma_area() call in the CMA heap
driver. However, since commit 854acbe75ff4 ("dma-buf: heaps: Give
default CMA heap a fixed name"), we will create two heap instances for
the CMA default region.
The first one is always called "default_cma_region", and is the one
handled by the call to dev_get_cma_area() mentioned earlier. The second
one is the name it used to have prior to that last commit for backward
compatibility.
In the case where the default CMA region is defined in the DT, then that
region is registered through rmem_cma_setup() and that region is added
to the list of CMA regions to create a CMA heap instance for.
In the case where the default CMA region is not defined in the DT
though used to be the case covered by the now removed
dma_heap_cma_register_heap() in dma_contiguous_reserve(). If we only
remove the call to dma_heap_cma_register_heap(), then the legacy name of
the CMA heap will not be registered anymore. We thus need to replace
that call with a call to rmem_cma_insert_area() to make sure we queue
this instance, if created, to create a heap instance.
Once that call to dma_heap_cma_register_heap() replaced, we can also
remove the now unused function definition, its now empty header, and all
includes of this header.
Fixes: 3a236f6a5cf2 ("dma: contiguous: Turn heap registration logic around")
Reported-by: Mark Brown <broonie(a)kernel.org>
Closes: https://lore.kernel.org/linux-next/acbjaDJ1a-YQC64d@sirena.co.uk/
Signed-off-by: Maxime Ripard <mripard(a)kernel.org>
---
Changes in v2:
- Fix creation of the CMA heap instance with the legacy name when not
declared in the DT.
- Link to v1: https://lore.kernel.org/r/20260330-dma-build-fix-v1-1-748b64f0d8af@kernel.o…
---
drivers/dma-buf/heaps/cma_heap.c | 1 -
include/linux/dma-buf/heaps/cma.h | 16 ----------------
kernel/dma/contiguous.c | 14 +++++++++++---
3 files changed, 11 insertions(+), 20 deletions(-)
diff --git a/drivers/dma-buf/heaps/cma_heap.c b/drivers/dma-buf/heaps/cma_heap.c
index 7216a14262b04bb6130ddf26b7d009f7d15b03fd..9a8b36bc929f6daa483a0139a2919d95127e0d23 100644
--- a/drivers/dma-buf/heaps/cma_heap.c
+++ b/drivers/dma-buf/heaps/cma_heap.c
@@ -12,11 +12,10 @@
#define pr_fmt(fmt) "cma_heap: " fmt
#include <linux/cma.h>
#include <linux/dma-buf.h>
-#include <linux/dma-buf/heaps/cma.h>
#include <linux/dma-heap.h>
#include <linux/dma-map-ops.h>
#include <linux/err.h>
#include <linux/highmem.h>
#include <linux/io.h>
diff --git a/include/linux/dma-buf/heaps/cma.h b/include/linux/dma-buf/heaps/cma.h
deleted file mode 100644
index e751479e21e703e24a5f799b4a7fc8bd0df3c1c4..0000000000000000000000000000000000000000
--- a/include/linux/dma-buf/heaps/cma.h
+++ /dev/null
@@ -1,16 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 */
-#ifndef DMA_BUF_HEAP_CMA_H_
-#define DMA_BUF_HEAP_CMA_H_
-
-struct cma;
-
-#ifdef CONFIG_DMABUF_HEAPS_CMA
-int dma_heap_cma_register_heap(struct cma *cma);
-#else
-static inline int dma_heap_cma_register_heap(struct cma *cma)
-{
- return 0;
-}
-#endif // CONFIG_DMABUF_HEAPS_CMA
-
-#endif // DMA_BUF_HEAP_CMA_H_
diff --git a/kernel/dma/contiguous.c b/kernel/dma/contiguous.c
index ad50512d71d3088a73e4b1ac02d6e6122374888e..d5d15983060c5c54744d6a63f2b591e1a3455b86 100644
--- a/kernel/dma/contiguous.c
+++ b/kernel/dma/contiguous.c
@@ -40,11 +40,10 @@
#include <asm/page.h>
#include <linux/memblock.h>
#include <linux/err.h>
#include <linux/sizes.h>
-#include <linux/dma-buf/heaps/cma.h>
#include <linux/dma-map-ops.h>
#include <linux/cma.h>
#include <linux/nospec.h>
#ifdef CONFIG_CMA_SIZE_MBYTES
@@ -217,10 +216,19 @@ static void __init dma_numa_cma_reserve(void)
static inline void __init dma_numa_cma_reserve(void)
{
}
#endif
+#ifdef CONFIG_OF_RESERVED_MEM
+static int rmem_cma_insert_area(struct cma *cma);
+#else
+static inline int rmem_cma_insert_area(struct cma *cma)
+{
+ return 0;
+}
+#endif
+
/**
* dma_contiguous_reserve() - reserve area(s) for contiguous memory handling
* @limit: End address of the reserved memory (optional, 0 for any).
*
* This function reserves memory from early allocator. It should be
@@ -271,13 +279,13 @@ void __init dma_contiguous_reserve(phys_addr_t limit)
&dma_contiguous_default_area,
fixed);
if (ret)
return;
- ret = dma_heap_cma_register_heap(dma_contiguous_default_area);
+ ret = rmem_cma_insert_area(dma_contiguous_default_area);
if (ret)
- pr_warn("Couldn't register default CMA heap.");
+ pr_warn("Couldn't queue default CMA region for heap creation.");
}
}
void __weak
dma_contiguous_early_fixup(phys_addr_t base, unsigned long size)
---
base-commit: 6c683d5b1903a14e362c9f1628ce9fe61eac35e7
change-id: 20260330-dma-build-fix-706a4feb0e0f
Best regards,
--
Maxime Ripard <mripard(a)kernel.org>