This patch series introduces the Qualcomm DSP Accelerator (QDA) driver,
a modern DRM-based accelerator implementation for Qualcomm Hexagon DSPs.
The driver provides a standardized interface for offloading computational
tasks to DSPs found on Qualcomm SoCs, supporting all DSP domains (ADSP,
CDSP, SDSP, GDSP).
The QDA driver is designed as an alternative for the FastRPC driver
in drivers/misc/, offering improved resource management, better integration
with standard kernel subsystems, and alignment with the Linux kernel's
Compute Accelerators framework.
User-space staging branch
============
https://github.com/qualcomm/fastrpc/tree/accel/staging
Key Features
============
* Standard DRM accelerator interface via /dev/accel/accelN
* GEM-based buffer management with DMA-BUF import/export support
* IOMMU-based memory isolation using per-process context banks
* FastRPC protocol implementation for DSP communication
* RPMsg transport layer for reliable message passing
* Support for all DSP domains (ADSP, CDSP, SDSP, GDSP)
* Comprehensive IOCTL interface for DSP operations
High-Level Architecture Differences with Existing FastRPC Driver
=================================================================
The QDA driver represents a significant architectural departure from the
existing FastRPC driver (drivers/misc/fastrpc.c), addressing several key
limitations while maintaining protocol compatibility:
1. DRM Accelerator Framework Integration
- FastRPC: Custom character device (/dev/fastrpc-*)
- QDA: Standard DRM accel device (/dev/accel/accelN)
- Benefit: Leverages established DRM infrastructure for device
management.
2. Memory Management
- FastRPC: Custom memory allocator with ION/DMA-BUF integration
- QDA: Native GEM objects with full PRIME support
- Benefit: Seamless buffer sharing using standard DRM mechanisms
3. IOMMU Context Bank Management
- FastRPC: Direct IOMMU domain manipulation, limited isolation
- QDA: Custom compute bus (qda_cb_bus_type) with proper device model
- Benefit: Each CB device is a proper struct device with IOMMU group
support, enabling better isolation and resource tracking.
- https://lore.kernel.org/all/245d602f-3037-4ae3-9af9-d98f37258aae@oss.qualco…
4. Memory Manager Architecture
- FastRPC: Monolithic allocator
- QDA: Pluggable memory manager with backend abstraction
- Benefit: Currently uses DMA-coherent backend, easily extensible for
future memory types (e.g., carveout, CMA)
5. Transport Layer
- FastRPC: Direct RPMsg integration in core driver
- QDA: Abstracted transport layer (qda_rpmsg.c)
- Benefit: Clean separation of concerns, easier to add alternative
transports if needed
8. Code Organization
- FastRPC: ~3000 lines in single file
- QDA: Modular design across multiple files (~4600 lines total)
* qda_drv.c: Core driver and DRM integration
* qda_gem.c: GEM object management
* qda_memory_manager.c: Memory and IOMMU management
* qda_fastrpc.c: FastRPC protocol implementation
* qda_rpmsg.c: Transport layer
* qda_cb.c: Context bank device management
- Benefit: Better maintainability, clearer separation of concerns
9. UAPI Design
- FastRPC: Custom IOCTL interface
- QDA: DRM-style IOCTLs with proper versioning support
- Benefit: Follows DRM conventions, easier userspace integration
10. Documentation
- FastRPC: Minimal in-tree documentation
- QDA: Comprehensive documentation in Documentation/accel/qda/
- Benefit: Better developer experience, clearer API contracts
11. Buffer Reference Mechanism
- FastRPC: Uses buffer file descriptors (FDs) for all book-keeping
in both kernel and DSP
- QDA: Uses GEM handles for kernel-side management, providing better
integration with DRM subsystem
- Benefit: Leverages DRM GEM infrastructure for reference counting,
lifetime management, and integration with other DRM components
Key Technical Improvements
===========================
* Proper device model: CB devices are real struct device instances on a
custom bus, enabling proper IOMMU group management and power management
integration
* Reference-counted IOMMU devices: Multiple file descriptors from the same
process share a single IOMMU device, reducing overhead
* GEM-based buffer lifecycle: Automatic cleanup via DRM GEM reference
counting, eliminating many resource leak scenarios
* Modular memory backends: The memory manager supports pluggable backends,
currently implementing DMA-coherent allocations with SID-prefixed
addresses for DSP firmware
* Context-based invocation tracking: XArray-based context management with
proper synchronization and cleanup
Patch Series Organization
==========================
Patches 1-2: Driver skeleton and documentation
Patches 3-6: RPMsg transport and IOMMU/CB infrastructure
Patches 7-9: DRM device registration and basic IOCTL
Patches 10-12: GEM buffer management and PRIME support
Patches 13-17: FastRPC protocol implementation (attach, invoke, create,
map/unmap)
Patch 18: MAINTAINERS entry
Open Items
===========
The following items are identified as open items:
1. Privilege Level Management
- Currently, daemon processes and user processes have the same access
level as both use the same accel device node. This needs to be
addressed as daemons attach to privileged DSP PDs and require
higher privilege levels for system-level operations
- Seeking guidance on the best approach: separate device nodes,
capability-based checks, or DRM master/authentication mechanisms
2. UAPI Compatibility Layer
- Add UAPI compat layer to facilitate migration of client applications
from existing FastRPC UAPI to the new QDA accel driver UAPI,
ensuring smooth transition for existing userspace code
- Seeking guidance on implementation approach: in-kernel translation
layer, userspace wrapper library, or hybrid solution
3. Documentation Improvements
- Add detailed IOCTL usage examples
- Document DSP firmware interface requirements
- Create migration guide from existing FastRPC
4. Per-Domain Memory Allocation
- Develop new userspace API to support memory allocation on a per
domain basis, enabling domain-specific memory management and
optimization
5. Audio and Sensors PD Support
- The current patch series does not handle Audio PD and Sensors PD
functionalities. These specialized protection domains require
additional support for real-time constraints and power management
Interface Compatibility
========================
The QDA driver maintains compatibility with existing FastRPC infrastructure:
* Device Tree Bindings: The driver uses the same device tree bindings as
the existing FastRPC driver, ensuring no changes are required to device
tree sources. The "qcom,fastrpc" compatible string and child node
structure remain unchanged.
* Userspace Interface: While the driver provides a new DRM-based UAPI,
the underlying FastRPC protocol and DSP firmware interface remain
compatible. This ensures that DSP firmware and libraries continue to
work without modification.
* Migration Path: The modular design allows for gradual migration, where
both drivers can coexist during the transition period. Applications can
be migrated incrementally to the new UAPI with the help of the planned
compatibility layer.
References
==========
Previous discussions on this migration:
- https://lkml.org/lkml/2024/6/24/479
- https://lkml.org/lkml/2024/6/21/1252
Testing
=======
The driver has been tested on Qualcomm platforms with:
- Basic FastRPC attach/release operations
- DSP process creation and initialization
- Memory mapping/unmapping operations
- Dynamic invocation with various buffer types
- GEM buffer allocation and mmap
- PRIME buffer import from other subsystems
Signed-off-by: Ekansh Gupta <ekansh.gupta(a)oss.qualcomm.com>
---
Ekansh Gupta (18):
accel/qda: Add Qualcomm QDA DSP accelerator driver docs
accel/qda: Add Qualcomm DSP accelerator driver skeleton
accel/qda: Add RPMsg transport for Qualcomm DSP accelerator
accel/qda: Add built-in compute CB bus for QDA and integrate with IOMMU
accel/qda: Create compute CB devices on QDA compute bus
accel/qda: Add memory manager for CB devices
accel/qda: Add DRM accel device registration for QDA driver
accel/qda: Add per-file DRM context and open/close handling
accel/qda: Add QUERY IOCTL and basic QDA UAPI header
accel/qda: Add DMA-backed GEM objects and memory manager integration
accel/qda: Add GEM_CREATE and GEM_MMAP_OFFSET IOCTLs
accel/qda: Add PRIME dma-buf import support
accel/qda: Add initial FastRPC attach and release support
accel/qda: Add FastRPC dynamic invocation support
accel/qda: Add FastRPC DSP process creation support
accel/qda: Add FastRPC-based DSP memory mapping support
accel/qda: Add FastRPC-based DSP memory unmapping support
MAINTAINERS: Add MAINTAINERS entry for QDA driver
Documentation/accel/index.rst | 1 +
Documentation/accel/qda/index.rst | 14 +
Documentation/accel/qda/qda.rst | 129 ++++
MAINTAINERS | 9 +
arch/arm64/configs/defconfig | 2 +
drivers/accel/Kconfig | 1 +
drivers/accel/Makefile | 2 +
drivers/accel/qda/Kconfig | 35 ++
drivers/accel/qda/Makefile | 19 +
drivers/accel/qda/qda_cb.c | 182 ++++++
drivers/accel/qda/qda_cb.h | 26 +
drivers/accel/qda/qda_compute_bus.c | 23 +
drivers/accel/qda/qda_drv.c | 375 ++++++++++++
drivers/accel/qda/qda_drv.h | 171 ++++++
drivers/accel/qda/qda_fastrpc.c | 1002 ++++++++++++++++++++++++++++++++
drivers/accel/qda/qda_fastrpc.h | 433 ++++++++++++++
drivers/accel/qda/qda_gem.c | 211 +++++++
drivers/accel/qda/qda_gem.h | 103 ++++
drivers/accel/qda/qda_ioctl.c | 271 +++++++++
drivers/accel/qda/qda_ioctl.h | 118 ++++
drivers/accel/qda/qda_memory_dma.c | 91 +++
drivers/accel/qda/qda_memory_dma.h | 46 ++
drivers/accel/qda/qda_memory_manager.c | 382 ++++++++++++
drivers/accel/qda/qda_memory_manager.h | 148 +++++
drivers/accel/qda/qda_prime.c | 194 +++++++
drivers/accel/qda/qda_prime.h | 43 ++
drivers/accel/qda/qda_rpmsg.c | 327 +++++++++++
drivers/accel/qda/qda_rpmsg.h | 57 ++
drivers/iommu/iommu.c | 4 +
include/linux/qda_compute_bus.h | 22 +
include/uapi/drm/qda_accel.h | 224 +++++++
31 files changed, 4665 insertions(+)
---
base-commit: d4906ae14a5f136ceb671bb14cedbf13fa560da6
change-id: 20260223-qda-firstpost-4ab05249e2cc
Best regards,
--
Ekansh Gupta <ekansh.gupta(a)oss.qualcomm.com>
This patch series adds a new dma-buf heap driver that exposes coherent,
non‑reusable reserved-memory regions as named heaps, so userspace can
explicitly allocate buffers from those device‑specific pools.
Motivation: we want cgroup accounting for all userspace‑visible buffer
allocations (DRM, v4l2, dma‑buf heaps, etc.). That’s hard to do when
drivers call dma_alloc_attrs() directly because the accounting controller
(memcg vs dmem) is ambiguous. The long‑term plan is to steer those paths
toward dma‑buf heaps, where each heap can unambiguously charge a single
controller. To reach that goal, we need a heap backend for each
dma_alloc_attrs() memory type. CMA and system heaps already exist;
coherent reserved‑memory was the missing piece, since many SoCs define
dedicated, device‑local coherent pools in DT under /reserved-memory using
"shared-dma-pool" with non‑reusable regions (i.e., not CMA) that are
carved out exclusively for coherent DMA and are currently only usable by
in‑kernel drivers.
Because these regions are device‑dependent, each heap instance binds a
heap device to its reserved‑mem region via a newly introduced helper
function -namely, of_reserved_mem_device_init_with_mem()- so coherent
allocations use the correct dev->dma_mem.
Charging to cgroups for these buffers is intentionally left out to keep
review focused on the new heap; I plan to follow up based on Eric’s [1]
and Maxime’s [2] work on dmem charging from userspace.
This series also makes the new heap driver modular, in line with the CMA
heap change in [3].
[1] https://lore.kernel.org/all/20260218-dmabuf-heap-cma-dmem-v2-0-b249886fb7b2…
[2] https://lore.kernel.org/all/20250310-dmem-cgroups-v1-0-2984c1bc9312@kernel.…
[3] https://lore.kernel.org/all/20260303-dma-buf-heaps-as-modules-v3-0-24344812…
Signed-off-by: Albert Esteve <aesteve(a)redhat.com>
---
Changes in v3:
- Reorganized changesets among patches to ensure bisectability
- Removed unused dma_heap_coherent_register() leftover
- Removed fallback when setting mask in coherent heap dev, since
dma_set_mask() already truncates to supported masks
- Moved struct rmem_assigned_device (rd) logic to
of_reserved_mem_device_init_with_mem() to allow listing the device
- Link to v2: https://lore.kernel.org/r/20260303-b4-dmabuf-heap-coherent-rmem-v2-0-65a465…
Changes in v2:
- Removed dmem charging parts
- Moved coherent heap registering logic to coherent.c
- Made heap device a member of struct dma_heap
- Split dma_heap_add logic into create/register, to be able to
access the stored heap device before registered.
- Avoid platform device in favour of heap device
- Added a wrapper to rmem device_init() op
- Switched from late_initcall() to module_init()
- Made the coherent heap driver modular
- Link to v1: https://lore.kernel.org/r/20260224-b4-dmabuf-heap-coherent-rmem-v1-1-dffef4…
---
Albert Esteve (5):
dma-buf: dma-heap: split dma_heap_add
of_reserved_mem: add a helper for rmem device_init op
dma: coherent: store reserved memory coherent regions
dma-buf: heaps: Add Coherent heap to dmabuf heaps
dma-buf: heaps: coherent: Turn heap into a module
John Stultz (1):
dma-buf: dma-heap: Keep track of the heap device struct
drivers/dma-buf/dma-heap.c | 138 +++++++++--
drivers/dma-buf/heaps/Kconfig | 9 +
drivers/dma-buf/heaps/Makefile | 1 +
drivers/dma-buf/heaps/coherent_heap.c | 417 ++++++++++++++++++++++++++++++++++
drivers/of/of_reserved_mem.c | 68 ++++--
include/linux/dma-heap.h | 5 +
include/linux/dma-map-ops.h | 7 +
include/linux/of_reserved_mem.h | 8 +
kernel/dma/coherent.c | 34 +++
9 files changed, 640 insertions(+), 47 deletions(-)
---
base-commit: 6de23f81a5e08be8fbf5e8d7e9febc72a5b5f27f
change-id: 20260223-b4-dmabuf-heap-coherent-rmem-91fd3926afe9
Best regards,
--
Albert Esteve <aesteve(a)redhat.com>
I often get this question in my mind when I think about visiting the mountains — is Leh Ladakh safe for visitors like me? The place looks absolutely beautiful in photos, with huge mountains, quiet monasteries, and those long scenic roads. But at the same time, I always wonder about the altitude, weather, and road conditions. Since it’s a high-altitude region, I feel it’s important to understand how the body reacts there and how much time I should give myself to adjust.
From what I’ve learned so far, many travelers visit every year and most of them have a wonderful experience. Still, I always prefer to read carefully and plan things before going anywhere new. I like checking routes, the best months to visit, and what kind of preparations I should make. That’s why I started looking through a Leh Ladakh travel guide, because it answers many of the small questions I have in my mind.
I also feel safer knowing that tourism is common there and local people are welcoming to visitors. So maybe the real question for me is not just safety, but how well I plan my trip. With the right preparation, the journey can be amazing. 🏔️✨
More details:
https://www.indiahighlight.com/destination/leh-ladakh
📧 emmastone(a)yzcalo.com
📞 19707840507
begin_cpu_udmabuf() maps the sg_table with the caller-provided direction
(e.g., DMA_TO_DEVICE for a write-only sync), and caches it in ubuf->sg
for reuse. However, release_udmabuf() always unmaps this sg_table with
a hardcoded DMA_BIDIRECTIONAL, regardless of the direction that was
originally used for the mapping.
With CONFIG_DMA_API_DEBUG=y this produces:
DMA-API: misc udmabuf: device driver frees DMA memory with different
direction [device address=0x000000044a123000] [size=4096 bytes]
[mapped with DMA_TO_DEVICE] [unmapped with DMA_BIDIRECTIONAL]
The issue was found during video playback when GStreamer performed a
write-only DMA_BUF_IOCTL_SYNC on a udmabuf. It can be reproduced
with CONFIG_DMA_API_DEBUG=y by creating a udmabuf from a memfd,
performing a write-only sync (DMA_BUF_SYNC_WRITE without
DMA_BUF_SYNC_READ), and closing the file descriptor.
Fix this by storing the DMA direction used when the sg_table is first
created in begin_cpu_udmabuf(), and passing that same direction to
put_sg_table() in release_udmabuf().
Fixes: 284562e1f348 ("udmabuf: implement begin_cpu_access/end_cpu_access hooks")
Cc: stable(a)vger.kernel.org
Signed-off-by: Mikhail Gavrilov <mikhail.v.gavrilov(a)gmail.com>
---
drivers/dma-buf/udmabuf.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/drivers/dma-buf/udmabuf.c b/drivers/dma-buf/udmabuf.c
index 94b8ecb892bb..d0836febefdd 100644
--- a/drivers/dma-buf/udmabuf.c
+++ b/drivers/dma-buf/udmabuf.c
@@ -40,6 +40,7 @@ struct udmabuf {
struct folio **pinned_folios;
struct sg_table *sg;
+ enum dma_data_direction sg_dir;
struct miscdevice *device;
pgoff_t *offsets;
};
@@ -235,7 +236,7 @@ static void release_udmabuf(struct dma_buf *buf)
struct device *dev = ubuf->device->this_device;
if (ubuf->sg)
- put_sg_table(dev, ubuf->sg, DMA_BIDIRECTIONAL);
+ put_sg_table(dev, ubuf->sg, ubuf->sg_dir);
deinit_udmabuf(ubuf);
kfree(ubuf);
@@ -253,6 +254,8 @@ static int begin_cpu_udmabuf(struct dma_buf *buf,
if (IS_ERR(ubuf->sg)) {
ret = PTR_ERR(ubuf->sg);
ubuf->sg = NULL;
+ } else {
+ ubuf->sg_dir = direction;
}
} else {
dma_sync_sgtable_for_cpu(dev, ubuf->sg, direction);
--
2.53.0