This patch series introduces the Qualcomm DSP Accelerator (QDA) driver, a DRM-based accelerator driver for Qualcomm DSPs. The driver provides a standardized interface for offloading computational tasks to DSPs found on Qualcomm SoCs, supporting all DSP domains.
The QDA driver implements the FastRPC protocol over the DRM accel subsystem. It uses the same device-tree node structure as the existing fastrpc driver in drivers/misc/. The approach for binding the QDA driver to device-tree nodes while coexisting with the fastrpc driver is an open item described below.
RFC thread: https://lore.kernel.org/dri-devel/20260224-qda-firstpost-v1-0-fe46a9c1a046@o...
User-space staging branch ========================= https://github.com/qualcomm/fastrpc/tree/accel/staging
Key Features ============
* Standard DRM accelerator interface via /dev/accel/accelN * GEM-based buffer management with DMA-BUF import/export (PRIME) * IOMMU-based memory isolation using per-process context banks * FastRPC protocol implementation for DSP communication * RPMsg transport layer for reliable message passing * Support for all DSP domains (ADSP, CDSP, SDSP, GDSP) * DRM IOCTL interface for DSP session management, buffer allocation, and remote procedure invocation
Architecture ============
1. DRM Accelerator Framework Integration The driver registers as a DRM accel device, exposing a standard /dev/accel/accelN character device node. This provides established DRM infrastructure for device management, file operations, and IOCTL dispatch.
2. Memory Management Buffers are managed as GEM objects with full PRIME support for DMA-BUF import/export. This enables seamless buffer sharing with other DRM drivers (GPU, camera, video) using standard kernel mechanisms.
3. IOMMU Context Bank Management IOMMU context banks (CBs) are represented as proper struct device instances on a custom virtual bus (qda-compute-cb). Each CB device is registered with the IOMMU subsystem and receives its own IOMMU domain, enabling per-session address space isolation. The custom bus was introduced because IOMMU context banks are synthetic constructs — not real platform devices — and to ensure CB device lifetime is strictly subordinate to the parent QDA device. See also: https://lore.kernel.org/all/245d602f-3037-4ae3-9af9-d98f37258aae@oss.qualcom...
4. Memory Manager Architecture A pluggable memory manager coordinates IOMMU device assignment and buffer allocation. The current implementation uses a DMA-coherent backend with SID-prefixed DMA addresses for DSP firmware compatibility.
5. Transport Layer RPMsg communication is handled in a dedicated transport layer (qda_rpmsg.c), separate from the core DRM driver logic.
6. Code Organization The driver is organized across multiple files (~4600 lines total): * qda_drv.c: Core driver and DRM integration * qda_rpmsg.c: RPMsg transport layer * qda_cb.c: Context bank device management * qda_compute_bus.c: Custom virtual bus for CB devices * qda_gem.c: GEM object management * qda_prime.c: DMA-BUF import (PRIME) * qda_memory_manager.c: IOMMU device registry and allocation * qda_memory_dma.c: DMA-coherent allocation backend * qda_fastrpc.c: FastRPC protocol implementation * qda_ioctl.c: IOCTL dispatch
7. UAPI Design The driver exposes DRM-style IOCTLs defined in include/uapi/drm/qda_accel.h, following DRM UAPI conventions (__u32/__u64 types, C++ guard, GPL-2.0-only WITH Linux-syscall-note).
Patch Series Organization ==========================
Patch 01: MAINTAINERS entry Patch 02: Driver documentation (Documentation/accel/qda/) Patches 03-04: Core driver skeleton and compute bus Patch 05: iommu: Register qda-compute-cb bus with IOMMU subsystem Patches 06-07: CB device enumeration and memory manager Patch 08: QUERY IOCTL and UAPI header Patches 09-11: GEM buffer management and PRIME import Patches 12-15: FastRPC protocol (invoke, session create/release, map/unmap)
Open Items ===========
1. Device-Tree Compatible String The QDA driver uses the same device-tree node structure and properties as the existing fastrpc driver in drivers/misc/. A mechanism is needed to allow the QDA driver to bind to its device node independently of the fastrpc driver.
The intended coexistence model is: platforms that require the complete fastrpc feature set continue to use "qcom,fastrpc"; new platforms where a feature available only in QDA takes priority, or where QDA's current feature set is sufficient, use a QDA-specific compatible string. New feature development is directed toward QDA rather than the existing fastrpc driver. As QDA matures toward feature parity with fastrpc, platforms can adopt the QDA-specific compatible string exclusively.
The options under consideration are:
a) Add a new "qcom,qda" compatible string to the existing qcom,fastrpc.yaml binding, since the DT node structure and properties are identical. This avoids a separate binding file but adds a QDA-specific string to a fastrpc binding.
b) Introduce a separate qcom,qda.yaml binding that references or inherits the fastrpc binding properties.
Seeking guidance from DT binding maintainers on the preferred approach.
2. Privilege Level Management Currently, daemon processes and user processes have the same access level as both use the same accel device node. This needs to be addressed as daemons attach to privileged DSP protection domains and require higher privilege levels for system-level operations. Seeking guidance on the best approach: separate device nodes, capability-based checks, or DRM master/authentication mechanisms.
3. UAPI Compatibility Layer A compatibility layer is needed to facilitate migration of client applications from the existing FastRPC UAPI to the new QDA UAPI, ensuring a smooth transition for existing userspace code. Seeking guidance on the preferred implementation approach: in-kernel translation layer, userspace wrapper library, or hybrid solution.
An initial evaluation of an in-kernel translation shim was performed, where legacy FastRPC device nodes (/dev/fastrpc-*) are exposed and requests are internally routed to the QDA accel driver. The goal was to keep the compatibility layer minimal, reuse existing QDA helper paths (attach, buffer allocation, mapping, etc.), and avoid duplication of GEM and buffer management logic.
However, the following challenges were identified:
a) Dependency on drm_file for QDA helpers QDA relies on GEM-backed allocations and per-client handle namespaces, which require a valid struct drm_file. Since GEM handles are scoped per drm_file, the compatibility layer cannot directly reuse QDA helper paths without establishing a proper drm_file context for each client.
b) Lack of public API for drm_file creation Creating a drm_file directly (similar to mock_drm_getfile()-style approaches) is not feasible, as the required helpers (drm_file_alloc(), drm_file_free(), etc.) are internal to the DRM core and not exported. This prevents external drivers from safely constructing and managing drm_file instances.
c) VFS-based open is not a viable solution Opening the underlying accel device (/dev/accel/accelN) from the compatibility driver via filp_open() does provide a valid drm_file, but introduces reliance on userspace-visible device paths, lack of stability in containerized or chroot environments, and no clean mapping between legacy device nodes and accel devices.
d) Userspace proxy limitations (CUSE) A CUSE-based userspace proxy was evaluated. However, DMA-buf file descriptors passed by legacy applications cannot be directly reused in the CUSE daemon (file descriptors are process-specific), which breaks buffer sharing semantics.
e) drm_client-based approaches do not match requirements drm_client APIs (used for fbdev emulation) rely on a shared drm_file and do not provide the per-client isolation required by FastRPC semantics.
Due to the above constraints, it is currently unclear how to implement an in-kernel compatibility layer that correctly handles per-client drm_file contexts without relying on VFS paths or non-exported DRM internals.
4. Documentation Improvements Add detailed IOCTL usage examples, document DSP firmware interface requirements, and create a migration guide from the existing FastRPC driver.
5. Per-Session Memory Allocation Develop a userspace API to support memory allocation on a per-session basis, enabling session-specific memory management.
6. Audio and Sensors PD Support The current series does not handle Audio PD and Sensors PD functionalities. These specialized protection domains require additional support for real-time constraints and power management.
Interface Compatibility ========================
The QDA driver uses the same device-tree node structure and child node layout (including "qcom,fastrpc-compute-cb" child nodes) as the existing fastrpc driver. The underlying FastRPC protocol and DSP firmware interface are compatible with the existing fastrpc driver, ensuring that DSP firmware and libraries continue to work without modification.
References ==========
Previous discussions on this migration: - https://lkml.org/lkml/2024/6/24/479 - https://lkml.org/lkml/2024/6/21/1252
Testing =======
The driver has been tested on Qualcomm platforms with: - Basic FastRPC attach/release operations - DSP process creation and initialization - Memory mapping/unmapping operations - Dynamic invocation with various buffer types - GEM buffer allocation and mmap - PRIME buffer import from other subsystems
Signed-off-by: Ekansh Gupta ekansh.gupta@oss.qualcomm.com --- Ekansh Gupta (15): MAINTAINERS: Add entry for Qualcomm DSP Accelerator (QDA) driver accel/qda: Add QDA driver documentation accel/qda: Add initial QDA DRM accelerator driver accel/qda: Add compute bus for QDA context banks iommu: Add QDA compute context bank bus to iommu_buses accel/qda: Create compute context bank devices on QDA compute bus accel/qda: Add memory manager for CB devices accel/qda: Add QUERY IOCTL and QDA UAPI header accel/qda: Add DMA-backed GEM objects and memory manager integration accel/qda: Add GEM_CREATE and GEM_MMAP_OFFSET IOCTLs accel/qda: Add PRIME DMA-BUF import support accel/qda: Add FastRPC invocation support accel/qda: Add DSP process creation and release accel/qda: Add remote memory mapping to DSP address space accel/qda: Add remote memory unmap from DSP address space
Documentation/accel/index.rst | 1 + Documentation/accel/qda/index.rst | 13 + Documentation/accel/qda/qda.rst | 146 +++++ MAINTAINERS | 9 + drivers/accel/Kconfig | 1 + drivers/accel/Makefile | 2 + drivers/accel/qda/Kconfig | 34 + drivers/accel/qda/Makefile | 19 + drivers/accel/qda/qda_cb.c | 146 +++++ drivers/accel/qda/qda_cb.h | 32 + drivers/accel/qda/qda_compute_bus.c | 68 ++ drivers/accel/qda/qda_drv.c | 192 ++++++ drivers/accel/qda/qda_drv.h | 91 +++ drivers/accel/qda/qda_fastrpc.c | 1058 ++++++++++++++++++++++++++++++++ drivers/accel/qda/qda_fastrpc.h | 390 ++++++++++++ drivers/accel/qda/qda_gem.c | 177 ++++++ drivers/accel/qda/qda_gem.h | 62 ++ drivers/accel/qda/qda_ioctl.c | 296 +++++++++ drivers/accel/qda/qda_ioctl.h | 19 + drivers/accel/qda/qda_memory_dma.c | 110 ++++ drivers/accel/qda/qda_memory_dma.h | 17 + drivers/accel/qda/qda_memory_manager.c | 380 ++++++++++++ drivers/accel/qda/qda_memory_manager.h | 75 +++ drivers/accel/qda/qda_prime.c | 184 ++++++ drivers/accel/qda/qda_prime.h | 18 + drivers/accel/qda/qda_rpmsg.c | 248 ++++++++ drivers/accel/qda/qda_rpmsg.h | 30 + drivers/iommu/iommu.c | 4 + include/linux/qda_compute_bus.h | 32 + include/uapi/drm/qda_accel.h | 229 +++++++ 30 files changed, 4083 insertions(+) --- base-commit: 80dd246accce631c328ea43294e53b2b2dd2aa32 change-id: 20260519-qda-series-78c2bf0ed78b
Best regards,