On Tue, Feb 24, 2026 at 12:38:55AM +0530, Ekansh Gupta wrote:
Add initial documentation for the Qualcomm DSP Accelerator (QDA) driver integrated in the DRM accel subsystem.
The new docs introduce QDA as a DRM/accel-based implementation of Hexagon DSP offload that is intended as a modern alternative to the legacy FastRPC driver in drivers/misc. The text describes the driver motivation, high-level architecture and interaction with IOMMU context banks, GEM-based buffer management and the RPMsg transport.
The user-space facing section documents the main QDA IOCTLs used to establish DSP sessions, manage GEM buffer objects and invoke remote procedures using the FastRPC protocol, along with a typical lifecycle example for applications.
Finally, the driver is wired into the Compute Accelerators documentation index under Documentation/accel, and a brief debugging section shows how to enable dynamic debug for the QDA implementation.
Signed-off-by: Ekansh Gupta ekansh.gupta@oss.qualcomm.com
Documentation/accel/index.rst | 1 + Documentation/accel/qda/index.rst | 14 +++++ Documentation/accel/qda/qda.rst | 129 ++++++++++++++++++++++++++++++++++++++ 3 files changed, 144 insertions(+)
diff --git a/Documentation/accel/index.rst b/Documentation/accel/index.rst index cbc7d4c3876a..5901ea7f784c 100644 --- a/Documentation/accel/index.rst +++ b/Documentation/accel/index.rst @@ -10,4 +10,5 @@ Compute Accelerators introduction amdxdna/index qaic/index
- qda/index rocket/index
diff --git a/Documentation/accel/qda/index.rst b/Documentation/accel/qda/index.rst new file mode 100644 index 000000000000..bce188f21117 --- /dev/null +++ b/Documentation/accel/qda/index.rst @@ -0,0 +1,14 @@ +.. SPDX-License-Identifier: GPL-2.0-only
+==============================
- accel/qda Qualcomm DSP Driver
+==============================
+The **accel/qda** driver provides support for Qualcomm Hexagon DSPs (Digital +Signal Processors) within the DRM accelerator framework. It serves as a modern +replacement for the legacy FastRPC driver, offering improved resource management +and standard subsystem integration.
+.. toctree::
- qda
diff --git a/Documentation/accel/qda/qda.rst b/Documentation/accel/qda/qda.rst new file mode 100644 index 000000000000..742159841b95 --- /dev/null +++ b/Documentation/accel/qda/qda.rst @@ -0,0 +1,129 @@ +.. SPDX-License-Identifier: GPL-2.0-only
+================================== +Qualcomm Hexagon DSP (QDA) Driver +==================================
+Introduction +============
+The **QDA** (Qualcomm DSP Accelerator) driver is a new DRM-based +accelerator driver for Qualcomm's Hexagon DSPs. It provides a standardized +interface for user-space applications to offload computational tasks ranging +from audio processing and sensor offload to computer vision and AI +inference to the Hexagon DSPs found on Qualcomm SoCs.
+This driver is designed to align with the Linux kernel's modern **Compute +Accelerators** subsystem (`drivers/accel/`), providing a robust and modular +alternative to the legacy FastRPC driver in `drivers/misc/`, offering +improved resource management and better integration with standard kernel +subsystems.
+Motivation +==========
+The existing FastRPC implementation in the kernel utilizes a custom character +device and lacks integration with modern kernel memory management frameworks. +The QDA driver addresses these limitations by:
+1. **Adopting the DRM accel Framework**: Leveraging standard uAPIs for device
- management, job submission, and synchronization.
+2. **Utilizing GEM for Memory**: Providing proper buffer object management,
- including DMA-BUF import/export capabilities.
+3. **Improving Isolation**: Using IOMMU context banks to enforce memory
- isolation between different DSP user sessions.
+Key Features +============
+* **Standard Accelerator Interface**: Exposes a standard character device
- node (e.g., `/dev/accel/accel0`) via the DRM subsystem.
+* **Unified Offload Support**: Supports all DSP domains (ADSP, CDSP, SDSP,
- GDSP) via a single driver architecture.
+* **FastRPC Protocol**: Implements the reliable Remote Procedure Call
- (FastRPC) protocol for communication between the application processor
- and DSP.
+* **DMA-BUF Interop**: Seamless sharing of memory buffers between the DSP
- and other multimedia subsystems (GPU, Camera, Video) via standard DMA-BUFs.
+* **Modular Design**: Clean separation between the core DRM logic, the memory
- manager, and the RPMsg-based transport layer.
+Architecture +============
+The QDA driver is composed of several modular components:
+1. **Core Driver (`qda_drv`)**: Manages device registration, file operations,
- and bridges the driver with the DRM accelerator subsystem.
+2. **Memory Manager (`qda_memory_manager`)**: A flexible memory management
- layer that handles IOMMU context banks. It supports pluggable backends
- (such as DMA-coherent) to adapt to different SoC memory architectures.
+3. **GEM Subsystem**: Implements the DRM GEM interface for buffer management:
- **`qda_gem`**: Core GEM object management, including allocation, mmap
operations, and buffer lifecycle management.
- **`qda_prime`**: PRIME import functionality for DMA-BUF interoperability,
enabling seamless buffer sharing with other kernel subsystems.+4. **Transport Layer (`qda_rpmsg`)**: Abstraction over the RPMsg framework
- to handle low-level message passing with the DSP firmware.
+5. **Compute Bus (`qda_compute_bus`)**: A custom virtual bus used to
- enumerate and manage the specific compute context banks defined in the
- device tree.
I'm really not sure if it's a bonus or not. I'm waiting for iommu-map improvements to land to send patches reworking FastRPC CB from using probe into being created by the main driver: it would remove some of the possible race conditions between main driver finishing probe and the CB devices probing in the background.
What's the actual benefit of the CB bus?
+6. **FastRPC Core (`qda_fastrpc`)**: Implements the protocol logic for
- marshalling arguments and handling remote invocations.
+User-Space API +==============
+The driver exposes a set of DRM-compliant IOCTLs. Note that these are designed +to be familiar to existing FastRPC users while adhering to DRM standards.
+* `DRM_IOCTL_QDA_QUERY`: Query DSP type (e.g., "cdsp", "adsp")
- and capabilities.
+* `DRM_IOCTL_QDA_INIT_ATTACH`: Attach a user session to the DSP's protection
- domain.
+* `DRM_IOCTL_QDA_INIT_CREATE`: Initialize a new process context on the DSP.
You need to explain the difference between these two.
+* `DRM_IOCTL_QDA_INVOKE`: Submit a remote method invocation (the primary
- execution unit).
+* `DRM_IOCTL_QDA_GEM_CREATE`: Allocate a GEM buffer object for DSP usage. +* `DRM_IOCTL_QDA_GEM_MMAP_OFFSET`: Retrieve mmap offsets for memory mapping. +* `DRM_IOCTL_QDA_MAP` / `DRM_IOCTL_QDA_MUNMAP`: Map or unmap buffers into the
- DSP's virtual address space.
Do we need to make this separate? Can we map/unmap buffers on their usage? Or when they are created? I'm thinking about that the virtualization. An alternative approach would be to merge GET_MMAP_OFFSET with _MAP: once you map it to the DSP memory, you will get the offset.
+Usage Example +=============
+A typical lifecycle for a user-space application:
+1. **Discovery**: Open `/dev/accel/accel*` and check
- `DRM_IOCTL_QDA_QUERY` to find the desired DSP (e.g., CDSP for
- compute workloads).
+2. **Initialization**: Call `DRM_IOCTL_QDA_INIT_ATTACH` and
- `DRM_IOCTL_QDA_INIT_CREATE` to establish a session.
+3. **Memory**: Allocate buffers via `DRM_IOCTL_QDA_GEM_CREATE` or import
- DMA-BUFs (PRIME fd) from other drivers using `DRM_IOCTL_PRIME_FD_TO_HANDLE`.
+4. **Execution**: Use `DRM_IOCTL_QDA_INVOKE` to pass arguments and execute
- functions on the DSP.
+5. **Cleanup**: Close file descriptors to automatically release resources and
- detach the session.
+Internal Implementation +=======================
+Memory Management +----------------- +The driver's memory manager creates virtual "IOMMU devices" that map to +hardware context banks. This allows the driver to manage multiple isolated +address spaces. The implementation currently uses a **DMA-coherent backend** +to ensure data consistency between the CPU and DSP without manual cache +maintenance in most cases.
+Debugging +========= +The driver includes extensive dynamic debug support. Enable it via the +kernel's dynamic debug control:
+.. code-block:: bash
- echo "file drivers/accel/qda/* +p" > /sys/kernel/debug/dynamic_debug/control
Please add documentation on how to build the test apps and how to load them to the DSP.
-- 2.34.1