> From: Matt Evans <matt(a)ozlabs.org>
> Sent: Wednesday, June 10, 2026 11:43 PM
>
> Since converting BAR mmap()s to using DMABUFs, we lose the original
> device path in /proc/<pid>/maps, lsof, etc. Generate a debug-oriented
> synthetic 'filename' based on the cdev, plus BDF, plus resource index.
>
> This applies only to BAR mappings via the VFIO device fd, as
> explicitly-exported DMABUFs are named by userspace via the
> DMA_BUF_SET_NAME ioctl.
>
> Signed-off-by: Matt Evans <matt(a)ozlabs.org>
Reviewed-by: Kevin Tian <kevin.tian(a)intel.com>
> From: Matt Evans <matt(a)ozlabs.org>
> Sent: Wednesday, June 10, 2026 11:43 PM
>
> Convert the VFIO device fd fops->mmap to create a DMABUF representing
> the BAR mapping, and make the VMA fault handler look up PFNs from the
> corresponding DMABUF. This supports future code mmap()ing BAR
> DMABUFs, and iommufd work to support Type1 P2P.
>
> First, vfio_pci_core_mmap() uses the new
> vfio_pci_core_mmap_prep_dmabuf() helper to export a DMABUF
> representing a single BAR range. Then, the vfio_pci_mmap_huge_fault()
> callback is updated to understand revoked buffers, and uses the new
> vfio_pci_dma_buf_find_pfn() helper to determine the PFN for a given
> fault address.
>
> Now that the VFIO DMABUFs can be mmap()ed, vfio_pci_dma_buf_move()
> zaps PTEs (used on the revocation and cleanup paths).
>
> CONFIG_VFIO_PCI_CORE now unconditionally depends on
> CONFIG_DMA_SHARED_BUFFER and CONFIG_PCI_P2PDMA_CORE. The
> CONFIG_VFIO_PCI_DMABUF feature conditionally includes support for
> VFIO_DEVICE_FEATURE_DMA_BUF, depending on the availability of
> CONFIG_PCI_P2PDMA.
>
> Signed-off-by: Matt Evans <matt(a)ozlabs.org>
Reviewed-by: Kevin Tian <kevin.tian(a)intel.com>
with a nit:
> - vma->vm_private_data = vdev;
> + /*
> + * Create a DMABUF with a single range corresponding to this
> + * mapping, and wire it into vma->vm_private_data. The VMA's
> + * vm_file becomes that of the DMABUF, and the DMABUF takes
> + * ownership of the VFIO device file (put upon DMABUF
> + * release). This maintains the behaviour of a live VMA
> + * mapping holding the VFIO device file open.
> + */
> + ret = vfio_pci_core_mmap_prep_dmabuf(vdev, vma,
> + pci_resource_start(pdev, index),
> + req_len, index);
the comment is redundant as it's about internal logic of the callee
and is well covered by the comment there.
> From: Matt Evans <matt(a)ozlabs.org>
> Sent: Wednesday, June 10, 2026 11:43 PM
>
> This helper, vfio_pci_core_mmap_prep_dmabuf(), creates a single-range
> DMABUF for the purpose of mapping a PCI BAR. This is used in a future
> commit by VFIO's ordinary mmap() path.
>
> This function transfers ownership of the VFIO device fd to the
> DMABUF, which fput()s when it's released.
>
> Refactor the existing vfio_pci_core_feature_dma_buf() to split out
> export code common to the two paths, VFIO_DEVICE_FEATURE_DMA_BUF
> and
> this new VFIO_BAR mmap().
>
> Signed-off-by: Matt Evans <matt(a)ozlabs.org>
Reviewed-by: Kevin Tian <kevin.tian(a)intel.com>
> From: Matt Evans <matt(a)ozlabs.org>
> Sent: Wednesday, June 10, 2026 11:43 PM
>
> +int vfio_pci_dma_buf_find_pfn(struct vfio_pci_dma_buf *priv,
> + struct vm_area_struct *vma,
> + unsigned long address,
> + unsigned int order,
> + unsigned long *out_pfn)
> +{
> + /*
> + * Given a VMA (start, end, pgoffs) and a fault address,
> + * search the corresponding DMABUF's phys_vec[] to find the
> + * range representing the address's offset into the VMA, and
> + * its PFN.
> + *
> + * The phys_vec[] ranges represent contiguous spans of VAs
> + * upwards from the buffer offset 0; the actual PFNs might be
> + * in any order, overlap/alias, etc. Calculate an offset of
> + * the desired page given VMA start/pgoff and address, then
> + * search upwards from 0 to find which span contains it.
> + *
> + * On success, a valid PFN for a page sized by 'order' is
> + * returned into out_pfn.
> + *
> + * Failure occurs if:
> + * - The page would cross the edge of the VMA
> + * - The page isn't entirely contained within a range
> + * - We find a range, but the final PFN isn't aligned to the
> + * requested order.
> + *
> + * (Upon failure, the caller is expected to try again with a
> + * smaller order; the tests above will always succeed for
> + * order=0 as the limit case.)
> + *
> + * It's suboptimal if DMABUFs are created with neigbouring
s/neigbouring/neighboring/
> + * ranges that are physically contiguous, since hugepages
> + * can't straddle range boundaries. (The construction of the
> + * ranges vector should merge such ranges.)
though the field is called 'phys_vec', removing 'vector' in description
is clearer here.
> + *
> + * Finally, vma_pgoff_adjust is used for a DMABUF representing
> + * a VFIO BAR mmap, which is created from the start of the
> + * offset region.
Elaborate it a little bit that the vm_pgoff is already counted in paddr
of phys_vec so it should be skipped when finding the pfn.
> + */
> +
> + const unsigned long pagesize = PAGE_SIZE << order;
> + unsigned long vma_off = ((vma->vm_pgoff - priv->vma_pgoff_adjust)
> <<
> + PAGE_SHIFT) & VFIO_PCI_OFFSET_MASK;
> + unsigned long rounded_page_addr = ALIGN_DOWN(address,
> pagesize);
> + unsigned long rounded_page_end = rounded_page_addr + pagesize;
> + unsigned long page_buf_offset;
> + unsigned long page_buf_offset_end;
what about "fault_offset[_end]"? page_buf is a bit confusing.
> + unsigned long range_buf_offset = 0;
could this be called 'range_start' then the 'range_start' in latter loop
is renamed to 'phys_start'?
Not strong... just feel such naming helps me understand the logic easier
> + unsigned int i;
> +
> + if (rounded_page_addr < vma->vm_start || rounded_page_end >
> vma->vm_end) {
> + if (order > 0)
> + return -EAGAIN;
> +
> + /* A fault address outside of the VMA is absurd. */
> + WARN(1, "Fault addr 0x%lx outside VMA 0x%lx-0x%lx\n",
> + address, vma->vm_start, vma->vm_end);
> + return -EFAULT;
> + }
> +
> + /*
> + * page_buff_offset[_end] is the span of DMABUF offsets
> + * corresponding to the faulting page:
> + */
if the naming is kept then s/page_buff_offset/page_buf_offset/
otherwise,
Reviewed-by: Kevin Tian <kevin.tian(a)intel.com>
Hi all,
This series is based on previous RFCs/discussions:
Tech topic: https://lore.kernel.org/linux-iommu/20250918214425.2677057-1-amastro@fb.com/
RFCv1: https://lore.kernel.org/all/20260226202211.929005-1-mattev@meta.com/
RFCv2: https://lore.kernel.org/kvm/20260312184613.3710705-1-mattev@meta.com/
The background/rationale is covered in more detail in the RFC cover
letters. The TL;DR is:
The goal is to enable userspace driver designs that use VFIO to export
DMABUFs representing subsets of PCI device BARs, and "vend" those
buffers from a primary process to other subordinate processes by fd.
These processes then mmap() the buffers and their access to the device
is isolated to the exported ranges. This is an improvement on sharing
the VFIO device fd to subordinate processes, which would allow
unfettered access.
This is achieved by enabling mmap() of vfio-pci DMABUFs, passed by fd
to subordinate processes. Second, a new ioctl()-based revocation
mechanism is added to allow the primary process to forcibly revoke
access to previously-shared BAR spans, even if the subordinate
processes haven't cleanly exited.
(The related topic of safe delegation of iommufd control to the
subordinate processes is not addressed here, and is follow-up work.)
As well as isolation and revocation, another advantage to accessing a
BAR through a VMA backed by a DMABUF is that it's straightforward to
mmap() the buffer with access attributes, such as write-combining.
Feedback from the RFCs requested that, instead of creating
DMABUF-specific vm_ops and .fault paths, to go the whole way and
migrate the existing VFIO PCI BAR mmap() to be backed by a DMABUF too,
resulting in a common vm_ops and fault handler for mmap()s of both the
VFIO device and explicitly-exported DMABUFs. This will help future
iommufd emulation of VFIO Type1 peer-to-peer, making it easier to get
a DMABUF for a VFIO BAR as a DMA target.
mmap() conversion to use DMABUF underneath has been done for vfio-pci,
but not sub-drivers:
nvgrace-gpu's mmap() override path is unchanged; I kept this out of
scope for now not least because I don't have a thorough test setup
for this system. I would prefer to help the nvgrace-gpu maintainers
enable BAR mmap() DMABUFs themselves.
Notes on patches
================
PCI/P2PDMA: Add CONFIG_PCI_P2PDMA_CORE
Later in the series, vfio-pci's mmap() is going to depend on
pcim_p2pdma_provider() which depended on CONFIG_PCI_P2PDMA, which
in turn depended on ZONE_DEVICE (which isn't available on 32-bit
and some archs, because they lack MEMORY_HOTPLUG and friends).
VFIO does _not_ require actual P2P to be present for basic mmap()
functionality, only for the optional CONFIG_DMA_SHARED_BUFFER
feature.
This splits P2PDMA into a CONFIG_PCI_P2PDMA_CORE (which currently
contains pcim_p2pdma_provider()) and an optional CONFIG_PCI_P2PDMA
(which depends on ZONE_DEVICE etc., and provides P2P
functionality).
vfio/pci: Add a helper to look up PFNs for DMABUFs
vfio/pci: Add a helper to create a DMABUF for a BAR-map VMA
The first is for a DMABUF VMA fault handler to determine
arbitrary-sized PFNs from ranges in DMABUF. Secondly, refactor
DMABUF export for use by the existing export feature and add a new
helper that creates a DMABUF corresponding to a VFIO BAR mmap()
request.
vfio/pci: Convert BAR mmap() to use a DMABUF
The vfio-pci core mmap() creates a DMABUF with the helper, and the
vm_ops fault handler uses the other helper to resolve the fault.
Because this depends on DMABUF structs/code, CONFIG_VFIO_PCI_CORE
needs to depend on CONFIG_DMA_SHARED_BUFFER. The
CONFIG_VFIO_PCI_DMABUF still conditionally enables the export
support code.
NOTE: The user mmap()s a device fd, but the resulting VMA's vm_file
becomes that of the DMABUF which takes ownership of the device and
puts it on release. This maintains the existing behaviour of a VMA
keeping the VFIO device open.
BAR zapping then happens via the existing vfio_pci_dma_buf_move()
path, which now needs to unmap PTEs in the DMABUF's address_space.
vfio/pci: Provide a user-facing name for BAR mappings
There was a request for decent debug naming in /proc/<pid>/maps
etc. comparable to the existing VFIO names: since the VMAs are
DMABUFs, they have a "dmabuf:" prefix and can't be 100% identical
to before. This is a user-visible change, but this patch at least
now gives us extra info on the BDF & BAR being mapped.
vfio/pci: Clean up BAR zap and revocation
In general (see NOTE!) the vfio_pci_zap_bars() is now obsolete,
since it unmaps PTEs in the VFIO device address_space which is now
unused. This consolidates all calls (e.g. around reset) with the
neighbouring vfio_pci_dma_buf_move()s into new functions, to
revoke-zap/unrevoke.
!!! NOTE: the nvgrace-gpu driver continues to use its own private
vm_ops, fault handler, etc. for its special memregions, and these
DO still add PTEs to the VFIO device address_space. So, a
temporary flag, vdev->bar_needs_zap, maintains the old behaviour
for this use. At least this patch's consolidation makes it easy to
remove the remaining zap when this need goes away; a FIXME reminds
that this can be removed when nvgrace-gpu is converted.
vfio/pci: Support mmap() of a VFIO DMABUF
Adds mmap() for a DMABUF fd exported from vfio-pci.
It was a goal to keep the VFIO device fd lifetime behaviour
unchanged with respect to the DMABUFs. An application can close
all device fds, and this will revoke/clean up all DMABUFs; no
mappings or other access can be performed now. When enabling
mmap() of the DMABUFs, this means access through the VMA is also
revoked. This complicates the fault handler because whilst the
DMABUF exists, it has no guarantee that the corresponding VFIO
device is still alive. Adds synchronisation ensuring the vdev is
available before vdev->memory_lock is touched; this holds the
device registration so that even if the buffer has been cleaned up,
vdev hasn't been freed and so the lock can be safely taken.
(I decided against the alternative of preventing cleanup by holding
the VFIO device open if any DMABUFs exist, because it's both a
change of behaviour and less clean overall.)
I've added a chonky comment in place, happy to clarify more if you
have ideas.
This commit makes VFIO_PCI_CORE depend on PCI_P2PDMA_CORE (commit
1) to bring in (only) the P2PDMA provider code.
vfio/pci: Permanently revoke a DMABUF on request
By weight, this is mostly a rename of revoked to an enum, status.
There are now 3 states for a buffer, usable and revoked
temporary/permanent. A new VFIO device ioctl is added,
VFIO_DEVICE_PCI_DMABUF_REVOKE, which passes a DMABUF (exported from
that device) and permanently revokes it. Thus a userspace driver
can guarantee any downstream consumers of a shared fd are prevented
from accessing a BAR range, and that range can be reused.
The code doing revocation in vfio_pci_dma_buf_move() is moved,
unchanged, to a common function for use by _move() and the new
ioctl path.
Q: I can't think of a good reason to temporarily revoke/unrevoke
buffers from userspace, so didn't add a 'flags' field to the ioctl
struct. Easy to add if people think it's worthwhile for future
use.
vfio/pci: Add mmap() attributes to DMABUF feature
Adds a new VFIO feature, VFIO_DEVICE_FEATURE_DMA_BUF_MEMATTR.
After a DMABUF is exported, this feature ioctl() isused to set a
memory attribute that will be used by future mmap()s of the DMABUF
fd (i.e. it does nothing for any existing maps).
The default is UC, and via the feature one can specify CPU access
as WC. The attribute is an enum/scalar rather than
bitmap/cumulative. The attributes follow a "try-fail" model where
a client can request an attribute and either succeed or fail with
ENOTSUPP if it's unknown; if future attributes are
platform-specific then their support can be probed.
(Since it's just UC/WC for now, there is no reservation or numeric
structure to the namespace yet, but we could support
system/arch-specific values in future by carving out base +
arch-specific + IMPDEF ranges.)
Testing
=======
(The [RFC ONLY] userspace test program, for QEMU edu-plus, has been
dropped from the series, but can be found in the GitHub branch below.
It at least illustrates the export, map, revoke, attribute, and close
semantics interoperate.)
This code has been tested in mapping DMABUFs of single/multiple
ranges, aliasing mmap()s, aliasing ranges across DMABUFs, vm_pgoff >
0, revocation, shutdown/cleanup scenarios, and hugepage mappings seem
to work correctly. I've lightly tested WC mappings also (by observing
resulting PTEs as having the correct attributes...). No regressions
observed on the VFIO selftests, or on our internal vfio-pci
applications.
End
===
This is based on VFIO next (e.g. at b9285405c5f6).
These commits are on GitHub for easier browsing, along with
"[RFC ONLY] selftests: vfio: Add standalone vfio_dmabuf_mmap_test":
https://github.com/metamev/linux/compare/b9285405c5f6...metamev:linux:dev/m…
Thanks for reading,
Matt
================================================================================
Change log:
v2:
- Rebase on VFIO next, picking up Alex's
vfio_pci_dma_buf_move()/vfio_pci_dma_buf_cleanup() fixes, and
dropping "vfio/pci: Fix vfio_pci_dma_buf_cleanup() double-put"
- Added "PCI/P2PDMA: Add CONFIG_PCI_P2PDMA_CORE" so that the
newly-added vfio-pci hard dependency on the P2PDMA provider instead
pulls in the _CORE variant and not the full-fat CONFIG_PCI_P2PDMA.
This means that the core of vfio-pci does not need ZONE_DEVICE, but
if it's available then enabling P2PDMA in turn enables DMABUF
export. Fixes basic VFIO operation on 32b or other platforms without
ZONE_DEVICE.
- Fixed comment inaccuracy in vfio_pci_dma_buf_revoke() and cleaned
up vdev validity test.
- vfio_pci_dma_buf_find_pfn(): use PAGE_ALIGN(), better span variable
naming, OVF check
- Made vm_pgoffs use consistent (keeping the resource index at the
top and masking where offset is used). For BAR mmap, use new
vma_pgoff_adjust to create the DMABUF with the exact mmap()ed span
instead of from the start of the BAR with an invisible portion
before the mapping.
- Added VFIO_DEVICE_FEATURE_DMA_BUF_MEMATTR to set memory attributes,
instead of using the export `flags` field.
- vfio_pci_ioctl_reset: Moved vfio_pci_zap_revoke_bars()
(effectively, vfio_pci_dma_buf_move()) back after D0 transition.
Note, if a BAR zap is needed, it's done in this function so now
happens after this D0 transition with the _move; it was done before
it at the time of the memory_lock taking.
- Minimised vfio_pci_dma_buf_mmap() (removed redundant span check),
added READ_ONCE for memattr
- Misc fixes: comment in DMABUF name generation, removed superfluous
READ_ONCE from faulthandler
v1:
https://lore.kernel.org/kvm/20260416131815.2729131-1-mattev@meta.com/
- Cleanup of the common DMABUF-aware VMA vm_ops fault handler and
export code.
- Fixed a lot of races, particularly faults racing with DMABUF
cleanup (if the VFIO device fds close, for example).
- Added nicer human-readable names for VFIO mmap() VMAs
RFCv2: Respin based on the feedback/suggestions:
https://lore.kernel.org/kvm/20260312184613.3710705-1-mattev@meta.com/
- Transform the existing VFIO BAR mmap path to also use DMABUFs
behind the scenes, and then simply share that code for
explicitly-mapped DMABUFs. Jason wanted to go that direction to
enable iommufd VFIO type 1 emulation to pick up a DMABUF for an IO
mapping.
- Revoke buffers using a VFIO device fd ioctl
RFCv1:
https://lore.kernel.org/all/20260226202211.929005-1-mattev@meta.com/
Matt Evans (9):
PCI/P2PDMA: Add CONFIG_PCI_P2PDMA_CORE
vfio/pci: Add a helper to look up PFNs for DMABUFs
vfio/pci: Add a helper to create a DMABUF for a BAR-map VMA
vfio/pci: Convert BAR mmap() to use a DMABUF
vfio/pci: Provide a user-facing name for BAR mappings
vfio/pci: Clean up BAR zap and revocation
vfio/pci: Support mmap() of a VFIO DMABUF
vfio/pci: Permanently revoke a DMABUF on request
vfio/pci: Add mmap() attributes to DMABUF feature
drivers/pci/Kconfig | 10 +-
drivers/pci/Makefile | 2 +-
drivers/pci/p2pdma.c | 16 +
drivers/vfio/pci/Kconfig | 4 +-
drivers/vfio/pci/Makefile | 3 +-
drivers/vfio/pci/nvgrace-gpu/main.c | 5 +
drivers/vfio/pci/vfio_pci_config.c | 30 +-
drivers/vfio/pci/vfio_pci_core.c | 225 +++++++++---
drivers/vfio/pci/vfio_pci_dmabuf.c | 548 ++++++++++++++++++++++++----
drivers/vfio/pci/vfio_pci_priv.h | 57 ++-
include/linux/pci-p2pdma.h | 24 +-
include/linux/pci.h | 2 +-
include/linux/vfio_pci_core.h | 1 +
include/uapi/linux/vfio.h | 57 +++
14 files changed, 815 insertions(+), 169 deletions(-)
--
2.47.3
> From: Matt Evans <matt(a)ozlabs.org>
> Sent: Wednesday, June 10, 2026 11:43 PM
>
[...]
>
> vfio/pci: Support mmap() of a VFIO DMABUF
>
> Adds mmap() for a DMABUF fd exported from vfio-pci.
>
> It was a goal to keep the VFIO device fd lifetime behaviour
> unchanged with respect to the DMABUFs. An application can close
> all device fds, and this will revoke/clean up all DMABUFs; no
> mappings or other access can be performed now. When enabling
> mmap() of the DMABUFs, this means access through the VMA is also
> revoked. This complicates the fault handler because whilst the
> DMABUF exists, it has no guarantee that the corresponding VFIO
> device is still alive. Adds synchronisation ensuring the vdev is
> available before vdev->memory_lock is touched; this holds the
> device registration so that even if the buffer has been cleaned up,
> vdev hasn't been freed and so the lock can be safely taken.
>
> This commit makes VFIO_PCI_CORE depend on PCI_P2PDMA_CORE
> (commit
> 1) to bring in (only) the P2PDMA provider code.
the last sentence is stale as the dependency is now added in patch4.
>
> End
> ===
>
> This is based on VFIO next (e.g. at b9285405c5f6).
>
Sashiko failed to apply this series. Is there dependent work in vfio-next?
otherwise getting a Sashiko review is helpful here.
This patch series introduces the Qualcomm DSP Accelerator (QDA) driver,
a DRM-based accelerator driver for Qualcomm DSPs. The driver provides a
standardized interface for offloading computational tasks to DSPs found
on Qualcomm SoCs, supporting all DSP domains.
The QDA driver implements the FastRPC protocol over the DRM accel
subsystem. It uses the same device-tree node structure as the existing
fastrpc driver in drivers/misc/. The approach for binding the QDA driver
to device-tree nodes while coexisting with the fastrpc driver is an open
item described below.
RFC thread: https://lore.kernel.org/dri-devel/20260224-qda-firstpost-v1-0-fe46a9c1a046@…
User-space staging branch
=========================
https://github.com/qualcomm/fastrpc/tree/accel/staging
Key Features
============
* Standard DRM accelerator interface via /dev/accel/accelN
* GEM-based buffer management with DMA-BUF import/export (PRIME)
* IOMMU-based memory isolation using per-process context banks
* FastRPC protocol implementation for DSP communication
* RPMsg transport layer for reliable message passing
* Support for all DSP domains (ADSP, CDSP, SDSP, GDSP)
* DRM IOCTL interface for DSP session management, buffer allocation,
and remote procedure invocation
Architecture
============
1. DRM Accelerator Framework Integration
The driver registers as a DRM accel device, exposing a standard
/dev/accel/accelN character device node. This provides established
DRM infrastructure for device management, file operations, and
IOCTL dispatch.
2. Memory Management
Buffers are managed as GEM objects with full PRIME support for
DMA-BUF import/export. This enables seamless buffer sharing with
other DRM drivers (GPU, camera, video) using standard kernel
mechanisms.
3. IOMMU Context Bank Management
IOMMU context banks (CBs) are represented as proper struct device
instances on a custom virtual bus (qda-compute-cb). Each CB device
is registered with the IOMMU subsystem and receives its own IOMMU
domain, enabling per-session address space isolation. The custom
bus was introduced because IOMMU context banks are synthetic
constructs — not real platform devices — and to ensure CB device
lifetime is strictly subordinate to the parent QDA device.
See also: https://lore.kernel.org/all/245d602f-3037-4ae3-9af9-d98f37258aae@oss.qualco…
4. Memory Manager Architecture
A pluggable memory manager coordinates IOMMU device assignment and
buffer allocation. The current implementation uses a DMA-coherent
backend with SID-prefixed DMA addresses for DSP firmware
compatibility.
5. Transport Layer
RPMsg communication is handled in a dedicated transport layer
(qda_rpmsg.c), separate from the core DRM driver logic.
6. Code Organization
The driver is organized across multiple files (~4600 lines total):
* qda_drv.c: Core driver and DRM integration
* qda_rpmsg.c: RPMsg transport layer
* qda_cb.c: Context bank device management
* qda_compute_bus.c: Custom virtual bus for CB devices
* qda_gem.c: GEM object management
* qda_prime.c: DMA-BUF import (PRIME)
* qda_memory_manager.c: IOMMU device registry and allocation
* qda_memory_dma.c: DMA-coherent allocation backend
* qda_fastrpc.c: FastRPC protocol implementation
* qda_ioctl.c: IOCTL dispatch
7. UAPI Design
The driver exposes DRM-style IOCTLs defined in
include/uapi/drm/qda_accel.h, following DRM UAPI conventions
(__u32/__u64 types, C++ guard, GPL-2.0-only WITH Linux-syscall-note).
Patch Series Organization
==========================
Patch 01: MAINTAINERS entry
Patch 02: Driver documentation (Documentation/accel/qda/)
Patches 03-04: Core driver skeleton and compute bus
Patch 05: iommu: Register qda-compute-cb bus with IOMMU subsystem
Patches 06-07: CB device enumeration and memory manager
Patch 08: QUERY IOCTL and UAPI header
Patches 09-11: GEM buffer management and PRIME import
Patches 12-15: FastRPC protocol (invoke, session create/release,
map/unmap)
Open Items
===========
1. Device-Tree Compatible String
The QDA driver uses the same device-tree node structure and
properties as the existing fastrpc driver in drivers/misc/. A
mechanism is needed to allow the QDA driver to bind to its device
node independently of the fastrpc driver.
The intended coexistence model is: platforms that require the
complete fastrpc feature set continue to use "qcom,fastrpc"; new
platforms where a feature available only in QDA takes priority, or
where QDA's current feature set is sufficient, use a QDA-specific
compatible string. New feature development is directed toward QDA
rather than the existing fastrpc driver. As QDA matures toward
feature parity with fastrpc, platforms can adopt the QDA-specific
compatible string exclusively.
The options under consideration are:
a) Add a new "qcom,qda" compatible string to the existing
qcom,fastrpc.yaml binding, since the DT node structure and
properties are identical. This avoids a separate binding file
but adds a QDA-specific string to a fastrpc binding.
b) Introduce a separate qcom,qda.yaml binding that references or
inherits the fastrpc binding properties.
Seeking guidance from DT binding maintainers on the preferred
approach.
2. Privilege Level Management
Currently, daemon processes and user processes have the same access
level as both use the same accel device node. This needs to be
addressed as daemons attach to privileged DSP protection domains
and require higher privilege levels for system-level operations.
Seeking guidance on the best approach: separate device nodes,
capability-based checks, or DRM master/authentication mechanisms.
3. UAPI Compatibility Layer
A compatibility layer is needed to facilitate migration of client
applications from the existing FastRPC UAPI to the new QDA UAPI,
ensuring a smooth transition for existing userspace code. Seeking
guidance on the preferred implementation approach: in-kernel
translation layer, userspace wrapper library, or hybrid solution.
An initial evaluation of an in-kernel translation shim was
performed, where legacy FastRPC device nodes (/dev/fastrpc-*) are
exposed and requests are internally routed to the QDA accel driver.
The goal was to keep the compatibility layer minimal, reuse existing
QDA helper paths (attach, buffer allocation, mapping, etc.), and
avoid duplication of GEM and buffer management logic.
However, the following challenges were identified:
a) Dependency on drm_file for QDA helpers
QDA relies on GEM-backed allocations and per-client handle
namespaces, which require a valid struct drm_file. Since GEM
handles are scoped per drm_file, the compatibility layer cannot
directly reuse QDA helper paths without establishing a proper
drm_file context for each client.
b) Lack of public API for drm_file creation
Creating a drm_file directly (similar to mock_drm_getfile()-style
approaches) is not feasible, as the required helpers
(drm_file_alloc(), drm_file_free(), etc.) are internal to the DRM
core and not exported. This prevents external drivers from safely
constructing and managing drm_file instances.
c) VFS-based open is not a viable solution
Opening the underlying accel device (/dev/accel/accelN) from the
compatibility driver via filp_open() does provide a valid
drm_file, but introduces reliance on userspace-visible device
paths, lack of stability in containerized or chroot environments,
and no clean mapping between legacy device nodes and accel
devices.
d) Userspace proxy limitations (CUSE)
A CUSE-based userspace proxy was evaluated. However, DMA-buf file
descriptors passed by legacy applications cannot be directly
reused in the CUSE daemon (file descriptors are process-specific),
which breaks buffer sharing semantics.
e) drm_client-based approaches do not match requirements
drm_client APIs (used for fbdev emulation) rely on a shared
drm_file and do not provide the per-client isolation required by
FastRPC semantics.
Due to the above constraints, it is currently unclear how to
implement an in-kernel compatibility layer that correctly handles
per-client drm_file contexts without relying on VFS paths or
non-exported DRM internals.
4. Documentation Improvements
Add detailed IOCTL usage examples, document DSP firmware interface
requirements, and create a migration guide from the existing FastRPC
driver.
5. Per-Session Memory Allocation
Develop a userspace API to support memory allocation on a per-session
basis, enabling session-specific memory management.
6. Audio and Sensors PD Support
The current series does not handle Audio PD and Sensors PD
functionalities. These specialized protection domains require
additional support for real-time constraints and power management.
Interface Compatibility
========================
The QDA driver uses the same device-tree node structure and child node
layout (including "qcom,fastrpc-compute-cb" child nodes) as the
existing fastrpc driver. The underlying FastRPC protocol and DSP
firmware interface are compatible with the existing fastrpc driver,
ensuring that DSP firmware and libraries continue to work without
modification.
References
==========
Previous discussions on this migration:
- https://lkml.org/lkml/2024/6/24/479
- https://lkml.org/lkml/2024/6/21/1252
Testing
=======
The driver has been tested on Qualcomm platforms with:
- Basic FastRPC attach/release operations
- DSP process creation and initialization
- Memory mapping/unmapping operations
- Dynamic invocation with various buffer types
- GEM buffer allocation and mmap
- PRIME buffer import from other subsystems
Signed-off-by: Ekansh Gupta <ekansh.gupta(a)oss.qualcomm.com>
---
Ekansh Gupta (15):
MAINTAINERS: Add entry for Qualcomm DSP Accelerator (QDA) driver
accel/qda: Add QDA driver documentation
accel/qda: Add initial QDA DRM accelerator driver
accel/qda: Add compute bus for QDA context banks
iommu: Add QDA compute context bank bus to iommu_buses
accel/qda: Create compute context bank devices on QDA compute bus
accel/qda: Add memory manager for CB devices
accel/qda: Add QUERY IOCTL and QDA UAPI header
accel/qda: Add DMA-backed GEM objects and memory manager integration
accel/qda: Add GEM_CREATE and GEM_MMAP_OFFSET IOCTLs
accel/qda: Add PRIME DMA-BUF import support
accel/qda: Add FastRPC invocation support
accel/qda: Add DSP process creation and release
accel/qda: Add remote memory mapping to DSP address space
accel/qda: Add remote memory unmap from DSP address space
Documentation/accel/index.rst | 1 +
Documentation/accel/qda/index.rst | 13 +
Documentation/accel/qda/qda.rst | 146 +++++
MAINTAINERS | 9 +
drivers/accel/Kconfig | 1 +
drivers/accel/Makefile | 2 +
drivers/accel/qda/Kconfig | 34 +
drivers/accel/qda/Makefile | 19 +
drivers/accel/qda/qda_cb.c | 146 +++++
drivers/accel/qda/qda_cb.h | 32 +
drivers/accel/qda/qda_compute_bus.c | 68 ++
drivers/accel/qda/qda_drv.c | 192 ++++++
drivers/accel/qda/qda_drv.h | 91 +++
drivers/accel/qda/qda_fastrpc.c | 1058 ++++++++++++++++++++++++++++++++
drivers/accel/qda/qda_fastrpc.h | 390 ++++++++++++
drivers/accel/qda/qda_gem.c | 177 ++++++
drivers/accel/qda/qda_gem.h | 62 ++
drivers/accel/qda/qda_ioctl.c | 296 +++++++++
drivers/accel/qda/qda_ioctl.h | 19 +
drivers/accel/qda/qda_memory_dma.c | 110 ++++
drivers/accel/qda/qda_memory_dma.h | 17 +
drivers/accel/qda/qda_memory_manager.c | 380 ++++++++++++
drivers/accel/qda/qda_memory_manager.h | 75 +++
drivers/accel/qda/qda_prime.c | 184 ++++++
drivers/accel/qda/qda_prime.h | 18 +
drivers/accel/qda/qda_rpmsg.c | 248 ++++++++
drivers/accel/qda/qda_rpmsg.h | 30 +
drivers/iommu/iommu.c | 4 +
include/linux/qda_compute_bus.h | 32 +
include/uapi/drm/qda_accel.h | 229 +++++++
30 files changed, 4083 insertions(+)
---
base-commit: 80dd246accce631c328ea43294e53b2b2dd2aa32
change-id: 20260519-qda-series-78c2bf0ed78b
Best regards,
--
Ekansh Gupta <ekansh.gupta(a)oss.qualcomm.com>
What Pips Game Is About
Pips is a modern digital puzzle game that focuses on logic, pattern recognition, and quick decision making. The core idea is to complete tasks or solve challenges by placing or matching elements in a structured environment. Each level increases in difficulty and requires more attention to detail. https://pipsly.io/
Core Appeal of Pips
The appeal of Pips comes from its simple start and gradually complex gameplay. Early stages are easy to understand, while later stages demand deeper thinking. This balance makes Pips suitable for both new players and experienced puzzle fans.
How to Play Pips Game
Basic Gameplay Rules
In Pips, the main objective is to interact with puzzle elements and complete specific conditions required by each level. Players must analyze the layout and decide the best moves to achieve the goal in the most efficient way.
Step by Step Progression
The game usually follows a level based system. Each stage presents a new challenge that builds on previous mechanics. Players progress by solving puzzles correctly and unlocking new stages. Precision and observation are key factors for success.
Core Mechanics in Pips Game
Pattern Recognition System
A major mechanic in Pips is identifying patterns. Players must observe repeating structures and predict outcomes based on visual clues. This helps in solving complex puzzles faster and more accurately.
Logic Based Decision Making
Every action in Pips requires logical thinking. Random moves often lead to failure or inefficient solutions. Players are encouraged to evaluate all possible outcomes before making a move.
Progressive Difficulty Design
The difficulty in Pips increases gradually. Early levels focus on teaching basic mechanics, while later levels introduce more complex combinations and constraints. This progression helps maintain engagement over time.
Strategies for Winning in Pips Game
Careful Planning Before Moves
One of the most effective strategies in Pips is planning before acting. Observing the full puzzle layout before making any decision helps avoid mistakes and reduces unnecessary steps.
Focus on Efficiency
Efficiency is important in Pips. Completing levels with fewer moves or faster decisions often leads to better results. Players should aim to find the most direct solution instead of trial and error approaches.
Learning From Repeated Attempts
Failure is part of the learning process in Pips. Each attempt provides new information about puzzle structure. By analyzing previous mistakes, players can improve their future performance.
Common Mistakes in Pips Game
Random Movements Without Strategy
A frequent mistake is making random moves without understanding the puzzle. This often leads to confusion and longer completion time. Structured thinking is more effective.
Ignoring Visual Clues
Pips provides visual hints that guide players toward solutions. Ignoring these clues can make puzzles much harder than intended. Careful observation is essential.
Rushing Through Levels
Speed without accuracy can reduce success in Pips. Rushing often causes errors that require restarting the level. A balanced approach is recommended for better results.
Advanced Tips for Pips Game
Mastering Pattern Prediction
Advanced players improve by predicting patterns before they fully appear. This skill helps in solving higher difficulty levels with greater consistency.
Adapting to New Mechanics Quickly
As new mechanics are introduced, flexibility becomes important. Players who adapt quickly to changes in rules or structure perform better in later stages.
Combining Multiple Strategies
High level gameplay in Pips often requires combining several strategies. Logical analysis, pattern recognition, and efficiency planning must work together for optimal performance.
Why Pips Game Attracts Players
Simple Yet Deep Gameplay
Pips attracts attention because it is easy to start but difficult to master. This combination creates long term interest and encourages repeated play.
Rewarding Puzzle Experience
Each solved level provides a sense of achievement. The satisfaction of completing complex puzzles motivates continued engagement.
Suitable for All Skill Levels
Pips is designed for a wide range of players. Beginners can enjoy early levels, while advanced players can challenge themselves with more complex stages.
Final Thoughts on Pips Game
Long Term Engagement Value
Pips offers long term entertainment through its evolving puzzle structure. Continuous updates or level variations can keep the experience fresh and engaging.
Skill Development Benefits
Playing Pips can help improve logical thinking, pattern recognition, and decision making skills. These benefits extend beyond the game environment.
Overall Experience Summary
Pips stands out as a thoughtful puzzle game that balances simplicity and depth. Its structured design and increasing challenge levels make it appealing for puzzle enthusiasts who enjoy strategic thinking and problem solving.
On Wed, Jun 10, 2026 at 04:43:15PM +0100, Matt Evans wrote:
> The P2PDMA code currently provides two features under the same
> CONFIG_PCI_P2PDMA option:
>
> 1. Locate providers via pcim_p2pdma_provider()
> 2. Manage actual P2P DMA
>
> Some drivers (such as vfio-pci) depend on 1, without having a hard
> dependency on 2.
>
> A future commit expands the use of DMABUF in vfio-pci for non-P2P
> scenarios, relying on pcim_p2pdma_provider() always being present. If
> that depended on CONFIG_PCI_P2PDMA, it would make vfio-pci only
> available if CONFIG_ZONE_DEVICE is present (e.g. 64-bit systems), even
> when P2P is not needed.
>
> To resolve this, introduce CONFIG_PCI_P2PDMA_CORE and refactor the
> basic provider functionality into a new p2pdma_core.c file. This is
> available even if the CONFIG_PCI_P2PDMA feature is disabled (or
> unavailable due to !CONFIG_ZONE_DEVICE). Then, drivers can enable any
> additional P2P features with the original CONFIG_PCI_P2PDMA (available
> when CONFIG_ZONE_DEVICE is set).
>
> Signed-off-by: Matt Evans <matt(a)ozlabs.org>
> ---
> MAINTAINERS | 2 +-
> drivers/pci/Kconfig | 10 ++--
> drivers/pci/Makefile | 1 +
> drivers/pci/p2pdma.c | 109 ++--------------------------------
> drivers/pci/p2pdma.h | 29 +++++++++
> drivers/pci/p2pdma_core.c | 118 +++++++++++++++++++++++++++++++++++++
> include/linux/pci-p2pdma.h | 24 ++++----
> include/linux/pci.h | 2 +-
> 8 files changed, 174 insertions(+), 121 deletions(-)
> create mode 100644 drivers/pci/p2pdma.h
> create mode 100644 drivers/pci/p2pdma_core.c
>
> diff --git a/MAINTAINERS b/MAINTAINERS
> index c2c6d79275c6..b21523b3bd8b 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -20617,7 +20617,7 @@ B: https://bugzilla.kernel.org
> C: irc://irc.oftc.net/linux-pci
> T: git git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci.git
> F: Documentation/driver-api/pci/p2pdma.rst
> -F: drivers/pci/p2pdma.c
> +F: drivers/pci/p2pdma*
> F: include/linux/pci-p2pdma.h
>
> PCI POWER CONTROL
> diff --git a/drivers/pci/Kconfig b/drivers/pci/Kconfig
> index 33c88432b728..59d70bc84cc9 100644
> --- a/drivers/pci/Kconfig
> +++ b/drivers/pci/Kconfig
> @@ -206,11 +206,7 @@ config PCIE_TPH
> config PCI_P2PDMA
> bool "PCI peer-to-peer transfer support"
> depends on ZONE_DEVICE
> - #
> - # The need for the scatterlist DMA bus address flag means PCI P2PDMA
> - # requires 64bit
> - #
> - depends on 64BIT
> + select PCI_P2PDMA_CORE
> select GENERIC_ALLOCATOR
> select NEED_SG_DMA_FLAGS
> help
Nit: Did we drop depends on 64BIT intentionally here? I guess the full
PCI_P2PDMA stack still selects NEED_SG_DMA_FLAGS? IIRC, NEED_SG_DMA_FLAGS
doesn't select 64BIT?
With the nit (and Bjorn's comments addressed)
Reviewed-by: Pranjal Shrivastava <praan(a)google.com>
Thanks,
Praan
Ever found yourself in need of a quick, engaging, and surprisingly challenging game that you can jump into anytime, anywhere? Look no further than the captivating world of Slope Game. This seemingly simple title offers a deceptively deep experience, testing your reflexes and spatial awareness in a way that’s both frustrating and incredibly rewarding. If you've never rolled into its vibrant, geometric landscapes, you're in for a treat!
https://slopegamefree.com/
Gameplay: Master the Art of the Tilt
Playing Slope Game is refreshingly straightforward. Your primary objective is to keep your ball from falling off the edges of the platforms or colliding with red obstacles. The ball is always moving forward, and your only controls are to steer it left or right.
Steering: On a desktop, you'll typically use the A and D keys or the left and right arrow keys to shift the ball's direction. On mobile devices, it’s often a touch-and-drag or tilt-to-steer mechanism. The key is subtlety. Small, precise adjustments are far more effective than wild, sweeping movements, which will often send your ball careening off the edge.
Speed: The ball constantly gains speed, which is where the real challenge lies. As you progress, the platforms become narrower, the gaps wider, and the red obstacles more frequent. What was a gentle curve at the start becomes a frantic maneuver later on.
Obstacles: Keep a keen eye out for the static and sometimes moving red blocks. Touching these spells instant doom. Learning their patterns and anticipating their appearance is crucial for long-term survival.
Tips for Rolling Success
While it may seem like pure luck at first, there are definite strategies you can employ to improve your high score:
Focus on the Path Ahead: Don't just look at where your ball is; always be scanning the screen for what's coming up next. This allows you to plan your turns and avoid sudden surprises.
Small Adjustments are Key: Resist the urge to oversteer. Gentle taps or small nudges are almost always more effective than holding down a direction key, especially as your speed increases.
Embrace the Center: Whenever possible, try to keep your ball centered on the path. This gives you the most leeway for unexpected obstacles or sudden turns.
Learn from Your Fails: Every time you fall, take a moment to understand why. Did you oversteer? Miss an obstacle? Knowing your weaknesses will help you avoid repeating the same mistakes.
Practice Makes Perfect: Like any skill-based game, consistent practice is the best way to improve. Your reflexes will sharpen, and your muscle memory for precise steering will develop over time.
Conclusion: A Simple Game, Endless Fun
Slope Game embodies the beauty of simplicity in game design. It doesn't rely on flashy graphics or complex storylines, but rather on addictive gameplay that challenges your reflexes and keeps you coming back for "just one more run." Whether you have a few minutes to spare or an hour to dedicate, this game offers a rewarding and endlessly replayable experience. So, if you're looking for a fun and engaging way to test your agility, give Slope Game a try – you might just find your new favorite time-waster.