In a typical dma-buf use case, a dmabuf exporter makes its buffer
buffer available to an importer by mapping it using DMA APIs
such as dma_map_sgtable() or dma_map_resource(). However, this
is not desirable in some cases where the exporter and importer
are directly connected via a physical or virtual link (or
interconnect) and the importer can access the buffer without
having it DMA mapped.
So, to address this scenario, this patch series adds APIs to map/
unmap dmabufs via interconnects and also provides a helper to
identify the first common interconnect between the exporter and
importer. Furthermore, this patch series also adds support for
IOV interconnect in the vfio-pci driver and Intel Xe driver.
The IOV interconnect is a virtual interconnect between an SRIOV
physical function (PF) and its virtual functions (VFs). And, for
the IOV interconnect, the addresses associated with a buffer are
shared using an xarray (instead of an sg_table) that is populated
with entries of type struct range.
The dma-buf patches in this series are based on ideas/suggestions
provided by Jason Gunthorpe, Christian Koenig and Thomas Hellström.
Changelog:
RFC -> RFCv2:
- Add documentation for the new dma-buf APIs and types (Thomas)
- Change the interconnect type from enum to unique pointer (Thomas)
- Moved the new dma-buf APIs to a separate file
- Store a copy of the interconnect matching data in the attachment
- Simplified the macros to create and match interconnects
- Use struct device instead of struct pci_dev in match data
- Replace DRM_INTERCONNECT_DRIVER with XE_INTERCONNECT_VRAM during
address encoding (Matt, Thomas)
- Drop is_devmem_external and instead rely on bo->dma_data.dma_addr
to check for imported VRAM BOs (Matt)
- Pass XE_PAGE_SIZE as the last parameter to xe_bo_addr (Matt)
- Add a check to prevent malicious VF from accessing other VF's
addresses (Thomas)
- Fallback to legacy (map_dma_buf) mapping method if mapping via
interconnect fails
Patchset overview:
Patch 1-3: Add dma-buf APIs to map/unmap and match
Patch 4: Add support for IOV interconnect in vfio-pci driver
Patch 5: Add support for IOV interconnect in Xe driver
Patch 6-8: Create and use a new dma_addr array for LMEM based
dmabuf BOs to store translated addresses (DPAs)
This series is rebased on top of the following repo:
https://git.kernel.org/pub/scm/linux/kernel/git/leon/linux-rdma.git/log/?h=…
Associated Qemu patch series:
https://lore.kernel.org/qemu-devel/20251003234138.85820-1-vivek.kasireddy@i…
Associated vfio-pci patch series:
https://lore.kernel.org/dri-devel/cover.1760368250.git.leon@kernel.org/
This series is tested using the following method:
- Run Qemu with the following relevant options:
qemu-system-x86_64 -m 4096m ....
-device ioh3420,id=root_port1,bus=pcie.0
-device x3130-upstream,id=upstream1,bus=root_port1
-device xio3130-downstream,id=downstream1,bus=upstream1,chassis=9
-device xio3130-downstream,id=downstream2,bus=upstream1,chassis=10
-device vfio-pci,host=0000:03:00.1,bus=downstream1
-device virtio-gpu,max_outputs=1,blob=true,xres=1920,yres=1080,bus=downstream2
-display gtk,gl=on
-object memory-backend-memfd,id=mem1,size=4096M
-machine q35,accel=kvm,memory-backend=mem1 ...
- Run Gnome Wayland with the following options in the Guest VM:
# cat /usr/lib/udev/rules.d/61-mutter-primary-gpu.rules
ENV{DEVNAME}=="/dev/dri/card1", TAG+="mutter-device-preferred-primary", TAG+="mutter-device-disable-kms-modifiers"
# XDG_SESSION_TYPE=wayland dbus-run-session -- /usr/bin/gnome-shell --wayland --no-x11 &
Cc: Jason Gunthorpe <jgg(a)nvidia.com>
Cc: Leon Romanovsky <leonro(a)nvidia.com>
Cc: Christian Koenig <christian.koenig(a)amd.com>
Cc: Sumit Semwal <sumit.semwal(a)linaro.org>
Cc: Thomas Hellström <thomas.hellstrom(a)linux.intel.com>
Cc: Simona Vetter <simona.vetter(a)ffwll.ch>
Cc: Matthew Brost <matthew.brost(a)intel.com>
Cc: Dongwon Kim <dongwon.kim(a)intel.com>
Vivek Kasireddy (8):
dma-buf: Add support for map/unmap APIs for interconnects
dma-buf: Add a helper to match interconnects between exporter/importer
dma-buf: Create and expose IOV interconnect to all exporters/importers
vfio/pci/dmabuf: Add support for IOV interconnect
drm/xe/dma_buf: Add support for IOV interconnect
drm/xe/pf: Add a helper function to get a VF's backing object in LMEM
drm/xe/bo: Create new dma_addr array for dmabuf BOs associated with
VFs
drm/xe/pt: Add an additional check for dmabuf BOs while doing bind
drivers/dma-buf/Makefile | 2 +-
drivers/dma-buf/dma-buf-interconnect.c | 164 +++++++++++++++++++++
drivers/dma-buf/dma-buf.c | 12 +-
drivers/gpu/drm/xe/xe_bo.c | 162 ++++++++++++++++++--
drivers/gpu/drm/xe/xe_bo_types.h | 6 +
drivers/gpu/drm/xe/xe_dma_buf.c | 17 ++-
drivers/gpu/drm/xe/xe_gt_sriov_pf_config.c | 24 +++
drivers/gpu/drm/xe/xe_gt_sriov_pf_config.h | 1 +
drivers/gpu/drm/xe/xe_pt.c | 8 +
drivers/gpu/drm/xe/xe_sriov_pf_types.h | 19 +++
drivers/vfio/pci/vfio_pci_dmabuf.c | 135 ++++++++++++++++-
include/linux/dma-buf-interconnect.h | 122 +++++++++++++++
include/linux/dma-buf.h | 41 ++++++
13 files changed, 691 insertions(+), 22 deletions(-)
create mode 100644 drivers/dma-buf/dma-buf-interconnect.c
create mode 100644 include/linux/dma-buf-interconnect.h
--
2.50.1
This makes a few changes to the way immediate mode works, and then it
implements a Rust immediate mode GPUVM abstraction on top of that.
Please see the following branch for example usage in Tyr:
https://gitlab.freedesktop.org/panfrost/linux/-/merge_requests/53
For context, please see this previous patch:
https://lore.kernel.org/rust-for-linux/20250621-gpuvm-v3-1-10203da06867@col…
and the commit message of the last patch.
Signed-off-by: Alice Ryhl <aliceryhl(a)google.com>
---
Alice Ryhl (4):
drm/gpuvm: take GEM lock inside drm_gpuvm_bo_obtain_prealloc()
drm/gpuvm: drm_gpuvm_bo_obtain() requires lock and staged mode
drm/gpuvm: use const for drm_gpuva_op_* ptrs
rust: drm: add GPUVM immediate mode abstraction
MAINTAINERS | 1 +
drivers/gpu/drm/drm_gpuvm.c | 80 ++++--
drivers/gpu/drm/imagination/pvr_vm.c | 2 +-
drivers/gpu/drm/msm/msm_gem.h | 2 +-
drivers/gpu/drm/msm/msm_gem_vma.c | 2 +-
drivers/gpu/drm/nouveau/nouveau_uvmm.c | 2 +-
drivers/gpu/drm/panthor/panthor_mmu.c | 10 -
drivers/gpu/drm/xe/xe_vm.c | 4 +-
include/drm/drm_gpuvm.h | 12 +-
rust/bindings/bindings_helper.h | 2 +
rust/helpers/drm_gpuvm.c | 43 +++
rust/helpers/helpers.c | 1 +
rust/kernel/drm/gpuvm/mod.rs | 394 +++++++++++++++++++++++++++
rust/kernel/drm/gpuvm/sm_ops.rs | 469 +++++++++++++++++++++++++++++++++
rust/kernel/drm/gpuvm/va.rs | 148 +++++++++++
rust/kernel/drm/gpuvm/vm_bo.rs | 213 +++++++++++++++
rust/kernel/drm/mod.rs | 1 +
17 files changed, 1337 insertions(+), 49 deletions(-)
---
base-commit: 77b686f688126a5f758b51441a03186e9eb1b0f1
change-id: 20251128-gpuvm-rust-b719cac27ad6
Best regards,
--
Alice Ryhl <aliceryhl(a)google.com>
Changelog:
v6:
* Fixed wrong error check from pcim_p2pdma_init().
* Documented pcim_p2pdma_provider() function.
* Improved commit messages.
* Added VFIO DMA-BUF selftest.
* Added __counted_by(nr_ranges) annotation to struct vfio_device_feature_dma_buf.
* Fixed error unwind when dma_buf_fd() fails.
* Document latest changes to p2pmem.
* Removed EXPORT_SYMBOL_GPL from pci_p2pdma_map_type.
* Moved DMA mapping logic to DMA-BUF.
* Removed types patch to avoid dependencies between subsystems.
* Moved vfio_pci_dma_buf_move() in err_undo block.
* Added nvgrace patch.
v5: https://lore.kernel.org/all/cover.1760368250.git.leon@kernel.org
* Rebased on top of v6.18-rc1.
* Added more validation logic to make sure that DMA-BUF length doesn't
overflow in various scenarios.
* Hide kernel config from the users.
* Fixed type conversion issue. DMA ranges are exposed with u64 length,
but DMA-BUF uses "unsigned int" as a length for SG entries.
* Added check to prevent from VFIO drivers which reports BAR size
different from PCI, do not use DMA-BUF functionality.
v4: https://lore.kernel.org/all/cover.1759070796.git.leon@kernel.org
* Split pcim_p2pdma_provider() to two functions, one that initializes
array of providers and another to return right provider pointer.
v3: https://lore.kernel.org/all/cover.1758804980.git.leon@kernel.org
* Changed pcim_p2pdma_enable() to be pcim_p2pdma_provider().
* Cache provider in vfio_pci_dma_buf struct instead of BAR index.
* Removed misleading comment from pcim_p2pdma_provider().
* Moved MMIO check to be in pcim_p2pdma_provider().
v2: https://lore.kernel.org/all/cover.1757589589.git.leon@kernel.org/
* Added extra patch which adds new CONFIG, so next patches can reuse
* it.
* Squashed "PCI/P2PDMA: Remove redundant bus_offset from map state"
into the other patch.
* Fixed revoke calls to be aligned with true->false semantics.
* Extended p2pdma_providers to be per-BAR and not global to whole
* device.
* Fixed possible race between dmabuf states and revoke.
* Moved revoke to PCI BAR zap block.
v1: https://lore.kernel.org/all/cover.1754311439.git.leon@kernel.org
* Changed commit messages.
* Reused DMA_ATTR_MMIO attribute.
* Returned support for multiple DMA ranges per-dMABUF.
v0: https://lore.kernel.org/all/cover.1753274085.git.leonro@nvidia.com
---------------------------------------------------------------------------
Based on "[PATCH v6 00/16] dma-mapping: migrate to physical address-based API"
https://lore.kernel.org/all/cover.1757423202.git.leonro@nvidia.com/ series.
---------------------------------------------------------------------------
This series extends the VFIO PCI subsystem to support exporting MMIO
regions from PCI device BARs as dma-buf objects, enabling safe sharing of
non-struct page memory with controlled lifetime management. This allows RDMA
and other subsystems to import dma-buf FDs and build them into memory regions
for PCI P2P operations.
The series supports a use case for SPDK where a NVMe device will be
owned by SPDK through VFIO but interacting with a RDMA device. The RDMA
device may directly access the NVMe CMB or directly manipulate the NVMe
device's doorbell using PCI P2P.
However, as a general mechanism, it can support many other scenarios with
VFIO. This dmabuf approach can be usable by iommufd as well for generic
and safe P2P mappings.
In addition to the SPDK use-case mentioned above, the capability added
in this patch series can also be useful when a buffer (located in device
memory such as VRAM) needs to be shared between any two dGPU devices or
instances (assuming one of them is bound to VFIO PCI) as long as they
are P2P DMA compatible.
The implementation provides a revocable attachment mechanism using dma-buf
move operations. MMIO regions are normally pinned as BARs don't change
physical addresses, but access is revoked when the VFIO device is closed
or a PCI reset is issued. This ensures kernel self-defense against
potentially hostile userspace.
The series includes significant refactoring of the PCI P2PDMA subsystem
to separate core P2P functionality from memory allocation features,
making it more modular and suitable for VFIO use cases that don't need
struct page support.
-----------------------------------------------------------------------
The series is based originally on
https://lore.kernel.org/all/20250307052248.405803-1-vivek.kasireddy@intel.c…
but heavily rewritten to be based on DMA physical API.
-----------------------------------------------------------------------
The WIP branch can be found here:
https://git.kernel.org/pub/scm/linux/kernel/git/leon/linux-rdma.git/log/?h=…
Thanks
---
Jason Gunthorpe (2):
PCI/P2PDMA: Document DMABUF model
vfio/nvgrace: Support get_dmabuf_phys
Leon Romanovsky (7):
PCI/P2PDMA: Separate the mmap() support from the core logic
PCI/P2PDMA: Simplify bus address mapping API
PCI/P2PDMA: Refactor to separate core P2P functionality from memory allocation
PCI/P2PDMA: Provide an access to pci_p2pdma_map_type() function
dma-buf: provide phys_vec to scatter-gather mapping routine
vfio/pci: Enable peer-to-peer DMA transactions by default
vfio/pci: Add dma-buf export support for MMIO regions
Vivek Kasireddy (2):
vfio: Export vfio device get and put registration helpers
vfio/pci: Share the core device pointer while invoking feature functions
Documentation/driver-api/pci/p2pdma.rst | 95 +++++++---
block/blk-mq-dma.c | 2 +-
drivers/dma-buf/dma-buf.c | 235 ++++++++++++++++++++++++
drivers/iommu/dma-iommu.c | 4 +-
drivers/pci/p2pdma.c | 182 +++++++++++++-----
drivers/vfio/pci/Kconfig | 3 +
drivers/vfio/pci/Makefile | 1 +
drivers/vfio/pci/nvgrace-gpu/main.c | 56 ++++++
drivers/vfio/pci/vfio_pci.c | 5 +
drivers/vfio/pci/vfio_pci_config.c | 22 ++-
drivers/vfio/pci/vfio_pci_core.c | 56 ++++--
drivers/vfio/pci/vfio_pci_dmabuf.c | 315 ++++++++++++++++++++++++++++++++
drivers/vfio/pci/vfio_pci_priv.h | 23 +++
drivers/vfio/vfio_main.c | 2 +
include/linux/dma-buf.h | 18 ++
include/linux/pci-p2pdma.h | 120 +++++++-----
include/linux/vfio.h | 2 +
include/linux/vfio_pci_core.h | 42 +++++
include/uapi/linux/vfio.h | 27 +++
kernel/dma/direct.c | 4 +-
mm/hmm.c | 2 +-
21 files changed, 1077 insertions(+), 139 deletions(-)
---
base-commit: 3a8660878839faadb4f1a6dd72c3179c1df56787
change-id: 20251016-dmabuf-vfio-6cef732adf5a
Best regards,
--
Leon Romanovsky <leonro(a)nvidia.com>
This series adds AF_XDP zero coppy support to am65-cpsw driver.
Tests were performed on AM62x-sk with xdpsock application [1].
A clear improvement is seen in 64 byte packets on Transmit (txonly)
and receive (rxdrop).
1500 byte test seems to be limited by line rate (1G link) so no
improvement seen there in packet rate. A test on higher speed link
(or PHY-less setup) might be worthwile.
There is some issue during l2fwd with 64 byte packets and benchmark
results show 0. This issue needs to be debugged further.
A 512 byte l2fwd test result has been added to compare instead.
AF_XDP performance using 64 byte packets in Kpps.
Benchmark: XDP-SKB XDP-Native XDP-Native(ZeroCopy)
rxdrop 322 491 845
txonly 390 394 723
l2fwd 205 257 0
AF_XDP performance using 512 byte packets in Kpps.
l2fwd 140 167 231
AF_XDP performance using 1500 byte packets in Kpps.
Benchmark: XDP-SKB XDP-Native XDP-Native(ZeroCopy)
rxdrop 82 82 82
txonly 82 82 82
l2fwd 82 82 82
[1]: https://github.com/xdp-project/bpf-examples/tree/master/AF_XDP-example
Signed-off-by: Roger Quadros <rogerq(a)kernel.org>
---
Changes in v2:
- Prevent crash on systems with 1 of 2 ports disabled in device tree. check
for valid ndev before registering/unregistering XDP RXQ.
Reported-by: Meghana Malladi <m-malladi(a)ti.com>
- Retain page pool on XDP program exchangae so we don't have to re-alloacate
memory.
- Fix clearing of irq_disabled flag in am65_cpsw_nuss_rx_poll().
- Link to v1: https://lore.kernel.org/r/20250520-am65-cpsw-xdp-zc-v1-0-45558024f566@kerne…
---
Roger Quadros (7):
net: ethernet: ti: am65-cpsw: fix BPF Program change on multi-port CPSW
net: ethernet: ti: am65-cpsw: Retain page_pool on XDP program exchange
net: ethernet: ti: am65-cpsw: add XSK pool helpers
net: ethernet: ti: am65-cpsw: Add AF_XDP zero copy for RX
net: ethernet: ti: am65-cpsw: Add AF_XDP zero copy for TX
net: ethernet: ti: am65-cpsw: enable zero copy in XDP features
net: ethernet: ti: am65-cpsw: Fix clearing of irq_disabled flag in rx_poll
drivers/net/ethernet/ti/Makefile | 2 +-
drivers/net/ethernet/ti/am65-cpsw-nuss.c | 583 ++++++++++++++++++++++++++-----
drivers/net/ethernet/ti/am65-cpsw-nuss.h | 37 +-
drivers/net/ethernet/ti/am65-cpsw-xdp.c | 155 ++++++++
4 files changed, 692 insertions(+), 85 deletions(-)
---
base-commit: a0c3aefb08cd81864b17c23c25b388dba90b9dad
change-id: 20250225-am65-cpsw-xdp-zc-2af9e4be1356
Best regards,
--
Roger Quadros <rogerq(a)kernel.org>