Linaro-mm-sig October 2025

linaro-mm-sig@lists.linaro.org

49 participants
56 discussions

[RFC v2 0/8] dma-buf: Add support for mapping dmabufs via interconnects

by Vivek Kasireddy

In a typical dma-buf use case, a dmabuf exporter makes its buffer buffer available to an importer by mapping it using DMA APIs such as dma_map_sgtable() or dma_map_resource(). However, this is not desirable in some cases where the exporter and importer are directly connected via a physical or virtual link (or interconnect) and the importer can access the buffer without having it DMA mapped. So, to address this scenario, this patch series adds APIs to map/ unmap dmabufs via interconnects and also provides a helper to identify the first common interconnect between the exporter and importer. Furthermore, this patch series also adds support for IOV interconnect in the vfio-pci driver and Intel Xe driver. The IOV interconnect is a virtual interconnect between an SRIOV physical function (PF) and its virtual functions (VFs). And, for the IOV interconnect, the addresses associated with a buffer are shared using an xarray (instead of an sg_table) that is populated with entries of type struct range. The dma-buf patches in this series are based on ideas/suggestions provided by Jason Gunthorpe, Christian Koenig and Thomas Hellström. Changelog: RFC -> RFCv2: - Add documentation for the new dma-buf APIs and types (Thomas) - Change the interconnect type from enum to unique pointer (Thomas) - Moved the new dma-buf APIs to a separate file - Store a copy of the interconnect matching data in the attachment - Simplified the macros to create and match interconnects - Use struct device instead of struct pci_dev in match data - Replace DRM_INTERCONNECT_DRIVER with XE_INTERCONNECT_VRAM during address encoding (Matt, Thomas) - Drop is_devmem_external and instead rely on bo->dma_data.dma_addr to check for imported VRAM BOs (Matt) - Pass XE_PAGE_SIZE as the last parameter to xe_bo_addr (Matt) - Add a check to prevent malicious VF from accessing other VF's addresses (Thomas) - Fallback to legacy (map_dma_buf) mapping method if mapping via interconnect fails Patchset overview: Patch 1-3: Add dma-buf APIs to map/unmap and match Patch 4: Add support for IOV interconnect in vfio-pci driver Patch 5: Add support for IOV interconnect in Xe driver Patch 6-8: Create and use a new dma_addr array for LMEM based dmabuf BOs to store translated addresses (DPAs) This series is rebased on top of the following repo: https://git.kernel.org/pub/scm/linux/kernel/git/leon/linux-rdma.git/log/?h=… Associated Qemu patch series: https://lore.kernel.org/qemu-devel/20251003234138.85820-1-vivek.kasireddy@i… Associated vfio-pci patch series: https://lore.kernel.org/dri-devel/cover.1760368250.git.leon@kernel.org/ This series is tested using the following method: - Run Qemu with the following relevant options: qemu-system-x86_64 -m 4096m .... -device ioh3420,id=root_port1,bus=pcie.0 -device x3130-upstream,id=upstream1,bus=root_port1 -device xio3130-downstream,id=downstream1,bus=upstream1,chassis=9 -device xio3130-downstream,id=downstream2,bus=upstream1,chassis=10 -device vfio-pci,host=0000:03:00.1,bus=downstream1 -device virtio-gpu,max_outputs=1,blob=true,xres=1920,yres=1080,bus=downstream2 -display gtk,gl=on -object memory-backend-memfd,id=mem1,size=4096M -machine q35,accel=kvm,memory-backend=mem1 ... - Run Gnome Wayland with the following options in the Guest VM: # cat /usr/lib/udev/rules.d/61-mutter-primary-gpu.rules ENV{DEVNAME}=="/dev/dri/card1", TAG+="mutter-device-preferred-primary", TAG+="mutter-device-disable-kms-modifiers" # XDG_SESSION_TYPE=wayland dbus-run-session -- /usr/bin/gnome-shell --wayland --no-x11 & Cc: Jason Gunthorpe <jgg(a)nvidia.com> Cc: Leon Romanovsky <leonro(a)nvidia.com> Cc: Christian Koenig <christian.koenig(a)amd.com> Cc: Sumit Semwal <sumit.semwal(a)linaro.org> Cc: Thomas Hellström <thomas.hellstrom(a)linux.intel.com> Cc: Simona Vetter <simona.vetter(a)ffwll.ch> Cc: Matthew Brost <matthew.brost(a)intel.com> Cc: Dongwon Kim <dongwon.kim(a)intel.com> Vivek Kasireddy (8): dma-buf: Add support for map/unmap APIs for interconnects dma-buf: Add a helper to match interconnects between exporter/importer dma-buf: Create and expose IOV interconnect to all exporters/importers vfio/pci/dmabuf: Add support for IOV interconnect drm/xe/dma_buf: Add support for IOV interconnect drm/xe/pf: Add a helper function to get a VF's backing object in LMEM drm/xe/bo: Create new dma_addr array for dmabuf BOs associated with VFs drm/xe/pt: Add an additional check for dmabuf BOs while doing bind drivers/dma-buf/Makefile | 2 +- drivers/dma-buf/dma-buf-interconnect.c | 164 +++++++++++++++++++++ drivers/dma-buf/dma-buf.c | 12 +- drivers/gpu/drm/xe/xe_bo.c | 162 ++++++++++++++++++-- drivers/gpu/drm/xe/xe_bo_types.h | 6 + drivers/gpu/drm/xe/xe_dma_buf.c | 17 ++- drivers/gpu/drm/xe/xe_gt_sriov_pf_config.c | 24 +++ drivers/gpu/drm/xe/xe_gt_sriov_pf_config.h | 1 + drivers/gpu/drm/xe/xe_pt.c | 8 + drivers/gpu/drm/xe/xe_sriov_pf_types.h | 19 +++ drivers/vfio/pci/vfio_pci_dmabuf.c | 135 ++++++++++++++++- include/linux/dma-buf-interconnect.h | 122 +++++++++++++++ include/linux/dma-buf.h | 41 ++++++ 13 files changed, 691 insertions(+), 22 deletions(-) create mode 100644 drivers/dma-buf/dma-buf-interconnect.c create mode 100644 include/linux/dma-buf-interconnect.h -- 2.50.1

4 months, 1 week

[PATCH v2] dma-buf: system_heap: use larger contiguous mappings instead of per-page mmap

by Barry Song

From: Barry Song <v-songbaohua(a)oppo.com> We can allocate high-order pages, but mapping them one by one is inefficient. This patch changes the code to map as large a chunk as possible. The code looks somewhat complicated mainly because supporting mmap with a non-zero offset is a bit tricky. Using the micro-benchmark below, we see that mmap becomes 3.5X faster: #include <stdio.h> #include <fcntl.h> #include <linux/dma-heap.h> #include <sys/ioctl.h> #include <sys/mman.h> #include <time.h> #include <unistd.h> #include <stdlib.h> #define SIZE (512UL * 1024 * 1024) #define PAGE 4096 #define STRIDE (PAGE/sizeof(int)) #define PAGES (SIZE/PAGE) int main(void) { int heap = open("/dev/dma_heap/system", O_RDONLY); struct dma_heap_allocation_data d = { .len = SIZE, .fd_flags = O_RDWR|O_CLOEXEC }; ioctl(heap, DMA_HEAP_IOCTL_ALLOC, &d); struct timespec t0, t1; clock_gettime(CLOCK_MONOTONIC, &t0); int *p = mmap(NULL, SIZE, PROT_READ|PROT_WRITE, MAP_SHARED, d.fd, 0); clock_gettime(CLOCK_MONOTONIC, &t1); for (int i = 0; i < PAGES; i++) p[i*STRIDE] = i; for (int i = 0; i < PAGES; i++) if (p[i*STRIDE] != i) { fprintf(stderr, "mismatch at page %d\n", i); exit(1); } long ns = (t1.tv_sec-t0.tv_sec)*1000000000L + (t1.tv_nsec-t0.tv_nsec); printf("mmap 512MB took %.3f us, verify OK\n", ns/1000.0); return 0; } W/ patch: ~ # ./a.out mmap 512MB took 200266.000 us, verify OK ~ # ./a.out mmap 512MB took 198151.000 us, verify OK ~ # ./a.out mmap 512MB took 197069.000 us, verify OK ~ # ./a.out mmap 512MB took 196781.000 us, verify OK ~ # ./a.out mmap 512MB took 198102.000 us, verify OK ~ # ./a.out mmap 512MB took 195552.000 us, verify OK W/o patch: ~ # ./a.out mmap 512MB took 6987470.000 us, verify OK ~ # ./a.out mmap 512MB took 6970739.000 us, verify OK ~ # ./a.out mmap 512MB took 6984383.000 us, verify OK ~ # ./a.out mmap 512MB took 6971311.000 us, verify OK ~ # ./a.out mmap 512MB took 6991680.000 us, verify OK Signed-off-by: Barry Song <v-songbaohua(a)oppo.com> Acked-by: John Stultz <jstultz(a)google.com> --- -v2: collect John's ack. thanks! drivers/dma-buf/heaps/system_heap.c | 33 +++++++++++++++++++++-------- 1 file changed, 24 insertions(+), 9 deletions(-) diff --git a/drivers/dma-buf/heaps/system_heap.c b/drivers/dma-buf/heaps/system_heap.c index bbe7881f1360..4c782fe33fd4 100644 --- a/drivers/dma-buf/heaps/system_heap.c +++ b/drivers/dma-buf/heaps/system_heap.c @@ -186,20 +186,35 @@ static int system_heap_mmap(struct dma_buf *dmabuf, struct vm_area_struct *vma) struct system_heap_buffer *buffer = dmabuf->priv; struct sg_table *table = &buffer->sg_table; unsigned long addr = vma->vm_start; - struct sg_page_iter piter; - int ret; + unsigned long pgoff = vma->vm_pgoff; + struct scatterlist *sg; + int i, ret; + + for_each_sgtable_sg(table, sg, i) { + unsigned long n = sg->length >> PAGE_SHIFT; - for_each_sgtable_page(table, &piter, vma->vm_pgoff) { - struct page *page = sg_page_iter_page(&piter); + if (pgoff < n) + break; + pgoff -= n; + } + + for (; sg && addr < vma->vm_end; sg = sg_next(sg)) { + unsigned long n = (sg->length >> PAGE_SHIFT) - pgoff; + struct page *page = sg_page(sg) + pgoff; + unsigned long size = n << PAGE_SHIFT; + + if (addr + size > vma->vm_end) + size = vma->vm_end - addr; - ret = remap_pfn_range(vma, addr, page_to_pfn(page), PAGE_SIZE, - vma->vm_page_prot); + ret = remap_pfn_range(vma, addr, page_to_pfn(page), + size, vma->vm_page_prot); if (ret) return ret; - addr += PAGE_SIZE; - if (addr >= vma->vm_end) - return 0; + + addr += size; + pgoff = 0; } + return 0; } -- 2.39.3 (Apple Git-146)

6 months, 2 weeks

Re: [PATCH] rust: bindings: add `rust_helper_wait_for_completion` helper function

by Boqun Feng

On Thu, Oct 02, 2025 at 10:06:17AM +0000, Guangbo Cui wrote: > > -extern void wait_for_completion(struct completion *); > > -extern void wait_for_completion_io(struct completion *); > > -extern int wait_for_completion_interruptible(struct completion *x); > > -extern int wait_for_completion_killable(struct completion *x); > > -extern int wait_for_completion_state(struct completion *x, unsigned int state); > > -extern unsigned long wait_for_completion_timeout(struct completion *x, > > +extern void __wait_for_completion(struct completion *); > > +extern void __wait_for_completion_io(struct completion *); > > +extern int __wait_for_completion_interruptible(struct completion *x); > > +extern int __wait_for_completion_killable(struct completion *x); > > +extern int __wait_for_completion_state(struct completion *x, unsigned int state); > > +extern unsigned long __wait_for_completion_timeout(struct completion *x, > > unsigned long timeout); > > -extern unsigned long wait_for_completion_io_timeout(struct completion *x, > > +extern unsigned long __wait_for_completion_io_timeout(struct completion *x, > > unsigned long timeout); > > -extern long wait_for_completion_interruptible_timeout( > > +extern long __wait_for_completion_interruptible_timeout( > > struct completion *x, unsigned long timeout); > > -extern long wait_for_completion_killable_timeout( > > +extern long __wait_for_completion_killable_timeout( > > struct completion *x, unsigned long timeout); > > extern bool try_wait_for_completion(struct completion *x); > > extern bool completion_done(struct completion *x); > > @@ -139,4 +134,79 @@ extern void complete(struct completion *); > > extern void complete_on_current_cpu(struct completion *x); > > extern void complete_all(struct completion *); > > > > +#define wait_for_completion(x) \ > > +({ \ > > + sdt_might_sleep_start_timeout(NULL, -1L); \ > > + __wait_for_completion(x); \ > > + sdt_might_sleep_end(); \ > > +}) > > The DEPT patch series changed `wait_for_completion` into a macro. > Because bindgen cannot handle function-like macros, this caused > Rust build errors. Add a helper function to fix it. > > ``` > error[E0425]: cannot find function `wait_for_completion` in crate `bindings` > --> rust/kernel/sync/completion.rs:110:28 > | > 110 | unsafe { bindings::wait_for_completion(self.as_raw()) }; > | ^^^^^^^^^^^^^^^^^^^ help: a function with a similar name exists: `__wait_for_completion` > | > ::: /root/linux/rust/bindings/bindings_generated.rs:33440:5 > | > 33440 | pub fn __wait_for_completion(arg1: *mut completion); > | ---------------------------------------------------- similarly named function `__wait_for_completion` defined here > > error: aborting due to 1 previous error > > For more information about this error, try `rustc --explain E0425`. > ``` > I think Danilo already made it clear, please fold this the existing patch. Moreover, since this patchset doesn't adjust init_completion() from the Rust side, the result is Rust code will also use the same dept key for completion, which has to be fixed if dept wants to be in-tree. Regards, Boqun > Signed-off-by: Guangbo Cui <2407018371(a)qq.com> > --- > rust/helpers/completion.c | 5 +++++ > 1 file changed, 5 insertions(+) > > diff --git a/rust/helpers/completion.c b/rust/helpers/completion.c > index b2443262a2ae..5bae5e749def 100644 > --- a/rust/helpers/completion.c > +++ b/rust/helpers/completion.c > @@ -6,3 +6,8 @@ void rust_helper_init_completion(struct completion *x) > { > init_completion(x); > } > + > +void rust_helper_wait_for_completion(struct completion *x) > +{ > + wait_for_completion(x); > +} > -- > 2.43.0 >

6 months, 2 weeks

[PATCH 0/8] Initial DMABUF support for iommufd

by Jason Gunthorpe

This series is the start of adding full DMABUF support to iommufd. Currently it is limited to only work with VFIO's DMABUF exporter. It sits on top of Leon's series to add a DMABUF exporter to VFIO: https://lore.kernel.org/all/cover.1760368250.git.leon@kernel.org/ The existing IOMMU_IOAS_MAP_FILE is enhanced to detect DMABUF fd's, but otherwise works the same as it does today for a memfd. The user can select a slice of the FD to map into the ioas and if the underliyng alignment requirements are met it will be placed in the iommu_domain. Though limited, it is enough to allow a VMM like QEMU to connect MMIO BAR memory from VFIO to an iommu_domain controlled by iommufd. This is used for PCI Peer to Peer support in VMs, and is the last feature that the VFIO type 1 container has that iommufd couldn't do. The VFIO type1 version extracts raw PFNs from VMAs, which has no lifetime control and is a use-after-free security problem. Instead iommufd relies on revokable DMABUFs. Whenever VFIO thinks there should be no access to the MMIO it can shoot down the mapping in iommufd which will unmap it from the iommu_domain. There is no automatic remap, this is a safety protocol so the kernel doesn't get stuck. Userspace is expected to know it is doing something that will revoke the dmabuf and map/unmap it around the activity. Eg when QEMU goes to issue FLR it should do the map/unmap to iommufd. Since DMABUF is missing some key general features for this use case it relies on a "private interconnect" between VFIO and iommufd via the vfio_pci_dma_buf_iommufd_map() call. The call confirms the DMABUF has revoke semantics and delivers a phys_addr for the memory suitable for use with iommu_map(). Medium term there is a desire to expand the supported DMABUFs to include GPU drivers to support DPDK/SPDK type use cases so future series will work to add a general concept of revoke and a general negotiation of interconnect to remove vfio_pci_dma_buf_iommufd_map(). I also plan another series to modify iommufd's vfio_compat to transparently pull a dmabuf out of a VFIO VMA to emulate more of the uAPI of type1. The latest series for interconnect negotation to exchange a phys_addr is: https://lore.kernel.org/r/20251027044712.1676175-1-vivek.kasireddy@intel.com And the discussion for design of revoke is here: https://lore.kernel.org/dri-devel/20250114173103.GE5556@nvidia.com/ This is on github: https://github.com/jgunthorpe/linux/commits/iommufd_dmabuf The branch has various modifications to Leon's series I've suggested. Jason Gunthorpe (8): iommufd: Add DMABUF to iopt_pages iommufd: Do not map/unmap revoked DMABUFs iommufd: Allow a DMABUF to be revoked iommufd: Allow MMIO pages in a batch iommufd: Have pfn_reader process DMABUF iopt_pages iommufd: Have iopt_map_file_pages convert the fd to a file iommufd: Accept a DMABUF through IOMMU_IOAS_MAP_FILE iommufd/selftest: Add some tests for the dmabuf flow drivers/iommu/iommufd/io_pagetable.c | 74 +++- drivers/iommu/iommufd/io_pagetable.h | 53 ++- drivers/iommu/iommufd/ioas.c | 8 +- drivers/iommu/iommufd/iommufd_private.h | 13 +- drivers/iommu/iommufd/iommufd_test.h | 10 + drivers/iommu/iommufd/main.c | 10 + drivers/iommu/iommufd/pages.c | 407 ++++++++++++++++-- drivers/iommu/iommufd/selftest.c | 142 ++++++ tools/testing/selftests/iommu/iommufd.c | 43 ++ tools/testing/selftests/iommu/iommufd_utils.h | 44 ++ 10 files changed, 741 insertions(+), 63 deletions(-) base-commit: fc882154e421f82677925d33577226e776bb07a4 -- 2.43.0

7 months

[PATCH] i2c: qcom-geni: make sure I2C hub controllers can't use SE DMA

by Neil Armstrong

The I2C Hub controller is a simpler GENI I2C variant that doesn't support DMA at all, add a no_dma flag to make sure it nevers selects the SE DMA mode with mappable 32bytes long transfers. Fixes: cacd9643eca7 ("i2c: qcom-geni: add support for I2C Master Hub variant") Signed-off-by: Neil Armstrong <neil.armstrong(a)linaro.org> --- drivers/i2c/busses/i2c-qcom-geni.c | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-) diff --git a/drivers/i2c/busses/i2c-qcom-geni.c b/drivers/i2c/busses/i2c-qcom-geni.c index 43fdd89b8beb..bfb352b04902 100644 --- a/drivers/i2c/busses/i2c-qcom-geni.c +++ b/drivers/i2c/busses/i2c-qcom-geni.c @@ -97,6 +97,7 @@ struct geni_i2c_dev { dma_addr_t dma_addr; struct dma_chan *tx_c; struct dma_chan *rx_c; + bool no_dma; bool gpi_mode; bool abort_done; }; @@ -425,7 +426,7 @@ static int geni_i2c_rx_one_msg(struct geni_i2c_dev *gi2c, struct i2c_msg *msg, size_t len = msg->len; struct i2c_msg *cur; - dma_buf = i2c_get_dma_safe_msg_buf(msg, 32); + dma_buf = gi2c->no_dma ? NULL : i2c_get_dma_safe_msg_buf(msg, 32); if (dma_buf) geni_se_select_mode(se, GENI_SE_DMA); else @@ -464,7 +465,7 @@ static int geni_i2c_tx_one_msg(struct geni_i2c_dev *gi2c, struct i2c_msg *msg, size_t len = msg->len; struct i2c_msg *cur; - dma_buf = i2c_get_dma_safe_msg_buf(msg, 32); + dma_buf = gi2c->no_dma ? NULL : i2c_get_dma_safe_msg_buf(msg, 32); if (dma_buf) geni_se_select_mode(se, GENI_SE_DMA); else @@ -880,10 +881,12 @@ static int geni_i2c_probe(struct platform_device *pdev) goto err_resources; } - if (desc && desc->no_dma_support) + if (desc && desc->no_dma_support) { fifo_disable = false; - else + gi2c->no_dma = true; + } else { fifo_disable = readl_relaxed(gi2c->se.base + GENI_IF_DISABLE_RO) & FIFO_IF_DISABLE; + } if (fifo_disable) { /* FIFO is disabled, so we can only use GPI DMA */ --- base-commit: dcb6fa37fd7bc9c3d2b066329b0d27dedf8becaa change-id: 20251029-topic-sm8x50-geni-i2c-hub-no-dma-8812576a65cb Best regards, -- Neil Armstrong <neil.armstrong(a)linaro.org>

7 months

[PATCH net-next v4 0/6] Add AF_XDP zero copy support

by Meghana Malladi

This series adds AF_XDP zero coppy support to icssg driver. Tests were performed on AM64x-EVM with xdpsock application [1]. A clear improvement is seen Transmit (txonly) and receive (rxdrop) for 64 byte packets. 1500 byte test seems to be limited by line rate (1G link) so no improvement seen there in packet rate Having some issue with l2fwd as the benchmarking numbers show 0 for 64 byte packets after forwading first batch packets and I am currently looking into it. AF_XDP performance using 64 byte packets in Kpps. Benchmark: XDP-SKB XDP-Native XDP-Native(ZeroCopy) rxdrop 259 462 645 txonly 350 354 760 l2fwd 178 240 0 AF_XDP performance using 1500 byte packets in Kpps. Benchmark: XDP-SKB XDP-Native XDP-Native(ZeroCopy) rxdrop 82 82 82 txonly 81 82 82 l2fwd 81 82 82 [1]: https://github.com/xdp-project/bpf-examples/tree/master/AF_XDP-example v3: https://lore.kernel.org/all/20251014105613.2808674-1-m-malladi@ti.com/ v4-v3: - Rebased to the latest tip Meghana Malladi (6): net: ti: icssg-prueth: Add functions to create and destroy Rx/Tx queues net: ti: icssg-prueth: Add XSK pool helpers net: ti: icssg-prueth: Add AF_XDP zero copy for TX net: ti: icssg-prueth: Make emac_run_xdp function independent of page net: ti: icssg-prueth: Add AF_XDP zero copy for RX net: ti: icssg-prueth: Enable zero copy in XDP features drivers/net/ethernet/ti/icssg/icssg_common.c | 471 ++++++++++++++++--- drivers/net/ethernet/ti/icssg/icssg_prueth.c | 394 +++++++++++++--- drivers/net/ethernet/ti/icssg/icssg_prueth.h | 25 +- 3 files changed, 741 insertions(+), 149 deletions(-) base-commit: d550d63d0082268a31e93a10c64cbc2476b98b24 -- 2.43.0

7 months

[PATCH v2] drm/sched: Fix deadlock in drm_sched_entity_kill_jobs_cb

by Pierre-Eric Pelloux-Prayer

The Mesa issue referenced below pointed out a possible deadlock: [ 1231.611031] Possible interrupt unsafe locking scenario: [ 1231.611033] CPU0 CPU1 [ 1231.611034] ---- ---- [ 1231.611035] lock(&xa->xa_lock#17); [ 1231.611038] local_irq_disable(); [ 1231.611039] lock(&fence->lock); [ 1231.611041] lock(&xa->xa_lock#17); [ 1231.611044] <Interrupt> [ 1231.611045] lock(&fence->lock); [ 1231.611047] *** DEADLOCK *** In this example, CPU0 would be any function accessing job->dependencies through the xa_* functions that doesn't disable interrupts (eg: drm_sched_job_add_dependency, drm_sched_entity_kill_jobs_cb). CPU1 is executing drm_sched_entity_kill_jobs_cb as a fence signalling callback so in an interrupt context. It will deadlock when trying to grab the xa_lock which is already held by CPU0. Replacing all xa_* usage by their xa_*_irq counterparts would fix this issue, but Christian pointed out another issue: dma_fence_signal takes fence.lock and so does dma_fence_add_callback. dma_fence_signal() // locks f1.lock -> drm_sched_entity_kill_jobs_cb() -> foreach dependencies -> dma_fence_add_callback() // locks f2.lock This will deadlock if f1 and f2 share the same spinlock. To fix both issues, the code iterating on dependencies and re-arming them is moved out to drm_sched_entity_kill_jobs_work. Link: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13908 Reported-by: Mikhail Gavrilov <mikhail.v.gavrilov(a)gmail.com> Suggested-by: Christian König <christian.koenig(a)amd.com> Reviewed-by: Christian König <christian.koenig(a)amd.com> Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer(a)amd.com> --- drivers/gpu/drm/scheduler/sched_entity.c | 34 +++++++++++++----------- 1 file changed, 19 insertions(+), 15 deletions(-) diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c index c8e949f4a568..fe174a4857be 100644 --- a/drivers/gpu/drm/scheduler/sched_entity.c +++ b/drivers/gpu/drm/scheduler/sched_entity.c @@ -173,26 +173,15 @@ int drm_sched_entity_error(struct drm_sched_entity *entity) } EXPORT_SYMBOL(drm_sched_entity_error); +static void drm_sched_entity_kill_jobs_cb(struct dma_fence *f, + struct dma_fence_cb *cb); + static void drm_sched_entity_kill_jobs_work(struct work_struct *wrk) { struct drm_sched_job *job = container_of(wrk, typeof(*job), work); - - drm_sched_fence_scheduled(job->s_fence, NULL); - drm_sched_fence_finished(job->s_fence, -ESRCH); - WARN_ON(job->s_fence->parent); - job->sched->ops->free_job(job); -} - -/* Signal the scheduler finished fence when the entity in question is killed. */ -static void drm_sched_entity_kill_jobs_cb(struct dma_fence *f, - struct dma_fence_cb *cb) -{ - struct drm_sched_job *job = container_of(cb, struct drm_sched_job, - finish_cb); + struct dma_fence *f; unsigned long index; - dma_fence_put(f); - /* Wait for all dependencies to avoid data corruptions */ xa_for_each(&job->dependencies, index, f) { struct drm_sched_fence *s_fence = to_drm_sched_fence(f); @@ -220,6 +209,21 @@ static void drm_sched_entity_kill_jobs_cb(struct dma_fence *f, dma_fence_put(f); } + drm_sched_fence_scheduled(job->s_fence, NULL); + drm_sched_fence_finished(job->s_fence, -ESRCH); + WARN_ON(job->s_fence->parent); + job->sched->ops->free_job(job); +} + +/* Signal the scheduler finished fence when the entity in question is killed. */ +static void drm_sched_entity_kill_jobs_cb(struct dma_fence *f, + struct dma_fence_cb *cb) +{ + struct drm_sched_job *job = container_of(cb, struct drm_sched_job, + finish_cb); + + dma_fence_put(f); + INIT_WORK(&job->work, drm_sched_entity_kill_jobs_work); schedule_work(&job->work); } -- 2.43.0

7 months

[PATCH v5 0/9] vfio/pci: Allow MMIO regions to be exported through dma-buf

by Leon Romanovsky

Changelog: v5: * Rebased on top of v6.18-rc1. * Added more validation logic to make sure that DMA-BUF length doesn't overflow in various scenarios. * Hide kernel config from the users. * Fixed type conversion issue. DMA ranges are exposed with u64 length, but DMA-BUF uses "unsigned int" as a length for SG entries. * Added check to prevent from VFIO drivers which reports BAR size different from PCI, do not use DMA-BUF functionality. v4: https://lore.kernel.org/all/cover.1759070796.git.leon@kernel.org * Split pcim_p2pdma_provider() to two functions, one that initializes array of providers and another to return right provider pointer. v3: https://lore.kernel.org/all/cover.1758804980.git.leon@kernel.org * Changed pcim_p2pdma_enable() to be pcim_p2pdma_provider(). * Cache provider in vfio_pci_dma_buf struct instead of BAR index. * Removed misleading comment from pcim_p2pdma_provider(). * Moved MMIO check to be in pcim_p2pdma_provider(). v2: https://lore.kernel.org/all/cover.1757589589.git.leon@kernel.org/ * Added extra patch which adds new CONFIG, so next patches can reuse * it. * Squashed "PCI/P2PDMA: Remove redundant bus_offset from map state" into the other patch. * Fixed revoke calls to be aligned with true->false semantics. * Extended p2pdma_providers to be per-BAR and not global to whole * device. * Fixed possible race between dmabuf states and revoke. * Moved revoke to PCI BAR zap block. v1: https://lore.kernel.org/all/cover.1754311439.git.leon@kernel.org * Changed commit messages. * Reused DMA_ATTR_MMIO attribute. * Returned support for multiple DMA ranges per-dMABUF. v0: https://lore.kernel.org/all/cover.1753274085.git.leonro@nvidia.com --------------------------------------------------------------------------- Based on "[PATCH v6 00/16] dma-mapping: migrate to physical address-based API" https://lore.kernel.org/all/cover.1757423202.git.leonro@nvidia.com/ series. --------------------------------------------------------------------------- This series extends the VFIO PCI subsystem to support exporting MMIO regions from PCI device BARs as dma-buf objects, enabling safe sharing of non-struct page memory with controlled lifetime management. This allows RDMA and other subsystems to import dma-buf FDs and build them into memory regions for PCI P2P operations. The series supports a use case for SPDK where a NVMe device will be owned by SPDK through VFIO but interacting with a RDMA device. The RDMA device may directly access the NVMe CMB or directly manipulate the NVMe device's doorbell using PCI P2P. However, as a general mechanism, it can support many other scenarios with VFIO. This dmabuf approach can be usable by iommufd as well for generic and safe P2P mappings. In addition to the SPDK use-case mentioned above, the capability added in this patch series can also be useful when a buffer (located in device memory such as VRAM) needs to be shared between any two dGPU devices or instances (assuming one of them is bound to VFIO PCI) as long as they are P2P DMA compatible. The implementation provides a revocable attachment mechanism using dma-buf move operations. MMIO regions are normally pinned as BARs don't change physical addresses, but access is revoked when the VFIO device is closed or a PCI reset is issued. This ensures kernel self-defense against potentially hostile userspace. The series includes significant refactoring of the PCI P2PDMA subsystem to separate core P2P functionality from memory allocation features, making it more modular and suitable for VFIO use cases that don't need struct page support. ----------------------------------------------------------------------- The series is based originally on https://lore.kernel.org/all/20250307052248.405803-1-vivek.kasireddy@intel.c… but heavily rewritten to be based on DMA physical API. ----------------------------------------------------------------------- The WIP branch can be found here: https://git.kernel.org/pub/scm/linux/kernel/git/leon/linux-rdma.git/log/?h=… Thanks Leon Romanovsky (7): PCI/P2PDMA: Separate the mmap() support from the core logic PCI/P2PDMA: Simplify bus address mapping API PCI/P2PDMA: Refactor to separate core P2P functionality from memory allocation PCI/P2PDMA: Export pci_p2pdma_map_type() function types: move phys_vec definition to common header vfio/pci: Enable peer-to-peer DMA transactions by default vfio/pci: Add dma-buf export support for MMIO regions Vivek Kasireddy (2): vfio: Export vfio device get and put registration helpers vfio/pci: Share the core device pointer while invoking feature functions block/blk-mq-dma.c | 7 +- drivers/iommu/dma-iommu.c | 4 +- drivers/pci/p2pdma.c | 175 ++++++++--- drivers/vfio/pci/Kconfig | 3 + drivers/vfio/pci/Makefile | 2 + drivers/vfio/pci/vfio_pci_config.c | 22 +- drivers/vfio/pci/vfio_pci_core.c | 63 ++-- drivers/vfio/pci/vfio_pci_dmabuf.c | 446 +++++++++++++++++++++++++++++ drivers/vfio/pci/vfio_pci_priv.h | 23 ++ drivers/vfio/vfio_main.c | 2 + include/linux/pci-p2pdma.h | 120 +++++--- include/linux/types.h | 5 + include/linux/vfio.h | 2 + include/linux/vfio_pci_core.h | 1 + include/uapi/linux/vfio.h | 25 ++ kernel/dma/direct.c | 4 +- mm/hmm.c | 2 +- 17 files changed, 785 insertions(+), 121 deletions(-) create mode 100644 drivers/vfio/pci/vfio_pci_dmabuf.c -- 2.51.0

7 months

[PATCH v1] drm/sched: fix deadlock in drm_sched_entity_kill_jobs_cb

by Pierre-Eric Pelloux-Prayer

https://gitlab.freedesktop.org/mesa/mesa/-/issues/13908 pointed out a possible deadlock: [ 1231.611031] Possible interrupt unsafe locking scenario: [ 1231.611033] CPU0 CPU1 [ 1231.611034] ---- ---- [ 1231.611035] lock(&xa->xa_lock#17); [ 1231.611038] local_irq_disable(); [ 1231.611039] lock(&fence->lock); [ 1231.611041] lock(&xa->xa_lock#17); [ 1231.611044] <Interrupt> [ 1231.611045] lock(&fence->lock); [ 1231.611047] *** DEADLOCK *** My initial fix was to replace xa_erase by xa_erase_irq, but Christian pointed out that calling dma_fence_add_callback from a callback can also deadlock if the signalling fence and the one passed to dma_fence_add_callback share the same lock. To fix both issues, the code iterating on dependencies and re-arming them is moved out to drm_sched_entity_kill_jobs_work. Suggested-by: Christian König <christian.koenig(a)amd.com> Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer(a)amd.com> --- drivers/gpu/drm/scheduler/sched_entity.c | 34 +++++++++++++----------- 1 file changed, 19 insertions(+), 15 deletions(-) diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c index c8e949f4a568..fe174a4857be 100644 --- a/drivers/gpu/drm/scheduler/sched_entity.c +++ b/drivers/gpu/drm/scheduler/sched_entity.c @@ -173,26 +173,15 @@ int drm_sched_entity_error(struct drm_sched_entity *entity) } EXPORT_SYMBOL(drm_sched_entity_error); +static void drm_sched_entity_kill_jobs_cb(struct dma_fence *f, + struct dma_fence_cb *cb); + static void drm_sched_entity_kill_jobs_work(struct work_struct *wrk) { struct drm_sched_job *job = container_of(wrk, typeof(*job), work); - - drm_sched_fence_scheduled(job->s_fence, NULL); - drm_sched_fence_finished(job->s_fence, -ESRCH); - WARN_ON(job->s_fence->parent); - job->sched->ops->free_job(job); -} - -/* Signal the scheduler finished fence when the entity in question is killed. */ -static void drm_sched_entity_kill_jobs_cb(struct dma_fence *f, - struct dma_fence_cb *cb) -{ - struct drm_sched_job *job = container_of(cb, struct drm_sched_job, - finish_cb); + struct dma_fence *f; unsigned long index; - dma_fence_put(f); - /* Wait for all dependencies to avoid data corruptions */ xa_for_each(&job->dependencies, index, f) { struct drm_sched_fence *s_fence = to_drm_sched_fence(f); @@ -220,6 +209,21 @@ static void drm_sched_entity_kill_jobs_cb(struct dma_fence *f, dma_fence_put(f); } + drm_sched_fence_scheduled(job->s_fence, NULL); + drm_sched_fence_finished(job->s_fence, -ESRCH); + WARN_ON(job->s_fence->parent); + job->sched->ops->free_job(job); +} + +/* Signal the scheduler finished fence when the entity in question is killed. */ +static void drm_sched_entity_kill_jobs_cb(struct dma_fence *f, + struct dma_fence_cb *cb) +{ + struct drm_sched_job *job = container_of(cb, struct drm_sched_job, + finish_cb); + + dma_fence_put(f); + INIT_WORK(&job->work, drm_sched_entity_kill_jobs_work); schedule_work(&job->work); } -- 2.43.0

7 months

Re: [PATCH v5 9/9] vfio/pci: Add dma-buf export support for MMIO regions

by Leon Romanovsky

On Wed, Oct 29, 2025, at 18:50, Alex Mastro wrote: > On Mon, Oct 13, 2025 at 06:26:11PM +0300, Leon Romanovsky wrote: >> + /* >> + * dma_buf_fd() consumes the reference, when the file closes the dmabuf >> + * will be released. >> + */ >> + return dma_buf_fd(priv->dmabuf, get_dma_buf.open_flags); > > I think this still needs to unwind state on fd allocation error. Reference > ownership is only transferred on success. Yes, you are correct, i need to call to dma_buf_put() in case of error. I will fix. Thanks > >> + >> +err_dev_put: >> + vfio_device_put_registration(&vdev->vdev); >> +err_free_phys: >> + kfree(priv->phys_vec); >> +err_free_priv: >> + kfree(priv); >> +err_free_ranges: >> + kfree(dma_ranges); >> + return ret; >> +}

7 months, 1 week

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

Linaro-mm-sig October 2025