- Linaro-mm-sig - lists.linaro.org

Re: [PATCH v9 10/11] vfio/pci: Add dma-buf export support for MMIO regions

by Jason Gunthorpe

On Thu, Nov 20, 2025 at 05:04:13PM -0700, Alex Williamson wrote: > @@ -2501,7 +2501,7 @@ static int vfio_pci_dev_set_hot_reset(struct vfio_device_set *dev_set, > err_undo: > list_for_each_entry_from_reverse(vdev, &dev_set->device_list, > vdev.dev_set_list) { > - if (__vfio_pci_memory_enabled(vdev)) > + if (vdev->vdev.open_count && __vfio_pci_memory_enabled(vdev)) > vfio_pci_dma_buf_move(vdev, false); > up_write(&vdev->memory_lock); > } > > Any other suggestions? This should be the only reset path with this > nuance of affecting non-opened devices. Thanks, Seems reasonable, but should it be in __vfio_pci_memory_enabled() just to be robust? Jason

2 months, 3 weeks

1
0
0 0

Re: [PATCH] drm/xe: Fix memory leak when handling pagefault vma

by Thomas Hellström

On Thu, 2025-11-20 at 18:14 +0200, Mika Kuoppala wrote: > When the pagefault handling code was moved to a new file, an extra > drm_exec_init() was added to the VMA path. This call is unnecessary > because > xe_validation_ctx_init() already performs a drm_exec_init(), > resulting in a > memory leak reported by kmemleak. > > Remove the redundant drm_exec_init() from the VMA pagefault handling > code. > > Fixes: fb544b844508 ("drm/xe: Implement xe_pagefault_queue_work") > Cc: Matthew Brost <matthew.brost(a)intel.com> > Cc: Stuart Summers <stuart.summers(a)intel.com> > Cc: Lucas De Marchi <lucas.demarchi(a)intel.com> > Cc: "Thomas Hellström" <thomas.hellstrom(a)linux.intel.com> > Cc: Rodrigo Vivi <rodrigo.vivi(a)intel.com> > Cc: Sumit Semwal <sumit.semwal(a)linaro.org> > Cc: "Christian König" <christian.koenig(a)amd.com> > Cc: intel-xe(a)lists.freedesktop.org > Cc: linux-media(a)vger.kernel.org > Cc: dri-devel(a)lists.freedesktop.org > Cc: linaro-mm-sig(a)lists.linaro.org > Signed-off-by: Mika Kuoppala <mika.kuoppala(a)linux.intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom(a)linux.intel.com> > --- > drivers/gpu/drm/xe/xe_pagefault.c | 1 - > 1 file changed, 1 deletion(-) > > diff --git a/drivers/gpu/drm/xe/xe_pagefault.c > b/drivers/gpu/drm/xe/xe_pagefault.c > index fe3e40145012..afb06598b6e1 100644 > --- a/drivers/gpu/drm/xe/xe_pagefault.c > +++ b/drivers/gpu/drm/xe/xe_pagefault.c > @@ -102,7 +102,6 @@ static int xe_pagefault_handle_vma(struct xe_gt > *gt, struct xe_vma *vma, > > /* Lock VM and BOs dma-resv */ > xe_validation_ctx_init(&ctx, &vm->xe->val, &exec, (struct > xe_val_flags) {}); > - drm_exec_init(&exec, 0, 0); > drm_exec_until_all_locked(&exec) { > err = xe_pagefault_begin(&exec, vma, tile->mem.vram, > needs_vram == 1);

2 months, 3 weeks

1
0
0 0

Re: [PATCH 5/9] iommufd: Allow MMIO pages in a batch

by Jason Gunthorpe

On Thu, Nov 20, 2025 at 07:59:19AM +0000, Tian, Kevin wrote: > > From: Jason Gunthorpe <jgg(a)nvidia.com> > > Sent: Saturday, November 8, 2025 12:50 AM > > > > +enum batch_kind { > > + BATCH_CPU_MEMORY = 0, > > + BATCH_MMIO, > > +}; > > with 'CPU_MEMORY' (instead of plain 'MEMORY') implies future > support of 'DEV_MEMORY'? Maybe, but I don't have an immediate thought on this. CXL "MMIO" that is cachable is a thing but we can also label it as CPU_MEMORY. We might have something for CC shared/protected memory down the road. Thanks, Jason

2 months, 3 weeks

1
0
0 0

Re: [PATCH net-next v6 0/6] Add AF_XDP zero copy support

by patchwork-bot+netdevbpf＠kernel.org

Hello: This series was applied to netdev/net-next.git (main) by Paolo Abeni <pabeni(a)redhat.com>: On Tue, 18 Nov 2025 19:25:36 +0530 you wrote: > This series adds AF_XDP zero coppy support to icssg driver. > > Tests were performed on AM64x-EVM with xdpsock application [1]. > > A clear improvement is seen Transmit (txonly) and receive (rxdrop) > for 64 byte packets. 1500 byte test seems to be limited by line > rate (1G link) so no improvement seen there in packet rate > > [...] Here is the summary with links: - [net-next,v6,1/6] net: ti: icssg-prueth: Add functions to create and destroy Rx/Tx queues https://git.kernel.org/netdev/net-next/c/41dde7f1d013 - [net-next,v6,2/6] net: ti: icssg-prueth: Add XSK pool helpers https://git.kernel.org/netdev/net-next/c/7dfd7597911f - [net-next,v6,3/6] net: ti: icssg-prueth: Add AF_XDP zero copy for TX https://git.kernel.org/netdev/net-next/c/8756ef2eb078 - [net-next,v6,4/6] net: ti: icssg-prueth: Make emac_run_xdp function independent of page https://git.kernel.org/netdev/net-next/c/121133163c9f - [net-next,v6,5/6] net: ti: icssg-prueth: Add AF_XDP zero copy for RX https://git.kernel.org/netdev/net-next/c/7a64bb388df3 - [net-next,v6,6/6] net: ti: icssg-prueth: Enable zero copy in XDP features https://git.kernel.org/netdev/net-next/c/c6a1ec1870e6 You are awesome, thank you! -- Deet-doot-dot, I am a bot. https://korg.docs.kernel.org/patchwork/pwbot.html

2 months, 3 weeks

1
0
0 0

Re: [PATCH 02/18] dma-buf: protected fence ops by RCU v3

by Christian König

On 11/18/25 17:03, Tvrtko Ursulin wrote: >>>> @@ -448,13 +465,19 @@ dma_fence_is_signaled_locked(struct dma_fence *fence) >>>> static inline bool >>>> dma_fence_is_signaled(struct dma_fence *fence) >>>> { >>>> + const struct dma_fence_ops *ops; >>>> + >>>> if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags)) >>>> return true; >>>> - if (fence->ops->signaled && fence->ops->signaled(fence)) { >>>> + rcu_read_lock(); >>>> + ops = rcu_dereference(fence->ops); >>>> + if (ops->signaled && ops->signaled(fence)) { >>>> + rcu_read_unlock(); >>> >>> With the unlocked version two threads could race and one could make the fence->lock go away just around here, before the dma_fence_signal below will take it. It seems it is only safe to rcu_read_unlock before signaling if using the embedded fence (later in the series). Can you think of a downside to holding the rcu read lock to after signaling? that would make it safe I think. >> >> Well it's good to talk about it but I think that it is not necessary to protect the lock in this particular case. >> >> See the RCU protection is only for the fence->ops pointer, but the lock can be taken way after the fence is already signaled. >> >> That's why I came up with the patch to move the lock into the fence in the first place. > > Right. And you think there is nothing to gain with the option of keeping the rcu_read_unlock() to after signalling? Ie. why not plug a potential race if we can for no negative effect. I thought quite a bit over that, but at least of hand I can't come up with a reason why we should do this. The signaling path doesn't need the RCU read side lock as far as I can see. Regards, Christian. > > Regards, > > Tvrtko

2 months, 3 weeks

1
0
0 0

[RFC PATCH 0/2] locking/ww_mutex, dma-buf/dma-resv: Improve detection of unheld locks

by Thomas Hellström

WW mutexes and dma-resv objects, which embed them, typically have a number of locks belocking to the same lock class. However code using them typically want to verify the locking on object granularity, not lock-class granularity. This series add ww_mutex functions to facilitate that, (patch 1) and utilizes these functions in the dma-resv lock checks. Thomas Hellström (2): kernel/locking/ww_mutex: Add per-lock lock-check helpers dma-buf/dma-resv: Improve the dma-resv lockdep checks include/linux/dma-resv.h | 7 +++++-- include/linux/ww_mutex.h | 18 ++++++++++++++++++ kernel/locking/mutex.c | 10 ++++++++++ 3 files changed, 33 insertions(+), 2 deletions(-) -- 2.51.1

2 months, 3 weeks

3
4
0 0

[PATCH v8 00/11] vfio/pci: Allow MMIO regions to be exported through dma-buf

by Leon Romanovsky

Changelog: v8: * Fixed spelling errors in p2pdma documentation file. * Added vdev->pci_ops check for NULL in vfio_pci_core_feature_dma_buf(). * Simplified the nvgrace_get_dmabuf_phys() function. * Added extra check in pcim_p2pdma_provider() to catch missing call to pcim_p2pdma_init(). v7: https://patch.msgid.link/20251106-dmabuf-vfio-v7-0-2503bf390699@nvidia.com * Dropped restore_revoke flag and added vfio_pci_dma_buf_move to reverse loop. * Fixed spelling errors in documentation patch. * Rebased on top of v6.18-rc3. * Added include to stddef.h to vfio.h, to keep uapi header file independent. v6: https://patch.msgid.link/20251102-dmabuf-vfio-v6-0-d773cff0db9f@nvidia.com * Fixed wrong error check from pcim_p2pdma_init(). * Documented pcim_p2pdma_provider() function. * Improved commit messages. * Added VFIO DMA-BUF selftest, not sent yet. * Added __counted_by(nr_ranges) annotation to struct vfio_device_feature_dma_buf. * Fixed error unwind when dma_buf_fd() fails. * Document latest changes to p2pmem. * Removed EXPORT_SYMBOL_GPL from pci_p2pdma_map_type. * Moved DMA mapping logic to DMA-BUF. * Removed types patch to avoid dependencies between subsystems. * Moved vfio_pci_dma_buf_move() in err_undo block. * Added nvgrace patch. v5: https://lore.kernel.org/all/cover.1760368250.git.leon@kernel.org * Rebased on top of v6.18-rc1. * Added more validation logic to make sure that DMA-BUF length doesn't overflow in various scenarios. * Hide kernel config from the users. * Fixed type conversion issue. DMA ranges are exposed with u64 length, but DMA-BUF uses "unsigned int" as a length for SG entries. * Added check to prevent from VFIO drivers which reports BAR size different from PCI, do not use DMA-BUF functionality. v4: https://lore.kernel.org/all/cover.1759070796.git.leon@kernel.org * Split pcim_p2pdma_provider() to two functions, one that initializes array of providers and another to return right provider pointer. v3: https://lore.kernel.org/all/cover.1758804980.git.leon@kernel.org * Changed pcim_p2pdma_enable() to be pcim_p2pdma_provider(). * Cache provider in vfio_pci_dma_buf struct instead of BAR index. * Removed misleading comment from pcim_p2pdma_provider(). * Moved MMIO check to be in pcim_p2pdma_provider(). v2: https://lore.kernel.org/all/cover.1757589589.git.leon@kernel.org/ * Added extra patch which adds new CONFIG, so next patches can reuse * it. * Squashed "PCI/P2PDMA: Remove redundant bus_offset from map state" into the other patch. * Fixed revoke calls to be aligned with true->false semantics. * Extended p2pdma_providers to be per-BAR and not global to whole * device. * Fixed possible race between dmabuf states and revoke. * Moved revoke to PCI BAR zap block. v1: https://lore.kernel.org/all/cover.1754311439.git.leon@kernel.org * Changed commit messages. * Reused DMA_ATTR_MMIO attribute. * Returned support for multiple DMA ranges per-dMABUF. v0: https://lore.kernel.org/all/cover.1753274085.git.leonro@nvidia.com --------------------------------------------------------------------------- Based on "[PATCH v6 00/16] dma-mapping: migrate to physical address-based API" https://lore.kernel.org/all/cover.1757423202.git.leonro@nvidia.com/ series. --------------------------------------------------------------------------- This series extends the VFIO PCI subsystem to support exporting MMIO regions from PCI device BARs as dma-buf objects, enabling safe sharing of non-struct page memory with controlled lifetime management. This allows RDMA and other subsystems to import dma-buf FDs and build them into memory regions for PCI P2P operations. The series supports a use case for SPDK where a NVMe device will be owned by SPDK through VFIO but interacting with a RDMA device. The RDMA device may directly access the NVMe CMB or directly manipulate the NVMe device's doorbell using PCI P2P. However, as a general mechanism, it can support many other scenarios with VFIO. This dmabuf approach can be usable by iommufd as well for generic and safe P2P mappings. In addition to the SPDK use-case mentioned above, the capability added in this patch series can also be useful when a buffer (located in device memory such as VRAM) needs to be shared between any two dGPU devices or instances (assuming one of them is bound to VFIO PCI) as long as they are P2P DMA compatible. The implementation provides a revocable attachment mechanism using dma-buf move operations. MMIO regions are normally pinned as BARs don't change physical addresses, but access is revoked when the VFIO device is closed or a PCI reset is issued. This ensures kernel self-defense against potentially hostile userspace. The series includes significant refactoring of the PCI P2PDMA subsystem to separate core P2P functionality from memory allocation features, making it more modular and suitable for VFIO use cases that don't need struct page support. ----------------------------------------------------------------------- The series is based originally on https://lore.kernel.org/all/20250307052248.405803-1-vivek.kasireddy@intel.c… but heavily rewritten to be based on DMA physical API. ----------------------------------------------------------------------- The WIP branch can be found here: https://git.kernel.org/pub/scm/linux/kernel/git/leon/linux-rdma.git/log/?h=… Thanks --- Jason Gunthorpe (2): PCI/P2PDMA: Document DMABUF model vfio/nvgrace: Support get_dmabuf_phys Leon Romanovsky (7): PCI/P2PDMA: Separate the mmap() support from the core logic PCI/P2PDMA: Simplify bus address mapping API PCI/P2PDMA: Refactor to separate core P2P functionality from memory allocation PCI/P2PDMA: Provide an access to pci_p2pdma_map_type() function dma-buf: provide phys_vec to scatter-gather mapping routine vfio/pci: Enable peer-to-peer DMA transactions by default vfio/pci: Add dma-buf export support for MMIO regions Vivek Kasireddy (2): vfio: Export vfio device get and put registration helpers vfio/pci: Share the core device pointer while invoking feature functions Documentation/driver-api/pci/p2pdma.rst | 95 +++++++--- block/blk-mq-dma.c | 2 +- drivers/dma-buf/dma-buf.c | 235 ++++++++++++++++++++++++ drivers/iommu/dma-iommu.c | 4 +- drivers/pci/p2pdma.c | 186 ++++++++++++++----- drivers/vfio/pci/Kconfig | 3 + drivers/vfio/pci/Makefile | 1 + drivers/vfio/pci/nvgrace-gpu/main.c | 56 ++++++ drivers/vfio/pci/vfio_pci.c | 5 + drivers/vfio/pci/vfio_pci_config.c | 22 ++- drivers/vfio/pci/vfio_pci_core.c | 53 ++++-- drivers/vfio/pci/vfio_pci_dmabuf.c | 315 ++++++++++++++++++++++++++++++++ drivers/vfio/pci/vfio_pci_priv.h | 23 +++ drivers/vfio/vfio_main.c | 2 + include/linux/dma-buf.h | 18 ++ include/linux/pci-p2pdma.h | 120 +++++++----- include/linux/vfio.h | 2 + include/linux/vfio_pci_core.h | 42 +++++ include/uapi/linux/vfio.h | 28 +++ kernel/dma/direct.c | 4 +- mm/hmm.c | 2 +- 21 files changed, 1078 insertions(+), 140 deletions(-) --- base-commit: dcb6fa37fd7bc9c3d2b066329b0d27dedf8becaa change-id: 20251016-dmabuf-vfio-6cef732adf5a Best regards, -- Leon Romanovsky <leonro(a)nvidia.com>

2 months, 3 weeks

3
40
0 0

[PATCH v9 00/11] vfio/pci: Allow MMIO regions to be exported through dma-buf

by Leon Romanovsky

Changelog: v9: * Added Reviewed-by tags. * Fixes to p2pdma documentation. * Renamed dma_buf_map and unmap. * Moved them to separate file. * Used nvgrace_gpu_memregion() function instead of open-coded variant. * Paired get_file_active() with fput(). v8: https://patch.msgid.link/20251111-dmabuf-vfio-v8-0-fd9aa5df478f@nvidia.com * Fixed spelling errors in p2pdma documentation file. * Added vdev->pci_ops check for NULL in vfio_pci_core_feature_dma_buf(). * Simplified the nvgrace_get_dmabuf_phys() function. * Added extra check in pcim_p2pdma_provider() to catch missing call to pcim_p2pdma_init(). v7: https://patch.msgid.link/20251106-dmabuf-vfio-v7-0-2503bf390699@nvidia.com * Dropped restore_revoke flag and added vfio_pci_dma_buf_move to reverse loop. * Fixed spelling errors in documentation patch. * Rebased on top of v6.18-rc3. * Added include to stddef.h to vfio.h, to keep uapi header file independent. v6: https://patch.msgid.link/20251102-dmabuf-vfio-v6-0-d773cff0db9f@nvidia.com * Fixed wrong error check from pcim_p2pdma_init(). * Documented pcim_p2pdma_provider() function. * Improved commit messages. * Added VFIO DMA-BUF selftest, not sent yet. * Added __counted_by(nr_ranges) annotation to struct vfio_device_feature_dma_buf. * Fixed error unwind when dma_buf_fd() fails. * Document latest changes to p2pmem. * Removed EXPORT_SYMBOL_GPL from pci_p2pdma_map_type. * Moved DMA mapping logic to DMA-BUF. * Removed types patch to avoid dependencies between subsystems. * Moved vfio_pci_dma_buf_move() in err_undo block. * Added nvgrace patch. v5: https://lore.kernel.org/all/cover.1760368250.git.leon@kernel.org * Rebased on top of v6.18-rc1. * Added more validation logic to make sure that DMA-BUF length doesn't overflow in various scenarios. * Hide kernel config from the users. * Fixed type conversion issue. DMA ranges are exposed with u64 length, but DMA-BUF uses "unsigned int" as a length for SG entries. * Added check to prevent from VFIO drivers which reports BAR size different from PCI, do not use DMA-BUF functionality. v4: https://lore.kernel.org/all/cover.1759070796.git.leon@kernel.org * Split pcim_p2pdma_provider() to two functions, one that initializes array of providers and another to return right provider pointer. v3: https://lore.kernel.org/all/cover.1758804980.git.leon@kernel.org * Changed pcim_p2pdma_enable() to be pcim_p2pdma_provider(). * Cache provider in vfio_pci_dma_buf struct instead of BAR index. * Removed misleading comment from pcim_p2pdma_provider(). * Moved MMIO check to be in pcim_p2pdma_provider(). v2: https://lore.kernel.org/all/cover.1757589589.git.leon@kernel.org/ * Added extra patch which adds new CONFIG, so next patches can reuse * it. * Squashed "PCI/P2PDMA: Remove redundant bus_offset from map state" into the other patch. * Fixed revoke calls to be aligned with true->false semantics. * Extended p2pdma_providers to be per-BAR and not global to whole * device. * Fixed possible race between dmabuf states and revoke. * Moved revoke to PCI BAR zap block. v1: https://lore.kernel.org/all/cover.1754311439.git.leon@kernel.org * Changed commit messages. * Reused DMA_ATTR_MMIO attribute. * Returned support for multiple DMA ranges per-dMABUF. v0: https://lore.kernel.org/all/cover.1753274085.git.leonro@nvidia.com --------------------------------------------------------------------------- Based on "[PATCH v6 00/16] dma-mapping: migrate to physical address-based API" https://lore.kernel.org/all/cover.1757423202.git.leonro@nvidia.com/ series. --------------------------------------------------------------------------- This series extends the VFIO PCI subsystem to support exporting MMIO regions from PCI device BARs as dma-buf objects, enabling safe sharing of non-struct page memory with controlled lifetime management. This allows RDMA and other subsystems to import dma-buf FDs and build them into memory regions for PCI P2P operations. The series supports a use case for SPDK where a NVMe device will be owned by SPDK through VFIO but interacting with a RDMA device. The RDMA device may directly access the NVMe CMB or directly manipulate the NVMe device's doorbell using PCI P2P. However, as a general mechanism, it can support many other scenarios with VFIO. This dmabuf approach can be usable by iommufd as well for generic and safe P2P mappings. In addition to the SPDK use-case mentioned above, the capability added in this patch series can also be useful when a buffer (located in device memory such as VRAM) needs to be shared between any two dGPU devices or instances (assuming one of them is bound to VFIO PCI) as long as they are P2P DMA compatible. The implementation provides a revocable attachment mechanism using dma-buf move operations. MMIO regions are normally pinned as BARs don't change physical addresses, but access is revoked when the VFIO device is closed or a PCI reset is issued. This ensures kernel self-defense against potentially hostile userspace. The series includes significant refactoring of the PCI P2PDMA subsystem to separate core P2P functionality from memory allocation features, making it more modular and suitable for VFIO use cases that don't need struct page support. ----------------------------------------------------------------------- The series is based originally on https://lore.kernel.org/all/20250307052248.405803-1-vivek.kasireddy@intel.c… but heavily rewritten to be based on DMA physical API. ----------------------------------------------------------------------- The WIP branch can be found here: https://git.kernel.org/pub/scm/linux/kernel/git/leon/linux-rdma.git/log/?h=… Thanks --- Jason Gunthorpe (2): PCI/P2PDMA: Document DMABUF model vfio/nvgrace: Support get_dmabuf_phys Leon Romanovsky (7): PCI/P2PDMA: Separate the mmap() support from the core logic PCI/P2PDMA: Simplify bus address mapping API PCI/P2PDMA: Refactor to separate core P2P functionality from memory allocation PCI/P2PDMA: Provide an access to pci_p2pdma_map_type() function dma-buf: provide phys_vec to scatter-gather mapping routine vfio/pci: Enable peer-to-peer DMA transactions by default vfio/pci: Add dma-buf export support for MMIO regions Vivek Kasireddy (2): vfio: Export vfio device get and put registration helpers vfio/pci: Share the core device pointer while invoking feature functions Documentation/driver-api/pci/p2pdma.rst | 97 +++++++--- block/blk-mq-dma.c | 2 +- drivers/dma-buf/Makefile | 2 +- drivers/dma-buf/dma-buf-mapping.c | 248 +++++++++++++++++++++++++ drivers/iommu/dma-iommu.c | 4 +- drivers/pci/p2pdma.c | 186 ++++++++++++++----- drivers/vfio/pci/Kconfig | 3 + drivers/vfio/pci/Makefile | 1 + drivers/vfio/pci/nvgrace-gpu/main.c | 52 ++++++ drivers/vfio/pci/vfio_pci.c | 5 + drivers/vfio/pci/vfio_pci_config.c | 22 ++- drivers/vfio/pci/vfio_pci_core.c | 53 ++++-- drivers/vfio/pci/vfio_pci_dmabuf.c | 316 ++++++++++++++++++++++++++++++++ drivers/vfio/pci/vfio_pci_priv.h | 23 +++ drivers/vfio/vfio_main.c | 2 + include/linux/dma-buf-mapping.h | 17 ++ include/linux/dma-buf.h | 11 ++ include/linux/pci-p2pdma.h | 120 +++++++----- include/linux/vfio.h | 2 + include/linux/vfio_pci_core.h | 42 +++++ include/uapi/linux/vfio.h | 28 +++ kernel/dma/direct.c | 4 +- mm/hmm.c | 2 +- 23 files changed, 1101 insertions(+), 141 deletions(-) --- base-commit: dcb6fa37fd7bc9c3d2b066329b0d27dedf8becaa change-id: 20251016-dmabuf-vfio-6cef732adf5a Best regards, -- Leon Romanovsky <leonro(a)nvidia.com>

2 months, 3 weeks

2
13
0 0

Re: [PATCH v8 10/11] vfio/pci: Add dma-buf export support for MMIO regions

by Jason Gunthorpe

On Tue, Nov 18, 2025 at 11:56:14PM +0000, Tian, Kevin wrote: > > > > + down_write(&vdev->memory_lock); > > > > + list_for_each_entry_safe(priv, tmp, &vdev->dmabufs, dmabufs_elm) > > > > { > > > > + if (!get_file_active(&priv->dmabuf->file)) > > > > + continue; > > > > + > > > > + dma_resv_lock(priv->dmabuf->resv, NULL); > > > > + list_del_init(&priv->dmabufs_elm); > > > > + priv->vdev = NULL; > > > > + priv->revoked = true; > > > > + dma_buf_move_notify(priv->dmabuf); > > > > + dma_resv_unlock(priv->dmabuf->resv); > > > > + vfio_device_put_registration(&vdev->vdev); > > > > + fput(priv->dmabuf->file); > > > > > > dma_buf_put(priv->dmabuf), consistent with other places. > > > > Someone else said this, I don't agree, the above got the get via > > > > get_file_active() instead of a dma_buf version.. > > > > So we should pair with get_file_active() vs fput(). > > > > Christian rejected the idea of adding a dmabuf wrapper for > > get_file_active(), oh well. > > Okay then vfio_pci_dma_buf_move() should be changed. It uses > get_file_active() to pair dma_buf_put(). Makes sense, Leon can you fix it? Thanks, Jason

2 months, 3 weeks

2
1
0 0

Re: [PATCH v2 02/20] drm/ttm: rework pipelined eviction fence handling

by Thomas Hellström

Hi, Pierre-Eric On Thu, 2025-11-13 at 17:05 +0100, Pierre-Eric Pelloux-Prayer wrote: > Until now ttm stored a single pipelined eviction fence which means > drivers had to use a single entity for these evictions. > > To lift this requirement, this commit allows up to 8 entities to > be used. > > Ideally a dma_resv object would have been used as a container of > the eviction fences, but the locking rules makes it complex. > dma_resv all have the same ww_class, which means "Attempting to > lock more mutexes after ww_acquire_done." is an error. > > One alternative considered was to introduced a 2nd ww_class for > specific resv to hold a single "transient" lock (= the resv lock > would only be held for a short period, without taking any other > locks). Wouldn't it be possible to use lockdep_set_class_and_name() to modify the resv lock class for these particular resv objects after they are allocated? Reusing the resv code certainly sounds attractive. Thanks, Thomas

2 months, 3 weeks

2
1
0 0

Re: [PATCH v8 09/11] vfio/pci: Enable peer-to-peer DMA transactions by default

by Leon Romanovsky

On Wed, Nov 19, 2025 at 12:02:02AM +0000, Tian, Kevin wrote: > > From: Keith Busch <kbusch(a)kernel.org> > > Sent: Wednesday, November 19, 2025 4:19 AM > > > > On Tue, Nov 18, 2025 at 07:18:36AM +0000, Tian, Kevin wrote: > > > > From: Leon Romanovsky <leon(a)kernel.org> > > > > Sent: Tuesday, November 11, 2025 5:58 PM > > > > > > > > From: Leon Romanovsky <leonro(a)nvidia.com> > > > > > > not required with only your own s-o-b > > > > That's automatically appended when the sender and signer don't match. > > It's not uncommon for developers to send from a kernel.org email but > > sign off with a corporate account, or the other way around. > > Good to know. Yes, in addition, I used to separate between code authorship and my open-source activity. Code belongs to my employer and this is why corporate address is used as an author, but all emails and communications are coming from my kernel.org account. Thanks

2 months, 3 weeks

1
0
0 0

Re: [PATCH v8 06/11] dma-buf: provide phys_vec to scatter-gather mapping routine

by Leon Romanovsky

On Wed, Nov 19, 2025 at 05:54:55AM +0000, Tian, Kevin wrote: > > From: Leon Romanovsky <leon(a)kernel.org> > > Sent: Tuesday, November 11, 2025 5:58 PM > > + > > + if (dma->state && dma_use_iova(dma->state)) { > > + WARN_ON_ONCE(mapped_len != size); > > then "goto err_unmap_dma". It never should happen, there is no need to provide error unwind to something that you won't get. > > Reviewed-by: Kevin Tian <kevin.tian(a)intel.com> Thanks

2 months, 3 weeks

2
2
0 0

Re: [PATCH v8 06/11] dma-buf: provide phys_vec to scatter-gather mapping routine

by Leon Romanovsky

On Tue, Nov 18, 2025 at 04:06:11PM -0800, Nicolin Chen wrote: > On Tue, Nov 11, 2025 at 11:57:48AM +0200, Leon Romanovsky wrote: > > From: Leon Romanovsky <leonro(a)nvidia.com> > > > > Add dma_buf_map() and dma_buf_unmap() helpers to convert an array of > > MMIO physical address ranges into scatter-gather tables with proper > > DMA mapping. > > > > These common functions are a starting point and support any PCI > > drivers creating mappings from their BAR's MMIO addresses. VFIO is one > > case, as shortly will be RDMA. We can review existing DRM drivers to > > refactor them separately. We hope this will evolve into routines to > > help common DRM that include mixed CPU and MMIO mappings. > > > > Compared to the dma_map_resource() abuse this implementation handles > > the complicated PCI P2P scenarios properly, especially when an IOMMU > > is enabled: > > > > - Direct bus address mapping without IOVA allocation for > > PCI_P2PDMA_MAP_BUS_ADDR, using pci_p2pdma_bus_addr_map(). This > > happens if the IOMMU is enabled but the PCIe switch ACS flags allow > > transactions to avoid the host bridge. > > > > Further, this handles the slightly obscure, case of MMIO with a > > phys_addr_t that is different from the physical BAR programming > > (bus offset). The phys_addr_t is converted to a dma_addr_t and > > accommodates this effect. This enables certain real systems to > > work, especially on ARM platforms. > > > > - Mapping through host bridge with IOVA allocation and DMA_ATTR_MMIO > > attribute for MMIO memory regions (PCI_P2PDMA_MAP_THRU_HOST_BRIDGE). > > This happens when the IOMMU is enabled and the ACS flags are forcing > > all traffic to the IOMMU - ie for virtualization systems. > > > > - Cases where P2P is not supported through the host bridge/CPU. The > > P2P subsystem is the proper place to detect this and block it. > > > > Helper functions fill_sg_entry() and calc_sg_nents() handle the > > scatter-gather table construction, splitting large regions into > > UINT_MAX-sized chunks to fit within sg->length field limits. > > > > Since the physical address based DMA API forbids use of the CPU list > > of the scatterlist this will produce a mangled scatterlist that has > > a fully zero-length and NULL'd CPU list. The list is 0 length, > > all the struct page pointers are NULL and zero sized. This is stronger > > and more robust than the existing mangle_sg_table() technique. It is > > a future project to migrate DMABUF as a subsystem away from using > > scatterlist for this data structure. > > > > Tested-by: Alex Mastro <amastro(a)fb.com> > > Tested-by: Nicolin Chen <nicolinc(a)nvidia.com> > > Signed-off-by: Leon Romanovsky <leonro(a)nvidia.com> > > Reviewed-by: Nicolin Chen <nicolinc(a)nvidia.com> > > With a nit: > > > +err_unmap_dma: > > + if (!i || !dma->state) { > > + ; /* Do nothing */ > > + } else if (dma_use_iova(dma->state)) { > > + dma_iova_destroy(attach->dev, dma->state, mapped_len, dir, > > + DMA_ATTR_MMIO); > > + } else { > > + for_each_sgtable_dma_sg(&dma->sgt, sgl, i) > > + dma_unmap_phys(attach->dev, sg_dma_address(sgl), > > + sg_dma_len(sgl), dir, DMA_ATTR_MMIO); > > Would it be safer to skip dma_unmap_phys() the range [i, nents)? [i, nents) is not supposed to be in SG list which we are iterating. Thanks

2 months, 3 weeks

1
0
0 0

Re: [PATCH 8/9] iommufd: Accept a DMABUF through IOMMU_IOAS_MAP_FILE

by Jason Gunthorpe

On Thu, Nov 13, 2025 at 04:05:09PM -0800, Nicolin Chen wrote: > > -struct iopt_pages *iopt_alloc_file_pages(struct file *file, unsigned long start, > > +struct iopt_pages *iopt_alloc_file_pages(struct file *file, > > + unsigned long start_byte, > > + unsigned long start, > > unsigned long length, bool writable); > > Passing in start_byte looks like a cleanup to me, aligning with > what iopt_map_common() has. > Since we are doing this cleanup, maybe we could follow the same > sequence: xxx, start, length, start_byte, writable? ?? static int iopt_map_common(struct iommufd_ctx *ictx, struct io_pagetable *iopt, struct iopt_pages *pages, unsigned long *iova, unsigned long length, unsigned long start_byte, int iommu_prot, unsigned int flags) Not the same arguments, we don't pass start and start_byte there? Jason

2 months, 4 weeks

1
0
0 0

Re: [PATCH 6/9] iommufd: Have pfn_reader process DMABUF iopt_pages

by Jason Gunthorpe

On Thu, Nov 13, 2025 at 03:39:46PM -0800, Nicolin Chen wrote: > > @@ -1687,6 +1737,12 @@ static void __iopt_area_unfill_domain(struct iopt_area *area, > > > > lockdep_assert_held(&pages->mutex); > > > > + if (iopt_is_dmabuf(pages)) { > > + iopt_area_unmap_domain_range(area, domain, start_index, > > + last_index); > > + return; > > + } > > Should it be: > if (iopt_is_dmabuf(pages) && !iopt_dmabuf_revoked(pages)) { > ? All callers have already done it, let's add an assertion though.. @@ -1873,6 +1873,8 @@ static void __iopt_area_unfill_domain(struct iopt_area *area, lockdep_assert_held(&pages->mutex); if (iopt_is_dmabuf(pages)) { + if (WARN_ON(iopt_dmabuf_revoked(pages))) + return; iopt_area_unmap_domain_range(area, domain, start_index, last_index); return; Jason

2 months, 4 weeks

1
0
0 0

Re: [PATCH 9/9] iommufd/selftest: Add some tests for the dmabuf flow

by Jason Gunthorpe

On Fri, Nov 07, 2025 at 11:43:56AM -0800, Nicolin Chen wrote: > > +static void iommufd_test_dma_buf_release(struct dma_buf *dmabuf) > > +{ > > + struct iommufd_test_dma_buf *priv = dmabuf->priv; > > + > > + kfree(priv); > > +} > > Missing > kfree(priv->memory); > ? Yes, thanks Jason

2 months, 4 weeks

1
0
0 0

Re: [PATCH 0/9] Initial DMABUF support for iommufd

by Jason Gunthorpe

On Tue, Nov 18, 2025 at 05:37:59AM +0000, Kasireddy, Vivek wrote: > Hi Jason, > > > Subject: Re: [PATCH 0/9] Initial DMABUF support for iommufd > > > > On Thu, Nov 13, 2025 at 11:37:12AM -0700, Alex Williamson wrote: > > > > The latest series for interconnect negotation to exchange a phys_addr is: > > > > https://lore.kernel.org/r/20251027044712.1676175-1- > > vivek.kasireddy(a)intel.com > > > > > > If this is in development, why are we pursuing a vfio specific > > > temporary "private interconnect" here rather than building on that > > > work? What are the gaps/barriers/timeline? > > > > I broadly don't expect to see an agreement on the above for probably > Are you planning to post your SGT mapping type patches soon, so that we > can start discussion on the design? It is on my list, but probably not soon enough :\ I wanted to address the remarks given and I still have to conclude some urgent things for this merge window. > I went ahead and tested your patches and did not notice any regressions > with my test-cases (after adding some minor fixups). I have also added/tested > support for IOV mapping type based on your design: > https://gitlab.freedesktop.org/Vivek/drm-tip/-/commits/dmabuf_iov_v1 Wow, that's great! Jason

2 months, 4 weeks

1
0
0 0

Re: [PATCH v7 00/11] vfio/pci: Allow MMIO regions to be exported through dma-buf

by Jason Gunthorpe

On Mon, Nov 17, 2025 at 08:36:20AM -0700, Alex Williamson wrote: > On Tue, 11 Nov 2025 09:54:22 +0100 > Christian König <christian.koenig(a)amd.com> wrote: > > > On 11/10/25 21:42, Alex Williamson wrote: > > > On Thu, 6 Nov 2025 16:16:45 +0200 > > > Leon Romanovsky <leon(a)kernel.org> wrote: > > > > > >> Changelog: > > >> v7: > > >> * Dropped restore_revoke flag and added vfio_pci_dma_buf_move > > >> to reverse loop. > > >> * Fixed spelling errors in documentation patch. > > >> * Rebased on top of v6.18-rc3. > > >> * Added include to stddef.h to vfio.h, to keep uapi header file independent. > > > > > > I think we're winding down on review comments. It'd be great to get > > > p2pdma and dma-buf acks on this series. Otherwise it's been posted > > > enough that we'll assume no objections. Thanks, > > > > Already have it on my TODO list to take a closer look, but no idea when that will be. > > > > This patch set is on place 4 or 5 on a rather long list of stuff to review/finish. > > Hi Christian, > > Gentle nudge. Leon posted v8[1] last week, which is not drawing any > new comments. Do you foresee having time for review that I should > still hold off merging for v6.19 a bit longer? Thanks, I really want this merged this cycle, along with the iommufd part, which means it needs to go into your tree by very early next week on a shared branch so I can do the iommufd part on top. It is the last blocking kernel piece to conclude the viommu support roll out into qemu for iommufd which quite a lot of people have been working on for years now. IMHO there is nothing profound in the dmabuf patch, it was written by the expert in the new DMA API operation, and doesn't form any troublesome API contracts. It is also the same basic code as from the v1 in July just moved into dmabuf .c files instead of vfio .c files at Christoph's request. My hope is DRM folks will pick up the baton and continue to improve this to move other drivers away from dma_map_resource(). Simona told me people have wanted DMA API improvements for ages, now we have them, now is the time! Any remarks after the fact can be addressed incrementally. If there are no concrete technical remarks please take it. 6 months is long enough to wait for feedback. Thanks, Jason

2 months, 4 weeks

2
2
0 0

Re: [PATCH v8 11/11] vfio/nvgrace: Support get_dmabuf_phys

by Jason Gunthorpe

On Tue, Nov 18, 2025 at 07:59:20AM +0000, Ankit Agrawal wrote: > + if (nvdev->resmem.memlength && region_index == RESMEM_REGION_INDEX) { > + /* > + * The P2P properties of the non-BAR memory is the same as the > + * BAR memory, so just use the provider for index 0. Someday > + * when CXL gets P2P support we could create CXLish providers > + * for the non-BAR memory. > + */ > + mem_region = &nvdev->resmem; > + } else if (region_index == USEMEM_REGION_INDEX) { > + /* > + * This is actually cachable memory and isn't treated as P2P in > + * the chip. For now we have no way to push cachable memory > + * through everything and the Grace HW doesn't care what caching > + * attribute is programmed into the SMMU. So use BAR 0. > + */ > + mem_region = &nvdev->usemem; > + } > + > > Can we replace this with nvgrace_gpu_memregion()? Yes, looks like But we need to preserve the comments above as well somehow. Jason

2 months, 4 weeks

1
0
0 0

Re: [PATCH v8 10/11] vfio/pci: Add dma-buf export support for MMIO regions

by Jason Gunthorpe

On Tue, Nov 18, 2025 at 07:33:23AM +0000, Tian, Kevin wrote: > > From: Leon Romanovsky <leon(a)kernel.org> > > Sent: Tuesday, November 11, 2025 5:58 PM > > > > - if (!new_mem) > > + if (!new_mem) { > > vfio_pci_zap_and_down_write_memory_lock(vdev); > > - else > > + vfio_pci_dma_buf_move(vdev, true); > > + } else { > > down_write(&vdev->memory_lock); > > + } > > shouldn't we notify move before zapping the bars? otherwise there is > still a small window in between where the exporter already has the > mapping cleared while the importer still keeps it... zapping the VMA and moving/revoking the DMABUF are independent operations that can happen in any order. They effect different kinds of users. The VMA zap prevents CPU access from userspace, the DMABUF move prevents DMA access from devices. The order has to be like the above because vfio_pci_dma_buf_move() must be called under the memory lock and vfio_pci_zap_and_down_write_memory_lock() gets the memory lock.. > > +static void vfio_pci_dma_buf_release(struct dma_buf *dmabuf) > > +{ > > + struct vfio_pci_dma_buf *priv = dmabuf->priv; > > + > > + /* > > + * Either this or vfio_pci_dma_buf_cleanup() will remove from the list. > > + * The refcount prevents both. > > which refcount? I thought it's vdev->memory_lock preventing the race... Refcount on the dmabuf > > +int vfio_pci_core_fill_phys_vec(struct dma_buf_phys_vec *phys_vec, > > + struct vfio_region_dma_range *dma_ranges, > > + size_t nr_ranges, phys_addr_t start, > > + phys_addr_t len) > > +{ > > + phys_addr_t max_addr; > > + unsigned int i; > > + > > + max_addr = start + len; > > + for (i = 0; i < nr_ranges; i++) { > > + phys_addr_t end; > > + > > + if (!dma_ranges[i].length) > > + return -EINVAL; > > Looks redundant as there is already a check in validate_dmabuf_input(). Agree > > +int vfio_pci_core_feature_dma_buf(struct vfio_pci_core_device *vdev, u32 > > flags, > > + struct vfio_device_feature_dma_buf __user > > *arg, > > + size_t argsz) > > +{ > > + struct vfio_device_feature_dma_buf get_dma_buf = {}; > > + struct vfio_region_dma_range *dma_ranges; > > + DEFINE_DMA_BUF_EXPORT_INFO(exp_info); > > + struct vfio_pci_dma_buf *priv; > > + size_t length; > > + int ret; > > + > > + if (!vdev->pci_ops || !vdev->pci_ops->get_dmabuf_phys) > > + return -EOPNOTSUPP; > > + > > + ret = vfio_check_feature(flags, argsz, VFIO_DEVICE_FEATURE_GET, > > + sizeof(get_dma_buf)); > > + if (ret != 1) > > + return ret; > > + > > + if (copy_from_user(&get_dma_buf, arg, sizeof(get_dma_buf))) > > + return -EFAULT; > > + > > + if (!get_dma_buf.nr_ranges || get_dma_buf.flags) > > + return -EINVAL; > > unknown flag bits get -EOPNOTSUPP. Agree > > + > > +void vfio_pci_dma_buf_cleanup(struct vfio_pci_core_device *vdev) > > +{ > > + struct vfio_pci_dma_buf *priv; > > + struct vfio_pci_dma_buf *tmp; > > + > > + down_write(&vdev->memory_lock); > > + list_for_each_entry_safe(priv, tmp, &vdev->dmabufs, dmabufs_elm) > > { > > + if (!get_file_active(&priv->dmabuf->file)) > > + continue; > > + > > + dma_resv_lock(priv->dmabuf->resv, NULL); > > + list_del_init(&priv->dmabufs_elm); > > + priv->vdev = NULL; > > + priv->revoked = true; > > + dma_buf_move_notify(priv->dmabuf); > > + dma_resv_unlock(priv->dmabuf->resv); > > + vfio_device_put_registration(&vdev->vdev); > > + fput(priv->dmabuf->file); > > dma_buf_put(priv->dmabuf), consistent with other places. Someone else said this, I don't agree, the above got the get via get_file_active() instead of a dma_buf version.. So we should pair with get_file_active() vs fput(). Christian rejected the idea of adding a dmabuf wrapper for get_file_active(), oh well. > > +struct vfio_device_feature_dma_buf { > > + __u32 region_index; > > + __u32 open_flags; > > + __u32 flags; > > Usually the 'flags' field is put in the start (following argsz if existing). Yeah, but doesn't really matter. Thanks, Jason

2 months, 4 weeks

1
0
0 0

Re: [PATCH 02/18] dma-buf: protected fence ops by RCU v3

by Christian König

On 11/14/25 11:50, Tvrtko Ursulin wrote: >> @@ -569,12 +577,12 @@ void dma_fence_release(struct kref *kref) >> spin_unlock_irqrestore(fence->lock, flags); >> } >> - rcu_read_unlock(); >> - >> - if (fence->ops->release) >> - fence->ops->release(fence); >> + ops = rcu_dereference(fence->ops); >> + if (ops->release) >> + ops->release(fence); >> else >> dma_fence_free(fence); >> + rcu_read_unlock(); > > Risk being a spin lock in the release callback will trigger a warning on PREEMPT_RT. But at least the current code base does not have anything like that AFAICS so I guess it is okay. I don't think that this is a problem. When PREEMPT_RT is enabled both RCU and spinlocks become preemptible. So as far as I know it is perfectly valid to grab a spinlock under an rcu read side critical section. >> diff --git a/include/linux/dma-fence.h b/include/linux/dma-fence.h >> index 64639e104110..77f07735f556 100644 >> --- a/include/linux/dma-fence.h >> +++ b/include/linux/dma-fence.h >> @@ -66,7 +66,7 @@ struct seq_file; >> */ >> struct dma_fence { >> spinlock_t *lock; >> - const struct dma_fence_ops *ops; >> + const struct dma_fence_ops __rcu *ops; >> /* >> * We clear the callback list on kref_put so that by the time we >> * release the fence it is unused. No one should be adding to the >> @@ -218,6 +218,10 @@ struct dma_fence_ops { >> * timed out. Can also return other error values on custom implementations, >> * which should be treated as if the fence is signaled. For example a hardware >> * lockup could be reported like that. >> + * >> + * Implementing this callback prevents the BO from detaching after > > s/BO/fence/ > >> + * signaling and so it is mandatory for the module providing the >> + * dma_fence_ops to stay loaded as long as the dma_fence exists. >> */ >> signed long (*wait)(struct dma_fence *fence, >> bool intr, signed long timeout); >> @@ -229,6 +233,13 @@ struct dma_fence_ops { >> * Can be called from irq context. This callback is optional. If it is >> * NULL, then dma_fence_free() is instead called as the default >> * implementation. >> + * >> + * Implementing this callback prevents the BO from detaching after > > Ditto. Both fixed, thanks. > >> + * signaling and so it is mandatory for the module providing the >> + * dma_fence_ops to stay loaded as long as the dma_fence exists. >> + * >> + * If the callback is implemented the memory backing the dma_fence >> + * object must be freed RCU safe. >> */ >> void (*release)(struct dma_fence *fence); >> @@ -418,13 +429,19 @@ const char __rcu *dma_fence_timeline_name(struct dma_fence *fence); >> static inline bool >> dma_fence_is_signaled_locked(struct dma_fence *fence) >> { >> + const struct dma_fence_ops *ops; >> + >> if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags)) >> return true; >> - if (fence->ops->signaled && fence->ops->signaled(fence)) { >> + rcu_read_lock(); >> + ops = rcu_dereference(fence->ops); >> + if (ops->signaled && ops->signaled(fence)) { >> + rcu_read_unlock(); >> dma_fence_signal_locked(fence); >> return true; >> } >> + rcu_read_unlock(); >> return false; >> } >> @@ -448,13 +465,19 @@ dma_fence_is_signaled_locked(struct dma_fence *fence) >> static inline bool >> dma_fence_is_signaled(struct dma_fence *fence) >> { >> + const struct dma_fence_ops *ops; >> + >> if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags)) >> return true; >> - if (fence->ops->signaled && fence->ops->signaled(fence)) { >> + rcu_read_lock(); >> + ops = rcu_dereference(fence->ops); >> + if (ops->signaled && ops->signaled(fence)) { >> + rcu_read_unlock(); > > With the unlocked version two threads could race and one could make the fence->lock go away just around here, before the dma_fence_signal below will take it. It seems it is only safe to rcu_read_unlock before signaling if using the embedded fence (later in the series). Can you think of a downside to holding the rcu read lock to after signaling? that would make it safe I think. Well it's good to talk about it but I think that it is not necessary to protect the lock in this particular case. See the RCU protection is only for the fence->ops pointer, but the lock can be taken way after the fence is already signaled. That's why I came up with the patch to move the lock into the fence in the first place. Regards, Christian. > > Regards, > > Tvrtko > >> dma_fence_signal(fence); >> return true; >> } >> + rcu_read_unlock(); >> return false; >> } >

2 months, 4 weeks

1
0
0 0

Re: [PATCH 0/9] Initial DMABUF support for iommufd

by Jason Gunthorpe

On Thu, Nov 13, 2025 at 11:37:12AM -0700, Alex Williamson wrote: > > The latest series for interconnect negotation to exchange a phys_addr is: > > https://lore.kernel.org/r/20251027044712.1676175-1-vivek.kasireddy@intel.com > > If this is in development, why are we pursuing a vfio specific > temporary "private interconnect" here rather than building on that > work? What are the gaps/barriers/timeline? I broadly don't expect to see an agreement on the above for probably half a year, and I see no reason to hold this up for it. Many people are asking for this P2P support to be completed in iommufd. Further, I think the above will be easier to work on when we have this merged as an example that can consume it in a different way. Right now it is too theoretical, IMHO. > I don't see any uAPI changes here, is there any visibility to userspace > whether IOMMUFD supports this feature or is it simply a try and fail > approach? So far we haven't done discoverably things beyond try and fail. I'd be happy if the userspace folks doing libvirt or whatever came with some requests/patches for discoverability. It is not just this feature, but things like nesting and IOMMU driver support and so on. > The latter makes it difficult for management tools to select > whether to choose a VM configuration based on IOMMUFD or legacy vfio if > p2p DMA is a requirement. Thanks, In alot of cases it isn't really a choice as you need iommufd to do an accelerated vIOMMU. But yes, it would be nice to eventually automatically use iommufd whenever possible. Thanks, Jason

2 months, 4 weeks

1
0
0 0

Re: [PATCH 08/18] drm/sched: use inline locks for the drm-sched-fence

by Christian König

On 11/13/25 17:23, Philipp Stanner wrote: > On Thu, 2025-11-13 at 15:51 +0100, Christian König wrote: >> Using the inline lock is now the recommended way for dma_fence implementations. >> >> So use this approach for the scheduler fences as well just in case if >> anybody uses this as blueprint for its own implementation. >> >> Also saves about 4 bytes for the external spinlock. > > So you changed your mind and want to keep this patch? Actually it was you who changed my mind. When we want to document that using the internal lock is now the norm and all implementations should switch to that if possible we should push as much as possible for using this in the driver common code as well. Regards, Christian. > > P. >

2 months, 4 weeks

1
0
0 0

Re: Independence for dma_fences! v3

by Christian König

On 11/13/25 17:20, Philipp Stanner wrote: > On Thu, 2025-11-13 at 15:51 +0100, Christian König wrote: >> Hi everyone, >> >> dma_fences have ever lived under the tyranny dictated by the module >> lifetime of their issuer, leading to crashes should anybody still holding >> a reference to a dma_fence when the module of the issuer was unloaded. >> >> The basic problem is that when buffer are shared between drivers >> dma_fence objects can leak into external drivers and stay there even >> after they are signaled. The dma_resv object for example only lazy releases >> dma_fences. >> >> So what happens is that when the module who originally created the dma_fence >> unloads the dma_fence_ops function table becomes unavailable as well and so >> any attempt to release the fence crashes the system. >> >> Previously various approaches have been discussed, including changing the >> locking semantics of the dma_fence callbacks (by me) as well as using the >> drm scheduler as intermediate layer (by Sima) to disconnect dma_fences >> from their actual users, but none of them are actually solving all problems. >> >> Tvrtko did some really nice prerequisite work by protecting the returned >> strings of the dma_fence_ops by RCU. This way dma_fence creators where >> able to just wait for an RCU grace period after fence signaling before >> they could be save to free those data structures. >> >> Now this patch set here goes a step further and protects the whole >> dma_fence_ops structure by RCU, so that after the fence signals the >> pointer to the dma_fence_ops is set to NULL when there is no wait nor >> release callback given. All functionality which use the dma_fence_ops >> reference are put inside an RCU critical section, except for the >> deprecated issuer specific wait and of course the optional release >> callback. >> >> Additional to the RCU changes the lock protecting the dma_fence state >> previously had to be allocated external. This set here now changes the >> functionality to make that external lock optional and allows dma_fences >> to use an inline lock and be self contained. >> >> This patch set addressed all previous code review comments and is based >> on drm-tip, includes my changes for amdgpu as well as Mathew's patches for XE. >> >> Going to push the core DMA-buf changes to drm-misc-next as soon as I get >> the appropriate rb. The driver specific changes can go upstream through >> the driver channels as necessary. > > No changelog? :( On the cover letter? For dma-buf patches we usually do that on the individual patches. Christian. > > P. > >> >> Please review and comment, >> Christian. >> >> >

2 months, 4 weeks

1
0
0 0

Re: [PATCH v2 18/20] drm/amdgpu: rename amdgpu_fill_buffer as amdgpu_ttm_clear_buffer

by Christian König

On 11/13/25 17:05, Pierre-Eric Pelloux-Prayer wrote: > This is the only use case for this function. > > --- > v2: amdgpu_ttm_clear_buffer instead of amdgpu_clear_buffer > --- > > Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer(a)amd.com> Reviewed-by: Christian König <christian.koenig(a)amd.com> > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 8 +++---- > drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 26 ++++++++++------------ > drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h | 15 ++++++------- > 3 files changed, 23 insertions(+), 26 deletions(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c > index 4490b19752b8..4b9518097899 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c > @@ -725,8 +725,8 @@ int amdgpu_bo_create(struct amdgpu_device *adev, > bo->tbo.resource->mem_type == TTM_PL_VRAM) { > struct dma_fence *fence; > > - r = amdgpu_fill_buffer(NULL, bo, 0, NULL, &fence, NULL, > - true, AMDGPU_KERNEL_JOB_ID_TTM_CLEAR_BUFFER); > + r = amdgpu_ttm_clear_buffer(NULL, bo, NULL, &fence, NULL, > + true, AMDGPU_KERNEL_JOB_ID_TTM_CLEAR_BUFFER); > if (unlikely(r)) > goto fail_unreserve; > > @@ -1324,8 +1324,8 @@ void amdgpu_bo_release_notify(struct ttm_buffer_object *bo) > if (r) > goto out; > > - r = amdgpu_fill_buffer(NULL, abo, 0, &bo->base._resv, &fence, NULL, > - false, AMDGPU_KERNEL_JOB_ID_CLEAR_ON_RELEASE); > + r = amdgpu_ttm_clear_buffer(NULL, abo, &bo->base._resv, &fence, NULL, > + false, AMDGPU_KERNEL_JOB_ID_CLEAR_ON_RELEASE); > if (WARN_ON(r)) > goto out; > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c > index df05768c3817..0a55bc4ea91f 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c > @@ -433,9 +433,9 @@ static int amdgpu_move_blit(struct ttm_buffer_object *bo, > (abo->flags & AMDGPU_GEM_CREATE_VRAM_WIPE_ON_RELEASE)) { > struct dma_fence *wipe_fence = NULL; > > - r = amdgpu_fill_buffer(entity, > - abo, 0, NULL, &wipe_fence, fence, > - false, AMDGPU_KERNEL_JOB_ID_MOVE_BLIT); > + r = amdgpu_ttm_clear_buffer(entity, > + abo, NULL, &wipe_fence, fence, > + false, AMDGPU_KERNEL_JOB_ID_MOVE_BLIT); > if (r) { > goto error; > } else if (wipe_fence) { > @@ -2418,11 +2418,10 @@ static int amdgpu_ttm_fill_mem(struct amdgpu_ring *ring, > } > > /** > - * amdgpu_fill_buffer - fill a buffer with a given value > + * amdgpu_ttm_clear_buffer - fill a buffer with 0 > * @entity: optional entity to use. If NULL, the clearing entities will be > * used to load-balance the partial clears > * @bo: the bo to fill > - * @src_data: the value to set > * @resv: fences contained in this reservation will be used as dependencies. > * @out_fence: the fence from the last clear will be stored here. It might be > * NULL if no job was run. > @@ -2432,14 +2431,13 @@ static int amdgpu_ttm_fill_mem(struct amdgpu_ring *ring, > * @k_job_id: trace id > * > */ > -int amdgpu_fill_buffer(struct amdgpu_ttm_buffer_entity *entity, > - struct amdgpu_bo *bo, > - uint32_t src_data, > - struct dma_resv *resv, > - struct dma_fence **out_fence, > - struct dma_fence *dependency, > - bool consider_clear_status, > - u64 k_job_id) > +int amdgpu_ttm_clear_buffer(struct amdgpu_ttm_buffer_entity *entity, > + struct amdgpu_bo *bo, > + struct dma_resv *resv, > + struct dma_fence **out_fence, > + struct dma_fence *dependency, > + bool consider_clear_status, > + u64 k_job_id) > { > struct amdgpu_device *adev = amdgpu_ttm_adev(bo->tbo.bdev); > struct dma_fence *fence = NULL; > @@ -2486,7 +2484,7 @@ int amdgpu_fill_buffer(struct amdgpu_ttm_buffer_entity *entity, > goto error; > > r = amdgpu_ttm_fill_mem(ring, &entity->base, > - src_data, to, cur_size, resv, > + 0, to, cur_size, resv, > &next, true, k_job_id); > if (r) > goto error; > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h > index e01c2173d79f..585aee9a173b 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h > @@ -181,14 +181,13 @@ int amdgpu_copy_buffer(struct amdgpu_ring *ring, > struct dma_resv *resv, > struct dma_fence **fence, > bool vm_needs_flush, uint32_t copy_flags); > -int amdgpu_fill_buffer(struct amdgpu_ttm_buffer_entity *entity, > - struct amdgpu_bo *bo, > - uint32_t src_data, > - struct dma_resv *resv, > - struct dma_fence **out_fence, > - struct dma_fence *dependency, > - bool consider_clear_status, > - u64 k_job_id); > +int amdgpu_ttm_clear_buffer(struct amdgpu_ttm_buffer_entity *entity, > + struct amdgpu_bo *bo, > + struct dma_resv *resv, > + struct dma_fence **out_fence, > + struct dma_fence *dependency, > + bool consider_clear_status, > + u64 k_job_id); > > int amdgpu_ttm_alloc_gart(struct ttm_buffer_object *bo); > void amdgpu_ttm_recover_gart(struct ttm_buffer_object *tbo);

2 months, 4 weeks

1
0
0 0

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

Linaro-mm-sig