The drm/ttm patch modifies TTM to support multiple contexts for the pipelined moves.
Then amdgpu/ttm is updated to express dependencies between jobs explicitely,
instead of relying on the ordering of execution guaranteed by the use of a single
instance.
With all of this in place, we can use multiple entities, with each having access
to the available SDMA instances.
This rework also gives the opportunity to merge the clear functions into a single
one and to optimize a bit GART usage.
Since v3 some patches have been already reviewed and merged separately:
- https://lists.freedesktop.org/archives/amd-gfx/2026-January/137747.html
- https://gitlab.freedesktop.org/drm/kernel/-/commit/ddf055b80a544d6f36f77be5…
This version depend on them.
v3: https://lists.freedesktop.org/archives/dri-devel/2025-November/537830.html
Pierre-Eric Pelloux-Prayer (12):
drm/amdgpu: allocate clear entities dynamically
drm/amdgpu: allocate move entities dynamically
drm/amdgpu: round robin through clear_entities in amdgpu_fill_buffer
drm/amdgpu: use TTM_NUM_MOVE_FENCES when reserving fences
drm/amdgpu: use multiple entities in amdgpu_move_blit
drm/amdgpu: pass all the sdma scheds to amdgpu_mman
drm/amdgpu: only use working sdma schedulers for ttm
drm/amdgpu: create multiple clear/move ttm entities
drm/amdgpu: give ttm entities access to all the sdma scheds
drm/amdgpu: get rid of amdgpu_ttm_clear_buffer
drm/amdgpu: rename amdgpu_fill_buffer as amdgpu_ttm_clear_buffer
drm/amdgpu: split amdgpu_ttm_set_buffer_funcs_status in 2 funcs
drivers/gpu/drm/amd/amdgpu/amdgpu.h | 2 +
drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 5 +-
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 16 +-
drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c | 4 +-
drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 17 +-
drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 329 ++++++++++--------
drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h | 29 +-
drivers/gpu/drm/amd/amdgpu/amdgpu_vkms.c | 6 +-
drivers/gpu/drm/amd/amdgpu/cik_sdma.c | 13 +-
drivers/gpu/drm/amd/amdgpu/sdma_v2_4.c | 8 +-
drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c | 8 +-
drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c | 15 +-
drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c | 12 +-
drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c | 11 +-
drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c | 14 +-
drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c | 5 +-
drivers/gpu/drm/amd/amdgpu/sdma_v7_0.c | 5 +-
drivers/gpu/drm/amd/amdgpu/sdma_v7_1.c | 12 +-
drivers/gpu/drm/amd/amdgpu/si_dma.c | 12 +-
drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 5 +-
drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 3 +-
.../amd/display/amdgpu_dm/amdgpu_dm_plane.c | 6 +-
.../drm/amd/display/amdgpu_dm/amdgpu_dm_wb.c | 6 +-
23 files changed, 300 insertions(+), 243 deletions(-)
--
2.43.0
The cma dma-buf heaps let userspace allocate buffers in CMA regions
without enforcing limits. Register a dmem region per cma heap and charge
against it when allocating a buffer in a cma heap.
For the default cma region, two heaps may be created for the same cma
range:
commit 854acbe75ff4 ("dma-buf: heaps: Give default CMA heap a fixed name")
Introduced /dev/dma_heap/default_cma_region
commit 4f5f8baf7341 ("dma-buf: heaps: cma: Create CMA heap for each CMA
reserved region")
Created a CMA heap for each CMA region, which might create a duplicate
heap to the default one, e.g:
/dev/dma_heap/default_cma_region
/dev/dma_heap/reserved
Removing the legacy heap would break user API. So handle the special
case by using one dmem between the two heaps to account charges
correctly.
Signed-off-by: Eric Chanudet <echanude(a)redhat.com>
---
In continuation with introducing cgroup for the system heap[1], this
behavior is enabled based on dma_heap.mem_accounting, disabled by
default.
dmem is chosen for CMA heaps as it allows limits to be set for each
region backing each heap. There is one caveat for the default cma range
that may accessible through two different cma heaps, which is treated as
a special case.
[1] https://lore.kernel.org/all/20260116-dmabuf-heap-system-memcg-v3-0-ecc6b62c…
---
drivers/dma-buf/heaps/cma_heap.c | 51 ++++++++++++++++++++++++++++++++++++----
1 file changed, 46 insertions(+), 5 deletions(-)
diff --git a/drivers/dma-buf/heaps/cma_heap.c b/drivers/dma-buf/heaps/cma_heap.c
index 49cc45fb42dd7200c3c14384bcfdbe85323454b1..608af8ad6bce7fe0321da6d8f1b65a69f5d8d950 100644
--- a/drivers/dma-buf/heaps/cma_heap.c
+++ b/drivers/dma-buf/heaps/cma_heap.c
@@ -27,6 +27,7 @@
#include <linux/scatterlist.h>
#include <linux/slab.h>
#include <linux/vmalloc.h>
+#include <linux/cgroup_dmem.h>
#define DEFAULT_CMA_NAME "default_cma_region"
@@ -46,7 +47,9 @@ int __init dma_heap_cma_register_heap(struct cma *cma)
struct cma_heap {
struct dma_heap *heap;
struct cma *cma;
+ struct dmem_cgroup_region *cg;
};
+static struct dmem_cgroup_region *default_cma_cg;
struct cma_heap_buffer {
struct cma_heap *heap;
@@ -58,6 +61,7 @@ struct cma_heap_buffer {
pgoff_t pagecount;
int vmap_cnt;
void *vaddr;
+ struct dmem_cgroup_pool_state *pool;
};
struct dma_heap_attachment {
@@ -276,6 +280,7 @@ static void cma_heap_dma_buf_release(struct dma_buf *dmabuf)
kfree(buffer->pages);
/* release memory */
cma_release(cma_heap->cma, buffer->cma_pages, buffer->pagecount);
+ dmem_cgroup_uncharge(buffer->pool, buffer->len);
kfree(buffer);
}
@@ -319,9 +324,16 @@ static struct dma_buf *cma_heap_allocate(struct dma_heap *heap,
if (align > CONFIG_CMA_ALIGNMENT)
align = CONFIG_CMA_ALIGNMENT;
+ if (mem_accounting) {
+ ret = dmem_cgroup_try_charge(cma_heap->cg, size,
+ &buffer->pool, NULL);
+ if (ret)
+ goto free_buffer;
+ }
+
cma_pages = cma_alloc(cma_heap->cma, pagecount, align, false);
if (!cma_pages)
- goto free_buffer;
+ goto uncharge_cgroup;
/* Clear the cma pages */
if (PageHighMem(cma_pages)) {
@@ -376,6 +388,8 @@ static struct dma_buf *cma_heap_allocate(struct dma_heap *heap,
kfree(buffer->pages);
free_cma:
cma_release(cma_heap->cma, cma_pages, pagecount);
+uncharge_cgroup:
+ dmem_cgroup_uncharge(buffer->pool, size);
free_buffer:
kfree(buffer);
@@ -390,25 +404,52 @@ static int __init __add_cma_heap(struct cma *cma, const char *name)
{
struct dma_heap_export_info exp_info;
struct cma_heap *cma_heap;
+ struct dmem_cgroup_region *region;
+ int ret;
cma_heap = kzalloc(sizeof(*cma_heap), GFP_KERNEL);
if (!cma_heap)
return -ENOMEM;
cma_heap->cma = cma;
+ /*
+ * If two heaps are created for the default cma region, use the same
+ * dmem for them. They both use the same memory pool.
+ */
+ if (dev_get_cma_area(NULL) == cma && default_cma_cg)
+ region = default_cma_cg;
+ else {
+ region = dmem_cgroup_register_region(cma_get_size(cma), "cma/%s", name);
+ if (IS_ERR(region)) {
+ ret = PTR_ERR(region);
+ goto free_cma_heap;
+ }
+ }
+ cma_heap->cg = region;
+
exp_info.name = name;
exp_info.ops = &cma_heap_ops;
exp_info.priv = cma_heap;
cma_heap->heap = dma_heap_add(&exp_info);
if (IS_ERR(cma_heap->heap)) {
- int ret = PTR_ERR(cma_heap->heap);
-
- kfree(cma_heap);
- return ret;
+ ret = PTR_ERR(cma_heap->heap);
+ goto cg_unregister;
}
+ if (dev_get_cma_area(NULL) == cma && !default_cma_cg)
+ default_cma_cg = region;
+
return 0;
+
+cg_unregister:
+ /* default_cma_cg == cma_heap->cg only for the duplicate heap. */
+ if (default_cma_cg != cma_heap->cg)
+ dmem_cgroup_unregister_region(cma_heap->cg);
+free_cma_heap:
+ kfree(cma_heap);
+
+ return ret;
}
static int __init add_cma_heaps(void)
---
base-commit: 3d65e4c276b32c03450261d114e495fda03c8e97
change-id: 20260128-dmabuf-heap-cma-dmem-f4120a2df4a8
Best regards,
--
Eric Chanudet <echanude(a)redhat.com>
I invested $320,000 in Tether (USDT) on a fraudulent website after falling for a romantic scam. I felt completely helpless and in need of assistance after realizing I had been duped. I started looking for a hacker online and found SAFEGUARD RECOVERY. I had optimism because of his professionalism and knowledge. I'm happy to report that SAFEGUARD RECOVERY successfully recovered my stolen money after working relentlessly to do so! I am immensely appreciative of their help and heartily urge anyone in a comparable circumstance to use their services. I'm grateful
Email: safeguardbitcoin(a)consultant.com
WhatsApp: +44 7426 168300
Website: https://safeguardbitcoin.wixsite.com/safeguard-bitcoin--1
Changelog:
v6:
* Added Reviewed-by tags.
* Changed for blocking wait_for_completion() in VFIO
* Fixed race between ->attach and move_notify, where priv->revoked is
flipped and lock is released.
v5: https://patch.msgid.link/20260124-dmabuf-revoke-v5-0-f98fca917e96@nvidia.com
* Documented the DMA-BUF expectations around DMA unmap.
* Added wait support in VFIO for DMA unmap.
* Reordered patches.
* Improved commit messages to document even more.
v4: https://lore.kernel.org/all/20260121-dmabuf-revoke-v4-0-d311cbc8633d@nvidia…
* Changed DMA_RESV_USAGE_KERNEL to DMA_RESV_USAGE_BOOKKEEP.
* Made .invalidate_mapping() truly optional.
* Added patch which renames dma_buf_move_notify() to be
dma_buf_invalidate_mappings().
* Restored dma_buf_attachment_is_dynamic() function.
v3: https://lore.kernel.org/all/20260120-dmabuf-revoke-v3-0-b7e0b07b8214@nvidia…
* Used Jason's wordings for commits and cover letter.
* Removed IOMMUFD patch.
* Renamed dma_buf_attachment_is_revoke() to be dma_buf_attach_revocable().
* Added patch to remove CONFIG_DMABUF_MOVE_NOTIFY.
* Added Reviewed-by tags.
* Called to dma_resv_wait_timeout() after dma_buf_move_notify() in VFIO.
* Added dma_buf_attach_revocable() check to VFIO DMABUF attach function.
* Slightly changed commit messages.
v2: https://patch.msgid.link/20260118-dmabuf-revoke-v2-0-a03bb27c0875@nvidia.com
* Changed series to document the revoke semantics instead of
implementing it.
v1: https://patch.msgid.link/20260111-dmabuf-revoke-v1-0-fb4bcc8c259b@nvidia.com
-------------------------------------------------------------------------
This series is based on latest VFIO fix, which will be sent to Linus
very soon.
https://lore.kernel.org/all/20260121-vfio-add-pin-v1-1-4e04916b17f1@nvidia.…
Thanks
-------------------------------------------------------------------------
This series documents a dma-buf “revoke” mechanism: to allow a dma-buf
exporter to explicitly invalidate (“kill”) a shared buffer after it has
been distributed to importers, so that further CPU and device access is
prevented and importers reliably observe failure.
The change in this series is to properly document and use existing core
“revoked” state on the dma-buf object and a corresponding exporter-triggered
revoke operation.
dma-buf has quietly allowed calling move_notify on pinned dma-bufs, even
though legacy importers using dma_buf_attach() would simply ignore
these calls.
The intention was that move_notify() would tell the importer to expedite
it's unmapping process and once the importer is fully finished with DMA it
would unmap the dma-buf which finally signals that the importer is no
longer ever going to touch the memory again. Importers that touch past
their unmap() call can trigger IOMMU errors, AER and beyond, however
read-and-discard access between move_notify() and unmap is allowed.
Thus, we can define the exporter's revoke sequence for pinned dma-buf as:
dma_resv_lock(dmabuf->resv, NULL);
// Prevent new mappings from being established
priv->revoked = true;
// Tell all importers to eventually unmap
dma_buf_invalidate_mappings(dmabuf);
// Wait for any inprogress fences on the old mapping
dma_resv_wait_timeout(dmabuf->resv,
DMA_RESV_USAGE_BOOKKEEP, false,
MAX_SCHEDULE_TIMEOUT);
dma_resv_unlock(dmabuf->resv, NULL);
// Wait for all importers to complete unmap
wait_for_completion(&priv->unmapp_comp);
However, dma-buf also supports importers that don't do anything on
move_notify(), and will not unmap the buffer in bounded time.
Since such importers would cause the above sequence to hang, a new
mechanism is needed to detect incompatible importers.
Introduce dma_buf_attach_revocable() which if true indicates the above
sequence is safe to use and will complete in kernel-only bounded time for
this attachment.
Unfortunately dma_buf_attach_revocable() is going to fail for the popular
RDMA pinned importer, which means we cannot introduce it to existing
places using pinned move_notify() without potentially breaking existing
userspace flows.
Existing exporters that only trigger this flow for RAS errors should not
call dma_buf_attach_revocable() and will suffer an unbounded block on the
final completion, hoping that the userspace will notice the RAS and clean
things up. Without revoke support on the RDMA pinned importers it doesn't
seem like any other non-breaking option is currently possible.
For new exporters, like VFIO and RDMA, that have userspace triggered
revoke events, the unbouned sleep would not be acceptable. They can call
dma_buf_attach_revocable() and will not work with the RDMA pinned importer
from day 0, preventing regressions.
In the process add documentation explaining the above details.
Thanks
Signed-off-by: Leon Romanovsky <leonro(a)nvidia.com>
---
Leon Romanovsky (8):
dma-buf: Rename .move_notify() callback to a clearer identifier
dma-buf: Rename dma_buf_move_notify() to dma_buf_invalidate_mappings()
dma-buf: Always build with DMABUF_MOVE_NOTIFY
vfio: Wait for dma-buf invalidation to complete
dma-buf: Make .invalidate_mapping() truly optional
dma-buf: Add dma_buf_attach_revocable()
vfio: Permit VFIO to work with pinned importers
iommufd: Add dma_buf_pin()
drivers/dma-buf/Kconfig | 12 -----
drivers/dma-buf/dma-buf.c | 69 +++++++++++++++++++-----
drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c | 14 ++---
drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 2 +-
drivers/gpu/drm/amd/amdkfd/Kconfig | 2 +-
drivers/gpu/drm/virtio/virtgpu_prime.c | 2 +-
drivers/gpu/drm/xe/tests/xe_dma_buf.c | 7 ++-
drivers/gpu/drm/xe/xe_bo.c | 2 +-
drivers/gpu/drm/xe/xe_dma_buf.c | 14 ++---
drivers/infiniband/core/umem_dmabuf.c | 13 -----
drivers/infiniband/hw/mlx5/mr.c | 2 +-
drivers/iommu/iommufd/pages.c | 11 +++-
drivers/iommu/iommufd/selftest.c | 2 +-
drivers/vfio/pci/vfio_pci_dmabuf.c | 84 ++++++++++++++++++++++-------
include/linux/dma-buf.h | 17 +++---
15 files changed, 157 insertions(+), 96 deletions(-)
---
base-commit: 61ceaf236115f20f4fdd7cf60f883ada1063349a
change-id: 20251221-dmabuf-revoke-b90ef16e4236
Best regards,
--
Leon Romanovsky <leonro(a)nvidia.com>
Changelog:
v5:
* Documented the DMA-BUF expectations around DMA unmap.
* Added wait support in VFIO for DMA unmap.
* Reordered patches.
* Improved commit messages to document even more.
v4: https://lore.kernel.org/all/20260121-dmabuf-revoke-v4-0-d311cbc8633d@nvidia…
* Changed DMA_RESV_USAGE_KERNEL to DMA_RESV_USAGE_BOOKKEEP.
* Made .invalidate_mapping() truly optional.
* Added patch which renames dma_buf_move_notify() to be
dma_buf_invalidate_mappings().
* Restored dma_buf_attachment_is_dynamic() function.
v3: https://lore.kernel.org/all/20260120-dmabuf-revoke-v3-0-b7e0b07b8214@nvidia…
* Used Jason's wordings for commits and cover letter.
* Removed IOMMUFD patch.
* Renamed dma_buf_attachment_is_revoke() to be dma_buf_attach_revocable().
* Added patch to remove CONFIG_DMABUF_MOVE_NOTIFY.
* Added Reviewed-by tags.
* Called to dma_resv_wait_timeout() after dma_buf_move_notify() in VFIO.
* Added dma_buf_attach_revocable() check to VFIO DMABUF attach function.
* Slightly changed commit messages.
v2: https://patch.msgid.link/20260118-dmabuf-revoke-v2-0-a03bb27c0875@nvidia.com
* Changed series to document the revoke semantics instead of
implementing it.
v1: https://patch.msgid.link/20260111-dmabuf-revoke-v1-0-fb4bcc8c259b@nvidia.com
-------------------------------------------------------------------------
This series is based on latest VFIO fix, which will be sent to Linus
very soon.
https://lore.kernel.org/all/20260121-vfio-add-pin-v1-1-4e04916b17f1@nvidia.…
Thanks
-------------------------------------------------------------------------
This series documents a dma-buf “revoke” mechanism: to allow a dma-buf
exporter to explicitly invalidate (“kill”) a shared buffer after it has
been distributed to importers, so that further CPU and device access is
prevented and importers reliably observe failure.
The change in this series is to properly document and use existing core
“revoked” state on the dma-buf object and a corresponding exporter-triggered
revoke operation.
dma-buf has quietly allowed calling move_notify on pinned dma-bufs, even
though legacy importers using dma_buf_attach() would simply ignore
these calls.
The intention was that move_notify() would tell the importer to expedite
it's unmapping process and once the importer is fully finished with DMA it
would unmap the dma-buf which finally signals that the importer is no
longer ever going to touch the memory again. Importers that touch past
their unmap() call can trigger IOMMU errors, AER and beyond, however
read-and-discard access between move_notify() and unmap is allowed.
Thus, we can define the exporter's revoke sequence for pinned dma-buf as:
dma_resv_lock(dmabuf->resv, NULL);
// Prevent new mappings from being established
priv->revoked = true;
// Tell all importers to eventually unmap
dma_buf_invalidate_mappings(dmabuf);
// Wait for any inprogress fences on the old mapping
dma_resv_wait_timeout(dmabuf->resv,
DMA_RESV_USAGE_BOOKKEEP, false,
MAX_SCHEDULE_TIMEOUT);
dma_resv_unlock(dmabuf->resv, NULL);
// Wait for all importers to complete unmap
wait_for_completion(&priv->unmapp_comp);
However, dma-buf also supports importers that don't do anything on
move_notify(), and will not unmap the buffer in bounded time.
Since such importers would cause the above sequence to hang, a new
mechanism is needed to detect incompatible importers.
Introduce dma_buf_attach_revocable() which if true indicates the above
sequence is safe to use and will complete in kernel-only bounded time for
this attachment.
Unfortunately dma_buf_attach_revocable() is going to fail for the popular
RDMA pinned importer, which means we cannot introduce it to existing
places using pinned move_notify() without potentially breaking existing
userspace flows.
Existing exporters that only trigger this flow for RAS errors should not
call dma_buf_attach_revocable() and will suffer an unbounded block on the
final completion, hoping that the userspace will notice the RAS and clean
things up. Without revoke support on the RDMA pinned importers it doesn't
seem like any other non-breaking option is currently possible.
For new exporters, like VFIO and RDMA, that have userspace triggered
revoke events, the unbouned sleep would not be acceptable. They can call
dma_buf_attach_revocable() and will not work with the RDMA pinned importer
from day 0, preventing regressions.
In the process add documentation explaining the above details.
Thanks
Signed-off-by: Leon Romanovsky <leonro(a)nvidia.com>
---
Leon Romanovsky (8):
dma-buf: Rename .move_notify() callback to a clearer identifier
dma-buf: Rename dma_buf_move_notify() to dma_buf_invalidate_mappings()
dma-buf: Always build with DMABUF_MOVE_NOTIFY
vfio: Wait for dma-buf invalidation to complete
dma-buf: Make .invalidate_mapping() truly optional
dma-buf: Add dma_buf_attach_revocable()
vfio: Permit VFIO to work with pinned importers
iommufd: Add dma_buf_pin()
drivers/dma-buf/Kconfig | 12 ----
drivers/dma-buf/dma-buf.c | 69 +++++++++++++++++-----
drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c | 14 ++---
drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 2 +-
drivers/gpu/drm/amd/amdkfd/Kconfig | 2 +-
drivers/gpu/drm/virtio/virtgpu_prime.c | 2 +-
drivers/gpu/drm/xe/tests/xe_dma_buf.c | 7 +--
drivers/gpu/drm/xe/xe_bo.c | 2 +-
drivers/gpu/drm/xe/xe_dma_buf.c | 14 ++---
drivers/infiniband/core/umem_dmabuf.c | 13 -----
drivers/infiniband/hw/mlx5/mr.c | 2 +-
drivers/iommu/iommufd/pages.c | 11 +++-
drivers/iommu/iommufd/selftest.c | 2 +-
drivers/vfio/pci/vfio_pci_dmabuf.c | 90 +++++++++++++++++++++++------
include/linux/dma-buf.h | 17 +++---
15 files changed, 164 insertions(+), 95 deletions(-)
---
base-commit: 61ceaf236115f20f4fdd7cf60f883ada1063349a
change-id: 20251221-dmabuf-revoke-b90ef16e4236
Best regards,
--
Leon Romanovsky <leonro(a)nvidia.com>
In a typical dma-buf use case, a dmabuf exporter makes its buffer
buffer available to an importer by mapping it using DMA APIs
such as dma_map_sgtable() or dma_map_resource(). However, this
is not desirable in some cases where the exporter and importer
are directly connected via a physical or virtual link (or
interconnect) and the importer can access the buffer without
having it DMA mapped.
So, to address this scenario, this patch series adds APIs to map/
unmap dmabufs via interconnects and also provides a helper to
identify the first common interconnect between the exporter and
importer. Furthermore, this patch series also adds support for
IOV interconnect in the vfio-pci driver and Intel Xe driver.
The IOV interconnect is a virtual interconnect between an SRIOV
physical function (PF) and its virtual functions (VFs). And, for
the IOV interconnect, the addresses associated with a buffer are
shared using an xarray (instead of an sg_table) that is populated
with entries of type struct range.
The dma-buf patches in this series are based on ideas/suggestions
provided by Jason Gunthorpe, Christian Koenig and Thomas Hellström.
Changelog:
RFC -> RFCv2:
- Add documentation for the new dma-buf APIs and types (Thomas)
- Change the interconnect type from enum to unique pointer (Thomas)
- Moved the new dma-buf APIs to a separate file
- Store a copy of the interconnect matching data in the attachment
- Simplified the macros to create and match interconnects
- Use struct device instead of struct pci_dev in match data
- Replace DRM_INTERCONNECT_DRIVER with XE_INTERCONNECT_VRAM during
address encoding (Matt, Thomas)
- Drop is_devmem_external and instead rely on bo->dma_data.dma_addr
to check for imported VRAM BOs (Matt)
- Pass XE_PAGE_SIZE as the last parameter to xe_bo_addr (Matt)
- Add a check to prevent malicious VF from accessing other VF's
addresses (Thomas)
- Fallback to legacy (map_dma_buf) mapping method if mapping via
interconnect fails
Patchset overview:
Patch 1-3: Add dma-buf APIs to map/unmap and match
Patch 4: Add support for IOV interconnect in vfio-pci driver
Patch 5: Add support for IOV interconnect in Xe driver
Patch 6-8: Create and use a new dma_addr array for LMEM based
dmabuf BOs to store translated addresses (DPAs)
This series is rebased on top of the following repo:
https://git.kernel.org/pub/scm/linux/kernel/git/leon/linux-rdma.git/log/?h=…
Associated Qemu patch series:
https://lore.kernel.org/qemu-devel/20251003234138.85820-1-vivek.kasireddy@i…
Associated vfio-pci patch series:
https://lore.kernel.org/dri-devel/cover.1760368250.git.leon@kernel.org/
This series is tested using the following method:
- Run Qemu with the following relevant options:
qemu-system-x86_64 -m 4096m ....
-device ioh3420,id=root_port1,bus=pcie.0
-device x3130-upstream,id=upstream1,bus=root_port1
-device xio3130-downstream,id=downstream1,bus=upstream1,chassis=9
-device xio3130-downstream,id=downstream2,bus=upstream1,chassis=10
-device vfio-pci,host=0000:03:00.1,bus=downstream1
-device virtio-gpu,max_outputs=1,blob=true,xres=1920,yres=1080,bus=downstream2
-display gtk,gl=on
-object memory-backend-memfd,id=mem1,size=4096M
-machine q35,accel=kvm,memory-backend=mem1 ...
- Run Gnome Wayland with the following options in the Guest VM:
# cat /usr/lib/udev/rules.d/61-mutter-primary-gpu.rules
ENV{DEVNAME}=="/dev/dri/card1", TAG+="mutter-device-preferred-primary", TAG+="mutter-device-disable-kms-modifiers"
# XDG_SESSION_TYPE=wayland dbus-run-session -- /usr/bin/gnome-shell --wayland --no-x11 &
Cc: Jason Gunthorpe <jgg(a)nvidia.com>
Cc: Leon Romanovsky <leonro(a)nvidia.com>
Cc: Christian Koenig <christian.koenig(a)amd.com>
Cc: Sumit Semwal <sumit.semwal(a)linaro.org>
Cc: Thomas Hellström <thomas.hellstrom(a)linux.intel.com>
Cc: Simona Vetter <simona.vetter(a)ffwll.ch>
Cc: Matthew Brost <matthew.brost(a)intel.com>
Cc: Dongwon Kim <dongwon.kim(a)intel.com>
Vivek Kasireddy (8):
dma-buf: Add support for map/unmap APIs for interconnects
dma-buf: Add a helper to match interconnects between exporter/importer
dma-buf: Create and expose IOV interconnect to all exporters/importers
vfio/pci/dmabuf: Add support for IOV interconnect
drm/xe/dma_buf: Add support for IOV interconnect
drm/xe/pf: Add a helper function to get a VF's backing object in LMEM
drm/xe/bo: Create new dma_addr array for dmabuf BOs associated with
VFs
drm/xe/pt: Add an additional check for dmabuf BOs while doing bind
drivers/dma-buf/Makefile | 2 +-
drivers/dma-buf/dma-buf-interconnect.c | 164 +++++++++++++++++++++
drivers/dma-buf/dma-buf.c | 12 +-
drivers/gpu/drm/xe/xe_bo.c | 162 ++++++++++++++++++--
drivers/gpu/drm/xe/xe_bo_types.h | 6 +
drivers/gpu/drm/xe/xe_dma_buf.c | 17 ++-
drivers/gpu/drm/xe/xe_gt_sriov_pf_config.c | 24 +++
drivers/gpu/drm/xe/xe_gt_sriov_pf_config.h | 1 +
drivers/gpu/drm/xe/xe_pt.c | 8 +
drivers/gpu/drm/xe/xe_sriov_pf_types.h | 19 +++
drivers/vfio/pci/vfio_pci_dmabuf.c | 135 ++++++++++++++++-
include/linux/dma-buf-interconnect.h | 122 +++++++++++++++
include/linux/dma-buf.h | 41 ++++++
13 files changed, 691 insertions(+), 22 deletions(-)
create mode 100644 drivers/dma-buf/dma-buf-interconnect.c
create mode 100644 include/linux/dma-buf-interconnect.h
--
2.50.1
This patch series introduces dma-buf export support for RDMA/InfiniBand
devices, enabling userspace applications to export RDMA PCI-backed
memory regions (such as device memory or mlx5 UAR pages) as dma-buf file
descriptors.
This allows PCI device memory to be shared with other kernel subsystems
(e.g., graphics or media) or between userspace processes, via the
standard dma-buf interface, avoiding unnecessary copies and enabling
efficient peer-to-peer (P2P) DMA transfers. See [1] for background on
dma-buf.
As part of this series, we introduce a new uverbs object of type FD for
dma-buf export, along with the corresponding APIs for allocation and
teardown. This object encapsulates all attributes required to export a
dma-buf.
The implementation enforces P2P-only mappings and properly manages
resource lifecycle, including:
- Cleanup during driver removal or RDMA context destruction.
- Revocation via dma_buf_move_notify() when the underlying mmap entries
are removed.
- Refactors common cleanup logic for reuse across FD uobject types.
The infrastructure is generic within uverbs, allowing individual drivers
to easily integrate and supply their vendor-specific implementation.
The mlx5 driver is the first consumer of this new API, providing:
- Initialization of PCI peer-to-peer DMA support.
- mlx5-specific implementations of the mmap_get_pfns and
pgoff_to_mmap_entry device operations required for dma-buf export.
[1] https://docs.kernel.org/driver-api/dma-buf.html
Signed-off-by: Yishai Hadas <yishaih(a)nvidia.com>
Signed-off-by: Edward Srouji <edwards(a)nvidia.com>
---
Changes in v2:
- Split the FD uobject refactoring into a separate patch
("RDMA: Add support for exporting dma-buf file descriptors")
- Remove redundant revoked check from attach callback. It is checked
during map
- Add pin callback that returns -EOPNOTSUPP to explicitly refuse pinned
importers
- Wait for pending fences after dma_buf_move_notify() using
dma_resv_wait_timeout() to ensure hardware has completed all in-flight
operations before proceeding
- Link to v1: https://lore.kernel.org/r/20260108-dmabuf-export-v1-0-6d47d46580d3@nvidia.c…
---
Yishai Hadas (3):
RDMA/uverbs: Support external FD uobjects
RDMA/uverbs: Add DMABUF object type and operations
RDMA/mlx5: Implement DMABUF export ops
drivers/infiniband/core/Makefile | 1 +
drivers/infiniband/core/device.c | 2 +
drivers/infiniband/core/ib_core_uverbs.c | 22 +++
drivers/infiniband/core/rdma_core.c | 63 ++++----
drivers/infiniband/core/rdma_core.h | 1 +
drivers/infiniband/core/uverbs.h | 10 ++
drivers/infiniband/core/uverbs_std_types_dmabuf.c | 176 ++++++++++++++++++++++
drivers/infiniband/core/uverbs_uapi.c | 1 +
drivers/infiniband/hw/mlx5/main.c | 72 +++++++++
include/rdma/ib_verbs.h | 9 ++
include/rdma/uverbs_types.h | 1 +
include/uapi/rdma/ib_user_ioctl_cmds.h | 10 ++
12 files changed, 342 insertions(+), 26 deletions(-)
---
base-commit: 325e3b5431ddd27c5f93156b36838a351e3b2f72
change-id: 20260108-dmabuf-export-0d598058dd1e
Best regards,
--
Edward Srouji <edwards(a)nvidia.com>
Changelog:
v3:
* Used Jason's wordings for commits and cover letter.
* Removed IOMMUFD patch.
* Renamed dma_buf_attachment_is_revoke() to be dma_buf_attach_revocable().
* Added patch to remove CONFIG_DMABUF_MOVE_NOTIFY.
* Added Reviewed-by tags.
* Called to dma_resv_wait_timeout() after dma_buf_move_notify() in VFIO.
* Added dma_buf_attach_revocable() check to VFIO DMABUF attach function.
* Slightly changed commit messages.
v2: https://patch.msgid.link/20260118-dmabuf-revoke-v2-0-a03bb27c0875@nvidia.com
* Changed series to document the revoke semantics instead of
implementing it.
v1: https://patch.msgid.link/20260111-dmabuf-revoke-v1-0-fb4bcc8c259b@nvidia.com
-------------------------------------------------------------------------
This series documents a dma-buf “revoke” mechanism: to allow a dma-buf
exporter to explicitly invalidate (“kill”) a shared buffer after it has
been distributed to importers, so that further CPU and device access is
prevented and importers reliably observe failure.
The change in this series is to properly document and use existing core
“revoked” state on the dma-buf object and a corresponding exporter-triggered
revoke operation.
dma-buf has quietly allowed calling move_notify on pinned dma-bufs, even
though legacy importers using dma_buf_attach() would simply ignore
these calls.
RDMA saw this and needed to use allow_peer2peer=true, so implemented a
new-style pinned importer with an explicitly non-working move_notify()
callback.
This has been tolerable because the existing exporters are thought to
only call move_notify() on a pinned DMABUF under RAS events and we
have been willing to tolerate the UAF that results by allowing the
importer to continue to use the mapping in this rare case.
VFIO wants to implement a pin supporting exporter that will issue a
revoking move_notify() around FLRs and a few other user triggerable
operations. Since this is much more common we are not willing to
tolerate the security UAF caused by interworking with non-move_notify()
supporting drivers. Thus till now VFIO has required dynamic importers,
even though it never actually moves the buffer location.
To allow VFIO to work with pinned importers, according to how dma-buf
was intended, we need to allow VFIO to detect if an importer is legacy
or RDMA and does not actually implement move_notify().
Introduce a new function that exporters can call to detect these less
capable importers. VFIO can then refuse to accept them during attach.
In theory all exporters that call move_notify() on pinned dma-buf's
should call this function, however that would break a number of widely
used NIC/GPU flows. Thus for now do not spread this further than VFIO
until we can understand how much of RDMA can implement the full
semantic.
In the process clarify how move_notify is intended to be used with
pinned dma-bufs.
Thanks
Signed-off-by: Leon Romanovsky <leonro(a)nvidia.com>
---
Leon Romanovsky (7):
dma-buf: Rename .move_notify() callback to a clearer identifier
dma-buf: Always build with DMABUF_MOVE_NOTIFY
dma-buf: Document RDMA non-ODP invalidate_mapping() special case
dma-buf: Add check function for revoke semantics
iommufd: Pin dma-buf importer for revoke semantics
vfio: Wait for dma-buf invalidation to complete
vfio: Validate dma-buf revocation semantics
drivers/dma-buf/Kconfig | 12 -----
drivers/dma-buf/dma-buf.c | 69 +++++++++++++++++++++++------
drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c | 14 +++---
drivers/gpu/drm/amd/amdkfd/Kconfig | 2 +-
drivers/gpu/drm/virtio/virtgpu_prime.c | 2 +-
drivers/gpu/drm/xe/tests/xe_dma_buf.c | 7 ++-
drivers/gpu/drm/xe/xe_dma_buf.c | 14 +++---
drivers/infiniband/core/umem_dmabuf.c | 13 +-----
drivers/infiniband/hw/mlx5/mr.c | 2 +-
drivers/iommu/iommufd/pages.c | 11 ++++-
drivers/vfio/pci/vfio_pci_dmabuf.c | 8 ++++
include/linux/dma-buf.h | 9 ++--
12 files changed, 96 insertions(+), 67 deletions(-)
---
base-commit: 9ace4753a5202b02191d54e9fdf7f9e3d02b85eb
change-id: 20251221-dmabuf-revoke-b90ef16e4236
Best regards,
--
Leon Romanovsky <leonro(a)nvidia.com>
This patch series introduces dma-buf export support for RDMA/InfiniBand
devices, enabling userspace applications to export RDMA PCI-backed
memory regions (such as device memory or mlx5 UAR pages) as dma-buf file
descriptors.
This allows PCI device memory to be shared with other kernel subsystems
(e.g., graphics or media) or between userspace processes, via the
standard dma-buf interface, avoiding unnecessary copies and enabling
efficient peer-to-peer (P2P) DMA transfers. See [1] for background on
dma-buf.
As part of this series, we introduce a new uverbs object of type FD for
dma-buf export, along with the corresponding APIs for allocation and
teardown. This object encapsulates all attributes required to export a
dma-buf.
The implementation enforces P2P-only mappings and properly manages
resource lifecycle, including:
- Cleanup during driver removal or RDMA context destruction.
- Revocation via dma_buf_move_notify() when the underlying mmap entries
are removed.
- Refactors common cleanup logic for reuse across FD uobject types.
The infrastructure is generic within uverbs, allowing individual drivers
to easily integrate and supply their vendor-specific implementation.
The mlx5 driver is the first consumer of this new API, providing:
- Initialization of PCI peer-to-peer DMA support.
- mlx5-specific implementations of the mmap_get_pfns and
pgoff_to_mmap_entry device operations required for dma-buf export.
[1] https://docs.kernel.org/driver-api/dma-buf.html
Signed-off-by: Yishai Hadas <yishaih(a)nvidia.com>
Signed-off-by: Edward Srouji <edwards(a)nvidia.com>
---
Yishai Hadas (2):
RDMA/uverbs: Add DMABUF object type and operations
RDMA/mlx5: Implement DMABUF export ops
drivers/infiniband/core/Makefile | 1 +
drivers/infiniband/core/device.c | 2 +
drivers/infiniband/core/ib_core_uverbs.c | 19 +++
drivers/infiniband/core/rdma_core.c | 63 ++++----
drivers/infiniband/core/rdma_core.h | 1 +
drivers/infiniband/core/uverbs.h | 10 ++
drivers/infiniband/core/uverbs_std_types_dmabuf.c | 172 ++++++++++++++++++++++
drivers/infiniband/core/uverbs_uapi.c | 1 +
drivers/infiniband/hw/mlx5/main.c | 72 +++++++++
include/rdma/ib_verbs.h | 9 ++
include/rdma/uverbs_types.h | 1 +
include/uapi/rdma/ib_user_ioctl_cmds.h | 10 ++
12 files changed, 335 insertions(+), 26 deletions(-)
---
base-commit: 325e3b5431ddd27c5f93156b36838a351e3b2f72
change-id: 20260108-dmabuf-export-0d598058dd1e
Best regards,
--
Edward Srouji <edwards(a)nvidia.com>
Changelog:
v4:
* Changed DMA_RESV_USAGE_KERNEL to DMA_RESV_USAGE_BOOKKEEP.
* Made .invalidate_mapping() truly optional.
* Added patch which renames dma_buf_move_notify() to be
dma_buf_invalidate_mappings().
* Restored dma_buf_attachment_is_dynamic() function.
v3: https://lore.kernel.org/all/20260120-dmabuf-revoke-v3-0-b7e0b07b8214@nvidia…
* Used Jason's wordings for commits and cover letter.
* Removed IOMMUFD patch.
* Renamed dma_buf_attachment_is_revoke() to be dma_buf_attach_revocable().
* Added patch to remove CONFIG_DMABUF_MOVE_NOTIFY.
* Added Reviewed-by tags.
* Called to dma_resv_wait_timeout() after dma_buf_move_notify() in VFIO.
* Added dma_buf_attach_revocable() check to VFIO DMABUF attach function.
* Slightly changed commit messages.
v2: https://patch.msgid.link/20260118-dmabuf-revoke-v2-0-a03bb27c0875@nvidia.com
* Changed series to document the revoke semantics instead of
implementing it.
v1: https://patch.msgid.link/20260111-dmabuf-revoke-v1-0-fb4bcc8c259b@nvidia.com
-------------------------------------------------------------------------
This series documents a dma-buf “revoke” mechanism: to allow a dma-buf
exporter to explicitly invalidate (“kill”) a shared buffer after it has
been distributed to importers, so that further CPU and device access is
prevented and importers reliably observe failure.
The change in this series is to properly document and use existing core
“revoked” state on the dma-buf object and a corresponding exporter-triggered
revoke operation.
dma-buf has quietly allowed calling move_notify on pinned dma-bufs, even
though legacy importers using dma_buf_attach() would simply ignore
these calls.
RDMA saw this and needed to use allow_peer2peer=true, so implemented a
new-style pinned importer with an explicitly non-working move_notify()
callback.
This has been tolerable because the existing exporters are thought to
only call move_notify() on a pinned DMABUF under RAS events and we
have been willing to tolerate the UAF that results by allowing the
importer to continue to use the mapping in this rare case.
VFIO wants to implement a pin supporting exporter that will issue a
revoking move_notify() around FLRs and a few other user triggerable
operations. Since this is much more common we are not willing to
tolerate the security UAF caused by interworking with non-move_notify()
supporting drivers. Thus till now VFIO has required dynamic importers,
even though it never actually moves the buffer location.
To allow VFIO to work with pinned importers, according to how dma-buf
was intended, we need to allow VFIO to detect if an importer is legacy
or RDMA and does not actually implement move_notify().
In theory all exporters that call move_notify() on pinned dma-buf's
should call this function, however that would break a number of widely
used NIC/GPU flows. Thus for now do not spread this further than VFIO
until we can understand how much of RDMA can implement the full
semantic.
In the process clarify how move_notify is intended to be used with
pinned dma-bufs.
Thanks
Signed-off-by: Leon Romanovsky <leonro(a)nvidia.com>
---
Leon Romanovsky (8):
dma-buf: Rename .move_notify() callback to a clearer identifier
dma-buf: Rename dma_buf_move_notify() to dma_buf_invalidate_mappings()
dma-buf: Always build with DMABUF_MOVE_NOTIFY
dma-buf: Make .invalidate_mapping() truly optional
dma-buf: Add check function for revoke semantics
iommufd: Pin dma-buf importer for revoke semantics
vfio: Wait for dma-buf invalidation to complete
vfio: Validate dma-buf revocation semantics
drivers/dma-buf/Kconfig | 12 -------
drivers/dma-buf/dma-buf.c | 53 ++++++++++++++++++++++-------
drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c | 14 +++-----
drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 2 +-
drivers/gpu/drm/amd/amdkfd/Kconfig | 2 +-
drivers/gpu/drm/virtio/virtgpu_prime.c | 2 +-
drivers/gpu/drm/xe/tests/xe_dma_buf.c | 7 ++--
drivers/gpu/drm/xe/xe_bo.c | 2 +-
drivers/gpu/drm/xe/xe_dma_buf.c | 14 +++-----
drivers/infiniband/core/umem_dmabuf.c | 13 -------
drivers/infiniband/hw/mlx5/mr.c | 2 +-
drivers/iommu/iommufd/pages.c | 11 ++++--
drivers/iommu/iommufd/selftest.c | 2 +-
drivers/vfio/pci/vfio_pci_dmabuf.c | 13 +++++--
include/linux/dma-buf.h | 9 ++---
15 files changed, 84 insertions(+), 74 deletions(-)
---
base-commit: 9ace4753a5202b02191d54e9fdf7f9e3d02b85eb
change-id: 20251221-dmabuf-revoke-b90ef16e4236
Best regards,
--
Leon Romanovsky <leonro(a)nvidia.com>