Linaro-mm-sig January 2026

linaro-mm-sig@lists.linaro.org

18 participants
59 discussions

[PATCH v7 0/8] dma-buf: Use revoke mechanism to invalidate shared buffers

by Leon Romanovsky

Changelog: v7: * Fixed messed VFIO patch due to rebase. v6: https://patch.msgid.link/20260130-dmabuf-revoke-v6-0-06278f9b7bf0@nvidia.com * Added Reviewed-by tags. * Changed for blocking wait_for_completion() in VFIO * Fixed race between ->attach and move_notify, where priv->revoked is flipped and lock is released. v5: https://patch.msgid.link/20260124-dmabuf-revoke-v5-0-f98fca917e96@nvidia.com * Documented the DMA-BUF expectations around DMA unmap. * Added wait support in VFIO for DMA unmap. * Reordered patches. * Improved commit messages to document even more. v4: https://lore.kernel.org/all/20260121-dmabuf-revoke-v4-0-d311cbc8633d@nvidia… * Changed DMA_RESV_USAGE_KERNEL to DMA_RESV_USAGE_BOOKKEEP. * Made .invalidate_mapping() truly optional. * Added patch which renames dma_buf_move_notify() to be dma_buf_invalidate_mappings(). * Restored dma_buf_attachment_is_dynamic() function. v3: https://lore.kernel.org/all/20260120-dmabuf-revoke-v3-0-b7e0b07b8214@nvidia… * Used Jason's wordings for commits and cover letter. * Removed IOMMUFD patch. * Renamed dma_buf_attachment_is_revoke() to be dma_buf_attach_revocable(). * Added patch to remove CONFIG_DMABUF_MOVE_NOTIFY. * Added Reviewed-by tags. * Called to dma_resv_wait_timeout() after dma_buf_move_notify() in VFIO. * Added dma_buf_attach_revocable() check to VFIO DMABUF attach function. * Slightly changed commit messages. v2: https://patch.msgid.link/20260118-dmabuf-revoke-v2-0-a03bb27c0875@nvidia.com * Changed series to document the revoke semantics instead of implementing it. v1: https://patch.msgid.link/20260111-dmabuf-revoke-v1-0-fb4bcc8c259b@nvidia.com ------------------------------------------------------------------------- This series is based on latest VFIO fix, which will be sent to Linus very soon. https://lore.kernel.org/all/20260121-vfio-add-pin-v1-1-4e04916b17f1@nvidia.… Thanks ------------------------------------------------------------------------- This series documents a dma-buf “revoke” mechanism: to allow a dma-buf exporter to explicitly invalidate (“kill”) a shared buffer after it has been distributed to importers, so that further CPU and device access is prevented and importers reliably observe failure. The change in this series is to properly document and use existing core “revoked” state on the dma-buf object and a corresponding exporter-triggered revoke operation. dma-buf has quietly allowed calling move_notify on pinned dma-bufs, even though legacy importers using dma_buf_attach() would simply ignore these calls. The intention was that move_notify() would tell the importer to expedite it's unmapping process and once the importer is fully finished with DMA it would unmap the dma-buf which finally signals that the importer is no longer ever going to touch the memory again. Importers that touch past their unmap() call can trigger IOMMU errors, AER and beyond, however read-and-discard access between move_notify() and unmap is allowed. Thus, we can define the exporter's revoke sequence for pinned dma-buf as: dma_resv_lock(dmabuf->resv, NULL); // Prevent new mappings from being established priv->revoked = true; // Tell all importers to eventually unmap dma_buf_invalidate_mappings(dmabuf); // Wait for any inprogress fences on the old mapping dma_resv_wait_timeout(dmabuf->resv, DMA_RESV_USAGE_BOOKKEEP, false, MAX_SCHEDULE_TIMEOUT); dma_resv_unlock(dmabuf->resv, NULL); // Wait for all importers to complete unmap wait_for_completion(&priv->unmapp_comp); However, dma-buf also supports importers that don't do anything on move_notify(), and will not unmap the buffer in bounded time. Since such importers would cause the above sequence to hang, a new mechanism is needed to detect incompatible importers. Introduce dma_buf_attach_revocable() which if true indicates the above sequence is safe to use and will complete in kernel-only bounded time for this attachment. Unfortunately dma_buf_attach_revocable() is going to fail for the popular RDMA pinned importer, which means we cannot introduce it to existing places using pinned move_notify() without potentially breaking existing userspace flows. Existing exporters that only trigger this flow for RAS errors should not call dma_buf_attach_revocable() and will suffer an unbounded block on the final completion, hoping that the userspace will notice the RAS and clean things up. Without revoke support on the RDMA pinned importers it doesn't seem like any other non-breaking option is currently possible. For new exporters, like VFIO and RDMA, that have userspace triggered revoke events, the unbouned sleep would not be acceptable. They can call dma_buf_attach_revocable() and will not work with the RDMA pinned importer from day 0, preventing regressions. In the process add documentation explaining the above details. Thanks Signed-off-by: Leon Romanovsky <leonro(a)nvidia.com> --- Leon Romanovsky (8): dma-buf: Rename .move_notify() callback to a clearer identifier dma-buf: Rename dma_buf_move_notify() to dma_buf_invalidate_mappings() dma-buf: Always build with DMABUF_MOVE_NOTIFY vfio: Wait for dma-buf invalidation to complete dma-buf: Make .invalidate_mapping() truly optional dma-buf: Add dma_buf_attach_revocable() vfio: Permit VFIO to work with pinned importers iommufd: Add dma_buf_pin() drivers/dma-buf/Kconfig | 12 ----- drivers/dma-buf/dma-buf.c | 69 ++++++++++++++++++++----- drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c | 14 ++--- drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 2 +- drivers/gpu/drm/amd/amdkfd/Kconfig | 2 +- drivers/gpu/drm/virtio/virtgpu_prime.c | 2 +- drivers/gpu/drm/xe/tests/xe_dma_buf.c | 7 ++- drivers/gpu/drm/xe/xe_bo.c | 2 +- drivers/gpu/drm/xe/xe_dma_buf.c | 14 ++--- drivers/infiniband/core/umem_dmabuf.c | 13 ----- drivers/infiniband/hw/mlx5/mr.c | 2 +- drivers/iommu/iommufd/pages.c | 11 +++- drivers/iommu/iommufd/selftest.c | 2 +- drivers/vfio/pci/vfio_pci_dmabuf.c | 80 ++++++++++++++++++++++------- include/linux/dma-buf.h | 17 +++--- 15 files changed, 153 insertions(+), 96 deletions(-) --- base-commit: 61ceaf236115f20f4fdd7cf60f883ada1063349a change-id: 20251221-dmabuf-revoke-b90ef16e4236 Best regards, -- Leon Romanovsky <leonro(a)nvidia.com>

1 week

[PATCH v5 0/8] dma-buf: Use revoke mechanism to invalidate shared buffers

by Leon Romanovsky

Changelog: v5: * Documented the DMA-BUF expectations around DMA unmap. * Added wait support in VFIO for DMA unmap. * Reordered patches. * Improved commit messages to document even more. v4: https://lore.kernel.org/all/20260121-dmabuf-revoke-v4-0-d311cbc8633d@nvidia… * Changed DMA_RESV_USAGE_KERNEL to DMA_RESV_USAGE_BOOKKEEP. * Made .invalidate_mapping() truly optional. * Added patch which renames dma_buf_move_notify() to be dma_buf_invalidate_mappings(). * Restored dma_buf_attachment_is_dynamic() function. v3: https://lore.kernel.org/all/20260120-dmabuf-revoke-v3-0-b7e0b07b8214@nvidia… * Used Jason's wordings for commits and cover letter. * Removed IOMMUFD patch. * Renamed dma_buf_attachment_is_revoke() to be dma_buf_attach_revocable(). * Added patch to remove CONFIG_DMABUF_MOVE_NOTIFY. * Added Reviewed-by tags. * Called to dma_resv_wait_timeout() after dma_buf_move_notify() in VFIO. * Added dma_buf_attach_revocable() check to VFIO DMABUF attach function. * Slightly changed commit messages. v2: https://patch.msgid.link/20260118-dmabuf-revoke-v2-0-a03bb27c0875@nvidia.com * Changed series to document the revoke semantics instead of implementing it. v1: https://patch.msgid.link/20260111-dmabuf-revoke-v1-0-fb4bcc8c259b@nvidia.com ------------------------------------------------------------------------- This series is based on latest VFIO fix, which will be sent to Linus very soon. https://lore.kernel.org/all/20260121-vfio-add-pin-v1-1-4e04916b17f1@nvidia.… Thanks ------------------------------------------------------------------------- This series documents a dma-buf “revoke” mechanism: to allow a dma-buf exporter to explicitly invalidate (“kill”) a shared buffer after it has been distributed to importers, so that further CPU and device access is prevented and importers reliably observe failure. The change in this series is to properly document and use existing core “revoked” state on the dma-buf object and a corresponding exporter-triggered revoke operation. dma-buf has quietly allowed calling move_notify on pinned dma-bufs, even though legacy importers using dma_buf_attach() would simply ignore these calls. The intention was that move_notify() would tell the importer to expedite it's unmapping process and once the importer is fully finished with DMA it would unmap the dma-buf which finally signals that the importer is no longer ever going to touch the memory again. Importers that touch past their unmap() call can trigger IOMMU errors, AER and beyond, however read-and-discard access between move_notify() and unmap is allowed. Thus, we can define the exporter's revoke sequence for pinned dma-buf as: dma_resv_lock(dmabuf->resv, NULL); // Prevent new mappings from being established priv->revoked = true; // Tell all importers to eventually unmap dma_buf_invalidate_mappings(dmabuf); // Wait for any inprogress fences on the old mapping dma_resv_wait_timeout(dmabuf->resv, DMA_RESV_USAGE_BOOKKEEP, false, MAX_SCHEDULE_TIMEOUT); dma_resv_unlock(dmabuf->resv, NULL); // Wait for all importers to complete unmap wait_for_completion(&priv->unmapp_comp); However, dma-buf also supports importers that don't do anything on move_notify(), and will not unmap the buffer in bounded time. Since such importers would cause the above sequence to hang, a new mechanism is needed to detect incompatible importers. Introduce dma_buf_attach_revocable() which if true indicates the above sequence is safe to use and will complete in kernel-only bounded time for this attachment. Unfortunately dma_buf_attach_revocable() is going to fail for the popular RDMA pinned importer, which means we cannot introduce it to existing places using pinned move_notify() without potentially breaking existing userspace flows. Existing exporters that only trigger this flow for RAS errors should not call dma_buf_attach_revocable() and will suffer an unbounded block on the final completion, hoping that the userspace will notice the RAS and clean things up. Without revoke support on the RDMA pinned importers it doesn't seem like any other non-breaking option is currently possible. For new exporters, like VFIO and RDMA, that have userspace triggered revoke events, the unbouned sleep would not be acceptable. They can call dma_buf_attach_revocable() and will not work with the RDMA pinned importer from day 0, preventing regressions. In the process add documentation explaining the above details. Thanks Signed-off-by: Leon Romanovsky <leonro(a)nvidia.com> --- Leon Romanovsky (8): dma-buf: Rename .move_notify() callback to a clearer identifier dma-buf: Rename dma_buf_move_notify() to dma_buf_invalidate_mappings() dma-buf: Always build with DMABUF_MOVE_NOTIFY vfio: Wait for dma-buf invalidation to complete dma-buf: Make .invalidate_mapping() truly optional dma-buf: Add dma_buf_attach_revocable() vfio: Permit VFIO to work with pinned importers iommufd: Add dma_buf_pin() drivers/dma-buf/Kconfig | 12 ---- drivers/dma-buf/dma-buf.c | 69 +++++++++++++++++----- drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c | 14 ++--- drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 2 +- drivers/gpu/drm/amd/amdkfd/Kconfig | 2 +- drivers/gpu/drm/virtio/virtgpu_prime.c | 2 +- drivers/gpu/drm/xe/tests/xe_dma_buf.c | 7 +-- drivers/gpu/drm/xe/xe_bo.c | 2 +- drivers/gpu/drm/xe/xe_dma_buf.c | 14 ++--- drivers/infiniband/core/umem_dmabuf.c | 13 ----- drivers/infiniband/hw/mlx5/mr.c | 2 +- drivers/iommu/iommufd/pages.c | 11 +++- drivers/iommu/iommufd/selftest.c | 2 +- drivers/vfio/pci/vfio_pci_dmabuf.c | 90 +++++++++++++++++++++++------ include/linux/dma-buf.h | 17 +++--- 15 files changed, 164 insertions(+), 95 deletions(-) --- base-commit: 61ceaf236115f20f4fdd7cf60f883ada1063349a change-id: 20251221-dmabuf-revoke-b90ef16e4236 Best regards, -- Leon Romanovsky <leonro(a)nvidia.com>

4 weeks, 1 day

[PATCH v6 0/8] dma-buf: Use revoke mechanism to invalidate shared buffers

by Leon Romanovsky

Changelog: v6: * Added Reviewed-by tags. * Changed for blocking wait_for_completion() in VFIO * Fixed race between ->attach and move_notify, where priv->revoked is flipped and lock is released. v5: https://patch.msgid.link/20260124-dmabuf-revoke-v5-0-f98fca917e96@nvidia.com * Documented the DMA-BUF expectations around DMA unmap. * Added wait support in VFIO for DMA unmap. * Reordered patches. * Improved commit messages to document even more. v4: https://lore.kernel.org/all/20260121-dmabuf-revoke-v4-0-d311cbc8633d@nvidia… * Changed DMA_RESV_USAGE_KERNEL to DMA_RESV_USAGE_BOOKKEEP. * Made .invalidate_mapping() truly optional. * Added patch which renames dma_buf_move_notify() to be dma_buf_invalidate_mappings(). * Restored dma_buf_attachment_is_dynamic() function. v3: https://lore.kernel.org/all/20260120-dmabuf-revoke-v3-0-b7e0b07b8214@nvidia… * Used Jason's wordings for commits and cover letter. * Removed IOMMUFD patch. * Renamed dma_buf_attachment_is_revoke() to be dma_buf_attach_revocable(). * Added patch to remove CONFIG_DMABUF_MOVE_NOTIFY. * Added Reviewed-by tags. * Called to dma_resv_wait_timeout() after dma_buf_move_notify() in VFIO. * Added dma_buf_attach_revocable() check to VFIO DMABUF attach function. * Slightly changed commit messages. v2: https://patch.msgid.link/20260118-dmabuf-revoke-v2-0-a03bb27c0875@nvidia.com * Changed series to document the revoke semantics instead of implementing it. v1: https://patch.msgid.link/20260111-dmabuf-revoke-v1-0-fb4bcc8c259b@nvidia.com ------------------------------------------------------------------------- This series is based on latest VFIO fix, which will be sent to Linus very soon. https://lore.kernel.org/all/20260121-vfio-add-pin-v1-1-4e04916b17f1@nvidia.… Thanks ------------------------------------------------------------------------- This series documents a dma-buf “revoke” mechanism: to allow a dma-buf exporter to explicitly invalidate (“kill”) a shared buffer after it has been distributed to importers, so that further CPU and device access is prevented and importers reliably observe failure. The change in this series is to properly document and use existing core “revoked” state on the dma-buf object and a corresponding exporter-triggered revoke operation. dma-buf has quietly allowed calling move_notify on pinned dma-bufs, even though legacy importers using dma_buf_attach() would simply ignore these calls. The intention was that move_notify() would tell the importer to expedite it's unmapping process and once the importer is fully finished with DMA it would unmap the dma-buf which finally signals that the importer is no longer ever going to touch the memory again. Importers that touch past their unmap() call can trigger IOMMU errors, AER and beyond, however read-and-discard access between move_notify() and unmap is allowed. Thus, we can define the exporter's revoke sequence for pinned dma-buf as: dma_resv_lock(dmabuf->resv, NULL); // Prevent new mappings from being established priv->revoked = true; // Tell all importers to eventually unmap dma_buf_invalidate_mappings(dmabuf); // Wait for any inprogress fences on the old mapping dma_resv_wait_timeout(dmabuf->resv, DMA_RESV_USAGE_BOOKKEEP, false, MAX_SCHEDULE_TIMEOUT); dma_resv_unlock(dmabuf->resv, NULL); // Wait for all importers to complete unmap wait_for_completion(&priv->unmapp_comp); However, dma-buf also supports importers that don't do anything on move_notify(), and will not unmap the buffer in bounded time. Since such importers would cause the above sequence to hang, a new mechanism is needed to detect incompatible importers. Introduce dma_buf_attach_revocable() which if true indicates the above sequence is safe to use and will complete in kernel-only bounded time for this attachment. Unfortunately dma_buf_attach_revocable() is going to fail for the popular RDMA pinned importer, which means we cannot introduce it to existing places using pinned move_notify() without potentially breaking existing userspace flows. Existing exporters that only trigger this flow for RAS errors should not call dma_buf_attach_revocable() and will suffer an unbounded block on the final completion, hoping that the userspace will notice the RAS and clean things up. Without revoke support on the RDMA pinned importers it doesn't seem like any other non-breaking option is currently possible. For new exporters, like VFIO and RDMA, that have userspace triggered revoke events, the unbouned sleep would not be acceptable. They can call dma_buf_attach_revocable() and will not work with the RDMA pinned importer from day 0, preventing regressions. In the process add documentation explaining the above details. Thanks Signed-off-by: Leon Romanovsky <leonro(a)nvidia.com> --- Leon Romanovsky (8): dma-buf: Rename .move_notify() callback to a clearer identifier dma-buf: Rename dma_buf_move_notify() to dma_buf_invalidate_mappings() dma-buf: Always build with DMABUF_MOVE_NOTIFY vfio: Wait for dma-buf invalidation to complete dma-buf: Make .invalidate_mapping() truly optional dma-buf: Add dma_buf_attach_revocable() vfio: Permit VFIO to work with pinned importers iommufd: Add dma_buf_pin() drivers/dma-buf/Kconfig | 12 ----- drivers/dma-buf/dma-buf.c | 69 +++++++++++++++++++----- drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c | 14 ++--- drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 2 +- drivers/gpu/drm/amd/amdkfd/Kconfig | 2 +- drivers/gpu/drm/virtio/virtgpu_prime.c | 2 +- drivers/gpu/drm/xe/tests/xe_dma_buf.c | 7 ++- drivers/gpu/drm/xe/xe_bo.c | 2 +- drivers/gpu/drm/xe/xe_dma_buf.c | 14 ++--- drivers/infiniband/core/umem_dmabuf.c | 13 ----- drivers/infiniband/hw/mlx5/mr.c | 2 +- drivers/iommu/iommufd/pages.c | 11 +++- drivers/iommu/iommufd/selftest.c | 2 +- drivers/vfio/pci/vfio_pci_dmabuf.c | 84 ++++++++++++++++++++++------- include/linux/dma-buf.h | 17 +++--- 15 files changed, 157 insertions(+), 96 deletions(-) --- base-commit: 61ceaf236115f20f4fdd7cf60f883ada1063349a change-id: 20251221-dmabuf-revoke-b90ef16e4236 Best regards, -- Leon Romanovsky <leonro(a)nvidia.com>

1 month

Re: [PATCH v5 4/8] vfio: Wait for dma-buf invalidation to complete

by Jason Gunthorpe

On Thu, Jan 29, 2026 at 07:06:37AM +0000, Tian, Kevin wrote: > Bear me if it's an ignorant question. > > The commit msg of patch6 says that VFIO doesn't tolerate unbounded > wait, which is the reason behind the 2nd timeout wait here. As far as I understand dmabuf design a fence wait should complete eventually under kernel control, because these sleeps are sprinkled all around the kernel today. I suspect that is not actually true for every HW, probably something like "shader programs can run forever technically". We can argue if those cases should not report revocable either, but at least this will work "correctly" even if it takes a huge amount of time. I wouldn't mind seeing a shorter timeout and print on the fence too just in case. Jason

1 month

Re: [PATCH v5 4/8] vfio: Wait for dma-buf invalidation to complete

by Leon Romanovsky

On Thu, Jan 29, 2026 at 08:13:18AM +0000, Tian, Kevin wrote: > > From: Leon Romanovsky <leon(a)kernel.org> > > Sent: Thursday, January 29, 2026 3:34 PM > > > > On Thu, Jan 29, 2026 at 07:06:37AM +0000, Tian, Kevin wrote: > > > > From: Jason Gunthorpe <jgg(a)ziepe.ca> > > > > Sent: Wednesday, January 28, 2026 12:28 AM > > > > > > > > On Tue, Jan 27, 2026 at 10:58:35AM +0200, Leon Romanovsky wrote: > > > > > > > @@ -333,7 +359,37 @@ void vfio_pci_dma_buf_move(struct > > > > vfio_pci_core_device *vdev, bool revoked) > > > > > > > dma_resv_lock(priv->dmabuf->resv, NULL); > > > > > > > priv->revoked = revoked; > > > > > > > dma_buf_invalidate_mappings(priv- > > >dmabuf); > > > > > > > + dma_resv_wait_timeout(priv->dmabuf->resv, > > > > > > > + > > DMA_RESV_USAGE_BOOKKEEP, > > > > false, > > > > > > > + > > MAX_SCHEDULE_TIMEOUT); > > > > > > > dma_resv_unlock(priv->dmabuf->resv); > > > > > > > + if (revoked) { > > > > > > > + kref_put(&priv->kref, > > > > vfio_pci_dma_buf_done); > > > > > > > + /* Let's wait till all DMA unmap are > > > > completed. */ > > > > > > > + wait = wait_for_completion_timeout( > > > > > > > + &priv->comp, > > secs_to_jiffies(1)); > > > > > > > > > > > > Is the 1-second constant sufficient for all hardware, or should the > > > > > > invalidate_mappings() contract require the callback to block until > > > > > > speculative reads are strictly fenced? I'm wondering about a case > > where > > > > > > a device's firmware has a high response latency, perhaps due to > > internal > > > > > > management tasks like error recovery or thermal and it exceeds the > > 1s > > > > > > timeout. > > > > > > > > > > > > If the device is in the middle of a large DMA burst and the firmware is > > > > > > slow to flush the internal pipelines to a fully "quiesced" > > > > > > read-and-discard state, reclaiming the memory at exactly 1.001 > > seconds > > > > > > risks triggering platform-level faults.. > > > > > > > > > > > > Since the wen explicitly permit these speculative reads until unmap is > > > > > > complete, relying on a hardcoded timeout in the exporter seems to > > > > > > introduce a hardware-dependent race condition that could > > compromise > > > > > > system stability via IOMMU errors or AER faults. > > > > > > > > > > > > Should the importer instead be required to guarantee that all > > > > > > speculative access has ceased before the invalidation call returns? > > > > > > > > > > It is guaranteed by the dma_resv_wait_timeout() call above. That call > > > > ensures > > > > > that the hardware has completed all pending operations. The 1‑second > > > > delay is > > > > > meant to catch cases where an in-kernel DMA unmap call is missing, > > which > > > > should > > > > > not trigger any DMA activity at that point. > > > > > > > > Christian may know actual examples, but my general feeling is he was > > > > worrying about drivers that have pushed the DMABUF to visibility on > > > > the GPU and the move notify & fences only shoot down some access. So > > > > it has to wait until the DMABUF is finally unmapped. > > > > > > > > Pranjal's example should be covered by the driver adding a fence and > > > > then the unbounded fence wait will complete it. > > > > > > > > > > Bear me if it's an ignorant question. > > > > > > The commit msg of patch6 says that VFIO doesn't tolerate unbounded > > > wait, which is the reason behind the 2nd timeout wait here. > > > > It is not accurate. A second timeout is present both in the > > description of patch 6 and in VFIO implementation. The difference is > > that the timeout is enforced within VFIO. > > > > > > > > Then why is "the unbounded fence wait" not a problem in the same > > > code path? the use of MAX_SCHEDULE_TIMEOUT imply a worst-case > > > timeout in hundreds of years... > > > > "An unbounded fence wait" is a different class of wait. It indicates broken > > hardware that continues to issue DMA transactions even after it has been > > told to > > stop. > > > > The second wait exists to catch software bugs or misuse, where the dma-buf > > importer has misrepresented its capabilities. > > > > Okay I see. > > > > > > > and it'd be helpful to put some words in the code based on what's > > > discussed here. > > > > We've documented as much as we can in dma_buf_attach_revocable() and > > dma_buf_invalidate_mappings(). Do you have any suggestions on what else > > should be added here? > > > > the selection of 1s? It is indirectly written in description of WARN_ON(), but let's add more. What about the following? diff --git a/drivers/vfio/pci/vfio_pci_dmabuf.c b/drivers/vfio/pci/vfio_pci_dmabuf.c index 93795ad2e025..948ba75288c6 100644 --- a/drivers/vfio/pci/vfio_pci_dmabuf.c +++ b/drivers/vfio/pci/vfio_pci_dmabuf.c @@ -357,7 +357,13 @@ void vfio_pci_dma_buf_move(struct vfio_pci_core_device *vdev, bool revoked) dma_resv_unlock(priv->dmabuf->resv); if (revoked) { kref_put(&priv->kref, vfio_pci_dma_buf_done); - /* Let's wait till all DMA unmap are completed. */ + /* + * Let's wait for 1 second till all DMA unmap + * are completed. It is supposed to catch dma-buf + * importers which lied about their support + * of dmabuf revoke. See dma_buf_invalidate_mappings() + * for the expected behaviour, + */ wait = wait_for_completion_timeout( &priv->comp, secs_to_jiffies(1)); /* > > then, > > Reviewed-by: Kevin Tian <kevin.tian(a)intel.com> Thanks

1 month

Re: [PATCH v5 4/8] vfio: Wait for dma-buf invalidation to complete

by Leon Romanovsky

On Thu, Jan 29, 2026 at 07:06:37AM +0000, Tian, Kevin wrote: > > From: Jason Gunthorpe <jgg(a)ziepe.ca> > > Sent: Wednesday, January 28, 2026 12:28 AM > > > > On Tue, Jan 27, 2026 at 10:58:35AM +0200, Leon Romanovsky wrote: > > > > > @@ -333,7 +359,37 @@ void vfio_pci_dma_buf_move(struct > > vfio_pci_core_device *vdev, bool revoked) > > > > > dma_resv_lock(priv->dmabuf->resv, NULL); > > > > > priv->revoked = revoked; > > > > > dma_buf_invalidate_mappings(priv->dmabuf); > > > > > + dma_resv_wait_timeout(priv->dmabuf->resv, > > > > > + DMA_RESV_USAGE_BOOKKEEP, > > false, > > > > > + MAX_SCHEDULE_TIMEOUT); > > > > > dma_resv_unlock(priv->dmabuf->resv); > > > > > + if (revoked) { > > > > > + kref_put(&priv->kref, > > vfio_pci_dma_buf_done); > > > > > + /* Let's wait till all DMA unmap are > > completed. */ > > > > > + wait = wait_for_completion_timeout( > > > > > + &priv->comp, secs_to_jiffies(1)); > > > > > > > > Is the 1-second constant sufficient for all hardware, or should the > > > > invalidate_mappings() contract require the callback to block until > > > > speculative reads are strictly fenced? I'm wondering about a case where > > > > a device's firmware has a high response latency, perhaps due to internal > > > > management tasks like error recovery or thermal and it exceeds the 1s > > > > timeout. > > > > > > > > If the device is in the middle of a large DMA burst and the firmware is > > > > slow to flush the internal pipelines to a fully "quiesced" > > > > read-and-discard state, reclaiming the memory at exactly 1.001 seconds > > > > risks triggering platform-level faults.. > > > > > > > > Since the wen explicitly permit these speculative reads until unmap is > > > > complete, relying on a hardcoded timeout in the exporter seems to > > > > introduce a hardware-dependent race condition that could compromise > > > > system stability via IOMMU errors or AER faults. > > > > > > > > Should the importer instead be required to guarantee that all > > > > speculative access has ceased before the invalidation call returns? > > > > > > It is guaranteed by the dma_resv_wait_timeout() call above. That call > > ensures > > > that the hardware has completed all pending operations. The 1‑second > > delay is > > > meant to catch cases where an in-kernel DMA unmap call is missing, which > > should > > > not trigger any DMA activity at that point. > > > > Christian may know actual examples, but my general feeling is he was > > worrying about drivers that have pushed the DMABUF to visibility on > > the GPU and the move notify & fences only shoot down some access. So > > it has to wait until the DMABUF is finally unmapped. > > > > Pranjal's example should be covered by the driver adding a fence and > > then the unbounded fence wait will complete it. > > > > Bear me if it's an ignorant question. > > The commit msg of patch6 says that VFIO doesn't tolerate unbounded > wait, which is the reason behind the 2nd timeout wait here. It is not accurate. A second timeout is present both in the description of patch 6 and in VFIO implementation. The difference is that the timeout is enforced within VFIO. > > Then why is "the unbounded fence wait" not a problem in the same > code path? the use of MAX_SCHEDULE_TIMEOUT imply a worst-case > timeout in hundreds of years... "An unbounded fence wait" is a different class of wait. It indicates broken hardware that continues to issue DMA transactions even after it has been told to stop. The second wait exists to catch software bugs or misuse, where the dma-buf importer has misrepresented its capabilities. > > and it'd be helpful to put some words in the code based on what's > discussed here. We've documented as much as we can in dma_buf_attach_revocable() and dma_buf_invalidate_mappings(). Do you have any suggestions on what else should be added here? Thanks

1 month

Re: [PATCH v5 4/8] vfio: Wait for dma-buf invalidation to complete

by Leon Romanovsky

On Mon, Jan 26, 2026 at 08:53:57PM +0000, Pranjal Shrivastava wrote: > On Sat, Jan 24, 2026 at 09:14:16PM +0200, Leon Romanovsky wrote: > > From: Leon Romanovsky <leonro(a)nvidia.com> > > > > dma-buf invalidation is handled asynchronously by the hardware, so VFIO > > must wait until all affected objects have been fully invalidated. > > > > In addition, the dma-buf exporter is expecting that all importers unmap any > > buffers they previously mapped. > > > > Fixes: 5d74781ebc86 ("vfio/pci: Add dma-buf export support for MMIO regions") > > Signed-off-by: Leon Romanovsky <leonro(a)nvidia.com> > > --- > > drivers/vfio/pci/vfio_pci_dmabuf.c | 71 ++++++++++++++++++++++++++++++++++++-- > > 1 file changed, 68 insertions(+), 3 deletions(-) <...> > > @@ -333,7 +359,37 @@ void vfio_pci_dma_buf_move(struct vfio_pci_core_device *vdev, bool revoked) > > dma_resv_lock(priv->dmabuf->resv, NULL); > > priv->revoked = revoked; > > dma_buf_invalidate_mappings(priv->dmabuf); > > + dma_resv_wait_timeout(priv->dmabuf->resv, > > + DMA_RESV_USAGE_BOOKKEEP, false, > > + MAX_SCHEDULE_TIMEOUT); > > dma_resv_unlock(priv->dmabuf->resv); > > + if (revoked) { > > + kref_put(&priv->kref, vfio_pci_dma_buf_done); > > + /* Let's wait till all DMA unmap are completed. */ > > + wait = wait_for_completion_timeout( > > + &priv->comp, secs_to_jiffies(1)); > > Is the 1-second constant sufficient for all hardware, or should the > invalidate_mappings() contract require the callback to block until > speculative reads are strictly fenced? I'm wondering about a case where > a device's firmware has a high response latency, perhaps due to internal > management tasks like error recovery or thermal and it exceeds the 1s > timeout. > > If the device is in the middle of a large DMA burst and the firmware is > slow to flush the internal pipelines to a fully "quiesced" > read-and-discard state, reclaiming the memory at exactly 1.001 seconds > risks triggering platform-level faults.. > > Since the wen explicitly permit these speculative reads until unmap is > complete, relying on a hardcoded timeout in the exporter seems to > introduce a hardware-dependent race condition that could compromise > system stability via IOMMU errors or AER faults. > > Should the importer instead be required to guarantee that all > speculative access has ceased before the invalidation call returns? It is guaranteed by the dma_resv_wait_timeout() call above. That call ensures that the hardware has completed all pending operations. The 1‑second delay is meant to catch cases where an in-kernel DMA unmap call is missing, which should not trigger any DMA activity at that point. So yes, one second is more than sufficient. Thanks > > Thanks > Praan > > > + /* > > + * If you see this WARN_ON, it means that > > + * importer didn't call unmap in response to > > + * dma_buf_invalidate_mappings() which is not > > + * allowed. > > + */ > > + WARN(!wait, > > + "Timed out waiting for DMABUF unmap, importer has a broken invalidate_mapping()"); > > + } else { > > + /* > > + * Kref is initialize again, because when revoke > > + * was performed the reference counter was decreased > > + * to zero to trigger completion. > > + */ > > + kref_init(&priv->kref); > > + /* > > + * There is no need to wait as no mapping was > > + * performed when the previous status was > > + * priv->revoked == true. > > + */ > > + reinit_completion(&priv->comp); > > + } > > } > > fput(priv->dmabuf->file); > > } > > @@ -346,6 +402,8 @@ void vfio_pci_dma_buf_cleanup(struct vfio_pci_core_device *vdev) > > > > down_write(&vdev->memory_lock); > > list_for_each_entry_safe(priv, tmp, &vdev->dmabufs, dmabufs_elm) { > > + unsigned long wait; > > + > > if (!get_file_active(&priv->dmabuf->file)) > > continue; > > > > @@ -354,7 +412,14 @@ void vfio_pci_dma_buf_cleanup(struct vfio_pci_core_device *vdev) > > priv->vdev = NULL; > > priv->revoked = true; > > dma_buf_invalidate_mappings(priv->dmabuf); > > + dma_resv_wait_timeout(priv->dmabuf->resv, > > + DMA_RESV_USAGE_BOOKKEEP, false, > > + MAX_SCHEDULE_TIMEOUT); > > dma_resv_unlock(priv->dmabuf->resv); > > + kref_put(&priv->kref, vfio_pci_dma_buf_done); > > + wait = wait_for_completion_timeout(&priv->comp, > > + secs_to_jiffies(1)); > > + WARN_ON(!wait); > > vfio_device_put_registration(&vdev->vdev); > > fput(priv->dmabuf->file); > > } > > > > -- > > 2.52.0 > > > > >

1 month

Independence for dma_fences! v6

by Christian König

Hi everyone, dma_fences have ever lived under the tyranny dictated by the module lifetime of their issuer, leading to crashes should anybody still holding a reference to a dma_fence when the module of the issuer was unloaded. The basic problem is that when buffer are shared between drivers dma_fence objects can leak into external drivers and stay there even after they are signaled. The dma_resv object for example only lazy releases dma_fences. So what happens is that when the module who originally created the dma_fence unloads the dma_fence_ops function table becomes unavailable as well and so any attempt to release the fence crashes the system. Previously various approaches have been discussed, including changing the locking semantics of the dma_fence callbacks (by me) as well as using the drm scheduler as intermediate layer (by Sima) to disconnect dma_fences from their actual users, but none of them are actually solving all problems. Tvrtko did some really nice prerequisite work by protecting the returned strings of the dma_fence_ops by RCU. This way dma_fence creators where able to just wait for an RCU grace period after fence signaling before they could be save to free those data structures. Now this patch set here goes a step further and protects the whole dma_fence_ops structure by RCU, so that after the fence signals the pointer to the dma_fence_ops is set to NULL when there is no wait nor release callback given. All functionality which use the dma_fence_ops reference are put inside an RCU critical section, except for the deprecated issuer specific wait and of course the optional release callback. Additional to the RCU changes the lock protecting the dma_fence state previously had to be allocated external. This set here now changes the functionality to make that external lock optional and allows dma_fences to use an inline lock and be self contained. v4: Rebases the whole set on upstream changes, especially the cleanup from Philip in patch "drm/amdgpu: independence for the amdkfd_fence!". Adding two patches which brings the DMA-fence self tests up to date. The first selftest changes removes the mock_wait and so actually starts testing the default behavior instead of some hacky implementation in the test. This one got upstreamed independent of this set. The second drops the mock_fence as well and tests the new RCU and inline spinlock functionality. v5: Rebase on top of drm-misc-next instead of drm-tip, leave out all driver changes for now since those should go through the driver specific paths anyway. Address a few more review comments, especially some rebase mess and typos. And finally fix one more bug found by AMDs CI system. v6: Minor style changes, re-ordered patch #1, dropped the scheduler fence change for now Please review and comment, Christian.

1 month

Re: [PATCH v5 6/8] dma-buf: Add dma_buf_attach_revocable()

by Jason Gunthorpe

On Mon, Jan 26, 2026 at 08:38:44PM +0000, Pranjal Shrivastava wrote: > I noticed that Patch 5 removes the invalidate_mappings stub from > umem_dmabuf.c, effectively making the callback NULL for an RDMA > importer. Consequently, dma_buf_attach_revocable() (introduced here) > will return false for these importers. Yes, that is the intention. > Since the cover letter mentions that VFIO will use > dma_buf_attach_revocable() to prevent unbounded waits, this appears to > effectively block paths like the VFIO-export -> RDMA-import path.. It remains usable with the ODP path and people are using that right now. > Given that RDMA is a significant consumer of dma-bufs, are there plans > to implement proper revocation support in the IB/RDMA core (umem_dmabuf)? This depends on each HW, they need a way to implement the revoke semantic. I can't guess what is possible, but I would hope that most HW could at least do a revoke on a real MR. Eg a MR rereg operation to a kernel owned empty PD is an effective "revoke", and MR rereg is at least defined by standards so HW should implement it. > It would be good to know if there's a plan for bringing such importers > into compliance with the new revocation semantics so they can interop > with VFIO OR are we completely ruling out users like RDMA / IB importing > any DMABUFs exported by VFIO? It will be driver dependent, there is no one shot update here. Jason

1 month

Re: [PATCH rdma-next 1/2] RDMA/uverbs: Add DMABUF object type and operations

by Leon Romanovsky

On Thu, Jan 08, 2026 at 01:11:14PM +0200, Edward Srouji wrote: > From: Yishai Hadas <yishaih(a)nvidia.com> > > Expose DMABUF functionality to userspace through the uverbs interface, > enabling InfiniBand/RDMA devices to export PCI based memory regions > (e.g. device memory) as DMABUF file descriptors. This allows > zero-copy sharing of RDMA memory with other subsystems that support the > dma-buf framework. > > A new UVERBS_OBJECT_DMABUF object type and allocation method were > introduced. > > During allocation, uverbs invokes the driver to supply the > rdma_user_mmap_entry associated with the given page offset (pgoff). > > Based on the returned rdma_user_mmap_entry, uverbs requests the driver > to provide the corresponding physical-memory details as well as the > driver’s PCI provider information. > > Using this information, dma_buf_export() is called; if it succeeds, > uobj->object is set to the underlying file pointer returned by the > dma-buf framework. > > The file descriptor number follows the standard uverbs allocation flow, > but the file pointer comes from the dma-buf subsystem, including its own > fops and private data. > > Because of this, alloc_begin_fd_uobject() must handle cases where > fd_type->fops is NULL, and both alloc_commit_fd_uobject() and > alloc_abort_fd_uobject() must account for whether filp->private_data > exists, since it is only populated after a successful dma_buf_export(). > > When an mmap entry is removed, uverbs iterates over its associated > DMABUFs, marks them as revoked, and calls dma_buf_move_notify() so that > their importers are notified. > > The same procedure applies during the disassociate flow; final cleanup > occurs when the application closes the file. > > Signed-off-by: Yishai Hadas <yishaih(a)nvidia.com> > Signed-off-by: Edward Srouji <edwards(a)nvidia.com> > --- > drivers/infiniband/core/Makefile | 1 + > drivers/infiniband/core/device.c | 2 + > drivers/infiniband/core/ib_core_uverbs.c | 19 +++ > drivers/infiniband/core/rdma_core.c | 63 ++++---- > drivers/infiniband/core/rdma_core.h | 1 + > drivers/infiniband/core/uverbs.h | 10 ++ > drivers/infiniband/core/uverbs_std_types_dmabuf.c | 172 ++++++++++++++++++++++ > drivers/infiniband/core/uverbs_uapi.c | 1 + > include/rdma/ib_verbs.h | 9 ++ > include/rdma/uverbs_types.h | 1 + > include/uapi/rdma/ib_user_ioctl_cmds.h | 10 ++ > 11 files changed, 263 insertions(+), 26 deletions(-) <...> > +static struct sg_table * > +uverbs_dmabuf_map(struct dma_buf_attachment *attachment, > + enum dma_data_direction dir) > +{ > + struct ib_uverbs_dmabuf_file *priv = attachment->dmabuf->priv; > + > + dma_resv_assert_held(priv->dmabuf->resv); > + > + if (priv->revoked) > + return ERR_PTR(-ENODEV); > + > + return dma_buf_phys_vec_to_sgt(attachment, priv->provider, > + &priv->phys_vec, 1, priv->phys_vec.len, > + dir); > +} > + > +static void uverbs_dmabuf_unmap(struct dma_buf_attachment *attachment, > + struct sg_table *sgt, > + enum dma_data_direction dir) > +{ > + dma_buf_free_sgt(attachment, sgt, dir); > +} Unfortunately, it is not enough. Exporters should count their map<->unmap calls and make sure that they are equal. See this VFIO change https://lore.kernel.org/kvm/20260124-dmabuf-revoke-v5-4-f98fca917e96@nvidia… Thanks

1 month, 1 week

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

Linaro-mm-sig January 2026