Linaro-mm-sig November 2025

linaro-mm-sig@lists.linaro.org

19 participants
100 discussions

Re: [PATCH] dma-buf: add no-op stubs when CONFIG_DMA_SHARED_BUFFER is disabled

by Christian König

On 11/27/25 08:40, Viresh Kumar wrote: > Move several dma-buf function declarations under > CONFIG_DMA_SHARED_BUFFER and provide static inline no-op implementations > for the disabled case to allow the callers to build when the feature is > not compiled in. Good point, but which driver actually needs that? At least the whole DRM subsystem selects CONFIG_DMA_SHARED_BUFFER and others (like V4L) usually don't compile their whole infrastructure when that option isn't selected. In other words there should be a concrete example of what breaks in the commit message. > > Signed-off-by: Viresh Kumar <viresh.kumar(a)linaro.org> > --- > include/linux/dma-buf.h | 116 ++++++++++++++++++++++++++++++++++++++++ > 1 file changed, 116 insertions(+) > > diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h > index d58e329ac0e7..06e494d8f6b0 100644 > --- a/include/linux/dma-buf.h > +++ b/include/linux/dma-buf.h > @@ -568,6 +568,7 @@ static inline bool dma_buf_is_dynamic(struct dma_buf *dmabuf) > return !!dmabuf->ops->pin; > } > > +#ifdef CONFIG_DMA_SHARED_BUFFER > struct dma_buf_attachment *dma_buf_attach(struct dma_buf *dmabuf, > struct device *dev); > struct dma_buf_attachment * > @@ -609,4 +610,119 @@ int dma_buf_vmap_unlocked(struct dma_buf *dmabuf, struct iosys_map *map); > void dma_buf_vunmap_unlocked(struct dma_buf *dmabuf, struct iosys_map *map); > struct dma_buf *dma_buf_iter_begin(void); > struct dma_buf *dma_buf_iter_next(struct dma_buf *dmbuf); > + > +#else > +static inline struct dma_buf_attachment *dma_buf_attach(struct dma_buf *dmabuf, > + struct device *dev) > +{ > + return NULL; This should probably be an ERR_PTR(-EOPNOTSUPP); > +} > + > +static inline struct dma_buf_attachment * > +dma_buf_dynamic_attach(struct dma_buf *dmabuf, struct device *dev, > + const struct dma_buf_attach_ops *importer_ops, > + void *importer_priv) > +{ > + return NULL; Same here, ERR_PTR(-EOPNOTSUPP). > +} > + > +static inline void dma_buf_detach(struct dma_buf *dmabuf, > + struct dma_buf_attachment *attach) { } > + > +static inline int dma_buf_pin(struct dma_buf_attachment *attach) > +{ > + return -EOPNOTSUPP; > +} > + > +static inline void dma_buf_unpin(struct dma_buf_attachment *attach) { } > + > +static inline struct dma_buf * > +dma_buf_export(const struct dma_buf_export_info *exp_info) > +{ > + return NULL; > +} > + > + > +static inline int dma_buf_fd(struct dma_buf *dmabuf, int flags) > +{ > + return -EOPNOTSUPP; > +} > + > + > +static inline struct dma_buf *dma_buf_get(int fd) > +{ > + return NULL; And here ERR_PTR(-EINVAL). > +} > + > +static inline void dma_buf_put(struct dma_buf *dmabuf) { } > + > +static inline struct sg_table * > +dma_buf_map_attachment(struct dma_buf_attachment *, enum dma_data_direction) > +{ > + return NULL; ERR_PTR(-EINVAL) > +} > + > +static inline void dma_buf_unmap_attachment(struct dma_buf_attachment *, > + struct sg_table *, > + enum dma_data_direction) { } > + > +static inline void dma_buf_move_notify(struct dma_buf *dma_buf) { } > + > +static inline int dma_buf_begin_cpu_access(struct dma_buf *dma_buf, > + enum dma_data_direction dir) > +{ > + return -EOPNOTSUPP; > +} > + > +static inline int dma_buf_end_cpu_access(struct dma_buf *dma_buf, > + enum dma_data_direction dir) > +{ > + return -EOPNOTSUPP; > +} > + > +static inline struct sg_table * > +dma_buf_map_attachment_unlocked(struct dma_buf_attachment *attach, > + enum dma_data_direction direction) > +{ > + return NULL; ERR_PTR(-EINVAL) > +} > + > +static inline void > +dma_buf_unmap_attachment_unlocked(struct dma_buf_attachment *attach, > + struct sg_table *sg_table, > + enum dma_data_direction direction) { } > + > +static inline int dma_buf_mmap(struct dma_buf *, struct vm_area_struct *, > + unsigned long) > +{ > + return -EOPNOTSUPP; > +} > + > +static inline int dma_buf_vmap(struct dma_buf *dmabuf, struct iosys_map *map) > +{ > + return -EOPNOTSUPP; > +} > + > +static inline void dma_buf_vunmap(struct dma_buf *dmabuf, struct iosys_map *map) > +{ } > + > +static inline int dma_buf_vmap_unlocked(struct dma_buf *dmabuf, > + struct iosys_map *map) > +{ > + return -EOPNOTSUPP; > +} > + > +static inline void dma_buf_vunmap_unlocked(struct dma_buf *dmabuf, > + struct iosys_map *map) { } > + > +static inline struct dma_buf *dma_buf_iter_begin(void) > +{ > + return NULL; > +} > + > +static inline struct dma_buf *dma_buf_iter_next(struct dma_buf *dmbuf) > +{ > + return NULL; > +} Those two are only for BPF and not driver use. Regards, Christian. > +#endif /* CONFIG_DMA_SHARED_BUFFER */ > #endif /* __DMA_BUF_H__ */

2 months, 2 weeks

Re: [PATCH 2/6] amd/amdkfd: Ignore return code of dma_fence_signal()

by Kuehling, Felix

On 2025-11-26 08:19, Philipp Stanner wrote: > The return code of dma_fence_signal() is not really useful as there is > nothing reasonable to do if a fence was already signaled. That return > code shall be removed from the kernel. > > Ignore dma_fence_signal()'s return code. I think this is not correct. Looking at the comment in evict_process_worker, we use the return value to decide a race conditions where multiple threads are trying to signal the eviction fence. Only one of them should schedule the restore work. And the other ones need to increment the reference count to keep evictions balanced. Regards, Felix > > Suggested-by: Christian König <christian.koenig(a)amd.com> > Signed-off-by: Philipp Stanner <phasta(a)kernel.org> > --- > drivers/gpu/drm/amd/amdkfd/kfd_process.c | 5 ++--- > 1 file changed, 2 insertions(+), 3 deletions(-) > > diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c b/drivers/gpu/drm/amd/amdkfd/kfd_process.c > index ddfe30c13e9d..950fafa4b3c3 100644 > --- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c > +++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c > @@ -1986,7 +1986,6 @@ kfd_process_gpuid_from_node(struct kfd_process *p, struct kfd_node *node, > static int signal_eviction_fence(struct kfd_process *p) > { > struct dma_fence *ef; > - int ret; > > rcu_read_lock(); > ef = dma_fence_get_rcu_safe(&p->ef); > @@ -1994,10 +1993,10 @@ static int signal_eviction_fence(struct kfd_process *p) > if (!ef) > return -EINVAL; > > - ret = dma_fence_signal(ef); > + dma_fence_signal(ef); > dma_fence_put(ef); > > - return ret; > + return 0; > } > > static void evict_process_worker(struct work_struct *work)

2 months, 2 weeks

[PATCH v2 0/9] Initial DMABUF support for iommufd

by Jason Gunthorpe

This series is the start of adding full DMABUF support to iommufd. Currently it is limited to only work with VFIO's DMABUF exporter. It sits on top of Leon's series to add a DMABUF exporter to VFIO: https://lore.kernel.org/all/20251120-dmabuf-vfio-v9-0-d7f71607f371@nvidia.c… The existing IOMMU_IOAS_MAP_FILE is enhanced to detect DMABUF fd's, but otherwise works the same as it does today for a memfd. The user can select a slice of the FD to map into the ioas and if the underliyng alignment requirements are met it will be placed in the iommu_domain. Though limited, it is enough to allow a VMM like QEMU to connect MMIO BAR memory from VFIO to an iommu_domain controlled by iommufd. This is used for PCI Peer to Peer support in VMs, and is the last feature that the VFIO type 1 container has that iommufd couldn't do. The VFIO type1 version extracts raw PFNs from VMAs, which has no lifetime control and is a use-after-free security problem. Instead iommufd relies on revokable DMABUFs. Whenever VFIO thinks there should be no access to the MMIO it can shoot down the mapping in iommufd which will unmap it from the iommu_domain. There is no automatic remap, this is a safety protocol so the kernel doesn't get stuck. Userspace is expected to know it is doing something that will revoke the dmabuf and map/unmap it around the activity. Eg when QEMU goes to issue FLR it should do the map/unmap to iommufd. Since DMABUF is missing some key general features for this use case it relies on a "private interconnect" between VFIO and iommufd via the vfio_pci_dma_buf_iommufd_map() call. The call confirms the DMABUF has revoke semantics and delivers a phys_addr for the memory suitable for use with iommu_map(). Medium term there is a desire to expand the supported DMABUFs to include GPU drivers to support DPDK/SPDK type use cases so future series will work to add a general concept of revoke and a general negotiation of interconnect to remove vfio_pci_dma_buf_iommufd_map(). I also plan another series to modify iommufd's vfio_compat to transparently pull a dmabuf out of a VFIO VMA to emulate more of the uAPI of type1. The latest series for interconnect negotation to exchange a phys_addr is: https://lore.kernel.org/r/20251027044712.1676175-1-vivek.kasireddy@intel.com And the discussion for design of revoke is here: https://lore.kernel.org/dri-devel/20250114173103.GE5556@nvidia.com/ This is on github: https://github.com/jgunthorpe/linux/commits/iommufd_dmabuf v2: - Rebase on Leon's v9 - Fix mislocking in an iopt_fill_domain() error path - Revise the comments around how the sub page offset works - Remove a useless WARN_ON in iopt_pages_rw_access() - Fixed missed memory free in the selftest v1: https://patch.msgid.link/r/0-v1-64bed2430cdb+31b-iommufd_dmabuf_jgg@nvidia.… Jason Gunthorpe (9): vfio/pci: Add vfio_pci_dma_buf_iommufd_map() iommufd: Add DMABUF to iopt_pages iommufd: Do not map/unmap revoked DMABUFs iommufd: Allow a DMABUF to be revoked iommufd: Allow MMIO pages in a batch iommufd: Have pfn_reader process DMABUF iopt_pages iommufd: Have iopt_map_file_pages convert the fd to a file iommufd: Accept a DMABUF through IOMMU_IOAS_MAP_FILE iommufd/selftest: Add some tests for the dmabuf flow drivers/iommu/iommufd/io_pagetable.c | 78 +++- drivers/iommu/iommufd/io_pagetable.h | 54 ++- drivers/iommu/iommufd/ioas.c | 8 +- drivers/iommu/iommufd/iommufd_private.h | 14 +- drivers/iommu/iommufd/iommufd_test.h | 10 + drivers/iommu/iommufd/main.c | 10 + drivers/iommu/iommufd/pages.c | 414 ++++++++++++++++-- drivers/iommu/iommufd/selftest.c | 143 ++++++ drivers/vfio/pci/vfio_pci_dmabuf.c | 34 ++ include/linux/vfio_pci_core.h | 4 + tools/testing/selftests/iommu/iommufd.c | 43 ++ tools/testing/selftests/iommu/iommufd_utils.h | 44 ++ 12 files changed, 786 insertions(+), 70 deletions(-) base-commit: f836737ed56db9e2d5b047c56a31e05af0f3f116 -- 2.43.0

2 months, 2 weeks

Re: [PATCH] dma-buf: fix integer overflow in fill_sg_entry() for buffers >= 8GiB

by Jason Gunthorpe

On Tue, Nov 25, 2025 at 05:11:18PM -0800, Alex Mastro wrote: > fill_sg_entry() splits large DMA buffers into multiple scatter-gather > entries, each holding up to UINT_MAX bytes. When calculating the DMA > address for entries beyond the second one, the expression (i * UINT_MAX) > causes integer overflow due to 32-bit arithmetic. > > This manifests when the input arg length >= 8 GiB results in looping for > i >= 2. > > Fix by casting i to dma_addr_t before multiplication. > > Fixes: 3aa31a8bb11e ("dma-buf: provide phys_vec to scatter-gather mapping routine") > Signed-off-by: Alex Mastro <amastro(a)fb.com> > --- > More color about how I discovered this in [1] for the commit at [2]: > > [1] https://lore.kernel.org/all/aSZHO6otK0Heh+Qj@devgpu015.cco6.facebook.com > [2] https://lore.kernel.org/all/20251120-dmabuf-vfio-v9-6-d7f71607f371@nvidia.c… > --- > drivers/dma-buf/dma-buf-mapping.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) Reviewed-by: Jason Gunthorpe <jgg(a)nvidia.com> AlexW, can you pick this up? Thanks, Jason

2 months, 2 weeks

Re: [PATCH v3 19/28] drm/ttm: rework pipelined eviction fence handling

by Christian König

On 11/21/25 11:12, Pierre-Eric Pelloux-Prayer wrote: > Until now ttm stored a single pipelined eviction fence which means > drivers had to use a single entity for these evictions. > > To lift this requirement, this commit allows up to 8 entities to > be used. > > Ideally a dma_resv object would have been used as a container of > the eviction fences, but the locking rules makes it complex. > dma_resv all have the same ww_class, which means "Attempting to > lock more mutexes after ww_acquire_done." is an error. > > One alternative considered was to introduced a 2nd ww_class for > specific resv to hold a single "transient" lock (= the resv lock > would only be held for a short period, without taking any other > locks). > > The other option, is to statically reserve a fence array, and > extend the existing code to deal with N fences, instead of 1. > > The driver is still responsible to reserve the correct number > of fence slots. > > --- > v2: > - simplified code > - dropped n_fences > - name changes > v3: use ttm_resource_manager_cleanup > --- > > Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer(a)amd.com> Reviewed-by: Christian König <christian.koenig(a)amd.com> Going to push separately to drm-misc-next on Monday. Regards, Christian. > --- > .../gpu/drm/ttm/tests/ttm_bo_validate_test.c | 11 +++-- > drivers/gpu/drm/ttm/tests/ttm_resource_test.c | 5 +- > drivers/gpu/drm/ttm/ttm_bo.c | 47 ++++++++++--------- > drivers/gpu/drm/ttm/ttm_bo_util.c | 38 ++++++++++++--- > drivers/gpu/drm/ttm/ttm_resource.c | 31 +++++++----- > include/drm/ttm/ttm_resource.h | 29 ++++++++---- > 6 files changed, 104 insertions(+), 57 deletions(-) > > diff --git a/drivers/gpu/drm/ttm/tests/ttm_bo_validate_test.c b/drivers/gpu/drm/ttm/tests/ttm_bo_validate_test.c > index 3148f5d3dbd6..8f71906c4238 100644 > --- a/drivers/gpu/drm/ttm/tests/ttm_bo_validate_test.c > +++ b/drivers/gpu/drm/ttm/tests/ttm_bo_validate_test.c > @@ -651,7 +651,7 @@ static void ttm_bo_validate_move_fence_signaled(struct kunit *test) > int err; > > man = ttm_manager_type(priv->ttm_dev, mem_type); > - man->move = dma_fence_get_stub(); > + man->eviction_fences[0] = dma_fence_get_stub(); > > bo = ttm_bo_kunit_init(test, test->priv, size, NULL); > bo->type = bo_type; > @@ -668,7 +668,7 @@ static void ttm_bo_validate_move_fence_signaled(struct kunit *test) > KUNIT_EXPECT_EQ(test, ctx.bytes_moved, size); > > ttm_bo_put(bo); > - dma_fence_put(man->move); > + dma_fence_put(man->eviction_fences[0]); > } > > static const struct ttm_bo_validate_test_case ttm_bo_validate_wait_cases[] = { > @@ -732,9 +732,9 @@ static void ttm_bo_validate_move_fence_not_signaled(struct kunit *test) > > spin_lock_init(&fence_lock); > man = ttm_manager_type(priv->ttm_dev, fst_mem); > - man->move = alloc_mock_fence(test); > + man->eviction_fences[0] = alloc_mock_fence(test); > > - task = kthread_create(threaded_fence_signal, man->move, "move-fence-signal"); > + task = kthread_create(threaded_fence_signal, man->eviction_fences[0], "move-fence-signal"); > if (IS_ERR(task)) > KUNIT_FAIL(test, "Couldn't create move fence signal task\n"); > > @@ -742,7 +742,8 @@ static void ttm_bo_validate_move_fence_not_signaled(struct kunit *test) > err = ttm_bo_validate(bo, placement_val, &ctx_val); > dma_resv_unlock(bo->base.resv); > > - dma_fence_wait_timeout(man->move, false, MAX_SCHEDULE_TIMEOUT); > + dma_fence_wait_timeout(man->eviction_fences[0], false, MAX_SCHEDULE_TIMEOUT); > + man->eviction_fences[0] = NULL; > > KUNIT_EXPECT_EQ(test, err, 0); > KUNIT_EXPECT_EQ(test, ctx_val.bytes_moved, size); > diff --git a/drivers/gpu/drm/ttm/tests/ttm_resource_test.c b/drivers/gpu/drm/ttm/tests/ttm_resource_test.c > index e6ea2bd01f07..c0e4e35e0442 100644 > --- a/drivers/gpu/drm/ttm/tests/ttm_resource_test.c > +++ b/drivers/gpu/drm/ttm/tests/ttm_resource_test.c > @@ -207,6 +207,7 @@ static void ttm_resource_manager_init_basic(struct kunit *test) > struct ttm_resource_test_priv *priv = test->priv; > struct ttm_resource_manager *man; > size_t size = SZ_16K; > + int i; > > man = kunit_kzalloc(test, sizeof(*man), GFP_KERNEL); > KUNIT_ASSERT_NOT_NULL(test, man); > @@ -216,8 +217,8 @@ static void ttm_resource_manager_init_basic(struct kunit *test) > KUNIT_ASSERT_PTR_EQ(test, man->bdev, priv->devs->ttm_dev); > KUNIT_ASSERT_EQ(test, man->size, size); > KUNIT_ASSERT_EQ(test, man->usage, 0); > - KUNIT_ASSERT_NULL(test, man->move); > - KUNIT_ASSERT_NOT_NULL(test, &man->move_lock); > + for (i = 0; i < TTM_NUM_MOVE_FENCES; i++) > + KUNIT_ASSERT_NULL(test, man->eviction_fences[i]); > > for (int i = 0; i < TTM_MAX_BO_PRIORITY; ++i) > KUNIT_ASSERT_TRUE(test, list_empty(&man->lru[i])); > diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c > index f4d9e68b21e7..0b3732ed6f6c 100644 > --- a/drivers/gpu/drm/ttm/ttm_bo.c > +++ b/drivers/gpu/drm/ttm/ttm_bo.c > @@ -658,34 +658,35 @@ void ttm_bo_unpin(struct ttm_buffer_object *bo) > EXPORT_SYMBOL(ttm_bo_unpin); > > /* > - * Add the last move fence to the BO as kernel dependency and reserve a new > - * fence slot. > + * Add the pipelined eviction fencesto the BO as kernel dependency and reserve new > + * fence slots. > */ > -static int ttm_bo_add_move_fence(struct ttm_buffer_object *bo, > - struct ttm_resource_manager *man, > - bool no_wait_gpu) > +static int ttm_bo_add_pipelined_eviction_fences(struct ttm_buffer_object *bo, > + struct ttm_resource_manager *man, > + bool no_wait_gpu) > { > struct dma_fence *fence; > - int ret; > + int i; > > - spin_lock(&man->move_lock); > - fence = dma_fence_get(man->move); > - spin_unlock(&man->move_lock); > + spin_lock(&man->eviction_lock); > + for (i = 0; i < TTM_NUM_MOVE_FENCES; i++) { > + fence = man->eviction_fences[i]; > + if (!fence) > + continue; > > - if (!fence) > - return 0; > - > - if (no_wait_gpu) { > - ret = dma_fence_is_signaled(fence) ? 0 : -EBUSY; > - dma_fence_put(fence); > - return ret; > + if (no_wait_gpu) { > + if (!dma_fence_is_signaled(fence)) { > + spin_unlock(&man->eviction_lock); > + return -EBUSY; > + } > + } else { > + dma_resv_add_fence(bo->base.resv, fence, DMA_RESV_USAGE_KERNEL); > + } > } > + spin_unlock(&man->eviction_lock); > > - dma_resv_add_fence(bo->base.resv, fence, DMA_RESV_USAGE_KERNEL); > - > - ret = dma_resv_reserve_fences(bo->base.resv, 1); > - dma_fence_put(fence); > - return ret; > + /* TODO: this call should be removed. */ > + return dma_resv_reserve_fences(bo->base.resv, 1); > } > > /** > @@ -718,7 +719,7 @@ static int ttm_bo_alloc_resource(struct ttm_buffer_object *bo, > int i, ret; > > ticket = dma_resv_locking_ctx(bo->base.resv); > - ret = dma_resv_reserve_fences(bo->base.resv, 1); > + ret = dma_resv_reserve_fences(bo->base.resv, TTM_NUM_MOVE_FENCES); > if (unlikely(ret)) > return ret; > > @@ -757,7 +758,7 @@ static int ttm_bo_alloc_resource(struct ttm_buffer_object *bo, > return ret; > } > > - ret = ttm_bo_add_move_fence(bo, man, ctx->no_wait_gpu); > + ret = ttm_bo_add_pipelined_eviction_fences(bo, man, ctx->no_wait_gpu); > if (unlikely(ret)) { > ttm_resource_free(bo, res); > if (ret == -EBUSY) > diff --git a/drivers/gpu/drm/ttm/ttm_bo_util.c b/drivers/gpu/drm/ttm/ttm_bo_util.c > index acbbca9d5c92..2ff35d55e462 100644 > --- a/drivers/gpu/drm/ttm/ttm_bo_util.c > +++ b/drivers/gpu/drm/ttm/ttm_bo_util.c > @@ -258,7 +258,7 @@ static int ttm_buffer_object_transfer(struct ttm_buffer_object *bo, > ret = dma_resv_trylock(&fbo->base.base._resv); > WARN_ON(!ret); > > - ret = dma_resv_reserve_fences(&fbo->base.base._resv, 1); > + ret = dma_resv_reserve_fences(&fbo->base.base._resv, TTM_NUM_MOVE_FENCES); > if (ret) { > dma_resv_unlock(&fbo->base.base._resv); > kfree(fbo); > @@ -646,20 +646,44 @@ static void ttm_bo_move_pipeline_evict(struct ttm_buffer_object *bo, > { > struct ttm_device *bdev = bo->bdev; > struct ttm_resource_manager *from; > + struct dma_fence *tmp; > + int i; > > from = ttm_manager_type(bdev, bo->resource->mem_type); > > /** > * BO doesn't have a TTM we need to bind/unbind. Just remember > - * this eviction and free up the allocation > + * this eviction and free up the allocation. > + * The fence will be saved in the first free slot or in the slot > + * already used to store a fence from the same context. Since > + * drivers can't use more than TTM_NUM_MOVE_FENCES contexts for > + * evictions we should always find a slot to use. > */ > - spin_lock(&from->move_lock); > - if (!from->move || dma_fence_is_later(fence, from->move)) { > - dma_fence_put(from->move); > - from->move = dma_fence_get(fence); > + spin_lock(&from->eviction_lock); > + for (i = 0; i < TTM_NUM_MOVE_FENCES; i++) { > + tmp = from->eviction_fences[i]; > + if (!tmp) > + break; > + if (fence->context != tmp->context) > + continue; > + if (dma_fence_is_later(fence, tmp)) { > + dma_fence_put(tmp); > + break; > + } > + goto unlock; > + } > + if (i < TTM_NUM_MOVE_FENCES) { > + from->eviction_fences[i] = dma_fence_get(fence); > + } else { > + WARN(1, "not enough fence slots for all fence contexts"); > + spin_unlock(&from->eviction_lock); > + dma_fence_wait(fence, false); > + goto end; > } > - spin_unlock(&from->move_lock); > > +unlock: > + spin_unlock(&from->eviction_lock); > +end: > ttm_resource_free(bo, &bo->resource); > } > > diff --git a/drivers/gpu/drm/ttm/ttm_resource.c b/drivers/gpu/drm/ttm/ttm_resource.c > index e2c82ad07eb4..62c34cafa387 100644 > --- a/drivers/gpu/drm/ttm/ttm_resource.c > +++ b/drivers/gpu/drm/ttm/ttm_resource.c > @@ -523,14 +523,15 @@ void ttm_resource_manager_init(struct ttm_resource_manager *man, > { > unsigned i; > > - spin_lock_init(&man->move_lock); > man->bdev = bdev; > man->size = size; > man->usage = 0; > > for (i = 0; i < TTM_MAX_BO_PRIORITY; ++i) > INIT_LIST_HEAD(&man->lru[i]); > - man->move = NULL; > + spin_lock_init(&man->eviction_lock); > + for (i = 0; i < TTM_NUM_MOVE_FENCES; i++) > + man->eviction_fences[i] = NULL; > } > EXPORT_SYMBOL(ttm_resource_manager_init); > > @@ -551,7 +552,7 @@ int ttm_resource_manager_evict_all(struct ttm_device *bdev, > .no_wait_gpu = false, > }; > struct dma_fence *fence; > - int ret; > + int ret, i; > > do { > ret = ttm_bo_evict_first(bdev, man, &ctx); > @@ -561,18 +562,24 @@ int ttm_resource_manager_evict_all(struct ttm_device *bdev, > if (ret && ret != -ENOENT) > return ret; > > - spin_lock(&man->move_lock); > - fence = dma_fence_get(man->move); > - spin_unlock(&man->move_lock); > + ret = 0; > > - if (fence) { > - ret = dma_fence_wait(fence, false); > - dma_fence_put(fence); > - if (ret) > - return ret; > + spin_lock(&man->eviction_lock); > + for (i = 0; i < TTM_NUM_MOVE_FENCES; i++) { > + fence = man->eviction_fences[i]; > + if (fence && !dma_fence_is_signaled(fence)) { > + dma_fence_get(fence); > + spin_unlock(&man->eviction_lock); > + ret = dma_fence_wait(fence, false); > + dma_fence_put(fence); > + if (ret) > + return ret; > + spin_lock(&man->eviction_lock); > + } > } > + spin_unlock(&man->eviction_lock); > > - return 0; > + return ret; > } > EXPORT_SYMBOL(ttm_resource_manager_evict_all); > > diff --git a/include/drm/ttm/ttm_resource.h b/include/drm/ttm/ttm_resource.h > index f49daa504c36..50e6added509 100644 > --- a/include/drm/ttm/ttm_resource.h > +++ b/include/drm/ttm/ttm_resource.h > @@ -50,6 +50,15 @@ struct io_mapping; > struct sg_table; > struct scatterlist; > > +/** > + * define TTM_NUM_MOVE_FENCES - How many entities can be used for evictions > + * > + * Pipelined evictions can be spread on multiple entities. This > + * is the max number of entities that can be used by the driver > + * for that purpose. > + */ > +#define TTM_NUM_MOVE_FENCES 8 > + > /** > * enum ttm_lru_item_type - enumerate ttm_lru_item subclasses > */ > @@ -180,8 +189,8 @@ struct ttm_resource_manager_func { > * @size: Size of the managed region. > * @bdev: ttm device this manager belongs to > * @func: structure pointer implementing the range manager. See above > - * @move_lock: lock for move fence > - * @move: The fence of the last pipelined move operation. > + * @eviction_lock: lock for eviction fences > + * @eviction_fences: The fences of the last pipelined move operation. > * @lru: The lru list for this memory type. > * > * This structure is used to identify and manage memory types for a device. > @@ -195,12 +204,12 @@ struct ttm_resource_manager { > struct ttm_device *bdev; > uint64_t size; > const struct ttm_resource_manager_func *func; > - spinlock_t move_lock; > > - /* > - * Protected by @move_lock. > + /* This is very similar to a dma_resv object, but locking rules make > + * it difficult to use one in this context. > */ > - struct dma_fence *move; > + spinlock_t eviction_lock; > + struct dma_fence *eviction_fences[TTM_NUM_MOVE_FENCES]; > > /* > * Protected by the bdev->lru_lock. > @@ -421,8 +430,12 @@ static inline bool ttm_resource_manager_used(struct ttm_resource_manager *man) > static inline void > ttm_resource_manager_cleanup(struct ttm_resource_manager *man) > { > - dma_fence_put(man->move); > - man->move = NULL; > + int i; > + > + for (i = 0; i < TTM_NUM_MOVE_FENCES; i++) { > + dma_fence_put(man->eviction_fences[i]); > + man->eviction_fences[i] = NULL; > + } > } > > void ttm_lru_bulk_move_init(struct ttm_lru_bulk_move *bulk);

2 months, 2 weeks

Re: [PATCH v9 06/11] dma-buf: provide phys_vec to scatter-gather mapping routine

by Jason Gunthorpe

On Wed, Nov 26, 2025 at 08:08:24AM -0800, Alex Mastro wrote: > On Wed, Nov 26, 2025 at 01:12:40PM +0000, Pranjal Shrivastava wrote: > > On Tue, Nov 25, 2025 at 04:18:03PM -0800, Alex Mastro wrote: > > > On Thu, Nov 20, 2025 at 11:28:25AM +0200, Leon Romanovsky wrote: > > > > +static struct scatterlist *fill_sg_entry(struct scatterlist *sgl, size_t length, > > > > + dma_addr_t addr) > > > > +{ > > > > + unsigned int len, nents; > > > > + int i; > > > > + > > > > + nents = DIV_ROUND_UP(length, UINT_MAX); > > > > + for (i = 0; i < nents; i++) { > > > > + len = min_t(size_t, length, UINT_MAX); > > > > + length -= len; > > > > + /* > > > > + * DMABUF abuses scatterlist to create a scatterlist > > > > + * that does not have any CPU list, only the DMA list. > > > > + * Always set the page related values to NULL to ensure > > > > + * importers can't use it. The phys_addr based DMA API > > > > + * does not require the CPU list for mapping or unmapping. > > > > + */ > > > > + sg_set_page(sgl, NULL, 0, 0); > > > > + sg_dma_address(sgl) = addr + i * UINT_MAX; > > > > > > (i * UINT_MAX) happens in 32-bit before being promoted to dma_addr_t for > > > addition with addr. Overflows for i >=2 when length >= 8 GiB. Needs a cast: > > > > > > sg_dma_address(sgl) = addr + (dma_addr_t)i * UINT_MAX; Yeah, and i should not be signed. > > > Discovered this while debugging why dma-buf import was failing for > > > an 8 GiB dma-buf using my earlier toy program [1]. It was surfaced by > > > ib_umem_find_best_pgsz() returning 0 due to malformed scatterlist, which bubbles > > > up as an EINVAL. > > > > > > > Thanks a lot for testing & reporting this! > > > > However, I believe the casting approach is a little fragile (and > > potentially prone to issues depending on how dma_addr_t is sized on > > different platforms). Thus, approaching this with accumulation seems > > better as it avoids the multiplication logic entirely, maybe something > > like the following (untested) diff ? > > If the function input range is well-formed, then all values in > [addr..addr+length) must be expressible by dma_addr_t, so I don't think overflow > after casting is possible as long as nents is valid. It is probably not perfect, but validate_dmabuf_input() limits length to a valid size_t The signature is: bool dma_iova_try_alloc(struct device *dev, struct dma_iova_state *state, phys_addr_t phys, size_t size) And that function should fail if size is too large. I think it mostly does, but it looks like there are a few little misses: iova_align(iovad, size + iova_off), return ALIGN(size, iovad->granule); etc are all unchecked math that could overflow. > That said, `nents = DIV_ROUND_UP(length, UINT_MAX)` is simply broken on any > system where size_t is 32b. I don't know if that's a practical consideration for > these code paths though. Yeah, that's a good point. Casting to u64 will trigger 64 bit device errors on 32 bit too. // DIV_ROUND_UP that is safe at the type limits nents = size / UINT_MAX; if (size % UINT_MAX) nents++; Compiler should turn the % into bit math. Jason

2 months, 2 weeks

Re: [PATCH 1/4] dma-buf/fence: give some reasonable maximum signaling timeout

by Lucas Stach

Am Mittwoch, dem 26.11.2025 um 16:44 +0100 schrieb Philipp Stanner: > On Wed, 2025-11-26 at 16:03 +0100, Christian König wrote: > > > > > > On 11/26/25 13:37, Philipp Stanner wrote: > > > On Wed, 2025-11-26 at 13:31 +0100, Christian König wrote: > > > > > > […] > > > > > Well the question is how do you detect *reliable* that there is > > > > still forward progress? > > > > > > My understanding is that that's impossible since the internals of > > > command submissions are only really understood by userspace, who > > > submits them. > > > > Right, but we can still try to do our best in the kernel to mitigate > > the situation. > > > > I think for now amdgpu will implement something like checking if the > > HW still makes progress after a timeout but only a limited number of > > re-tries until we say that's it and reset anyway. > > Oh oh, isn't that our dear hang_limit? :) Not really. The hang limit is the limit on how many times a hanging submit might be retried. Limiting the number of timeout extensions is more of a safety net against a workloads which might appear to make progress to the kernel driver but in reality are stuck. After all, the kernel driver can only have limited knowledge of the GPU state and any progress check will have limited precision with false positives/negatives being a part of reality we have to deal with. > > We agree that you can never really now whether userspace just submitted > a while(true) job, don't we? Even if some GPU register still indicates > "progress". Yea, this is really hardware dependent on what you can read at runtime. For etnaviv we define "progress" as the command frontend moving towards the end of the command buffer. As a single draw call in valid workloads can blow through our timeout we also use debug registers to look at the current primitive ID within a draw call. If userspace submits a workload that requires more than 500ms per primitive to finish we consider this an invalid workload and go through the reset/recovery motions. Regards, Lucas

2 months, 2 weeks

Re: [PATCH 1/4] dma-buf/fence: give some reasonable maximum signaling timeout

by Christian König

On 11/26/25 13:37, Philipp Stanner wrote: > On Wed, 2025-11-26 at 13:31 +0100, Christian König wrote: >> On 11/25/25 18:02, Lucas Stach wrote: >>>>> I agree that distinguishing the use case that way is not ideal. >>>>> However, who has the knowledge of how the hardware is being used by >>>>> customers / users, if not the driver? >>>> >>>> Well the end user. >>>> >>>> Maybe we should move the whole timeout topic into the DRM layer or the scheduler component. >>>> >>>> Something like 2 seconds default (which BTW is the default on Windows as well), which can be overridden on a global, per device, per queue name basis. >>>> >>>> And 10 seconds maximum with only a warning that a not default timeout is used and everything above 10 seconds taints the kernel and should really only be used for testing/debugging. >>> >>> The question really is what you want to do after you hit the (lowered) >>> timeout? Users get grumpy if you block things for 10 seconds, but they >>> get equally if not more grumpy when you kick out a valid workload that >>> just happens to need a lot of GPU time. >> >> Yeah, exactly that summarizes the problem pretty well. >> >>> Fences are only defined to signal eventually, with no real concept of a >>> timeout. IMO all timeouts waiting for fences should be long enough to >>> only be considered last resort. You may want to give the user some >>> indication of a failed fence wait instead of stalling indefinitely, but >>> you really only want to do this after a quite long timeout, not in a >>> sense of "Sorry, I ran out of patience after 2 seconds". >>> >>> Sure memory management depends on fences making forward progress, but >>> mm also depends on scheduled writeback making forward progress. You >>> don't kick out writeback requests after an arbitrary timeout just >>> because the backing storage happens to be loaded heavily. >>> >>> This BTW is also why etnaviv has always had a quite short timeout of >>> 500ms, with the option to extend the timeout when the GPU is still >>> making progress. We don't ever want to shoot down valid workloads (we >>> have some that need a few seconds to upload textures, etc on our wimpy >>> GPU), but you also don't want to wait multiple seconds until you detect >>> a real GPU hang. >> >> That is a really good point. We considered that as well, but then abandoned the idea, see below for the background. >> >> What we could also do is setting a flag on the fence when a process is killed and then waiting for that fence to signal so that it can clean up. Going to prototype that. >> >>> So we use the short scheduler timeout to check in on the GPU and see if >>> it is still making progress (for graphics workloads by looking at the >>> frontend position within the command buffer and current primitive ID). >>> If we can deduce that the GPU is stuck we do the usual reset/recovery >>> dance within a reasonable reaction time, acceptable to users hitting a >>> real GPU hang. But if the GPU is making progress we will give an >>> infinite number of timeout extensions with no global timeout at all, >>> only fulfilling the eventual signaling guarantee of the fence. >> >> Well the question is how do you detect *reliable* that there is still forward progress? > > My understanding is that that's impossible since the internals of > command submissions are only really understood by userspace, who > submits them. Right, but we can still try to do our best in the kernel to mitigate the situation. I think for now amdgpu will implement something like checking if the HW still makes progress after a timeout but only a limited number of re-tries until we say that's it and reset anyway. > I think the long-term solution can only be fully fledged GPU scheduling > with preemption. That's why we don't need such a timeout mechanism for > userspace processes: the scheduler simply interrupts and lets someone > else run. Yeah absolutely. > > My hope would be that in the mid-term future we'd get firmware rings > that can be preempted through a firmware call for all major hardware. > Then a huge share of our problems would disappear. At least on AMD HW pre-emption is actually horrible unreliable as well. Userspace basically needs to co-operate and provide a buffer where the state on a pre-emption is saved into. > With the current situation, IDK either. My impression so far is that > letting the drivers and driver programmers decide is the least bad > choice. Yeah, agree. It's the least evil thing we can do. But I now have a plan how to proceed :) Thanks for the input, Christian. > > > P.

2 months, 2 weeks

Re: [PATCH 0/6] dma-fence: Remove return code of dma_fence_signal() et al.

by Christian König

On 11/26/25 14:19, Philipp Stanner wrote: > Barely anyone uses dma_fence_signal()'s (and similar functions') return > code. Checking it is pretty much useless anyways, because what are you > going to do if a fence was already signal it? Unsignal it and signal it > again? ;p Reviewed-by: Christian König <christian.koenig(a)amd.com> for the entire series. Please push to drm-misc-next or leave me a note when I should pick it up. > Removing the return code simplifies the API and makes it easier for me > to sit on top with Rust DmaFence. BTW, I have an rb for embedding the lock and I'm now writing test cases. When that is done you should be able to base the Rust DmaFence abstraction on that as well. Regards, Christian. > > Philipp Stanner (6): > dma-buf/dma-fence: Add dma_fence_test_signaled_flag() > amd/amdkfd: Ignore return code of dma_fence_signal() > drm/gpu/xe: Ignore dma_fenc_signal() return code > dma-buf: Don't misuse dma_fence_signal() > drm/ttm: Remove return check of dma_fence_signal() > dma-buf/dma-fence: Remove return code of signaling-functions > > drivers/dma-buf/dma-fence.c | 59 ++++++------------- > drivers/dma-buf/st-dma-fence.c | 7 +-- > drivers/gpu/drm/amd/amdkfd/kfd_process.c | 5 +- > .../gpu/drm/ttm/tests/ttm_bo_validate_test.c | 3 +- > drivers/gpu/drm/xe/xe_hw_fence.c | 5 +- > include/linux/dma-fence.h | 33 ++++++++--- > 6 files changed, 53 insertions(+), 59 deletions(-) >

2 months, 2 weeks

Re: [PATCH 1/4] dma-buf/fence: give some reasonable maximum signaling timeout

by Christian König

On 11/25/25 11:56, Philipp Stanner wrote: >>>> >>>> The GPU scheduler has a very similar define, MAX_WAIT_SCHED_ENTITY_Q_EMPTY which is currently just 1 second. >>>> >>>> The real question is what is the maximum amount of time we can wait for the HW before we should trigger a timeout? >>> >>> That's a question only the drivers can answer, which is why I like to >>> think that setting global constants constraining all parties is not the >>> right thing to do. >> >> Exactly that's the reason why I bring that up. I think that drivers should be in charge of timeouts is the wrong approach. >> >> See the reason why we have the timeout (and documented that it is a must have) is because we have both core memory management as well a desktop responsiveness depend on it. > > Good and well, but then patch 4 becomes even more problematic: > > So we'd just have drivers fire warnings, and then they would still have > the freedom to set timeouts for drm/sched, as long as those timeouts > are smaller than your new global constant. > > Why then not remove drm/sched's timeout parameter API completely and > always use your maximum value internally in drm/sched? Or maybe > truncate it with a warning? I have considered that as well, but then thought that we should at least give end users the possibility to override the timeout while still tainting the kernel so that we know about this in bug reports, core dumps etc... > "Maximum timeout parameter exceeded, truncating to %ld.\n" > > I suppose some drivers want even higher responsiveness than those 2 > seconds. As far as I know some medical use cases for example have timeouts like 100-200ms. But again that is the use case and not the driver. > I do believe that more of the driver folks should be made aware of this > intended change. I have no real intention of actually pushing those patches, at least not as they are. I just wanted to kick of some discussion. >> >>> What is even your motivation? What problem does this solve? Is the OOM >>> killer currently hanging for anyone? Can you link a bug report? >> >> I'm not sure if we have an external bug report (we have an internal one), but for amdgpu there were customer complains that 10 seconds is to long. >> >> So we changed it to 2 seconds for amdgpu, and now there are complains from internal AMD teams that 2 seconds is to short. >> >> While working on that I realized that the timeout is actually not driver dependent at all. >> >> What can maybe argued is that a desktop system should have a shorter timeout than some server, but that one driver needs a different timeout than another driver doesn't really makes sense to me. >> >> I mean what is actually HW dependent on the requirement that I need a responsive desktop system? > > I suppose some drivers are indeed only used for server hardware. And > for compute you might not care about responsiveness as long as your > result drops off at some point. But there's cloud gaming, too.. Good point with the cloud gaming. > I agree that distinguishing the use case that way is not ideal. > However, who has the knowledge of how the hardware is being used by > customers / users, if not the driver? Well the end user. Maybe we should move the whole timeout topic into the DRM layer or the scheduler component. Something like 2 seconds default (which BTW is the default on Windows as well), which can be overridden on a global, per device, per queue name basis. And 10 seconds maximum with only a warning that a not default timeout is used and everything above 10 seconds taints the kernel and should really only be used for testing/debugging. Thoughts? Regards, Christian. > > > P.

2 months, 2 weeks

← Newer
1
2
3
4
5
6
7
8
9
10
Older →

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

Linaro-mm-sig November 2025