Linaro-mm-sig November 2025

linaro-mm-sig@lists.linaro.org

19 participants
100 discussions

Re: [PATCH v3 27/28] drm/amdgpu: get rid of amdgpu_ttm_clear_buffer

by Christian König

On 11/21/25 11:12, Pierre-Eric Pelloux-Prayer wrote: > It's doing the same thing as amdgpu_fill_buffer(src_data=0), so drop it. > > The only caveat is that amdgpu_res_cleared() return value is only valid > right after allocation. > > --- > v2: introduce new "bool consider_clear_status" arg > --- > > Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer(a)amd.com> It would be better to have that ealier in the patch set, but I guess that gives you rebasing problems? Christian. > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 16 ++-- > drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 90 +++++----------------- > drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h | 7 +- > 3 files changed, 33 insertions(+), 80 deletions(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c > index 7d8d70135cc2..dccc31d0128e 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c > @@ -725,13 +725,17 @@ int amdgpu_bo_create(struct amdgpu_device *adev, > bo->tbo.resource->mem_type == TTM_PL_VRAM) { > struct dma_fence *fence; > > - r = amdgpu_ttm_clear_buffer(adev, bo, bo->tbo.base.resv, &fence); > + r = amdgpu_fill_buffer(adev, amdgpu_ttm_next_clear_entity(adev), > + bo, 0, NULL, &fence, > + true, AMDGPU_KERNEL_JOB_ID_TTM_CLEAR_BUFFER); > if (unlikely(r)) > goto fail_unreserve; > > - dma_resv_add_fence(bo->tbo.base.resv, fence, > - DMA_RESV_USAGE_KERNEL); > - dma_fence_put(fence); > + if (fence) { > + dma_resv_add_fence(bo->tbo.base.resv, fence, > + DMA_RESV_USAGE_KERNEL); > + dma_fence_put(fence); > + } > } > if (!bp->resv) > amdgpu_bo_unreserve(bo); > @@ -1323,8 +1327,8 @@ void amdgpu_bo_release_notify(struct ttm_buffer_object *bo) > goto out; > > r = amdgpu_fill_buffer(adev, amdgpu_ttm_next_clear_entity(adev), > - abo, 0, &bo->base._resv, > - &fence, AMDGPU_KERNEL_JOB_ID_CLEAR_ON_RELEASE); > + abo, 0, &bo->base._resv, &fence, > + false, AMDGPU_KERNEL_JOB_ID_CLEAR_ON_RELEASE); > if (WARN_ON(r)) > goto out; > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c > index 39cfe2dbdf03..c65c411ce26e 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c > @@ -459,7 +459,7 @@ static int amdgpu_move_blit(struct ttm_buffer_object *bo, > > r = amdgpu_fill_buffer(adev, entity, > abo, 0, NULL, &wipe_fence, > - AMDGPU_KERNEL_JOB_ID_MOVE_BLIT); > + false, AMDGPU_KERNEL_JOB_ID_MOVE_BLIT); > if (r) { > goto error; > } else if (wipe_fence) { > @@ -2459,79 +2459,28 @@ static int amdgpu_ttm_fill_mem(struct amdgpu_device *adev, > } > > /** > - * amdgpu_ttm_clear_buffer - clear memory buffers > + * amdgpu_fill_buffer - fill a buffer with a given value > * @adev: amdgpu device object > - * @bo: amdgpu buffer object > - * @resv: reservation object > - * @fence: dma_fence associated with the operation > + * @entity: optional entity to use. If NULL, the clearing entities will be > + * used to load-balance the partial clears > + * @bo: the bo to fill > + * @src_data: the value to set > + * @resv: fences contained in this reservation will be used as dependencies. > + * @out_fence: the fence from the last clear will be stored here. It might be > + * NULL if no job was run. > + * @dependency: optional input dependency fence. > + * @consider_clear_status: true if region reported as cleared by amdgpu_res_cleared() > + * are skipped. > + * @k_job_id: trace id > * > - * Clear the memory buffer resource. > - * > - * Returns: > - * 0 for success or a negative error code on failure. > */ > -int amdgpu_ttm_clear_buffer(struct amdgpu_device *adev, > - struct amdgpu_bo *bo, > - struct dma_resv *resv, > - struct dma_fence **fence) > -{ > - struct amdgpu_ttm_buffer_entity *entity; > - struct amdgpu_res_cursor cursor; > - u64 addr; > - int r = 0; > - > - if (!adev->mman.buffer_funcs_enabled) > - return -EINVAL; > - > - if (!fence) > - return -EINVAL; > - entity = &adev->mman.clear_entities[0]; > - *fence = dma_fence_get_stub(); > - > - amdgpu_res_first(bo->tbo.resource, 0, amdgpu_bo_size(bo), &cursor); > - > - mutex_lock(&entity->lock); > - while (cursor.remaining) { > - struct dma_fence *next = NULL; > - u64 size; > - > - if (amdgpu_res_cleared(&cursor)) { > - amdgpu_res_next(&cursor, cursor.size); > - continue; > - } > - > - /* Never clear more than 256MiB at once to avoid timeouts */ > - size = min(cursor.size, 256ULL << 20); > - > - r = amdgpu_ttm_map_buffer(adev, entity, > - &bo->tbo, bo->tbo.resource, &cursor, > - 1, false, false, &size, &addr); > - if (r) > - goto err; > - > - r = amdgpu_ttm_fill_mem(adev, entity, 0, addr, size, resv, > - &next, true, > - AMDGPU_KERNEL_JOB_ID_TTM_CLEAR_BUFFER); > - if (r) > - goto err; > - > - dma_fence_put(*fence); > - *fence = next; > - > - amdgpu_res_next(&cursor, size); > - } > -err: > - mutex_unlock(&entity->lock); > - > - return r; > -} > - > int amdgpu_fill_buffer(struct amdgpu_device *adev, > struct amdgpu_ttm_buffer_entity *entity, > struct amdgpu_bo *bo, > uint32_t src_data, > struct dma_resv *resv, > - struct dma_fence **f, > + struct dma_fence **out_fence, > + bool consider_clear_status, > u64 k_job_id) > { > struct dma_fence *fence = NULL; > @@ -2551,6 +2500,11 @@ int amdgpu_fill_buffer(struct amdgpu_device *adev, > struct dma_fence *next; > uint64_t cur_size, to; > > + if (consider_clear_status && amdgpu_res_cleared(&dst)) { > + amdgpu_res_next(&dst, dst.size); > + continue; > + } > + > /* Never fill more than 256MiB at once to avoid timeouts */ > cur_size = min(dst.size, 256ULL << 20); > > @@ -2574,9 +2528,7 @@ int amdgpu_fill_buffer(struct amdgpu_device *adev, > } > error: > mutex_unlock(&entity->lock); > - if (f) > - *f = dma_fence_get(fence); > - dma_fence_put(fence); > + *out_fence = fence; > return r; > } > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h > index 653a4d17543e..f3bdbcec9afc 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h > @@ -181,16 +181,13 @@ int amdgpu_copy_buffer(struct amdgpu_device *adev, > struct dma_resv *resv, > struct dma_fence **fence, > bool vm_needs_flush, uint32_t copy_flags); > -int amdgpu_ttm_clear_buffer(struct amdgpu_device *adev, > - struct amdgpu_bo *bo, > - struct dma_resv *resv, > - struct dma_fence **fence); > int amdgpu_fill_buffer(struct amdgpu_device *adev, > struct amdgpu_ttm_buffer_entity *entity, > struct amdgpu_bo *bo, > uint32_t src_data, > struct dma_resv *resv, > - struct dma_fence **f, > + struct dma_fence **out_fence, > + bool consider_clear_status, > u64 k_job_id); > struct amdgpu_ttm_buffer_entity *amdgpu_ttm_next_clear_entity(struct amdgpu_device *adev); >

2 months, 2 weeks

Re: [PATCH 6/9] iommufd: Have pfn_reader process DMABUF iopt_pages

by Jason Gunthorpe

On Thu, Nov 20, 2025 at 08:04:37AM +0000, Tian, Kevin wrote: > > From: Jason Gunthorpe <jgg(a)nvidia.com> > > Sent: Saturday, November 8, 2025 12:50 AM > > + > > +static int pfn_reader_fill_dmabuf(struct pfn_reader_dmabuf *dmabuf, > > + struct pfn_batch *batch, > > + unsigned long start_index, > > + unsigned long last_index) > > +{ > > + unsigned long start = dmabuf->start_offset + start_index * PAGE_SIZE; > > + > > + /* > > + * This works in PAGE_SIZE indexes, if the dmabuf is sliced and > > + * starts/ends at a sub page offset then the batch to domain code will > > + * adjust it. > > + */ > > dmabuf->start_offset comes from pages->dmabuf.start, which is initialized as: > > pages->dmabuf.start = start - start_byte; > > so it's always page-aligned. Where is the sub-page offset coming from? I need to go over this again to check it, this sub-page stuff is a bit convoluted. start_offset should include the sub page offset here.. > > @@ -1687,6 +1737,12 @@ static void __iopt_area_unfill_domain(struct > > iopt_area *area, > > > > lockdep_assert_held(&pages->mutex); > > > > + if (iopt_is_dmabuf(pages)) { > > + iopt_area_unmap_domain_range(area, domain, start_index, > > + last_index); > > + return; > > + } > > + > > this belongs to patch3? This is part of programming the domain with the dmabuf, the patch3 was about the revoke which is a slightly different topic though they are both similar. Thanks, Jason

2 months, 2 weeks

Re: [PATCH 2/9] iommufd: Add DMABUF to iopt_pages

by Jason Gunthorpe

On Thu, Nov 20, 2025 at 07:55:04AM +0000, Tian, Kevin wrote: > > From: Jason Gunthorpe <jgg(a)nvidia.com> > > Sent: Saturday, November 8, 2025 12:50 AM > > > > > > @@ -2031,7 +2155,10 @@ int iopt_pages_rw_access(struct iopt_pages > > *pages, unsigned long start_byte, > > if ((flags & IOMMUFD_ACCESS_RW_WRITE) && !pages->writable) > > return -EPERM; > > > > - if (pages->type == IOPT_ADDRESS_FILE) > > + if (iopt_is_dmabuf(pages)) > > + return -EINVAL; > > + > > probably also add helpers for other types, e.g.: > > iopt_is_user() > iopt_is_memfd() The helper was to integrate the IS_ENABLED() check for DMABUF, there are not so many others uses, I think leave it to not bloat the patch. > > + if (pages->type != IOPT_ADDRESS_USER) > > return iopt_pages_rw_slow(pages, start_index, last_index, > > start_byte % PAGE_SIZE, data, > > length, > > flags); > > -- > > then the following WARN_ON() becomes useless: > > if (IS_ENABLED(CONFIG_IOMMUFD_TEST) && > WARN_ON(pages->type != IOPT_ADDRESS_USER)) > return -EINVAL; Yep Thanks, Jason

2 months, 2 weeks

Re: [PATCH v3 09/28] drm/amdgpu: pass the entity to use to ttm public functions

by Christian König

On 11/21/25 11:12, Pierre-Eric Pelloux-Prayer wrote: > This way the caller can select the one it wants to use. > > Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer(a)amd.com> I'm wondering if it wouldn't make sense to put a pointer to adev into each amdgpu_ttm_buffer_entity. But that is maybe something for another patch. For now: Reviewed-by: Christian König <christian.koenig(a)amd.com> > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_benchmark.c | 3 +- > drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 4 +-- > drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 34 +++++++++---------- > drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h | 16 +++++---- > drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 3 +- > 5 files changed, 32 insertions(+), 28 deletions(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_benchmark.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_benchmark.c > index 3636b757c974..a050167e76a4 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_benchmark.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_benchmark.c > @@ -37,7 +37,8 @@ static int amdgpu_benchmark_do_move(struct amdgpu_device *adev, unsigned size, > > stime = ktime_get(); > for (i = 0; i < n; i++) { > - r = amdgpu_copy_buffer(adev, saddr, daddr, size, NULL, &fence, > + r = amdgpu_copy_buffer(adev, &adev->mman.default_entity, > + saddr, daddr, size, NULL, &fence, > false, 0); > if (r) > goto exit_do_move; > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c > index 926a3f09a776..858eb9fa061b 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c > @@ -1322,8 +1322,8 @@ void amdgpu_bo_release_notify(struct ttm_buffer_object *bo) > if (r) > goto out; > > - r = amdgpu_fill_buffer(abo, 0, &bo->base._resv, &fence, true, > - AMDGPU_KERNEL_JOB_ID_CLEAR_ON_RELEASE); > + r = amdgpu_fill_buffer(&adev->mman.clear_entity, abo, 0, &bo->base._resv, > + &fence, AMDGPU_KERNEL_JOB_ID_CLEAR_ON_RELEASE); > if (WARN_ON(r)) > goto out; > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c > index 3d850893b97f..1d3afad885da 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c > @@ -359,7 +359,7 @@ static int amdgpu_ttm_copy_mem_to_mem(struct amdgpu_device *adev, > write_compress_disable)); > } > > - r = amdgpu_copy_buffer(adev, from, to, cur_size, resv, > + r = amdgpu_copy_buffer(adev, entity, from, to, cur_size, resv, > &next, true, copy_flags); > if (r) > goto error; > @@ -414,8 +414,9 @@ static int amdgpu_move_blit(struct ttm_buffer_object *bo, > (abo->flags & AMDGPU_GEM_CREATE_VRAM_WIPE_ON_RELEASE)) { > struct dma_fence *wipe_fence = NULL; > > - r = amdgpu_fill_buffer(abo, 0, NULL, &wipe_fence, > - false, AMDGPU_KERNEL_JOB_ID_MOVE_BLIT); > + r = amdgpu_fill_buffer(&adev->mman.move_entity, > + abo, 0, NULL, &wipe_fence, > + AMDGPU_KERNEL_JOB_ID_MOVE_BLIT); > if (r) { > goto error; > } else if (wipe_fence) { > @@ -2258,7 +2259,9 @@ static int amdgpu_ttm_prepare_job(struct amdgpu_device *adev, > DMA_RESV_USAGE_BOOKKEEP); > } > > -int amdgpu_copy_buffer(struct amdgpu_device *adev, uint64_t src_offset, > +int amdgpu_copy_buffer(struct amdgpu_device *adev, > + struct amdgpu_ttm_buffer_entity *entity, > + uint64_t src_offset, > uint64_t dst_offset, uint32_t byte_count, > struct dma_resv *resv, > struct dma_fence **fence, > @@ -2282,7 +2285,7 @@ int amdgpu_copy_buffer(struct amdgpu_device *adev, uint64_t src_offset, > max_bytes = adev->mman.buffer_funcs->copy_max_bytes; > num_loops = DIV_ROUND_UP(byte_count, max_bytes); > num_dw = ALIGN(num_loops * adev->mman.buffer_funcs->copy_num_dw, 8); > - r = amdgpu_ttm_prepare_job(adev, &adev->mman.move_entity, num_dw, > + r = amdgpu_ttm_prepare_job(adev, entity, num_dw, > resv, vm_needs_flush, &job, > AMDGPU_KERNEL_JOB_ID_TTM_COPY_BUFFER); > if (r) > @@ -2411,22 +2414,18 @@ int amdgpu_ttm_clear_buffer(struct amdgpu_bo *bo, > return r; > } > > -int amdgpu_fill_buffer(struct amdgpu_bo *bo, > - uint32_t src_data, > - struct dma_resv *resv, > - struct dma_fence **f, > - bool delayed, > - u64 k_job_id) > +int amdgpu_fill_buffer(struct amdgpu_ttm_buffer_entity *entity, > + struct amdgpu_bo *bo, > + uint32_t src_data, > + struct dma_resv *resv, > + struct dma_fence **f, > + u64 k_job_id) > { > struct amdgpu_device *adev = amdgpu_ttm_adev(bo->tbo.bdev); > - struct amdgpu_ttm_buffer_entity *entity; > struct dma_fence *fence = NULL; > struct amdgpu_res_cursor dst; > int r; > > - entity = delayed ? &adev->mman.clear_entity : > - &adev->mman.move_entity; > - > if (!adev->mman.buffer_funcs_enabled) { > dev_err(adev->dev, > "Trying to clear memory with ring turned off.\n"); > @@ -2443,13 +2442,14 @@ int amdgpu_fill_buffer(struct amdgpu_bo *bo, > /* Never fill more than 256MiB at once to avoid timeouts */ > cur_size = min(dst.size, 256ULL << 20); > > - r = amdgpu_ttm_map_buffer(adev, &adev->mman.default_entity, > + r = amdgpu_ttm_map_buffer(adev, entity, > &bo->tbo, bo->tbo.resource, &dst, > 1, false, &cur_size, &to); > if (r) > goto error; > > - r = amdgpu_ttm_fill_mem(adev, entity, src_data, to, cur_size, resv, > + r = amdgpu_ttm_fill_mem(adev, entity, > + src_data, to, cur_size, resv, > &next, true, k_job_id); > if (r) > goto error; > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h > index 41bbc25680a2..9288599c9c46 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h > @@ -167,7 +167,9 @@ int amdgpu_ttm_init(struct amdgpu_device *adev); > void amdgpu_ttm_fini(struct amdgpu_device *adev); > void amdgpu_ttm_set_buffer_funcs_status(struct amdgpu_device *adev, > bool enable); > -int amdgpu_copy_buffer(struct amdgpu_device *adev, uint64_t src_offset, > +int amdgpu_copy_buffer(struct amdgpu_device *adev, > + struct amdgpu_ttm_buffer_entity *entity, > + uint64_t src_offset, > uint64_t dst_offset, uint32_t byte_count, > struct dma_resv *resv, > struct dma_fence **fence, > @@ -175,12 +177,12 @@ int amdgpu_copy_buffer(struct amdgpu_device *adev, uint64_t src_offset, > int amdgpu_ttm_clear_buffer(struct amdgpu_bo *bo, > struct dma_resv *resv, > struct dma_fence **fence); > -int amdgpu_fill_buffer(struct amdgpu_bo *bo, > - uint32_t src_data, > - struct dma_resv *resv, > - struct dma_fence **fence, > - bool delayed, > - u64 k_job_id); > +int amdgpu_fill_buffer(struct amdgpu_ttm_buffer_entity *entity, > + struct amdgpu_bo *bo, > + uint32_t src_data, > + struct dma_resv *resv, > + struct dma_fence **f, > + u64 k_job_id); > > int amdgpu_ttm_alloc_gart(struct ttm_buffer_object *bo); > void amdgpu_ttm_recover_gart(struct ttm_buffer_object *tbo); > diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c > index ade1d4068d29..9c76f1ba0e55 100644 > --- a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c > +++ b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c > @@ -157,7 +157,8 @@ svm_migrate_copy_memory_gart(struct amdgpu_device *adev, dma_addr_t *sys, > goto out_unlock; > } > > - r = amdgpu_copy_buffer(adev, gart_s, gart_d, size * PAGE_SIZE, > + r = amdgpu_copy_buffer(adev, entity, > + gart_s, gart_d, size * PAGE_SIZE, > NULL, &next, true, 0); > if (r) { > dev_err(adev->dev, "fail %d to copy memory\n", r);

2 months, 2 weeks

Re: [PATCH v3 06/28] drm/amdgpu: add amdgpu_ttm_job_submit helper

by Christian König

On 11/21/25 11:12, Pierre-Eric Pelloux-Prayer wrote: > Deduplicate the IB padding code and will also be used > later to check locking. > > Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer(a)amd.com> Reviewed-by: Christian König <christian.koenig(a)amd.com> > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 34 ++++++++++++------------- > 1 file changed, 16 insertions(+), 18 deletions(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c > index 17e1892c44a2..be1232b2d55e 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c > @@ -162,6 +162,18 @@ static void amdgpu_evict_flags(struct ttm_buffer_object *bo, > *placement = abo->placement; > } > > +static struct dma_fence * > +amdgpu_ttm_job_submit(struct amdgpu_device *adev, struct amdgpu_job *job, u32 num_dw) > +{ > + struct amdgpu_ring *ring; > + > + ring = adev->mman.buffer_funcs_ring; > + amdgpu_ring_pad_ib(ring, &job->ibs[0]); > + WARN_ON(job->ibs[0].length_dw > num_dw); > + > + return amdgpu_job_submit(job); > +} > + > /** > * amdgpu_ttm_map_buffer - Map memory into the GART windows > * @adev: the device being used > @@ -185,7 +197,6 @@ static int amdgpu_ttm_map_buffer(struct amdgpu_device *adev, > { > unsigned int offset, num_pages, num_dw, num_bytes; > uint64_t src_addr, dst_addr; > - struct amdgpu_ring *ring; > struct amdgpu_job *job; > void *cpu_addr; > uint64_t flags; > @@ -240,10 +251,6 @@ static int amdgpu_ttm_map_buffer(struct amdgpu_device *adev, > amdgpu_emit_copy_buffer(adev, &job->ibs[0], src_addr, > dst_addr, num_bytes, 0); > > - ring = adev->mman.buffer_funcs_ring; > - amdgpu_ring_pad_ib(ring, &job->ibs[0]); > - WARN_ON(job->ibs[0].length_dw > num_dw); > - > flags = amdgpu_ttm_tt_pte_flags(adev, bo->ttm, mem); > if (tmz) > flags |= AMDGPU_PTE_TMZ; > @@ -261,7 +268,7 @@ static int amdgpu_ttm_map_buffer(struct amdgpu_device *adev, > amdgpu_gart_map_vram_range(adev, pa, 0, num_pages, flags, cpu_addr); > } > > - dma_fence_put(amdgpu_job_submit(job)); > + dma_fence_put(amdgpu_ttm_job_submit(adev, job, num_dw)); > return 0; > } > > @@ -1497,10 +1504,7 @@ static int amdgpu_ttm_access_memory_sdma(struct ttm_buffer_object *bo, > amdgpu_emit_copy_buffer(adev, &job->ibs[0], src_addr, dst_addr, > PAGE_SIZE, 0); > > - amdgpu_ring_pad_ib(adev->mman.buffer_funcs_ring, &job->ibs[0]); > - WARN_ON(job->ibs[0].length_dw > num_dw); > - > - fence = amdgpu_job_submit(job); > + fence = amdgpu_ttm_job_submit(adev, job, num_dw); > > if (!dma_fence_wait_timeout(fence, false, adev->sdma_timeout)) > r = -ETIMEDOUT; > @@ -2285,11 +2289,9 @@ int amdgpu_copy_buffer(struct amdgpu_device *adev, uint64_t src_offset, > byte_count -= cur_size_in_bytes; > } > > - amdgpu_ring_pad_ib(ring, &job->ibs[0]); > - WARN_ON(job->ibs[0].length_dw > num_dw); > - *fence = amdgpu_job_submit(job); > if (r) > goto error_free; > + *fence = amdgpu_ttm_job_submit(adev, job, num_dw); > > return r; > > @@ -2307,7 +2309,6 @@ static int amdgpu_ttm_fill_mem(struct amdgpu_device *adev, uint32_t src_data, > u64 k_job_id) > { > unsigned int num_loops, num_dw; > - struct amdgpu_ring *ring; > struct amdgpu_job *job; > uint32_t max_bytes; > unsigned int i; > @@ -2331,10 +2332,7 @@ static int amdgpu_ttm_fill_mem(struct amdgpu_device *adev, uint32_t src_data, > byte_count -= cur_size; > } > > - ring = adev->mman.buffer_funcs_ring; > - amdgpu_ring_pad_ib(ring, &job->ibs[0]); > - WARN_ON(job->ibs[0].length_dw > num_dw); > - *fence = amdgpu_job_submit(job); > + *fence = amdgpu_ttm_job_submit(adev, job, num_dw); > return 0; > } >

2 months, 2 weeks

Re: [PATCH v9 10/11] vfio/pci: Add dma-buf export support for MMIO regions

by Leon Romanovsky

On Thu, Nov 20, 2025 at 05:04:13PM -0700, Alex Williamson wrote: > On Thu, 20 Nov 2025 11:28:29 +0200 > Leon Romanovsky <leon(a)kernel.org> wrote: > > diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c > > index 142b84b3f225..51a3bcc26f8b 100644 > > --- a/drivers/vfio/pci/vfio_pci_core.c > > +++ b/drivers/vfio/pci/vfio_pci_core.c > ... > > @@ -2487,8 +2500,11 @@ static int vfio_pci_dev_set_hot_reset(struct vfio_device_set *dev_set, > > > > err_undo: > > list_for_each_entry_from_reverse(vdev, &dev_set->device_list, > > - vdev.dev_set_list) > > + vdev.dev_set_list) { > > + if (__vfio_pci_memory_enabled(vdev)) > > + vfio_pci_dma_buf_move(vdev, false); > > up_write(&vdev->memory_lock); > > + } > > I ran into a bug here. In the hot reset path we can have dev_sets > where one or more devices are not opened by the user. The vconfig > buffer for the device is established on open. However: > > bool __vfio_pci_memory_enabled(struct vfio_pci_core_device *vdev) > { > struct pci_dev *pdev = vdev->pdev; > u16 cmd = le16_to_cpu(*(__le16 *)&vdev->vconfig[PCI_COMMAND]); > ... > > Leads to a NULL pointer dereference. > > I think the most straightforward fix is simply to test the open_count > on the vfio_device, which is also protected by the dev_set->lock that > we already hold here: > > --- a/drivers/vfio/pci/vfio_pci_core.c > +++ b/drivers/vfio/pci/vfio_pci_core.c > @@ -2501,7 +2501,7 @@ static int vfio_pci_dev_set_hot_reset(struct vfio_device_set *dev_set, > err_undo: > list_for_each_entry_from_reverse(vdev, &dev_set->device_list, > vdev.dev_set_list) { > - if (__vfio_pci_memory_enabled(vdev)) > + if (vdev->vdev.open_count && __vfio_pci_memory_enabled(vdev)) > vfio_pci_dma_buf_move(vdev, false); > up_write(&vdev->memory_lock); > } > > Any other suggestions? This should be the only reset path with this > nuance of affecting non-opened devices. Thanks, It seems right to me. Thanks > > Alex

2 months, 2 weeks

Re: [PATCH v2] dma-buf: system_heap: use larger contiguous mappings instead of per-page mmap

by Sumit Semwal

Hi Barry, On Fri, 21 Nov 2025 at 06:54, Barry Song <21cnbao(a)gmail.com> wrote: > > Hi Sumit, > > > > > Using the micro-benchmark below, we see that mmap becomes > > 3.5X faster: > > > Marcin pointed out to me off-tree that it is actually 35x faster, > not 3.5x faster. Sorry for my poor math. I assume you can fix this > when merging it? Sure, I corrected this, and is merged to drm-misc-next Thanks, Sumit. > > > > > W/ patch: > > > > ~ # ./a.out > > mmap 512MB took 200266.000 us, verify OK > > ~ # ./a.out > > mmap 512MB took 198151.000 us, verify OK > > ~ # ./a.out > > mmap 512MB took 197069.000 us, verify OK > > ~ # ./a.out > > mmap 512MB took 196781.000 us, verify OK > > ~ # ./a.out > > mmap 512MB took 198102.000 us, verify OK > > ~ # ./a.out > > mmap 512MB took 195552.000 us, verify OK > > > > W/o patch: > > > > ~ # ./a.out > > mmap 512MB took 6987470.000 us, verify OK > > ~ # ./a.out > > mmap 512MB took 6970739.000 us, verify OK > > ~ # ./a.out > > mmap 512MB took 6984383.000 us, verify OK > > ~ # ./a.out > > mmap 512MB took 6971311.000 us, verify OK > > ~ # ./a.out > > mmap 512MB took 6991680.000 us, verify OK > > > Thanks > Barry

2 months, 2 weeks

Re: [PATCH v9 10/11] vfio/pci: Add dma-buf export support for MMIO regions

by Jason Gunthorpe

On Thu, Nov 20, 2025 at 05:04:13PM -0700, Alex Williamson wrote: > @@ -2501,7 +2501,7 @@ static int vfio_pci_dev_set_hot_reset(struct vfio_device_set *dev_set, > err_undo: > list_for_each_entry_from_reverse(vdev, &dev_set->device_list, > vdev.dev_set_list) { > - if (__vfio_pci_memory_enabled(vdev)) > + if (vdev->vdev.open_count && __vfio_pci_memory_enabled(vdev)) > vfio_pci_dma_buf_move(vdev, false); > up_write(&vdev->memory_lock); > } > > Any other suggestions? This should be the only reset path with this > nuance of affecting non-opened devices. Thanks, Seems reasonable, but should it be in __vfio_pci_memory_enabled() just to be robust? Jason

2 months, 2 weeks

Re: [PATCH] drm/xe: Fix memory leak when handling pagefault vma

by Thomas Hellström

On Thu, 2025-11-20 at 18:14 +0200, Mika Kuoppala wrote: > When the pagefault handling code was moved to a new file, an extra > drm_exec_init() was added to the VMA path. This call is unnecessary > because > xe_validation_ctx_init() already performs a drm_exec_init(), > resulting in a > memory leak reported by kmemleak. > > Remove the redundant drm_exec_init() from the VMA pagefault handling > code. > > Fixes: fb544b844508 ("drm/xe: Implement xe_pagefault_queue_work") > Cc: Matthew Brost <matthew.brost(a)intel.com> > Cc: Stuart Summers <stuart.summers(a)intel.com> > Cc: Lucas De Marchi <lucas.demarchi(a)intel.com> > Cc: "Thomas Hellström" <thomas.hellstrom(a)linux.intel.com> > Cc: Rodrigo Vivi <rodrigo.vivi(a)intel.com> > Cc: Sumit Semwal <sumit.semwal(a)linaro.org> > Cc: "Christian König" <christian.koenig(a)amd.com> > Cc: intel-xe(a)lists.freedesktop.org > Cc: linux-media(a)vger.kernel.org > Cc: dri-devel(a)lists.freedesktop.org > Cc: linaro-mm-sig(a)lists.linaro.org > Signed-off-by: Mika Kuoppala <mika.kuoppala(a)linux.intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom(a)linux.intel.com> > --- > drivers/gpu/drm/xe/xe_pagefault.c | 1 - > 1 file changed, 1 deletion(-) > > diff --git a/drivers/gpu/drm/xe/xe_pagefault.c > b/drivers/gpu/drm/xe/xe_pagefault.c > index fe3e40145012..afb06598b6e1 100644 > --- a/drivers/gpu/drm/xe/xe_pagefault.c > +++ b/drivers/gpu/drm/xe/xe_pagefault.c > @@ -102,7 +102,6 @@ static int xe_pagefault_handle_vma(struct xe_gt > *gt, struct xe_vma *vma, > > /* Lock VM and BOs dma-resv */ > xe_validation_ctx_init(&ctx, &vm->xe->val, &exec, (struct > xe_val_flags) {}); > - drm_exec_init(&exec, 0, 0); > drm_exec_until_all_locked(&exec) { > err = xe_pagefault_begin(&exec, vma, tile->mem.vram, > needs_vram == 1);

2 months, 2 weeks

Re: [PATCH 5/9] iommufd: Allow MMIO pages in a batch

by Jason Gunthorpe

On Thu, Nov 20, 2025 at 07:59:19AM +0000, Tian, Kevin wrote: > > From: Jason Gunthorpe <jgg(a)nvidia.com> > > Sent: Saturday, November 8, 2025 12:50 AM > > > > +enum batch_kind { > > + BATCH_CPU_MEMORY = 0, > > + BATCH_MMIO, > > +}; > > with 'CPU_MEMORY' (instead of plain 'MEMORY') implies future > support of 'DEV_MEMORY'? Maybe, but I don't have an immediate thought on this. CXL "MMIO" that is cachable is a thing but we can also label it as CPU_MEMORY. We might have something for CC shared/protected memory down the road. Thanks, Jason

2 months, 2 weeks

← Newer
1
2
3
4
5
6
7
8
9
10
Older →

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

Linaro-mm-sig November 2025