The drm/ttm patch modifies TTM to support multiple contexts for the pipelined moves.
Then amdgpu/ttm is updated to express dependencies between jobs explicitely, instead of relying on the ordering of execution guaranteed by the use of a single instance. With all of this in place, we can use multiple entities, with each having access to the available SDMA instances.
This rework also gives the opportunity to merge the clear functions into a single one and to optimize a bit GART usage.
(The first patch of the series has already been merged through drm-misc but I'm including it here to reduce conflicts)
For v3 I've kept the series as a whole but I've reorganized the patches so that everything up to the drm/ttm change can be merged through amd-staging-drm-next once reviewed.
v3: - shuffled the patches: everything up to the drm/ttm patch has no dependency on the ttm change and be merged independently - split "drm/amdgpu: pass the entity to use to ttm functions" in 2 commits - moved AMDGPU_GTT_NUM_TRANSFER_WINDOWS removal to its own commit - added a ttm job submission helper - addressed comments from Christian and Felix v2: - addressed comments from Christian - dropped "drm/amdgpu: prepare amdgpu_fill_buffer to use N entities" and "drm/amdgpu: use multiple entities in amdgpu_fill_buffer" - added "drm/admgpu: handle resv dependencies in amdgpu_ttm_map_buffer", "drm/amdgpu: round robin through clear_entities in amdgpu_fill_buffer" - reworked how sdma rings/scheds are passed to amdgpu_ttm v1: https://lists.freedesktop.org/archives/dri-devel/2025-November/534517.html
Pierre-Eric Pelloux-Prayer (28): drm/amdgpu: give each kernel job a unique id drm/amdgpu: use ttm_resource_manager_cleanup drm/amdgpu: remove direct_submit arg from amdgpu_copy_buffer drm/amdgpu: remove the ring param from ttm functions drm/amdgpu: introduce amdgpu_ttm_buffer_entity drm/amdgpu: add amdgpu_ttm_job_submit helper drm/amdgpu: fix error handling in amdgpu_copy_buffer drm/amdgpu: pass the entity to use to amdgpu_ttm_map_buffer drm/amdgpu: pass the entity to use to ttm public functions drm/amdgpu: add amdgpu_device argument to ttm functions that need it drm/amdgpu: statically assign gart windows to ttm entities drm/amdgpu: remove AMDGPU_GTT_NUM_TRANSFER_WINDOWS drm/amdgpu: add missing lock when using ttm entities drm/amdgpu: check entity lock is held in amdgpu_ttm_job_submit drm/amdgpu: double AMDGPU_GTT_MAX_TRANSFER_SIZE drm/amdgpu: use larger gart window when possible drm/amdgpu: introduce amdgpu_sdma_set_vm_pte_scheds drm/amdgpu: move sched status check inside amdgpu_ttm_set_buffer_funcs_status drm/ttm: rework pipelined eviction fence handling drm/amdgpu: allocate multiple clear entities drm/amdgpu: allocate multiple move entities drm/amdgpu: round robin through clear_entities in amdgpu_fill_buffer drm/amdgpu: use TTM_NUM_MOVE_FENCES when reserving fences drm/amdgpu: use multiple entities in amdgpu_move_blit drm/amdgpu: pass all the sdma scheds to amdgpu_mman drm/amdgpu: give ttm entities access to all the sdma scheds drm/amdgpu: get rid of amdgpu_ttm_clear_buffer drm/amdgpu: rename amdgpu_fill_buffer as amdgpu_ttm_clear_buffer
drivers/gpu/drm/amd/amdgpu/amdgpu.h | 4 + drivers/gpu/drm/amd/amdgpu/amdgpu_benchmark.c | 8 +- drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 5 +- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 15 +- drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c | 3 +- drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c | 14 +- drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c | 8 +- drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 5 +- drivers/gpu/drm/amd/amdgpu/amdgpu_job.h | 19 +- drivers/gpu/drm/amd/amdgpu/amdgpu_jpeg.c | 3 +- drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 16 +- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 493 +++++++++++------- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h | 58 ++- drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c | 3 +- drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c | 11 +- drivers/gpu/drm/amd/amdgpu/amdgpu_vce.h | 3 +- drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c | 8 +- drivers/gpu/drm/amd/amdgpu/amdgpu_vkms.c | 6 +- drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 26 +- drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_vm_cpu.c | 4 +- drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c | 4 +- drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c | 12 +- drivers/gpu/drm/amd/amdgpu/cik_sdma.c | 34 +- drivers/gpu/drm/amd/amdgpu/sdma_v2_4.c | 34 +- drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c | 34 +- drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c | 41 +- drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c | 41 +- drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c | 37 +- drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c | 37 +- drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c | 32 +- drivers/gpu/drm/amd/amdgpu/sdma_v7_0.c | 32 +- drivers/gpu/drm/amd/amdgpu/si_dma.c | 34 +- drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c | 6 +- drivers/gpu/drm/amd/amdgpu/uvd_v7_0.c | 6 +- drivers/gpu/drm/amd/amdgpu/vce_v1_0.c | 12 +- drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 33 +- drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 3 +- .../amd/display/amdgpu_dm/amdgpu_dm_plane.c | 6 +- .../drm/amd/display/amdgpu_dm/amdgpu_dm_wb.c | 6 +- .../gpu/drm/ttm/tests/ttm_bo_validate_test.c | 11 +- drivers/gpu/drm/ttm/tests/ttm_resource_test.c | 5 +- drivers/gpu/drm/ttm/ttm_bo.c | 47 +- drivers/gpu/drm/ttm/ttm_bo_util.c | 38 +- drivers/gpu/drm/ttm/ttm_resource.c | 31 +- include/drm/ttm/ttm_resource.h | 29 +- 47 files changed, 706 insertions(+), 615 deletions(-)
It was always false.
Signed-off-by: Pierre-Eric Pelloux-Prayer pierre-eric.pelloux-prayer@amd.com Reviewed-by: Christian König christian.koenig@amd.com --- drivers/gpu/drm/amd/amdgpu/amdgpu_benchmark.c | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 20 +++++++------------ drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h | 2 +- drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 2 +- 4 files changed, 10 insertions(+), 16 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_benchmark.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_benchmark.c index 199693369c7c..02c2479a8840 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_benchmark.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_benchmark.c @@ -39,7 +39,7 @@ static int amdgpu_benchmark_do_move(struct amdgpu_device *adev, unsigned size, for (i = 0; i < n; i++) { struct amdgpu_ring *ring = adev->mman.buffer_funcs_ring; r = amdgpu_copy_buffer(ring, saddr, daddr, size, NULL, &fence, - false, false, 0); + false, 0); if (r) goto exit_do_move; r = dma_fence_wait(fence, false); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c index 8d0043ad5336..071afbacb3d2 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c @@ -346,7 +346,7 @@ static int amdgpu_ttm_copy_mem_to_mem(struct amdgpu_device *adev, }
r = amdgpu_copy_buffer(ring, from, to, cur_size, resv, - &next, false, true, copy_flags); + &next, true, copy_flags); if (r) goto error;
@@ -2203,16 +2203,13 @@ void amdgpu_ttm_set_buffer_funcs_status(struct amdgpu_device *adev, bool enable) }
static int amdgpu_ttm_prepare_job(struct amdgpu_device *adev, - bool direct_submit, unsigned int num_dw, struct dma_resv *resv, bool vm_needs_flush, struct amdgpu_job **job, bool delayed, u64 k_job_id) { - enum amdgpu_ib_pool_type pool = direct_submit ? - AMDGPU_IB_POOL_DIRECT : - AMDGPU_IB_POOL_DELAYED; + enum amdgpu_ib_pool_type pool = AMDGPU_IB_POOL_DELAYED; int r; struct drm_sched_entity *entity = delayed ? &adev->mman.low_pr : &adev->mman.high_pr; @@ -2238,7 +2235,7 @@ static int amdgpu_ttm_prepare_job(struct amdgpu_device *adev, int amdgpu_copy_buffer(struct amdgpu_ring *ring, uint64_t src_offset, uint64_t dst_offset, uint32_t byte_count, struct dma_resv *resv, - struct dma_fence **fence, bool direct_submit, + struct dma_fence **fence, bool vm_needs_flush, uint32_t copy_flags) { struct amdgpu_device *adev = ring->adev; @@ -2248,7 +2245,7 @@ int amdgpu_copy_buffer(struct amdgpu_ring *ring, uint64_t src_offset, unsigned int i; int r;
- if (!direct_submit && !ring->sched.ready) { + if (!ring->sched.ready) { dev_err(adev->dev, "Trying to move memory with ring turned off.\n"); return -EINVAL; @@ -2257,7 +2254,7 @@ int amdgpu_copy_buffer(struct amdgpu_ring *ring, uint64_t src_offset, max_bytes = adev->mman.buffer_funcs->copy_max_bytes; num_loops = DIV_ROUND_UP(byte_count, max_bytes); num_dw = ALIGN(num_loops * adev->mman.buffer_funcs->copy_num_dw, 8); - r = amdgpu_ttm_prepare_job(adev, direct_submit, num_dw, + r = amdgpu_ttm_prepare_job(adev, num_dw, resv, vm_needs_flush, &job, false, AMDGPU_KERNEL_JOB_ID_TTM_COPY_BUFFER); if (r) @@ -2275,10 +2272,7 @@ int amdgpu_copy_buffer(struct amdgpu_ring *ring, uint64_t src_offset,
amdgpu_ring_pad_ib(ring, &job->ibs[0]); WARN_ON(job->ibs[0].length_dw > num_dw); - if (direct_submit) - r = amdgpu_job_submit_direct(job, ring, fence); - else - *fence = amdgpu_job_submit(job); + *fence = amdgpu_job_submit(job); if (r) goto error_free;
@@ -2307,7 +2301,7 @@ static int amdgpu_ttm_fill_mem(struct amdgpu_ring *ring, uint32_t src_data, max_bytes = adev->mman.buffer_funcs->fill_max_bytes; num_loops = DIV_ROUND_UP_ULL(byte_count, max_bytes); num_dw = ALIGN(num_loops * adev->mman.buffer_funcs->fill_num_dw, 8); - r = amdgpu_ttm_prepare_job(adev, false, num_dw, resv, vm_needs_flush, + r = amdgpu_ttm_prepare_job(adev, num_dw, resv, vm_needs_flush, &job, delayed, k_job_id); if (r) return r; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h index 577ee04ce0bf..50e40380fe95 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h @@ -166,7 +166,7 @@ void amdgpu_ttm_set_buffer_funcs_status(struct amdgpu_device *adev, int amdgpu_copy_buffer(struct amdgpu_ring *ring, uint64_t src_offset, uint64_t dst_offset, uint32_t byte_count, struct dma_resv *resv, - struct dma_fence **fence, bool direct_submit, + struct dma_fence **fence, bool vm_needs_flush, uint32_t copy_flags); int amdgpu_ttm_clear_buffer(struct amdgpu_bo *bo, struct dma_resv *resv, diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c index 46c84fc60af1..378af0b2aaa9 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c @@ -153,7 +153,7 @@ svm_migrate_copy_memory_gart(struct amdgpu_device *adev, dma_addr_t *sys, }
r = amdgpu_copy_buffer(ring, gart_s, gart_d, size * PAGE_SIZE, - NULL, &next, false, true, 0); + NULL, &next, true, 0); if (r) { dev_err(adev->dev, "fail %d to copy memory\n", r); goto out_unlock;
Deduplicate the IB padding code and will also be used later to check locking.
Signed-off-by: Pierre-Eric Pelloux-Prayer pierre-eric.pelloux-prayer@amd.com --- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 34 ++++++++++++------------- 1 file changed, 16 insertions(+), 18 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c index 17e1892c44a2..be1232b2d55e 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c @@ -162,6 +162,18 @@ static void amdgpu_evict_flags(struct ttm_buffer_object *bo, *placement = abo->placement; }
+static struct dma_fence * +amdgpu_ttm_job_submit(struct amdgpu_device *adev, struct amdgpu_job *job, u32 num_dw) +{ + struct amdgpu_ring *ring; + + ring = adev->mman.buffer_funcs_ring; + amdgpu_ring_pad_ib(ring, &job->ibs[0]); + WARN_ON(job->ibs[0].length_dw > num_dw); + + return amdgpu_job_submit(job); +} + /** * amdgpu_ttm_map_buffer - Map memory into the GART windows * @adev: the device being used @@ -185,7 +197,6 @@ static int amdgpu_ttm_map_buffer(struct amdgpu_device *adev, { unsigned int offset, num_pages, num_dw, num_bytes; uint64_t src_addr, dst_addr; - struct amdgpu_ring *ring; struct amdgpu_job *job; void *cpu_addr; uint64_t flags; @@ -240,10 +251,6 @@ static int amdgpu_ttm_map_buffer(struct amdgpu_device *adev, amdgpu_emit_copy_buffer(adev, &job->ibs[0], src_addr, dst_addr, num_bytes, 0);
- ring = adev->mman.buffer_funcs_ring; - amdgpu_ring_pad_ib(ring, &job->ibs[0]); - WARN_ON(job->ibs[0].length_dw > num_dw); - flags = amdgpu_ttm_tt_pte_flags(adev, bo->ttm, mem); if (tmz) flags |= AMDGPU_PTE_TMZ; @@ -261,7 +268,7 @@ static int amdgpu_ttm_map_buffer(struct amdgpu_device *adev, amdgpu_gart_map_vram_range(adev, pa, 0, num_pages, flags, cpu_addr); }
- dma_fence_put(amdgpu_job_submit(job)); + dma_fence_put(amdgpu_ttm_job_submit(adev, job, num_dw)); return 0; }
@@ -1497,10 +1504,7 @@ static int amdgpu_ttm_access_memory_sdma(struct ttm_buffer_object *bo, amdgpu_emit_copy_buffer(adev, &job->ibs[0], src_addr, dst_addr, PAGE_SIZE, 0);
- amdgpu_ring_pad_ib(adev->mman.buffer_funcs_ring, &job->ibs[0]); - WARN_ON(job->ibs[0].length_dw > num_dw); - - fence = amdgpu_job_submit(job); + fence = amdgpu_ttm_job_submit(adev, job, num_dw);
if (!dma_fence_wait_timeout(fence, false, adev->sdma_timeout)) r = -ETIMEDOUT; @@ -2285,11 +2289,9 @@ int amdgpu_copy_buffer(struct amdgpu_device *adev, uint64_t src_offset, byte_count -= cur_size_in_bytes; }
- amdgpu_ring_pad_ib(ring, &job->ibs[0]); - WARN_ON(job->ibs[0].length_dw > num_dw); - *fence = amdgpu_job_submit(job); if (r) goto error_free; + *fence = amdgpu_ttm_job_submit(adev, job, num_dw);
return r;
@@ -2307,7 +2309,6 @@ static int amdgpu_ttm_fill_mem(struct amdgpu_device *adev, uint32_t src_data, u64 k_job_id) { unsigned int num_loops, num_dw; - struct amdgpu_ring *ring; struct amdgpu_job *job; uint32_t max_bytes; unsigned int i; @@ -2331,10 +2332,7 @@ static int amdgpu_ttm_fill_mem(struct amdgpu_device *adev, uint32_t src_data, byte_count -= cur_size; }
- ring = adev->mman.buffer_funcs_ring; - amdgpu_ring_pad_ib(ring, &job->ibs[0]); - WARN_ON(job->ibs[0].length_dw > num_dw); - *fence = amdgpu_job_submit(job); + *fence = amdgpu_ttm_job_submit(adev, job, num_dw); return 0; }
On 11/21/25 11:12, Pierre-Eric Pelloux-Prayer wrote:
Deduplicate the IB padding code and will also be used later to check locking.
Signed-off-by: Pierre-Eric Pelloux-Prayer pierre-eric.pelloux-prayer@amd.com
Reviewed-by: Christian König christian.koenig@amd.com
drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 34 ++++++++++++------------- 1 file changed, 16 insertions(+), 18 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c index 17e1892c44a2..be1232b2d55e 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c @@ -162,6 +162,18 @@ static void amdgpu_evict_flags(struct ttm_buffer_object *bo, *placement = abo->placement; } +static struct dma_fence * +amdgpu_ttm_job_submit(struct amdgpu_device *adev, struct amdgpu_job *job, u32 num_dw) +{
- struct amdgpu_ring *ring;
- ring = adev->mman.buffer_funcs_ring;
- amdgpu_ring_pad_ib(ring, &job->ibs[0]);
- WARN_ON(job->ibs[0].length_dw > num_dw);
- return amdgpu_job_submit(job);
+}
/**
- amdgpu_ttm_map_buffer - Map memory into the GART windows
- @adev: the device being used
@@ -185,7 +197,6 @@ static int amdgpu_ttm_map_buffer(struct amdgpu_device *adev, { unsigned int offset, num_pages, num_dw, num_bytes; uint64_t src_addr, dst_addr;
- struct amdgpu_ring *ring; struct amdgpu_job *job; void *cpu_addr; uint64_t flags;
@@ -240,10 +251,6 @@ static int amdgpu_ttm_map_buffer(struct amdgpu_device *adev, amdgpu_emit_copy_buffer(adev, &job->ibs[0], src_addr, dst_addr, num_bytes, 0);
- ring = adev->mman.buffer_funcs_ring;
- amdgpu_ring_pad_ib(ring, &job->ibs[0]);
- WARN_ON(job->ibs[0].length_dw > num_dw);
- flags = amdgpu_ttm_tt_pte_flags(adev, bo->ttm, mem); if (tmz) flags |= AMDGPU_PTE_TMZ;
@@ -261,7 +268,7 @@ static int amdgpu_ttm_map_buffer(struct amdgpu_device *adev, amdgpu_gart_map_vram_range(adev, pa, 0, num_pages, flags, cpu_addr); }
- dma_fence_put(amdgpu_job_submit(job));
- dma_fence_put(amdgpu_ttm_job_submit(adev, job, num_dw)); return 0;
} @@ -1497,10 +1504,7 @@ static int amdgpu_ttm_access_memory_sdma(struct ttm_buffer_object *bo, amdgpu_emit_copy_buffer(adev, &job->ibs[0], src_addr, dst_addr, PAGE_SIZE, 0);
- amdgpu_ring_pad_ib(adev->mman.buffer_funcs_ring, &job->ibs[0]);
- WARN_ON(job->ibs[0].length_dw > num_dw);
- fence = amdgpu_job_submit(job);
- fence = amdgpu_ttm_job_submit(adev, job, num_dw);
if (!dma_fence_wait_timeout(fence, false, adev->sdma_timeout)) r = -ETIMEDOUT; @@ -2285,11 +2289,9 @@ int amdgpu_copy_buffer(struct amdgpu_device *adev, uint64_t src_offset, byte_count -= cur_size_in_bytes; }
- amdgpu_ring_pad_ib(ring, &job->ibs[0]);
- WARN_ON(job->ibs[0].length_dw > num_dw);
- *fence = amdgpu_job_submit(job); if (r) goto error_free;
- *fence = amdgpu_ttm_job_submit(adev, job, num_dw);
return r; @@ -2307,7 +2309,6 @@ static int amdgpu_ttm_fill_mem(struct amdgpu_device *adev, uint32_t src_data, u64 k_job_id) { unsigned int num_loops, num_dw;
- struct amdgpu_ring *ring; struct amdgpu_job *job; uint32_t max_bytes; unsigned int i;
@@ -2331,10 +2332,7 @@ static int amdgpu_ttm_fill_mem(struct amdgpu_device *adev, uint32_t src_data, byte_count -= cur_size; }
- ring = adev->mman.buffer_funcs_ring;
- amdgpu_ring_pad_ib(ring, &job->ibs[0]);
- WARN_ON(job->ibs[0].length_dw > num_dw);
- *fence = amdgpu_job_submit(job);
- *fence = amdgpu_ttm_job_submit(adev, job, num_dw); return 0;
}
This way the caller can select the one it wants to use.
Signed-off-by: Pierre-Eric Pelloux-Prayer pierre-eric.pelloux-prayer@amd.com --- drivers/gpu/drm/amd/amdgpu/amdgpu_benchmark.c | 3 +- drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 4 +-- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 34 +++++++++---------- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h | 16 +++++---- drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 3 +- 5 files changed, 32 insertions(+), 28 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_benchmark.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_benchmark.c index 3636b757c974..a050167e76a4 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_benchmark.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_benchmark.c @@ -37,7 +37,8 @@ static int amdgpu_benchmark_do_move(struct amdgpu_device *adev, unsigned size,
stime = ktime_get(); for (i = 0; i < n; i++) { - r = amdgpu_copy_buffer(adev, saddr, daddr, size, NULL, &fence, + r = amdgpu_copy_buffer(adev, &adev->mman.default_entity, + saddr, daddr, size, NULL, &fence, false, 0); if (r) goto exit_do_move; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c index 926a3f09a776..858eb9fa061b 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c @@ -1322,8 +1322,8 @@ void amdgpu_bo_release_notify(struct ttm_buffer_object *bo) if (r) goto out;
- r = amdgpu_fill_buffer(abo, 0, &bo->base._resv, &fence, true, - AMDGPU_KERNEL_JOB_ID_CLEAR_ON_RELEASE); + r = amdgpu_fill_buffer(&adev->mman.clear_entity, abo, 0, &bo->base._resv, + &fence, AMDGPU_KERNEL_JOB_ID_CLEAR_ON_RELEASE); if (WARN_ON(r)) goto out;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c index 3d850893b97f..1d3afad885da 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c @@ -359,7 +359,7 @@ static int amdgpu_ttm_copy_mem_to_mem(struct amdgpu_device *adev, write_compress_disable)); }
- r = amdgpu_copy_buffer(adev, from, to, cur_size, resv, + r = amdgpu_copy_buffer(adev, entity, from, to, cur_size, resv, &next, true, copy_flags); if (r) goto error; @@ -414,8 +414,9 @@ static int amdgpu_move_blit(struct ttm_buffer_object *bo, (abo->flags & AMDGPU_GEM_CREATE_VRAM_WIPE_ON_RELEASE)) { struct dma_fence *wipe_fence = NULL;
- r = amdgpu_fill_buffer(abo, 0, NULL, &wipe_fence, - false, AMDGPU_KERNEL_JOB_ID_MOVE_BLIT); + r = amdgpu_fill_buffer(&adev->mman.move_entity, + abo, 0, NULL, &wipe_fence, + AMDGPU_KERNEL_JOB_ID_MOVE_BLIT); if (r) { goto error; } else if (wipe_fence) { @@ -2258,7 +2259,9 @@ static int amdgpu_ttm_prepare_job(struct amdgpu_device *adev, DMA_RESV_USAGE_BOOKKEEP); }
-int amdgpu_copy_buffer(struct amdgpu_device *adev, uint64_t src_offset, +int amdgpu_copy_buffer(struct amdgpu_device *adev, + struct amdgpu_ttm_buffer_entity *entity, + uint64_t src_offset, uint64_t dst_offset, uint32_t byte_count, struct dma_resv *resv, struct dma_fence **fence, @@ -2282,7 +2285,7 @@ int amdgpu_copy_buffer(struct amdgpu_device *adev, uint64_t src_offset, max_bytes = adev->mman.buffer_funcs->copy_max_bytes; num_loops = DIV_ROUND_UP(byte_count, max_bytes); num_dw = ALIGN(num_loops * adev->mman.buffer_funcs->copy_num_dw, 8); - r = amdgpu_ttm_prepare_job(adev, &adev->mman.move_entity, num_dw, + r = amdgpu_ttm_prepare_job(adev, entity, num_dw, resv, vm_needs_flush, &job, AMDGPU_KERNEL_JOB_ID_TTM_COPY_BUFFER); if (r) @@ -2411,22 +2414,18 @@ int amdgpu_ttm_clear_buffer(struct amdgpu_bo *bo, return r; }
-int amdgpu_fill_buffer(struct amdgpu_bo *bo, - uint32_t src_data, - struct dma_resv *resv, - struct dma_fence **f, - bool delayed, - u64 k_job_id) +int amdgpu_fill_buffer(struct amdgpu_ttm_buffer_entity *entity, + struct amdgpu_bo *bo, + uint32_t src_data, + struct dma_resv *resv, + struct dma_fence **f, + u64 k_job_id) { struct amdgpu_device *adev = amdgpu_ttm_adev(bo->tbo.bdev); - struct amdgpu_ttm_buffer_entity *entity; struct dma_fence *fence = NULL; struct amdgpu_res_cursor dst; int r;
- entity = delayed ? &adev->mman.clear_entity : - &adev->mman.move_entity; - if (!adev->mman.buffer_funcs_enabled) { dev_err(adev->dev, "Trying to clear memory with ring turned off.\n"); @@ -2443,13 +2442,14 @@ int amdgpu_fill_buffer(struct amdgpu_bo *bo, /* Never fill more than 256MiB at once to avoid timeouts */ cur_size = min(dst.size, 256ULL << 20);
- r = amdgpu_ttm_map_buffer(adev, &adev->mman.default_entity, + r = amdgpu_ttm_map_buffer(adev, entity, &bo->tbo, bo->tbo.resource, &dst, 1, false, &cur_size, &to); if (r) goto error;
- r = amdgpu_ttm_fill_mem(adev, entity, src_data, to, cur_size, resv, + r = amdgpu_ttm_fill_mem(adev, entity, + src_data, to, cur_size, resv, &next, true, k_job_id); if (r) goto error; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h index 41bbc25680a2..9288599c9c46 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h @@ -167,7 +167,9 @@ int amdgpu_ttm_init(struct amdgpu_device *adev); void amdgpu_ttm_fini(struct amdgpu_device *adev); void amdgpu_ttm_set_buffer_funcs_status(struct amdgpu_device *adev, bool enable); -int amdgpu_copy_buffer(struct amdgpu_device *adev, uint64_t src_offset, +int amdgpu_copy_buffer(struct amdgpu_device *adev, + struct amdgpu_ttm_buffer_entity *entity, + uint64_t src_offset, uint64_t dst_offset, uint32_t byte_count, struct dma_resv *resv, struct dma_fence **fence, @@ -175,12 +177,12 @@ int amdgpu_copy_buffer(struct amdgpu_device *adev, uint64_t src_offset, int amdgpu_ttm_clear_buffer(struct amdgpu_bo *bo, struct dma_resv *resv, struct dma_fence **fence); -int amdgpu_fill_buffer(struct amdgpu_bo *bo, - uint32_t src_data, - struct dma_resv *resv, - struct dma_fence **fence, - bool delayed, - u64 k_job_id); +int amdgpu_fill_buffer(struct amdgpu_ttm_buffer_entity *entity, + struct amdgpu_bo *bo, + uint32_t src_data, + struct dma_resv *resv, + struct dma_fence **f, + u64 k_job_id);
int amdgpu_ttm_alloc_gart(struct ttm_buffer_object *bo); void amdgpu_ttm_recover_gart(struct ttm_buffer_object *tbo); diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c index ade1d4068d29..9c76f1ba0e55 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c @@ -157,7 +157,8 @@ svm_migrate_copy_memory_gart(struct amdgpu_device *adev, dma_addr_t *sys, goto out_unlock; }
- r = amdgpu_copy_buffer(adev, gart_s, gart_d, size * PAGE_SIZE, + r = amdgpu_copy_buffer(adev, entity, + gart_s, gart_d, size * PAGE_SIZE, NULL, &next, true, 0); if (r) { dev_err(adev->dev, "fail %d to copy memory\n", r);
On 11/21/25 11:12, Pierre-Eric Pelloux-Prayer wrote:
This way the caller can select the one it wants to use.
Signed-off-by: Pierre-Eric Pelloux-Prayer pierre-eric.pelloux-prayer@amd.com
I'm wondering if it wouldn't make sense to put a pointer to adev into each amdgpu_ttm_buffer_entity.
But that is maybe something for another patch. For now:
Reviewed-by: Christian König christian.koenig@amd.com
drivers/gpu/drm/amd/amdgpu/amdgpu_benchmark.c | 3 +- drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 4 +-- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 34 +++++++++---------- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h | 16 +++++---- drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 3 +- 5 files changed, 32 insertions(+), 28 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_benchmark.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_benchmark.c index 3636b757c974..a050167e76a4 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_benchmark.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_benchmark.c @@ -37,7 +37,8 @@ static int amdgpu_benchmark_do_move(struct amdgpu_device *adev, unsigned size, stime = ktime_get(); for (i = 0; i < n; i++) {
r = amdgpu_copy_buffer(adev, saddr, daddr, size, NULL, &fence,
r = amdgpu_copy_buffer(adev, &adev->mman.default_entity, if (r) goto exit_do_move;saddr, daddr, size, NULL, &fence, false, 0);diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c index 926a3f09a776..858eb9fa061b 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c @@ -1322,8 +1322,8 @@ void amdgpu_bo_release_notify(struct ttm_buffer_object *bo) if (r) goto out;
- r = amdgpu_fill_buffer(abo, 0, &bo->base._resv, &fence, true,
AMDGPU_KERNEL_JOB_ID_CLEAR_ON_RELEASE);
- r = amdgpu_fill_buffer(&adev->mman.clear_entity, abo, 0, &bo->base._resv,
if (WARN_ON(r)) goto out;&fence, AMDGPU_KERNEL_JOB_ID_CLEAR_ON_RELEASE);diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c index 3d850893b97f..1d3afad885da 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c @@ -359,7 +359,7 @@ static int amdgpu_ttm_copy_mem_to_mem(struct amdgpu_device *adev, write_compress_disable)); }
r = amdgpu_copy_buffer(adev, from, to, cur_size, resv,
if (r) goto error;r = amdgpu_copy_buffer(adev, entity, from, to, cur_size, resv, &next, true, copy_flags);@@ -414,8 +414,9 @@ static int amdgpu_move_blit(struct ttm_buffer_object *bo, (abo->flags & AMDGPU_GEM_CREATE_VRAM_WIPE_ON_RELEASE)) { struct dma_fence *wipe_fence = NULL;
r = amdgpu_fill_buffer(abo, 0, NULL, &wipe_fence,false, AMDGPU_KERNEL_JOB_ID_MOVE_BLIT);
r = amdgpu_fill_buffer(&adev->mman.move_entity,abo, 0, NULL, &wipe_fence, if (r) { goto error; } else if (wipe_fence) {AMDGPU_KERNEL_JOB_ID_MOVE_BLIT);@@ -2258,7 +2259,9 @@ static int amdgpu_ttm_prepare_job(struct amdgpu_device *adev, DMA_RESV_USAGE_BOOKKEEP); } -int amdgpu_copy_buffer(struct amdgpu_device *adev, uint64_t src_offset, +int amdgpu_copy_buffer(struct amdgpu_device *adev,
struct amdgpu_ttm_buffer_entity *entity,uint64_t src_offset, uint64_t dst_offset, uint32_t byte_count, struct dma_resv *resv, struct dma_fence **fence,@@ -2282,7 +2285,7 @@ int amdgpu_copy_buffer(struct amdgpu_device *adev, uint64_t src_offset, max_bytes = adev->mman.buffer_funcs->copy_max_bytes; num_loops = DIV_ROUND_UP(byte_count, max_bytes); num_dw = ALIGN(num_loops * adev->mman.buffer_funcs->copy_num_dw, 8);
- r = amdgpu_ttm_prepare_job(adev, &adev->mman.move_entity, num_dw,
- r = amdgpu_ttm_prepare_job(adev, entity, num_dw, resv, vm_needs_flush, &job, AMDGPU_KERNEL_JOB_ID_TTM_COPY_BUFFER); if (r)
@@ -2411,22 +2414,18 @@ int amdgpu_ttm_clear_buffer(struct amdgpu_bo *bo, return r; } -int amdgpu_fill_buffer(struct amdgpu_bo *bo,
uint32_t src_data,struct dma_resv *resv,struct dma_fence **f,bool delayed,u64 k_job_id)+int amdgpu_fill_buffer(struct amdgpu_ttm_buffer_entity *entity,
struct amdgpu_bo *bo,uint32_t src_data,struct dma_resv *resv,struct dma_fence **f,u64 k_job_id){ struct amdgpu_device *adev = amdgpu_ttm_adev(bo->tbo.bdev);
- struct amdgpu_ttm_buffer_entity *entity; struct dma_fence *fence = NULL; struct amdgpu_res_cursor dst; int r;
- entity = delayed ? &adev->mman.clear_entity :
&adev->mman.move_entity;- if (!adev->mman.buffer_funcs_enabled) { dev_err(adev->dev, "Trying to clear memory with ring turned off.\n");
@@ -2443,13 +2442,14 @@ int amdgpu_fill_buffer(struct amdgpu_bo *bo, /* Never fill more than 256MiB at once to avoid timeouts */ cur_size = min(dst.size, 256ULL << 20);
r = amdgpu_ttm_map_buffer(adev, &adev->mman.default_entity,
if (r) goto error;r = amdgpu_ttm_map_buffer(adev, entity, &bo->tbo, bo->tbo.resource, &dst, 1, false, &cur_size, &to);
r = amdgpu_ttm_fill_mem(adev, entity, src_data, to, cur_size, resv,
r = amdgpu_ttm_fill_mem(adev, entity, if (r) goto error;src_data, to, cur_size, resv, &next, true, k_job_id);diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h index 41bbc25680a2..9288599c9c46 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h @@ -167,7 +167,9 @@ int amdgpu_ttm_init(struct amdgpu_device *adev); void amdgpu_ttm_fini(struct amdgpu_device *adev); void amdgpu_ttm_set_buffer_funcs_status(struct amdgpu_device *adev, bool enable); -int amdgpu_copy_buffer(struct amdgpu_device *adev, uint64_t src_offset, +int amdgpu_copy_buffer(struct amdgpu_device *adev,
struct amdgpu_ttm_buffer_entity *entity,uint64_t src_offset, uint64_t dst_offset, uint32_t byte_count, struct dma_resv *resv, struct dma_fence **fence,@@ -175,12 +177,12 @@ int amdgpu_copy_buffer(struct amdgpu_device *adev, uint64_t src_offset, int amdgpu_ttm_clear_buffer(struct amdgpu_bo *bo, struct dma_resv *resv, struct dma_fence **fence); -int amdgpu_fill_buffer(struct amdgpu_bo *bo,
uint32_t src_data,struct dma_resv *resv,struct dma_fence **fence,bool delayed,u64 k_job_id);+int amdgpu_fill_buffer(struct amdgpu_ttm_buffer_entity *entity,
struct amdgpu_bo *bo,uint32_t src_data,struct dma_resv *resv,struct dma_fence **f,u64 k_job_id);int amdgpu_ttm_alloc_gart(struct ttm_buffer_object *bo); void amdgpu_ttm_recover_gart(struct ttm_buffer_object *tbo); diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c index ade1d4068d29..9c76f1ba0e55 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c @@ -157,7 +157,8 @@ svm_migrate_copy_memory_gart(struct amdgpu_device *adev, dma_addr_t *sys, goto out_unlock; }
r = amdgpu_copy_buffer(adev, gart_s, gart_d, size * PAGE_SIZE,
r = amdgpu_copy_buffer(adev, entity, if (r) { dev_err(adev->dev, "fail %d to copy memory\n", r);gart_s, gart_d, size * PAGE_SIZE, NULL, &next, true, 0);
Until now ttm stored a single pipelined eviction fence which means drivers had to use a single entity for these evictions.
To lift this requirement, this commit allows up to 8 entities to be used.
Ideally a dma_resv object would have been used as a container of the eviction fences, but the locking rules makes it complex. dma_resv all have the same ww_class, which means "Attempting to lock more mutexes after ww_acquire_done." is an error.
One alternative considered was to introduced a 2nd ww_class for specific resv to hold a single "transient" lock (= the resv lock would only be held for a short period, without taking any other locks).
The other option, is to statically reserve a fence array, and extend the existing code to deal with N fences, instead of 1.
The driver is still responsible to reserve the correct number of fence slots.
--- v2: - simplified code - dropped n_fences - name changes v3: use ttm_resource_manager_cleanup ---
Signed-off-by: Pierre-Eric Pelloux-Prayer pierre-eric.pelloux-prayer@amd.com --- .../gpu/drm/ttm/tests/ttm_bo_validate_test.c | 11 +++-- drivers/gpu/drm/ttm/tests/ttm_resource_test.c | 5 +- drivers/gpu/drm/ttm/ttm_bo.c | 47 ++++++++++--------- drivers/gpu/drm/ttm/ttm_bo_util.c | 38 ++++++++++++--- drivers/gpu/drm/ttm/ttm_resource.c | 31 +++++++----- include/drm/ttm/ttm_resource.h | 29 ++++++++---- 6 files changed, 104 insertions(+), 57 deletions(-)
diff --git a/drivers/gpu/drm/ttm/tests/ttm_bo_validate_test.c b/drivers/gpu/drm/ttm/tests/ttm_bo_validate_test.c index 3148f5d3dbd6..8f71906c4238 100644 --- a/drivers/gpu/drm/ttm/tests/ttm_bo_validate_test.c +++ b/drivers/gpu/drm/ttm/tests/ttm_bo_validate_test.c @@ -651,7 +651,7 @@ static void ttm_bo_validate_move_fence_signaled(struct kunit *test) int err;
man = ttm_manager_type(priv->ttm_dev, mem_type); - man->move = dma_fence_get_stub(); + man->eviction_fences[0] = dma_fence_get_stub();
bo = ttm_bo_kunit_init(test, test->priv, size, NULL); bo->type = bo_type; @@ -668,7 +668,7 @@ static void ttm_bo_validate_move_fence_signaled(struct kunit *test) KUNIT_EXPECT_EQ(test, ctx.bytes_moved, size);
ttm_bo_put(bo); - dma_fence_put(man->move); + dma_fence_put(man->eviction_fences[0]); }
static const struct ttm_bo_validate_test_case ttm_bo_validate_wait_cases[] = { @@ -732,9 +732,9 @@ static void ttm_bo_validate_move_fence_not_signaled(struct kunit *test)
spin_lock_init(&fence_lock); man = ttm_manager_type(priv->ttm_dev, fst_mem); - man->move = alloc_mock_fence(test); + man->eviction_fences[0] = alloc_mock_fence(test);
- task = kthread_create(threaded_fence_signal, man->move, "move-fence-signal"); + task = kthread_create(threaded_fence_signal, man->eviction_fences[0], "move-fence-signal"); if (IS_ERR(task)) KUNIT_FAIL(test, "Couldn't create move fence signal task\n");
@@ -742,7 +742,8 @@ static void ttm_bo_validate_move_fence_not_signaled(struct kunit *test) err = ttm_bo_validate(bo, placement_val, &ctx_val); dma_resv_unlock(bo->base.resv);
- dma_fence_wait_timeout(man->move, false, MAX_SCHEDULE_TIMEOUT); + dma_fence_wait_timeout(man->eviction_fences[0], false, MAX_SCHEDULE_TIMEOUT); + man->eviction_fences[0] = NULL;
KUNIT_EXPECT_EQ(test, err, 0); KUNIT_EXPECT_EQ(test, ctx_val.bytes_moved, size); diff --git a/drivers/gpu/drm/ttm/tests/ttm_resource_test.c b/drivers/gpu/drm/ttm/tests/ttm_resource_test.c index e6ea2bd01f07..c0e4e35e0442 100644 --- a/drivers/gpu/drm/ttm/tests/ttm_resource_test.c +++ b/drivers/gpu/drm/ttm/tests/ttm_resource_test.c @@ -207,6 +207,7 @@ static void ttm_resource_manager_init_basic(struct kunit *test) struct ttm_resource_test_priv *priv = test->priv; struct ttm_resource_manager *man; size_t size = SZ_16K; + int i;
man = kunit_kzalloc(test, sizeof(*man), GFP_KERNEL); KUNIT_ASSERT_NOT_NULL(test, man); @@ -216,8 +217,8 @@ static void ttm_resource_manager_init_basic(struct kunit *test) KUNIT_ASSERT_PTR_EQ(test, man->bdev, priv->devs->ttm_dev); KUNIT_ASSERT_EQ(test, man->size, size); KUNIT_ASSERT_EQ(test, man->usage, 0); - KUNIT_ASSERT_NULL(test, man->move); - KUNIT_ASSERT_NOT_NULL(test, &man->move_lock); + for (i = 0; i < TTM_NUM_MOVE_FENCES; i++) + KUNIT_ASSERT_NULL(test, man->eviction_fences[i]);
for (int i = 0; i < TTM_MAX_BO_PRIORITY; ++i) KUNIT_ASSERT_TRUE(test, list_empty(&man->lru[i])); diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c index f4d9e68b21e7..0b3732ed6f6c 100644 --- a/drivers/gpu/drm/ttm/ttm_bo.c +++ b/drivers/gpu/drm/ttm/ttm_bo.c @@ -658,34 +658,35 @@ void ttm_bo_unpin(struct ttm_buffer_object *bo) EXPORT_SYMBOL(ttm_bo_unpin);
/* - * Add the last move fence to the BO as kernel dependency and reserve a new - * fence slot. + * Add the pipelined eviction fencesto the BO as kernel dependency and reserve new + * fence slots. */ -static int ttm_bo_add_move_fence(struct ttm_buffer_object *bo, - struct ttm_resource_manager *man, - bool no_wait_gpu) +static int ttm_bo_add_pipelined_eviction_fences(struct ttm_buffer_object *bo, + struct ttm_resource_manager *man, + bool no_wait_gpu) { struct dma_fence *fence; - int ret; + int i;
- spin_lock(&man->move_lock); - fence = dma_fence_get(man->move); - spin_unlock(&man->move_lock); + spin_lock(&man->eviction_lock); + for (i = 0; i < TTM_NUM_MOVE_FENCES; i++) { + fence = man->eviction_fences[i]; + if (!fence) + continue;
- if (!fence) - return 0; - - if (no_wait_gpu) { - ret = dma_fence_is_signaled(fence) ? 0 : -EBUSY; - dma_fence_put(fence); - return ret; + if (no_wait_gpu) { + if (!dma_fence_is_signaled(fence)) { + spin_unlock(&man->eviction_lock); + return -EBUSY; + } + } else { + dma_resv_add_fence(bo->base.resv, fence, DMA_RESV_USAGE_KERNEL); + } } + spin_unlock(&man->eviction_lock);
- dma_resv_add_fence(bo->base.resv, fence, DMA_RESV_USAGE_KERNEL); - - ret = dma_resv_reserve_fences(bo->base.resv, 1); - dma_fence_put(fence); - return ret; + /* TODO: this call should be removed. */ + return dma_resv_reserve_fences(bo->base.resv, 1); }
/** @@ -718,7 +719,7 @@ static int ttm_bo_alloc_resource(struct ttm_buffer_object *bo, int i, ret;
ticket = dma_resv_locking_ctx(bo->base.resv); - ret = dma_resv_reserve_fences(bo->base.resv, 1); + ret = dma_resv_reserve_fences(bo->base.resv, TTM_NUM_MOVE_FENCES); if (unlikely(ret)) return ret;
@@ -757,7 +758,7 @@ static int ttm_bo_alloc_resource(struct ttm_buffer_object *bo, return ret; }
- ret = ttm_bo_add_move_fence(bo, man, ctx->no_wait_gpu); + ret = ttm_bo_add_pipelined_eviction_fences(bo, man, ctx->no_wait_gpu); if (unlikely(ret)) { ttm_resource_free(bo, res); if (ret == -EBUSY) diff --git a/drivers/gpu/drm/ttm/ttm_bo_util.c b/drivers/gpu/drm/ttm/ttm_bo_util.c index acbbca9d5c92..2ff35d55e462 100644 --- a/drivers/gpu/drm/ttm/ttm_bo_util.c +++ b/drivers/gpu/drm/ttm/ttm_bo_util.c @@ -258,7 +258,7 @@ static int ttm_buffer_object_transfer(struct ttm_buffer_object *bo, ret = dma_resv_trylock(&fbo->base.base._resv); WARN_ON(!ret);
- ret = dma_resv_reserve_fences(&fbo->base.base._resv, 1); + ret = dma_resv_reserve_fences(&fbo->base.base._resv, TTM_NUM_MOVE_FENCES); if (ret) { dma_resv_unlock(&fbo->base.base._resv); kfree(fbo); @@ -646,20 +646,44 @@ static void ttm_bo_move_pipeline_evict(struct ttm_buffer_object *bo, { struct ttm_device *bdev = bo->bdev; struct ttm_resource_manager *from; + struct dma_fence *tmp; + int i;
from = ttm_manager_type(bdev, bo->resource->mem_type);
/** * BO doesn't have a TTM we need to bind/unbind. Just remember - * this eviction and free up the allocation + * this eviction and free up the allocation. + * The fence will be saved in the first free slot or in the slot + * already used to store a fence from the same context. Since + * drivers can't use more than TTM_NUM_MOVE_FENCES contexts for + * evictions we should always find a slot to use. */ - spin_lock(&from->move_lock); - if (!from->move || dma_fence_is_later(fence, from->move)) { - dma_fence_put(from->move); - from->move = dma_fence_get(fence); + spin_lock(&from->eviction_lock); + for (i = 0; i < TTM_NUM_MOVE_FENCES; i++) { + tmp = from->eviction_fences[i]; + if (!tmp) + break; + if (fence->context != tmp->context) + continue; + if (dma_fence_is_later(fence, tmp)) { + dma_fence_put(tmp); + break; + } + goto unlock; + } + if (i < TTM_NUM_MOVE_FENCES) { + from->eviction_fences[i] = dma_fence_get(fence); + } else { + WARN(1, "not enough fence slots for all fence contexts"); + spin_unlock(&from->eviction_lock); + dma_fence_wait(fence, false); + goto end; } - spin_unlock(&from->move_lock);
+unlock: + spin_unlock(&from->eviction_lock); +end: ttm_resource_free(bo, &bo->resource); }
diff --git a/drivers/gpu/drm/ttm/ttm_resource.c b/drivers/gpu/drm/ttm/ttm_resource.c index e2c82ad07eb4..62c34cafa387 100644 --- a/drivers/gpu/drm/ttm/ttm_resource.c +++ b/drivers/gpu/drm/ttm/ttm_resource.c @@ -523,14 +523,15 @@ void ttm_resource_manager_init(struct ttm_resource_manager *man, { unsigned i;
- spin_lock_init(&man->move_lock); man->bdev = bdev; man->size = size; man->usage = 0;
for (i = 0; i < TTM_MAX_BO_PRIORITY; ++i) INIT_LIST_HEAD(&man->lru[i]); - man->move = NULL; + spin_lock_init(&man->eviction_lock); + for (i = 0; i < TTM_NUM_MOVE_FENCES; i++) + man->eviction_fences[i] = NULL; } EXPORT_SYMBOL(ttm_resource_manager_init);
@@ -551,7 +552,7 @@ int ttm_resource_manager_evict_all(struct ttm_device *bdev, .no_wait_gpu = false, }; struct dma_fence *fence; - int ret; + int ret, i;
do { ret = ttm_bo_evict_first(bdev, man, &ctx); @@ -561,18 +562,24 @@ int ttm_resource_manager_evict_all(struct ttm_device *bdev, if (ret && ret != -ENOENT) return ret;
- spin_lock(&man->move_lock); - fence = dma_fence_get(man->move); - spin_unlock(&man->move_lock); + ret = 0;
- if (fence) { - ret = dma_fence_wait(fence, false); - dma_fence_put(fence); - if (ret) - return ret; + spin_lock(&man->eviction_lock); + for (i = 0; i < TTM_NUM_MOVE_FENCES; i++) { + fence = man->eviction_fences[i]; + if (fence && !dma_fence_is_signaled(fence)) { + dma_fence_get(fence); + spin_unlock(&man->eviction_lock); + ret = dma_fence_wait(fence, false); + dma_fence_put(fence); + if (ret) + return ret; + spin_lock(&man->eviction_lock); + } } + spin_unlock(&man->eviction_lock);
- return 0; + return ret; } EXPORT_SYMBOL(ttm_resource_manager_evict_all);
diff --git a/include/drm/ttm/ttm_resource.h b/include/drm/ttm/ttm_resource.h index f49daa504c36..50e6added509 100644 --- a/include/drm/ttm/ttm_resource.h +++ b/include/drm/ttm/ttm_resource.h @@ -50,6 +50,15 @@ struct io_mapping; struct sg_table; struct scatterlist;
+/** + * define TTM_NUM_MOVE_FENCES - How many entities can be used for evictions + * + * Pipelined evictions can be spread on multiple entities. This + * is the max number of entities that can be used by the driver + * for that purpose. + */ +#define TTM_NUM_MOVE_FENCES 8 + /** * enum ttm_lru_item_type - enumerate ttm_lru_item subclasses */ @@ -180,8 +189,8 @@ struct ttm_resource_manager_func { * @size: Size of the managed region. * @bdev: ttm device this manager belongs to * @func: structure pointer implementing the range manager. See above - * @move_lock: lock for move fence - * @move: The fence of the last pipelined move operation. + * @eviction_lock: lock for eviction fences + * @eviction_fences: The fences of the last pipelined move operation. * @lru: The lru list for this memory type. * * This structure is used to identify and manage memory types for a device. @@ -195,12 +204,12 @@ struct ttm_resource_manager { struct ttm_device *bdev; uint64_t size; const struct ttm_resource_manager_func *func; - spinlock_t move_lock;
- /* - * Protected by @move_lock. + /* This is very similar to a dma_resv object, but locking rules make + * it difficult to use one in this context. */ - struct dma_fence *move; + spinlock_t eviction_lock; + struct dma_fence *eviction_fences[TTM_NUM_MOVE_FENCES];
/* * Protected by the bdev->lru_lock. @@ -421,8 +430,12 @@ static inline bool ttm_resource_manager_used(struct ttm_resource_manager *man) static inline void ttm_resource_manager_cleanup(struct ttm_resource_manager *man) { - dma_fence_put(man->move); - man->move = NULL; + int i; + + for (i = 0; i < TTM_NUM_MOVE_FENCES; i++) { + dma_fence_put(man->eviction_fences[i]); + man->eviction_fences[i] = NULL; + } }
void ttm_lru_bulk_move_init(struct ttm_lru_bulk_move *bulk);
On 11/21/25 11:12, Pierre-Eric Pelloux-Prayer wrote:
Until now ttm stored a single pipelined eviction fence which means drivers had to use a single entity for these evictions.
To lift this requirement, this commit allows up to 8 entities to be used.
Ideally a dma_resv object would have been used as a container of the eviction fences, but the locking rules makes it complex. dma_resv all have the same ww_class, which means "Attempting to lock more mutexes after ww_acquire_done." is an error.
One alternative considered was to introduced a 2nd ww_class for specific resv to hold a single "transient" lock (= the resv lock would only be held for a short period, without taking any other locks).
The other option, is to statically reserve a fence array, and extend the existing code to deal with N fences, instead of 1.
The driver is still responsible to reserve the correct number of fence slots.
v2:
- simplified code
- dropped n_fences
- name changes
v3: use ttm_resource_manager_cleanup
Signed-off-by: Pierre-Eric Pelloux-Prayer pierre-eric.pelloux-prayer@amd.com
Reviewed-by: Christian König christian.koenig@amd.com
Going to push separately to drm-misc-next on Monday.
Regards, Christian.
.../gpu/drm/ttm/tests/ttm_bo_validate_test.c | 11 +++-- drivers/gpu/drm/ttm/tests/ttm_resource_test.c | 5 +- drivers/gpu/drm/ttm/ttm_bo.c | 47 ++++++++++--------- drivers/gpu/drm/ttm/ttm_bo_util.c | 38 ++++++++++++--- drivers/gpu/drm/ttm/ttm_resource.c | 31 +++++++----- include/drm/ttm/ttm_resource.h | 29 ++++++++---- 6 files changed, 104 insertions(+), 57 deletions(-)
diff --git a/drivers/gpu/drm/ttm/tests/ttm_bo_validate_test.c b/drivers/gpu/drm/ttm/tests/ttm_bo_validate_test.c index 3148f5d3dbd6..8f71906c4238 100644 --- a/drivers/gpu/drm/ttm/tests/ttm_bo_validate_test.c +++ b/drivers/gpu/drm/ttm/tests/ttm_bo_validate_test.c @@ -651,7 +651,7 @@ static void ttm_bo_validate_move_fence_signaled(struct kunit *test) int err; man = ttm_manager_type(priv->ttm_dev, mem_type);
- man->move = dma_fence_get_stub();
- man->eviction_fences[0] = dma_fence_get_stub();
bo = ttm_bo_kunit_init(test, test->priv, size, NULL); bo->type = bo_type; @@ -668,7 +668,7 @@ static void ttm_bo_validate_move_fence_signaled(struct kunit *test) KUNIT_EXPECT_EQ(test, ctx.bytes_moved, size); ttm_bo_put(bo);
- dma_fence_put(man->move);
- dma_fence_put(man->eviction_fences[0]);
} static const struct ttm_bo_validate_test_case ttm_bo_validate_wait_cases[] = { @@ -732,9 +732,9 @@ static void ttm_bo_validate_move_fence_not_signaled(struct kunit *test) spin_lock_init(&fence_lock); man = ttm_manager_type(priv->ttm_dev, fst_mem);
- man->move = alloc_mock_fence(test);
- man->eviction_fences[0] = alloc_mock_fence(test);
- task = kthread_create(threaded_fence_signal, man->move, "move-fence-signal");
- task = kthread_create(threaded_fence_signal, man->eviction_fences[0], "move-fence-signal"); if (IS_ERR(task)) KUNIT_FAIL(test, "Couldn't create move fence signal task\n");
@@ -742,7 +742,8 @@ static void ttm_bo_validate_move_fence_not_signaled(struct kunit *test) err = ttm_bo_validate(bo, placement_val, &ctx_val); dma_resv_unlock(bo->base.resv);
- dma_fence_wait_timeout(man->move, false, MAX_SCHEDULE_TIMEOUT);
- dma_fence_wait_timeout(man->eviction_fences[0], false, MAX_SCHEDULE_TIMEOUT);
- man->eviction_fences[0] = NULL;
KUNIT_EXPECT_EQ(test, err, 0); KUNIT_EXPECT_EQ(test, ctx_val.bytes_moved, size); diff --git a/drivers/gpu/drm/ttm/tests/ttm_resource_test.c b/drivers/gpu/drm/ttm/tests/ttm_resource_test.c index e6ea2bd01f07..c0e4e35e0442 100644 --- a/drivers/gpu/drm/ttm/tests/ttm_resource_test.c +++ b/drivers/gpu/drm/ttm/tests/ttm_resource_test.c @@ -207,6 +207,7 @@ static void ttm_resource_manager_init_basic(struct kunit *test) struct ttm_resource_test_priv *priv = test->priv; struct ttm_resource_manager *man; size_t size = SZ_16K;
- int i;
man = kunit_kzalloc(test, sizeof(*man), GFP_KERNEL); KUNIT_ASSERT_NOT_NULL(test, man); @@ -216,8 +217,8 @@ static void ttm_resource_manager_init_basic(struct kunit *test) KUNIT_ASSERT_PTR_EQ(test, man->bdev, priv->devs->ttm_dev); KUNIT_ASSERT_EQ(test, man->size, size); KUNIT_ASSERT_EQ(test, man->usage, 0);
- KUNIT_ASSERT_NULL(test, man->move);
- KUNIT_ASSERT_NOT_NULL(test, &man->move_lock);
- for (i = 0; i < TTM_NUM_MOVE_FENCES; i++)
KUNIT_ASSERT_NULL(test, man->eviction_fences[i]);for (int i = 0; i < TTM_MAX_BO_PRIORITY; ++i) KUNIT_ASSERT_TRUE(test, list_empty(&man->lru[i])); diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c index f4d9e68b21e7..0b3732ed6f6c 100644 --- a/drivers/gpu/drm/ttm/ttm_bo.c +++ b/drivers/gpu/drm/ttm/ttm_bo.c @@ -658,34 +658,35 @@ void ttm_bo_unpin(struct ttm_buffer_object *bo) EXPORT_SYMBOL(ttm_bo_unpin); /*
- Add the last move fence to the BO as kernel dependency and reserve a new
- fence slot.
- Add the pipelined eviction fencesto the BO as kernel dependency and reserve new
*/
- fence slots.
-static int ttm_bo_add_move_fence(struct ttm_buffer_object *bo,
struct ttm_resource_manager *man,bool no_wait_gpu)+static int ttm_bo_add_pipelined_eviction_fences(struct ttm_buffer_object *bo,
struct ttm_resource_manager *man,bool no_wait_gpu){ struct dma_fence *fence;
- int ret;
- int i;
- spin_lock(&man->move_lock);
- fence = dma_fence_get(man->move);
- spin_unlock(&man->move_lock);
- spin_lock(&man->eviction_lock);
- for (i = 0; i < TTM_NUM_MOVE_FENCES; i++) {
fence = man->eviction_fences[i];if (!fence)continue;
- if (!fence)
return 0;- if (no_wait_gpu) {
ret = dma_fence_is_signaled(fence) ? 0 : -EBUSY;dma_fence_put(fence);return ret;
if (no_wait_gpu) {if (!dma_fence_is_signaled(fence)) {spin_unlock(&man->eviction_lock);return -EBUSY;}} else {dma_resv_add_fence(bo->base.resv, fence, DMA_RESV_USAGE_KERNEL); }}- spin_unlock(&man->eviction_lock);
- dma_resv_add_fence(bo->base.resv, fence, DMA_RESV_USAGE_KERNEL);
- ret = dma_resv_reserve_fences(bo->base.resv, 1);
- dma_fence_put(fence);
- return ret;
- /* TODO: this call should be removed. */
- return dma_resv_reserve_fences(bo->base.resv, 1);
} /** @@ -718,7 +719,7 @@ static int ttm_bo_alloc_resource(struct ttm_buffer_object *bo, int i, ret; ticket = dma_resv_locking_ctx(bo->base.resv);
- ret = dma_resv_reserve_fences(bo->base.resv, 1);
- ret = dma_resv_reserve_fences(bo->base.resv, TTM_NUM_MOVE_FENCES); if (unlikely(ret)) return ret;
@@ -757,7 +758,7 @@ static int ttm_bo_alloc_resource(struct ttm_buffer_object *bo, return ret; }
ret = ttm_bo_add_move_fence(bo, man, ctx->no_wait_gpu);
if (unlikely(ret)) { ttm_resource_free(bo, res); if (ret == -EBUSY)ret = ttm_bo_add_pipelined_eviction_fences(bo, man, ctx->no_wait_gpu);diff --git a/drivers/gpu/drm/ttm/ttm_bo_util.c b/drivers/gpu/drm/ttm/ttm_bo_util.c index acbbca9d5c92..2ff35d55e462 100644 --- a/drivers/gpu/drm/ttm/ttm_bo_util.c +++ b/drivers/gpu/drm/ttm/ttm_bo_util.c @@ -258,7 +258,7 @@ static int ttm_buffer_object_transfer(struct ttm_buffer_object *bo, ret = dma_resv_trylock(&fbo->base.base._resv); WARN_ON(!ret);
- ret = dma_resv_reserve_fences(&fbo->base.base._resv, 1);
- ret = dma_resv_reserve_fences(&fbo->base.base._resv, TTM_NUM_MOVE_FENCES); if (ret) { dma_resv_unlock(&fbo->base.base._resv); kfree(fbo);
@@ -646,20 +646,44 @@ static void ttm_bo_move_pipeline_evict(struct ttm_buffer_object *bo, { struct ttm_device *bdev = bo->bdev; struct ttm_resource_manager *from;
- struct dma_fence *tmp;
- int i;
from = ttm_manager_type(bdev, bo->resource->mem_type); /** * BO doesn't have a TTM we need to bind/unbind. Just remember
* this eviction and free up the allocation
* this eviction and free up the allocation.* The fence will be saved in the first free slot or in the slot* already used to store a fence from the same context. Since* drivers can't use more than TTM_NUM_MOVE_FENCES contexts for */* evictions we should always find a slot to use.
- spin_lock(&from->move_lock);
- if (!from->move || dma_fence_is_later(fence, from->move)) {
dma_fence_put(from->move);from->move = dma_fence_get(fence);
- spin_lock(&from->eviction_lock);
- for (i = 0; i < TTM_NUM_MOVE_FENCES; i++) {
tmp = from->eviction_fences[i];if (!tmp)break;if (fence->context != tmp->context)continue;if (dma_fence_is_later(fence, tmp)) {dma_fence_put(tmp);break;}goto unlock;- }
- if (i < TTM_NUM_MOVE_FENCES) {
from->eviction_fences[i] = dma_fence_get(fence);- } else {
WARN(1, "not enough fence slots for all fence contexts");spin_unlock(&from->eviction_lock);dma_fence_wait(fence, false); }goto end;
- spin_unlock(&from->move_lock);
+unlock:
- spin_unlock(&from->eviction_lock);
+end: ttm_resource_free(bo, &bo->resource); } diff --git a/drivers/gpu/drm/ttm/ttm_resource.c b/drivers/gpu/drm/ttm/ttm_resource.c index e2c82ad07eb4..62c34cafa387 100644 --- a/drivers/gpu/drm/ttm/ttm_resource.c +++ b/drivers/gpu/drm/ttm/ttm_resource.c @@ -523,14 +523,15 @@ void ttm_resource_manager_init(struct ttm_resource_manager *man, { unsigned i;
- spin_lock_init(&man->move_lock); man->bdev = bdev; man->size = size; man->usage = 0;
for (i = 0; i < TTM_MAX_BO_PRIORITY; ++i) INIT_LIST_HEAD(&man->lru[i]);
- man->move = NULL;
- spin_lock_init(&man->eviction_lock);
- for (i = 0; i < TTM_NUM_MOVE_FENCES; i++)
man->eviction_fences[i] = NULL;} EXPORT_SYMBOL(ttm_resource_manager_init); @@ -551,7 +552,7 @@ int ttm_resource_manager_evict_all(struct ttm_device *bdev, .no_wait_gpu = false, }; struct dma_fence *fence;
- int ret;
- int ret, i;
do { ret = ttm_bo_evict_first(bdev, man, &ctx); @@ -561,18 +562,24 @@ int ttm_resource_manager_evict_all(struct ttm_device *bdev, if (ret && ret != -ENOENT) return ret;
- spin_lock(&man->move_lock);
- fence = dma_fence_get(man->move);
- spin_unlock(&man->move_lock);
- ret = 0;
- if (fence) {
ret = dma_fence_wait(fence, false);dma_fence_put(fence);if (ret)return ret;
- spin_lock(&man->eviction_lock);
- for (i = 0; i < TTM_NUM_MOVE_FENCES; i++) {
fence = man->eviction_fences[i];if (fence && !dma_fence_is_signaled(fence)) {dma_fence_get(fence);spin_unlock(&man->eviction_lock);ret = dma_fence_wait(fence, false);dma_fence_put(fence);if (ret)return ret;spin_lock(&man->eviction_lock); }}- spin_unlock(&man->eviction_lock);
- return 0;
- return ret;
} EXPORT_SYMBOL(ttm_resource_manager_evict_all); diff --git a/include/drm/ttm/ttm_resource.h b/include/drm/ttm/ttm_resource.h index f49daa504c36..50e6added509 100644 --- a/include/drm/ttm/ttm_resource.h +++ b/include/drm/ttm/ttm_resource.h @@ -50,6 +50,15 @@ struct io_mapping; struct sg_table; struct scatterlist; +/**
- define TTM_NUM_MOVE_FENCES - How many entities can be used for evictions
- Pipelined evictions can be spread on multiple entities. This
- is the max number of entities that can be used by the driver
- for that purpose.
- */
+#define TTM_NUM_MOVE_FENCES 8
/**
- enum ttm_lru_item_type - enumerate ttm_lru_item subclasses
*/ @@ -180,8 +189,8 @@ struct ttm_resource_manager_func {
- @size: Size of the managed region.
- @bdev: ttm device this manager belongs to
- @func: structure pointer implementing the range manager. See above
- @move_lock: lock for move fence
- @move: The fence of the last pipelined move operation.
- @eviction_lock: lock for eviction fences
- @eviction_fences: The fences of the last pipelined move operation.
- @lru: The lru list for this memory type.
- This structure is used to identify and manage memory types for a device.
@@ -195,12 +204,12 @@ struct ttm_resource_manager { struct ttm_device *bdev; uint64_t size; const struct ttm_resource_manager_func *func;
- spinlock_t move_lock;
- /*
* Protected by @move_lock.
- /* This is very similar to a dma_resv object, but locking rules make
*/* it difficult to use one in this context.
- struct dma_fence *move;
- spinlock_t eviction_lock;
- struct dma_fence *eviction_fences[TTM_NUM_MOVE_FENCES];
/* * Protected by the bdev->lru_lock. @@ -421,8 +430,12 @@ static inline bool ttm_resource_manager_used(struct ttm_resource_manager *man) static inline void ttm_resource_manager_cleanup(struct ttm_resource_manager *man) {
- dma_fence_put(man->move);
- man->move = NULL;
- int i;
- for (i = 0; i < TTM_NUM_MOVE_FENCES; i++) {
dma_fence_put(man->eviction_fences[i]);man->eviction_fences[i] = NULL;- }
} void ttm_lru_bulk_move_init(struct ttm_lru_bulk_move *bulk);
Hey,
Den 2025-11-21 kl. 16:12, skrev Christian König:
On 11/21/25 11:12, Pierre-Eric Pelloux-Prayer wrote:
Until now ttm stored a single pipelined eviction fence which means drivers had to use a single entity for these evictions.
To lift this requirement, this commit allows up to 8 entities to be used.
Ideally a dma_resv object would have been used as a container of the eviction fences, but the locking rules makes it complex. dma_resv all have the same ww_class, which means "Attempting to lock more mutexes after ww_acquire_done." is an error.
One alternative considered was to introduced a 2nd ww_class for specific resv to hold a single "transient" lock (= the resv lock would only be held for a short period, without taking any other locks).
The other option, is to statically reserve a fence array, and extend the existing code to deal with N fences, instead of 1.
The driver is still responsible to reserve the correct number of fence slots.
v2:
- simplified code
- dropped n_fences
- name changes
v3: use ttm_resource_manager_cleanup
Signed-off-by: Pierre-Eric Pelloux-Prayer pierre-eric.pelloux-prayer@amd.com
Reviewed-by: Christian König christian.koenig@amd.com
Going to push separately to drm-misc-next on Monday.
Pushing this broke drm-tip, the amd driver fails to build, as it's not using the eviction_fences array.
Kind regards, ~Maarten Lankhorst
On 11/26/25 16:34, Maarten Lankhorst wrote:
Hey,
Den 2025-11-21 kl. 16:12, skrev Christian König:
On 11/21/25 11:12, Pierre-Eric Pelloux-Prayer wrote:
Until now ttm stored a single pipelined eviction fence which means drivers had to use a single entity for these evictions.
To lift this requirement, this commit allows up to 8 entities to be used.
Ideally a dma_resv object would have been used as a container of the eviction fences, but the locking rules makes it complex. dma_resv all have the same ww_class, which means "Attempting to lock more mutexes after ww_acquire_done." is an error.
One alternative considered was to introduced a 2nd ww_class for specific resv to hold a single "transient" lock (= the resv lock would only be held for a short period, without taking any other locks).
The other option, is to statically reserve a fence array, and extend the existing code to deal with N fences, instead of 1.
The driver is still responsible to reserve the correct number of fence slots.
v2:
- simplified code
- dropped n_fences
- name changes
v3: use ttm_resource_manager_cleanup
Signed-off-by: Pierre-Eric Pelloux-Prayer pierre-eric.pelloux-prayer@amd.com
Reviewed-by: Christian König christian.koenig@amd.com
Going to push separately to drm-misc-next on Monday.
Pushing this broke drm-tip, the amd driver fails to build, as it's not using the eviction_fences array.
Thanks for the note! But hui? We changed amdgpu to not touch the move fence.
Give me a second.
Thanks, Christian.
Kind regards, ~Maarten Lankhorst
Hey,
Den 2025-11-26 kl. 16:36, skrev Christian König:
On 11/26/25 16:34, Maarten Lankhorst wrote:
Hey,
Den 2025-11-21 kl. 16:12, skrev Christian König:
On 11/21/25 11:12, Pierre-Eric Pelloux-Prayer wrote:
Until now ttm stored a single pipelined eviction fence which means drivers had to use a single entity for these evictions.
To lift this requirement, this commit allows up to 8 entities to be used.
Ideally a dma_resv object would have been used as a container of the eviction fences, but the locking rules makes it complex. dma_resv all have the same ww_class, which means "Attempting to lock more mutexes after ww_acquire_done." is an error.
One alternative considered was to introduced a 2nd ww_class for specific resv to hold a single "transient" lock (= the resv lock would only be held for a short period, without taking any other locks).
The other option, is to statically reserve a fence array, and extend the existing code to deal with N fences, instead of 1.
The driver is still responsible to reserve the correct number of fence slots.
v2:
- simplified code
- dropped n_fences
- name changes
v3: use ttm_resource_manager_cleanup
Signed-off-by: Pierre-Eric Pelloux-Prayer pierre-eric.pelloux-prayer@amd.com
Reviewed-by: Christian König christian.koenig@amd.com
Going to push separately to drm-misc-next on Monday.
Pushing this broke drm-tip, the amd driver fails to build, as it's not using the eviction_fences array.
Thanks for the note! But hui? We changed amdgpu to not touch the move fence.
Give me a second.commit 13bec21f5f4cdabdf06725e5a8dee0b9b56ff671 (HEAD -> drm-tip, drm-tip/drm-tip, drm-tip/HEAD)
Author: Christian König christian.koenig@amd.com Date: Wed Nov 26 13:13:03 2025 +0100
drm-tip: 2025y-11m-26d-12h-12m-41s UTC integration manifest
drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c:2188:34: error: ‘struct ttm_resource_manager’ has no member named ‘move’ 2188 | dma_fence_put(man->move); | ^~ drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c:2189:20: error: ‘struct ttm_resource_manager’ has no member named ‘move’ 2189 | man->move = NULL; | ^~
Is what I see.
Kind regards, ~Maarten Lankhorst
On 11/26/25 16:39, Maarten Lankhorst wrote:
Hey,
Den 2025-11-26 kl. 16:36, skrev Christian König:
On 11/26/25 16:34, Maarten Lankhorst wrote:
Hey,
Den 2025-11-21 kl. 16:12, skrev Christian König:
On 11/21/25 11:12, Pierre-Eric Pelloux-Prayer wrote:
Until now ttm stored a single pipelined eviction fence which means drivers had to use a single entity for these evictions.
To lift this requirement, this commit allows up to 8 entities to be used.
Ideally a dma_resv object would have been used as a container of the eviction fences, but the locking rules makes it complex. dma_resv all have the same ww_class, which means "Attempting to lock more mutexes after ww_acquire_done." is an error.
One alternative considered was to introduced a 2nd ww_class for specific resv to hold a single "transient" lock (= the resv lock would only be held for a short period, without taking any other locks).
The other option, is to statically reserve a fence array, and extend the existing code to deal with N fences, instead of 1.
The driver is still responsible to reserve the correct number of fence slots.
v2:
- simplified code
- dropped n_fences
- name changes
v3: use ttm_resource_manager_cleanup
Signed-off-by: Pierre-Eric Pelloux-Prayer pierre-eric.pelloux-prayer@amd.com
Reviewed-by: Christian König christian.koenig@amd.com
Going to push separately to drm-misc-next on Monday.
Pushing this broke drm-tip, the amd driver fails to build, as it's not using the eviction_fences array.
Thanks for the note! But hui? We changed amdgpu to not touch the move fence.
Give me a second.commit 13bec21f5f4cdabdf06725e5a8dee0b9b56ff671 (HEAD -> drm-tip, drm-tip/drm-tip, drm-tip/HEAD)
Author: Christian König christian.koenig@amd.com Date: Wed Nov 26 13:13:03 2025 +0100
drm-tip: 2025y-11m-26d-12h-12m-41s UTC integration manifestdrivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c:2188:34: error: ‘struct ttm_resource_manager’ has no member named ‘move’ 2188 | dma_fence_put(man->move); | ^~ drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c:2189:20: error: ‘struct ttm_resource_manager’ has no member named ‘move’ 2189 | man->move = NULL; | ^~
Is what I see.
Ah, crap, I know what's going on.
The patch to remove those lines is queued up to go upstream through amd-staging-drm-next instead of drm-misc-next.
I will push this patch to drm-misc-next and sync up with Alex that it shouldn't go upstream through amd-staging-drm-next.
Going to build test drm-tip the next time.
Thanks, Christian.
Kind regards, ~Maarten Lankhorst
Hey,
Den 2025-11-26 kl. 16:48, skrev Christian König:
On 11/26/25 16:39, Maarten Lankhorst wrote:
Hey,
Den 2025-11-26 kl. 16:36, skrev Christian König:
On 11/26/25 16:34, Maarten Lankhorst wrote:
Hey,
Den 2025-11-21 kl. 16:12, skrev Christian König:
On 11/21/25 11:12, Pierre-Eric Pelloux-Prayer wrote:
Until now ttm stored a single pipelined eviction fence which means drivers had to use a single entity for these evictions.
To lift this requirement, this commit allows up to 8 entities to be used.
Ideally a dma_resv object would have been used as a container of the eviction fences, but the locking rules makes it complex. dma_resv all have the same ww_class, which means "Attempting to lock more mutexes after ww_acquire_done." is an error.
One alternative considered was to introduced a 2nd ww_class for specific resv to hold a single "transient" lock (= the resv lock would only be held for a short period, without taking any other locks).
The other option, is to statically reserve a fence array, and extend the existing code to deal with N fences, instead of 1.
The driver is still responsible to reserve the correct number of fence slots.
v2:
- simplified code
- dropped n_fences
- name changes
v3: use ttm_resource_manager_cleanup
Signed-off-by: Pierre-Eric Pelloux-Prayer pierre-eric.pelloux-prayer@amd.com
Reviewed-by: Christian König christian.koenig@amd.com
Going to push separately to drm-misc-next on Monday.
Pushing this broke drm-tip, the amd driver fails to build, as it's not using the eviction_fences array.
Thanks for the note! But hui? We changed amdgpu to not touch the move fence.
Give me a second.commit 13bec21f5f4cdabdf06725e5a8dee0b9b56ff671 (HEAD -> drm-tip, drm-tip/drm-tip, drm-tip/HEAD)
Author: Christian König christian.koenig@amd.com Date: Wed Nov 26 13:13:03 2025 +0100
drm-tip: 2025y-11m-26d-12h-12m-41s UTC integration manifestdrivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c:2188:34: error: ‘struct ttm_resource_manager’ has no member named ‘move’ 2188 | dma_fence_put(man->move); | ^~ drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c:2189:20: error: ‘struct ttm_resource_manager’ has no member named ‘move’ 2189 | man->move = NULL; | ^~
Is what I see.
Ah, crap, I know what's going on.
The patch to remove those lines is queued up to go upstream through amd-staging-drm-next instead of drm-misc-next.
I will push this patch to drm-misc-next and sync up with Alex that it shouldn't go upstream through amd-staging-drm-next.
Going to build test drm-tip the next time.
Thank you, drm-tip now builds cleanly again!
Thanks, Christian.
Kind regards, ~Maarten Lankhorst
It's doing the same thing as amdgpu_fill_buffer(src_data=0), so drop it.
The only caveat is that amdgpu_res_cleared() return value is only valid right after allocation.
--- v2: introduce new "bool consider_clear_status" arg ---
Signed-off-by: Pierre-Eric Pelloux-Prayer pierre-eric.pelloux-prayer@amd.com --- drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 16 ++-- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 90 +++++----------------- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h | 7 +- 3 files changed, 33 insertions(+), 80 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c index 7d8d70135cc2..dccc31d0128e 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c @@ -725,13 +725,17 @@ int amdgpu_bo_create(struct amdgpu_device *adev, bo->tbo.resource->mem_type == TTM_PL_VRAM) { struct dma_fence *fence;
- r = amdgpu_ttm_clear_buffer(adev, bo, bo->tbo.base.resv, &fence); + r = amdgpu_fill_buffer(adev, amdgpu_ttm_next_clear_entity(adev), + bo, 0, NULL, &fence, + true, AMDGPU_KERNEL_JOB_ID_TTM_CLEAR_BUFFER); if (unlikely(r)) goto fail_unreserve;
- dma_resv_add_fence(bo->tbo.base.resv, fence, - DMA_RESV_USAGE_KERNEL); - dma_fence_put(fence); + if (fence) { + dma_resv_add_fence(bo->tbo.base.resv, fence, + DMA_RESV_USAGE_KERNEL); + dma_fence_put(fence); + } } if (!bp->resv) amdgpu_bo_unreserve(bo); @@ -1323,8 +1327,8 @@ void amdgpu_bo_release_notify(struct ttm_buffer_object *bo) goto out;
r = amdgpu_fill_buffer(adev, amdgpu_ttm_next_clear_entity(adev), - abo, 0, &bo->base._resv, - &fence, AMDGPU_KERNEL_JOB_ID_CLEAR_ON_RELEASE); + abo, 0, &bo->base._resv, &fence, + false, AMDGPU_KERNEL_JOB_ID_CLEAR_ON_RELEASE); if (WARN_ON(r)) goto out;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c index 39cfe2dbdf03..c65c411ce26e 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c @@ -459,7 +459,7 @@ static int amdgpu_move_blit(struct ttm_buffer_object *bo,
r = amdgpu_fill_buffer(adev, entity, abo, 0, NULL, &wipe_fence, - AMDGPU_KERNEL_JOB_ID_MOVE_BLIT); + false, AMDGPU_KERNEL_JOB_ID_MOVE_BLIT); if (r) { goto error; } else if (wipe_fence) { @@ -2459,79 +2459,28 @@ static int amdgpu_ttm_fill_mem(struct amdgpu_device *adev, }
/** - * amdgpu_ttm_clear_buffer - clear memory buffers + * amdgpu_fill_buffer - fill a buffer with a given value * @adev: amdgpu device object - * @bo: amdgpu buffer object - * @resv: reservation object - * @fence: dma_fence associated with the operation + * @entity: optional entity to use. If NULL, the clearing entities will be + * used to load-balance the partial clears + * @bo: the bo to fill + * @src_data: the value to set + * @resv: fences contained in this reservation will be used as dependencies. + * @out_fence: the fence from the last clear will be stored here. It might be + * NULL if no job was run. + * @dependency: optional input dependency fence. + * @consider_clear_status: true if region reported as cleared by amdgpu_res_cleared() + * are skipped. + * @k_job_id: trace id * - * Clear the memory buffer resource. - * - * Returns: - * 0 for success or a negative error code on failure. */ -int amdgpu_ttm_clear_buffer(struct amdgpu_device *adev, - struct amdgpu_bo *bo, - struct dma_resv *resv, - struct dma_fence **fence) -{ - struct amdgpu_ttm_buffer_entity *entity; - struct amdgpu_res_cursor cursor; - u64 addr; - int r = 0; - - if (!adev->mman.buffer_funcs_enabled) - return -EINVAL; - - if (!fence) - return -EINVAL; - entity = &adev->mman.clear_entities[0]; - *fence = dma_fence_get_stub(); - - amdgpu_res_first(bo->tbo.resource, 0, amdgpu_bo_size(bo), &cursor); - - mutex_lock(&entity->lock); - while (cursor.remaining) { - struct dma_fence *next = NULL; - u64 size; - - if (amdgpu_res_cleared(&cursor)) { - amdgpu_res_next(&cursor, cursor.size); - continue; - } - - /* Never clear more than 256MiB at once to avoid timeouts */ - size = min(cursor.size, 256ULL << 20); - - r = amdgpu_ttm_map_buffer(adev, entity, - &bo->tbo, bo->tbo.resource, &cursor, - 1, false, false, &size, &addr); - if (r) - goto err; - - r = amdgpu_ttm_fill_mem(adev, entity, 0, addr, size, resv, - &next, true, - AMDGPU_KERNEL_JOB_ID_TTM_CLEAR_BUFFER); - if (r) - goto err; - - dma_fence_put(*fence); - *fence = next; - - amdgpu_res_next(&cursor, size); - } -err: - mutex_unlock(&entity->lock); - - return r; -} - int amdgpu_fill_buffer(struct amdgpu_device *adev, struct amdgpu_ttm_buffer_entity *entity, struct amdgpu_bo *bo, uint32_t src_data, struct dma_resv *resv, - struct dma_fence **f, + struct dma_fence **out_fence, + bool consider_clear_status, u64 k_job_id) { struct dma_fence *fence = NULL; @@ -2551,6 +2500,11 @@ int amdgpu_fill_buffer(struct amdgpu_device *adev, struct dma_fence *next; uint64_t cur_size, to;
+ if (consider_clear_status && amdgpu_res_cleared(&dst)) { + amdgpu_res_next(&dst, dst.size); + continue; + } + /* Never fill more than 256MiB at once to avoid timeouts */ cur_size = min(dst.size, 256ULL << 20);
@@ -2574,9 +2528,7 @@ int amdgpu_fill_buffer(struct amdgpu_device *adev, } error: mutex_unlock(&entity->lock); - if (f) - *f = dma_fence_get(fence); - dma_fence_put(fence); + *out_fence = fence; return r; }
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h index 653a4d17543e..f3bdbcec9afc 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h @@ -181,16 +181,13 @@ int amdgpu_copy_buffer(struct amdgpu_device *adev, struct dma_resv *resv, struct dma_fence **fence, bool vm_needs_flush, uint32_t copy_flags); -int amdgpu_ttm_clear_buffer(struct amdgpu_device *adev, - struct amdgpu_bo *bo, - struct dma_resv *resv, - struct dma_fence **fence); int amdgpu_fill_buffer(struct amdgpu_device *adev, struct amdgpu_ttm_buffer_entity *entity, struct amdgpu_bo *bo, uint32_t src_data, struct dma_resv *resv, - struct dma_fence **f, + struct dma_fence **out_fence, + bool consider_clear_status, u64 k_job_id); struct amdgpu_ttm_buffer_entity *amdgpu_ttm_next_clear_entity(struct amdgpu_device *adev);
On 11/21/25 11:12, Pierre-Eric Pelloux-Prayer wrote:
It's doing the same thing as amdgpu_fill_buffer(src_data=0), so drop it.
The only caveat is that amdgpu_res_cleared() return value is only valid right after allocation.
v2: introduce new "bool consider_clear_status" arg
Signed-off-by: Pierre-Eric Pelloux-Prayer pierre-eric.pelloux-prayer@amd.com
It would be better to have that ealier in the patch set, but I guess that gives you rebasing problems?
Christian.
drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 16 ++-- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 90 +++++----------------- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h | 7 +- 3 files changed, 33 insertions(+), 80 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c index 7d8d70135cc2..dccc31d0128e 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c @@ -725,13 +725,17 @@ int amdgpu_bo_create(struct amdgpu_device *adev, bo->tbo.resource->mem_type == TTM_PL_VRAM) { struct dma_fence *fence;
r = amdgpu_ttm_clear_buffer(adev, bo, bo->tbo.base.resv, &fence);
r = amdgpu_fill_buffer(adev, amdgpu_ttm_next_clear_entity(adev),bo, 0, NULL, &fence, if (unlikely(r)) goto fail_unreserve;true, AMDGPU_KERNEL_JOB_ID_TTM_CLEAR_BUFFER);
dma_resv_add_fence(bo->tbo.base.resv, fence,DMA_RESV_USAGE_KERNEL);dma_fence_put(fence);
if (fence) {dma_resv_add_fence(bo->tbo.base.resv, fence,DMA_RESV_USAGE_KERNEL);dma_fence_put(fence); } if (!bp->resv) amdgpu_bo_unreserve(bo);}@@ -1323,8 +1327,8 @@ void amdgpu_bo_release_notify(struct ttm_buffer_object *bo) goto out; r = amdgpu_fill_buffer(adev, amdgpu_ttm_next_clear_entity(adev),
abo, 0, &bo->base._resv,&fence, AMDGPU_KERNEL_JOB_ID_CLEAR_ON_RELEASE);
abo, 0, &bo->base._resv, &fence, if (WARN_ON(r)) goto out;false, AMDGPU_KERNEL_JOB_ID_CLEAR_ON_RELEASE);diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c index 39cfe2dbdf03..c65c411ce26e 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c @@ -459,7 +459,7 @@ static int amdgpu_move_blit(struct ttm_buffer_object *bo, r = amdgpu_fill_buffer(adev, entity, abo, 0, NULL, &wipe_fence,
AMDGPU_KERNEL_JOB_ID_MOVE_BLIT);
if (r) { goto error; } else if (wipe_fence) {false, AMDGPU_KERNEL_JOB_ID_MOVE_BLIT);@@ -2459,79 +2459,28 @@ static int amdgpu_ttm_fill_mem(struct amdgpu_device *adev, } /**
- amdgpu_ttm_clear_buffer - clear memory buffers
- amdgpu_fill_buffer - fill a buffer with a given value
- @adev: amdgpu device object
- @bo: amdgpu buffer object
- @resv: reservation object
- @fence: dma_fence associated with the operation
- @entity: optional entity to use. If NULL, the clearing entities will be
used to load-balance the partial clears
- @bo: the bo to fill
- @src_data: the value to set
- @resv: fences contained in this reservation will be used as dependencies.
- @out_fence: the fence from the last clear will be stored here. It might be
NULL if no job was run.
- @dependency: optional input dependency fence.
- @consider_clear_status: true if region reported as cleared by amdgpu_res_cleared()
are skipped.
- @k_job_id: trace id
- Clear the memory buffer resource.
- Returns:
*/
- 0 for success or a negative error code on failure.
-int amdgpu_ttm_clear_buffer(struct amdgpu_device *adev,
struct amdgpu_bo *bo,struct dma_resv *resv,struct dma_fence **fence)-{
- struct amdgpu_ttm_buffer_entity *entity;
- struct amdgpu_res_cursor cursor;
- u64 addr;
- int r = 0;
- if (!adev->mman.buffer_funcs_enabled)
return -EINVAL;- if (!fence)
return -EINVAL;- entity = &adev->mman.clear_entities[0];
- *fence = dma_fence_get_stub();
- amdgpu_res_first(bo->tbo.resource, 0, amdgpu_bo_size(bo), &cursor);
- mutex_lock(&entity->lock);
- while (cursor.remaining) {
struct dma_fence *next = NULL;u64 size;if (amdgpu_res_cleared(&cursor)) {amdgpu_res_next(&cursor, cursor.size);continue;}/* Never clear more than 256MiB at once to avoid timeouts */size = min(cursor.size, 256ULL << 20);r = amdgpu_ttm_map_buffer(adev, entity,&bo->tbo, bo->tbo.resource, &cursor,1, false, false, &size, &addr);if (r)goto err;r = amdgpu_ttm_fill_mem(adev, entity, 0, addr, size, resv,&next, true,AMDGPU_KERNEL_JOB_ID_TTM_CLEAR_BUFFER);if (r)goto err;dma_fence_put(*fence);*fence = next;amdgpu_res_next(&cursor, size);- }
-err:
- mutex_unlock(&entity->lock);
- return r;
-}
int amdgpu_fill_buffer(struct amdgpu_device *adev, struct amdgpu_ttm_buffer_entity *entity, struct amdgpu_bo *bo, uint32_t src_data, struct dma_resv *resv,
struct dma_fence **f,
struct dma_fence **out_fence,bool consider_clear_status, u64 k_job_id){ struct dma_fence *fence = NULL; @@ -2551,6 +2500,11 @@ int amdgpu_fill_buffer(struct amdgpu_device *adev, struct dma_fence *next; uint64_t cur_size, to;
if (consider_clear_status && amdgpu_res_cleared(&dst)) {amdgpu_res_next(&dst, dst.size);continue;}- /* Never fill more than 256MiB at once to avoid timeouts */ cur_size = min(dst.size, 256ULL << 20);
@@ -2574,9 +2528,7 @@ int amdgpu_fill_buffer(struct amdgpu_device *adev, } error: mutex_unlock(&entity->lock);
- if (f)
*f = dma_fence_get(fence);- dma_fence_put(fence);
- *out_fence = fence; return r;
} diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h index 653a4d17543e..f3bdbcec9afc 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h @@ -181,16 +181,13 @@ int amdgpu_copy_buffer(struct amdgpu_device *adev, struct dma_resv *resv, struct dma_fence **fence, bool vm_needs_flush, uint32_t copy_flags); -int amdgpu_ttm_clear_buffer(struct amdgpu_device *adev,
struct amdgpu_bo *bo,struct dma_resv *resv,struct dma_fence **fence);int amdgpu_fill_buffer(struct amdgpu_device *adev, struct amdgpu_ttm_buffer_entity *entity, struct amdgpu_bo *bo, uint32_t src_data, struct dma_resv *resv,
struct dma_fence **f,
struct dma_fence **out_fence,bool consider_clear_status, u64 k_job_id);struct amdgpu_ttm_buffer_entity *amdgpu_ttm_next_clear_entity(struct amdgpu_device *adev);
This is the only use case for this function.
--- v2: amdgpu_ttm_clear_buffer instead of amdgpu_clear_buffer ---
Signed-off-by: Pierre-Eric Pelloux-Prayer pierre-eric.pelloux-prayer@amd.com Reviewed-by: Christian König christian.koenig@amd.com --- drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 12 +++++----- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 27 ++++++++++------------ drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h | 15 ++++++------ 3 files changed, 25 insertions(+), 29 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c index dccc31d0128e..ac1727c3634a 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c @@ -725,9 +725,9 @@ int amdgpu_bo_create(struct amdgpu_device *adev, bo->tbo.resource->mem_type == TTM_PL_VRAM) { struct dma_fence *fence;
- r = amdgpu_fill_buffer(adev, amdgpu_ttm_next_clear_entity(adev), - bo, 0, NULL, &fence, - true, AMDGPU_KERNEL_JOB_ID_TTM_CLEAR_BUFFER); + r = amdgpu_ttm_clear_buffer(adev, amdgpu_ttm_next_clear_entity(adev), + bo, NULL, &fence, + true, AMDGPU_KERNEL_JOB_ID_TTM_CLEAR_BUFFER); if (unlikely(r)) goto fail_unreserve;
@@ -1326,9 +1326,9 @@ void amdgpu_bo_release_notify(struct ttm_buffer_object *bo) if (r) goto out;
- r = amdgpu_fill_buffer(adev, amdgpu_ttm_next_clear_entity(adev), - abo, 0, &bo->base._resv, &fence, - false, AMDGPU_KERNEL_JOB_ID_CLEAR_ON_RELEASE); + r = amdgpu_ttm_clear_buffer(adev, amdgpu_ttm_next_clear_entity(adev), + abo, &bo->base._resv, &fence, + false, AMDGPU_KERNEL_JOB_ID_CLEAR_ON_RELEASE); if (WARN_ON(r)) goto out;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c index c65c411ce26e..1cc72fd94a4c 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c @@ -457,9 +457,9 @@ static int amdgpu_move_blit(struct ttm_buffer_object *bo, (abo->flags & AMDGPU_GEM_CREATE_VRAM_WIPE_ON_RELEASE)) { struct dma_fence *wipe_fence = NULL;
- r = amdgpu_fill_buffer(adev, entity, - abo, 0, NULL, &wipe_fence, - false, AMDGPU_KERNEL_JOB_ID_MOVE_BLIT); + r = amdgpu_ttm_clear_buffer(adev, entity, + abo, NULL, &wipe_fence, + false, AMDGPU_KERNEL_JOB_ID_MOVE_BLIT); if (r) { goto error; } else if (wipe_fence) { @@ -2459,29 +2459,26 @@ static int amdgpu_ttm_fill_mem(struct amdgpu_device *adev, }
/** - * amdgpu_fill_buffer - fill a buffer with a given value + * amdgpu_ttm_clear_buffer - fill a buffer with 0 * @adev: amdgpu device object * @entity: optional entity to use. If NULL, the clearing entities will be * used to load-balance the partial clears * @bo: the bo to fill - * @src_data: the value to set * @resv: fences contained in this reservation will be used as dependencies. * @out_fence: the fence from the last clear will be stored here. It might be * NULL if no job was run. - * @dependency: optional input dependency fence. * @consider_clear_status: true if region reported as cleared by amdgpu_res_cleared() * are skipped. * @k_job_id: trace id * */ -int amdgpu_fill_buffer(struct amdgpu_device *adev, - struct amdgpu_ttm_buffer_entity *entity, - struct amdgpu_bo *bo, - uint32_t src_data, - struct dma_resv *resv, - struct dma_fence **out_fence, - bool consider_clear_status, - u64 k_job_id) +int amdgpu_ttm_clear_buffer(struct amdgpu_device *adev, + struct amdgpu_ttm_buffer_entity *entity, + struct amdgpu_bo *bo, + struct dma_resv *resv, + struct dma_fence **out_fence, + bool consider_clear_status, + u64 k_job_id) { struct dma_fence *fence = NULL; struct amdgpu_res_cursor dst; @@ -2516,7 +2513,7 @@ int amdgpu_fill_buffer(struct amdgpu_device *adev, goto error;
r = amdgpu_ttm_fill_mem(adev, entity, - src_data, to, cur_size, resv, + 0, to, cur_size, resv, &next, true, k_job_id); if (r) goto error; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h index f3bdbcec9afc..fba205c1b5d7 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h @@ -181,14 +181,13 @@ int amdgpu_copy_buffer(struct amdgpu_device *adev, struct dma_resv *resv, struct dma_fence **fence, bool vm_needs_flush, uint32_t copy_flags); -int amdgpu_fill_buffer(struct amdgpu_device *adev, - struct amdgpu_ttm_buffer_entity *entity, - struct amdgpu_bo *bo, - uint32_t src_data, - struct dma_resv *resv, - struct dma_fence **out_fence, - bool consider_clear_status, - u64 k_job_id); +int amdgpu_ttm_clear_buffer(struct amdgpu_device *adev, + struct amdgpu_ttm_buffer_entity *entity, + struct amdgpu_bo *bo, + struct dma_resv *resv, + struct dma_fence **out_fence, + bool consider_clear_status, + u64 k_job_id); struct amdgpu_ttm_buffer_entity *amdgpu_ttm_next_clear_entity(struct amdgpu_device *adev);
int amdgpu_ttm_alloc_gart(struct ttm_buffer_object *bo);
linaro-mm-sig@lists.linaro.org