The drm/ttm patch modifies TTM to support multiple contexts for the pipelined moves.
Then amdgpu/ttm is updated to express dependencies between jobs explicitely, instead of relying on the ordering of execution guaranteed by the use of a single instance. With all of this in place, we can use multiple entities, with each having access to the available SDMA instances.
This rework also gives the opportunity to merge the clear functions into a single one and to optimize a bit GART usage.
(The first patch of the series has already been merged through drm-misc but I'm including it here to reduce conflicts)
Pierre-Eric Pelloux-Prayer (20): drm/amdgpu: give each kernel job a unique id drm/ttm: rework pipelined eviction fence handling drm/amdgpu: remove direct_submit arg from amdgpu_copy_buffer drm/amdgpu: introduce amdgpu_ttm_entity drm/amdgpu: pass the entity to use to ttm functions drm/amdgpu: statically assign gart windows to ttm entities drm/amdgpu: allocate multiple clear entities drm/amdgpu: allocate multiple move entities drm/amdgpu: pass optional dependency to amdgpu_fill_buffer drm/amdgpu: prepare amdgpu_fill_buffer to use N entities drm/amdgpu: use multiple entities in amdgpu_fill_buffer drm/amdgpu: use TTM_FENCES_MAX_SLOT_COUNT drm/amdgpu: use multiple entities in amdgpu_move_blit drm/amdgpu: pass all the sdma rings to amdgpu_mman drm/amdgpu: introduce amdgpu_sdma_set_vm_pte_scheds drm/amdgpu: give ttm entities access to all the sdma scheds drm/amdgpu: get rid of amdgpu_ttm_clear_buffer drm/amdgpu: rename amdgpu_fill_buffer as amdgpu_clear_buffer drm/amdgpu: use larger gart window when possible drm/amdgpu: double AMDGPU_GTT_MAX_TRANSFER_SIZE
drivers/gpu/drm/amd/amdgpu/amdgpu.h | 2 + drivers/gpu/drm/amd/amdgpu/amdgpu_benchmark.c | 7 +- drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 5 +- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 8 +- drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c | 3 +- drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c | 23 +- drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c | 6 +- drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 5 +- drivers/gpu/drm/amd/amdgpu/amdgpu_job.h | 19 +- drivers/gpu/drm/amd/amdgpu/amdgpu_jpeg.c | 3 +- drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 19 +- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 496 ++++++++++++------ drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h | 51 +- drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c | 3 +- drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c | 5 +- drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c | 8 +- drivers/gpu/drm/amd/amdgpu/amdgpu_vkms.c | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 24 +- drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_vm_cpu.c | 4 +- drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c | 4 +- drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c | 12 +- drivers/gpu/drm/amd/amdgpu/cik_sdma.c | 10 +- drivers/gpu/drm/amd/amdgpu/sdma_v2_4.c | 10 +- drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c | 10 +- drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c | 17 +- drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c | 17 +- drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c | 10 +- drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c | 10 +- drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c | 10 +- drivers/gpu/drm/amd/amdgpu/sdma_v7_0.c | 10 +- drivers/gpu/drm/amd/amdgpu/si_dma.c | 10 +- drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c | 6 +- drivers/gpu/drm/amd/amdgpu/uvd_v7_0.c | 6 +- drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 31 +- drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 2 +- .../amd/display/amdgpu_dm/amdgpu_dm_plane.c | 2 +- .../drm/amd/display/amdgpu_dm/amdgpu_dm_wb.c | 2 +- .../gpu/drm/ttm/tests/ttm_bo_validate_test.c | 13 +- drivers/gpu/drm/ttm/tests/ttm_resource_test.c | 5 +- drivers/gpu/drm/ttm/ttm_bo.c | 56 +- drivers/gpu/drm/ttm/ttm_bo_util.c | 36 +- drivers/gpu/drm/ttm/ttm_resource.c | 45 +- include/drm/ttm/ttm_resource.h | 34 +- 45 files changed, 651 insertions(+), 414 deletions(-)
Until now ttm stored a single pipelined eviction fence which means drivers had to use a single entity for these evictions.
To lift this requirement, this commit allows up to 8 entities to be used.
Ideally a dma_resv object would have been used as a container of the eviction fences, but the locking rules makes it complex. dma_resv all have the same ww_class, which means "Attempting to lock more mutexes after ww_acquire_done." is an error.
One alternative considered was to introduced a 2nd ww_class for specific resv to hold a single "transient" lock (= the resv lock would only be held for a short period, without taking any other locks).
The other option, is to statically reserve a fence array, and extend the existing code to deal with N fences, instead of 1.
The driver is still responsible to reserve the correct number of fence slots.
Lastly ttm_resource_manager.pipelined_eviction.n_fences is initialized to 1, so the new behavior is opt-in.
Signed-off-by: Pierre-Eric Pelloux-Prayer pierre-eric.pelloux-prayer@amd.com --- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 8 ++- .../gpu/drm/ttm/tests/ttm_bo_validate_test.c | 13 +++-- drivers/gpu/drm/ttm/tests/ttm_resource_test.c | 5 +- drivers/gpu/drm/ttm/ttm_bo.c | 56 ++++++++++++------- drivers/gpu/drm/ttm/ttm_bo_util.c | 36 ++++++++++-- drivers/gpu/drm/ttm/ttm_resource.c | 45 ++++++++++----- include/drm/ttm/ttm_resource.h | 34 ++++++++--- 7 files changed, 139 insertions(+), 58 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c index 326476089db3..c66f00434991 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c @@ -2156,7 +2156,7 @@ void amdgpu_ttm_set_buffer_funcs_status(struct amdgpu_device *adev, bool enable) { struct ttm_resource_manager *man = ttm_manager_type(&adev->mman.bdev, TTM_PL_VRAM); uint64_t size; - int r; + int r, i;
if (!adev->mman.initialized || amdgpu_in_reset(adev) || adev->mman.buffer_funcs_enabled == enable || adev->gmc.is_app_apu) @@ -2190,8 +2190,10 @@ void amdgpu_ttm_set_buffer_funcs_status(struct amdgpu_device *adev, bool enable) } else { drm_sched_entity_destroy(&adev->mman.high_pr); drm_sched_entity_destroy(&adev->mman.low_pr); - dma_fence_put(man->move); - man->move = NULL; + for (i = 0; i < TTM_FENCES_MAX_SLOT_COUNT; i++) { + dma_fence_put(man->pipelined_eviction.fences[i]); + man->pipelined_eviction.fences[i] = NULL; + } }
/* this just adjusts TTM size idea, which sets lpfn to the correct value */ diff --git a/drivers/gpu/drm/ttm/tests/ttm_bo_validate_test.c b/drivers/gpu/drm/ttm/tests/ttm_bo_validate_test.c index 3148f5d3dbd6..1396674e1923 100644 --- a/drivers/gpu/drm/ttm/tests/ttm_bo_validate_test.c +++ b/drivers/gpu/drm/ttm/tests/ttm_bo_validate_test.c @@ -651,7 +651,8 @@ static void ttm_bo_validate_move_fence_signaled(struct kunit *test) int err;
man = ttm_manager_type(priv->ttm_dev, mem_type); - man->move = dma_fence_get_stub(); + man->pipelined_eviction.n_fences = 1; + man->pipelined_eviction.fences[0] = dma_fence_get_stub();
bo = ttm_bo_kunit_init(test, test->priv, size, NULL); bo->type = bo_type; @@ -668,7 +669,7 @@ static void ttm_bo_validate_move_fence_signaled(struct kunit *test) KUNIT_EXPECT_EQ(test, ctx.bytes_moved, size);
ttm_bo_put(bo); - dma_fence_put(man->move); + dma_fence_put(man->pipelined_eviction.fences[0]); }
static const struct ttm_bo_validate_test_case ttm_bo_validate_wait_cases[] = { @@ -732,9 +733,10 @@ static void ttm_bo_validate_move_fence_not_signaled(struct kunit *test)
spin_lock_init(&fence_lock); man = ttm_manager_type(priv->ttm_dev, fst_mem); - man->move = alloc_mock_fence(test); + man->pipelined_eviction.n_fences = 1; + man->pipelined_eviction.fences[0] = alloc_mock_fence(test);
- task = kthread_create(threaded_fence_signal, man->move, "move-fence-signal"); + task = kthread_create(threaded_fence_signal, man->pipelined_eviction.fences[0], "move-fence-signal"); if (IS_ERR(task)) KUNIT_FAIL(test, "Couldn't create move fence signal task\n");
@@ -742,7 +744,8 @@ static void ttm_bo_validate_move_fence_not_signaled(struct kunit *test) err = ttm_bo_validate(bo, placement_val, &ctx_val); dma_resv_unlock(bo->base.resv);
- dma_fence_wait_timeout(man->move, false, MAX_SCHEDULE_TIMEOUT); + dma_fence_wait_timeout(man->pipelined_eviction.fences[0], false, MAX_SCHEDULE_TIMEOUT); + man->pipelined_eviction.fences[0] = NULL;
KUNIT_EXPECT_EQ(test, err, 0); KUNIT_EXPECT_EQ(test, ctx_val.bytes_moved, size); diff --git a/drivers/gpu/drm/ttm/tests/ttm_resource_test.c b/drivers/gpu/drm/ttm/tests/ttm_resource_test.c index e6ea2bd01f07..6dfdf759a491 100644 --- a/drivers/gpu/drm/ttm/tests/ttm_resource_test.c +++ b/drivers/gpu/drm/ttm/tests/ttm_resource_test.c @@ -207,6 +207,7 @@ static void ttm_resource_manager_init_basic(struct kunit *test) struct ttm_resource_test_priv *priv = test->priv; struct ttm_resource_manager *man; size_t size = SZ_16K; + int i;
man = kunit_kzalloc(test, sizeof(*man), GFP_KERNEL); KUNIT_ASSERT_NOT_NULL(test, man); @@ -216,8 +217,8 @@ static void ttm_resource_manager_init_basic(struct kunit *test) KUNIT_ASSERT_PTR_EQ(test, man->bdev, priv->devs->ttm_dev); KUNIT_ASSERT_EQ(test, man->size, size); KUNIT_ASSERT_EQ(test, man->usage, 0); - KUNIT_ASSERT_NULL(test, man->move); - KUNIT_ASSERT_NOT_NULL(test, &man->move_lock); + for (i = 0; i < TTM_FENCES_MAX_SLOT_COUNT; i++) + KUNIT_ASSERT_NULL(test, man->pipelined_eviction.fences[i]);
for (int i = 0; i < TTM_MAX_BO_PRIORITY; ++i) KUNIT_ASSERT_TRUE(test, list_empty(&man->lru[i])); diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c index f4d9e68b21e7..bc6d4a6c6d70 100644 --- a/drivers/gpu/drm/ttm/ttm_bo.c +++ b/drivers/gpu/drm/ttm/ttm_bo.c @@ -658,34 +658,48 @@ void ttm_bo_unpin(struct ttm_buffer_object *bo) EXPORT_SYMBOL(ttm_bo_unpin);
/* - * Add the last move fence to the BO as kernel dependency and reserve a new - * fence slot. + * Add the pipelined eviction fencesto the BO as kernel dependency and reserve new + * fence slots. */ -static int ttm_bo_add_move_fence(struct ttm_buffer_object *bo, - struct ttm_resource_manager *man, - bool no_wait_gpu) +static int ttm_bo_add_pipelined_eviction_fences(struct ttm_buffer_object *bo, + struct ttm_resource_manager *man, + bool no_wait_gpu) { + struct dma_fence *fences_to_add[TTM_FENCES_MAX_SLOT_COUNT] = {}; struct dma_fence *fence; - int ret; + bool all_signaled = true, signaled; + int i, n = 0;
- spin_lock(&man->move_lock); - fence = dma_fence_get(man->move); - spin_unlock(&man->move_lock); + spin_lock(&man->pipelined_eviction.lock); + for (i = 0; i < man->pipelined_eviction.n_fences; i++) { + fence = man->pipelined_eviction.fences[i]; + if (!fence) + continue; + signaled = dma_fence_is_signaled(fence);
- if (!fence) + if (signaled) { + dma_fence_put(man->pipelined_eviction.fences[i]); + man->pipelined_eviction.fences[i] = NULL; + } else { + all_signaled = false; + if (no_wait_gpu) { + spin_unlock(&man->pipelined_eviction.lock); + return -EBUSY; + } + fences_to_add[n++] = dma_fence_get(fence); + } + } + spin_unlock(&man->pipelined_eviction.lock); + + if (all_signaled) return 0;
- if (no_wait_gpu) { - ret = dma_fence_is_signaled(fence) ? 0 : -EBUSY; - dma_fence_put(fence); - return ret; + for (i = 0; i < n; i++) { + dma_resv_add_fence(bo->base.resv, fences_to_add[i], DMA_RESV_USAGE_KERNEL); + dma_fence_put(fences_to_add[i]); }
- dma_resv_add_fence(bo->base.resv, fence, DMA_RESV_USAGE_KERNEL); - - ret = dma_resv_reserve_fences(bo->base.resv, 1); - dma_fence_put(fence); - return ret; + return dma_resv_reserve_fences(bo->base.resv, TTM_FENCES_MAX_SLOT_COUNT); }
/** @@ -718,7 +732,7 @@ static int ttm_bo_alloc_resource(struct ttm_buffer_object *bo, int i, ret;
ticket = dma_resv_locking_ctx(bo->base.resv); - ret = dma_resv_reserve_fences(bo->base.resv, 1); + ret = dma_resv_reserve_fences(bo->base.resv, TTM_FENCES_MAX_SLOT_COUNT); if (unlikely(ret)) return ret;
@@ -757,7 +771,7 @@ static int ttm_bo_alloc_resource(struct ttm_buffer_object *bo, return ret; }
- ret = ttm_bo_add_move_fence(bo, man, ctx->no_wait_gpu); + ret = ttm_bo_add_pipelined_eviction_fences(bo, man, ctx->no_wait_gpu); if (unlikely(ret)) { ttm_resource_free(bo, res); if (ret == -EBUSY) diff --git a/drivers/gpu/drm/ttm/ttm_bo_util.c b/drivers/gpu/drm/ttm/ttm_bo_util.c index acbbca9d5c92..ada8af965acf 100644 --- a/drivers/gpu/drm/ttm/ttm_bo_util.c +++ b/drivers/gpu/drm/ttm/ttm_bo_util.c @@ -258,7 +258,7 @@ static int ttm_buffer_object_transfer(struct ttm_buffer_object *bo, ret = dma_resv_trylock(&fbo->base.base._resv); WARN_ON(!ret);
- ret = dma_resv_reserve_fences(&fbo->base.base._resv, 1); + ret = dma_resv_reserve_fences(&fbo->base.base._resv, TTM_FENCES_MAX_SLOT_COUNT); if (ret) { dma_resv_unlock(&fbo->base.base._resv); kfree(fbo); @@ -646,6 +646,8 @@ static void ttm_bo_move_pipeline_evict(struct ttm_buffer_object *bo, { struct ttm_device *bdev = bo->bdev; struct ttm_resource_manager *from; + struct dma_fence *tmp; + int i, free_slot = -1;
from = ttm_manager_type(bdev, bo->resource->mem_type);
@@ -653,13 +655,35 @@ static void ttm_bo_move_pipeline_evict(struct ttm_buffer_object *bo, * BO doesn't have a TTM we need to bind/unbind. Just remember * this eviction and free up the allocation */ - spin_lock(&from->move_lock); - if (!from->move || dma_fence_is_later(fence, from->move)) { - dma_fence_put(from->move); - from->move = dma_fence_get(fence); + spin_lock(&from->pipelined_eviction.lock); + for (i = 0; i < from->pipelined_eviction.n_fences; i++) { + tmp = from->pipelined_eviction.fences[i]; + if (!tmp) { + if (free_slot < 0) + free_slot = i; + continue; + } + if (fence->context != tmp->context) + continue; + if (dma_fence_is_later(fence, tmp)) { + dma_fence_put(tmp); + free_slot = i; + break; + } + goto unlock; + } + if (free_slot >= 0) { + from->pipelined_eviction.fences[free_slot] = dma_fence_get(fence); + } else { + WARN(1, "not enough fence slots for all fence contexts"); + spin_unlock(&from->pipelined_eviction.lock); + dma_fence_wait(fence, false); + goto end; } - spin_unlock(&from->move_lock);
+unlock: + spin_unlock(&from->pipelined_eviction.lock); +end: ttm_resource_free(bo, &bo->resource); }
diff --git a/drivers/gpu/drm/ttm/ttm_resource.c b/drivers/gpu/drm/ttm/ttm_resource.c index e2c82ad07eb4..ae0d4621cc55 100644 --- a/drivers/gpu/drm/ttm/ttm_resource.c +++ b/drivers/gpu/drm/ttm/ttm_resource.c @@ -523,14 +523,19 @@ void ttm_resource_manager_init(struct ttm_resource_manager *man, { unsigned i;
- spin_lock_init(&man->move_lock); man->bdev = bdev; man->size = size; man->usage = 0;
for (i = 0; i < TTM_MAX_BO_PRIORITY; ++i) INIT_LIST_HEAD(&man->lru[i]); - man->move = NULL; + spin_lock_init(&man->pipelined_eviction.lock); + for (i = 0; i < TTM_FENCES_MAX_SLOT_COUNT; i++) + man->pipelined_eviction.fences[i] = NULL; + /* Can be overridden by drivers that wants to use more than 1 entity + * for moves and evictions (limited to TTM_FENCES_MAX_SLOT_COUNT). + */ + man->pipelined_eviction.n_fences = 1; } EXPORT_SYMBOL(ttm_resource_manager_init);
@@ -551,7 +556,7 @@ int ttm_resource_manager_evict_all(struct ttm_device *bdev, .no_wait_gpu = false, }; struct dma_fence *fence; - int ret; + int ret, i;
do { ret = ttm_bo_evict_first(bdev, man, &ctx); @@ -561,18 +566,32 @@ int ttm_resource_manager_evict_all(struct ttm_device *bdev, if (ret && ret != -ENOENT) return ret;
- spin_lock(&man->move_lock); - fence = dma_fence_get(man->move); - spin_unlock(&man->move_lock); + ret = 0;
- if (fence) { - ret = dma_fence_wait(fence, false); - dma_fence_put(fence); - if (ret) - return ret; - } + do { + fence = NULL;
- return 0; + spin_lock(&man->pipelined_eviction.lock); + for (i = 0; i < man->pipelined_eviction.n_fences; i++) { + fence = man->pipelined_eviction.fences[i]; + man->pipelined_eviction.fences[i] = NULL; + if (fence) + break; + } + spin_unlock(&man->pipelined_eviction.lock); + + if (fence) { + ret = dma_fence_wait(fence, false); + dma_fence_put(fence); + + if (ret) + break; + } else { + break; + } + } while (1); + + return ret; } EXPORT_SYMBOL(ttm_resource_manager_evict_all);
diff --git a/include/drm/ttm/ttm_resource.h b/include/drm/ttm/ttm_resource.h index f49daa504c36..898c429b37ad 100644 --- a/include/drm/ttm/ttm_resource.h +++ b/include/drm/ttm/ttm_resource.h @@ -50,6 +50,15 @@ struct io_mapping; struct sg_table; struct scatterlist;
+/** + * define TTM_FENCES_MAX_SLOT_COUNT - How many entities can be used for evictions + * + * Pipelined evictions can be spread on multiple entities. This + * is the max number of entities that can be used by the driver + * for that purpose. + */ +#define TTM_FENCES_MAX_SLOT_COUNT 8 + /** * enum ttm_lru_item_type - enumerate ttm_lru_item subclasses */ @@ -180,8 +189,10 @@ struct ttm_resource_manager_func { * @size: Size of the managed region. * @bdev: ttm device this manager belongs to * @func: structure pointer implementing the range manager. See above - * @move_lock: lock for move fence - * @move: The fence of the last pipelined move operation. + * @pipelined_eviction.lock: lock for eviction fences + * @pipelined_eviction.n_fences: The number of fences allowed in the array. If + * 0, pipelined evictions aren't used. + * @pipelined_eviction.fences: The fences of the last pipelined move operation. * @lru: The lru list for this memory type. * * This structure is used to identify and manage memory types for a device. @@ -195,12 +206,15 @@ struct ttm_resource_manager { struct ttm_device *bdev; uint64_t size; const struct ttm_resource_manager_func *func; - spinlock_t move_lock;
- /* - * Protected by @move_lock. + /* This is very similar to a dma_resv object, but locking rules make + * it difficult to use a it in this context. */ - struct dma_fence *move; + struct { + spinlock_t lock; + int n_fences; + struct dma_fence *fences[TTM_FENCES_MAX_SLOT_COUNT]; + } pipelined_eviction;
/* * Protected by the bdev->lru_lock. @@ -421,8 +435,12 @@ static inline bool ttm_resource_manager_used(struct ttm_resource_manager *man) static inline void ttm_resource_manager_cleanup(struct ttm_resource_manager *man) { - dma_fence_put(man->move); - man->move = NULL; + int i; + + for (i = 0; i < TTM_FENCES_MAX_SLOT_COUNT; i++) { + dma_fence_put(man->pipelined_eviction.fences[i]); + man->pipelined_eviction.fences[i] = NULL; + } }
void ttm_lru_bulk_move_init(struct ttm_lru_bulk_move *bulk);
On 11/4/25 09:35, Pierre-Eric Pelloux-Prayer wrote:
Until now ttm stored a single pipelined eviction fence which means drivers had to use a single entity for these evictions.
To lift this requirement, this commit allows up to 8 entities to be used.
Ideally a dma_resv object would have been used as a container of the eviction fences, but the locking rules makes it complex. dma_resv all have the same ww_class, which means "Attempting to lock more mutexes after ww_acquire_done." is an error.
One alternative considered was to introduced a 2nd ww_class for specific resv to hold a single "transient" lock (= the resv lock would only be held for a short period, without taking any other locks).
The other option, is to statically reserve a fence array, and extend the existing code to deal with N fences, instead of 1.
The driver is still responsible to reserve the correct number of fence slots.
Lastly ttm_resource_manager.pipelined_eviction.n_fences is initialized to 1, so the new behavior is opt-in.
Signed-off-by: Pierre-Eric Pelloux-Prayer pierre-eric.pelloux-prayer@amd.com
drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 8 ++- .../gpu/drm/ttm/tests/ttm_bo_validate_test.c | 13 +++-- drivers/gpu/drm/ttm/tests/ttm_resource_test.c | 5 +- drivers/gpu/drm/ttm/ttm_bo.c | 56 ++++++++++++------- drivers/gpu/drm/ttm/ttm_bo_util.c | 36 ++++++++++-- drivers/gpu/drm/ttm/ttm_resource.c | 45 ++++++++++----- include/drm/ttm/ttm_resource.h | 34 ++++++++--- 7 files changed, 139 insertions(+), 58 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c index 326476089db3..c66f00434991 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c @@ -2156,7 +2156,7 @@ void amdgpu_ttm_set_buffer_funcs_status(struct amdgpu_device *adev, bool enable) { struct ttm_resource_manager *man = ttm_manager_type(&adev->mman.bdev, TTM_PL_VRAM); uint64_t size;
- int r;
- int r, i;
if (!adev->mman.initialized || amdgpu_in_reset(adev) || adev->mman.buffer_funcs_enabled == enable || adev->gmc.is_app_apu) @@ -2190,8 +2190,10 @@ void amdgpu_ttm_set_buffer_funcs_status(struct amdgpu_device *adev, bool enable) } else { drm_sched_entity_destroy(&adev->mman.high_pr); drm_sched_entity_destroy(&adev->mman.low_pr);
dma_fence_put(man->move);man->move = NULL;
for (i = 0; i < TTM_FENCES_MAX_SLOT_COUNT; i++) {dma_fence_put(man->pipelined_eviction.fences[i]);man->pipelined_eviction.fences[i] = NULL; }}/* this just adjusts TTM size idea, which sets lpfn to the correct value */ diff --git a/drivers/gpu/drm/ttm/tests/ttm_bo_validate_test.c b/drivers/gpu/drm/ttm/tests/ttm_bo_validate_test.c index 3148f5d3dbd6..1396674e1923 100644 --- a/drivers/gpu/drm/ttm/tests/ttm_bo_validate_test.c +++ b/drivers/gpu/drm/ttm/tests/ttm_bo_validate_test.c @@ -651,7 +651,8 @@ static void ttm_bo_validate_move_fence_signaled(struct kunit *test) int err; man = ttm_manager_type(priv->ttm_dev, mem_type);
- man->move = dma_fence_get_stub();
- man->pipelined_eviction.n_fences = 1;
- man->pipelined_eviction.fences[0] = dma_fence_get_stub();
bo = ttm_bo_kunit_init(test, test->priv, size, NULL); bo->type = bo_type; @@ -668,7 +669,7 @@ static void ttm_bo_validate_move_fence_signaled(struct kunit *test) KUNIT_EXPECT_EQ(test, ctx.bytes_moved, size); ttm_bo_put(bo);
- dma_fence_put(man->move);
- dma_fence_put(man->pipelined_eviction.fences[0]);
} static const struct ttm_bo_validate_test_case ttm_bo_validate_wait_cases[] = { @@ -732,9 +733,10 @@ static void ttm_bo_validate_move_fence_not_signaled(struct kunit *test) spin_lock_init(&fence_lock); man = ttm_manager_type(priv->ttm_dev, fst_mem);
- man->move = alloc_mock_fence(test);
- man->pipelined_eviction.n_fences = 1;
- man->pipelined_eviction.fences[0] = alloc_mock_fence(test);
- task = kthread_create(threaded_fence_signal, man->move, "move-fence-signal");
- task = kthread_create(threaded_fence_signal, man->pipelined_eviction.fences[0], "move-fence-signal"); if (IS_ERR(task)) KUNIT_FAIL(test, "Couldn't create move fence signal task\n");
@@ -742,7 +744,8 @@ static void ttm_bo_validate_move_fence_not_signaled(struct kunit *test) err = ttm_bo_validate(bo, placement_val, &ctx_val); dma_resv_unlock(bo->base.resv);
- dma_fence_wait_timeout(man->move, false, MAX_SCHEDULE_TIMEOUT);
- dma_fence_wait_timeout(man->pipelined_eviction.fences[0], false, MAX_SCHEDULE_TIMEOUT);
- man->pipelined_eviction.fences[0] = NULL;
KUNIT_EXPECT_EQ(test, err, 0); KUNIT_EXPECT_EQ(test, ctx_val.bytes_moved, size); diff --git a/drivers/gpu/drm/ttm/tests/ttm_resource_test.c b/drivers/gpu/drm/ttm/tests/ttm_resource_test.c index e6ea2bd01f07..6dfdf759a491 100644 --- a/drivers/gpu/drm/ttm/tests/ttm_resource_test.c +++ b/drivers/gpu/drm/ttm/tests/ttm_resource_test.c @@ -207,6 +207,7 @@ static void ttm_resource_manager_init_basic(struct kunit *test) struct ttm_resource_test_priv *priv = test->priv; struct ttm_resource_manager *man; size_t size = SZ_16K;
- int i;
man = kunit_kzalloc(test, sizeof(*man), GFP_KERNEL); KUNIT_ASSERT_NOT_NULL(test, man); @@ -216,8 +217,8 @@ static void ttm_resource_manager_init_basic(struct kunit *test) KUNIT_ASSERT_PTR_EQ(test, man->bdev, priv->devs->ttm_dev); KUNIT_ASSERT_EQ(test, man->size, size); KUNIT_ASSERT_EQ(test, man->usage, 0);
- KUNIT_ASSERT_NULL(test, man->move);
- KUNIT_ASSERT_NOT_NULL(test, &man->move_lock);
- for (i = 0; i < TTM_FENCES_MAX_SLOT_COUNT; i++)
KUNIT_ASSERT_NULL(test, man->pipelined_eviction.fences[i]);for (int i = 0; i < TTM_MAX_BO_PRIORITY; ++i) KUNIT_ASSERT_TRUE(test, list_empty(&man->lru[i])); diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c index f4d9e68b21e7..bc6d4a6c6d70 100644 --- a/drivers/gpu/drm/ttm/ttm_bo.c +++ b/drivers/gpu/drm/ttm/ttm_bo.c @@ -658,34 +658,48 @@ void ttm_bo_unpin(struct ttm_buffer_object *bo) EXPORT_SYMBOL(ttm_bo_unpin); /*
- Add the last move fence to the BO as kernel dependency and reserve a new
- fence slot.
- Add the pipelined eviction fencesto the BO as kernel dependency and reserve new
*/
- fence slots.
-static int ttm_bo_add_move_fence(struct ttm_buffer_object *bo,
struct ttm_resource_manager *man,bool no_wait_gpu)+static int ttm_bo_add_pipelined_eviction_fences(struct ttm_buffer_object *bo,
struct ttm_resource_manager *man,bool no_wait_gpu){
- struct dma_fence *fences_to_add[TTM_FENCES_MAX_SLOT_COUNT] = {}; struct dma_fence *fence;
- int ret;
- bool all_signaled = true, signaled;
- int i, n = 0;
- spin_lock(&man->move_lock);
- fence = dma_fence_get(man->move);
- spin_unlock(&man->move_lock);
- spin_lock(&man->pipelined_eviction.lock);
- for (i = 0; i < man->pipelined_eviction.n_fences; i++) {
fence = man->pipelined_eviction.fences[i];
if (!fence)continue;signaled = dma_fence_is_signaled(fence);
- if (!fence)
if (signaled) {dma_fence_put(man->pipelined_eviction.fences[i]);man->pipelined_eviction.fences[i] = NULL;
Please completely drop that, only check if the fences are signaled when the no_wait_gpu flag is set.
} else {all_signaled = false;if (no_wait_gpu) {spin_unlock(&man->pipelined_eviction.lock);return -EBUSY;}fences_to_add[n++] = dma_fence_get(fence);}- }
- spin_unlock(&man->pipelined_eviction.lock);
- if (all_signaled) return 0;
- if (no_wait_gpu) {
ret = dma_fence_is_signaled(fence) ? 0 : -EBUSY;dma_fence_put(fence);return ret;
- for (i = 0; i < n; i++) {
dma_resv_add_fence(bo->base.resv, fences_to_add[i], DMA_RESV_USAGE_KERNEL); }dma_fence_put(fences_to_add[i]);
- dma_resv_add_fence(bo->base.resv, fence, DMA_RESV_USAGE_KERNEL);
- ret = dma_resv_reserve_fences(bo->base.resv, 1);
- dma_fence_put(fence);
- return ret;
- return dma_resv_reserve_fences(bo->base.resv, TTM_FENCES_MAX_SLOT_COUNT);
Please separate out a patch where the call to dma_resv_reserve_fences() is removed here.
} /** @@ -718,7 +732,7 @@ static int ttm_bo_alloc_resource(struct ttm_buffer_object *bo, int i, ret; ticket = dma_resv_locking_ctx(bo->base.resv);
- ret = dma_resv_reserve_fences(bo->base.resv, 1);
- ret = dma_resv_reserve_fences(bo->base.resv, TTM_FENCES_MAX_SLOT_COUNT); if (unlikely(ret)) return ret;
@@ -757,7 +771,7 @@ static int ttm_bo_alloc_resource(struct ttm_buffer_object *bo, return ret; }
ret = ttm_bo_add_move_fence(bo, man, ctx->no_wait_gpu);
if (unlikely(ret)) { ttm_resource_free(bo, res); if (ret == -EBUSY)ret = ttm_bo_add_pipelined_eviction_fences(bo, man, ctx->no_wait_gpu);diff --git a/drivers/gpu/drm/ttm/ttm_bo_util.c b/drivers/gpu/drm/ttm/ttm_bo_util.c index acbbca9d5c92..ada8af965acf 100644 --- a/drivers/gpu/drm/ttm/ttm_bo_util.c +++ b/drivers/gpu/drm/ttm/ttm_bo_util.c @@ -258,7 +258,7 @@ static int ttm_buffer_object_transfer(struct ttm_buffer_object *bo, ret = dma_resv_trylock(&fbo->base.base._resv); WARN_ON(!ret);
- ret = dma_resv_reserve_fences(&fbo->base.base._resv, 1);
- ret = dma_resv_reserve_fences(&fbo->base.base._resv, TTM_FENCES_MAX_SLOT_COUNT); if (ret) { dma_resv_unlock(&fbo->base.base._resv); kfree(fbo);
@@ -646,6 +646,8 @@ static void ttm_bo_move_pipeline_evict(struct ttm_buffer_object *bo, { struct ttm_device *bdev = bo->bdev; struct ttm_resource_manager *from;
- struct dma_fence *tmp;
- int i, free_slot = -1;
from = ttm_manager_type(bdev, bo->resource->mem_type); @@ -653,13 +655,35 @@ static void ttm_bo_move_pipeline_evict(struct ttm_buffer_object *bo, * BO doesn't have a TTM we need to bind/unbind. Just remember * this eviction and free up the allocation */
- spin_lock(&from->move_lock);
- if (!from->move || dma_fence_is_later(fence, from->move)) {
dma_fence_put(from->move);from->move = dma_fence_get(fence);
- spin_lock(&from->pipelined_eviction.lock);
- for (i = 0; i < from->pipelined_eviction.n_fences; i++) {
tmp = from->pipelined_eviction.fences[i];if (!tmp) {if (free_slot < 0)free_slot = i;continue;
Just break here.
}if (fence->context != tmp->context)continue;if (dma_fence_is_later(fence, tmp)) {dma_fence_put(tmp);free_slot = i;break;}goto unlock;- }
- if (free_slot >= 0) {
Drop free_slot and check i here.
from->pipelined_eviction.fences[free_slot] = dma_fence_get(fence);- } else {
WARN(1, "not enough fence slots for all fence contexts");spin_unlock(&from->pipelined_eviction.lock);dma_fence_wait(fence, false); }goto end;
- spin_unlock(&from->move_lock);
+unlock:
- spin_unlock(&from->pipelined_eviction.lock);
+end: ttm_resource_free(bo, &bo->resource); } diff --git a/drivers/gpu/drm/ttm/ttm_resource.c b/drivers/gpu/drm/ttm/ttm_resource.c index e2c82ad07eb4..ae0d4621cc55 100644 --- a/drivers/gpu/drm/ttm/ttm_resource.c +++ b/drivers/gpu/drm/ttm/ttm_resource.c @@ -523,14 +523,19 @@ void ttm_resource_manager_init(struct ttm_resource_manager *man, { unsigned i;
- spin_lock_init(&man->move_lock); man->bdev = bdev; man->size = size; man->usage = 0;
for (i = 0; i < TTM_MAX_BO_PRIORITY; ++i) INIT_LIST_HEAD(&man->lru[i]);
- man->move = NULL;
- spin_lock_init(&man->pipelined_eviction.lock);
- for (i = 0; i < TTM_FENCES_MAX_SLOT_COUNT; i++)
man->pipelined_eviction.fences[i] = NULL;- /* Can be overridden by drivers that wants to use more than 1 entity
* for moves and evictions (limited to TTM_FENCES_MAX_SLOT_COUNT).*/- man->pipelined_eviction.n_fences = 1;
} EXPORT_SYMBOL(ttm_resource_manager_init); @@ -551,7 +556,7 @@ int ttm_resource_manager_evict_all(struct ttm_device *bdev, .no_wait_gpu = false, }; struct dma_fence *fence;
- int ret;
- int ret, i;
do { ret = ttm_bo_evict_first(bdev, man, &ctx); @@ -561,18 +566,32 @@ int ttm_resource_manager_evict_all(struct ttm_device *bdev, if (ret && ret != -ENOENT) return ret;
- spin_lock(&man->move_lock);
- fence = dma_fence_get(man->move);
- spin_unlock(&man->move_lock);
- ret = 0;
- if (fence) {
ret = dma_fence_wait(fence, false);dma_fence_put(fence);if (ret)return ret;- }
- do {
fence = NULL;
- return 0;
spin_lock(&man->pipelined_eviction.lock);for (i = 0; i < man->pipelined_eviction.n_fences; i++) {fence = man->pipelined_eviction.fences[i];
man->pipelined_eviction.fences[i] = NULL;
Drop that. We should never set man->pipelined_eviction.fences to NULL.
Potentially even initialize all move fences with a stub fence.
if (fence)break;}spin_unlock(&man->pipelined_eviction.lock);if (fence) {ret = dma_fence_wait(fence, false);dma_fence_put(fence);if (ret)break;} else {break;}- } while (1);
- return ret;
} EXPORT_SYMBOL(ttm_resource_manager_evict_all); diff --git a/include/drm/ttm/ttm_resource.h b/include/drm/ttm/ttm_resource.h index f49daa504c36..898c429b37ad 100644 --- a/include/drm/ttm/ttm_resource.h +++ b/include/drm/ttm/ttm_resource.h @@ -50,6 +50,15 @@ struct io_mapping; struct sg_table; struct scatterlist; +/**
- define TTM_FENCES_MAX_SLOT_COUNT - How many entities can be used for evictions
- Pipelined evictions can be spread on multiple entities. This
- is the max number of entities that can be used by the driver
- for that purpose.
- */
+#define TTM_FENCES_MAX_SLOT_COUNT 8
Make that TTM_NUM_MOVE_FENCES.
/**
- enum ttm_lru_item_type - enumerate ttm_lru_item subclasses
*/ @@ -180,8 +189,10 @@ struct ttm_resource_manager_func {
- @size: Size of the managed region.
- @bdev: ttm device this manager belongs to
- @func: structure pointer implementing the range manager. See above
- @move_lock: lock for move fence
- @move: The fence of the last pipelined move operation.
- @pipelined_eviction.lock: lock for eviction fences
- @pipelined_eviction.n_fences: The number of fences allowed in the array. If
- 0, pipelined evictions aren't used.
- @pipelined_eviction.fences: The fences of the last pipelined move operation.
- @lru: The lru list for this memory type.
- This structure is used to identify and manage memory types for a device.
@@ -195,12 +206,15 @@ struct ttm_resource_manager { struct ttm_device *bdev; uint64_t size; const struct ttm_resource_manager_func *func;
- spinlock_t move_lock;
- /*
* Protected by @move_lock.
- /* This is very similar to a dma_resv object, but locking rules make
*/* it difficult to use a it in this context.
- struct dma_fence *move;
- struct {
spinlock_t lock;int n_fences;struct dma_fence *fences[TTM_FENCES_MAX_SLOT_COUNT];- } pipelined_eviction;
Drop the separate structure, just make move an array instead.
And also drop n_fences. Just always take a look at all fences.
Regards, Christian.
/* * Protected by the bdev->lru_lock. @@ -421,8 +435,12 @@ static inline bool ttm_resource_manager_used(struct ttm_resource_manager *man) static inline void ttm_resource_manager_cleanup(struct ttm_resource_manager *man) {
- dma_fence_put(man->move);
- man->move = NULL;
- int i;
- for (i = 0; i < TTM_FENCES_MAX_SLOT_COUNT; i++) {
dma_fence_put(man->pipelined_eviction.fences[i]);man->pipelined_eviction.fences[i] = NULL;- }
} void ttm_lru_bulk_move_init(struct ttm_lru_bulk_move *bulk);
Le 04/11/2025 à 15:12, Christian König a écrit :
On 11/4/25 09:35, Pierre-Eric Pelloux-Prayer wrote:
Until now ttm stored a single pipelined eviction fence which means drivers had to use a single entity for these evictions.
To lift this requirement, this commit allows up to 8 entities to be used.
Ideally a dma_resv object would have been used as a container of the eviction fences, but the locking rules makes it complex. dma_resv all have the same ww_class, which means "Attempting to lock more mutexes after ww_acquire_done." is an error.
One alternative considered was to introduced a 2nd ww_class for specific resv to hold a single "transient" lock (= the resv lock would only be held for a short period, without taking any other locks).
The other option, is to statically reserve a fence array, and extend the existing code to deal with N fences, instead of 1.
The driver is still responsible to reserve the correct number of fence slots.
Lastly ttm_resource_manager.pipelined_eviction.n_fences is initialized to 1, so the new behavior is opt-in.
Signed-off-by: Pierre-Eric Pelloux-Prayer pierre-eric.pelloux-prayer@amd.com
drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 8 ++- .../gpu/drm/ttm/tests/ttm_bo_validate_test.c | 13 +++-- drivers/gpu/drm/ttm/tests/ttm_resource_test.c | 5 +- drivers/gpu/drm/ttm/ttm_bo.c | 56 ++++++++++++------- drivers/gpu/drm/ttm/ttm_bo_util.c | 36 ++++++++++-- drivers/gpu/drm/ttm/ttm_resource.c | 45 ++++++++++----- include/drm/ttm/ttm_resource.h | 34 ++++++++--- 7 files changed, 139 insertions(+), 58 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c index 326476089db3..c66f00434991 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c @@ -2156,7 +2156,7 @@ void amdgpu_ttm_set_buffer_funcs_status(struct amdgpu_device *adev, bool enable) { struct ttm_resource_manager *man = ttm_manager_type(&adev->mman.bdev, TTM_PL_VRAM); uint64_t size;
- int r;
- int r, i;
if (!adev->mman.initialized || amdgpu_in_reset(adev) || adev->mman.buffer_funcs_enabled == enable || adev->gmc.is_app_apu) @@ -2190,8 +2190,10 @@ void amdgpu_ttm_set_buffer_funcs_status(struct amdgpu_device *adev, bool enable) } else { drm_sched_entity_destroy(&adev->mman.high_pr); drm_sched_entity_destroy(&adev->mman.low_pr);
dma_fence_put(man->move);man->move = NULL;
for (i = 0; i < TTM_FENCES_MAX_SLOT_COUNT; i++) {dma_fence_put(man->pipelined_eviction.fences[i]);man->pipelined_eviction.fences[i] = NULL; }}/* this just adjusts TTM size idea, which sets lpfn to the correct value */ diff --git a/drivers/gpu/drm/ttm/tests/ttm_bo_validate_test.c b/drivers/gpu/drm/ttm/tests/ttm_bo_validate_test.c index 3148f5d3dbd6..1396674e1923 100644 --- a/drivers/gpu/drm/ttm/tests/ttm_bo_validate_test.c +++ b/drivers/gpu/drm/ttm/tests/ttm_bo_validate_test.c @@ -651,7 +651,8 @@ static void ttm_bo_validate_move_fence_signaled(struct kunit *test) int err; man = ttm_manager_type(priv->ttm_dev, mem_type);
- man->move = dma_fence_get_stub();
- man->pipelined_eviction.n_fences = 1;
- man->pipelined_eviction.fences[0] = dma_fence_get_stub();
bo = ttm_bo_kunit_init(test, test->priv, size, NULL); bo->type = bo_type; @@ -668,7 +669,7 @@ static void ttm_bo_validate_move_fence_signaled(struct kunit *test) KUNIT_EXPECT_EQ(test, ctx.bytes_moved, size); ttm_bo_put(bo);
- dma_fence_put(man->move);
- dma_fence_put(man->pipelined_eviction.fences[0]); }
static const struct ttm_bo_validate_test_case ttm_bo_validate_wait_cases[] = { @@ -732,9 +733,10 @@ static void ttm_bo_validate_move_fence_not_signaled(struct kunit *test) spin_lock_init(&fence_lock); man = ttm_manager_type(priv->ttm_dev, fst_mem);
- man->move = alloc_mock_fence(test);
- man->pipelined_eviction.n_fences = 1;
- man->pipelined_eviction.fences[0] = alloc_mock_fence(test);
- task = kthread_create(threaded_fence_signal, man->move, "move-fence-signal");
- task = kthread_create(threaded_fence_signal, man->pipelined_eviction.fences[0], "move-fence-signal"); if (IS_ERR(task)) KUNIT_FAIL(test, "Couldn't create move fence signal task\n");
@@ -742,7 +744,8 @@ static void ttm_bo_validate_move_fence_not_signaled(struct kunit *test) err = ttm_bo_validate(bo, placement_val, &ctx_val); dma_resv_unlock(bo->base.resv);
- dma_fence_wait_timeout(man->move, false, MAX_SCHEDULE_TIMEOUT);
- dma_fence_wait_timeout(man->pipelined_eviction.fences[0], false, MAX_SCHEDULE_TIMEOUT);
- man->pipelined_eviction.fences[0] = NULL;
KUNIT_EXPECT_EQ(test, err, 0); KUNIT_EXPECT_EQ(test, ctx_val.bytes_moved, size); diff --git a/drivers/gpu/drm/ttm/tests/ttm_resource_test.c b/drivers/gpu/drm/ttm/tests/ttm_resource_test.c index e6ea2bd01f07..6dfdf759a491 100644 --- a/drivers/gpu/drm/ttm/tests/ttm_resource_test.c +++ b/drivers/gpu/drm/ttm/tests/ttm_resource_test.c @@ -207,6 +207,7 @@ static void ttm_resource_manager_init_basic(struct kunit *test) struct ttm_resource_test_priv *priv = test->priv; struct ttm_resource_manager *man; size_t size = SZ_16K;
- int i;
man = kunit_kzalloc(test, sizeof(*man), GFP_KERNEL); KUNIT_ASSERT_NOT_NULL(test, man); @@ -216,8 +217,8 @@ static void ttm_resource_manager_init_basic(struct kunit *test) KUNIT_ASSERT_PTR_EQ(test, man->bdev, priv->devs->ttm_dev); KUNIT_ASSERT_EQ(test, man->size, size); KUNIT_ASSERT_EQ(test, man->usage, 0);
- KUNIT_ASSERT_NULL(test, man->move);
- KUNIT_ASSERT_NOT_NULL(test, &man->move_lock);
- for (i = 0; i < TTM_FENCES_MAX_SLOT_COUNT; i++)
KUNIT_ASSERT_NULL(test, man->pipelined_eviction.fences[i]);for (int i = 0; i < TTM_MAX_BO_PRIORITY; ++i) KUNIT_ASSERT_TRUE(test, list_empty(&man->lru[i])); diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c index f4d9e68b21e7..bc6d4a6c6d70 100644 --- a/drivers/gpu/drm/ttm/ttm_bo.c +++ b/drivers/gpu/drm/ttm/ttm_bo.c @@ -658,34 +658,48 @@ void ttm_bo_unpin(struct ttm_buffer_object *bo) EXPORT_SYMBOL(ttm_bo_unpin); /*
- Add the last move fence to the BO as kernel dependency and reserve a new
- fence slot.
- Add the pipelined eviction fencesto the BO as kernel dependency and reserve new
*/
- fence slots.
-static int ttm_bo_add_move_fence(struct ttm_buffer_object *bo,
struct ttm_resource_manager *man,bool no_wait_gpu)+static int ttm_bo_add_pipelined_eviction_fences(struct ttm_buffer_object *bo,
struct ttm_resource_manager *man, {bool no_wait_gpu)- struct dma_fence *fences_to_add[TTM_FENCES_MAX_SLOT_COUNT] = {}; struct dma_fence *fence;
- int ret;
- bool all_signaled = true, signaled;
- int i, n = 0;
- spin_lock(&man->move_lock);
- fence = dma_fence_get(man->move);
- spin_unlock(&man->move_lock);
- spin_lock(&man->pipelined_eviction.lock);
- for (i = 0; i < man->pipelined_eviction.n_fences; i++) {
fence = man->pipelined_eviction.fences[i];
if (!fence)continue;signaled = dma_fence_is_signaled(fence);
- if (!fence)
if (signaled) {dma_fence_put(man->pipelined_eviction.fences[i]);man->pipelined_eviction.fences[i] = NULL;Please completely drop that, only check if the fences are signaled when the no_wait_gpu flag is set.
ok.
} else {all_signaled = false;if (no_wait_gpu) {spin_unlock(&man->pipelined_eviction.lock);return -EBUSY;}fences_to_add[n++] = dma_fence_get(fence);}- }
- spin_unlock(&man->pipelined_eviction.lock);
- if (all_signaled) return 0;
- if (no_wait_gpu) {
ret = dma_fence_is_signaled(fence) ? 0 : -EBUSY;dma_fence_put(fence);return ret;
- for (i = 0; i < n; i++) {
dma_resv_add_fence(bo->base.resv, fences_to_add[i], DMA_RESV_USAGE_KERNEL); }dma_fence_put(fences_to_add[i]);
- dma_resv_add_fence(bo->base.resv, fence, DMA_RESV_USAGE_KERNEL);
- ret = dma_resv_reserve_fences(bo->base.resv, 1);
- dma_fence_put(fence);
- return ret;
- return dma_resv_reserve_fences(bo->base.resv, TTM_FENCES_MAX_SLOT_COUNT);
Please separate out a patch where the call to dma_resv_reserve_fences() is removed here.
Can you remind me why it's not needed?
} /** @@ -718,7 +732,7 @@ static int ttm_bo_alloc_resource(struct ttm_buffer_object *bo, int i, ret; ticket = dma_resv_locking_ctx(bo->base.resv);
- ret = dma_resv_reserve_fences(bo->base.resv, 1);
- ret = dma_resv_reserve_fences(bo->base.resv, TTM_FENCES_MAX_SLOT_COUNT); if (unlikely(ret)) return ret;
@@ -757,7 +771,7 @@ static int ttm_bo_alloc_resource(struct ttm_buffer_object *bo, return ret; }
ret = ttm_bo_add_move_fence(bo, man, ctx->no_wait_gpu);
if (unlikely(ret)) { ttm_resource_free(bo, res); if (ret == -EBUSY)ret = ttm_bo_add_pipelined_eviction_fences(bo, man, ctx->no_wait_gpu);diff --git a/drivers/gpu/drm/ttm/ttm_bo_util.c b/drivers/gpu/drm/ttm/ttm_bo_util.c index acbbca9d5c92..ada8af965acf 100644 --- a/drivers/gpu/drm/ttm/ttm_bo_util.c +++ b/drivers/gpu/drm/ttm/ttm_bo_util.c @@ -258,7 +258,7 @@ static int ttm_buffer_object_transfer(struct ttm_buffer_object *bo, ret = dma_resv_trylock(&fbo->base.base._resv); WARN_ON(!ret);
- ret = dma_resv_reserve_fences(&fbo->base.base._resv, 1);
- ret = dma_resv_reserve_fences(&fbo->base.base._resv, TTM_FENCES_MAX_SLOT_COUNT); if (ret) { dma_resv_unlock(&fbo->base.base._resv); kfree(fbo);
@@ -646,6 +646,8 @@ static void ttm_bo_move_pipeline_evict(struct ttm_buffer_object *bo, { struct ttm_device *bdev = bo->bdev; struct ttm_resource_manager *from;
- struct dma_fence *tmp;
- int i, free_slot = -1;
from = ttm_manager_type(bdev, bo->resource->mem_type); @@ -653,13 +655,35 @@ static void ttm_bo_move_pipeline_evict(struct ttm_buffer_object *bo, * BO doesn't have a TTM we need to bind/unbind. Just remember * this eviction and free up the allocation */
- spin_lock(&from->move_lock);
- if (!from->move || dma_fence_is_later(fence, from->move)) {
dma_fence_put(from->move);from->move = dma_fence_get(fence);
- spin_lock(&from->pipelined_eviction.lock);
- for (i = 0; i < from->pipelined_eviction.n_fences; i++) {
tmp = from->pipelined_eviction.fences[i];if (!tmp) {if (free_slot < 0)free_slot = i;continue;Just break here.
The logic here is to reuse context slots. Even if slot 0 is empty, I need to use slot 1 if slot 1's context is the same as fence->context.
This way, we're guaranteed to find a slot for all contexts used by the driver.
}if (fence->context != tmp->context)continue;if (dma_fence_is_later(fence, tmp)) {dma_fence_put(tmp);free_slot = i;break;}goto unlock;- }
- if (free_slot >= 0) {
Drop free_slot and check i here.
from->pipelined_eviction.fences[free_slot] = dma_fence_get(fence);- } else {
WARN(1, "not enough fence slots for all fence contexts");spin_unlock(&from->pipelined_eviction.lock);dma_fence_wait(fence, false); }goto end;
- spin_unlock(&from->move_lock);
+unlock:
- spin_unlock(&from->pipelined_eviction.lock);
+end: ttm_resource_free(bo, &bo->resource); } diff --git a/drivers/gpu/drm/ttm/ttm_resource.c b/drivers/gpu/drm/ttm/ttm_resource.c index e2c82ad07eb4..ae0d4621cc55 100644 --- a/drivers/gpu/drm/ttm/ttm_resource.c +++ b/drivers/gpu/drm/ttm/ttm_resource.c @@ -523,14 +523,19 @@ void ttm_resource_manager_init(struct ttm_resource_manager *man, { unsigned i;
- spin_lock_init(&man->move_lock); man->bdev = bdev; man->size = size; man->usage = 0;
for (i = 0; i < TTM_MAX_BO_PRIORITY; ++i) INIT_LIST_HEAD(&man->lru[i]);
- man->move = NULL;
- spin_lock_init(&man->pipelined_eviction.lock);
- for (i = 0; i < TTM_FENCES_MAX_SLOT_COUNT; i++)
man->pipelined_eviction.fences[i] = NULL;- /* Can be overridden by drivers that wants to use more than 1 entity
* for moves and evictions (limited to TTM_FENCES_MAX_SLOT_COUNT).*/- man->pipelined_eviction.n_fences = 1; } EXPORT_SYMBOL(ttm_resource_manager_init);
@@ -551,7 +556,7 @@ int ttm_resource_manager_evict_all(struct ttm_device *bdev, .no_wait_gpu = false, }; struct dma_fence *fence;
- int ret;
- int ret, i;
do { ret = ttm_bo_evict_first(bdev, man, &ctx); @@ -561,18 +566,32 @@ int ttm_resource_manager_evict_all(struct ttm_device *bdev, if (ret && ret != -ENOENT) return ret;
- spin_lock(&man->move_lock);
- fence = dma_fence_get(man->move);
- spin_unlock(&man->move_lock);
- ret = 0;
- if (fence) {
ret = dma_fence_wait(fence, false);dma_fence_put(fence);if (ret)return ret;- }
- do {
fence = NULL;
- return 0;
spin_lock(&man->pipelined_eviction.lock);for (i = 0; i < man->pipelined_eviction.n_fences; i++) {fence = man->pipelined_eviction.fences[i];
man->pipelined_eviction.fences[i] = NULL;Drop that. We should never set man->pipelined_eviction.fences to NULL.
Why?
Potentially even initialize all move fences with a stub fence.
if (fence)break;}spin_unlock(&man->pipelined_eviction.lock);if (fence) {ret = dma_fence_wait(fence, false);dma_fence_put(fence);if (ret)break;} else {break;}- } while (1);
- return ret; } EXPORT_SYMBOL(ttm_resource_manager_evict_all);
diff --git a/include/drm/ttm/ttm_resource.h b/include/drm/ttm/ttm_resource.h index f49daa504c36..898c429b37ad 100644 --- a/include/drm/ttm/ttm_resource.h +++ b/include/drm/ttm/ttm_resource.h @@ -50,6 +50,15 @@ struct io_mapping; struct sg_table; struct scatterlist; +/**
- define TTM_FENCES_MAX_SLOT_COUNT - How many entities can be used for evictions
- Pipelined evictions can be spread on multiple entities. This
- is the max number of entities that can be used by the driver
- for that purpose.
- */
+#define TTM_FENCES_MAX_SLOT_COUNT 8
Make that TTM_NUM_MOVE_FENCES.
Ok.
- /**
*/
- enum ttm_lru_item_type - enumerate ttm_lru_item subclasses
@@ -180,8 +189,10 @@ struct ttm_resource_manager_func {
- @size: Size of the managed region.
- @bdev: ttm device this manager belongs to
- @func: structure pointer implementing the range manager. See above
- @move_lock: lock for move fence
- @move: The fence of the last pipelined move operation.
- @pipelined_eviction.lock: lock for eviction fences
- @pipelined_eviction.n_fences: The number of fences allowed in the array. If
- 0, pipelined evictions aren't used.
- @pipelined_eviction.fences: The fences of the last pipelined move operation.
- @lru: The lru list for this memory type.
- This structure is used to identify and manage memory types for a device.
@@ -195,12 +206,15 @@ struct ttm_resource_manager { struct ttm_device *bdev; uint64_t size; const struct ttm_resource_manager_func *func;
- spinlock_t move_lock;
- /*
* Protected by @move_lock.
- /* This is very similar to a dma_resv object, but locking rules make
*/* it difficult to use a it in this context.
- struct dma_fence *move;
- struct {
spinlock_t lock;int n_fences;struct dma_fence *fences[TTM_FENCES_MAX_SLOT_COUNT];- } pipelined_eviction;
Drop the separate structure, just make move an array instead.
IMO pipelined_eviction.fences and pipelined_eviction.lock is clearer when reading the code than moves and move_lock but if you prefer I'll switch back to the old names.
And also drop n_fences. Just always take a look at all fences.
OK.
Thanks, Pierre-Eric
Regards, Christian.
/* * Protected by the bdev->lru_lock. @@ -421,8 +435,12 @@ static inline bool ttm_resource_manager_used(struct ttm_resource_manager *man) static inline void ttm_resource_manager_cleanup(struct ttm_resource_manager *man) {
- dma_fence_put(man->move);
- man->move = NULL;
- int i;
- for (i = 0; i < TTM_FENCES_MAX_SLOT_COUNT; i++) {
dma_fence_put(man->pipelined_eviction.fences[i]);man->pipelined_eviction.fences[i] = NULL;- } }
void ttm_lru_bulk_move_init(struct ttm_lru_bulk_move *bulk);
On 11/5/25 11:34, Pierre-Eric Pelloux-Prayer wrote:
+ } else { + all_signaled = false; + if (no_wait_gpu) { + spin_unlock(&man->pipelined_eviction.lock); + return -EBUSY; + } + fences_to_add[n++] = dma_fence_get(fence); + } + } + spin_unlock(&man->pipelined_eviction.lock);
+ if (all_signaled) return 0; - if (no_wait_gpu) { - ret = dma_fence_is_signaled(fence) ? 0 : -EBUSY; - dma_fence_put(fence); - return ret; + for (i = 0; i < n; i++) { + dma_resv_add_fence(bo->base.resv, fences_to_add[i], DMA_RESV_USAGE_KERNEL); + dma_fence_put(fences_to_add[i]); } - dma_resv_add_fence(bo->base.resv, fence, DMA_RESV_USAGE_KERNEL);
- ret = dma_resv_reserve_fences(bo->base.resv, 1); - dma_fence_put(fence); - return ret; + return dma_resv_reserve_fences(bo->base.resv, TTM_FENCES_MAX_SLOT_COUNT);
Please separate out a patch where the call to dma_resv_reserve_fences() is removed here.
Can you remind me why it's not needed?
Some years ago the dma_resv object had a fixed field for an exclusive fence.
When we removed that we sprinkled calls like "dma_resv_reserve_fences(bo->base.resv, 1)" all over the place where we previously used this exclusive fence slot to prevent things from going boom!
It could be that some old drivers like radeon or qxl still rely on that somewhere, but that would then clearly be a driver bug.
What we could do is to either leave it alone or remove it, but changing it to reserving TTM_FENCES_MAX_SLOT_COUNT is clearly not correct.
} /** @@ -718,7 +732,7 @@ static int ttm_bo_alloc_resource(struct ttm_buffer_object *bo, int i, ret; ticket = dma_resv_locking_ctx(bo->base.resv); - ret = dma_resv_reserve_fences(bo->base.resv, 1); + ret = dma_resv_reserve_fences(bo->base.resv, TTM_FENCES_MAX_SLOT_COUNT); if (unlikely(ret)) return ret; @@ -757,7 +771,7 @@ static int ttm_bo_alloc_resource(struct ttm_buffer_object *bo, return ret; } - ret = ttm_bo_add_move_fence(bo, man, ctx->no_wait_gpu); + ret = ttm_bo_add_pipelined_eviction_fences(bo, man, ctx->no_wait_gpu); if (unlikely(ret)) { ttm_resource_free(bo, res); if (ret == -EBUSY) diff --git a/drivers/gpu/drm/ttm/ttm_bo_util.c b/drivers/gpu/drm/ttm/ttm_bo_util.c index acbbca9d5c92..ada8af965acf 100644 --- a/drivers/gpu/drm/ttm/ttm_bo_util.c +++ b/drivers/gpu/drm/ttm/ttm_bo_util.c @@ -258,7 +258,7 @@ static int ttm_buffer_object_transfer(struct ttm_buffer_object *bo, ret = dma_resv_trylock(&fbo->base.base._resv); WARN_ON(!ret); - ret = dma_resv_reserve_fences(&fbo->base.base._resv, 1); + ret = dma_resv_reserve_fences(&fbo->base.base._resv, TTM_FENCES_MAX_SLOT_COUNT); if (ret) { dma_resv_unlock(&fbo->base.base._resv); kfree(fbo); @@ -646,6 +646,8 @@ static void ttm_bo_move_pipeline_evict(struct ttm_buffer_object *bo, { struct ttm_device *bdev = bo->bdev; struct ttm_resource_manager *from; + struct dma_fence *tmp; + int i, free_slot = -1; from = ttm_manager_type(bdev, bo->resource->mem_type); @@ -653,13 +655,35 @@ static void ttm_bo_move_pipeline_evict(struct ttm_buffer_object *bo, * BO doesn't have a TTM we need to bind/unbind. Just remember * this eviction and free up the allocation */ - spin_lock(&from->move_lock); - if (!from->move || dma_fence_is_later(fence, from->move)) { - dma_fence_put(from->move); - from->move = dma_fence_get(fence); + spin_lock(&from->pipelined_eviction.lock); + for (i = 0; i < from->pipelined_eviction.n_fences; i++) { + tmp = from->pipelined_eviction.fences[i]; + if (!tmp) { + if (free_slot < 0) + free_slot = i; + continue;
Just break here.
The logic here is to reuse context slots. Even if slot 0 is empty, I need to use slot 1 if slot 1's context is the same as fence->context.
Good point, but slot 0 should never be empty. See we fill the slots starting from 0 and then incrementing you either have NULL or a slot which doesn't match.
This way, we're guaranteed to find a slot for all contexts used by the driver.
+ } + if (fence->context != tmp->context) + continue; + if (dma_fence_is_later(fence, tmp)) { + dma_fence_put(tmp); + free_slot = i; + break; + } + goto unlock; + } + if (free_slot >= 0) {
Drop free_slot and check i here.
+ from->pipelined_eviction.fences[free_slot] = dma_fence_get(fence); + } else { + WARN(1, "not enough fence slots for all fence contexts"); + spin_unlock(&from->pipelined_eviction.lock); + dma_fence_wait(fence, false); + goto end; } - spin_unlock(&from->move_lock); +unlock: + spin_unlock(&from->pipelined_eviction.lock); +end: ttm_resource_free(bo, &bo->resource); } diff --git a/drivers/gpu/drm/ttm/ttm_resource.c b/drivers/gpu/drm/ttm/ttm_resource.c index e2c82ad07eb4..ae0d4621cc55 100644 --- a/drivers/gpu/drm/ttm/ttm_resource.c +++ b/drivers/gpu/drm/ttm/ttm_resource.c @@ -523,14 +523,19 @@ void ttm_resource_manager_init(struct ttm_resource_manager *man, { unsigned i; - spin_lock_init(&man->move_lock); man->bdev = bdev; man->size = size; man->usage = 0; for (i = 0; i < TTM_MAX_BO_PRIORITY; ++i) INIT_LIST_HEAD(&man->lru[i]); - man->move = NULL; + spin_lock_init(&man->pipelined_eviction.lock); + for (i = 0; i < TTM_FENCES_MAX_SLOT_COUNT; i++) + man->pipelined_eviction.fences[i] = NULL; + /* Can be overridden by drivers that wants to use more than 1 entity + * for moves and evictions (limited to TTM_FENCES_MAX_SLOT_COUNT). + */ + man->pipelined_eviction.n_fences = 1; } EXPORT_SYMBOL(ttm_resource_manager_init); @@ -551,7 +556,7 @@ int ttm_resource_manager_evict_all(struct ttm_device *bdev, .no_wait_gpu = false, }; struct dma_fence *fence; - int ret; + int ret, i; do { ret = ttm_bo_evict_first(bdev, man, &ctx); @@ -561,18 +566,32 @@ int ttm_resource_manager_evict_all(struct ttm_device *bdev, if (ret && ret != -ENOENT) return ret; - spin_lock(&man->move_lock); - fence = dma_fence_get(man->move); - spin_unlock(&man->move_lock); + ret = 0; - if (fence) { - ret = dma_fence_wait(fence, false); - dma_fence_put(fence); - if (ret) - return ret; - } + do { + fence = NULL; - return 0; + spin_lock(&man->pipelined_eviction.lock); + for (i = 0; i < man->pipelined_eviction.n_fences; i++) { + fence = man->pipelined_eviction.fences[i];
+ man->pipelined_eviction.fences[i] = NULL;
Drop that. We should never set man->pipelined_eviction.fences to NULL.
Why?
To simplify the logic while filling the slots.
dma_fences are made to be keept around for long.
/** * enum ttm_lru_item_type - enumerate ttm_lru_item subclasses */ @@ -180,8 +189,10 @@ struct ttm_resource_manager_func { * @size: Size of the managed region. * @bdev: ttm device this manager belongs to * @func: structure pointer implementing the range manager. See above
- @move_lock: lock for move fence
- @move: The fence of the last pipelined move operation.
- @pipelined_eviction.lock: lock for eviction fences
- @pipelined_eviction.n_fences: The number of fences allowed in the array. If
- 0, pipelined evictions aren't used.
- @pipelined_eviction.fences: The fences of the last pipelined move operation.
* @lru: The lru list for this memory type. * * This structure is used to identify and manage memory types for a device. @@ -195,12 +206,15 @@ struct ttm_resource_manager { struct ttm_device *bdev; uint64_t size; const struct ttm_resource_manager_func *func; - spinlock_t move_lock; - /* - * Protected by @move_lock. + /* This is very similar to a dma_resv object, but locking rules make + * it difficult to use a it in this context. */ - struct dma_fence *move; + struct { + spinlock_t lock; + int n_fences; + struct dma_fence *fences[TTM_FENCES_MAX_SLOT_COUNT]; + } pipelined_eviction;
Drop the separate structure, just make move an array instead.
IMO pipelined_eviction.fences and pipelined_eviction.lock is clearer when reading the code than moves and move_lock but if you prefer I'll switch back to the old names.
The name "pipelined_eviction" is just a bit to long. Maybe just eviction_fences and eviction_fences_lock?
Regards, Christian.
And also drop n_fences. Just always take a look at all fences.
OK.
Thanks, Pierre-Eric
Regards, Christian.
/* * Protected by the bdev->lru_lock. @@ -421,8 +435,12 @@ static inline bool ttm_resource_manager_used(struct ttm_resource_manager *man) static inline void ttm_resource_manager_cleanup(struct ttm_resource_manager *man) { - dma_fence_put(man->move); - man->move = NULL; + int i;
+ for (i = 0; i < TTM_FENCES_MAX_SLOT_COUNT; i++) { + dma_fence_put(man->pipelined_eviction.fences[i]); + man->pipelined_eviction.fences[i] = NULL; + } } void ttm_lru_bulk_move_init(struct ttm_lru_bulk_move *bulk);
It was always false.
Signed-off-by: Pierre-Eric Pelloux-Prayer pierre-eric.pelloux-prayer@amd.com --- drivers/gpu/drm/amd/amdgpu/amdgpu_benchmark.c | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 20 +++++++------------ drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h | 2 +- drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 2 +- 4 files changed, 10 insertions(+), 16 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_benchmark.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_benchmark.c index 199693369c7c..02c2479a8840 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_benchmark.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_benchmark.c @@ -39,7 +39,7 @@ static int amdgpu_benchmark_do_move(struct amdgpu_device *adev, unsigned size, for (i = 0; i < n; i++) { struct amdgpu_ring *ring = adev->mman.buffer_funcs_ring; r = amdgpu_copy_buffer(ring, saddr, daddr, size, NULL, &fence, - false, false, 0); + false, 0); if (r) goto exit_do_move; r = dma_fence_wait(fence, false); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c index c66f00434991..fce22712396b 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c @@ -354,7 +354,7 @@ static int amdgpu_ttm_copy_mem_to_mem(struct amdgpu_device *adev, }
r = amdgpu_copy_buffer(ring, from, to, cur_size, resv, - &next, false, true, copy_flags); + &next, true, copy_flags); if (r) goto error;
@@ -2211,16 +2211,13 @@ void amdgpu_ttm_set_buffer_funcs_status(struct amdgpu_device *adev, bool enable) }
static int amdgpu_ttm_prepare_job(struct amdgpu_device *adev, - bool direct_submit, unsigned int num_dw, struct dma_resv *resv, bool vm_needs_flush, struct amdgpu_job **job, bool delayed, u64 k_job_id) { - enum amdgpu_ib_pool_type pool = direct_submit ? - AMDGPU_IB_POOL_DIRECT : - AMDGPU_IB_POOL_DELAYED; + enum amdgpu_ib_pool_type pool = AMDGPU_IB_POOL_DELAYED; int r; struct drm_sched_entity *entity = delayed ? &adev->mman.low_pr : &adev->mman.high_pr; @@ -2246,7 +2243,7 @@ static int amdgpu_ttm_prepare_job(struct amdgpu_device *adev, int amdgpu_copy_buffer(struct amdgpu_ring *ring, uint64_t src_offset, uint64_t dst_offset, uint32_t byte_count, struct dma_resv *resv, - struct dma_fence **fence, bool direct_submit, + struct dma_fence **fence, bool vm_needs_flush, uint32_t copy_flags) { struct amdgpu_device *adev = ring->adev; @@ -2256,7 +2253,7 @@ int amdgpu_copy_buffer(struct amdgpu_ring *ring, uint64_t src_offset, unsigned int i; int r;
- if (!direct_submit && !ring->sched.ready) { + if (!ring->sched.ready) { dev_err(adev->dev, "Trying to move memory with ring turned off.\n"); return -EINVAL; @@ -2265,7 +2262,7 @@ int amdgpu_copy_buffer(struct amdgpu_ring *ring, uint64_t src_offset, max_bytes = adev->mman.buffer_funcs->copy_max_bytes; num_loops = DIV_ROUND_UP(byte_count, max_bytes); num_dw = ALIGN(num_loops * adev->mman.buffer_funcs->copy_num_dw, 8); - r = amdgpu_ttm_prepare_job(adev, direct_submit, num_dw, + r = amdgpu_ttm_prepare_job(adev, num_dw, resv, vm_needs_flush, &job, false, AMDGPU_KERNEL_JOB_ID_TTM_COPY_BUFFER); if (r) @@ -2283,10 +2280,7 @@ int amdgpu_copy_buffer(struct amdgpu_ring *ring, uint64_t src_offset,
amdgpu_ring_pad_ib(ring, &job->ibs[0]); WARN_ON(job->ibs[0].length_dw > num_dw); - if (direct_submit) - r = amdgpu_job_submit_direct(job, ring, fence); - else - *fence = amdgpu_job_submit(job); + *fence = amdgpu_job_submit(job); if (r) goto error_free;
@@ -2315,7 +2309,7 @@ static int amdgpu_ttm_fill_mem(struct amdgpu_ring *ring, uint32_t src_data, max_bytes = adev->mman.buffer_funcs->fill_max_bytes; num_loops = DIV_ROUND_UP_ULL(byte_count, max_bytes); num_dw = ALIGN(num_loops * adev->mman.buffer_funcs->fill_num_dw, 8); - r = amdgpu_ttm_prepare_job(adev, false, num_dw, resv, vm_needs_flush, + r = amdgpu_ttm_prepare_job(adev, num_dw, resv, vm_needs_flush, &job, delayed, k_job_id); if (r) return r; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h index 577ee04ce0bf..50e40380fe95 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h @@ -166,7 +166,7 @@ void amdgpu_ttm_set_buffer_funcs_status(struct amdgpu_device *adev, int amdgpu_copy_buffer(struct amdgpu_ring *ring, uint64_t src_offset, uint64_t dst_offset, uint32_t byte_count, struct dma_resv *resv, - struct dma_fence **fence, bool direct_submit, + struct dma_fence **fence, bool vm_needs_flush, uint32_t copy_flags); int amdgpu_ttm_clear_buffer(struct amdgpu_bo *bo, struct dma_resv *resv, diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c index 46c84fc60af1..378af0b2aaa9 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c @@ -153,7 +153,7 @@ svm_migrate_copy_memory_gart(struct amdgpu_device *adev, dma_addr_t *sys, }
r = amdgpu_copy_buffer(ring, gart_s, gart_d, size * PAGE_SIZE, - NULL, &next, false, true, 0); + NULL, &next, true, 0); if (r) { dev_err(adev->dev, "fail %d to copy memory\n", r); goto out_unlock;
On 11/4/25 09:35, Pierre-Eric Pelloux-Prayer wrote:
It was always false.
Signed-off-by: Pierre-Eric Pelloux-Prayer pierre-eric.pelloux-prayer@amd.com
Reviewed-by: Christian König christian.koenig@amd.com
drivers/gpu/drm/amd/amdgpu/amdgpu_benchmark.c | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 20 +++++++------------ drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h | 2 +- drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 2 +- 4 files changed, 10 insertions(+), 16 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_benchmark.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_benchmark.c index 199693369c7c..02c2479a8840 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_benchmark.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_benchmark.c @@ -39,7 +39,7 @@ static int amdgpu_benchmark_do_move(struct amdgpu_device *adev, unsigned size, for (i = 0; i < n; i++) { struct amdgpu_ring *ring = adev->mman.buffer_funcs_ring; r = amdgpu_copy_buffer(ring, saddr, daddr, size, NULL, &fence,
false, false, 0);
if (r) goto exit_do_move; r = dma_fence_wait(fence, false);false, 0);diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c index c66f00434991..fce22712396b 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c @@ -354,7 +354,7 @@ static int amdgpu_ttm_copy_mem_to_mem(struct amdgpu_device *adev, } r = amdgpu_copy_buffer(ring, from, to, cur_size, resv,
&next, false, true, copy_flags);
if (r) goto error;&next, true, copy_flags);@@ -2211,16 +2211,13 @@ void amdgpu_ttm_set_buffer_funcs_status(struct amdgpu_device *adev, bool enable) } static int amdgpu_ttm_prepare_job(struct amdgpu_device *adev,
bool direct_submit, unsigned int num_dw, struct dma_resv *resv, bool vm_needs_flush, struct amdgpu_job **job, bool delayed, u64 k_job_id){
- enum amdgpu_ib_pool_type pool = direct_submit ?
AMDGPU_IB_POOL_DIRECT :AMDGPU_IB_POOL_DELAYED;
- enum amdgpu_ib_pool_type pool = AMDGPU_IB_POOL_DELAYED; int r; struct drm_sched_entity *entity = delayed ? &adev->mman.low_pr : &adev->mman.high_pr;
@@ -2246,7 +2243,7 @@ static int amdgpu_ttm_prepare_job(struct amdgpu_device *adev, int amdgpu_copy_buffer(struct amdgpu_ring *ring, uint64_t src_offset, uint64_t dst_offset, uint32_t byte_count, struct dma_resv *resv,
struct dma_fence **fence, bool direct_submit,
struct dma_fence **fence, bool vm_needs_flush, uint32_t copy_flags){ struct amdgpu_device *adev = ring->adev; @@ -2256,7 +2253,7 @@ int amdgpu_copy_buffer(struct amdgpu_ring *ring, uint64_t src_offset, unsigned int i; int r;
- if (!direct_submit && !ring->sched.ready) {
- if (!ring->sched.ready) { dev_err(adev->dev, "Trying to move memory with ring turned off.\n"); return -EINVAL;
@@ -2265,7 +2262,7 @@ int amdgpu_copy_buffer(struct amdgpu_ring *ring, uint64_t src_offset, max_bytes = adev->mman.buffer_funcs->copy_max_bytes; num_loops = DIV_ROUND_UP(byte_count, max_bytes); num_dw = ALIGN(num_loops * adev->mman.buffer_funcs->copy_num_dw, 8);
- r = amdgpu_ttm_prepare_job(adev, direct_submit, num_dw,
- r = amdgpu_ttm_prepare_job(adev, num_dw, resv, vm_needs_flush, &job, false, AMDGPU_KERNEL_JOB_ID_TTM_COPY_BUFFER); if (r)
@@ -2283,10 +2280,7 @@ int amdgpu_copy_buffer(struct amdgpu_ring *ring, uint64_t src_offset, amdgpu_ring_pad_ib(ring, &job->ibs[0]); WARN_ON(job->ibs[0].length_dw > num_dw);
- if (direct_submit)
r = amdgpu_job_submit_direct(job, ring, fence);- else
*fence = amdgpu_job_submit(job);
- *fence = amdgpu_job_submit(job); if (r) goto error_free;
@@ -2315,7 +2309,7 @@ static int amdgpu_ttm_fill_mem(struct amdgpu_ring *ring, uint32_t src_data, max_bytes = adev->mman.buffer_funcs->fill_max_bytes; num_loops = DIV_ROUND_UP_ULL(byte_count, max_bytes); num_dw = ALIGN(num_loops * adev->mman.buffer_funcs->fill_num_dw, 8);
- r = amdgpu_ttm_prepare_job(adev, false, num_dw, resv, vm_needs_flush,
- r = amdgpu_ttm_prepare_job(adev, num_dw, resv, vm_needs_flush, &job, delayed, k_job_id); if (r) return r;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h index 577ee04ce0bf..50e40380fe95 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h @@ -166,7 +166,7 @@ void amdgpu_ttm_set_buffer_funcs_status(struct amdgpu_device *adev, int amdgpu_copy_buffer(struct amdgpu_ring *ring, uint64_t src_offset, uint64_t dst_offset, uint32_t byte_count, struct dma_resv *resv,
struct dma_fence **fence, bool direct_submit,
struct dma_fence **fence, bool vm_needs_flush, uint32_t copy_flags);int amdgpu_ttm_clear_buffer(struct amdgpu_bo *bo, struct dma_resv *resv, diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c index 46c84fc60af1..378af0b2aaa9 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c @@ -153,7 +153,7 @@ svm_migrate_copy_memory_gart(struct amdgpu_device *adev, dma_addr_t *sys, } r = amdgpu_copy_buffer(ring, gart_s, gart_d, size * PAGE_SIZE,
NULL, &next, false, true, 0);
if (r) { dev_err(adev->dev, "fail %d to copy memory\n", r); goto out_unlock;NULL, &next, true, 0);
This way the caller can select the one it wants to use.
Signed-off-by: Pierre-Eric Pelloux-Prayer pierre-eric.pelloux-prayer@amd.com --- drivers/gpu/drm/amd/amdgpu/amdgpu_benchmark.c | 3 +- drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 4 +- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 75 +++++++++++-------- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h | 16 ++-- drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 3 +- 5 files changed, 60 insertions(+), 41 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_benchmark.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_benchmark.c index 02c2479a8840..b59040a8771f 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_benchmark.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_benchmark.c @@ -38,7 +38,8 @@ static int amdgpu_benchmark_do_move(struct amdgpu_device *adev, unsigned size, stime = ktime_get(); for (i = 0; i < n; i++) { struct amdgpu_ring *ring = adev->mman.buffer_funcs_ring; - r = amdgpu_copy_buffer(ring, saddr, daddr, size, NULL, &fence, + r = amdgpu_copy_buffer(ring, &adev->mman.default_entity.base, + saddr, daddr, size, NULL, &fence, false, 0); if (r) goto exit_do_move; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c index e08f58de4b17..c06c132a753c 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c @@ -1321,8 +1321,8 @@ void amdgpu_bo_release_notify(struct ttm_buffer_object *bo) if (r) goto out;
- r = amdgpu_fill_buffer(abo, 0, &bo->base._resv, &fence, true, - AMDGPU_KERNEL_JOB_ID_CLEAR_ON_RELEASE); + r = amdgpu_fill_buffer(&adev->mman.clear_entity, abo, 0, &bo->base._resv, + &fence, AMDGPU_KERNEL_JOB_ID_CLEAR_ON_RELEASE); if (WARN_ON(r)) goto out;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c index 94e909905c64..bd35bea4b573 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c @@ -164,6 +164,7 @@ static void amdgpu_evict_flags(struct ttm_buffer_object *bo,
/** * amdgpu_ttm_map_buffer - Map memory into the GART windows + * @entity: entity to run the window setup job * @bo: buffer object to map * @mem: memory object to map * @mm_cur: range to map @@ -176,7 +177,8 @@ static void amdgpu_evict_flags(struct ttm_buffer_object *bo, * Setup one of the GART windows to access a specific piece of memory or return * the physical address for local memory. */ -static int amdgpu_ttm_map_buffer(struct ttm_buffer_object *bo, +static int amdgpu_ttm_map_buffer(struct drm_sched_entity *entity, + struct ttm_buffer_object *bo, struct ttm_resource *mem, struct amdgpu_res_cursor *mm_cur, unsigned int window, struct amdgpu_ring *ring, @@ -224,7 +226,7 @@ static int amdgpu_ttm_map_buffer(struct ttm_buffer_object *bo, num_dw = ALIGN(adev->mman.buffer_funcs->copy_num_dw, 8); num_bytes = num_pages * 8 * AMDGPU_GPU_PAGES_IN_CPU_PAGE;
- r = amdgpu_job_alloc_with_ib(adev, &adev->mman.default_entity.base, + r = amdgpu_job_alloc_with_ib(adev, entity, AMDGPU_FENCE_OWNER_UNDEFINED, num_dw * 4 + num_bytes, AMDGPU_IB_POOL_DELAYED, &job, @@ -274,6 +276,7 @@ static int amdgpu_ttm_map_buffer(struct ttm_buffer_object *bo, /** * amdgpu_ttm_copy_mem_to_mem - Helper function for copy * @adev: amdgpu device + * @entity: entity to run the jobs * @src: buffer/address where to read from * @dst: buffer/address where to write to * @size: number of bytes to copy @@ -288,6 +291,7 @@ static int amdgpu_ttm_map_buffer(struct ttm_buffer_object *bo, */ __attribute__((nonnull)) static int amdgpu_ttm_copy_mem_to_mem(struct amdgpu_device *adev, + struct drm_sched_entity *entity, const struct amdgpu_copy_mem *src, const struct amdgpu_copy_mem *dst, uint64_t size, bool tmz, @@ -320,12 +324,14 @@ static int amdgpu_ttm_copy_mem_to_mem(struct amdgpu_device *adev, cur_size = min3(src_mm.size, dst_mm.size, 256ULL << 20);
/* Map src to window 0 and dst to window 1. */ - r = amdgpu_ttm_map_buffer(src->bo, src->mem, &src_mm, + r = amdgpu_ttm_map_buffer(entity, + src->bo, src->mem, &src_mm, 0, ring, tmz, &cur_size, &from); if (r) goto error;
- r = amdgpu_ttm_map_buffer(dst->bo, dst->mem, &dst_mm, + r = amdgpu_ttm_map_buffer(entity, + dst->bo, dst->mem, &dst_mm, 1, ring, tmz, &cur_size, &to); if (r) goto error; @@ -353,7 +359,7 @@ static int amdgpu_ttm_copy_mem_to_mem(struct amdgpu_device *adev, write_compress_disable)); }
- r = amdgpu_copy_buffer(ring, from, to, cur_size, resv, + r = amdgpu_copy_buffer(ring, entity, from, to, cur_size, resv, &next, true, copy_flags); if (r) goto error; @@ -394,7 +400,9 @@ static int amdgpu_move_blit(struct ttm_buffer_object *bo, src.offset = 0; dst.offset = 0;
- r = amdgpu_ttm_copy_mem_to_mem(adev, &src, &dst, + r = amdgpu_ttm_copy_mem_to_mem(adev, + &adev->mman.move_entity.base, + &src, &dst, new_mem->size, amdgpu_bo_encrypted(abo), bo->base.resv, &fence); @@ -406,8 +414,9 @@ static int amdgpu_move_blit(struct ttm_buffer_object *bo, (abo->flags & AMDGPU_GEM_CREATE_VRAM_WIPE_ON_RELEASE)) { struct dma_fence *wipe_fence = NULL;
- r = amdgpu_fill_buffer(abo, 0, NULL, &wipe_fence, - false, AMDGPU_KERNEL_JOB_ID_MOVE_BLIT); + r = amdgpu_fill_buffer(&adev->mman.move_entity, + abo, 0, NULL, &wipe_fence, + AMDGPU_KERNEL_JOB_ID_MOVE_BLIT); if (r) { goto error; } else if (wipe_fence) { @@ -2223,16 +2232,15 @@ void amdgpu_ttm_set_buffer_funcs_status(struct amdgpu_device *adev, bool enable) }
static int amdgpu_ttm_prepare_job(struct amdgpu_device *adev, + struct drm_sched_entity *entity, unsigned int num_dw, struct dma_resv *resv, bool vm_needs_flush, struct amdgpu_job **job, - bool delayed, u64 k_job_id) + u64 k_job_id) { enum amdgpu_ib_pool_type pool = AMDGPU_IB_POOL_DELAYED; int r; - struct drm_sched_entity *entity = delayed ? &adev->mman.clear_entity.base : - &adev->mman.move_entity.base; r = amdgpu_job_alloc_with_ib(adev, entity, AMDGPU_FENCE_OWNER_UNDEFINED, num_dw * 4, pool, job, k_job_id); @@ -2252,7 +2260,9 @@ static int amdgpu_ttm_prepare_job(struct amdgpu_device *adev, DMA_RESV_USAGE_BOOKKEEP); }
-int amdgpu_copy_buffer(struct amdgpu_ring *ring, uint64_t src_offset, +int amdgpu_copy_buffer(struct amdgpu_ring *ring, + struct drm_sched_entity *entity, + uint64_t src_offset, uint64_t dst_offset, uint32_t byte_count, struct dma_resv *resv, struct dma_fence **fence, @@ -2274,8 +2284,8 @@ int amdgpu_copy_buffer(struct amdgpu_ring *ring, uint64_t src_offset, max_bytes = adev->mman.buffer_funcs->copy_max_bytes; num_loops = DIV_ROUND_UP(byte_count, max_bytes); num_dw = ALIGN(num_loops * adev->mman.buffer_funcs->copy_num_dw, 8); - r = amdgpu_ttm_prepare_job(adev, num_dw, - resv, vm_needs_flush, &job, false, + r = amdgpu_ttm_prepare_job(adev, entity, num_dw, + resv, vm_needs_flush, &job, AMDGPU_KERNEL_JOB_ID_TTM_COPY_BUFFER); if (r) return r; @@ -2304,11 +2314,13 @@ int amdgpu_copy_buffer(struct amdgpu_ring *ring, uint64_t src_offset, return r; }
-static int amdgpu_ttm_fill_mem(struct amdgpu_ring *ring, uint32_t src_data, +static int amdgpu_ttm_fill_mem(struct amdgpu_ring *ring, + struct drm_sched_entity *entity, + uint32_t src_data, uint64_t dst_addr, uint32_t byte_count, struct dma_resv *resv, struct dma_fence **fence, - bool vm_needs_flush, bool delayed, + bool vm_needs_flush, u64 k_job_id) { struct amdgpu_device *adev = ring->adev; @@ -2321,8 +2333,8 @@ static int amdgpu_ttm_fill_mem(struct amdgpu_ring *ring, uint32_t src_data, max_bytes = adev->mman.buffer_funcs->fill_max_bytes; num_loops = DIV_ROUND_UP_ULL(byte_count, max_bytes); num_dw = ALIGN(num_loops * adev->mman.buffer_funcs->fill_num_dw, 8); - r = amdgpu_ttm_prepare_job(adev, num_dw, resv, vm_needs_flush, - &job, delayed, k_job_id); + r = amdgpu_ttm_prepare_job(adev, entity, num_dw, resv, + vm_needs_flush, &job, k_job_id); if (r) return r;
@@ -2386,13 +2398,14 @@ int amdgpu_ttm_clear_buffer(struct amdgpu_bo *bo, /* Never clear more than 256MiB at once to avoid timeouts */ size = min(cursor.size, 256ULL << 20);
- r = amdgpu_ttm_map_buffer(&bo->tbo, bo->tbo.resource, &cursor, + r = amdgpu_ttm_map_buffer(&adev->mman.clear_entity.base, + &bo->tbo, bo->tbo.resource, &cursor, 1, ring, false, &size, &addr); if (r) goto err;
- r = amdgpu_ttm_fill_mem(ring, 0, addr, size, resv, - &next, true, true, + r = amdgpu_ttm_fill_mem(ring, &adev->mman.clear_entity.base, 0, addr, size, resv, + &next, true, AMDGPU_KERNEL_JOB_ID_TTM_CLEAR_BUFFER); if (r) goto err; @@ -2408,12 +2421,12 @@ int amdgpu_ttm_clear_buffer(struct amdgpu_bo *bo, return r; }
-int amdgpu_fill_buffer(struct amdgpu_bo *bo, - uint32_t src_data, - struct dma_resv *resv, - struct dma_fence **f, - bool delayed, - u64 k_job_id) +int amdgpu_fill_buffer(struct amdgpu_ttm_entity *entity, + struct amdgpu_bo *bo, + uint32_t src_data, + struct dma_resv *resv, + struct dma_fence **f, + u64 k_job_id) { struct amdgpu_device *adev = amdgpu_ttm_adev(bo->tbo.bdev); struct amdgpu_ring *ring = adev->mman.buffer_funcs_ring; @@ -2437,13 +2450,15 @@ int amdgpu_fill_buffer(struct amdgpu_bo *bo, /* Never fill more than 256MiB at once to avoid timeouts */ cur_size = min(dst.size, 256ULL << 20);
- r = amdgpu_ttm_map_buffer(&bo->tbo, bo->tbo.resource, &dst, + r = amdgpu_ttm_map_buffer(&entity->base, + &bo->tbo, bo->tbo.resource, &dst, 1, ring, false, &cur_size, &to); if (r) goto error;
- r = amdgpu_ttm_fill_mem(ring, src_data, to, cur_size, resv, - &next, true, delayed, k_job_id); + r = amdgpu_ttm_fill_mem(ring, &entity->base, + src_data, to, cur_size, resv, + &next, true, k_job_id); if (r) goto error;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h index f83313bc0afb..a3be0e06e1e7 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h @@ -167,7 +167,9 @@ int amdgpu_ttm_init(struct amdgpu_device *adev); void amdgpu_ttm_fini(struct amdgpu_device *adev); void amdgpu_ttm_set_buffer_funcs_status(struct amdgpu_device *adev, bool enable); -int amdgpu_copy_buffer(struct amdgpu_ring *ring, uint64_t src_offset, +int amdgpu_copy_buffer(struct amdgpu_ring *ring, + struct drm_sched_entity *entity, + uint64_t src_offset, uint64_t dst_offset, uint32_t byte_count, struct dma_resv *resv, struct dma_fence **fence, @@ -175,12 +177,12 @@ int amdgpu_copy_buffer(struct amdgpu_ring *ring, uint64_t src_offset, int amdgpu_ttm_clear_buffer(struct amdgpu_bo *bo, struct dma_resv *resv, struct dma_fence **fence); -int amdgpu_fill_buffer(struct amdgpu_bo *bo, - uint32_t src_data, - struct dma_resv *resv, - struct dma_fence **fence, - bool delayed, - u64 k_job_id); +int amdgpu_fill_buffer(struct amdgpu_ttm_entity *entity, + struct amdgpu_bo *bo, + uint32_t src_data, + struct dma_resv *resv, + struct dma_fence **f, + u64 k_job_id);
int amdgpu_ttm_alloc_gart(struct ttm_buffer_object *bo); void amdgpu_ttm_recover_gart(struct ttm_buffer_object *tbo); diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c index 1d35a89999f7..dfcba2c4580d 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c @@ -157,7 +157,8 @@ svm_migrate_copy_memory_gart(struct amdgpu_device *adev, dma_addr_t *sys, goto out_unlock; }
- r = amdgpu_copy_buffer(ring, gart_s, gart_d, size * PAGE_SIZE, + r = amdgpu_copy_buffer(ring, &entity->base, + gart_s, gart_d, size * PAGE_SIZE, NULL, &next, true, 0); if (r) { dev_err(adev->dev, "fail %d to copy memory\n", r);
In case the fill job depends on a previous fence, the caller can now pass it to make sure the ordering of the jobs is correct.
Signed-off-by: Pierre-Eric Pelloux-Prayer pierre-eric.pelloux-prayer@amd.com --- drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 22 ++++++++++++++++------ drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h | 1 + 3 files changed, 18 insertions(+), 7 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c index e7b2cae031b3..be3532134e46 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c @@ -1322,7 +1322,7 @@ void amdgpu_bo_release_notify(struct ttm_buffer_object *bo) goto out;
r = amdgpu_fill_buffer(&adev->mman.clear_entities[0], abo, 0, &bo->base._resv, - &fence, AMDGPU_KERNEL_JOB_ID_CLEAR_ON_RELEASE); + &fence, NULL, AMDGPU_KERNEL_JOB_ID_CLEAR_ON_RELEASE); if (WARN_ON(r)) goto out;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c index 1b3945513157..b94ac16c785b 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c @@ -173,6 +173,7 @@ static void amdgpu_evict_flags(struct ttm_buffer_object *bo, * @tmz: if we should setup a TMZ enabled mapping * @size: in number of bytes to map, out number of bytes mapped * @addr: resulting address inside the MC address space + * @dep: optional dependency * * Setup one of the GART windows to access a specific piece of memory or return * the physical address for local memory. @@ -182,7 +183,8 @@ static int amdgpu_ttm_map_buffer(struct drm_sched_entity *entity, struct ttm_resource *mem, struct amdgpu_res_cursor *mm_cur, unsigned int window, struct amdgpu_ring *ring, - bool tmz, uint64_t *size, uint64_t *addr) + bool tmz, uint64_t *size, uint64_t *addr, + struct dma_fence *dep) { struct amdgpu_device *adev = ring->adev; unsigned int offset, num_pages, num_dw, num_bytes; @@ -234,6 +236,9 @@ static int amdgpu_ttm_map_buffer(struct drm_sched_entity *entity, if (r) return r;
+ if (dep) + drm_sched_job_add_dependency(&job->base, dma_fence_get(dep)); + src_addr = num_dw * 4; src_addr += job->ibs[0].gpu_addr;
@@ -326,13 +331,15 @@ static int amdgpu_ttm_copy_mem_to_mem(struct amdgpu_device *adev, /* Map src to window 0 and dst to window 1. */ r = amdgpu_ttm_map_buffer(&entity->base, src->bo, src->mem, &src_mm, - entity->gart_window_id0, ring, tmz, &cur_size, &from); + entity->gart_window_id0, ring, tmz, &cur_size, &from, + NULL); if (r) goto error;
r = amdgpu_ttm_map_buffer(&entity->base, dst->bo, dst->mem, &dst_mm, - entity->gart_window_id1, ring, tmz, &cur_size, &to); + entity->gart_window_id1, ring, tmz, &cur_size, &to, + NULL); if (r) goto error;
@@ -415,7 +422,7 @@ static int amdgpu_move_blit(struct ttm_buffer_object *bo, struct dma_fence *wipe_fence = NULL;
r = amdgpu_fill_buffer(&adev->mman.move_entities[0], - abo, 0, NULL, &wipe_fence, + abo, 0, NULL, &wipe_fence, fence, AMDGPU_KERNEL_JOB_ID_MOVE_BLIT); if (r) { goto error; @@ -2443,7 +2450,8 @@ int amdgpu_ttm_clear_buffer(struct amdgpu_bo *bo,
r = amdgpu_ttm_map_buffer(&entity->base, &bo->tbo, bo->tbo.resource, &cursor, - entity->gart_window_id1, ring, false, &size, &addr); + entity->gart_window_id1, ring, false, &size, &addr, + NULL); if (r) goto err;
@@ -2469,6 +2477,7 @@ int amdgpu_fill_buffer(struct amdgpu_ttm_entity *entity, uint32_t src_data, struct dma_resv *resv, struct dma_fence **f, + struct dma_fence *dependency, u64 k_job_id) { struct amdgpu_device *adev = amdgpu_ttm_adev(bo->tbo.bdev); @@ -2496,7 +2505,8 @@ int amdgpu_fill_buffer(struct amdgpu_ttm_entity *entity, r = amdgpu_ttm_map_buffer(&entity->base, &bo->tbo, bo->tbo.resource, &dst, entity->gart_window_id1, ring, false, - &cur_size, &to); + &cur_size, &to, + dependency); if (r) goto error;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h index 2874f054e869..9dd2a76a5641 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h @@ -186,6 +186,7 @@ int amdgpu_fill_buffer(struct amdgpu_ttm_entity *entity, uint32_t src_data, struct dma_resv *resv, struct dma_fence **f, + struct dma_fence *dependency, u64 k_job_id);
int amdgpu_ttm_alloc_gart(struct ttm_buffer_object *bo);
We want to use multiple entities for clears to enable parallelism. To achieve this, let amdgpu_fill_buffer select the entity to use when it's possible by passing a NULL entity.
We can also simplify the signature and remove the resv param: amdgpu_move_blit is the only caller that doesn't use it, and it's also the only caller that needs to specify an entity.
Signed-off-by: Pierre-Eric Pelloux-Prayer pierre-eric.pelloux-prayer@amd.com --- drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 11 +---- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 54 ++++++++++++++++++---- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h | 1 - 3 files changed, 47 insertions(+), 19 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c index be3532134e46..4a69324bb730 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c @@ -1287,7 +1287,6 @@ void amdgpu_bo_move_notify(struct ttm_buffer_object *bo, void amdgpu_bo_release_notify(struct ttm_buffer_object *bo) { struct amdgpu_device *adev = amdgpu_ttm_adev(bo->bdev); - struct dma_fence *fence = NULL; struct amdgpu_bo *abo; int r;
@@ -1317,18 +1316,12 @@ void amdgpu_bo_release_notify(struct ttm_buffer_object *bo) adev->in_suspend || drm_dev_is_unplugged(adev_to_drm(adev))) goto out;
- r = dma_resv_reserve_fences(&bo->base._resv, 1); - if (r) - goto out; - - r = amdgpu_fill_buffer(&adev->mman.clear_entities[0], abo, 0, &bo->base._resv, - &fence, NULL, AMDGPU_KERNEL_JOB_ID_CLEAR_ON_RELEASE); + r = amdgpu_fill_buffer(NULL, abo, 0, NULL, + NULL, AMDGPU_KERNEL_JOB_ID_CLEAR_ON_RELEASE); if (WARN_ON(r)) goto out;
amdgpu_vram_mgr_set_cleared(bo->resource); - dma_resv_add_fence(&bo->base._resv, fence, DMA_RESV_USAGE_KERNEL); - dma_fence_put(fence);
out: dma_resv_unlock(&bo->base._resv); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c index b94ac16c785b..c357a6d9763a 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c @@ -184,7 +184,8 @@ static int amdgpu_ttm_map_buffer(struct drm_sched_entity *entity, struct amdgpu_res_cursor *mm_cur, unsigned int window, struct amdgpu_ring *ring, bool tmz, uint64_t *size, uint64_t *addr, - struct dma_fence *dep) + struct dma_fence *dep, + struct dma_resv *resv) { struct amdgpu_device *adev = ring->adev; unsigned int offset, num_pages, num_dw, num_bytes; @@ -239,6 +240,10 @@ static int amdgpu_ttm_map_buffer(struct drm_sched_entity *entity, if (dep) drm_sched_job_add_dependency(&job->base, dma_fence_get(dep));
+ if (resv) + drm_sched_job_add_resv_dependencies(&job->base, resv, + DMA_RESV_USAGE_BOOKKEEP); + src_addr = num_dw * 4; src_addr += job->ibs[0].gpu_addr;
@@ -332,14 +337,14 @@ static int amdgpu_ttm_copy_mem_to_mem(struct amdgpu_device *adev, r = amdgpu_ttm_map_buffer(&entity->base, src->bo, src->mem, &src_mm, entity->gart_window_id0, ring, tmz, &cur_size, &from, - NULL); + NULL, NULL); if (r) goto error;
r = amdgpu_ttm_map_buffer(&entity->base, dst->bo, dst->mem, &dst_mm, entity->gart_window_id1, ring, tmz, &cur_size, &to, - NULL); + NULL, NULL); if (r) goto error;
@@ -422,7 +427,7 @@ static int amdgpu_move_blit(struct ttm_buffer_object *bo, struct dma_fence *wipe_fence = NULL;
r = amdgpu_fill_buffer(&adev->mman.move_entities[0], - abo, 0, NULL, &wipe_fence, fence, + abo, 0, &wipe_fence, fence, AMDGPU_KERNEL_JOB_ID_MOVE_BLIT); if (r) { goto error; @@ -2451,7 +2456,7 @@ int amdgpu_ttm_clear_buffer(struct amdgpu_bo *bo, r = amdgpu_ttm_map_buffer(&entity->base, &bo->tbo, bo->tbo.resource, &cursor, entity->gart_window_id1, ring, false, &size, &addr, - NULL); + NULL, NULL); if (r) goto err;
@@ -2472,10 +2477,21 @@ int amdgpu_ttm_clear_buffer(struct amdgpu_bo *bo, return r; }
+/** + * amdgpu_fill_buffer - fill a buffer with a given value + * @entity: optional entity to use. If NULL, the clearing entities will be + * used to load-balance the partial clears + * @bo: the bo to fill + * @src_data: the value to set + * @f: optional out fence. If @entity is NULL, this must be NULL and the + * fences from each partial clear will be added to the &dma_resv. + * @dependency: optional input dependency fence. + * @k_job_id: trace id + * + */ int amdgpu_fill_buffer(struct amdgpu_ttm_entity *entity, struct amdgpu_bo *bo, uint32_t src_data, - struct dma_resv *resv, struct dma_fence **f, struct dma_fence *dependency, u64 k_job_id) @@ -2483,15 +2499,29 @@ int amdgpu_fill_buffer(struct amdgpu_ttm_entity *entity, struct amdgpu_device *adev = amdgpu_ttm_adev(bo->tbo.bdev); struct amdgpu_ring *ring = adev->mman.buffer_funcs_ring; struct dma_fence *fence = NULL; + struct dma_resv *resv = NULL; struct amdgpu_res_cursor dst; int r;
+ /* The fences will be either added to the resv object or the last fence + * will be returned to the caller. In the latter case, all fill jobs will + * be executed on the same ring. + */ + WARN_ON_ONCE((entity && !f) || (!entity && f)); if (!adev->mman.buffer_funcs_enabled) { dev_err(adev->dev, "Trying to clear memory with ring turned off.\n"); return -EINVAL; }
+ if (!entity) { + entity = &adev->mman.clear_entities[0]; + resv = &bo->tbo.base._resv; + r = dma_resv_reserve_fences(resv, 1); + if (r) + return r; + } + amdgpu_res_first(bo->tbo.resource, 0, amdgpu_bo_size(bo), &dst);
mutex_lock(&entity->gart_window_lock); @@ -2506,7 +2536,8 @@ int amdgpu_fill_buffer(struct amdgpu_ttm_entity *entity, &bo->tbo, bo->tbo.resource, &dst, entity->gart_window_id1, ring, false, &cur_size, &to, - dependency); + dependency, + resv); if (r) goto error;
@@ -2516,8 +2547,13 @@ int amdgpu_fill_buffer(struct amdgpu_ttm_entity *entity, if (r) goto error;
- dma_fence_put(fence); - fence = next; + if (resv) { + dma_resv_add_fence(resv, next, DMA_RESV_USAGE_KERNEL); + dma_fence_put(next); + } else { + dma_fence_put(fence); + fence = next; + }
amdgpu_res_next(&dst, cur_size); } diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h index 9dd2a76a5641..38df2b5b4bc7 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h @@ -184,7 +184,6 @@ int amdgpu_ttm_clear_buffer(struct amdgpu_bo *bo, int amdgpu_fill_buffer(struct amdgpu_ttm_entity *entity, struct amdgpu_bo *bo, uint32_t src_data, - struct dma_resv *resv, struct dma_fence **f, struct dma_fence *dependency, u64 k_job_id);
The benefits of using multiple entities is that multiple fill jobs can run in parallel. Otherwise, even if the entity has access to multiple engines, a burst of N independent jobs will all run on the same engine because an entity guarantees the ordering of execution matches the ordering of the submission.
Callers can opt-out of this behavior by passing the entity they want to use (see amdgpu_move_blit).
Signed-off-by: Pierre-Eric Pelloux-Prayer pierre-eric.pelloux-prayer@amd.com --- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 84 ++++++++++++++++++------- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h | 1 + 2 files changed, 64 insertions(+), 21 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c index c357a6d9763a..839ea8c7f6be 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c @@ -2224,6 +2224,7 @@ u32 amdgpu_ttm_set_buffer_funcs_status(struct amdgpu_device *adev, bool enable) adev->mman.clear_entities = kcalloc(num_clear_entities, sizeof(struct amdgpu_ttm_entity), GFP_KERNEL); + atomic_set(&adev->mman.next_clear_entity, 0); if (!adev->mman.clear_entities) goto error_free_entity;
@@ -2498,10 +2499,12 @@ int amdgpu_fill_buffer(struct amdgpu_ttm_entity *entity, { struct amdgpu_device *adev = amdgpu_ttm_adev(bo->tbo.bdev); struct amdgpu_ring *ring = adev->mman.buffer_funcs_ring; + struct dma_fence *fences[TTM_FENCES_MAX_SLOT_COUNT] = {}; struct dma_fence *fence = NULL; struct dma_resv *resv = NULL; struct amdgpu_res_cursor dst; - int r; + uint64_t cur_size, to; + int r, e, n_fences;
/* The fences will be either added to the resv object or the last fence * will be returned to the caller. In the latter case, all fill jobs will @@ -2515,53 +2518,92 @@ int amdgpu_fill_buffer(struct amdgpu_ttm_entity *entity, }
if (!entity) { - entity = &adev->mman.clear_entities[0]; resv = &bo->tbo.base._resv; - r = dma_resv_reserve_fences(resv, 1); + + /* Determine how much fences we're going to add to the + * resv object. + */ + n_fences = 0; + amdgpu_res_first(bo->tbo.resource, 0, amdgpu_bo_size(bo), &dst); + while (dst.remaining) { + cur_size = min(dst.size, 256ULL << 20); + + n_fences += 1; + amdgpu_res_next(&dst, cur_size); + } + if (n_fences == 0) + return 0; + + /* One slot per entity at most. */ + n_fences = MIN(n_fences, adev->mman.num_clear_entities); + + r = dma_resv_reserve_fences(resv, n_fences); if (r) return r; + } else { + mutex_lock(&entity->gart_window_lock); }
amdgpu_res_first(bo->tbo.resource, 0, amdgpu_bo_size(bo), &dst);
- mutex_lock(&entity->gart_window_lock); while (dst.remaining) { - struct dma_fence *next; - uint64_t cur_size, to; - /* Never fill more than 256MiB at once to avoid timeouts */ cur_size = min(dst.size, 256ULL << 20);
+ if (resv) { + /* Pick a new entity for each partial clear so they can + * execute in parallel. + */ + e = atomic_inc_return(&adev->mman.next_clear_entity) % + adev->mman.num_clear_entities; + entity = &adev->mman.clear_entities[e]; + mutex_lock(&entity->gart_window_lock); + } + r = amdgpu_ttm_map_buffer(&entity->base, &bo->tbo, bo->tbo.resource, &dst, entity->gart_window_id1, ring, false, &cur_size, &to, dependency, resv); - if (r) + if (r) { + mutex_unlock(&entity->gart_window_lock); goto error; + }
r = amdgpu_ttm_fill_mem(ring, &entity->base, src_data, to, cur_size, resv, - &next, true, k_job_id); - if (r) + &fence, true, k_job_id); + if (r) { + mutex_unlock(&entity->gart_window_lock); goto error; - - if (resv) { - dma_resv_add_fence(resv, next, DMA_RESV_USAGE_KERNEL); - dma_fence_put(next); - } else { - dma_fence_put(fence); - fence = next; }
amdgpu_res_next(&dst, cur_size); + + if (resv) { + /* Delay the addition of the fences to resv, otherwise the next partial + * clears will depend on this one. + */ + fences[e] = fence; + mutex_unlock(&entity->gart_window_lock); + } else { + dma_fence_put(*f); + *f = fence; + } } error: - mutex_unlock(&entity->gart_window_lock); - if (f) - *f = dma_fence_get(fence); - dma_fence_put(fence); + if (resv) { + for (e = 0; e < adev->mman.num_clear_entities; e++) { + if (fences[e]) { + dma_resv_add_fence(resv, fences[e], DMA_RESV_USAGE_KERNEL); + dma_fence_put(fences[e]); + } + } + } else { + mutex_unlock(&entity->gart_window_lock); + } + return r; }
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h index 38df2b5b4bc7..3fc31c7c6bfe 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h @@ -73,6 +73,7 @@ struct amdgpu_mman {
struct amdgpu_ttm_entity default_entity; /* has no gart windows */ struct amdgpu_ttm_entity *clear_entities; + atomic_t next_clear_entity; u32 num_clear_entities; struct amdgpu_ttm_entity move_entities[TTM_FENCES_MAX_SLOT_COUNT]; u32 num_move_entities;
On 11/4/25 09:35, Pierre-Eric Pelloux-Prayer wrote:
The benefits of using multiple entities is that multiple fill jobs can run in parallel. Otherwise, even if the entity has access to multiple engines, a burst of N independent jobs will all run on the same engine because an entity guarantees the ordering of execution matches the ordering of the submission.
Callers can opt-out of this behavior by passing the entity they want to use (see amdgpu_move_blit).
That still sounds like a really bad idea to me.
First of all we can't reserve so many fence slots in the release handler, previously we basically just relied on the fact that the BO will most likely be mostly idle.
I think we should just use a single SDMA engine for each clear and distribute clearing different BOs over multiple engines.
Regards, Christian.
Signed-off-by: Pierre-Eric Pelloux-Prayer pierre-eric.pelloux-prayer@amd.com
drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 84 ++++++++++++++++++------- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h | 1 + 2 files changed, 64 insertions(+), 21 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c index c357a6d9763a..839ea8c7f6be 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c @@ -2224,6 +2224,7 @@ u32 amdgpu_ttm_set_buffer_funcs_status(struct amdgpu_device *adev, bool enable) adev->mman.clear_entities = kcalloc(num_clear_entities, sizeof(struct amdgpu_ttm_entity), GFP_KERNEL);
if (!adev->mman.clear_entities) goto error_free_entity;atomic_set(&adev->mman.next_clear_entity, 0);@@ -2498,10 +2499,12 @@ int amdgpu_fill_buffer(struct amdgpu_ttm_entity *entity, { struct amdgpu_device *adev = amdgpu_ttm_adev(bo->tbo.bdev); struct amdgpu_ring *ring = adev->mman.buffer_funcs_ring;
- struct dma_fence *fences[TTM_FENCES_MAX_SLOT_COUNT] = {}; struct dma_fence *fence = NULL; struct dma_resv *resv = NULL; struct amdgpu_res_cursor dst;
- int r;
- uint64_t cur_size, to;
- int r, e, n_fences;
/* The fences will be either added to the resv object or the last fence * will be returned to the caller. In the latter case, all fill jobs will @@ -2515,53 +2518,92 @@ int amdgpu_fill_buffer(struct amdgpu_ttm_entity *entity, } if (!entity) {
resv = &bo->tbo.base._resv;entity = &adev->mman.clear_entities[0];r = dma_resv_reserve_fences(resv, 1);
/* Determine how much fences we're going to add to the* resv object.*/n_fences = 0;amdgpu_res_first(bo->tbo.resource, 0, amdgpu_bo_size(bo), &dst);while (dst.remaining) {cur_size = min(dst.size, 256ULL << 20);n_fences += 1;amdgpu_res_next(&dst, cur_size);}if (n_fences == 0)return 0;/* One slot per entity at most. */n_fences = MIN(n_fences, adev->mman.num_clear_entities); if (r) return r;r = dma_resv_reserve_fences(resv, n_fences);- } else {
}mutex_lock(&entity->gart_window_lock);amdgpu_res_first(bo->tbo.resource, 0, amdgpu_bo_size(bo), &dst);
- mutex_lock(&entity->gart_window_lock); while (dst.remaining) {
struct dma_fence *next;uint64_t cur_size, to;- /* Never fill more than 256MiB at once to avoid timeouts */ cur_size = min(dst.size, 256ULL << 20);
if (resv) {/* Pick a new entity for each partial clear so they can* execute in parallel.*/e = atomic_inc_return(&adev->mman.next_clear_entity) %adev->mman.num_clear_entities;entity = &adev->mman.clear_entities[e];mutex_lock(&entity->gart_window_lock);}- r = amdgpu_ttm_map_buffer(&entity->base, &bo->tbo, bo->tbo.resource, &dst, entity->gart_window_id1, ring, false, &cur_size, &to, dependency, resv);
if (r)
if (r) {mutex_unlock(&entity->gart_window_lock); goto error;}r = amdgpu_ttm_fill_mem(ring, &entity->base, src_data, to, cur_size, resv,
&next, true, k_job_id);if (r)
&fence, true, k_job_id);if (r) {mutex_unlock(&entity->gart_window_lock); goto error;
if (resv) {dma_resv_add_fence(resv, next, DMA_RESV_USAGE_KERNEL);dma_fence_put(next);} else {dma_fence_put(fence); }fence = next;amdgpu_res_next(&dst, cur_size);
if (resv) {/* Delay the addition of the fences to resv, otherwise the next partial* clears will depend on this one.*/fences[e] = fence;mutex_unlock(&entity->gart_window_lock);} else {dma_fence_put(*f);*f = fence; }}error:
- mutex_unlock(&entity->gart_window_lock);
- if (f)
*f = dma_fence_get(fence);- dma_fence_put(fence);
- if (resv) {
for (e = 0; e < adev->mman.num_clear_entities; e++) {if (fences[e]) {dma_resv_add_fence(resv, fences[e], DMA_RESV_USAGE_KERNEL);dma_fence_put(fences[e]);}}- } else {
mutex_unlock(&entity->gart_window_lock);- }
- return r;
} diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h index 38df2b5b4bc7..3fc31c7c6bfe 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h @@ -73,6 +73,7 @@ struct amdgpu_mman { struct amdgpu_ttm_entity default_entity; /* has no gart windows */ struct amdgpu_ttm_entity *clear_entities;
- atomic_t next_clear_entity; u32 num_clear_entities; struct amdgpu_ttm_entity move_entities[TTM_FENCES_MAX_SLOT_COUNT]; u32 num_move_entities;
Le 04/11/2025 à 17:40, Christian König a écrit :
On 11/4/25 09:35, Pierre-Eric Pelloux-Prayer wrote:
The benefits of using multiple entities is that multiple fill jobs can run in parallel. Otherwise, even if the entity has access to multiple engines, a burst of N independent jobs will all run on the same engine because an entity guarantees the ordering of execution matches the ordering of the submission.
Callers can opt-out of this behavior by passing the entity they want to use (see amdgpu_move_blit).
That still sounds like a really bad idea to me.
First of all we can't reserve so many fence slots in the release handler, previously we basically just relied on the fact that the BO will most likely be mostly idle.
I think we should just use a single SDMA engine for each clear and distribute clearing different BOs over multiple engines.
So N clear entities, each one having access to a single engine. And all jobs to clear a single BO go to the same entity?
Is that what you mean?
Pierre-Eric
Regards, Christian.
Signed-off-by: Pierre-Eric Pelloux-Prayer pierre-eric.pelloux-prayer@amd.com
drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 84 ++++++++++++++++++------- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h | 1 + 2 files changed, 64 insertions(+), 21 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c index c357a6d9763a..839ea8c7f6be 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c @@ -2224,6 +2224,7 @@ u32 amdgpu_ttm_set_buffer_funcs_status(struct amdgpu_device *adev, bool enable) adev->mman.clear_entities = kcalloc(num_clear_entities, sizeof(struct amdgpu_ttm_entity), GFP_KERNEL);
if (!adev->mman.clear_entities) goto error_free_entity;atomic_set(&adev->mman.next_clear_entity, 0);@@ -2498,10 +2499,12 @@ int amdgpu_fill_buffer(struct amdgpu_ttm_entity *entity, { struct amdgpu_device *adev = amdgpu_ttm_adev(bo->tbo.bdev); struct amdgpu_ring *ring = adev->mman.buffer_funcs_ring;
- struct dma_fence *fences[TTM_FENCES_MAX_SLOT_COUNT] = {}; struct dma_fence *fence = NULL; struct dma_resv *resv = NULL; struct amdgpu_res_cursor dst;
- int r;
- uint64_t cur_size, to;
- int r, e, n_fences;
/* The fences will be either added to the resv object or the last fence * will be returned to the caller. In the latter case, all fill jobs will @@ -2515,53 +2518,92 @@ int amdgpu_fill_buffer(struct amdgpu_ttm_entity *entity, } if (!entity) {
resv = &bo->tbo.base._resv;entity = &adev->mman.clear_entities[0];r = dma_resv_reserve_fences(resv, 1);
/* Determine how much fences we're going to add to the* resv object.*/n_fences = 0;amdgpu_res_first(bo->tbo.resource, 0, amdgpu_bo_size(bo), &dst);while (dst.remaining) {cur_size = min(dst.size, 256ULL << 20);n_fences += 1;amdgpu_res_next(&dst, cur_size);}if (n_fences == 0)return 0;/* One slot per entity at most. */n_fences = MIN(n_fences, adev->mman.num_clear_entities); if (r) return r;r = dma_resv_reserve_fences(resv, n_fences);- } else {
}mutex_lock(&entity->gart_window_lock);amdgpu_res_first(bo->tbo.resource, 0, amdgpu_bo_size(bo), &dst);
- mutex_lock(&entity->gart_window_lock); while (dst.remaining) {
struct dma_fence *next;uint64_t cur_size, to;- /* Never fill more than 256MiB at once to avoid timeouts */ cur_size = min(dst.size, 256ULL << 20);
if (resv) {/* Pick a new entity for each partial clear so they can* execute in parallel.*/e = atomic_inc_return(&adev->mman.next_clear_entity) %adev->mman.num_clear_entities;entity = &adev->mman.clear_entities[e];mutex_lock(&entity->gart_window_lock);}- r = amdgpu_ttm_map_buffer(&entity->base, &bo->tbo, bo->tbo.resource, &dst, entity->gart_window_id1, ring, false, &cur_size, &to, dependency, resv);
if (r)
if (r) {mutex_unlock(&entity->gart_window_lock); goto error;}r = amdgpu_ttm_fill_mem(ring, &entity->base, src_data, to, cur_size, resv,
&next, true, k_job_id);if (r)
&fence, true, k_job_id);if (r) {mutex_unlock(&entity->gart_window_lock); goto error;
if (resv) {dma_resv_add_fence(resv, next, DMA_RESV_USAGE_KERNEL);dma_fence_put(next);} else {dma_fence_put(fence); }fence = next;amdgpu_res_next(&dst, cur_size);
if (resv) {/* Delay the addition of the fences to resv, otherwise the next partial* clears will depend on this one.*/fences[e] = fence;mutex_unlock(&entity->gart_window_lock);} else {dma_fence_put(*f);*f = fence; } error:}
- mutex_unlock(&entity->gart_window_lock);
- if (f)
*f = dma_fence_get(fence);- dma_fence_put(fence);
- if (resv) {
for (e = 0; e < adev->mman.num_clear_entities; e++) {if (fences[e]) {dma_resv_add_fence(resv, fences[e], DMA_RESV_USAGE_KERNEL);dma_fence_put(fences[e]);}}- } else {
mutex_unlock(&entity->gart_window_lock);- }
- return r; }
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h index 38df2b5b4bc7..3fc31c7c6bfe 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h @@ -73,6 +73,7 @@ struct amdgpu_mman { struct amdgpu_ttm_entity default_entity; /* has no gart windows */ struct amdgpu_ttm_entity *clear_entities;
- atomic_t next_clear_entity; u32 num_clear_entities; struct amdgpu_ttm_entity move_entities[TTM_FENCES_MAX_SLOT_COUNT]; u32 num_move_entities;
On 11/5/25 11:39, Pierre-Eric Pelloux-Prayer wrote:
Le 04/11/2025 à 17:40, Christian König a écrit :
On 11/4/25 09:35, Pierre-Eric Pelloux-Prayer wrote:
The benefits of using multiple entities is that multiple fill jobs can run in parallel. Otherwise, even if the entity has access to multiple engines, a burst of N independent jobs will all run on the same engine because an entity guarantees the ordering of execution matches the ordering of the submission.
Callers can opt-out of this behavior by passing the entity they want to use (see amdgpu_move_blit).
That still sounds like a really bad idea to me.
First of all we can't reserve so many fence slots in the release handler, previously we basically just relied on the fact that the BO will most likely be mostly idle.
I think we should just use a single SDMA engine for each clear and distribute clearing different BOs over multiple engines.
So N clear entities, each one having access to a single engine. And all jobs to clear a single BO go to the same entity?
Is that what you mean?
More or less.
N clear entities, each one has access to all engines. When a BO needs to be cleared it picks the next best entity and submits the jobs.
This way clear entities still load balance with moves and page table updates but we can keep the clearing logic simple.
Christian.
Pierre-Eric
Regards, Christian.
Signed-off-by: Pierre-Eric Pelloux-Prayer pierre-eric.pelloux-prayer@amd.com
drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 84 ++++++++++++++++++------- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h | 1 + 2 files changed, 64 insertions(+), 21 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c index c357a6d9763a..839ea8c7f6be 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c @@ -2224,6 +2224,7 @@ u32 amdgpu_ttm_set_buffer_funcs_status(struct amdgpu_device *adev, bool enable) adev->mman.clear_entities = kcalloc(num_clear_entities, sizeof(struct amdgpu_ttm_entity), GFP_KERNEL); + atomic_set(&adev->mman.next_clear_entity, 0); if (!adev->mman.clear_entities) goto error_free_entity; @@ -2498,10 +2499,12 @@ int amdgpu_fill_buffer(struct amdgpu_ttm_entity *entity, { struct amdgpu_device *adev = amdgpu_ttm_adev(bo->tbo.bdev); struct amdgpu_ring *ring = adev->mman.buffer_funcs_ring; + struct dma_fence *fences[TTM_FENCES_MAX_SLOT_COUNT] = {}; struct dma_fence *fence = NULL; struct dma_resv *resv = NULL; struct amdgpu_res_cursor dst; - int r; + uint64_t cur_size, to; + int r, e, n_fences; /* The fences will be either added to the resv object or the last fence * will be returned to the caller. In the latter case, all fill jobs will @@ -2515,53 +2518,92 @@ int amdgpu_fill_buffer(struct amdgpu_ttm_entity *entity, } if (!entity) { - entity = &adev->mman.clear_entities[0]; resv = &bo->tbo.base._resv; - r = dma_resv_reserve_fences(resv, 1);
+ /* Determine how much fences we're going to add to the + * resv object. + */ + n_fences = 0; + amdgpu_res_first(bo->tbo.resource, 0, amdgpu_bo_size(bo), &dst); + while (dst.remaining) { + cur_size = min(dst.size, 256ULL << 20);
+ n_fences += 1; + amdgpu_res_next(&dst, cur_size); + } + if (n_fences == 0) + return 0;
+ /* One slot per entity at most. */ + n_fences = MIN(n_fences, adev->mman.num_clear_entities);
+ r = dma_resv_reserve_fences(resv, n_fences); if (r) return r; + } else { + mutex_lock(&entity->gart_window_lock); } amdgpu_res_first(bo->tbo.resource, 0, amdgpu_bo_size(bo), &dst); - mutex_lock(&entity->gart_window_lock); while (dst.remaining) { - struct dma_fence *next; - uint64_t cur_size, to;
/* Never fill more than 256MiB at once to avoid timeouts */ cur_size = min(dst.size, 256ULL << 20); + if (resv) { + /* Pick a new entity for each partial clear so they can + * execute in parallel. + */ + e = atomic_inc_return(&adev->mman.next_clear_entity) % + adev->mman.num_clear_entities; + entity = &adev->mman.clear_entities[e]; + mutex_lock(&entity->gart_window_lock); + }
r = amdgpu_ttm_map_buffer(&entity->base, &bo->tbo, bo->tbo.resource, &dst, entity->gart_window_id1, ring, false, &cur_size, &to, dependency, resv); - if (r) + if (r) { + mutex_unlock(&entity->gart_window_lock); goto error; + } r = amdgpu_ttm_fill_mem(ring, &entity->base, src_data, to, cur_size, resv, - &next, true, k_job_id); - if (r) + &fence, true, k_job_id); + if (r) { + mutex_unlock(&entity->gart_window_lock); goto error;
- if (resv) { - dma_resv_add_fence(resv, next, DMA_RESV_USAGE_KERNEL); - dma_fence_put(next); - } else { - dma_fence_put(fence); - fence = next; } amdgpu_res_next(&dst, cur_size);
+ if (resv) { + /* Delay the addition of the fences to resv, otherwise the next partial + * clears will depend on this one. + */ + fences[e] = fence; + mutex_unlock(&entity->gart_window_lock); + } else { + dma_fence_put(*f); + *f = fence; + } } error: - mutex_unlock(&entity->gart_window_lock); - if (f) - *f = dma_fence_get(fence); - dma_fence_put(fence); + if (resv) { + for (e = 0; e < adev->mman.num_clear_entities; e++) { + if (fences[e]) { + dma_resv_add_fence(resv, fences[e], DMA_RESV_USAGE_KERNEL); + dma_fence_put(fences[e]); + } + } + } else { + mutex_unlock(&entity->gart_window_lock); + }
return r; } diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h index 38df2b5b4bc7..3fc31c7c6bfe 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h @@ -73,6 +73,7 @@ struct amdgpu_mman { struct amdgpu_ttm_entity default_entity; /* has no gart windows */ struct amdgpu_ttm_entity *clear_entities; + atomic_t next_clear_entity; u32 num_clear_entities; struct amdgpu_ttm_entity move_entities[TTM_FENCES_MAX_SLOT_COUNT]; u32 num_move_entities;
Le 05/11/2025 à 14:03, Christian König a écrit :
On 11/5/25 11:39, Pierre-Eric Pelloux-Prayer wrote:
Le 04/11/2025 à 17:40, Christian König a écrit :
On 11/4/25 09:35, Pierre-Eric Pelloux-Prayer wrote:
The benefits of using multiple entities is that multiple fill jobs can run in parallel. Otherwise, even if the entity has access to multiple engines, a burst of N independent jobs will all run on the same engine because an entity guarantees the ordering of execution matches the ordering of the submission.
Callers can opt-out of this behavior by passing the entity they want to use (see amdgpu_move_blit).
That still sounds like a really bad idea to me.
First of all we can't reserve so many fence slots in the release handler, previously we basically just relied on the fact that the BO will most likely be mostly idle.
I think we should just use a single SDMA engine for each clear and distribute clearing different BOs over multiple engines.
So N clear entities, each one having access to a single engine. And all jobs to clear a single BO go to the same entity?
Is that what you mean?
More or less.
N clear entities, each one has access to all engines. When a BO needs to be cleared it picks the next best entity and submits the jobs.
This way clear entities still load balance with moves and page table updates but we can keep the clearing logic simple.
OK, I'll drop this change from patchset as a tradeoff: we lose a bit of performance (because if a BO is built of N regions, then the N clears will execute sequentially) but it keep the code simple.
Pierre-Eric
Christian.
Pierre-Eric
Regards, Christian.
Signed-off-by: Pierre-Eric Pelloux-Prayer pierre-eric.pelloux-prayer@amd.com
drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 84 ++++++++++++++++++------- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h | 1 + 2 files changed, 64 insertions(+), 21 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c index c357a6d9763a..839ea8c7f6be 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c @@ -2224,6 +2224,7 @@ u32 amdgpu_ttm_set_buffer_funcs_status(struct amdgpu_device *adev, bool enable) adev->mman.clear_entities = kcalloc(num_clear_entities, sizeof(struct amdgpu_ttm_entity), GFP_KERNEL); + atomic_set(&adev->mman.next_clear_entity, 0); if (!adev->mman.clear_entities) goto error_free_entity; @@ -2498,10 +2499,12 @@ int amdgpu_fill_buffer(struct amdgpu_ttm_entity *entity, { struct amdgpu_device *adev = amdgpu_ttm_adev(bo->tbo.bdev); struct amdgpu_ring *ring = adev->mman.buffer_funcs_ring; + struct dma_fence *fences[TTM_FENCES_MAX_SLOT_COUNT] = {}; struct dma_fence *fence = NULL; struct dma_resv *resv = NULL; struct amdgpu_res_cursor dst; - int r; + uint64_t cur_size, to; + int r, e, n_fences; /* The fences will be either added to the resv object or the last fence * will be returned to the caller. In the latter case, all fill jobs will @@ -2515,53 +2518,92 @@ int amdgpu_fill_buffer(struct amdgpu_ttm_entity *entity, } if (!entity) { - entity = &adev->mman.clear_entities[0]; resv = &bo->tbo.base._resv; - r = dma_resv_reserve_fences(resv, 1);
+ /* Determine how much fences we're going to add to the + * resv object. + */ + n_fences = 0; + amdgpu_res_first(bo->tbo.resource, 0, amdgpu_bo_size(bo), &dst); + while (dst.remaining) { + cur_size = min(dst.size, 256ULL << 20);
+ n_fences += 1; + amdgpu_res_next(&dst, cur_size); + } + if (n_fences == 0) + return 0;
+ /* One slot per entity at most. */ + n_fences = MIN(n_fences, adev->mman.num_clear_entities);
+ r = dma_resv_reserve_fences(resv, n_fences); if (r) return r; + } else { + mutex_lock(&entity->gart_window_lock); } amdgpu_res_first(bo->tbo.resource, 0, amdgpu_bo_size(bo), &dst); - mutex_lock(&entity->gart_window_lock); while (dst.remaining) { - struct dma_fence *next; - uint64_t cur_size, to;
/* Never fill more than 256MiB at once to avoid timeouts */ cur_size = min(dst.size, 256ULL << 20); + if (resv) { + /* Pick a new entity for each partial clear so they can + * execute in parallel. + */ + e = atomic_inc_return(&adev->mman.next_clear_entity) % + adev->mman.num_clear_entities; + entity = &adev->mman.clear_entities[e]; + mutex_lock(&entity->gart_window_lock); + }
r = amdgpu_ttm_map_buffer(&entity->base, &bo->tbo, bo->tbo.resource, &dst, entity->gart_window_id1, ring, false, &cur_size, &to, dependency, resv); - if (r) + if (r) { + mutex_unlock(&entity->gart_window_lock); goto error; + } r = amdgpu_ttm_fill_mem(ring, &entity->base, src_data, to, cur_size, resv, - &next, true, k_job_id); - if (r) + &fence, true, k_job_id); + if (r) { + mutex_unlock(&entity->gart_window_lock); goto error;
- if (resv) { - dma_resv_add_fence(resv, next, DMA_RESV_USAGE_KERNEL); - dma_fence_put(next); - } else { - dma_fence_put(fence); - fence = next; } amdgpu_res_next(&dst, cur_size);
+ if (resv) { + /* Delay the addition of the fences to resv, otherwise the next partial + * clears will depend on this one. + */ + fences[e] = fence; + mutex_unlock(&entity->gart_window_lock); + } else { + dma_fence_put(*f); + *f = fence; + } } error: - mutex_unlock(&entity->gart_window_lock); - if (f) - *f = dma_fence_get(fence); - dma_fence_put(fence); + if (resv) { + for (e = 0; e < adev->mman.num_clear_entities; e++) { + if (fences[e]) { + dma_resv_add_fence(resv, fences[e], DMA_RESV_USAGE_KERNEL); + dma_fence_put(fences[e]); + } + } + } else { + mutex_unlock(&entity->gart_window_lock); + }
return r; } diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h index 38df2b5b4bc7..3fc31c7c6bfe 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h @@ -73,6 +73,7 @@ struct amdgpu_mman { struct amdgpu_ttm_entity default_entity; /* has no gart windows */ struct amdgpu_ttm_entity *clear_entities; + atomic_t next_clear_entity; u32 num_clear_entities; struct amdgpu_ttm_entity move_entities[TTM_FENCES_MAX_SLOT_COUNT]; u32 num_move_entities;
It's doing the same thing as amdgpu_fill_buffer(src_data=0), so drop it.
The only caveat is that amdgpu_res_cleared() return value is only valid right after allocation.
Signed-off-by: Pierre-Eric Pelloux-Prayer pierre-eric.pelloux-prayer@amd.com --- drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 9 +-- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 86 ++++------------------ drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h | 3 - 3 files changed, 18 insertions(+), 80 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c index 4a69324bb730..410e9b68ff81 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c @@ -723,15 +723,10 @@ int amdgpu_bo_create(struct amdgpu_device *adev,
if (bp->flags & AMDGPU_GEM_CREATE_VRAM_CLEARED && bo->tbo.resource->mem_type == TTM_PL_VRAM) { - struct dma_fence *fence; - - r = amdgpu_ttm_clear_buffer(bo, bo->tbo.base.resv, &fence); + r = amdgpu_fill_buffer(NULL, bo, 0, NULL, + NULL, AMDGPU_KERNEL_JOB_ID_TTM_CLEAR_BUFFER); if (unlikely(r)) goto fail_unreserve; - - dma_resv_add_fence(bo->tbo.base.resv, fence, - DMA_RESV_USAGE_KERNEL); - dma_fence_put(fence); } if (!bp->resv) amdgpu_bo_unreserve(bo); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c index d88bdb2ac083..1f553c56f31d 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c @@ -2412,75 +2412,6 @@ static int amdgpu_ttm_fill_mem(struct amdgpu_ring *ring, return 0; }
-/** - * amdgpu_ttm_clear_buffer - clear memory buffers - * @bo: amdgpu buffer object - * @resv: reservation object - * @fence: dma_fence associated with the operation - * - * Clear the memory buffer resource. - * - * Returns: - * 0 for success or a negative error code on failure. - */ -int amdgpu_ttm_clear_buffer(struct amdgpu_bo *bo, - struct dma_resv *resv, - struct dma_fence **fence) -{ - struct amdgpu_device *adev = amdgpu_ttm_adev(bo->tbo.bdev); - struct amdgpu_ring *ring = adev->mman.buffer_funcs_rings[0]; - struct amdgpu_ttm_entity *entity; - struct amdgpu_res_cursor cursor; - u64 addr; - int r = 0; - - if (!adev->mman.buffer_funcs_enabled) - return -EINVAL; - - if (!fence) - return -EINVAL; - entity = &adev->mman.clear_entities[0]; - *fence = dma_fence_get_stub(); - - amdgpu_res_first(bo->tbo.resource, 0, amdgpu_bo_size(bo), &cursor); - - mutex_lock(&entity->gart_window_lock); - while (cursor.remaining) { - struct dma_fence *next = NULL; - u64 size; - - if (amdgpu_res_cleared(&cursor)) { - amdgpu_res_next(&cursor, cursor.size); - continue; - } - - /* Never clear more than 256MiB at once to avoid timeouts */ - size = min(cursor.size, 256ULL << 20); - - r = amdgpu_ttm_map_buffer(&entity->base, - &bo->tbo, bo->tbo.resource, &cursor, - entity->gart_window_id1, ring, false, &size, &addr, - NULL, NULL); - if (r) - goto err; - - r = amdgpu_ttm_fill_mem(ring, &entity->base, 0, addr, size, resv, - &next, true, - AMDGPU_KERNEL_JOB_ID_TTM_CLEAR_BUFFER); - if (r) - goto err; - - dma_fence_put(*fence); - *fence = next; - - amdgpu_res_next(&cursor, size); - } -err: - mutex_unlock(&entity->gart_window_lock); - - return r; -} - /** * amdgpu_fill_buffer - fill a buffer with a given value * @entity: optional entity to use. If NULL, the clearing entities will be @@ -2508,6 +2439,9 @@ int amdgpu_fill_buffer(struct amdgpu_ttm_entity *entity, struct amdgpu_res_cursor dst; uint64_t cur_size, to; int r, e, n_fences; + /* The clear flag is only valid directly after allocation. */ + bool consider_clear_flag = + src_data == 0 && k_job_id == AMDGPU_KERNEL_JOB_ID_TTM_CLEAR_BUFFER;
/* The fences will be either added to the resv object or the last fence * will be returned to the caller. In the latter case, all fill jobs will @@ -2531,6 +2465,11 @@ int amdgpu_fill_buffer(struct amdgpu_ttm_entity *entity, while (dst.remaining) { cur_size = min(dst.size, 256ULL << 20);
+ if (consider_clear_flag && amdgpu_res_cleared(&dst)) { + amdgpu_res_next(&dst, dst.size); + continue; + } + n_fences += 1; amdgpu_res_next(&dst, cur_size); } @@ -2550,6 +2489,11 @@ int amdgpu_fill_buffer(struct amdgpu_ttm_entity *entity, amdgpu_res_first(bo->tbo.resource, 0, amdgpu_bo_size(bo), &dst);
while (dst.remaining) { + if (consider_clear_flag && amdgpu_res_cleared(&dst)) { + amdgpu_res_next(&dst, dst.size); + continue; + } + /* Never fill more than 256MiB at once to avoid timeouts */ cur_size = min(dst.size, 256ULL << 20);
@@ -2574,8 +2518,10 @@ int amdgpu_fill_buffer(struct amdgpu_ttm_entity *entity, goto error; }
+ r = amdgpu_ttm_fill_mem(ring, &entity->base, - src_data, to, cur_size, resv, + src_data, to, cur_size, + resv, &fence, true, k_job_id); if (r) { mutex_unlock(&entity->gart_window_lock); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h index c059a3d52b57..97e73919cb0c 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h @@ -182,9 +182,6 @@ int amdgpu_copy_buffer(struct amdgpu_ring *ring, struct dma_resv *resv, struct dma_fence **fence, bool vm_needs_flush, uint32_t copy_flags); -int amdgpu_ttm_clear_buffer(struct amdgpu_bo *bo, - struct dma_resv *resv, - struct dma_fence **fence); int amdgpu_fill_buffer(struct amdgpu_ttm_entity *entity, struct amdgpu_bo *bo, uint32_t src_data,
On 11/4/25 09:35, Pierre-Eric Pelloux-Prayer wrote:
It's doing the same thing as amdgpu_fill_buffer(src_data=0), so drop it.
The only caveat is that amdgpu_res_cleared() return value is only valid right after allocation.
Signed-off-by: Pierre-Eric Pelloux-Prayer pierre-eric.pelloux-prayer@amd.com
drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 9 +-- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 86 ++++------------------ drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h | 3 - 3 files changed, 18 insertions(+), 80 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c index 4a69324bb730..410e9b68ff81 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c @@ -723,15 +723,10 @@ int amdgpu_bo_create(struct amdgpu_device *adev, if (bp->flags & AMDGPU_GEM_CREATE_VRAM_CLEARED && bo->tbo.resource->mem_type == TTM_PL_VRAM) {
struct dma_fence *fence;r = amdgpu_ttm_clear_buffer(bo, bo->tbo.base.resv, &fence);
r = amdgpu_fill_buffer(NULL, bo, 0, NULL, if (unlikely(r)) goto fail_unreserve;NULL, AMDGPU_KERNEL_JOB_ID_TTM_CLEAR_BUFFER);
dma_resv_add_fence(bo->tbo.base.resv, fence,DMA_RESV_USAGE_KERNEL); } if (!bp->resv) amdgpu_bo_unreserve(bo);dma_fence_put(fence);diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c index d88bdb2ac083..1f553c56f31d 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c @@ -2412,75 +2412,6 @@ static int amdgpu_ttm_fill_mem(struct amdgpu_ring *ring, return 0; } -/**
- amdgpu_ttm_clear_buffer - clear memory buffers
- @bo: amdgpu buffer object
- @resv: reservation object
- @fence: dma_fence associated with the operation
- Clear the memory buffer resource.
- Returns:
- 0 for success or a negative error code on failure.
- */
-int amdgpu_ttm_clear_buffer(struct amdgpu_bo *bo,
struct dma_resv *resv,struct dma_fence **fence)-{
- struct amdgpu_device *adev = amdgpu_ttm_adev(bo->tbo.bdev);
- struct amdgpu_ring *ring = adev->mman.buffer_funcs_rings[0];
- struct amdgpu_ttm_entity *entity;
- struct amdgpu_res_cursor cursor;
- u64 addr;
- int r = 0;
- if (!adev->mman.buffer_funcs_enabled)
return -EINVAL;- if (!fence)
return -EINVAL;- entity = &adev->mman.clear_entities[0];
- *fence = dma_fence_get_stub();
- amdgpu_res_first(bo->tbo.resource, 0, amdgpu_bo_size(bo), &cursor);
- mutex_lock(&entity->gart_window_lock);
- while (cursor.remaining) {
struct dma_fence *next = NULL;u64 size;if (amdgpu_res_cleared(&cursor)) {amdgpu_res_next(&cursor, cursor.size);continue;}/* Never clear more than 256MiB at once to avoid timeouts */size = min(cursor.size, 256ULL << 20);r = amdgpu_ttm_map_buffer(&entity->base,&bo->tbo, bo->tbo.resource, &cursor,entity->gart_window_id1, ring, false, &size, &addr,NULL, NULL);if (r)goto err;r = amdgpu_ttm_fill_mem(ring, &entity->base, 0, addr, size, resv,&next, true,AMDGPU_KERNEL_JOB_ID_TTM_CLEAR_BUFFER);if (r)goto err;dma_fence_put(*fence);*fence = next;amdgpu_res_next(&cursor, size);- }
-err:
- mutex_unlock(&entity->gart_window_lock);
- return r;
-}
/**
- amdgpu_fill_buffer - fill a buffer with a given value
- @entity: optional entity to use. If NULL, the clearing entities will be
@@ -2508,6 +2439,9 @@ int amdgpu_fill_buffer(struct amdgpu_ttm_entity *entity, struct amdgpu_res_cursor dst; uint64_t cur_size, to; int r, e, n_fences;
- /* The clear flag is only valid directly after allocation. */
- bool consider_clear_flag =
src_data == 0 && k_job_id == AMDGPU_KERNEL_JOB_ID_TTM_CLEAR_BUFFER;
Absolutely clear NAK to that.
Christian.
/* The fences will be either added to the resv object or the last fence * will be returned to the caller. In the latter case, all fill jobs will @@ -2531,6 +2465,11 @@ int amdgpu_fill_buffer(struct amdgpu_ttm_entity *entity, while (dst.remaining) { cur_size = min(dst.size, 256ULL << 20);
if (consider_clear_flag && amdgpu_res_cleared(&dst)) {amdgpu_res_next(&dst, dst.size);continue;} }n_fences += 1; amdgpu_res_next(&dst, cur_size);@@ -2550,6 +2489,11 @@ int amdgpu_fill_buffer(struct amdgpu_ttm_entity *entity, amdgpu_res_first(bo->tbo.resource, 0, amdgpu_bo_size(bo), &dst); while (dst.remaining) {
if (consider_clear_flag && amdgpu_res_cleared(&dst)) {amdgpu_res_next(&dst, dst.size);continue;}- /* Never fill more than 256MiB at once to avoid timeouts */ cur_size = min(dst.size, 256ULL << 20);
@@ -2574,8 +2518,10 @@ int amdgpu_fill_buffer(struct amdgpu_ttm_entity *entity, goto error; }
- r = amdgpu_ttm_fill_mem(ring, &entity->base,
src_data, to, cur_size, resv,
src_data, to, cur_size, if (r) { mutex_unlock(&entity->gart_window_lock);resv, &fence, true, k_job_id);diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h index c059a3d52b57..97e73919cb0c 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h @@ -182,9 +182,6 @@ int amdgpu_copy_buffer(struct amdgpu_ring *ring, struct dma_resv *resv, struct dma_fence **fence, bool vm_needs_flush, uint32_t copy_flags); -int amdgpu_ttm_clear_buffer(struct amdgpu_bo *bo,
struct dma_resv *resv,struct dma_fence **fence);int amdgpu_fill_buffer(struct amdgpu_ttm_entity *entity, struct amdgpu_bo *bo, uint32_t src_data,
Le 05/11/2025 à 09:46, Christian König a écrit :
On 11/4/25 09:35, Pierre-Eric Pelloux-Prayer wrote:
It's doing the same thing as amdgpu_fill_buffer(src_data=0), so drop it.
The only caveat is that amdgpu_res_cleared() return value is only valid right after allocation.
Signed-off-by: Pierre-Eric Pelloux-Prayer pierre-eric.pelloux-prayer@amd.com
drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 9 +-- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 86 ++++------------------ drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h | 3 - 3 files changed, 18 insertions(+), 80 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c index 4a69324bb730..410e9b68ff81 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c @@ -723,15 +723,10 @@ int amdgpu_bo_create(struct amdgpu_device *adev, if (bp->flags & AMDGPU_GEM_CREATE_VRAM_CLEARED && bo->tbo.resource->mem_type == TTM_PL_VRAM) {
struct dma_fence *fence;r = amdgpu_ttm_clear_buffer(bo, bo->tbo.base.resv, &fence);
r = amdgpu_fill_buffer(NULL, bo, 0, NULL, if (unlikely(r)) goto fail_unreserve;NULL, AMDGPU_KERNEL_JOB_ID_TTM_CLEAR_BUFFER);
dma_resv_add_fence(bo->tbo.base.resv, fence,DMA_RESV_USAGE_KERNEL); } if (!bp->resv) amdgpu_bo_unreserve(bo);dma_fence_put(fence);diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c index d88bdb2ac083..1f553c56f31d 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c @@ -2412,75 +2412,6 @@ static int amdgpu_ttm_fill_mem(struct amdgpu_ring *ring, return 0; } -/**
- amdgpu_ttm_clear_buffer - clear memory buffers
- @bo: amdgpu buffer object
- @resv: reservation object
- @fence: dma_fence associated with the operation
- Clear the memory buffer resource.
- Returns:
- 0 for success or a negative error code on failure.
- */
-int amdgpu_ttm_clear_buffer(struct amdgpu_bo *bo,
struct dma_resv *resv,struct dma_fence **fence)-{
- struct amdgpu_device *adev = amdgpu_ttm_adev(bo->tbo.bdev);
- struct amdgpu_ring *ring = adev->mman.buffer_funcs_rings[0];
- struct amdgpu_ttm_entity *entity;
- struct amdgpu_res_cursor cursor;
- u64 addr;
- int r = 0;
- if (!adev->mman.buffer_funcs_enabled)
return -EINVAL;- if (!fence)
return -EINVAL;- entity = &adev->mman.clear_entities[0];
- *fence = dma_fence_get_stub();
- amdgpu_res_first(bo->tbo.resource, 0, amdgpu_bo_size(bo), &cursor);
- mutex_lock(&entity->gart_window_lock);
- while (cursor.remaining) {
struct dma_fence *next = NULL;u64 size;if (amdgpu_res_cleared(&cursor)) {amdgpu_res_next(&cursor, cursor.size);continue;}/* Never clear more than 256MiB at once to avoid timeouts */size = min(cursor.size, 256ULL << 20);r = amdgpu_ttm_map_buffer(&entity->base,&bo->tbo, bo->tbo.resource, &cursor,entity->gart_window_id1, ring, false, &size, &addr,NULL, NULL);if (r)goto err;r = amdgpu_ttm_fill_mem(ring, &entity->base, 0, addr, size, resv,&next, true,AMDGPU_KERNEL_JOB_ID_TTM_CLEAR_BUFFER);if (r)goto err;dma_fence_put(*fence);*fence = next;amdgpu_res_next(&cursor, size);- }
-err:
- mutex_unlock(&entity->gart_window_lock);
- return r;
-}
- /**
- amdgpu_fill_buffer - fill a buffer with a given value
- @entity: optional entity to use. If NULL, the clearing entities will be
@@ -2508,6 +2439,9 @@ int amdgpu_fill_buffer(struct amdgpu_ttm_entity *entity, struct amdgpu_res_cursor dst; uint64_t cur_size, to; int r, e, n_fences;
- /* The clear flag is only valid directly after allocation. */
- bool consider_clear_flag =
src_data == 0 && k_job_id == AMDGPU_KERNEL_JOB_ID_TTM_CLEAR_BUFFER;Absolutely clear NAK to that.
I suppose the NAK applies to the check, not the general idea of the patch? In that case, would passing "bool consider_clear_flag" as a parameter be ok ?
Pierre-Eric
Christian.
/* The fences will be either added to the resv object or the last fence * will be returned to the caller. In the latter case, all fill jobs will @@ -2531,6 +2465,11 @@ int amdgpu_fill_buffer(struct amdgpu_ttm_entity *entity, while (dst.remaining) { cur_size = min(dst.size, 256ULL << 20);
if (consider_clear_flag && amdgpu_res_cleared(&dst)) {amdgpu_res_next(&dst, dst.size);continue;} }n_fences += 1; amdgpu_res_next(&dst, cur_size);@@ -2550,6 +2489,11 @@ int amdgpu_fill_buffer(struct amdgpu_ttm_entity *entity, amdgpu_res_first(bo->tbo.resource, 0, amdgpu_bo_size(bo), &dst); while (dst.remaining) {
if (consider_clear_flag && amdgpu_res_cleared(&dst)) {amdgpu_res_next(&dst, dst.size);continue;}- /* Never fill more than 256MiB at once to avoid timeouts */ cur_size = min(dst.size, 256ULL << 20);
@@ -2574,8 +2518,10 @@ int amdgpu_fill_buffer(struct amdgpu_ttm_entity *entity, goto error; }
- r = amdgpu_ttm_fill_mem(ring, &entity->base,
src_data, to, cur_size, resv,
src_data, to, cur_size, if (r) { mutex_unlock(&entity->gart_window_lock);resv, &fence, true, k_job_id);diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h index c059a3d52b57..97e73919cb0c 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h @@ -182,9 +182,6 @@ int amdgpu_copy_buffer(struct amdgpu_ring *ring, struct dma_resv *resv, struct dma_fence **fence, bool vm_needs_flush, uint32_t copy_flags); -int amdgpu_ttm_clear_buffer(struct amdgpu_bo *bo,
struct dma_resv *resv, int amdgpu_fill_buffer(struct amdgpu_ttm_entity *entity, struct amdgpu_bo *bo, uint32_t src_data,struct dma_fence **fence);
On 11/5/25 10:30, Pierre-Eric Pelloux-Prayer wrote:
Le 05/11/2025 à 09:46, Christian König a écrit :
On 11/4/25 09:35, Pierre-Eric Pelloux-Prayer wrote:
It's doing the same thing as amdgpu_fill_buffer(src_data=0), so drop it.
The only caveat is that amdgpu_res_cleared() return value is only valid right after allocation.
Signed-off-by: Pierre-Eric Pelloux-Prayer pierre-eric.pelloux-prayer@amd.com
drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 9 +-- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 86 ++++------------------ drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h | 3 - 3 files changed, 18 insertions(+), 80 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c index 4a69324bb730..410e9b68ff81 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c @@ -723,15 +723,10 @@ int amdgpu_bo_create(struct amdgpu_device *adev, if (bp->flags & AMDGPU_GEM_CREATE_VRAM_CLEARED && bo->tbo.resource->mem_type == TTM_PL_VRAM) { - struct dma_fence *fence;
- r = amdgpu_ttm_clear_buffer(bo, bo->tbo.base.resv, &fence); + r = amdgpu_fill_buffer(NULL, bo, 0, NULL, + NULL, AMDGPU_KERNEL_JOB_ID_TTM_CLEAR_BUFFER); if (unlikely(r)) goto fail_unreserve;
- dma_resv_add_fence(bo->tbo.base.resv, fence, - DMA_RESV_USAGE_KERNEL); - dma_fence_put(fence); } if (!bp->resv) amdgpu_bo_unreserve(bo); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c index d88bdb2ac083..1f553c56f31d 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c @@ -2412,75 +2412,6 @@ static int amdgpu_ttm_fill_mem(struct amdgpu_ring *ring, return 0; } -/**
- amdgpu_ttm_clear_buffer - clear memory buffers
- @bo: amdgpu buffer object
- @resv: reservation object
- @fence: dma_fence associated with the operation
- Clear the memory buffer resource.
- Returns:
- 0 for success or a negative error code on failure.
- */
-int amdgpu_ttm_clear_buffer(struct amdgpu_bo *bo, - struct dma_resv *resv, - struct dma_fence **fence) -{ - struct amdgpu_device *adev = amdgpu_ttm_adev(bo->tbo.bdev); - struct amdgpu_ring *ring = adev->mman.buffer_funcs_rings[0]; - struct amdgpu_ttm_entity *entity; - struct amdgpu_res_cursor cursor; - u64 addr; - int r = 0;
- if (!adev->mman.buffer_funcs_enabled) - return -EINVAL;
- if (!fence) - return -EINVAL; - entity = &adev->mman.clear_entities[0]; - *fence = dma_fence_get_stub();
- amdgpu_res_first(bo->tbo.resource, 0, amdgpu_bo_size(bo), &cursor);
- mutex_lock(&entity->gart_window_lock); - while (cursor.remaining) { - struct dma_fence *next = NULL; - u64 size;
- if (amdgpu_res_cleared(&cursor)) { - amdgpu_res_next(&cursor, cursor.size); - continue; - }
- /* Never clear more than 256MiB at once to avoid timeouts */ - size = min(cursor.size, 256ULL << 20);
- r = amdgpu_ttm_map_buffer(&entity->base, - &bo->tbo, bo->tbo.resource, &cursor, - entity->gart_window_id1, ring, false, &size, &addr, - NULL, NULL); - if (r) - goto err;
- r = amdgpu_ttm_fill_mem(ring, &entity->base, 0, addr, size, resv, - &next, true, - AMDGPU_KERNEL_JOB_ID_TTM_CLEAR_BUFFER); - if (r) - goto err;
- dma_fence_put(*fence); - *fence = next;
- amdgpu_res_next(&cursor, size); - } -err: - mutex_unlock(&entity->gart_window_lock);
- return r; -}
/** * amdgpu_fill_buffer - fill a buffer with a given value * @entity: optional entity to use. If NULL, the clearing entities will be @@ -2508,6 +2439,9 @@ int amdgpu_fill_buffer(struct amdgpu_ttm_entity *entity, struct amdgpu_res_cursor dst; uint64_t cur_size, to; int r, e, n_fences;
+ /* The clear flag is only valid directly after allocation. */ + bool consider_clear_flag = + src_data == 0 && k_job_id == AMDGPU_KERNEL_JOB_ID_TTM_CLEAR_BUFFER;
Absolutely clear NAK to that.
I suppose the NAK applies to the check, not the general idea of the patch?
Correct.
In that case, would passing "bool consider_clear_flag" as a parameter be ok ?
And then determining the k_job_id based on this new flag? Yeah, that sounds much cleaner.
Christian.
Pierre-Eric
Christian.
/* The fences will be either added to the resv object or the last fence * will be returned to the caller. In the latter case, all fill jobs will @@ -2531,6 +2465,11 @@ int amdgpu_fill_buffer(struct amdgpu_ttm_entity *entity, while (dst.remaining) { cur_size = min(dst.size, 256ULL << 20); + if (consider_clear_flag && amdgpu_res_cleared(&dst)) { + amdgpu_res_next(&dst, dst.size); + continue; + }
n_fences += 1; amdgpu_res_next(&dst, cur_size); } @@ -2550,6 +2489,11 @@ int amdgpu_fill_buffer(struct amdgpu_ttm_entity *entity, amdgpu_res_first(bo->tbo.resource, 0, amdgpu_bo_size(bo), &dst); while (dst.remaining) { + if (consider_clear_flag && amdgpu_res_cleared(&dst)) { + amdgpu_res_next(&dst, dst.size); + continue; + }
/* Never fill more than 256MiB at once to avoid timeouts */ cur_size = min(dst.size, 256ULL << 20); @@ -2574,8 +2518,10 @@ int amdgpu_fill_buffer(struct amdgpu_ttm_entity *entity, goto error; } + r = amdgpu_ttm_fill_mem(ring, &entity->base, - src_data, to, cur_size, resv, + src_data, to, cur_size, + resv, &fence, true, k_job_id); if (r) { mutex_unlock(&entity->gart_window_lock); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h index c059a3d52b57..97e73919cb0c 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h @@ -182,9 +182,6 @@ int amdgpu_copy_buffer(struct amdgpu_ring *ring, struct dma_resv *resv, struct dma_fence **fence, bool vm_needs_flush, uint32_t copy_flags); -int amdgpu_ttm_clear_buffer(struct amdgpu_bo *bo, - struct dma_resv *resv, - struct dma_fence **fence); int amdgpu_fill_buffer(struct amdgpu_ttm_entity *entity, struct amdgpu_bo *bo, uint32_t src_data,
This is the only use case for this function.
Signed-off-by: Pierre-Eric Pelloux-Prayer pierre-eric.pelloux-prayer@amd.com --- drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 8 +++---- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 25 ++++++++++------------ drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h | 11 +++++----- 3 files changed, 20 insertions(+), 24 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c index 410e9b68ff81..9dc262cac39f 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c @@ -723,8 +723,8 @@ int amdgpu_bo_create(struct amdgpu_device *adev,
if (bp->flags & AMDGPU_GEM_CREATE_VRAM_CLEARED && bo->tbo.resource->mem_type == TTM_PL_VRAM) { - r = amdgpu_fill_buffer(NULL, bo, 0, NULL, - NULL, AMDGPU_KERNEL_JOB_ID_TTM_CLEAR_BUFFER); + r = amdgpu_clear_buffer(NULL, bo, NULL, + NULL, AMDGPU_KERNEL_JOB_ID_TTM_CLEAR_BUFFER); if (unlikely(r)) goto fail_unreserve; } @@ -1311,8 +1311,8 @@ void amdgpu_bo_release_notify(struct ttm_buffer_object *bo) adev->in_suspend || drm_dev_is_unplugged(adev_to_drm(adev))) goto out;
- r = amdgpu_fill_buffer(NULL, abo, 0, NULL, - NULL, AMDGPU_KERNEL_JOB_ID_CLEAR_ON_RELEASE); + r = amdgpu_clear_buffer(NULL, abo, NULL, + NULL, AMDGPU_KERNEL_JOB_ID_CLEAR_ON_RELEASE); if (WARN_ON(r)) goto out;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c index 1f553c56f31d..ac2857314d68 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c @@ -431,9 +431,9 @@ static int amdgpu_move_blit(struct ttm_buffer_object *bo, (abo->flags & AMDGPU_GEM_CREATE_VRAM_WIPE_ON_RELEASE)) { struct dma_fence *wipe_fence = NULL;
- r = amdgpu_fill_buffer(entity, - abo, 0, &wipe_fence, fence, - AMDGPU_KERNEL_JOB_ID_MOVE_BLIT); + r = amdgpu_clear_buffer(entity, + abo, &wipe_fence, fence, + AMDGPU_KERNEL_JOB_ID_MOVE_BLIT); if (r) { goto error; } else if (wipe_fence) { @@ -2413,23 +2413,21 @@ static int amdgpu_ttm_fill_mem(struct amdgpu_ring *ring, }
/** - * amdgpu_fill_buffer - fill a buffer with a given value + * amdgpu_clear_buffer - fill a buffer with 0 * @entity: optional entity to use. If NULL, the clearing entities will be * used to load-balance the partial clears * @bo: the bo to fill - * @src_data: the value to set * @f: optional out fence. If @entity is NULL, this must be NULL and the * fences from each partial clear will be added to the &dma_resv. * @dependency: optional input dependency fence. * @k_job_id: trace id * */ -int amdgpu_fill_buffer(struct amdgpu_ttm_entity *entity, - struct amdgpu_bo *bo, - uint32_t src_data, - struct dma_fence **f, - struct dma_fence *dependency, - u64 k_job_id) +int amdgpu_clear_buffer(struct amdgpu_ttm_entity *entity, + struct amdgpu_bo *bo, + struct dma_fence **f, + struct dma_fence *dependency, + u64 k_job_id) { struct amdgpu_device *adev = amdgpu_ttm_adev(bo->tbo.bdev); struct amdgpu_ring *ring = adev->mman.buffer_funcs_rings[0]; @@ -2440,8 +2438,7 @@ int amdgpu_fill_buffer(struct amdgpu_ttm_entity *entity, uint64_t cur_size, to; int r, e, n_fences; /* The clear flag is only valid directly after allocation. */ - bool consider_clear_flag = - src_data == 0 && k_job_id == AMDGPU_KERNEL_JOB_ID_TTM_CLEAR_BUFFER; + bool consider_clear_flag = k_job_id == AMDGPU_KERNEL_JOB_ID_TTM_CLEAR_BUFFER;
/* The fences will be either added to the resv object or the last fence * will be returned to the caller. In the latter case, all fill jobs will @@ -2520,7 +2517,7 @@ int amdgpu_fill_buffer(struct amdgpu_ttm_entity *entity,
r = amdgpu_ttm_fill_mem(ring, &entity->base, - src_data, to, cur_size, + 0, to, cur_size, resv, &fence, true, k_job_id); if (r) { diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h index 97e73919cb0c..b685bf207e43 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h @@ -182,12 +182,11 @@ int amdgpu_copy_buffer(struct amdgpu_ring *ring, struct dma_resv *resv, struct dma_fence **fence, bool vm_needs_flush, uint32_t copy_flags); -int amdgpu_fill_buffer(struct amdgpu_ttm_entity *entity, - struct amdgpu_bo *bo, - uint32_t src_data, - struct dma_fence **f, - struct dma_fence *dependency, - u64 k_job_id); +int amdgpu_clear_buffer(struct amdgpu_ttm_entity *entity, + struct amdgpu_bo *bo, + struct dma_fence **f, + struct dma_fence *dependency, + u64 k_job_id);
int amdgpu_ttm_alloc_gart(struct ttm_buffer_object *bo); void amdgpu_ttm_recover_gart(struct ttm_buffer_object *tbo);
On 11/4/25 09:35, Pierre-Eric Pelloux-Prayer wrote:
This is the only use case for this function.
amdgpu_ttm_clear_buffer please, but apart from that looks good to me.
Regards, Christian.
Signed-off-by: Pierre-Eric Pelloux-Prayer pierre-eric.pelloux-prayer@amd.com
drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 8 +++---- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 25 ++++++++++------------ drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h | 11 +++++----- 3 files changed, 20 insertions(+), 24 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c index 410e9b68ff81..9dc262cac39f 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c @@ -723,8 +723,8 @@ int amdgpu_bo_create(struct amdgpu_device *adev, if (bp->flags & AMDGPU_GEM_CREATE_VRAM_CLEARED && bo->tbo.resource->mem_type == TTM_PL_VRAM) {
r = amdgpu_fill_buffer(NULL, bo, 0, NULL,NULL, AMDGPU_KERNEL_JOB_ID_TTM_CLEAR_BUFFER);
r = amdgpu_clear_buffer(NULL, bo, NULL, if (unlikely(r)) goto fail_unreserve; }NULL, AMDGPU_KERNEL_JOB_ID_TTM_CLEAR_BUFFER);@@ -1311,8 +1311,8 @@ void amdgpu_bo_release_notify(struct ttm_buffer_object *bo) adev->in_suspend || drm_dev_is_unplugged(adev_to_drm(adev))) goto out;
- r = amdgpu_fill_buffer(NULL, abo, 0, NULL,
NULL, AMDGPU_KERNEL_JOB_ID_CLEAR_ON_RELEASE);
- r = amdgpu_clear_buffer(NULL, abo, NULL,
if (WARN_ON(r)) goto out;NULL, AMDGPU_KERNEL_JOB_ID_CLEAR_ON_RELEASE);diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c index 1f553c56f31d..ac2857314d68 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c @@ -431,9 +431,9 @@ static int amdgpu_move_blit(struct ttm_buffer_object *bo, (abo->flags & AMDGPU_GEM_CREATE_VRAM_WIPE_ON_RELEASE)) { struct dma_fence *wipe_fence = NULL;
r = amdgpu_fill_buffer(entity,abo, 0, &wipe_fence, fence,AMDGPU_KERNEL_JOB_ID_MOVE_BLIT);
r = amdgpu_clear_buffer(entity,abo, &wipe_fence, fence, if (r) { goto error; } else if (wipe_fence) {AMDGPU_KERNEL_JOB_ID_MOVE_BLIT);@@ -2413,23 +2413,21 @@ static int amdgpu_ttm_fill_mem(struct amdgpu_ring *ring, } /**
- amdgpu_fill_buffer - fill a buffer with a given value
- amdgpu_clear_buffer - fill a buffer with 0
- @entity: optional entity to use. If NULL, the clearing entities will be
used to load-balance the partial clears- @bo: the bo to fill
*/
- @src_data: the value to set
- @f: optional out fence. If @entity is NULL, this must be NULL and the
fences from each partial clear will be added to the &dma_resv.- @dependency: optional input dependency fence.
- @k_job_id: trace id
-int amdgpu_fill_buffer(struct amdgpu_ttm_entity *entity,
struct amdgpu_bo *bo,uint32_t src_data,struct dma_fence **f,struct dma_fence *dependency,u64 k_job_id)+int amdgpu_clear_buffer(struct amdgpu_ttm_entity *entity,
struct amdgpu_bo *bo,struct dma_fence **f,struct dma_fence *dependency,u64 k_job_id){ struct amdgpu_device *adev = amdgpu_ttm_adev(bo->tbo.bdev); struct amdgpu_ring *ring = adev->mman.buffer_funcs_rings[0]; @@ -2440,8 +2438,7 @@ int amdgpu_fill_buffer(struct amdgpu_ttm_entity *entity, uint64_t cur_size, to; int r, e, n_fences; /* The clear flag is only valid directly after allocation. */
- bool consider_clear_flag =
src_data == 0 && k_job_id == AMDGPU_KERNEL_JOB_ID_TTM_CLEAR_BUFFER;
- bool consider_clear_flag = k_job_id == AMDGPU_KERNEL_JOB_ID_TTM_CLEAR_BUFFER;
/* The fences will be either added to the resv object or the last fence * will be returned to the caller. In the latter case, all fill jobs will @@ -2520,7 +2517,7 @@ int amdgpu_fill_buffer(struct amdgpu_ttm_entity *entity, r = amdgpu_ttm_fill_mem(ring, &entity->base,
src_data, to, cur_size,
if (r) {0, to, cur_size, resv, &fence, true, k_job_id);diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h index 97e73919cb0c..b685bf207e43 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h @@ -182,12 +182,11 @@ int amdgpu_copy_buffer(struct amdgpu_ring *ring, struct dma_resv *resv, struct dma_fence **fence, bool vm_needs_flush, uint32_t copy_flags); -int amdgpu_fill_buffer(struct amdgpu_ttm_entity *entity,
struct amdgpu_bo *bo,uint32_t src_data,struct dma_fence **f,struct dma_fence *dependency,u64 k_job_id);+int amdgpu_clear_buffer(struct amdgpu_ttm_entity *entity,
struct amdgpu_bo *bo,struct dma_fence **f,struct dma_fence *dependency,u64 k_job_id);int amdgpu_ttm_alloc_gart(struct ttm_buffer_object *bo); void amdgpu_ttm_recover_gart(struct ttm_buffer_object *tbo);
linaro-mm-sig@lists.linaro.org