From: Amber Lin Amber.Lin@amd.com
[ Upstream commit f3820e9d356132e18405cd7606e22dc87ccfa6d1 ]
When KFD asks CP to preempt queues, other than preempt CP queues, CP also requests SDMA to preempt SDMA queues with UNMAP_LATENCY timeout. Currently queue_preemption_timeout_ms is 9000 ms by default but can be configured via module parameter. KFD_UNMAP_LATENCY_MS is hard coded as 4000 ms though. This patch ties KFD_UNMAP_LATENCY_MS to queue_preemption_timeout_ms so in a slow system such as emulator, both CP and SDMA slowness are taken into account.
Signed-off-by: Amber Lin Amber.Lin@amd.com Reviewed-by: Harish Kasiviswanathan Harish.Kasiviswanathan@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Sasha Levin sashal@kernel.org ---
LLM Generated explanations, may be completely bogus:
YES
- What changed - Replaces hard-coded `KFD_UNMAP_LATENCY_MS (4000)` with a value derived from the existing module parameter `queue_preemption_timeout_ms`: `((queue_preemption_timeout_ms - queue_preemption_timeout_ms / 10) >> 1)` in `drivers/gpu/drm/amd/amdkfd/kfd_priv.h:120`. This budgets ~45% of the total preemption timeout for each of the two SDMA engines, leaving ~10% for CP overhead, per the new comment in `drivers/gpu/drm/amd/amdkfd/kfd_priv.h:114`. - `queue_preemption_timeout_ms` is already a public module parameter with default 9000 ms in `drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c:833`, documented at `drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c:835`, and declared for KFD use at `drivers/gpu/drm/amd/amdkfd/kfd_priv.h:195`.
- Why it matters (bug and impact) - When KFD asks CP to preempt queues, CP also requests SDMA to preempt SDMA queues with an UNMAP latency. The driver waits for the CP fence using `queue_preemption_timeout_ms` (see `drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c:2402`), but previously SDMA’s UNMAP latency was fixed at 4000 ms. This mismatch can cause spurious preemption timeouts on slow systems (e.g., emulators) or when users tune the module parameter, leading to preempt failures and potential error paths like “The cp might be in an unrecoverable state due to an unsuccessful queues preemption.” - By tying `KFD_UNMAP_LATENCY_MS` to `queue_preemption_timeout_ms`, the SDMA preemption budget scales consistently with the CP fence wait, avoiding premature timeouts and improving reliability.
- Where the new value is used - Programmed into MES/PM4 packets (units of 100 ms): `packet->bitfields2.unmap_latency = KFD_UNMAP_LATENCY_MS / 100;` in `drivers/gpu/drm/amd/amdkfd/kfd_packet_manager_vi.c:129` and `drivers/gpu/drm/amd/amdkfd/kfd_packet_manager_v9.c:205`. - Passed as the timeout when destroying MQDs (preempt/unmap paths): calls to `mqd_mgr->destroy_mqd(..., KFD_UNMAP_LATENCY_MS, ...)` in `drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c:884`, `drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c:996`, and `drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c:1175`. - Used for resetting hung queues via `hqd_reset(..., KFD_UNMAP_LATENCY_MS)` in `drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c:2230`.
- Stable criteria assessment - Fixes a real-world reliability issue (timeouts/mismatched budgets) that affects users, especially on slow systems and when `queue_preemption_timeout_ms` is tuned. - Change is small, contained to a single macro in one header (`kfd_priv.h`) with clear rationale and no architectural refactoring. - Side effects are minimal: default behavior remains effectively unchanged (for 9000 ms, `KFD_UNMAP_LATENCY_MS` becomes ~4050 ms; when quantized to 100 ms units it still programs 40), while non- default configurations become consistent and safer. - Touches KFD/amdgpu preemption logic but only adjusts a timeout parameter already designed to be user-configurable; no new features introduced.
Given the above, this is a low-risk, correctness-improving timeout alignment and a good candidate for backporting to stable.
drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h index 67694bcd94646..d01ef5ac07666 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h +++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h @@ -111,7 +111,14 @@
#define KFD_KERNEL_QUEUE_SIZE 2048
-#define KFD_UNMAP_LATENCY_MS (4000) +/* KFD_UNMAP_LATENCY_MS is the timeout CP waiting for SDMA preemption. One XCC + * can be associated to 2 SDMA engines. queue_preemption_timeout_ms is the time + * driver waiting for CP returning the UNMAP_QUEUE fence. Thus the math is + * queue_preemption_timeout_ms = sdma_preemption_time * 2 + cp workload + * The format here makes CP workload 10% of total timeout + */ +#define KFD_UNMAP_LATENCY_MS \ + ((queue_preemption_timeout_ms - queue_preemption_timeout_ms / 10) >> 1)
#define KFD_MAX_SDMA_QUEUES 128