[AMD Public Use]
-----Original Message----- From: Tuikov, Luben Luben.Tuikov@amd.com Sent: Wednesday, May 12, 2021 1:03 PM To: amd-gfx@lists.freedesktop.org Cc: Tuikov, Luben Luben.Tuikov@amd.com; Deucher, Alexander Alexander.Deucher@amd.com; stable@vger.kernel.org Subject: [PATCH 1/2] drm/amdgpu: Don't query CE and UE errors
On QUERY2 IOCTL don't query counts of correctable and uncorrectable errors, since when RAS is enabled and supported on Vega20 server boards, this takes insurmountably long time, in O(n^3), which slows the system down to the point of it being unusable when we have GUI up.
Fixes: ae363a212b14 ("drm/amdgpu: Add a new flag to AMDGPU_CTX_OP_QUERY_STATE2") Cc: Alexander Deucher Alexander.Deucher@amd.com Cc: stable@vger.kernel.org Signed-off-by: Luben Tuikov luben.tuikov@amd.com
drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c | 26 ++++++++++++-----------
1 file changed, 13 insertions(+), 13 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c index 01fe60fedcbe..d481a33f4eaf 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c @@ -363,19 +363,19 @@ static int amdgpu_ctx_query2(struct amdgpu_device *adev, out->state.flags |= AMDGPU_CTX_QUERY2_FLAGS_GUILTY;
/*query ue count*/
- ras_counter = amdgpu_ras_query_error_count(adev, false);
- /*ras counter is monotonic increasing*/
- if (ras_counter != ctx->ras_counter_ue) {
out->state.flags |= AMDGPU_CTX_QUERY2_FLAGS_RAS_UE;
ctx->ras_counter_ue = ras_counter;
- }
- /*query ce count*/
- ras_counter = amdgpu_ras_query_error_count(adev, true);
- if (ras_counter != ctx->ras_counter_ce) {
out->state.flags |= AMDGPU_CTX_QUERY2_FLAGS_RAS_CE;
ctx->ras_counter_ce = ras_counter;
- }
- /* ras_counter = amdgpu_ras_query_error_count(adev, false); */
- /* /*ras counter is monotonic increasing*/ */
- /* if (ras_counter != ctx->ras_counter_ue) { */
- /* out->state.flags |= AMDGPU_CTX_QUERY2_FLAGS_RAS_UE;
*/
- /* ctx->ras_counter_ue = ras_counter; */
- /* } */
- /* /*query ce count*/ */
- /* ras_counter = amdgpu_ras_query_error_count(adev, true); */
- /* if (ras_counter != ctx->ras_counter_ce) { */
- /* out->state.flags |= AMDGPU_CTX_QUERY2_FLAGS_RAS_CE;
*/
- /* ctx->ras_counter_ce = ras_counter; */
- /* } */
Rather than commenting this out, just drop it in patch 1, and then re-add this in patch 2.
Alex
mutex_unlock(&mgr->lock); return 0; -- 2.31.1.527.g2d677e5b15