Re: [PATCH 1/3] drm/amdgpu: Forward soft recovery errors to userspace

1 Aug 2024

Hi,
I happened to come across an issue just now again where soft recovery
fails to get reported to userspace properly, causing apps to submit
hanging work in a loop (which ended up hanging the entire machine) - it
seems like this patch never made it into amd-staging-drm-next. Given
that it has a Reviewed-by and everything, was this just an oversight or
are there some blockers to pushing it that I missed?
If not, I'd be grateful if the patch could get merged.
Thanks,
Friedrich
On 08.03.24 09:33, Christian König wrote:
...
Am 07.03.24 um 20:04 schrieb Joshua Ashton:
...
As we discussed before[1], soft recovery should be
forwarded to userspace, or we can get into a really
bad state where apps will keep submitting hanging
command buffers cascading us to a hard reset.
Marek you are in favor of this like forever.  So I would like to request
you to put your Reviewed-by on it and I will just push it into our
internal kernel branch.
Regards,
Christian.
...
1:
https://lore.kernel.org/all/bf23d5ed-9a6b-43e7-84ee-8cbfd0d60f18@froggi.es/
Signed-off-by: Joshua Ashton joshua@froggi.es
Cc: Friedrich Vock friedrich.vock@gmx.de
Cc: Bas Nieuwenhuizen bas@basnieuwenhuizen.nl
Cc: Christian König christian.koenig@amd.com
Cc: André Almeida andrealmeid@igalia.com
Cc: stable@vger.kernel.org

drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 3 +--
  1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
index 4b3000c21ef2..aebf59855e9f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
@@ -262,9 +262,8 @@ amdgpu_job_prepare_job(struct drm_sched_job
*sched_job,
      struct dma_fence *fence = NULL;
      int r;
-    /* Ignore soft recovered fences here */
      r = drm_sched_entity_error(s_entity);
-    if (r && r != -ENODATA)
+    if (r)
          goto error;
      if (!fence && job->gang_submit)

    

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

Re: [PATCH 1/3] drm/amdgpu: Forward soft recovery errors to userspace