Commit 704d3d60fec4 ("drm/etnaviv: don't block scheduler when GPU is still active") ensured that active jobs are returned to the pending list when extending the timeout. However, it didn't use the pending list's lock to manipulate the list, which causes a race condition as the scheduler's workqueues are running.
Hold the lock while manipulating the scheduler's pending list to prevent a race.
Cc: stable@vger.kernel.org Fixes: 704d3d60fec4 ("drm/etnaviv: don't block scheduler when GPU is still active") Signed-off-by: Maíra Canal mcanal@igalia.com --- Hi,
I'm proposing this workaround patch to address the race-condition caused by manipulating the pending list without using its lock. Although I understand this isn't a complete solution (see [1]), it's not reasonable to backport the new DRM stat series [2] to the stable branches.
Therefore, I believe the best solution is backporting this fix to the stable branches, which will fix the race and will keep adding the job back to the pending list (which will avoid most memory leaks).
[1] https://lore.kernel.org/dri-devel/bcc0ed477f8a6f3bb06665b1756bcb98fb7af871.c... [2] https://lore.kernel.org/dri-devel/20250530-sched-skip-reset-v2-0-c40a8d2d8da...
Best Regards, - Maíra --- drivers/gpu/drm/etnaviv/etnaviv_sched.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/etnaviv/etnaviv_sched.c b/drivers/gpu/drm/etnaviv/etnaviv_sched.c index 76a3a3e517d8..71e2e6b9d713 100644 --- a/drivers/gpu/drm/etnaviv/etnaviv_sched.c +++ b/drivers/gpu/drm/etnaviv/etnaviv_sched.c @@ -35,6 +35,7 @@ static enum drm_gpu_sched_stat etnaviv_sched_timedout_job(struct drm_sched_job *sched_job) { struct etnaviv_gem_submit *submit = to_etnaviv_submit(sched_job); + struct drm_gpu_scheduler *sched = sched_job->sched; struct etnaviv_gpu *gpu = submit->gpu; u32 dma_addr, primid = 0; int change; @@ -89,7 +90,9 @@ static enum drm_gpu_sched_stat etnaviv_sched_timedout_job(struct drm_sched_job return DRM_GPU_SCHED_STAT_NOMINAL;
out_no_timeout: - list_add(&sched_job->list, &sched_job->sched->pending_list); + spin_lock(&sched->job_list_lock); + list_add(&sched_job->list, &sched->pending_list); + spin_unlock(&sched->job_list_lock); return DRM_GPU_SCHED_STAT_NOMINAL; }
On Mon, 2025-06-02 at 10:22 -0300, Maíra Canal wrote:
Commit 704d3d60fec4 ("drm/etnaviv: don't block scheduler when GPU is still active") ensured that active jobs are returned to the pending list when extending the timeout. However, it didn't use the pending list's lock to manipulate the list, which causes a race condition as the scheduler's workqueues are running.
Hold the lock while manipulating the scheduler's pending list to prevent a race.
Cc: stable@vger.kernel.org Fixes: 704d3d60fec4 ("drm/etnaviv: don't block scheduler when GPU is still active")
Could also contain a "Closes: " with the link to the appropriate message from thread [1] from below.
You might also include "Reported-by: Philipp" since I technically first described the problem. But no hard feelings on that
Signed-off-by: Maíra Canal mcanal@igalia.com
Reviewed-by: Philipp Stanner phasta@kernel.org
Hi,
I'm proposing this workaround patch to address the race-condition caused by manipulating the pending list without using its lock. Although I understand this isn't a complete solution (see [1]), it's not reasonable to backport the new DRM stat series [2] to the stable branches.
Therefore, I believe the best solution is backporting this fix to the stable branches, which will fix the race and will keep adding the job back to the pending list (which will avoid most memory leaks).
[1] https://lore.kernel.org/dri-devel/bcc0ed477f8a6f3bb06665b1756bcb98fb7af871.c... [2] https://lore.kernel.org/dri-devel/20250530-sched-skip-reset-v2-0-c40a8d2d8da...
Best Regards,
- Maíra
drivers/gpu/drm/etnaviv/etnaviv_sched.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/etnaviv/etnaviv_sched.c b/drivers/gpu/drm/etnaviv/etnaviv_sched.c index 76a3a3e517d8..71e2e6b9d713 100644 --- a/drivers/gpu/drm/etnaviv/etnaviv_sched.c +++ b/drivers/gpu/drm/etnaviv/etnaviv_sched.c @@ -35,6 +35,7 @@ static enum drm_gpu_sched_stat etnaviv_sched_timedout_job(struct drm_sched_job *sched_job) { struct etnaviv_gem_submit *submit = to_etnaviv_submit(sched_job);
- struct drm_gpu_scheduler *sched = sched_job->sched;
struct etnaviv_gpu *gpu = submit->gpu; u32 dma_addr, primid = 0; int change; @@ -89,7 +90,9 @@ static enum drm_gpu_sched_stat etnaviv_sched_timedout_job(struct drm_sched_job return DRM_GPU_SCHED_STAT_NOMINAL; out_no_timeout:
- list_add(&sched_job->list, &sched_job->sched->pending_list);
- spin_lock(&sched->job_list_lock);
- list_add(&sched_job->list, &sched->pending_list);
- spin_unlock(&sched->job_list_lock);
return DRM_GPU_SCHED_STAT_NOMINAL; }
linux-stable-mirror@lists.linaro.org