[Linaro-mm-sig] Re: [PATCH 1/4] dma-buf/fence: give some reasonable maximum signaling timeout

26 Nov 2025


      Am Mittwoch, dem 26.11.2025 um 16:44 +0100 schrieb Philipp Stanner:
...
On Wed, 2025-11-26 at 16:03 +0100, Christian König wrote:
...
On 11/26/25 13:37, Philipp Stanner wrote:
...
On Wed, 2025-11-26 at 13:31 +0100, Christian König wrote:
...
[…]
...
...
...
Well the question is how do you detect *reliable* that there is
still forward progress?
My understanding is that that's impossible since the internals of
command submissions are only really understood by userspace, who
submits them.
Right, but we can still try to do our best in the kernel to mitigate
the situation.
I think for now amdgpu will implement something like checking if the
HW still makes progress after a timeout but only a limited number of
re-tries until we say that's it and reset anyway.
Oh oh, isn't that our dear hang_limit? :)
Not really. The hang limit is the limit on how many times a hanging
submit might be retried.
Limiting the number of timeout extensions is more of a safety net
against a workloads which might appear to make progress to the kernel
driver but in reality are stuck. After all, the kernel driver can only
have limited knowledge of the GPU state and any progress check will
have limited precision with false positives/negatives being a part of
reality we have to deal with.
...
We agree that you can never really now whether userspace just submitted
a while(true) job, don't we? Even if some GPU register still indicates
"progress".
Yea, this is really hardware dependent on what you can read at
runtime.
For etnaviv we define "progress" as the command frontend moving towards
the end of the command buffer. As a single draw call in valid workloads
can blow through our timeout we also use debug registers to look at the
current primitive ID within a draw call.
If userspace submits a workload that requires more than 500ms per
primitive to finish we consider this an invalid workload and go through
the reset/recovery motions.
Regards,
Lucas

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

[Linaro-mm-sig] Re: [PATCH 1/4] dma-buf/fence: give some reasonable maximum signaling timeout