[Linaro-mm-sig] Re: [PATCH 1/4] dma-buf/fence: give some reasonable maximum signaling timeout

26 Nov 2025


      On Wed, 2025-11-26 at 16:03 +0100, Christian König wrote:
...
On 11/26/25 13:37, Philipp Stanner wrote:
...
On Wed, 2025-11-26 at 13:31 +0100, Christian König wrote:
...
[…]
...
...
...
Well the question is how do you detect *reliable* that there is
still forward progress?
My understanding is that that's impossible since the internals of
command submissions are only really understood by userspace, who
submits them.
Right, but we can still try to do our best in the kernel to mitigate
the situation.
I think for now amdgpu will implement something like checking if the
HW still makes progress after a timeout but only a limited number of
re-tries until we say that's it and reset anyway.
Oh oh, isn't that our dear hang_limit? :)
We agree that you can never really now whether userspace just submitted
a while(true) job, don't we? Even if some GPU register still indicates
"progress".
...
...
I think the long-term solution can only be fully fledged GPU
scheduling
with preemption. That's why we don't need such a timeout mechanism
for
userspace processes: the scheduler simply interrupts and lets
someone
else run.
Yeah absolutely.
...
My hope would be that in the mid-term future we'd get firmware
rings
that can be preempted through a firmware call for all major
hardware.
Then a huge share of our problems would disappear.
At least on AMD HW pre-emption is actually horrible unreliable as
well.
Do you mean new GPUs with firmware scheduling, or what is "HW pre-
emption"?
With firmware interfaces, my hope would be that you could simply tell
stop_running_ring(nr_of_ring)
// time slice for someone else
start_running_ring(nr_of_ring)
Thereby getting real scheduling and all that. And eliminating many
other problems we know well from drm/sched.
...
Userspace basically needs to co-operate and provide a buffer where
the state on a pre-emption is saved into.
That's uncool. With CPU preemption all that is done automatically via
the processe's pages.
P.

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

[Linaro-mm-sig] Re: [PATCH 1/4] dma-buf/fence: give some reasonable maximum signaling timeout