My dma-fence lockdep annotations caught an inversion because we
allocate memory where we really shouldn't:
kmem_cache_alloc+0x2b/0x6d0
amdgpu_fence_emit+0x30/0x330 [amdgpu]
amdgpu_ib_schedule+0x306/0x550 [amdgpu]
amdgpu_job_run+0x10f/0x260 [amdgpu]
drm_sched_main+0x1b9/0x490 [gpu_sched]
kthread+0x12e/0x150
Trouble right now is that lockdep only validates against GFP_FS, which
would be good enough for shrinkers. But for mmu_notifiers we actually
need !GFP_ATOMIC, since they can be called from any page laundering,
even if GFP_NOFS or GFP_NOIO are set.
I guess we should improve the lockdep annotations for
fs_reclaim_acquire/release.
Ofc real fix is to properly preallocate this fence and stuff it into
the amdgpu job structure. But GFP_ATOMIC gets the lockdep splat out of
the way.
v2: Two more allocations in scheduler paths.
Frist one:
__kmalloc+0x58/0x720
amdgpu_vmid_grab+0x100/0xca0 [amdgpu]
amdgpu_job_dependency+0xf9/0x120 [amdgpu]
drm_sched_entity_pop_job+0x3f/0x440 [gpu_sched]
drm_sched_main+0xf9/0x490 [gpu_sched]
Second one:
kmem_cache_alloc+0x2b/0x6d0
amdgpu_sync_fence+0x7e/0x110 [amdgpu]
amdgpu_vmid_grab+0x86b/0xca0 [amdgpu]
amdgpu_job_dependency+0xf9/0x120 [amdgpu]
drm_sched_entity_pop_job+0x3f/0x440 [gpu_sched]
drm_sched_main+0xf9/0x490 [gpu_sched]
Cc: linux-media(a)vger.kernel.org
Cc: linaro-mm-sig(a)lists.linaro.org
Cc: linux-rdma(a)vger.kernel.org
Cc: amd-gfx(a)lists.freedesktop.org
Cc: intel-gfx(a)lists.freedesktop.org
Cc: Chris Wilson <chris(a)chris-wilson.co.uk>
Cc: Maarten Lankhorst <maarten.lankhorst(a)linux.intel.com>
Cc: Christian König <christian.koenig(a)amd.com>
Signed-off-by: Daniel Vetter <daniel.vetter(a)intel.com>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 2 +-
drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c | 2 +-
drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c | 2 +-
3 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
index d878fe7fee51..055b47241bb1 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
@@ -143,7 +143,7 @@ int amdgpu_fence_emit(struct amdgpu_ring *ring, struct dma_fence **f,
uint32_t seq;
int r;
- fence = kmem_cache_alloc(amdgpu_fence_slab, GFP_KERNEL);
+ fence = kmem_cache_alloc(amdgpu_fence_slab, GFP_ATOMIC);
if (fence == NULL)
return -ENOMEM;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c
index fe92dcd94d4a..fdcd6659f5ad 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c
@@ -208,7 +208,7 @@ static int amdgpu_vmid_grab_idle(struct amdgpu_vm *vm,
if (ring->vmid_wait && !dma_fence_is_signaled(ring->vmid_wait))
return amdgpu_sync_fence(sync, ring->vmid_wait, false);
- fences = kmalloc_array(sizeof(void *), id_mgr->num_ids, GFP_KERNEL);
+ fences = kmalloc_array(sizeof(void *), id_mgr->num_ids, GFP_ATOMIC);
if (!fences)
return -ENOMEM;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
index b87ca171986a..330476cc0c86 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
@@ -168,7 +168,7 @@ int amdgpu_sync_fence(struct amdgpu_sync *sync, struct dma_fence *f,
if (amdgpu_sync_add_later(sync, f, explicit))
return 0;
- e = kmem_cache_alloc(amdgpu_sync_slab, GFP_KERNEL);
+ e = kmem_cache_alloc(amdgpu_sync_slab, GFP_ATOMIC);
if (!e)
return -ENOMEM;
--
2.26.2
This is a bit tricky, since ->notifier_lock is held while calling
dma_fence_wait we must ensure that also the read side (i.e.
dma_fence_begin_signalling) is on the same side. If we mix this up
lockdep complaints, and that's again why we want to have these
annotations.
A nice side effect of this is that because of the fs_reclaim priming
for dma_fence_enable lockdep now automatically checks for us that
nothing in here allocates memory, without even running any userptr
workloads.
Cc: linux-media(a)vger.kernel.org
Cc: linaro-mm-sig(a)lists.linaro.org
Cc: linux-rdma(a)vger.kernel.org
Cc: amd-gfx(a)lists.freedesktop.org
Cc: intel-gfx(a)lists.freedesktop.org
Cc: Chris Wilson <chris(a)chris-wilson.co.uk>
Cc: Maarten Lankhorst <maarten.lankhorst(a)linux.intel.com>
Cc: Christian König <christian.koenig(a)amd.com>
Signed-off-by: Daniel Vetter <daniel.vetter(a)intel.com>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index a25fb59c127c..e109666aec14 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -1212,6 +1212,7 @@ static int amdgpu_cs_submit(struct amdgpu_cs_parser *p,
struct amdgpu_job *job;
uint64_t seq;
int r;
+ bool fence_cookie;
job = p->job;
p->job = NULL;
@@ -1226,6 +1227,8 @@ static int amdgpu_cs_submit(struct amdgpu_cs_parser *p,
*/
mutex_lock(&p->adev->notifier_lock);
+ fence_cookie = dma_fence_begin_signalling();
+
/* If userptr are invalidated after amdgpu_cs_parser_bos(), return
* -EAGAIN, drmIoctl in libdrm will restart the amdgpu_cs_ioctl.
*/
@@ -1262,12 +1265,14 @@ static int amdgpu_cs_submit(struct amdgpu_cs_parser *p,
amdgpu_vm_move_to_lru_tail(p->adev, &fpriv->vm);
ttm_eu_fence_buffer_objects(&p->ticket, &p->validated, p->fence);
+ dma_fence_end_signalling(fence_cookie);
mutex_unlock(&p->adev->notifier_lock);
return 0;
error_abort:
drm_sched_job_cleanup(&job->base);
+ dma_fence_end_signalling(fence_cookie);
mutex_unlock(&p->adev->notifier_lock);
error_unlock:
--
2.26.2
If the scheduler rt thread gets stuck on a mutex that we're holding
while waiting for gpu workloads to complete, we have a problem.
Add dma-fence annotations so that lockdep can check this for us.
I've tried to quite carefully review this, and I think it's at the
right spot. But obviosly no expert on drm scheduler.
Cc: linux-media(a)vger.kernel.org
Cc: linaro-mm-sig(a)lists.linaro.org
Cc: linux-rdma(a)vger.kernel.org
Cc: amd-gfx(a)lists.freedesktop.org
Cc: intel-gfx(a)lists.freedesktop.org
Cc: Chris Wilson <chris(a)chris-wilson.co.uk>
Cc: Maarten Lankhorst <maarten.lankhorst(a)linux.intel.com>
Cc: Christian König <christian.koenig(a)amd.com>
Signed-off-by: Daniel Vetter <daniel.vetter(a)intel.com>
---
drivers/gpu/drm/scheduler/sched_main.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
index 2f319102ae9f..06a736e506ad 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -763,9 +763,12 @@ static int drm_sched_main(void *param)
struct sched_param sparam = {.sched_priority = 1};
struct drm_gpu_scheduler *sched = (struct drm_gpu_scheduler *)param;
int r;
+ bool fence_cookie;
sched_setscheduler(current, SCHED_FIFO, &sparam);
+ fence_cookie = dma_fence_begin_signalling();
+
while (!kthread_should_stop()) {
struct drm_sched_entity *entity = NULL;
struct drm_sched_fence *s_fence;
@@ -823,6 +826,9 @@ static int drm_sched_main(void *param)
wake_up(&sched->job_scheduled);
}
+
+ dma_fence_end_signalling(fence_cookie);
+
return 0;
}
--
2.26.2
This is a bit disappointing since we need to split the annotations
over all the different parts.
I was considering just leaking the critical section into the
->atomic_commit_tail callback of each driver. But that would mean we
need to pass the fence_cookie into each driver (there's a total of 13
implementations of this hook right now), so bad flag day. And also a
bit leaky abstraction.
Hence just do it function-by-function.
Cc: linux-media(a)vger.kernel.org
Cc: linaro-mm-sig(a)lists.linaro.org
Cc: linux-rdma(a)vger.kernel.org
Cc: amd-gfx(a)lists.freedesktop.org
Cc: intel-gfx(a)lists.freedesktop.org
Cc: Chris Wilson <chris(a)chris-wilson.co.uk>
Cc: Maarten Lankhorst <maarten.lankhorst(a)linux.intel.com>
Cc: Christian König <christian.koenig(a)amd.com>
Signed-off-by: Daniel Vetter <daniel.vetter(a)intel.com>
---
drivers/gpu/drm/drm_atomic_helper.c | 16 ++++++++++++++++
1 file changed, 16 insertions(+)
diff --git a/drivers/gpu/drm/drm_atomic_helper.c b/drivers/gpu/drm/drm_atomic_helper.c
index 7cd7fe0d57b4..bfcc7857a9a1 100644
--- a/drivers/gpu/drm/drm_atomic_helper.c
+++ b/drivers/gpu/drm/drm_atomic_helper.c
@@ -1549,6 +1549,7 @@ EXPORT_SYMBOL(drm_atomic_helper_wait_for_flip_done);
void drm_atomic_helper_commit_tail(struct drm_atomic_state *old_state)
{
struct drm_device *dev = old_state->dev;
+ bool fence_cookie = dma_fence_begin_signalling();
drm_atomic_helper_commit_modeset_disables(dev, old_state);
@@ -1560,6 +1561,8 @@ void drm_atomic_helper_commit_tail(struct drm_atomic_state *old_state)
drm_atomic_helper_commit_hw_done(old_state);
+ dma_fence_end_signalling(fence_cookie);
+
drm_atomic_helper_wait_for_vblanks(dev, old_state);
drm_atomic_helper_cleanup_planes(dev, old_state);
@@ -1579,6 +1582,7 @@ EXPORT_SYMBOL(drm_atomic_helper_commit_tail);
void drm_atomic_helper_commit_tail_rpm(struct drm_atomic_state *old_state)
{
struct drm_device *dev = old_state->dev;
+ bool fence_cookie = dma_fence_begin_signalling();
drm_atomic_helper_commit_modeset_disables(dev, old_state);
@@ -1591,6 +1595,8 @@ void drm_atomic_helper_commit_tail_rpm(struct drm_atomic_state *old_state)
drm_atomic_helper_commit_hw_done(old_state);
+ dma_fence_end_signalling(fence_cookie);
+
drm_atomic_helper_wait_for_vblanks(dev, old_state);
drm_atomic_helper_cleanup_planes(dev, old_state);
@@ -1606,6 +1612,9 @@ static void commit_tail(struct drm_atomic_state *old_state)
ktime_t start;
s64 commit_time_ms;
unsigned int i, new_self_refresh_mask = 0;
+ bool fence_cookie;
+
+ fence_cookie = dma_fence_begin_signalling();
funcs = dev->mode_config.helper_private;
@@ -1634,6 +1643,8 @@ static void commit_tail(struct drm_atomic_state *old_state)
if (new_crtc_state->self_refresh_active)
new_self_refresh_mask |= BIT(i);
+ dma_fence_end_signalling(fence_cookie);
+
if (funcs && funcs->atomic_commit_tail)
funcs->atomic_commit_tail(old_state);
else
@@ -1789,6 +1800,7 @@ int drm_atomic_helper_commit(struct drm_device *dev,
bool nonblock)
{
int ret;
+ bool fence_cookie;
if (state->async_update) {
ret = drm_atomic_helper_prepare_planes(dev, state);
@@ -1811,6 +1823,8 @@ int drm_atomic_helper_commit(struct drm_device *dev,
if (ret)
return ret;
+ fence_cookie = dma_fence_begin_signalling();
+
if (!nonblock) {
ret = drm_atomic_helper_wait_for_fences(dev, state, true);
if (ret)
@@ -1848,6 +1862,7 @@ int drm_atomic_helper_commit(struct drm_device *dev,
*/
drm_atomic_state_get(state);
+ dma_fence_end_signalling(fence_cookie);
if (nonblock)
queue_work(system_unbound_wq, &state->commit_work);
else
@@ -1856,6 +1871,7 @@ int drm_atomic_helper_commit(struct drm_device *dev,
return 0;
err:
+ dma_fence_end_signalling(fence_cookie);
drm_atomic_helper_cleanup_planes(dev, state);
return ret;
}
--
2.26.2
This is rather overkill since currently all drivers call this from
hardirq (or at least timers). But maybe in the future we're going to
have thread irq handlers and what not, doesn't hurt to be prepared.
Plus this is an easy start for sprinkling these fence annotations into
shared code.
Cc: linux-media(a)vger.kernel.org
Cc: linaro-mm-sig(a)lists.linaro.org
Cc: linux-rdma(a)vger.kernel.org
Cc: amd-gfx(a)lists.freedesktop.org
Cc: intel-gfx(a)lists.freedesktop.org
Cc: Chris Wilson <chris(a)chris-wilson.co.uk>
Cc: Maarten Lankhorst <maarten.lankhorst(a)linux.intel.com>
Cc: Christian König <christian.koenig(a)amd.com>
Signed-off-by: Daniel Vetter <daniel.vetter(a)intel.com>
---
drivers/gpu/drm/drm_vblank.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/drm_vblank.c b/drivers/gpu/drm/drm_vblank.c
index 85e5f2db1608..93a5bba5f665 100644
--- a/drivers/gpu/drm/drm_vblank.c
+++ b/drivers/gpu/drm/drm_vblank.c
@@ -24,6 +24,7 @@
* OTHER DEALINGS IN THE SOFTWARE.
*/
+#include <linux/dma-fence.h>
#include <linux/export.h>
#include <linux/moduleparam.h>
@@ -1908,7 +1909,7 @@ bool drm_handle_vblank(struct drm_device *dev, unsigned int pipe)
{
struct drm_vblank_crtc *vblank = &dev->vblank[pipe];
unsigned long irqflags;
- bool disable_irq;
+ bool disable_irq, fence_cookie;
if (drm_WARN_ON_ONCE(dev, !drm_dev_has_vblank(dev)))
return false;
@@ -1916,6 +1917,8 @@ bool drm_handle_vblank(struct drm_device *dev, unsigned int pipe)
if (drm_WARN_ON(dev, pipe >= dev->num_crtcs))
return false;
+ fence_cookie = dma_fence_begin_signalling();
+
spin_lock_irqsave(&dev->event_lock, irqflags);
/* Need timestamp lock to prevent concurrent execution with
@@ -1928,6 +1931,7 @@ bool drm_handle_vblank(struct drm_device *dev, unsigned int pipe)
if (!vblank->enabled) {
spin_unlock(&dev->vblank_time_lock);
spin_unlock_irqrestore(&dev->event_lock, irqflags);
+ dma_fence_end_signalling(fence_cookie);
return false;
}
@@ -1953,6 +1957,8 @@ bool drm_handle_vblank(struct drm_device *dev, unsigned int pipe)
if (disable_irq)
vblank_disable_fn(&vblank->disable_timer);
+ dma_fence_end_signalling(fence_cookie);
+
return true;
}
EXPORT_SYMBOL(drm_handle_vblank);
--
2.26.2
Hi all,
I've dragged my feet for years on this, hoping that cross-release lockdep
would do this for us, but well that never really happened unfortunately.
So here we are.
Cc'ed quite a pile of people since this is about the cross-driver contract
around dma_fences. Which is heavily used for dma_buf, and I'm hearing more
noises that rdma folks are looking into this, hence also on cc.
There's a bunch of different parts to this RFC:
- The annotations itself, in the 2nd patch after the prep patch to add
might_sleep annotations. Commit message has all the motivation for what
kind of deadlocks I want to catch, best you just read it.
Since lockdep doesn't understand cross-release natively we need to
cobble something together using rwlocks and a few more tricks, but from
the test rollout in a few places in drm/vkms, amdgpu & i915 I think what
I have now seems to actually work. Downside is that we have to
explicitly annotate all code involved in eventual dma_fence signalling.
- Second important part is locking down the current dma-fence cross-driver
contract, using lockdep priming like we already do for dma_resv_lock.
I've just started with my own take on what we probably need to make the
current code work (-ish), but both amdgpu and i915 have issues with
that. So this needs some careful discussions, and also some thought on
how we land it all eventually to not break lockdep completely for
everyone.
The important patch for that is "dma-fence: prime lockdep annotations"
plus of course the various annotations patches and driver hacks to
highlight some of the issues caught.
Note that depending upon what exactly we end up deciding we might need
to improve the annotations for fs_reclaim_acquire/release - for
dma_fence_wait in mmu notifiers we can only allow GFP_NOWAIT (afaiui),
and currently fs_reclaim_acquire/release only has a lockdep class for
__GFP_FS only, we'd need to add another one for __GFP_DIRECT_RECLAIM in
general maybe.
- Finally there's clearly some gaps in the current dma_fence driver
interfaces: Amdgpu's hang recovery is essentially impossible to fix
as-is - it needs to reset the display state and you can't get at modeset
locks from tdr without deadlock potential. i915 has an internal trick
(but it stops working once we involve real cross-driver fences) for this
issues, but then for i915 modeset reset is only needed on very ancient
gen2/3. Modern hw is a lot more reasonable.
I'm kinda hoping that the annotations and priming for basic command
submission and atomic modeset paths could be merged soonish, while we
the tdr side clearly needs a pile more work to get going. But since we
have to explicitly annotate all code paths anyway we can hide bugs in
e.g. tdr code by simply not yet annotating those functions.
I'm trying to lay out at least one idea for solving the tdr issue in the
patch titled "drm/scheduler: use dma-fence annotations in tdr work".
Finally, once we have some agreement on where we're going with all this,
we also need some documentation. Currently that's missing because I don't
want to re-edit the text all the time while we still figure out the
details of the exact cross-driver semantics.
My goal here is that with this we can lock down the cross-driver contract
for the last bit of the dma_buf/resv/fence story and make sure this stops
being such a wobbly thing where everyone just does whatever they feel
like.
Ideas, thoughts, reviews, testing (with specific annotations for that
driver) on other drivers very much welcome.
Cheers, Daniel
Cc: linux-media(a)vger.kernel.org
Cc: linaro-mm-sig(a)lists.linaro.org
Cc: linux-rdma(a)vger.kernel.org
Cc: amd-gfx(a)lists.freedesktop.org
Cc: intel-gfx(a)lists.freedesktop.org
Cc: Chris Wilson <chris(a)chris-wilson.co.uk>
Cc: Maarten Lankhorst <maarten.lankhorst(a)linux.intel.com>
Cc: Christian König <christian.koenig(a)amd.com>
Daniel Vetter (17):
dma-fence: add might_sleep annotation to _wait()
dma-fence: basic lockdep annotations
dma-fence: prime lockdep annotations
drm/vkms: Annotate vblank timer
drm/vblank: Annotate with dma-fence signalling section
drm/atomic-helper: Add dma-fence annotations
drm/amdgpu: add dma-fence annotations to atomic commit path
drm/scheduler: use dma-fence annotations in main thread
drm/amdgpu: use dma-fence annotations in cs_submit()
drm/amdgpu: s/GFP_KERNEL/GFP_ATOMIC in scheduler code
drm/amdgpu: DC also loves to allocate stuff where it shouldn't
drm/amdgpu/dc: Stop dma_resv_lock inversion in commit_tail
drm/scheduler: use dma-fence annotations in tdr work
drm/amdgpu: use dma-fence annotations for gpu reset code
Revert "drm/amdgpu: add fbdev suspend/resume on gpu reset"
drm/amdgpu: gpu recovery does full modesets
drm/i915: Annotate dma_fence_work
drivers/dma-buf/dma-fence.c | 56 +++++++++++++++++++
drivers/dma-buf/dma-resv.c | 1 +
drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 5 ++
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 22 ++++++--
drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 2 +-
drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c | 2 +-
drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c | 2 +-
drivers/gpu/drm/amd/amdgpu/atom.c | 2 +-
.../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 18 +++++-
drivers/gpu/drm/amd/display/dc/core/dc.c | 4 +-
drivers/gpu/drm/drm_atomic_helper.c | 16 ++++++
drivers/gpu/drm/drm_vblank.c | 8 ++-
drivers/gpu/drm/i915/i915_sw_fence_work.c | 3 +
drivers/gpu/drm/scheduler/sched_main.c | 11 ++++
drivers/gpu/drm/vkms/vkms_crtc.c | 8 ++-
include/linux/dma-fence.h | 13 +++++
16 files changed, 160 insertions(+), 13 deletions(-)
--
2.26.2
On Thu, May 28, 2020 at 11:54 PM Luben Tuikov <luben.tuikov(a)amd.com> wrote:
>
> On 2020-05-12 4:59 a.m., Daniel Vetter wrote:
> > Design is similar to the lockdep annotations for workers, but with
> > some twists:
> >
> > - We use a read-lock for the execution/worker/completion side, so that
> > this explicit annotation can be more liberally sprinkled around.
> > With read locks lockdep isn't going to complain if the read-side
> > isn't nested the same way under all circumstances, so ABBA deadlocks
> > are ok. Which they are, since this is an annotation only.
> >
> > - We're using non-recursive lockdep read lock mode, since in recursive
> > read lock mode lockdep does not catch read side hazards. And we
> > _very_ much want read side hazards to be caught. For full details of
> > this limitation see
> >
> > commit e91498589746065e3ae95d9a00b068e525eec34f
> > Author: Peter Zijlstra <peterz(a)infradead.org>
> > Date: Wed Aug 23 13:13:11 2017 +0200
> >
> > locking/lockdep/selftests: Add mixed read-write ABBA tests
> >
> > - To allow nesting of the read-side explicit annotations we explicitly
> > keep track of the nesting. lock_is_held() allows us to do that.
> >
> > - The wait-side annotation is a write lock, and entirely done within
> > dma_fence_wait() for everyone by default.
> >
> > - To be able to freely annotate helper functions I want to make it ok
> > to call dma_fence_begin/end_signalling from soft/hardirq context.
> > First attempt was using the hardirq locking context for the write
> > side in lockdep, but this forces all normal spinlocks nested within
> > dma_fence_begin/end_signalling to be spinlocks. That bollocks.
> >
> > The approach now is to simple check in_atomic(), and for these cases
> > entirely rely on the might_sleep() check in dma_fence_wait(). That
> > will catch any wrong nesting against spinlocks from soft/hardirq
> > contexts.
> >
> > The idea here is that every code path that's critical for eventually
> > signalling a dma_fence should be annotated with
> > dma_fence_begin/end_signalling. The annotation ideally starts right
> > after a dma_fence is published (added to a dma_resv, exposed as a
> > sync_file fd, attached to a drm_syncobj fd, or anything else that
> > makes the dma_fence visible to other kernel threads), up to and
> > including the dma_fence_wait(). Examples are irq handlers, the
> > scheduler rt threads, the tail of execbuf (after the corresponding
> > fences are visible), any workers that end up signalling dma_fences and
> > really anything else. Not annotated should be code paths that only
> > complete fences opportunistically as the gpu progresses, like e.g.
> > shrinker/eviction code.
> >
> > The main class of deadlocks this is supposed to catch are:
> >
> > Thread A:
> >
> > mutex_lock(A);
> > mutex_unlock(A);
> >
> > dma_fence_signal();
> >
> > Thread B:
> >
> > mutex_lock(A);
> > dma_fence_wait();
> > mutex_unlock(A);
> >
> > Thread B is blocked on A signalling the fence, but A never gets around
> > to that because it cannot acquire the lock A.
> >
> > Note that dma_fence_wait() is allowed to be nested within
> > dma_fence_begin/end_signalling sections. To allow this to happen the
> > read lock needs to be upgraded to a write lock, which means that any
> > other lock is acquired between the dma_fence_begin_signalling() call and
> > the call to dma_fence_wait(), and still held, this will result in an
> > immediate lockdep complaint. The only other option would be to not
> > annotate such calls, defeating the point. Therefore these annotations
> > cannot be sprinkled over the code entirely mindless to avoid false
> > positives.
> >
> > v2: handle soft/hardirq ctx better against write side and dont forget
> > EXPORT_SYMBOL, drivers can't use this otherwise.
> >
> > Cc: linux-media(a)vger.kernel.org
> > Cc: linaro-mm-sig(a)lists.linaro.org
> > Cc: linux-rdma(a)vger.kernel.org
> > Cc: amd-gfx(a)lists.freedesktop.org
> > Cc: intel-gfx(a)lists.freedesktop.org
> > Cc: Chris Wilson <chris(a)chris-wilson.co.uk>
> > Cc: Maarten Lankhorst <maarten.lankhorst(a)linux.intel.com>
> > Cc: Christian König <christian.koenig(a)amd.com>
> > Signed-off-by: Daniel Vetter <daniel.vetter(a)intel.com>
> > ---
> > drivers/dma-buf/dma-fence.c | 53 +++++++++++++++++++++++++++++++++++++
> > include/linux/dma-fence.h | 12 +++++++++
> > 2 files changed, 65 insertions(+)
> >
> > diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
> > index 6802125349fb..d5c0fd2efc70 100644
> > --- a/drivers/dma-buf/dma-fence.c
> > +++ b/drivers/dma-buf/dma-fence.c
> > @@ -110,6 +110,52 @@ u64 dma_fence_context_alloc(unsigned num)
> > }
> > EXPORT_SYMBOL(dma_fence_context_alloc);
> >
> > +#ifdef CONFIG_LOCKDEP
> > +struct lockdep_map dma_fence_lockdep_map = {
> > + .name = "dma_fence_map"
> > +};
> > +
> > +bool dma_fence_begin_signalling(void)
> > +{
> > + /* explicitly nesting ... */
> > + if (lock_is_held_type(&dma_fence_lockdep_map, 1))
> > + return true;
> > +
> > + /* rely on might_sleep check for soft/hardirq locks */
> > + if (in_atomic())
> > + return true;
> > +
> > + /* ... and non-recursive readlock */
> > + lock_acquire(&dma_fence_lockdep_map, 0, 0, 1, 1, NULL, _RET_IP_);
> > +
> > + return false;
> > +}
> > +EXPORT_SYMBOL(dma_fence_begin_signalling);
>
> Hi Daniel,
>
> This is great work and could help a lot.
>
> If you invert the result of dma_fence_begin_signalling()
> then it would naturally mean "locked", i.e. whether we need to
> later release "dma_fence_lockedep_map". Then,
> in dma_fence_end_signalling(), you can call the "cookie"
> argument "locked" and simply do:
>
> void dma_fence_end_signalling(bool locked)
> {
> if (locked)
> lock_release(&dma_fence_lockdep_map, _RET_IP_);
> }
> EXPORT_SYMBOL(dma_fence_end_signalling);
>
> It'll be more natural to understand as well.
It's intentionally called cookie so callers don't start doing funny
stuff with it. The thing is, after begin_signalling you are _always_
in the locked state. It's just that because of limitations with
lockdep we need to play a few tricks, and in some cases we do not take
the lockdep map. There's 2 cases:
- lockdep map already taken - we want recursive readlock semantics for
this, but lockdep does not correctly check recursive read locks. Hence
we only use readlock, and make sure we do not actually nest upon
ourselves with this explicit check.
- when we're in atomic sections - lockdep gets pissed at us if we take
the read lock in hard/softirq sections because of hard/softirq ctx
mismatch (lockdep thinks it's a real lock, but we don't treat it as
one). Simplest fix was to rely on the might_sleep check in patch 1
(already merged)
The commit message mentions this already a bit, but I'll try to
explain this implementation detail tersely in the kerneldoc too in the
next round.
Thanks, Daniel
>
> Regards,
> Luben
>
> > +
> > +void dma_fence_end_signalling(bool cookie)
> > +{
> > + if (cookie)
> > + return;
> > +
> > + lock_release(&dma_fence_lockdep_map, _RET_IP_);
> > +}
> > +EXPORT_SYMBOL(dma_fence_end_signalling);
> > +
> > +void __dma_fence_might_wait(void)
> > +{
> > + bool tmp;
> > +
> > + tmp = lock_is_held_type(&dma_fence_lockdep_map, 1);
> > + if (tmp)
> > + lock_release(&dma_fence_lockdep_map, _THIS_IP_);
> > + lock_map_acquire(&dma_fence_lockdep_map);
> > + lock_map_release(&dma_fence_lockdep_map);
> > + if (tmp)
> > + lock_acquire(&dma_fence_lockdep_map, 0, 0, 1, 1, NULL, _THIS_IP_);
> > +}
> > +#endif
> > +
> > +
> > /**
> > * dma_fence_signal_locked - signal completion of a fence
> > * @fence: the fence to signal
> > @@ -170,14 +216,19 @@ int dma_fence_signal(struct dma_fence *fence)
> > {
> > unsigned long flags;
> > int ret;
> > + bool tmp;
> >
> > if (!fence)
> > return -EINVAL;
> >
> > + tmp = dma_fence_begin_signalling();
> > +
> > spin_lock_irqsave(fence->lock, flags);
> > ret = dma_fence_signal_locked(fence);
> > spin_unlock_irqrestore(fence->lock, flags);
> >
> > + dma_fence_end_signalling(tmp);
> > +
> > return ret;
> > }
> > EXPORT_SYMBOL(dma_fence_signal);
> > @@ -211,6 +262,8 @@ dma_fence_wait_timeout(struct dma_fence *fence, bool intr, signed long timeout)
> > if (timeout > 0)
> > might_sleep();
> >
> > + __dma_fence_might_wait();
> > +
> > trace_dma_fence_wait_start(fence);
> > if (fence->ops->wait)
> > ret = fence->ops->wait(fence, intr, timeout);
> > diff --git a/include/linux/dma-fence.h b/include/linux/dma-fence.h
> > index 3347c54f3a87..3f288f7db2ef 100644
> > --- a/include/linux/dma-fence.h
> > +++ b/include/linux/dma-fence.h
> > @@ -357,6 +357,18 @@ dma_fence_get_rcu_safe(struct dma_fence __rcu **fencep)
> > } while (1);
> > }
> >
> > +#ifdef CONFIG_LOCKDEP
> > +bool dma_fence_begin_signalling(void);
> > +void dma_fence_end_signalling(bool cookie);
> > +#else
> > +static inline bool dma_fence_begin_signalling(void)
> > +{
> > + return true;
> > +}
> > +static inline void dma_fence_end_signalling(bool cookie) {}
> > +static inline void __dma_fence_might_wait(void) {}
> > +#endif
> > +
> > int dma_fence_signal(struct dma_fence *fence);
> > int dma_fence_signal_locked(struct dma_fence *fence);
> > signed long dma_fence_default_wait(struct dma_fence *fence,
> >
>
--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
On Thu, May 28, 2020 at 3:37 PM Thomas Hellström (Intel)
<thomas_os(a)shipmail.org> wrote:
>
> On 2020-05-12 10:59, Daniel Vetter wrote:
> > Design is similar to the lockdep annotations for workers, but with
> > some twists:
> >
> > - We use a read-lock for the execution/worker/completion side, so that
> > this explicit annotation can be more liberally sprinkled around.
> > With read locks lockdep isn't going to complain if the read-side
> > isn't nested the same way under all circumstances, so ABBA deadlocks
> > are ok. Which they are, since this is an annotation only.
> >
> > - We're using non-recursive lockdep read lock mode, since in recursive
> > read lock mode lockdep does not catch read side hazards. And we
> > _very_ much want read side hazards to be caught. For full details of
> > this limitation see
> >
> > commit e91498589746065e3ae95d9a00b068e525eec34f
> > Author: Peter Zijlstra <peterz(a)infradead.org>
> > Date: Wed Aug 23 13:13:11 2017 +0200
> >
> > locking/lockdep/selftests: Add mixed read-write ABBA tests
> >
> > - To allow nesting of the read-side explicit annotations we explicitly
> > keep track of the nesting. lock_is_held() allows us to do that.
> >
> > - The wait-side annotation is a write lock, and entirely done within
> > dma_fence_wait() for everyone by default.
> >
> > - To be able to freely annotate helper functions I want to make it ok
> > to call dma_fence_begin/end_signalling from soft/hardirq context.
> > First attempt was using the hardirq locking context for the write
> > side in lockdep, but this forces all normal spinlocks nested within
> > dma_fence_begin/end_signalling to be spinlocks. That bollocks.
> >
> > The approach now is to simple check in_atomic(), and for these cases
> > entirely rely on the might_sleep() check in dma_fence_wait(). That
> > will catch any wrong nesting against spinlocks from soft/hardirq
> > contexts.
> >
> > The idea here is that every code path that's critical for eventually
> > signalling a dma_fence should be annotated with
> > dma_fence_begin/end_signalling. The annotation ideally starts right
> > after a dma_fence is published (added to a dma_resv, exposed as a
> > sync_file fd, attached to a drm_syncobj fd, or anything else that
> > makes the dma_fence visible to other kernel threads), up to and
> > including the dma_fence_wait(). Examples are irq handlers, the
> > scheduler rt threads, the tail of execbuf (after the corresponding
> > fences are visible), any workers that end up signalling dma_fences and
> > really anything else. Not annotated should be code paths that only
> > complete fences opportunistically as the gpu progresses, like e.g.
> > shrinker/eviction code.
> >
> > The main class of deadlocks this is supposed to catch are:
> >
> > Thread A:
> >
> > mutex_lock(A);
> > mutex_unlock(A);
> >
> > dma_fence_signal();
> >
> > Thread B:
> >
> > mutex_lock(A);
> > dma_fence_wait();
> > mutex_unlock(A);
> >
> > Thread B is blocked on A signalling the fence, but A never gets around
> > to that because it cannot acquire the lock A.
> >
> > Note that dma_fence_wait() is allowed to be nested within
> > dma_fence_begin/end_signalling sections. To allow this to happen the
> > read lock needs to be upgraded to a write lock, which means that any
> > other lock is acquired between the dma_fence_begin_signalling() call and
> > the call to dma_fence_wait(), and still held, this will result in an
> > immediate lockdep complaint. The only other option would be to not
> > annotate such calls, defeating the point. Therefore these annotations
> > cannot be sprinkled over the code entirely mindless to avoid false
> > positives.
> >
> > v2: handle soft/hardirq ctx better against write side and dont forget
> > EXPORT_SYMBOL, drivers can't use this otherwise.
> >
> > Cc: linux-media(a)vger.kernel.org
> > Cc: linaro-mm-sig(a)lists.linaro.org
> > Cc: linux-rdma(a)vger.kernel.org
> > Cc: amd-gfx(a)lists.freedesktop.org
> > Cc: intel-gfx(a)lists.freedesktop.org
> > Cc: Chris Wilson <chris(a)chris-wilson.co.uk>
> > Cc: Maarten Lankhorst <maarten.lankhorst(a)linux.intel.com>
> > Cc: Christian König <christian.koenig(a)amd.com>
> > Signed-off-by: Daniel Vetter <daniel.vetter(a)intel.com>
>
> LGTM. Perhaps some in-code documentation on how to use the new functions
> are called.
See cover letter, that's going to be done for next round. For this one
here I just wanted to showcase a bit how it's used in a few different
places, mostly selected to get as much feedback from across different
drivers. Hence e.g. annotating drm/scheduler.
> Otherwise for patch 2 and 3,
>
> Reviewed-by: Thomas Hellstrom <thomas.hellstrom(a)intel.com>
I think I'll just cc you for the next round with docs, so you can make
sure it looks ok :-)
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch