On Thu, Jul 09, 2020 at 08:32:41AM +0100, Daniel Stone wrote:
Hi,
On Wed, 8 Jul 2020 at 16:13, Daniel Vetter daniel.vetter@ffwll.ch wrote:
On Wed, Jul 8, 2020 at 4:57 PM Christian König christian.koenig@amd.com wrote:
Could we merge this controlled by a separate config option?
This way we could have the checks upstream without having to fix all the stuff before we do this?
Since it's fully opt-in annotations nothing blows up if we don't merge any annotations. So we could start merging the first 3 patches. After that the fun starts ...
My rough idea was that first I'd try to tackle display, thus far there's 2 actual issues in drivers:
- amdgpu has some dma_resv_lock in commit_tail, plus a kmalloc. I
think those should be fairly easy to fix (I'd try a stab at them even)
- vmwgfx has a full on locking inversion with dma_resv_lock in
commit_tail, and that one is functional. Not just reading something which we can safely assume to be invariant anyway (like the tmz flag for amdgpu, or whatever it was).
I've done a pile more annotations patches for other atomic drivers now, so hopefully that flushes out any remaining offenders here. Since some of the annotations are in helper code worst case we might need a dev->mode_config.broken_atomic_commit flag to disable them. At least for now I have 0 plans to merge any of these while there's known unsolved issues. Maybe if some drivers take forever to get fixed we can then apply some duct-tape for the atomic helper annotation patch. Instead of a flag we can also copypasta the atomic_commit_tail hook, leaving the annotations out and adding a huge warning about that.
How about an opt-in drm_driver DRIVER_DEADLOCK_HAPPY flag? At first this could just disable the annotations and nothing else, but as we see the annotations gaining real-world testing and maturity, we could eventually make it taint the kernel.
You can do that pretty much per-driver, since the annotations are pretty much per-driver. No annotations in your code, no lockdep splat. Only if there's some dma_fence_begin/end_signalling() calls is there even the chance of a problem.
E.g. this round has the i915 patch dropped and *traraaaa* intel-gfx-ci is happy (or well at least a lot happier, there's some noise in there that's probably not from my stuff).
So I guess if amd wants this, we could do an DRM_AMDGPU_MOAR_LOCKDEP Kconfig or similar. I haven't tested, but I think as long as we don't merge any of the amdgpu specific patches, there's no splat in amdgpu. So with that I think that's plenty enough opt-in for each driver. The only problem is a bit shared helper code like atomic helpers and drm scheduler. There we might need some opt-out (I don't think merging makes sense when most of the users are still broken). -Daniel