Hi everyone,
dma_fences have ever lived under the tyranny dictated by the module lifetime of their issuer, leading to crashes should anybody still holding a reference to a dma_fence when the module of the issuer was unloaded.
The basic problem is that when buffer are shared between drivers dma_fence objects can leak into external drivers and stay there even after they are signaled. The dma_resv object for example only lazy releases dma_fences.
So what happens is that when the module who originally created the dma_fence unloads the dma_fence_ops function table becomes unavailable as well and so any attempt to release the fence crashes the system.
Previously various approaches have been discussed, including changing the locking semantics of the dma_fence callbacks (by me) as well as using the drm scheduler as intermediate layer (by Sima) to disconnect dma_fences from their actual users, but none of them are actually solving all problems.
Tvrtko did some really nice prerequisite work by protecting the returned strings of the dma_fence_ops by RCU. This way dma_fence creators where able to just wait for an RCU grace period after fence signaling before they could be save to free those data structures.
Now this patch set here goes a step further and protects the whole dma_fence_ops structure by RCU, so that after the fence signals the pointer to the dma_fence_ops is set to NULL when there is no wait nor release callback given. All functionality which use the dma_fence_ops reference are put inside an RCU critical section, except for the deprecated issuer specific wait and of course the optional release callback.
Additional to the RCU changes the lock protecting the dma_fence state previously had to be allocated external. This set here now changes the functionality to make that external lock optional and allows dma_fences to use an inline lock and be self contained.
v4:
Rebases the whole set on upstream changes, especially the cleanup from Philip in patch "drm/amdgpu: independence for the amdkfd_fence!".
Adding two patches which brings the DMA-fence self tests up to date. The first selftest changes removes the mock_wait and so actually starts testing the default behavior instead of some hacky implementation in the test. This one should probably go upstream independent of this set. The second drops the mock_fence as well and tests the new RCU and inline spinlock functionality.
Especially the first patch still needs a Reviewed-by, apart from that I think I've addressed all review comments.
The plan is to push the core DMA-buf changes to drm-misc-next and then the driver specific changes through the driver channels as approprite.
Please review and comment, Christian.