On Wed, Sep 30, 2020 at 11:39:06AM +0200, Michel Dänzer wrote:
> On 2020-03-17 10:21 p.m., Jason Ekstrand wrote:
> > Explicit synchronization is the future. At least, that seems to be what
> > most userspace APIs are agreeing on at this point. However, most of our
> > Linux APIs (both userspace and kernel UAPI) are currently built around
> > implicit synchronization with dma-buf. While work is ongoing to change
> > many of the userspace APIs and protocols to an explicit synchronization
> > model, switching over piecemeal is difficult due to the number of
> > potential components involved. On the kernel side, many drivers use
> > dma-buf including GPU (3D/compute), display, v4l, and others. In
> > userspace, we have X11, several Wayland compositors, 3D drivers, compute
> > drivers (OpenCL etc.), media encode/decode, and the list goes on.
> >
> > This patch provides a path forward by allowing userspace to manually
> > manage the fences attached to a dma-buf. Alternatively, one can think
> > of this as making dma-buf's implicit synchronization simply a carrier
> > for an explicit fence. This is accomplished by adding two IOCTLs to
> > dma-buf for importing and exporting a sync file to/from the dma-buf.
> > This way a userspace component which is uses explicit synchronization,
> > such as a Vulkan driver, can manually set the write fence on a buffer
> > before handing it off to an implicitly synchronized component such as a
> > Wayland compositor or video encoder. In this way, each of the different
> > components can be upgraded to an explicit synchronization model one at a
> > time as long as the userspace pieces connecting them are aware of it and
> > import/export fences at the right times.
> >
> > There is a potential race condition with this API if userspace is not
> > careful. A typical use case for implicit synchronization is to wait for
> > the dma-buf to be ready, use it, and then signal it for some other
> > component. Because a sync_file cannot be created until it is guaranteed
> > to complete in finite time, userspace can only signal the dma-buf after
> > it has already submitted the work which uses it to the kernel and has
> > received a sync_file back. There is no way to atomically submit a
> > wait-use-signal operation. This is not, however, really a problem with
> > this API so much as it is a problem with explicit synchronization
> > itself. The way this is typically handled is to have very explicit
> > ownership transfer points in the API or protocol which ensure that only
> > one component is using it at any given time. Both X11 (via the PRESENT
> > extension) and Wayland provide such ownership transfer points via
> > explicit present and idle messages.
> >
> > The decision was intentionally made in this patch to make the import and
> > export operations IOCTLs on the dma-buf itself rather than as a DRM
> > IOCTL. This makes it the import/export operation universal across all
> > components which use dma-buf including GPU, display, v4l, and others.
> > It also means that a userspace component can do the import/export
> > without access to the DRM fd which may be tricky to get in cases where
> > the client communicates with DRM via a userspace API such as OpenGL or
> > Vulkan. At a future date we may choose to add direct import/export APIs
> > to components such as drm_syncobj to avoid allocating a file descriptor
> > and going through two ioctls. However, that seems to be something of a
> > micro-optimization as import/export operations are likely to happen at a
> > rate of a few per frame of rendered or decoded video.
> >
> > v2 (Jason Ekstrand):
> > - Use a wrapper dma_fence_array of all fences including the new one
> > when importing an exclusive fence.
> >
> > v3 (Jason Ekstrand):
> > - Lock around setting shared fences as well as exclusive
> > - Mark SIGNAL_SYNC_FILE as a read-write ioctl.
> > - Initialize ret to 0 in dma_buf_wait_sync_file
> >
> > v4 (Jason Ekstrand):
> > - Use the new dma_resv_get_singleton helper
> >
> > v5 (Jason Ekstrand):
> > - Rename the IOCTLs to import/export rather than wait/signal
> > - Drop the WRITE flag and always get/set the exclusive fence
> >
> > Signed-off-by: Jason Ekstrand <jason(a)jlekstrand.net>
>
> What's the status of this? DMA_BUF_IOCTL_EXPORT_SYNC_FILE would be useful
> for Wayland compositors to wait for client buffers to become ready without
> being prone to getting delayed by later HW access to them, so it would be
> nice to merge that at least (if DMA_BUF_IOCTL_IMPORT_SYNC_FILE is still
> controversial).
I think the missing bits are just the usual stuff
- igt testcases
- userspace using the new ioctls
- review of the entire pile
I don't think there's any fundamental objections aside from "no one ever
pushed this over the finish line".
Cheers, Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
On Tue, Oct 27, 2020 at 12:52:30PM +0000, Parav Pandit wrote:
>
> > From: hch(a)lst.de <hch(a)lst.de>
> > Sent: Tuesday, October 27, 2020 1:41 PM
> >
> > On Mon, Oct 26, 2020 at 05:23:48AM +0000, Parav Pandit wrote:
> > > Hi Christoph,
> > >
> > > > From: Jakub Kicinski <kuba(a)kernel.org>
> > > > Sent: Saturday, October 24, 2020 11:45 PM
> > > >
> > > > CC: rdma, looks like rdma from the stack trace
> > > >
> > > > On Fri, 23 Oct 2020 20:07:17 -0700 syzbot wrote:
> > > > > syzbot has found a reproducer for the following issue on:
> > > > >
> > > > > HEAD commit: 3cb12d27 Merge tag 'net-5.10-rc1' of
> > git://git.kernel.org/..
> > >
> > > In [1] you mentioned that dma_mask should not be set for dma_virt_ops.
> > > So patch [2] removed it.
> > >
> > > But check to validate the dma mask for all dma_ops was added in [3].
> > >
> > > What is the right way? Did I misunderstood your comment about
> > dma_mask in [1]?
> >
> > No, I did not say we don't need the mask. I said copying over the various
> > dma-related fields from the parent is bogus.
> >
> > I think rxe (and ther other drivers/infiniband/sw drivers) need a simple
> > dma_coerce_mask_and_coherent and nothing else.
>
> I see. Does below fix make sense?
> Is DMA_MASK_NONE correct?
DMA_MASK_NONE is gone in 5.10. I think you want DMA_BIT_MASK(64).
That isn't actually correct for 32-bit platforms, but good enough.
On Wed, Oct 28, 2020 at 09:44:15AM +0100, Boris Brezillon wrote:
> On Tue, 27 Oct 2020 22:49:22 +0100
> Daniel Vetter <daniel.vetter(a)ffwll.ch> wrote:
>
> > When we forward an mmap to the dma_buf exporter, they get to own
> > everything. Unfortunately drm_gem_mmap_obj() overwrote
> > vma->vm_private_data after the driver callback, wreaking the
> > exporter complete. This was noticed because vb2_common_vm_close blew
> > up on mali gpu with panfrost after commit 26d3ac3cb04d
> > ("drm/shmem-helpers: Redirect mmap for imported dma-buf").
> >
> > Unfortunately drm_gem_mmap_obj also acquires a surplus reference that
> > we need to drop in shmem helpers, which is a bit of a mislayer
> > situation. Maybe the entire dma_buf_mmap forwarding should be pulled
> > into core gem code.
> >
> > Note that the only two other drivers which forward mmap in their own
> > code (etnaviv and exynos) get this somewhat right by overwriting the
> > gem mmap code. But they seem to still have the leak. This might be a
> > good excuse to move these drivers over to shmem helpers completely.
> >
> > Note to stable team: There's a trivial context conflict with
> > d693def4fd1c ("drm: Remove obsolete GEM and PRIME callbacks from
> > struct drm_driver"). I'm assuming it's clear where to put the first
> > hunk, otherwise I can send a 5.9 version of this.
> >
> > Cc: Christian König <christian.koenig(a)amd.com>
> > Cc: Sumit Semwal <sumit.semwal(a)linaro.org>
> > Cc: Lucas Stach <l.stach(a)pengutronix.de>
> > Cc: Russell King <linux+etnaviv(a)armlinux.org.uk>
> > Cc: Christian Gmeiner <christian.gmeiner(a)gmail.com>
> > Cc: Inki Dae <inki.dae(a)samsung.com>
> > Cc: Joonyoung Shim <jy0922.shim(a)samsung.com>
> > Cc: Seung-Woo Kim <sw0312.kim(a)samsung.com>
> > Cc: Kyungmin Park <kyungmin.park(a)samsung.com>
> > Fixes: 26d3ac3cb04d ("drm/shmem-helpers: Redirect mmap for imported dma-buf")
> > Cc: Boris Brezillon <boris.brezillon(a)collabora.com>
>
> Reviewed-by: Boris Brezillon <boris.brezillon(a)collabora.com>
Patch pushed to drm-misc-fixes, thanks for taking a look.
-Daniel
>
> > Cc: Thomas Zimmermann <tzimmermann(a)suse.de>
> > Cc: Gerd Hoffmann <kraxel(a)redhat.com>
> > Cc: Rob Herring <robh(a)kernel.org>
> > Cc: dri-devel(a)lists.freedesktop.org
> > Cc: linux-media(a)vger.kernel.org
> > Cc: linaro-mm-sig(a)lists.linaro.org
> > Cc: <stable(a)vger.kernel.org> # v5.9+
> > Reported-and-tested-by: piotr.oniszczuk(a)gmail.com
> > Cc: piotr.oniszczuk(a)gmail.com
> > Signed-off-by: Daniel Vetter <daniel.vetter(a)intel.com>
> > ---
> > drivers/gpu/drm/drm_gem.c | 4 ++--
> > drivers/gpu/drm/drm_gem_shmem_helper.c | 7 ++++++-
> > 2 files changed, 8 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c
> > index 1da67d34e55d..d586068f5509 100644
> > --- a/drivers/gpu/drm/drm_gem.c
> > +++ b/drivers/gpu/drm/drm_gem.c
> > @@ -1076,6 +1076,8 @@ int drm_gem_mmap_obj(struct drm_gem_object *obj, unsigned long obj_size,
> > */
> > drm_gem_object_get(obj);
> >
> > + vma->vm_private_data = obj;
> > +
> > if (obj->funcs->mmap) {
> > ret = obj->funcs->mmap(obj, vma);
> > if (ret) {
> > @@ -1096,8 +1098,6 @@ int drm_gem_mmap_obj(struct drm_gem_object *obj, unsigned long obj_size,
> > vma->vm_page_prot = pgprot_decrypted(vma->vm_page_prot);
> > }
> >
> > - vma->vm_private_data = obj;
> > -
> > return 0;
> > }
> > EXPORT_SYMBOL(drm_gem_mmap_obj);
> > diff --git a/drivers/gpu/drm/drm_gem_shmem_helper.c b/drivers/gpu/drm/drm_gem_shmem_helper.c
> > index fb11df7aced5..8233bda4692f 100644
> > --- a/drivers/gpu/drm/drm_gem_shmem_helper.c
> > +++ b/drivers/gpu/drm/drm_gem_shmem_helper.c
> > @@ -598,8 +598,13 @@ int drm_gem_shmem_mmap(struct drm_gem_object *obj, struct vm_area_struct *vma)
> > /* Remove the fake offset */
> > vma->vm_pgoff -= drm_vma_node_start(&obj->vma_node);
> >
> > - if (obj->import_attach)
> > + if (obj->import_attach) {
> > + /* Drop the reference drm_gem_mmap_obj() acquired.*/
> > + drm_gem_object_put(obj);
> > + vma->vm_private_data = NULL;
> > +
> > return dma_buf_mmap(obj->dma_buf, vma, 0);
> > + }
> >
> > shmem = to_drm_gem_shmem_obj(obj);
> >
>
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
When we forward an mmap to the dma_buf exporter, they get to own
everything. Unfortunately drm_gem_mmap_obj() overwrote
vma->vm_private_data after the driver callback, wreaking the
exporter complete. This was noticed because vb2_common_vm_close blew
up on mali gpu with panfrost after commit 26d3ac3cb04d
("drm/shmem-helpers: Redirect mmap for imported dma-buf").
Unfortunately drm_gem_mmap_obj also acquires a surplus reference that
we need to drop in shmem helpers, which is a bit of a mislayer
situation. Maybe the entire dma_buf_mmap forwarding should be pulled
into core gem code.
Note that the only two other drivers which forward mmap in their own
code (etnaviv and exynos) get this somewhat right by overwriting the
gem mmap code. But they seem to still have the leak. This might be a
good excuse to move these drivers over to shmem helpers completely.
Note to stable team: There's a trivial context conflict with
d693def4fd1c ("drm: Remove obsolete GEM and PRIME callbacks from
struct drm_driver"). I'm assuming it's clear where to put the first
hunk, otherwise I can send a 5.9 version of this.
Cc: Christian König <christian.koenig(a)amd.com>
Cc: Sumit Semwal <sumit.semwal(a)linaro.org>
Cc: Lucas Stach <l.stach(a)pengutronix.de>
Cc: Russell King <linux+etnaviv(a)armlinux.org.uk>
Cc: Christian Gmeiner <christian.gmeiner(a)gmail.com>
Cc: Inki Dae <inki.dae(a)samsung.com>
Cc: Joonyoung Shim <jy0922.shim(a)samsung.com>
Cc: Seung-Woo Kim <sw0312.kim(a)samsung.com>
Cc: Kyungmin Park <kyungmin.park(a)samsung.com>
Fixes: 26d3ac3cb04d ("drm/shmem-helpers: Redirect mmap for imported dma-buf")
Cc: Boris Brezillon <boris.brezillon(a)collabora.com>
Cc: Thomas Zimmermann <tzimmermann(a)suse.de>
Cc: Gerd Hoffmann <kraxel(a)redhat.com>
Cc: Rob Herring <robh(a)kernel.org>
Cc: dri-devel(a)lists.freedesktop.org
Cc: linux-media(a)vger.kernel.org
Cc: linaro-mm-sig(a)lists.linaro.org
Cc: <stable(a)vger.kernel.org> # v5.9+
Reported-and-tested-by: piotr.oniszczuk(a)gmail.com
Cc: piotr.oniszczuk(a)gmail.com
Signed-off-by: Daniel Vetter <daniel.vetter(a)intel.com>
---
drivers/gpu/drm/drm_gem.c | 4 ++--
drivers/gpu/drm/drm_gem_shmem_helper.c | 7 ++++++-
2 files changed, 8 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c
index 1da67d34e55d..d586068f5509 100644
--- a/drivers/gpu/drm/drm_gem.c
+++ b/drivers/gpu/drm/drm_gem.c
@@ -1076,6 +1076,8 @@ int drm_gem_mmap_obj(struct drm_gem_object *obj, unsigned long obj_size,
*/
drm_gem_object_get(obj);
+ vma->vm_private_data = obj;
+
if (obj->funcs->mmap) {
ret = obj->funcs->mmap(obj, vma);
if (ret) {
@@ -1096,8 +1098,6 @@ int drm_gem_mmap_obj(struct drm_gem_object *obj, unsigned long obj_size,
vma->vm_page_prot = pgprot_decrypted(vma->vm_page_prot);
}
- vma->vm_private_data = obj;
-
return 0;
}
EXPORT_SYMBOL(drm_gem_mmap_obj);
diff --git a/drivers/gpu/drm/drm_gem_shmem_helper.c b/drivers/gpu/drm/drm_gem_shmem_helper.c
index fb11df7aced5..8233bda4692f 100644
--- a/drivers/gpu/drm/drm_gem_shmem_helper.c
+++ b/drivers/gpu/drm/drm_gem_shmem_helper.c
@@ -598,8 +598,13 @@ int drm_gem_shmem_mmap(struct drm_gem_object *obj, struct vm_area_struct *vma)
/* Remove the fake offset */
vma->vm_pgoff -= drm_vma_node_start(&obj->vma_node);
- if (obj->import_attach)
+ if (obj->import_attach) {
+ /* Drop the reference drm_gem_mmap_obj() acquired.*/
+ drm_gem_object_put(obj);
+ vma->vm_private_data = NULL;
+
return dma_buf_mmap(obj->dma_buf, vma, 0);
+ }
shmem = to_drm_gem_shmem_obj(obj);
--
2.28.0
On Mon, Oct 26, 2020 at 05:23:48AM +0000, Parav Pandit wrote:
> Hi Christoph,
>
> > From: Jakub Kicinski <kuba(a)kernel.org>
> > Sent: Saturday, October 24, 2020 11:45 PM
> >
> > CC: rdma, looks like rdma from the stack trace
> >
> > On Fri, 23 Oct 2020 20:07:17 -0700 syzbot wrote:
> > > syzbot has found a reproducer for the following issue on:
> > >
> > > HEAD commit: 3cb12d27 Merge tag 'net-5.10-rc1' of git://git.kernel.org/..
>
> In [1] you mentioned that dma_mask should not be set for dma_virt_ops.
> So patch [2] removed it.
>
> But check to validate the dma mask for all dma_ops was added in [3].
>
> What is the right way? Did I misunderstood your comment about dma_mask in [1]?
No, I did not say we don't need the mask. I said copying over the
various dma-related fields from the parent is bogus.
I think rxe (and ther other drivers/infiniband/sw drivers) need a simple
dma_coerce_mask_and_coherent and nothing else.
From: Rob Clark <robdclark(a)chromium.org>
This doesn't remove *all* the struct_mutex, but it covers the worst
of it, ie. shrinker/madvise/free/retire. The submit path still uses
struct_mutex, but it still needs *something* serialize a portion of
the submit path, and lock_stat mostly just shows the lock contention
there being with other submits. And there are a few other bits of
struct_mutex usage in less critical paths (debugfs, etc). But this
seems like a reasonable step in the right direction.
v2: teach lockdep about shrinker locking patters (danvet) and
convert to obj->resv locking (danvet)
v3: fix get_vaddr locking for legacy userspace (relocs), devcoredump,
and rd/hangrd
v4: couple minor review comments (krh), fix deadlock with imported
dma-buf's (ie. from v4l2, etc)
Rob Clark (23):
drm/msm: Fix a couple incorrect usages of get_vaddr_active()
drm/msm/gem: Add obj->lock wrappers
drm/msm/gem: Rename internal get_iova_locked helper
drm/msm/gem: Move prototypes to msm_gem.h
drm/msm/gem: Add some _locked() helpers
drm/msm/gem: Move locking in shrinker path
drm/msm/submit: Move copy_from_user ahead of locking bos
drm/msm: Do rpm get sooner in the submit path
drm/msm/gem: Switch over to obj->resv for locking
drm/msm: Use correct drm_gem_object_put() in fail case
drm/msm: Drop chatty trace
drm/msm: Move update_fences()
drm/msm: Add priv->mm_lock to protect active/inactive lists
drm/msm: Document and rename preempt_lock
drm/msm: Protect ring->submits with it's own lock
drm/msm: Refcount submits
drm/msm: Remove obj->gpu
drm/msm: Drop struct_mutex from the retire path
drm/msm: Drop struct_mutex in free_object() path
drm/msm: Remove msm_gem_free_work
drm/msm: Drop struct_mutex in madvise path
drm/msm: Drop struct_mutex in shrinker path
drm/msm: Don't implicit-sync if only a single ring
drivers/gpu/drm/msm/adreno/a5xx_gpu.c | 6 +-
drivers/gpu/drm/msm/adreno/a5xx_preempt.c | 12 +-
drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 6 +-
drivers/gpu/drm/msm/disp/mdp4/mdp4_crtc.c | 1 +
drivers/gpu/drm/msm/disp/mdp5/mdp5_crtc.c | 1 +
drivers/gpu/drm/msm/dsi/dsi_host.c | 1 +
drivers/gpu/drm/msm/msm_debugfs.c | 7 +
drivers/gpu/drm/msm/msm_drv.c | 21 +-
drivers/gpu/drm/msm/msm_drv.h | 73 +-----
drivers/gpu/drm/msm/msm_fbdev.c | 1 +
drivers/gpu/drm/msm/msm_gem.c | 271 +++++++++++-----------
drivers/gpu/drm/msm/msm_gem.h | 133 +++++++++--
drivers/gpu/drm/msm/msm_gem_shrinker.c | 81 ++-----
drivers/gpu/drm/msm/msm_gem_submit.c | 164 ++++++++-----
drivers/gpu/drm/msm/msm_gpu.c | 110 +++++----
drivers/gpu/drm/msm/msm_gpu.h | 5 +-
drivers/gpu/drm/msm/msm_rd.c | 2 +-
drivers/gpu/drm/msm/msm_ringbuffer.c | 3 +-
drivers/gpu/drm/msm/msm_ringbuffer.h | 13 +-
19 files changed, 506 insertions(+), 405 deletions(-)
--
2.26.2
i915 does tons of allocations from this worker, which lockdep catches.
Also generic infrastructure like this with big potential for how
dma_fence or other cross driver contracts work, really should be
reviewed on dri-devel. Implementing custom wheels for everything
within the driver is a classic case of "platform problem" [1]. Which in
upstream we really shouldn't have.
Since there's no quick way to solve these splats (dma_fence_work is
used a bunch in basic buffer management and command submission) like
for amdgpu, I'm giving up at this point here. Annotating i915
scheduler and gpu reset could would be interesting, but since lockdep
is one-shot we can't see what surprises would lurk there.
1: https://lwn.net/Articles/443531/
Cc: linux-media(a)vger.kernel.org
Cc: linaro-mm-sig(a)lists.linaro.org
Cc: linux-rdma(a)vger.kernel.org
Cc: amd-gfx(a)lists.freedesktop.org
Cc: intel-gfx(a)lists.freedesktop.org
Cc: Chris Wilson <chris(a)chris-wilson.co.uk>
Cc: Maarten Lankhorst <maarten.lankhorst(a)linux.intel.com>
Cc: Christian König <christian.koenig(a)amd.com>
Signed-off-by: Daniel Vetter <daniel.vetter(a)intel.com>
---
drivers/gpu/drm/i915/i915_sw_fence_work.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/gpu/drm/i915/i915_sw_fence_work.c b/drivers/gpu/drm/i915/i915_sw_fence_work.c
index a3a81bb8f2c3..5b74acadaef5 100644
--- a/drivers/gpu/drm/i915/i915_sw_fence_work.c
+++ b/drivers/gpu/drm/i915/i915_sw_fence_work.c
@@ -17,12 +17,15 @@ static void fence_work(struct work_struct *work)
{
struct dma_fence_work *f = container_of(work, typeof(*f), work);
int err;
+ bool fence_cookie;
+ fence_cookie = dma_fence_begin_signalling();
err = f->ops->work(f);
if (err)
dma_fence_set_error(&f->dma, err);
fence_complete(f);
+ dma_fence_end_signalling(fence_cookie);
dma_fence_put(&f->dma);
}
--
2.28.0