RFC
I don't have a good name for this yet and I did not spend
any time on documentataion (for that reason)
We create fences (out fences) as part of operations execution, which
are short-lived objects, we want to release all memory after operation
execution is completed or when operation gets cancelled/deleted via
ioctl().
This creates a bit of a problem. DMA fences are refcounted objects and
exporter never knows when importer imports a fence or puts its refcount,
so exporter never knows when fence will be destoyed, which should not
be a problem for refcounted objects, but here comes the twist...
operation A - creates and exports out fence X
... user-space imports fence X
operation A - finishes execution, signals fence X
kfree operation A, put dma_fence
DMA fences are designed to borrow spinlock that DMA fences use to
protect struct dma_fence members:
struct dma_fence {
spinlock_t *lock;
const struct dma_fence_ops *ops;
.....
};
void dma_fence_init(struct dma_fence *fence,
const struct dma_fence_ops *ops,
spinlock_t *lock,
u64 context,
u64 seqno);
So the `lock` should have at least same lifespan as the DMA fence
that borrows it, which is impossible to guarantee in our case. When
we kfree operation A struct we also kfree ->lock that operation
lends to DMA fence, which outlives operation A (depending on what
fence importers do and when they drop imported fence refcount).
This patch adds a new memnber to struct dma_fence: __lock_inplace.
Which is a lock that DMA fence will use to protect its own data when
it cannot reliably borrow a lock from the outside object.
I also had a patch that puts inplace and borrowed locks to an unnamed
uninon and adds one more dma_fence_flag_bits to distinguish between
fences with borrowed and inplace locks
struct dma_fence {
uninon {
spinlock_t *lock;
spinlock_t __lock_inplace;
};
...
};
And then instead of locking/unlocking ->lock directly we would use
dma_fence_lock_irqsave()/dma_fence_unlock_irqrestore() macros which
would check fence flags and either use borrowed lock or inplace lock.
But after seeing how owten drivers directly access fence ->lock I
decided to scratch that approach and just add extra spinlock member.
Not-Yet-Signed-off-by: Sergey Senozhatsky <senozhatsky(a)chromium.org>
---
drivers/dma-buf/dma-fence.c | 10 ++++++++++
include/linux/dma-fence.h | 6 ++++++
2 files changed, 16 insertions(+)
diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
index 066400ed8841..7ae40b8adb73 100644
--- a/drivers/dma-buf/dma-fence.c
+++ b/drivers/dma-buf/dma-fence.c
@@ -958,3 +958,13 @@ dma_fence_init(struct dma_fence *fence, const struct dma_fence_ops *ops,
trace_dma_fence_init(fence);
}
EXPORT_SYMBOL(dma_fence_init);
+
+void dma_fence_inplace_lock_init(struct dma_fence *fence,
+ const struct dma_fence_ops *ops,
+ u64 context, u64 seqno)
+{
+ spin_lock_init(&fence->__lock_inplace);
+
+ dma_fence_init(fence, ops, &fence->__lock_inplace, context, seqno);
+}
+EXPORT_SYMBOL(dma_fence_inplace_lock_init);
diff --git a/include/linux/dma-fence.h b/include/linux/dma-fence.h
index 1ea691753bd3..6b15a0d2eccf 100644
--- a/include/linux/dma-fence.h
+++ b/include/linux/dma-fence.h
@@ -64,6 +64,8 @@ struct dma_fence_cb;
*/
struct dma_fence {
spinlock_t *lock;
+ spinlock_t __lock_inplace;
+
const struct dma_fence_ops *ops;
/*
* We clear the callback list on kref_put so that by the time we
@@ -262,6 +264,10 @@ struct dma_fence_ops {
void dma_fence_init(struct dma_fence *fence, const struct dma_fence_ops *ops,
spinlock_t *lock, u64 context, u64 seqno);
+void dma_fence_inplace_lock_init(struct dma_fence *fence,
+ const struct dma_fence_ops *ops,
+ u64 context, u64 seqno);
+
void dma_fence_release(struct kref *kref);
void dma_fence_free(struct dma_fence *fence);
void dma_fence_describe(struct dma_fence *fence, struct seq_file *seq);
--
2.36.1.124.g0e6072fb45-goog
Hi,
On Wed, 25 May 2022 at 15:07, Simon Ser <contact(a)emersion.fr> wrote:
> On Wednesday, May 25th, 2022 at 15:51, Daniel Vetter <daniel(a)ffwll.ch> wrote:
> > > You can add that to the list of reasons why compositors need to stop
> > > using buffers with unsignaled fences. There's plenty of other reasons
> > > there already (the big one being that otherwise slow clients can slow
> > > down the compositor, even if the compositor uses a high priority context
> > > and the HW supports preemption).
> >
> >
> > Yeah that's tbh another reason why I think we shouldn't do umf as a
> > transparent thing - compositors need to get better anyway, so we might as
> > well take this as a chance to do this right.
>
> As a compositor dev, I agree -- we should definitely be smarter about
> this. Note, it would help a lot to have a good way to integrate the
> waits into a poll(2) event loop.
The same holds for Weston. We're currently working through a bunch of
internal infrastructure to be able to handle this. Mutter (aka GNOME)
is also really well-placed to be able to do this.
Having pollable waits would be really useful, but I don't think it's
essential. In my strawman I'm just waking up at the usual
just-before-repaint point and checking; if it doesn't make it for this
frame then we'll wait for the next frame. If someone submits buffers
which take 4 repaint periods to clear then we'll have 3 'unnecessary'
wakeups, but given the GPU is already slammed then it's not an
efficiency problem I don't think. (I don't know if all the other
compositor people share this view.)
Cheers,
Daniel
On Wed, May 25, 2022 at 03:28:41PM +0200, Michel Dänzer wrote:
> On 2022-05-25 15:05, Daniel Vetter wrote:
> > On Tue, May 17, 2022 at 12:28:17PM +0200, Christian König wrote:
> >> Am 09.05.22 um 16:10 schrieb Daniel Vetter:
> >>> On Mon, May 09, 2022 at 08:56:41AM +0200, Christian König wrote:
> >>>> Am 04.05.22 um 12:08 schrieb Daniel Vetter:
> >>>>>
> >>>>> If the goal is specifically atomic kms, then there's an entire can of
> >>>>> worms there that I really don't want to think about, but it exists: We
> >>>>> have dma_fence as out-fences from atomic commit, and that's already
> >>>>> massively broken since most drivers allocate some memory or at least take
> >>>>> locks which can allocate memory in their commit path. Like i2c. Putting a
> >>>>> userspace memory fence as in-fence in there makes that problem
> >>>>> substantially worse, since at least in theory you're just not allowed to
> >>>>> might_faul in atomic_commit_tail.
> >>>> Yes, that's unfortunately one of the goals as well and yes I completely
> >>>> agree on the can of worms. But I think I've solved that.
> >>>>
> >>>> What I do in the patch set is to enforce that the out fence is an user fence
> >>>> when the driver supports user in fences as well.
> >>>>
> >>>> Since user fences doesn't have the memory management dependency drivers can
> >>>> actually allocate memory or call I2C functions which takes locks which have
> >>>> memory allocation dependencies.
> >>>>
> >>>> Or do I miss some other reason why you can't fault or allocate memory in
> >>>> atomic_commit_tail? At least lockdep seems to be happy about that now.
> >>> The problem is a bit that this breaks the uapi already. At least if the
> >>> goal is to have this all be perfectly transparent for userspace - as you
> >>> as you have multi-gpu setups going on at least.
> >>
> >> Question here is why do you think there is an UAPI break? We currently wait
> >> in a work item already, so where exactly is the problem?
> >
> > It's a bit washy, but dma_fence and hence implicit sync is supposed to
> > finish in finite time. umf just doesn't.
> >
> > Ofc in reality you can still flood your compositor and they're not very
> > robust, but with umf it's trivial to just hang your compositor forever and
> > nothing happens.
>
> You can add that to the list of reasons why compositors need to stop
> using buffers with unsignaled fences. There's plenty of other reasons
> there already (the big one being that otherwise slow clients can slow
> down the compositor, even if the compositor uses a high priority context
> and the HW supports preemption).
Yeah that's tbh another reason why I think we shouldn't do umf as a
transparent thing - compositors need to get better anyway, so we might as
well take this as a chance to do this right.
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch