Consider a fence design where signal() consumes self. Now consider this:
impl FenceCb for MyCallback { fn called(&mut self) { // Can't move the fence out, so we have to put an Option<T> just to be able // to move. if let Some(f) = self.some_fence.take() { f.signal(); } }This used to be the case when our version of the job queue used the "proxy fence" design:
// Callback on the hw fence impl FenceCb for MyCallback { fn called(&mut self) { if let Some(f) = self.submit_fence.take() { f.signal(); }I'm pretty sure lockdep won't like it anyway, because this is nested locking of the same lock class. For such proxies, we'll need to teach lockdep about the nesting like has been recently done on dma_fence_array & co. But I'm digressing.
Yeah, but this is more about resource transfer in general, not this pattern specifically.
I agree that this has issues, and yes, lockdep complained back then :)
The thing is, there's so many aspects that could go wrong because of the context this callback is called in. Nested locking is one of them, the fact we can't sleep is another. And with rust it's even worse, because of the implicit drops that will happen when you take ownership of resources (taking sleeping locks to remove resources from a dataset for instance).
So, by passing self by value to the ::callback(), you're basically telling users "hey, BTW, don't forget to defer the drop to some workqueue if you think it's not atomic-safe". And how can users know that the thing they're about to drop can be dropped in atomic context?
Can’t we create a token type that signals we’re in atomic context? If we pass this token type as an argument, perhaps lockdep can check it for us?
Perhaps
enum SomeFancyName { Atomic(AtomicToken) NotAtomic, }
signaled(self, atomic: AtomicToken) { .. }
They basically have to audit the ::drop() of all the resources they embed in their type implementing FenceCb. Not only that, but they also have to design the thing so the deferral of this ::drop() doesn't allocate, because, obviously, allocating in atomic context is tricky/fallible. AFAIK, none of this can be spot at compile-time (I remember Gary/Danilo mentioning that we could teach the klint about some of these rules). This would leave us with runtime checks like might_sleep(), but most of the C putters (xxx_put(object)) don't have might_sleep() in the path where the decref doesn't lead to a refcnt=0 situation.
TLDR; Call this PTSD if you want, but this is the sort of bugs I struggled with on the C side, and I can predict that the exact same will happen in rust drivers if we expose the FenceCb as it is designed here and we don't have a way to check the soundness of the FenceCb implementations at compile time.
The other option (the one I've been advocating for from the start), is to not let drivers implement FenceCb (make it private), but instead have a bunch of implementations that we know are safe. Here's a list of implementations that I think would unblock most of the drivers use cases:
- wakeup a thread
- complete a completion object
- schedule a WorkItem
- schedule a kthread_worker (once we get a proper rust abstraction for
that)
This can also work too, I guess.
It doesn't mean we can't have optimized FenceCb implementations that do a lot more in the callback() path instead of deferring to a workqueue/thread, but at least those would have to be implemented in dma_fence.rs, and the dma_fence.rs maintainers can then carefully audit the code as part of the review process, which we know is not really the case when changes touch drivers code only.
FWIW, I think the FenceProxy design you were describing falls into this "must be carefully audited" bucket, and should be implemented in dma_fence.rs.
}
Although this is not the case anymore, since we phased out this design given Christian's recent work. Still, we should ideally not require Option<T> here in general just to make resource transfer possible.I see. OTOH, don't we need to make this inner data movable if we want to cancel the FenceCb before the fence is signaled anyway? And that's most certainly a case we have in the teardown path.
Can you expand a bit on what you mean here?
Never mind, I was confusing two different iterations of the code here. I thought the Option<T> you were mentioning was in FenceCbRegistration<T>, with some explicit ::cancel() function that would return Option<T> so the user can get its resources back when it cancels the registration, and also know whether the callback was called or not. But this is all gone now, and all we can do is drop the registration, which will automatically drop the inner T.