On Fri, May 03, 2024 at 06:54:22PM +0700, Bui Quang Minh wrote:
> [...]
> Root cause:
> AFAIK, eventpoll (epoll) does not increase the registered file's reference.
> To ensure the safety, when the registered file is deallocated in __fput,
> eventpoll_release is called to unregister the file from epoll. When calling
> poll on epoll, epoll will loop through registered files and call vfs_poll on
> these files. In the file's poll, file is guaranteed to be alive, however, as
> epoll does not increase the registered file's reference, the file may be
> dying
> and it's not safe the get the file for later use. And dma_buf_poll violates
> this. In the dma_buf_poll, it tries to get_file to use in the callback. This
> leads to a race where the dmabuf->file can be fput twice.
>
> Here is the race occurs in the above proof-of-concept
>
> close(dmabuf->file)
> __fput_sync (f_count == 1, last ref)
> f_count-- (f_count == 0 now)
> __fput
> epoll_wait
> vfs_poll(dmabuf->file)
> get_file(dmabuf->file)(f_count == 1)
> eventpoll_release
> dmabuf->file deallocation
> fput(dmabuf->file) (f_count == 1)
> f_count--
> dmabuf->file deallocation
>
> I am not familiar with the dma_buf so I don't know the proper fix for the
> issue. About the rule that don't get the file for later use in poll callback
> of
> file, I wonder if it is there when only select/poll exist or just after
> epoll
> appears.
>
> I hope the analysis helps us to fix the issue.
Thanks for doing this analysis! I suspect at least a start of a fix
would be this:
diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
index 8fe5aa67b167..15e8f74ee0f2 100644
--- a/drivers/dma-buf/dma-buf.c
+++ b/drivers/dma-buf/dma-buf.c
@@ -267,9 +267,8 @@ static __poll_t dma_buf_poll(struct file *file, poll_table *poll)
if (events & EPOLLOUT) {
/* Paired with fput in dma_buf_poll_cb */
- get_file(dmabuf->file);
-
- if (!dma_buf_poll_add_cb(resv, true, dcb))
+ if (!atomic_long_inc_not_zero(&dmabuf->file) &&
+ !dma_buf_poll_add_cb(resv, true, dcb))
/* No callback queued, wake up any other waiters */
dma_buf_poll_cb(NULL, &dcb->cb);
else
@@ -290,9 +289,8 @@ static __poll_t dma_buf_poll(struct file *file, poll_table *poll)
if (events & EPOLLIN) {
/* Paired with fput in dma_buf_poll_cb */
- get_file(dmabuf->file);
-
- if (!dma_buf_poll_add_cb(resv, false, dcb))
+ if (!atomic_long_inc_not_zero(&dmabuf->file) &&
+ !dma_buf_poll_add_cb(resv, false, dcb))
/* No callback queued, wake up any other waiters */
dma_buf_poll_cb(NULL, &dcb->cb);
else
But this ends up leaving "active" non-zero, and at close time it runs
into:
BUG_ON(dmabuf->cb_in.active || dmabuf->cb_out.active);
But the bottom line is that get_file() is unsafe to use in some places,
one of which appears to be in the poll handler. There are maybe some
other fragile places too, like in drivers/gpu/drm/vmwgfx/ttm_object.c:
static bool __must_check get_dma_buf_unless_doomed(struct dma_buf *dmabuf)
{
return atomic_long_inc_not_zero(&dmabuf->file->f_count) != 0L;
}
Which I also note involves a dmabuf...
Due to this issue I've proposed fixing get_file() to detect pathological states:
https://lore.kernel.org/lkml/20240502222252.work.690-kees@kernel.org/
But that has run into some push-back. I'm hoping that seeing this epoll
example will help illustrate what needs fixing a little better.
I think the best current proposal is to just WARN sooner instead of a
full refcount_t implementation:
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 8dfd53b52744..e09107d0a3d6 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1040,7 +1040,8 @@ struct file_handle {
static inline struct file *get_file(struct file *f)
{
- atomic_long_inc(&f->f_count);
+ long prior = atomic_long_fetch_inc_relaxed(&f->f_count);
+ WARN_ONCE(!prior, "struct file::f_count incremented from zero; use-after-free condition present!\n");
return f;
}
What's the right way to deal with the dmabuf situation? (And I suspect
it applies to get_dma_buf_unless_doomed() as well...)
-Kees
--
Kees Cook
On Fri, May 03, 2024 at 01:35:09PM -0600, Jens Axboe wrote:
> On 5/3/24 1:22 PM, Kees Cook wrote:
> > On Fri, May 03, 2024 at 12:49:11PM -0600, Jens Axboe wrote:
> >> On 5/3/24 12:26 PM, Kees Cook wrote:
> >>> Thanks for doing this analysis! I suspect at least a start of a fix
> >>> would be this:
> >>>
> >>> diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
> >>> index 8fe5aa67b167..15e8f74ee0f2 100644
> >>> --- a/drivers/dma-buf/dma-buf.c
> >>> +++ b/drivers/dma-buf/dma-buf.c
> >>> @@ -267,9 +267,8 @@ static __poll_t dma_buf_poll(struct file *file, poll_table *poll)
> >>>
> >>> if (events & EPOLLOUT) {
> >>> /* Paired with fput in dma_buf_poll_cb */
> >>> - get_file(dmabuf->file);
> >>> -
> >>> - if (!dma_buf_poll_add_cb(resv, true, dcb))
> >>> + if (!atomic_long_inc_not_zero(&dmabuf->file) &&
> >>> + !dma_buf_poll_add_cb(resv, true, dcb))
> >>> /* No callback queued, wake up any other waiters */
> >>
> >> Don't think this is sane at all. I'm assuming you meant:
> >>
> >> atomic_long_inc_not_zero(&dmabuf->file->f_count);
> >
> > Oops, yes, sorry. I was typed from memory instead of copy/paste.
>
> Figured :-)
>
> >> but won't fly as you're not under RCU in the first place. And what
> >> protects it from being long gone before you attempt this anyway? This is
> >> sane way to attempt to fix it, it's completely opposite of what sane ref
> >> handling should look like.
> >>
> >> Not sure what the best fix is here, seems like dma-buf should hold an
> >> actual reference to the file upfront rather than just stash a pointer
> >> and then later _hope_ that it can just grab a reference. That seems
> >> pretty horrible, and the real source of the issue.
> >
> > AFAICT, epoll just doesn't hold any references at all. It depends,
> > I think, on eventpoll_release() (really eventpoll_release_file())
> > synchronizing with epoll_wait() (but I don't see how this happens, and
> > the race seems to be against ep_item_poll() ...?)
> >
> > I'm really confused about how eventpoll manages the lifetime of polled
> > fds.
>
> epoll doesn't hold any references, and it's got some ugly callback to
> deal with that. It's not ideal, nor pretty, but that's how it currently
> works. See eventpoll_release() and how it's called. This means that
> epoll itself is supposedly safe from the file going away, even though it
> doesn't hold a reference to it.
Right -- what remains unclear to me is how struct file lifetime is
expected to work in the struct file_operations::poll callbacks. Because
using get_file() there looks clearly unsafe...
> Except that in this case, the file is already gone by the time
> eventpoll_release() is called. Which presumably is some interaction with
> the somewhat suspicious file reference management that dma-buf is doing.
> But I didn't look into that much, outside of noting it looks a bit
> suspect.
Not yet, though. Here's (one) race state from the analysis. I added lines
for the dma_fence_add_callback()/dma_buf_poll_cb() case, since that's
the case that would escape any eventpoll_release/epoll_wait
synchronization (if it exists?):
close(dmabuf->file)
__fput_sync (f_count == 1, last ref)
f_count-- (f_count == 0 now)
__fput
epoll_wait
vfs_poll(dmabuf->file)
get_file(dmabuf->file)(f_count == 1)
dma_fence_add_callback()
eventpoll_release
dmabuf->file deallocation
dma_buf_poll_cb()
fput(dmabuf->file) (f_count == 1)
f_count--
dmabuf->file deallocation
Without fences to create a background callback, we just do a double-free:
close(dmabuf->file)
__fput_sync (f_count == 1, last ref)
f_count-- (f_count == 0 now)
__fput
epoll_wait
vfs_poll(dmabuf->file)
get_file(dmabuf->file)(f_count == 1)
dma_buf_poll_cb()
fput(dmabuf->file) (f_count == 1)
f_count--
eventpoll_release
dmabuf->file deallocation
eventpoll_release
dmabuf->file deallocation
get_file(), via epoll_wait()->vfs_poll()->dma_buf_poll(), has raised
f_count again. Then eventpoll_release() is doing things to remove
dmabuf->file from the eventpoll lists, but I *think* this is synchronized
so that an epoll_wait() will only call .poll handlers with a valid
(though possibly f_count==0) file, but I can't figure out where that
happens. (If it's not happening, we have a much bigger problem, but I
imagine we'd see massive corruption all the time, which we don't.)
So, yeah, I can't figure out how eventpoll_release() and epoll_wait()
are expected to behave safely for .poll handlers.
Regardless, for the simple case: it seems like it's just totally illegal
to use get_file() in a poll handler. Is this known/expected? And if so,
how can dmabuf possibly deal with that?
--
Kees Cook
On Sun, 5 May 2024 at 03:50, Christian Brauner <brauner(a)kernel.org> wrote:
>
> And I agree with you that for some instances it's valid to take another
> reference to a file from f_op->poll() but then they need to use
> get_file_active() imho and simply handle the case where f_count is zero.
I think this is
(a) practically impossible to find (since most f_count updates are in
various random helpers)
(b) not tenable in the first place, since *EVERYBODY* does a f_count
update as part of the bog-standard pollwait
So (b) means that the notion of "warn if somebody increments f_count
from zero" is broken to begin with - but it's doubly broken because it
wouldn't find anything *anyway*, since this never happens in any
normal situation.
And (a) means that any non-automatic finding of this is practically impossible.
> And we need to document that in Documentation/filesystems/file.rst or
> locking.rst.
WHY?
Why cannot you and Al just admit that the problem is in epoll. Always
has been, always will be.
The fact is, it's not dma-buf that is violating any rules. It's epoll.
It's calling out to random driver functions with a file pointer that
is no longer valid.
It really is that simple.
I don't see why you are arguing for "unknown number of drivers - we
know at least *one* - have to be fixed for a bug that is in epoll".
If it was *easy* to fix, and if it was *easy* to validate, then sure.
But that just isn't the case.
In contrast, in epoll it's *trivial* to fix the one case where it does
a VFS call-out, and just say "you have to follow the rules".
So explain to me again why you want to mess up the driver interface
and everybody who has a '.poll()' function, and not just fix the ONE
clearly buggy piece of code.
Because dammit,. epoll is clearly buggy. It's not enough to say "the
file allocation isn't going away", and claim that that means that it's
not buggy - when the file IS NO LONGER VALID!
Linus
On Fri, May 03, 2024 at 12:49:11PM -0600, Jens Axboe wrote:
> On 5/3/24 12:26 PM, Kees Cook wrote:
> > Thanks for doing this analysis! I suspect at least a start of a fix
> > would be this:
> >
> > diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
> > index 8fe5aa67b167..15e8f74ee0f2 100644
> > --- a/drivers/dma-buf/dma-buf.c
> > +++ b/drivers/dma-buf/dma-buf.c
> > @@ -267,9 +267,8 @@ static __poll_t dma_buf_poll(struct file *file, poll_table *poll)
> >
> > if (events & EPOLLOUT) {
> > /* Paired with fput in dma_buf_poll_cb */
> > - get_file(dmabuf->file);
> > -
> > - if (!dma_buf_poll_add_cb(resv, true, dcb))
> > + if (!atomic_long_inc_not_zero(&dmabuf->file) &&
> > + !dma_buf_poll_add_cb(resv, true, dcb))
> > /* No callback queued, wake up any other waiters */
>
> Don't think this is sane at all. I'm assuming you meant:
>
> atomic_long_inc_not_zero(&dmabuf->file->f_count);
Oops, yes, sorry. I was typed from memory instead of copy/paste.
> but won't fly as you're not under RCU in the first place. And what
> protects it from being long gone before you attempt this anyway? This is
> sane way to attempt to fix it, it's completely opposite of what sane ref
> handling should look like.
>
> Not sure what the best fix is here, seems like dma-buf should hold an
> actual reference to the file upfront rather than just stash a pointer
> and then later _hope_ that it can just grab a reference. That seems
> pretty horrible, and the real source of the issue.
AFAICT, epoll just doesn't hold any references at all. It depends,
I think, on eventpoll_release() (really eventpoll_release_file())
synchronizing with epoll_wait() (but I don't see how this happens, and
the race seems to be against ep_item_poll() ...?)
I'm really confused about how eventpoll manages the lifetime of polled
fds.
> > Due to this issue I've proposed fixing get_file() to detect pathological states:
> > https://lore.kernel.org/lkml/20240502222252.work.690-kees@kernel.org/
>
> I don't think this would catch this case, as the memory could just be
> garbage at this point.
It catches it just fine! :) I tested it against the published PoC.
And for cases where further allocations have progressed far enough to
corrupt the freed struct file and render the check pointless, nothing
different has happened than what happens today. At least now we have a
window to catch the situation across the time frame before it is both
reallocated _and_ the contents at the f_count offset gets changed to
non-zero.
--
Kees Cook
Hi
Am 02.05.24 um 14:00 schrieb Boris Brezillon:
> On Thu, 2 May 2024 13:59:41 +0200
> Boris Brezillon <boris.brezillon(a)collabora.com> wrote:
>
>> Hi Thomas,
>>
>> On Thu, 2 May 2024 13:51:16 +0200
>> Thomas Zimmermann <tzimmermann(a)suse.de> wrote:
>>
>>> Hi,
>>>
>>> ignoring my r-b on patch 1, I'd like to rethink the current patches in
>>> general.
>>>
>>> I think drm_gem_shmem_pin() should become the locked version of _pin(),
>>> so that drm_gem_shmem_object_pin() can call it directly. The existing
>>> _pin_unlocked() would not be needed any longer. Same for the _unpin()
>>> functions. This change would also fix the consistency with the semantics
>>> of the shmem _vmap() functions, which never take reservation locks.
>>>
>>> There are only two external callers of drm_gem_shmem_pin(): the test
>>> case and panthor. These assume that drm_gem_shmem_pin() acquires the
>>> reservation lock. The test case should likely call drm_gem_pin()
>>> instead. That would acquire the reservation lock and the test would
>>> validate that shmem's pin helper integrates well into the overall GEM
>>> framework. The way panthor uses drm_gem_shmem_pin() looks wrong to me.
>>> For now, it could receive a wrapper that takes the lock and that's it.
>> I do agree that the current inconsistencies in the naming is
>> troublesome (sometimes _unlocked, sometimes _locked, with the version
>> without any suffix meaning either _locked or _unlocked depending on
>> what the suffixed version does), and that's the very reason I asked
>> Dmitry to address that in his shrinker series [1]. So, ideally I'd
>> prefer if patches from Dmitry's series were applied instead of
>> trying to fix that here (IIRC, we had an ack from Maxime).
> With the link this time :-).
>
> [1]https://lore.kernel.org/lkml/20240105184624.508603-1-dmitry.osipenko@coll…
Thanks. I remember these patches. Somehow I thought they would have been
merged already. I wasn't super happy about the naming changes in patch
5, because the names of the GEM object callbacks do no longer correspond
with their implementations. But anyway.
If we go that direction, we should here simply push drm_gem_shmem_pin()
and drm_gem_shmem_unpin() into panthor and update the shmem tests with
drm_gem_pin(). Panfrost and lima would call drm_gem_shmem_pin_locked().
IMHO we should not promote the use of drm_gem_shmem_object_*()
functions, as they are meant to be callbacks for struct
drm_gem_object_funcs. (Auto-generating them would be nice.)
Best regards
Thomas
>
>> Regards,
>>
>> Boris
--
--
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Frankenstrasse 146, 90461 Nuernberg, Germany
GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman
HRB 36809 (AG Nuernberg)
Hi,
ignoring my r-b on patch 1, I'd like to rethink the current patches in
general.
I think drm_gem_shmem_pin() should become the locked version of _pin(),
so that drm_gem_shmem_object_pin() can call it directly. The existing
_pin_unlocked() would not be needed any longer. Same for the _unpin()
functions. This change would also fix the consistency with the semantics
of the shmem _vmap() functions, which never take reservation locks.
There are only two external callers of drm_gem_shmem_pin(): the test
case and panthor. These assume that drm_gem_shmem_pin() acquires the
reservation lock. The test case should likely call drm_gem_pin()
instead. That would acquire the reservation lock and the test would
validate that shmem's pin helper integrates well into the overall GEM
framework. The way panthor uses drm_gem_shmem_pin() looks wrong to me.
For now, it could receive a wrapper that takes the lock and that's it.
Best regards
Thomas
Am 01.05.24 um 08:55 schrieb Adrián Larumbe:
> This is v3 of https://lore.kernel.org/dri-devel/20240424090429.57de7d1c@collabora.com/
>
> The goal of this patch series is fixing a deadlock upon locking the dma reservation
> of a DRM gem object when pinning it, at a prime import operation.
>
> Changes from v2:
> - Removed comment explaining reason why an already-locked
> pin function replaced the locked variant inside Panfrost's
> object pin callback.
> - Moved already-assigned attachment warning into generic
> already-locked gem object pin function
>
> Adrián Larumbe (2):
> drm/panfrost: Fix dma_resv deadlock at drm object pin time
> drm/gem-shmem: Add import attachment warning to locked pin function
>
> drivers/gpu/drm/drm_gem_shmem_helper.c | 2 ++
> drivers/gpu/drm/lima/lima_gem.c | 2 +-
> drivers/gpu/drm/panfrost/panfrost_gem.c | 2 +-
> 3 files changed, 4 insertions(+), 2 deletions(-)
>
>
> base-commit: 75b68f22e39aafb22f3d8e3071e1aba73560788c
--
--
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Frankenstrasse 146, 90461 Nuernberg, Germany
GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman
HRB 36809 (AG Nuernberg)
Hi
Am 01.05.24 um 08:55 schrieb Adrián Larumbe:
> When Panfrost must pin an object that is being prepared a dma-buf
> attachment for on behalf of another driver, the core drm gem object pinning
> code already takes a lock on the object's dma reservation.
>
> However, Panfrost GEM object's pinning callback would eventually try taking
> the lock on the same dma reservation when delegating pinning of the object
> onto the shmem subsystem, which led to a deadlock.
>
> This can be shown by enabling CONFIG_DEBUG_WW_MUTEX_SLOWPATH, which throws
> the following recursive locking situation:
>
> weston/3440 is trying to acquire lock:
> ffff000000e235a0 (reservation_ww_class_mutex){+.+.}-{3:3}, at: drm_gem_shmem_pin+0x34/0xb8 [drm_shmem_helper]
> but task is already holding lock:
> ffff000000e235a0 (reservation_ww_class_mutex){+.+.}-{3:3}, at: drm_gem_pin+0x2c/0x80 [drm]
>
> Fix it by assuming the object's reservation had already been locked by the
> time we reach panfrost_gem_pin.
Maybe say that the reservation lock has been taken in drm_gem_pin()
>
> Do the same thing for the Lima driver, as it most likely suffers from the
> same issue.
Please split this patch into one for panfrost and one for lima. To each
patch, you can add
Reviewed-by: Thomas Zimmermann <tzimmermann(a)suse.de>
Best regards
Thomas
>
> Cc: Thomas Zimmermann <tzimmermann(a)suse.de>
> Cc: Dmitry Osipenko <dmitry.osipenko(a)collabora.com>
> Cc: Boris Brezillon <boris.brezillon(a)collabora.com>
> Cc: Steven Price <steven.price(a)arm.com>
> Fixes: a78027847226 ("drm/gem: Acquire reservation lock in drm_gem_{pin/unpin}()")
> Signed-off-by: Adrián Larumbe <adrian.larumbe(a)collabora.com>
> ---
> drivers/gpu/drm/lima/lima_gem.c | 2 +-
> drivers/gpu/drm/panfrost/panfrost_gem.c | 2 +-
> 2 files changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/lima/lima_gem.c b/drivers/gpu/drm/lima/lima_gem.c
> index 7ea244d876ca..c4e0f9faaa47 100644
> --- a/drivers/gpu/drm/lima/lima_gem.c
> +++ b/drivers/gpu/drm/lima/lima_gem.c
> @@ -185,7 +185,7 @@ static int lima_gem_pin(struct drm_gem_object *obj)
> if (bo->heap_size)
> return -EINVAL;
>
> - return drm_gem_shmem_pin(&bo->base);
> + return drm_gem_shmem_object_pin(obj);
> }
>
> static int lima_gem_vmap(struct drm_gem_object *obj, struct iosys_map *map)
> diff --git a/drivers/gpu/drm/panfrost/panfrost_gem.c b/drivers/gpu/drm/panfrost/panfrost_gem.c
> index d47b40b82b0b..f268bd5c2884 100644
> --- a/drivers/gpu/drm/panfrost/panfrost_gem.c
> +++ b/drivers/gpu/drm/panfrost/panfrost_gem.c
> @@ -192,7 +192,7 @@ static int panfrost_gem_pin(struct drm_gem_object *obj)
> if (bo->is_heap)
> return -EINVAL;
>
> - return drm_gem_shmem_pin(&bo->base);
> + return drm_gem_shmem_object_pin(obj);
> }
>
> static enum drm_gem_object_status panfrost_gem_status(struct drm_gem_object *obj)
--
--
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Frankenstrasse 146, 90461 Nuernberg, Germany
GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman
HRB 36809 (AG Nuernberg)
On Fri, Apr 26, 2024 at 07:22:34PM +0200, Alexandre Mergnat wrote:
> Add audio clock wrapper and audio tuner control.
Please submit patches using subject lines reflecting the style for the
subsystem, this makes it easier for people to identify relevant patches.
Look at what existing commits in the area you're changing are doing and
make sure your subject lines visually resemble what they're doing.
There's no need to resubmit to fix this alone.
On 26/04/2024 19:22, Alexandre Mergnat wrote:
> regulators:
> type: object
> $ref: /schemas/regulator/mediatek,mt6357-regulator.yaml
> @@ -83,6 +111,12 @@ examples:
> interrupt-controller;
> #interrupt-cells = <2>;
>
> + audio-codec {
> + mediatek,micbias0-microvolt = <1700000>;
> + mediatek,micbias1-microvolt = <1700000>;
> + vaud28-supply = <&mt6357_vaud28_reg>;
And now you should see how odd it looks. Supplies are part of entire
chip, not subblock, even if they supply dedicated domain within that chip.
That's why I asked to put it in the parent node.
Best regards,
Krzysztof