On Sat, 4 May 2024 at 02:37, Christian Brauner brauner@kernel.org wrote:
--- a/drivers/dma-buf/dma-buf.c +++ b/drivers/dma-buf/dma-buf.c @@ -244,13 +244,18 @@ static __poll_t dma_buf_poll(struct file *file, poll_table *poll) if (!dmabuf || !dmabuf->resv) return EPOLLERR;
if (!get_file_active(&dmabuf->file))
return EPOLLERR;
[...]
I *really* don't think anything that touches dma-buf.c can possibly be right.
This is not a dma-buf.c bug.
This is purely an epoll bug.
Lookie here, the fundamental issue is that epoll can call '->poll()' on a file descriptor that is being closed concurrently.
That means that *ANY* driver that relies on *any* data structure that is managed by the lifetime of the 'struct file' will have problems.
Look, here's sock_poll():
static __poll_t sock_poll(struct file *file, poll_table *wait) { struct socket *sock = file->private_data;
and that first line looks about as innocent as it possibly can, right?
Now, imagine that this is called from 'epoll' concurrently with the file being closed for the last time (but it just hasn't _quite_ reached eventpoll_release() yet).
Now, imagine that the kernel is built with preemption, and the epoll thread gets preempted _just_ before it loads 'file->private_data'.
Furthermore, the machine is under heavy load, and it just stays off its CPU a long time.
Now, during this TOTALLY INNOCENT sock_poll(), in another thread, the file closing completes, eventpoll_release() finishes, and the preemption of the poll() thing just takes so long that you go through an RCU period too, so that the actual file has been released too.
So now that totally innoced file->private_data load in the poll() is probably going to get random data.
Yes, the file is allocated as SLAB_TYPESAFE_BY_RCU, so it's probably still a file. Not guaranteed, even the slab will get fully free'd in some situations. And yes, the above case is impossible to hit in practice. You have to hit quite the small race window with an operation that practically never happens in the first place.
But my point is that the fact that the problem with file->f_count lifetimes happens for that dmabuf driver is not the fault of the dmabuf code. Not at all.
It is *ENTIRELY* a bug in epoll, and the dmabuf code is probably just easier to hit because it has a poll() function that does things that have longer lifetimes than most things, and interacts more directly with that f_count.
So I really don't understand why Al thinks this is "dmabuf does bad things with f_count". It damn well does not. dma-buf is the GOOD GUY here. It's doing things *PROPERLY*. It's taking refcounts like it damn well should.
The fact that it takes ref-counts on something that the epoll code has messed up is *NOT* its fault.
Linus