If kiocb_set_cancel_fn() is called for I/O submitted via io_uring, the following kernel warning appears:
WARNING: CPU: 3 PID: 368 at fs/aio.c:598 kiocb_set_cancel_fn+0x9c/0xa8 Call trace: kiocb_set_cancel_fn+0x9c/0xa8 ffs_epfile_read_iter+0x144/0x1d0 io_read+0x19c/0x498 io_issue_sqe+0x118/0x27c io_submit_sqes+0x25c/0x5fc __arm64_sys_io_uring_enter+0x104/0xab0 invoke_syscall+0x58/0x11c el0_svc_common+0xb4/0xf4 do_el0_svc+0x2c/0xb0 el0_svc+0x2c/0xa4 el0t_64_sync_handler+0x68/0xb4 el0t_64_sync+0x1a4/0x1a8
Fix this by setting the IOCB_AIO_RW flag for read and write I/O that is submitted by libaio.
Suggested-by: Jens Axboe axboe@kernel.dk Cc: Christoph Hellwig hch@lst.de Cc: Avi Kivity avi@scylladb.com Cc: Sandeep Dhavale dhavale@google.com Cc: Jens Axboe axboe@kernel.dk Cc: Greg Kroah-Hartman gregkh@linuxfoundation.org Cc: Kent Overstreet kent.overstreet@linux.dev Cc: stable@vger.kernel.org Signed-off-by: Bart Van Assche bvanassche@acm.org --- fs/aio.c | 9 ++++++++- include/linux/fs.h | 2 ++ 2 files changed, 10 insertions(+), 1 deletion(-)
diff --git a/fs/aio.c b/fs/aio.c index bb2ff48991f3..da18dbcfcb22 100644 --- a/fs/aio.c +++ b/fs/aio.c @@ -593,6 +593,13 @@ void kiocb_set_cancel_fn(struct kiocb *iocb, kiocb_cancel_fn *cancel) struct kioctx *ctx = req->ki_ctx; unsigned long flags;
+ /* + * kiocb didn't come from aio or is neither a read nor a write, hence + * ignore it. + */ + if (!(iocb->ki_flags & IOCB_AIO_RW)) + return; + if (WARN_ON_ONCE(!list_empty(&req->ki_list))) return;
@@ -1509,7 +1516,7 @@ static int aio_prep_rw(struct kiocb *req, const struct iocb *iocb) req->ki_complete = aio_complete_rw; req->private = NULL; req->ki_pos = iocb->aio_offset; - req->ki_flags = req->ki_filp->f_iocb_flags; + req->ki_flags = req->ki_filp->f_iocb_flags | IOCB_AIO_RW; if (iocb->aio_flags & IOCB_FLAG_RESFD) req->ki_flags |= IOCB_EVENTFD; if (iocb->aio_flags & IOCB_FLAG_IOPRIO) { diff --git a/include/linux/fs.h b/include/linux/fs.h index c9deac59c29a..d47306fe1121 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -352,6 +352,8 @@ enum rw_hint { * unrelated IO (like cache flushing, new IO generation, etc). */ #define IOCB_DIO_CALLER_COMP (1 << 22) +/* kiocb is a read or write operation submitted by fs/aio.c. */ +#define IOCB_AIO_RW (1 << 23)
/* for use in trace events */ #define TRACE_IOCB_STRINGS \
On 2/15/24 1:47 PM, Bart Van Assche wrote:
If kiocb_set_cancel_fn() is called for I/O submitted via io_uring, the following kernel warning appears:
WARNING: CPU: 3 PID: 368 at fs/aio.c:598 kiocb_set_cancel_fn+0x9c/0xa8 Call trace: kiocb_set_cancel_fn+0x9c/0xa8 ffs_epfile_read_iter+0x144/0x1d0 io_read+0x19c/0x498 io_issue_sqe+0x118/0x27c io_submit_sqes+0x25c/0x5fc __arm64_sys_io_uring_enter+0x104/0xab0 invoke_syscall+0x58/0x11c el0_svc_common+0xb4/0xf4 do_el0_svc+0x2c/0xb0 el0_svc+0x2c/0xa4 el0t_64_sync_handler+0x68/0xb4 el0t_64_sync+0x1a4/0x1a8
Fix this by setting the IOCB_AIO_RW flag for read and write I/O that is submitted by libaio.
Like I said weeks ago, let's please get this fix in NOW and we can debate what to do about cancelations in general for aio separately. This patch 1 is a real fix, and it'd be silly to keep this stalled while the other stuff is ongoing.
This isn't a critique at at you Bart, really just wanted to reply to teh cover letter but there isn't one.
Christian, can you queue this up for 6.8 and mark it for stable?
On Thu, 15 Feb 2024 12:47:38 -0800, Bart Van Assche wrote:
If kiocb_set_cancel_fn() is called for I/O submitted via io_uring, the following kernel warning appears:
WARNING: CPU: 3 PID: 368 at fs/aio.c:598 kiocb_set_cancel_fn+0x9c/0xa8 Call trace: kiocb_set_cancel_fn+0x9c/0xa8 ffs_epfile_read_iter+0x144/0x1d0 io_read+0x19c/0x498 io_issue_sqe+0x118/0x27c io_submit_sqes+0x25c/0x5fc __arm64_sys_io_uring_enter+0x104/0xab0 invoke_syscall+0x58/0x11c el0_svc_common+0xb4/0xf4 do_el0_svc+0x2c/0xb0 el0_svc+0x2c/0xa4 el0t_64_sync_handler+0x68/0xb4 el0t_64_sync+0x1a4/0x1a8
[...]
Applied to the vfs.fixes branch of the vfs/vfs.git tree. Patches in the vfs.fixes branch should appear in linux-next soon.
Please report any outstanding bugs that were missed during review in a new review to the original patch series allowing us to drop it.
It's encouraged to provide Acked-bys and Reviewed-bys even though the patch has now been applied. If possible patch trailers will be updated.
Note that commit hashes shown below are subject to change due to rebase, trailer updates or similar. If in doubt, please check the listed branch.
tree: https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git branch: vfs.fixes
[1/2] fs/aio: Restrict kiocb_set_cancel_fn() to I/O submitted via libaio https://git.kernel.org/vfs/vfs/c/b820de741ae4
On Thu, Feb 15, 2024 at 12:47:38PM -0800, Bart Van Assche wrote:
void kiocb_set_cancel_fn(struct kiocb *iocb, kiocb_cancel_fn *cancel) { struct aio_kiocb *req = container_of(iocb, struct aio_kiocb, rw); struct kioctx *ctx = req->ki_ctx; unsigned long flags;
- /*
* kiocb didn't come from aio or is neither a read nor a write, hence
* ignore it.
*/
- if (!(iocb->ki_flags & IOCB_AIO_RW))
return;
- if (WARN_ON_ONCE(!list_empty(&req->ki_list))) return;
If I understand correctly, this patch is supposed to fix a memory safety bug when kiocb_set_cancel_fn() is called on a kiocb that is owned by io_uring instead of legacy AIO. However, the kiocb still gets accessed as an aio_kiocb at the very beginning of the function, so it's still broken:
struct aio_kiocb *req = container_of(iocb, struct aio_kiocb, rw); struct kioctx *ctx = req->ki_ctx;
I'm also wondering why "ignore" is the right fix. The USB gadget driver sees that it has asynchronous I/O (kiocb::ki_complete != NULL) and then tries to set a cancellation function. What is the expected behavior when the I/O is owned by io_uring? Should it perhaps call into io_uring to set a cancellation function with io_uring? Or is the concept of cancellation functions indeed specific to legacy AIO, and nothing should be done with io_uring I/O?
- Eric
On 3/4/24 11:10, Eric Biggers wrote:
If I understand correctly, this patch is supposed to fix a memory safety bug when kiocb_set_cancel_fn() is called on a kiocb that is owned by io_uring instead of legacy AIO. However, the kiocb still gets accessed as an aio_kiocb at the very beginning of the function, so it's still broken:
struct aio_kiocb *req = container_of(iocb, struct aio_kiocb, rw); struct kioctx *ctx = req->ki_ctx;
Hi Eric,
Thanks for having reported this. I agree that this needs to be fixed.
I'm also wondering why "ignore" is the right fix. The USB gadget driver sees that it has asynchronous I/O (kiocb::ki_complete != NULL) and then tries to set a cancellation function. What is the expected behavior when the I/O is owned by io_uring? Should it perhaps call into io_uring to set a cancellation function with io_uring? Or is the concept of cancellation functions indeed specific to legacy AIO, and nothing should be done with io_uring I/O?
As far as I know no Linux user space interface for submitting I/O supports cancellation of read or write requests other than the AIO io_cancel() system call.
It would make it easier to maintain the kernel if I/O cancellation support would be removed. However, there is existing user space code that depends on USB I/O cancellation so I'm not sure how to proceed to remove AIO io_cancel() support from the kernel.
Thanks,
Bart.
On 3/4/24 12:43 PM, Bart Van Assche wrote:
I'm also wondering why "ignore" is the right fix. The USB gadget driver sees that it has asynchronous I/O (kiocb::ki_complete != NULL) and then tries to set a cancellation function. What is the expected behavior when the I/O is owned by io_uring? Should it perhaps call into io_uring to set a cancellation function with io_uring? Or is the concept of cancellation functions indeed specific to legacy AIO, and nothing should be done with io_uring I/O?
As far as I know no Linux user space interface for submitting I/O supports cancellation of read or write requests other than the AIO io_cancel() system call.
Not true, see previous reply (on both points in this email). The kernel in general does not support cancelation of regular file/storage IO that has submitted. That includes aio. There are many reasons for this.
For anything but that, you can most certainly cancel inflight IO with io_uring, be it to a socket, pipe, whatever.
The problem here isn't that only aio supports cancelations, it's that the code to do so is a bad hack.
On 3/4/24 12:21, Jens Axboe wrote:
On 3/4/24 12:43 PM, Bart Van Assche wrote:
As far as I know no Linux user space interface for submitting I/O supports cancellation of read or write requests other than the AIO io_cancel() system call.
Not true, see previous reply (on both points in this email). The kernel in general does not support cancelation of regular file/storage IO that has submitted. That includes aio. There are many reasons for this.
For anything but that, you can most certainly cancel inflight IO with io_uring, be it to a socket, pipe, whatever.
The problem here isn't that only aio supports cancelations, it's that the code to do so is a bad hack.
What I meant is that the AIO code is the only code I know of that supports cancelling I/O from user space after the I/O has been submitted to the driver that will process the I/O request (e.g. a USB driver). Is my understanding correct that io_uring cancellation involves setting the IO_WQ_WORK_CANCEL flag and also that that flag is ignored by io_wq_submit_work() after io_assign_file() has been called? The AIO code supports cancelling I/O after call_read_iter() or call_write_iter() has been called.
Thanks,
Bart.
On 3/5/24 1:43 PM, Bart Van Assche wrote:
On 3/4/24 12:21, Jens Axboe wrote:
On 3/4/24 12:43 PM, Bart Van Assche wrote:
As far as I know no Linux user space interface for submitting I/O supports cancellation of read or write requests other than the AIO io_cancel() system call.
Not true, see previous reply (on both points in this email). The kernel in general does not support cancelation of regular file/storage IO that has submitted. That includes aio. There are many reasons for this.
For anything but that, you can most certainly cancel inflight IO with io_uring, be it to a socket, pipe, whatever.
The problem here isn't that only aio supports cancelations, it's that the code to do so is a bad hack.
What I meant is that the AIO code is the only code I know of that supports cancelling I/O from user space after the I/O has been submitted to the driver that will process the I/O request (e.g. a USB driver). Is
Right, we never offered that in general in the kernel. Like I said, it's just a hack what is there for that.
my understanding correct that io_uring cancellation involves setting the IO_WQ_WORK_CANCEL flag and also that that flag is ignored by io_wq_submit_work() after io_assign_file() has been called?
No, that's only for requests that go via io-wq, which is generally not the fast path and most workloads will never see that.
For anything else, cancelation can very much happen at any time.
The AIO code supports cancelling I/O after call_read_iter() or call_write_iter() has been called.
...for the above hacky case, and that's it, not as a generic thing.
On 3/4/24 12:10 PM, Eric Biggers wrote:
On Thu, Feb 15, 2024 at 12:47:38PM -0800, Bart Van Assche wrote:
void kiocb_set_cancel_fn(struct kiocb *iocb, kiocb_cancel_fn *cancel) { struct aio_kiocb *req = container_of(iocb, struct aio_kiocb, rw); struct kioctx *ctx = req->ki_ctx; unsigned long flags;
- /*
* kiocb didn't come from aio or is neither a read nor a write, hence
* ignore it.
*/
- if (!(iocb->ki_flags & IOCB_AIO_RW))
return;
- if (WARN_ON_ONCE(!list_empty(&req->ki_list))) return;
If I understand correctly, this patch is supposed to fix a memory safety bug when kiocb_set_cancel_fn() is called on a kiocb that is owned by io_uring instead of legacy AIO. However, the kiocb still gets accessed as an aio_kiocb at the very beginning of the function, so it's still broken:
struct aio_kiocb *req = container_of(iocb, struct aio_kiocb, rw); struct kioctx *ctx = req->ki_ctx;
Doesn't matter, they are both just pointer math. But it'd look cleaner if it was below.
I'm also wondering why "ignore" is the right fix. The USB gadget driver sees that it has asynchronous I/O (kiocb::ki_complete != NULL) and then tries to set a cancellation function. What is the expected behavior when the I/O is owned by io_uring? Should it perhaps call into io_uring to set a cancellation function with io_uring? Or is the concept of cancellation functions indeed specific to legacy AIO, and nothing should be done with io_uring I/O?
Because the ->ki_cancel() is a hack, as demonstrated by this issue in teh first place, which is a gross layering violation. io_uring supports proper cancelations, invoked from userspace. It would never have worked with this scheme.
On Mon, Mar 04, 2024 at 01:09:15PM -0700, Jens Axboe wrote:
If I understand correctly, this patch is supposed to fix a memory safety bug when kiocb_set_cancel_fn() is called on a kiocb that is owned by io_uring instead of legacy AIO. However, the kiocb still gets accessed as an aio_kiocb at the very beginning of the function, so it's still broken:
struct aio_kiocb *req = container_of(iocb, struct aio_kiocb, rw); struct kioctx *ctx = req->ki_ctx;
Doesn't matter, they are both just pointer math. But it'd look cleaner if it was below.
It dereferences the pointer.
I'm also wondering why "ignore" is the right fix. The USB gadget driver sees that it has asynchronous I/O (kiocb::ki_complete != NULL) and then tries to set a cancellation function. What is the expected behavior when the I/O is owned by io_uring? Should it perhaps call into io_uring to set a cancellation function with io_uring? Or is the concept of cancellation functions indeed specific to legacy AIO, and nothing should be done with io_uring I/O?
Because the ->ki_cancel() is a hack, as demonstrated by this issue in teh first place, which is a gross layering violation. io_uring supports proper cancelations, invoked from userspace. It would never have worked with this scheme.
Maybe kiocb_set_cancel_fn() should have a comment that explains this?
- Eric
On 3/4/24 1:49 PM, Eric Biggers wrote:
On Mon, Mar 04, 2024 at 01:09:15PM -0700, Jens Axboe wrote:
If I understand correctly, this patch is supposed to fix a memory safety bug when kiocb_set_cancel_fn() is called on a kiocb that is owned by io_uring instead of legacy AIO. However, the kiocb still gets accessed as an aio_kiocb at the very beginning of the function, so it's still broken:
struct aio_kiocb *req = container_of(iocb, struct aio_kiocb, rw); struct kioctx *ctx = req->ki_ctx;
Doesn't matter, they are both just pointer math. But it'd look cleaner if it was below.
It dereferences the pointer.
Oops yes, was too focused on the container_of(). We should move them down, one for clarity and one for not dereferencing it.
I'm also wondering why "ignore" is the right fix. The USB gadget driver sees that it has asynchronous I/O (kiocb::ki_complete != NULL) and then tries to set a cancellation function. What is the expected behavior when the I/O is owned by io_uring? Should it perhaps call into io_uring to set a cancellation function with io_uring? Or is the concept of cancellation functions indeed specific to legacy AIO, and nothing should be done with io_uring I/O?
Because the ->ki_cancel() is a hack, as demonstrated by this issue in teh first place, which is a gross layering violation. io_uring supports proper cancelations, invoked from userspace. It would never have worked with this scheme.
Maybe kiocb_set_cancel_fn() should have a comment that explains this?
It should have a big fat comment that nobody should be using it. At least the gadget stuff is the only one doing it, and we haven't grown a new one in decades, thankfully.
linux-stable-mirror@lists.linaro.org