commit f011c9cf04c06f16b24f583d313d3c012e589e50 upstream.
The submit queue polling threads are userland threads that just never exit to the userland. When creating the thread with IORING_SETUP_SQ_AFF, the affinity of the poller thread is set to the cpu specified in sq_thread_cpu. However, this CPU can be outside of the cpuset defined by the cgroup cpuset controller. This violates the rules defined by the cpuset controller and is a potential issue for realtime applications.
In b7ed6d8ffd6 we fixed the default affinity of the poller thread, in case no explicit pinning is required by inheriting the one of the creating task. In case of explicit pinning, the check is more complicated, as also a cpu outside of the parent cpumask is allowed. We implemented this by using cpuset_cpus_allowed (that has support for cgroup cpusets) and testing if the requested cpu is in the set.
Fixes: 37d1e2e3642e ("io_uring: move SQPOLL thread io-wq forked worker") Signed-off-by: Felix Moessbauer felix.moessbauer@siemens.com Link: https://lore.kernel.org/r/20240909150036.55921-1-felix.moessbauer@siemens.co... Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- io_uring/io_uring.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index 8ed2c65529714..6b6fd244233f8 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -56,6 +56,7 @@ #include <linux/mm.h> #include <linux/mman.h> #include <linux/percpu.h> +#include <linux/cpuset.h> #include <linux/slab.h> #include <linux/blkdev.h> #include <linux/bvec.h> @@ -8746,10 +8747,12 @@ static int io_sq_offload_create(struct io_ring_ctx *ctx, return 0;
if (p->flags & IORING_SETUP_SQ_AFF) { + struct cpumask allowed_mask; int cpu = p->sq_thread_cpu;
ret = -EINVAL; - if (cpu >= nr_cpu_ids || !cpu_online(cpu)) + cpuset_cpus_allowed(current, &allowed_mask); + if (!cpumask_test_cpu(cpu, &allowed_mask)) goto err_sqpoll; sqd->sq_cpu = cpu; } else {
From: Jens Axboe axboe@kernel.dk
commit a09c17240bdf2e9fa6d0591afa9448b59785f7d4 upstream.
A recent commit ensured that SQPOLL cannot be setup with a CPU that isn't in the current tasks cpuset, but it also dropped testing whether the CPU is valid in the first place. Without that, if a task passes in a CPU value that is too high, the following KASAN splat can get triggered:
BUG: KASAN: stack-out-of-bounds in io_sq_offload_create+0x858/0xaa4 Read of size 8 at addr ffff800089bc7b90 by task wq-aff.t/1391
CPU: 4 UID: 1000 PID: 1391 Comm: wq-aff.t Not tainted 6.11.0-rc7-00227-g371c468f4db6 #7080 Hardware name: linux,dummy-virt (DT) Call trace: dump_backtrace.part.0+0xcc/0xe0 show_stack+0x14/0x1c dump_stack_lvl+0x58/0x74 print_report+0x16c/0x4c8 kasan_report+0x9c/0xe4 __asan_report_load8_noabort+0x1c/0x24 io_sq_offload_create+0x858/0xaa4 io_uring_setup+0x1394/0x17c4 __arm64_sys_io_uring_setup+0x6c/0x180 invoke_syscall+0x6c/0x260 el0_svc_common.constprop.0+0x158/0x224 do_el0_svc+0x3c/0x5c el0_svc+0x34/0x70 el0t_64_sync_handler+0x118/0x124 el0t_64_sync+0x168/0x16c
The buggy address belongs to stack of task wq-aff.t/1391 and is located at offset 48 in frame: io_sq_offload_create+0x0/0xaa4
This frame has 1 object: [32, 40) 'allowed_mask'
The buggy address belongs to the virtual mapping at [ffff800089bc0000, ffff800089bc9000) created by: kernel_clone+0x124/0x7e0
The buggy address belongs to the physical page: page: refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff0000d740af80 pfn:0x11740a memcg:ffff0000c2706f02 flags: 0xbffe00000000000(node=0|zone=2|lastcpupid=0x1fff) raw: 0bffe00000000000 0000000000000000 dead000000000122 0000000000000000 raw: ffff0000d740af80 0000000000000000 00000001ffffffff ffff0000c2706f02 page dumped because: kasan: bad access detected
Memory state around the buggy address: ffff800089bc7a80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ffff800089bc7b00: 00 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
ffff800089bc7b80: 00 f3 f3 f3 00 00 00 00 00 00 00 00 00 00 00 00
^ ffff800089bc7c00: 00 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1 ffff800089bc7c80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 f3
Reported-by: kernel test robot oliver.sang@intel.com Closes: https://lore.kernel.org/oe-lkp/202409161632.cbeeca0d-lkp@intel.com Fixes: f011c9cf04c0 ("io_uring/sqpoll: do not allow pinning outside of cpuset") Tested-by: Felix Moessbauer felix.moessbauer@siemens.com Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Felix Moessbauer felix.moessbauer@siemens.com --- io_uring/io_uring.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index 6b6fd244233f8..a260852a0490c 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -8751,6 +8751,8 @@ static int io_sq_offload_create(struct io_ring_ctx *ctx, int cpu = p->sq_thread_cpu;
ret = -EINVAL; + if (cpu >= nr_cpu_ids || !cpu_online(cpu)) + goto err_sqpoll; cpuset_cpus_allowed(current, &allowed_mask); if (!cpumask_test_cpu(cpu, &allowed_mask)) goto err_sqpoll;
commit 7f44beadcc11adb98220556d2ddbe9c97aa6d42d upstream.
Putting the cpumask on the stack is deprecated for a long time (since 2d3854a37e8), as these can be big. Given that, change the on-stack allocation of allowed_mask to be dynamically allocated.
Fixes: f011c9cf04c0 ("io_uring/sqpoll: do not allow pinning outside of cpuset") Signed-off-by: Felix Moessbauer felix.moessbauer@siemens.com Link: https://lore.kernel.org/r/20240916111150.1266191-1-felix.moessbauer@siemens.... Signed-off-by: Jens Axboe axboe@kernel.dk --- io_uring/io_uring.c | 13 ++++++++++--- 1 file changed, 10 insertions(+), 3 deletions(-)
diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index a260852a0490c..12aade2ac68ea 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -8747,15 +8747,22 @@ static int io_sq_offload_create(struct io_ring_ctx *ctx, return 0;
if (p->flags & IORING_SETUP_SQ_AFF) { - struct cpumask allowed_mask; + cpumask_var_t allowed_mask; int cpu = p->sq_thread_cpu;
ret = -EINVAL; if (cpu >= nr_cpu_ids || !cpu_online(cpu)) goto err_sqpoll; - cpuset_cpus_allowed(current, &allowed_mask); - if (!cpumask_test_cpu(cpu, &allowed_mask)) + ret = -ENOMEM; + if (!alloc_cpumask_var(&allowed_mask, GFP_KERNEL)) + goto err_sqpoll; + ret = -EINVAL; + cpuset_cpus_allowed(current, allowed_mask); + if (!cpumask_test_cpu(cpu, allowed_mask)) { + free_cpumask_var(allowed_mask); goto err_sqpoll; + } + free_cpumask_var(allowed_mask); sqd->sq_cpu = cpu; } else { sqd->sq_cpu = -1;
linux-stable-mirror@lists.linaro.org