On 1/19/22 07:43, Tadeusz Struk wrote:
Looks like after this change there is a time window, when task_struct->se.cfs_rq can be NULL. This can be exploited to trigger null-ptr-deref by calling setpriority on that task.
Looks like isn't good enough, either there is, in which case you explain the window, or there isn't in which case what are we doing here?
There surely is something wrong, otherwise it wouldn't crash. I will try to narrow down the reproducer to better understand what causes the fault.
The race is between sched_post_fork() and setpriority(PRIO_PGRP) The scenario is that the main process spawns 3 new threads, which then call setpriority(PRIO_PGRP, 0, -20), wait, and exit. For each of the new thread the copy_process() gets invoked, which then calls sched_fork() and finally sched_post_fork().
There is a possibility that setpriority(PRIO_PGRP)->set_one_prio() will be called for a thread in the group that is just being created by copy_process(), and for which the sched_post_fork() has not been executed yet. This will trigger a null pointer dereference in reweight_entity() because it will try to access the CFS run queue pointer, which hasn't been set, resulting it a crash as below:
KASAN: null-ptr-deref in range [0x00000000000000a0-0x00000000000000a7] CPU: 0 PID: 2392 Comm: reduced_repro Not tainted 5.16.0-11201-gb42c5a161ea3-dirty #13 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1.fc35 04/01/2014 RIP: 0010:reweight_entity+0x15d/0x440 RSP: 0018:ffffc900035dfcf8 EFLAGS: 00010006 Call Trace: <TASK> reweight_task+0xde/0x1c0 set_load_weight+0x21c/0x2b0 set_user_nice.part.0+0x2d1/0x519 set_user_nice.cold+0x8/0xd set_one_prio+0x24f/0x263 __do_sys_setpriority+0x2d3/0x640 __x64_sys_setpriority+0x84/0x8b do_syscall_64+0x35/0xb0 entry_SYSCALL_64_after_hwframe+0x44/0xae </TASK> ---[ end trace 9dc80a9d378ed00a ]---
Before the mentioned change the rq pointer has been set in sched_fork(), which is called much earlier in copy_process() as opposed to sched_post_fork(), before the new task is added to the thread_group.
A stripped down version of the sysbot reproducer can be found here: https://termbin.com/axkq
I can consistently reproduce the issue with it in 2-3 runs.
The solution is either we set the pointer p->se.cfs_rq to a dummy rq in sched_fork(), or return from the set_one_prio() without doing anything if the rq is NULL, as it is done in the patch. I will update the description and resend it tomorrow.