Syzbot found a GPF in reweight_entity. This has been bisected to commit c85c6fadbef0 ("kernel/sched: Fix sched_fork() access an invalid sched_task_group") Looks like after this change there is a time window, when task_struct->se.cfs_rq can be NULL. This can be exploited to trigger null-ptr-deref by calling setpriority on that task.
Cc: Ingo Molnar mingo@redhat.com Cc: Peter Zijlstra peterz@infradead.org Cc: Juri Lelli juri.lelli@redhat.com Cc: Vincent Guittot vincent.guittot@linaro.org Cc: Dietmar Eggemann dietmar.eggemann@arm.com Cc: Steven Rostedt rostedt@goodmis.org Cc: Ben Segall bsegall@google.com Cc: Mel Gorman mgorman@suse.de Cc: Daniel Bristot de Oliveira bristot@redhat.com Cc: Zhang Qiao zhangqiao22@huawei.com Cc: stable@vger.kernel.org Cc: linux-kernel@vger.kernel.org
Link: https://syzkaller.appspot.com/bug?id=9d9c27adc674e3a7932b22b61c79a02da82cbdc... Fixes: c85c6fadbef0 ("kernel/sched: Fix sched_fork() access an invalid sched_task_group") Signed-off-by: Tadeusz Struk tadeusz.struk@linaro.org --- kernel/sched/fair.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 095b0aa378df..196f8cee3f9b 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -3042,6 +3042,9 @@ dequeue_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se) { } static void reweight_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, unsigned long weight) { + if (!cfs_rq) + return; + if (se->on_rq) { /* commit outstanding execution time */ if (cfs_rq->curr == se)
On Wed, 19 Jan 2022 at 02:24, Tadeusz Struk tadeusz.struk@linaro.org wrote:
Syzbot found a GPF in reweight_entity. This has been bisected to commit c85c6fadbef0 ("kernel/sched: Fix sched_fork() access an invalid sched_task_group") Looks like after this change there is a time window, when task_struct->se.cfs_rq can be NULL. This can be exploited to trigger null-ptr-deref by calling setpriority on that task.
Cc: Ingo Molnar mingo@redhat.com Cc: Peter Zijlstra peterz@infradead.org Cc: Juri Lelli juri.lelli@redhat.com Cc: Vincent Guittot vincent.guittot@linaro.org Cc: Dietmar Eggemann dietmar.eggemann@arm.com Cc: Steven Rostedt rostedt@goodmis.org Cc: Ben Segall bsegall@google.com Cc: Mel Gorman mgorman@suse.de Cc: Daniel Bristot de Oliveira bristot@redhat.com Cc: Zhang Qiao zhangqiao22@huawei.com Cc: stable@vger.kernel.org Cc: linux-kernel@vger.kernel.org
Link: https://syzkaller.appspot.com/bug?id=9d9c27adc674e3a7932b22b61c79a02da82cbdc... Fixes: c85c6fadbef0 ("kernel/sched: Fix sched_fork() access an invalid sched_task_group")
The sha1 doesn't look correct.
Signed-off-by: Tadeusz Struk tadeusz.struk@linaro.org
kernel/sched/fair.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 095b0aa378df..196f8cee3f9b 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -3042,6 +3042,9 @@ dequeue_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se) { } static void reweight_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, unsigned long weight) {
if (!cfs_rq)
return;
if (se->on_rq) { /* commit outstanding execution time */ if (cfs_rq->curr == se)
-- 2.34.1
On Tue, Jan 18, 2022 at 05:24:17PM -0800, Tadeusz Struk wrote:
Syzbot found a GPF in reweight_entity. This has been bisected to commit c85c6fadbef0 ("kernel/sched: Fix sched_fork() access an invalid sched_task_group")
That's a stable commit, the real commit is 4ef0c5c6b5ba1f38f0ea1cedad0cad722f00c14a
Looks like after this change there is a time window, when task_struct->se.cfs_rq can be NULL. This can be exploited to trigger null-ptr-deref by calling setpriority on that task.
Looks like isn't good enough, either there is, in which case you explain the window, or there isn't in which case what are we doing here?
On 1/19/22 01:32, Peter Zijlstra wrote:
On Tue, Jan 18, 2022 at 05:24:17PM -0800, Tadeusz Struk wrote:
Syzbot found a GPF in reweight_entity. This has been bisected to commit c85c6fadbef0 ("kernel/sched: Fix sched_fork() access an invalid sched_task_group")
That's a stable commit, the real commit is 4ef0c5c6b5ba1f38f0ea1cedad0cad722f00c14a
This is what syzbot bisected it to. I will reference the original commit in the next version.
Looks like after this change there is a time window, when task_struct->se.cfs_rq can be NULL. This can be exploited to trigger null-ptr-deref by calling setpriority on that task.
Looks like isn't good enough, either there is, in which case you explain the window, or there isn't in which case what are we doing here?
There surely is something wrong, otherwise it wouldn't crash. I will try to narrow down the reproducer to better understand what causes the fault.
On 1/19/22 07:43, Tadeusz Struk wrote:
Looks like after this change there is a time window, when task_struct->se.cfs_rq can be NULL. This can be exploited to trigger null-ptr-deref by calling setpriority on that task.
Looks like isn't good enough, either there is, in which case you explain the window, or there isn't in which case what are we doing here?
There surely is something wrong, otherwise it wouldn't crash. I will try to narrow down the reproducer to better understand what causes the fault.
The race is between sched_post_fork() and setpriority(PRIO_PGRP) The scenario is that the main process spawns 3 new threads, which then call setpriority(PRIO_PGRP, 0, -20), wait, and exit. For each of the new thread the copy_process() gets invoked, which then calls sched_fork() and finally sched_post_fork().
There is a possibility that setpriority(PRIO_PGRP)->set_one_prio() will be called for a thread in the group that is just being created by copy_process(), and for which the sched_post_fork() has not been executed yet. This will trigger a null pointer dereference in reweight_entity() because it will try to access the CFS run queue pointer, which hasn't been set, resulting it a crash as below:
KASAN: null-ptr-deref in range [0x00000000000000a0-0x00000000000000a7] CPU: 0 PID: 2392 Comm: reduced_repro Not tainted 5.16.0-11201-gb42c5a161ea3-dirty #13 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1.fc35 04/01/2014 RIP: 0010:reweight_entity+0x15d/0x440 RSP: 0018:ffffc900035dfcf8 EFLAGS: 00010006 Call Trace: <TASK> reweight_task+0xde/0x1c0 set_load_weight+0x21c/0x2b0 set_user_nice.part.0+0x2d1/0x519 set_user_nice.cold+0x8/0xd set_one_prio+0x24f/0x263 __do_sys_setpriority+0x2d3/0x640 __x64_sys_setpriority+0x84/0x8b do_syscall_64+0x35/0xb0 entry_SYSCALL_64_after_hwframe+0x44/0xae </TASK> ---[ end trace 9dc80a9d378ed00a ]---
Before the mentioned change the rq pointer has been set in sched_fork(), which is called much earlier in copy_process() as opposed to sched_post_fork(), before the new task is added to the thread_group.
A stripped down version of the sysbot reproducer can be found here: https://termbin.com/axkq
I can consistently reproduce the issue with it in 2-3 runs.
The solution is either we set the pointer p->se.cfs_rq to a dummy rq in sched_fork(), or return from the set_one_prio() without doing anything if the rq is NULL, as it is done in the patch. I will update the description and resend it tomorrow.
linux-stable-mirror@lists.linaro.org