Syzbot found a GPF in reweight_entity. This has been bisected to commit
4ef0c5c6b5ba ("kernel/sched: Fix sched_fork() access an invalid sched_task_group")
There is a race between sched_post_fork() and setpriority(PRIO_PGRP)
within a thread group that causes a null-ptr-deref in reweight_entity()
in CFS. The scenario is that the main process spawns number of new
threads, which then call setpriority(PRIO_PGRP, 0, -20), wait, and exit.
For each of the new threads the copy_process() gets invoked, which adds
the new task_struct and calls sched_post_fork() for it.
In the above scenario there is a possibility that setpriority(PRIO_PGRP)
and set_one_prio() will be called for a thread in the group that is just
being created by copy_process(), and for which the sched_post_fork() has
not been executed yet. This will trigger a null pointer dereference in
reweight_entity(), as it will try to access the run queue pointer, which
hasn't been set. This results it a crash as shown below:
KASAN: null-ptr-deref in range [0x00000000000000a0-0x00000000000000a7]
CPU: 0 PID: 2392 Comm: reduced_repro Not tainted 5.16.0-11201-gb42c5a161ea3
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1.fc35
RIP: 0010:reweight_entity+0x15d/0x440
RSP: 0018:ffffc900035dfcf8 EFLAGS: 00010006
Call Trace:
<TASK>
reweight_task+0xde/0x1c0
set_load_weight+0x21c/0x2b0
set_user_nice.part.0+0x2d1/0x519
set_user_nice.cold+0x8/0xd
set_one_prio+0x24f/0x263
__do_sys_setpriority+0x2d3/0x640
__x64_sys_setpriority+0x84/0x8b
do_syscall_64+0x35/0xb0
entry_SYSCALL_64_after_hwframe+0x44/0xae
</TASK>
---[ end trace 9dc80a9d378ed00a ]---
Before the mentioned change the cfs_rq pointer for the task has been
set in sched_fork(), which is called much earlier in copy_process(),
before the new task is added to the thread_group.
Now it is done in the sched_post_fork(), which is called after that.
To fix the issue the remove the update_load param from the
update_load param() function and call reweight_task() only if the task
flag doesn't have the TASK_NEW flag set.
Cc: Ingo Molnar <mingo(a)redhat.com>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Juri Lelli <juri.lelli(a)redhat.com>
Cc: Vincent Guittot <vincent.guittot(a)linaro.org>
Cc: Dietmar Eggemann <dietmar.eggemann(a)arm.com>
Cc: Steven Rostedt <rostedt(a)goodmis.org>
Cc: Ben Segall <bsegall(a)google.com>
Cc: Mel Gorman <mgorman(a)suse.de>
Cc: Daniel Bristot de Oliveira <bristot(a)redhat.com>
Cc: Zhang Qiao <zhangqiao22(a)huawei.com>
Cc: stable(a)vger.kernel.org
Cc: linux-kernel(a)vger.kernel.org
Link: https://syzkaller.appspot.com/bug?id=9d9c27adc674e3a7932b22b61c79a02da82cbd…
Fixes: 4ef0c5c6b5ba ("kernel/sched: Fix sched_fork() access an invalid sched_task_group")
Reported-by: syzbot+af7a719bc92395ee41b3(a)syzkaller.appspotmail.com
Reviewed-by: Dietmar Eggemann <dietmar.eggemann(a)arm.com>
Signed-off-by: Tadeusz Struk <tadeusz.struk(a)linaro.org>
---
Changes in v5:
- Changed the order of local variables declaration in set_load_weight
to comply with the coding standard
Changes in v4:
- Removed the update_load param from set_load_weight() and call
reweight_task() based on the TASK_NEW flag
Changes in v3:
- Removed the new check and changed the update_load condition from
always true to true if p->state != TASK_NEW
Changes in v2:
- Added a check in set_user_nice(), and return from there if the task
is not fully setup instead of returning from reweight_entity()
---
kernel/sched/core.c | 11 ++++++-----
1 file changed, 6 insertions(+), 5 deletions(-)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 848eaa0efe0e..fcf0c180617c 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1214,8 +1214,9 @@ int tg_nop(struct task_group *tg, void *data)
}
#endif
-static void set_load_weight(struct task_struct *p, bool update_load)
+static void set_load_weight(struct task_struct *p)
{
+ bool update_load = !(READ_ONCE(p->__state) & TASK_NEW);
int prio = p->static_prio - MAX_RT_PRIO;
struct load_weight *load = &p->se.load;
@@ -4406,7 +4407,7 @@ int sched_fork(unsigned long clone_flags, struct task_struct *p)
p->static_prio = NICE_TO_PRIO(0);
p->prio = p->normal_prio = p->static_prio;
- set_load_weight(p, false);
+ set_load_weight(p);
/*
* We don't need the reset flag anymore after the fork. It has
@@ -6921,7 +6922,7 @@ void set_user_nice(struct task_struct *p, long nice)
put_prev_task(rq, p);
p->static_prio = NICE_TO_PRIO(nice);
- set_load_weight(p, true);
+ set_load_weight(p);
old_prio = p->prio;
p->prio = effective_prio(p);
@@ -7212,7 +7213,7 @@ static void __setscheduler_params(struct task_struct *p,
*/
p->rt_priority = attr->sched_priority;
p->normal_prio = normal_prio(p);
- set_load_weight(p, true);
+ set_load_weight(p);
}
/*
@@ -9445,7 +9446,7 @@ void __init sched_init(void)
#endif
}
- set_load_weight(&init_task, false);
+ set_load_weight(&init_task);
/*
* The boot idle thread does lazy MMU switching as well:
--
2.34.1
Since the SUBLEVEL overflowed LINUX_VERSION, we have no reliable
way to tell the current SUBLEVEL in source code.
This brings significant difficulties for backport works to deal
with changes in stable releases.
Define those macros so we can continue to get proper SUBLEVEL
in source without breaking stable ABI by refining KERNEL_VERSION
bit fields.
Signed-off-by: Jiaxun Yang <jiaxun.yang(a)flygoat.com>
---
For some context: https://lore.kernel.org/backports/bb0ae37aa770e016463706d557fec1c5205bc6a9.…
---
Makefile | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/Makefile b/Makefile
index 99d37c23495e..8132f81d94d8 100644
--- a/Makefile
+++ b/Makefile
@@ -1142,7 +1142,10 @@ endef
define filechk_version.h
(echo \#define LINUX_VERSION_CODE $(shell \
expr $(VERSION) \* 65536 + 0$(PATCHLEVEL) \* 256 + 255); \
- echo '#define KERNEL_VERSION(a,b,c) (((a) << 16) + ((b) << 8) + (c))';)
+ echo '#define KERNEL_VERSION(a,b,c) (((a) << 16) + ((b) << 8) + (c))'; \
+ echo \#define LINUX_VERSION_MAJOR $(VERSION); \
+ echo \#define LINUX_VERSION_PATCHLEVEL $(PATCHLEVEL); \
+ echo \#define LINUX_VERSION_SUBLEVEL $(SUBLEVEL);)
endef
$(version_h): $(srctree)/Makefile FORCE
--
2.35.1