Thomas Gleixner tglx@linutronix.de writes:
On Fri, May 06 2022 at 09:15, Eric W. Biederman wrote:
* the init task will end up wanting to create kthreads, which, if * we schedule it before we create kthreadd, will OOPS. */
- pid = kernel_thread(kernel_init, NULL, CLONE_FS);
- pid = user_mode_thread(kernel_init, NULL, CLONE_FS);
So init does not have PF_KTHREAD set anymore, which causes this to go sideways with a NULL pointer dereference in get_mm_counter() on next:
Well not after the change above, but in a later patch yes.
Patch 1/7 really gets us back to the previous status quo, where I introduced the breakage.
get_mm_counter include/linux/mm.h:1996 [inline] get_mm_rss include/linux/mm.h:2049 [inline] task_nr_scan_windows.isra.0+0x23/0x120 kernel/sched/fair.c:1123 task_scan_min kernel/sched/fair.c:1144 [inline] task_scan_start+0x6c/0x400 kernel/sched/fair.c:1150 task_tick_numa kernel/sched/fair.c:2944 [inline] task_tick_fair+0xaeb/0xef0 kernel/sched/fair.c:11186 scheduler_tick+0x20a/0x5e0 kernel/sched/core.c:5380
https://lore.kernel.org/lkml/0000000000008a9fbb05dea76400@google.com
because the fence in task_tick_numa():
if ((curr->flags & (PF_EXITING | PF_KTHREAD)) || work->next != work) return;
is not longer sufficient. It needs also to bail if !curr->mm.
Agreed. I proposed a patch to do just that a little while ago.
I'm worried that there are more of these issues lurking. Haven't looked yet.
I looked earlier and I missed this one. I am going to look again today, along with applying the obvious fix to task_tick_numa.
I don't think there are many but when the code has evolved into a shape that is not easy to understand things occasionally slip through when the abstractions are made clear to understand. The reason to rework the code and make it clear is that once the code has evolved to a point of many subtle issues making any change is brittle.
Eric