Hey everyone,
This is a follow-up to the do_fork() cleanup from last cycle based on a short discussion this was merged. Last cycle we removed copy_thread_tls() and the associated Kconfig option for each architecture. Now we are only left with copy_thread(). Part of this work was removing the old do_fork() legacy clone()-style calling convention in favor of the new struct kernel_clone args calling convention. The only remaining function callable outside of kernel/fork.c is _do_fork(). It doesn't really follow the naming of kernel-internal syscall helpers as Christoph righly pointed out. Switch all callers and references to kernel_clone() and remove _do_fork() once and for all.
For all architectures I have done a full git rebase v5.9-rc1 -x "make -j31". There were no built failures and the changes were fairly mechanical.
The only helpers we have left now are kernel_thread() and kernel_clone() where kernel_thread() just calls kernel_clone().
Thanks! Christian
Christian Brauner (11): fork: introduce kernel_clone() h8300: switch to kernel_clone() ia64: switch to kernel_clone() m68k: switch to kernel_clone() nios2: switch to kernel_clone() sparc: switch to kernel_clone() x86: switch to kernel_clone() kprobes: switch to kernel_clone() kgdbts: switch to kernel_clone() tracing: switch to kernel_clone() sched: remove _do_fork()
Documentation/trace/histogram.rst | 4 +- arch/h8300/kernel/process.c | 2 +- arch/ia64/kernel/process.c | 4 +- arch/m68k/kernel/process.c | 10 ++-- arch/nios2/kernel/process.c | 2 +- arch/sparc/kernel/process.c | 6 +-- arch/x86/kernel/sys_ia32.c | 2 +- drivers/misc/kgdbts.c | 48 +++++++++---------- include/linux/sched/task.h | 2 +- kernel/fork.c | 14 +++--- samples/kprobes/kprobe_example.c | 6 +-- samples/kprobes/kretprobe_example.c | 4 +- .../test.d/dynevent/add_remove_kprobe.tc | 2 +- .../test.d/dynevent/clear_select_events.tc | 2 +- .../test.d/dynevent/generic_clear_event.tc | 2 +- .../test.d/ftrace/func-filter-stacktrace.tc | 4 +- .../ftrace/test.d/kprobe/add_and_remove.tc | 2 +- .../ftrace/test.d/kprobe/busy_check.tc | 2 +- .../ftrace/test.d/kprobe/kprobe_args.tc | 4 +- .../ftrace/test.d/kprobe/kprobe_args_comm.tc | 2 +- .../test.d/kprobe/kprobe_args_string.tc | 4 +- .../test.d/kprobe/kprobe_args_symbol.tc | 10 ++-- .../ftrace/test.d/kprobe/kprobe_args_type.tc | 2 +- .../ftrace/test.d/kprobe/kprobe_ftrace.tc | 14 +++--- .../ftrace/test.d/kprobe/kprobe_multiprobe.tc | 2 +- .../test.d/kprobe/kprobe_syntax_errors.tc | 12 ++--- .../ftrace/test.d/kprobe/kretprobe_args.tc | 4 +- .../selftests/ftrace/test.d/kprobe/profile.tc | 2 +- 28 files changed, 87 insertions(+), 87 deletions(-)
base-commit: 9123e3a74ec7b934a4a099e98af6a61c2f80bbf5
The old _do_fork() helper doesn't follow naming conventions of in-kernel helpers for syscalls. The process creation cleanup in [1] didn't change the name to something more reasonable mainly because _do_fork() was used in quite a few places. So sending this as a separate series seemed the better strategy.
This commit renames _do_fork() to kernel_clone() but keeps _do_fork() as a simple static inline wrapper around kernel_clone(). Follow-up patches will switch each caller of _do_fork() and each place where it is referenced over to kernel_clone(). After all these changes are done, we can remove _do_fork() completely and will only be left with kernel_clone().
[1]: 9ba27414f2ec ("Merge tag 'fork-v5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux") Cc: Thomas Gleixner tglx@linutronix.de Cc: Ingo Molnar mingo@kernel.org Cc: "Peter Zijlstra (Intel)" peterz@infradead.org Signed-off-by: Christian Brauner christian.brauner@ubuntu.com --- include/linux/sched/task.h | 6 +++++- kernel/fork.c | 14 +++++++------- 2 files changed, 12 insertions(+), 8 deletions(-)
diff --git a/include/linux/sched/task.h b/include/linux/sched/task.h index a98965007eef..d9ef07359c96 100644 --- a/include/linux/sched/task.h +++ b/include/linux/sched/task.h @@ -83,7 +83,11 @@ extern void do_group_exit(int); extern void exit_files(struct task_struct *); extern void exit_itimers(struct signal_struct *);
-extern long _do_fork(struct kernel_clone_args *kargs); +extern int kernel_clone(struct kernel_clone_args *kargs); +static inline long _do_fork(struct kernel_clone_args *kargs) +{ + return kernel_clone(kargs); +} struct task_struct *fork_idle(int); struct mm_struct *copy_init_mm(void); extern pid_t kernel_thread(int (*fn)(void *), void *arg, unsigned long flags); diff --git a/kernel/fork.c b/kernel/fork.c index 4d32190861bd..34e37cee239f 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -2384,7 +2384,7 @@ struct mm_struct *copy_init_mm(void) * * args->exit_signal is expected to be checked for sanity by the caller. */ -long _do_fork(struct kernel_clone_args *args) +int kernel_clone(struct kernel_clone_args *args) { u64 clone_flags = args->flags; struct completion vfork; @@ -2477,7 +2477,7 @@ pid_t kernel_thread(int (*fn)(void *), void *arg, unsigned long flags) .stack_size = (unsigned long)arg, };
- return _do_fork(&args); + return kernel_clone(&args); }
#ifdef __ARCH_WANT_SYS_FORK @@ -2488,7 +2488,7 @@ SYSCALL_DEFINE0(fork) .exit_signal = SIGCHLD, };
- return _do_fork(&args); + return kernel_clone(&args); #else /* can not support in nommu mode */ return -EINVAL; @@ -2504,7 +2504,7 @@ SYSCALL_DEFINE0(vfork) .exit_signal = SIGCHLD, };
- return _do_fork(&args); + return kernel_clone(&args); } #endif
@@ -2542,7 +2542,7 @@ SYSCALL_DEFINE5(clone, unsigned long, clone_flags, unsigned long, newsp, .tls = tls, };
- return _do_fork(&args); + return kernel_clone(&args); } #endif
@@ -2700,7 +2700,7 @@ SYSCALL_DEFINE2(clone3, struct clone_args __user *, uargs, size_t, size) if (!clone3_args_valid(&kargs)) return -EINVAL;
- return _do_fork(&kargs); + return kernel_clone(&kargs); } #endif
@@ -2863,7 +2863,7 @@ int unshare_fd(unsigned long unshare_flags, unsigned int max_fds, /* * unshare allows a process to 'unshare' part of the process * context which was originally shared using clone. copy_* - * functions used by _do_fork() cannot be used here directly + * functions used by kernel_clone() cannot be used here directly * because they modify an inactive task_struct that is being * constructed. Here we are modifying the current, active, * task_struct.
The old _do_fork() helper is removed in favor of the new kernel_clone() helper. The latter adheres to naming conventions for kernel internal syscall helpers.
Cc: Greentime Hu green.hu@gmail.com Cc: Yoshinori Sato ysato@users.sourceforge.jp Cc: uclinux-h8-devel@lists.sourceforge.jp Signed-off-by: Christian Brauner christian.brauner@ubuntu.com --- arch/h8300/kernel/process.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/h8300/kernel/process.c b/arch/h8300/kernel/process.c index 83ce3caf7313..aea0a40b77a9 100644 --- a/arch/h8300/kernel/process.c +++ b/arch/h8300/kernel/process.c @@ -172,5 +172,5 @@ asmlinkage int sys_clone(unsigned long __user *args) kargs.exit_signal = (lower_32_bits(clone_flags) & CSIGNAL); kargs.stack = newsp;
- return _do_fork(&kargs); + return kernel_clone(&kargs); }
The old _do_fork() helper is removed in favor of the new kernel_clone() helper. The latter adheres to naming conventions for kernel internal syscall helpers.
Cc: Tony Luck tony.luck@intel.com Cc: Fenghua Yu fenghua.yu@intel.com Cc: linux-ia64@vger.kernel.org Signed-off-by: Christian Brauner christian.brauner@ubuntu.com --- arch/ia64/kernel/process.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/ia64/kernel/process.c b/arch/ia64/kernel/process.c index f19cb97c0098..126f72490ec7 100644 --- a/arch/ia64/kernel/process.c +++ b/arch/ia64/kernel/process.c @@ -310,7 +310,7 @@ ia64_load_extra (struct task_struct *task) * * <clone syscall> <some kernel call frames> * sys_clone : - * _do_fork _do_fork + * kernel_clone kernel_clone * copy_thread copy_thread * * This means that the stack layout is as follows: @@ -455,7 +455,7 @@ asmlinkage long ia64_clone(unsigned long clone_flags, unsigned long stack_start, .tls = tls, };
- return _do_fork(&args); + return kernel_clone(&args); }
static void
The old _do_fork() helper is removed in favor of the new kernel_clone() helper. The latter adheres to naming conventions for kernel internal syscall helpers.
Cc: Kars de Jong jongk@linux-m68k.org Cc: Geert Uytterhoeven geert@linux-m68k.org Cc: linux-m68k@lists.linux-m68k.org Signed-off-by: Christian Brauner christian.brauner@ubuntu.com --- arch/m68k/kernel/process.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/arch/m68k/kernel/process.c b/arch/m68k/kernel/process.c index 6492a2c54dbc..08359a6e058f 100644 --- a/arch/m68k/kernel/process.c +++ b/arch/m68k/kernel/process.c @@ -107,10 +107,10 @@ void flush_thread(void) * on top of pt_regs, which means that sys_clone() arguments would be * buried. We could, of course, copy them, but it's too costly for no * good reason - generic clone() would have to copy them *again* for - * _do_fork() anyway. So in this case it's actually better to pass pt_regs * - * and extract arguments for _do_fork() from there. Eventually we might - * go for calling _do_fork() directly from the wrapper, but only after we - * are finished with _do_fork() prototype conversion. + * kernel_clone() anyway. So in this case it's actually better to pass pt_regs * + * and extract arguments for kernel_clone() from there. Eventually we might + * go for calling kernel_clone() directly from the wrapper, but only after we + * are finished with kernel_clone() prototype conversion. */ asmlinkage int m68k_clone(struct pt_regs *regs) { @@ -125,7 +125,7 @@ asmlinkage int m68k_clone(struct pt_regs *regs) .tls = regs->d5, };
- return _do_fork(&args); + return kernel_clone(&args); }
/*
On Tue, Aug 18, 2020 at 7:37 PM Christian Brauner christian.brauner@ubuntu.com wrote:
The old _do_fork() helper is removed in favor of the new kernel_clone() helper. The latter adheres to naming conventions for kernel internal syscall helpers.
Cc: Kars de Jong jongk@linux-m68k.org Cc: Geert Uytterhoeven geert@linux-m68k.org Cc: linux-m68k@lists.linux-m68k.org Signed-off-by: Christian Brauner christian.brauner@ubuntu.com
Acked-by: Geert Uytterhoeven geert@linux-m68k.org
Gr{oetje,eeting}s,
Geert
The old _do_fork() helper is removed in favor of the new kernel_clone() helper. The latter adheres to naming conventions for kernel internal syscall helpers.
Cc: Ley Foon Tan ley.foon.tan@intel.com Signed-off-by: Christian Brauner christian.brauner@ubuntu.com --- arch/nios2/kernel/process.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/nios2/kernel/process.c b/arch/nios2/kernel/process.c index 88a4ec03edab..4ffe857e6ada 100644 --- a/arch/nios2/kernel/process.c +++ b/arch/nios2/kernel/process.c @@ -266,5 +266,5 @@ asmlinkage int nios2_clone(unsigned long clone_flags, unsigned long newsp, .tls = tls, };
- return _do_fork(&args); + return kernel_clone(&args); }
The old _do_fork() helper is removed in favor of the new kernel_clone() helper. The latter adheres to naming conventions for kernel internal syscall helpers.
Cc: "David S. Miller" davem@davemloft.net Cc: sparclinux@vger.kernel.org Signed-off-by: Christian Brauner christian.brauner@ubuntu.com --- arch/sparc/kernel/process.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/arch/sparc/kernel/process.c b/arch/sparc/kernel/process.c index 5234b5ccc0b9..0442ab00518d 100644 --- a/arch/sparc/kernel/process.c +++ b/arch/sparc/kernel/process.c @@ -25,7 +25,7 @@ asmlinkage long sparc_fork(struct pt_regs *regs) .stack = regs->u_regs[UREG_FP], };
- ret = _do_fork(&args); + ret = kernel_clone(&args);
/* If we get an error and potentially restart the system * call, we're screwed because copy_thread() clobbered @@ -50,7 +50,7 @@ asmlinkage long sparc_vfork(struct pt_regs *regs) .stack = regs->u_regs[UREG_FP], };
- ret = _do_fork(&args); + ret = kernel_clone(&args);
/* If we get an error and potentially restart the system * call, we're screwed because copy_thread() clobbered @@ -96,7 +96,7 @@ asmlinkage long sparc_clone(struct pt_regs *regs) else args.stack = regs->u_regs[UREG_FP];
- ret = _do_fork(&args); + ret = kernel_clone(&args);
/* If we get an error and potentially restart the system * call, we're screwed because copy_thread() clobbered
The old _do_fork() helper is removed in favor of the new kernel_clone() helper. The latter adheres to naming conventions for kernel internal syscall helpers.
Cc: Thomas Gleixner tglx@linutronix.de Cc: Ingo Molnar mingo@redhat.com Cc: Borislav Petkov bp@alien8.de Cc: x86@kernel.org Signed-off-by: Christian Brauner christian.brauner@ubuntu.com --- arch/x86/kernel/sys_ia32.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/x86/kernel/sys_ia32.c b/arch/x86/kernel/sys_ia32.c index 720cde885042..6cf65397d225 100644 --- a/arch/x86/kernel/sys_ia32.c +++ b/arch/x86/kernel/sys_ia32.c @@ -251,6 +251,6 @@ COMPAT_SYSCALL_DEFINE5(ia32_clone, unsigned long, clone_flags, .tls = tls_val, };
- return _do_fork(&args); + return kernel_clone(&args); } #endif /* CONFIG_IA32_EMULATION */
The old _do_fork() helper is removed in favor of the new kernel_clone() helper. The latter adheres to naming conventions for kernel internal syscall helpers.
Cc: Mauro Carvalho Chehab mchehab+huawei@kernel.org Cc: Alexandre Chartre alexandre.chartre@oracle.com Cc: Jonathan Corbet corbet@lwn.net Cc: Masami Hiramatsu mhiramat@kernel.org Signed-off-by: Christian Brauner christian.brauner@ubuntu.com --- samples/kprobes/kprobe_example.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/samples/kprobes/kprobe_example.c b/samples/kprobes/kprobe_example.c index 240f2435ce6f..a02f53836ee1 100644 --- a/samples/kprobes/kprobe_example.c +++ b/samples/kprobes/kprobe_example.c @@ -2,13 +2,13 @@ /* * NOTE: This example is works on x86 and powerpc. * Here's a sample kernel module showing the use of kprobes to dump a - * stack trace and selected registers when _do_fork() is called. + * stack trace and selected registers when kernel_clone() is called. * * For more information on theory of operation of kprobes, see * Documentation/staging/kprobes.rst * * You will see the trace data in /var/log/messages and on the console - * whenever _do_fork() is invoked to create a new process. + * whenever kernel_clone() is invoked to create a new process. */
#include <linux/kernel.h> @@ -16,7 +16,7 @@ #include <linux/kprobes.h>
#define MAX_SYMBOL_LEN 64 -static char symbol[MAX_SYMBOL_LEN] = "_do_fork"; +static char symbol[MAX_SYMBOL_LEN] = "kernel_clone"; module_param_string(symbol, symbol, sizeof(symbol), 0644);
/* For each probe you need to allocate a kprobe structure */
The old _do_fork() helper is removed in favor of the new kernel_clone() helper. The latter adheres to naming conventions for kernel internal syscall helpers.
Cc: Douglas Anderson dianders@chromium.org Cc: Arnd Bergmann arnd@arndb.de Cc: Greg Kroah-Hartman gregkh@linuxfoundation.org Cc: Daniel Thompson daniel.thompson@linaro.org Cc: Jason Wessel jason.wessel@windriver.com Cc: kgdb-bugreport@lists.sourceforge.net Signed-off-by: Christian Brauner christian.brauner@ubuntu.com --- drivers/misc/kgdbts.c | 48 +++++++++++++++++++++---------------------- 1 file changed, 24 insertions(+), 24 deletions(-)
diff --git a/drivers/misc/kgdbts.c b/drivers/misc/kgdbts.c index d5d2af4d10e6..945701bce553 100644 --- a/drivers/misc/kgdbts.c +++ b/drivers/misc/kgdbts.c @@ -33,16 +33,16 @@ * You can also specify optional tests: * N## = Go to sleep with interrupts of for ## seconds * to test the HW NMI watchdog - * F## = Break at do_fork for ## iterations + * F## = Break at kernel_clone for ## iterations * S## = Break at sys_open for ## iterations * I## = Run the single step test ## iterations * - * NOTE: that the do_fork and sys_open tests are mutually exclusive. + * NOTE: that the kernel_clone and sys_open tests are mutually exclusive. * * To invoke the kgdb test suite from boot you use a kernel start * argument as follows: * kgdbts=V1 kgdbwait - * Or if you wanted to perform the NMI test for 6 seconds and do_fork + * Or if you wanted to perform the NMI test for 6 seconds and kernel_clone * test for 100 forks, you could use: * kgdbts=V1N6F100 kgdbwait * @@ -74,7 +74,7 @@ * echo kgdbts=V1S10000 > /sys/module/kgdbts/parameters/kgdbts * fg # and hit control-c * fg # and hit control-c - * ## This tests break points on do_fork + * ## This tests break points on kernel_clone * while [ 1 ] ; do date > /dev/null ; done & * while [ 1 ] ; do date > /dev/null ; done & * echo kgdbts=V1F1000 > /sys/module/kgdbts/parameters/kgdbts @@ -209,8 +209,8 @@ static unsigned long lookup_addr(char *arg) addr = (unsigned long)kgdbts_break_test; else if (!strcmp(arg, "sys_open")) addr = (unsigned long)do_sys_open; - else if (!strcmp(arg, "do_fork")) - addr = (unsigned long)_do_fork; + else if (!strcmp(arg, "kernel_clone")) + addr = (unsigned long)kernel_clone; else if (!strcmp(arg, "hw_break_val")) addr = (unsigned long)&hw_break_val; addr = (unsigned long) dereference_function_descriptor((void *)addr); @@ -310,7 +310,7 @@ static int check_and_rewind_pc(char *put_str, char *arg)
if (arch_needs_sstep_emulation && sstep_addr && ip + offset == sstep_addr && - ((!strcmp(arg, "sys_open") || !strcmp(arg, "do_fork")))) { + ((!strcmp(arg, "sys_open") || !strcmp(arg, "kernel_clone")))) { /* This is special case for emulated single step */ v2printk("Emul: rewind hit single step bp\n"); restart_from_top_after_write = 1; @@ -596,19 +596,19 @@ static struct test_struct singlestep_break_test[] = { };
/* - * Test for hitting a breakpoint at do_fork for what ever the number + * Test for hitting a breakpoint at kernel_clone for what ever the number * of iterations required by the variable repeat_test. */ -static struct test_struct do_fork_test[] = { +static struct test_struct do_kernel_clone_test[] = { { "?", "S0*" }, /* Clear break points */ - { "do_fork", "OK", sw_break, }, /* set sw breakpoint */ + { "kernel_clone", "OK", sw_break, }, /* set sw breakpoint */ { "c", "T0*", NULL, get_thread_id_continue }, /* Continue */ - { "do_fork", "OK", sw_rem_break }, /*remove breakpoint */ - { "g", "do_fork", NULL, check_and_rewind_pc }, /* check location */ + { "kernel_clone", "OK", sw_rem_break }, /*remove breakpoint */ + { "g", "kernel_clone", NULL, check_and_rewind_pc }, /* check location */ { "write", "OK", write_regs, emul_reset }, /* Write registers */ { "s", "T0*", emul_sstep_get, emul_sstep_put }, /* Single step */ - { "g", "do_fork", NULL, check_single_step }, - { "do_fork", "OK", sw_break, }, /* set sw breakpoint */ + { "g", "kernel_clone", NULL, check_single_step }, + { "kernel_clone", "OK", sw_break, }, /* set sw breakpoint */ { "7", "T0*", skip_back_repeat_test }, /* Loop based on repeat_test */ { "D", "OK", NULL, final_ack_set }, /* detach and unregister I/O */ { "", "", get_cont_catch, put_cont_catch }, @@ -935,11 +935,11 @@ static void run_bad_read_test(void) kgdb_breakpoint(); }
-static void run_do_fork_test(void) +static void run_kernel_clone_test(void) { init_simple_test(); - ts.tst = do_fork_test; - ts.name = "do_fork_test"; + ts.tst = do_kernel_clone_test; + ts.name = "do_kernel_clone_test"; /* Activate test with initial breakpoint */ kgdb_breakpoint(); } @@ -967,7 +967,7 @@ static void run_singlestep_break_test(void) static void kgdbts_run_tests(void) { char *ptr; - int fork_test = 0; + int clone_test = 0; int do_sys_open_test = 0; int sstep_test = 1000; int nmi_sleep = 0; @@ -981,7 +981,7 @@ static void kgdbts_run_tests(void)
ptr = strchr(config, 'F'); if (ptr) - fork_test = simple_strtol(ptr + 1, NULL, 10); + clone_test = simple_strtol(ptr + 1, NULL, 10); ptr = strchr(config, 'S'); if (ptr) do_sys_open_test = simple_strtol(ptr + 1, NULL, 10); @@ -1025,16 +1025,16 @@ static void kgdbts_run_tests(void) run_nmi_sleep_test(nmi_sleep); }
- /* If the do_fork test is run it will be the last test that is + /* If the kernel_clone test is run it will be the last test that is * executed because a kernel thread will be spawned at the very * end to unregister the debug hooks. */ - if (fork_test) { - repeat_test = fork_test; - printk(KERN_INFO "kgdbts:RUN do_fork for %i breakpoints\n", + if (clone_test) { + repeat_test = clone_test; + printk(KERN_INFO "kgdbts:RUN kernel_clone for %i breakpoints\n", repeat_test); kthread_run(kgdbts_unreg_thread, NULL, "kgdbts_unreg"); - run_do_fork_test(); + run_kernel_clone_test(); return; }
The old _do_fork() helper is removed in favor of the new kernel_clone() helper. The latter adheres to naming conventions for kernel internal syscall helpers.
Cc: Mauro Carvalho Chehab mchehab+huawei@kernel.org Cc: Alexandre Chartre alexandre.chartre@oracle.com Cc: Jonathan Corbet corbet@lwn.net Cc: Thomas Gleixner tglx@linutronix.de Cc: Masami Hiramatsu mhiramat@kernel.org Cc: Shuah Khan shuah@kernel.org Cc: Ingo Molnar mingo@redhat.com Cc: Steven Rostedt rostedt@goodmis.org Cc: Xiao Yang yangx.jy@cn.fujitsu.com Cc: Tom Zanussi zanussi@kernel.org Cc: linux-doc@vger.kernel.org Cc: linux-kselftest@vger.kernel.org Signed-off-by: Christian Brauner christian.brauner@ubuntu.com --- Documentation/trace/histogram.rst | 4 ++-- samples/kprobes/kretprobe_example.c | 4 ++-- .../ftrace/test.d/dynevent/add_remove_kprobe.tc | 2 +- .../ftrace/test.d/dynevent/clear_select_events.tc | 2 +- .../ftrace/test.d/dynevent/generic_clear_event.tc | 2 +- .../ftrace/test.d/ftrace/func-filter-stacktrace.tc | 4 ++-- .../ftrace/test.d/kprobe/add_and_remove.tc | 2 +- .../selftests/ftrace/test.d/kprobe/busy_check.tc | 2 +- .../selftests/ftrace/test.d/kprobe/kprobe_args.tc | 4 ++-- .../ftrace/test.d/kprobe/kprobe_args_comm.tc | 2 +- .../ftrace/test.d/kprobe/kprobe_args_string.tc | 4 ++-- .../ftrace/test.d/kprobe/kprobe_args_symbol.tc | 10 +++++----- .../ftrace/test.d/kprobe/kprobe_args_type.tc | 2 +- .../ftrace/test.d/kprobe/kprobe_ftrace.tc | 14 +++++++------- .../ftrace/test.d/kprobe/kprobe_multiprobe.tc | 2 +- .../ftrace/test.d/kprobe/kprobe_syntax_errors.tc | 12 ++++++------ .../ftrace/test.d/kprobe/kretprobe_args.tc | 4 ++-- .../selftests/ftrace/test.d/kprobe/profile.tc | 2 +- 18 files changed, 39 insertions(+), 39 deletions(-)
diff --git a/Documentation/trace/histogram.rst b/Documentation/trace/histogram.rst index 8408670d0328..f93333524a44 100644 --- a/Documentation/trace/histogram.rst +++ b/Documentation/trace/histogram.rst @@ -1495,7 +1495,7 @@ Extended error information #
{ stacktrace: - _do_fork+0x18e/0x330 + kernel_clone+0x18e/0x330 kernel_thread+0x29/0x30 kthreadd+0x154/0x1b0 ret_from_fork+0x3f/0x70 @@ -1588,7 +1588,7 @@ Extended error information SYSC_sendto+0xef/0x170 } hitcount: 88 { stacktrace: - _do_fork+0x18e/0x330 + kernel_clone+0x18e/0x330 SyS_clone+0x19/0x20 entry_SYSCALL_64_fastpath+0x12/0x6a } hitcount: 244 diff --git a/samples/kprobes/kretprobe_example.c b/samples/kprobes/kretprobe_example.c index 78a2da6fb3cd..0c40f7236989 100644 --- a/samples/kprobes/kretprobe_example.c +++ b/samples/kprobes/kretprobe_example.c @@ -8,7 +8,7 @@ * * usage: insmod kretprobe_example.ko func=<func_name> * - * If no func_name is specified, _do_fork is instrumented + * If no func_name is specified, kernel_clone is instrumented * * For more information on theory of operation of kretprobes, see * Documentation/staging/kprobes.rst @@ -26,7 +26,7 @@ #include <linux/limits.h> #include <linux/sched.h>
-static char func_name[NAME_MAX] = "_do_fork"; +static char func_name[NAME_MAX] = "kernel_clone"; module_param_string(func, func_name, NAME_MAX, S_IRUGO); MODULE_PARM_DESC(func, "Function to kretprobe; this module will report the" " function's execution time"); diff --git a/tools/testing/selftests/ftrace/test.d/dynevent/add_remove_kprobe.tc b/tools/testing/selftests/ftrace/test.d/dynevent/add_remove_kprobe.tc index 68550f97d3c3..3bcd4c3624ee 100644 --- a/tools/testing/selftests/ftrace/test.d/dynevent/add_remove_kprobe.tc +++ b/tools/testing/selftests/ftrace/test.d/dynevent/add_remove_kprobe.tc @@ -6,7 +6,7 @@ echo 0 > events/enable echo > dynamic_events
-PLACE=_do_fork +PLACE=kernel_clone
echo "p:myevent1 $PLACE" >> dynamic_events echo "r:myevent2 $PLACE" >> dynamic_events diff --git a/tools/testing/selftests/ftrace/test.d/dynevent/clear_select_events.tc b/tools/testing/selftests/ftrace/test.d/dynevent/clear_select_events.tc index c969be9eb7de..438961971b7e 100644 --- a/tools/testing/selftests/ftrace/test.d/dynevent/clear_select_events.tc +++ b/tools/testing/selftests/ftrace/test.d/dynevent/clear_select_events.tc @@ -6,7 +6,7 @@ echo 0 > events/enable echo > dynamic_events
-PLACE=_do_fork +PLACE=kernel_clone
setup_events() { echo "p:myevent1 $PLACE" >> dynamic_events diff --git a/tools/testing/selftests/ftrace/test.d/dynevent/generic_clear_event.tc b/tools/testing/selftests/ftrace/test.d/dynevent/generic_clear_event.tc index 16d543eaac88..a8603bd23e0d 100644 --- a/tools/testing/selftests/ftrace/test.d/dynevent/generic_clear_event.tc +++ b/tools/testing/selftests/ftrace/test.d/dynevent/generic_clear_event.tc @@ -6,7 +6,7 @@ echo 0 > events/enable echo > dynamic_events
-PLACE=_do_fork +PLACE=kernel_clone
setup_events() { echo "p:myevent1 $PLACE" >> dynamic_events diff --git a/tools/testing/selftests/ftrace/test.d/ftrace/func-filter-stacktrace.tc b/tools/testing/selftests/ftrace/test.d/ftrace/func-filter-stacktrace.tc index 0f41e441c203..98305d76bd04 100644 --- a/tools/testing/selftests/ftrace/test.d/ftrace/func-filter-stacktrace.tc +++ b/tools/testing/selftests/ftrace/test.d/ftrace/func-filter-stacktrace.tc @@ -4,9 +4,9 @@ # requires: set_ftrace_filter # flags: instance
-echo _do_fork:stacktrace >> set_ftrace_filter +echo kernel_clone:stacktrace >> set_ftrace_filter
-grep -q "_do_fork:stacktrace:unlimited" set_ftrace_filter +grep -q "kernel_clone:stacktrace:unlimited" set_ftrace_filter
(echo "forked"; sleep 1)
diff --git a/tools/testing/selftests/ftrace/test.d/kprobe/add_and_remove.tc b/tools/testing/selftests/ftrace/test.d/kprobe/add_and_remove.tc index eba858c21815..9737cd0578a7 100644 --- a/tools/testing/selftests/ftrace/test.d/kprobe/add_and_remove.tc +++ b/tools/testing/selftests/ftrace/test.d/kprobe/add_and_remove.tc @@ -3,7 +3,7 @@ # description: Kprobe dynamic event - adding and removing # requires: kprobe_events
-echo p:myevent _do_fork > kprobe_events +echo p:myevent kernel_clone > kprobe_events grep myevent kprobe_events test -d events/kprobes/myevent echo > kprobe_events diff --git a/tools/testing/selftests/ftrace/test.d/kprobe/busy_check.tc b/tools/testing/selftests/ftrace/test.d/kprobe/busy_check.tc index d10bf4f05bc8..f9a40af76888 100644 --- a/tools/testing/selftests/ftrace/test.d/kprobe/busy_check.tc +++ b/tools/testing/selftests/ftrace/test.d/kprobe/busy_check.tc @@ -3,7 +3,7 @@ # description: Kprobe dynamic event - busy event check # requires: kprobe_events
-echo p:myevent _do_fork > kprobe_events +echo p:myevent kernel_clone > kprobe_events test -d events/kprobes/myevent echo 1 > events/kprobes/myevent/enable echo > kprobe_events && exit_fail # this must fail diff --git a/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args.tc b/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args.tc index 61f2ac441aec..eb543d3cfe5f 100644 --- a/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args.tc +++ b/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args.tc @@ -3,13 +3,13 @@ # description: Kprobe dynamic event with arguments # requires: kprobe_events
-echo 'p:testprobe _do_fork $stack $stack0 +0($stack)' > kprobe_events +echo 'p:testprobe kernel_clone $stack $stack0 +0($stack)' > kprobe_events grep testprobe kprobe_events | grep -q 'arg1=$stack arg2=$stack0 arg3=+0($stack)' test -d events/kprobes/testprobe
echo 1 > events/kprobes/testprobe/enable ( echo "forked") -grep testprobe trace | grep '_do_fork' | \ +grep testprobe trace | grep 'kernel_clone' | \ grep -q 'arg1=0x[[:xdigit:]]* arg2=0x[[:xdigit:]]* arg3=0x[[:xdigit:]]*$'
echo 0 > events/kprobes/testprobe/enable diff --git a/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_comm.tc b/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_comm.tc index 05aaeed6987f..4e5b63be51c9 100644 --- a/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_comm.tc +++ b/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_comm.tc @@ -5,7 +5,7 @@
grep -A1 "fetcharg:" README | grep -q "$comm" || exit_unsupported # this is too old
-echo 'p:testprobe _do_fork comm=$comm ' > kprobe_events +echo 'p:testprobe kernel_clone comm=$comm ' > kprobe_events grep testprobe kprobe_events | grep -q 'comm=$comm' test -d events/kprobes/testprobe
diff --git a/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_string.tc b/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_string.tc index b5fa05443b39..a1d70588ab21 100644 --- a/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_string.tc +++ b/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_string.tc @@ -30,13 +30,13 @@ esac : "Test get argument (1)" echo "p:testprobe tracefs_create_dir arg1=+0(${ARG1}):string" > kprobe_events echo 1 > events/kprobes/testprobe/enable -echo "p:test _do_fork" >> kprobe_events +echo "p:test kernel_clone" >> kprobe_events grep -qe "testprobe.* arg1="test"" trace
echo 0 > events/kprobes/testprobe/enable : "Test get argument (2)" echo "p:testprobe tracefs_create_dir arg1=+0(${ARG1}):string arg2=+0(${ARG1}):string" > kprobe_events echo 1 > events/kprobes/testprobe/enable -echo "p:test _do_fork" >> kprobe_events +echo "p:test kernel_clone" >> kprobe_events grep -qe "testprobe.* arg1="test" arg2="test"" trace
diff --git a/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_symbol.tc b/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_symbol.tc index b8c75a3d003c..bd25dd0ba0d0 100644 --- a/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_symbol.tc +++ b/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_symbol.tc @@ -14,12 +14,12 @@ elif ! grep "$SYMBOL$" /proc/kallsyms; then fi
: "Test get basic types symbol argument" -echo "p:testprobe_u _do_fork arg1=@linux_proc_banner:u64 arg2=@linux_proc_banner:u32 arg3=@linux_proc_banner:u16 arg4=@linux_proc_banner:u8" > kprobe_events -echo "p:testprobe_s _do_fork arg1=@linux_proc_banner:s64 arg2=@linux_proc_banner:s32 arg3=@linux_proc_banner:s16 arg4=@linux_proc_banner:s8" >> kprobe_events +echo "p:testprobe_u kernel_clone arg1=@linux_proc_banner:u64 arg2=@linux_proc_banner:u32 arg3=@linux_proc_banner:u16 arg4=@linux_proc_banner:u8" > kprobe_events +echo "p:testprobe_s kernel_clone arg1=@linux_proc_banner:s64 arg2=@linux_proc_banner:s32 arg3=@linux_proc_banner:s16 arg4=@linux_proc_banner:s8" >> kprobe_events if grep -q "x8/16/32/64" README; then - echo "p:testprobe_x _do_fork arg1=@linux_proc_banner:x64 arg2=@linux_proc_banner:x32 arg3=@linux_proc_banner:x16 arg4=@linux_proc_banner:x8" >> kprobe_events + echo "p:testprobe_x kernel_clone arg1=@linux_proc_banner:x64 arg2=@linux_proc_banner:x32 arg3=@linux_proc_banner:x16 arg4=@linux_proc_banner:x8" >> kprobe_events fi -echo "p:testprobe_bf _do_fork arg1=@linux_proc_banner:b8@4/32" >> kprobe_events +echo "p:testprobe_bf kernel_clone arg1=@linux_proc_banner:b8@4/32" >> kprobe_events echo 1 > events/kprobes/enable (echo "forked") echo 0 > events/kprobes/enable @@ -27,7 +27,7 @@ grep "testprobe_[usx]:.* arg1=.* arg2=.* arg3=.* arg4=.*" trace grep "testprobe_bf:.* arg1=.*" trace
: "Test get string symbol argument" -echo "p:testprobe_str _do_fork arg1=@linux_proc_banner:string" > kprobe_events +echo "p:testprobe_str kernel_clone arg1=@linux_proc_banner:string" > kprobe_events echo 1 > events/kprobes/enable (echo "forked") echo 0 > events/kprobes/enable diff --git a/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_type.tc b/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_type.tc index 0610e0b5587c..91fcce1c241c 100644 --- a/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_type.tc +++ b/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_type.tc @@ -4,7 +4,7 @@ # requires: kprobe_events "x8/16/32/64":README
gen_event() { # Bitsize - echo "p:testprobe _do_fork $stack0:s$1 $stack0:u$1 $stack0:x$1 $stack0:b4@4/$1" + echo "p:testprobe kernel_clone $stack0:s$1 $stack0:u$1 $stack0:x$1 $stack0:b4@4/$1" }
check_types() { # s-type u-type x-type bf-type width diff --git a/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_ftrace.tc b/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_ftrace.tc index 81d8b58c03bc..0d179094191f 100644 --- a/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_ftrace.tc +++ b/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_ftrace.tc @@ -5,29 +5,29 @@
# prepare echo nop > current_tracer -echo _do_fork > set_ftrace_filter -echo 'p:testprobe _do_fork' > kprobe_events +echo kernel_clone > set_ftrace_filter +echo 'p:testprobe kernel_clone' > kprobe_events
# kprobe on / ftrace off echo 1 > events/kprobes/testprobe/enable echo > trace ( echo "forked") grep testprobe trace -! grep '_do_fork <-' trace +! grep 'kernel_clone <-' trace
# kprobe on / ftrace on echo function > current_tracer echo > trace ( echo "forked") grep testprobe trace -grep '_do_fork <-' trace +grep 'kernel_clone <-' trace
# kprobe off / ftrace on echo 0 > events/kprobes/testprobe/enable echo > trace ( echo "forked") ! grep testprobe trace -grep '_do_fork <-' trace +grep 'kernel_clone <-' trace
# kprobe on / ftrace on echo 1 > events/kprobes/testprobe/enable @@ -35,11 +35,11 @@ echo function > current_tracer echo > trace ( echo "forked") grep testprobe trace -grep '_do_fork <-' trace +grep 'kernel_clone <-' trace
# kprobe on / ftrace off echo nop > current_tracer echo > trace ( echo "forked") grep testprobe trace -! grep '_do_fork <-' trace +! grep 'kernel_clone <-' trace diff --git a/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_multiprobe.tc b/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_multiprobe.tc index 366b7e1b6718..45d90b6c763d 100644 --- a/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_multiprobe.tc +++ b/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_multiprobe.tc @@ -4,7 +4,7 @@ # requires: kprobe_events "Create/append/":README
# Choose 2 symbols for target -SYM1=_do_fork +SYM1=kernel_clone SYM2=do_exit EVENT_NAME=kprobes/testevent
diff --git a/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_syntax_errors.tc b/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_syntax_errors.tc index b4d834675e59..c02ea50d63ea 100644 --- a/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_syntax_errors.tc +++ b/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_syntax_errors.tc @@ -86,15 +86,15 @@ esac
# multiprobe errors if grep -q "Create/append/" README && grep -q "imm-value" README; then -echo 'p:kprobes/testevent _do_fork' > kprobe_events +echo 'p:kprobes/testevent kernel_clone' > kprobe_events check_error '^r:kprobes/testevent do_exit' # DIFF_PROBE_TYPE
# Explicitly use printf "%s" to not interpret \1 -printf "%s" 'p:kprobes/testevent _do_fork abcd=\1' > kprobe_events -check_error 'p:kprobes/testevent _do_fork ^bcd=\1' # DIFF_ARG_TYPE -check_error 'p:kprobes/testevent _do_fork ^abcd=\1:u8' # DIFF_ARG_TYPE -check_error 'p:kprobes/testevent _do_fork ^abcd="foo"' # DIFF_ARG_TYPE -check_error '^p:kprobes/testevent _do_fork abcd=\1' # SAME_PROBE +printf "%s" 'p:kprobes/testevent kernel_clone abcd=\1' > kprobe_events +check_error 'p:kprobes/testevent kernel_clone ^bcd=\1' # DIFF_ARG_TYPE +check_error 'p:kprobes/testevent kernel_clone ^abcd=\1:u8' # DIFF_ARG_TYPE +check_error 'p:kprobes/testevent kernel_clone ^abcd="foo"' # DIFF_ARG_TYPE +check_error '^p:kprobes/testevent kernel_clone abcd=\1' # SAME_PROBE fi
exit 0 diff --git a/tools/testing/selftests/ftrace/test.d/kprobe/kretprobe_args.tc b/tools/testing/selftests/ftrace/test.d/kprobe/kretprobe_args.tc index 523fde6d1aa5..7ae492c204a4 100644 --- a/tools/testing/selftests/ftrace/test.d/kprobe/kretprobe_args.tc +++ b/tools/testing/selftests/ftrace/test.d/kprobe/kretprobe_args.tc @@ -4,14 +4,14 @@ # requires: kprobe_events
# Add new kretprobe event -echo 'r:testprobe2 _do_fork $retval' > kprobe_events +echo 'r:testprobe2 kernel_clone $retval' > kprobe_events grep testprobe2 kprobe_events | grep -q 'arg1=$retval' test -d events/kprobes/testprobe2
echo 1 > events/kprobes/testprobe2/enable ( echo "forked")
-cat trace | grep testprobe2 | grep -q '<- _do_fork' +cat trace | grep testprobe2 | grep -q '<- kernel_clone'
echo 0 > events/kprobes/testprobe2/enable echo '-:testprobe2' >> kprobe_events diff --git a/tools/testing/selftests/ftrace/test.d/kprobe/profile.tc b/tools/testing/selftests/ftrace/test.d/kprobe/profile.tc index ff6c44adc8a0..c4093fc1a773 100644 --- a/tools/testing/selftests/ftrace/test.d/kprobe/profile.tc +++ b/tools/testing/selftests/ftrace/test.d/kprobe/profile.tc @@ -4,7 +4,7 @@ # requires: kprobe_events
! grep -q 'myevent' kprobe_profile -echo p:myevent _do_fork > kprobe_events +echo p:myevent kernel_clone > kprobe_events grep -q 'myevent[[:space:]]*0[[:space:]]*0$' kprobe_profile echo 1 > events/kprobes/myevent/enable ( echo "forked" )
Now that all callers of _do_fork() have been switched to kernel_clone() remove the _do_fork() helper.
Signed-off-by: Christian Brauner christian.brauner@ubuntu.com --- include/linux/sched/task.h | 4 ---- 1 file changed, 4 deletions(-)
diff --git a/include/linux/sched/task.h b/include/linux/sched/task.h index d9ef07359c96..44a798bf21b4 100644 --- a/include/linux/sched/task.h +++ b/include/linux/sched/task.h @@ -84,10 +84,6 @@ extern void exit_files(struct task_struct *); extern void exit_itimers(struct signal_struct *);
extern int kernel_clone(struct kernel_clone_args *kargs); -static inline long _do_fork(struct kernel_clone_args *kargs) -{ - return kernel_clone(kargs); -} struct task_struct *fork_idle(int); struct mm_struct *copy_init_mm(void); extern pid_t kernel_thread(int (*fn)(void *), void *arg, unsigned long flags);
On Tue, Aug 18, 2020 at 07:34:00PM +0200, Christian Brauner wrote:
The only remaining function callable outside of kernel/fork.c is _do_fork(). It doesn't really follow the naming of kernel-internal syscall helpers as Christoph righly pointed out. Switch all callers and references to kernel_clone() and remove _do_fork() once and for all.
My only concern is around return type. long, int, pid_t ... can we choose one and stick to it? pid_t is probably the right return type within the kernel, despite the return type of clone3(). It'll save us some work if we ever go through the hassle of growing pid_t beyond 31-bit.
On Tue, Aug 18, 2020 at 06:44:47PM +0100, Matthew Wilcox wrote:
On Tue, Aug 18, 2020 at 07:34:00PM +0200, Christian Brauner wrote:
The only remaining function callable outside of kernel/fork.c is _do_fork(). It doesn't really follow the naming of kernel-internal syscall helpers as Christoph righly pointed out. Switch all callers and references to kernel_clone() and remove _do_fork() once and for all.
My only concern is around return type. long, int, pid_t ... can we choose one and stick to it? pid_t is probably the right return type within the kernel, despite the return type of clone3(). It'll save us some work if we ever go through the hassle of growing pid_t beyond 31-bit.
It should be safe to switch kernel_clone() to return pid_t. (Afair, the syscall wrappers all have "long" as return type. (I think Linus provided some more details on that in another mail. Also see include/linux/syscalls.h. So the return type for clone3() is really somewhat a userspace thing, I think.)
I wonder whether I should take the opportunity and switch the advertised flag arguments for the legacy clone() syscalls and kernel_thread() from unsigned long to unsigned int so we can get rid of the lower .flags = (lower_32_bits(clone_flags) & ~CSIGNAL), calls I added to fix sign extension issues glibc ran into...
Christian
On Tue, Aug 18, 2020 at 06:44:47PM +0100, Matthew Wilcox wrote:
On Tue, Aug 18, 2020 at 07:34:00PM +0200, Christian Brauner wrote:
The only remaining function callable outside of kernel/fork.c is _do_fork(). It doesn't really follow the naming of kernel-internal syscall helpers as Christoph righly pointed out. Switch all callers and references to kernel_clone() and remove _do_fork() once and for all.
My only concern is around return type. long, int, pid_t ... can we choose one and stick to it? pid_t is probably the right return type within the kernel, despite the return type of clone3(). It'll save us some work if we ever go through the hassle of growing pid_t beyond 31-bit.
We have at least the futex ABI restricting PID space to 30 bits.
On Wed, Aug 19, 2020 at 09:43:40AM +0200, peterz@infradead.org wrote:
On Tue, Aug 18, 2020 at 06:44:47PM +0100, Matthew Wilcox wrote:
On Tue, Aug 18, 2020 at 07:34:00PM +0200, Christian Brauner wrote:
The only remaining function callable outside of kernel/fork.c is _do_fork(). It doesn't really follow the naming of kernel-internal syscall helpers as Christoph righly pointed out. Switch all callers and references to kernel_clone() and remove _do_fork() once and for all.
My only concern is around return type. long, int, pid_t ... can we choose one and stick to it? pid_t is probably the right return type within the kernel, despite the return type of clone3(). It'll save us some work if we ever go through the hassle of growing pid_t beyond 31-bit.
We have at least the futex ABI restricting PID space to 30 bits.
Ok, looking into kernel/futex.c I see
pid_t pid = uval & FUTEX_TID_MASK;
which is probably what this referes to and /proc/sys/kernel/threads-max is restricted to FUTEX_TID_MASK.
Afaict, that doesn't block switching kernel_clone() to return pid_t. It can't create anything > FUTEX_TID_MASK anyway without yelling EAGAIN at userspace. But it means that _if_ we were to change the size of pid_t we'd likely need a new futex API.
Christian
On Wed, Aug 19, 2020 at 10:45:56AM +0200, Christian Brauner wrote:
On Wed, Aug 19, 2020 at 09:43:40AM +0200, peterz@infradead.org wrote:
On Tue, Aug 18, 2020 at 06:44:47PM +0100, Matthew Wilcox wrote:
On Tue, Aug 18, 2020 at 07:34:00PM +0200, Christian Brauner wrote:
The only remaining function callable outside of kernel/fork.c is _do_fork(). It doesn't really follow the naming of kernel-internal syscall helpers as Christoph righly pointed out. Switch all callers and references to kernel_clone() and remove _do_fork() once and for all.
My only concern is around return type. long, int, pid_t ... can we choose one and stick to it? pid_t is probably the right return type within the kernel, despite the return type of clone3(). It'll save us some work if we ever go through the hassle of growing pid_t beyond 31-bit.
We have at least the futex ABI restricting PID space to 30 bits.
Ok, looking into kernel/futex.c I see
pid_t pid = uval & FUTEX_TID_MASK;
which is probably what this referes to and /proc/sys/kernel/threads-max is restricted to FUTEX_TID_MASK.
Afaict, that doesn't block switching kernel_clone() to return pid_t. It can't create anything > FUTEX_TID_MASK anyway without yelling EAGAIN at userspace. But it means that _if_ we were to change the size of pid_t we'd likely need a new futex API.
Yes, there would be a lot of work to do to increase the size of pid_t. I'd just like to not do anything to make that harder _now_. Stick to using pid_t within the kernel.
Matthew Wilcox willy@infradead.org writes:
On Wed, Aug 19, 2020 at 10:45:56AM +0200, Christian Brauner wrote:
On Wed, Aug 19, 2020 at 09:43:40AM +0200, peterz@infradead.org wrote:
On Tue, Aug 18, 2020 at 06:44:47PM +0100, Matthew Wilcox wrote:
On Tue, Aug 18, 2020 at 07:34:00PM +0200, Christian Brauner wrote:
The only remaining function callable outside of kernel/fork.c is _do_fork(). It doesn't really follow the naming of kernel-internal syscall helpers as Christoph righly pointed out. Switch all callers and references to kernel_clone() and remove _do_fork() once and for all.
My only concern is around return type. long, int, pid_t ... can we choose one and stick to it? pid_t is probably the right return type within the kernel, despite the return type of clone3(). It'll save us some work if we ever go through the hassle of growing pid_t beyond 31-bit.
We have at least the futex ABI restricting PID space to 30 bits.
Ok, looking into kernel/futex.c I see
pid_t pid = uval & FUTEX_TID_MASK;
which is probably what this referes to and /proc/sys/kernel/threads-max is restricted to FUTEX_TID_MASK.
Afaict, that doesn't block switching kernel_clone() to return pid_t. It can't create anything > FUTEX_TID_MASK anyway without yelling EAGAIN at userspace. But it means that _if_ we were to change the size of pid_t we'd likely need a new futex API.
Yes, there would be a lot of work to do to increase the size of pid_t. I'd just like to not do anything to make that harder _now_. Stick to using pid_t within the kernel.
Just so people are aware. If you look in include/linux/threads.h you can see that the maximum value of PID_MAX_LIMIT limits pids to 22 bits.
Further the design decisions of pids keeps us densly using pids. So I expect it will be a while before we even come close to using 30 bits of pid space.
At the same time I do agree that it makes sense to use a consistent type in the kernel to make it easier to read and update the code.
Eric
On Wed, Aug 19, 2020 at 08:32:59AM -0500, Eric W. Biederman wrote:
Matthew Wilcox willy@infradead.org writes:
On Wed, Aug 19, 2020 at 10:45:56AM +0200, Christian Brauner wrote:
On Wed, Aug 19, 2020 at 09:43:40AM +0200, peterz@infradead.org wrote:
On Tue, Aug 18, 2020 at 06:44:47PM +0100, Matthew Wilcox wrote:
On Tue, Aug 18, 2020 at 07:34:00PM +0200, Christian Brauner wrote:
The only remaining function callable outside of kernel/fork.c is _do_fork(). It doesn't really follow the naming of kernel-internal syscall helpers as Christoph righly pointed out. Switch all callers and references to kernel_clone() and remove _do_fork() once and for all.
My only concern is around return type. long, int, pid_t ... can we choose one and stick to it? pid_t is probably the right return type within the kernel, despite the return type of clone3(). It'll save us some work if we ever go through the hassle of growing pid_t beyond 31-bit.
We have at least the futex ABI restricting PID space to 30 bits.
Ok, looking into kernel/futex.c I see
pid_t pid = uval & FUTEX_TID_MASK;
which is probably what this referes to and /proc/sys/kernel/threads-max is restricted to FUTEX_TID_MASK.
Afaict, that doesn't block switching kernel_clone() to return pid_t. It can't create anything > FUTEX_TID_MASK anyway without yelling EAGAIN at userspace. But it means that _if_ we were to change the size of pid_t we'd likely need a new futex API.
Yes, there would be a lot of work to do to increase the size of pid_t. I'd just like to not do anything to make that harder _now_. Stick to using pid_t within the kernel.
Just so people are aware. If you look in include/linux/threads.h you can see that the maximum value of PID_MAX_LIMIT limits pids to 22 bits.
Further the design decisions of pids keeps us densly using pids. So I expect it will be a while before we even come close to using 30 bits of pid space.
Also because it's simply annoying to have to type really large pid numbers on the shell. Yes yes, that's a very privileged developer-centric complaint but it matters when you have to do a quick kill -9. Chromebook users obviously won't care about how large their pids are for sure.
Tbf, related to discussions last year, systemd now actually raises the default limit from ~33000 to 4194304. Which seems like an ok compromise.
Christian
Christian Brauner christian.brauner@ubuntu.com writes:
On Wed, Aug 19, 2020 at 08:32:59AM -0500, Eric W. Biederman wrote:
Matthew Wilcox willy@infradead.org writes:
On Wed, Aug 19, 2020 at 10:45:56AM +0200, Christian Brauner wrote:
On Wed, Aug 19, 2020 at 09:43:40AM +0200, peterz@infradead.org wrote:
On Tue, Aug 18, 2020 at 06:44:47PM +0100, Matthew Wilcox wrote:
On Tue, Aug 18, 2020 at 07:34:00PM +0200, Christian Brauner wrote: > The only remaining function callable outside of kernel/fork.c is > _do_fork(). It doesn't really follow the naming of kernel-internal > syscall helpers as Christoph righly pointed out. Switch all callers and > references to kernel_clone() and remove _do_fork() once and for all.
My only concern is around return type. long, int, pid_t ... can we choose one and stick to it? pid_t is probably the right return type within the kernel, despite the return type of clone3(). It'll save us some work if we ever go through the hassle of growing pid_t beyond 31-bit.
We have at least the futex ABI restricting PID space to 30 bits.
Ok, looking into kernel/futex.c I see
pid_t pid = uval & FUTEX_TID_MASK;
which is probably what this referes to and /proc/sys/kernel/threads-max is restricted to FUTEX_TID_MASK.
Afaict, that doesn't block switching kernel_clone() to return pid_t. It can't create anything > FUTEX_TID_MASK anyway without yelling EAGAIN at userspace. But it means that _if_ we were to change the size of pid_t we'd likely need a new futex API.
Yes, there would be a lot of work to do to increase the size of pid_t. I'd just like to not do anything to make that harder _now_. Stick to using pid_t within the kernel.
Just so people are aware. If you look in include/linux/threads.h you can see that the maximum value of PID_MAX_LIMIT limits pids to 22 bits.
Further the design decisions of pids keeps us densly using pids. So I expect it will be a while before we even come close to using 30 bits of pid space.
Also because it's simply annoying to have to type really large pid numbers on the shell. Yes yes, that's a very privileged developer-centric complaint but it matters when you have to do a quick kill -9. Chromebook users obviously won't care about how large their pids are for sure.
Actually that is one of the reasons (possibly the primary reason) that we have chosen to keep pid numbers dense.
There may be fewer users of unix shells then their used to be, and we may now have pidfds. But until people stop using pids in shells it is a very valid reason to keep them densly packed.
Tbf, related to discussions last year, systemd now actually raises the default limit from ~33000 to 4194304. Which seems like an ok compromise.
Intereseting. I had not heard of that. That seems a strange choice for systemd rather than a system administrator to make. Of course any design decision that requires manual intervention to get large systems to work is probably a bad one.
Eric
From: Eric W. Biederman
Sent: 19 August 2020 16:01
...
Further the design decisions of pids keeps us densly using pids. So I expect it will be a while before we even come close to using 30 bits of pid space.
Also because it's simply annoying to have to type really large pid numbers on the shell. Yes yes, that's a very privileged developer-centric complaint but it matters when you have to do a quick kill -9. Chromebook users obviously won't care about how large their pids are for sure.
Actually that is one of the reasons (possibly the primary reason) that we have chosen to keep pid numbers dense.
It also helps keep the ps output under 80 cols.
There may be fewer users of unix shells then their used to be, and we may now have pidfds. But until people stop using pids in shells it is a very valid reason to keep them densly packed.
ISTM that the upper limit should be increased automatically when the number of allocated pids gets large enough that they are likely to run out (or get reused very quickly).
Does linux have an O(1) (or do I mean o(1)) pid allocator? Or does it have to do a linear scan to find a gap??
I made the NetBSD pid allocator/lookup use pid_array[pid & mask] then check the high bits matched (incremented on allocate). With a FIFO free list through the unused entries. Fairly easy to double the size and 'unzip' when getting full. And then allocate extra high bits to keep plenty of free values in circulation.
David
- Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
On Wed, Aug 19, 2020 at 03:41:48PM +0000, David Laight wrote:
Does linux have an O(1) (or do I mean o(1)) pid allocator? Or does it have to do a linear scan to find a gap??
O(log(n)). It uses the IDR allocator, so 'n' in this case is the number of PIDs currently allocated, and it's log_64 rather than log_2 (which makes no difference to O() but does make a bit of a difference to performance)
From: Matthew Wilcox
Sent: 19 August 2020 16:45
On Wed, Aug 19, 2020 at 03:41:48PM +0000, David Laight wrote:
Does linux have an O(1) (or do I mean o(1)) pid allocator? Or does it have to do a linear scan to find a gap??
O(log(n)). It uses the IDR allocator, so 'n' in this case is the number of PIDs currently allocated, and it's log_64 rather than log_2 (which makes no difference to O() but does make a bit of a difference to performance)
Still worse that O(1) - when that is just replacing a variable with a value read out of an array. Made pid lookup a trivial O(1) as well.
David
- Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
On Wed, Aug 19, 2020 at 03:55:47PM +0000, David Laight wrote:
From: Matthew Wilcox
Sent: 19 August 2020 16:45
On Wed, Aug 19, 2020 at 03:41:48PM +0000, David Laight wrote:
Does linux have an O(1) (or do I mean o(1)) pid allocator? Or does it have to do a linear scan to find a gap??
O(log(n)). It uses the IDR allocator, so 'n' in this case is the number of PIDs currently allocated, and it's log_64 rather than log_2 (which makes no difference to O() but does make a bit of a difference to performance)
Still worse that O(1) - when that is just replacing a variable with a value read out of an array. Made pid lookup a trivial O(1) as well.
You'd be surprised. We replaced the custom PID allocator with the generic IDR allocator a few years ago and got a pretty decent speedup.
If you think you can do better, then submit patches. You have to support all the existing use cases, of course.
linux-kselftest-mirror@lists.linaro.org