It is preferrable to reject unknown flags within rseq unregistration rather than to ignore them. It is an oversight caused by the fact that the check for unknown flags is after the rseq unregister flag check.
Signed-off-by: Mathieu Desnoyers mathieu.desnoyers@efficios.com Cc: Thomas Gleixner tglx@linutronix.de Cc: Peter Zijlstra (Intel) peterz@infradead.org Cc: "Paul E. McKenney" paulmck@linux.ibm.com Cc: Boqun Feng boqun.feng@gmail.com Cc: "H . Peter Anvin" hpa@zytor.com Cc: Paul Turner pjt@google.com Cc: linux-api@vger.kernel.org Cc: stable@vger.kernel.org --- kernel/rseq.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/kernel/rseq.c b/kernel/rseq.c index 27c48eb7de40..a4f86a9d6937 100644 --- a/kernel/rseq.c +++ b/kernel/rseq.c @@ -310,6 +310,8 @@ SYSCALL_DEFINE4(rseq, struct rseq __user *, rseq, u32, rseq_len, int ret;
if (flags & RSEQ_FLAG_UNREGISTER) { + if (flags & ~RSEQ_FLAG_UNREGISTER) + return -EINVAL; /* Unregister rseq for current thread. */ if (current->rseq != rseq || !current->rseq) return -EINVAL;
It has been reported by Google that rseq is not behaving properly with respect to clone when CLONE_VM is used without CLONE_THREAD. It keeps the prior thread's rseq TLS registered when the TLS of the thread has moved, so the kernel deals with the wrong TLS.
The approach of clearing the per task-struct rseq registration on clone with CLONE_THREAD flag is incomplete. It does not cover the use-case of clone with CLONE_VM set, but without CLONE_THREAD.
Looking more closely at each of the clone flags:
- CLONE_THREAD, - CLONE_VM, - CLONE_SETTLS.
It appears that the flag we really want to track is CLONE_SETTLS, which moves the location of the TLS for the child, making the rseq registration point to the wrong TLS.
Suggested-by: "H . Peter Anvin" hpa@zytor.com Signed-off-by: Mathieu Desnoyers mathieu.desnoyers@efficios.com Cc: Thomas Gleixner tglx@linutronix.de Cc: Peter Zijlstra (Intel) peterz@infradead.org Cc: "Paul E. McKenney" paulmck@linux.ibm.com Cc: Boqun Feng boqun.feng@gmail.com Cc: "H . Peter Anvin" hpa@zytor.com Cc: Paul Turner pjt@google.com Cc: Dmitry Vyukov dvyukov@google.com Cc: linux-api@vger.kernel.org Cc: stable@vger.kernel.org --- include/linux/sched.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/include/linux/sched.h b/include/linux/sched.h index 9f51932bd543..76bf55b5cccf 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1919,11 +1919,11 @@ static inline void rseq_migrate(struct task_struct *t)
/* * If parent process has a registered restartable sequences area, the - * child inherits. Only applies when forking a process, not a thread. + * child inherits. Unregister rseq for a clone with CLONE_SETTLS set. */ static inline void rseq_fork(struct task_struct *t, unsigned long clone_flags) { - if (clone_flags & CLONE_THREAD) { + if (clone_flags & CLONE_SETTLS) { t->rseq = NULL; t->rseq_sig = 0; t->rseq_event_mask = 0;
There is an ongoing discussion on the choice of flag we want to care about here. Therefore, please don't pull this patch until we reach an agreement.
Thanks,
Mathieu
----- On Sep 13, 2019, at 11:12 AM, Mathieu Desnoyers mathieu.desnoyers@efficios.com wrote:
It has been reported by Google that rseq is not behaving properly with respect to clone when CLONE_VM is used without CLONE_THREAD. It keeps the prior thread's rseq TLS registered when the TLS of the thread has moved, so the kernel deals with the wrong TLS.
The approach of clearing the per task-struct rseq registration on clone with CLONE_THREAD flag is incomplete. It does not cover the use-case of clone with CLONE_VM set, but without CLONE_THREAD.
Looking more closely at each of the clone flags:
- CLONE_THREAD,
- CLONE_VM,
- CLONE_SETTLS.
It appears that the flag we really want to track is CLONE_SETTLS, which moves the location of the TLS for the child, making the rseq registration point to the wrong TLS.
Suggested-by: "H . Peter Anvin" hpa@zytor.com Signed-off-by: Mathieu Desnoyers mathieu.desnoyers@efficios.com Cc: Thomas Gleixner tglx@linutronix.de Cc: Peter Zijlstra (Intel) peterz@infradead.org Cc: "Paul E. McKenney" paulmck@linux.ibm.com Cc: Boqun Feng boqun.feng@gmail.com Cc: "H . Peter Anvin" hpa@zytor.com Cc: Paul Turner pjt@google.com Cc: Dmitry Vyukov dvyukov@google.com Cc: linux-api@vger.kernel.org Cc: stable@vger.kernel.org
include/linux/sched.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/include/linux/sched.h b/include/linux/sched.h index 9f51932bd543..76bf55b5cccf 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1919,11 +1919,11 @@ static inline void rseq_migrate(struct task_struct *t)
/*
- If parent process has a registered restartable sequences area, the
- child inherits. Only applies when forking a process, not a thread.
- child inherits. Unregister rseq for a clone with CLONE_SETTLS set.
*/ static inline void rseq_fork(struct task_struct *t, unsigned long clone_flags) {
- if (clone_flags & CLONE_THREAD) {
- if (clone_flags & CLONE_SETTLS) { t->rseq = NULL; t->rseq_sig = 0; t->rseq_event_mask = 0;
-- 2.17.1
----- On Sep 14, 2019, at 10:21 AM, Mathieu Desnoyers mathieu.desnoyers@efficios.com wrote:
There is an ongoing discussion on the choice of flag we want to care about here. Therefore, please don't pull this patch until we reach an agreement.
Following discussion with Neel Natu (Google) and Paul Turner (Google), I plan to modify this patch, and unregister RSEQ on clone CLONE_VM for the following reasons:
1) CLONE_THREAD requires CLONE_SIGHAND, which requires CLONE_VM to be set. Therefore, just checking for CLONE_VM covers all CLONE_THREAD uses,
2) There is the possibility of an unlikely scenario where CLONE_SETTLS is used without CLONE_VM. In order to be an issue, it would require that the rseq TLS is in a shared memory area.
I do not plan on adding CLONE_SETTLS to the set of clone flags which unregister RSEQ, because it would require that we also unregister RSEQ on set_thread_area(2) and arch_prctl(2) ARCH_SET_FS for completeness. So rather than doing a partial solution, it appears better to let user-space explicitly perform rseq unregistration across clone if needed in scenarios where CLONE_VM is not set.
Thoughts ?
Thanks,
Mathieu
Thanks,
Mathieu
----- On Sep 13, 2019, at 11:12 AM, Mathieu Desnoyers mathieu.desnoyers@efficios.com wrote:
It has been reported by Google that rseq is not behaving properly with respect to clone when CLONE_VM is used without CLONE_THREAD. It keeps the prior thread's rseq TLS registered when the TLS of the thread has moved, so the kernel deals with the wrong TLS.
The approach of clearing the per task-struct rseq registration on clone with CLONE_THREAD flag is incomplete. It does not cover the use-case of clone with CLONE_VM set, but without CLONE_THREAD.
Looking more closely at each of the clone flags:
- CLONE_THREAD,
- CLONE_VM,
- CLONE_SETTLS.
It appears that the flag we really want to track is CLONE_SETTLS, which moves the location of the TLS for the child, making the rseq registration point to the wrong TLS.
Suggested-by: "H . Peter Anvin" hpa@zytor.com Signed-off-by: Mathieu Desnoyers mathieu.desnoyers@efficios.com Cc: Thomas Gleixner tglx@linutronix.de Cc: Peter Zijlstra (Intel) peterz@infradead.org Cc: "Paul E. McKenney" paulmck@linux.ibm.com Cc: Boqun Feng boqun.feng@gmail.com Cc: "H . Peter Anvin" hpa@zytor.com Cc: Paul Turner pjt@google.com Cc: Dmitry Vyukov dvyukov@google.com Cc: linux-api@vger.kernel.org Cc: stable@vger.kernel.org
include/linux/sched.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/include/linux/sched.h b/include/linux/sched.h index 9f51932bd543..76bf55b5cccf 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1919,11 +1919,11 @@ static inline void rseq_migrate(struct task_struct *t)
/*
- If parent process has a registered restartable sequences area, the
- child inherits. Only applies when forking a process, not a thread.
- child inherits. Unregister rseq for a clone with CLONE_SETTLS set.
*/ static inline void rseq_fork(struct task_struct *t, unsigned long clone_flags) {
- if (clone_flags & CLONE_THREAD) {
- if (clone_flags & CLONE_SETTLS) { t->rseq = NULL; t->rseq_sig = 0; t->rseq_event_mask = 0;
-- 2.17.1
-- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com
glibc 2.30 introduces gettid() in public headers, which clashes with the internal static definition within rseq selftests.
Rename gettid() to rseq_gettid() to eliminate this symbol name clash.
Reported-by: Tommi T. Rantala tommi.t.rantala@nokia.com Signed-off-by: Mathieu Desnoyers mathieu.desnoyers@efficios.com Cc: Shuah Khan skhan@linuxfoundation.org Cc: Tommi T. Rantala tommi.t.rantala@nokia.com Cc: Thomas Gleixner tglx@linutronix.de Cc: Peter Zijlstra (Intel) peterz@infradead.org Cc: "Paul E. McKenney" paulmck@linux.ibm.com Cc: Boqun Feng boqun.feng@gmail.com Cc: "H . Peter Anvin" hpa@zytor.com Cc: Paul Turner pjt@google.com Cc: Dmitry Vyukov dvyukov@google.com Cc: stable@vger.kernel.org --- tools/testing/selftests/rseq/param_test.c | 18 ++++++++++-------- 1 file changed, 10 insertions(+), 8 deletions(-)
diff --git a/tools/testing/selftests/rseq/param_test.c b/tools/testing/selftests/rseq/param_test.c index eec2663261f2..e8a657a5f48a 100644 --- a/tools/testing/selftests/rseq/param_test.c +++ b/tools/testing/selftests/rseq/param_test.c @@ -15,7 +15,7 @@ #include <errno.h> #include <stddef.h>
-static inline pid_t gettid(void) +static inline pid_t rseq_gettid(void) { return syscall(__NR_gettid); } @@ -373,11 +373,12 @@ void *test_percpu_spinlock_thread(void *arg) rseq_percpu_unlock(&data->lock, cpu); #ifndef BENCHMARK if (i != 0 && !(i % (reps / 10))) - printf_verbose("tid %d: count %lld\n", (int) gettid(), i); + printf_verbose("tid %d: count %lld\n", + (int) rseq_gettid(), i); #endif } printf_verbose("tid %d: number of rseq abort: %d, signals delivered: %u\n", - (int) gettid(), nr_abort, signals_delivered); + (int) rseq_gettid(), nr_abort, signals_delivered); if (!opt_disable_rseq && thread_data->reg && rseq_unregister_current_thread()) abort(); @@ -454,11 +455,12 @@ void *test_percpu_inc_thread(void *arg) } while (rseq_unlikely(ret)); #ifndef BENCHMARK if (i != 0 && !(i % (reps / 10))) - printf_verbose("tid %d: count %lld\n", (int) gettid(), i); + printf_verbose("tid %d: count %lld\n", + (int) rseq_gettid(), i); #endif } printf_verbose("tid %d: number of rseq abort: %d, signals delivered: %u\n", - (int) gettid(), nr_abort, signals_delivered); + (int) rseq_gettid(), nr_abort, signals_delivered); if (!opt_disable_rseq && thread_data->reg && rseq_unregister_current_thread()) abort(); @@ -605,7 +607,7 @@ void *test_percpu_list_thread(void *arg) }
printf_verbose("tid %d: number of rseq abort: %d, signals delivered: %u\n", - (int) gettid(), nr_abort, signals_delivered); + (int) rseq_gettid(), nr_abort, signals_delivered); if (!opt_disable_rseq && rseq_unregister_current_thread()) abort();
@@ -796,7 +798,7 @@ void *test_percpu_buffer_thread(void *arg) }
printf_verbose("tid %d: number of rseq abort: %d, signals delivered: %u\n", - (int) gettid(), nr_abort, signals_delivered); + (int) rseq_gettid(), nr_abort, signals_delivered); if (!opt_disable_rseq && rseq_unregister_current_thread()) abort();
@@ -1011,7 +1013,7 @@ void *test_percpu_memcpy_buffer_thread(void *arg) }
printf_verbose("tid %d: number of rseq abort: %d, signals delivered: %u\n", - (int) gettid(), nr_abort, signals_delivered); + (int) rseq_gettid(), nr_abort, signals_delivered); if (!opt_disable_rseq && rseq_unregister_current_thread()) abort();
linux-stable-mirror@lists.linaro.org