While integrating rseq into glibc and replacing glibc's sched_getcpu implementation with rseq, glibc's tests discovered an issue with incorrect __rseq_abi.cpu_id field value right after the first time a newly created process issues sched_setaffinity.
For the records, it triggers after building glibc and running tests, and then issuing:
for x in {1..2000} ; do posix/tst-affinity-static & done
and shows up as:
error: Unexpected CPU 2, expected 0 error: Unexpected CPU 2, expected 0 error: Unexpected CPU 2, expected 0 error: Unexpected CPU 2, expected 0 error: Unexpected CPU 138, expected 0 error: Unexpected CPU 138, expected 0 error: Unexpected CPU 138, expected 0 error: Unexpected CPU 138, expected 0
This is caused by the scheduler invoking __set_task_cpu() directly from sched_fork() and wake_up_new_task(), thus bypassing rseq_migrate() which is done by set_task_cpu().
Add the missing rseq_migrate() to both functions. The only other direct use of __set_task_cpu() is done by init_idle(), which does not involve a user-space task.
Based on my testing with the glibc test-case, just adding rseq_migrate() to wake_up_new_task() is sufficient to fix the observed issue. Also add it to sched_fork() to keep things consistent.
The reason why this never triggered so far with the rseq/basic_test selftest is unclear.
The current use of sched_getcpu(3) does not typically require it to be always accurate. However, use of the __rseq_abi.cpu_id field within rseq critical sections requires it to be accurate. If it is not accurate, it can cause corruption in the per-cpu data targeted by rseq critical sections in user-space.
Link: https://sourceware.org/pipermail/libc-alpha/2020-July/115816.html Signed-off-by: Mathieu Desnoyers mathieu.desnoyers@efficios.com Tested-By: Florian Weimer fweimer@redhat.com Cc: Peter Zijlstra (Intel) peterz@infradead.org Cc: Thomas Gleixner tglx@linutronix.de Cc: Florian Weimer fw@deneb.enyo.de Cc: "Paul E. McKenney" paulmck@linux.ibm.com Cc: Boqun Feng boqun.feng@gmail.com Cc: "H . Peter Anvin" hpa@zytor.com Cc: Paul Turner pjt@google.com Cc: Dmitry Vyukov dvyukov@google.com Cc: Neel Natu neelnatu@google.com Cc: linux-api@vger.kernel.org Cc: stable@vger.kernel.org # v4.18+ --- kernel/sched/core.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c index ca5db40392d4..86a855bd4d90 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -2962,6 +2962,7 @@ int sched_fork(unsigned long clone_flags, struct task_struct *p) * Silence PROVE_RCU. */ raw_spin_lock_irqsave(&p->pi_lock, flags); + rseq_migrate(p); /* * We're setting the CPU for the first time, we don't migrate, * so use __set_task_cpu(). @@ -3026,6 +3027,7 @@ void wake_up_new_task(struct task_struct *p) * as we're not fully set-up yet. */ p->recent_used_cpu = task_cpu(p); + rseq_migrate(p); __set_task_cpu(p, select_task_rq(p, task_cpu(p), SD_BALANCE_FORK, 0)); #endif rq = __task_rq_lock(p, &rf);
The following commit has been merged into the sched/urgent branch of tip:
Commit-ID: ce3614daabea8a2d01c1dd17ae41d1ec5e5ae7db Gitweb: https://git.kernel.org/tip/ce3614daabea8a2d01c1dd17ae41d1ec5e5ae7db Author: Mathieu Desnoyers mathieu.desnoyers@efficios.com AuthorDate: Mon, 06 Jul 2020 16:49:10 -04:00 Committer: Peter Zijlstra peterz@infradead.org CommitterDate: Wed, 08 Jul 2020 11:38:50 +02:00
sched: Fix unreliable rseq cpu_id for new tasks
While integrating rseq into glibc and replacing glibc's sched_getcpu implementation with rseq, glibc's tests discovered an issue with incorrect __rseq_abi.cpu_id field value right after the first time a newly created process issues sched_setaffinity.
For the records, it triggers after building glibc and running tests, and then issuing:
for x in {1..2000} ; do posix/tst-affinity-static & done
and shows up as:
error: Unexpected CPU 2, expected 0 error: Unexpected CPU 2, expected 0 error: Unexpected CPU 2, expected 0 error: Unexpected CPU 2, expected 0 error: Unexpected CPU 138, expected 0 error: Unexpected CPU 138, expected 0 error: Unexpected CPU 138, expected 0 error: Unexpected CPU 138, expected 0
This is caused by the scheduler invoking __set_task_cpu() directly from sched_fork() and wake_up_new_task(), thus bypassing rseq_migrate() which is done by set_task_cpu().
Add the missing rseq_migrate() to both functions. The only other direct use of __set_task_cpu() is done by init_idle(), which does not involve a user-space task.
Based on my testing with the glibc test-case, just adding rseq_migrate() to wake_up_new_task() is sufficient to fix the observed issue. Also add it to sched_fork() to keep things consistent.
The reason why this never triggered so far with the rseq/basic_test selftest is unclear.
The current use of sched_getcpu(3) does not typically require it to be always accurate. However, use of the __rseq_abi.cpu_id field within rseq critical sections requires it to be accurate. If it is not accurate, it can cause corruption in the per-cpu data targeted by rseq critical sections in user-space.
Reported-By: Florian Weimer fweimer@redhat.com Signed-off-by: Mathieu Desnoyers mathieu.desnoyers@efficios.com Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Tested-By: Florian Weimer fweimer@redhat.com Cc: stable@vger.kernel.org # v4.18+ Link: https://lkml.kernel.org/r/20200707201505.2632-1-mathieu.desnoyers@efficios.c... --- kernel/sched/core.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 950ac45..e15543c 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -2965,6 +2965,7 @@ int sched_fork(unsigned long clone_flags, struct task_struct *p) * Silence PROVE_RCU. */ raw_spin_lock_irqsave(&p->pi_lock, flags); + rseq_migrate(p); /* * We're setting the CPU for the first time, we don't migrate, * so use __set_task_cpu(). @@ -3029,6 +3030,7 @@ void wake_up_new_task(struct task_struct *p) * as we're not fully set-up yet. */ p->recent_used_cpu = task_cpu(p); + rseq_migrate(p); __set_task_cpu(p, select_task_rq(p, task_cpu(p), SD_BALANCE_FORK, 0)); #endif rq = __task_rq_lock(p, &rf);
linux-stable-mirror@lists.linaro.org