On Wed, Aug 02, 2023 at 10:14:51AM -0700, Linus Torvalds wrote:
Two quick comments, both of them "this code is a bit odd" rather than anything else.
Good to get eyes on this code, so thank you very much!!!
On Tue, 1 Aug 2023 at 12:11, Paul E. McKenney paulmck@kernel.org wrote:
diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h
Why is this file called "tasks.h"?
It's not a header file. It makes no sense. It's full of C code. It's included in only one place. It's just _weird_.
You are right, it is weird.
This is a holdover from when I was much more concerned about being criticized for having #ifdef in a .c file, and pretty much every line in this file is under some combination or another of #ifdefs. This concern led to kernel/rcu/tree_plugin.h being set up in this way back when preemptible RCU was introduced, and for good or for bad I just kept following that pattern.
We could convert this to a .c file, keep the #ifdefs, drop some instances of "static", add a bunch of declarations, and maybe (or maybe not) push a function or two into some .h file for performance/inlining reasons. Me, I would prefer to leave it alone, but we can certainly change it.
However, more relevantly:
mutex_unlock(&rtp->tasks_gp_mutex); set_tasks_gp_state(rtp, RTGS_WAIT_CBS);
Isn't the tasks_gp_mutex the thing that protects the gp state here? Shouldn't it be after setting?
Much of the gp state is protected by being accessed only by the gp kthread. But there is a window in time where the gp might be driven directly out of the synchronize_rcu_tasks() call. That window in time does not have a definite end, so this ->tasks_gp_mutex does the needed mutual exclusion during the transition of gp processing to the newly created gp kthread.
rcuwait_wait_event(&rtp->cbs_wait, (needgpcb = rcu_tasks_need_gpcb(rtp)), TASK_IDLE);
Also, looking at rcu_tasks_need_gpcb() that is now called outside the lock, it does something quite odd.
The state of each callback list is protected by the ->lock field of the rcu_tasks_percpu structure. Yes, rcu_segcblist_n_cbs() is invoked int rcu_tasks_need_gpcb() outside of the lock, but it is designed for lockless use. If it is modified just after the check, then there will be a later wakeup on the one hand or we will just uselessly acquire that ->lock this one time on the other.
Also, ncbs records the number of callbacks seen in that first loop, then used later, where its value might be stale. This might result in a collapse back to single-callback-queue operation and a later expansion back up. Except that at this point we are still in single-CPU mode, so there should not be any lock contention, which means that there should still be but a single callback queue. The transition itself is protected by ->cbs_gbl_lock.
At the very top of the function does
for (cpu = 0; cpu < smp_load_acquire(&rtp->percpu_dequeue_lim); cpu++) {
and 'smp_load_acquire()' is all about saying "everything *after* this load is ordered,
But the way it is done in that loop, it is indeed done at the beginning of the loop, but then it's done *after* the loop too, so the last smp_load_acquire seems a bit nonsensical.
If you want to load a value and say "this value is now sensible for everything that follows", I think you should load it *first*. No?
IOW, wouldn't the whole sequence make more sense as
dequeue_limit = smp_load_acquire(&rtp->percpu_dequeue_lim); for (cpu = 0; cpu < dequeue_limit; cpu++) {
and say that everything in rcu_tasks_need_gpcb() is ordered wrt the initial limit on entry?
I dunno. That use of "smp_load_acquire()" just seems odd. Memory ordering is hard to understand to begin with, but then when you have things like loops that do the same ordered load multiple times, it goes from "hard to understand" to positively confusing.
Excellent point. I am queueing that change with your Suggested-by. If testing goes well, it will be as shown below.
Thanx, Paul
------------------------------------------------------------------------
diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h index 83049a893de5..94bb5abdbb37 100644 --- a/kernel/rcu/tasks.h +++ b/kernel/rcu/tasks.h @@ -432,6 +432,7 @@ static void rcu_barrier_tasks_generic(struct rcu_tasks *rtp) static int rcu_tasks_need_gpcb(struct rcu_tasks *rtp) { int cpu; + int dequeue_limit; unsigned long flags; bool gpdone = poll_state_synchronize_rcu(rtp->percpu_dequeue_gpseq); long n; @@ -439,7 +440,8 @@ static int rcu_tasks_need_gpcb(struct rcu_tasks *rtp) long ncbsnz = 0; int needgpcb = 0;
- for (cpu = 0; cpu < smp_load_acquire(&rtp->percpu_dequeue_lim); cpu++) { + dequeue_limit = smp_load_acquire(&rtp->percpu_dequeue_lim); + for (cpu = 0; cpu < dequeue_limit; cpu++) { struct rcu_tasks_percpu *rtpcp = per_cpu_ptr(rtp->rtpcpu, cpu);
/* Advance and accelerate any new callbacks. */