From: Frederic Weisbecker frederic@kernel.org
[ Upstream commit 81f6d49cce2d2fe507e3fddcc4a6db021d9c2e7b ]
Expedited RCU grace periods invoke sync_rcu_exp_select_node_cpus(), which takes two passes over the leaf rcu_node structure's CPUs. The first pass gathers up the current CPU and CPUs that are in dynticks idle mode. The workqueue will report a quiescent state on their behalf later. The second pass sends IPIs to the rest of the CPUs, but excludes the current CPU, incorrectly assuming it has been included in the first pass's list of CPUs.
Unfortunately the current CPU may have changed between the first and second pass, due to the fact that the various rcu_node structures' ->lock fields have been dropped, thus momentarily enabling preemption. This means that if the second pass's CPU was not on the first pass's list, it will be ignored completely. There will be no IPI sent to it, and there will be no reporting of quiescent states on its behalf. Unfortunately, the expedited grace period will nevertheless be waiting for that CPU to report a quiescent state, but with that CPU having no reason to believe that such a report is needed.
The result will be an expedited grace period stall.
Fix this by no longer excluding the current CPU from consideration during the second pass.
Fixes: b9ad4d6ed18e ("rcu: Avoid self-IPI in sync_rcu_exp_select_node_cpus()") Reviewed-by: Neeraj Upadhyay quic_neeraju@quicinc.com Signed-off-by: Frederic Weisbecker frederic@kernel.org Cc: Uladzislau Rezki urezki@gmail.com Cc: Neeraj Upadhyay quic_neeraju@quicinc.com Cc: Boqun Feng boqun.feng@gmail.com Cc: Josh Triplett josh@joshtriplett.org Cc: Joel Fernandes joel@joelfernandes.org Signed-off-by: Paul E. McKenney paulmck@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- kernel/rcu/tree_exp.h | 1 + 1 file changed, 1 insertion(+)
diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h index 4c4d7683a4e5b..173e3ce607900 100644 --- a/kernel/rcu/tree_exp.h +++ b/kernel/rcu/tree_exp.h @@ -382,6 +382,7 @@ retry_ipi: continue; } if (get_cpu() == cpu) { + mask_ofl_test |= mask; put_cpu(); continue; }