On Tue, 2023-08-01 at 12:11 -0700, Paul E. McKenney wrote:
On Tue, Aug 01, 2023 at 10:32:45AM -0700, Guenter Roeck wrote:
Please see below for my preferred fix. Does this work for you guys?
Back to figuring out why recent kernels occasionally to blow up all rcutorture guest OSes...
Thanx, Paul
diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h index 7294be62727b..2d5b8385c357 100644 --- a/kernel/rcu/tasks.h +++ b/kernel/rcu/tasks.h @@ -570,10 +570,12 @@ static void rcu_tasks_one_gp(struct rcu_tasks *rtp, bool midboot) if (unlikely(midboot)) { needgpcb = 0x2; } else { + mutex_unlock(&rtp->tasks_gp_mutex); set_tasks_gp_state(rtp, RTGS_WAIT_CBS); rcuwait_wait_event(&rtp->cbs_wait, (needgpcb = rcu_tasks_need_gpcb(rtp)), TASK_IDLE); + mutex_lock(&rtp->tasks_gp_mutex); } if (needgpcb & 0x2) {
Your preferred fix looks good to me.
With the original code I can quite easily reproduce the problem on my system every 10 reboots or so. With your fix in place the problem no longer occurs.