On Wed, Feb 24, 2021 at 04:14:25PM -0800, Paul E. McKenney wrote:
On Wed, Feb 24, 2021 at 11:06:06PM +0100, Frederic Weisbecker wrote:
I managed to recollect some pieces of my brain. So keep the above but let's change the point 10:
CPU 0 enqueues its second callback, this time with interrupts
enabled so it can wake directly ->nocb_gp_kthread. It does so with calling __wake_nocb_gp() which also cancels the pending timer that got queued in step 2. But that doesn't reset CPU 0's ->nocb_defer_wakeup which is still set to RCU_NOCB_WAKE. So CPU 0's ->nocb_defer_wakeup and CPU 0's ->nocb_timer are now desynchronized.
- ->nocb_gp_kthread associates the callback queued in 10 with a new
grace period, arrange for it to start and sleeps on it.
The grace period ends, ->nocb_gp_kthread awakens and wakes up
CPU 0's ->nocb_cb_kthread which invokes the callback queued in 10.
- CPU 0 enqueues its third callback, this time with interrupts
disabled so it tries to queue a deferred wakeup. However ->nocb_defer_wakeup has a stalled RCU_NOCB_WAKE value which prevents the CPU 0's ->nocb_timer, that got cancelled in 10, from being armed.
CPU 0 has its pending callback and it may go unnoticed until some other CPU ever wakes up ->nocb_gp_kthread or CPU 0 ever calls
an explicit deferred wake up caller like idle entry.
I hope I'm not missing something this time...
Thank you, that does sound plausible. I guess I can see how rcutorture might have missed this one!
I must admit it requires a lot of stars to be aligned :-)
Thanks.