Re: [PATCH 01/13] rcu/nocb: Fix potential missed nocb_timer rearm

3 Mar 2021

      On Tue, Mar 02, 2021 at 06:06:43PM -0800, Paul E. McKenney wrote:
...
On Wed, Mar 03, 2021 at 02:35:33AM +0100, Frederic Weisbecker wrote:
...
On Tue, Mar 02, 2021 at 10:17:29AM -0800, Paul E. McKenney wrote:
...
On Tue, Mar 02, 2021 at 01:34:44PM +0100, Frederic Weisbecker wrote:
OK, how about if I queue a temporary commit (shown below) that just
calls out the first scenario so that I can start testing, and you get
me more detail on the second scenario?  I can then update the commit.
Sure, meanwhile here is an attempt for a nocb_bypass_timer based
scenario, it's overly hairy and perhaps I picture more power
in the hands of callbacks advancing on nocb_cb_wait() than it
really has:
Thank you very much!
I must defer looking through this in detail until I am more awake,
but I do very much like the fine-grained exposition.
					Thanx, Paul

...

     CPU 0's ->nocb_cb_kthread just called rcu_do_batch() and
     executed all the ready callbacks. Its segcblist is now
     entirely empty. It's preempted while calling local_bh_enable().

     A new callback is enqueued on CPU 0 with IRQs enabled. So
     the ->nocb_gp_kthread for CPU 0-2's is awaken. Then a storm
     of callbacks enqueue follows on CPU 0 and even reaches the
     bypass queue. Note that ->nocb_gp_kthread is also associated
     with CPU 0.

     CPU 0 queues one last bypass callback.

     The ->nocb_gp_kthread wakes up and associates a grace period
     with the whole queue of regular callbacks on CPU 0. It also
     tries to flush the bypass queue of CPU 0 but the bypass lock
     is contended due to the concurrent enqueuing on the previous
     step 2, so the flush fails.

     This ->nocb_gp_kthread arms its ->nocb_bypass_timer and goes
     to sleep waiting for the end of this future grace period.

     This grace period elapses before the ->nocb_bypass_timer timer
     fires. This is normally improbably given that the timer is set
     for only two jiffies, but timers can be delayed.  Besides, it
     is possible that kernel was built with CONFIG_RCU_STRICT_GRACE_PERIOD=y.

     The grace period ends, so rcu_gp_kthread awakens the
     ->nocb_gp_kthread but it doesn't get a chance to run on a CPU
     before a while.

     CPU 0's ->nocb_cb_kthread get back to the CPU after its preemption.
     As it notices the new completed grace period, it advances the callbacks
     and executes them. Then it gets preempted again on local_bh_enabled().

     A new callback enqueue on CPU 0 flushes itself the bypass queue
     because CPU 0's ->nocb_nobypass_count < nocb_nobypass_lim_per_jiffy.

     CPUs from other ->nocb_gp_kthread groups (above CPU 2) initiate and
     elapse a few grace periods. CPU 0's ->nocb_gp_kthread still hasn't
     got an opportunity to run on a CPU and its ->nocb_bypass_timer still
     hasn't fired.

    CPU 0's ->nocb_cb_kthread wakes up from preemption. It notices the
    new grace periods that have elapsed, advance all the callbacks and
    executes them. Then it goes to sleep waiting for invocable
    callbacks.

I'm just not so sure about the above point 10. Even though a few grace periods
have elapsed, the callback queued in 8 is in RCU_NEXT_TAIL at this
point. Perhaps one more grace period is necessary after that.
Anyway, I need to be more awake as well before checking that again.

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

Re: [PATCH 01/13] rcu/nocb: Fix potential missed nocb_timer rearm