On Fri, Aug 25, 2023 at 07:15:44PM +0800, Huacai Chen wrote:
Hi, Paul,
On Fri, Aug 25, 2023 at 2:28 AM Paul E. McKenney paulmck@kernel.org wrote:
On Thu, Aug 24, 2023 at 11:43:04PM +0800, Huacai Chen wrote:
Hi, Paul,
On Thu, Aug 24, 2023 at 9:24 PM Paul E. McKenney paulmck@kernel.org wrote:
On Thu, Aug 24, 2023 at 08:40:00PM +0800, Huacai Chen wrote:
Hi, Paul, On Thu, Aug 24, 2023 at 7:40 PM Paul E. McKenney paulmck@kernel.org wrote:
On Thu, Aug 24, 2023 at 10:50:41AM +0800, Huacai Chen wrote: > Hi, Paul, > On Thu, Aug 24, 2023 at 6:41 AM Paul E. McKenney paulmck@kernel.org wrote: > > On Thu, Aug 24, 2023 at 12:03:25AM +0200, Thomas Gleixner wrote: > > > On Thu, Aug 17 2023 at 16:06, Huacai Chen wrote: > > > > On Thu, Aug 17, 2023 at 3:27 AM Joel Fernandes joel@joelfernandes.org wrote: > > > >> > If do_update_jiffies_64() cannot be used in NMI context, > > > >> > > > >> Can you not make the jiffies update conditional on whether it is > > > >> called within NMI context? > > > > > > Which solves what? If KGDB has a breakpoint in the jiffies lock held > > > region then you still dead lock. > > > > > > >> I dislike that.. > > > > Is this acceptable? > > > > > > > > void rcu_cpu_stall_reset(void) > > > > { > > > > unsigned long delta; > > > > > > > > delta = nsecs_to_jiffies(ktime_get_ns() - ktime_get_coarse_ns()); > > > > > > > > WRITE_ONCE(rcu_state.jiffies_stall, > > > > jiffies + delta + rcu_jiffies_till_stall_check()); > > > > } > > > > > > > > This can update jiffies_stall without updating jiffies (but has the > > > > same effect). > > > > > > Now you traded the potential dead lock on jiffies lock for a potential > > > live lock vs. tk_core.seq. Not really an improvement, right? > > > > > > The only way you can do the above is something like the incomplete and > > > uncompiled below. NMI safe and therefore livelock proof time interfaces > > > exist for a reason. > > > > Just for completeness, another approach, with its own advantages > > and disadvantage, is to add something like ULONG_MAX/4 to > > rcu_state.jiffies_stall, but also set a counter indicating that this > > has been done. Then RCU's force-quiescent processing could decrement > > that counter (if non-zero) and reset rcu_state.jiffies_stall when it > > does reach zero. > > > > Setting the counter to three should cover most cases, but "live by the > > heuristic, die by the heuristic". ;-) > > > > It would be good to have some indication when gdb exited, but things > > like the gdb "next" command can make that "interesting" when applied to > > a long-running function. > > The original code is adding ULONG_MAX/2, so adding ULONG_MAX/4 may > make no much difference? The simplest way is adding 300*HZ, but Joel > dislikes that.
I am not seeing the ULONG_MAX/2, so could you please point me to that original code?
Maybe I misunderstand something, I say the original code means code before commit a80be428fbc1f1f3bc9ed924 ("rcu: Do not disable GP stall detection in rcu_cpu_stall_reset()").
Yes, my suggestion would essentially revert that patch. It would compensate by resetting rcu_state.jiffies_stall after a few calls to rcu_gp_fqs().
Alternatively, we could simply provide a way for gdb users to manually disable RCU CPU stall warnings at the beginning of their debug sessions and to manually re-enable them when they are done.
This problem is not KGDB-specific (though it is firstly found in the KGDB case), so I want to fix it in the rcu code rather than in the kgdb code.
Sure, for example, there is also PowerPC XMON.
But this problem also is not RCU-specific. There are also hardlockups, softlockups, workqueue lockups, networking timeouts, and who knows what all else.
Plus, and again to Thomas's point, gdb breakpoints can happen anywhere. For example, immediately after RCU computes the RCU CPU stall time for a new grace period, and right before it stores it. The gdb callout updates rcu_state.jiffies_stall, but that update is overwritten with a stale value as soon as the system starts back up.
Low probabillity, to be sure, but there are quite a few places in the kernel right after a read from some timebase or another, and many (perhaps all) of these can see similar stale-time-use problems.
The only way I know of to avoid these sorts of false positives is for the user to manually suppress all timeouts (perhaps using a kernel-boot parameter for your early-boot case), do the gdb work, and then unsuppress all stalls. Even that won't work for networking, because the other system's clock will be running throughout.
In other words, from what I know now, there is no perfect solution. Therefore, there are sharp limits to the complexity of any solution that I will be willing to accept.
I think the simplest solution is (I hope Joel will not angry):
void rcu_cpu_stall_reset(void) { WRITE_ONCE(rcu_state.jiffies_stall, jiffies + 300*HZ); }
300s is the upper limit of "stall timeout" we can configure (RCU_CPU_STALL_TIMEOUT in kernel/rcu/Kconfig.debug), so it isn't just a "magic number". In practice, 300s is also enough for any normal kgdb operation. And compared to "resetting after a few calls to rcu_gp_fqs()", this simple solution means "automatically resetting after 300s".
Please keep in mind that the long-ago choice of 300s did not take things like kernel debuggers into account. But that 300s limit still makes sense in the absence of things like kernel debuggers. So this code is what takes up the difference.
If this is completely unacceptable, I prefer Thomas's tick_estimate_stale_jiffies() solution.
Thomas's tick_estimate_stale_jiffies() does have its merits, but the advantage of ULONG_MAX/4 is that you don't have to care about jiffies being stale.
Thanx, Paul
The advantage of ULONG_MAX/4 over ULONG_MAX/2 is that the time_after() and time_before() macros have ULONG_MAX/4 slop in either direction before giving you the wrong answer. You can get nearly the same result using ULONG_MAX/2, but it requires a bit more care. And even on 32-bit HZ=1000 systems, ULONG_MAX/4 gets you more than 12 days of gdb session or jiffies-update delay before you start getting false positives.
Then things can be reset after (say) 3 calls to rcu_gp_fqs() and also the current reset at the beginning of a grace period, which is in record_gp_stall_check_time().
It would be better if RCU could get notified at both ends of the debug session, but given gdb commands such as "next", along with Thomas's point about gdb breakpoints being pretty much anywhere, this might or might not be so helpful in real life. But worth looking into.
Thanx, Paul
> Huacai > > > > > Thanx, Paul > > > > > Thanks, > > > > > > tglx > > > --- > > > --- a/kernel/time/tick-sched.c > > > +++ b/kernel/time/tick-sched.c > > > @@ -51,6 +51,13 @@ struct tick_sched *tick_get_tick_sched(i > > > */ > > > static ktime_t last_jiffies_update; > > > > > > +unsigned long tick_estimate_stale_jiffies(void) > > > +{ > > > + ktime_t delta = ktime_get_mono_fast_ns() - READ_ONCE(last_jiffies_update); > > > + > > > + return delta < 0 ? 0 : div_s64(delta, TICK_NSEC); > > > +} > > > + > > > /* > > > * Must be called with interrupts disabled ! > > > */ > > > > > >