On Sun, Aug 13, 2023 at 08:24:39PM +0000, Joel Fernandes wrote:
On Sun, Aug 13, 2023 at 06:34:27PM +0200, Greg KH wrote:
On Sun, Aug 13, 2023 at 03:15:34AM +0000, Joel Fernandes (Google) wrote:
From: Joel Fernandes joel@joelfernandes.org
During shutdown of rcutorture, the shutdown thread in rcu_torture_cleanup() calls torture_cleanup_begin() which sets fullstop to FULLSTOP_RMMOD. This is enough to cause the rcutorture threads for readers and fakewriters to breakout of their main while loop and start shutting down.
Once out of their main loop, they then call torture_kthread_stopping() which in turn waits for kthread_stop() to be called, however rcu_torture_cleanup() has not even called kthread_stop() on those threads yet, it does that a bit later. However, before it gets a chance to do so, torture_kthread_stopping() calls schedule_timeout_interruptible(1) in a tight loop. Tracing confirmed this makes the timer softirq constantly execute timer callbacks, while never returning back to the softirq exit path and is essentially "locked up" because of that. If the softirq preempts the shutdown thread, kthread_stop() may never be called.
This commit improves the situation dramatically, by increasing timeout passed to schedule_timeout_interruptible() 1/20th of a second. This causes the timer softirq to not lock up a CPU and everything works fine. Testing has shown 100 runs of TREE07 passing reliably, which was not the case before because of RCU stalls.
Cc: Paul McKenney paulmck@kernel.org Cc: Frederic Weisbecker fweisbec@gmail.com Cc: Zhouyi Zhou zhouzhouyi@gmail.com Cc: stable@vger.kernel.org # 6.0.x Signed-off-by: Joel Fernandes (Google) joel@joelfernandes.org Reviewed-by: Davidlohr Bueso dave@stgolabs.net Tested-by: Zhouyi Zhou zhouzhouyi@gmail.com
kernel/torture.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
Any hint as to what the git commit id in Linus's tree for this, and the other patches you just sent, are? I kind of need that to keep track of things...
Apologies, I added the SHA to the 5.15 ones but not 5.10. Here they are for 5.10:
1/3 d52d3a2bf408ff86f3a79560b5cce80efb340239 ("torture: Fix hang during kthread shutdown phase")
2/3 a1ff03cd6fb9c501fff63a4a2bface9adcfa81cd ("tick: Detect and fix jiffies update stall")
3/3 62c1256d544747b38e77ca9b5bfe3a26f9592576 ("timers/nohz: Switch to ONESHOT_STOPPED in the low-res handler when the tick is stopped")
In case you wish to pull them in via git, I have uploaded them to: Git: https://github.com/joelagnel/linux-kernel.git Branch: rcu/linux-5.10.y.aug13.greg
Can you resend these with the git sha1 in the message like you did for 5.15.y (but the correct one) so I can take them that way? My scripts are set up for email, not github pulls :)
thanks,
greg k-h