On Thu, Jul 27, 2023 at 07:39:54AM -0700, Guenter Roeck wrote:
On 7/27/23 07:06, Paul E. McKenney wrote:
On Thu, Jul 27, 2023 at 09:26:52AM -0400, Joel Fernandes wrote:
On Jul 27, 2023, at 7:35 AM, Pavel Machek pavel@denx.de wrote:
Hi!
This is the start of the stable review cycle for the 6.4.7 release. There are 227 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 27 Jul 2023 10:44:26 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.4.7-rc1.g... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.4.y and the diffstat can be found below.
I saw this when running rcutorture, this one happened in the TREE04 configuration. This is likely due to the stuttering issues we are discussing in the other thread. Anyway I am just making a note here while I am continuing to look into it.
So is the stuttering new in 6.4.7?
No it is an old feature in RCU torture tests. But is dependent on timing. Something changed in recent kernels that is making the issues with it more likely. Its hard to bisect as failure sometimes takes hours.
Other than that, all tests pass: Tested-by: Joel Fernandes (Google) joel@joelfernandes.org
...or you still believe 6.4.7 is okay to release?
As such, it should be Ok. However naturally I am not happy that the RCU testing is intermittently failing. These issues have been seen in last several 6.4 stable releases so since those were released, maybe this one can be too? The fix for stuttering is currently being reviewed.
Or, to look at it another way, the stuttering fix is specific to torture testing. Would we really want to hold up a -stable release only because rcutorture occasionally gives a false-positive failure on certain types of systems?
No. However, (unrelated) in linux-next, rcu tests sometimes result in apparent hangs or long runtime.
[ 0.778841] Mount-cache hash table entries: 512 (order: 0, 4096 bytes, linear) [ 0.779011] Mountpoint-cache hash table entries: 512 (order: 0, 4096 bytes, linear) [ 0.797998] Running RCU synchronous self tests [ 0.798209] Running RCU synchronous self tests [ 0.912368] smpboot: CPU0: AMD Opteron 63xx class CPU (family: 0x15, model: 0x2, stepping: 0x0) [ 0.923398] RCU Tasks: Setting shift to 2 and lim to 1 rcu_task_cb_adjust=1. [ 0.925419] Running RCU-tasks wait API self tests
(hangs until aborted). This is primarily with Opteron CPUs, but also with others such as Haswell, Icelake-Server, and pentium3. It is all but impossible to bisect because it doesn't happen all the time. All I was able to figure out was that it has to do with rcu changes in linux-next. I'd be much more concerned about that.
First I have heard of this, so thank you for letting me know.
About what fraction of the time does this happen?
Thanx, Paul