On 4/12/22 12:28, john.p.donnelly@oracle.com wrote:
On 4/11/22 4:07 PM, Waiman Long wrote:
On 4/11/22 17:03, john.p.donnelly@oracle.com wrote:
I have reached out to Waiman and he suggested this for our next test pass:
1ee326196c6658 locking/rwsem: Always try to wake waiters in out_nolock path
Does this commit help to avoid the lockup problem?
Commit 1ee326196c6658 fixes a potential missed wakeup problem when a reader first in the wait queue is interrupted out without acquiring the lock. It is actually not a fix for commit d257cc8cb8d5. However, this commit changes the out_nolock path behavior of writers by leaving the handoff bit set when the wait queue isn't empty. That likely makes the missed wakeup problem easier to reproduce.
Cheers, Longman
Hi,
We are testing now
ETA for fio soak test completion is ~15hr from now.
I wanted to share the stack traces for future reference + occurrences.
I am looking forward to your testing results tomorrow.
Cheers, Longman
Hi
Our 24hr fio soak test with :
1ee326196c6658 locking/rwsem: Always try to wake waiters in out_nolock path
applied to 5.15.30 passed.
I suggest you append 1ee326196c6658 with :
cc: stable
Fixes: d257cc8cb8d5 ("locking/rwsem: Make handoff bit handling more consistent")
I'll leave the implementation details up to the core maintainers how to do that ;-)
Thanks for the test.
The patch has already been in the tip tree. It may not be easy to add a Fixes tag to it. Anyway, I will encourage stable tree maintainer to take it as it does fix a problem as shown in your test.
Cheers, Longman