On Wed, Mar 10, 2021 at 10:43:33AM -0800, Linus Torvalds wrote:
Just a note to the stable tree: this commit has been reverted upstream, because it causes a huge performance drop (admittedly on a load and setup that may not be all that relevant to most people).
It was applied to 4.4, 4.9 and 4.12, because the commit it was marked as "fixing" is from 2012, but it turns out that the early exit from the loop in that commit was very much intentional, and very much shows up on scalability benchmarks.
I don't think this is likely to be a big deal for the stable kernels - we're basically talking tuning for special cases, and while it is reverted in my tree now, the "correct" thing to do is likely to be a bit more flexible than either "exit loop immediately" or "loop for as long as we have contention".
In practice, most machines probably won't see either case - or it will at least be rare enough that you can't tell.
The machine that reports a huge performance drop was a multi-socket machine under fairly extreme conditions, and these contention issues are often close to exponential - a smaller machine (or a slighly less extreme load) would never see the issue at all either way.
See
https://lore.kernel.org/lkml/20210301080404.GF12822@xsang-OptiPlex-9020/
for details if you care. I don't think this has to necessarily be undone in the stable trees, this email is more of an incidental note just as a heads-up.
Thanks for the details, I'll look into reverting it in a future stable release.
greg k-h