David Howells dhowells@redhat.com wrote:
Here are some miscellaneous fixes and changes for netfslib and cifs, if you could consider pulling them. All the bugs fixed were observed in cifs, so they should probably go through the cifs tree unless Christian would much prefer for them to go through the VFS tree.
Hi David,
your commit 2b1424cd131c ("netfs: Fix wait/wake to be consistent about the waitqueue used") has given me serious headaches; it has caused outages in our web hosting clusters (yet again - all Linux versions since 6.9 had serious netfs regressions). Your patch was backported to 6.15 as commit 329ba1cb402a in 6.15.3 (why oh why??), and therefore the bugs it has caused will be "available" to all Linux stable users.
The problem we had is that writing to certain files never finishes. It looks like it has to do with the cachefiles subrequest never reporting completion. (We use Ceph with cachefiles)
I have tried applying the fixes in this pull request, which sounded promising, but the problem is still there. The only thing that helps is reverting 2b1424cd131c completely - everything is fine with 6.15.5 plus the revert.
What do you need from me in order to analyze the bug?
Max