On Thu, Oct 12, 2023 at 09:54:40AM -0700, poester wrote:
Since rolling out 6.1.56 we have been experiencing file corruption over NFSv3. We bisected it down to
f16fd0b11f0f NFS: Fix error handling for O_DIRECT write scheduling
But that doesn't cleanly revert so we ended up reverting all NFS changes from 6.1.56 and the corruption no longer occurs. Namely:
edd1f0614510 NFS: More fixes for nfs_direct_write_reschedule_io() d4729af1c73c NFS: Use the correct commit info in nfs_join_page_group() 1f49386d6779 NFS: More O_DIRECT accounting fixes for error paths 4d98038e5bd9 NFS: Fix O_DIRECT locking issues f16fd0b11f0f NFS: Fix error handling for O_DIRECT write scheduling
The test case is fairly easily reproduced for us:
dd if=testfile of=testfile2 oflag=direct; md5sum testfile*
shows a different md5sum between the two files on 6.1.56+ kernels. Interestingly, on 6.5.7 this problem does not occur even though it contains the same O_DIRECT patch as f16fd0b11f0f.
We opened a bugzilla on this:
https://bugzilla.kernel.org/show_bug.cgi?id=217999
But this seems like a critical issue to us which should likely be addressed in 6.1.58.
I don't touch bugzilla, but I'll go revert these now and push out a -rc release with the reverts as you aren't the only one who has reported this and it would be good to get it resolved.
thanks!
greg k-h