#regzbot introduced: 3ee1a1fc3981
Dear maintainers,
I think I have found a cifs regression in the 6.10 kernel series, which leads certain programs to write corrupt data.
After upgrading from kernel 6.9.12 to 6.10.6, flatpak and ostree are now writing bad gpg signatures when exporting signed packages or signing their repository metadata/summary files, whenever the repository is on a cifs mount. Instead of writing the signature data, null bytes are written in its place.
Furthermore, ffmpeg and mkvmerge are now intermittently writing corrupt files to cifs mounts.
No error is reported by the applications or the kernel when it happens. In the case of flatpak, the problem isn't revealed until something tries to use the repository and finds signatures full of null bytes. (Of course, this means the affected repositories have been rendered useless.) In the case of ffmpeg and mkvmerge, the problem isn't revealed until someone plays the video file and reaches a corrupt section.
A kernel bisect reveals this:
3ee1a1fc39819906f04d6c62c180e760cd3a689d is the first bad commit commit 3ee1a1fc39819906f04d6c62c180e760cd3a689d Author: David Howells dhowells@redhat.com Date: Fri Oct 6 18:29:59 2023 +0100 cifs: Cut over to using netfslib
I was unable to determine whether 6.11.0-rc4 fixes it, due to another cifs bug in that version (which I hope to report soon).
An strace of flatpak (which uses libostree) shows it generating correct signatures internally, but behaving differently on cifs vs. ext4 when working with memory-mapped temp files, in which the signatures are stored before being written to their final outputs. Here's where I reported my initial findings to those projects: https://github.com/flatpak/flatpak/issues/5911 https://github.com/ostreedev/ostree/issues/3288
Debian Testing and Unstable kernels (6.10.4-1 and 6.10.6-1) are affected: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1079394
The following reproducer script consistently triggers the problem for me. Run it with two arguments: a path on a cifs mount where an ostree repo should be created, and a GPG key ID with which to sign a commit.
#!/bin/sh set -e
if [ "$#" -lt 2 ] || [ "$1" = "-h" ] ; then echo "usage: $(basename "$0") <repo-dir> <gpg-key-id>" exit 2 fi
repo=$1 keyid=$2 src="./foo"
echo "creating ostree repo at $repo" ostree init --repo="$repo"
echo "creating source file tree at $src" mkdir -p "$src" echo hi > "$src"/hello
ostree commit --repo="$repo" --branch=foo --gpg-sign="$keyid" "$src"
if ostree show --repo="$repo" foo; then echo --- echo success! else echo --- ostree show --repo="$repo" --print-detached-metadata-key=ostree.gpgsigs foo echo failure! echo look for null bytes in the above commit signature fi
On Sat, 24 Aug 2024 18:50:40 -0700, Forest wrote:
I was unable to determine whether 6.11.0-rc4 fixes it, due to another cifs bug in that version (which I hope to report soon).
That bug is now reported:
https://lore.kernel.org/linux-cifs/37fncjpgsq45becdf2pdju0idf3hj3dtmb@sonic....
A pair of patches considered in that bug's discussion allowed me to test this regression on 3e9bff3bbe13, which is one commit ahead of v6.11-rc5. The mkvmerge output corruption is still present.
On Sat, 24 Aug 2024 18:50:40 -0700, Forest wrote:
I think I have found a cifs regression in the 6.10 kernel series, which leads certain programs to write corrupt data.
[...]
3ee1a1fc39819906f04d6c62c180e760cd3a689d is the first bad commit
Write corruption still exists in 6.11.0-rc6.
Bad ostree signatures may be fixed in 6.11.0-rc6. (My reproducer didn't trigger it in that version.)
Forest forestix@nom.one wrote:
Write corruption still exists in 6.11.0-rc6.
Can you try adding this:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
Unfortunately, it managed to miss -rc6 because Linus released early before the PR could be sent to him.
David
On Wed, 04 Sep 2024 23:01:51 +0100, David Howells wrote:
Forest forestix@nom.one wrote:
Write corruption still exists in 6.11.0-rc6.
Can you try adding this:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
That patch looks promising. With it, I've run my tests 2-3 times more than usual, and there has been no sign of the corrupt writes so far. Thank you!
Unfortunately, it managed to miss -rc6 because Linus released early before the PR could be sent to him.
Will these fixes be applied to the 6.10 series as well?
On Thu, Sep 5, 2024 at 2:32 AM Forest forestix@nom.one wrote:
On Wed, 04 Sep 2024 23:01:51 +0100, David Howells wrote:
Forest forestix@nom.one wrote:
Write corruption still exists in 6.11.0-rc6.
Can you try adding this:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c26096ee0278c5e765009c5eee427bbafe6dc090
That patch looks promising. With it, I've run my tests 2-3 times more than usual, and there has been no sign of the corrupt writes so far. Thank you!
Unfortunately, it managed to miss -rc6 because Linus released early before the PR could be sent to him.
Will these fixes be applied to the 6.10 series as well?
It is queued for 6.10 stable based on recent email from Greg KH - see email titled:
[PATCH 6.10 181/184] mm: Fix filemap_invalidate_inode() to use invalidate_inode_pages2_range()
linux-stable-mirror@lists.linaro.org