+linux-fsdevel
On Mon, Oct 20, 2025 at 9:28 AM Alexei Starovoitov alexei.starovoitov@gmail.com wrote:
On Mon, Oct 20, 2025 at 1:59 AM Xing Guo higuoxing@gmail.com wrote:
Test with fsync:
I doubt people will be reading this giant log. Please bisect it instead. Since it's not reproducible when /tmp is backed by tmpfs it's probably some change in vfs or in the file system that your laptop is using for /tmp. It changes a user visible behavior of the file system and needs to be investigated, since it may affect more code than just this selftest.
dmesg output was certainly too much, but I filtered all that out. Here are relevant pieces of strace log.
BEFORE (FAILING) ================ openat(AT_FDCWD, "/tmp/bpf_arg_parsing_test.Pf280c", O_RDWR|O_CREAT|O_EXCL, 0600) = 4 fcntl(4, F_GETFL) = 0x8002 (flags O_RDWR|O_LARGEFILE) fstat(4, {st_mode=S_IFREG|0600, st_size=0, ...}) = 0 write(4, "# comment\n test_with_spaces "..., 175) = 175 openat(AT_FDCWD, "/tmp/bpf_arg_parsing_test.Pf280c", O_RDONLY) = 5 fstat(5, {st_mode=S_IFREG|0600, st_size=0, ...}) = 0 read(5, "", 8192) = 0 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -- THIS IS BAD, NO CONTENTS close(5) = 0 close(4) = 0 unlink("/tmp/bpf_arg_parsing_test.Pf280c") = 0
WITH SYNC ========= openat(AT_FDCWD, "/tmp/bpf_arg_parsing_test.UK5nUq", O_RDWR|O_CREAT|O_EXCL, 0600) = 4 fcntl(4, F_GETFL) = 0x8002 (flags O_RDWR|O_LARGEFILE) fstat(4, {st_mode=S_IFREG|0600, st_size=0, ...}) = 0 write(4, "# comment\n test_with_spaces "..., 175) = 175 fsync(4) = 0 openat(AT_FDCWD, "/tmp/bpf_arg_parsing_test.UK5nUq", O_RDONLY) = 5 fstat(5, {st_mode=S_IFREG|0600, st_size=175, ...}) = 0 read(5, "# comment\n test_with_spaces "..., 8192) = 175 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -- GOOD, because fsync(4) before second openat() read(5, "", 8192) = 0 close(5) = 0 close(4) = 0 unlink("/tmp/bpf_arg_parsing_test.UK5nUq") = 0
WITH CLOSE ========== openat(AT_FDCWD, "/tmp/bpf_arg_parsing_test.WavYEa", O_RDWR|O_CREAT|O_EXCL, 0600) = 4 fcntl(4, F_GETFL) = 0x8002 (flags O_RDWR|O_LARGEFILE) fstat(4, {st_mode=S_IFREG|0600, st_size=0, ...}) = 0 write(4, "# comment\n test_with_spaces "..., 175) = 175 close(4) = 0 openat(AT_FDCWD, "/tmp/bpf_arg_parsing_test.WavYEa", O_RDONLY) = 4 fstat(4, {st_mode=S_IFREG|0600, st_size=175, ...}) = 0 read(4, "# comment\n test_with_spaces "..., 8192) = 175 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -- GOOD, because close(4) before second openat() read(4, "", 8192) = 0 close(4) = 0 unlink("/tmp/bpf_arg_parsing_test.WavYEa") = 0
So as can be seen above, kernel does see the write(4, <175 bytes of content>) in all cases (so libc's fflush(fp) works as expected), but without either fsync(4) or close(4), kernel won't return those 175 bytes if we open() same file (returning FD 5 this time).
Is that a reasonable behavior of the kernel? I don't know, it would be good for FS folks to double check/confirm. The complication here is that we have two FDs open against the same underlying file (so my assumption is that kernel should share underlying page cache data), and documentation I've found isn't particularly clear on guarantees in that case.
write()'s man page states:
POSIX requires that a read(2) which can be proved to occur after a
write() has returned returns the new data. Note that not all file systems are POSIX conforming.
(but this doesn't clarify if all this is applied only within the same *FD*)
POSIX itself says:
Writes can be serialized with respect to other reads and writes.
If a read() of file data can be proven (by any means) to occur after a write() of the data, it must reflect that write(), even if the calls are made by different processes. A similar requirement applies to multiple write operations to the same file position. This is needed to guarantee the propagation of data from write() calls to subsequent read() calls. This requirement is particularly significant for networked file systems, where some caching schemes violate these semantics.
But again, no mention of multiple FDs opened against the same underlying file.
So unclear, which is why it would be nice for FS folks to double check. It's certainly a change in behavior, it used to work reliably before. [0] is the source code of the test (and note that we now added fsync(), without it the test is now broken).
[0] https://github.com/torvalds/linux/blob/master/tools/testing/selftests/bpf/pr...