On Mon, Nov 11, 2024 at 11:20:17PM +0800, yangerkun wrote:
在 2024/11/11 22:39, Chuck Lever III 写道:
On Nov 10, 2024, at 9:36 PM, Yu Kuai yukuai1@huaweicloud.com wrote: I'm in the cc list ,so I assume you saw my set, then I don't know why you're ignoring my concerns.
- next_offset is 32-bit and can overflow in a long-time running
machine. 2) Once next_offset overflows, readdir will skip the files that offset is bigger.
I'm sorry, I'm a little busy these days, so I haven't responded to this series of emails.
In that case, that entry won't be visible via getdents(3) until the directory is re-opened or the process does an lseek(fd, 0, SEEK_SET).
Yes.
That is the proper and expected behavior. I suspect you will see exactly that behavior with ext4 and 32-bit directory offsets, for example.
Emm...
For this case like this:
- mkdir /tmp/dir and touch /tmp/dir/file1 /tmp/dir/file2
- open /tmp/dir with fd1
- readdir and get /tmp/dir/file1
- rm /tmp/dir/file2
- touch /tmp/dir/file2
- loop 4~5 for 2^32 times
- readdir /tmp/dir with fd1
For tmpfs now, we may see no /tmp/dir/file2, since the offset has been overflow, for ext4 it is ok... So we think this will be a problem.
I constructed a simple test program using the above steps:
/* * 1. mkdir /tmp/dir and touch /tmp/dir/file1 /tmp/dir/file2 * 2. open /tmp/dir with fd1 * 3. readdir and get /tmp/dir/file1 * 4. rm /tmp/dir/file2 * 5. touch /tmp/dir/file2 * 6. loop 4~5 for 2^32 times * 7. readdir /tmp/dir with fd1 */
#include <sys/types.h> #include <sys/stat.h>
#include <dirent.h> #include <errno.h> #include <fcntl.h> #include <unistd.h> #include <stdbool.h> #include <stdio.h> #include <string.h>
static void list_directory(DIR *dirp) { struct dirent *de;
errno = 0; do { de = readdir(dirp); if (!de) break;
printf("d_off: %lld\n", de->d_off); printf("d_name: %s\n", de->d_name); } while (true);
if (errno) perror("readdir"); else printf("EOD\n"); }
int main(int argc, char **argv) { unsigned long i; DIR *dirp; int ret;
/* 1. */ ret = mkdir("/tmp/dir", 0755); if (ret < 0) { perror("mkdir"); return 1; }
ret = creat("/tmp/dir/file1", 0644); if (ret < 0) { perror("creat"); return 1; } close(ret);
ret = creat("/tmp/dir/file2", 0644); if (ret < 0) { perror("creat"); return 1; } close(ret);
/* 2. */ errno = 0; dirp = opendir("/tmp/dir"); if (!dirp) { if (errno) perror("opendir"); else fprintf(stderr, "EOD\n"); closedir(dirp); return 1; }
/* 3. */ errno = 0; do { struct dirent *de;
de = readdir(dirp); if (!de) { if (errno) { perror("readdir"); closedir(dirp); return 1; } break; } if (strcmp(de->d_name, "file1") == 0) { printf("Found 'file1'\n"); break; } } while (true);
/* run the test. */ for (i = 0; i < 10000; i++) { /* 4. */ ret = unlink("/tmp/dir/file2"); if (ret < 0) { perror("unlink"); closedir(dirp); return 1; }
/* 5. */ ret = creat("/tmp/dir/file2", 0644); if (ret < 0) { perror("creat"); fprintf(stderr, "i = %lu\n", i); closedir(dirp); return 1; } close(ret); }
/* 7. */ printf("\ndirectory after test:\n"); list_directory(dirp);
/* cel. */ rewinddir(dirp); printf("\ndirectory after rewind:\n"); list_directory(dirp);
closedir(dirp); return 0; }
Does that not directly address your concern? Or do you mean that Erkun's patch introduces a new issue?
Yes, to be honest, my personal feeling is a problem. But for 64bit, it may never been trigger.
I ran the test program above on this kernel:
https://git.kernel.org/pub/scm/linux/kernel/git/cel/linux.git/log/?h=nfsd-te...
Note that it has a patch to restrict the range of directory offset values for tmpfs to 2..4096.
I did not observe any unexpected behavior after the offset values wrapped. At step 7, I can always see file2, and its offset is always 4. At step "cel" I can see all expected directory entries.
I tested on v6.12-rc7 with the same range restriction but using Maple tree and 64-bit offsets. No unexpected behavior there either.
So either we're still missing something, or there is no problem. My only theory is maybe it's an issue with an implicit integer sign conversion, and we should restrict the offset range to 2..S32_MAX.
I can try testing with a range of (U32_MAX - 4096)..(U32_MAX).
If there is a problem here, please construct a reproducer against this patch set and post it.
Invitation still stands: if you have a solid reproducer, please post it.