On Nov 11, 2024, at 10:20 AM, yangerkun yangerkun@huaweicloud.com wrote:
在 2024/11/11 22:39, Chuck Lever III 写道:
On Nov 10, 2024, at 9:36 PM, Yu Kuai yukuai1@huaweicloud.com wrote:
Hi,
在 2024/11/11 8:52, cel@kernel.org 写道:
From: yangerkun yangerkun@huawei.com [ Upstream commit 64a7ce76fb901bf9f9c36cf5d681328fc0fd4b5a ] After we switch tmpfs dir operations from simple_dir_operations to simple_offset_dir_operations, every rename happened will fill new dentry to dest dir's maple tree(&SHMEM_I(inode)->dir_offsets->mt) with a free key starting with octx->newx_offset, and then set newx_offset equals to free key + 1. This will lead to infinite readdir combine with rename happened at the same time, which fail generic/736 in xfstests(detail show as below).
- create 5000 files(1 2 3...) under one dir
- call readdir(man 3 readdir) once, and get one entry
- rename(entry, "TEMPFILE"), then rename("TEMPFILE", entry)
- loop 2~3, until readdir return nothing or we loop too many times(tmpfs break test with the second condition)
We choose the same logic what commit 9b378f6ad48cf ("btrfs: fix infinite directory reads") to fix it, record the last_index when we open dir, and do not emit the entry which index >= last_index. The file->private_data
Please notice this requires last_index should never overflow, otherwise readdir will be messed up.
It would help your cause if you could be more specific than "messed up".
now used in offset dir can use directly to do this, and we also update the last_index when we llseek the dir file. Fixes: a2e459555c5f ("shmem: stable directory offsets") Signed-off-by: yangerkun yangerkun@huawei.com Link: https://lore.kernel.org/r/20240731043835.1828697-1-yangerkun@huawei.com Reviewed-by: Chuck Lever chuck.lever@oracle.com [brauner: only update last_index after seek when offset is zero like Jan suggested] Signed-off-by: Christian Brauner brauner@kernel.org Link: https://nvd.nist.gov/vuln/detail/CVE-2024-46701 [ cel: adjusted to apply to origin/linux-6.6.y ] Signed-off-by: Chuck Lever chuck.lever@oracle.com
fs/libfs.c | 37 +++++++++++++++++++++++++------------ 1 file changed, 25 insertions(+), 12 deletions(-) diff --git a/fs/libfs.c b/fs/libfs.c index a87005c89534..b59ff0dfea1f 100644 --- a/fs/libfs.c +++ b/fs/libfs.c @@ -449,6 +449,14 @@ void simple_offset_destroy(struct offset_ctx *octx) xa_destroy(&octx->xa); } +static int offset_dir_open(struct inode *inode, struct file *file) +{
- struct offset_ctx *ctx = inode->i_op->get_offset_ctx(inode);
- file->private_data = (void *)ctx->next_offset;
- return 0;
+}
Looks like xarray is still used.
That's not going to change, as several folks have already explained.
I'm in the cc list ,so I assume you saw my set, then I don't know why you're ignoring my concerns.
- next_offset is 32-bit and can overflow in a long-time running
machine. 2) Once next_offset overflows, readdir will skip the files that offset is bigger.
I'm sorry, I'm a little busy these days, so I haven't responded to this series of emails.
In that case, that entry won't be visible via getdents(3) until the directory is re-opened or the process does an lseek(fd, 0, SEEK_SET).
Yes.
That is the proper and expected behavior. I suspect you will see exactly that behavior with ext4 and 32-bit directory offsets, for example.
Emm...
For this case like this:
- mkdir /tmp/dir and touch /tmp/dir/file1 /tmp/dir/file2
- open /tmp/dir with fd1
- readdir and get /tmp/dir/file1
- rm /tmp/dir/file2
- touch /tmp/dir/file2
- loop 4~5 for 2^32 times
- readdir /tmp/dir with fd1
For tmpfs now, we may see no /tmp/dir/file2, since the offset has been overflow, for ext4 it is ok... So we think this will be a problem.
Does that not directly address your concern? Or do you mean that Erkun's patch introduces a new issue?
Yes, to be honest, my personal feeling is a problem. But for 64bit, it may never been trigger.
Thanks for confirming.
In that case, the preferred way to handle it is to fix the issue in upstream, and then backport that fix to LTS. Dependence on 64-bit offsets to avoid a failure case should be considered a workaround, not a real fix, IMHO.
Do you have a few moments to address it, or if not I will see to it.
I think reducing the xa_limit in simple_offset_add() to, say, 2..16 would make the reproducer fire almost immediately.
If there is a problem here, please construct a reproducer against this patch set and post it.
Thanks, Kuai
/**
- offset_dir_llseek - Advance the read position of a directory descriptor
- @file: an open directory whose position is to be updated
@@ -462,6 +470,9 @@ void simple_offset_destroy(struct offset_ctx *octx) */ static loff_t offset_dir_llseek(struct file *file, loff_t offset, int whence) {
- struct inode *inode = file->f_inode;
- struct offset_ctx *ctx = inode->i_op->get_offset_ctx(inode);
- switch (whence) { case SEEK_CUR: offset += file->f_pos;
@@ -475,8 +486,9 @@ static loff_t offset_dir_llseek(struct file *file, loff_t offset, int whence) } /* In this case, ->private_data is protected by f_pos_lock */
- file->private_data = NULL;
- return vfs_setpos(file, offset, U32_MAX);
- if (!offset)
- file->private_data = (void *)ctx->next_offset;
- return vfs_setpos(file, offset, LONG_MAX);
} static struct dentry *offset_find_next(struct xa_state *xas) @@ -505,7 +517,7 @@ static bool offset_dir_emit(struct dir_context *ctx, struct dentry *dentry) inode->i_ino, fs_umode_to_dtype(inode->i_mode)); } -static void *offset_iterate_dir(struct inode *inode, struct dir_context *ctx) +static void offset_iterate_dir(struct inode *inode, struct dir_context *ctx, long last_index) { struct offset_ctx *so_ctx = inode->i_op->get_offset_ctx(inode); XA_STATE(xas, &so_ctx->xa, ctx->pos); @@ -514,17 +526,21 @@ static void *offset_iterate_dir(struct inode *inode, struct dir_context *ctx) while (true) { dentry = offset_find_next(&xas); if (!dentry)
- return ERR_PTR(-ENOENT);
- return;
- if (dentry2offset(dentry) >= last_index) {
- dput(dentry);
- return;
- } if (!offset_dir_emit(ctx, dentry)) { dput(dentry);
- break;
- return; } dput(dentry); ctx->pos = xas.xa_index + 1; }
- return NULL;
} /** @@ -551,22 +567,19 @@ static void *offset_iterate_dir(struct inode *inode, struct dir_context *ctx) static int offset_readdir(struct file *file, struct dir_context *ctx) { struct dentry *dir = file->f_path.dentry;
- long last_index = (long)file->private_data; lockdep_assert_held(&d_inode(dir)->i_rwsem); if (!dir_emit_dots(file, ctx)) return 0;
- /* In this case, ->private_data is protected by f_pos_lock */
- if (ctx->pos == DIR_OFFSET_MIN)
- file->private_data = NULL;
- else if (file->private_data == ERR_PTR(-ENOENT))
- return 0;
- file->private_data = offset_iterate_dir(d_inode(dir), ctx);
- offset_iterate_dir(d_inode(dir), ctx, last_index); return 0;
} const struct file_operations simple_offset_dir_operations = {
- .open = offset_dir_open, .llseek = offset_dir_llseek, .iterate_shared = offset_readdir, .read = generic_read_dir,
-- Chuck Lever
-- Chuck Lever