[RFC PATCH 0/6 6.6] Address rename/readdir bugs in fs/libfs.c

List overview All Threads
Download

newer

older

[PATCH v4 2/3] dt-bindings:...

[PATCHv2 5.15] udf: Allocate name...

cel＠kernel.org

11 Nov 2024 11 Nov '24

12:52 a.m.

From: Chuck Lever chuck.lever@oracle.com

Address several bugs in v6.6's libfs/shmemfs, including CVE-2024-46701.

Link: https://lore.kernel.org/stable/976C0DD5-4337-4C7D-92C6-A38C2EC335A4@oracle.c...

I'm still running the usual set of regression tests, but so far this set looks stable. I'm interested in hearing review comments and test results.

Branch for testing: https://git.kernel.org/pub/scm/linux/kernel/git/cel/linux.git/log/?h=nfsd-6....

Chuck Lever (5): libfs: Define a minimum directory offset libfs: Add simple_offset_empty() libfs: Fix simple_offset_rename_exchange() libfs: Add simple_offset_rename() API shmem: Fix shmem_rename2()

yangerkun (1): libfs: fix infinite directory reads for offset dir

fs/libfs.c | 135 +++++++++++++++++++++++++++++++++++++-------- include/linux/fs.h | 3 + mm/shmem.c | 7 +-- 3 files changed, 119 insertions(+), 26 deletions(-)

-- 2.47.0

Show replies by date

cel＠kernel.org

11 Nov 11 Nov

12:52 a.m.

New subject: [RFC PATCH 1/6 6.6] libfs: Define a minimum directory offset

From: Chuck Lever chuck.lever@oracle.com

[ Upstream commit 7beea725a8ca412c6190090ce7c3a13b169592a1 ]

This value is used in several places, so make it a symbolic constant.

Reviewed-by: Jan Kara jack@suse.cz Signed-off-by: Chuck Lever chuck.lever@oracle.com Link: https://lore.kernel.org/r/170820142741.6328.12428356024575347885.stgit@91.11... Signed-off-by: Christian Brauner brauner@kernel.org Stable-dep-of: ecba88a3b32d ("libfs: Add simple_offset_empty()") Signed-off-by: Chuck Lever chuck.lever@oracle.com --- fs/libfs.c | 13 ++++++++----- 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/fs/libfs.c b/fs/libfs.c index dc0f7519045f..4a2205afcc88 100644 --- a/fs/libfs.c +++ b/fs/libfs.c @@ -239,6 +239,11 @@ const struct inode_operations simple_dir_inode_operations = { }; EXPORT_SYMBOL(simple_dir_inode_operations);

+/* 0 is '.', 1 is '..', so always start with offset 2 or more */ +enum { + DIR_OFFSET_MIN = 2, +}; + static void offset_set(struct dentry *dentry, u32 offset) { dentry->d_fsdata = (void *)((uintptr_t)(offset)); @@ -260,9 +265,7 @@ void simple_offset_init(struct offset_ctx *octx) { xa_init_flags(&octx->xa, XA_FLAGS_ALLOC1); lockdep_set_class(&octx->xa.xa_lock, &simple_offset_xa_lock); - - /* 0 is '.', 1 is '..', so always start with offset 2 */ - octx->next_offset = 2; + octx->next_offset = DIR_OFFSET_MIN; }

/** @@ -275,7 +278,7 @@ void simple_offset_init(struct offset_ctx *octx) */ int simple_offset_add(struct offset_ctx *octx, struct dentry *dentry) { - static const struct xa_limit limit = XA_LIMIT(2, U32_MAX); + static const struct xa_limit limit = XA_LIMIT(DIR_OFFSET_MIN, U32_MAX); u32 offset; int ret;

@@ -480,7 +483,7 @@ static int offset_readdir(struct file *file, struct dir_context *ctx) return 0;

/* In this case, ->private_data is protected by f_pos_lock */ - if (ctx->pos == 2) + if (ctx->pos == DIR_OFFSET_MIN) file->private_data = NULL; else if (file->private_data == ERR_PTR(-ENOENT)) return 0;

-- 2.47.0

cel＠kernel.org

12:52 a.m.

New subject: [RFC PATCH 2/6 6.6] libfs: Add simple_offset_empty()

From: Chuck Lever chuck.lever@oracle.com

[ Upstream commit ecba88a3b32d733d41e27973e25b2bc580f64281 ]

For simple filesystems that use directory offset mapping, rely strictly on the directory offset map to tell when a directory has no children.

After this patch is applied, the emptiness test holds only the RCU read lock when the directory being tested has no children.

In addition, this adds another layer of confirmation that simple_offset_add/remove() are working as expected.

Reviewed-by: Jan Kara jack@suse.cz Signed-off-by: Chuck Lever chuck.lever@oracle.com Link: https://lore.kernel.org/r/170820143463.6328.7872919188371286951.stgit@91.116... Signed-off-by: Christian Brauner brauner@kernel.org Stable-dep-of: 5a1a25be995e ("libfs: Add simple_offset_rename() API") Signed-off-by: Chuck Lever chuck.lever@oracle.com --- fs/libfs.c | 32 ++++++++++++++++++++++++++++++++ include/linux/fs.h | 1 + mm/shmem.c | 4 ++-- 3 files changed, 35 insertions(+), 2 deletions(-)

diff --git a/fs/libfs.c b/fs/libfs.c index 4a2205afcc88..66b428f3fc41 100644 --- a/fs/libfs.c +++ b/fs/libfs.c @@ -312,6 +312,38 @@ void simple_offset_remove(struct offset_ctx *octx, struct dentry *dentry) offset_set(dentry, 0); }

+/** + * simple_offset_empty - Check if a dentry can be unlinked + * @dentry: dentry to be tested + * + * Returns 0 if @dentry is a non-empty directory; otherwise returns 1. + */ +int simple_offset_empty(struct dentry *dentry) +{ + struct inode *inode = d_inode(dentry); + struct offset_ctx *octx; + struct dentry *child; + unsigned long index; + int ret = 1; + + if (!inode || !S_ISDIR(inode->i_mode)) + return ret; + + index = DIR_OFFSET_MIN; + octx = inode->i_op->get_offset_ctx(inode); + xa_for_each(&octx->xa, index, child) { + spin_lock(&child->d_lock); + if (simple_positive(child)) { + spin_unlock(&child->d_lock); + ret = 0; + break; + } + spin_unlock(&child->d_lock); + } + + return ret; +} + /** * simple_offset_rename_exchange - exchange rename with directory offsets * @old_dir: parent of dentry being moved diff --git a/include/linux/fs.h b/include/linux/fs.h index 6c3d86532e3f..5104405ce3e6 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -3197,6 +3197,7 @@ struct offset_ctx { void simple_offset_init(struct offset_ctx *octx); int simple_offset_add(struct offset_ctx *octx, struct dentry *dentry); void simple_offset_remove(struct offset_ctx *octx, struct dentry *dentry); +int simple_offset_empty(struct dentry *dentry); int simple_offset_rename_exchange(struct inode *old_dir, struct dentry *old_dentry, struct inode *new_dir, diff --git a/mm/shmem.c b/mm/shmem.c index 5d076022da24..e0d014eaaf73 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -3373,7 +3373,7 @@ static int shmem_unlink(struct inode *dir, struct dentry *dentry)

static int shmem_rmdir(struct inode *dir, struct dentry *dentry) { - if (!simple_empty(dentry)) + if (!simple_offset_empty(dentry)) return -ENOTEMPTY;

drop_nlink(d_inode(dentry)); @@ -3430,7 +3430,7 @@ static int shmem_rename2(struct mnt_idmap *idmap, return simple_offset_rename_exchange(old_dir, old_dentry, new_dir, new_dentry);

- if (!simple_empty(new_dentry)) + if (!simple_offset_empty(new_dentry)) return -ENOTEMPTY;

if (flags & RENAME_WHITEOUT) {

-- 2.47.0

cel＠kernel.org

12:52 a.m.

New subject: [RFC PATCH 3/6 6.6] libfs: Fix simple_offset_rename_exchange()

From: Chuck Lever chuck.lever@oracle.com

[ Upstream commit 23cdd0eed3f1fff3af323092b0b88945a7950d8e ]

User space expects the replacement (old) directory entry to have the same directory offset after the rename.

Suggested-by: Christian Brauner brauner@kernel.org Fixes: a2e459555c5f ("shmem: stable directory offsets") Signed-off-by: Chuck Lever chuck.lever@oracle.com Link: https://lore.kernel.org/r/20240415152057.4605-2-cel@kernel.org Signed-off-by: Christian Brauner brauner@kernel.org [ cel: adjusted to apply to origin/linux-6.6.y ] Signed-off-by: Chuck Lever chuck.lever@oracle.com --- fs/libfs.c | 25 +++++++++++++++++++------ 1 file changed, 19 insertions(+), 6 deletions(-)

diff --git a/fs/libfs.c b/fs/libfs.c index 66b428f3fc41..9fec0113a83f 100644 --- a/fs/libfs.c +++ b/fs/libfs.c @@ -294,6 +294,18 @@ int simple_offset_add(struct offset_ctx *octx, struct dentry *dentry) return 0; }

+static int simple_offset_replace(struct offset_ctx *octx, struct dentry *dentry, + u32 offset) +{ + void *ret; + + ret = xa_store(&octx->xa, offset, dentry, GFP_KERNEL); + if (xa_is_err(ret)) + return xa_err(ret); + offset_set(dentry, offset); + return 0; +} + /** * simple_offset_remove - Remove an entry to a directory's offset map * @octx: directory offset ctx to be updated @@ -351,6 +363,9 @@ int simple_offset_empty(struct dentry *dentry) * @new_dir: destination parent * @new_dentry: destination dentry * + * This API preserves the directory offset values. Caller provides + * appropriate serialization. + * * Returns zero on success. Otherwise a negative errno is returned and the * rename is rolled back. */ @@ -368,11 +383,11 @@ int simple_offset_rename_exchange(struct inode *old_dir, simple_offset_remove(old_ctx, old_dentry); simple_offset_remove(new_ctx, new_dentry);

- ret = simple_offset_add(new_ctx, old_dentry); + ret = simple_offset_replace(new_ctx, old_dentry, new_index); if (ret) goto out_restore;

- ret = simple_offset_add(old_ctx, new_dentry); + ret = simple_offset_replace(old_ctx, new_dentry, old_index); if (ret) { simple_offset_remove(new_ctx, old_dentry); goto out_restore; @@ -387,10 +402,8 @@ int simple_offset_rename_exchange(struct inode *old_dir, return 0;

out_restore: - offset_set(old_dentry, old_index); - xa_store(&old_ctx->xa, old_index, old_dentry, GFP_KERNEL); - offset_set(new_dentry, new_index); - xa_store(&new_ctx->xa, new_index, new_dentry, GFP_KERNEL); + (void)simple_offset_replace(old_ctx, old_dentry, old_index); + (void)simple_offset_replace(new_ctx, new_dentry, new_index); return ret; }

-- 2.47.0

cel＠kernel.org

12:52 a.m.

New subject: [RFC PATCH 4/6 6.6] libfs: Add simple_offset_rename() API

From: Chuck Lever chuck.lever@oracle.com

[ Upstream commit 5a1a25be995e1014abd01600479915683e356f5c ]

I'm about to fix a tmpfs rename bug that requires the use of internal simple_offset helpers that are not available in mm/shmem.c

Signed-off-by: Chuck Lever chuck.lever@oracle.com Link: https://lore.kernel.org/r/20240415152057.4605-3-cel@kernel.org Signed-off-by: Christian Brauner brauner@kernel.org Stable-dep-of: ad191eb6d694 ("shmem: Fix shmem_rename2()") Signed-off-by: Chuck Lever chuck.lever@oracle.com --- fs/libfs.c | 21 +++++++++++++++++++++ include/linux/fs.h | 2 ++ mm/shmem.c | 3 +-- 3 files changed, 24 insertions(+), 2 deletions(-)

diff --git a/fs/libfs.c b/fs/libfs.c index 9fec0113a83f..b2dcb15d993a 100644 --- a/fs/libfs.c +++ b/fs/libfs.c @@ -356,6 +356,27 @@ int simple_offset_empty(struct dentry *dentry) return ret; }

+/** + * simple_offset_rename - handle directory offsets for rename + * @old_dir: parent directory of source entry + * @old_dentry: dentry of source entry + * @new_dir: parent_directory of destination entry + * @new_dentry: dentry of destination + * + * Caller provides appropriate serialization. + * + * Returns zero on success, a negative errno value on failure. + */ +int simple_offset_rename(struct inode *old_dir, struct dentry *old_dentry, + struct inode *new_dir, struct dentry *new_dentry) +{ + struct offset_ctx *old_ctx = old_dir->i_op->get_offset_ctx(old_dir); + struct offset_ctx *new_ctx = new_dir->i_op->get_offset_ctx(new_dir); + + simple_offset_remove(old_ctx, old_dentry); + return simple_offset_add(new_ctx, old_dentry); +} + /** * simple_offset_rename_exchange - exchange rename with directory offsets * @old_dir: parent of dentry being moved diff --git a/include/linux/fs.h b/include/linux/fs.h index 5104405ce3e6..e4d139fcaad0 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -3198,6 +3198,8 @@ void simple_offset_init(struct offset_ctx *octx); int simple_offset_add(struct offset_ctx *octx, struct dentry *dentry); void simple_offset_remove(struct offset_ctx *octx, struct dentry *dentry); int simple_offset_empty(struct dentry *dentry); +int simple_offset_rename(struct inode *old_dir, struct dentry *old_dentry, + struct inode *new_dir, struct dentry *new_dentry); int simple_offset_rename_exchange(struct inode *old_dir, struct dentry *old_dentry, struct inode *new_dir, diff --git a/mm/shmem.c b/mm/shmem.c index e0d014eaaf73..8e8998152a0f 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -3439,8 +3439,7 @@ static int shmem_rename2(struct mnt_idmap *idmap, return error; }

- simple_offset_remove(shmem_get_offset_ctx(old_dir), old_dentry); - error = simple_offset_add(shmem_get_offset_ctx(new_dir), old_dentry); + error = simple_offset_rename(old_dir, old_dentry, new_dir, new_dentry); if (error) return error;

-- 2.47.0

cel＠kernel.org

12:52 a.m.

New subject: [RFC PATCH 5/6 6.6] shmem: Fix shmem_rename2()

From: Chuck Lever chuck.lever@oracle.com

[ Upstream commit ad191eb6d6942bb835a0b20b647f7c53c1d99ca4 ]

When renaming onto an existing directory entry, user space expects the replacement entry to have the same directory offset as the original one.

Link: https://gitlab.alpinelinux.org/alpine/aports/-/issues/15966 Fixes: a2e459555c5f ("shmem: stable directory offsets") Signed-off-by: Chuck Lever chuck.lever@oracle.com Link: https://lore.kernel.org/r/20240415152057.4605-4-cel@kernel.org Signed-off-by: Christian Brauner brauner@kernel.org Signed-off-by: Chuck Lever chuck.lever@oracle.com --- fs/libfs.c | 9 +++++++++ 1 file changed, 9 insertions(+)

diff --git a/fs/libfs.c b/fs/libfs.c index b2dcb15d993a..a87005c89534 100644 --- a/fs/libfs.c +++ b/fs/libfs.c @@ -365,6 +365,9 @@ int simple_offset_empty(struct dentry *dentry) * * Caller provides appropriate serialization. * + * User space expects the directory offset value of the replaced + * (new) directory entry to be unchanged after a rename. + * * Returns zero on success, a negative errno value on failure. */ int simple_offset_rename(struct inode *old_dir, struct dentry *old_dentry, @@ -372,8 +375,14 @@ int simple_offset_rename(struct inode *old_dir, struct dentry *old_dentry, { struct offset_ctx *old_ctx = old_dir->i_op->get_offset_ctx(old_dir); struct offset_ctx *new_ctx = new_dir->i_op->get_offset_ctx(new_dir); + long new_offset = dentry2offset(new_dentry);

simple_offset_remove(old_ctx, old_dentry); + + if (new_offset) { + offset_set(new_dentry, 0); + return simple_offset_replace(new_ctx, old_dentry, new_offset); + } return simple_offset_add(new_ctx, old_dentry); }

-- 2.47.0

cel＠kernel.org

12:52 a.m.

New subject: [RFC PATCH 6/6 6.6] libfs: fix infinite directory reads for offset dir

From: yangerkun yangerkun@huawei.com

[ Upstream commit 64a7ce76fb901bf9f9c36cf5d681328fc0fd4b5a ]

After we switch tmpfs dir operations from simple_dir_operations to simple_offset_dir_operations, every rename happened will fill new dentry to dest dir's maple tree(&SHMEM_I(inode)->dir_offsets->mt) with a free key starting with octx->newx_offset, and then set newx_offset equals to free key + 1. This will lead to infinite readdir combine with rename happened at the same time, which fail generic/736 in xfstests(detail show as below).

1. create 5000 files(1 2 3...) under one dir 2. call readdir(man 3 readdir) once, and get one entry 3. rename(entry, "TEMPFILE"), then rename("TEMPFILE", entry) 4. loop 2~3, until readdir return nothing or we loop too many times(tmpfs break test with the second condition)

We choose the same logic what commit 9b378f6ad48cf ("btrfs: fix infinite directory reads") to fix it, record the last_index when we open dir, and do not emit the entry which index >= last_index. The file->private_data now used in offset dir can use directly to do this, and we also update the last_index when we llseek the dir file.

Fixes: a2e459555c5f ("shmem: stable directory offsets") Signed-off-by: yangerkun yangerkun@huawei.com Link: https://lore.kernel.org/r/20240731043835.1828697-1-yangerkun@huawei.com Reviewed-by: Chuck Lever chuck.lever@oracle.com [brauner: only update last_index after seek when offset is zero like Jan suggested] Signed-off-by: Christian Brauner brauner@kernel.org Link: https://nvd.nist.gov/vuln/detail/CVE-2024-46701 [ cel: adjusted to apply to origin/linux-6.6.y ] Signed-off-by: Chuck Lever chuck.lever@oracle.com --- fs/libfs.c | 37 +++++++++++++++++++++++++------------ 1 file changed, 25 insertions(+), 12 deletions(-)

diff --git a/fs/libfs.c b/fs/libfs.c index a87005c89534..b59ff0dfea1f 100644 --- a/fs/libfs.c +++ b/fs/libfs.c @@ -449,6 +449,14 @@ void simple_offset_destroy(struct offset_ctx *octx) xa_destroy(&octx->xa); }

+static int offset_dir_open(struct inode *inode, struct file *file) +{ + struct offset_ctx *ctx = inode->i_op->get_offset_ctx(inode); + + file->private_data = (void *)ctx->next_offset; + return 0; +} + /** * offset_dir_llseek - Advance the read position of a directory descriptor * @file: an open directory whose position is to be updated @@ -462,6 +470,9 @@ void simple_offset_destroy(struct offset_ctx *octx) */ static loff_t offset_dir_llseek(struct file *file, loff_t offset, int whence) { + struct inode *inode = file->f_inode; + struct offset_ctx *ctx = inode->i_op->get_offset_ctx(inode); + switch (whence) { case SEEK_CUR: offset += file->f_pos; @@ -475,8 +486,9 @@ static loff_t offset_dir_llseek(struct file *file, loff_t offset, int whence) }

/* In this case, ->private_data is protected by f_pos_lock */ - file->private_data = NULL; - return vfs_setpos(file, offset, U32_MAX); + if (!offset) + file->private_data = (void *)ctx->next_offset; + return vfs_setpos(file, offset, LONG_MAX); }

static struct dentry *offset_find_next(struct xa_state *xas) @@ -505,7 +517,7 @@ static bool offset_dir_emit(struct dir_context *ctx, struct dentry *dentry) inode->i_ino, fs_umode_to_dtype(inode->i_mode)); }

-static void *offset_iterate_dir(struct inode *inode, struct dir_context *ctx) +static void offset_iterate_dir(struct inode *inode, struct dir_context *ctx, long last_index) { struct offset_ctx *so_ctx = inode->i_op->get_offset_ctx(inode); XA_STATE(xas, &so_ctx->xa, ctx->pos); @@ -514,17 +526,21 @@ static void *offset_iterate_dir(struct inode *inode, struct dir_context *ctx) while (true) { dentry = offset_find_next(&xas); if (!dentry) - return ERR_PTR(-ENOENT); + return; + + if (dentry2offset(dentry) >= last_index) { + dput(dentry); + return; + }

if (!offset_dir_emit(ctx, dentry)) { dput(dentry); - break; + return; }

dput(dentry); ctx->pos = xas.xa_index + 1; } - return NULL; }

/** @@ -551,22 +567,19 @@ static void *offset_iterate_dir(struct inode *inode, struct dir_context *ctx) static int offset_readdir(struct file *file, struct dir_context *ctx) { struct dentry *dir = file->f_path.dentry; + long last_index = (long)file->private_data;

lockdep_assert_held(&d_inode(dir)->i_rwsem);

if (!dir_emit_dots(file, ctx)) return 0;

- /* In this case, ->private_data is protected by f_pos_lock */ - if (ctx->pos == DIR_OFFSET_MIN) - file->private_data = NULL; - else if (file->private_data == ERR_PTR(-ENOENT)) - return 0; - file->private_data = offset_iterate_dir(d_inode(dir), ctx); + offset_iterate_dir(d_inode(dir), ctx, last_index); return 0; }

const struct file_operations simple_offset_dir_operations = { + .open = offset_dir_open, .llseek = offset_dir_llseek, .iterate_shared = offset_readdir, .read = generic_read_dir,

-- 2.47.0

Yu Kuai

2:36 a.m.

New subject: [RFC PATCH 6/6 6.6] libfs: fix infinite directory reads for offset dir

Hi,

在 2024/11/11 8:52, cel@kernel.org 写道:

...

From: yangerkun yangerkun@huawei.com

[ Upstream commit 64a7ce76fb901bf9f9c36cf5d681328fc0fd4b5a ]

After we switch tmpfs dir operations from simple_dir_operations to simple_offset_dir_operations, every rename happened will fill new dentry to dest dir's maple tree(&SHMEM_I(inode)->dir_offsets->mt) with a free key starting with octx->newx_offset, and then set newx_offset equals to free key + 1. This will lead to infinite readdir combine with rename happened at the same time, which fail generic/736 in xfstests(detail show as below).

create 5000 files(1 2 3...) under one dir

call readdir(man 3 readdir) once, and get one entry

rename(entry, "TEMPFILE"), then rename("TEMPFILE", entry)

loop 2~3, until readdir return nothing or we loop too many times(tmpfs break test with the second condition)

We choose the same logic what commit 9b378f6ad48cf ("btrfs: fix infinite directory reads") to fix it, record the last_index when we open dir, and do not emit the entry which index >= last_index. The file->private_data

Please notice this requires last_index should never overflow, otherwise readdir will be messed up.

...

now used in offset dir can use directly to do this, and we also update the last_index when we llseek the dir file.

Fixes: a2e459555c5f ("shmem: stable directory offsets") Signed-off-by: yangerkun yangerkun@huawei.com Link: https://lore.kernel.org/r/20240731043835.1828697-1-yangerkun@huawei.com Reviewed-by: Chuck Lever chuck.lever@oracle.com [brauner: only update last_index after seek when offset is zero like Jan suggested] Signed-off-by: Christian Brauner brauner@kernel.org Link: https://nvd.nist.gov/vuln/detail/CVE-2024-46701 [ cel: adjusted to apply to origin/linux-6.6.y ] Signed-off-by: Chuck Lever chuck.lever@oracle.com

fs/libfs.c | 37 +++++++++++++++++++++++++------------ 1 file changed, 25 insertions(+), 12 deletions(-)

diff --git a/fs/libfs.c b/fs/libfs.c index a87005c89534..b59ff0dfea1f 100644 --- a/fs/libfs.c +++ b/fs/libfs.c @@ -449,6 +449,14 @@ void simple_offset_destroy(struct offset_ctx *octx) xa_destroy(&octx->xa); } +static int offset_dir_open(struct inode *inode, struct file *file) +{

struct offset_ctx *ctx = inode->i_op->get_offset_ctx(inode);

file->private_data = (void *)ctx->next_offset;

return 0;

+}

Looks like xarray is still used.

I'm in the cc list ,so I assume you saw my set, then I don't know why you're ignoring my concerns.

1) next_offset is 32-bit and can overflow in a long-time running machine. 2) Once next_offset overflows, readdir will skip the files that offset is bigger.

Thanks, Kuai

...

/**

offset_dir_llseek - Advance the read position of a directory descriptor

@file: an open directory whose position is to be updated

@@ -462,6 +470,9 @@ void simple_offset_destroy(struct offset_ctx *octx) */ static loff_t offset_dir_llseek(struct file *file, loff_t offset, int whence) {

struct inode *inode = file->f_inode;

struct offset_ctx *ctx = inode->i_op->get_offset_ctx(inode);

switch (whence) { case SEEK_CUR: offset += file->f_pos;

@@ -475,8 +486,9 @@ static loff_t offset_dir_llseek(struct file *file, loff_t offset, int whence) } /* In this case, ->private_data is protected by f_pos_lock */

file->private_data = NULL;

return vfs_setpos(file, offset, U32_MAX);
if (!offset)
file->private_data = (void *)ctx->next_offset;
return vfs_setpos(file, offset, LONG_MAX); }
static struct dentry *offset_find_next(struct xa_state *xas) @@ -505,7 +517,7 @@ static bool offset_dir_emit(struct dir_context *ctx, struct dentry *dentry) inode->i_ino, fs_umode_to_dtype(inode->i_mode)); } -static void *offset_iterate_dir(struct inode *inode, struct dir_context *ctx) +static void offset_iterate_dir(struct inode *inode, struct dir_context *ctx, long last_index) { struct offset_ctx *so_ctx = inode->i_op->get_offset_ctx(inode); XA_STATE(xas, &so_ctx->xa, ctx->pos); @@ -514,17 +526,21 @@ static void *offset_iterate_dir(struct inode *inode, struct dir_context *ctx) while (true) { dentry = offset_find_next(&xas); if (!dentry)
	return ERR_PTR(-ENOENT);
	return;
if (dentry2offset(dentry) >= last_index) {
	dput(dentry);
	return;
}
if (!offset_dir_emit(ctx, dentry)) { dput(dentry);
	break;
	return;
}
dput(dentry); ctx->pos = xas.xa_index + 1; }

return NULL; }

/** @@ -551,22 +567,19 @@ static void *offset_iterate_dir(struct inode *inode, struct dir_context *ctx) static int offset_readdir(struct file *file, struct dir_context *ctx) { struct dentry *dir = file->f_path.dentry;

long last_index = (long)file->private_data;

lockdep_assert_held(&d_inode(dir)->i_rwsem); if (!dir_emit_dots(file, ctx)) return 0;
/* In this case, ->private_data is protected by f_pos_lock */

if (ctx->pos == DIR_OFFSET_MIN)
file->private_data = NULL;
else if (file->private_data == ERR_PTR(-ENOENT))
return 0;
file->private_data = offset_iterate_dir(d_inode(dir), ctx);
offset_iterate_dir(d_inode(dir), ctx, last_index); return 0; }

const struct file_operations simple_offset_dir_operations = {

.open = offset_dir_open, .llseek = offset_dir_llseek, .iterate_shared = offset_readdir, .read = generic_read_dir,

Chuck Lever III

2:39 p.m.

New subject: [RFC PATCH 6/6 6.6] libfs: fix infinite directory reads for offset dir

...

On Nov 10, 2024, at 9:36 PM, Yu Kuai yukuai1@huaweicloud.com wrote:

Hi,

在 2024/11/11 8:52, cel@kernel.org 写道:

...
From: yangerkun yangerkun@huawei.com [ Upstream commit 64a7ce76fb901bf9f9c36cf5d681328fc0fd4b5a ] After we switch tmpfs dir operations from simple_dir_operations to simple_offset_dir_operations, every rename happened will fill new dentry to dest dir's maple tree(&SHMEM_I(inode)->dir_offsets->mt) with a free key starting with octx->newx_offset, and then set newx_offset equals to free key + 1. This will lead to infinite readdir combine with rename happened at the same time, which fail generic/736 in xfstests(detail show as below).

create 5000 files(1 2 3...) under one dir

call readdir(man 3 readdir) once, and get one entry

rename(entry, "TEMPFILE"), then rename("TEMPFILE", entry)

loop 2~3, until readdir return nothing or we loop too many times(tmpfs break test with the second condition)

We choose the same logic what commit 9b378f6ad48cf ("btrfs: fix infinite directory reads") to fix it, record the last_index when we open dir, and do not emit the entry which index >= last_index. The file->private_data

Please notice this requires last_index should never overflow, otherwise readdir will be messed up.

It would help your cause if you could be more specific than "messed up".

...

...
now used in offset dir can use directly to do this, and we also update the last_index when we llseek the dir file. Fixes: a2e459555c5f ("shmem: stable directory offsets") Signed-off-by: yangerkun yangerkun@huawei.com Link: https://lore.kernel.org/r/20240731043835.1828697-1-yangerkun@huawei.com Reviewed-by: Chuck Lever chuck.lever@oracle.com [brauner: only update last_index after seek when offset is zero like Jan suggested] Signed-off-by: Christian Brauner brauner@kernel.org Link: https://nvd.nist.gov/vuln/detail/CVE-2024-46701 [ cel: adjusted to apply to origin/linux-6.6.y ] Signed-off-by: Chuck Lever chuck.lever@oracle.com

fs/libfs.c | 37 +++++++++++++++++++++++++------------ 1 file changed, 25 insertions(+), 12 deletions(-) diff --git a/fs/libfs.c b/fs/libfs.c index a87005c89534..b59ff0dfea1f 100644 --- a/fs/libfs.c +++ b/fs/libfs.c @@ -449,6 +449,14 @@ void simple_offset_destroy(struct offset_ctx *octx) xa_destroy(&octx->xa); } +static int offset_dir_open(struct inode *inode, struct file *file) +{

struct offset_ctx *ctx = inode->i_op->get_offset_ctx(inode);

file->private_data = (void *)ctx->next_offset;

return 0;

+}

Looks like xarray is still used.

That's not going to change, as several folks have already explained.

...

I'm in the cc list ,so I assume you saw my set, then I don't know why you're ignoring my concerns.

...

next_offset is 32-bit and can overflow in a long-time running

machine. 2) Once next_offset overflows, readdir will skip the files that offset is bigger.

In that case, that entry won't be visible via getdents(3) until the directory is re-opened or the process does an lseek(fd, 0, SEEK_SET).

That is the proper and expected behavior. I suspect you will see exactly that behavior with ext4 and 32-bit directory offsets, for example.

Does that not directly address your concern? Or do you mean that Erkun's patch introduces a new issue?

If there is a problem here, please construct a reproducer against this patch set and post it.

...

Thanks, Kuai

...

/**

offset_dir_llseek - Advance the read position of a directory descriptor

@file: an open directory whose position is to be updated

@@ -462,6 +470,9 @@ void simple_offset_destroy(struct offset_ctx *octx) */ static loff_t offset_dir_llseek(struct file *file, loff_t offset, int whence) {

struct inode *inode = file->f_inode;

struct offset_ctx *ctx = inode->i_op->get_offset_ctx(inode);

switch (whence) { case SEEK_CUR: offset += file->f_pos;

@@ -475,8 +486,9 @@ static loff_t offset_dir_llseek(struct file *file, loff_t offset, int whence) } /* In this case, ->private_data is protected by f_pos_lock */

file->private_data = NULL;

return vfs_setpos(file, offset, U32_MAX);

if (!offset)

file->private_data = (void *)ctx->next_offset;

return vfs_setpos(file, offset, LONG_MAX);

} static struct dentry *offset_find_next(struct xa_state *xas) @@ -505,7 +517,7 @@ static bool offset_dir_emit(struct dir_context *ctx, struct dentry *dentry) inode->i_ino, fs_umode_to_dtype(inode->i_mode)); } -static void *offset_iterate_dir(struct inode *inode, struct dir_context *ctx) +static void offset_iterate_dir(struct inode *inode, struct dir_context *ctx, long last_index) { struct offset_ctx *so_ctx = inode->i_op->get_offset_ctx(inode); XA_STATE(xas, &so_ctx->xa, ctx->pos); @@ -514,17 +526,21 @@ static void *offset_iterate_dir(struct inode *inode, struct dir_context *ctx) while (true) { dentry = offset_find_next(&xas); if (!dentry)

return ERR_PTR(-ENOENT);

return;

if (dentry2offset(dentry) >= last_index) {

dput(dentry);

return;

} if (!offset_dir_emit(ctx, dentry)) { dput(dentry);

break;

return; } dput(dentry); ctx->pos = xas.xa_index + 1; }

return NULL;

} /** @@ -551,22 +567,19 @@ static void *offset_iterate_dir(struct inode *inode, struct dir_context *ctx) static int offset_readdir(struct file *file, struct dir_context *ctx) { struct dentry *dir = file->f_path.dentry;

long last_index = (long)file->private_data; lockdep_assert_held(&d_inode(dir)->i_rwsem); if (!dir_emit_dots(file, ctx)) return 0;

/* In this case, ->private_data is protected by f_pos_lock */

if (ctx->pos == DIR_OFFSET_MIN)

file->private_data = NULL;

else if (file->private_data == ERR_PTR(-ENOENT))

return 0;

file->private_data = offset_iterate_dir(d_inode(dir), ctx);

offset_iterate_dir(d_inode(dir), ctx, last_index); return 0;

} const struct file_operations simple_offset_dir_operations = {

.open = offset_dir_open, .llseek = offset_dir_llseek, .iterate_shared = offset_readdir, .read = generic_read_dir,

-- Chuck Lever

yangerkun

3:20 p.m.

New subject: [RFC PATCH 6/6 6.6] libfs: fix infinite directory reads for offset dir

在 2024/11/11 22:39, Chuck Lever III 写道:

...

...
On Nov 10, 2024, at 9:36 PM, Yu Kuai yukuai1@huaweicloud.com wrote:

Hi,

在 2024/11/11 8:52, cel@kernel.org 写道:

...
From: yangerkun yangerkun@huawei.com [ Upstream commit 64a7ce76fb901bf9f9c36cf5d681328fc0fd4b5a ] After we switch tmpfs dir operations from simple_dir_operations to simple_offset_dir_operations, every rename happened will fill new dentry to dest dir's maple tree(&SHMEM_I(inode)->dir_offsets->mt) with a free key starting with octx->newx_offset, and then set newx_offset equals to free key + 1. This will lead to infinite readdir combine with rename happened at the same time, which fail generic/736 in xfstests(detail show as below).

create 5000 files(1 2 3...) under one dir

call readdir(man 3 readdir) once, and get one entry

rename(entry, "TEMPFILE"), then rename("TEMPFILE", entry)

loop 2~3, until readdir return nothing or we loop too many times(tmpfs break test with the second condition)

We choose the same logic what commit 9b378f6ad48cf ("btrfs: fix infinite directory reads") to fix it, record the last_index when we open dir, and do not emit the entry which index >= last_index. The file->private_data

Please notice this requires last_index should never overflow, otherwise readdir will be messed up.

It would help your cause if you could be more specific than "messed up".

...
...
now used in offset dir can use directly to do this, and we also update the last_index when we llseek the dir file. Fixes: a2e459555c5f ("shmem: stable directory offsets") Signed-off-by: yangerkun yangerkun@huawei.com Link: https://lore.kernel.org/r/20240731043835.1828697-1-yangerkun@huawei.com Reviewed-by: Chuck Lever chuck.lever@oracle.com [brauner: only update last_index after seek when offset is zero like Jan suggested] Signed-off-by: Christian Brauner brauner@kernel.org Link: https://nvd.nist.gov/vuln/detail/CVE-2024-46701 [ cel: adjusted to apply to origin/linux-6.6.y ] Signed-off-by: Chuck Lever chuck.lever@oracle.com

fs/libfs.c | 37 +++++++++++++++++++++++++------------ 1 file changed, 25 insertions(+), 12 deletions(-) diff --git a/fs/libfs.c b/fs/libfs.c index a87005c89534..b59ff0dfea1f 100644 --- a/fs/libfs.c +++ b/fs/libfs.c @@ -449,6 +449,14 @@ void simple_offset_destroy(struct offset_ctx *octx) xa_destroy(&octx->xa); } +static int offset_dir_open(struct inode *inode, struct file *file) +{

struct offset_ctx *ctx = inode->i_op->get_offset_ctx(inode);

file->private_data = (void *)ctx->next_offset;

return 0;

+}

Looks like xarray is still used.

That's not going to change, as several folks have already explained.

...
I'm in the cc list ,so I assume you saw my set, then I don't know why you're ignoring my concerns.

...

next_offset is 32-bit and can overflow in a long-time running

machine. 2) Once next_offset overflows, readdir will skip the files that offset is bigger.

I'm sorry, I'm a little busy these days, so I haven't responded to this series of emails.

...

In that case, that entry won't be visible via getdents(3) until the directory is re-opened or the process does an lseek(fd, 0, SEEK_SET).

Yes.

...

That is the proper and expected behavior. I suspect you will see exactly that behavior with ext4 and 32-bit directory offsets, for example.

Emm...

For this case like this:

1. mkdir /tmp/dir and touch /tmp/dir/file1 /tmp/dir/file2 2. open /tmp/dir with fd1 3. readdir and get /tmp/dir/file1 4. rm /tmp/dir/file2 5. touch /tmp/dir/file2 4. loop 4~5 for 2^32 times 5. readdir /tmp/dir with fd1

For tmpfs now, we may see no /tmp/dir/file2, since the offset has been overflow, for ext4 it is ok... So we think this will be a problem.

...

Does that not directly address your concern? Or do you mean that Erkun's patch introduces a new issue?

Yes, to be honest, my personal feeling is a problem. But for 64bit, it may never been trigger.

...

If there is a problem here, please construct a reproducer against this patch set and post it.

...
Thanks, Kuai

...

/**

offset_dir_llseek - Advance the read position of a directory descriptor

@file: an open directory whose position is to be updated

@@ -462,6 +470,9 @@ void simple_offset_destroy(struct offset_ctx *octx) */ static loff_t offset_dir_llseek(struct file *file, loff_t offset, int whence) {

struct inode *inode = file->f_inode;

struct offset_ctx *ctx = inode->i_op->get_offset_ctx(inode);

switch (whence) { case SEEK_CUR: offset += file->f_pos;

@@ -475,8 +486,9 @@ static loff_t offset_dir_llseek(struct file *file, loff_t offset, int whence) } /* In this case, ->private_data is protected by f_pos_lock */

file->private_data = NULL;

return vfs_setpos(file, offset, U32_MAX);

if (!offset)

file->private_data = (void *)ctx->next_offset;

return vfs_setpos(file, offset, LONG_MAX); } static struct dentry *offset_find_next(struct xa_state *xas)

@@ -505,7 +517,7 @@ static bool offset_dir_emit(struct dir_context *ctx, struct dentry *dentry) inode->i_ino, fs_umode_to_dtype(inode->i_mode)); } -static void *offset_iterate_dir(struct inode *inode, struct dir_context *ctx) +static void offset_iterate_dir(struct inode *inode, struct dir_context *ctx, long last_index) { struct offset_ctx *so_ctx = inode->i_op->get_offset_ctx(inode); XA_STATE(xas, &so_ctx->xa, ctx->pos); @@ -514,17 +526,21 @@ static void *offset_iterate_dir(struct inode *inode, struct dir_context *ctx) while (true) { dentry = offset_find_next(&xas); if (!dentry)

return ERR_PTR(-ENOENT);

return;

if (dentry2offset(dentry) >= last_index) {

dput(dentry);

return;

} if (!offset_dir_emit(ctx, dentry)) { dput(dentry);

break;

return; } dput(dentry); ctx->pos = xas.xa_index + 1; }

return NULL; } /**

@@ -551,22 +567,19 @@ static void *offset_iterate_dir(struct inode *inode, struct dir_context *ctx) static int offset_readdir(struct file *file, struct dir_context *ctx) { struct dentry *dir = file->f_path.dentry;

long last_index = (long)file->private_data; lockdep_assert_held(&d_inode(dir)->i_rwsem); if (!dir_emit_dots(file, ctx)) return 0;

/* In this case, ->private_data is protected by f_pos_lock */

if (ctx->pos == DIR_OFFSET_MIN)

file->private_data = NULL;

else if (file->private_data == ERR_PTR(-ENOENT))

return 0;

file->private_data = offset_iterate_dir(d_inode(dir), ctx);

offset_iterate_dir(d_inode(dir), ctx, last_index); return 0; } const struct file_operations simple_offset_dir_operations = {

.open = offset_dir_open, .llseek = offset_dir_llseek, .iterate_shared = offset_readdir, .read = generic_read_dir,

-- Chuck Lever

Chuck Lever III

3:34 p.m.

New subject: [RFC PATCH 6/6 6.6] libfs: fix infinite directory reads for offset dir

...

On Nov 11, 2024, at 10:20 AM, yangerkun yangerkun@huaweicloud.com wrote:

在 2024/11/11 22:39, Chuck Lever III 写道:

...
...
On Nov 10, 2024, at 9:36 PM, Yu Kuai yukuai1@huaweicloud.com wrote:

Hi,

在 2024/11/11 8:52, cel@kernel.org 写道:

...
From: yangerkun yangerkun@huawei.com [ Upstream commit 64a7ce76fb901bf9f9c36cf5d681328fc0fd4b5a ] After we switch tmpfs dir operations from simple_dir_operations to simple_offset_dir_operations, every rename happened will fill new dentry to dest dir's maple tree(&SHMEM_I(inode)->dir_offsets->mt) with a free key starting with octx->newx_offset, and then set newx_offset equals to free key + 1. This will lead to infinite readdir combine with rename happened at the same time, which fail generic/736 in xfstests(detail show as below).

create 5000 files(1 2 3...) under one dir

call readdir(man 3 readdir) once, and get one entry

rename(entry, "TEMPFILE"), then rename("TEMPFILE", entry)

loop 2~3, until readdir return nothing or we loop too many times(tmpfs break test with the second condition)

We choose the same logic what commit 9b378f6ad48cf ("btrfs: fix infinite directory reads") to fix it, record the last_index when we open dir, and do not emit the entry which index >= last_index. The file->private_data

Please notice this requires last_index should never overflow, otherwise readdir will be messed up.

It would help your cause if you could be more specific than "messed up".

...
...
now used in offset dir can use directly to do this, and we also update the last_index when we llseek the dir file. Fixes: a2e459555c5f ("shmem: stable directory offsets") Signed-off-by: yangerkun yangerkun@huawei.com Link: https://lore.kernel.org/r/20240731043835.1828697-1-yangerkun@huawei.com Reviewed-by: Chuck Lever chuck.lever@oracle.com [brauner: only update last_index after seek when offset is zero like Jan suggested] Signed-off-by: Christian Brauner brauner@kernel.org Link: https://nvd.nist.gov/vuln/detail/CVE-2024-46701 [ cel: adjusted to apply to origin/linux-6.6.y ] Signed-off-by: Chuck Lever chuck.lever@oracle.com

fs/libfs.c | 37 +++++++++++++++++++++++++------------ 1 file changed, 25 insertions(+), 12 deletions(-) diff --git a/fs/libfs.c b/fs/libfs.c index a87005c89534..b59ff0dfea1f 100644 --- a/fs/libfs.c +++ b/fs/libfs.c @@ -449,6 +449,14 @@ void simple_offset_destroy(struct offset_ctx *octx) xa_destroy(&octx->xa); } +static int offset_dir_open(struct inode *inode, struct file *file) +{

struct offset_ctx *ctx = inode->i_op->get_offset_ctx(inode);

file->private_data = (void *)ctx->next_offset;

return 0;

+}

Looks like xarray is still used.

That's not going to change, as several folks have already explained.

...
I'm in the cc list ,so I assume you saw my set, then I don't know why you're ignoring my concerns.

next_offset is 32-bit and can overflow in a long-time running

machine. 2) Once next_offset overflows, readdir will skip the files that offset is bigger.

I'm sorry, I'm a little busy these days, so I haven't responded to this series of emails.

...
In that case, that entry won't be visible via getdents(3) until the directory is re-opened or the process does an lseek(fd, 0, SEEK_SET).

Yes.

...
That is the proper and expected behavior. I suspect you will see exactly that behavior with ext4 and 32-bit directory offsets, for example.

Emm...

For this case like this:

mkdir /tmp/dir and touch /tmp/dir/file1 /tmp/dir/file2

open /tmp/dir with fd1

readdir and get /tmp/dir/file1

rm /tmp/dir/file2

touch /tmp/dir/file2

loop 4~5 for 2^32 times

readdir /tmp/dir with fd1

For tmpfs now, we may see no /tmp/dir/file2, since the offset has been overflow, for ext4 it is ok... So we think this will be a problem.

...
Does that not directly address your concern? Or do you mean that Erkun's patch introduces a new issue?

Yes, to be honest, my personal feeling is a problem. But for 64bit, it may never been trigger.

Thanks for confirming.

In that case, the preferred way to handle it is to fix the issue in upstream, and then backport that fix to LTS. Dependence on 64-bit offsets to avoid a failure case should be considered a workaround, not a real fix, IMHO.

Do you have a few moments to address it, or if not I will see to it.

I think reducing the xa_limit in simple_offset_add() to, say, 2..16 would make the reproducer fire almost immediately.

...

...
If there is a problem here, please construct a reproducer against this patch set and post it.

...
Thanks, Kuai

...

/**

offset_dir_llseek - Advance the read position of a directory descriptor

@file: an open directory whose position is to be updated

@@ -462,6 +470,9 @@ void simple_offset_destroy(struct offset_ctx *octx) */ static loff_t offset_dir_llseek(struct file *file, loff_t offset, int whence) {

struct inode *inode = file->f_inode;

struct offset_ctx *ctx = inode->i_op->get_offset_ctx(inode);

switch (whence) { case SEEK_CUR: offset += file->f_pos;

@@ -475,8 +486,9 @@ static loff_t offset_dir_llseek(struct file *file, loff_t offset, int whence) } /* In this case, ->private_data is protected by f_pos_lock */

file->private_data = NULL;

return vfs_setpos(file, offset, U32_MAX);

if (!offset)

file->private_data = (void *)ctx->next_offset;

return vfs_setpos(file, offset, LONG_MAX);

} static struct dentry *offset_find_next(struct xa_state *xas) @@ -505,7 +517,7 @@ static bool offset_dir_emit(struct dir_context *ctx, struct dentry *dentry) inode->i_ino, fs_umode_to_dtype(inode->i_mode)); } -static void *offset_iterate_dir(struct inode *inode, struct dir_context *ctx) +static void offset_iterate_dir(struct inode *inode, struct dir_context *ctx, long last_index) { struct offset_ctx *so_ctx = inode->i_op->get_offset_ctx(inode); XA_STATE(xas, &so_ctx->xa, ctx->pos); @@ -514,17 +526,21 @@ static void *offset_iterate_dir(struct inode *inode, struct dir_context *ctx) while (true) { dentry = offset_find_next(&xas); if (!dentry)

return ERR_PTR(-ENOENT);

return;

if (dentry2offset(dentry) >= last_index) {

dput(dentry);

return;

} if (!offset_dir_emit(ctx, dentry)) { dput(dentry);

break;

return; } dput(dentry); ctx->pos = xas.xa_index + 1; }

return NULL;

} /** @@ -551,22 +567,19 @@ static void *offset_iterate_dir(struct inode *inode, struct dir_context *ctx) static int offset_readdir(struct file *file, struct dir_context *ctx) { struct dentry *dir = file->f_path.dentry;

long last_index = (long)file->private_data; lockdep_assert_held(&d_inode(dir)->i_rwsem); if (!dir_emit_dots(file, ctx)) return 0;

/* In this case, ->private_data is protected by f_pos_lock */

if (ctx->pos == DIR_OFFSET_MIN)

file->private_data = NULL;

else if (file->private_data == ERR_PTR(-ENOENT))

return 0;

file->private_data = offset_iterate_dir(d_inode(dir), ctx);

offset_iterate_dir(d_inode(dir), ctx, last_index); return 0;

} const struct file_operations simple_offset_dir_operations = {

.open = offset_dir_open, .llseek = offset_dir_llseek, .iterate_shared = offset_readdir, .read = generic_read_dir,

-- Chuck Lever

-- Chuck Lever

yangerkun

12 Nov 12 Nov

3:43 a.m.

New subject: [RFC PATCH 6/6 6.6] libfs: fix infinite directory reads for offset dir

在 2024/11/11 23:34, Chuck Lever III 写道:

...

...
On Nov 11, 2024, at 10:20 AM, yangerkun yangerkun@huaweicloud.com wrote:

在 2024/11/11 22:39, Chuck Lever III 写道:

...
...
On Nov 10, 2024, at 9:36 PM, Yu Kuai yukuai1@huaweicloud.com wrote:

Hi,

在 2024/11/11 8:52, cel@kernel.org 写道:

...
From: yangerkun yangerkun@huawei.com [ Upstream commit 64a7ce76fb901bf9f9c36cf5d681328fc0fd4b5a ] After we switch tmpfs dir operations from simple_dir_operations to simple_offset_dir_operations, every rename happened will fill new dentry to dest dir's maple tree(&SHMEM_I(inode)->dir_offsets->mt) with a free key starting with octx->newx_offset, and then set newx_offset equals to free key + 1. This will lead to infinite readdir combine with rename happened at the same time, which fail generic/736 in xfstests(detail show as below).

create 5000 files(1 2 3...) under one dir

call readdir(man 3 readdir) once, and get one entry

rename(entry, "TEMPFILE"), then rename("TEMPFILE", entry)

loop 2~3, until readdir return nothing or we loop too many times(tmpfs break test with the second condition)

We choose the same logic what commit 9b378f6ad48cf ("btrfs: fix infinite directory reads") to fix it, record the last_index when we open dir, and do not emit the entry which index >= last_index. The file->private_data

Please notice this requires last_index should never overflow, otherwise readdir will be messed up.

It would help your cause if you could be more specific than "messed up".

...
...
now used in offset dir can use directly to do this, and we also update the last_index when we llseek the dir file. Fixes: a2e459555c5f ("shmem: stable directory offsets") Signed-off-by: yangerkun yangerkun@huawei.com Link: https://lore.kernel.org/r/20240731043835.1828697-1-yangerkun@huawei.com Reviewed-by: Chuck Lever chuck.lever@oracle.com [brauner: only update last_index after seek when offset is zero like Jan suggested] Signed-off-by: Christian Brauner brauner@kernel.org Link: https://nvd.nist.gov/vuln/detail/CVE-2024-46701 [ cel: adjusted to apply to origin/linux-6.6.y ] Signed-off-by: Chuck Lever chuck.lever@oracle.com

fs/libfs.c | 37 +++++++++++++++++++++++++------------ 1 file changed, 25 insertions(+), 12 deletions(-) diff --git a/fs/libfs.c b/fs/libfs.c index a87005c89534..b59ff0dfea1f 100644 --- a/fs/libfs.c +++ b/fs/libfs.c @@ -449,6 +449,14 @@ void simple_offset_destroy(struct offset_ctx *octx) xa_destroy(&octx->xa); } +static int offset_dir_open(struct inode *inode, struct file *file) +{

struct offset_ctx *ctx = inode->i_op->get_offset_ctx(inode);

file->private_data = (void *)ctx->next_offset;

return 0;

+}

Looks like xarray is still used.

That's not going to change, as several folks have already explained.

...
I'm in the cc list ,so I assume you saw my set, then I don't know why you're ignoring my concerns.

next_offset is 32-bit and can overflow in a long-time running

machine. 2) Once next_offset overflows, readdir will skip the files that offset is bigger.

I'm sorry, I'm a little busy these days, so I haven't responded to this series of emails.

...
In that case, that entry won't be visible via getdents(3) until the directory is re-opened or the process does an lseek(fd, 0, SEEK_SET).

Yes.

...
That is the proper and expected behavior. I suspect you will see exactly that behavior with ext4 and 32-bit directory offsets, for example.

Emm...

For this case like this:

mkdir /tmp/dir and touch /tmp/dir/file1 /tmp/dir/file2

open /tmp/dir with fd1

readdir and get /tmp/dir/file1

rm /tmp/dir/file2

touch /tmp/dir/file2

loop 4~5 for 2^32 times

readdir /tmp/dir with fd1

For tmpfs now, we may see no /tmp/dir/file2, since the offset has been overflow, for ext4 it is ok... So we think this will be a problem.

...
Does that not directly address your concern? Or do you mean that Erkun's patch introduces a new issue?

Yes, to be honest, my personal feeling is a problem. But for 64bit, it may never been trigger.

Thanks for confirming.

In that case, the preferred way to handle it is to fix the issue in upstream, and then backport that fix to LTS. Dependence on 64-bit offsets to avoid a failure case should be considered a workaround, not a real fix, IMHO.

Yes.

...

Do you have a few moments to address it, or if not I will see to it.

You can try to do this, for the reason I am quite busy now until end of this month... Sorry.

...

I think reducing the xa_limit in simple_offset_add() to, say, 2..16 would make the reproducer fire almost immediately.

Yes.

...

...
...
If there is a problem here, please construct a reproducer against this patch set and post it.

...
Thanks, Kuai

...

/**

offset_dir_llseek - Advance the read position of a directory descriptor

@file: an open directory whose position is to be updated

@@ -462,6 +470,9 @@ void simple_offset_destroy(struct offset_ctx *octx) */ static loff_t offset_dir_llseek(struct file *file, loff_t offset, int whence) {

struct inode *inode = file->f_inode;

struct offset_ctx *ctx = inode->i_op->get_offset_ctx(inode);

switch (whence) { case SEEK_CUR: offset += file->f_pos;

@@ -475,8 +486,9 @@ static loff_t offset_dir_llseek(struct file *file, loff_t offset, int whence) } /* In this case, ->private_data is protected by f_pos_lock */

file->private_data = NULL;

return vfs_setpos(file, offset, U32_MAX);

if (!offset)

file->private_data = (void *)ctx->next_offset;

return vfs_setpos(file, offset, LONG_MAX); } static struct dentry *offset_find_next(struct xa_state *xas)

@@ -505,7 +517,7 @@ static bool offset_dir_emit(struct dir_context *ctx, struct dentry *dentry) inode->i_ino, fs_umode_to_dtype(inode->i_mode)); } -static void *offset_iterate_dir(struct inode *inode, struct dir_context *ctx) +static void offset_iterate_dir(struct inode *inode, struct dir_context *ctx, long last_index) { struct offset_ctx *so_ctx = inode->i_op->get_offset_ctx(inode); XA_STATE(xas, &so_ctx->xa, ctx->pos); @@ -514,17 +526,21 @@ static void *offset_iterate_dir(struct inode *inode, struct dir_context *ctx) while (true) { dentry = offset_find_next(&xas); if (!dentry)

return ERR_PTR(-ENOENT);

return;

if (dentry2offset(dentry) >= last_index) {

dput(dentry);

return;

} if (!offset_dir_emit(ctx, dentry)) { dput(dentry);

break;

return; } dput(dentry); ctx->pos = xas.xa_index + 1; }

return NULL; } /**

@@ -551,22 +567,19 @@ static void *offset_iterate_dir(struct inode *inode, struct dir_context *ctx) static int offset_readdir(struct file *file, struct dir_context *ctx) { struct dentry *dir = file->f_path.dentry;

long last_index = (long)file->private_data; lockdep_assert_held(&d_inode(dir)->i_rwsem); if (!dir_emit_dots(file, ctx)) return 0;

/* In this case, ->private_data is protected by f_pos_lock */

if (ctx->pos == DIR_OFFSET_MIN)

file->private_data = NULL;

else if (file->private_data == ERR_PTR(-ENOENT))

return 0;

file->private_data = offset_iterate_dir(d_inode(dir), ctx);

offset_iterate_dir(d_inode(dir), ctx, last_index); return 0; } const struct file_operations simple_offset_dir_operations = {

.open = offset_dir_open, .llseek = offset_dir_llseek, .iterate_shared = offset_readdir, .read = generic_read_dir,

-- Chuck Lever

-- Chuck Lever

Chuck Lever III

3:37 p.m.

New subject: [RFC PATCH 6/6 6.6] libfs: fix infinite directory reads for offset dir

...

On Nov 11, 2024, at 10:43 PM, yangerkun yangerkun@huaweicloud.com wrote:

在 2024/11/11 23:34, Chuck Lever III 写道:

...
...
On Nov 11, 2024, at 10:20 AM, yangerkun yangerkun@huaweicloud.com wrote:

在 2024/11/11 22:39, Chuck Lever III 写道:

...
...
On Nov 10, 2024, at 9:36 PM, Yu Kuai yukuai1@huaweicloud.com wrote:

Hi,

在 2024/11/11 8:52, cel@kernel.org 写道:

...
From: yangerkun yangerkun@huawei.com [ Upstream commit 64a7ce76fb901bf9f9c36cf5d681328fc0fd4b5a ] After we switch tmpfs dir operations from simple_dir_operations to simple_offset_dir_operations, every rename happened will fill new dentry to dest dir's maple tree(&SHMEM_I(inode)->dir_offsets->mt) with a free key starting with octx->newx_offset, and then set newx_offset equals to free key + 1. This will lead to infinite readdir combine with rename happened at the same time, which fail generic/736 in xfstests(detail show as below).

create 5000 files(1 2 3...) under one dir

call readdir(man 3 readdir) once, and get one entry

rename(entry, "TEMPFILE"), then rename("TEMPFILE", entry)

loop 2~3, until readdir return nothing or we loop too many times(tmpfs break test with the second condition)

We choose the same logic what commit 9b378f6ad48cf ("btrfs: fix infinite directory reads") to fix it, record the last_index when we open dir, and do not emit the entry which index >= last_index. The file->private_data

Please notice this requires last_index should never overflow, otherwise readdir will be messed up.

It would help your cause if you could be more specific than "messed up".

...
...
now used in offset dir can use directly to do this, and we also update the last_index when we llseek the dir file. Fixes: a2e459555c5f ("shmem: stable directory offsets") Signed-off-by: yangerkun yangerkun@huawei.com Link: https://lore.kernel.org/r/20240731043835.1828697-1-yangerkun@huawei.com Reviewed-by: Chuck Lever chuck.lever@oracle.com [brauner: only update last_index after seek when offset is zero like Jan suggested] Signed-off-by: Christian Brauner brauner@kernel.org Link: https://nvd.nist.gov/vuln/detail/CVE-2024-46701 [ cel: adjusted to apply to origin/linux-6.6.y ] Signed-off-by: Chuck Lever chuck.lever@oracle.com

fs/libfs.c | 37 +++++++++++++++++++++++++------------ 1 file changed, 25 insertions(+), 12 deletions(-) diff --git a/fs/libfs.c b/fs/libfs.c index a87005c89534..b59ff0dfea1f 100644 --- a/fs/libfs.c +++ b/fs/libfs.c @@ -449,6 +449,14 @@ void simple_offset_destroy(struct offset_ctx *octx) xa_destroy(&octx->xa); } +static int offset_dir_open(struct inode *inode, struct file *file) +{

struct offset_ctx *ctx = inode->i_op->get_offset_ctx(inode);

file->private_data = (void *)ctx->next_offset;

return 0;

+}

Looks like xarray is still used.

That's not going to change, as several folks have already explained.

...
I'm in the cc list ,so I assume you saw my set, then I don't know why you're ignoring my concerns.

next_offset is 32-bit and can overflow in a long-time running

machine. 2) Once next_offset overflows, readdir will skip the files that offset is bigger.

I'm sorry, I'm a little busy these days, so I haven't responded to this series of emails.

...
In that case, that entry won't be visible via getdents(3) until the directory is re-opened or the process does an lseek(fd, 0, SEEK_SET).

Yes.

...
That is the proper and expected behavior. I suspect you will see exactly that behavior with ext4 and 32-bit directory offsets, for example.

Emm...

For this case like this:

mkdir /tmp/dir and touch /tmp/dir/file1 /tmp/dir/file2

open /tmp/dir with fd1

readdir and get /tmp/dir/file1

rm /tmp/dir/file2

touch /tmp/dir/file2

loop 4~5 for 2^32 times

readdir /tmp/dir with fd1

For tmpfs now, we may see no /tmp/dir/file2, since the offset has been overflow, for ext4 it is ok... So we think this will be a problem.

...
Does that not directly address your concern? Or do you mean that Erkun's patch introduces a new issue?

Yes, to be honest, my personal feeling is a problem. But for 64bit, it may never been trigger.

Thanks for confirming. In that case, the preferred way to handle it is to fix the issue in upstream, and then backport that fix to LTS. Dependence on 64-bit offsets to avoid a failure case should be considered a workaround, not a real fix, IMHO.

Yes.

...
Do you have a few moments to address it, or if not I will see to it.

You can try to do this, for the reason I am quite busy now until end of this month... Sorry.

No worries!

...

...
I think reducing the xa_limit in simple_offset_add() to, say, 2..16 would make the reproducer fire almost immediately.

Yes.

...
...
...
If there is a problem here, please construct a reproducer against this patch set and post it.

...
Thanks, Kuai

...

/**

offset_dir_llseek - Advance the read position of a directory descriptor

@file: an open directory whose position is to be updated

@@ -462,6 +470,9 @@ void simple_offset_destroy(struct offset_ctx *octx) */ static loff_t offset_dir_llseek(struct file *file, loff_t offset, int whence) {

struct inode *inode = file->f_inode;

struct offset_ctx *ctx = inode->i_op->get_offset_ctx(inode);

switch (whence) { case SEEK_CUR: offset += file->f_pos;

@@ -475,8 +486,9 @@ static loff_t offset_dir_llseek(struct file *file, loff_t offset, int whence) } /* In this case, ->private_data is protected by f_pos_lock */

file->private_data = NULL;

return vfs_setpos(file, offset, U32_MAX);

if (!offset)

file->private_data = (void *)ctx->next_offset;

return vfs_setpos(file, offset, LONG_MAX);

} static struct dentry *offset_find_next(struct xa_state *xas) @@ -505,7 +517,7 @@ static bool offset_dir_emit(struct dir_context *ctx, struct dentry *dentry) inode->i_ino, fs_umode_to_dtype(inode->i_mode)); } -static void *offset_iterate_dir(struct inode *inode, struct dir_context *ctx) +static void offset_iterate_dir(struct inode *inode, struct dir_context *ctx, long last_index) { struct offset_ctx *so_ctx = inode->i_op->get_offset_ctx(inode); XA_STATE(xas, &so_ctx->xa, ctx->pos); @@ -514,17 +526,21 @@ static void *offset_iterate_dir(struct inode *inode, struct dir_context *ctx) while (true) { dentry = offset_find_next(&xas); if (!dentry)

return ERR_PTR(-ENOENT);

return;

if (dentry2offset(dentry) >= last_index) {

dput(dentry);

return;

} if (!offset_dir_emit(ctx, dentry)) { dput(dentry);

break;

return; } dput(dentry); ctx->pos = xas.xa_index + 1; }

return NULL;

} /** @@ -551,22 +567,19 @@ static void *offset_iterate_dir(struct inode *inode, struct dir_context *ctx) static int offset_readdir(struct file *file, struct dir_context *ctx) { struct dentry *dir = file->f_path.dentry;

long last_index = (long)file->private_data; lockdep_assert_held(&d_inode(dir)->i_rwsem); if (!dir_emit_dots(file, ctx)) return 0;

/* In this case, ->private_data is protected by f_pos_lock */

if (ctx->pos == DIR_OFFSET_MIN)

file->private_data = NULL;

else if (file->private_data == ERR_PTR(-ENOENT))

return 0;

file->private_data = offset_iterate_dir(d_inode(dir), ctx);

offset_iterate_dir(d_inode(dir), ctx, last_index); return 0;

} const struct file_operations simple_offset_dir_operations = {

.open = offset_dir_open, .llseek = offset_dir_llseek, .iterate_shared = offset_readdir, .read = generic_read_dir,

-- Chuck Lever

-- Chuck Lever

-- Chuck Lever

Chuck Lever

13 Nov 13 Nov

3:17 p.m.

New subject: [RFC PATCH 6/6 6.6] libfs: fix infinite directory reads for offset dir

On Mon, Nov 11, 2024 at 11:20:17PM +0800, yangerkun wrote:

...

在 2024/11/11 22:39, Chuck Lever III 写道:

...
...
On Nov 10, 2024, at 9:36 PM, Yu Kuai yukuai1@huaweicloud.com wrote: I'm in the cc list ,so I assume you saw my set, then I don't know why you're ignoring my concerns.

next_offset is 32-bit and can overflow in a long-time running

machine. 2) Once next_offset overflows, readdir will skip the files that offset is bigger.

I'm sorry, I'm a little busy these days, so I haven't responded to this series of emails.

...
In that case, that entry won't be visible via getdents(3) until the directory is re-opened or the process does an lseek(fd, 0, SEEK_SET).

Yes.

...
That is the proper and expected behavior. I suspect you will see exactly that behavior with ext4 and 32-bit directory offsets, for example.

Emm...

For this case like this:

mkdir /tmp/dir and touch /tmp/dir/file1 /tmp/dir/file2

open /tmp/dir with fd1

readdir and get /tmp/dir/file1

rm /tmp/dir/file2

touch /tmp/dir/file2

loop 4~5 for 2^32 times

readdir /tmp/dir with fd1

For tmpfs now, we may see no /tmp/dir/file2, since the offset has been overflow, for ext4 it is ok... So we think this will be a problem.

I constructed a simple test program using the above steps:

/* * 1. mkdir /tmp/dir and touch /tmp/dir/file1 /tmp/dir/file2 * 2. open /tmp/dir with fd1 * 3. readdir and get /tmp/dir/file1 * 4. rm /tmp/dir/file2 * 5. touch /tmp/dir/file2 * 6. loop 4~5 for 2^32 times * 7. readdir /tmp/dir with fd1 */

#include <sys/types.h> #include <sys/stat.h>

#include <dirent.h> #include <errno.h> #include <fcntl.h> #include <unistd.h> #include <stdbool.h> #include <stdio.h> #include <string.h>

static void list_directory(DIR *dirp) { struct dirent *de;

errno = 0; do { de = readdir(dirp); if (!de) break;

printf("d_off: %lld\n", de->d_off); printf("d_name: %s\n", de->d_name); } while (true);

if (errno) perror("readdir"); else printf("EOD\n"); }

int main(int argc, char **argv) { unsigned long i; DIR *dirp; int ret;

/* 1. */ ret = mkdir("/tmp/dir", 0755); if (ret < 0) { perror("mkdir"); return 1; }

ret = creat("/tmp/dir/file1", 0644); if (ret < 0) { perror("creat"); return 1; } close(ret);

ret = creat("/tmp/dir/file2", 0644); if (ret < 0) { perror("creat"); return 1; } close(ret);

/* 2. */ errno = 0; dirp = opendir("/tmp/dir"); if (!dirp) { if (errno) perror("opendir"); else fprintf(stderr, "EOD\n"); closedir(dirp); return 1; }

/* 3. */ errno = 0; do { struct dirent *de;

de = readdir(dirp); if (!de) { if (errno) { perror("readdir"); closedir(dirp); return 1; } break; } if (strcmp(de->d_name, "file1") == 0) { printf("Found 'file1'\n"); break; } } while (true);

/* run the test. */ for (i = 0; i < 10000; i++) { /* 4. */ ret = unlink("/tmp/dir/file2"); if (ret < 0) { perror("unlink"); closedir(dirp); return 1; }

/* 5. */ ret = creat("/tmp/dir/file2", 0644); if (ret < 0) { perror("creat"); fprintf(stderr, "i = %lu\n", i); closedir(dirp); return 1; } close(ret); }

/* 7. */ printf("\ndirectory after test:\n"); list_directory(dirp);

/* cel. */ rewinddir(dirp); printf("\ndirectory after rewind:\n"); list_directory(dirp);

closedir(dirp); return 0; }

...

...
Does that not directly address your concern? Or do you mean that Erkun's patch introduces a new issue?

Yes, to be honest, my personal feeling is a problem. But for 64bit, it may never been trigger.

I ran the test program above on this kernel:

https://git.kernel.org/pub/scm/linux/kernel/git/cel/linux.git/log/?h=nfsd-te...

Note that it has a patch to restrict the range of directory offset values for tmpfs to 2..4096.

I did not observe any unexpected behavior after the offset values wrapped. At step 7, I can always see file2, and its offset is always 4. At step "cel" I can see all expected directory entries.

I tested on v6.12-rc7 with the same range restriction but using Maple tree and 64-bit offsets. No unexpected behavior there either.

So either we're still missing something, or there is no problem. My only theory is maybe it's an issue with an implicit integer sign conversion, and we should restrict the offset range to 2..S32_MAX.

I can try testing with a range of (U32_MAX - 4096)..(U32_MAX).

...

...
If there is a problem here, please construct a reproducer against this patch set and post it.

Invitation still stands: if you have a solid reproducer, please post it.

-- Chuck Lever

Yu Kuai

16 Nov 16 Nov

7:22 a.m.

New subject: [RFC PATCH 6/6 6.6] libfs: fix infinite directory reads for offset dir

Hi,

在 2024/11/13 23:17, Chuck Lever 写道:

...

On Mon, Nov 11, 2024 at 11:20:17PM +0800, yangerkun wrote:

...
在 2024/11/11 22:39, Chuck Lever III 写道:

...
...
On Nov 10, 2024, at 9:36 PM, Yu Kuai yukuai1@huaweicloud.com wrote: I'm in the cc list ,so I assume you saw my set, then I don't know why you're ignoring my concerns.

next_offset is 32-bit and can overflow in a long-time running

machine. 2) Once next_offset overflows, readdir will skip the files that offset is bigger.

I'm sorry, I'm a little busy these days, so I haven't responded to this series of emails.

...
In that case, that entry won't be visible via getdents(3) until the directory is re-opened or the process does an lseek(fd, 0, SEEK_SET).

Yes.

...
That is the proper and expected behavior. I suspect you will see exactly that behavior with ext4 and 32-bit directory offsets, for example.

Emm...

For this case like this:

mkdir /tmp/dir and touch /tmp/dir/file1 /tmp/dir/file2

open /tmp/dir with fd1

readdir and get /tmp/dir/file1

rm /tmp/dir/file2

touch /tmp/dir/file2

loop 4~5 for 2^32 times

readdir /tmp/dir with fd1

For tmpfs now, we may see no /tmp/dir/file2, since the offset has been overflow, for ext4 it is ok... So we think this will be a problem.

I constructed a simple test program using the above steps:

/*

mkdir /tmp/dir and touch /tmp/dir/file1 /tmp/dir/file2

open /tmp/dir with fd1

readdir and get /tmp/dir/file1

rm /tmp/dir/file2

touch /tmp/dir/file2

loop 4~5 for 2^32 times

readdir /tmp/dir with fd1

*/

#include <sys/types.h> #include <sys/stat.h>

#include <dirent.h> #include <errno.h> #include <fcntl.h> #include <unistd.h> #include <stdbool.h> #include <stdio.h> #include <string.h>

static void list_directory(DIR *dirp) { struct dirent *de;

errno = 0; do { de = readdir(dirp); if (!de) break;
printf("d_off:  %lld\n", de->d_off);
printf("d_name: %s\n", de->d_name);
} while (true);

if (errno) perror("readdir"); else printf("EOD\n"); }

int main(int argc, char **argv) { unsigned long i; DIR *dirp; int ret;

/* 1. */ ret = mkdir("/tmp/dir", 0755); if (ret < 0) { perror("mkdir"); return 1; }

ret = creat("/tmp/dir/file1", 0644); if (ret < 0) { perror("creat"); return 1; } close(ret);

ret = creat("/tmp/dir/file2", 0644); if (ret < 0) { perror("creat"); return 1; } close(ret);

/* 2. */ errno = 0; dirp = opendir("/tmp/dir"); if (!dirp) { if (errno) perror("opendir"); else fprintf(stderr, "EOD\n"); closedir(dirp); return 1; }

/* 3. */ errno = 0; do { struct dirent *de;
de = readdir(dirp);
if (!de) {
	if (errno) {
		perror("readdir");
		closedir(dirp);
		return 1;
	}
	break;
}
if (strcmp(de->d_name, "file1") == 0) {
	printf("Found 'file1'\n");
	break;
}
} while (true);

/* run the test. */ for (i = 0; i < 10000; i++) { /* 4. */ ret = unlink("/tmp/dir/file2"); if (ret < 0) { perror("unlink"); closedir(dirp); return 1; }
/* 5. */
ret = creat("/tmp/dir/file2", 0644);
if (ret < 0) {
	perror("creat");
	fprintf(stderr, "i = %lu\n", i);
	closedir(dirp);
	return 1;
}
close(ret);
}

/* 7. */ printf("\ndirectory after test:\n"); list_directory(dirp);

/* cel. */ rewinddir(dirp); printf("\ndirectory after rewind:\n"); list_directory(dirp);

closedir(dirp); return 0; }

...
...
Does that not directly address your concern? Or do you mean that Erkun's patch introduces a new issue?

Yes, to be honest, my personal feeling is a problem. But for 64bit, it may never been trigger.

I ran the test program above on this kernel:

https://git.kernel.org/pub/scm/linux/kernel/git/cel/linux.git/log/?h=nfsd-te...

Note that it has a patch to restrict the range of directory offset values for tmpfs to 2..4096.

I did not observe any unexpected behavior after the offset values wrapped. At step 7, I can always see file2, and its offset is always 4. At step "cel" I can see all expected directory entries.

Then, do you investigate more or not?

...

I tested on v6.12-rc7 with the same range restriction but using Maple tree and 64-bit offsets. No unexpected behavior there either.

So either we're still missing something, or there is no problem. My only theory is maybe it's an issue with an implicit integer sign conversion, and we should restrict the offset range to 2..S32_MAX.

I can try testing with a range of (U32_MAX - 4096)..(U32_MAX).

You can try the following reproducer, it's much simpler. First, apply following patch(on latest kernel):

diff --git a/fs/libfs.c b/fs/libfs.c index a168ece5cc61..7c1a5982a0c8 100644 --- a/fs/libfs.c +++ b/fs/libfs.c @@ -291,7 +291,7 @@ int simple_offset_add(struct offset_ctx *octx, struct dentry *dentry) return -EBUSY;

ret = mtree_alloc_cyclic(&octx->mt, &offset, dentry, DIR_OFFSET_MIN, - LONG_MAX, &octx->next_offset, GFP_KERNEL); + 256, &octx->next_offset, GFP_KERNEL); if (ret < 0) return ret;

Then, create a new tmpfs dir, inside the dir:

[root@fedora test-libfs]# for ((i=0; i<256; ++i)); do touch $i; done touch: cannot touch '255': Device or resource busy [root@fedora test-libfs]# ls 0 103 109 114 12 125 130 136 141 147 152 158 163 169 174 18 185 190 196 200 206 211 217 222 228 233 239 244 25 26 31 37 42 48 53 59 64 7 75 80 86 91 97 1 104 11 115 120 126 131 137 142 148 153 159 164 17 175 180 186 191 197 201 207 212 218 223 229 234 24 245 250 27 32 38 43 49 54 6 65 70 76 81 87 92 98 10 105 110 116 121 127 132 138 143 149 154 16 165 170 176 181 187 192 198 202 208 213 219 224 23 235 240 246 251 28 33 39 44 5 55 60 66 71 77 82 88 93 99 100 106 111 117 122 128 133 139 144 15 155 160 166 171 177 182 188 193 199 203 209 214 22 225 230 236 241 247 252 29 34 4 45 50 56 61 67 72 78 83 89 94 101 107 112 118 123 129 134 14 145 150 156 161 167 172 178 183 189 194 2 204 21 215 220 226 231 237 242 248 253 3 35 40 46 51 57 62 68 73 79 84 9 95 102 108 113 119 124 13 135 140 146 151 157 162 168 173 179 184 19 195 20 205 210 216 221 227 232 238 243 249 254 30 36 41 47 52 58 63 69 74 8 85 90 96 [root@fedora test-libfs]# rm -f 0 [root@fedora test-libfs]# touch 255 [root@fedora test-libfs]# ls 255 [root@fedora test-libfs]#

I don't think I have to explain why the second ls can only show the file 255...

Thanks, Kuai

...

...
...
If there is a problem here, please construct a reproducer against this patch set and post it.

Invitation still stands: if you have a solid reproducer, please post it.

396

days inactive

401

days old

linux-stable-mirror@lists.linaro.org

14 comments

participants

tags (0)

participants (5)

cel＠kernel.org
Chuck Lever
Chuck Lever III
yangerkun
Yu Kuai