This is a backport of all the work that lead up to the work that Linus made on eventfs. I trust Linus's version more so than the versions in 6.6 and 6.7. There may be plenty of hidden issues due to the design.
This is the update for 6.7. It includes Linus's updates as well as all the patches leading up to them.
I ran these through my full test suite that I use before sending anyting to Linus, althouh I did not run my "bisect" test that walks through the patches. The tests were just run on the end result. I'm currently running my 6.6 version through my tests.
Erick Archer (1): eventfs: Use kcalloc() instead of kzalloc()
Linus Torvalds (7): tracefs: remove stale 'update_gid' code eventfs: Initialize the tracefs inode properly tracefs: Avoid using the ei->dentry pointer unnecessarily tracefs: dentry lookup crapectomy eventfs: Remove unused d_parent pointer field eventfs: Clean up dentry ops and add revalidate function eventfs: Get rid of dentry pointers without refcounts
Steven Rostedt (Google) (15): eventfs: Remove "lookup" parameter from create_dir/file_dentry() eventfs: Stop using dcache_readdir() for getdents() tracefs/eventfs: Use root and instance inodes as default ownership eventfs: Have eventfs_iterate() stop immediately if ei->is_freed is set eventfs: Do ctx->pos update for all iterations in eventfs_iterate() eventfs: Read ei->entries before ei->children in eventfs_iterate() eventfs: Shortcut eventfs_iterate() by skipping entries already read eventfs: Have the inodes all for files and directories all be the same eventfs: Do not create dentries nor inodes in iterate_shared eventfs: Save directory inodes in the eventfs_inode structure tracefs: Zero out the tracefs_inode when allocating it eventfs: Warn if an eventfs_inode is freed without is_freed being set eventfs: Restructure eventfs_inode structure to be more condensed eventfs: Remove fsnotify*() functions from lookup() eventfs: Keep all directory links at 1
---- fs/tracefs/event_inode.c | 905 ++++++++++++++++------------------------------- fs/tracefs/inode.c | 286 +++++++-------- fs/tracefs/internal.h | 48 ++- 3 files changed, 451 insertions(+), 788 deletions(-)
From: "Steven Rostedt (Google)" rostedt@goodmis.org
The "lookup" parameter is a way to differentiate the call to create_file/dir_dentry() from when it's just a lookup (no need to up the dentry refcount) and accessed via a readdir (need to up the refcount).
But reality, it just makes the code more complex. Just up the refcount and let the caller decide to dput() the result or not.
Link: https://lore.kernel.org/linux-trace-kernel/20240103102553.17a19cea@gandalf.l... Link: https://lore.kernel.org/linux-trace-kernel/20240104015435.517502710@goodmis....
Cc: Masami Hiramatsu mhiramat@kernel.org Cc: Mark Rutland mark.rutland@arm.com Cc: Mathieu Desnoyers mathieu.desnoyers@efficios.com Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Ajay Kaher akaher@vmware.com Cc: Al Viro viro@ZenIV.linux.org.uk Cc: Christian Brauner brauner@kernel.org Signed-off-by: Steven Rostedt (Google) rostedt@goodmis.org --- fs/tracefs/event_inode.c | 55 +++++++++++++++------------------------- 1 file changed, 20 insertions(+), 35 deletions(-)
diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c index f0677ea0ec24..c360300fb866 100644 --- a/fs/tracefs/event_inode.c +++ b/fs/tracefs/event_inode.c @@ -390,16 +390,14 @@ void eventfs_set_ei_status_free(struct tracefs_inode *ti, struct dentry *dentry) * @mode: The mode of the file. * @data: The data to use to set the inode of the file with on open() * @fops: The fops of the file to be created. - * @lookup: If called by the lookup routine, in which case, dput() the created dentry. * * Create a dentry for a file of an eventfs_inode @ei and place it into the - * address located at @e_dentry. If the @e_dentry already has a dentry, then - * just do a dget() on it and return. Otherwise create the dentry and attach it. + * address located at @e_dentry. */ static struct dentry * create_file_dentry(struct eventfs_inode *ei, int idx, struct dentry *parent, const char *name, umode_t mode, void *data, - const struct file_operations *fops, bool lookup) + const struct file_operations *fops) { struct eventfs_attr *attr = NULL; struct dentry **e_dentry = &ei->d_children[idx]; @@ -414,9 +412,7 @@ create_file_dentry(struct eventfs_inode *ei, int idx, } /* If the e_dentry already has a dentry, use it */ if (*e_dentry) { - /* lookup does not need to up the ref count */ - if (!lookup) - dget(*e_dentry); + dget(*e_dentry); mutex_unlock(&eventfs_mutex); return *e_dentry; } @@ -441,13 +437,12 @@ create_file_dentry(struct eventfs_inode *ei, int idx, * way to being freed, don't return it. If e_dentry is NULL * it means it was already freed. */ - if (ei->is_freed) + if (ei->is_freed) { dentry = NULL; - else + } else { dentry = *e_dentry; - /* The lookup does not need to up the dentry refcount */ - if (dentry && !lookup) dget(dentry); + } mutex_unlock(&eventfs_mutex); return dentry; } @@ -465,9 +460,6 @@ create_file_dentry(struct eventfs_inode *ei, int idx, } mutex_unlock(&eventfs_mutex);
- if (lookup) - dput(dentry); - return dentry; }
@@ -500,13 +492,12 @@ static void eventfs_post_create_dir(struct eventfs_inode *ei) * @pei: The eventfs_inode parent of ei. * @ei: The eventfs_inode to create the directory for * @parent: The dentry of the parent of this directory - * @lookup: True if this is called by the lookup code * * This creates and attaches a directory dentry to the eventfs_inode @ei. */ static struct dentry * create_dir_dentry(struct eventfs_inode *pei, struct eventfs_inode *ei, - struct dentry *parent, bool lookup) + struct dentry *parent) { struct dentry *dentry = NULL;
@@ -518,11 +509,9 @@ create_dir_dentry(struct eventfs_inode *pei, struct eventfs_inode *ei, return NULL; } if (ei->dentry) { - /* If the dentry already has a dentry, use it */ + /* If the eventfs_inode already has a dentry, use it */ dentry = ei->dentry; - /* lookup does not need to up the ref count */ - if (!lookup) - dget(dentry); + dget(dentry); mutex_unlock(&eventfs_mutex); return dentry; } @@ -542,7 +531,7 @@ create_dir_dentry(struct eventfs_inode *pei, struct eventfs_inode *ei, * way to being freed. */ dentry = ei->dentry; - if (dentry && !lookup) + if (dentry) dget(dentry); mutex_unlock(&eventfs_mutex); return dentry; @@ -562,9 +551,6 @@ create_dir_dentry(struct eventfs_inode *pei, struct eventfs_inode *ei, } mutex_unlock(&eventfs_mutex);
- if (lookup) - dput(dentry); - return dentry; }
@@ -589,8 +575,8 @@ static struct dentry *eventfs_root_lookup(struct inode *dir, struct eventfs_inode *ei; struct dentry *ei_dentry = NULL; struct dentry *ret = NULL; + struct dentry *d; const char *name = dentry->d_name.name; - bool created = false; umode_t mode; void *data; int idx; @@ -626,13 +612,10 @@ static struct dentry *eventfs_root_lookup(struct inode *dir, ret = simple_lookup(dir, dentry, flags); if (IS_ERR(ret)) goto out; - create_dir_dentry(ei, ei_child, ei_dentry, true); - created = true; - break; - } - - if (created) + d = create_dir_dentry(ei, ei_child, ei_dentry); + dput(d); goto out; + }
for (i = 0; i < ei->nr_entries; i++) { entry = &ei->entries[i]; @@ -650,8 +633,8 @@ static struct dentry *eventfs_root_lookup(struct inode *dir, ret = simple_lookup(dir, dentry, flags); if (IS_ERR(ret)) goto out; - create_file_dentry(ei, i, ei_dentry, name, mode, cdata, - fops, true); + d = create_file_dentry(ei, i, ei_dentry, name, mode, cdata, fops); + dput(d); break; } } @@ -768,9 +751,10 @@ static int dcache_dir_open_wrapper(struct inode *inode, struct file *file) inode_lock(parent->d_inode); list_for_each_entry_srcu(ei_child, &ei->children, list, srcu_read_lock_held(&eventfs_srcu)) { - d = create_dir_dentry(ei, ei_child, parent, false); + d = create_dir_dentry(ei, ei_child, parent); if (d) { ret = add_dentries(&dentries, d, cnt); + dput(d); if (ret < 0) break; cnt++; @@ -790,9 +774,10 @@ static int dcache_dir_open_wrapper(struct inode *inode, struct file *file) mutex_unlock(&eventfs_mutex); if (r <= 0) continue; - d = create_file_dentry(ei, i, parent, name, mode, cdata, fops, false); + d = create_file_dentry(ei, i, parent, name, mode, cdata, fops); if (d) { ret = add_dentries(&dentries, d, cnt); + dput(d); if (ret < 0) break; cnt++;
Hi,
Thanks for your patch.
FYI: kernel test robot notices the stable kernel rule is not satisfied.
The check is based on https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.html#opti...
Rule: add the tag "Cc: stable@vger.kernel.org" in the sign-off area to have the patch automatically included in the stable tree. Subject: [v6.7][PATCH 01/23] eventfs: Remove "lookup" parameter from create_dir/file_dentry() Link: https://lore.kernel.org/stable/20240204011827.061811169%40goodmis.org
From: "Steven Rostedt (Google)" rostedt@goodmis.org
The eventfs creates dynamically allocated dentries and inodes. Using the dcache_readdir() logic for its own directory lookups requires hiding the cursor of the dcache logic and playing games to allow the dcache_readdir() to still have access to the cursor while the eventfs saved what it created and what it needs to release.
Instead, just have eventfs have its own iterate_shared callback function that will fill in the dent entries. This simplifies the code quite a bit.
Link: https://lore.kernel.org/linux-trace-kernel/20240104015435.682218477@goodmis....
Cc: Masami Hiramatsu mhiramat@kernel.org Cc: Mark Rutland mark.rutland@arm.com Cc: Mathieu Desnoyers mathieu.desnoyers@efficios.com Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Ajay Kaher akaher@vmware.com Cc: Al Viro viro@ZenIV.linux.org.uk Cc: Christian Brauner brauner@kernel.org Signed-off-by: Steven Rostedt (Google) rostedt@goodmis.org --- fs/tracefs/event_inode.c | 194 +++++++++++++-------------------------- 1 file changed, 64 insertions(+), 130 deletions(-)
diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c index c360300fb866..41af56f44f0a 100644 --- a/fs/tracefs/event_inode.c +++ b/fs/tracefs/event_inode.c @@ -52,9 +52,7 @@ enum { static struct dentry *eventfs_root_lookup(struct inode *dir, struct dentry *dentry, unsigned int flags); -static int dcache_dir_open_wrapper(struct inode *inode, struct file *file); -static int dcache_readdir_wrapper(struct file *file, struct dir_context *ctx); -static int eventfs_release(struct inode *inode, struct file *file); +static int eventfs_iterate(struct file *file, struct dir_context *ctx);
static void update_attr(struct eventfs_attr *attr, struct iattr *iattr) { @@ -148,11 +146,9 @@ static const struct inode_operations eventfs_file_inode_operations = { };
static const struct file_operations eventfs_file_operations = { - .open = dcache_dir_open_wrapper, .read = generic_read_dir, - .iterate_shared = dcache_readdir_wrapper, + .iterate_shared = eventfs_iterate, .llseek = generic_file_llseek, - .release = eventfs_release, };
/* Return the evenfs_inode of the "events" directory */ @@ -643,128 +639,87 @@ static struct dentry *eventfs_root_lookup(struct inode *dir, return ret; }
-struct dentry_list { - void *cursor; - struct dentry **dentries; -}; - -/** - * eventfs_release - called to release eventfs file/dir - * @inode: inode to be released - * @file: file to be released (not used) - */ -static int eventfs_release(struct inode *inode, struct file *file) -{ - struct tracefs_inode *ti; - struct dentry_list *dlist = file->private_data; - void *cursor; - int i; - - ti = get_tracefs(inode); - if (!(ti->flags & TRACEFS_EVENT_INODE)) - return -EINVAL; - - if (WARN_ON_ONCE(!dlist)) - return -EINVAL; - - for (i = 0; dlist->dentries && dlist->dentries[i]; i++) { - dput(dlist->dentries[i]); - } - - cursor = dlist->cursor; - kfree(dlist->dentries); - kfree(dlist); - file->private_data = cursor; - return dcache_dir_close(inode, file); -} - -static int add_dentries(struct dentry ***dentries, struct dentry *d, int cnt) -{ - struct dentry **tmp; - - tmp = krealloc(*dentries, sizeof(d) * (cnt + 2), GFP_NOFS); - if (!tmp) - return -1; - tmp[cnt] = d; - tmp[cnt + 1] = NULL; - *dentries = tmp; - return 0; -} - -/** - * dcache_dir_open_wrapper - eventfs open wrapper - * @inode: not used - * @file: dir to be opened (to create it's children) - * - * Used to dynamic create file/dir with-in @file, all the - * file/dir will be created. If already created then references - * will be increased +/* + * Walk the children of a eventfs_inode to fill in getdents(). */ -static int dcache_dir_open_wrapper(struct inode *inode, struct file *file) +static int eventfs_iterate(struct file *file, struct dir_context *ctx) { const struct file_operations *fops; + struct inode *f_inode = file_inode(file); const struct eventfs_entry *entry; struct eventfs_inode *ei_child; struct tracefs_inode *ti; struct eventfs_inode *ei; - struct dentry_list *dlist; - struct dentry **dentries = NULL; - struct dentry *parent = file_dentry(file); - struct dentry *d; - struct inode *f_inode = file_inode(file); - const char *name = parent->d_name.name; + struct dentry *ei_dentry = NULL; + struct dentry *dentry; + const char *name; umode_t mode; - void *data; - int cnt = 0; int idx; - int ret; - int i; - int r; + int ret = -EINVAL; + int ino; + int i, r, c; + + if (!dir_emit_dots(file, ctx)) + return 0;
ti = get_tracefs(f_inode); if (!(ti->flags & TRACEFS_EVENT_INODE)) return -EINVAL;
- if (WARN_ON_ONCE(file->private_data)) - return -EINVAL; + c = ctx->pos - 2;
idx = srcu_read_lock(&eventfs_srcu);
mutex_lock(&eventfs_mutex); ei = READ_ONCE(ti->private); + if (ei && !ei->is_freed) + ei_dentry = READ_ONCE(ei->dentry); mutex_unlock(&eventfs_mutex);
- if (!ei) { - srcu_read_unlock(&eventfs_srcu, idx); - return -EINVAL; - } - - - data = ei->data; + if (!ei || !ei_dentry) + goto out;
- dlist = kmalloc(sizeof(*dlist), GFP_KERNEL); - if (!dlist) { - srcu_read_unlock(&eventfs_srcu, idx); - return -ENOMEM; - } + ret = 0;
- inode_lock(parent->d_inode); + /* + * Need to create the dentries and inodes to have a consistent + * inode number. + */ list_for_each_entry_srcu(ei_child, &ei->children, list, srcu_read_lock_held(&eventfs_srcu)) { - d = create_dir_dentry(ei, ei_child, parent); - if (d) { - ret = add_dentries(&dentries, d, cnt); - dput(d); - if (ret < 0) - break; - cnt++; + + if (c > 0) { + c--; + continue; } + + if (ei_child->is_freed) + continue; + + name = ei_child->name; + + dentry = create_dir_dentry(ei, ei_child, ei_dentry); + if (!dentry) + goto out; + ino = dentry->d_inode->i_ino; + dput(dentry); + + if (!dir_emit(ctx, name, strlen(name), ino, DT_DIR)) + goto out; + ctx->pos++; }
for (i = 0; i < ei->nr_entries; i++) { - void *cdata = data; + void *cdata = ei->data; + + if (c > 0) { + c--; + continue; + } + entry = &ei->entries[i]; name = entry->name; + mutex_lock(&eventfs_mutex); /* If ei->is_freed, then the event itself may be too */ if (!ei->is_freed) @@ -774,42 +729,21 @@ static int dcache_dir_open_wrapper(struct inode *inode, struct file *file) mutex_unlock(&eventfs_mutex); if (r <= 0) continue; - d = create_file_dentry(ei, i, parent, name, mode, cdata, fops); - if (d) { - ret = add_dentries(&dentries, d, cnt); - dput(d); - if (ret < 0) - break; - cnt++; - } - } - inode_unlock(parent->d_inode); - srcu_read_unlock(&eventfs_srcu, idx); - ret = dcache_dir_open(inode, file);
- /* - * dcache_dir_open() sets file->private_data to a dentry cursor. - * Need to save that but also save all the dentries that were - * opened by this function. - */ - dlist->cursor = file->private_data; - dlist->dentries = dentries; - file->private_data = dlist; - return ret; -} + dentry = create_file_dentry(ei, i, ei_dentry, name, mode, cdata, fops); + if (!dentry) + goto out; + ino = dentry->d_inode->i_ino; + dput(dentry);
-/* - * This just sets the file->private_data back to the cursor and back. - */ -static int dcache_readdir_wrapper(struct file *file, struct dir_context *ctx) -{ - struct dentry_list *dlist = file->private_data; - int ret; + if (!dir_emit(ctx, name, strlen(name), ino, DT_REG)) + goto out; + ctx->pos++; + } + ret = 1; + out: + srcu_read_unlock(&eventfs_srcu, idx);
- file->private_data = dlist->cursor; - ret = dcache_readdir(file, ctx); - dlist->cursor = file->private_data; - file->private_data = dlist; return ret; }
From: "Steven Rostedt (Google)" rostedt@goodmis.org
Instead of walking the dentries on mount/remount to update the gid values of all the dentries if a gid option is specified on mount, just update the root inode. Add .getattr, .setattr, and .permissions on the tracefs inode operations to update the permissions of the files and directories.
For all files and directories in the top level instance:
/sys/kernel/tracing/*
It will use the root inode as the default permissions. The inode that represents: /sys/kernel/tracing (or wherever it is mounted).
When an instance is created:
mkdir /sys/kernel/tracing/instance/foo
The directory "foo" and all its files and directories underneath will use the default of what foo is when it was created. A remount of tracefs will not affect it.
If a user were to modify the permissions of any file or directory in tracefs, it will also no longer be modified by a change in ownership of a remount.
The events directory, if it is in the top level instance, will use the tracefs root inode as the default ownership for itself and all the files and directories below it.
For the events directory in an instance ("foo"), it will keep the ownership of what it was when it was created, and that will be used as the default ownership for the files and directories beneath it.
Link: https://lore.kernel.org/linux-trace-kernel/CAHk-=wjVdGkjDXBbvLn2wbZnqP4UsH46... Link: https://lore.kernel.org/linux-trace-kernel/20240103215016.1e0c9811@gandalf.l...
Cc: Masami Hiramatsu mhiramat@kernel.org Cc: Mathieu Desnoyers mathieu.desnoyers@efficios.com Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Al Viro viro@ZenIV.linux.org.uk Cc: Christian Brauner brauner@kernel.org Cc: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Steven Rostedt (Google) rostedt@goodmis.org --- fs/tracefs/event_inode.c | 79 +++++++++++++++- fs/tracefs/inode.c | 198 ++++++++++++++++++++++----------------- fs/tracefs/internal.h | 3 + 3 files changed, 190 insertions(+), 90 deletions(-)
diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c index 41af56f44f0a..72912b5f9a90 100644 --- a/fs/tracefs/event_inode.c +++ b/fs/tracefs/event_inode.c @@ -45,6 +45,7 @@ enum { EVENTFS_SAVE_MODE = BIT(16), EVENTFS_SAVE_UID = BIT(17), EVENTFS_SAVE_GID = BIT(18), + EVENTFS_TOPLEVEL = BIT(19), };
#define EVENTFS_MODE_MASK (EVENTFS_SAVE_MODE - 1) @@ -115,10 +116,17 @@ static int eventfs_set_attr(struct mnt_idmap *idmap, struct dentry *dentry, * The events directory dentry is never freed, unless its * part of an instance that is deleted. It's attr is the * default for its child files and directories. - * Do not update it. It's not used for its own mode or ownership + * Do not update it. It's not used for its own mode or ownership. */ - if (!ei->is_events) + if (ei->is_events) { + /* But it still needs to know if it was modified */ + if (iattr->ia_valid & ATTR_UID) + ei->attr.mode |= EVENTFS_SAVE_UID; + if (iattr->ia_valid & ATTR_GID) + ei->attr.mode |= EVENTFS_SAVE_GID; + } else { update_attr(&ei->attr, iattr); + }
} else { name = dentry->d_name.name; @@ -136,9 +144,66 @@ static int eventfs_set_attr(struct mnt_idmap *idmap, struct dentry *dentry, return ret; }
+static void update_top_events_attr(struct eventfs_inode *ei, struct dentry *dentry) +{ + struct inode *inode; + + /* Only update if the "events" was on the top level */ + if (!ei || !(ei->attr.mode & EVENTFS_TOPLEVEL)) + return; + + /* Get the tracefs root inode. */ + inode = d_inode(dentry->d_sb->s_root); + ei->attr.uid = inode->i_uid; + ei->attr.gid = inode->i_gid; +} + +static void set_top_events_ownership(struct inode *inode) +{ + struct tracefs_inode *ti = get_tracefs(inode); + struct eventfs_inode *ei = ti->private; + struct dentry *dentry; + + /* The top events directory doesn't get automatically updated */ + if (!ei || !ei->is_events || !(ei->attr.mode & EVENTFS_TOPLEVEL)) + return; + + dentry = ei->dentry; + + update_top_events_attr(ei, dentry); + + if (!(ei->attr.mode & EVENTFS_SAVE_UID)) + inode->i_uid = ei->attr.uid; + + if (!(ei->attr.mode & EVENTFS_SAVE_GID)) + inode->i_gid = ei->attr.gid; +} + +static int eventfs_get_attr(struct mnt_idmap *idmap, + const struct path *path, struct kstat *stat, + u32 request_mask, unsigned int flags) +{ + struct dentry *dentry = path->dentry; + struct inode *inode = d_backing_inode(dentry); + + set_top_events_ownership(inode); + + generic_fillattr(idmap, request_mask, inode, stat); + return 0; +} + +static int eventfs_permission(struct mnt_idmap *idmap, + struct inode *inode, int mask) +{ + set_top_events_ownership(inode); + return generic_permission(idmap, inode, mask); +} + static const struct inode_operations eventfs_root_dir_inode_operations = { .lookup = eventfs_root_lookup, .setattr = eventfs_set_attr, + .getattr = eventfs_get_attr, + .permission = eventfs_permission, };
static const struct inode_operations eventfs_file_inode_operations = { @@ -174,6 +239,8 @@ static struct eventfs_inode *eventfs_find_events(struct dentry *dentry) } while (!ei->is_events); mutex_unlock(&eventfs_mutex);
+ update_top_events_attr(ei, dentry); + return ei; }
@@ -887,6 +954,14 @@ struct eventfs_inode *eventfs_create_events_dir(const char *name, struct dentry uid = d_inode(dentry->d_parent)->i_uid; gid = d_inode(dentry->d_parent)->i_gid;
+ /* + * If the events directory is of the top instance, then parent + * is NULL. Set the attr.mode to reflect this and its permissions will + * default to the tracefs root dentry. + */ + if (!parent) + ei->attr.mode = EVENTFS_TOPLEVEL; + /* This is used as the default ownership of the files and directories */ ei->attr.uid = uid; ei->attr.gid = gid; diff --git a/fs/tracefs/inode.c b/fs/tracefs/inode.c index bc86ffdb103b..e1b172c0e091 100644 --- a/fs/tracefs/inode.c +++ b/fs/tracefs/inode.c @@ -91,6 +91,7 @@ static int tracefs_syscall_mkdir(struct mnt_idmap *idmap, struct inode *inode, struct dentry *dentry, umode_t mode) { + struct tracefs_inode *ti; char *name; int ret;
@@ -98,6 +99,15 @@ static int tracefs_syscall_mkdir(struct mnt_idmap *idmap, if (!name) return -ENOMEM;
+ /* + * This is a new directory that does not take the default of + * the rootfs. It becomes the default permissions for all the + * files and directories underneath it. + */ + ti = get_tracefs(inode); + ti->flags |= TRACEFS_INSTANCE_INODE; + ti->private = inode; + /* * The mkdir call can call the generic functions that create * the files within the tracefs system. It is up to the individual @@ -141,10 +151,76 @@ static int tracefs_syscall_rmdir(struct inode *inode, struct dentry *dentry) return ret; }
-static const struct inode_operations tracefs_dir_inode_operations = { +static void set_tracefs_inode_owner(struct inode *inode) +{ + struct tracefs_inode *ti = get_tracefs(inode); + struct inode *root_inode = ti->private; + + /* + * If this inode has never been referenced, then update + * the permissions to the superblock. + */ + if (!(ti->flags & TRACEFS_UID_PERM_SET)) + inode->i_uid = root_inode->i_uid; + + if (!(ti->flags & TRACEFS_GID_PERM_SET)) + inode->i_gid = root_inode->i_gid; +} + +static int tracefs_permission(struct mnt_idmap *idmap, + struct inode *inode, int mask) +{ + set_tracefs_inode_owner(inode); + return generic_permission(idmap, inode, mask); +} + +static int tracefs_getattr(struct mnt_idmap *idmap, + const struct path *path, struct kstat *stat, + u32 request_mask, unsigned int flags) +{ + struct inode *inode = d_backing_inode(path->dentry); + + set_tracefs_inode_owner(inode); + generic_fillattr(idmap, request_mask, inode, stat); + return 0; +} + +static int tracefs_setattr(struct mnt_idmap *idmap, struct dentry *dentry, + struct iattr *attr) +{ + unsigned int ia_valid = attr->ia_valid; + struct inode *inode = d_inode(dentry); + struct tracefs_inode *ti = get_tracefs(inode); + + if (ia_valid & ATTR_UID) + ti->flags |= TRACEFS_UID_PERM_SET; + + if (ia_valid & ATTR_GID) + ti->flags |= TRACEFS_GID_PERM_SET; + + return simple_setattr(idmap, dentry, attr); +} + +static const struct inode_operations tracefs_instance_dir_inode_operations = { .lookup = simple_lookup, .mkdir = tracefs_syscall_mkdir, .rmdir = tracefs_syscall_rmdir, + .permission = tracefs_permission, + .getattr = tracefs_getattr, + .setattr = tracefs_setattr, +}; + +static const struct inode_operations tracefs_dir_inode_operations = { + .lookup = simple_lookup, + .permission = tracefs_permission, + .getattr = tracefs_getattr, + .setattr = tracefs_setattr, +}; + +static const struct inode_operations tracefs_file_inode_operations = { + .permission = tracefs_permission, + .getattr = tracefs_getattr, + .setattr = tracefs_setattr, };
struct inode *tracefs_get_inode(struct super_block *sb) @@ -183,87 +259,6 @@ struct tracefs_fs_info { struct tracefs_mount_opts mount_opts; };
-static void change_gid(struct dentry *dentry, kgid_t gid) -{ - if (!dentry->d_inode) - return; - dentry->d_inode->i_gid = gid; -} - -/* - * Taken from d_walk, but without he need for handling renames. - * Nothing can be renamed while walking the list, as tracefs - * does not support renames. This is only called when mounting - * or remounting the file system, to set all the files to - * the given gid. - */ -static void set_gid(struct dentry *parent, kgid_t gid) -{ - struct dentry *this_parent; - struct list_head *next; - - this_parent = parent; - spin_lock(&this_parent->d_lock); - - change_gid(this_parent, gid); -repeat: - next = this_parent->d_subdirs.next; -resume: - while (next != &this_parent->d_subdirs) { - struct tracefs_inode *ti; - struct list_head *tmp = next; - struct dentry *dentry = list_entry(tmp, struct dentry, d_child); - next = tmp->next; - - /* Note, getdents() can add a cursor dentry with no inode */ - if (!dentry->d_inode) - continue; - - spin_lock_nested(&dentry->d_lock, DENTRY_D_LOCK_NESTED); - - change_gid(dentry, gid); - - /* If this is the events directory, update that too */ - ti = get_tracefs(dentry->d_inode); - if (ti && (ti->flags & TRACEFS_EVENT_INODE)) - eventfs_update_gid(dentry, gid); - - if (!list_empty(&dentry->d_subdirs)) { - spin_unlock(&this_parent->d_lock); - spin_release(&dentry->d_lock.dep_map, _RET_IP_); - this_parent = dentry; - spin_acquire(&this_parent->d_lock.dep_map, 0, 1, _RET_IP_); - goto repeat; - } - spin_unlock(&dentry->d_lock); - } - /* - * All done at this level ... ascend and resume the search. - */ - rcu_read_lock(); -ascend: - if (this_parent != parent) { - struct dentry *child = this_parent; - this_parent = child->d_parent; - - spin_unlock(&child->d_lock); - spin_lock(&this_parent->d_lock); - - /* go into the first sibling still alive */ - do { - next = child->d_child.next; - if (next == &this_parent->d_subdirs) - goto ascend; - child = list_entry(next, struct dentry, d_child); - } while (unlikely(child->d_flags & DCACHE_DENTRY_KILLED)); - rcu_read_unlock(); - goto resume; - } - rcu_read_unlock(); - spin_unlock(&this_parent->d_lock); - return; -} - static int tracefs_parse_options(char *data, struct tracefs_mount_opts *opts) { substring_t args[MAX_OPT_ARGS]; @@ -336,10 +331,8 @@ static int tracefs_apply_options(struct super_block *sb, bool remount) if (!remount || opts->opts & BIT(Opt_uid)) inode->i_uid = opts->uid;
- if (!remount || opts->opts & BIT(Opt_gid)) { - /* Set all the group ids to the mount option */ - set_gid(sb->s_root, opts->gid); - } + if (!remount || opts->opts & BIT(Opt_gid)) + inode->i_gid = opts->gid;
return 0; } @@ -573,6 +566,26 @@ struct dentry *eventfs_end_creating(struct dentry *dentry) return dentry; }
+/* Find the inode that this will use for default */ +static struct inode *instance_inode(struct dentry *parent, struct inode *inode) +{ + struct tracefs_inode *ti; + + /* If parent is NULL then use root inode */ + if (!parent) + return d_inode(inode->i_sb->s_root); + + /* Find the inode that is flagged as an instance or the root inode */ + while (!IS_ROOT(parent)) { + ti = get_tracefs(d_inode(parent)); + if (ti->flags & TRACEFS_INSTANCE_INODE) + break; + parent = parent->d_parent; + } + + return d_inode(parent); +} + /** * tracefs_create_file - create a file in the tracefs filesystem * @name: a pointer to a string containing the name of the file to create. @@ -603,6 +616,7 @@ struct dentry *tracefs_create_file(const char *name, umode_t mode, struct dentry *parent, void *data, const struct file_operations *fops) { + struct tracefs_inode *ti; struct dentry *dentry; struct inode *inode;
@@ -621,7 +635,11 @@ struct dentry *tracefs_create_file(const char *name, umode_t mode, if (unlikely(!inode)) return tracefs_failed_creating(dentry);
+ ti = get_tracefs(inode); + ti->private = instance_inode(parent, inode); + inode->i_mode = mode; + inode->i_op = &tracefs_file_inode_operations; inode->i_fop = fops ? fops : &tracefs_file_operations; inode->i_private = data; inode->i_uid = d_inode(dentry->d_parent)->i_uid; @@ -634,6 +652,7 @@ struct dentry *tracefs_create_file(const char *name, umode_t mode, static struct dentry *__create_dir(const char *name, struct dentry *parent, const struct inode_operations *ops) { + struct tracefs_inode *ti; struct dentry *dentry = tracefs_start_creating(name, parent); struct inode *inode;
@@ -651,6 +670,9 @@ static struct dentry *__create_dir(const char *name, struct dentry *parent, inode->i_uid = d_inode(dentry->d_parent)->i_uid; inode->i_gid = d_inode(dentry->d_parent)->i_gid;
+ ti = get_tracefs(inode); + ti->private = instance_inode(parent, inode); + /* directory inodes start off with i_nlink == 2 (for "." entry) */ inc_nlink(inode); d_instantiate(dentry, inode); @@ -681,7 +703,7 @@ struct dentry *tracefs_create_dir(const char *name, struct dentry *parent) if (security_locked_down(LOCKDOWN_TRACEFS)) return NULL;
- return __create_dir(name, parent, &simple_dir_inode_operations); + return __create_dir(name, parent, &tracefs_dir_inode_operations); }
/** @@ -712,7 +734,7 @@ __init struct dentry *tracefs_create_instance_dir(const char *name, if (WARN_ON(tracefs_ops.mkdir || tracefs_ops.rmdir)) return NULL;
- dentry = __create_dir(name, parent, &tracefs_dir_inode_operations); + dentry = __create_dir(name, parent, &tracefs_instance_dir_inode_operations); if (!dentry) return NULL;
diff --git a/fs/tracefs/internal.h b/fs/tracefs/internal.h index 42bdeb471a07..12b7d0150ae9 100644 --- a/fs/tracefs/internal.h +++ b/fs/tracefs/internal.h @@ -5,6 +5,9 @@ enum { TRACEFS_EVENT_INODE = BIT(1), TRACEFS_EVENT_TOP_INODE = BIT(2), + TRACEFS_GID_PERM_SET = BIT(3), + TRACEFS_UID_PERM_SET = BIT(4), + TRACEFS_INSTANCE_INODE = BIT(5), };
struct tracefs_inode {
From: "Steven Rostedt (Google)" rostedt@goodmis.org
If ei->is_freed is set in eventfs_iterate(), it means that the directory that is being iterated on is in the process of being freed. Just exit the loop immediately when that is ever detected, and separate out the return of the entry->callback() from ei->is_freed.
Link: https://lore.kernel.org/linux-trace-kernel/20240104220048.016261289@goodmis....
Cc: Masami Hiramatsu mhiramat@kernel.org Cc: Mark Rutland mark.rutland@arm.com Cc: Mathieu Desnoyers mathieu.desnoyers@efficios.com Cc: Andrew Morton akpm@linux-foundation.org Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Al Viro viro@zeniv.linux.org.uk Cc: Christian Brauner brauner@kernel.org Cc: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Steven Rostedt (Google) rostedt@goodmis.org --- fs/tracefs/event_inode.c | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-)
diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c index 72912b5f9a90..0aca6910efb3 100644 --- a/fs/tracefs/event_inode.c +++ b/fs/tracefs/event_inode.c @@ -788,11 +788,12 @@ static int eventfs_iterate(struct file *file, struct dir_context *ctx) name = entry->name;
mutex_lock(&eventfs_mutex); - /* If ei->is_freed, then the event itself may be too */ - if (!ei->is_freed) - r = entry->callback(name, &mode, &cdata, &fops); - else - r = -1; + /* If ei->is_freed then just bail here, nothing more to do */ + if (ei->is_freed) { + mutex_unlock(&eventfs_mutex); + goto out; + } + r = entry->callback(name, &mode, &cdata, &fops); mutex_unlock(&eventfs_mutex); if (r <= 0) continue;
From: "Steven Rostedt (Google)" rostedt@goodmis.org
The ctx->pos was only updated when it added an entry, but the "skip to current pos" check (c--) happened for every loop regardless of if the entry was added or not. This inconsistency caused readdir to be incorrect.
It was due to:
for (i = 0; i < ei->nr_entries; i++) {
if (c > 0) { c--; continue; }
mutex_lock(&eventfs_mutex); /* If ei->is_freed then just bail here, nothing more to do */ if (ei->is_freed) { mutex_unlock(&eventfs_mutex); goto out; } r = entry->callback(name, &mode, &cdata, &fops); mutex_unlock(&eventfs_mutex);
[..] ctx->pos++; }
But this can cause the iterator to return a file that was already read. That's because of the way the callback() works. Some events may not have all files, and the callback can return 0 to tell eventfs to skip the file for this directory.
for instance, we have:
# ls /sys/kernel/tracing/events/ftrace/function format hist hist_debug id inject
and
# ls /sys/kernel/tracing/events/sched/sched_switch/ enable filter format hist hist_debug id inject trigger
Where the function directory is missing "enable", "filter" and "trigger". That's because the callback() for events has:
static int event_callback(const char *name, umode_t *mode, void **data, const struct file_operations **fops) { struct trace_event_file *file = *data; struct trace_event_call *call = file->event_call;
[..]
/* * Only event directories that can be enabled should have * triggers or filters, with the exception of the "print" * event that can have a "trigger" file. */ if (!(call->flags & TRACE_EVENT_FL_IGNORE_ENABLE)) { if (call->class->reg && strcmp(name, "enable") == 0) { *mode = TRACE_MODE_WRITE; *fops = &ftrace_enable_fops; return 1; }
if (strcmp(name, "filter") == 0) { *mode = TRACE_MODE_WRITE; *fops = &ftrace_event_filter_fops; return 1; } }
if (!(call->flags & TRACE_EVENT_FL_IGNORE_ENABLE) || strcmp(trace_event_name(call), "print") == 0) { if (strcmp(name, "trigger") == 0) { *mode = TRACE_MODE_WRITE; *fops = &event_trigger_fops; return 1; } } [..] return 0; }
Where the function event has the TRACE_EVENT_FL_IGNORE_ENABLE set.
This means that the entries array elements for "enable", "filter" and "trigger" when called on the function event will have the callback return 0 and not 1, to tell eventfs to skip these files for it.
Because the "skip to current ctx->pos" check happened for all entries, but the ctx->pos++ only happened to entries that exist, it would confuse the reading of a directory. Which would cause:
# ls /sys/kernel/tracing/events/ftrace/function/ format hist hist hist_debug hist_debug id inject inject
The missing "enable", "filter" and "trigger" caused ls to show "hist", "hist_debug" and "inject" twice.
Update the ctx->pos for every iteration to keep its update and the "skip" update consistent. This also means that on error, the ctx->pos needs to be decremented if it was incremented without adding something.
Link: https://lore.kernel.org/all/20240104150500.38b15a62@gandalf.local.home/ Link: https://lore.kernel.org/linux-trace-kernel/20240104220048.172295263@goodmis....
Cc: Masami Hiramatsu mhiramat@kernel.org Cc: Mark Rutland mark.rutland@arm.com Cc: Mathieu Desnoyers mathieu.desnoyers@efficios.com Cc: Andrew Morton akpm@linux-foundation.org Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Al Viro viro@zeniv.linux.org.uk Cc: Christian Brauner brauner@kernel.org Cc: Greg Kroah-Hartman gregkh@linuxfoundation.org Fixes: 493ec81a8fb8e ("eventfs: Stop using dcache_readdir() for getdents()") Signed-off-by: Steven Rostedt (Google) rostedt@goodmis.org --- fs/tracefs/event_inode.c | 21 ++++++++++++++------- 1 file changed, 14 insertions(+), 7 deletions(-)
diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c index 0aca6910efb3..c73fb1f7ddbc 100644 --- a/fs/tracefs/event_inode.c +++ b/fs/tracefs/event_inode.c @@ -760,6 +760,8 @@ static int eventfs_iterate(struct file *file, struct dir_context *ctx) continue; }
+ ctx->pos++; + if (ei_child->is_freed) continue;
@@ -767,13 +769,12 @@ static int eventfs_iterate(struct file *file, struct dir_context *ctx)
dentry = create_dir_dentry(ei, ei_child, ei_dentry); if (!dentry) - goto out; + goto out_dec; ino = dentry->d_inode->i_ino; dput(dentry);
if (!dir_emit(ctx, name, strlen(name), ino, DT_DIR)) - goto out; - ctx->pos++; + goto out_dec; }
for (i = 0; i < ei->nr_entries; i++) { @@ -784,6 +785,8 @@ static int eventfs_iterate(struct file *file, struct dir_context *ctx) continue; }
+ ctx->pos++; + entry = &ei->entries[i]; name = entry->name;
@@ -791,7 +794,7 @@ static int eventfs_iterate(struct file *file, struct dir_context *ctx) /* If ei->is_freed then just bail here, nothing more to do */ if (ei->is_freed) { mutex_unlock(&eventfs_mutex); - goto out; + goto out_dec; } r = entry->callback(name, &mode, &cdata, &fops); mutex_unlock(&eventfs_mutex); @@ -800,19 +803,23 @@ static int eventfs_iterate(struct file *file, struct dir_context *ctx)
dentry = create_file_dentry(ei, i, ei_dentry, name, mode, cdata, fops); if (!dentry) - goto out; + goto out_dec; ino = dentry->d_inode->i_ino; dput(dentry);
if (!dir_emit(ctx, name, strlen(name), ino, DT_REG)) - goto out; - ctx->pos++; + goto out_dec; } ret = 1; out: srcu_read_unlock(&eventfs_srcu, idx);
return ret; + + out_dec: + /* Incremented ctx->pos without adding something, reset it */ + ctx->pos--; + goto out; }
/**
From: "Steven Rostedt (Google)" rostedt@goodmis.org
In order to apply a shortcut to skip over the current ctx->pos immediately, by using the ei->entries array, the reading of that array should be first. Moving the array reading before the linked list reading will make the shortcut change diff nicer to read.
Link: https://lore.kernel.org/all/CAHk-=wiKwDUDv3+jCsv-uacDcHDVTYsXtBR9=6sGM5mqX+D... Link: https://lore.kernel.org/linux-trace-kernel/20240104220048.333115095@goodmis....
Cc: Masami Hiramatsu mhiramat@kernel.org Cc: Mark Rutland mark.rutland@arm.com Cc: Mathieu Desnoyers mathieu.desnoyers@efficios.com Cc: Andrew Morton akpm@linux-foundation.org Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Al Viro viro@zeniv.linux.org.uk Cc: Christian Brauner brauner@kernel.org Cc: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Steven Rostedt (Google) rostedt@goodmis.org --- fs/tracefs/event_inode.c | 46 ++++++++++++++++++++-------------------- 1 file changed, 23 insertions(+), 23 deletions(-)
diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c index c73fb1f7ddbc..a1934e0eea3b 100644 --- a/fs/tracefs/event_inode.c +++ b/fs/tracefs/event_inode.c @@ -752,8 +752,8 @@ static int eventfs_iterate(struct file *file, struct dir_context *ctx) * Need to create the dentries and inodes to have a consistent * inode number. */ - list_for_each_entry_srcu(ei_child, &ei->children, list, - srcu_read_lock_held(&eventfs_srcu)) { + for (i = 0; i < ei->nr_entries; i++) { + void *cdata = ei->data;
if (c > 0) { c--; @@ -762,23 +762,32 @@ static int eventfs_iterate(struct file *file, struct dir_context *ctx)
ctx->pos++;
- if (ei_child->is_freed) - continue; + entry = &ei->entries[i]; + name = entry->name;
- name = ei_child->name; + mutex_lock(&eventfs_mutex); + /* If ei->is_freed then just bail here, nothing more to do */ + if (ei->is_freed) { + mutex_unlock(&eventfs_mutex); + goto out_dec; + } + r = entry->callback(name, &mode, &cdata, &fops); + mutex_unlock(&eventfs_mutex); + if (r <= 0) + continue;
- dentry = create_dir_dentry(ei, ei_child, ei_dentry); + dentry = create_file_dentry(ei, i, ei_dentry, name, mode, cdata, fops); if (!dentry) goto out_dec; ino = dentry->d_inode->i_ino; dput(dentry);
- if (!dir_emit(ctx, name, strlen(name), ino, DT_DIR)) + if (!dir_emit(ctx, name, strlen(name), ino, DT_REG)) goto out_dec; }
- for (i = 0; i < ei->nr_entries; i++) { - void *cdata = ei->data; + list_for_each_entry_srcu(ei_child, &ei->children, list, + srcu_read_lock_held(&eventfs_srcu)) {
if (c > 0) { c--; @@ -787,27 +796,18 @@ static int eventfs_iterate(struct file *file, struct dir_context *ctx)
ctx->pos++;
- entry = &ei->entries[i]; - name = entry->name; - - mutex_lock(&eventfs_mutex); - /* If ei->is_freed then just bail here, nothing more to do */ - if (ei->is_freed) { - mutex_unlock(&eventfs_mutex); - goto out_dec; - } - r = entry->callback(name, &mode, &cdata, &fops); - mutex_unlock(&eventfs_mutex); - if (r <= 0) + if (ei_child->is_freed) continue;
- dentry = create_file_dentry(ei, i, ei_dentry, name, mode, cdata, fops); + name = ei_child->name; + + dentry = create_dir_dentry(ei, ei_child, ei_dentry); if (!dentry) goto out_dec; ino = dentry->d_inode->i_ino; dput(dentry);
- if (!dir_emit(ctx, name, strlen(name), ino, DT_REG)) + if (!dir_emit(ctx, name, strlen(name), ino, DT_DIR)) goto out_dec; } ret = 1;
From: "Steven Rostedt (Google)" rostedt@goodmis.org
As the ei->entries array is fixed for the duration of the eventfs_inode, it can be used to skip over already read entries in eventfs_iterate().
That is, if ctx->pos is greater than zero, there's no reason in doing the loop across the ei->entries array for the entries less than ctx->pos. Instead, start the lookup of the entries at the current ctx->pos.
Link: https://lore.kernel.org/all/CAHk-=wiKwDUDv3+jCsv-uacDcHDVTYsXtBR9=6sGM5mqX+D... Link: https://lore.kernel.org/linux-trace-kernel/20240104220048.494956957@goodmis....
Cc: Masami Hiramatsu mhiramat@kernel.org Cc: Mark Rutland mark.rutland@arm.com Cc: Mathieu Desnoyers mathieu.desnoyers@efficios.com Cc: Andrew Morton akpm@linux-foundation.org Cc: Al Viro viro@zeniv.linux.org.uk Cc: Christian Brauner brauner@kernel.org Cc: Greg Kroah-Hartman gregkh@linuxfoundation.org Suggested-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Steven Rostedt (Google) rostedt@goodmis.org --- fs/tracefs/event_inode.c | 23 ++++++++++------------- 1 file changed, 10 insertions(+), 13 deletions(-)
diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c index a1934e0eea3b..fdff53d5a1f8 100644 --- a/fs/tracefs/event_inode.c +++ b/fs/tracefs/event_inode.c @@ -746,21 +746,15 @@ static int eventfs_iterate(struct file *file, struct dir_context *ctx) if (!ei || !ei_dentry) goto out;
- ret = 0; - /* * Need to create the dentries and inodes to have a consistent * inode number. */ - for (i = 0; i < ei->nr_entries; i++) { - void *cdata = ei->data; - - if (c > 0) { - c--; - continue; - } + ret = 0;
- ctx->pos++; + /* Start at 'c' to jump over already read entries */ + for (i = c; i < ei->nr_entries; i++, ctx->pos++) { + void *cdata = ei->data;
entry = &ei->entries[i]; name = entry->name; @@ -769,7 +763,7 @@ static int eventfs_iterate(struct file *file, struct dir_context *ctx) /* If ei->is_freed then just bail here, nothing more to do */ if (ei->is_freed) { mutex_unlock(&eventfs_mutex); - goto out_dec; + goto out; } r = entry->callback(name, &mode, &cdata, &fops); mutex_unlock(&eventfs_mutex); @@ -778,14 +772,17 @@ static int eventfs_iterate(struct file *file, struct dir_context *ctx)
dentry = create_file_dentry(ei, i, ei_dentry, name, mode, cdata, fops); if (!dentry) - goto out_dec; + goto out; ino = dentry->d_inode->i_ino; dput(dentry);
if (!dir_emit(ctx, name, strlen(name), ino, DT_REG)) - goto out_dec; + goto out; }
+ /* Subtract the skipped entries above */ + c -= min((unsigned int)c, (unsigned int)ei->nr_entries); + list_for_each_entry_srcu(ei_child, &ei->children, list, srcu_read_lock_held(&eventfs_srcu)) {
From: "Steven Rostedt (Google)" rostedt@goodmis.org
The dentries and inodes are created in the readdir for the sole purpose of getting a consistent inode number. Linus stated that is unnecessary, and that all inodes can have the same inode number. For a virtual file system they are pretty meaningless.
Instead use a single unique inode number for all files and one for all directories.
Link: https://lore.kernel.org/all/20240116133753.2808d45e@gandalf.local.home/ Link: https://lore.kernel.org/linux-trace-kernel/20240116211353.412180363@goodmis....
Cc: Masami Hiramatsu mhiramat@kernel.org Cc: Mark Rutland mark.rutland@arm.com Cc: Mathieu Desnoyers mathieu.desnoyers@efficios.com Cc: Christian Brauner brauner@kernel.org Cc: Al Viro viro@ZenIV.linux.org.uk Cc: Ajay Kaher ajay.kaher@broadcom.com Suggested-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Steven Rostedt (Google) rostedt@goodmis.org --- fs/tracefs/event_inode.c | 10 ++++++++++ 1 file changed, 10 insertions(+)
diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c index fdff53d5a1f8..5edf0b96758b 100644 --- a/fs/tracefs/event_inode.c +++ b/fs/tracefs/event_inode.c @@ -32,6 +32,10 @@ */ static DEFINE_MUTEX(eventfs_mutex);
+/* Choose something "unique" ;-) */ +#define EVENTFS_FILE_INODE_INO 0x12c4e37 +#define EVENTFS_DIR_INODE_INO 0x134b2f5 + /* * The eventfs_inode (ei) itself is protected by SRCU. It is released from * its parent's list and will have is_freed set (under eventfs_mutex). @@ -352,6 +356,9 @@ static struct dentry *create_file(const char *name, umode_t mode, inode->i_fop = fop; inode->i_private = data;
+ /* All files will have the same inode number */ + inode->i_ino = EVENTFS_FILE_INODE_INO; + ti = get_tracefs(inode); ti->flags |= TRACEFS_EVENT_INODE; d_instantiate(dentry, inode); @@ -388,6 +395,9 @@ static struct dentry *create_dir(struct eventfs_inode *ei, struct dentry *parent inode->i_op = &eventfs_root_dir_inode_operations; inode->i_fop = &eventfs_file_operations;
+ /* All directories will have the same inode number */ + inode->i_ino = EVENTFS_DIR_INODE_INO; + ti = get_tracefs(inode); ti->flags |= TRACEFS_EVENT_INODE;
From: "Steven Rostedt (Google)" rostedt@goodmis.org
The original eventfs code added a wrapper around the dcache_readdir open callback and created all the dentries and inodes at open, and increment their ref count. A wrapper was added around the dcache_readdir release function to decrement all the ref counts of those created inodes and dentries. But this proved to be buggy[1] for when a kprobe was created during a dir read, it would create a dentry between the open and the release, and because the release would decrement all ref counts of all files and directories, that would include the kprobe directory that was not there to have its ref count incremented in open. This would cause the ref count to go to negative and later crash the kernel.
To solve this, the dentries and inodes that were created and had their ref count upped in open needed to be saved. That list needed to be passed from the open to the release, so that the release would only decrement the ref counts of the entries that were incremented in the open.
Unfortunately, the dcache_readdir logic was already using the file->private_data, which is the only field that can be used to pass information from the open to the release. What was done was the eventfs created another descriptor that had a void pointer to save the dcache_readdir pointer, and it wrapped all the callbacks, so that it could save the list of entries that had their ref counts incremented in the open, and pass it to the release. The wrapped callbacks would just put back the dcache_readdir pointer and call the functions it used so it could still use its data[2].
But Linus had an issue with the "hijacking" of the file->private_data (unfortunately this discussion was on a security list, so no public link). Which we finally agreed on doing everything within the iterate_shared callback and leave the dcache_readdir out of it[3]. All the information needed for the getents() could be created then.
But this ended up being buggy too[4]. The iterate_shared callback was not the right place to create the dentries and inodes. Even Christian Brauner had issues with that[5].
An attempt was to go back to creating the inodes and dentries at the open, create an array to store the information in the file->private_data, and pass that information to the other callbacks.[6]
The difference between that and the original method, is that it does not use dcache_readdir. It also does not up the ref counts of the dentries and pass them. Instead, it creates an array of a structure that saves the dentry's name and inode number. That information is used in the iterate_shared callback, and the array is freed in the dir release. The dentries and inodes created in the open are not used for the iterate_share or release callbacks. Just their names and inode numbers.
Linus did not like that either[7] and just wanted to remove the dentries being created in iterate_shared and use the hard coded inode numbers.
[ All this while Linus enjoyed an unexpected vacation during the merge window due to lack of power. ]
[1] https://lore.kernel.org/linux-trace-kernel/20230919211804.230edf1e@gandalf.l... [2] https://lore.kernel.org/linux-trace-kernel/20230922163446.1431d4fa@gandalf.l... [3] https://lore.kernel.org/linux-trace-kernel/20240104015435.682218477@goodmis.... [4] https://lore.kernel.org/all/202401152142.bfc28861-oliver.sang@intel.com/ [5] https://lore.kernel.org/all/20240111-unzahl-gefegt-433acb8a841d@brauner/ [6] https://lore.kernel.org/all/20240116114711.7e8637be@gandalf.local.home/ [7] https://lore.kernel.org/all/20240116170154.5bf0a250@gandalf.local.home/
Link: https://lore.kernel.org/linux-trace-kernel/20240116211353.573784051@goodmis....
Cc: Masami Hiramatsu mhiramat@kernel.org Cc: Mark Rutland mark.rutland@arm.com Cc: Mathieu Desnoyers mathieu.desnoyers@efficios.com Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Christian Brauner brauner@kernel.org Cc: Al Viro viro@ZenIV.linux.org.uk Cc: Ajay Kaher ajay.kaher@broadcom.com Fixes: 493ec81a8fb8 ("eventfs: Stop using dcache_readdir() for getdents()") Reported-by: kernel test robot oliver.sang@intel.com Closes: https://lore.kernel.org/oe-lkp/202401152142.bfc28861-oliver.sang@intel.com Signed-off-by: Steven Rostedt (Google) rostedt@goodmis.org --- fs/tracefs/event_inode.c | 20 +++++--------------- 1 file changed, 5 insertions(+), 15 deletions(-)
diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c index 5edf0b96758b..10580d6b5012 100644 --- a/fs/tracefs/event_inode.c +++ b/fs/tracefs/event_inode.c @@ -727,8 +727,6 @@ static int eventfs_iterate(struct file *file, struct dir_context *ctx) struct eventfs_inode *ei_child; struct tracefs_inode *ti; struct eventfs_inode *ei; - struct dentry *ei_dentry = NULL; - struct dentry *dentry; const char *name; umode_t mode; int idx; @@ -749,11 +747,11 @@ static int eventfs_iterate(struct file *file, struct dir_context *ctx)
mutex_lock(&eventfs_mutex); ei = READ_ONCE(ti->private); - if (ei && !ei->is_freed) - ei_dentry = READ_ONCE(ei->dentry); + if (ei && ei->is_freed) + ei = NULL; mutex_unlock(&eventfs_mutex);
- if (!ei || !ei_dentry) + if (!ei) goto out;
/* @@ -780,11 +778,7 @@ static int eventfs_iterate(struct file *file, struct dir_context *ctx) if (r <= 0) continue;
- dentry = create_file_dentry(ei, i, ei_dentry, name, mode, cdata, fops); - if (!dentry) - goto out; - ino = dentry->d_inode->i_ino; - dput(dentry); + ino = EVENTFS_FILE_INODE_INO;
if (!dir_emit(ctx, name, strlen(name), ino, DT_REG)) goto out; @@ -808,11 +802,7 @@ static int eventfs_iterate(struct file *file, struct dir_context *ctx)
name = ei_child->name;
- dentry = create_dir_dentry(ei, ei_child, ei_dentry); - if (!dentry) - goto out_dec; - ino = dentry->d_inode->i_ino; - dput(dentry); + ino = EVENTFS_DIR_INODE_INO;
if (!dir_emit(ctx, name, strlen(name), ino, DT_DIR)) goto out_dec;
From: Erick Archer erick.archer@gmx.com
As noted in the "Deprecated Interfaces, Language Features, Attributes, and Conventions" documentation [1], size calculations (especially multiplication) should not be performed in memory allocator (or similar) function arguments due to the risk of them overflowing. This could lead to values wrapping around and a smaller allocation being made than the caller was expecting. Using those allocations could lead to linear overflows of heap memory and other misbehaviors.
So, use the purpose specific kcalloc() function instead of the argument size * count in the kzalloc() function.
[1] https://www.kernel.org/doc/html/next/process/deprecated.html#open-coded-arit...
Link: https://lore.kernel.org/linux-trace-kernel/20240115181658.4562-1-erick.arche...
Cc: Masami Hiramatsu mhiramat@kernel.org Cc: Mathieu Desnoyers mathieu.desnoyers@efficios.com Cc: Mark Rutland mark.rutland@arm.com Link: https://github.com/KSPP/linux/issues/162 Signed-off-by: Erick Archer erick.archer@gmx.com Reviewed-by: Gustavo A. R. Silva gustavoars@kernel.org Signed-off-by: Steven Rostedt (Google) rostedt@goodmis.org --- fs/tracefs/event_inode.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c index 10580d6b5012..6795fda2af19 100644 --- a/fs/tracefs/event_inode.c +++ b/fs/tracefs/event_inode.c @@ -97,7 +97,7 @@ static int eventfs_set_attr(struct mnt_idmap *idmap, struct dentry *dentry, /* Preallocate the children mode array if necessary */ if (!(dentry->d_inode->i_mode & S_IFDIR)) { if (!ei->entry_attrs) { - ei->entry_attrs = kzalloc(sizeof(*ei->entry_attrs) * ei->nr_entries, + ei->entry_attrs = kcalloc(ei->nr_entries, sizeof(*ei->entry_attrs), GFP_NOFS); if (!ei->entry_attrs) { ret = -ENOMEM; @@ -874,7 +874,7 @@ struct eventfs_inode *eventfs_create_dir(const char *name, struct eventfs_inode }
if (size) { - ei->d_children = kzalloc(sizeof(*ei->d_children) * size, GFP_KERNEL); + ei->d_children = kcalloc(size, sizeof(*ei->d_children), GFP_KERNEL); if (!ei->d_children) { kfree_const(ei->name); kfree(ei); @@ -941,7 +941,7 @@ struct eventfs_inode *eventfs_create_events_dir(const char *name, struct dentry goto fail;
if (size) { - ei->d_children = kzalloc(sizeof(*ei->d_children) * size, GFP_KERNEL); + ei->d_children = kcalloc(size, sizeof(*ei->d_children), GFP_KERNEL); if (!ei->d_children) goto fail; }
From: "Steven Rostedt (Google)" rostedt@goodmis.org
The eventfs inodes and directories are allocated when referenced. But this leaves the issue of keeping consistent inode numbers and the number is only saved in the inode structure itself. When the inode is no longer referenced, it can be freed. When the file that the inode was representing is referenced again, the inode is once again created, but the inode number needs to be the same as it was before.
Just making the inode numbers the same for all files is fine, but that does not work with directories. The find command will check for loops via the inode number and having the same inode number for directories triggers:
# find /sys/kernel/tracing find: File system loop detected; '/sys/kernel/debug/tracing/events/initcall/initcall_finish' is part of the same file system loop as '/sys/kernel/debug/tracing/events/initcall'. [..]
Linus pointed out that the eventfs_inode structure ends with a single 32bit int, and on 64 bit machines, there's likely a 4 byte hole due to alignment. We can use this hole to store the inode number for the eventfs_inode. All directories in eventfs are represented by an eventfs_inode and that data structure can hold its inode number.
That last int was also purposely placed at the end of the structure to prevent holes from within. Now that there's a 4 byte number to hold the inode, both the inode number and the last integer can be moved up in the structure for better cache locality, where the llist and rcu fields can be moved to the end as they are only used when the eventfs_inode is being deleted.
Link: https://lore.kernel.org/all/CAMuHMdXKiorg-jiuKoZpfZyDJ3Ynrfb8=X+c7x0Eewxn-YR... Link: https://lore.kernel.org/linux-trace-kernel/20240122152748.46897388@gandalf.l...
Cc: Masami Hiramatsu mhiramat@kernel.org Cc: Mathieu Desnoyers mathieu.desnoyers@efficios.com Cc: Linus Torvalds torvalds@linux-foundation.org Reported-by: Geert Uytterhoeven geert@linux-m68k.org Tested-by: Geert Uytterhoeven geert+renesas@glider.be Fixes: 53c41052ba31 ("eventfs: Have the inodes all for files and directories all be the same") Signed-off-by: Steven Rostedt (Google) rostedt@goodmis.org Reviewed-by: Kees Cook keescook@chromium.org --- fs/tracefs/event_inode.c | 14 +++++++++++--- fs/tracefs/internal.h | 7 ++++--- 2 files changed, 15 insertions(+), 6 deletions(-)
diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c index 6795fda2af19..6b211522a13e 100644 --- a/fs/tracefs/event_inode.c +++ b/fs/tracefs/event_inode.c @@ -34,7 +34,15 @@ static DEFINE_MUTEX(eventfs_mutex);
/* Choose something "unique" ;-) */ #define EVENTFS_FILE_INODE_INO 0x12c4e37 -#define EVENTFS_DIR_INODE_INO 0x134b2f5 + +/* Just try to make something consistent and unique */ +static int eventfs_dir_ino(struct eventfs_inode *ei) +{ + if (!ei->ino) + ei->ino = get_next_ino(); + + return ei->ino; +}
/* * The eventfs_inode (ei) itself is protected by SRCU. It is released from @@ -396,7 +404,7 @@ static struct dentry *create_dir(struct eventfs_inode *ei, struct dentry *parent inode->i_fop = &eventfs_file_operations;
/* All directories will have the same inode number */ - inode->i_ino = EVENTFS_DIR_INODE_INO; + inode->i_ino = eventfs_dir_ino(ei);
ti = get_tracefs(inode); ti->flags |= TRACEFS_EVENT_INODE; @@ -802,7 +810,7 @@ static int eventfs_iterate(struct file *file, struct dir_context *ctx)
name = ei_child->name;
- ino = EVENTFS_DIR_INODE_INO; + ino = eventfs_dir_ino(ei_child);
if (!dir_emit(ctx, name, strlen(name), ino, DT_DIR)) goto out_dec; diff --git a/fs/tracefs/internal.h b/fs/tracefs/internal.h index 12b7d0150ae9..45397df9bb65 100644 --- a/fs/tracefs/internal.h +++ b/fs/tracefs/internal.h @@ -55,6 +55,10 @@ struct eventfs_inode { struct eventfs_attr *entry_attrs; struct eventfs_attr attr; void *data; + unsigned int is_freed:1; + unsigned int is_events:1; + unsigned int nr_entries:30; + unsigned int ino; /* * Union - used for deletion * @llist: for calling dput() if needed after RCU @@ -64,9 +68,6 @@ struct eventfs_inode { struct llist_node llist; struct rcu_head rcu; }; - unsigned int is_freed:1; - unsigned int is_events:1; - unsigned int nr_entries:30; };
static inline struct tracefs_inode *get_tracefs(const struct inode *inode)
From: Linus Torvalds torvalds@linux-foundation.org
The 'eventfs_update_gid()' function is no longer called, so remove it (and the helper function it uses).
Link: https://lore.kernel.org/all/CAHk-=wj+DsZZ=2iTUkJ-Nojs9fjYMvPs1NuoM3yK7aTDtJf...
Fixes: 8186fff7ab64 ("tracefs/eventfs: Use root and instance inodes as default ownership") Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Steven Rostedt (Google) rostedt@goodmis.org --- fs/tracefs/event_inode.c | 38 -------------------------------------- fs/tracefs/internal.h | 1 - 2 files changed, 39 deletions(-)
diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c index 6b211522a13e..1c3dd0ad4660 100644 --- a/fs/tracefs/event_inode.c +++ b/fs/tracefs/event_inode.c @@ -281,44 +281,6 @@ static void update_inode_attr(struct dentry *dentry, struct inode *inode, inode->i_gid = attr->gid; }
-static void update_gid(struct eventfs_inode *ei, kgid_t gid, int level) -{ - struct eventfs_inode *ei_child; - - /* at most we have events/system/event */ - if (WARN_ON_ONCE(level > 3)) - return; - - ei->attr.gid = gid; - - if (ei->entry_attrs) { - for (int i = 0; i < ei->nr_entries; i++) { - ei->entry_attrs[i].gid = gid; - } - } - - /* - * Only eventfs_inode with dentries are updated, make sure - * all eventfs_inodes are updated. If one of the children - * do not have a dentry, this function must traverse it. - */ - list_for_each_entry_srcu(ei_child, &ei->children, list, - srcu_read_lock_held(&eventfs_srcu)) { - if (!ei_child->dentry) - update_gid(ei_child, gid, level + 1); - } -} - -void eventfs_update_gid(struct dentry *dentry, kgid_t gid) -{ - struct eventfs_inode *ei = dentry->d_fsdata; - int idx; - - idx = srcu_read_lock(&eventfs_srcu); - update_gid(ei, gid, 0); - srcu_read_unlock(&eventfs_srcu, idx); -} - /** * create_file - create a file in the tracefs filesystem * @name: the name of the file to create. diff --git a/fs/tracefs/internal.h b/fs/tracefs/internal.h index 45397df9bb65..91c2bf0b91d9 100644 --- a/fs/tracefs/internal.h +++ b/fs/tracefs/internal.h @@ -82,7 +82,6 @@ struct inode *tracefs_get_inode(struct super_block *sb); struct dentry *eventfs_start_creating(const char *name, struct dentry *parent); struct dentry *eventfs_failed_creating(struct dentry *dentry); struct dentry *eventfs_end_creating(struct dentry *dentry); -void eventfs_update_gid(struct dentry *dentry, kgid_t gid); void eventfs_set_ei_status_free(struct tracefs_inode *ti, struct dentry *dentry);
#endif /* _TRACEFS_INTERNAL_H */
From: "Steven Rostedt (Google)" rostedt@goodmis.org
eventfs uses the tracefs_inode and assumes that it's already initialized to zero. That is, it doesn't set fields to zero (like ti->private) after getting its tracefs_inode. This causes bugs due to stale values.
Just initialize the entire structure to zero on allocation so there isn't any more surprises.
This is a partial fix to access to ti->private. The assignment still needs to be made before the dentry is instantiated.
Link: https://lore.kernel.org/linux-trace-kernel/20240131185512.315825944@goodmis....
Cc: stable@vger.kernel.org Cc: Masami Hiramatsu mhiramat@kernel.org Cc: Mark Rutland mark.rutland@arm.com Cc: Mathieu Desnoyers mathieu.desnoyers@efficios.com Cc: Christian Brauner brauner@kernel.org Cc: Al Viro viro@ZenIV.linux.org.uk Cc: Ajay Kaher ajay.kaher@broadcom.com Cc: Greg Kroah-Hartman gregkh@linuxfoundation.org Fixes: 5790b1fb3d672 ("eventfs: Remove eventfs_file and just use eventfs_inode") Reported-by: kernel test robot oliver.sang@intel.com Closes: https://lore.kernel.org/oe-lkp/202401291043.e62e89dc-oliver.sang@intel.com Suggested-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Steven Rostedt (Google) rostedt@goodmis.org --- fs/tracefs/inode.c | 6 ++++-- fs/tracefs/internal.h | 3 ++- 2 files changed, 6 insertions(+), 3 deletions(-)
diff --git a/fs/tracefs/inode.c b/fs/tracefs/inode.c index e1b172c0e091..888e42087847 100644 --- a/fs/tracefs/inode.c +++ b/fs/tracefs/inode.c @@ -38,8 +38,6 @@ static struct inode *tracefs_alloc_inode(struct super_block *sb) if (!ti) return NULL;
- ti->flags = 0; - return &ti->vfs_inode; }
@@ -779,7 +777,11 @@ static void init_once(void *foo) { struct tracefs_inode *ti = (struct tracefs_inode *) foo;
+ /* inode_init_once() calls memset() on the vfs_inode portion */ inode_init_once(&ti->vfs_inode); + + /* Zero out the rest */ + memset_after(ti, 0, vfs_inode); }
static int __init tracefs_init(void) diff --git a/fs/tracefs/internal.h b/fs/tracefs/internal.h index 91c2bf0b91d9..7d84349ade87 100644 --- a/fs/tracefs/internal.h +++ b/fs/tracefs/internal.h @@ -11,9 +11,10 @@ enum { };
struct tracefs_inode { + struct inode vfs_inode; + /* The below gets initialized with memset_after(ti, 0, vfs_inode) */ unsigned long flags; void *private; - struct inode vfs_inode; };
/*
From: Linus Torvalds torvalds@linux-foundation.org
The tracefs-specific fields in the inode were not initialized before the inode was exposed to others through the dentry with 'd_instantiate()'.
Move the field initializations up to before the d_instantiate.
Link: https://lore.kernel.org/linux-trace-kernel/20240131185512.478449628@goodmis....
Cc: stable@vger.kernel.org Cc: Masami Hiramatsu mhiramat@kernel.org Cc: Mark Rutland mark.rutland@arm.com Cc: Mathieu Desnoyers mathieu.desnoyers@efficios.com Cc: Christian Brauner brauner@kernel.org Cc: Al Viro viro@ZenIV.linux.org.uk Cc: Ajay Kaher ajay.kaher@broadcom.com Cc: Greg Kroah-Hartman gregkh@linuxfoundation.org Fixes: 5790b1fb3d672 ("eventfs: Remove eventfs_file and just use eventfs_inode") Reported-by: kernel test robot oliver.sang@intel.com Closes: https://lore.kernel.org/oe-lkp/202401291043.e62e89dc-oliver.sang@intel.com Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Steven Rostedt (Google) rostedt@goodmis.org --- fs/tracefs/event_inode.c | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-)
diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c index 1c3dd0ad4660..824b1811e342 100644 --- a/fs/tracefs/event_inode.c +++ b/fs/tracefs/event_inode.c @@ -370,6 +370,8 @@ static struct dentry *create_dir(struct eventfs_inode *ei, struct dentry *parent
ti = get_tracefs(inode); ti->flags |= TRACEFS_EVENT_INODE; + /* Only directories have ti->private set to an ei, not files */ + ti->private = ei;
inc_nlink(inode); d_instantiate(dentry, inode); @@ -515,7 +517,6 @@ create_file_dentry(struct eventfs_inode *ei, int idx, static void eventfs_post_create_dir(struct eventfs_inode *ei) { struct eventfs_inode *ei_child; - struct tracefs_inode *ti;
lockdep_assert_held(&eventfs_mutex);
@@ -525,9 +526,6 @@ static void eventfs_post_create_dir(struct eventfs_inode *ei) srcu_read_lock_held(&eventfs_srcu)) { ei_child->d_parent = ei->dentry; } - - ti = get_tracefs(ei->dentry->d_inode); - ti->private = ei; }
/**
From: Linus Torvalds torvalds@linux-foundation.org
The eventfs_find_events() code tries to walk up the tree to find the event directory that a dentry belongs to, in order to then find the eventfs inode that is associated with that event directory.
However, it uses an odd combination of walking the dentry parent, looking up the eventfs inode associated with that, and then looking up the dentry from there. Repeat.
But the code shouldn't have back-pointers to dentries in the first place, and it should just walk the dentry parenthood chain directly.
Similarly, 'set_top_events_ownership()' looks up the dentry from the eventfs inode, but the only reason it wants a dentry is to look up the superblock in order to look up the root dentry.
But it already has the real filesystem inode, which has that same superblock pointer. So just pass in the superblock pointer using the information that's already there, instead of looking up extraneous data that is irrelevant.
Link: https://lore.kernel.org/linux-trace-kernel/202401291043.e62e89dc-oliver.sang... Link: https://lore.kernel.org/linux-trace-kernel/20240131185512.638645365@goodmis....
Cc: stable@vger.kernel.org Cc: Masami Hiramatsu mhiramat@kernel.org Cc: Mark Rutland mark.rutland@arm.com Cc: Mathieu Desnoyers mathieu.desnoyers@efficios.com Cc: Christian Brauner brauner@kernel.org Cc: Al Viro viro@ZenIV.linux.org.uk Cc: Ajay Kaher ajay.kaher@broadcom.com Cc: Greg Kroah-Hartman gregkh@linuxfoundation.org Fixes: c1504e510238 ("eventfs: Implement eventfs dir creation functions") Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Steven Rostedt (Google) rostedt@goodmis.org --- fs/tracefs/event_inode.c | 26 ++++++++++++-------------- 1 file changed, 12 insertions(+), 14 deletions(-)
diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c index 824b1811e342..e9819d719d2a 100644 --- a/fs/tracefs/event_inode.c +++ b/fs/tracefs/event_inode.c @@ -156,33 +156,30 @@ static int eventfs_set_attr(struct mnt_idmap *idmap, struct dentry *dentry, return ret; }
-static void update_top_events_attr(struct eventfs_inode *ei, struct dentry *dentry) +static void update_top_events_attr(struct eventfs_inode *ei, struct super_block *sb) { - struct inode *inode; + struct inode *root;
/* Only update if the "events" was on the top level */ if (!ei || !(ei->attr.mode & EVENTFS_TOPLEVEL)) return;
/* Get the tracefs root inode. */ - inode = d_inode(dentry->d_sb->s_root); - ei->attr.uid = inode->i_uid; - ei->attr.gid = inode->i_gid; + root = d_inode(sb->s_root); + ei->attr.uid = root->i_uid; + ei->attr.gid = root->i_gid; }
static void set_top_events_ownership(struct inode *inode) { struct tracefs_inode *ti = get_tracefs(inode); struct eventfs_inode *ei = ti->private; - struct dentry *dentry;
/* The top events directory doesn't get automatically updated */ if (!ei || !ei->is_events || !(ei->attr.mode & EVENTFS_TOPLEVEL)) return;
- dentry = ei->dentry; - - update_top_events_attr(ei, dentry); + update_top_events_attr(ei, inode->i_sb);
if (!(ei->attr.mode & EVENTFS_SAVE_UID)) inode->i_uid = ei->attr.uid; @@ -235,8 +232,10 @@ static struct eventfs_inode *eventfs_find_events(struct dentry *dentry)
mutex_lock(&eventfs_mutex); do { - /* The parent always has an ei, except for events itself */ - ei = dentry->d_parent->d_fsdata; + // The parent is stable because we do not do renames + dentry = dentry->d_parent; + // ... and directories always have d_fsdata + ei = dentry->d_fsdata;
/* * If the ei is being freed, the ownership of the children @@ -246,12 +245,11 @@ static struct eventfs_inode *eventfs_find_events(struct dentry *dentry) ei = NULL; break; } - - dentry = ei->dentry; + // Walk upwards until you find the events inode } while (!ei->is_events); mutex_unlock(&eventfs_mutex);
- update_top_events_attr(ei, dentry); + update_top_events_attr(ei, dentry->d_sb);
return ei; }
From: Linus Torvalds torvalds@linux-foundation.org
The dentry lookup for eventfs files was very broken, and had lots of signs of the old situation where the filesystem names were all created statically in the dentry tree, rather than being looked up dynamically based on the eventfs data structures.
You could see it in the naming - how it claimed to "create" dentries rather than just look up the dentries that were given it.
You could see it in various nonsensical and very incorrect operations, like using "simple_lookup()" on the dentries that were passed in, which only results in those dentries becoming negative dentries. Which meant that any other lookup would possibly return ENOENT if it saw that negative dentry before the data was then later filled in.
You could see it in the immense amount of nonsensical code that didn't actually just do lookups.
Link: https://lore.kernel.org/linux-trace-kernel/202401291043.e62e89dc-oliver.sang... Link: https://lore.kernel.org/linux-trace-kernel/20240131233227.73db55e1@gandalf.l...
Cc: stable@vger.kernel.org Cc: Al Viro viro@ZenIV.linux.org.uk Cc: Masami Hiramatsu mhiramat@kernel.org Cc: Mathieu Desnoyers mathieu.desnoyers@efficios.com Cc: Christian Brauner brauner@kernel.org Cc: Greg Kroah-Hartman gregkh@linuxfoundation.org Cc: Ajay Kaher ajay.kaher@broadcom.com Cc: Mark Rutland mark.rutland@arm.com Fixes: c1504e510238 ("eventfs: Implement eventfs dir creation functions") Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Steven Rostedt (Google) rostedt@goodmis.org --- fs/tracefs/event_inode.c | 275 +++++++-------------------------------- fs/tracefs/inode.c | 69 ---------- fs/tracefs/internal.h | 3 - 3 files changed, 50 insertions(+), 297 deletions(-)
diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c index e9819d719d2a..04c2ab90f93e 100644 --- a/fs/tracefs/event_inode.c +++ b/fs/tracefs/event_inode.c @@ -230,7 +230,6 @@ static struct eventfs_inode *eventfs_find_events(struct dentry *dentry) { struct eventfs_inode *ei;
- mutex_lock(&eventfs_mutex); do { // The parent is stable because we do not do renames dentry = dentry->d_parent; @@ -247,7 +246,6 @@ static struct eventfs_inode *eventfs_find_events(struct dentry *dentry) } // Walk upwards until you find the events inode } while (!ei->is_events); - mutex_unlock(&eventfs_mutex);
update_top_events_attr(ei, dentry->d_sb);
@@ -280,11 +278,10 @@ static void update_inode_attr(struct dentry *dentry, struct inode *inode, }
/** - * create_file - create a file in the tracefs filesystem - * @name: the name of the file to create. + * lookup_file - look up a file in the tracefs filesystem + * @dentry: the dentry to look up * @mode: the permission that the file should have. * @attr: saved attributes changed by user - * @parent: parent dentry for this file. * @data: something that the caller will want to get to later on. * @fop: struct file_operations that should be used for this file. * @@ -292,13 +289,13 @@ static void update_inode_attr(struct dentry *dentry, struct inode *inode, * directory. The inode.i_private pointer will point to @data in the open() * call. */ -static struct dentry *create_file(const char *name, umode_t mode, +static struct dentry *lookup_file(struct dentry *dentry, + umode_t mode, struct eventfs_attr *attr, - struct dentry *parent, void *data, + void *data, const struct file_operations *fop) { struct tracefs_inode *ti; - struct dentry *dentry; struct inode *inode;
if (!(mode & S_IFMT)) @@ -307,15 +304,9 @@ static struct dentry *create_file(const char *name, umode_t mode, if (WARN_ON_ONCE(!S_ISREG(mode))) return NULL;
- WARN_ON_ONCE(!parent); - dentry = eventfs_start_creating(name, parent); - - if (IS_ERR(dentry)) - return dentry; - inode = tracefs_get_inode(dentry->d_sb); if (unlikely(!inode)) - return eventfs_failed_creating(dentry); + return ERR_PTR(-ENOMEM);
/* If the user updated the directory's attributes, use them */ update_inode_attr(dentry, inode, attr, mode); @@ -329,32 +320,29 @@ static struct dentry *create_file(const char *name, umode_t mode,
ti = get_tracefs(inode); ti->flags |= TRACEFS_EVENT_INODE; - d_instantiate(dentry, inode); + + d_add(dentry, inode); fsnotify_create(dentry->d_parent->d_inode, dentry); - return eventfs_end_creating(dentry); + return dentry; };
/** - * create_dir - create a dir in the tracefs filesystem + * lookup_dir_entry - look up a dir in the tracefs filesystem + * @dentry: the directory to look up * @ei: the eventfs_inode that represents the directory to create - * @parent: parent dentry for this file. * - * This function will create a dentry for a directory represented by + * This function will look up a dentry for a directory represented by * a eventfs_inode. */ -static struct dentry *create_dir(struct eventfs_inode *ei, struct dentry *parent) +static struct dentry *lookup_dir_entry(struct dentry *dentry, + struct eventfs_inode *pei, struct eventfs_inode *ei) { struct tracefs_inode *ti; - struct dentry *dentry; struct inode *inode;
- dentry = eventfs_start_creating(ei->name, parent); - if (IS_ERR(dentry)) - return dentry; - inode = tracefs_get_inode(dentry->d_sb); if (unlikely(!inode)) - return eventfs_failed_creating(dentry); + return ERR_PTR(-ENOMEM);
/* If the user updated the directory's attributes, use them */ update_inode_attr(dentry, inode, &ei->attr, @@ -371,11 +359,14 @@ static struct dentry *create_dir(struct eventfs_inode *ei, struct dentry *parent /* Only directories have ti->private set to an ei, not files */ ti->private = ei;
+ dentry->d_fsdata = ei; + ei->dentry = dentry; // Remove me! + inc_nlink(inode); - d_instantiate(dentry, inode); + d_add(dentry, inode); inc_nlink(dentry->d_parent->d_inode); fsnotify_mkdir(dentry->d_parent->d_inode, dentry); - return eventfs_end_creating(dentry); + return dentry; }
static void free_ei(struct eventfs_inode *ei) @@ -425,7 +416,7 @@ void eventfs_set_ei_status_free(struct tracefs_inode *ti, struct dentry *dentry) }
/** - * create_file_dentry - create a dentry for a file of an eventfs_inode + * lookup_file_dentry - create a dentry for a file of an eventfs_inode * @ei: the eventfs_inode that the file will be created under * @idx: the index into the d_children[] of the @ei * @parent: The parent dentry of the created file. @@ -438,157 +429,21 @@ void eventfs_set_ei_status_free(struct tracefs_inode *ti, struct dentry *dentry) * address located at @e_dentry. */ static struct dentry * -create_file_dentry(struct eventfs_inode *ei, int idx, - struct dentry *parent, const char *name, umode_t mode, void *data, +lookup_file_dentry(struct dentry *dentry, + struct eventfs_inode *ei, int idx, + umode_t mode, void *data, const struct file_operations *fops) { struct eventfs_attr *attr = NULL; struct dentry **e_dentry = &ei->d_children[idx]; - struct dentry *dentry; - - WARN_ON_ONCE(!inode_is_locked(parent->d_inode));
- mutex_lock(&eventfs_mutex); - if (ei->is_freed) { - mutex_unlock(&eventfs_mutex); - return NULL; - } - /* If the e_dentry already has a dentry, use it */ - if (*e_dentry) { - dget(*e_dentry); - mutex_unlock(&eventfs_mutex); - return *e_dentry; - } - - /* ei->entry_attrs are protected by SRCU */ if (ei->entry_attrs) attr = &ei->entry_attrs[idx];
- mutex_unlock(&eventfs_mutex); - - dentry = create_file(name, mode, attr, parent, data, fops); + dentry->d_fsdata = ei; // NOTE: ei of _parent_ + lookup_file(dentry, mode, attr, data, fops);
- mutex_lock(&eventfs_mutex); - - if (IS_ERR_OR_NULL(dentry)) { - /* - * When the mutex was released, something else could have - * created the dentry for this e_dentry. In which case - * use that one. - * - * If ei->is_freed is set, the e_dentry is currently on its - * way to being freed, don't return it. If e_dentry is NULL - * it means it was already freed. - */ - if (ei->is_freed) { - dentry = NULL; - } else { - dentry = *e_dentry; - dget(dentry); - } - mutex_unlock(&eventfs_mutex); - return dentry; - } - - if (!*e_dentry && !ei->is_freed) { - *e_dentry = dentry; - dentry->d_fsdata = ei; - } else { - /* - * Should never happen unless we get here due to being freed. - * Otherwise it means two dentries exist with the same name. - */ - WARN_ON_ONCE(!ei->is_freed); - dentry = NULL; - } - mutex_unlock(&eventfs_mutex); - - return dentry; -} - -/** - * eventfs_post_create_dir - post create dir routine - * @ei: eventfs_inode of recently created dir - * - * Map the meta-data of files within an eventfs dir to their parent dentry - */ -static void eventfs_post_create_dir(struct eventfs_inode *ei) -{ - struct eventfs_inode *ei_child; - - lockdep_assert_held(&eventfs_mutex); - - /* srcu lock already held */ - /* fill parent-child relation */ - list_for_each_entry_srcu(ei_child, &ei->children, list, - srcu_read_lock_held(&eventfs_srcu)) { - ei_child->d_parent = ei->dentry; - } -} - -/** - * create_dir_dentry - Create a directory dentry for the eventfs_inode - * @pei: The eventfs_inode parent of ei. - * @ei: The eventfs_inode to create the directory for - * @parent: The dentry of the parent of this directory - * - * This creates and attaches a directory dentry to the eventfs_inode @ei. - */ -static struct dentry * -create_dir_dentry(struct eventfs_inode *pei, struct eventfs_inode *ei, - struct dentry *parent) -{ - struct dentry *dentry = NULL; - - WARN_ON_ONCE(!inode_is_locked(parent->d_inode)); - - mutex_lock(&eventfs_mutex); - if (pei->is_freed || ei->is_freed) { - mutex_unlock(&eventfs_mutex); - return NULL; - } - if (ei->dentry) { - /* If the eventfs_inode already has a dentry, use it */ - dentry = ei->dentry; - dget(dentry); - mutex_unlock(&eventfs_mutex); - return dentry; - } - mutex_unlock(&eventfs_mutex); - - dentry = create_dir(ei, parent); - - mutex_lock(&eventfs_mutex); - - if (IS_ERR_OR_NULL(dentry) && !ei->is_freed) { - /* - * When the mutex was released, something else could have - * created the dentry for this e_dentry. In which case - * use that one. - * - * If ei->is_freed is set, the e_dentry is currently on its - * way to being freed. - */ - dentry = ei->dentry; - if (dentry) - dget(dentry); - mutex_unlock(&eventfs_mutex); - return dentry; - } - - if (!ei->dentry && !ei->is_freed) { - ei->dentry = dentry; - eventfs_post_create_dir(ei); - dentry->d_fsdata = ei; - } else { - /* - * Should never happen unless we get here due to being freed. - * Otherwise it means two dentries exist with the same name. - */ - WARN_ON_ONCE(!ei->is_freed); - dentry = NULL; - } - mutex_unlock(&eventfs_mutex); + *e_dentry = dentry; // Remove me
return dentry; } @@ -607,79 +462,49 @@ static struct dentry *eventfs_root_lookup(struct inode *dir, struct dentry *dentry, unsigned int flags) { - const struct file_operations *fops; - const struct eventfs_entry *entry; struct eventfs_inode *ei_child; struct tracefs_inode *ti; struct eventfs_inode *ei; - struct dentry *ei_dentry = NULL; - struct dentry *ret = NULL; - struct dentry *d; const char *name = dentry->d_name.name; - umode_t mode; - void *data; - int idx; - int i; - int r;
ti = get_tracefs(dir); if (!(ti->flags & TRACEFS_EVENT_INODE)) - return NULL; - - /* Grab srcu to prevent the ei from going away */ - idx = srcu_read_lock(&eventfs_srcu); + return ERR_PTR(-EIO);
- /* - * Grab the eventfs_mutex to consistent value from ti->private. - * This s - */ mutex_lock(&eventfs_mutex); - ei = READ_ONCE(ti->private); - if (ei && !ei->is_freed) - ei_dentry = READ_ONCE(ei->dentry); - mutex_unlock(&eventfs_mutex);
- if (!ei || !ei_dentry) + ei = ti->private; + if (!ei || ei->is_freed) goto out;
- data = ei->data; - - list_for_each_entry_srcu(ei_child, &ei->children, list, - srcu_read_lock_held(&eventfs_srcu)) { + list_for_each_entry(ei_child, &ei->children, list) { if (strcmp(ei_child->name, name) != 0) continue; - ret = simple_lookup(dir, dentry, flags); - if (IS_ERR(ret)) + if (ei_child->is_freed) goto out; - d = create_dir_dentry(ei, ei_child, ei_dentry); - dput(d); + lookup_dir_entry(dentry, ei, ei_child); goto out; }
- for (i = 0; i < ei->nr_entries; i++) { - entry = &ei->entries[i]; - if (strcmp(name, entry->name) == 0) { - void *cdata = data; - mutex_lock(&eventfs_mutex); - /* If ei->is_freed, then the event itself may be too */ - if (!ei->is_freed) - r = entry->callback(name, &mode, &cdata, &fops); - else - r = -1; - mutex_unlock(&eventfs_mutex); - if (r <= 0) - continue; - ret = simple_lookup(dir, dentry, flags); - if (IS_ERR(ret)) - goto out; - d = create_file_dentry(ei, i, ei_dentry, name, mode, cdata, fops); - dput(d); - break; - } + for (int i = 0; i < ei->nr_entries; i++) { + void *data; + umode_t mode; + const struct file_operations *fops; + const struct eventfs_entry *entry = &ei->entries[i]; + + if (strcmp(name, entry->name) != 0) + continue; + + data = ei->data; + if (entry->callback(name, &mode, &data, &fops) <= 0) + goto out; + + lookup_file_dentry(dentry, ei, i, mode, data, fops); + goto out; } out: - srcu_read_unlock(&eventfs_srcu, idx); - return ret; + mutex_unlock(&eventfs_mutex); + return NULL; }
/* diff --git a/fs/tracefs/inode.c b/fs/tracefs/inode.c index 888e42087847..5c84460feeeb 100644 --- a/fs/tracefs/inode.c +++ b/fs/tracefs/inode.c @@ -495,75 +495,6 @@ struct dentry *tracefs_end_creating(struct dentry *dentry) return dentry; }
-/** - * eventfs_start_creating - start the process of creating a dentry - * @name: Name of the file created for the dentry - * @parent: The parent dentry where this dentry will be created - * - * This is a simple helper function for the dynamically created eventfs - * files. When the directory of the eventfs files are accessed, their - * dentries are created on the fly. This function is used to start that - * process. - */ -struct dentry *eventfs_start_creating(const char *name, struct dentry *parent) -{ - struct dentry *dentry; - int error; - - /* Must always have a parent. */ - if (WARN_ON_ONCE(!parent)) - return ERR_PTR(-EINVAL); - - error = simple_pin_fs(&trace_fs_type, &tracefs_mount, - &tracefs_mount_count); - if (error) - return ERR_PTR(error); - - if (unlikely(IS_DEADDIR(parent->d_inode))) - dentry = ERR_PTR(-ENOENT); - else - dentry = lookup_one_len(name, parent, strlen(name)); - - if (!IS_ERR(dentry) && dentry->d_inode) { - dput(dentry); - dentry = ERR_PTR(-EEXIST); - } - - if (IS_ERR(dentry)) - simple_release_fs(&tracefs_mount, &tracefs_mount_count); - - return dentry; -} - -/** - * eventfs_failed_creating - clean up a failed eventfs dentry creation - * @dentry: The dentry to clean up - * - * If after calling eventfs_start_creating(), a failure is detected, the - * resources created by eventfs_start_creating() needs to be cleaned up. In - * that case, this function should be called to perform that clean up. - */ -struct dentry *eventfs_failed_creating(struct dentry *dentry) -{ - dput(dentry); - simple_release_fs(&tracefs_mount, &tracefs_mount_count); - return NULL; -} - -/** - * eventfs_end_creating - Finish the process of creating a eventfs dentry - * @dentry: The dentry that has successfully been created. - * - * This function is currently just a place holder to match - * eventfs_start_creating(). In case any synchronization needs to be added, - * this function will be used to implement that without having to modify - * the callers of eventfs_start_creating(). - */ -struct dentry *eventfs_end_creating(struct dentry *dentry) -{ - return dentry; -} - /* Find the inode that this will use for default */ static struct inode *instance_inode(struct dentry *parent, struct inode *inode) { diff --git a/fs/tracefs/internal.h b/fs/tracefs/internal.h index 7d84349ade87..09037e2c173d 100644 --- a/fs/tracefs/internal.h +++ b/fs/tracefs/internal.h @@ -80,9 +80,6 @@ struct dentry *tracefs_start_creating(const char *name, struct dentry *parent); struct dentry *tracefs_end_creating(struct dentry *dentry); struct dentry *tracefs_failed_creating(struct dentry *dentry); struct inode *tracefs_get_inode(struct super_block *sb); -struct dentry *eventfs_start_creating(const char *name, struct dentry *parent); -struct dentry *eventfs_failed_creating(struct dentry *dentry); -struct dentry *eventfs_end_creating(struct dentry *dentry); void eventfs_set_ei_status_free(struct tracefs_inode *ti, struct dentry *dentry);
#endif /* _TRACEFS_INTERNAL_H */
From: Linus Torvalds torvalds@linux-foundation.org
It's never used
Link: https://lore.kernel.org/linux-trace-kernel/202401291043.e62e89dc-oliver.sang... Link: https://lore.kernel.org/linux-trace-kernel/20240131185512.961772428@goodmis....
Cc: stable@vger.kernel.org Cc: Masami Hiramatsu mhiramat@kernel.org Cc: Mark Rutland mark.rutland@arm.com Cc: Mathieu Desnoyers mathieu.desnoyers@efficios.com Cc: Christian Brauner brauner@kernel.org Cc: Al Viro viro@ZenIV.linux.org.uk Cc: Ajay Kaher ajay.kaher@broadcom.com Cc: Greg Kroah-Hartman gregkh@linuxfoundation.org Fixes: c1504e510238 ("eventfs: Implement eventfs dir creation functions") Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Steven Rostedt (Google) rostedt@goodmis.org --- fs/tracefs/event_inode.c | 4 +--- fs/tracefs/internal.h | 2 -- 2 files changed, 1 insertion(+), 5 deletions(-)
diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c index 04c2ab90f93e..16ca8d9759b1 100644 --- a/fs/tracefs/event_inode.c +++ b/fs/tracefs/event_inode.c @@ -680,10 +680,8 @@ struct eventfs_inode *eventfs_create_dir(const char *name, struct eventfs_inode INIT_LIST_HEAD(&ei->list);
mutex_lock(&eventfs_mutex); - if (!parent->is_freed) { + if (!parent->is_freed) list_add_tail(&ei->list, &parent->children); - ei->d_parent = parent->dentry; - } mutex_unlock(&eventfs_mutex);
/* Was the parent freed? */ diff --git a/fs/tracefs/internal.h b/fs/tracefs/internal.h index 09037e2c173d..932733a2696a 100644 --- a/fs/tracefs/internal.h +++ b/fs/tracefs/internal.h @@ -36,7 +36,6 @@ struct eventfs_attr { * @name: the name of the directory to create * @children: link list into the child eventfs_inode * @dentry: the dentry of the directory - * @d_parent: pointer to the parent's dentry * @d_children: The array of dentries to represent the files when created * @entry_attrs: Saved mode and ownership of the @d_children * @attr: Saved mode and ownership of eventfs_inode itself @@ -51,7 +50,6 @@ struct eventfs_inode { const char *name; struct list_head children; struct dentry *dentry; /* Check is_freed to access */ - struct dentry *d_parent; struct dentry **d_children; struct eventfs_attr *entry_attrs; struct eventfs_attr attr;
From: Linus Torvalds torvalds@linux-foundation.org
In order for the dentries to stay up-to-date with the eventfs changes, just add a 'd_revalidate' function that checks the 'is_freed' bit.
Also, clean up the dentry release to actually use d_release() rather than the slightly odd d_iput() function. We don't care about the inode, all we want to do is to get rid of the refcount to the eventfs data added by dentry->d_fsdata.
It would probably be cleaner to make eventfs its own filesystem, or at least set its own dentry ops when looking up eventfs files. But as it is, only eventfs dentries use d_fsdata, so we don't really need to split these things up by use.
Another thing that might be worth doing is to make all eventfs lookups mark their dentries as not worth caching. We could do that with d_delete(), but the DCACHE_DONTCACHE flag would likely be even better.
As it is, the dentries are all freeable, but they only tend to get freed at memory pressure rather than more proactively. But that's a separate issue.
Link: https://lore.kernel.org/linux-trace-kernel/202401291043.e62e89dc-oliver.sang... Link: https://lore.kernel.org/linux-trace-kernel/20240131185513.124644253@goodmis....
Cc: stable@vger.kernel.org Cc: Masami Hiramatsu mhiramat@kernel.org Cc: Mark Rutland mark.rutland@arm.com Cc: Mathieu Desnoyers mathieu.desnoyers@efficios.com Cc: Christian Brauner brauner@kernel.org Cc: Al Viro viro@ZenIV.linux.org.uk Cc: Ajay Kaher ajay.kaher@broadcom.com Cc: Greg Kroah-Hartman gregkh@linuxfoundation.org Fixes: c1504e510238 ("eventfs: Implement eventfs dir creation functions") Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Steven Rostedt (Google) rostedt@goodmis.org --- fs/tracefs/event_inode.c | 5 ++--- fs/tracefs/inode.c | 27 ++++++++++++++++++--------- fs/tracefs/internal.h | 3 ++- 3 files changed, 22 insertions(+), 13 deletions(-)
diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c index 16ca8d9759b1..b2285d5f3fed 100644 --- a/fs/tracefs/event_inode.c +++ b/fs/tracefs/event_inode.c @@ -378,13 +378,12 @@ static void free_ei(struct eventfs_inode *ei) }
/** - * eventfs_set_ei_status_free - remove the dentry reference from an eventfs_inode - * @ti: the tracefs_inode of the dentry + * eventfs_d_release - dentry is going away * @dentry: dentry which has the reference to remove. * * Remove the association between a dentry from an eventfs_inode. */ -void eventfs_set_ei_status_free(struct tracefs_inode *ti, struct dentry *dentry) +void eventfs_d_release(struct dentry *dentry) { struct eventfs_inode *ei; int i; diff --git a/fs/tracefs/inode.c b/fs/tracefs/inode.c index 5c84460feeeb..d65ffad4c327 100644 --- a/fs/tracefs/inode.c +++ b/fs/tracefs/inode.c @@ -377,21 +377,30 @@ static const struct super_operations tracefs_super_operations = { .show_options = tracefs_show_options, };
-static void tracefs_dentry_iput(struct dentry *dentry, struct inode *inode) +/* + * It would be cleaner if eventfs had its own dentry ops. + * + * Note that d_revalidate is called potentially under RCU, + * so it can't take the eventfs mutex etc. It's fine - if + * we open a file just as it's marked dead, things will + * still work just fine, and just see the old stale case. + */ +static void tracefs_d_release(struct dentry *dentry) { - struct tracefs_inode *ti; + if (dentry->d_fsdata) + eventfs_d_release(dentry); +}
- if (!dentry || !inode) - return; +static int tracefs_d_revalidate(struct dentry *dentry, unsigned int flags) +{ + struct eventfs_inode *ei = dentry->d_fsdata;
- ti = get_tracefs(inode); - if (ti && ti->flags & TRACEFS_EVENT_INODE) - eventfs_set_ei_status_free(ti, dentry); - iput(inode); + return !(ei && ei->is_freed); }
static const struct dentry_operations tracefs_dentry_operations = { - .d_iput = tracefs_dentry_iput, + .d_revalidate = tracefs_d_revalidate, + .d_release = tracefs_d_release, };
static int trace_fill_super(struct super_block *sb, void *data, int silent) diff --git a/fs/tracefs/internal.h b/fs/tracefs/internal.h index 932733a2696a..4b50a0668055 100644 --- a/fs/tracefs/internal.h +++ b/fs/tracefs/internal.h @@ -78,6 +78,7 @@ struct dentry *tracefs_start_creating(const char *name, struct dentry *parent); struct dentry *tracefs_end_creating(struct dentry *dentry); struct dentry *tracefs_failed_creating(struct dentry *dentry); struct inode *tracefs_get_inode(struct super_block *sb); -void eventfs_set_ei_status_free(struct tracefs_inode *ti, struct dentry *dentry); + +void eventfs_d_release(struct dentry *dentry);
#endif /* _TRACEFS_INTERNAL_H */
From: Linus Torvalds torvalds@linux-foundation.org
The eventfs inode had pointers to dentries (and child dentries) without actually holding a refcount on said pointer. That is fundamentally broken, and while eventfs tried to then maintain coherence with dentries going away by hooking into the '.d_iput' callback, that doesn't actually work since it's not ordered wrt lookups.
There were two reasonms why eventfs tried to keep a pointer to a dentry:
- the creation of a 'events' directory would actually have a stable dentry pointer that it created with tracefs_start_creating().
And it needed that dentry when tearing it all down again in eventfs_remove_events_dir().
This use is actually ok, because the special top-level events directory dentries are actually stable, not just a temporary cache of the eventfs data structures.
- the 'eventfs_inode' (aka ei) needs to stay around as long as there are dentries that refer to it.
It then used these dentry pointers as a replacement for doing reference counting: it would try to make sure that there was only ever one dentry associated with an event_inode, and keep a child dentry array around to see which dentries might still refer to the parent ei.
This gets rid of the invalid dentry pointer use, and renames the one valid case to a different name to make it clear that it's not just any random dentry.
The magic child dentry array that is kind of a "reverse reference list" is simply replaced by having child dentries take a ref to the ei. As does the directory dentries. That makes the broken use case go away.
Link: https://lore.kernel.org/linux-trace-kernel/202401291043.e62e89dc-oliver.sang... Link: https://lore.kernel.org/linux-trace-kernel/20240131185513.280463000@goodmis....
Cc: stable@vger.kernel.org Cc: Masami Hiramatsu mhiramat@kernel.org Cc: Mark Rutland mark.rutland@arm.com Cc: Mathieu Desnoyers mathieu.desnoyers@efficios.com Cc: Christian Brauner brauner@kernel.org Cc: Al Viro viro@ZenIV.linux.org.uk Cc: Ajay Kaher ajay.kaher@broadcom.com Cc: Greg Kroah-Hartman gregkh@linuxfoundation.org Fixes: c1504e510238 ("eventfs: Implement eventfs dir creation functions") Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Steven Rostedt (Google) rostedt@goodmis.org --- fs/tracefs/event_inode.c | 248 ++++++++++++--------------------------- fs/tracefs/internal.h | 7 +- 2 files changed, 78 insertions(+), 177 deletions(-)
diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c index b2285d5f3fed..515fdace1eea 100644 --- a/fs/tracefs/event_inode.c +++ b/fs/tracefs/event_inode.c @@ -62,6 +62,35 @@ enum {
#define EVENTFS_MODE_MASK (EVENTFS_SAVE_MODE - 1)
+/* + * eventfs_inode reference count management. + * + * NOTE! We count only references from dentries, in the + * form 'dentry->d_fsdata'. There are also references from + * directory inodes ('ti->private'), but the dentry reference + * count is always a superset of the inode reference count. + */ +static void release_ei(struct kref *ref) +{ + struct eventfs_inode *ei = container_of(ref, struct eventfs_inode, kref); + kfree(ei->entry_attrs); + kfree_const(ei->name); + kfree_rcu(ei, rcu); +} + +static inline void put_ei(struct eventfs_inode *ei) +{ + if (ei) + kref_put(&ei->kref, release_ei); +} + +static inline struct eventfs_inode *get_ei(struct eventfs_inode *ei) +{ + if (ei) + kref_get(&ei->kref); + return ei; +} + static struct dentry *eventfs_root_lookup(struct inode *dir, struct dentry *dentry, unsigned int flags); @@ -289,7 +318,8 @@ static void update_inode_attr(struct dentry *dentry, struct inode *inode, * directory. The inode.i_private pointer will point to @data in the open() * call. */ -static struct dentry *lookup_file(struct dentry *dentry, +static struct dentry *lookup_file(struct eventfs_inode *parent_ei, + struct dentry *dentry, umode_t mode, struct eventfs_attr *attr, void *data, @@ -302,7 +332,7 @@ static struct dentry *lookup_file(struct dentry *dentry, mode |= S_IFREG;
if (WARN_ON_ONCE(!S_ISREG(mode))) - return NULL; + return ERR_PTR(-EIO);
inode = tracefs_get_inode(dentry->d_sb); if (unlikely(!inode)) @@ -321,9 +351,12 @@ static struct dentry *lookup_file(struct dentry *dentry, ti = get_tracefs(inode); ti->flags |= TRACEFS_EVENT_INODE;
+ // Files have their parent's ei as their fsdata + dentry->d_fsdata = get_ei(parent_ei); + d_add(dentry, inode); fsnotify_create(dentry->d_parent->d_inode, dentry); - return dentry; + return NULL; };
/** @@ -359,22 +392,29 @@ static struct dentry *lookup_dir_entry(struct dentry *dentry, /* Only directories have ti->private set to an ei, not files */ ti->private = ei;
- dentry->d_fsdata = ei; - ei->dentry = dentry; // Remove me! + dentry->d_fsdata = get_ei(ei);
inc_nlink(inode); d_add(dentry, inode); inc_nlink(dentry->d_parent->d_inode); fsnotify_mkdir(dentry->d_parent->d_inode, dentry); - return dentry; + return NULL; }
-static void free_ei(struct eventfs_inode *ei) +static inline struct eventfs_inode *alloc_ei(const char *name) { - kfree_const(ei->name); - kfree(ei->d_children); - kfree(ei->entry_attrs); - kfree(ei); + struct eventfs_inode *ei = kzalloc(sizeof(*ei), GFP_KERNEL); + + if (!ei) + return NULL; + + ei->name = kstrdup_const(name, GFP_KERNEL); + if (!ei->name) { + kfree(ei); + return NULL; + } + kref_init(&ei->kref); + return ei; }
/** @@ -385,39 +425,13 @@ static void free_ei(struct eventfs_inode *ei) */ void eventfs_d_release(struct dentry *dentry) { - struct eventfs_inode *ei; - int i; - - mutex_lock(&eventfs_mutex); - - ei = dentry->d_fsdata; - if (!ei) - goto out; - - /* This could belong to one of the files of the ei */ - if (ei->dentry != dentry) { - for (i = 0; i < ei->nr_entries; i++) { - if (ei->d_children[i] == dentry) - break; - } - if (WARN_ON_ONCE(i == ei->nr_entries)) - goto out; - ei->d_children[i] = NULL; - } else if (ei->is_freed) { - free_ei(ei); - } else { - ei->dentry = NULL; - } - - dentry->d_fsdata = NULL; - out: - mutex_unlock(&eventfs_mutex); + put_ei(dentry->d_fsdata); }
/** * lookup_file_dentry - create a dentry for a file of an eventfs_inode * @ei: the eventfs_inode that the file will be created under - * @idx: the index into the d_children[] of the @ei + * @idx: the index into the entry_attrs[] of the @ei * @parent: The parent dentry of the created file. * @name: The name of the file to create * @mode: The mode of the file. @@ -434,17 +448,11 @@ lookup_file_dentry(struct dentry *dentry, const struct file_operations *fops) { struct eventfs_attr *attr = NULL; - struct dentry **e_dentry = &ei->d_children[idx];
if (ei->entry_attrs) attr = &ei->entry_attrs[idx];
- dentry->d_fsdata = ei; // NOTE: ei of _parent_ - lookup_file(dentry, mode, attr, data, fops); - - *e_dentry = dentry; // Remove me - - return dentry; + return lookup_file(ei, dentry, mode, attr, data, fops); }
/** @@ -465,6 +473,7 @@ static struct dentry *eventfs_root_lookup(struct inode *dir, struct tracefs_inode *ti; struct eventfs_inode *ei; const char *name = dentry->d_name.name; + struct dentry *result = NULL;
ti = get_tracefs(dir); if (!(ti->flags & TRACEFS_EVENT_INODE)) @@ -481,7 +490,7 @@ static struct dentry *eventfs_root_lookup(struct inode *dir, continue; if (ei_child->is_freed) goto out; - lookup_dir_entry(dentry, ei, ei_child); + result = lookup_dir_entry(dentry, ei, ei_child); goto out; }
@@ -498,12 +507,12 @@ static struct dentry *eventfs_root_lookup(struct inode *dir, if (entry->callback(name, &mode, &data, &fops) <= 0) goto out;
- lookup_file_dentry(dentry, ei, i, mode, data, fops); + result = lookup_file_dentry(dentry, ei, i, mode, data, fops); goto out; } out: mutex_unlock(&eventfs_mutex); - return NULL; + return result; }
/* @@ -653,25 +662,10 @@ struct eventfs_inode *eventfs_create_dir(const char *name, struct eventfs_inode if (!parent) return ERR_PTR(-EINVAL);
- ei = kzalloc(sizeof(*ei), GFP_KERNEL); + ei = alloc_ei(name); if (!ei) return ERR_PTR(-ENOMEM);
- ei->name = kstrdup_const(name, GFP_KERNEL); - if (!ei->name) { - kfree(ei); - return ERR_PTR(-ENOMEM); - } - - if (size) { - ei->d_children = kcalloc(size, sizeof(*ei->d_children), GFP_KERNEL); - if (!ei->d_children) { - kfree_const(ei->name); - kfree(ei); - return ERR_PTR(-ENOMEM); - } - } - ei->entries = entries; ei->nr_entries = size; ei->data = data; @@ -685,7 +679,7 @@ struct eventfs_inode *eventfs_create_dir(const char *name, struct eventfs_inode
/* Was the parent freed? */ if (list_empty(&ei->list)) { - free_ei(ei); + put_ei(ei); ei = NULL; } return ei; @@ -720,28 +714,20 @@ struct eventfs_inode *eventfs_create_events_dir(const char *name, struct dentry if (IS_ERR(dentry)) return ERR_CAST(dentry);
- ei = kzalloc(sizeof(*ei), GFP_KERNEL); + ei = alloc_ei(name); if (!ei) - goto fail_ei; + goto fail;
inode = tracefs_get_inode(dentry->d_sb); if (unlikely(!inode)) goto fail;
- if (size) { - ei->d_children = kcalloc(size, sizeof(*ei->d_children), GFP_KERNEL); - if (!ei->d_children) - goto fail; - } - - ei->dentry = dentry; + // Note: we have a ref to the dentry from tracefs_start_creating() + ei->events_dir = dentry; ei->entries = entries; ei->nr_entries = size; ei->is_events = 1; ei->data = data; - ei->name = kstrdup_const(name, GFP_KERNEL); - if (!ei->name) - goto fail;
/* Save the ownership of this directory */ uid = d_inode(dentry->d_parent)->i_uid; @@ -772,7 +758,7 @@ struct eventfs_inode *eventfs_create_events_dir(const char *name, struct dentry inode->i_op = &eventfs_root_dir_inode_operations; inode->i_fop = &eventfs_file_operations;
- dentry->d_fsdata = ei; + dentry->d_fsdata = get_ei(ei);
/* directory inodes start off with i_nlink == 2 (for "." entry) */ inc_nlink(inode); @@ -784,72 +770,11 @@ struct eventfs_inode *eventfs_create_events_dir(const char *name, struct dentry return ei;
fail: - kfree(ei->d_children); - kfree(ei); - fail_ei: + put_ei(ei); tracefs_failed_creating(dentry); return ERR_PTR(-ENOMEM); }
-static LLIST_HEAD(free_list); - -static void eventfs_workfn(struct work_struct *work) -{ - struct eventfs_inode *ei, *tmp; - struct llist_node *llnode; - - llnode = llist_del_all(&free_list); - llist_for_each_entry_safe(ei, tmp, llnode, llist) { - /* This dput() matches the dget() from unhook_dentry() */ - for (int i = 0; i < ei->nr_entries; i++) { - if (ei->d_children[i]) - dput(ei->d_children[i]); - } - /* This should only get here if it had a dentry */ - if (!WARN_ON_ONCE(!ei->dentry)) - dput(ei->dentry); - } -} - -static DECLARE_WORK(eventfs_work, eventfs_workfn); - -static void free_rcu_ei(struct rcu_head *head) -{ - struct eventfs_inode *ei = container_of(head, struct eventfs_inode, rcu); - - if (ei->dentry) { - /* Do not free the ei until all references of dentry are gone */ - if (llist_add(&ei->llist, &free_list)) - queue_work(system_unbound_wq, &eventfs_work); - return; - } - - /* If the ei doesn't have a dentry, neither should its children */ - for (int i = 0; i < ei->nr_entries; i++) { - WARN_ON_ONCE(ei->d_children[i]); - } - - free_ei(ei); -} - -static void unhook_dentry(struct dentry *dentry) -{ - if (!dentry) - return; - /* - * Need to add a reference to the dentry that is expected by - * simple_recursive_removal(), which will include a dput(). - */ - dget(dentry); - - /* - * Also add a reference for the dput() in eventfs_workfn(). - * That is required as that dput() will free the ei after - * the SRCU grace period is over. - */ - dget(dentry); -} - /** * eventfs_remove_rec - remove eventfs dir or file from list * @ei: eventfs_inode to be removed. @@ -862,8 +787,6 @@ static void eventfs_remove_rec(struct eventfs_inode *ei, int level) { struct eventfs_inode *ei_child;
- if (!ei) - return; /* * Check recursion depth. It should never be greater than 3: * 0 - events/ @@ -875,28 +798,12 @@ static void eventfs_remove_rec(struct eventfs_inode *ei, int level) return;
/* search for nested folders or files */ - list_for_each_entry_srcu(ei_child, &ei->children, list, - lockdep_is_held(&eventfs_mutex)) { - /* Children only have dentry if parent does */ - WARN_ON_ONCE(ei_child->dentry && !ei->dentry); + list_for_each_entry(ei_child, &ei->children, list) eventfs_remove_rec(ei_child, level + 1); - } -
ei->is_freed = 1; - - for (int i = 0; i < ei->nr_entries; i++) { - if (ei->d_children[i]) { - /* Children only have dentry if parent does */ - WARN_ON_ONCE(!ei->dentry); - unhook_dentry(ei->d_children[i]); - } - } - - unhook_dentry(ei->dentry); - - list_del_rcu(&ei->list); - call_srcu(&eventfs_srcu, &ei->rcu, free_rcu_ei); + list_del(&ei->list); + put_ei(ei); }
/** @@ -907,22 +814,12 @@ static void eventfs_remove_rec(struct eventfs_inode *ei, int level) */ void eventfs_remove_dir(struct eventfs_inode *ei) { - struct dentry *dentry; - if (!ei) return;
mutex_lock(&eventfs_mutex); - dentry = ei->dentry; eventfs_remove_rec(ei, 0); mutex_unlock(&eventfs_mutex); - - /* - * If any of the ei children has a dentry, then the ei itself - * must have a dentry. - */ - if (dentry) - simple_recursive_removal(dentry, NULL); }
/** @@ -935,7 +832,11 @@ void eventfs_remove_events_dir(struct eventfs_inode *ei) { struct dentry *dentry;
- dentry = ei->dentry; + dentry = ei->events_dir; + if (!dentry) + return; + + ei->events_dir = NULL; eventfs_remove_dir(ei);
/* @@ -945,5 +846,6 @@ void eventfs_remove_events_dir(struct eventfs_inode *ei) * sticks around while the other ei->dentry are created * and destroyed dynamically. */ + d_invalidate(dentry); dput(dentry); } diff --git a/fs/tracefs/internal.h b/fs/tracefs/internal.h index 4b50a0668055..1886f1826cd8 100644 --- a/fs/tracefs/internal.h +++ b/fs/tracefs/internal.h @@ -35,8 +35,7 @@ struct eventfs_attr { * @entries: the array of entries representing the files in the directory * @name: the name of the directory to create * @children: link list into the child eventfs_inode - * @dentry: the dentry of the directory - * @d_children: The array of dentries to represent the files when created + * @events_dir: the dentry of the events directory * @entry_attrs: Saved mode and ownership of the @d_children * @attr: Saved mode and ownership of eventfs_inode itself * @data: The private data to pass to the callbacks @@ -45,12 +44,12 @@ struct eventfs_attr { * @nr_entries: The number of items in @entries */ struct eventfs_inode { + struct kref kref; struct list_head list; const struct eventfs_entry *entries; const char *name; struct list_head children; - struct dentry *dentry; /* Check is_freed to access */ - struct dentry **d_children; + struct dentry *events_dir; struct eventfs_attr *entry_attrs; struct eventfs_attr attr; void *data;
From: "Steven Rostedt (Google)" rostedt@goodmis.org
There should never be a case where an evenfs_inode is being freed without is_freed being set. Add a WARN_ON_ONCE() if it ever happens. That would mean there was one too many put_ei()s.
Link: https://lore.kernel.org/linux-trace-kernel/20240201161616.843551963@goodmis....
Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Masami Hiramatsu mhiramat@kernel.org Cc: Mark Rutland mark.rutland@arm.com Cc: Mathieu Desnoyers mathieu.desnoyers@efficios.com Cc: Christian Brauner brauner@kernel.org Cc: Al Viro viro@ZenIV.linux.org.uk Cc: Ajay Kaher ajay.kaher@broadcom.com Signed-off-by: Steven Rostedt (Google) rostedt@goodmis.org --- fs/tracefs/event_inode.c | 18 ++++++++++++++---- 1 file changed, 14 insertions(+), 4 deletions(-)
diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c index 515fdace1eea..ca7daee7c811 100644 --- a/fs/tracefs/event_inode.c +++ b/fs/tracefs/event_inode.c @@ -73,6 +73,9 @@ enum { static void release_ei(struct kref *ref) { struct eventfs_inode *ei = container_of(ref, struct eventfs_inode, kref); + + WARN_ON_ONCE(!ei->is_freed); + kfree(ei->entry_attrs); kfree_const(ei->name); kfree_rcu(ei, rcu); @@ -84,6 +87,14 @@ static inline void put_ei(struct eventfs_inode *ei) kref_put(&ei->kref, release_ei); }
+static inline void free_ei(struct eventfs_inode *ei) +{ + if (ei) { + ei->is_freed = 1; + put_ei(ei); + } +} + static inline struct eventfs_inode *get_ei(struct eventfs_inode *ei) { if (ei) @@ -679,7 +690,7 @@ struct eventfs_inode *eventfs_create_dir(const char *name, struct eventfs_inode
/* Was the parent freed? */ if (list_empty(&ei->list)) { - put_ei(ei); + free_ei(ei); ei = NULL; } return ei; @@ -770,7 +781,7 @@ struct eventfs_inode *eventfs_create_events_dir(const char *name, struct dentry return ei;
fail: - put_ei(ei); + free_ei(ei); tracefs_failed_creating(dentry); return ERR_PTR(-ENOMEM); } @@ -801,9 +812,8 @@ static void eventfs_remove_rec(struct eventfs_inode *ei, int level) list_for_each_entry(ei_child, &ei->children, list) eventfs_remove_rec(ei_child, level + 1);
- ei->is_freed = 1; list_del(&ei->list); - put_ei(ei); + free_ei(ei); }
/**
From: "Steven Rostedt (Google)" rostedt@goodmis.org
Some of the eventfs_inode structure has holes in it. Rework the structure to be a bit more condensed, and also remove the no longer used llist field.
Link: https://lore.kernel.org/linux-trace-kernel/20240201161617.002321438@goodmis....
Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Masami Hiramatsu mhiramat@kernel.org Cc: Mark Rutland mark.rutland@arm.com Cc: Mathieu Desnoyers mathieu.desnoyers@efficios.com Cc: Christian Brauner brauner@kernel.org Cc: Al Viro viro@ZenIV.linux.org.uk Cc: Ajay Kaher ajay.kaher@broadcom.com Signed-off-by: Steven Rostedt (Google) rostedt@goodmis.org --- fs/tracefs/internal.h | 27 ++++++++++++--------------- 1 file changed, 12 insertions(+), 15 deletions(-)
diff --git a/fs/tracefs/internal.h b/fs/tracefs/internal.h index 1886f1826cd8..beb3dcd0e434 100644 --- a/fs/tracefs/internal.h +++ b/fs/tracefs/internal.h @@ -32,40 +32,37 @@ struct eventfs_attr { /* * struct eventfs_inode - hold the properties of the eventfs directories. * @list: link list into the parent directory + * @rcu: Union with @list for freeing + * @children: link list into the child eventfs_inode * @entries: the array of entries representing the files in the directory * @name: the name of the directory to create - * @children: link list into the child eventfs_inode * @events_dir: the dentry of the events directory * @entry_attrs: Saved mode and ownership of the @d_children - * @attr: Saved mode and ownership of eventfs_inode itself * @data: The private data to pass to the callbacks + * @attr: Saved mode and ownership of eventfs_inode itself * @is_freed: Flag set if the eventfs is on its way to be freed * Note if is_freed is set, then dentry is corrupted. + * @is_events: Flag set for only the top level "events" directory * @nr_entries: The number of items in @entries + * @ino: The saved inode number */ struct eventfs_inode { - struct kref kref; - struct list_head list; + union { + struct list_head list; + struct rcu_head rcu; + }; + struct list_head children; const struct eventfs_entry *entries; const char *name; - struct list_head children; struct dentry *events_dir; struct eventfs_attr *entry_attrs; - struct eventfs_attr attr; void *data; + struct eventfs_attr attr; + struct kref kref; unsigned int is_freed:1; unsigned int is_events:1; unsigned int nr_entries:30; unsigned int ino; - /* - * Union - used for deletion - * @llist: for calling dput() if needed after RCU - * @rcu: eventfs_inode to delete in RCU - */ - union { - struct llist_node llist; - struct rcu_head rcu; - }; };
static inline struct tracefs_inode *get_tracefs(const struct inode *inode)
From: "Steven Rostedt (Google)" rostedt@goodmis.org
The dentries and inodes are created when referenced in the lookup code. There's no reason to call fsnotify_*() functions when they are created by a reference. It doesn't make any sense.
Link: https://lore.kernel.org/linux-trace-kernel/20240201002719.GS2087318@ZenIV/ Link: https://lore.kernel.org/linux-trace-kernel/20240201161617.166973329@goodmis....
Cc: stable@vger.kernel.org Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Masami Hiramatsu mhiramat@kernel.org Cc: Mark Rutland mark.rutland@arm.com Cc: Mathieu Desnoyers mathieu.desnoyers@efficios.com Cc: Christian Brauner brauner@kernel.org Cc: Al Viro viro@ZenIV.linux.org.uk Cc: Ajay Kaher ajay.kaher@broadcom.com Fixes: a376007917776 ("eventfs: Implement functions to create files and dirs when accessed"); Suggested-by: Al Viro viro@zeniv.linux.org.uk Signed-off-by: Steven Rostedt (Google) rostedt@goodmis.org --- fs/tracefs/event_inode.c | 2 -- 1 file changed, 2 deletions(-)
diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c index ca7daee7c811..9e031e5a2713 100644 --- a/fs/tracefs/event_inode.c +++ b/fs/tracefs/event_inode.c @@ -366,7 +366,6 @@ static struct dentry *lookup_file(struct eventfs_inode *parent_ei, dentry->d_fsdata = get_ei(parent_ei);
d_add(dentry, inode); - fsnotify_create(dentry->d_parent->d_inode, dentry); return NULL; };
@@ -408,7 +407,6 @@ static struct dentry *lookup_dir_entry(struct dentry *dentry, inc_nlink(inode); d_add(dentry, inode); inc_nlink(dentry->d_parent->d_inode); - fsnotify_mkdir(dentry->d_parent->d_inode, dentry); return NULL; }
From: "Steven Rostedt (Google)" rostedt@goodmis.org
The directory link count in eventfs was somewhat bogus. It was only being updated when a directory child was being looked up and not on creation.
One solution would be to update in get_attr() the link count by iterating the ei->children list and then adding 2. But that could slow down simple stat() calls, especially if it's done on all directories in eventfs.
Another solution would be to add a parent pointer to the eventfs_inode and keep track of the number of sub directories it has on creation. But this adds overhead for something not really worthwhile.
The solution decided upon is to keep all directory links in eventfs as 1. This tells user space not to rely on the hard links of directories. Which in this case it shouldn't.
Link: https://lore.kernel.org/linux-trace-kernel/20240201002719.GS2087318@ZenIV/ Link: https://lore.kernel.org/linux-trace-kernel/20240201161617.339968298@goodmis....
Cc: stable@vger.kernel.org Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Masami Hiramatsu mhiramat@kernel.org Cc: Mark Rutland mark.rutland@arm.com Cc: Mathieu Desnoyers mathieu.desnoyers@efficios.com Cc: Christian Brauner brauner@kernel.org Cc: Al Viro viro@ZenIV.linux.org.uk Cc: Ajay Kaher ajay.kaher@broadcom.com Fixes: c1504e510238 ("eventfs: Implement eventfs dir creation functions") Suggested-by: Al Viro viro@zeniv.linux.org.uk Signed-off-by: Steven Rostedt (Google) rostedt@goodmis.org --- fs/tracefs/event_inode.c | 14 ++++++++++---- 1 file changed, 10 insertions(+), 4 deletions(-)
diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c index 9e031e5a2713..110e8a272189 100644 --- a/fs/tracefs/event_inode.c +++ b/fs/tracefs/event_inode.c @@ -404,9 +404,7 @@ static struct dentry *lookup_dir_entry(struct dentry *dentry,
dentry->d_fsdata = get_ei(ei);
- inc_nlink(inode); d_add(dentry, inode); - inc_nlink(dentry->d_parent->d_inode); return NULL; }
@@ -769,9 +767,17 @@ struct eventfs_inode *eventfs_create_events_dir(const char *name, struct dentry
dentry->d_fsdata = get_ei(ei);
- /* directory inodes start off with i_nlink == 2 (for "." entry) */ - inc_nlink(inode); + /* + * Keep all eventfs directories with i_nlink == 1. + * Due to the dynamic nature of the dentry creations and not + * wanting to add a pointer to the parent eventfs_inode in the + * eventfs_inode structure, keeping the i_nlink in sync with the + * number of directories would cause too much complexity for + * something not worth much. Keeping directory links at 1 + * tells userspace not to trust the link number. + */ d_instantiate(dentry, inode); + /* The dentry of the "events" parent does keep track though */ inc_nlink(dentry->d_parent->d_inode); fsnotify_mkdir(dentry->d_parent->d_inode, dentry); tracefs_end_creating(dentry);
On Sat, 03 Feb 2024 20:16:15 -0500 Steven Rostedt rostedt@goodmis.org wrote:
This is the update for 6.7. It includes Linus's updates as well as all the patches leading up to them.
Hmm, I just realized I didn't include the sha of the upstream commit. I simply ran the following command against v6.7.3:
git log --reverse --no-merges --pretty=oneline v6.7..origin/master fs/tracefs/ | cut -d' ' -f1 | while read a; do if ! git cherry-pick $a; then break; fi ; done
And I believe I removed one unrelated patch.
I know that cherry-pick has a '-x' option that adds a: "(cherry picked from commit ...)" I can re-run it and make sure it still matches this branch if it makes it easier for you.
-- Steve
On Sat, Feb 03, 2024 at 08:25:03PM -0500, Steven Rostedt wrote:
On Sat, 03 Feb 2024 20:16:15 -0500 Steven Rostedt rostedt@goodmis.org wrote:
This is the update for 6.7. It includes Linus's updates as well as all the patches leading up to them.
Hmm, I just realized I didn't include the sha of the upstream commit. I simply ran the following command against v6.7.3:
git log --reverse --no-merges --pretty=oneline v6.7..origin/master fs/tracefs/ | cut -d' ' -f1 | while read a; do if ! git cherry-pick $a; then break; fi ; done
And I believe I removed one unrelated patch.
I know that cherry-pick has a '-x' option that adds a: "(cherry picked from commit ...)" I can re-run it and make sure it still matches this branch if it makes it easier for you.
Please do, otherwise I have to dig out the sha1 "by hand" from the original commits, which is a huge pain.
thanks,
greg k-h
linux-stable-mirror@lists.linaro.org