Events Tracing infrastructure contains lot of files, directories (internally in terms of inodes, dentries). And ends up by consuming memory in MBs. We can have multiple events of Events Tracing, which further requires more memory.
Instead of creating inodes/dentries, eventfs could keep meta-data and skip the creation of inodes/dentries. As and when require, eventfs will create the inodes/dentries only for required files/directories. Also eventfs would delete the inodes/dentries once no more requires but preserve the meta data.
Tracing events took ~9MB, with this approach it took ~4.5MB for ~10K files/dir.
Diff from v5: Patch 02: removed TRACEFS_EVENT_INODE enum. Patch 04: added TRACEFS_EVENT_INODE enum. Patch 06: removed WARN_ON_ONCE in eventfs_set_ef_status_free() Patch 07: added WARN_ON_ONCE in create_dentry() moved declaration of following to internal.h: eventfs_start_creating() eventfs_failed_creating() eventfs_end_creating() Patch 08: added WARN_ON_ONCE in eventfs_set_ef_status_free()
Diff from v4: Patch 02: moved from v4 08/10 added fs/tracefs/internal.h Patch 03: moved from v4 02/10 removed fs/tracefs/internal.h Patch 04: moved from v4 03/10 moved out changes of fs/tracefs/internal.h Patch 05: moved from v4 04/10 renamed eventfs_add_top_file() -> eventfs_add_events_file() Patch 06: moved from v4 07/10 implemented create_dentry() helper function added create_file(), create_dir() stub function Patch 07: moved from v4 06/10 Patch 08: moved from v4 05/10 improved eventfs remove functionality Patch 09: removed unwanted if conditions Patch 10: added available_filter_functions check
Diff from v3: Patch 3,4,5,7,9: removed all the eventfs_rwsem code and replaced it with an srcu lock for the readers, and a mutex to synchronize the writers of the list. Patch 2: moved 'tracefs_inode' and 'get_tracefs()' to v4 03/10 Patch 3: moved the struct eventfs_file and eventfs_inode into event_inode.c as it really should not be exposed to all users. Patch 5: added a recursion check to eventfs_remove_rec() as it is really dangerous to have unchecked recursion in the kernel (we do have a fixed size stack). have the free use srcu callbacks. After the srcu grace periods are done, it adds the eventfs_file onto a llist (lockless link list) and wakes up a work queue. Then the work queue does the freeing (this needs to be done in task/workqueue context, as srcu callbacks are done in softirq context). Patch 6: renamed: eventfs_create_file() -> create_file() eventfs_create_dir() -> create_dir()
Diff from v2: Patch 01: new patch:'Require all trace events to have a TRACE_SYSTEM' Patch 02: moved from v1 1/9 Patch 03: moved from v1 2/9 As suggested by Zheng Yejian, introduced eventfs_prepare_ef() helper function to add files or directories to eventfs fix WARNING reported by kernel test robot in v1 8/9 Patch 04: moved from v1 3/9 used eventfs_prepare_ef() to add files fix WARNING reported by kernel test robot in v1 8/9 Patch 05: moved from v1 4/9 fix compiling warning reported by kernel test robot in v1 4/9 Patch 06: moved from v1 5/9 Patch 07: moved from v1 6/9 Patch 08: moved from v1 7/9 Patch 09: moved from v1 8/9 rebased because of v3 01/10 Patch 10: moved from v1 9/9
Diff from v1: Patch 1: add header file Patch 2: resolved kernel test robot issues protecting eventfs lists using nested eventfs_rwsem Patch 3: protecting eventfs lists using nested eventfs_rwsem Patch 4: improve events cleanup code to fix crashes Patch 5: resolved kernel test robot issues removed d_instantiate_anon() calls Patch 6: resolved kernel test robot issues fix kprobe test in eventfs_root_lookup() protecting eventfs lists using nested eventfs_rwsem Patch 7: remove header file Patch 8: pass eventfs_rwsem as argument to eventfs functions called eventfs_remove_events_dir() instead of tracefs_remove() from event_trace_del_tracer() Patch 9: new patch to fix kprobe test case
fs/tracefs/Makefile | 1 + fs/tracefs/event_inode.c | 801 ++++++++++++++++++ fs/tracefs/inode.c | 151 +++- fs/tracefs/internal.h | 29 + include/linux/trace_events.h | 1 + include/linux/tracefs.h | 23 + kernel/trace/trace.h | 2 +- kernel/trace/trace_events.c | 76 +- .../ftrace/test.d/kprobe/kprobe_args_char.tc | 9 +- .../test.d/kprobe/kprobe_args_string.tc | 9 +- 10 files changed, 1050 insertions(+), 52 deletions(-) create mode 100644 fs/tracefs/event_inode.c create mode 100644 fs/tracefs/internal.h
From: "Steven Rostedt (Google)" rostedt@goodmis.org
The creation of the trace event directory requires that a TRACE_SYSTEM is defined that the trace event directory is added within the system it was defined in.
The code handled the case where a TRACE_SYSTEM was not added, and would then add the event at the events directory. But nothing should be doing this. This code also prevents the implementation of creating dynamic dentrys for the eventfs system.
As this path has never been hit on correct code, remove it. If it does get hit, issues a WARN_ON_ONCE() and return ENODEV.
Signed-off-by: Steven Rostedt (Google) rostedt@goodmis.org Signed-off-by: Ajay Kaher akaher@vmware.com --- kernel/trace/trace_events.c | 15 ++++++++------- 1 file changed, 8 insertions(+), 7 deletions(-)
diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c index 5d6ae4eae510..a284171d5c74 100644 --- a/kernel/trace/trace_events.c +++ b/kernel/trace/trace_events.c @@ -2426,14 +2426,15 @@ event_create_dir(struct dentry *parent, struct trace_event_file *file)
/* * If the trace point header did not define TRACE_SYSTEM - * then the system would be called "TRACE_SYSTEM". + * then the system would be called "TRACE_SYSTEM". This should + * never happen. */ - if (strcmp(call->class->system, TRACE_SYSTEM) != 0) { - d_events = event_subsystem_dir(tr, call->class->system, file, parent); - if (!d_events) - return -ENOMEM; - } else - d_events = parent; + if (WARN_ON_ONCE(strcmp(call->class->system, TRACE_SYSTEM) == 0)) + return -ENODEV; + + d_events = event_subsystem_dir(tr, call->class->system, file, parent); + if (!d_events) + return -ENOMEM;
name = trace_event_name(call); file->dir = tracefs_create_dir(name, d_events);
Create a kmem cache of tracefs_inodes. To be more efficient, as there are lots of tracefs inodes, create its own cache. This also allows to see how many tracefs inodes have been created.
Add helper functions: tracefs_alloc_inode() tracefs_free_inode() get_tracefs()
Signed-off-by: Ajay Kaher akaher@vmware.com Co-developed-by: Steven Rostedt (VMware) rostedt@goodmis.org Signed-off-by: Steven Rostedt (VMware) rostedt@goodmis.org Tested-by: Ching-lin Yu chinglinyu@google.com --- fs/tracefs/inode.c | 39 +++++++++++++++++++++++++++++++++++++++ fs/tracefs/internal.h | 15 +++++++++++++++ 2 files changed, 54 insertions(+) create mode 100644 fs/tracefs/internal.h
diff --git a/fs/tracefs/inode.c b/fs/tracefs/inode.c index 57ac8aa4a724..2508944cc4d8 100644 --- a/fs/tracefs/inode.c +++ b/fs/tracefs/inode.c @@ -21,13 +21,33 @@ #include <linux/parser.h> #include <linux/magic.h> #include <linux/slab.h> +#include "internal.h"
#define TRACEFS_DEFAULT_MODE 0700 +static struct kmem_cache *tracefs_inode_cachep __ro_after_init;
static struct vfsmount *tracefs_mount; static int tracefs_mount_count; static bool tracefs_registered;
+static struct inode *tracefs_alloc_inode(struct super_block *sb) +{ + struct tracefs_inode *ti; + + ti = kmem_cache_alloc(tracefs_inode_cachep, GFP_KERNEL); + if (!ti) + return NULL; + + ti->flags = 0; + + return &ti->vfs_inode; +} + +static void tracefs_free_inode(struct inode *inode) +{ + kmem_cache_free(tracefs_inode_cachep, get_tracefs(inode)); +} + static ssize_t default_read_file(struct file *file, char __user *buf, size_t count, loff_t *ppos) { @@ -346,6 +366,9 @@ static int tracefs_show_options(struct seq_file *m, struct dentry *root) }
static const struct super_operations tracefs_super_operations = { + .alloc_inode = tracefs_alloc_inode, + .free_inode = tracefs_free_inode, + .drop_inode = generic_delete_inode, .statfs = simple_statfs, .remount_fs = tracefs_remount, .show_options = tracefs_show_options, @@ -628,10 +651,26 @@ bool tracefs_initialized(void) return tracefs_registered; }
+static void init_once(void *foo) +{ + struct tracefs_inode *ti = (struct tracefs_inode *) foo; + + inode_init_once(&ti->vfs_inode); +} + static int __init tracefs_init(void) { int retval;
+ tracefs_inode_cachep = kmem_cache_create("tracefs_inode_cache", + sizeof(struct tracefs_inode), + 0, (SLAB_RECLAIM_ACCOUNT| + SLAB_MEM_SPREAD| + SLAB_ACCOUNT), + init_once); + if (!tracefs_inode_cachep) + return -ENOMEM; + retval = sysfs_create_mount_point(kernel_kobj, "tracing"); if (retval) return -EINVAL; diff --git a/fs/tracefs/internal.h b/fs/tracefs/internal.h new file mode 100644 index 000000000000..954ea005632b --- /dev/null +++ b/fs/tracefs/internal.h @@ -0,0 +1,15 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _TRACEFS_INTERNAL_H +#define _TRACEFS_INTERNAL_H + +struct tracefs_inode { + unsigned long flags; + void *private; + struct inode vfs_inode; +}; + +static inline struct tracefs_inode *get_tracefs(const struct inode *inode) +{ + return container_of(inode, struct tracefs_inode, vfs_inode); +} +#endif /* _TRACEFS_INTERNAL_H */
Export a few tracefs functions that will be needed by the eventfs dynamic file system. Rename them to start with "tracefs_" to keep with the name space.
start_creating -> tracefs_start_creating failed_creating -> tracefs_failed_creating end_creating -> tracefs_end_creating
Signed-off-by: Ajay Kaher akaher@vmware.com Co-developed-by: Steven Rostedt (VMware) rostedt@goodmis.org Signed-off-by: Steven Rostedt (VMware) rostedt@goodmis.org Tested-by: Ching-lin Yu chinglinyu@google.com --- fs/tracefs/inode.c | 20 ++++++++++---------- fs/tracefs/internal.h | 5 +++++ 2 files changed, 15 insertions(+), 10 deletions(-)
diff --git a/fs/tracefs/inode.c b/fs/tracefs/inode.c index 2508944cc4d8..4acc4b4dfd22 100644 --- a/fs/tracefs/inode.c +++ b/fs/tracefs/inode.c @@ -147,7 +147,7 @@ static const struct inode_operations tracefs_dir_inode_operations = { .rmdir = tracefs_syscall_rmdir, };
-static struct inode *tracefs_get_inode(struct super_block *sb) +struct inode *tracefs_get_inode(struct super_block *sb) { struct inode *inode = new_inode(sb); if (inode) { @@ -422,7 +422,7 @@ static struct file_system_type trace_fs_type = { }; MODULE_ALIAS_FS("tracefs");
-static struct dentry *start_creating(const char *name, struct dentry *parent) +struct dentry *tracefs_start_creating(const char *name, struct dentry *parent) { struct dentry *dentry; int error; @@ -460,7 +460,7 @@ static struct dentry *start_creating(const char *name, struct dentry *parent) return dentry; }
-static struct dentry *failed_creating(struct dentry *dentry) +struct dentry *tracefs_failed_creating(struct dentry *dentry) { inode_unlock(d_inode(dentry->d_parent)); dput(dentry); @@ -468,7 +468,7 @@ static struct dentry *failed_creating(struct dentry *dentry) return NULL; }
-static struct dentry *end_creating(struct dentry *dentry) +struct dentry *tracefs_end_creating(struct dentry *dentry) { inode_unlock(d_inode(dentry->d_parent)); return dentry; @@ -513,14 +513,14 @@ struct dentry *tracefs_create_file(const char *name, umode_t mode, if (!(mode & S_IFMT)) mode |= S_IFREG; BUG_ON(!S_ISREG(mode)); - dentry = start_creating(name, parent); + dentry = tracefs_start_creating(name, parent);
if (IS_ERR(dentry)) return NULL;
inode = tracefs_get_inode(dentry->d_sb); if (unlikely(!inode)) - return failed_creating(dentry); + return tracefs_failed_creating(dentry);
inode->i_mode = mode; inode->i_fop = fops ? fops : &tracefs_file_operations; @@ -529,13 +529,13 @@ struct dentry *tracefs_create_file(const char *name, umode_t mode, inode->i_gid = d_inode(dentry->d_parent)->i_gid; d_instantiate(dentry, inode); fsnotify_create(d_inode(dentry->d_parent), dentry); - return end_creating(dentry); + return tracefs_end_creating(dentry); }
static struct dentry *__create_dir(const char *name, struct dentry *parent, const struct inode_operations *ops) { - struct dentry *dentry = start_creating(name, parent); + struct dentry *dentry = tracefs_start_creating(name, parent); struct inode *inode;
if (IS_ERR(dentry)) @@ -543,7 +543,7 @@ static struct dentry *__create_dir(const char *name, struct dentry *parent,
inode = tracefs_get_inode(dentry->d_sb); if (unlikely(!inode)) - return failed_creating(dentry); + return tracefs_failed_creating(dentry);
/* Do not set bits for OTH */ inode->i_mode = S_IFDIR | S_IRWXU | S_IRUSR| S_IRGRP | S_IXUSR | S_IXGRP; @@ -557,7 +557,7 @@ static struct dentry *__create_dir(const char *name, struct dentry *parent, d_instantiate(dentry, inode); inc_nlink(d_inode(dentry->d_parent)); fsnotify_mkdir(d_inode(dentry->d_parent), dentry); - return end_creating(dentry); + return tracefs_end_creating(dentry); }
/** diff --git a/fs/tracefs/internal.h b/fs/tracefs/internal.h index 954ea005632b..7dfb7ebc1c3f 100644 --- a/fs/tracefs/internal.h +++ b/fs/tracefs/internal.h @@ -12,4 +12,9 @@ static inline struct tracefs_inode *get_tracefs(const struct inode *inode) { return container_of(inode, struct tracefs_inode, vfs_inode); } + +struct dentry *tracefs_start_creating(const char *name, struct dentry *parent); +struct dentry *tracefs_end_creating(struct dentry *dentry); +struct dentry *tracefs_failed_creating(struct dentry *dentry); +struct inode *tracefs_get_inode(struct super_block *sb); #endif /* _TRACEFS_INTERNAL_H */
Add eventfs_file structure which will hold the properties of the eventfs files and directories.
Add following functions to create the directories in eventfs:
eventfs_create_events_dir() will create the top level "events" directory within the tracefs file system.
eventfs_add_subsystem_dir() creates an eventfs_file descriptor with the given name of the subsystem.
eventfs_add_dir() creates an eventfs_file descriptor with the given name of the directory and attached to a eventfs_file of a subsystem.
Add tracefs_inode structure to hold the inodes, flags and pointers to private data used by eventfs.
Signed-off-by: Ajay Kaher akaher@vmware.com Co-developed-by: Steven Rostedt (VMware) rostedt@goodmis.org Signed-off-by: Steven Rostedt (VMware) rostedt@goodmis.org Tested-by: Ching-lin Yu chinglinyu@google.com Reported-by: kernel test robot lkp@intel.com Closes: https://lore.kernel.org/oe-lkp/202305051619.9a469a9a-yujie.liu@intel.com --- fs/tracefs/Makefile | 1 + fs/tracefs/event_inode.c | 211 +++++++++++++++++++++++++++++++++++++++ fs/tracefs/internal.h | 4 + include/linux/tracefs.h | 11 ++ 4 files changed, 227 insertions(+) create mode 100644 fs/tracefs/event_inode.c
diff --git a/fs/tracefs/Makefile b/fs/tracefs/Makefile index 7c35a282b484..73c56da8e284 100644 --- a/fs/tracefs/Makefile +++ b/fs/tracefs/Makefile @@ -1,5 +1,6 @@ # SPDX-License-Identifier: GPL-2.0-only tracefs-objs := inode.o +tracefs-objs += event_inode.o
obj-$(CONFIG_TRACING) += tracefs.o
diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c new file mode 100644 index 000000000000..8f334b122e46 --- /dev/null +++ b/fs/tracefs/event_inode.c @@ -0,0 +1,211 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * event_inode.c - part of tracefs, a pseudo file system for activating tracing + * + * Copyright (C) 2020-23 VMware Inc, author: Steven Rostedt (VMware) rostedt@goodmis.org + * Copyright (C) 2020-23 VMware Inc, author: Ajay Kaher akaher@vmware.com + * + * eventfs is used to dynamically create inodes and dentries based on the + * meta data provided by the tracing system. + * + * eventfs stores the meta-data of files/dirs and holds off on creating + * inodes/dentries of the files. When accessed, the eventfs will create the + * inodes/dentries in a just-in-time (JIT) manner. The eventfs will clean up + * and delete the inodes/dentries when they are no longer referenced. + */ +#include <linux/fsnotify.h> +#include <linux/fs.h> +#include <linux/namei.h> +#include <linux/workqueue.h> +#include <linux/security.h> +#include <linux/tracefs.h> +#include <linux/kref.h> +#include <linux/delay.h> +#include "internal.h" + +struct eventfs_inode { + struct list_head e_top_files; +}; + +/** + * struct eventfs_file - hold the properties of the eventfs files and + * directories. + * @name: the name of the file or directory to create + * @list: file or directory to be added to parent directory + * @ei: list of files and directories within directory + * @fop: file_operations for file or directory + * @iop: inode_operations for file or directory + * @data: something that the caller will want to get to later on + * @mode: the permission that the file or directory should have + */ +struct eventfs_file { + const char *name; + struct list_head list; + struct eventfs_inode *ei; + const struct file_operations *fop; + const struct inode_operations *iop; + void *data; + umode_t mode; +}; + +static DEFINE_MUTEX(eventfs_mutex); + +static const struct inode_operations eventfs_root_dir_inode_operations = { +}; + +static const struct file_operations eventfs_file_operations = { +}; + +/** + * eventfs_prepare_ef - helper function to prepare eventfs_file + * @name: the name of the file/directory to create. + * @mode: the permission that the file should have. + * @fop: struct file_operations that should be used for this file/directory. + * @iop: struct inode_operations that should be used for this file/directory. + * @data: something that the caller will want to get to later on. The + * inode.i_private pointer will point to this value on the open() call. + * + * This function allocates and fills the eventfs_file structure. + */ +static struct eventfs_file *eventfs_prepare_ef(const char *name, umode_t mode, + const struct file_operations *fop, + const struct inode_operations *iop, + void *data) +{ + struct eventfs_file *ef; + + ef = kzalloc(sizeof(*ef), GFP_KERNEL); + if (!ef) + return ERR_PTR(-ENOMEM); + + ef->name = kstrdup(name, GFP_KERNEL); + if (!ef->name) { + kfree(ef); + return ERR_PTR(-ENOMEM); + } + + if (S_ISDIR(mode)) { + ef->ei = kzalloc(sizeof(*ef->ei), GFP_KERNEL); + if (!ef->ei) { + kfree(ef->name); + kfree(ef); + return ERR_PTR(-ENOMEM); + } + INIT_LIST_HEAD(&ef->ei->e_top_files); + } else { + ef->ei = NULL; + } + + ef->iop = iop; + ef->fop = fop; + ef->mode = mode; + ef->data = data; + return ef; +} + +/** + * eventfs_create_events_dir - create the trace event structure + * @name: the name of the directory to create. + * @parent: parent dentry for this file. This should be a directory dentry + * if set. If this parameter is NULL, then the directory will be + * created in the root of the tracefs filesystem. + * + * This function creates the top of the trace event directory. + */ +struct dentry *eventfs_create_events_dir(const char *name, + struct dentry *parent) +{ + struct dentry *dentry = tracefs_start_creating(name, parent); + struct eventfs_inode *ei; + struct tracefs_inode *ti; + struct inode *inode; + + if (IS_ERR(dentry)) + return dentry; + + ei = kzalloc(sizeof(*ei), GFP_KERNEL); + if (!ei) + return ERR_PTR(-ENOMEM); + inode = tracefs_get_inode(dentry->d_sb); + if (unlikely(!inode)) { + kfree(ei); + tracefs_failed_creating(dentry); + return ERR_PTR(-ENOMEM); + } + + INIT_LIST_HEAD(&ei->e_top_files); + + ti = get_tracefs(inode); + ti->flags |= TRACEFS_EVENT_INODE; + ti->private = ei; + + inode->i_mode = S_IFDIR | S_IRWXU | S_IRUGO | S_IXUGO; + inode->i_op = &eventfs_root_dir_inode_operations; + inode->i_fop = &eventfs_file_operations; + + /* directory inodes start off with i_nlink == 2 (for "." entry) */ + inc_nlink(inode); + d_instantiate(dentry, inode); + inc_nlink(dentry->d_parent->d_inode); + fsnotify_mkdir(dentry->d_parent->d_inode, dentry); + return tracefs_end_creating(dentry); +} + +/** + * eventfs_add_subsystem_dir - add eventfs subsystem_dir to list to create later + * @name: the name of the file to create. + * @parent: parent dentry for this dir. + * + * This function adds eventfs subsystem dir to list. + * And all these dirs are created on the fly when they are looked up, + * and the dentry and inodes will be removed when they are done. + */ +struct eventfs_file *eventfs_add_subsystem_dir(const char *name, + struct dentry *parent) +{ + struct tracefs_inode *ti_parent; + struct eventfs_inode *ei_parent; + struct eventfs_file *ef; + + if (!parent) + return ERR_PTR(-EINVAL); + + ti_parent = get_tracefs(parent->d_inode); + ei_parent = ti_parent->private; + + ef = eventfs_prepare_ef(name, S_IFDIR, NULL, NULL, NULL); + if (IS_ERR(ef)) + return ef; + + mutex_lock(&eventfs_mutex); + list_add_tail(&ef->list, &ei_parent->e_top_files); + mutex_unlock(&eventfs_mutex); + return ef; +} + +/** + * eventfs_add_dir - add eventfs dir to list to create later + * @name: the name of the file to create. + * @ef_parent: parent eventfs_file for this dir. + * + * This function adds eventfs dir to list. + * And all these dirs are created on the fly when they are looked up, + * and the dentry and inodes will be removed when they are done. + */ +struct eventfs_file *eventfs_add_dir(const char *name, + struct eventfs_file *ef_parent) +{ + struct eventfs_file *ef; + + if (!ef_parent) + return ERR_PTR(-EINVAL); + + ef = eventfs_prepare_ef(name, S_IFDIR, NULL, NULL, NULL); + if (IS_ERR(ef)) + return ef; + + mutex_lock(&eventfs_mutex); + list_add_tail(&ef->list, &ef_parent->ei->e_top_files); + mutex_unlock(&eventfs_mutex); + return ef; +} diff --git a/fs/tracefs/internal.h b/fs/tracefs/internal.h index 7dfb7ebc1c3f..f0fd565d59ec 100644 --- a/fs/tracefs/internal.h +++ b/fs/tracefs/internal.h @@ -2,6 +2,10 @@ #ifndef _TRACEFS_INTERNAL_H #define _TRACEFS_INTERNAL_H
+enum { + TRACEFS_EVENT_INODE = BIT(1), +}; + struct tracefs_inode { unsigned long flags; void *private; diff --git a/include/linux/tracefs.h b/include/linux/tracefs.h index 99912445974c..432e5e6f7901 100644 --- a/include/linux/tracefs.h +++ b/include/linux/tracefs.h @@ -21,6 +21,17 @@ struct file_operations;
#ifdef CONFIG_TRACING
+struct eventfs_file; + +struct dentry *eventfs_create_events_dir(const char *name, + struct dentry *parent); + +struct eventfs_file *eventfs_add_subsystem_dir(const char *name, + struct dentry *parent); + +struct eventfs_file *eventfs_add_dir(const char *name, + struct eventfs_file *ef_parent); + struct dentry *tracefs_create_file(const char *name, umode_t mode, struct dentry *parent, void *data, const struct file_operations *fops);
Add the following functions to add files to evenfs:
eventfs_add_events_file() to add the data needed to create a specific file located at the top level events directory. The dentry/inode will be created when the events directory is scanned.
eventfs_add_file() to add the data needed for files within the directories below the top level events directory. The dentry/inode of the file will be created when the directory that the file is in is scanned.
Signed-off-by: Ajay Kaher akaher@vmware.com Co-developed-by: Steven Rostedt (VMware) rostedt@goodmis.org Signed-off-by: Steven Rostedt (VMware) rostedt@goodmis.org Tested-by: Ching-lin Yu chinglinyu@google.com Reported-by: kernel test robot lkp@intel.com Closes: https://lore.kernel.org/oe-lkp/202305051619.9a469a9a-yujie.liu@intel.com --- fs/tracefs/event_inode.c | 86 ++++++++++++++++++++++++++++++++++++++++ include/linux/tracefs.h | 8 ++++ 2 files changed, 94 insertions(+)
diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c index 8f334b122e46..9e4843be9dc9 100644 --- a/fs/tracefs/event_inode.c +++ b/fs/tracefs/event_inode.c @@ -209,3 +209,89 @@ struct eventfs_file *eventfs_add_dir(const char *name, mutex_unlock(&eventfs_mutex); return ef; } + +/** + * eventfs_add_events_file - add the data needed to create a file for later reference + * @name: the name of the file to create. + * @mode: the permission that the file should have. + * @parent: parent dentry for this file. + * @data: something that the caller will want to get to later on. + * @fop: struct file_operations that should be used for this file. + * + * This function is used to add the information needed to create a + * dentry/inode within the top level events directory. The file created + * will have the @mode permissions. The @data will be used to fill the + * inode.i_private when the open() call is done. The dentry and inodes are + * all created when they are referenced, and removed when they are no + * longer referenced. + */ +int eventfs_add_events_file(const char *name, umode_t mode, + struct dentry *parent, void *data, + const struct file_operations *fop) +{ + struct tracefs_inode *ti; + struct eventfs_inode *ei; + struct eventfs_file *ef; + + if (!parent) + return -EINVAL; + + if (!(mode & S_IFMT)) + mode |= S_IFREG; + + if (!parent->d_inode) + return -EINVAL; + + ti = get_tracefs(parent->d_inode); + if (!(ti->flags & TRACEFS_EVENT_INODE)) + return -EINVAL; + + ei = ti->private; + ef = eventfs_prepare_ef(name, mode, fop, NULL, data); + + if (IS_ERR(ef)) + return -ENOMEM; + + mutex_lock(&eventfs_mutex); + list_add_tail(&ef->list, &ei->e_top_files); + mutex_unlock(&eventfs_mutex); + return 0; +} + +/** + * eventfs_add_file - add eventfs file to list to create later + * @name: the name of the file to create. + * @mode: the permission that the file should have. + * @ef_parent: parent eventfs_file for this file. + * @data: something that the caller will want to get to later on. + * @fop: struct file_operations that should be used for this file. + * + * This function is used to add the information needed to create a + * file within a subdirectory of the events directory. The file created + * will have the @mode permissions. The @data will be used to fill the + * inode.i_private when the open() call is done. The dentry and inodes are + * all created when they are referenced, and removed when they are no + * longer referenced. + */ +int eventfs_add_file(const char *name, umode_t mode, + struct eventfs_file *ef_parent, + void *data, + const struct file_operations *fop) +{ + struct eventfs_file *ef; + + if (!ef_parent) + return -EINVAL; + + if (!(mode & S_IFMT)) + mode |= S_IFREG; + + ef = eventfs_prepare_ef(name, mode, fop, NULL, data); + if (IS_ERR(ef)) + return -ENOMEM; + + mutex_lock(&eventfs_mutex); + list_add_tail(&ef->list, &ef_parent->ei->e_top_files); + mutex_unlock(&eventfs_mutex); + return 0; +} diff --git a/include/linux/tracefs.h b/include/linux/tracefs.h index 432e5e6f7901..54c9cbd0389b 100644 --- a/include/linux/tracefs.h +++ b/include/linux/tracefs.h @@ -32,6 +32,14 @@ struct eventfs_file *eventfs_add_subsystem_dir(const char *name, struct eventfs_file *eventfs_add_dir(const char *name, struct eventfs_file *ef_parent);
+int eventfs_add_file(const char *name, umode_t mode, + struct eventfs_file *ef_parent, void *data, + const struct file_operations *fops); + +int eventfs_add_events_file(const char *name, umode_t mode, + struct dentry *parent, void *data, + const struct file_operations *fops); + struct dentry *tracefs_create_file(const char *name, umode_t mode, struct dentry *parent, void *data, const struct file_operations *fops);
Add the inode_operations, file_operations, and helper functions to eventfs: dcache_dir_open_wrapper() eventfs_root_lookup() eventfs_release() eventfs_set_ef_status_free() eventfs_post_create_dir()
The inode_operations and file_operations functions will be called from the VFS layer.
create_file() and create_dir() are added as stub functions and will be filled in later.
Signed-off-by: Ajay Kaher akaher@vmware.com Co-developed-by: Steven Rostedt (VMware) rostedt@goodmis.org Signed-off-by: Steven Rostedt (VMware) rostedt@goodmis.org Tested-by: Ching-lin Yu chinglinyu@google.com --- fs/tracefs/event_inode.c | 304 +++++++++++++++++++++++++++++++++++++++ fs/tracefs/internal.h | 2 + 2 files changed, 306 insertions(+)
diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c index 9e4843be9dc9..24d645c61029 100644 --- a/fs/tracefs/event_inode.c +++ b/fs/tracefs/event_inode.c @@ -31,6 +31,8 @@ struct eventfs_inode { * struct eventfs_file - hold the properties of the eventfs files and * directories. * @name: the name of the file or directory to create + * @d_parent: holds parent's dentry + * @dentry: once accessed holds dentry * @list: file or directory to be added to parent directory * @ei: list of files and directories within directory * @fop: file_operations for file or directory @@ -40,22 +42,320 @@ struct eventfs_inode { */ struct eventfs_file { const char *name; + struct dentry *d_parent; + struct dentry *dentry; struct list_head list; struct eventfs_inode *ei; const struct file_operations *fop; const struct inode_operations *iop; + union { + struct list_head del_list; /* list of eventfs_file to delete */ + struct rcu_head rcu; /* eventfs_file to delete */ + unsigned long is_freed; /* Freed if one of the above is set */ + }; void *data; umode_t mode; };
static DEFINE_MUTEX(eventfs_mutex); +DEFINE_STATIC_SRCU(eventfs_srcu); + +static struct dentry *eventfs_root_lookup(struct inode *dir, + struct dentry *dentry, + unsigned int flags); +static int dcache_dir_open_wrapper(struct inode *inode, struct file *file); +static int eventfs_release(struct inode *inode, struct file *file);
static const struct inode_operations eventfs_root_dir_inode_operations = { + .lookup = eventfs_root_lookup, };
static const struct file_operations eventfs_file_operations = { + .open = dcache_dir_open_wrapper, + .read = generic_read_dir, + .iterate_shared = dcache_readdir, + .llseek = generic_file_llseek, + .release = eventfs_release, +}; + +/** + * create_file - create a file in the tracefs filesystem + * @name: the name of the file to create. + * @mode: the permission that the file should have. + * @parent: parent dentry for this file. + * @data: something that the caller will want to get to later on. + * @fop: struct file_operations that should be used for this file. + * + * This is the basic "create a file" function for tracefs. It allows for a + * wide range of flexibility in creating a file. + * + * This function will return a pointer to a dentry if it succeeds. This + * pointer must be passed to the tracefs_remove() function when the file is + * to be removed (no automatic cleanup happens if your module is unloaded, + * you are responsible here.) If an error occurs, %NULL will be returned. + * + * If tracefs is not enabled in the kernel, the value -%ENODEV will be + * returned. + */ +static struct dentry *create_file(const char *name, umode_t mode, + struct dentry *parent, void *data, + const struct file_operations *fop) +{ + return NULL; };
+/** + * create_dir - create a dir in the tracefs filesystem + * @name: the name of the file to create. + * @parent: parent dentry for this file. + * @data: something that the caller will want to get to later on. + * + * This is the basic "create a dir" function for eventfs. It allows for a + * wide range of flexibility in creating a dir. + * + * This function will return a pointer to a dentry if it succeeds. This + * pointer must be passed to the tracefs_remove() function when the file is + * to be removed (no automatic cleanup happens if your module is unloaded, + * you are responsible here.) If an error occurs, %NULL will be returned. + * + * If tracefs is not enabled in the kernel, the value -%ENODEV will be + * returned. + */ +static struct dentry *create_dir(const char *name, struct dentry *parent, void *data) +{ + return NULL; +} + +/** + * eventfs_set_ef_status_free - set the ef->status to free + * @dentry: dentry who's status to be freed + * + * eventfs_set_ef_status_free will be called if no more + * references remain + */ +void eventfs_set_ef_status_free(struct dentry *dentry) +{ + struct tracefs_inode *ti_parent; + struct eventfs_file *ef; + + mutex_lock(&eventfs_mutex); + ti_parent = get_tracefs(dentry->d_parent->d_inode); + if (!ti_parent || !(ti_parent->flags & TRACEFS_EVENT_INODE)) + goto out; + + ef = dentry->d_fsdata; + if (!ef) + goto out; + + dentry->d_fsdata = NULL; + ef->dentry = NULL; +out: + mutex_unlock(&eventfs_mutex); +} + +/** + * eventfs_post_create_dir - post create dir routine + * @ef: eventfs_file of recently created dir + * + * Map the meta-data of files within an eventfs dir to their parent dentry + */ +static void eventfs_post_create_dir(struct eventfs_file *ef) +{ + struct eventfs_file *ef_child; + struct tracefs_inode *ti; + + /* srcu lock already held */ + /* fill parent-child relation */ + list_for_each_entry_srcu(ef_child, &ef->ei->e_top_files, list, + srcu_read_lock_held(&eventfs_srcu)) { + ef_child->d_parent = ef->dentry; + } + + ti = get_tracefs(ef->dentry->d_inode); + ti->private = ef->ei; +} + +/** + * create_dentry - helper function to create dentry + * @ef: eventfs_file of file or directory to create + * @parent: parent dentry + * @lookup: true if called from lookup routine + * + * Used to create a dentry for file/dir, executes post dentry creation routine + */ +static struct dentry * +create_dentry(struct eventfs_file *ef, struct dentry *parent, bool lookup) +{ + bool invalidate = false; + struct dentry *dentry; + + mutex_lock(&eventfs_mutex); + if (ef->is_freed) { + mutex_unlock(&eventfs_mutex); + return NULL; + } + if (ef->dentry) { + dentry = ef->dentry; + /* On dir open, up the ref count */ + if (!lookup) + dget(dentry); + mutex_unlock(&eventfs_mutex); + return dentry; + } + mutex_unlock(&eventfs_mutex); + + if (!lookup) + inode_lock(parent->d_inode); + + if (ef->ei) + dentry = create_dir(ef->name, parent, ef->data); + else + dentry = create_file(ef->name, ef->mode, parent, + ef->data, ef->fop); + + if (!lookup) + inode_unlock(parent->d_inode); + + mutex_lock(&eventfs_mutex); + if (IS_ERR_OR_NULL(dentry)) { + /* If the ef was already updated get it */ + dentry = ef->dentry; + if (dentry && !lookup) + dget(dentry); + mutex_unlock(&eventfs_mutex); + return dentry; + } + + if (!ef->dentry && !ef->is_freed) { + ef->dentry = dentry; + if (ef->ei) + eventfs_post_create_dir(ef); + dentry->d_fsdata = ef; + } else { + /* A race here, should try again (unless freed) */ + invalidate = true; + } + mutex_unlock(&eventfs_mutex); + if (invalidate) + d_invalidate(dentry); + + if (lookup || invalidate) + dput(dentry); + + return invalidate ? NULL : dentry; +} + +static bool match_event_file(struct eventfs_file *ef, const char *name) +{ + bool ret; + + mutex_lock(&eventfs_mutex); + ret = !ef->is_freed && strcmp(ef->name, name) == 0; + mutex_unlock(&eventfs_mutex); + + return ret; +} + +/** + * eventfs_root_lookup - lookup routine to create file/dir + * @dir: in which a lookup is being done + * @dentry: file/dir dentry + * @flags: to pass as flags parameter to simple lookup + * + * Used to create a dynamic file/dir within @dir. Use the eventfs_inode + * list of meta data to find the information needed to create the file/dir. + */ +static struct dentry *eventfs_root_lookup(struct inode *dir, + struct dentry *dentry, + unsigned int flags) +{ + struct tracefs_inode *ti; + struct eventfs_inode *ei; + struct eventfs_file *ef; + struct dentry *ret = NULL; + int idx; + + ti = get_tracefs(dir); + if (!(ti->flags & TRACEFS_EVENT_INODE)) + return NULL; + + ei = ti->private; + idx = srcu_read_lock(&eventfs_srcu); + list_for_each_entry_srcu(ef, &ei->e_top_files, list, + srcu_read_lock_held(&eventfs_srcu)) { + if (!match_event_file(ef, dentry->d_name.name)) + continue; + ret = simple_lookup(dir, dentry, flags); + create_dentry(ef, ef->d_parent, true); + break; + } + srcu_read_unlock(&eventfs_srcu, idx); + return ret; +} + +/** + * eventfs_release - called to release eventfs file/dir + * @inode: inode to be released + * @file: file to be released (not used) + */ +static int eventfs_release(struct inode *inode, struct file *file) +{ + struct tracefs_inode *ti; + struct eventfs_inode *ei; + struct eventfs_file *ef; + struct dentry *dentry; + int idx; + + ti = get_tracefs(inode); + if (!(ti->flags & TRACEFS_EVENT_INODE)) + return -EINVAL; + + ei = ti->private; + idx = srcu_read_lock(&eventfs_srcu); + list_for_each_entry_srcu(ef, &ei->e_top_files, list, + srcu_read_lock_held(&eventfs_srcu)) { + mutex_lock(&eventfs_mutex); + dentry = ef->dentry; + mutex_unlock(&eventfs_mutex); + if (dentry) + dput(dentry); + } + srcu_read_unlock(&eventfs_srcu, idx); + return dcache_dir_close(inode, file); +} + +/** + * dcache_dir_open_wrapper - eventfs open wrapper + * @inode: not used + * @file: dir to be opened (to create its child) + * + * Used to dynamically create the file/dir within @file. @file is really a + * directory and all the files/dirs of the children within @file will be + * created. If any of the files/dirs have already been created, their + * reference count will be incremented. + */ +static int dcache_dir_open_wrapper(struct inode *inode, struct file *file) +{ + struct tracefs_inode *ti; + struct eventfs_inode *ei; + struct eventfs_file *ef; + struct dentry *dentry = file_dentry(file); + struct inode *f_inode = file_inode(file); + int idx; + + ti = get_tracefs(f_inode); + if (!(ti->flags & TRACEFS_EVENT_INODE)) + return -EINVAL; + + ei = ti->private; + idx = srcu_read_lock(&eventfs_srcu); + list_for_each_entry_rcu(ef, &ei->e_top_files, list) { + create_dentry(ef, dentry, false); + } + srcu_read_unlock(&eventfs_srcu, idx); + return dcache_dir_open(inode, file); +} + /** * eventfs_prepare_ef - helper function to prepare eventfs_file * @name: the name of the file/directory to create. @@ -179,6 +479,7 @@ struct eventfs_file *eventfs_add_subsystem_dir(const char *name,
mutex_lock(&eventfs_mutex); list_add_tail(&ef->list, &ei_parent->e_top_files); + ef->d_parent = parent; mutex_unlock(&eventfs_mutex); return ef; } @@ -206,6 +507,7 @@ struct eventfs_file *eventfs_add_dir(const char *name,
mutex_lock(&eventfs_mutex); list_add_tail(&ef->list, &ef_parent->ei->e_top_files); + ef->d_parent = ef_parent->dentry; mutex_unlock(&eventfs_mutex); return ef; } @@ -254,6 +556,7 @@ int eventfs_add_events_file(const char *name, umode_t mode,
mutex_lock(&eventfs_mutex); list_add_tail(&ef->list, &ei->e_top_files); + ef->d_parent = parent; mutex_unlock(&eventfs_mutex); return 0; } @@ -292,6 +595,7 @@ int eventfs_add_file(const char *name, umode_t mode,
mutex_lock(&eventfs_mutex); list_add_tail(&ef->list, &ef_parent->ei->e_top_files); + ef->d_parent = ef_parent->dentry; mutex_unlock(&eventfs_mutex); return 0; } diff --git a/fs/tracefs/internal.h b/fs/tracefs/internal.h index f0fd565d59ec..9bfad9d95a4a 100644 --- a/fs/tracefs/internal.h +++ b/fs/tracefs/internal.h @@ -21,4 +21,6 @@ struct dentry *tracefs_start_creating(const char *name, struct dentry *parent); struct dentry *tracefs_end_creating(struct dentry *dentry); struct dentry *tracefs_failed_creating(struct dentry *dentry); struct inode *tracefs_get_inode(struct super_block *sb); +void eventfs_set_ef_status_free(struct dentry *dentry); + #endif /* _TRACEFS_INTERNAL_H */
Add create_file() and create_dir() functions to create the files and directories respectively when they are accessed. The functions will be called from the lookup operation of the inode_operations or from the open function of file_operations.
Signed-off-by: Ajay Kaher akaher@vmware.com Co-developed-by: Steven Rostedt (VMware) rostedt@goodmis.org Signed-off-by: Steven Rostedt (VMware) rostedt@goodmis.org Tested-by: Ching-lin Yu chinglinyu@google.com --- fs/tracefs/event_inode.c | 61 +++++++++++++++++++++++++++++++-- fs/tracefs/inode.c | 74 ++++++++++++++++++++++++++++++++++++++++ fs/tracefs/internal.h | 3 ++ 3 files changed, 136 insertions(+), 2 deletions(-)
diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c index 24d645c61029..5240bd2c81e7 100644 --- a/fs/tracefs/event_inode.c +++ b/fs/tracefs/event_inode.c @@ -101,7 +101,34 @@ static struct dentry *create_file(const char *name, umode_t mode, struct dentry *parent, void *data, const struct file_operations *fop) { - return NULL; + struct tracefs_inode *ti; + struct dentry *dentry; + struct inode *inode; + + if (!(mode & S_IFMT)) + mode |= S_IFREG; + + if (WARN_ON_ONCE(!S_ISREG(mode))) + return NULL; + + dentry = eventfs_start_creating(name, parent); + + if (IS_ERR(dentry)) + return dentry; + + inode = tracefs_get_inode(dentry->d_sb); + if (unlikely(!inode)) + return eventfs_failed_creating(dentry); + + inode->i_mode = mode; + inode->i_fop = fop; + inode->i_private = data; + + ti = get_tracefs(inode); + ti->flags |= TRACEFS_EVENT_INODE; + d_instantiate(dentry, inode); + fsnotify_create(dentry->d_parent->d_inode, dentry); + return eventfs_end_creating(dentry); };
/** @@ -123,7 +150,31 @@ static struct dentry *create_file(const char *name, umode_t mode, */ static struct dentry *create_dir(const char *name, struct dentry *parent, void *data) { - return NULL; + struct tracefs_inode *ti; + struct dentry *dentry; + struct inode *inode; + + dentry = eventfs_start_creating(name, parent); + if (IS_ERR(dentry)) + return dentry; + + inode = tracefs_get_inode(dentry->d_sb); + if (unlikely(!inode)) + return eventfs_failed_creating(dentry); + + inode->i_mode = S_IFDIR | S_IRWXU | S_IRUGO | S_IXUGO; + inode->i_op = &eventfs_root_dir_inode_operations; + inode->i_fop = &eventfs_file_operations; + inode->i_private = data; + + ti = get_tracefs(inode); + ti->flags |= TRACEFS_EVENT_INODE; + + inc_nlink(inode); + d_instantiate(dentry, inode); + inc_nlink(dentry->d_parent->d_inode); + fsnotify_mkdir(dentry->d_parent->d_inode, dentry); + return eventfs_end_creating(dentry); }
/** @@ -234,6 +285,12 @@ create_dentry(struct eventfs_file *ef, struct dentry *parent, bool lookup) } else { /* A race here, should try again (unless freed) */ invalidate = true; + + /* + * Should never happen unless we get here due to being freed. + * Otherwise it means two dentries exist with the same name. + */ + WARN_ON_ONCE(!ef->is_freed); } mutex_unlock(&eventfs_mutex); if (invalidate) diff --git a/fs/tracefs/inode.c b/fs/tracefs/inode.c index 4acc4b4dfd22..d9273066f25f 100644 --- a/fs/tracefs/inode.c +++ b/fs/tracefs/inode.c @@ -474,6 +474,80 @@ struct dentry *tracefs_end_creating(struct dentry *dentry) return dentry; }
+/** + * eventfs_start_creating - start the process of creating a dentry + * @name: Name of the file created for the dentry + * @parent: The parent dentry where this dentry will be created + * + * This is a simple helper function for the dynamically created eventfs + * files. When the directory of the eventfs files are accessed, their + * dentries are created on the fly. This function is used to start that + * process. + */ +struct dentry *eventfs_start_creating(const char *name, struct dentry *parent) +{ + struct dentry *dentry; + int error; + + error = simple_pin_fs(&trace_fs_type, &tracefs_mount, + &tracefs_mount_count); + if (error) + return ERR_PTR(error); + + /* + * If the parent is not specified, we create it in the root. + * We need the root dentry to do this, which is in the super + * block. A pointer to that is in the struct vfsmount that we + * have around. + */ + if (!parent) + parent = tracefs_mount->mnt_root; + + if (unlikely(IS_DEADDIR(parent->d_inode))) + dentry = ERR_PTR(-ENOENT); + else + dentry = lookup_one_len(name, parent, strlen(name)); + + if (!IS_ERR(dentry) && dentry->d_inode) { + dput(dentry); + dentry = ERR_PTR(-EEXIST); + } + + if (IS_ERR(dentry)) + simple_release_fs(&tracefs_mount, &tracefs_mount_count); + + return dentry; +} + +/** + * eventfs_failed_creating - clean up a failed eventfs dentry creation + * @dentry: The dentry to clean up + * + * If after calling eventfs_start_creating(), a failure is detected, the + * resources created by eventfs_start_creating() needs to be cleaned up. In + * that case, this function should be called to perform that clean up. + */ +struct dentry *eventfs_failed_creating(struct dentry *dentry) +{ + dput(dentry); + simple_release_fs(&tracefs_mount, &tracefs_mount_count); + return NULL; +} + +/** + * eventfs_end_creating - Finish the process of creating a eventfs dentry + * @dentry: The dentry that has successfully been created. + * + * This function is currently just a place holder to match + * eventfs_start_creating(). In case any synchronization needs to be added, + * this function will be used to implement that without having to modify + * the callers of eventfs_start_creating(). + */ +struct dentry *eventfs_end_creating(struct dentry *dentry) +{ + return dentry; +} + /** * tracefs_create_file - create a file in the tracefs filesystem * @name: a pointer to a string containing the name of the file to create. diff --git a/fs/tracefs/internal.h b/fs/tracefs/internal.h index 9bfad9d95a4a..69c2b1d87c46 100644 --- a/fs/tracefs/internal.h +++ b/fs/tracefs/internal.h @@ -21,6 +21,9 @@ struct dentry *tracefs_start_creating(const char *name, struct dentry *parent); struct dentry *tracefs_end_creating(struct dentry *dentry); struct dentry *tracefs_failed_creating(struct dentry *dentry); struct inode *tracefs_get_inode(struct super_block *sb); +struct dentry *eventfs_start_creating(const char *name, struct dentry *parent); +struct dentry *eventfs_failed_creating(struct dentry *dentry); +struct dentry *eventfs_end_creating(struct dentry *dentry); void eventfs_set_ef_status_free(struct dentry *dentry);
#endif /* _TRACEFS_INTERNAL_H */
When events are removed from tracefs, the eventfs must be aware of this. The eventfs_remove() removes the meta data from eventfs so that it will no longer create the files associated with that event.
When an instance is removed from tracefs, eventfs_remove_events_dir() will remove and clean up the entire "events" directory.
The helper function eventfs_remove_rec() is used to clean up and free the associated data from eventfs for both of the added functions. SRCU is used to protect the lists of meta data stored in the eventfs. The eventfs_mutex is used to protect the content of the items in the list.
As lookups may be happening as deletions of events are made, the freeing of dentry/inodes and relative information is done after the SRCU grace period has passed.
Signed-off-by: Ajay Kaher akaher@vmware.com Co-developed-by: Steven Rostedt (VMware) rostedt@goodmis.org Signed-off-by: Steven Rostedt (VMware) rostedt@goodmis.org Tested-by: Ching-lin Yu chinglinyu@google.com Reported-by: kernel test robot lkp@intel.com Closes: https://lore.kernel.org/oe-kbuild-all/202305030611.Kas747Ev-lkp@intel.com/ --- fs/tracefs/event_inode.c | 143 +++++++++++++++++++++++++++++++++++++++ include/linux/tracefs.h | 4 ++ 2 files changed, 147 insertions(+)
diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c index 5240bd2c81e7..2db7f1028f75 100644 --- a/fs/tracefs/event_inode.c +++ b/fs/tracefs/event_inode.c @@ -198,6 +198,14 @@ void eventfs_set_ef_status_free(struct dentry *dentry) if (!ef) goto out;
+ /* + * If ef was freed, then the LSB bit is set for d_fsdata. + * But this should not happen, as it should still have a + * ref count that prevents it. Warn in case it does. + */ + if (WARN_ON_ONCE((unsigned long)ef & 1)) + goto out; + dentry->d_fsdata = NULL; ef->dentry = NULL; out: @@ -656,3 +664,138 @@ int eventfs_add_file(const char *name, umode_t mode, mutex_unlock(&eventfs_mutex); return 0; } + +static void free_ef(struct rcu_head *head) +{ + struct eventfs_file *ef = container_of(head, struct eventfs_file, rcu); + + kfree(ef->name); + kfree(ef->ei); + kfree(ef); +} + +/** + * eventfs_remove_rec - remove eventfs dir or file from list + * @ef: eventfs_file to be removed. + * @head: to create list of eventfs_file to be deleted + * @level: to check recursion depth + * + * The helper function eventfs_remove_rec() is used to clean up and free the + * associated data from eventfs for both of the added functions. + */ +static void eventfs_remove_rec(struct eventfs_file *ef, struct list_head *head, int level) +{ + struct eventfs_file *ef_child; + + if (!ef) + return; + /* + * Check recursion depth. It should never be greater than 3: + * 0 - events/ + * 1 - events/group/ + * 2 - events/group/event/ + * 3 - events/group/event/file + */ + if (WARN_ON_ONCE(level > 3)) + return; + + if (ef->ei) { + /* search for nested folders or files */ + list_for_each_entry_srcu(ef_child, &ef->ei->e_top_files, list, + lockdep_is_held(&eventfs_mutex)) { + eventfs_remove_rec(ef_child, head, level + 1); + } + } + + list_del_rcu(&ef->list); + list_add_tail(&ef->del_list, head); +} + +/** + * eventfs_remove - remove eventfs dir or file from list + * @ef: eventfs_file to be removed. + * + * This function acquire the eventfs_mutex lock and call eventfs_remove_rec() + */ +void eventfs_remove(struct eventfs_file *ef) +{ + struct eventfs_file *tmp; + LIST_HEAD(ef_del_list); + struct dentry *dentry_list = NULL; + struct dentry *dentry; + + if (!ef) + return; + + mutex_lock(&eventfs_mutex); + eventfs_remove_rec(ef, &ef_del_list, 0); + list_for_each_entry_safe(ef, tmp, &ef_del_list, del_list) { + if (ef->dentry) { + unsigned long ptr = (unsigned long)dentry_list; + + /* Keep the dentry from being freed yet */ + dget(ef->dentry); + + /* + * Paranoid: The dget() above should prevent the dentry + * from being freed and calling eventfs_set_ef_status_free(). + * But just in case, set the link list LSB pointer to 1 + * and have eventfs_set_ef_status_free() check that to + * make sure that if it does happen, it will not think + * the d_fsdata is an event_file. + * + * For this to work, no event_file should be allocated + * on a odd space, as the ef should always be allocated + * to be at least word aligned. Check for that too. + */ + WARN_ON_ONCE(ptr & 1); + + ef->dentry->d_fsdata = (void *)(ptr | 1); + dentry_list = ef->dentry; + ef->dentry = NULL; + } + call_srcu(&eventfs_srcu, &ef->rcu, free_ef); + } + mutex_unlock(&eventfs_mutex); + + while (dentry_list) { + unsigned long ptr; + + dentry = dentry_list; + ptr = (unsigned long)dentry->d_fsdata & ~1UL; + dentry_list = (struct dentry *)ptr; + dentry->d_fsdata = NULL; + d_invalidate(dentry); + mutex_lock(&eventfs_mutex); + /* dentry should now have at least a single reference */ + WARN_ONCE((int)d_count(dentry) < 1, + "dentry %p less than one reference (%d) after invalidate\n", + dentry, d_count(dentry)); + mutex_unlock(&eventfs_mutex); + dput(dentry); + } +} + +/** + * eventfs_remove_events_dir - remove eventfs dir or file from list + * @dentry: events's dentry to be removed. + * + * This function remove events main directory + */ +void eventfs_remove_events_dir(struct dentry *dentry) +{ + struct tracefs_inode *ti; + struct eventfs_inode *ei; + + if (!dentry || !dentry->d_inode) + return; + + ti = get_tracefs(dentry->d_inode); + if (!ti || !(ti->flags & TRACEFS_EVENT_INODE)) + return; + + ei = ti->private; + d_invalidate(dentry); + dput(dentry); + kfree(ei); +} diff --git a/include/linux/tracefs.h b/include/linux/tracefs.h index 54c9cbd0389b..009072792fa3 100644 --- a/include/linux/tracefs.h +++ b/include/linux/tracefs.h @@ -40,6 +40,10 @@ int eventfs_add_events_file(const char *name, umode_t mode, struct dentry *parent, void *data, const struct file_operations *fops);
+void eventfs_remove(struct eventfs_file *ef); + +void eventfs_remove_events_dir(struct dentry *dentry); + struct dentry *tracefs_create_file(const char *name, umode_t mode, struct dentry *parent, void *data, const struct file_operations *fops);
Up until now, /sys/kernel/tracing/events was no different than any other part of tracefs. The files and directories within the events directory was created when the tracefs was mounted, and also created for the instances in /sys/kernel/tracing/instances/<instance>/events. Most of these files and directories will never be referenced. Since there are thousands of these files and directories they spend their time wasting precious memory resources.
Move the "events" directory to the new eventfs. The eventfs will take the meta data of the events that they represent and store that. When the files in the events directory are referenced, the dentry and inodes to represent them are then created. When the files are no longer referenced, they are freed. This saves the precious memory resources that were wasted on these seldom referenced dentries and inodes.
Signed-off-by: Ajay Kaher akaher@vmware.com Co-developed-by: Steven Rostedt (VMware) rostedt@goodmis.org Signed-off-by: Steven Rostedt (VMware) rostedt@goodmis.org Tested-by: Ching-lin Yu chinglinyu@google.com --- fs/tracefs/inode.c | 18 ++++++++++ include/linux/trace_events.h | 1 + kernel/trace/trace.h | 2 +- kernel/trace/trace_events.c | 65 ++++++++++++++++++------------------ 4 files changed, 53 insertions(+), 33 deletions(-)
diff --git a/fs/tracefs/inode.c b/fs/tracefs/inode.c index d9273066f25f..bb6de89eb446 100644 --- a/fs/tracefs/inode.c +++ b/fs/tracefs/inode.c @@ -374,6 +374,23 @@ static const struct super_operations tracefs_super_operations = { .show_options = tracefs_show_options, };
+static void tracefs_dentry_iput(struct dentry *dentry, struct inode *inode) +{ + struct tracefs_inode *ti; + + if (!dentry || !inode) + return; + + ti = get_tracefs(inode); + if (ti && ti->flags & TRACEFS_EVENT_INODE) + eventfs_set_ef_status_free(dentry); + iput(inode); +} + +static const struct dentry_operations tracefs_dentry_operations = { + .d_iput = tracefs_dentry_iput, +}; + static int trace_fill_super(struct super_block *sb, void *data, int silent) { static const struct tree_descr trace_files[] = {{""}}; @@ -396,6 +413,7 @@ static int trace_fill_super(struct super_block *sb, void *data, int silent) goto fail;
sb->s_op = &tracefs_super_operations; + sb->s_d_op = &tracefs_dentry_operations;
tracefs_apply_options(sb, false);
diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h index 3930e676436c..c17623c78029 100644 --- a/include/linux/trace_events.h +++ b/include/linux/trace_events.h @@ -638,6 +638,7 @@ struct trace_event_file { struct list_head list; struct trace_event_call *event_call; struct event_filter __rcu *filter; + struct eventfs_file *ef; struct dentry *dir; struct trace_array *tr; struct trace_subsystem_dir *system; diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h index e1edc2197fc8..956938357774 100644 --- a/kernel/trace/trace.h +++ b/kernel/trace/trace.h @@ -1324,7 +1324,7 @@ struct trace_subsystem_dir { struct list_head list; struct event_subsystem *subsystem; struct trace_array *tr; - struct dentry *entry; + struct eventfs_file *ef; int ref_count; int nr_events; }; diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c index a284171d5c74..85f9a99bb506 100644 --- a/kernel/trace/trace_events.c +++ b/kernel/trace/trace_events.c @@ -990,7 +990,7 @@ static void remove_subsystem(struct trace_subsystem_dir *dir) return;
if (!--dir->nr_events) { - tracefs_remove(dir->entry); + eventfs_remove(dir->ef); list_del(&dir->list); __put_system_dir(dir); } @@ -1011,7 +1011,7 @@ static void remove_event_file_dir(struct trace_event_file *file)
tracefs_remove(dir); } - + eventfs_remove(file->ef); list_del(&file->list); remove_subsystem(file->system); free_event_filter(file->filter); @@ -2297,13 +2297,13 @@ create_new_subsystem(const char *name) return NULL; }
-static struct dentry * +static struct eventfs_file * event_subsystem_dir(struct trace_array *tr, const char *name, struct trace_event_file *file, struct dentry *parent) { struct event_subsystem *system, *iter; struct trace_subsystem_dir *dir; - struct dentry *entry; + int res;
/* First see if we did not already create this dir */ list_for_each_entry(dir, &tr->systems, list) { @@ -2311,7 +2311,7 @@ event_subsystem_dir(struct trace_array *tr, const char *name, if (strcmp(system->name, name) == 0) { dir->nr_events++; file->system = dir; - return dir->entry; + return dir->ef; } }
@@ -2335,8 +2335,8 @@ event_subsystem_dir(struct trace_array *tr, const char *name, } else __get_system(system);
- dir->entry = tracefs_create_dir(name, parent); - if (!dir->entry) { + dir->ef = eventfs_add_subsystem_dir(name, parent); + if (IS_ERR(dir->ef)) { pr_warn("Failed to create system directory %s\n", name); __put_system(system); goto out_free; @@ -2351,22 +2351,22 @@ event_subsystem_dir(struct trace_array *tr, const char *name, /* the ftrace system is special, do not create enable or filter files */ if (strcmp(name, "ftrace") != 0) {
- entry = tracefs_create_file("filter", TRACE_MODE_WRITE, - dir->entry, dir, + res = eventfs_add_file("filter", TRACE_MODE_WRITE, + dir->ef, dir, &ftrace_subsystem_filter_fops); - if (!entry) { + if (res) { kfree(system->filter); system->filter = NULL; pr_warn("Could not create tracefs '%s/filter' entry\n", name); }
- trace_create_file("enable", TRACE_MODE_WRITE, dir->entry, dir, + eventfs_add_file("enable", TRACE_MODE_WRITE, dir->ef, dir, &ftrace_system_enable_fops); }
list_add(&dir->list, &tr->systems);
- return dir->entry; + return dir->ef;
out_free: kfree(dir); @@ -2419,8 +2419,8 @@ static int event_create_dir(struct dentry *parent, struct trace_event_file *file) { struct trace_event_call *call = file->event_call; + struct eventfs_file *ef_subsystem = NULL; struct trace_array *tr = file->tr; - struct dentry *d_events; const char *name; int ret;
@@ -2432,24 +2432,24 @@ event_create_dir(struct dentry *parent, struct trace_event_file *file) if (WARN_ON_ONCE(strcmp(call->class->system, TRACE_SYSTEM) == 0)) return -ENODEV;
- d_events = event_subsystem_dir(tr, call->class->system, file, parent); - if (!d_events) + ef_subsystem = event_subsystem_dir(tr, call->class->system, file, parent); + if (!ef_subsystem) return -ENOMEM;
name = trace_event_name(call); - file->dir = tracefs_create_dir(name, d_events); - if (!file->dir) { + file->ef = eventfs_add_dir(name, ef_subsystem); + if (IS_ERR(file->ef)) { pr_warn("Could not create tracefs '%s' directory\n", name); return -1; }
if (call->class->reg && !(call->flags & TRACE_EVENT_FL_IGNORE_ENABLE)) - trace_create_file("enable", TRACE_MODE_WRITE, file->dir, file, + eventfs_add_file("enable", TRACE_MODE_WRITE, file->ef, file, &ftrace_enable_fops);
#ifdef CONFIG_PERF_EVENTS if (call->event.type && call->class->reg) - trace_create_file("id", TRACE_MODE_READ, file->dir, + eventfs_add_file("id", TRACE_MODE_READ, file->ef, (void *)(long)call->event.type, &ftrace_event_id_fops); #endif @@ -2465,27 +2465,27 @@ event_create_dir(struct dentry *parent, struct trace_event_file *file) * triggers or filters. */ if (!(call->flags & TRACE_EVENT_FL_IGNORE_ENABLE)) { - trace_create_file("filter", TRACE_MODE_WRITE, file->dir, + eventfs_add_file("filter", TRACE_MODE_WRITE, file->ef, file, &ftrace_event_filter_fops);
- trace_create_file("trigger", TRACE_MODE_WRITE, file->dir, + eventfs_add_file("trigger", TRACE_MODE_WRITE, file->ef, file, &event_trigger_fops); }
#ifdef CONFIG_HIST_TRIGGERS - trace_create_file("hist", TRACE_MODE_READ, file->dir, file, + eventfs_add_file("hist", TRACE_MODE_READ, file->ef, file, &event_hist_fops); #endif #ifdef CONFIG_HIST_TRIGGERS_DEBUG - trace_create_file("hist_debug", TRACE_MODE_READ, file->dir, file, + eventfs_add_file("hist_debug", TRACE_MODE_READ, file->ef, file, &event_hist_debug_fops); #endif - trace_create_file("format", TRACE_MODE_READ, file->dir, call, + eventfs_add_file("format", TRACE_MODE_READ, file->ef, call, &ftrace_event_format_fops);
#ifdef CONFIG_TRACE_EVENT_INJECT if (call->event.type && call->class->reg) - trace_create_file("inject", 0200, file->dir, file, + eventfs_add_file("inject", 0200, file->ef, file, &event_inject_fops); #endif
@@ -3638,21 +3638,22 @@ create_event_toplevel_files(struct dentry *parent, struct trace_array *tr) { struct dentry *d_events; struct dentry *entry; + int error = 0;
entry = trace_create_file("set_event", TRACE_MODE_WRITE, parent, tr, &ftrace_set_event_fops); if (!entry) return -ENOMEM;
- d_events = tracefs_create_dir("events", parent); - if (!d_events) { + d_events = eventfs_create_events_dir("events", parent); + if (IS_ERR(d_events)) { pr_warn("Could not create tracefs 'events' directory\n"); return -ENOMEM; }
- entry = trace_create_file("enable", TRACE_MODE_WRITE, d_events, + error = eventfs_add_events_file("enable", TRACE_MODE_WRITE, d_events, tr, &ftrace_tr_enable_fops); - if (!entry) + if (error) return -ENOMEM;
/* There are not as crucial, just warn if they are not created */ @@ -3665,11 +3666,11 @@ create_event_toplevel_files(struct dentry *parent, struct trace_array *tr) &ftrace_set_event_notrace_pid_fops);
/* ring buffer internal formats */ - trace_create_file("header_page", TRACE_MODE_READ, d_events, + eventfs_add_events_file("header_page", TRACE_MODE_READ, d_events, ring_buffer_print_page_header, &ftrace_show_header_fops);
- trace_create_file("header_event", TRACE_MODE_READ, d_events, + eventfs_add_events_file("header_event", TRACE_MODE_READ, d_events, ring_buffer_print_entry_header, &ftrace_show_header_fops);
@@ -3757,7 +3758,7 @@ int event_trace_del_tracer(struct trace_array *tr)
down_write(&trace_event_sem); __trace_remove_event_dirs(tr); - tracefs_remove(tr->event_dir); + eventfs_remove_events_dir(tr->event_dir); up_write(&trace_event_sem);
tr->event_dir = NULL;
kprobe_args_char.tc, kprobe_args_string.tc has validation check for tracefs_create_dir, for eventfs it should be eventfs_create_dir.
Signed-off-by: Ajay Kaher akaher@vmware.com Co-developed-by: Steven Rostedt (VMware) rostedt@goodmis.org Signed-off-by: Steven Rostedt (VMware) rostedt@goodmis.org Tested-by: Ching-lin Yu chinglinyu@google.com Acked-by: Masami Hiramatsu (Google) mhiramat@kernel.org --- .../selftests/ftrace/test.d/kprobe/kprobe_args_char.tc | 9 +++++++-- .../selftests/ftrace/test.d/kprobe/kprobe_args_string.tc | 9 +++++++-- 2 files changed, 14 insertions(+), 4 deletions(-)
diff --git a/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_char.tc b/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_char.tc index 285b4770efad..ff7499eb98d6 100644 --- a/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_char.tc +++ b/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_char.tc @@ -34,14 +34,19 @@ mips*) esac
: "Test get argument (1)" -echo "p:testprobe tracefs_create_dir arg1=+0(${ARG1}):char" > kprobe_events +if grep -q eventfs_add_dir available_filter_functions; then + DIR_NAME="eventfs_add_dir" +else + DIR_NAME="tracefs_create_dir" +fi +echo "p:testprobe ${DIR_NAME} arg1=+0(${ARG1}):char" > kprobe_events echo 1 > events/kprobes/testprobe/enable echo "p:test $FUNCTION_FORK" >> kprobe_events grep -qe "testprobe.* arg1='t'" trace
echo 0 > events/kprobes/testprobe/enable : "Test get argument (2)" -echo "p:testprobe tracefs_create_dir arg1=+0(${ARG1}):char arg2=+0(${ARG1}):char[4]" > kprobe_events +echo "p:testprobe ${DIR_NAME} arg1=+0(${ARG1}):char arg2=+0(${ARG1}):char[4]" > kprobe_events echo 1 > events/kprobes/testprobe/enable echo "p:test $FUNCTION_FORK" >> kprobe_events grep -qe "testprobe.* arg1='t' arg2={'t','e','s','t'}" trace diff --git a/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_string.tc b/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_string.tc index a4f8e7c53c1f..a202b2ea4baf 100644 --- a/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_string.tc +++ b/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_args_string.tc @@ -37,14 +37,19 @@ loongarch*) esac
: "Test get argument (1)" -echo "p:testprobe tracefs_create_dir arg1=+0(${ARG1}):string" > kprobe_events +if grep -q eventfs_add_dir available_filter_functions; then + DIR_NAME="eventfs_add_dir" +else + DIR_NAME="tracefs_create_dir" +fi +echo "p:testprobe ${DIR_NAME} arg1=+0(${ARG1}):string" > kprobe_events echo 1 > events/kprobes/testprobe/enable echo "p:test $FUNCTION_FORK" >> kprobe_events grep -qe "testprobe.* arg1="test"" trace
echo 0 > events/kprobes/testprobe/enable : "Test get argument (2)" -echo "p:testprobe tracefs_create_dir arg1=+0(${ARG1}):string arg2=+0(${ARG1}):string" > kprobe_events +echo "p:testprobe ${DIR_NAME} arg1=+0(${ARG1}):string arg2=+0(${ARG1}):string" > kprobe_events echo 1 > events/kprobes/testprobe/enable echo "p:test $FUNCTION_FORK" >> kprobe_events grep -qe "testprobe.* arg1="test" arg2="test"" trace
linux-kselftest-mirror@lists.linaro.org