[ Adding Masami and stable ]
On Tue, 31 Oct 2023 00:27:07 +0000 Beau Belgrave beaub@linux.microsoft.com wrote:
This is a separate issue from eventfs (good, because I think I have solved all the known bugs for that one - phew!).
Anyway, I checkout the code just before adding the eventfs, and did the following:
# echo 'p:sched schedule' > /sys/kernel/tracing/kprobe_events # exec 5>>/sys/kernel/tracing/events/kprobes/sched/enable # > /sys/kernel/tracing/kprobe_events # exec 5>&-
And it worked fine. The above creates a kprobe event, opens the enable file of that event with the bash file descriptor #5, removes the kprobe event, and then closes the file descriptor #5.
But the I applied:
f5ca233e2e66d ("tracing: Increase trace array ref count on enable and filter files")
And do the above commands again and BOOM! it crashes with:
[ 217.879087] BUG: kernel NULL pointer dereference, address: 0000000000000028 [ 217.881121] #PF: supervisor read access in kernel mode [ 217.882532] #PF: error_code(0x0000) - not-present page [ 217.883932] PGD 0 P4D 0 [ 217.884672] Oops: 0000 [#1] PREEMPT SMP PTI [ 217.885821] CPU: 6 PID: 877 Comm: bash Not tainted 6.5.0-rc4-test-00008-g2c6b6b1029d4-dirty #186 [ 217.888178] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014 [ 217.890684] RIP: 0010:tracing_release_file_tr+0xc/0x50 [ 217.892097] Code: 5d 41 5c c3 cc cc cc cc 66 0f 1f 44 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 53 48 8b 87 80 04 00 00 <48> 8b 58 28 48 85 db 74 2d 31 f6 48 c7 c7 c0 b3 1e 93 e8 2d 48 ca [ 217.897102] RSP: 0018:ffffa5d400587eb0 EFLAGS: 00010282 [ 217.898531] RAX: 0000000000000000 RBX: ffff907d06aa6c00 RCX: 0000000000000000 [ 217.900471] RDX: 0000000000000000 RSI: ffff907d06aa6c00 RDI: ffff907d0bf21bd0 [ 217.902403] RBP: 00000000000d801e R08: 0000000000000001 R09: ffff907d0bf21bd0 [ 217.904350] R10: 0000000000000001 R11: 0000000000000001 R12: ffff907d0bf21bd0 [ 217.906282] R13: ffff907d103708e0 R14: ffff907d0a178c30 R15: 0000000000000000 [ 217.908215] FS: 00007ff49c150740(0000) GS:ffff907e77d00000(0000) knlGS:0000000000000000 [ 217.910405] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 217.911970] CR2: 0000000000000028 CR3: 00000001051ec005 CR4: 0000000000170ee0 [ 217.913924] Call Trace: [ 217.914624] <TASK> [ 217.915232] ? __die+0x23/0x70 [ 217.916105] ? page_fault_oops+0x17d/0x4d0 [ 217.917262] ? exc_page_fault+0x7f/0x200 [ 217.918350] ? asm_exc_page_fault+0x26/0x30 [ 217.919513] ? tracing_release_file_tr+0xc/0x50 [ 217.920780] __fput+0xfb/0x2a0 [ 217.921651] task_work_run+0x5d/0x90 [ 217.922652] exit_to_user_mode_prepare+0x231/0x240 [ 217.923981] syscall_exit_to_user_mode+0x1a/0x50 [ 217.925248] do_syscall_64+0x4b/0xc0 [ 217.926176] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
Look familiar?
It's now midnight and I've been at this all day. I'm going to look more at this tomorrow. It's not going to be easy :-( I'm not sure what exactly to do. We may need to prevent dynamic events from being deleted if there's any of its files opened (enable, format, etc).
That is, if you try to delete the event, it will give you an -EBUSY, just like having them enabled would.
More good news, that commit is in 6.6 *and* marked for stable :-p
-- Steve
On Tue, 31 Oct 2023 00:00:31 -0400 Steven Rostedt rostedt@goodmis.org wrote:
The below patch appears to solve this. Care to test it?
-- Steve
Signed-off-by: Steven Rostedt (Google) rostedt@goodmis.org --- include/linux/trace_events.h | 4 ++++ kernel/trace/trace.c | 15 +++++++++++++++ kernel/trace/trace.h | 4 ++++ kernel/trace/trace_events.c | 29 +++++++++++++++++++++++++---- kernel/trace/trace_events_filter.c | 3 +++ 5 files changed, 51 insertions(+), 4 deletions(-)
diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h index 12207dc6722d..696f8dc4aa53 100644 --- a/include/linux/trace_events.h +++ b/include/linux/trace_events.h @@ -492,6 +492,7 @@ enum { EVENT_FILE_FL_TRIGGER_COND_BIT, EVENT_FILE_FL_PID_FILTER_BIT, EVENT_FILE_FL_WAS_ENABLED_BIT, + EVENT_FILE_FL_FREED_BIT, };
extern struct trace_event_file *trace_get_event_file(const char *instance, @@ -630,6 +631,7 @@ extern int __kprobe_event_add_fields(struct dynevent_cmd *cmd, ...); * TRIGGER_COND - When set, one or more triggers has an associated filter * PID_FILTER - When set, the event is filtered based on pid * WAS_ENABLED - Set when enabled to know to clear trace on module removal + * FREED - File descriptor is freed, all fields should be considered invalid */ enum { EVENT_FILE_FL_ENABLED = (1 << EVENT_FILE_FL_ENABLED_BIT), @@ -643,6 +645,7 @@ enum { EVENT_FILE_FL_TRIGGER_COND = (1 << EVENT_FILE_FL_TRIGGER_COND_BIT), EVENT_FILE_FL_PID_FILTER = (1 << EVENT_FILE_FL_PID_FILTER_BIT), EVENT_FILE_FL_WAS_ENABLED = (1 << EVENT_FILE_FL_WAS_ENABLED_BIT), + EVENT_FILE_FL_FREED = (1 << EVENT_FILE_FL_FREED_BIT), };
struct trace_event_file { @@ -671,6 +674,7 @@ struct trace_event_file { * caching and such. Which is mostly OK ;-) */ unsigned long flags; + atomic_t ref; /* ref count for opened files */ atomic_t sm_ref; /* soft-mode reference counter */ atomic_t tm_ref; /* trigger-mode reference counter */ }; diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c index 2539cfc20a97..9aebf904ff97 100644 --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -4978,6 +4978,20 @@ int tracing_open_file_tr(struct inode *inode, struct file *filp) if (ret) return ret;
+ mutex_lock(&event_mutex); + + /* Fail if the file is marked for removal */ + if (file->flags & EVENT_FILE_FL_FREED) { + trace_array_put(file->tr); + ret = -ENODEV; + } else { + event_file_get(file); + } + + mutex_unlock(&event_mutex); + if (ret) + return ret; + filp->private_data = inode->i_private;
return 0; @@ -4988,6 +5002,7 @@ int tracing_release_file_tr(struct inode *inode, struct file *filp) struct trace_event_file *file = inode->i_private;
trace_array_put(file->tr); + event_file_put(file);
return 0; } diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h index 0e1405abf4f7..4fcd293ecbd4 100644 --- a/kernel/trace/trace.h +++ b/kernel/trace/trace.h @@ -1669,6 +1669,10 @@ extern void event_trigger_unregister(struct event_command *cmd_ops, char *glob, struct event_trigger_data *trigger_data);
+extern void event_file_get(struct trace_event_file *file); +extern void event_file_put(struct trace_event_file *file); + + /** * struct event_trigger_ops - callbacks for trace event triggers * diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c index 70a029ae3c65..ec86f4894455 100644 --- a/kernel/trace/trace_events.c +++ b/kernel/trace/trace_events.c @@ -990,13 +990,33 @@ static void remove_subsystem(struct trace_subsystem_dir *dir) } }
+void event_file_get(struct trace_event_file *file) +{ + atomic_inc(&file->ref); +} + +void event_file_put(struct trace_event_file *file) +{ + if (WARN_ON_ONCE(!atomic_read(&file->ref))) { + if (file->flags & EVENT_FILE_FL_FREED) + kmem_cache_free(file_cachep, file); + return; + } + + if (atomic_dec_and_test(&file->ref) && + (file->flags & EVENT_FILE_FL_FREED)) { + kmem_cache_free(file_cachep, file); + } +} + static void remove_event_file_dir(struct trace_event_file *file) { eventfs_remove_dir(file->ei); list_del(&file->list); remove_subsystem(file->system); free_event_filter(file->filter); - kmem_cache_free(file_cachep, file); + file->flags |= EVENT_FILE_FL_FREED; + event_file_put(file); }
/* @@ -1369,7 +1389,7 @@ event_enable_read(struct file *filp, char __user *ubuf, size_t cnt, flags = file->flags; mutex_unlock(&event_mutex);
- if (!file) + if (!file || flags & EVENT_FILE_FL_FREED) return -ENODEV;
if (flags & EVENT_FILE_FL_ENABLED && @@ -1404,7 +1424,7 @@ event_enable_write(struct file *filp, const char __user *ubuf, size_t cnt, ret = -ENODEV; mutex_lock(&event_mutex); file = event_file_data(filp); - if (likely(file)) { + if (likely(file && !(file->flags & EVENT_FILE_FL_FREED))) { printk("update file = %px\n", file); printk("update tr = %px\n", file->tr); ret = tracing_update_buffers(file->tr); @@ -1686,7 +1706,7 @@ event_filter_read(struct file *filp, char __user *ubuf, size_t cnt,
mutex_lock(&event_mutex); file = event_file_data(filp); - if (file) + if (file && !(file->flags & EVENT_FILE_FL_FREED)) print_event_filter(file, s); mutex_unlock(&event_mutex);
@@ -2905,6 +2925,7 @@ trace_create_new_event(struct trace_event_call *call, atomic_set(&file->tm_ref, 0); INIT_LIST_HEAD(&file->triggers); list_add(&file->list, &tr->events); + event_file_get(file);
return file; } diff --git a/kernel/trace/trace_events_filter.c b/kernel/trace/trace_events_filter.c index 33264e510d16..0c611b281a5b 100644 --- a/kernel/trace/trace_events_filter.c +++ b/kernel/trace/trace_events_filter.c @@ -2349,6 +2349,9 @@ int apply_event_filter(struct trace_event_file *file, char *filter_string) struct event_filter *filter = NULL; int err;
+ if (file->flags & EVENT_FILE_FL_FREED) + return -ENODEV; + if (!strcmp(strstrip(filter_string), "0")) { filter_disable(file); filter = event_filter(file);
On Tue, 31 Oct 2023 10:45:51 -0400 Steven Rostedt rostedt@goodmis.org wrote:
Well, this won't apply because I still had debugging in it when I added these changes.
Here's a better version:
diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h index 12207dc6722d..696f8dc4aa53 100644 --- a/include/linux/trace_events.h +++ b/include/linux/trace_events.h @@ -492,6 +492,7 @@ enum { EVENT_FILE_FL_TRIGGER_COND_BIT, EVENT_FILE_FL_PID_FILTER_BIT, EVENT_FILE_FL_WAS_ENABLED_BIT, + EVENT_FILE_FL_FREED_BIT, };
extern struct trace_event_file *trace_get_event_file(const char *instance, @@ -630,6 +631,7 @@ extern int __kprobe_event_add_fields(struct dynevent_cmd *cmd, ...); * TRIGGER_COND - When set, one or more triggers has an associated filter * PID_FILTER - When set, the event is filtered based on pid * WAS_ENABLED - Set when enabled to know to clear trace on module removal + * FREED - File descriptor is freed, all fields should be considered invalid */ enum { EVENT_FILE_FL_ENABLED = (1 << EVENT_FILE_FL_ENABLED_BIT), @@ -643,6 +645,7 @@ enum { EVENT_FILE_FL_TRIGGER_COND = (1 << EVENT_FILE_FL_TRIGGER_COND_BIT), EVENT_FILE_FL_PID_FILTER = (1 << EVENT_FILE_FL_PID_FILTER_BIT), EVENT_FILE_FL_WAS_ENABLED = (1 << EVENT_FILE_FL_WAS_ENABLED_BIT), + EVENT_FILE_FL_FREED = (1 << EVENT_FILE_FL_FREED_BIT), };
struct trace_event_file { @@ -671,6 +674,7 @@ struct trace_event_file { * caching and such. Which is mostly OK ;-) */ unsigned long flags; + atomic_t ref; /* ref count for opened files */ atomic_t sm_ref; /* soft-mode reference counter */ atomic_t tm_ref; /* trigger-mode reference counter */ }; diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c index 2539cfc20a97..9aebf904ff97 100644 --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -4978,6 +4978,20 @@ int tracing_open_file_tr(struct inode *inode, struct file *filp) if (ret) return ret;
+ mutex_lock(&event_mutex); + + /* Fail if the file is marked for removal */ + if (file->flags & EVENT_FILE_FL_FREED) { + trace_array_put(file->tr); + ret = -ENODEV; + } else { + event_file_get(file); + } + + mutex_unlock(&event_mutex); + if (ret) + return ret; + filp->private_data = inode->i_private;
return 0; @@ -4988,6 +5002,7 @@ int tracing_release_file_tr(struct inode *inode, struct file *filp) struct trace_event_file *file = inode->i_private;
trace_array_put(file->tr); + event_file_put(file);
return 0; } diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h index 0e1405abf4f7..b7f4ea25a194 100644 --- a/kernel/trace/trace.h +++ b/kernel/trace/trace.h @@ -1669,6 +1669,9 @@ extern void event_trigger_unregister(struct event_command *cmd_ops, char *glob, struct event_trigger_data *trigger_data);
+extern void event_file_get(struct trace_event_file *file); +extern void event_file_put(struct trace_event_file *file); + /** * struct event_trigger_ops - callbacks for trace event triggers * diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c index f9e3e24d8796..f29e815ca5b2 100644 --- a/kernel/trace/trace_events.c +++ b/kernel/trace/trace_events.c @@ -990,13 +990,35 @@ static void remove_subsystem(struct trace_subsystem_dir *dir) } }
+void event_file_get(struct trace_event_file *file) +{ + atomic_inc(&file->ref); +} + +void event_file_put(struct trace_event_file *file) +{ + if (WARN_ON_ONCE(!atomic_read(&file->ref))) { + if (file->flags & EVENT_FILE_FL_FREED) + kmem_cache_free(file_cachep, file); + return; + } + + if (atomic_dec_and_test(&file->ref)) { + /* Count should only go to zero when it is freed */ + if (WARN_ON_ONCE(!(file->flags & EVENT_FILE_FL_FREED))) + return; + kmem_cache_free(file_cachep, file); + } +} + static void remove_event_file_dir(struct trace_event_file *file) { eventfs_remove_dir(file->ei); list_del(&file->list); remove_subsystem(file->system); free_event_filter(file->filter); - kmem_cache_free(file_cachep, file); + file->flags |= EVENT_FILE_FL_FREED; + event_file_put(file); }
/* @@ -1369,7 +1391,7 @@ event_enable_read(struct file *filp, char __user *ubuf, size_t cnt, flags = file->flags; mutex_unlock(&event_mutex);
- if (!file) + if (!file || flags & EVENT_FILE_FL_FREED) return -ENODEV;
if (flags & EVENT_FILE_FL_ENABLED && @@ -1403,7 +1425,7 @@ event_enable_write(struct file *filp, const char __user *ubuf, size_t cnt, ret = -ENODEV; mutex_lock(&event_mutex); file = event_file_data(filp); - if (likely(file)) { + if (likely(file && !(file->flags & EVENT_FILE_FL_FREED))) { ret = tracing_update_buffers(file->tr); if (ret < 0) { mutex_unlock(&event_mutex); @@ -1683,7 +1705,7 @@ event_filter_read(struct file *filp, char __user *ubuf, size_t cnt,
mutex_lock(&event_mutex); file = event_file_data(filp); - if (file) + if (file && !(file->flags & EVENT_FILE_FL_FREED)) print_event_filter(file, s); mutex_unlock(&event_mutex);
@@ -2902,6 +2924,7 @@ trace_create_new_event(struct trace_event_call *call, atomic_set(&file->tm_ref, 0); INIT_LIST_HEAD(&file->triggers); list_add(&file->list, &tr->events); + event_file_get(file);
return file; } diff --git a/kernel/trace/trace_events_filter.c b/kernel/trace/trace_events_filter.c index 33264e510d16..0c611b281a5b 100644 --- a/kernel/trace/trace_events_filter.c +++ b/kernel/trace/trace_events_filter.c @@ -2349,6 +2349,9 @@ int apply_event_filter(struct trace_event_file *file, char *filter_string) struct event_filter *filter = NULL; int err;
+ if (file->flags & EVENT_FILE_FL_FREED) + return -ENODEV; + if (!strcmp(strstrip(filter_string), "0")) { filter_disable(file); filter = event_filter(file);
On Tue, Oct 31, 2023 at 12:07:02PM -0400, Steven Rostedt wrote:
Thanks!
I ended up testing the official patch [1] instead of this one along with patches [2][3] in the next branch.
My printk's are now good, everything looks back to normal for my repro.
I added my Tested-by onto the official patch thread.
Thanks, -Beau
1. https://lore.kernel.org/linux-trace-kernel/20231031122453.7a48b923@gandalf.l... 2. https://lore.kernel.org/linux-trace-kernel/20231030114047.759c7bdf@gandalf.l... 3. https://lore.kernel.org/linux-trace-kernel/20231028164650.4f5ea18a@rorschach...
linux-stable-mirror@lists.linaro.org