The current implementation of netconsole sends all log messages in parallel, which can lead to an intermixed and interleaved output on the receiving side. This makes it challenging to demultiplex the messages and attribute them to their originating CPUs.
As a result, users and developers often struggle to effectively analyze and debug the parallel log output received through netconsole.
Example of a message got from produciton hosts:
------------[ cut here ]------------ ------------[ cut here ]------------ refcount_t: saturated; leaking memory. WARNING: CPU: 2 PID: 1613668 at lib/refcount.c:22 refcount_warn_saturate+0x5e/0xe0 refcount_t: addition on 0; use-after-free. WARNING: CPU: 26 PID: 4139916 at lib/refcount.c:25 refcount_warn_saturate+0x7d/0xe0 Modules linked in: bpf_preload(E) vhost_net(E) tun(E) vhost(E)
This series of patches introduces a new feature to the netconsole subsystem that allows the automatic population of the CPU number in the userdata field for each log message. This enhancement provides several benefits:
* Improved demultiplexing of parallel log output: When multiple CPUs are sending messages concurrently, the added CPU number in the userdata makes it easier to differentiate and attribute the messages to their originating CPUs.
* Better visibility into message sources: The CPU number information gives users and developers more insight into which specific CPU a particular log message came from, which can be valuable for debugging and analysis.
The changes in this series are as follows Patches::
Patch "consolidate send buffers into netconsole_target struct" =================================================
Move the static buffers to netconsole target, from static declaration in send_msg_no_fragmentation() and send_msg_fragmented().
Patch "netconsole: Rename userdata to extradata" ================================================= Create the a concept of extradata, which encompasses the concept of userdata and the upcoming sysdatao
Sysdata is a new concept being added, which is basically fields that are populated by the kernel. At this time only the CPU#, but, there is a desire to add current task name, kernel release version, etc.
Patch "netconsole: Helper to count number of used entries" =========================================================== Create a simple helper to count number of entries in extradata. I am separating this in a function since it will need to count userdata and sysdata. For instance, when the user adds an extra userdata, we need to check if there is space, counting the previous data entries (from userdata and cpu data)
Patch "Introduce configfs helpers for sysdata features" ====================================================== Create the concept of sysdata feature in the netconsole target, and create the configfs helpers to enable the bit in nt->sysdata
Patch "Include sysdata in extradata entry count" ================================================ Add the concept of sysdata when counting for available space in the buffer. This will protect users from creating new userdata/sysdata if there is no more space
Patch "netconsole: add support for sysdata and CPU population" =============================================================== This is the core patch. Basically add a new option to enable automatic CPU number population in the netconsole userdata Provides a new "cpu_nr" sysfs attribute to control this feature
Patch "netconsole: selftest: test CPU number auto-population" ============================================================= Expands the existing netconsole selftest to verify the CPU number auto-population functionality Ensures the received netconsole messages contain the expected "cpu=<CPU>" entry in the message. Test different permutation with userdata
Patch "netconsole: docs: Add documentation for CPU number auto-population" ============================================================================= Updates the netconsole documentation to explain the new CPU number auto-population feature Provides instructions on how to enable and use the feature
I believe these changes will be a valuable addition to the netconsole subsystem, enhancing its usefulness for kernel developers and users.
Signed-off-by: Breno Leitao leitao@debian.org --- Changes in v3: - Moved the buffer into netconsole_target, avoiding static functions in the send path (Jakub). - Fix a documentation error (Randy Dunlap) - Created a function that handle all the extradata, consolidating it in a single place (Jakub) - Split the patch even more, trying to simplify the review. - Link to v2: https://lore.kernel.org/r/20250115-netcon_cpu-v2-0-95971b44dc56@debian.org
Changes in v2: - Create the concept of extradata and sysdata. This will make the design easier to understand, and the code easier to read. * Basically extradata encompasses userdata and the new sysdata. Userdata originates from user, and sysdata originates in kernel. - Improved the test to send from a very specific CPU, which can be checked to be correct on the other side, as suggested by Jakub. - Fixed a bug where CPU # was populated at the wrong place - Link to v1: https://lore.kernel.org/r/20241113-netcon_cpu-v1-0-d187bf7c0321@debian.org
--- Breno Leitao (8): netconsole: consolidate send buffers into netconsole_target struct netconsole: Rename userdata to extradata netconsole: Helper to count number of used entries netconsole: Introduce configfs helpers for sysdata features netconsole: Include sysdata in extradata entry count netconsole: add support for sysdata and CPU population netconsole: selftest: test for sysdata CPU netconsole: docs: Add documentation for CPU number auto-population
Documentation/networking/netconsole.rst | 45 ++++ drivers/net/netconsole.c | 260 ++++++++++++++++----- tools/testing/selftests/drivers/net/Makefile | 1 + .../selftests/drivers/net/lib/sh/lib_netcons.sh | 17 ++ .../selftests/drivers/net/netcons_sysdata.sh | 167 +++++++++++++ 5 files changed, 426 insertions(+), 64 deletions(-) --- base-commit: fa10c9f5aa705deb43fd65623508074bee942764 change-id: 20241108-netcon_cpu-ce3917e88f4b
Best regards,
Move the static buffers from send_msg_no_fragmentation() and send_msg_fragmented() into the netconsole_target structure. This simplifies the code by: - Eliminating redundant static buffers - Centralizing buffer management in the target structure - Reducing memory usage by 1KB (one buffer instead of two)
The buffer in netconsole_target is protected by target_list_lock, maintaining the same synchronization semantics as the original code.
Suggested-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Breno Leitao leitao@debian.org --- drivers/net/netconsole.c | 29 +++++++++++++++-------------- 1 file changed, 15 insertions(+), 14 deletions(-)
diff --git a/drivers/net/netconsole.c b/drivers/net/netconsole.c index 86ab4a42769a49eebe5dd6f01dafafc6c86ec54f..1a78704681184673f5c1ba8ae665e46751384293 100644 --- a/drivers/net/netconsole.c +++ b/drivers/net/netconsole.c @@ -137,6 +137,8 @@ struct netconsole_target { bool extended; bool release; struct netpoll np; + /* protected by target_list_lock */ + char buf[MAX_PRINT_CHUNK]; };
#ifdef CONFIG_NETCONSOLE_DYNAMIC @@ -1117,7 +1119,6 @@ static void send_msg_no_fragmentation(struct netconsole_target *nt, int msg_len, int release_len) { - static char buf[MAX_PRINT_CHUNK]; /* protected by target_list_lock */ const char *userdata = NULL; const char *release;
@@ -1128,18 +1129,18 @@ static void send_msg_no_fragmentation(struct netconsole_target *nt, if (release_len) { release = init_utsname()->release;
- scnprintf(buf, MAX_PRINT_CHUNK, "%s,%s", release, msg); + scnprintf(nt->buf, MAX_PRINT_CHUNK, "%s,%s", release, msg); msg_len += release_len; } else { - memcpy(buf, msg, msg_len); + memcpy(nt->buf, msg, msg_len); }
if (userdata) - msg_len += scnprintf(&buf[msg_len], + msg_len += scnprintf(&nt->buf[msg_len], MAX_PRINT_CHUNK - msg_len, "%s", userdata);
- send_udp(nt, buf, msg_len); + send_udp(nt, nt->buf, msg_len); }
static void append_release(char *buf) @@ -1150,7 +1151,7 @@ static void append_release(char *buf) scnprintf(buf, MAX_PRINT_CHUNK, "%s,", release); }
-static void send_fragmented_body(struct netconsole_target *nt, char *buf, +static void send_fragmented_body(struct netconsole_target *nt, const char *msgbody, int header_len, int msgbody_len) { @@ -1181,7 +1182,7 @@ static void send_fragmented_body(struct netconsole_target *nt, char *buf, int this_offset = 0; int this_chunk = 0;
- this_header += scnprintf(buf + this_header, + this_header += scnprintf(nt->buf + this_header, MAX_PRINT_CHUNK - this_header, ",ncfrag=%d/%d;", offset, body_len); @@ -1192,7 +1193,8 @@ static void send_fragmented_body(struct netconsole_target *nt, char *buf, MAX_PRINT_CHUNK - this_header); if (WARN_ON_ONCE(this_chunk <= 0)) return; - memcpy(buf + this_header, msgbody + offset, this_chunk); + memcpy(nt->buf + this_header, msgbody + offset, + this_chunk); this_offset += this_chunk; }
@@ -1226,13 +1228,13 @@ static void send_fragmented_body(struct netconsole_target *nt, char *buf, */ return;
- memcpy(buf + this_header + this_offset, + memcpy(nt->buf + this_header + this_offset, userdata + sent_userdata, this_chunk); this_offset += this_chunk; }
- send_udp(nt, buf, this_header + this_offset); + send_udp(nt, nt->buf, this_header + this_offset); offset += this_offset; } } @@ -1242,7 +1244,6 @@ static void send_msg_fragmented(struct netconsole_target *nt, int msg_len, int release_len) { - static char buf[MAX_PRINT_CHUNK]; /* protected by target_list_lock */ int header_len, msgbody_len; const char *msgbody;
@@ -1260,16 +1261,16 @@ static void send_msg_fragmented(struct netconsole_target *nt, * "ncfrag=<byte-offset>/<total-bytes>" */ if (release_len) - append_release(buf); + append_release(nt->buf);
/* Copy the header into the buffer */ - memcpy(buf + release_len, msg, header_len); + memcpy(nt->buf + release_len, msg, header_len); header_len += release_len;
/* for now on, the header will be persisted, and the msgbody * will be replaced */ - send_fragmented_body(nt, buf, msgbody, header_len, msgbody_len); + send_fragmented_body(nt, msgbody, header_len, msgbody_len); }
/**
Rename "userdata" to "extradata" since this structure will hold both user and system data in future patches. Keep "userdata" term only for data that comes from userspace (configfs), while "extradata" encompasses both userdata and future kerneldata.
These are the rules of the design
1. extradata_complete will hold userdata and sysdata (coming) 2. sysdata will come after userdata_length 3. extradata_complete[userdata_length] string will be replaced at every message 5. userdata is replaced when configfs changes (update_userdata()) 6. sysdata is replaced at every message
Example: extradata_complete = "userkey=uservalue cpu=42" userdata_length = 17 sysdata_length = 7 (space (" ") is part of sysdata)
Since sysdata is still not available, you will see the following in the send functions:
extradata_len = nt->userdata_length;
The upcoming patches will, which will add support for sysdata, will change it to:
extradata_len = nt->userdata_length + sysdata_len;
Signed-off-by: Breno Leitao leitao@debian.org --- drivers/net/netconsole.c | 87 ++++++++++++++++++++++++------------------------ 1 file changed, 44 insertions(+), 43 deletions(-)
diff --git a/drivers/net/netconsole.c b/drivers/net/netconsole.c index 1a78704681184673f5c1ba8ae665e46751384293..6c67840e583a295e4f357bd19be0dd0ae3b76627 100644 --- a/drivers/net/netconsole.c +++ b/drivers/net/netconsole.c @@ -45,12 +45,12 @@ MODULE_DESCRIPTION("Console driver for network interfaces"); MODULE_LICENSE("GPL");
#define MAX_PARAM_LENGTH 256 -#define MAX_USERDATA_ENTRY_LENGTH 256 -#define MAX_USERDATA_VALUE_LENGTH 200 +#define MAX_EXTRADATA_ENTRY_LEN 256 +#define MAX_EXTRADATA_VALUE_LEN 200 /* The number 3 comes from userdata entry format characters (' ', '=', '\n') */ -#define MAX_USERDATA_NAME_LENGTH (MAX_USERDATA_ENTRY_LENGTH - \ - MAX_USERDATA_VALUE_LENGTH - 3) -#define MAX_USERDATA_ITEMS 16 +#define MAX_EXTRADATA_NAME_LEN (MAX_EXTRADATA_ENTRY_LEN - \ + MAX_EXTRADATA_VALUE_LEN - 3) +#define MAX_EXTRADATA_ITEMS 16 #define MAX_PRINT_CHUNK 1000
static char config[MAX_PARAM_LENGTH]; @@ -102,8 +102,8 @@ struct netconsole_target_stats { * @list: Links this target into the target_list. * @group: Links us into the configfs subsystem hierarchy. * @userdata_group: Links to the userdata configfs hierarchy - * @userdata_complete: Cached, formatted string of append - * @userdata_length: String length of userdata_complete + * @extradata_complete: Cached, formatted string of append + * @userdata_length: String length of usedata in extradata_complete. * @stats: Packet send stats for the target. Used for debugging. * @enabled: On / off knob to enable / disable target. * Visible from userspace (read-write). @@ -129,7 +129,7 @@ struct netconsole_target { #ifdef CONFIG_NETCONSOLE_DYNAMIC struct config_group group; struct config_group userdata_group; - char userdata_complete[MAX_USERDATA_ENTRY_LENGTH * MAX_USERDATA_ITEMS]; + char extradata_complete[MAX_EXTRADATA_ENTRY_LEN * MAX_EXTRADATA_ITEMS]; size_t userdata_length; #endif struct netconsole_target_stats stats; @@ -689,7 +689,7 @@ static ssize_t remote_mac_store(struct config_item *item, const char *buf,
struct userdatum { struct config_item item; - char value[MAX_USERDATA_VALUE_LENGTH]; + char value[MAX_EXTRADATA_VALUE_LEN]; };
static struct userdatum *to_userdatum(struct config_item *item) @@ -726,13 +726,13 @@ static void update_userdata(struct netconsole_target *nt)
/* Clear the current string in case the last userdatum was deleted */ nt->userdata_length = 0; - nt->userdata_complete[0] = 0; + nt->extradata_complete[0] = 0;
list_for_each(entry, &nt->userdata_group.cg_children) { struct userdatum *udm_item; struct config_item *item;
- if (WARN_ON_ONCE(child_count >= MAX_USERDATA_ITEMS)) + if (WARN_ON_ONCE(child_count >= MAX_EXTRADATA_ITEMS)) break; child_count++;
@@ -740,19 +740,19 @@ static void update_userdata(struct netconsole_target *nt) udm_item = to_userdatum(item);
/* Skip userdata with no value set */ - if (strnlen(udm_item->value, MAX_USERDATA_VALUE_LENGTH) == 0) + if (strnlen(udm_item->value, MAX_EXTRADATA_VALUE_LEN) == 0) continue;
- /* This doesn't overflow userdata_complete since it will write - * one entry length (1/MAX_USERDATA_ITEMS long), entry count is + /* This doesn't overflow extradata_complete since it will write + * one entry length (1/MAX_EXTRADATA_ITEMS long), entry count is * checked to not exceed MAX items with child_count above */ - complete_idx += scnprintf(&nt->userdata_complete[complete_idx], - MAX_USERDATA_ENTRY_LENGTH, " %s=%s\n", + complete_idx += scnprintf(&nt->extradata_complete[complete_idx], + MAX_EXTRADATA_ENTRY_LEN, " %s=%s\n", item->ci_name, udm_item->value); } - nt->userdata_length = strnlen(nt->userdata_complete, - sizeof(nt->userdata_complete)); + nt->userdata_length = strnlen(nt->extradata_complete, + sizeof(nt->extradata_complete)); }
static ssize_t userdatum_value_store(struct config_item *item, const char *buf, @@ -763,7 +763,7 @@ static ssize_t userdatum_value_store(struct config_item *item, const char *buf, struct userdata *ud; ssize_t ret;
- if (count > MAX_USERDATA_VALUE_LENGTH) + if (count > MAX_EXTRADATA_VALUE_LEN) return -EMSGSIZE;
mutex_lock(&dynamic_netconsole_mutex); @@ -812,13 +812,13 @@ static struct config_item *userdatum_make_item(struct config_group *group, struct userdata *ud; size_t child_count;
- if (strlen(name) > MAX_USERDATA_NAME_LENGTH) + if (strlen(name) > MAX_EXTRADATA_NAME_LEN) return ERR_PTR(-ENAMETOOLONG);
ud = to_userdata(&group->cg_item); nt = userdata_to_target(ud); child_count = list_count_nodes(&nt->userdata_group.cg_children); - if (child_count >= MAX_USERDATA_ITEMS) + if (child_count >= MAX_EXTRADATA_ITEMS) return ERR_PTR(-ENOSPC);
udm = kzalloc(sizeof(*udm), GFP_KERNEL); @@ -1119,11 +1119,11 @@ static void send_msg_no_fragmentation(struct netconsole_target *nt, int msg_len, int release_len) { - const char *userdata = NULL; + const char *extradata = NULL; const char *release;
#ifdef CONFIG_NETCONSOLE_DYNAMIC - userdata = nt->userdata_complete; + extradata = nt->extradata_complete; #endif
if (release_len) { @@ -1135,10 +1135,10 @@ static void send_msg_no_fragmentation(struct netconsole_target *nt, memcpy(nt->buf, msg, msg_len); }
- if (userdata) + if (extradata) msg_len += scnprintf(&nt->buf[msg_len], MAX_PRINT_CHUNK - msg_len, - "%s", userdata); + "%s", extradata);
send_udp(nt, nt->buf, msg_len); } @@ -1155,24 +1155,25 @@ static void send_fragmented_body(struct netconsole_target *nt, const char *msgbody, int header_len, int msgbody_len) { - const char *userdata = NULL; + int sent_extradata, preceding_bytes; + const char *extradata = NULL; int body_len, offset = 0; - int userdata_len = 0; + int extradata_len = 0;
#ifdef CONFIG_NETCONSOLE_DYNAMIC - userdata = nt->userdata_complete; - userdata_len = nt->userdata_length; + extradata = nt->extradata_complete; + extradata_len = nt->userdata_length; #endif
/* body_len represents the number of bytes that will be sent. This is * bigger than MAX_PRINT_CHUNK, thus, it will be split in multiple * packets */ - body_len = msgbody_len + userdata_len; + body_len = msgbody_len + extradata_len;
/* In each iteration of the while loop below, we send a packet * containing the header and a portion of the body. The body is - * composed of two parts: msgbody and userdata. We keep track of how + * composed of two parts: msgbody and extradata. We keep track of how * many bytes have been sent so far using the offset variable, which * ranges from 0 to the total length of the body. */ @@ -1200,36 +1201,36 @@ static void send_fragmented_body(struct netconsole_target *nt,
/* msgbody was finally written, either in the previous * messages and/or in the current buf. Time to write - * the userdata. + * the extradata. */ msgbody_written |= offset + this_offset >= msgbody_len;
- /* Msg body is fully written and there is pending userdata to - * write, append userdata in this chunk + /* Msg body is fully written and there is pending extradata to + * write, append extradata in this chunk */ if (msgbody_written && offset + this_offset < body_len) { /* Track how much user data was already sent. First * time here, sent_userdata is zero */ - int sent_userdata = (offset + this_offset) - msgbody_len; + sent_extradata = (offset + this_offset) - msgbody_len; /* offset of bytes used in current buf */ - int preceding_bytes = this_chunk + this_header; + preceding_bytes = this_chunk + this_header;
- if (WARN_ON_ONCE(sent_userdata < 0)) + if (WARN_ON_ONCE(sent_extradata < 0)) return;
- this_chunk = min(userdata_len - sent_userdata, + this_chunk = min(extradata_len - sent_extradata, MAX_PRINT_CHUNK - preceding_bytes); if (WARN_ON_ONCE(this_chunk < 0)) /* this_chunk could be zero if all the previous * message used all the buffer. This is not a - * problem, userdata will be sent in the next + * problem, extradata will be sent in the next * iteration */ return;
memcpy(nt->buf + this_header + this_offset, - userdata + sent_userdata, + extradata + sent_extradata, this_chunk); this_offset += this_chunk; } @@ -1286,17 +1287,17 @@ static void send_msg_fragmented(struct netconsole_target *nt, static void send_ext_msg_udp(struct netconsole_target *nt, const char *msg, int msg_len) { - int userdata_len = 0; + int extradata_len = 0; int release_len = 0;
#ifdef CONFIG_NETCONSOLE_DYNAMIC - userdata_len = nt->userdata_length; + extradata_len = nt->userdata_length; #endif
if (nt->release) release_len = strlen(init_utsname()->release) + 1;
- if (msg_len + release_len + userdata_len <= MAX_PRINT_CHUNK) + if (msg_len + release_len + extradata_len <= MAX_PRINT_CHUNK) return send_msg_no_fragmentation(nt, msg, msg_len, release_len);
return send_msg_fragmented(nt, msg, msg_len, release_len);
Add a helper function nr_extradata_entries() to count the number of used extradata entries in a netconsole target. This refactors the duplicate code for counting entries into a single function, which will be reused by upcoming CPU sysdata changes.
The helper uses list_count_nodes() to count the number of children in the userdata group configfs hierarchy.
Signed-off-by: Breno Leitao leitao@debian.org --- drivers/net/netconsole.c | 14 +++++++++++--- 1 file changed, 11 insertions(+), 3 deletions(-)
diff --git a/drivers/net/netconsole.c b/drivers/net/netconsole.c index 6c67840e583a295e4f357bd19be0dd0ae3b76627..1c197da806962a751a3da8794def6a97cff5f701 100644 --- a/drivers/net/netconsole.c +++ b/drivers/net/netconsole.c @@ -661,6 +661,16 @@ static ssize_t remote_ip_store(struct config_item *item, const char *buf, return ret; }
+/* Count number of entries we have in extradata. + * This is important because the extradata_complete only supports + * MAX_EXTRADATA_ITEMS entries. Before enabling any new {user,sys}data + * feature, number of entries needs to checked for available space. + */ +static size_t count_extradata_entries(struct netconsole_target *nt) +{ + return list_count_nodes(&nt->userdata_group.cg_children); +} + static ssize_t remote_mac_store(struct config_item *item, const char *buf, size_t count) { @@ -810,15 +820,13 @@ static struct config_item *userdatum_make_item(struct config_group *group, struct netconsole_target *nt; struct userdatum *udm; struct userdata *ud; - size_t child_count;
if (strlen(name) > MAX_EXTRADATA_NAME_LEN) return ERR_PTR(-ENAMETOOLONG);
ud = to_userdata(&group->cg_item); nt = userdata_to_target(ud); - child_count = list_count_nodes(&nt->userdata_group.cg_children); - if (child_count >= MAX_EXTRADATA_ITEMS) + if (count_extradata_entries(nt) >= MAX_EXTRADATA_ITEMS) return ERR_PTR(-ENOSPC);
udm = kzalloc(sizeof(*udm), GFP_KERNEL);
This patch introduces a bitfield to store sysdata features in the netconsole_target struct. It also adds configfs helpers to enable or disable the CPU_NR feature, which populates the CPU number in sysdata.
The patch provides the necessary infrastructure to set or unset the CPU_NR feature, but does not modify the message itself.
Signed-off-by: Breno Leitao leitao@debian.org --- drivers/net/netconsole.c | 81 ++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 81 insertions(+)
diff --git a/drivers/net/netconsole.c b/drivers/net/netconsole.c index 1c197da806962a751a3da8794def6a97cff5f701..4cefa43555aada25769b705dd8c8c89964f51a52 100644 --- a/drivers/net/netconsole.c +++ b/drivers/net/netconsole.c @@ -97,6 +97,15 @@ struct netconsole_target_stats { struct u64_stats_sync syncp; };
+/* Features enabled in sysdata. Contrary to userdata, this data is populated by + * the kernel. The fields are designed as bitwise flags, allowing multiple + * features to be set in sysdata_fields. + */ +enum sysdata_feature { + /* Populate the CPU that sends the message */ + CPU_NR = BIT(0), +}; + /** * struct netconsole_target - Represents a configured netconsole target. * @list: Links this target into the target_list. @@ -104,6 +113,7 @@ struct netconsole_target_stats { * @userdata_group: Links to the userdata configfs hierarchy * @extradata_complete: Cached, formatted string of append * @userdata_length: String length of usedata in extradata_complete. + * @sysdata_fields: Sysdata features enabled. * @stats: Packet send stats for the target. Used for debugging. * @enabled: On / off knob to enable / disable target. * Visible from userspace (read-write). @@ -131,6 +141,8 @@ struct netconsole_target { struct config_group userdata_group; char extradata_complete[MAX_EXTRADATA_ENTRY_LEN * MAX_EXTRADATA_ITEMS]; size_t userdata_length; + /* bit-wise with sysdata_feature bits */ + u32 sysdata_fields; #endif struct netconsole_target_stats stats; bool enabled; @@ -398,6 +410,19 @@ static ssize_t transmit_errors_show(struct config_item *item, char *buf) return sysfs_emit(buf, "%llu\n", xmit_drop_count + enomem_count); }
+/* configfs helper to display if cpu_nr sysdata feature is enabled */ +static ssize_t sysdata_cpu_nr_show(struct config_item *item, char *buf) +{ + struct netconsole_target *nt = to_target(item->ci_parent); + bool cpu_nr_enabled; + + mutex_lock(&dynamic_netconsole_mutex); + cpu_nr_enabled = nt->sysdata_fields & CPU_NR; + mutex_unlock(&dynamic_netconsole_mutex); + + return sysfs_emit(buf, "%d\n", cpu_nr_enabled); +} + /* * This one is special -- targets created through the configfs interface * are not enabled (and the corresponding netpoll activated) by default. @@ -792,7 +817,62 @@ static ssize_t userdatum_value_store(struct config_item *item, const char *buf, return ret; }
+/* disable_sysdata_feature - Disable sysdata feature and clean sysdata + * @nt: target that is diabling the feature + * @feature: feature being disabled + */ +static void disable_sysdata_feature(struct netconsole_target *nt, + enum sysdata_feature feature) +{ + nt->sysdata_fields &= ~feature; + nt->extradata_complete[nt->userdata_length] = 0; +} + +/* configfs helper to sysdata cpu_nr feature */ +static ssize_t sysdata_cpu_nr_store(struct config_item *item, const char *buf, + size_t count) +{ + struct netconsole_target *nt = to_target(item->ci_parent); + bool cpu_nr_enabled, curr; + ssize_t ret; + + ret = kstrtobool(buf, &cpu_nr_enabled); + if (ret) + return ret; + + mutex_lock(&dynamic_netconsole_mutex); + curr = nt->sysdata_fields & CPU_NR; + if (cpu_nr_enabled == curr) + /* no change requested */ + goto unlock_ok; + + if (cpu_nr_enabled && + count_extradata_entries(nt) >= MAX_EXTRADATA_ITEMS) { + /* user wants the new feature, but there is no space in the + * buffer. + */ + ret = -ENOSPC; + goto unlock; + } + + if (cpu_nr_enabled) + nt->sysdata_fields |= CPU_NR; + else + /* This is special because extradata_complete might have + * remaining data from previous sysdata, and it needs to be + * cleaned. + */ + disable_sysdata_feature(nt, CPU_NR); + +unlock_ok: + ret = strnlen(buf, count); +unlock: + mutex_unlock(&dynamic_netconsole_mutex); + return ret; +} + CONFIGFS_ATTR(userdatum_, value); +CONFIGFS_ATTR(sysdata_, cpu_nr);
static struct configfs_attribute *userdatum_attrs[] = { &userdatum_attr_value, @@ -852,6 +932,7 @@ static void userdatum_drop(struct config_group *group, struct config_item *item) }
static struct configfs_attribute *userdata_attrs[] = { + &sysdata_attr_cpu_nr, NULL, };
Modify count_extradata_entries() to include sysdata fields when calculating the total number of extradata entries. This change ensures that the sysdata feature, specifically the CPU number field, is correctly counted against the MAX_EXTRADATA_ITEMS limit.
The modification adds a simple check for the CPU_NR flag in the sysdata_fields, incrementing the entry count accordingly.
Signed-off-by: Breno Leitao leitao@debian.org --- drivers/net/netconsole.c | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-)
diff --git a/drivers/net/netconsole.c b/drivers/net/netconsole.c index 4cefa43555aada25769b705dd8c8c89964f51a52..2f1aecdf2a47f246e75061d09b9ca524a82ec994 100644 --- a/drivers/net/netconsole.c +++ b/drivers/net/netconsole.c @@ -693,7 +693,15 @@ static ssize_t remote_ip_store(struct config_item *item, const char *buf, */ static size_t count_extradata_entries(struct netconsole_target *nt) { - return list_count_nodes(&nt->userdata_group.cg_children); + size_t entries; + + /* Userdata entries */ + entries = list_count_nodes(&nt->userdata_group.cg_children); + /* Plus sysdata entries */ + if (nt->sysdata_fields & CPU_NR) + entries += 1; + + return entries; }
static ssize_t remote_mac_store(struct config_item *item, const char *buf,
Add infrastructure to automatically append kernel-generated data (sysdata) to netconsole messages. As the first use case, implement CPU number population, which adds the CPU that sent the message.
This change introduces three distinct data types: - extradata: The complete set of appended data (sysdata + userdata) - userdata: User-provided key-value pairs from userspace - sysdata: Kernel-populated data (e.g. cpu=XX)
The implementation adds a new configfs attribute 'cpu_nr' to control CPU number population per target. When enabled, each message is tagged with its originating CPU. The sysdata is dynamically updated at message time and appended after any existing userdata.
The CPU number is formatted as "cpu=XX" and is added to the extradata buffer, respecting the existing size limits.
Signed-off-by: Breno Leitao leitao@debian.org --- drivers/net/netconsole.c | 53 +++++++++++++++++++++++++++++++++++++++--------- 1 file changed, 43 insertions(+), 10 deletions(-)
diff --git a/drivers/net/netconsole.c b/drivers/net/netconsole.c index 2f1aecdf2a47f246e75061d09b9ca524a82ec994..d3df66de9a352678bff011024922c63ef6f1b0ef 100644 --- a/drivers/net/netconsole.c +++ b/drivers/net/netconsole.c @@ -1116,6 +1116,40 @@ static void populate_configfs_item(struct netconsole_target *nt, init_target_config_group(nt, target_name); }
+/* + * prepare_extradata - append sysdata at extradata_complete in runtime + * @nt: target to send message to + */ +static int prepare_extradata(struct netconsole_target *nt) +{ + int sysdata_len, extradata_len; + + /* userdata was appended when configfs write helper was called + * by update_userdata(). + */ + extradata_len = nt->userdata_length; + + if (!(nt->sysdata_fields & CPU_NR)) + goto out; + + /* Append cpu=%d at extradata_complete after userdata str */ + sysdata_len = scnprintf(&nt->extradata_complete[nt->userdata_length], + MAX_EXTRADATA_ENTRY_LEN, " cpu=%u\n", + raw_smp_processor_id()); + + extradata_len += sysdata_len; + + WARN_ON_ONCE(extradata_len > + MAX_EXTRADATA_ENTRY_LEN * MAX_EXTRADATA_ITEMS); + +out: + return extradata_len; +} +#else /* CONFIG_NETCONSOLE_DYNAMIC not set */ +static int prepare_extradata(struct netconsole_target *nt) +{ + return 0; +} #endif /* CONFIG_NETCONSOLE_DYNAMIC */
/* Handle network interface device notifications */ @@ -1250,16 +1284,14 @@ static void append_release(char *buf)
static void send_fragmented_body(struct netconsole_target *nt, const char *msgbody, int header_len, - int msgbody_len) + int msgbody_len, int extradata_len) { int sent_extradata, preceding_bytes; const char *extradata = NULL; int body_len, offset = 0; - int extradata_len = 0;
#ifdef CONFIG_NETCONSOLE_DYNAMIC extradata = nt->extradata_complete; - extradata_len = nt->userdata_length; #endif
/* body_len represents the number of bytes that will be sent. This is @@ -1340,7 +1372,8 @@ static void send_fragmented_body(struct netconsole_target *nt, static void send_msg_fragmented(struct netconsole_target *nt, const char *msg, int msg_len, - int release_len) + int release_len, + int extradata_len) { int header_len, msgbody_len; const char *msgbody; @@ -1368,7 +1401,8 @@ static void send_msg_fragmented(struct netconsole_target *nt, /* for now on, the header will be persisted, and the msgbody * will be replaced */ - send_fragmented_body(nt, msgbody, header_len, msgbody_len); + send_fragmented_body(nt, msgbody, header_len, msgbody_len, + extradata_len); }
/** @@ -1384,12 +1418,10 @@ static void send_msg_fragmented(struct netconsole_target *nt, static void send_ext_msg_udp(struct netconsole_target *nt, const char *msg, int msg_len) { - int extradata_len = 0; int release_len = 0; + int extradata_len;
-#ifdef CONFIG_NETCONSOLE_DYNAMIC - extradata_len = nt->userdata_length; -#endif + extradata_len = prepare_extradata(nt);
if (nt->release) release_len = strlen(init_utsname()->release) + 1; @@ -1397,7 +1429,8 @@ static void send_ext_msg_udp(struct netconsole_target *nt, const char *msg, if (msg_len + release_len + extradata_len <= MAX_PRINT_CHUNK) return send_msg_no_fragmentation(nt, msg, msg_len, release_len);
- return send_msg_fragmented(nt, msg, msg_len, release_len); + return send_msg_fragmented(nt, msg, msg_len, release_len, + extradata_len); }
static void write_ext_msg(struct console *con, const char *msg,
Add a new selftest to verify that the netconsole module correctly handles CPU runtime data in sysdata. The test validates three scenarios:
1. Basic CPU sysdata functionality - verifies that cpu=X is appended to messages 2. CPU sysdata with userdata - ensures CPU data works alongside userdata 3. Disabled CPU sysdata - confirms no CPU data is included when disabled
The test uses taskset to control which CPU sends messages and verifies the reported CPU matches the one used. This helps ensure that netconsole accurately tracks and reports the originating CPU of messages.
Signed-off-by: Breno Leitao leitao@debian.org --- tools/testing/selftests/drivers/net/Makefile | 1 + .../selftests/drivers/net/lib/sh/lib_netcons.sh | 17 +++ .../selftests/drivers/net/netcons_sysdata.sh | 167 +++++++++++++++++++++ 3 files changed, 185 insertions(+)
diff --git a/tools/testing/selftests/drivers/net/Makefile b/tools/testing/selftests/drivers/net/Makefile index 137470bdee0c7fd2517bd1baafc12d575de4b4ac..6c5e6159d8913e7bfd6bd835f0a9bd9e9d83518b 100644 --- a/tools/testing/selftests/drivers/net/Makefile +++ b/tools/testing/selftests/drivers/net/Makefile @@ -8,6 +8,7 @@ TEST_INCLUDES := $(wildcard lib/py/*.py) \ TEST_PROGS := \ netcons_basic.sh \ netcons_overflow.sh \ + netcons_sysdata.sh \ ping.py \ queues.py \ stats.py \ diff --git a/tools/testing/selftests/drivers/net/lib/sh/lib_netcons.sh b/tools/testing/selftests/drivers/net/lib/sh/lib_netcons.sh index 3acaba41ac7b21aa2fd8457ed640a5ac8a41bc12..d319d177ce5ed7a1b196e68bffe549d57011fb15 100644 --- a/tools/testing/selftests/drivers/net/lib/sh/lib_netcons.sh +++ b/tools/testing/selftests/drivers/net/lib/sh/lib_netcons.sh @@ -223,3 +223,20 @@ function check_for_dependencies() { exit "${ksft_skip}" fi } + +function check_for_taskset() { + if ! which taskset > /dev/null ; then + echo "SKIP: taskset(1) is not available" >&2 + exit "${ksft_skip}" + fi +} + +# This is necessary if running multiple tests in a row +function pkill_socat() { + PROCESS_NAME="socat UDP-LISTEN:6666,fork ${OUTPUT_FILE}" + # socat runs under timeout(1), kill it if it is still alive + # do not fail if socat doesn't exist anymore + set +e + pkill -f "${PROCESS_NAME}" + set -e +} diff --git a/tools/testing/selftests/drivers/net/netcons_sysdata.sh b/tools/testing/selftests/drivers/net/netcons_sysdata.sh new file mode 100755 index 0000000000000000000000000000000000000000..f50ccae6c0c090d6574defe5e02d7eda5f3d7188 --- /dev/null +++ b/tools/testing/selftests/drivers/net/netcons_sysdata.sh @@ -0,0 +1,167 @@ +#!/usr/bin/env bash +# SPDX-License-Identifier: GPL-2.0 + +# A test that makes sure that sysdata runtime CPU data is properly set +# when a message is sent. +# +# There are 3 different tests, every time sent using a random CPU. +# - Test #1 +# * Only enable cpu_nr sysdata feature. +# - Test #2 +# * Keep cpu_nr sysdata feature enable and enable userdata. +# - Test #3 +# * keep userdata enabled, and disable sysdata cpu_nr feature. +# +# Author: Breno Leitao leitao@debian.org + +set -euo pipefail + +SCRIPTDIR=$(dirname "$(readlink -e "${BASH_SOURCE[0]}")") + +source "${SCRIPTDIR}"/lib/sh/lib_netcons.sh + +# Enable the sysdata cpu_nr feature +function set_cpu_nr() { + if [[ ! -f "${NETCONS_PATH}/userdata/cpu_nr" ]] + then + echo "Populate CPU configfs path not available in ${NETCONS_PATH}/userdata/cpu_nr" >&2 + exit "${ksft_skip}" + fi + + echo 1 > "${NETCONS_PATH}/userdata/cpu_nr" +} + +# Disable the sysdata cpu_nr feature +function unset_cpu_nr() { + echo 0 > "${NETCONS_PATH}/userdata/cpu_nr" +} + +# Test if MSG content and `cpu=${CPU}` exists in OUTPUT_FILE +function validate_sysdata_cpu_exists() { + # OUTPUT_FILE will contain something like: + # 6.11.1-0_fbk0_rc13_509_g30d75cea12f7,13,1822,115075213798,-;netconsole selftest: netcons_gtJHM + # userdatakey=userdatavalue + # cpu=X + + if [ ! -f "$OUTPUT_FILE" ]; then + echo "FAIL: File was not generated." >&2 + exit "${ksft_fail}" + fi + + if ! grep -q "${MSG}" "${OUTPUT_FILE}"; then + echo "FAIL: ${MSG} not found in ${OUTPUT_FILE}" >&2 + cat "${OUTPUT_FILE}" >&2 + exit "${ksft_fail}" + fi + + # Check if cpu=XX exists in the file and matches the one used + # in taskset(1) + if ! grep -q "cpu=${CPU}+" "${OUTPUT_FILE}"; then + echo "FAIL: 'cpu=${CPU}' not found in ${OUTPUT_FILE}" >&2 + cat "${OUTPUT_FILE}" >&2 + exit "${ksft_fail}" + fi + + rm "${OUTPUT_FILE}" + pkill_socat +} + +# Test if MSG content exists in OUTPUT_FILE but no `cpu=` string +function validate_sysdata_no_cpu() { + if [ ! -f "$OUTPUT_FILE" ]; then + echo "FAIL: File was not generated." >&2 + exit "${ksft_fail}" + fi + + if ! grep -q "${MSG}" "${OUTPUT_FILE}"; then + echo "FAIL: ${MSG} not found in ${OUTPUT_FILE}" >&2 + cat "${OUTPUT_FILE}" >&2 + exit "${ksft_fail}" + fi + + if grep -q "cpu=" "${OUTPUT_FILE}"; then + echo "FAIL: 'cpu= found in ${OUTPUT_FILE}" >&2 + cat "${OUTPUT_FILE}" >&2 + exit "${ksft_fail}" + fi + + rm "${OUTPUT_FILE}" +} + +# Start socat, send the message and wait for the file to show up in the file +# system +function runtest { + # Listen for netconsole port inside the namespace and destination + # interface + listen_port_and_save_to "${OUTPUT_FILE}" & + # Wait for socat to start and listen to the port. + wait_local_port_listen "${NAMESPACE}" "${PORT}" udp + # Send the message + taskset -c "${CPU}" echo "${MSG}: ${TARGET}" > /dev/kmsg + # Wait until socat saves the file to disk + busywait "${BUSYWAIT_TIMEOUT}" test -s "${OUTPUT_FILE}" +} + +# ========== # +# Start here # +# ========== # + +modprobe netdevsim 2> /dev/null || true +modprobe netconsole 2> /dev/null || true + +# Check for basic system dependency and exit if not found +check_for_dependencies +# This test also depends on taskset(1). Check for it before starting the test +check_for_taskset + +# Set current loglevel to KERN_INFO(6), and default to KERN_NOTICE(5) +echo "6 5" > /proc/sys/kernel/printk +# Remove the namespace, interfaces and netconsole target on exit +trap cleanup EXIT +# Create one namespace and two interfaces +set_network +# Create a dynamic target for netconsole +create_dynamic_target + +#==================================================== +# TEST #1 +# Send message from a random CPU +#==================================================== +# Random CPU in the system +CPU=$((RANDOM % $(nproc))) +OUTPUT_FILE="/tmp/${TARGET}_1" +MSG="Test #1 from CPU${CPU}" +# Enable the auto population of cpu_nr +set_cpu_nr +runtest +# Make sure the message was received in the dst part +# and exit +validate_sysdata_cpu_exists + +#==================================================== +# TEST #2 +# This test now adds userdata together with sysdata +# =================================================== +# Get a new random CPU +CPU=$((RANDOM % $(nproc))) +OUTPUT_FILE="/tmp/${TARGET}_2" +MSG="Test #2 from CPU${CPU}" +set_user_data +runtest +validate_sysdata_cpu_exists + +# =================================================== +# TEST #3 +# Unset cpu_nr, so, no CPU should be appended. +# userdata is still set +# =================================================== +CPU=$((RANDOM % $(nproc))) +OUTPUT_FILE="/tmp/${TARGET}_3" +MSG="Test #3 from CPU${CPU}" +# Enable the auto population of cpu_nr +unset_cpu_nr +runtest +# At this time, cpu= shouldn't be present in the msg +validate_sysdata_no_cpu + +exit "${ksft_pass}"
Update the netconsole documentation to explain the new feature that allows automatic population of the CPU number.
The key changes include introducing a new section titled "CPU number auto population in userdata", explaining how to enable the CPU number auto-population feature by writing to the "populate_cpu_nr" file in the netconsole configfs hierarchy.
This documentation update ensures users are aware of the new CPU number auto-population functionality and how to leverage it for better demultiplexing and visibility of parallel netconsole output.
Signed-off-by: Breno Leitao leitao@debian.org --- Documentation/networking/netconsole.rst | 45 +++++++++++++++++++++++++++++++++ 1 file changed, 45 insertions(+)
diff --git a/Documentation/networking/netconsole.rst b/Documentation/networking/netconsole.rst index 94c4680fdf3e7e1a0020d11b44547acfd68072a5..84803c59968a3237012fab821f432eb531aba45c 100644 --- a/Documentation/networking/netconsole.rst +++ b/Documentation/networking/netconsole.rst @@ -17,6 +17,8 @@ Release prepend support by Breno Leitao leitao@debian.org, Jul 7 2023
Userdata append support by Matthew Wood thepacketgeek@gmail.com, Jan 22 2024
+Sysdata append support by Breno Leitao leitao@debian.org, Jan 15 2025 + Please send bug reports to Matt Mackall mpm@selenic.com Satyam Sharma satyam.sharma@gmail.com, and Cong Wang xiyou.wangcong@gmail.com
@@ -238,6 +240,49 @@ Delete `userdata` entries with `rmdir`::
It is recommended to not write user data values with newlines.
+CPU number auto population in userdata +-------------------------------------- + +Inside the netconsole configfs hierarchy, there is a file called +`cpu_nr` under the `userdata` directory. This file is used to enable or disable +the automatic CPU number population feature. This feature automatically +populates the CPU number that is sending the message. + +To enable the CPU number auto-population:: + + echo 1 > /sys/kernel/config/netconsole/target1/userdata/cpu_nr + +When this option is enabled, the netconsole messages will include an additional +line in the userdata field with the format `cpu=<cpu_number>`. This allows the +receiver of the netconsole messages to easily differentiate and demultiplex +messages originating from different CPUs, which is particularly useful when +dealing with parallel log output. + +Example:: + + echo "This is a message" > /dev/kmsg + 12,607,22085407756,-;This is a message + cpu=42 + +In this example, the message was sent by CPU 42. + +.. note:: + + If the user has set a conflicting `cpu` key in the userdata dictionary, + both keys will be reported, with the kernel-populated entry appearing after + the user one. For example:: + + # User-defined CPU entry + mkdir -p /sys/kernel/config/netconsole/target1/userdata/cpu + echo "1" > /sys/kernel/config/netconsole/target1/userdata/cpu/value + + Output might look like:: + + 12,607,22085407756,-;This is a message + cpu=1 + cpu=42 # kernel-populated value + + Extended console: =================
+CPU number auto population in userdata +--------------------------------------
+Inside the netconsole configfs hierarchy, there is a file called +`cpu_nr` under the `userdata` directory. This file is used to enable or disable +the automatic CPU number population feature. This feature automatically +populates the CPU number that is sending the message.
Biking shedding a bit, but to me `cpu_nr` is the number of a CPU. However, you want this to be an enable/disable feature. Would `cpu_nr_enable`, or `cpu_nr_auto_populate` be clearer?
Andrew
On Fri, Jan 24, 2025 at 07:16:39AM -0800, Breno Leitao wrote:
The current implementation of netconsole sends all log messages in parallel, which can lead to an intermixed and interleaved output on the receiving side. This makes it challenging to demultiplex the messages and attribute them to their originating CPUs.
As a result, users and developers often struggle to effectively analyze and debug the parallel log output received through netconsole.
I know very little about consoles and netconsle, so this is probably a silly question:
Why is this a netconsole problem, and not a generic console problem?
Can other console types also send in parallel? Do they have the same issue of intermixing?
Andrew
linux-kselftest-mirror@lists.linaro.org