This change introduces a way to check if an fd points to a memfd's
original open fd (the one created by memfd_create).
We encountered an issue with migrating memfds in CRIU (checkpoint
restore in userspace - it migrates running processes between
machines). Imagine a scenario:
1. Create a memfd. By default it's open with O_RDWR and yet one can
exec() to it (unlike with regular files, where one would get ETXTBSY).
2. Reopen that memfd with O_RDWR via /proc/self/fd/<fd>.
Now those 2 fds are indistinguishable from userspace. You can't exec()
to either of them (since the reopen incremented inode->i_writecount)
and their /proc/self/fdinfo/ are exactly the same. Unfortunately they
are not the same. If you close the second one, the first one becomes
exec()able again. If you close the first one, the other doesn't become
exec()able. Therefore during migration it does matter which is recreated
first and which is reopened but there is no way for CRIU to tell which
was first.
---
Changes since v1 at [1]:
- Rewrote it from fcntl to ioctl. This was requested by filesystems
maintainer.
Links:
[1] https://lore.kernel.org/all/20230831203647.558079-1-mclapinski@google.com/
Michal Clapinski (2):
mm/memfd: add ioctl(MEMFD_CHECK_IF_ORIGINAL)
selftests: test ioctl(MEMFD_CHECK_IF_ORIGINAL)
.../userspace-api/ioctl/ioctl-number.rst | 1 +
fs/hugetlbfs/inode.c | 9 ++++++
include/linux/memfd.h | 12 +++++++
mm/memfd.c | 9 ++++++
mm/shmem.c | 9 ++++++
tools/testing/selftests/memfd/memfd_test.c | 32 +++++++++++++++++++
6 files changed, 72 insertions(+)
--
2.42.0.283.g2d96d420d3-goog
From: Zhangjin Wu <falcon(a)tinylab.org>
[ Upstream commit c388c9920da2679f62bec48d00ca9e80e9d0a364 ]
kernel parameters allow pass two types of strings, one type is like
'noapic', another type is like 'panic=5', the first type is passed as
arguments of the init program, the second type is passed as environment
variables of the init program.
when users pass kernel parameters like this:
noapic NOLIBC_TEST=syscall
our nolibc-test program will use the test setting from argv[1] and
ignore the one from NOLIBC_TEST environment variable, and at last, it
will print the following line and ignore the whole test setting.
Ignoring unknown test name 'noapic'
reversing the parsing order does solve the above issue:
test = getenv("NOLIBC_TEST");
if (test)
test = argv[1];
but it still doesn't work with such kernel parameters (without
NOLIBC_TEST environment variable):
noapic FOO=bar
To support all of the potential kernel parameters, let's verify the test
setting from both of argv[1] and NOLIBC_TEST environment variable.
Reviewed-by: Thomas Weißschuh <linux(a)weissschuh.net>
Signed-off-by: Zhangjin Wu <falcon(a)tinylab.org>
Signed-off-by: Willy Tarreau <w(a)1wt.eu>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/selftests/nolibc/nolibc-test.c | 33 ++++++++++++++++++--
1 file changed, 31 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/nolibc/nolibc-test.c b/tools/testing/selftests/nolibc/nolibc-test.c
index 78bced95ac630..f8e8e8d2a5e18 100644
--- a/tools/testing/selftests/nolibc/nolibc-test.c
+++ b/tools/testing/selftests/nolibc/nolibc-test.c
@@ -630,6 +630,35 @@ static struct test test_names[] = {
{ 0 }
};
+int is_setting_valid(char *test)
+{
+ int idx, len, test_len, valid = 0;
+ char delimiter;
+
+ if (!test)
+ return valid;
+
+ test_len = strlen(test);
+
+ for (idx = 0; test_names[idx].name; idx++) {
+ len = strlen(test_names[idx].name);
+ if (test_len < len)
+ continue;
+
+ if (strncmp(test, test_names[idx].name, len) != 0)
+ continue;
+
+ delimiter = test[len];
+ if (delimiter != ':' && delimiter != ',' && delimiter != '\0')
+ continue;
+
+ valid = 1;
+ break;
+ }
+
+ return valid;
+}
+
int main(int argc, char **argv, char **envp)
{
int min = 0;
@@ -655,10 +684,10 @@ int main(int argc, char **argv, char **envp)
* syscall:5-15[:.*],stdlib:8-10
*/
test = argv[1];
- if (!test)
+ if (!is_setting_valid(test))
test = getenv("NOLIBC_TEST");
- if (test) {
+ if (is_setting_valid(test)) {
char *comma, *colon, *dash, *value;
do {
--
2.40.1
From: Zhangjin Wu <falcon(a)tinylab.org>
[ Upstream commit c388c9920da2679f62bec48d00ca9e80e9d0a364 ]
kernel parameters allow pass two types of strings, one type is like
'noapic', another type is like 'panic=5', the first type is passed as
arguments of the init program, the second type is passed as environment
variables of the init program.
when users pass kernel parameters like this:
noapic NOLIBC_TEST=syscall
our nolibc-test program will use the test setting from argv[1] and
ignore the one from NOLIBC_TEST environment variable, and at last, it
will print the following line and ignore the whole test setting.
Ignoring unknown test name 'noapic'
reversing the parsing order does solve the above issue:
test = getenv("NOLIBC_TEST");
if (test)
test = argv[1];
but it still doesn't work with such kernel parameters (without
NOLIBC_TEST environment variable):
noapic FOO=bar
To support all of the potential kernel parameters, let's verify the test
setting from both of argv[1] and NOLIBC_TEST environment variable.
Reviewed-by: Thomas Weißschuh <linux(a)weissschuh.net>
Signed-off-by: Zhangjin Wu <falcon(a)tinylab.org>
Signed-off-by: Willy Tarreau <w(a)1wt.eu>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/selftests/nolibc/nolibc-test.c | 33 ++++++++++++++++++--
1 file changed, 31 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/nolibc/nolibc-test.c b/tools/testing/selftests/nolibc/nolibc-test.c
index d37d036876ea9..041f5d16a9d87 100644
--- a/tools/testing/selftests/nolibc/nolibc-test.c
+++ b/tools/testing/selftests/nolibc/nolibc-test.c
@@ -782,6 +782,35 @@ static const struct test test_names[] = {
{ 0 }
};
+int is_setting_valid(char *test)
+{
+ int idx, len, test_len, valid = 0;
+ char delimiter;
+
+ if (!test)
+ return valid;
+
+ test_len = strlen(test);
+
+ for (idx = 0; test_names[idx].name; idx++) {
+ len = strlen(test_names[idx].name);
+ if (test_len < len)
+ continue;
+
+ if (strncmp(test, test_names[idx].name, len) != 0)
+ continue;
+
+ delimiter = test[len];
+ if (delimiter != ':' && delimiter != ',' && delimiter != '\0')
+ continue;
+
+ valid = 1;
+ break;
+ }
+
+ return valid;
+}
+
int main(int argc, char **argv, char **envp)
{
int min = 0;
@@ -807,10 +836,10 @@ int main(int argc, char **argv, char **envp)
* syscall:5-15[:.*],stdlib:8-10
*/
test = argv[1];
- if (!test)
+ if (!is_setting_valid(test))
test = getenv("NOLIBC_TEST");
- if (test) {
+ if (is_setting_valid(test)) {
char *comma, *colon, *dash, *value;
do {
--
2.40.1
From: Zhangjin Wu <falcon(a)tinylab.org>
[ Upstream commit c388c9920da2679f62bec48d00ca9e80e9d0a364 ]
kernel parameters allow pass two types of strings, one type is like
'noapic', another type is like 'panic=5', the first type is passed as
arguments of the init program, the second type is passed as environment
variables of the init program.
when users pass kernel parameters like this:
noapic NOLIBC_TEST=syscall
our nolibc-test program will use the test setting from argv[1] and
ignore the one from NOLIBC_TEST environment variable, and at last, it
will print the following line and ignore the whole test setting.
Ignoring unknown test name 'noapic'
reversing the parsing order does solve the above issue:
test = getenv("NOLIBC_TEST");
if (test)
test = argv[1];
but it still doesn't work with such kernel parameters (without
NOLIBC_TEST environment variable):
noapic FOO=bar
To support all of the potential kernel parameters, let's verify the test
setting from both of argv[1] and NOLIBC_TEST environment variable.
Reviewed-by: Thomas Weißschuh <linux(a)weissschuh.net>
Signed-off-by: Zhangjin Wu <falcon(a)tinylab.org>
Signed-off-by: Willy Tarreau <w(a)1wt.eu>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/selftests/nolibc/nolibc-test.c | 33 ++++++++++++++++++--
1 file changed, 31 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/nolibc/nolibc-test.c b/tools/testing/selftests/nolibc/nolibc-test.c
index 486334981e601..55628a25df0a3 100644
--- a/tools/testing/selftests/nolibc/nolibc-test.c
+++ b/tools/testing/selftests/nolibc/nolibc-test.c
@@ -939,6 +939,35 @@ static const struct test test_names[] = {
{ 0 }
};
+int is_setting_valid(char *test)
+{
+ int idx, len, test_len, valid = 0;
+ char delimiter;
+
+ if (!test)
+ return valid;
+
+ test_len = strlen(test);
+
+ for (idx = 0; test_names[idx].name; idx++) {
+ len = strlen(test_names[idx].name);
+ if (test_len < len)
+ continue;
+
+ if (strncmp(test, test_names[idx].name, len) != 0)
+ continue;
+
+ delimiter = test[len];
+ if (delimiter != ':' && delimiter != ',' && delimiter != '\0')
+ continue;
+
+ valid = 1;
+ break;
+ }
+
+ return valid;
+}
+
int main(int argc, char **argv, char **envp)
{
int min = 0;
@@ -964,10 +993,10 @@ int main(int argc, char **argv, char **envp)
* syscall:5-15[:.*],stdlib:8-10
*/
test = argv[1];
- if (!test)
+ if (!is_setting_valid(test))
test = getenv("NOLIBC_TEST");
- if (test) {
+ if (is_setting_valid(test)) {
char *comma, *colon, *dash, *value;
do {
--
2.40.1
Currently console.log.diags contains an output like follows:
[ 2457.293734] WARNING: CPU: 2 PID: 13 at kernel/rcu/tasks.h:1061 rcu_tasks_trace_pregp_step+0x4a/0x50
[ 2457.542385] Call Trace:
This is not very useful and easier access to the call trace is desired.
Improve the script by extracting more lines after each grep match.
Provide a summary in the beginning like before, but also include details
below. Limit the total number of issues to a maximum of 10. And limit
the lines included after each issue to a maximum of 20.
With these changes the output becomes:
Issues found:
Line 6228: [ 2457.293734] WARNING: CPU: 2 PID: 13 at kernel/rcu/tasks.h:1061 rcu_tasks_trace_pregp_step+0x4a/0x50
Line 6245: [ 2457.542385] Call Trace:
Details of each issue:
Issue 1 (line 6228):
[ 2457.293734] WARNING: CPU: 2 PID: 13 at kernel/rcu/tasks.h:1061 rcu_tasks_trace_pregp_step+0x4a/0x50
[ 2457.326661] Modules linked in:
[ 2457.334818] CPU: 2 PID: 13 Comm: rcu_tasks_trace Not tainted 5.15.128+ #381
[ 2457.349782] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
[ 2457.373309] RIP: 0010:rcu_tasks_trace_pregp_step+0x4a/0x50
[...]
[ 2457.421803] RSP: 0018:ffffa80fc0073e40 EFLAGS: 00010202
[ 2457.431940] RAX: ffff8db91f580000 RBX: 000000000001b900 RCX: 0000000000000003
[ 2457.443206] RDX: 0000000000000008 RSI: ffffffffac6bebd8 RDI: 0000000000000003
[ 2457.454428] RBP: 0000000000000004 R08: 0000000000000001 R09: 0000000000000001
[ 2457.465668] R10: 0000000000000000 R11: 00000000ffffffff R12: ffff8db902d87f40
[ 2457.476971] R13: ffffffffac556620 R14: ffffffffac556630 R15: ffff8db9011a3200
[ 2457.488251] FS: 0000000000000000(0000) GS:ffff8db91f500000(0000) knlGS:0000000000000000
[ 2457.500834] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2457.509602] CR2: 0000000000000000 CR3: 0000000002cbc000 CR4: 00000000000006e0
[ 2457.520378] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 2457.531440] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 2457.542385] Call Trace:
[ 2457.546756] <TASK>
[ 2457.550349] ? __warn+0x7b/0x100
[ 2457.567214] ? rcu_tasks_trace_pregp_step+0x4a/0x50
-------------------------------------
Issue 2 (line 6245):
[ 2457.542385] Call Trace:
[ 2457.546756] <TASK>
[ 2457.550349] ? __warn+0x7b/0x100
[ 2457.567214] ? rcu_tasks_trace_pregp_step+0x4a/0x50
[ 2457.574948] ? report_bug+0x99/0xc0
[ 2457.593824] ? handle_bug+0x3c/0x70
[ 2457.599534] ? exc_invalid_op+0x13/0x60
[ 2457.625729] ? asm_exc_invalid_op+0x16/0x20
[ 2457.632249] ? rcu_tasks_trace_pregp_step+0x4a/0x50
[ 2457.660010] rcu_tasks_wait_gp+0x54/0x360
[ 2457.677761] ? _raw_spin_unlock_irqrestore+0x2b/0x60
[ 2457.705658] rcu_tasks_kthread+0x114/0x200
[ 2457.712450] ? wait_woken+0x70/0x70
[ 2457.727283] ? synchronize_rcu_tasks_rude+0x10/0x10
[ 2457.746221] kthread+0x130/0x160
[ 2457.751487] ? set_kthread_struct+0x40/0x40
[ 2457.758178] ret_from_fork+0x22/0x30
[ 2457.763909] </TASK>
[ 2457.767546] irq event stamp: 29544441
[ 2457.773344] hardirqs last enabled at (29544451): [<ffffffffaace6cbd>] __up_console_sem+0x4d/0x60
[ 2457.786967] hardirqs last disabled at (29544460): [<ffffffffaace6ca2>] __up_console_sem+0x32/0x60
Signed-off-by: Joel Fernandes (Google) <joel(a)joelfernandes.org>
---
v1->v2: Limit number of issues reported and include summary on the top.
.../rcutorture/bin/console-badness.sh | 42 ++++++++++++++++++-
1 file changed, 41 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/rcutorture/bin/console-badness.sh b/tools/testing/selftests/rcutorture/bin/console-badness.sh
index aad51e7c0183..2612a4931723 100755
--- a/tools/testing/selftests/rcutorture/bin/console-badness.sh
+++ b/tools/testing/selftests/rcutorture/bin/console-badness.sh
@@ -9,10 +9,50 @@
# Copyright (C) 2020 Facebook, Inc.
#
# Authors: Paul E. McKenney <paulmck(a)kernel.org>
+INPUT_DATA=$(< /dev/stdin)
+MAX_NR_ISSUES=10
-grep -E 'Badness|WARNING:|Warn|BUG|===========|BUG: KCSAN:|Call Trace:|Oops:|detected stalls on CPUs/tasks:|self-detected stall on CPU|Stall ended before state dump start|\?\?\? Writer stall state|rcu_.*kthread starved for|!!!' |
+# Get the line numbers for all the grep matches
+GREP_LINES="$(echo "$INPUT_DATA" |
+grep -n -E 'Badness|WARNING:|Warn|BUG|===========|BUG: KCSAN:|Call Trace:|Oops:|detected stalls on CPUs/tasks:|self-detected stall on CPU|Stall ended before state dump start|\?\?\? Writer stall state|rcu_.*kthread starved for|!!!' |
grep -v 'ODEBUG: ' |
grep -v 'This means that this is a DEBUG kernel and it is' |
grep -v 'Warning: unable to open an initial console' |
grep -v 'Warning: Failed to add ttynull console. No stdin, stdout, and stderr.*the init process!' |
grep -v 'NOHZ tick-stop error: Non-RCU local softirq work is pending, handler'
+)"
+
+# Exit if no grep matches
+if [ ! -n "$GREP_LINES" ]; then exit 0; fi
+
+# Print first MAX_NR_ISSUES grepped lines
+echo "Issues found:"
+issue_num=1
+while IFS= read -r line; do
+ # Extract the line number from the line
+ num=$(echo "$line" | awk -F: '{print $1}')
+ # Extract the rest of the line
+ line_rest=$(echo "$line" | cut -d: -f2-)
+ echo "Line $num: $line_rest"
+ if [ "$issue_num" -eq "$MAX_NR_ISSUES" ]; then break; fi
+ issue_num="$(($issue_num + 1))"
+done <<< "$GREP_LINES"
+echo ""
+
+# Print details of each issue
+#
+# Go through each line of GREP_LINES, extract the line number and then
+# print from that line and 20 lines after that line. Do that for each
+# grep match upto MAX_NR_ISSUES of them.
+echo "Details of each issue:"
+issue_num=1
+while IFS= read -r line; do
+ # Extract the line number from the line
+ num=$(echo "$line" | awk -F: '{print $1}')
+ # Print 20 lines after the matched line
+ echo "Issue $issue_num (line $num):"
+ echo "$INPUT_DATA" | sed -n "${num},$(($num + 20))p"
+ echo "-------------------------------------"
+ if [ "$issue_num" -eq "$MAX_NR_ISSUES" ]; then break; fi
+ issue_num="$(($issue_num + 1))"
+done <<< "$GREP_LINES"
--
2.42.0.283.g2d96d420d3-goog