This makes sure that wchan contains a sensible symbol when a process is blocked. Specifically this calls the sleep() syscall, and expects the architecture to have called schedule() from a function that has "sleep" somewhere in its name. For example, on the architectures I tested (x86_64, arm64, arm, mips, and powerpc) this is "hrtimer_nanosleep":
$ tools/testing/selftests/proc/proc-pid-wchan ok: found 'sleep' in wchan 'hrtimer_nanosleep'
Cc: Peter Zijlstra peterz@infradead.org Cc: Mark Rutland mark.rutland@arm.com Cc: Shuah Khan shuah@kernel.org Cc: Alexey Dobriyan adobriyan@gmail.com Cc: linux-kselftest@vger.kernel.org Signed-off-by: Kees Cook keescook@chromium.org --- Hi Peter,
Can you add this to the wchan series, please? This should help wchan from regressing in the future, and allow us to notice if the depth accidentally changes, like Mark saw. --- tools/testing/selftests/proc/Makefile | 1 + tools/testing/selftests/proc/proc-pid-wchan.c | 69 +++++++++++++++++++ 2 files changed, 70 insertions(+) create mode 100644 tools/testing/selftests/proc/proc-pid-wchan.c
diff --git a/tools/testing/selftests/proc/Makefile b/tools/testing/selftests/proc/Makefile index 1054e40a499a..45cf35703ece 100644 --- a/tools/testing/selftests/proc/Makefile +++ b/tools/testing/selftests/proc/Makefile @@ -8,6 +8,7 @@ TEST_GEN_PROGS += fd-002-posix-eq TEST_GEN_PROGS += fd-003-kthread TEST_GEN_PROGS += proc-loadavg-001 TEST_GEN_PROGS += proc-pid-vm +TEST_GEN_PROGS += proc-pid-wchan TEST_GEN_PROGS += proc-self-map-files-001 TEST_GEN_PROGS += proc-self-map-files-002 TEST_GEN_PROGS += proc-self-syscall diff --git a/tools/testing/selftests/proc/proc-pid-wchan.c b/tools/testing/selftests/proc/proc-pid-wchan.c new file mode 100644 index 000000000000..7d7870c31cef --- /dev/null +++ b/tools/testing/selftests/proc/proc-pid-wchan.c @@ -0,0 +1,69 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Make sure that wchan returns a reasonable symbol when blocked. + */ +#include <sys/types.h> +#include <sys/stat.h> +#include <errno.h> +#include <fcntl.h> +#include <signal.h> +#include <stdio.h> +#include <string.h> +#include <unistd.h> +#include <sys/wait.h> + +#define perror_exit(str) do { perror(str); _exit(1); } while (0) + +int main(void) +{ + char buf[64]; + pid_t child; + int sync[2], fd; + + if (pipe(sync) < 0) + perror_exit("pipe"); + + child = fork(); + if (child < 0) + perror_exit("fork"); + if (child == 0) { + /* Child */ + if (close(sync[0]) < 0) + perror_exit("child close sync[0]"); + if (close(sync[1]) < 0) + perror_exit("child close sync[1]"); + sleep(10); + _exit(0); + } + /* Parent */ + if (close(sync[1]) < 0) + perror_exit("parent close sync[1]"); + if (read(sync[0], buf, 1) != 0) + perror_exit("parent read sync[0]"); + + snprintf(buf, sizeof(buf), "/proc/%d/wchan", child); + fd = open(buf, O_RDONLY); + if (fd < 0) { + if (errno == ENOENT) + return 4; + perror_exit(buf); + } + + memset(buf, 0, sizeof(buf)); + if (read(fd, buf, sizeof(buf) - 1) < 1) + perror_exit(buf); + if (strstr(buf, "sleep") == NULL) { + fprintf(stderr, "FAIL: did not find 'sleep' in wchan '%s'\n", buf); + return 1; + } + printf("ok: found 'sleep' in wchan '%s'\n", buf); + + if (kill(child, SIGKILL) < 0) + perror_exit("kill"); + if (waitpid(child, NULL, 0) != child) { + fprintf(stderr, "waitpid: got the wrong child!?\n"); + return 1; + } + + return 0; +}
On Fri, Oct 08, 2021 at 04:55:04PM -0700, Kees Cook wrote:
This makes sure that wchan contains a sensible symbol when a process is blocked.
Specifically this calls the sleep() syscall, and expects the architecture to have called schedule() from a function that has "sleep" somewhere in its name.
This exposes internal kernel symbol to userspace. Why would want to test that?
Doing s/sleep/SLEEP/g doesn't change kernel but now the test is broken.
For example, on the architectures I tested (x86_64, arm64, arm, mips, and powerpc) this is "hrtimer_nanosleep":
+/*
- Make sure that wchan returns a reasonable symbol when blocked.
- */
Test should be "contains C identifier" then?
+int main(void) +{
- char buf[64];
- pid_t child;
- int sync[2], fd;
- if (pipe(sync) < 0)
perror_exit("pipe");
- child = fork();
- if (child < 0)
perror_exit("fork");
- if (child == 0) {
/* Child */
if (close(sync[0]) < 0)
perror_exit("child close sync[0]");
if (close(sync[1]) < 0)
perror_exit("child close sync[1]");
Redundant close().
sleep(10);
_exit(0);
- }
- /* Parent */
- if (close(sync[1]) < 0)
perror_exit("parent close sync[1]");
Redundant close().
- if (read(sync[0], buf, 1) != 0)
perror_exit("parent read sync[0]");
Racy if child is scheduled out after first close in the child.
- snprintf(buf, sizeof(buf), "/proc/%d/wchan", child);
- fd = open(buf, O_RDONLY);
- if (fd < 0) {
if (errno == ENOENT)
return 4;
perror_exit(buf);
- }
- memset(buf, 0, sizeof(buf));
- if (read(fd, buf, sizeof(buf) - 1) < 1)
perror_exit(buf);
- if (strstr(buf, "sleep") == NULL) {
fprintf(stderr, "FAIL: did not find 'sleep' in wchan '%s'\n", buf);
return 1;
- }
- printf("ok: found 'sleep' in wchan '%s'\n", buf);
- if (kill(child, SIGKILL) < 0)
perror_exit("kill");
- if (waitpid(child, NULL, 0) != child) {
fprintf(stderr, "waitpid: got the wrong child!?\n");
return 1;
- }
- return 0;
+}
On Sat, Oct 09, 2021 at 03:52:02PM +0300, Alexey Dobriyan wrote:
On Fri, Oct 08, 2021 at 04:55:04PM -0700, Kees Cook wrote:
This makes sure that wchan contains a sensible symbol when a process is blocked.
Specifically this calls the sleep() syscall, and expects the architecture to have called schedule() from a function that has "sleep" somewhere in its name.
This exposes internal kernel symbol to userspace.
Correct; we're verifying the results of the wchan output, which produces a kernel symbol for blocked processes.
Why would want to test that?
This is part of a larger series refactoring/fixing wchan[1], and we've now tripped over several different failure conditions, so I want to make sure this doesn't regress in the future.
Doing s/sleep/SLEEP/g doesn't change kernel but now the test is broken.
Yes; the test would be doing it's job, as that would mean there was a userspace visible change to wchan, so we'd want to catch it and either fix the kernel or update the test to reflect the new reality.
For example, on the architectures I tested (x86_64, arm64, arm, mips, and powerpc) this is "hrtimer_nanosleep":
+/*
- Make sure that wchan returns a reasonable symbol when blocked.
- */
Test should be "contains C identifier" then?
Nope, this was intentional. Expanding to a C identifier won't catch the "we unwound the stack to the wrong depth and now all wchan shows is '__switch_to'" bug[2]. We're specifically checking that wchan is doing at least the right thing for the most common blocking state.
+int main(void) +{
- char buf[64];
- pid_t child;
- int sync[2], fd;
- if (pipe(sync) < 0)
perror_exit("pipe");
- child = fork();
- if (child < 0)
perror_exit("fork");
- if (child == 0) {
/* Child */
if (close(sync[0]) < 0)
perror_exit("child close sync[0]");
if (close(sync[1]) < 0)
perror_exit("child close sync[1]");
Redundant close().
Hmm, did you maybe miss the differing array indexes? This closes the reading end followed by the writing end of the child's pipe.
sleep(10);
_exit(0);
- }
- /* Parent */
- if (close(sync[1]) < 0)
perror_exit("parent close sync[1]");
Redundant close().
It's not, though. This closes the write side of the parent's pipe.
- if (read(sync[0], buf, 1) != 0)
perror_exit("parent read sync[0]");
Racy if child is scheduled out after first close in the child.
No, the first close will close the child's read-side of the pipe, which isn't being used. For example, see[3].
The parent's read of /proc/$child/wchan could technically race if the child is scheduled out after the second close() and before the sleep(), but the parent is doing at least 2 syscalls before then. I'm open to a more exact synchronization method, but this should be sufficient. (e.g. Using ptrace to catch sleep syscall entry seemed like overkill.)
-Kees
[1] https://lore.kernel.org/lkml/20211008111527.438276127@infradead.org/ [2] https://lore.kernel.org/lkml/20211008124052.GA976@C02TD0UTHF1T.local/ [3] https://man7.org/tlpi/code/online/diff/pipes/pipe_sync.c.html
- snprintf(buf, sizeof(buf), "/proc/%d/wchan", child);
- fd = open(buf, O_RDONLY);
- if (fd < 0) {
if (errno == ENOENT)
return 4;
perror_exit(buf);
- }
- memset(buf, 0, sizeof(buf));
- if (read(fd, buf, sizeof(buf) - 1) < 1)
perror_exit(buf);
- if (strstr(buf, "sleep") == NULL) {
fprintf(stderr, "FAIL: did not find 'sleep' in wchan '%s'\n", buf);
return 1;
- }
- printf("ok: found 'sleep' in wchan '%s'\n", buf);
- if (kill(child, SIGKILL) < 0)
perror_exit("kill");
- if (waitpid(child, NULL, 0) != child) {
fprintf(stderr, "waitpid: got the wrong child!?\n");
return 1;
- }
- return 0;
+}
On Fri, Oct 08, 2021 at 04:55:04PM -0700, Kees Cook wrote:
This makes sure that wchan contains a sensible symbol when a process is blocked. Specifically this calls the sleep() syscall, and expects the architecture to have called schedule() from a function that has "sleep" somewhere in its name. For example, on the architectures I tested (x86_64, arm64, arm, mips, and powerpc) this is "hrtimer_nanosleep":
$ tools/testing/selftests/proc/proc-pid-wchan ok: found 'sleep' in wchan 'hrtimer_nanosleep'
Cc: Peter Zijlstra peterz@infradead.org Cc: Mark Rutland mark.rutland@arm.com Cc: Shuah Khan shuah@kernel.org Cc: Alexey Dobriyan adobriyan@gmail.com Cc: linux-kselftest@vger.kernel.org Signed-off-by: Kees Cook keescook@chromium.org
Friendly ping.
Hi Peter,
Can you add this to the wchan series, please? This should help wchan from regressing in the future, and allow us to notice if the depth accidentally changes, like Mark saw.
I'd like to make sure we have a regression test for this. Will you add this to the wchan series please?
-Kees
On Thu, Oct 21, 2021 at 10:09:33AM -0700, Kees Cook wrote:
Hi Peter,
Can you add this to the wchan series, please? This should help wchan from regressing in the future, and allow us to notice if the depth accidentally changes, like Mark saw.
I'd like to make sure we have a regression test for this. Will you add this to the wchan series please?
I have it there, but it's in the part that didn't make it in yet. I'm currently in the progress of reworking that.
Do you want it it ahead of all that?
On Thu, Oct 21, 2021 at 09:30:33PM +0200, Peter Zijlstra wrote:
On Thu, Oct 21, 2021 at 10:09:33AM -0700, Kees Cook wrote:
Hi Peter,
Can you add this to the wchan series, please? This should help wchan from regressing in the future, and allow us to notice if the depth accidentally changes, like Mark saw.
I'd like to make sure we have a regression test for this. Will you add this to the wchan series please?
I have it there, but it's in the part that didn't make it in yet. I'm currently in the progress of reworking that.
Do you want it it ahead of all that?
Nah, that's fine. I'm not in any rush; I just wanted to make sure it didn't get lost. I didn't realize it was already there (I had only seen the -tip emails). Sorry for the noise; thanks!
-Kees
On Fri, Oct 8, 2021, at 4:55 PM, Kees Cook wrote:
This makes sure that wchan contains a sensible symbol when a process is blocked. Specifically this calls the sleep() syscall, and expects the architecture to have called schedule() from a function that has "sleep" somewhere in its name. For example, on the architectures I tested (x86_64, arm64, arm, mips, and powerpc) this is "hrtimer_nanosleep":
Is this really better than admitting that the whole mechanism is nonsense and disabling it?
We could have a fixed string for each task state and call it a day.
On Thu, Oct 21, 2021 at 01:03:33PM -0700, Andy Lutomirski wrote:
On Fri, Oct 8, 2021, at 4:55 PM, Kees Cook wrote:
This makes sure that wchan contains a sensible symbol when a process is blocked. Specifically this calls the sleep() syscall, and expects the architecture to have called schedule() from a function that has "sleep" somewhere in its name. For example, on the architectures I tested (x86_64, arm64, arm, mips, and powerpc) this is "hrtimer_nanosleep":
Is this really better than admitting that the whole mechanism is nonsense and disabling it?
We could have a fixed string for each task state and call it a day.
I consider this to be "future work". In earlier discussions it came up, but there wasn't an obvious clean cost-free way to do this, so instead we're just fixing the broken corner and keeping the mostly working rest of it while cleaning up the weird edges. :)
On Thu, Oct 21, 2021, at 1:12 PM, Kees Cook wrote:
On Thu, Oct 21, 2021 at 01:03:33PM -0700, Andy Lutomirski wrote:
On Fri, Oct 8, 2021, at 4:55 PM, Kees Cook wrote:
This makes sure that wchan contains a sensible symbol when a process is blocked. Specifically this calls the sleep() syscall, and expects the architecture to have called schedule() from a function that has "sleep" somewhere in its name. For example, on the architectures I tested (x86_64, arm64, arm, mips, and powerpc) this is "hrtimer_nanosleep":
Is this really better than admitting that the whole mechanism is nonsense and disabling it?
We could have a fixed string for each task state and call it a day.
I consider this to be "future work". In earlier discussions it came up, but there wasn't an obvious clean cost-free way to do this, so instead we're just fixing the broken corner and keeping the mostly working rest of it while cleaning up the weird edges. :)
True, but we have the caveat that wchan is currently broken, so in some sense we have an easier time killing it now as compared to later. But if we don't have a fully-fleshed-out idea for how to kill it, then I'm fine with waiting.
-- Kees Cook
On Thu, Oct 21, 2021 at 05:39:38PM -0700, Andy Lutomirski wrote:
On Thu, Oct 21, 2021, at 1:12 PM, Kees Cook wrote:
On Thu, Oct 21, 2021 at 01:03:33PM -0700, Andy Lutomirski wrote:
On Fri, Oct 8, 2021, at 4:55 PM, Kees Cook wrote:
This makes sure that wchan contains a sensible symbol when a process is blocked. Specifically this calls the sleep() syscall, and expects the architecture to have called schedule() from a function that has "sleep" somewhere in its name. For example, on the architectures I tested (x86_64, arm64, arm, mips, and powerpc) this is "hrtimer_nanosleep":
Is this really better than admitting that the whole mechanism is nonsense and disabling it?
We could have a fixed string for each task state and call it a day.
I consider this to be "future work". In earlier discussions it came up, but there wasn't an obvious clean cost-free way to do this, so instead we're just fixing the broken corner and keeping the mostly working rest of it while cleaning up the weird edges. :)
True, but we have the caveat that wchan is currently broken, so in some sense we have an easier time killing it now as compared to later. But if we don't have a fully-fleshed-out idea for how to kill it, then I'm fine with waiting.
It's not actually that broken. Only ORC was fully broken, so all the other architectures (and non-ORC x86) have been fine. But given the method of fixing ORC vs wchan, it turns out we could further clean up the other architectures. But yes, no real plan to remove it, but the current series fixes things pretty well. :)
-Kees
-- Kees Cook
linux-kselftest-mirror@lists.linaro.org