On Sat, Oct 09, 2021 at 03:52:02PM +0300, Alexey Dobriyan wrote:
On Fri, Oct 08, 2021 at 04:55:04PM -0700, Kees Cook wrote:
This makes sure that wchan contains a sensible symbol when a process is blocked.
Specifically this calls the sleep() syscall, and expects the architecture to have called schedule() from a function that has "sleep" somewhere in its name.
This exposes internal kernel symbol to userspace.
Correct; we're verifying the results of the wchan output, which produces a kernel symbol for blocked processes.
Why would want to test that?
This is part of a larger series refactoring/fixing wchan[1], and we've now tripped over several different failure conditions, so I want to make sure this doesn't regress in the future.
Doing s/sleep/SLEEP/g doesn't change kernel but now the test is broken.
Yes; the test would be doing it's job, as that would mean there was a userspace visible change to wchan, so we'd want to catch it and either fix the kernel or update the test to reflect the new reality.
For example, on the architectures I tested (x86_64, arm64, arm, mips, and powerpc) this is "hrtimer_nanosleep":
+/*
- Make sure that wchan returns a reasonable symbol when blocked.
 - */
 Test should be "contains C identifier" then?
Nope, this was intentional. Expanding to a C identifier won't catch the "we unwound the stack to the wrong depth and now all wchan shows is '__switch_to'" bug[2]. We're specifically checking that wchan is doing at least the right thing for the most common blocking state.
+int main(void) +{
- char buf[64];
 - pid_t child;
 - int sync[2], fd;
 - if (pipe(sync) < 0)
 perror_exit("pipe");- child = fork();
 - if (child < 0)
 perror_exit("fork");- if (child == 0) {
 /* Child */if (close(sync[0]) < 0)perror_exit("child close sync[0]");if (close(sync[1]) < 0)perror_exit("child close sync[1]");Redundant close().
Hmm, did you maybe miss the differing array indexes? This closes the reading end followed by the writing end of the child's pipe.
sleep(10);_exit(0);- }
 - /* Parent */
 - if (close(sync[1]) < 0)
 perror_exit("parent close sync[1]");Redundant close().
It's not, though. This closes the write side of the parent's pipe.
- if (read(sync[0], buf, 1) != 0)
 perror_exit("parent read sync[0]");Racy if child is scheduled out after first close in the child.
No, the first close will close the child's read-side of the pipe, which isn't being used. For example, see[3].
The parent's read of /proc/$child/wchan could technically race if the child is scheduled out after the second close() and before the sleep(), but the parent is doing at least 2 syscalls before then. I'm open to a more exact synchronization method, but this should be sufficient. (e.g. Using ptrace to catch sleep syscall entry seemed like overkill.)
-Kees
[1] https://lore.kernel.org/lkml/20211008111527.438276127@infradead.org/ [2] https://lore.kernel.org/lkml/20211008124052.GA976@C02TD0UTHF1T.local/ [3] https://man7.org/tlpi/code/online/diff/pipes/pipe_sync.c.html
- snprintf(buf, sizeof(buf), "/proc/%d/wchan", child);
 - fd = open(buf, O_RDONLY);
 - if (fd < 0) {
 if (errno == ENOENT)return 4;perror_exit(buf);- }
 - memset(buf, 0, sizeof(buf));
 - if (read(fd, buf, sizeof(buf) - 1) < 1)
 perror_exit(buf);- if (strstr(buf, "sleep") == NULL) {
 fprintf(stderr, "FAIL: did not find 'sleep' in wchan '%s'\n", buf);return 1;- }
 - printf("ok: found 'sleep' in wchan '%s'\n", buf);
 - if (kill(child, SIGKILL) < 0)
 perror_exit("kill");- if (waitpid(child, NULL, 0) != child) {
 fprintf(stderr, "waitpid: got the wrong child!?\n");return 1;- }
 - return 0;
 +}