Right now, only way for task->thread_info->syscall to be updated is if if _TIF_SYSCALL_WORK is set in current's task thread_info->flags (similar to what has_syscall_work() checks for arm64).
This means that "->syscall" will only be updated if we are tracing the syscalls through ptrace, for example. This is NOT the same behavior as arm64, when pt_regs->syscallno is updated in the beginning of svc0 handler for *every* syscall entry.
This patch fixes the issue since this behavior is needed for /proc/<pid>/syscall 1st argument to be correctly updated.
Link: https://bugs.linaro.org/show_bug.cgi?id=3783 Cc: stable@vger.kernel.org # v4.4 v4.9 v4.14 v4.19 Signed-off-by: Rafael David Tinoco rafael.tinoco@linaro.org --- arch/arm/kernel/asm-offsets.c | 1 + arch/arm/kernel/entry-common.S | 2 ++ 2 files changed, 3 insertions(+)
diff --git a/arch/arm/kernel/asm-offsets.c b/arch/arm/kernel/asm-offsets.c index 3968d6c22455..bfe68a98e1c6 100644 --- a/arch/arm/kernel/asm-offsets.c +++ b/arch/arm/kernel/asm-offsets.c @@ -64,6 +64,7 @@ int main(void) DEFINE(TI_USED_CP, offsetof(struct thread_info, used_cp)); DEFINE(TI_TP_VALUE, offsetof(struct thread_info, tp_value)); DEFINE(TI_FPSTATE, offsetof(struct thread_info, fpstate)); + DEFINE(TI_SYSCALL, offsetof(struct thread_info, syscall)); #ifdef CONFIG_VFP DEFINE(TI_VFPSTATE, offsetof(struct thread_info, vfpstate)); #ifdef CONFIG_SMP diff --git a/arch/arm/kernel/entry-common.S b/arch/arm/kernel/entry-common.S index 0465d65d23de..557e2add4e83 100644 --- a/arch/arm/kernel/entry-common.S +++ b/arch/arm/kernel/entry-common.S @@ -257,6 +257,8 @@ local_restart: tst r10, #_TIF_SYSCALL_WORK @ are we tracing syscalls? bne __sys_trace
+ str r7, [tsk, #TI_SYSCALL] @ update thread_info->syscall + invoke_syscall tbl, scno, r10, __ret_fast_syscall
add r1, sp, #S_OFF
On Mon, Nov 26, 2018 at 08:53:35PM -0200, Rafael David Tinoco wrote:
Right now, only way for task->thread_info->syscall to be updated is if if _TIF_SYSCALL_WORK is set in current's task thread_info->flags (similar to what has_syscall_work() checks for arm64).
This means that "->syscall" will only be updated if we are tracing the syscalls through ptrace, for example. This is NOT the same behavior as arm64, when pt_regs->syscallno is updated in the beginning of svc0 handler for *every* syscall entry.
So when was it decided that the syscall number will always be required (we need it to know how far back this has to be backported).
This patch fixes the issue since this behavior is needed for /proc/<pid>/syscall 1st argument to be correctly updated.
Link: https://bugs.linaro.org/show_bug.cgi?id=3783 Cc: stable@vger.kernel.org # v4.4 v4.9 v4.14 v4.19 Signed-off-by: Rafael David Tinoco rafael.tinoco@linaro.org
arch/arm/kernel/asm-offsets.c | 1 + arch/arm/kernel/entry-common.S | 2 ++ 2 files changed, 3 insertions(+)
diff --git a/arch/arm/kernel/asm-offsets.c b/arch/arm/kernel/asm-offsets.c index 3968d6c22455..bfe68a98e1c6 100644 --- a/arch/arm/kernel/asm-offsets.c +++ b/arch/arm/kernel/asm-offsets.c @@ -64,6 +64,7 @@ int main(void) DEFINE(TI_USED_CP, offsetof(struct thread_info, used_cp)); DEFINE(TI_TP_VALUE, offsetof(struct thread_info, tp_value)); DEFINE(TI_FPSTATE, offsetof(struct thread_info, fpstate));
- DEFINE(TI_SYSCALL, offsetof(struct thread_info, syscall));
#ifdef CONFIG_VFP DEFINE(TI_VFPSTATE, offsetof(struct thread_info, vfpstate)); #ifdef CONFIG_SMP diff --git a/arch/arm/kernel/entry-common.S b/arch/arm/kernel/entry-common.S index 0465d65d23de..557e2add4e83 100644 --- a/arch/arm/kernel/entry-common.S +++ b/arch/arm/kernel/entry-common.S @@ -257,6 +257,8 @@ local_restart: tst r10, #_TIF_SYSCALL_WORK @ are we tracing syscalls? bne __sys_trace
- str r7, [tsk, #TI_SYSCALL] @ update thread_info->syscall
"scno" is the systemcall number, not "r7".
- invoke_syscall tbl, scno, r10, __ret_fast_syscall
add r1, sp, #S_OFF -- 2.20.0.rc1
linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
On Mon, Nov 26, 2018 at 11:33:03PM +0000, Russell King - ARM Linux wrote:
On Mon, Nov 26, 2018 at 08:53:35PM -0200, Rafael David Tinoco wrote:
Right now, only way for task->thread_info->syscall to be updated is if if _TIF_SYSCALL_WORK is set in current's task thread_info->flags (similar to what has_syscall_work() checks for arm64).
This means that "->syscall" will only be updated if we are tracing the syscalls through ptrace, for example. This is NOT the same behavior as arm64, when pt_regs->syscallno is updated in the beginning of svc0 handler for *every* syscall entry.
So when was it decided that the syscall number will always be required (we need it to know how far back this has to be backported).
PS, I rather object to the fact that the required behaviour seems to change, arch maintainers aren't told about it until... some test is created at some random point in the future which then fails.
Surely there's a better way to communicate changes in requirements than discovery-by-random-bug-report ?
On Mon, Nov 26, 2018 at 11:41:11PM +0000, Russell King - ARM Linux wrote:
On Mon, Nov 26, 2018 at 11:33:03PM +0000, Russell King - ARM Linux wrote:
On Mon, Nov 26, 2018 at 08:53:35PM -0200, Rafael David Tinoco wrote:
Right now, only way for task->thread_info->syscall to be updated is if if _TIF_SYSCALL_WORK is set in current's task thread_info->flags (similar to what has_syscall_work() checks for arm64).
This means that "->syscall" will only be updated if we are tracing the syscalls through ptrace, for example. This is NOT the same behavior as arm64, when pt_regs->syscallno is updated in the beginning of svc0 handler for *every* syscall entry.
So when was it decided that the syscall number will always be required (we need it to know how far back this has to be backported).
PS, I rather object to the fact that the required behaviour seems to change, arch maintainers aren't told about it until... some test is created at some random point in the future which then fails.
Surely there's a better way to communicate changes in requirements than discovery-by-random-bug-report ?
Final comment for tonight - the commit introducing /proc/*/syscall says:
This adds /proc/PID/syscall and /proc/PID/task/TID/syscall magic files. These use task_current_syscall() to show the task's current system call number and argument registers, stack pointer and PC. For a task blocked but not in a syscall, the file shows "-1" in place of the syscall number, followed by only the SP and PC. For a task that's not blocked, it shows "running".
Please validate that a blocked task does indeed show -1 with your patch applied.
On 11/26/18 9:44 PM, Russell King - ARM Linux wrote:
On Mon, Nov 26, 2018 at 11:41:11PM +0000, Russell King - ARM Linux wrote:
On Mon, Nov 26, 2018 at 11:33:03PM +0000, Russell King - ARM Linux wrote:
On Mon, Nov 26, 2018 at 08:53:35PM -0200, Rafael David Tinoco wrote:
Right now, only way for task->thread_info->syscall to be updated is if if _TIF_SYSCALL_WORK is set in current's task thread_info->flags (similar to what has_syscall_work() checks for arm64).
This means that "->syscall" will only be updated if we are tracing the syscalls through ptrace, for example. This is NOT the same behavior as arm64, when pt_regs->syscallno is updated in the beginning of svc0 handler for *every* syscall entry.
So when was it decided that the syscall number will always be required (we need it to know how far back this has to be backported).
PS, I rather object to the fact that the required behaviour seems to change, arch maintainers aren't told about it until... some test is created at some random point in the future which then fails.
Surely there's a better way to communicate changes in requirements than discovery-by-random-bug-report ?
Final comment for tonight - the commit introducing /proc/*/syscall says:
This adds /proc/PID/syscall and /proc/PID/task/TID/syscall magic files. These use task_current_syscall() to show the task's current system call number and argument registers, stack pointer and PC. For a task blocked but not in a syscall, the file shows "-1" in place of the syscall number, followed by only the SP and PC. For a task that's not blocked, it shows "running".
Please validate that a blocked task does indeed show -1 with your patch applied.
Will do. This is done in an upper level (collect_syscall <- task_current_syscall <- proc_pid_syscall):
if (!try_get_task_stack(target)) { /* Task has no stack, so the task isn't in a syscall. */ *sp = *pc = 0; *callno = -1; return 0; }
I think only missing part for arm was that one, but will confirm, after fixing usage of "r7" for obtaining "scno". Will send a v2 in this thread.
Thanks!
On Tue, Nov 27, 2018 at 08:30:32AM -0200, Rafael David Tinoco wrote:
On 11/26/18 9:44 PM, Russell King - ARM Linux wrote:
On Mon, Nov 26, 2018 at 11:41:11PM +0000, Russell King - ARM Linux wrote:
On Mon, Nov 26, 2018 at 11:33:03PM +0000, Russell King - ARM Linux wrote:
On Mon, Nov 26, 2018 at 08:53:35PM -0200, Rafael David Tinoco wrote:
Right now, only way for task->thread_info->syscall to be updated is if if _TIF_SYSCALL_WORK is set in current's task thread_info->flags (similar to what has_syscall_work() checks for arm64).
This means that "->syscall" will only be updated if we are tracing the syscalls through ptrace, for example. This is NOT the same behavior as arm64, when pt_regs->syscallno is updated in the beginning of svc0 handler for *every* syscall entry.
So when was it decided that the syscall number will always be required (we need it to know how far back this has to be backported).
PS, I rather object to the fact that the required behaviour seems to change, arch maintainers aren't told about it until... some test is created at some random point in the future which then fails.
Surely there's a better way to communicate changes in requirements than discovery-by-random-bug-report ?
Final comment for tonight - the commit introducing /proc/*/syscall says:
This adds /proc/PID/syscall and /proc/PID/task/TID/syscall magic files. These use task_current_syscall() to show the task's current system call number and argument registers, stack pointer and PC. For a task blocked but not in a syscall, the file shows "-1" in place of the syscall number, followed by only the SP and PC. For a task that's not blocked, it shows "running".
Please validate that a blocked task does indeed show -1 with your patch applied.
Will do. This is done in an upper level (collect_syscall <- task_current_syscall <- proc_pid_syscall):
if (!try_get_task_stack(target)) { /* Task has no stack, so the task isn't in a syscall. */ *sp = *pc = 0; *callno = -1; return 0; }
I think only missing part for arm was that one, but will confirm, after fixing usage of "r7" for obtaining "scno". Will send a v2 in this thread.
There's another question - what's the expected behaviour when we restart a syscall using the restartblock mechanism? Is the syscall number expected to be __NR_restart_syscall or the original syscall number?
I can't find anywhere that this detail is specified (damn the lack of API documentation - I'm tempted to say that we won't implement this until it gets documented properly, and that test can continue failing until such time that happens.)
On Tue, Nov 27, 2018 at 10:56:20AM +0000, Russell King - ARM Linux wrote:
On Tue, Nov 27, 2018 at 08:30:32AM -0200, Rafael David Tinoco wrote:
On 11/26/18 9:44 PM, Russell King - ARM Linux wrote:
On Mon, Nov 26, 2018 at 11:41:11PM +0000, Russell King - ARM Linux wrote:
On Mon, Nov 26, 2018 at 11:33:03PM +0000, Russell King - ARM Linux wrote:
On Mon, Nov 26, 2018 at 08:53:35PM -0200, Rafael David Tinoco wrote:
Right now, only way for task->thread_info->syscall to be updated is if if _TIF_SYSCALL_WORK is set in current's task thread_info->flags (similar to what has_syscall_work() checks for arm64).
This means that "->syscall" will only be updated if we are tracing the syscalls through ptrace, for example. This is NOT the same behavior as arm64, when pt_regs->syscallno is updated in the beginning of svc0 handler for *every* syscall entry.
So when was it decided that the syscall number will always be required (we need it to know how far back this has to be backported).
PS, I rather object to the fact that the required behaviour seems to change, arch maintainers aren't told about it until... some test is created at some random point in the future which then fails.
Surely there's a better way to communicate changes in requirements than discovery-by-random-bug-report ?
Final comment for tonight - the commit introducing /proc/*/syscall says:
This adds /proc/PID/syscall and /proc/PID/task/TID/syscall magic files. These use task_current_syscall() to show the task's current system call number and argument registers, stack pointer and PC. For a task blocked but not in a syscall, the file shows "-1" in place of the syscall number, followed by only the SP and PC. For a task that's not blocked, it shows "running".
Please validate that a blocked task does indeed show -1 with your patch applied.
Will do. This is done in an upper level (collect_syscall <- task_current_syscall <- proc_pid_syscall):
if (!try_get_task_stack(target)) { /* Task has no stack, so the task isn't in a syscall. */ *sp = *pc = 0; *callno = -1; return 0; }
I think only missing part for arm was that one, but will confirm, after fixing usage of "r7" for obtaining "scno". Will send a v2 in this thread.
There's another question - what's the expected behaviour when we restart a syscall using the restartblock mechanism? Is the syscall number expected to be __NR_restart_syscall or the original syscall number?
I can't find anywhere that this detail is specified (damn the lack of API documentation - I'm tempted to say that we won't implement this until it gets documented properly, and that test can continue failing until such time that happens.)
Having looked around, it seems that the /proc/<PID>/syscall interface was sneaked into the kernel. The patch series which added it was sent in 2008 with a covering message that made no mention of this new interface, instead stating:
http://lkml.iu.edu/hypermail/linux/kernel/0807.2/0551.html
Most of these changes move code around with little or no change, and they should not break anything or change any behavior.
While that statement is absolutely correct, it doesn't highlight the fact that the set of patches _also_ include a brand new userspace interface exposing things like syscall numbers and arguments in /proc.
There appears to be no documentation at all of this interface, so there is no definition of how it is supposed to work or what it is supposed to expose beyond what little information is in the original patch:
http://lkml.iu.edu/hypermail/linux/kernel/0807.2/0577.html
This adds /proc/PID/syscall and /proc/PID/task/TID/syscall magic files. These use task_current_syscall() to show the task's current system call number and argument registers, stack pointer and PC. For a task blocked but not in a syscall, the file shows "-1" in place of the syscall number, followed by only the SP and PC. For a task that's not blocked, it shows "running".
This really isn't a good place to be - this is why commit messages should _not_ just describe what the changes are doing, also _why_ they are being made. Also, any new user interface needs to be fully and properly documented, because years later, people will move away, knowledge will be lost, and that leaves us with a maintainability problem, exactly like we have right now with this.
With the lack of interface documentation, how do we even know whether the /proc/*/syscall is supposed to show the syscall number of non-traced threads? How do we know that the test that found this is actually correct in reporting a failure? How do we know whether it's supposed to expose __NR_restart_syscall?
So, I thought I'd write a test program:
#include <stdio.h> #include <stdlib.h> #include <signal.h> #include <sys/fcntl.h> #include <unistd.h>
static int read_file(const char *fn, char *buf, size_t size) { int fd, ret, nr;
fd = open(fn, O_RDONLY); if (fd == -1) return -1;
for (nr = 0; nr < size; nr += ret) { ret = read(fd, buf + nr, size - nr); if (ret <= 0) break; }
close(fd);
return nr ? nr : ret; }
int main() { char fn[64], buf[256]; int pid, ret;
pid = fork(); if (pid == 0) { /* child */ sleep(5); exit(0); }
/* parent */ sleep(1); snprintf(fn, sizeof(fn), "/proc/%d/syscall", pid); ret = read_file(fn, buf, sizeof(buf));
printf("%.*s", ret, buf);
kill(pid, SIGCONT); sleep(1);
ret = read_file(fn, buf, sizeof(buf));
printf("%.*s", ret, buf);
return 0; }
On x86 (32-bit app on 64-bit kernel), it has this behaviour:
$ ./syscall-test 162 0xffcc5a6c 0xffcc5a6c 0x48d09000 0x0 0xffcc5af4 0xffcc5a74 0xffcc5a2c 0xf77dfa59 162 0xffcc5a6c 0xffcc5a6c 0x48d09000 0x0 0xffcc5af4 0xffcc5a74 0xffcc5a2c 0xf77dfa59
which looks good, except:
$ strace -o /dev/null -f ./syscall-test 162 0xffc0070c 0xffc0070c 0x48d09000 0x0 0xffc00794 0xffc00714 0xffc006cc 0xf77f3a59 0 0xffc0070c 0xffc0070c 0x48d09000 0x0 0xffc00794 0xffc00714 0xffc006cc 0xf77f3a59
So, if we're syscall ptracing a program, __NR_restart_syscall gets exposed through this interface, but if we aren't, it isn't exposed. Which version is correct? *shrug*, no documentation...
From: Russell King - ARM Linux
Sent: 27 November 2018 15:36
...
There appears to be no documentation at all of this interface, so there is no definition of how it is supposed to work or what it is supposed to expose beyond what little information is in the original patch:
http://lkml.iu.edu/hypermail/linux/kernel/0807.2/0577.html
This adds /proc/PID/syscall and /proc/PID/task/TID/syscall magic files. These use task_current_syscall() to show the task's current system call number and argument registers, stack pointer and PC. For a task blocked but not in a syscall, the file shows "-1" in place of the syscall number, followed by only the SP and PC. For a task that's not blocked, it shows "running".
I 'like' the way the columns for sp and pc jump about ...
David
- Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
On 11/27/18 1:35 PM, Russell King - ARM Linux wrote:
On x86 (32-bit app on 64-bit kernel), it has this behaviour:
$ ./syscall-test 162 0xffcc5a6c 0xffcc5a6c 0x48d09000 0x0 0xffcc5af4 0xffcc5a74 0xffcc5a2c 0xf77dfa59 162 0xffcc5a6c 0xffcc5a6c 0x48d09000 0x0 0xffcc5af4 0xffcc5a74 0xffcc5a2c 0xf77dfa59
which looks good, except:
$ strace -o /dev/null -f ./syscall-test 162 0xffc0070c 0xffc0070c 0x48d09000 0x0 0xffc00794 0xffc00714 0xffc006cc 0xf77f3a59 0 0xffc0070c 0xffc0070c 0x48d09000 0x0 0xffc00794 0xffc00714 0xffc006cc 0xf77f3a59
So, if we're syscall ptracing a program, __NR_restart_syscall gets exposed through this interface, but if we aren't, it isn't exposed. Which version is correct? *shrug*, no documentation...
I looked around and could only find people using this interface as an alternate mechanism - than ptracing - for discovering what a task was doing a certain moment (Mostly during hangs. I haven't found a good schematic way - or tool - that uses it, but I might be wrong there). For that purpose, it would be better *not* to update when restart_syscall happens, unless the hang is right there =o).
If you check: https://bugs.linaro.org/show_bug.cgi?id=3783#c11
The only syscalls being updated now - that I could get, without any workload - were 252 (epoll_wait) or 252 (sched_getaffinity):
252 0x7 0xbee80578 0x1b 0xffffffff 0xbee80578 0x7 0xbee80550 0xb6e9d286 252 0xa 0xbef3d930 0x9 0xffffffff 0x0 0x6c 0xbef3d908 0xb6e6f286 142 0x4 0x0 0xbea3eb2c 0x0 0x0 0x6c 0xbea3ea88 0xb6e40286 252 0x4 0xbe8d59e8 0x9 0xffffffff 0xbe8d59e8 0x4 0xbe8d59c0 0xb6e11286
All others I got zeroed, and that's where everything started.
Please, let me know if you still want a v2, or something else... like fixing it and making it to ignore the restart_syscall for *at least* having the same behavior across diff archs.
If it is not worth, I'll just blacklist those tests in our functional tests environment and move on =).
Thanks a lot, very best rgds,
linux-stable-mirror@lists.linaro.org