Adjust the calls to `user_regset_copyout' and `user_regset_copyin' in `riscv_fpr_get' and `riscv_fpr_set' respectively so as to use @start_pos and @end_pos according to API documentation in <linux/regset.h>, that is to point at the beginning and the end respectively of the data chunk to be copied. Update @data accordingly, also for the first call, to make it clear which structure member is accessed.
We currently have @start_pos fixed at 0 across all calls, which works as a result of the implementation, in particular because we have no padding between the FP general registers and the FP control and status register, but appears not to have been the intent of the API and is not what other ports do, requiring one to study the copy handlers to understand what is going on here.
Signed-off-by: Maciej W. Rozycki macro@wdc.com Fixes: b8c8a9590e4f ("RISC-V: Add FP register ptrace support for gdb.") Cc: stable@vger.kernel.org # 4.20+ --- arch/riscv/kernel/ptrace.c | 14 ++++++++++---- 1 file changed, 10 insertions(+), 4 deletions(-)
linux-riscv-ptrace-fcsr.diff Index: linux-hv/arch/riscv/kernel/ptrace.c =================================================================== --- linux-hv.orig/arch/riscv/kernel/ptrace.c +++ linux-hv/arch/riscv/kernel/ptrace.c @@ -61,10 +61,13 @@ static int riscv_fpr_get(struct task_str int ret; struct __riscv_d_ext_state *fstate = &target->thread.fstate;
- ret = user_regset_copyout(&pos, &count, &kbuf, &ubuf, fstate, 0, + ret = user_regset_copyout(&pos, &count, &kbuf, &ubuf, &fstate->f, 0, offsetof(struct __riscv_d_ext_state, fcsr)); if (!ret) { - ret = user_regset_copyout(&pos, &count, &kbuf, &ubuf, fstate, 0, + ret = user_regset_copyout(&pos, &count, &kbuf, &ubuf, + &fstate->fcsr, + offsetof(struct __riscv_d_ext_state, + fcsr), offsetof(struct __riscv_d_ext_state, fcsr) + sizeof(fstate->fcsr)); } @@ -80,10 +83,13 @@ static int riscv_fpr_set(struct task_str int ret; struct __riscv_d_ext_state *fstate = &target->thread.fstate;
- ret = user_regset_copyin(&pos, &count, &kbuf, &ubuf, fstate, 0, + ret = user_regset_copyin(&pos, &count, &kbuf, &ubuf, &fstate->f, 0, offsetof(struct __riscv_d_ext_state, fcsr)); if (!ret) { - ret = user_regset_copyin(&pos, &count, &kbuf, &ubuf, fstate, 0, + ret = user_regset_copyin(&pos, &count, &kbuf, &ubuf, + &fstate->fcsr, + offsetof(struct __riscv_d_ext_state, + fcsr), offsetof(struct __riscv_d_ext_state, fcsr) + sizeof(fstate->fcsr)); }
On Thu, 23 Jul 2020 16:22:15 PDT (-0700), macro@wdc.com wrote:
Adjust the calls to `user_regset_copyout' and `user_regset_copyin' in `riscv_fpr_get' and `riscv_fpr_set' respectively so as to use @start_pos and @end_pos according to API documentation in <linux/regset.h>, that is to point at the beginning and the end respectively of the data chunk to be copied. Update @data accordingly, also for the first call, to make it clear which structure member is accessed.
We currently have @start_pos fixed at 0 across all calls, which works as a result of the implementation, in particular because we have no padding between the FP general registers and the FP control and status register, but appears not to have been the intent of the API and is not what other ports do, requiring one to study the copy handlers to understand what is going on here.
Signed-off-by: Maciej W. Rozycki macro@wdc.com Fixes: b8c8a9590e4f ("RISC-V: Add FP register ptrace support for gdb.") Cc: stable@vger.kernel.org # 4.20+
arch/riscv/kernel/ptrace.c | 14 ++++++++++---- 1 file changed, 10 insertions(+), 4 deletions(-)
linux-riscv-ptrace-fcsr.diff Index: linux-hv/arch/riscv/kernel/ptrace.c =================================================================== --- linux-hv.orig/arch/riscv/kernel/ptrace.c +++ linux-hv/arch/riscv/kernel/ptrace.c @@ -61,10 +61,13 @@ static int riscv_fpr_get(struct task_str int ret; struct __riscv_d_ext_state *fstate = &target->thread.fstate;
- ret = user_regset_copyout(&pos, &count, &kbuf, &ubuf, fstate, 0,
- ret = user_regset_copyout(&pos, &count, &kbuf, &ubuf, &fstate->f, 0, offsetof(struct __riscv_d_ext_state, fcsr));
As far as I can tell the current code works correctly, it just requires knowledge of the layout of __riscv_d_ext_state to determine that it functions correctly. This new code still requires that knowledge: the first blob copies the F registers, but only works if the CSR is after the registers. If we fix both of those the code seems easier to read, but I don't think splitting the difference helps any.
So I guess what I'm saying is: maybe that second line should be changed to something like "ARRAY_SIZE(fstate->f)"?
if (!ret) {
ret = user_regset_copyout(&pos, &count, &kbuf, &ubuf, fstate, 0,
ret = user_regset_copyout(&pos, &count, &kbuf, &ubuf,
&fstate->fcsr,
offsetof(struct __riscv_d_ext_state,
}fcsr), offsetof(struct __riscv_d_ext_state, fcsr) + sizeof(fstate->fcsr));
@@ -80,10 +83,13 @@ static int riscv_fpr_set(struct task_str int ret; struct __riscv_d_ext_state *fstate = &target->thread.fstate;
- ret = user_regset_copyin(&pos, &count, &kbuf, &ubuf, fstate, 0,
- ret = user_regset_copyin(&pos, &count, &kbuf, &ubuf, &fstate->f, 0, offsetof(struct __riscv_d_ext_state, fcsr)); if (!ret) {
ret = user_regset_copyin(&pos, &count, &kbuf, &ubuf, fstate, 0,
ret = user_regset_copyin(&pos, &count, &kbuf, &ubuf,
&fstate->fcsr,
offsetof(struct __riscv_d_ext_state,
}fcsr), offsetof(struct __riscv_d_ext_state, fcsr) + sizeof(fstate->fcsr));
On Tue, Aug 04, 2020 at 07:01:01PM -0700, Palmer Dabbelt wrote:
We currently have @start_pos fixed at 0 across all calls, which works as a result of the implementation, in particular because we have no padding between the FP general registers and the FP control and status register, but appears not to have been the intent of the API and is not what other ports do, requiring one to study the copy handlers to understand what is going on here.
start_pos *is* fixed at 0 and it's going to go away, along with the sodding user_regset_copyout() very shortly. ->get() is simply a bad API. See vfs.git#work.regset for replacement. And ->put() is also going to be taken out and shot (next cycle, most likely).
On Tue, 04 Aug 2020 19:07:45 PDT (-0700), viro@zeniv.linux.org.uk wrote:
On Tue, Aug 04, 2020 at 07:01:01PM -0700, Palmer Dabbelt wrote:
We currently have @start_pos fixed at 0 across all calls, which works as a result of the implementation, in particular because we have no padding between the FP general registers and the FP control and status register, but appears not to have been the intent of the API and is not what other ports do, requiring one to study the copy handlers to understand what is going on here.
start_pos *is* fixed at 0 and it's going to go away, along with the sodding user_regset_copyout() very shortly. ->get() is simply a bad API. See vfs.git#work.regset for replacement. And ->put() is also going to be taken out and shot (next cycle, most likely).
I'm not sure I understand what you're saying, but given that branch replaces all of this I guess it's best to just do nothing on our end here?
On Tue, Aug 04, 2020 at 07:20:05PM -0700, Palmer Dabbelt wrote:
On Tue, 04 Aug 2020 19:07:45 PDT (-0700), viro@zeniv.linux.org.uk wrote:
On Tue, Aug 04, 2020 at 07:01:01PM -0700, Palmer Dabbelt wrote:
We currently have @start_pos fixed at 0 across all calls, which works as a result of the implementation, in particular because we have no padding between the FP general registers and the FP control and status register, but appears not to have been the intent of the API and is not what other ports do, requiring one to study the copy handlers to understand what is going on here.
start_pos *is* fixed at 0 and it's going to go away, along with the sodding user_regset_copyout() very shortly. ->get() is simply a bad API. See vfs.git#work.regset for replacement. And ->put() is also going to be taken out and shot (next cycle, most likely).
I'm not sure I understand what you're saying, but given that branch replaces all of this I guess it's best to just do nothing on our end here?
It doesn't replace ->put() (for now); it _does_ replace ->get() and AFAICS the replacement is much saner:
static int riscv_fpr_get(struct task_struct *target, const struct user_regset *regset, struct membuf to) { struct __riscv_d_ext_state *fstate = &target->thread.fstate;
membuf_write(&to, fstate, offsetof(struct __riscv_d_ext_state, fcsr)); membuf_store(&to, fstate->fcsr); return membuf_zero(&to, 4); // explicitly pad }
user_regset_copyout() calling conventions are atrocious and so are those of regset ->get(). The best thing to do with both is to take them out of their misery and be done with that. Do you see any problems with riscv gdbserver on current linux-next? If not, I'd rather see that "API" simply go away... If there are problems, I would very much prefer fixes on top of what's done in that branch.
On Tue, 04 Aug 2020 19:48:07 PDT (-0700), viro@zeniv.linux.org.uk wrote:
On Tue, Aug 04, 2020 at 07:20:05PM -0700, Palmer Dabbelt wrote:
On Tue, 04 Aug 2020 19:07:45 PDT (-0700), viro@zeniv.linux.org.uk wrote:
On Tue, Aug 04, 2020 at 07:01:01PM -0700, Palmer Dabbelt wrote:
We currently have @start_pos fixed at 0 across all calls, which works as a result of the implementation, in particular because we have no padding between the FP general registers and the FP control and status register, but appears not to have been the intent of the API and is not what other ports do, requiring one to study the copy handlers to understand what is going on here.
start_pos *is* fixed at 0 and it's going to go away, along with the sodding user_regset_copyout() very shortly. ->get() is simply a bad API. See vfs.git#work.regset for replacement. And ->put() is also going to be taken out and shot (next cycle, most likely).
I'm not sure I understand what you're saying, but given that branch replaces all of this I guess it's best to just do nothing on our end here?
It doesn't replace ->put() (for now); it _does_ replace ->get() and AFAICS the replacement is much saner:
static int riscv_fpr_get(struct task_struct *target, const struct user_regset *regset, struct membuf to) { struct __riscv_d_ext_state *fstate = &target->thread.fstate;
membuf_write(&to, fstate, offsetof(struct __riscv_d_ext_state, fcsr)); membuf_store(&to, fstate->fcsr); return membuf_zero(&to, 4); // explicitly pad }
user_regset_copyout() calling conventions are atrocious and so are those of regset ->get(). The best thing to do with both is to take them out of their misery and be done with that. Do you see any problems with riscv gdbserver on current linux-next? If not, I'd rather see that "API" simply go away... If there are problems, I would very much prefer fixes on top of what's done in that branch.
I guess my confusion was about "start_pos *is* fixed at 0": it certainly is zero in the code right now, but when poking around while review the patch I didn't see any reason that must be so. Admittedly all I did was read the prototype and function, so maybe I'm just missing something. That said, if it's all going away anyway then I don't really care either way.
As far as I can tell the patch set in question (the RISC-V one) doesn't change any functionality. I don't actually use GDB, but I haven't seen any issues reported in a few years so if there is one I've missed it.
I did this ptrace stuff many years ago (IIRC it was actually my first RISC-V Linux patch), and all I really remember is that it seemed way more complicated than it needed to be. I'm happy to just drop our patch set, as yours looks way cleaner to me and if you're already planning on fixing put() then it doesn't seem worth the churn.
On Wed, 5 Aug 2020, Al Viro wrote:
I'm not sure I understand what you're saying, but given that branch replaces all of this I guess it's best to just do nothing on our end here?
It doesn't replace ->put() (for now); it _does_ replace ->get() and AFAICS the replacement is much saner:
static int riscv_fpr_get(struct task_struct *target, const struct user_regset *regset, struct membuf to) { struct __riscv_d_ext_state *fstate = &target->thread.fstate;
membuf_write(&to, fstate, offsetof(struct __riscv_d_ext_state, fcsr)); membuf_store(&to, fstate->fcsr); return membuf_zero(&to, 4); // explicitly pad }
I'm glad to see the old interface go, it was cumbersome.
user_regset_copyout() calling conventions are atrocious and so are those of regset ->get(). The best thing to do with both is to take them out of their misery and be done with that. Do you see any problems with riscv gdbserver on current linux-next? If not, I'd rather see that "API" simply go away... If there are problems, I would very much prefer fixes on top of what's done in that branch.
I can push linux-next through regression-testing with RISC-V gdbserver and/or native GDB if that would help. This is also used with core dumps, but honestly I don't know what state RISC-V support is in in the BFD/GDB's core dump interpreter, as people tend to forget about the core dump feature nowadays.
Maciej
On Wed, 05 Aug 2020 03:25:11 PDT (-0700), macro@wdc.com wrote:
On Wed, 5 Aug 2020, Al Viro wrote:
I'm not sure I understand what you're saying, but given that branch replaces all of this I guess it's best to just do nothing on our end here?
It doesn't replace ->put() (for now); it _does_ replace ->get() and AFAICS the replacement is much saner:
static int riscv_fpr_get(struct task_struct *target, const struct user_regset *regset, struct membuf to) { struct __riscv_d_ext_state *fstate = &target->thread.fstate;
membuf_write(&to, fstate, offsetof(struct __riscv_d_ext_state, fcsr)); membuf_store(&to, fstate->fcsr); return membuf_zero(&to, 4); // explicitly pad }
I'm glad to see the old interface go, it was cumbersome.
user_regset_copyout() calling conventions are atrocious and so are those of regset ->get(). The best thing to do with both is to take them out of their misery and be done with that. Do you see any problems with riscv gdbserver on current linux-next? If not, I'd rather see that "API" simply go away... If there are problems, I would very much prefer fixes on top of what's done in that branch.
I can push linux-next through regression-testing with RISC-V gdbserver and/or native GDB if that would help. This is also used with core dumps, but honestly I don't know what state RISC-V support is in in the BFD/GDB's core dump interpreter, as people tend to forget about the core dump feature nowadays.
IIRC Andrew does GDB test suite runs sometimes natively on Linux as part of general GDB maintiance and we don't see major issues, but I'm pretty checked out of GDB development these days so he would know better than I do. It's always great to have someone test stuff, though -- and I doubt he's testing linux-next. It's been on my TODO list for a long time now to put together tip-of-tree testing for the various projects but I've never gotten around to doing it.
Oddly enough, despite not really using GDB I have used it for core dumps -- I was writing a tool to convert commit logs to coredumps with the GDB reverse debugging annotations, but I never got around to finishing it.
On Wed, 5 Aug 2020, Palmer Dabbelt wrote:
I can push linux-next through regression-testing with RISC-V gdbserver and/or native GDB if that would help. This is also used with core dumps, but honestly I don't know what state RISC-V support is in in the BFD/GDB's core dump interpreter, as people tend to forget about the core dump feature nowadays.
IIRC Andrew does GDB test suite runs sometimes natively on Linux as part of general GDB maintiance and we don't see major issues, but I'm pretty checked out of GDB development these days so he would know better than I do. It's always great to have someone test stuff, though -- and I doubt he's testing linux-next. It's been on my TODO list for a long time now to put together tip-of-tree testing for the various projects but I've never gotten around to doing it.
I have now run GDB regression testing with remote `gdbserver' on a HiFive Unleashed, lp64d ABI only, comparing 5.8.0-next-20200814 against 5.8.0-rc5 with no issues observed.
Oddly enough, despite not really using GDB I have used it for core dumps -- I was writing a tool to convert commit logs to coredumps with the GDB reverse debugging annotations, but I never got around to finishing it.
I fiddled with core dump handling verification for GDB back in my MIPS days expanding an existing test case to interpret an OS-generated core dump in addition to one produced by GDB's `gcore' command, although in the case of local testing only (i.e. either native or running `gdbserver' on the same test machine GDB runs); this restriction is due to the need to isolate the core file produced, as it may or may not have a .$pid suffix attached (or may have yet another name variation with non-Linux targets), which is somewhat complicated with commands run remotely (though I imagine the restriction could be lifted by someone sufficiently inclined).
The relevant tests results are as follows (on a successful run):
PASS: gdb.threads/tls-core.exp: native: load core file PASS: gdb.threads/tls-core.exp: native: print thread-local storage variable PASS: gdb.threads/tls-core.exp: gcore: load core file PASS: gdb.threads/tls-core.exp: gcore: print thread-local storage variable
and the binutils-gdb change is commit d9f6d7f8b636 ("testsuite: Extend TLS core file testing with an OS-generated dump"). So that part should be covered at least to some extent by automated testing.
However something is not exactly right and I recall having an issue recorded for later investigation (which may not happen given the recent turn of events) that RISC-V/Linux does not actually dump cores even in the circumstances it is supposed to (i.e. the combination of the specific signal delivered and RLIMIT_CORE set to infinity imply it).
Indeed I have run the test natively now and I got:
PASS: gdb.threads/tls-core.exp: successfully compiled posix threads test case WARNING: can't generate a core file - core tests suppressed - check ulimit -c PASS: gdb.threads/tls-core.exp: gcore UNSUPPORTED: gdb.threads/tls-core.exp: native: load core file UNSUPPORTED: gdb.threads/tls-core.exp: native: print thread-local storage variable PASS: gdb.threads/tls-core.exp: gcore: load core file PASS: gdb.threads/tls-core.exp: gcore: print thread-local storage variable
which means things are not actually sound. Likewise if I run the test program manually:
$ ulimit -c unlimited $ ./tls-core Aborted (core dumped) $ ls -la core* ls: cannot access 'core*': No such file or directory $
-- oops!
[As it turned out MIPS core dump handling was completely messed up both on the Linux and the GDB side. See binutils-gdb commit d8dab6c3bbe6 ("MIPS/Linux: Correct o32 core file FGR interpretation") if interested; there are further Linux commit references there.]
Maciej
linux-stable-mirror@lists.linaro.org