This series collects the various SME related fixes that were previously posted separately. These should address all the issues I am aware of so a patch which reenables the SME configuration option is also included.
Signed-off-by: Mark Brown broonie@kernel.org --- Mark Brown (6): arm64/sme: Flush foreign register state in do_sme_acc() arm64/fp: Don't corrupt FPMR when streaming mode changes arm64/ptrace: Zero FPMR on streaming mode entry/exit arm64/signal: Consistently invalidate the in register FP state in restore arm64/signal: Avoid corruption of SME state when entering signal handler arm64/sme: Reenable SME
arch/arm64/Kconfig | 1 - arch/arm64/include/asm/fpsimd.h | 1 + arch/arm64/kernel/fpsimd.c | 49 +++++++++++++++++++++-- arch/arm64/kernel/ptrace.c | 12 +++++- arch/arm64/kernel/signal.c | 89 +++++++++++------------------------------ 5 files changed, 79 insertions(+), 73 deletions(-) --- base-commit: 40384c840ea1944d7c5a392e8975ed088ecf0b37 change-id: 20241202-arm64-sme-reenable-98e64c161a8e
Best regards,
When do_sme_acc() runs with foreign FP state it does not do any updates of the task structure, relying on the next return to userspace to reload the register state appropriately, but leaves the task's last loaded CPU untouched. This means that if the task returns to userspace on the last CPU it ran on then the checks in fpsimd_bind_task_to_cpu() will incorrectly determine that the register state on the CPU is current and suppress reload of the floating point register state before returning to userspace. This will result in spurious warnings due to SME access traps occuring for the task after TIF_SME is set.
Call fpsimd_flush_task_state() to invalidate the last loaded CPU recorded in the task, forcing detection of the task as foreign.
Fixes: 8bd7f91c03d8 ("arm64/sme: Implement traps and syscall handling for SME") Reported-by: Mark Rutlamd mark.rutland@arm.com Signed-off-by: Mark Brown broonie@kernel.org Cc: stable@vger.kernel.org --- arch/arm64/kernel/fpsimd.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c index 8c4c1a2186cc510a7826d15ec36225857c07ed71..eca0b6a2fc6fa25d8c850a5b9e109b4d58809f54 100644 --- a/arch/arm64/kernel/fpsimd.c +++ b/arch/arm64/kernel/fpsimd.c @@ -1460,6 +1460,8 @@ void do_sme_acc(unsigned long esr, struct pt_regs *regs) sme_set_vq(vq_minus_one);
fpsimd_bind_task_to_cpu(); + } else { + fpsimd_flush_task_state(current); }
put_cpu_fpsimd_context();
On Tue, Dec 03, 2024 at 12:45:53PM +0000, Mark Brown wrote:
When do_sme_acc() runs with foreign FP state it does not do any updates of the task structure, relying on the next return to userspace to reload the register state appropriately, but leaves the task's last loaded CPU untouched. This means that if the task returns to userspace on the last CPU it ran on then the checks in fpsimd_bind_task_to_cpu() will incorrectly determine that the register state on the CPU is current and suppress reload of the floating point register state before returning to userspace. This will result in spurious warnings due to SME access traps occuring for the task after TIF_SME is set.
Call fpsimd_flush_task_state() to invalidate the last loaded CPU recorded in the task, forcing detection of the task as foreign.
Fixes: 8bd7f91c03d8 ("arm64/sme: Implement traps and syscall handling for SME") Reported-by: Mark Rutlamd mark.rutland@arm.com Signed-off-by: Mark Brown broonie@kernel.org Cc: stable@vger.kernel.org
arch/arm64/kernel/fpsimd.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c index 8c4c1a2186cc510a7826d15ec36225857c07ed71..eca0b6a2fc6fa25d8c850a5b9e109b4d58809f54 100644 --- a/arch/arm64/kernel/fpsimd.c +++ b/arch/arm64/kernel/fpsimd.c @@ -1460,6 +1460,8 @@ void do_sme_acc(unsigned long esr, struct pt_regs *regs) sme_set_vq(vq_minus_one); fpsimd_bind_task_to_cpu();
- } else {
fpsimd_flush_task_state(current);
TIF_FOREIGN_FPSTATE is (or was) a cache of the task<->CPU binding that you're clobbering here.
So, this fpsimd_flush_task_state() should have no effect unless TIF_FOREIGN_FPSTATE is already wrong? I'm wondering if the apparent need for this means that there is an undiagnosed bug elsewhere.
(My understanding is based on FPSIMD/SVE; I'm less familiar with the SME changes, so I may be missing something important here.)
[...]
Cheers ---Dave
On Tue, Dec 03, 2024 at 03:32:22PM +0000, Dave Martin wrote:
On Tue, Dec 03, 2024 at 12:45:53PM +0000, Mark Brown wrote:
@@ -1460,6 +1460,8 @@ void do_sme_acc(unsigned long esr, struct pt_regs *regs) sme_set_vq(vq_minus_one); fpsimd_bind_task_to_cpu();
- } else {
fpsimd_flush_task_state(current);
TIF_FOREIGN_FPSTATE is (or was) a cache of the task<->CPU binding that you're clobbering here.
So, this fpsimd_flush_task_state() should have no effect unless TIF_FOREIGN_FPSTATE is already wrong? I'm wondering if the apparent need for this means that there is an undiagnosed bug elsewhere.
(My understanding is based on FPSIMD/SVE; I'm less familiar with the SME changes, so I may be missing something important here.)
It's to ensure that the last recorded CPU for the current task is invalid so that if the state was loaded on another CPU and we switch back to that CPU we reload the state from memory, we need to at least trigger configuration of the SME VL.
On Tue, Dec 03, 2024 at 04:00:45PM +0000, Mark Brown wrote:
On Tue, Dec 03, 2024 at 03:32:22PM +0000, Dave Martin wrote:
On Tue, Dec 03, 2024 at 12:45:53PM +0000, Mark Brown wrote:
@@ -1460,6 +1460,8 @@ void do_sme_acc(unsigned long esr, struct pt_regs *regs) sme_set_vq(vq_minus_one); fpsimd_bind_task_to_cpu();
- } else {
fpsimd_flush_task_state(current);
TIF_FOREIGN_FPSTATE is (or was) a cache of the task<->CPU binding that you're clobbering here.
So, this fpsimd_flush_task_state() should have no effect unless TIF_FOREIGN_FPSTATE is already wrong? I'm wondering if the apparent need for this means that there is an undiagnosed bug elsewhere.
(My understanding is based on FPSIMD/SVE; I'm less familiar with the SME changes, so I may be missing something important here.)
It's to ensure that the last recorded CPU for the current task is invalid so that if the state was loaded on another CPU and we switch back to that CPU we reload the state from memory, we need to at least trigger configuration of the SME VL.
OK, so the logic here is something like:
Disregarding SME, the FPSIMD/SVE regs are up to date, which is fine because SME is trapped.
When we take the SME trap, we suddenly have some work to do in order to make sure that the SME-specific parts of the register state are up to date, so we need to mark the state as stale before setting TIF_SME and returning.
fpsimd_flush_task_state() means that we do the necessary work when re- entering userspace, but is there a problem with simply marking all the FPSIMD/vector state as stale? If FPSR or FPCR is dirty for example, it now looks like they won't get written back to thread struct if there is a context switch before current re-enters userspace?
Maybe the other flags distinguish these cases -- I haven't fully got my head around it.
(Actually, the ARM ARM says (IMHTLZ) that toggling PSTATE.SM by any means causes FPSR to become 0x800009f. I'm not sure where that fits in -- do we handle that anywhere? I guess the "soft" SM toggling via ptrace, signal delivery or maybe exec, ought to set this? Not sure how that interacts with the expected behaviour of the fenv(3) API... Hmm. I see no corresponding statement about FPCR.)
Cheers ---Dave
On Tue, Dec 03, 2024 at 05:00:08PM +0000, Dave Martin wrote:
On Tue, Dec 03, 2024 at 04:00:45PM +0000, Mark Brown wrote:
It's to ensure that the last recorded CPU for the current task is invalid so that if the state was loaded on another CPU and we switch back to that CPU we reload the state from memory, we need to at least trigger configuration of the SME VL.
OK, so the logic here is something like:
Disregarding SME, the FPSIMD/SVE regs are up to date, which is fine because SME is trapped.
When we take the SME trap, we suddenly have some work to do in order to make sure that the SME-specific parts of the register state are up to date, so we need to mark the state as stale before setting TIF_SME and returning.
We know that the only bit of register state which is not up to date at this point is the SME vector length, we don't configure that for tasks that do not have SME. SVCR is always configured since we have to exit streaming mode for FPSIMD and SVE to work properly so we know it's already 0, all the other SME specific state is gated by controls in SVCR.
fpsimd_flush_task_state() means that we do the necessary work when re- entering userspace, but is there a problem with simply marking all the FPSIMD/vector state as stale? If FPSR or FPCR is dirty for example, it now looks like they won't get written back to thread struct if there is a context switch before current re-enters userspace?
Maybe the other flags distinguish these cases -- I haven't fully got my head around it.
We are doing fpsimd_flush_task_state() in the TIF_FOREIGN_FPSTATE case so we know there is no dirty state in the registers.
(Actually, the ARM ARM says (IMHTLZ) that toggling PSTATE.SM by any means causes FPSR to become 0x800009f. I'm not sure where that fits in -- do we handle that anywhere? I guess the "soft" SM toggling via
Urgh, not seen that one - that needs handling in the signal entry path and ptrace. That will have been defined while the feature was being implemented. It's not relevant here though since we are in the SME access trap, we might be trapping due to a SMSTART or equivalent operation but that SMSTART has not yet run at the point where we return to userspace.
ptrace, signal delivery or maybe exec, ought to set this? Not sure how that interacts with the expected behaviour of the fenv(3) API... Hmm. I see no corresponding statement about FPCR.)
Fun. I'm not sure how the ABI is defined there by libc.
On Tue, Dec 03, 2024 at 05:24:39PM +0000, Mark Brown wrote:
On Tue, Dec 03, 2024 at 05:00:08PM +0000, Dave Martin wrote:
On Tue, Dec 03, 2024 at 04:00:45PM +0000, Mark Brown wrote:
[...]
We know that the only bit of register state which is not up to date at this point is the SME vector length, we don't configure that for tasks that do not have SME. SVCR is always configured since we have to exit streaming mode for FPSIMD and SVE to work properly so we know it's already 0, all the other SME specific state is gated by controls in SVCR.
fpsimd_flush_task_state() means that we do the necessary work when re- entering userspace, but is there a problem with simply marking all the FPSIMD/vector state as stale? If FPSR or FPCR is dirty for example, it now looks like they won't get written back to thread struct if there is a context switch before current re-enters userspace?
Maybe the other flags distinguish these cases -- I haven't fully got my head around it.
We are doing fpsimd_flush_task_state() in the TIF_FOREIGN_FPSTATE case so we know there is no dirty state in the registers.
Ah, that wasn't obvious from the diff context, but you're right.
I was confused by the fpsimd_bind_task_to_cpu() call; I forgot that there are reasons to call this even when TIF_FOREIGN_FPSTATE is clear. Perhaps it would be worth splitting some of those uses up, but it would need some thinking about. Doesn't really belong in this series anyway.
(Actually, the ARM ARM says (IMHTLZ) that toggling PSTATE.SM by any means causes FPSR to become 0x800009f. I'm not sure where that fits in -- do we handle that anywhere? I guess the "soft" SM toggling via
Urgh, not seen that one - that needs handling in the signal entry path and ptrace. That will have been defined while the feature was being implemented. It's not relevant here though since we are in the SME access trap, we might be trapping due to a SMSTART or equivalent operation but that SMSTART has not yet run at the point where we return to userspace.
ptrace, signal delivery or maybe exec, ought to set this? Not sure how that interacts with the expected behaviour of the fenv(3) API... Hmm. I see no corresponding statement about FPCR.)
Fun. I'm not sure how the ABI is defined there by libc.
I guess this should be left as-is, for now. There's an argument for sanitising FPCR/FPSR on signal delivery, but neither signal(7) nor fenv(3) give any clue about the expected behaviour...
For ptrace, the user has the opportunity to specify exactly what they want to happen to all the registers, so I suppose it's best to stick to the current model and require the tracer to specify all changes explicitly rather than add new magic ptrace behaviour.
Not relevant for this series, in any case.
Cheers ---Dave
When FPMR and SME are both present then entering and exiting streaming mode clears FPMR in the same manner as it clears the V/Z and P registers. Since entering and exiting streaming mode via ptrace is expected to have the same effect as doing so via SMSTART/SMSTOP it should clear FPMR too but this was missed when FPMR support was added. Add the required reset of FPMR.
Since changing the vector length resets SVCR a SME vector length change implemented via a write to ZA can trigger an exit of streaming mode and we need to check when writing to ZA as well.
Fixes: 4035c22ef7d4 ("arm64/ptrace: Expose FPMR via ptrace") Signed-off-by: Mark Brown broonie@kernel.org Cc: stable@vger.kernel.org --- arch/arm64/kernel/ptrace.c | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-)
diff --git a/arch/arm64/kernel/ptrace.c b/arch/arm64/kernel/ptrace.c index e4437f62a2cda93734052c44b48886db83d75b3e..43a9397d5903ff87b608befdcaed3f9a7e48f976 100644 --- a/arch/arm64/kernel/ptrace.c +++ b/arch/arm64/kernel/ptrace.c @@ -877,6 +877,7 @@ static int sve_set_common(struct task_struct *target, const void *kbuf, const void __user *ubuf, enum vec_type type) { + u64 old_svcr = target->thread.svcr; int ret; struct user_sve_header header; unsigned int vq; @@ -908,8 +909,6 @@ static int sve_set_common(struct task_struct *target,
/* Enter/exit streaming mode */ if (system_supports_sme()) { - u64 old_svcr = target->thread.svcr; - switch (type) { case ARM64_VEC_SVE: target->thread.svcr &= ~SVCR_SM_MASK; @@ -1008,6 +1007,10 @@ static int sve_set_common(struct task_struct *target, start, end);
out: + /* If we entered or exited streaming mode then reset FPMR */ + if ((target->thread.svcr & SVCR_SM) != (old_svcr & SVCR_SM)) + target->thread.uw.fpmr = 0; + fpsimd_flush_task_state(target); return ret; } @@ -1104,6 +1107,7 @@ static int za_set(struct task_struct *target, unsigned int pos, unsigned int count, const void *kbuf, const void __user *ubuf) { + u64 old_svcr = target->thread.svcr; int ret; struct user_za_header header; unsigned int vq; @@ -1184,6 +1188,10 @@ static int za_set(struct task_struct *target, target->thread.svcr |= SVCR_ZA_MASK;
out: + /* If we entered or exited streaming mode then reset FPMR */ + if ((target->thread.svcr & SVCR_SM) != (old_svcr & SVCR_SM)) + target->thread.uw.fpmr = 0; + fpsimd_flush_task_state(target); return ret; }
We intend that signal handlers are entered with PSTATE.{SM,ZA}={0,0}. The logic for this in setup_return() manipulates the saved state and live CPU state in an unsafe manner, and consequently, when a task enters a signal handler:
* The task entering the signal handler might not have its PSTATE.{SM,ZA} bits cleared, and other register state that is affected by changes to PSTATE.{SM,ZA} might not be zeroed as expected.
* An unrelated task might have its PSTATE.{SM,ZA} bits cleared unexpectedly, potentially zeroing other register state that is affected by changes to PSTATE.{SM,ZA}.
Tasks which do not set PSTATE.{SM,ZA} (i.e. those only using plain FPSIMD or non-streaming SVE) are not affected, as there is no resulting change to PSTATE.{SM,ZA}.
Consider for example two tasks on one CPU:
A: Begins signal entry in kernel mode, is preempted prior to SMSTOP. B: Using SM and/or ZA in userspace with register state current on the CPU, is preempted. A: Scheduled in, no register state changes made as in kernel mode. A: Executes SMSTOP, modifying live register state. A: Scheduled out. B: Scheduled in, fpsimd_thread_switch() sees the register state on the CPU is tracked as being that for task B so the state is not reloaded prior to returning to userspace.
Task B is now running with SM and ZA incorrectly cleared.
Fix this by:
* Checking TIF_FOREIGN_FPSTATE, and only updating the saved or live state as appropriate.
* Using {get,put}_cpu_fpsimd_context() to ensure mutual exclusion against other code which manipulates this state. To allow their use, the logic is moved into a new fpsimd_enter_sighandler() helper in fpsimd.c.
This race has been observed intermittently with fp-stress, especially with preempt disabled, commonly but not exclusively reporting "Bad SVCR: 0".
While we're at it also fix a discrepancy between in register and in memory entries. When operating on the register state we issue a SMSTOP, exiting streaming mode if we were in it. This clears the V/Z and P register and FPMR but nothing else. The in memory version clears all the user FPSIMD state including FPCR and FPSR but does not clear FPMR. Add the clear of FPMR and limit the existing memset() to only cover the vregs, preserving the state of FPCR and FPSR like SMSTOP does.
Fixes: 40a8e87bb3285 ("arm64/sme: Disable ZA and streaming mode when handling signals") Signed-off-by: Mark Brown broonie@kernel.org Cc: stable@vger.kernel.org --- arch/arm64/include/asm/fpsimd.h | 1 + arch/arm64/kernel/fpsimd.c | 39 +++++++++++++++++++++++++++++++++++++++ arch/arm64/kernel/signal.c | 19 +------------------ 3 files changed, 41 insertions(+), 18 deletions(-)
diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h index f2a84efc361858d4deda99faf1967cc7cac386c1..09af7cfd9f6c2cec26332caa4c254976e117b1bf 100644 --- a/arch/arm64/include/asm/fpsimd.h +++ b/arch/arm64/include/asm/fpsimd.h @@ -76,6 +76,7 @@ extern void fpsimd_load_state(struct user_fpsimd_state *state); extern void fpsimd_thread_switch(struct task_struct *next); extern void fpsimd_flush_thread(void);
+extern void fpsimd_enter_sighandler(void); extern void fpsimd_signal_preserve_current_state(void); extern void fpsimd_preserve_current_state(void); extern void fpsimd_restore_current_state(void); diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c index f02762762dbcf954e9add6dfd3575ae7055b6b0e..c5465c8ec467cb1ab8bd211dc5370f91aa2bcf35 100644 --- a/arch/arm64/kernel/fpsimd.c +++ b/arch/arm64/kernel/fpsimd.c @@ -1696,6 +1696,45 @@ void fpsimd_signal_preserve_current_state(void) sve_to_fpsimd(current); }
+/* + * Called by the signal handling code when preparing current to enter + * a signal handler. Currently this only needs to take care of exiting + * streaming mode and clearing ZA on SME systems. + */ +void fpsimd_enter_sighandler(void) +{ + if (!system_supports_sme()) + return; + + get_cpu_fpsimd_context(); + + if (test_thread_flag(TIF_FOREIGN_FPSTATE)) { + /* + * Exiting streaming mode zeros the V/Z and P + * registers and FPMR. Zero FPMR and the V registers, + * marking the state as FPSIMD only to force a clear + * of the remaining bits during reload if needed. + */ + if (current->thread.svcr & SVCR_SM_MASK) { + memset(¤t->thread.uw.fpsimd_state.vregs, 0, + sizeof(current->thread.uw.fpsimd_state.vregs)); + current->thread.uw.fpmr = 0; + current->thread.fp_type = FP_STATE_FPSIMD; + } + + current->thread.svcr &= ~(SVCR_ZA_MASK | + SVCR_SM_MASK); + + /* Ensure any copies on other CPUs aren't reused */ + fpsimd_flush_task_state(current); + } else { + /* The register state is current, just update it. */ + sme_smstop(); + } + + put_cpu_fpsimd_context(); +} + /* * Called by KVM when entering the guest. */ diff --git a/arch/arm64/kernel/signal.c b/arch/arm64/kernel/signal.c index abd0907061fe664bf22d1995319f9559c4bbed91..335c2327baf74eac9634cf594855dbf26a7d6b01 100644 --- a/arch/arm64/kernel/signal.c +++ b/arch/arm64/kernel/signal.c @@ -1461,24 +1461,7 @@ static int setup_return(struct pt_regs *regs, struct ksignal *ksig, /* TCO (Tag Check Override) always cleared for signal handlers */ regs->pstate &= ~PSR_TCO_BIT;
- /* Signal handlers are invoked with ZA and streaming mode disabled */ - if (system_supports_sme()) { - /* - * If we were in streaming mode the saved register - * state was SVE but we will exit SM and use the - * FPSIMD register state - flush the saved FPSIMD - * register state in case it gets loaded. - */ - if (current->thread.svcr & SVCR_SM_MASK) { - memset(¤t->thread.uw.fpsimd_state, 0, - sizeof(current->thread.uw.fpsimd_state)); - current->thread.fp_type = FP_STATE_FPSIMD; - } - - current->thread.svcr &= ~(SVCR_ZA_MASK | - SVCR_SM_MASK); - sme_smstop(); - } + fpsimd_enter_sighandler();
if (ksig->ka.sa.sa_flags & SA_RESTORER) sigtramp = ksig->ka.sa.sa_restorer;
On Tue, Dec 03, 2024 at 12:45:57PM +0000, Mark Brown wrote:
We intend that signal handlers are entered with PSTATE.{SM,ZA}={0,0}. The logic for this in setup_return() manipulates the saved state and live CPU state in an unsafe manner, and consequently, when a task enters a signal handler:
The task entering the signal handler might not have its PSTATE.{SM,ZA} bits cleared, and other register state that is affected by changes to PSTATE.{SM,ZA} might not be zeroed as expected.
An unrelated task might have its PSTATE.{SM,ZA} bits cleared unexpectedly, potentially zeroing other register state that is affected by changes to PSTATE.{SM,ZA}.
Tasks which do not set PSTATE.{SM,ZA} (i.e. those only using plain FPSIMD or non-streaming SVE) are not affected, as there is no resulting change to PSTATE.{SM,ZA}.
Consider for example two tasks on one CPU:
A: Begins signal entry in kernel mode, is preempted prior to SMSTOP. B: Using SM and/or ZA in userspace with register state current on the CPU, is preempted. A: Scheduled in, no register state changes made as in kernel mode. A: Executes SMSTOP, modifying live register state. A: Scheduled out. B: Scheduled in, fpsimd_thread_switch() sees the register state on the CPU is tracked as being that for task B so the state is not reloaded prior to returning to userspace.
Task B is now running with SM and ZA incorrectly cleared.
Fix this by:
Checking TIF_FOREIGN_FPSTATE, and only updating the saved or live state as appropriate.
Using {get,put}_cpu_fpsimd_context() to ensure mutual exclusion against other code which manipulates this state. To allow their use, the logic is moved into a new fpsimd_enter_sighandler() helper in fpsimd.c.
This race has been observed intermittently with fp-stress, especially with preempt disabled, commonly but not exclusively reporting "Bad SVCR: 0".
While we're at it also fix a discrepancy between in register and in memory entries. When operating on the register state we issue a SMSTOP, exiting streaming mode if we were in it. This clears the V/Z and P register and FPMR but nothing else. The in memory version clears all the user FPSIMD state including FPCR and FPSR but does not clear FPMR. Add the clear of FPMR and limit the existing memset() to only cover the vregs, preserving the state of FPCR and FPSR like SMSTOP does.
Fixes: 40a8e87bb3285 ("arm64/sme: Disable ZA and streaming mode when handling signals") Signed-off-by: Mark Brown broonie@kernel.org Cc: stable@vger.kernel.org
arch/arm64/include/asm/fpsimd.h | 1 + arch/arm64/kernel/fpsimd.c | 39 +++++++++++++++++++++++++++++++++++++++ arch/arm64/kernel/signal.c | 19 +------------------ 3 files changed, 41 insertions(+), 18 deletions(-)
diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h index f2a84efc361858d4deda99faf1967cc7cac386c1..09af7cfd9f6c2cec26332caa4c254976e117b1bf 100644 --- a/arch/arm64/include/asm/fpsimd.h +++ b/arch/arm64/include/asm/fpsimd.h @@ -76,6 +76,7 @@ extern void fpsimd_load_state(struct user_fpsimd_state *state); extern void fpsimd_thread_switch(struct task_struct *next); extern void fpsimd_flush_thread(void); +extern void fpsimd_enter_sighandler(void); extern void fpsimd_signal_preserve_current_state(void); extern void fpsimd_preserve_current_state(void); extern void fpsimd_restore_current_state(void); diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c index f02762762dbcf954e9add6dfd3575ae7055b6b0e..c5465c8ec467cb1ab8bd211dc5370f91aa2bcf35 100644 --- a/arch/arm64/kernel/fpsimd.c +++ b/arch/arm64/kernel/fpsimd.c @@ -1696,6 +1696,45 @@ void fpsimd_signal_preserve_current_state(void) sve_to_fpsimd(current); } +/*
- Called by the signal handling code when preparing current to enter
- a signal handler. Currently this only needs to take care of exiting
- streaming mode and clearing ZA on SME systems.
- */
+void fpsimd_enter_sighandler(void) +{
- if (!system_supports_sme())
return;
- get_cpu_fpsimd_context();
- if (test_thread_flag(TIF_FOREIGN_FPSTATE)) {
/*
* Exiting streaming mode zeros the V/Z and P
* registers and FPMR. Zero FPMR and the V registers,
* marking the state as FPSIMD only to force a clear
* of the remaining bits during reload if needed.
*/
if (current->thread.svcr & SVCR_SM_MASK) {
memset(¤t->thread.uw.fpsimd_state.vregs, 0,
sizeof(current->thread.uw.fpsimd_state.vregs));
Do we need to hold the CPU fpsimd context across this memset?
IIRC, TIF_FOREIGN_FPSTATE can be spontaneously cleared along with dumping of the regs into thread_struct (from current's PoV), but never spontaneously set again. So ... -> [*]
current->thread.uw.fpmr = 0;
current->thread.fp_type = FP_STATE_FPSIMD;
}
current->thread.svcr &= ~(SVCR_ZA_MASK |
SVCR_SM_MASK);
/* Ensure any copies on other CPUs aren't reused */
fpsimd_flush_task_state(current);
(This is very similar to fpsimd_flush_thread(); can they be unified?)
- } else {
/* The register state is current, just update it. */
sme_smstop();
... [*] the critical thing seems to be that the CPU fpsimd context is held from the test on TIF_FOREIGN_FPSTATE, across this else clause.
(Whether or not this is a worthwhile optimisation is another matter. But if the behaviour of TIF_FOREIGN_FPSTATE is still the same, then it may be a good idea to avoid sending mixed messages about this in the code.)
(A similar argument applies in fpsimd_flush_thread().)
[...]
Cheers ---Dave
On Tue, Dec 03, 2024 at 03:33:18PM +0000, Dave Martin wrote:
On Tue, Dec 03, 2024 at 12:45:57PM +0000, Mark Brown wrote:
- get_cpu_fpsimd_context();
if (current->thread.svcr & SVCR_SM_MASK) {
memset(¤t->thread.uw.fpsimd_state.vregs, 0,
sizeof(current->thread.uw.fpsimd_state.vregs));
Do we need to hold the CPU fpsimd context across this memset?
IIRC, TIF_FOREIGN_FPSTATE can be spontaneously cleared along with dumping of the regs into thread_struct (from current's PoV), but never spontaneously set again. So ... -> [*]
Yes, we could drop the lock here. OTOH this is very simple and easy to understand.
/* Ensure any copies on other CPUs aren't reused */
fpsimd_flush_task_state(current);
(This is very similar to fpsimd_flush_thread(); can they be unified?)
I have a half finished series to replace the whole setup around accessing the state with get/put operations for working on the state which should remove all these functions. The pile of similarly and confusingly named operations we have for working on the state is one of the major sources of issues with this code, even when actively working on the code it's hard to remember exactly which operation does what never mind the rules for which is needed.
Hi,
On Tue, Dec 03, 2024 at 04:12:33PM +0000, Mark Brown wrote:
On Tue, Dec 03, 2024 at 03:33:18PM +0000, Dave Martin wrote:
On Tue, Dec 03, 2024 at 12:45:57PM +0000, Mark Brown wrote:
- get_cpu_fpsimd_context();
if (current->thread.svcr & SVCR_SM_MASK) {
memset(¤t->thread.uw.fpsimd_state.vregs, 0,
sizeof(current->thread.uw.fpsimd_state.vregs));
Do we need to hold the CPU fpsimd context across this memset?
IIRC, TIF_FOREIGN_FPSTATE can be spontaneously cleared along with dumping of the regs into thread_struct (from current's PoV), but never spontaneously set again. So ... -> [*]
Yes, we could drop the lock here. OTOH this is very simple and easy to understand.
Ack; it works either way.
Since this is a Fixes: patch, it may be better to keep it simple.
/* Ensure any copies on other CPUs aren't reused */
fpsimd_flush_task_state(current);
(This is very similar to fpsimd_flush_thread(); can they be unified?)
I have a half finished series to replace the whole setup around accessing the state with get/put operations for working on the state which should remove all these functions. The pile of similarly and confusingly named operations we have for working on the state is one of the major sources of issues with this code, even when actively working on the code it's hard to remember exactly which operation does what never mind the rules for which is needed.
Sure, something like that would definitely help.
Cheers ---Dave
linux-stable-mirror@lists.linaro.org