On Tue, Dec 03, 2024 at 04:00:45PM +0000, Mark Brown wrote:
On Tue, Dec 03, 2024 at 03:32:22PM +0000, Dave Martin wrote:
On Tue, Dec 03, 2024 at 12:45:53PM +0000, Mark Brown wrote:
@@ -1460,6 +1460,8 @@ void do_sme_acc(unsigned long esr, struct pt_regs *regs) sme_set_vq(vq_minus_one); fpsimd_bind_task_to_cpu();
- } else {
fpsimd_flush_task_state(current);
TIF_FOREIGN_FPSTATE is (or was) a cache of the task<->CPU binding that you're clobbering here.
So, this fpsimd_flush_task_state() should have no effect unless TIF_FOREIGN_FPSTATE is already wrong? I'm wondering if the apparent need for this means that there is an undiagnosed bug elsewhere.
(My understanding is based on FPSIMD/SVE; I'm less familiar with the SME changes, so I may be missing something important here.)
It's to ensure that the last recorded CPU for the current task is invalid so that if the state was loaded on another CPU and we switch back to that CPU we reload the state from memory, we need to at least trigger configuration of the SME VL.
OK, so the logic here is something like:
Disregarding SME, the FPSIMD/SVE regs are up to date, which is fine because SME is trapped.
When we take the SME trap, we suddenly have some work to do in order to make sure that the SME-specific parts of the register state are up to date, so we need to mark the state as stale before setting TIF_SME and returning.
fpsimd_flush_task_state() means that we do the necessary work when re- entering userspace, but is there a problem with simply marking all the FPSIMD/vector state as stale? If FPSR or FPCR is dirty for example, it now looks like they won't get written back to thread struct if there is a context switch before current re-enters userspace?
Maybe the other flags distinguish these cases -- I haven't fully got my head around it.
(Actually, the ARM ARM says (IMHTLZ) that toggling PSTATE.SM by any means causes FPSR to become 0x800009f. I'm not sure where that fits in -- do we handle that anywhere? I guess the "soft" SM toggling via ptrace, signal delivery or maybe exec, ought to set this? Not sure how that interacts with the expected behaviour of the fenv(3) API... Hmm. I see no corresponding statement about FPCR.)
Cheers ---Dave