On Tue, Dec 03, 2024 at 05:00:08PM +0000, Dave Martin wrote:
On Tue, Dec 03, 2024 at 04:00:45PM +0000, Mark Brown wrote:
It's to ensure that the last recorded CPU for the current task is invalid so that if the state was loaded on another CPU and we switch back to that CPU we reload the state from memory, we need to at least trigger configuration of the SME VL.
OK, so the logic here is something like:
Disregarding SME, the FPSIMD/SVE regs are up to date, which is fine because SME is trapped.
When we take the SME trap, we suddenly have some work to do in order to make sure that the SME-specific parts of the register state are up to date, so we need to mark the state as stale before setting TIF_SME and returning.
We know that the only bit of register state which is not up to date at this point is the SME vector length, we don't configure that for tasks that do not have SME. SVCR is always configured since we have to exit streaming mode for FPSIMD and SVE to work properly so we know it's already 0, all the other SME specific state is gated by controls in SVCR.
fpsimd_flush_task_state() means that we do the necessary work when re- entering userspace, but is there a problem with simply marking all the FPSIMD/vector state as stale? If FPSR or FPCR is dirty for example, it now looks like they won't get written back to thread struct if there is a context switch before current re-enters userspace?
Maybe the other flags distinguish these cases -- I haven't fully got my head around it.
We are doing fpsimd_flush_task_state() in the TIF_FOREIGN_FPSTATE case so we know there is no dirty state in the registers.
(Actually, the ARM ARM says (IMHTLZ) that toggling PSTATE.SM by any means causes FPSR to become 0x800009f. I'm not sure where that fits in -- do we handle that anywhere? I guess the "soft" SM toggling via
Urgh, not seen that one - that needs handling in the signal entry path and ptrace. That will have been defined while the feature was being implemented. It's not relevant here though since we are in the SME access trap, we might be trapping due to a SMSTART or equivalent operation but that SMSTART has not yet run at the point where we return to userspace.
ptrace, signal delivery or maybe exec, ought to set this? Not sure how that interacts with the expected behaviour of the fenv(3) API... Hmm. I see no corresponding statement about FPCR.)
Fun. I'm not sure how the ABI is defined there by libc.