On Wed, Oct 30, 2024 at 05:34:16PM +0000, Mark Rutland wrote:
I originally just had a few comments on the commit message, but I believe I've found a logic issue in this patch, and more general issue throughout our FPSIMD/SVE/SME manipulation -- more details below.
I'm fairly sure there's at least one other issue lurking somewhere with TIF_SVE tearing, yes. I've not been able to get that to reproduce, and I've probably stared at this code too much to see it by pure inspection however it looks like you might've spotted the issue here.
On Wed, Oct 23, 2024 at 10:31:24PM +0100, Mark Brown wrote:
It would be nice to have the signature of the failure as well, e.g.
| This is intermittently detected by the fp-stress test, which | intermittently reports "ZA-VL-*-*: Bad SVCR: 0".
That's a common one for timing reasons, but it does also manifest with other outputs (eg, if we turn off ZA while trying to execute instructions that access ZA).
I don't think this is correct in the TIF_FOREIGN_FPSTATE case. We don't unbind the saved state from another CPU it might still be resident on, and so IIUC there's a race whereby the updates to the saved state can end up discarded:
...
... and either:
- A subsequent return to userspace will see TIF_FOREIGN_FPSTATE is clear and not restore the in-memory state.
- A subsequent context-switch will see TIF_FOREIGN_FPSTATE is clear an save the (stale) HW state again.
It looks like we have a similar pattern all over the place, e.g. in do_sve_acc():
Yes, indeed - I think that's a separate bug caused by the recalcuation of TIF_FOREIGN_FPSTATE.
This is going to need a careful audit and a proper series of fixes that can be backported to stable.
It feels like a separate thing at any rate. We can do a simple and robust but performance impacting fix by having fpsimd_thread_switch() only ever set TIF_FOREIGN_FPSTATE, never clear it. That'd cause extra reloads in the case where we switch to a thread but stay in kernel mode which probably happens often enough to be palatable.
Otherwise I'm not sure it's *too* hard, TIF_FOREIGN_FPSTATE is a bit of a giveaway for places that could have issues.