On Mon, Apr 20, 2020 at 2:32 AM David Laight David.Laight@aculab.com wrote:
Maybe kernel_fp_begin() should be passed the address of somewhere the address of an fpu save area buffer can be written to. Then the pre-emption code can allocate the buffer and save the state into it.
However that doesn't solve the problem for non-preemptive kernels. The may need a cond_resched() in the loop if it might take 1ms (or so).
kernel_fpu_begin() ought also be passed a parameter saying which fpu features are required, and return which are allocated. On x86 this could be used to check for AVX512 (etc) which may be available in an ISR unless it interrupted inside a kernel_fpu_begin() section (etc). It would also allow optimisations if only 1 or 2 fpu registers are needed (eg for some of the crypto functions) rather than the whole fpu register set.
There might be ways to improve lots of FPU things, indeed. This patch here is just a patch to Herbert's branch in order to make uniform usage of our existing solution for this, fixing the existing bug. I wouldn't mind seeing more involved and better solutions in a patchset for crypto-next.
Will follow up with your suggestion in a different thread, so as not to block this one.