On Thu, Dec 07, 2023 at 02:06:34PM +0000, Marc Zyngier wrote:
Mark Brown broonie@kernel.org wrote:
@@ -517,7 +519,6 @@ struct kvm_vcpu_arch { enum fp_type fp_type; unsigned int sve_max_vl; u64 svcr;
- u64 fpmr;
Why do this change here? Why isn't done like that the first place?
It didn't seem right to add the register to struct vcpu_sysreg before it was handled by KVM. As referenced in the cover letter normally this wouldn't come up because KVM doesn't rely on the host kernel for managing register state so we add KVM support then enable the host kernel but for FPSIMD we're reusing fpsimd_save() so we need the host kernel support to be in place when we enable KVM.
That doesn't explain why you can't be upfront with it and populate the FPMR entry. In either case, you are wasting a u64.
So you'd rather just have the register listed in there as part of the host support rather than initially excluding it? Note that the current series has the same approach as is currently used for SVCR which is in a similar situation.
I agree that it's not great, the main issue was that fpsimd_state is both already embedded in uw for hardened usercopy and very widely referenced by exactly which struct it's in so I was taking a guess as to what would get the least objections. The obvious thing would be to add FPMR to uw and share the whole thing with the hypervisor, if people don't mind adding another field to uw I could do that?
Either that, or you create a KVM-specific structure that contains these fields. If that results in KVM changes, so be it. But I won't take this sort of pointer arithmetic that assumes some pre-defined layout.
Moving fpsimd_state would have a big textual impact on the host code and consequent issues with creating conflicts too so I'd rather avoid that.
We really need to stop piling the save/restore of stuff that isn't advertised to the guest.
I'm not clear what you're referencing here? The feature is advertised to the guest via the ID registers and in the past you've pushed back on making things where the state is just a single register like this optional. Do you mean that we should be making this conditional on the guest ID registers? If that is the case is there a plan for how that's supposed to work, set flags when kvm_vcpu_run_pid_change() happens for example?
See the beginning of this email. It is high time that we stop enabling everything by default, because this totally breaks VM migration. We already have a huge backlog of these things, and I don't want to add more of it.
Which means that at the very least, enabling *any* feature also comes with sanitising the state one way or another when this feature is disabled by userspace.
How this is being done is still a work in progress: my current plan is based on a set of trap bits that are computed on a per-VM basis, and some side state that indicates whether the trap handling is for emulation or feature disabling purpose. This will probably reuse the NV infrastructure which has an exhaustive list of the sysregs that can be trapped from EL0/EL1.
At the very least, userspace shouldn't be able to observe the state that a guest isn't supposed to generate, and we should be mindful of not creating covert channels.
OK, that does seem much more like what I'd have expected the enablement of these extra features to look like (and may well simplify a bunch of the existing trap management) though it is a big change in approach. It does greatly expand the scope of what's needed to virtualise the feature though, and there's a bunch of other in flight serieses that'd be impacted (eg, POR and GCS) and I'm expecting a bunch of overlap until the general mechanism gets landed.
Based on that I think it might make sense to push the KVM guest support for these features out and deal with them separately as part of the new approach to trap/feature management. Probably through pulling the KVM bits to the end of the serieses so they're there for people to see. Does that sound reasonable to you?