On Wed, Oct 09, 2024 at 11:27:52PM +0000, Oliver Upton wrote:
On Wed, Oct 09, 2024 at 12:36:32PM -0700, Sean Christopherson wrote:
On Wed, Oct 09, 2024, Oliver Upton wrote:
On Wed, Oct 09, 2024 at 07:36:03PM +0100, Marc Zyngier wrote:
As there is very little ordering in the KVM API, userspace can instanciate a half-baked GIC (missing its memory map, for example) at almost any time.
This means that, with the right timing, a thread running vcpu-0 can enter the kernel without a GIC configured and get a GIC created behind its back by another thread. Amusingly, it will pick up that GIC and start messing with the data structures without the GIC having been fully initialised.
Huh, I'm definitely missing something. Could you remind me where we open up this race between KVM_RUN && kvm_vgic_create()?
Ah, duh, I see it now. kvm_arch_vcpu_run_pid_change() doesn't serialize on a VM lock, and kvm_vgic_map_resources() has an early return for vgic_ready() letting it blow straight past the config_lock.
Then if we can't register the MMIO region for the distributor everything comes crashing down and a vCPU has made it into the KVM_RUN loop w/ the VGIC-shaped rug pulled out from under it. There's definitely another functional bug here where a vCPU's attempts to poke the
a theoretical bug, that is. In practice the window to race against likely isn't big enough to get the in-guest vCPU to the point of poking the halfway-initialized distributor.
distributor wind up reaching userspace as MMIO exits. But we can worry about that another day.
If memory serves, kvm_vgic_map_resources() used to do all of this behind the config_lock to cure the race, but that wound up inverting lock ordering on srcu.
Note to self: Impose strict ordering on GIC initialization v. vCPU creation if/when we get a new flavor of irqchip.
I'd thought the fact that the latter takes all the vCPU mutexes and checks if any vCPU in the VM has run would be enough to guard against such a race, but clearly not...
Any chance that fixing bugs where vCPU0 can be accessed (and run!) before its fully online help?
That's an equally gross bug, but kvm_vgic_create() should still be safe w.r.t. vCPU creation since both hold the kvm->lock in the right spot. That is, since kvm_vgic_create() is called under the lock any vCPUs visible to userspace should exist in the vCPU xarray.
The crappy assumption here is kvm_arch_vcpu_run_pid_change() and its callees are allowed to destroy VM-scoped structures in error handling.
E.g. if that closes the vCPU0 hole, maybe the vCPU1 case can be handled a bit more gracefully?
I think this is about as graceful as we can be. The sorts of screw-ups that precipitate this error handling may involve stupidity across several KVM ioctls, meaning it is highly unlikely to be attributable / recoverable.
-- Thanks, Oliver