Re: [PATCH] KVM: arm64: Don't eagerly teardown the vgic on init error

10 Oct 2024


      On Thu, 10 Oct 2024 09:47:04 +0100,
Oliver Upton oliver.upton@linux.dev wrote:
...
On Thu, Oct 10, 2024 at 08:54:43AM +0100, Marc Zyngier wrote:
...
On Thu, 10 Oct 2024 00:27:46 +0100, Oliver Upton oliver.upton@linux.dev wrote:
...
Then if we can't register the MMIO region for the distributor
everything comes crashing down and a vCPU has made it into the KVM_RUN
loop w/ the VGIC-shaped rug pulled out from under it. There's definitely
another functional bug here where a vCPU's attempts to poke the
distributor wind up reaching userspace as MMIO exits. But we can worry
about that another day.
I don't think that one is that bad. Userspace got us here, and they
now see an MMIO exit for something that it is not prepared to handle.
Suck it up and die (on a black size M t-shirt, please).
LOL, I'll remember that.
The situation I have in mind is a bit harder to blame on userspace,
though. Supposing that the whole VM was set up correctly, multiple vCPUs
entering KVM_RUN concurrently could cause this race and have 'unexpected'
MMIO exits go out to userspace.
vcpu-0				vcpu-1
   ======				======
   kvm_vgic_map_resources()
     dist->ready = true
     mutex_unlock(config_lock)
     				kvm_vgic_map_resources()
   				  if (vgic_ready())
   				    return 0
			< enter guest >
			typer = writel(0, GICD_CTLR)

			< data abort >
			kvm_io_bus_write(...)	<= No GICD, out to userspace

   vgic_register_dist_iodev()


A small but stupid window to race with.
Ah, gotcha. I guess getting rid of the early-out in
kvm_vgic_map_resources() would plug that one. Want to post a fix for
that?
...
...
...
If memory serves, kvm_vgic_map_resources() used to do all of this behind
the config_lock to cure the race, but that wound up inverting lock
ordering on srcu.
Probably something like that. We also used to hold the kvm lock, which
made everything much simpler, but awfully wrong.
...
Note to self: Impose strict ordering on GIC initialization v. vCPU
creation if/when we get a new flavor of irqchip.
One of the things we should have done when introducing GICv3 is to
impose that at KVM_DEV_ARM_VGIC_CTRL_INIT, the GIC memory map is
final. I remember some push-back on the QEMU side of things, as they
like to decouple things, but this has proved to be a nightmare.
Pushing more of the initialization complexity into userspace feels like
the right thing. Since we clearly have no idea what we're doing :)
KVM APIv2?
...
...
...
The crappy assumption here is kvm_arch_vcpu_run_pid_change() and its
callees are allowed to destroy VM-scoped structures in error handling.
I think this is symptomatic of more general issue: we perform VM-wide
configuration in the context of a vcpu. We have tons of this stuff to
paper over the lack of a "this VM is fully configured" barrier.
I wonder whether we could sidestep things by punting the finalisation
of the VM to a different context (workqueue?)  and simply return
-EAGAIN or -EINTR to userspace while we're processing it. That doesn't
solve the "I'm missing parts of the address map and I'm going to die"
part though.
Throwing it back at userspace would be nice, but unfortunately for ABI I
think we need to block/spin vCPUs in the kernel til the VM is in fully
working condition. A fragile userspace could explode for a 'spurious'
EAGAIN/EINTR where there wasn't one before.
EINTR needs to be handled already, as this is how you report
preemption by a signal. But yeah, overall, I'm not enthralled with
much so far...
M.
-- 
Without deviation from the norm, progress is not possible.

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

Re: [PATCH] KVM: arm64: Don't eagerly teardown the vgic on init error