Hi,
On Thu, Oct 17, 2024 at 02:13:34PM +0800, WangYuli wrote:
There is a probability that the host machine will also restart when the virtual machine is restarting.
This is a start, but can you please describe in detail what the flow is you're seeing that leads to the refcount bug / UAF?
Commit ad362fe07fec ("KVM: arm64: vgic-its: Avoid potential UAF in LPI translation cache") released the reference count of an IRQ when it shouldn't have. This led to a situation where, when the system finally released the IRQ, it found that the structure had already been freed, triggering a 'refcount_t: underflow; use-after-free' error.
In fact, the function "vgic_put_irq" should be called by "vgic_its_inject_cached_translation" instead of "vgic_its_trigger_msi".
This line doesn't match your patch, and instead aligns with the upstream behavior.
The put in vgic_its_inject_cached_translation() pairs with the get in vgic_its_check_cache(). We need to do this because the LPI injection fast path happens outside of the its_lock.
The slow path for LPI injection takes the its_lock and is safe because the ITE holds a reference on the descriptor as well. Because of that, vgic_its_trigger_msi() doesn't touch the refcount on the resolved IRQ.
So I'm not following how any of this leads to the underflow you observe. Even if the put in vgic_its_inject_cached_translation() causes the refcount to drop to 0, it is likely that an unbalanced put happened somewhere else in the ITS emulation.