On Tue, Mar 08, 2022, David Matlack wrote:
On Tue, Mar 8, 2022 at 1:40 PM Sean Christopherson seanjc@google.com wrote:
On Thu, Mar 03, 2022, David Matlack wrote:
Tie the lifetime the KVM module to the lifetime of each VM via kvm.users_count. This way anything that grabs a reference to the VM via kvm_get_kvm() cannot accidentally outlive the KVM module.
Prior to this commit, the lifetime of the KVM module was tied to the lifetime of /dev/kvm file descriptors, VM file descriptors, and vCPU file descriptors by their respective file_operations "owner" field. This approach is insufficient because references grabbed via kvm_get_kvm() do not prevent closing any of the aforementioned file descriptors.
This fixes a long standing theoretical bug in KVM that at least affects async page faults. kvm_setup_async_pf() grabs a reference via kvm_get_kvm(), and drops it in an asynchronous work callback. Nothing prevents the VM file descriptor from being closed and the KVM module from being unloaded before this callback runs.
Fixes: af585b921e5d ("KVM: Halt vcpu if page it tries to access is swapped out")
And (or)
Fixes: 3d3aab1b973b ("KVM: set owner of cpu and vm file operations")
because the above is x86-centric, at a glance PPC and maybe s390 have issues beyond async #PF.
Cc: stable@vger.kernel.org Suggested-by: Ben Gardon bgardon@google.com [ Based on a patch from Ben implemented for Google's kernel. ] Signed-off-by: David Matlack dmatlack@google.com
virt/kvm/kvm_main.c | 8 ++++++++ 1 file changed, 8 insertions(+)
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 35ae6d32dae5..b59f0a29dbd5 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -117,6 +117,8 @@ EXPORT_SYMBOL_GPL(kvm_debugfs_dir);
static const struct file_operations stat_fops_per_vm;
+static struct file_operations kvm_chardev_ops;
static long kvm_vcpu_ioctl(struct file *file, unsigned int ioctl, unsigned long arg); #ifdef CONFIG_KVM_COMPAT @@ -1131,6 +1133,11 @@ static struct kvm *kvm_create_vm(unsigned long type) preempt_notifier_inc(); kvm_init_pm_notifier(kvm);
if (!try_module_get(kvm_chardev_ops.owner)) {
The "try" aspect is unnecessary. Stealing from Paolo's version,
/* KVM is pinned via open("/dev/kvm"), the fd passed to this ioctl(). */ __module_get(kvm_chardev_ops.owner);
Right, I did see that and agree we're guaranteed the KVM module has a reference at this point. But the KVM module might be in state MODULE_STATE_GOING (e.g. if someone ran "rmmod --wait"), which try_module_get() checks.
Ah, can you throw that in as a comment? Doesn't have to be much, just enough of a breadcrumb to connect the dots and to prevent us from "optimizing" this to __module_get() in the future.
/* Use the "try" variant to play nice with e.g. "rmmod --wait". */
With a comment,
Reviewed-by: Sean Christopherson seanjc@google.com