From: Frederic Weisbecker frederic@kernel.org
[ Upstream commit e7f2be115f0746b969c0df14c0d182f65f005ca5 ]
getrusage(RUSAGE_THREAD) with nohz_full may return shorter utime/stime than the actual time.
task_cputime_adjusted() snapshots utime and stime and then adjust their sum to match the scheduler maintained cputime.sum_exec_runtime. Unfortunately in nohz_full, sum_exec_runtime is only updated once per second in the worst case, causing a discrepancy against utime and stime that can be updated anytime by the reader using vtime.
To fix this situation, perform an update of cputime.sum_exec_runtime when the cputime snapshot reports the task as actually running while the tick is disabled. The related overhead is then contained within the relevant situations.
Reported-by: Hasegawa Hitomi hasegawa-hitomi@fujitsu.com Signed-off-by: Frederic Weisbecker frederic@kernel.org Signed-off-by: Hasegawa Hitomi hasegawa-hitomi@fujitsu.com Signed-off-by: Thomas Gleixner tglx@linutronix.de Tested-by: Masayoshi Mizuma m.mizuma@jp.fujitsu.com Acked-by: Phil Auld pauld@redhat.com Link: https://lore.kernel.org/r/20211026141055.57358-3-frederic@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- include/linux/sched/cputime.h | 5 +++-- kernel/sched/cputime.c | 12 +++++++++--- 2 files changed, 12 insertions(+), 5 deletions(-)
diff --git a/include/linux/sched/cputime.h b/include/linux/sched/cputime.h index 6c9f19a33865a..ce3c58286062c 100644 --- a/include/linux/sched/cputime.h +++ b/include/linux/sched/cputime.h @@ -18,15 +18,16 @@ #endif /* CONFIG_VIRT_CPU_ACCOUNTING_NATIVE */
#ifdef CONFIG_VIRT_CPU_ACCOUNTING_GEN -extern void task_cputime(struct task_struct *t, +extern bool task_cputime(struct task_struct *t, u64 *utime, u64 *stime); extern u64 task_gtime(struct task_struct *t); #else -static inline void task_cputime(struct task_struct *t, +static inline bool task_cputime(struct task_struct *t, u64 *utime, u64 *stime) { *utime = t->utime; *stime = t->stime; + return false; }
static inline u64 task_gtime(struct task_struct *t) diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c index 872e481d5098c..9392aea1804e5 100644 --- a/kernel/sched/cputime.c +++ b/kernel/sched/cputime.c @@ -615,7 +615,8 @@ void task_cputime_adjusted(struct task_struct *p, u64 *ut, u64 *st) .sum_exec_runtime = p->se.sum_exec_runtime, };
- task_cputime(p, &cputime.utime, &cputime.stime); + if (task_cputime(p, &cputime.utime, &cputime.stime)) + cputime.sum_exec_runtime = task_sched_runtime(p); cputime_adjust(&cputime, &p->prev_cputime, ut, st); } EXPORT_SYMBOL_GPL(task_cputime_adjusted); @@ -828,19 +829,21 @@ u64 task_gtime(struct task_struct *t) * add up the pending nohz execution time since the last * cputime snapshot. */ -void task_cputime(struct task_struct *t, u64 *utime, u64 *stime) +bool task_cputime(struct task_struct *t, u64 *utime, u64 *stime) { struct vtime *vtime = &t->vtime; unsigned int seq; u64 delta; + int ret;
if (!vtime_accounting_enabled()) { *utime = t->utime; *stime = t->stime; - return; + return false; }
do { + ret = false; seq = read_seqcount_begin(&vtime->seqcount);
*utime = t->utime; @@ -850,6 +853,7 @@ void task_cputime(struct task_struct *t, u64 *utime, u64 *stime) if (vtime->state < VTIME_SYS) continue;
+ ret = true; delta = vtime_delta(vtime);
/* @@ -861,6 +865,8 @@ void task_cputime(struct task_struct *t, u64 *utime, u64 *stime) else *utime += vtime->utime + delta; } while (read_seqcount_retry(&vtime->seqcount, seq)); + + return ret; }
static int vtime_state_fetch(struct vtime *vtime, int cpu)
From: Frederic Weisbecker frederic@kernel.org
[ Upstream commit 53e87e3cdc155f20c3417b689df8d2ac88d79576 ]
When at least one CPU runs in nohz_full mode, a dedicated timekeeper CPU is guaranteed to stay online and to never stop its tick.
Meanwhile on some rare case, the dedicated timekeeper may be running with interrupts disabled for a while, such as in stop_machine.
If jiffies stop being updated, a nohz_full CPU may end up endlessly programming the next tick in the past, taking the last jiffies update monotonic timestamp as a stale base, resulting in an tick storm.
Here is a scenario where it matters:
0) CPU 0 is the timekeeper and CPU 1 a nohz_full CPU.
1) A stop machine callback is queued to execute somewhere.
2) CPU 0 reaches MULTI_STOP_DISABLE_IRQ while CPU 1 is still in MULTI_STOP_PREPARE. Hence CPU 0 can't do its timekeeping duty. CPU 1 can still take IRQs.
3) CPU 1 receives an IRQ which queues a timer callback one jiffy forward.
4) On IRQ exit, CPU 1 schedules the tick one jiffy forward, taking last_jiffies_update as a base. But last_jiffies_update hasn't been updated for 2 jiffies since the timekeeper has interrupts disabled.
5) clockevents_program_event(), which relies on ktime_get(), observes that the expiration is in the past and therefore programs the min delta event on the clock.
6) The tick fires immediately, goto 3)
7) Tick storm, the nohz_full CPU is drown and takes ages to reach MULTI_STOP_DISABLE_IRQ, which is the only way out of this situation.
Solve this with unconditionally updating jiffies if the value is stale on nohz_full IRQ entry. IRQs and other disturbances are expected to be rare enough on nohz_full for the unconditional call to ktime_get() to actually matter.
Reported-by: Paul E. McKenney paulmck@kernel.org Signed-off-by: Frederic Weisbecker frederic@kernel.org Signed-off-by: Thomas Gleixner tglx@linutronix.de Tested-by: Paul E. McKenney paulmck@kernel.org Link: https://lore.kernel.org/r/20211026141055.57358-2-frederic@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- kernel/softirq.c | 3 ++- kernel/time/tick-sched.c | 7 +++++++ 2 files changed, 9 insertions(+), 1 deletion(-)
diff --git a/kernel/softirq.c b/kernel/softirq.c index 322b65d456767..41f470929e991 100644 --- a/kernel/softirq.c +++ b/kernel/softirq.c @@ -595,7 +595,8 @@ void irq_enter_rcu(void) { __irq_enter_raw();
- if (is_idle_task(current) && (irq_count() == HARDIRQ_OFFSET)) + if (tick_nohz_full_cpu(smp_processor_id()) || + (is_idle_task(current) && (irq_count() == HARDIRQ_OFFSET))) tick_irq_enter();
account_hardirq_enter(current); diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c index 6bffe5af8cb11..17a283ce2b20f 100644 --- a/kernel/time/tick-sched.c +++ b/kernel/time/tick-sched.c @@ -1375,6 +1375,13 @@ static inline void tick_nohz_irq_enter(void) now = ktime_get(); if (ts->idle_active) tick_nohz_stop_idle(ts, now); + /* + * If all CPUs are idle. We may need to update a stale jiffies value. + * Note nohz_full is a special case: a timekeeper is guaranteed to stay + * alive but it might be busy looping with interrupts disabled in some + * rare case (typically stop machine). So we must make sure we have a + * last resort. + */ if (ts->tick_stopped) tick_nohz_update_jiffies(now); }
From: Paolo Bonzini pbonzini@redhat.com
[ Upstream commit e90e51d5f01d2baae5dcce280866bbb96816e978 ]
There is nothing to synchronize if APICv is disabled, since neither other vCPUs nor assigned devices can set PIR.ON.
Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/x86/kvm/vmx/vmx.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index dacdf2395f01a..4e212f04268bb 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -7776,10 +7776,10 @@ static __init int hardware_setup(void) ple_window_shrink = 0; }
- if (!cpu_has_vmx_apicv()) { + if (!cpu_has_vmx_apicv()) enable_apicv = 0; + if (!enable_apicv) vmx_x86_ops.sync_pir_to_irr = NULL; - }
if (cpu_has_vmx_tsc_scaling()) { kvm_has_tsc_control = true;
On 12/13/21 15:19, Sasha Levin wrote:
From: Paolo Bonzini pbonzini@redhat.com
[ Upstream commit e90e51d5f01d2baae5dcce280866bbb96816e978 ]
There is nothing to synchronize if APICv is disabled, since neither other vCPUs nor assigned devices can set PIR.ON.
Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org
arch/x86/kvm/vmx/vmx.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index dacdf2395f01a..4e212f04268bb 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -7776,10 +7776,10 @@ static __init int hardware_setup(void) ple_window_shrink = 0; }
- if (!cpu_has_vmx_apicv()) {
- if (!cpu_has_vmx_apicv()) enable_apicv = 0;
- if (!enable_apicv) vmx_x86_ops.sync_pir_to_irr = NULL;
- }
if (cpu_has_vmx_tsc_scaling()) { kvm_has_tsc_control = true;
Acked-by: Paolo Bonzini pbonzini@redhat.com
From: Paolo Bonzini pbonzini@redhat.com
[ Upstream commit 10a37929efeb4c51a0069afdd537c4fa3831f6e5 ]
Taking the lock is useless since there are no other references, and there are already accesses (e.g. to sev->enc_context_owner) that do not take it. So get rid of it.
Reviewed-by: Sean Christopherson seanjc@google.com Message-Id: 20211123005036.2954379-12-pbonzini@redhat.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/x86/kvm/svm/sev.c | 4 ---- 1 file changed, 4 deletions(-)
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c index 134c4ea5e6ad8..ae8092f0d401e 100644 --- a/arch/x86/kvm/svm/sev.c +++ b/arch/x86/kvm/svm/sev.c @@ -1842,8 +1842,6 @@ void sev_vm_destroy(struct kvm *kvm) return; }
- mutex_lock(&kvm->lock); - /* * Ensure that all guest tagged cache entries are flushed before * releasing the pages back to the system for use. CLFLUSH will @@ -1863,8 +1861,6 @@ void sev_vm_destroy(struct kvm *kvm) } }
- mutex_unlock(&kvm->lock); - sev_unbind_asid(kvm, sev->handle); sev_asid_free(sev); }
On 12/13/21 15:19, Sasha Levin wrote:
From: Paolo Bonzini pbonzini@redhat.com
[ Upstream commit 10a37929efeb4c51a0069afdd537c4fa3831f6e5 ]
Taking the lock is useless since there are no other references, and there are already accesses (e.g. to sev->enc_context_owner) that do not take it. So get rid of it.
Reviewed-by: Sean Christopherson seanjc@google.com Message-Id: 20211123005036.2954379-12-pbonzini@redhat.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org
arch/x86/kvm/svm/sev.c | 4 ---- 1 file changed, 4 deletions(-)
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c index 134c4ea5e6ad8..ae8092f0d401e 100644 --- a/arch/x86/kvm/svm/sev.c +++ b/arch/x86/kvm/svm/sev.c @@ -1842,8 +1842,6 @@ void sev_vm_destroy(struct kvm *kvm) return; }
- mutex_lock(&kvm->lock);
- /*
- Ensure that all guest tagged cache entries are flushed before
- releasing the pages back to the system for use. CLFLUSH will
@@ -1863,8 +1861,6 @@ void sev_vm_destroy(struct kvm *kvm) } }
- mutex_unlock(&kvm->lock);
- sev_unbind_asid(kvm, sev->handle); sev_asid_free(sev); }
This is cosmetic only, so
NACK
Paolo
From: Vitaly Kuznetsov vkuznets@redhat.com
[ Upstream commit 908fa88e420f30dde6d80f092795a18ec72ca6d3 ]
With the elevated 'KVM_CAP_MAX_VCPUS' value kvm_create_max_vcpus test may hit RLIMIT_NOFILE limits:
# ./kvm_create_max_vcpus KVM_CAP_MAX_VCPU_ID: 4096 KVM_CAP_MAX_VCPUS: 1024 Testing creating 1024 vCPUs, with IDs 0...1023. /dev/kvm not available (errno: 24), skipping test
Adjust RLIMIT_NOFILE limits to make sure KVM_CAP_MAX_VCPUS fds can be opened. Note, raising hard limit ('rlim_max') requires CAP_SYS_RESOURCE capability which is generally not needed to run kvm selftests (but without raising the limit the test is doomed to fail anyway).
Signed-off-by: Vitaly Kuznetsov vkuznets@redhat.com Message-Id: 20211123135953.667434-1-vkuznets@redhat.com [Skip the test if the hard limit can be raised. - Paolo] Reviewed-by: Sean Christopherson seanjc@google.com Tested-by: Sean Christopherson seanjc@google.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- .../selftests/kvm/kvm_create_max_vcpus.c | 30 +++++++++++++++++++ 1 file changed, 30 insertions(+)
diff --git a/tools/testing/selftests/kvm/kvm_create_max_vcpus.c b/tools/testing/selftests/kvm/kvm_create_max_vcpus.c index 0299cd81b8ba2..aa3795cd7bd3d 100644 --- a/tools/testing/selftests/kvm/kvm_create_max_vcpus.c +++ b/tools/testing/selftests/kvm/kvm_create_max_vcpus.c @@ -12,6 +12,7 @@ #include <stdio.h> #include <stdlib.h> #include <string.h> +#include <sys/resource.h>
#include "test_util.h"
@@ -40,10 +41,39 @@ int main(int argc, char *argv[]) { int kvm_max_vcpu_id = kvm_check_cap(KVM_CAP_MAX_VCPU_ID); int kvm_max_vcpus = kvm_check_cap(KVM_CAP_MAX_VCPUS); + /* + * Number of file descriptors reqired, KVM_CAP_MAX_VCPUS for vCPU fds + + * an arbitrary number for everything else. + */ + int nr_fds_wanted = kvm_max_vcpus + 100; + struct rlimit rl;
pr_info("KVM_CAP_MAX_VCPU_ID: %d\n", kvm_max_vcpu_id); pr_info("KVM_CAP_MAX_VCPUS: %d\n", kvm_max_vcpus);
+ /* + * Check that we're allowed to open nr_fds_wanted file descriptors and + * try raising the limits if needed. + */ + TEST_ASSERT(!getrlimit(RLIMIT_NOFILE, &rl), "getrlimit() failed!"); + + if (rl.rlim_cur < nr_fds_wanted) { + rl.rlim_cur = nr_fds_wanted; + if (rl.rlim_max < nr_fds_wanted) { + int old_rlim_max = rl.rlim_max; + rl.rlim_max = nr_fds_wanted; + + int r = setrlimit(RLIMIT_NOFILE, &rl); + if (r < 0) { + printf("RLIMIT_NOFILE hard limit is too low (%d, wanted %d)\n", + old_rlim_max, nr_fds_wanted); + exit(KSFT_SKIP); + } + } else { + TEST_ASSERT(!setrlimit(RLIMIT_NOFILE, &rl), "setrlimit() failed!"); + } + } + /* * Upstream KVM prior to 4.8 does not support KVM_CAP_MAX_VCPU_ID. * Userspace is supposed to use KVM_CAP_MAX_VCPUS as the maximum ID
On 12/13/21 15:19, Sasha Levin wrote:
From: Vitaly Kuznetsov vkuznets@redhat.com
[ Upstream commit 908fa88e420f30dde6d80f092795a18ec72ca6d3 ]
With the elevated 'KVM_CAP_MAX_VCPUS' value kvm_create_max_vcpus test may hit RLIMIT_NOFILE limits:
# ./kvm_create_max_vcpus KVM_CAP_MAX_VCPU_ID: 4096 KVM_CAP_MAX_VCPUS: 1024 Testing creating 1024 vCPUs, with IDs 0...1023. /dev/kvm not available (errno: 24), skipping test
Adjust RLIMIT_NOFILE limits to make sure KVM_CAP_MAX_VCPUS fds can be opened. Note, raising hard limit ('rlim_max') requires CAP_SYS_RESOURCE capability which is generally not needed to run kvm selftests (but without raising the limit the test is doomed to fail anyway).
Signed-off-by: Vitaly Kuznetsov vkuznets@redhat.com Message-Id: 20211123135953.667434-1-vkuznets@redhat.com [Skip the test if the hard limit can be raised. - Paolo] Reviewed-by: Sean Christopherson seanjc@google.com Tested-by: Sean Christopherson seanjc@google.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org
.../selftests/kvm/kvm_create_max_vcpus.c | 30 +++++++++++++++++++ 1 file changed, 30 insertions(+)
diff --git a/tools/testing/selftests/kvm/kvm_create_max_vcpus.c b/tools/testing/selftests/kvm/kvm_create_max_vcpus.c index 0299cd81b8ba2..aa3795cd7bd3d 100644 --- a/tools/testing/selftests/kvm/kvm_create_max_vcpus.c +++ b/tools/testing/selftests/kvm/kvm_create_max_vcpus.c @@ -12,6 +12,7 @@ #include <stdio.h> #include <stdlib.h> #include <string.h> +#include <sys/resource.h> #include "test_util.h" @@ -40,10 +41,39 @@ int main(int argc, char *argv[]) { int kvm_max_vcpu_id = kvm_check_cap(KVM_CAP_MAX_VCPU_ID); int kvm_max_vcpus = kvm_check_cap(KVM_CAP_MAX_VCPUS);
- /*
* Number of file descriptors reqired, KVM_CAP_MAX_VCPUS for vCPU fds +
* an arbitrary number for everything else.
*/
- int nr_fds_wanted = kvm_max_vcpus + 100;
- struct rlimit rl;
pr_info("KVM_CAP_MAX_VCPU_ID: %d\n", kvm_max_vcpu_id); pr_info("KVM_CAP_MAX_VCPUS: %d\n", kvm_max_vcpus);
- /*
* Check that we're allowed to open nr_fds_wanted file descriptors and
* try raising the limits if needed.
*/
- TEST_ASSERT(!getrlimit(RLIMIT_NOFILE, &rl), "getrlimit() failed!");
- if (rl.rlim_cur < nr_fds_wanted) {
rl.rlim_cur = nr_fds_wanted;
if (rl.rlim_max < nr_fds_wanted) {
int old_rlim_max = rl.rlim_max;
rl.rlim_max = nr_fds_wanted;
int r = setrlimit(RLIMIT_NOFILE, &rl);
if (r < 0) {
printf("RLIMIT_NOFILE hard limit is too low (%d, wanted %d)\n",
old_rlim_max, nr_fds_wanted);
exit(KSFT_SKIP);
}
} else {
TEST_ASSERT(!setrlimit(RLIMIT_NOFILE, &rl), "setrlimit() failed!");
}
- }
- /*
- Upstream KVM prior to 4.8 does not support KVM_CAP_MAX_VCPU_ID.
- Userspace is supposed to use KVM_CAP_MAX_VCPUS as the maximum ID
Acked-by: Paolo Bonzini pbonzini@redhat.com
From: Vitaly Kuznetsov vkuznets@redhat.com
[ Upstream commit feb627e8d6f69c9a319fe279710959efb3eba873 ]
Commit 63f5a1909f9e ("KVM: x86: Alert userspace that KVM_SET_CPUID{,2} after KVM_RUN is broken") officially deprecated KVM_SET_CPUID{,2} ioctls after first successful KVM_RUN and promissed to make this sequence forbiden in 5.16. It's time to fulfil the promise.
Signed-off-by: Vitaly Kuznetsov vkuznets@redhat.com Message-Id: 20211122175818.608220-3-vkuznets@redhat.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/x86/kvm/mmu/mmu.c | 28 +++++++++++----------------- arch/x86/kvm/x86.c | 19 +++++++++++++++++++ 2 files changed, 30 insertions(+), 17 deletions(-)
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 0a88cb4f731f4..31392e492ce5e 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -5022,6 +5022,14 @@ void kvm_mmu_after_set_cpuid(struct kvm_vcpu *vcpu) /* * Invalidate all MMU roles to force them to reinitialize as CPUID * information is factored into reserved bit calculations. + * + * Correctly handling multiple vCPU models with respect to paging and + * physical address properties) in a single VM would require tracking + * all relevant CPUID information in kvm_mmu_page_role. That is very + * undesirable as it would increase the memory requirements for + * gfn_track (see struct kvm_mmu_page_role comments). For now that + * problem is swept under the rug; KVM's CPUID API is horrific and + * it's all but impossible to solve it without introducing a new API. */ vcpu->arch.root_mmu.mmu_role.ext.valid = 0; vcpu->arch.guest_mmu.mmu_role.ext.valid = 0; @@ -5029,24 +5037,10 @@ void kvm_mmu_after_set_cpuid(struct kvm_vcpu *vcpu) kvm_mmu_reset_context(vcpu);
/* - * KVM does not correctly handle changing guest CPUID after KVM_RUN, as - * MAXPHYADDR, GBPAGES support, AMD reserved bit behavior, etc.. aren't - * tracked in kvm_mmu_page_role. As a result, KVM may miss guest page - * faults due to reusing SPs/SPTEs. Alert userspace, but otherwise - * sweep the problem under the rug. - * - * KVM's horrific CPUID ABI makes the problem all but impossible to - * solve, as correctly handling multiple vCPU models (with respect to - * paging and physical address properties) in a single VM would require - * tracking all relevant CPUID information in kvm_mmu_page_role. That - * is very undesirable as it would double the memory requirements for - * gfn_track (see struct kvm_mmu_page_role comments), and in practice - * no sane VMM mucks with the core vCPU model on the fly. + * Changing guest CPUID after KVM_RUN is forbidden, see the comment in + * kvm_arch_vcpu_ioctl(). */ - if (vcpu->arch.last_vmentry_cpu != -1) { - pr_warn_ratelimited("KVM: KVM_SET_CPUID{,2} after KVM_RUN may cause guest instability\n"); - pr_warn_ratelimited("KVM: KVM_SET_CPUID{,2} will fail after KVM_RUN starting with Linux 5.16\n"); - } + KVM_BUG_ON(vcpu->arch.last_vmentry_cpu != -1, vcpu->kvm); }
void kvm_mmu_reset_context(struct kvm_vcpu *vcpu) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index b7aa845f7beee..2a584070833f9 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -5088,6 +5088,17 @@ long kvm_arch_vcpu_ioctl(struct file *filp, struct kvm_cpuid __user *cpuid_arg = argp; struct kvm_cpuid cpuid;
+ /* + * KVM does not correctly handle changing guest CPUID after KVM_RUN, as + * MAXPHYADDR, GBPAGES support, AMD reserved bit behavior, etc.. aren't + * tracked in kvm_mmu_page_role. As a result, KVM may miss guest page + * faults due to reusing SPs/SPTEs. In practice no sane VMM mucks with + * the core vCPU model on the fly, so fail. + */ + r = -EINVAL; + if (vcpu->arch.last_vmentry_cpu != -1) + goto out; + r = -EFAULT; if (copy_from_user(&cpuid, cpuid_arg, sizeof(cpuid))) goto out; @@ -5098,6 +5109,14 @@ long kvm_arch_vcpu_ioctl(struct file *filp, struct kvm_cpuid2 __user *cpuid_arg = argp; struct kvm_cpuid2 cpuid;
+ /* + * KVM_SET_CPUID{,2} after KVM_RUN is forbidded, see the comment in + * KVM_SET_CPUID case above. + */ + r = -EINVAL; + if (vcpu->arch.last_vmentry_cpu != -1) + goto out; + r = -EFAULT; if (copy_from_user(&cpuid, cpuid_arg, sizeof(cpuid))) goto out;
On 12/13/21 15:19, Sasha Levin wrote:
From: Vitaly Kuznetsov vkuznets@redhat.com
[ Upstream commit feb627e8d6f69c9a319fe279710959efb3eba873 ]
Commit 63f5a1909f9e ("KVM: x86: Alert userspace that KVM_SET_CPUID{,2} after KVM_RUN is broken") officially deprecated KVM_SET_CPUID{,2} ioctls after first successful KVM_RUN and promissed to make this sequence forbiden in 5.16. It's time to fulfil the promise.
Signed-off-by: Vitaly Kuznetsov vkuznets@redhat.com Message-Id: 20211122175818.608220-3-vkuznets@redhat.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org
arch/x86/kvm/mmu/mmu.c | 28 +++++++++++----------------- arch/x86/kvm/x86.c | 19 +++++++++++++++++++ 2 files changed, 30 insertions(+), 17 deletions(-)
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 0a88cb4f731f4..31392e492ce5e 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -5022,6 +5022,14 @@ void kvm_mmu_after_set_cpuid(struct kvm_vcpu *vcpu) /* * Invalidate all MMU roles to force them to reinitialize as CPUID * information is factored into reserved bit calculations.
*
* Correctly handling multiple vCPU models with respect to paging and
* physical address properties) in a single VM would require tracking
* all relevant CPUID information in kvm_mmu_page_role. That is very
* undesirable as it would increase the memory requirements for
* gfn_track (see struct kvm_mmu_page_role comments). For now that
* problem is swept under the rug; KVM's CPUID API is horrific and
*/ vcpu->arch.root_mmu.mmu_role.ext.valid = 0; vcpu->arch.guest_mmu.mmu_role.ext.valid = 0;* it's all but impossible to solve it without introducing a new API.
@@ -5029,24 +5037,10 @@ void kvm_mmu_after_set_cpuid(struct kvm_vcpu *vcpu) kvm_mmu_reset_context(vcpu); /*
* KVM does not correctly handle changing guest CPUID after KVM_RUN, as
* MAXPHYADDR, GBPAGES support, AMD reserved bit behavior, etc.. aren't
* tracked in kvm_mmu_page_role. As a result, KVM may miss guest page
* faults due to reusing SPs/SPTEs. Alert userspace, but otherwise
* sweep the problem under the rug.
*
* KVM's horrific CPUID ABI makes the problem all but impossible to
* solve, as correctly handling multiple vCPU models (with respect to
* paging and physical address properties) in a single VM would require
* tracking all relevant CPUID information in kvm_mmu_page_role. That
* is very undesirable as it would double the memory requirements for
* gfn_track (see struct kvm_mmu_page_role comments), and in practice
* no sane VMM mucks with the core vCPU model on the fly.
* Changing guest CPUID after KVM_RUN is forbidden, see the comment in
*/* kvm_arch_vcpu_ioctl().
- if (vcpu->arch.last_vmentry_cpu != -1) {
pr_warn_ratelimited("KVM: KVM_SET_CPUID{,2} after KVM_RUN may cause guest instability\n");
pr_warn_ratelimited("KVM: KVM_SET_CPUID{,2} will fail after KVM_RUN starting with Linux 5.16\n");
- }
- KVM_BUG_ON(vcpu->arch.last_vmentry_cpu != -1, vcpu->kvm); }
void kvm_mmu_reset_context(struct kvm_vcpu *vcpu) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index b7aa845f7beee..2a584070833f9 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -5088,6 +5088,17 @@ long kvm_arch_vcpu_ioctl(struct file *filp, struct kvm_cpuid __user *cpuid_arg = argp; struct kvm_cpuid cpuid;
/*
* KVM does not correctly handle changing guest CPUID after KVM_RUN, as
* MAXPHYADDR, GBPAGES support, AMD reserved bit behavior, etc.. aren't
* tracked in kvm_mmu_page_role. As a result, KVM may miss guest page
* faults due to reusing SPs/SPTEs. In practice no sane VMM mucks with
* the core vCPU model on the fly, so fail.
*/
r = -EINVAL;
if (vcpu->arch.last_vmentry_cpu != -1)
goto out;
- r = -EFAULT; if (copy_from_user(&cpuid, cpuid_arg, sizeof(cpuid))) goto out;
@@ -5098,6 +5109,14 @@ long kvm_arch_vcpu_ioctl(struct file *filp, struct kvm_cpuid2 __user *cpuid_arg = argp; struct kvm_cpuid2 cpuid;
/*
* KVM_SET_CPUID{,2} after KVM_RUN is forbidded, see the comment in
* KVM_SET_CPUID case above.
*/
r = -EINVAL;
if (vcpu->arch.last_vmentry_cpu != -1)
goto out;
- r = -EFAULT; if (copy_from_user(&cpuid, cpuid_arg, sizeof(cpuid))) goto out;
NACK
the error says "starting with Linux 5.16" for a reason... :)
Paolo
From: Vitaly Kuznetsov vkuznets@redhat.com
[ Upstream commit 6c1186430a808f97e2052bd5d9eff12c5d5defb0 ]
hyperv_features's sole purpose is to test access to various Hyper-V MSRs and hypercalls with different CPUID data. As KVM_SET_CPUID2 after KVM_RUN is deprecated and soon-to-be forbidden, avoid it by re-creating test VM for each sub-test.
Signed-off-by: Vitaly Kuznetsov vkuznets@redhat.com Message-Id: 20211122175818.608220-2-vkuznets@redhat.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- .../selftests/kvm/x86_64/hyperv_features.c | 140 +++++++++--------- 1 file changed, 71 insertions(+), 69 deletions(-)
diff --git a/tools/testing/selftests/kvm/x86_64/hyperv_features.c b/tools/testing/selftests/kvm/x86_64/hyperv_features.c index 91d88aaa98992..672915ce73d8f 100644 --- a/tools/testing/selftests/kvm/x86_64/hyperv_features.c +++ b/tools/testing/selftests/kvm/x86_64/hyperv_features.c @@ -165,10 +165,10 @@ static void hv_set_cpuid(struct kvm_vm *vm, struct kvm_cpuid2 *cpuid, vcpu_set_cpuid(vm, VCPU_ID, cpuid); }
-static void guest_test_msrs_access(struct kvm_vm *vm, struct msr_data *msr, - struct kvm_cpuid2 *best) +static void guest_test_msrs_access(void) { struct kvm_run *run; + struct kvm_vm *vm; struct ucall uc; int stage = 0, r; struct kvm_cpuid_entry2 feat = { @@ -180,11 +180,34 @@ static void guest_test_msrs_access(struct kvm_vm *vm, struct msr_data *msr, struct kvm_cpuid_entry2 dbg = { .function = HYPERV_CPUID_SYNDBG_PLATFORM_CAPABILITIES }; - struct kvm_enable_cap cap = {0}; - - run = vcpu_state(vm, VCPU_ID); + struct kvm_cpuid2 *best; + vm_vaddr_t msr_gva; + struct kvm_enable_cap cap = { + .cap = KVM_CAP_HYPERV_ENFORCE_CPUID, + .args = {1} + }; + struct msr_data *msr;
while (true) { + vm = vm_create_default(VCPU_ID, 0, guest_msr); + + msr_gva = vm_vaddr_alloc_page(vm); + memset(addr_gva2hva(vm, msr_gva), 0x0, getpagesize()); + msr = addr_gva2hva(vm, msr_gva); + + vcpu_args_set(vm, VCPU_ID, 1, msr_gva); + vcpu_enable_cap(vm, VCPU_ID, &cap); + + vcpu_set_hv_cpuid(vm, VCPU_ID); + + best = kvm_get_supported_hv_cpuid(); + + vm_init_descriptor_tables(vm); + vcpu_init_descriptor_tables(vm, VCPU_ID); + vm_install_exception_handler(vm, GP_VECTOR, guest_gp_handler); + + run = vcpu_state(vm, VCPU_ID); + switch (stage) { case 0: /* @@ -315,6 +338,7 @@ static void guest_test_msrs_access(struct kvm_vm *vm, struct msr_data *msr, * capability enabled and guest visible CPUID bit unset. */ cap.cap = KVM_CAP_HYPERV_SYNIC2; + cap.args[0] = 0; vcpu_enable_cap(vm, VCPU_ID, &cap); break; case 22: @@ -461,9 +485,9 @@ static void guest_test_msrs_access(struct kvm_vm *vm, struct msr_data *msr,
switch (get_ucall(vm, VCPU_ID, &uc)) { case UCALL_SYNC: - TEST_ASSERT(uc.args[1] == stage, - "Unexpected stage: %ld (%d expected)\n", - uc.args[1], stage); + TEST_ASSERT(uc.args[1] == 0, + "Unexpected stage: %ld (0 expected)\n", + uc.args[1]); break; case UCALL_ABORT: TEST_FAIL("%s at %s:%ld", (const char *)uc.args[0], @@ -474,13 +498,14 @@ static void guest_test_msrs_access(struct kvm_vm *vm, struct msr_data *msr, }
stage++; + kvm_vm_free(vm); } }
-static void guest_test_hcalls_access(struct kvm_vm *vm, struct hcall_data *hcall, - void *input, void *output, struct kvm_cpuid2 *best) +static void guest_test_hcalls_access(void) { struct kvm_run *run; + struct kvm_vm *vm; struct ucall uc; int stage = 0, r; struct kvm_cpuid_entry2 feat = { @@ -493,10 +518,38 @@ static void guest_test_hcalls_access(struct kvm_vm *vm, struct hcall_data *hcall struct kvm_cpuid_entry2 dbg = { .function = HYPERV_CPUID_SYNDBG_PLATFORM_CAPABILITIES }; - - run = vcpu_state(vm, VCPU_ID); + struct kvm_enable_cap cap = { + .cap = KVM_CAP_HYPERV_ENFORCE_CPUID, + .args = {1} + }; + vm_vaddr_t hcall_page, hcall_params; + struct hcall_data *hcall; + struct kvm_cpuid2 *best;
while (true) { + vm = vm_create_default(VCPU_ID, 0, guest_hcall); + + vm_init_descriptor_tables(vm); + vcpu_init_descriptor_tables(vm, VCPU_ID); + vm_install_exception_handler(vm, UD_VECTOR, guest_ud_handler); + + /* Hypercall input/output */ + hcall_page = vm_vaddr_alloc_pages(vm, 2); + hcall = addr_gva2hva(vm, hcall_page); + memset(addr_gva2hva(vm, hcall_page), 0x0, 2 * getpagesize()); + + hcall_params = vm_vaddr_alloc_page(vm); + memset(addr_gva2hva(vm, hcall_params), 0x0, getpagesize()); + + vcpu_args_set(vm, VCPU_ID, 2, addr_gva2gpa(vm, hcall_page), hcall_params); + vcpu_enable_cap(vm, VCPU_ID, &cap); + + vcpu_set_hv_cpuid(vm, VCPU_ID); + + best = kvm_get_supported_hv_cpuid(); + + run = vcpu_state(vm, VCPU_ID); + switch (stage) { case 0: hcall->control = 0xdeadbeef; @@ -606,9 +659,9 @@ static void guest_test_hcalls_access(struct kvm_vm *vm, struct hcall_data *hcall
switch (get_ucall(vm, VCPU_ID, &uc)) { case UCALL_SYNC: - TEST_ASSERT(uc.args[1] == stage, - "Unexpected stage: %ld (%d expected)\n", - uc.args[1], stage); + TEST_ASSERT(uc.args[1] == 0, + "Unexpected stage: %ld (0 expected)\n", + uc.args[1]); break; case UCALL_ABORT: TEST_FAIL("%s at %s:%ld", (const char *)uc.args[0], @@ -619,66 +672,15 @@ static void guest_test_hcalls_access(struct kvm_vm *vm, struct hcall_data *hcall }
stage++; + kvm_vm_free(vm); } }
int main(void) { - struct kvm_cpuid2 *best; - struct kvm_vm *vm; - vm_vaddr_t msr_gva, hcall_page, hcall_params; - struct kvm_enable_cap cap = { - .cap = KVM_CAP_HYPERV_ENFORCE_CPUID, - .args = {1} - }; - - /* Test MSRs */ - vm = vm_create_default(VCPU_ID, 0, guest_msr); - - msr_gva = vm_vaddr_alloc_page(vm); - memset(addr_gva2hva(vm, msr_gva), 0x0, getpagesize()); - vcpu_args_set(vm, VCPU_ID, 1, msr_gva); - vcpu_enable_cap(vm, VCPU_ID, &cap); - - vcpu_set_hv_cpuid(vm, VCPU_ID); - - best = kvm_get_supported_hv_cpuid(); - - vm_init_descriptor_tables(vm); - vcpu_init_descriptor_tables(vm, VCPU_ID); - vm_install_exception_handler(vm, GP_VECTOR, guest_gp_handler); - pr_info("Testing access to Hyper-V specific MSRs\n"); - guest_test_msrs_access(vm, addr_gva2hva(vm, msr_gva), - best); - kvm_vm_free(vm); - - /* Test hypercalls */ - vm = vm_create_default(VCPU_ID, 0, guest_hcall); - - vm_init_descriptor_tables(vm); - vcpu_init_descriptor_tables(vm, VCPU_ID); - vm_install_exception_handler(vm, UD_VECTOR, guest_ud_handler); - - /* Hypercall input/output */ - hcall_page = vm_vaddr_alloc_pages(vm, 2); - memset(addr_gva2hva(vm, hcall_page), 0x0, 2 * getpagesize()); - - hcall_params = vm_vaddr_alloc_page(vm); - memset(addr_gva2hva(vm, hcall_params), 0x0, getpagesize()); - - vcpu_args_set(vm, VCPU_ID, 2, addr_gva2gpa(vm, hcall_page), hcall_params); - vcpu_enable_cap(vm, VCPU_ID, &cap); - - vcpu_set_hv_cpuid(vm, VCPU_ID); - - best = kvm_get_supported_hv_cpuid(); + guest_test_msrs_access();
pr_info("Testing access to Hyper-V hypercalls\n"); - guest_test_hcalls_access(vm, addr_gva2hva(vm, hcall_params), - addr_gva2hva(vm, hcall_page), - addr_gva2hva(vm, hcall_page) + getpagesize(), - best); - - kvm_vm_free(vm); + guest_test_hcalls_access(); }
On 12/13/21 15:19, Sasha Levin wrote:
From: Vitaly Kuznetsov vkuznets@redhat.com
[ Upstream commit 6c1186430a808f97e2052bd5d9eff12c5d5defb0 ]
hyperv_features's sole purpose is to test access to various Hyper-V MSRs and hypercalls with different CPUID data. As KVM_SET_CPUID2 after KVM_RUN is deprecated and soon-to-be forbidden, avoid it by re-creating test VM for each sub-test.
Signed-off-by: Vitaly Kuznetsov vkuznets@redhat.com Message-Id: 20211122175818.608220-2-vkuznets@redhat.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org
.../selftests/kvm/x86_64/hyperv_features.c | 140 +++++++++--------- 1 file changed, 71 insertions(+), 69 deletions(-)
diff --git a/tools/testing/selftests/kvm/x86_64/hyperv_features.c b/tools/testing/selftests/kvm/x86_64/hyperv_features.c index 91d88aaa98992..672915ce73d8f 100644 --- a/tools/testing/selftests/kvm/x86_64/hyperv_features.c +++ b/tools/testing/selftests/kvm/x86_64/hyperv_features.c @@ -165,10 +165,10 @@ static void hv_set_cpuid(struct kvm_vm *vm, struct kvm_cpuid2 *cpuid, vcpu_set_cpuid(vm, VCPU_ID, cpuid); } -static void guest_test_msrs_access(struct kvm_vm *vm, struct msr_data *msr,
struct kvm_cpuid2 *best)
+static void guest_test_msrs_access(void) { struct kvm_run *run;
- struct kvm_vm *vm; struct ucall uc; int stage = 0, r; struct kvm_cpuid_entry2 feat = {
@@ -180,11 +180,34 @@ static void guest_test_msrs_access(struct kvm_vm *vm, struct msr_data *msr, struct kvm_cpuid_entry2 dbg = { .function = HYPERV_CPUID_SYNDBG_PLATFORM_CAPABILITIES };
- struct kvm_enable_cap cap = {0};
- run = vcpu_state(vm, VCPU_ID);
- struct kvm_cpuid2 *best;
- vm_vaddr_t msr_gva;
- struct kvm_enable_cap cap = {
.cap = KVM_CAP_HYPERV_ENFORCE_CPUID,
.args = {1}
- };
- struct msr_data *msr;
while (true) {
vm = vm_create_default(VCPU_ID, 0, guest_msr);
msr_gva = vm_vaddr_alloc_page(vm);
memset(addr_gva2hva(vm, msr_gva), 0x0, getpagesize());
msr = addr_gva2hva(vm, msr_gva);
vcpu_args_set(vm, VCPU_ID, 1, msr_gva);
vcpu_enable_cap(vm, VCPU_ID, &cap);
vcpu_set_hv_cpuid(vm, VCPU_ID);
best = kvm_get_supported_hv_cpuid();
vm_init_descriptor_tables(vm);
vcpu_init_descriptor_tables(vm, VCPU_ID);
vm_install_exception_handler(vm, GP_VECTOR, guest_gp_handler);
run = vcpu_state(vm, VCPU_ID);
- switch (stage) { case 0: /*
@@ -315,6 +338,7 @@ static void guest_test_msrs_access(struct kvm_vm *vm, struct msr_data *msr, * capability enabled and guest visible CPUID bit unset. */ cap.cap = KVM_CAP_HYPERV_SYNIC2;
case 22:cap.args[0] = 0; vcpu_enable_cap(vm, VCPU_ID, &cap); break;
@@ -461,9 +485,9 @@ static void guest_test_msrs_access(struct kvm_vm *vm, struct msr_data *msr, switch (get_ucall(vm, VCPU_ID, &uc)) { case UCALL_SYNC:
TEST_ASSERT(uc.args[1] == stage,
"Unexpected stage: %ld (%d expected)\n",
uc.args[1], stage);
TEST_ASSERT(uc.args[1] == 0,
"Unexpected stage: %ld (0 expected)\n",
case UCALL_ABORT: TEST_FAIL("%s at %s:%ld", (const char *)uc.args[0],uc.args[1]); break;
@@ -474,13 +498,14 @@ static void guest_test_msrs_access(struct kvm_vm *vm, struct msr_data *msr, } stage++;
} }kvm_vm_free(vm);
-static void guest_test_hcalls_access(struct kvm_vm *vm, struct hcall_data *hcall,
void *input, void *output, struct kvm_cpuid2 *best)
+static void guest_test_hcalls_access(void) { struct kvm_run *run;
- struct kvm_vm *vm; struct ucall uc; int stage = 0, r; struct kvm_cpuid_entry2 feat = {
@@ -493,10 +518,38 @@ static void guest_test_hcalls_access(struct kvm_vm *vm, struct hcall_data *hcall struct kvm_cpuid_entry2 dbg = { .function = HYPERV_CPUID_SYNDBG_PLATFORM_CAPABILITIES };
- run = vcpu_state(vm, VCPU_ID);
- struct kvm_enable_cap cap = {
.cap = KVM_CAP_HYPERV_ENFORCE_CPUID,
.args = {1}
- };
- vm_vaddr_t hcall_page, hcall_params;
- struct hcall_data *hcall;
- struct kvm_cpuid2 *best;
while (true) {
vm = vm_create_default(VCPU_ID, 0, guest_hcall);
vm_init_descriptor_tables(vm);
vcpu_init_descriptor_tables(vm, VCPU_ID);
vm_install_exception_handler(vm, UD_VECTOR, guest_ud_handler);
/* Hypercall input/output */
hcall_page = vm_vaddr_alloc_pages(vm, 2);
hcall = addr_gva2hva(vm, hcall_page);
memset(addr_gva2hva(vm, hcall_page), 0x0, 2 * getpagesize());
hcall_params = vm_vaddr_alloc_page(vm);
memset(addr_gva2hva(vm, hcall_params), 0x0, getpagesize());
vcpu_args_set(vm, VCPU_ID, 2, addr_gva2gpa(vm, hcall_page), hcall_params);
vcpu_enable_cap(vm, VCPU_ID, &cap);
vcpu_set_hv_cpuid(vm, VCPU_ID);
best = kvm_get_supported_hv_cpuid();
run = vcpu_state(vm, VCPU_ID);
- switch (stage) { case 0: hcall->control = 0xdeadbeef;
@@ -606,9 +659,9 @@ static void guest_test_hcalls_access(struct kvm_vm *vm, struct hcall_data *hcall switch (get_ucall(vm, VCPU_ID, &uc)) { case UCALL_SYNC:
TEST_ASSERT(uc.args[1] == stage,
"Unexpected stage: %ld (%d expected)\n",
uc.args[1], stage);
TEST_ASSERT(uc.args[1] == 0,
"Unexpected stage: %ld (0 expected)\n",
case UCALL_ABORT: TEST_FAIL("%s at %s:%ld", (const char *)uc.args[0],uc.args[1]); break;
@@ -619,66 +672,15 @@ static void guest_test_hcalls_access(struct kvm_vm *vm, struct hcall_data *hcall } stage++;
} }kvm_vm_free(vm);
int main(void) {
- struct kvm_cpuid2 *best;
- struct kvm_vm *vm;
- vm_vaddr_t msr_gva, hcall_page, hcall_params;
- struct kvm_enable_cap cap = {
.cap = KVM_CAP_HYPERV_ENFORCE_CPUID,
.args = {1}
- };
- /* Test MSRs */
- vm = vm_create_default(VCPU_ID, 0, guest_msr);
- msr_gva = vm_vaddr_alloc_page(vm);
- memset(addr_gva2hva(vm, msr_gva), 0x0, getpagesize());
- vcpu_args_set(vm, VCPU_ID, 1, msr_gva);
- vcpu_enable_cap(vm, VCPU_ID, &cap);
- vcpu_set_hv_cpuid(vm, VCPU_ID);
- best = kvm_get_supported_hv_cpuid();
- vm_init_descriptor_tables(vm);
- vcpu_init_descriptor_tables(vm, VCPU_ID);
- vm_install_exception_handler(vm, GP_VECTOR, guest_gp_handler);
- pr_info("Testing access to Hyper-V specific MSRs\n");
- guest_test_msrs_access(vm, addr_gva2hva(vm, msr_gva),
best);
- kvm_vm_free(vm);
- /* Test hypercalls */
- vm = vm_create_default(VCPU_ID, 0, guest_hcall);
- vm_init_descriptor_tables(vm);
- vcpu_init_descriptor_tables(vm, VCPU_ID);
- vm_install_exception_handler(vm, UD_VECTOR, guest_ud_handler);
- /* Hypercall input/output */
- hcall_page = vm_vaddr_alloc_pages(vm, 2);
- memset(addr_gva2hva(vm, hcall_page), 0x0, 2 * getpagesize());
- hcall_params = vm_vaddr_alloc_page(vm);
- memset(addr_gva2hva(vm, hcall_params), 0x0, getpagesize());
- vcpu_args_set(vm, VCPU_ID, 2, addr_gva2gpa(vm, hcall_page), hcall_params);
- vcpu_enable_cap(vm, VCPU_ID, &cap);
- vcpu_set_hv_cpuid(vm, VCPU_ID);
- best = kvm_get_supported_hv_cpuid();
- guest_test_msrs_access();
pr_info("Testing access to Hyper-V hypercalls\n");
- guest_test_hcalls_access(vm, addr_gva2hva(vm, hcall_params),
addr_gva2hva(vm, hcall_page),
addr_gva2hva(vm, hcall_page) + getpagesize(),
best);
- kvm_vm_free(vm);
- guest_test_hcalls_access(); }
NACK
From: Paolo Bonzini pbonzini@redhat.com
[ Upstream commit 5f25e71e311478f9bb0a8ef49e7d8b95316491d7 ]
This is not an unrecoverable situation. Users of kvm_read_guest_offset_cached and kvm_write_guest_offset_cached must expect the read/write to fail, and therefore it is possible to just return early with an error value.
Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- virt/kvm/kvm_main.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index ce1847bc898b2..c6bfd4e15d28a 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -3001,7 +3001,8 @@ int kvm_write_guest_offset_cached(struct kvm *kvm, struct gfn_to_hva_cache *ghc, int r; gpa_t gpa = ghc->gpa + offset;
- BUG_ON(len + offset > ghc->len); + if (WARN_ON_ONCE(len + offset > ghc->len)) + return -EINVAL;
if (slots->generation != ghc->generation) { if (__kvm_gfn_to_hva_cache_init(slots, ghc, ghc->gpa, ghc->len)) @@ -3038,7 +3039,8 @@ int kvm_read_guest_offset_cached(struct kvm *kvm, struct gfn_to_hva_cache *ghc, int r; gpa_t gpa = ghc->gpa + offset;
- BUG_ON(len + offset > ghc->len); + if (WARN_ON_ONCE(len + offset > ghc->len)) + return -EINVAL;
if (slots->generation != ghc->generation) { if (__kvm_gfn_to_hva_cache_init(slots, ghc, ghc->gpa, ghc->len))
On 12/13/21 15:19, Sasha Levin wrote:
From: Paolo Bonzini pbonzini@redhat.com
[ Upstream commit 5f25e71e311478f9bb0a8ef49e7d8b95316491d7 ]
This is not an unrecoverable situation. Users of kvm_read_guest_offset_cached and kvm_write_guest_offset_cached must expect the read/write to fail, and therefore it is possible to just return early with an error value.
Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org
virt/kvm/kvm_main.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index ce1847bc898b2..c6bfd4e15d28a 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -3001,7 +3001,8 @@ int kvm_write_guest_offset_cached(struct kvm *kvm, struct gfn_to_hva_cache *ghc, int r; gpa_t gpa = ghc->gpa + offset;
- BUG_ON(len + offset > ghc->len);
- if (WARN_ON_ONCE(len + offset > ghc->len))
return -EINVAL;
if (slots->generation != ghc->generation) { if (__kvm_gfn_to_hva_cache_init(slots, ghc, ghc->gpa, ghc->len)) @@ -3038,7 +3039,8 @@ int kvm_read_guest_offset_cached(struct kvm *kvm, struct gfn_to_hva_cache *ghc, int r; gpa_t gpa = ghc->gpa + offset;
- BUG_ON(len + offset > ghc->len);
- if (WARN_ON_ONCE(len + offset > ghc->len))
return -EINVAL;
if (slots->generation != ghc->generation) { if (__kvm_gfn_to_hva_cache_init(slots, ghc, ghc->gpa, ghc->len))
Acked-by: Paolo Bonzini pbonzini@redhat.com
From: Juergen Gross jgross@suse.com
[ Upstream commit 9dba4d24cbb5524dd39ab1e08886373b17f07ff2 ]
Commit f52447261bc8c2 ("KVM: irq ack notification") introduced an ack_notifier() callback in struct kvm_pic and in struct kvm_ioapic without using them anywhere. Remove those callbacks again.
Signed-off-by: Juergen Gross jgross@suse.com Message-Id: 20211117071617.19504-1-jgross@suse.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/x86/kvm/ioapic.h | 1 - arch/x86/kvm/irq.h | 1 - 2 files changed, 2 deletions(-)
diff --git a/arch/x86/kvm/ioapic.h b/arch/x86/kvm/ioapic.h index 27e61ff3ac3e8..f1b2b2a6ff4db 100644 --- a/arch/x86/kvm/ioapic.h +++ b/arch/x86/kvm/ioapic.h @@ -81,7 +81,6 @@ struct kvm_ioapic { unsigned long irq_states[IOAPIC_NUM_PINS]; struct kvm_io_device dev; struct kvm *kvm; - void (*ack_notifier)(void *opaque, int irq); spinlock_t lock; struct rtc_status rtc_status; struct delayed_work eoi_inject; diff --git a/arch/x86/kvm/irq.h b/arch/x86/kvm/irq.h index 650642b18d151..c2d7cfe82d004 100644 --- a/arch/x86/kvm/irq.h +++ b/arch/x86/kvm/irq.h @@ -56,7 +56,6 @@ struct kvm_pic { struct kvm_io_device dev_master; struct kvm_io_device dev_slave; struct kvm_io_device dev_elcr; - void (*ack_notifier)(void *opaque, int irq); unsigned long irq_states[PIC_NUM_PINS]; };
On 12/13/21 15:19, Sasha Levin wrote:
From: Juergen Gross jgross@suse.com
[ Upstream commit 9dba4d24cbb5524dd39ab1e08886373b17f07ff2 ]
Commit f52447261bc8c2 ("KVM: irq ack notification") introduced an ack_notifier() callback in struct kvm_pic and in struct kvm_ioapic without using them anywhere. Remove those callbacks again.
Signed-off-by: Juergen Gross jgross@suse.com Message-Id: 20211117071617.19504-1-jgross@suse.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org
arch/x86/kvm/ioapic.h | 1 - arch/x86/kvm/irq.h | 1 - 2 files changed, 2 deletions(-)
diff --git a/arch/x86/kvm/ioapic.h b/arch/x86/kvm/ioapic.h index 27e61ff3ac3e8..f1b2b2a6ff4db 100644 --- a/arch/x86/kvm/ioapic.h +++ b/arch/x86/kvm/ioapic.h @@ -81,7 +81,6 @@ struct kvm_ioapic { unsigned long irq_states[IOAPIC_NUM_PINS]; struct kvm_io_device dev; struct kvm *kvm;
- void (*ack_notifier)(void *opaque, int irq); spinlock_t lock; struct rtc_status rtc_status; struct delayed_work eoi_inject;
diff --git a/arch/x86/kvm/irq.h b/arch/x86/kvm/irq.h index 650642b18d151..c2d7cfe82d004 100644 --- a/arch/x86/kvm/irq.h +++ b/arch/x86/kvm/irq.h @@ -56,7 +56,6 @@ struct kvm_pic { struct kvm_io_device dev_master; struct kvm_io_device dev_slave; struct kvm_io_device dev_elcr;
- void (*ack_notifier)(void *opaque, int irq); unsigned long irq_states[PIC_NUM_PINS]; };
Acked-by: Paolo Bonzini pbonzini@redhat.com
Let's save 24 bytes per VM. :)
Paolo
linux-stable-mirror@lists.linaro.org