[PATCH MANUALSEL 5.15 1/9] sched/cputime: Fix getrusage(RUSAGE_THREAD) with nohz_full

List overview All Threads
Download

newer

older

[PATCH MANUALSEL 5.10 1/4]...

[PATCH] usb: cdnsp: Fix lack of...

Sasha Levin

13 Dec 2021 13 Dec '21

2:19 p.m.

From: Frederic Weisbecker frederic@kernel.org

[ Upstream commit e7f2be115f0746b969c0df14c0d182f65f005ca5 ]

getrusage(RUSAGE_THREAD) with nohz_full may return shorter utime/stime than the actual time.

task_cputime_adjusted() snapshots utime and stime and then adjust their sum to match the scheduler maintained cputime.sum_exec_runtime. Unfortunately in nohz_full, sum_exec_runtime is only updated once per second in the worst case, causing a discrepancy against utime and stime that can be updated anytime by the reader using vtime.

To fix this situation, perform an update of cputime.sum_exec_runtime when the cputime snapshot reports the task as actually running while the tick is disabled. The related overhead is then contained within the relevant situations.

Reported-by: Hasegawa Hitomi hasegawa-hitomi@fujitsu.com Signed-off-by: Frederic Weisbecker frederic@kernel.org Signed-off-by: Hasegawa Hitomi hasegawa-hitomi@fujitsu.com Signed-off-by: Thomas Gleixner tglx@linutronix.de Tested-by: Masayoshi Mizuma m.mizuma@jp.fujitsu.com Acked-by: Phil Auld pauld@redhat.com Link: https://lore.kernel.org/r/20211026141055.57358-3-frederic@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- include/linux/sched/cputime.h | 5 +++-- kernel/sched/cputime.c | 12 +++++++++--- 2 files changed, 12 insertions(+), 5 deletions(-)

diff --git a/include/linux/sched/cputime.h b/include/linux/sched/cputime.h index 6c9f19a33865a..ce3c58286062c 100644 --- a/include/linux/sched/cputime.h +++ b/include/linux/sched/cputime.h @@ -18,15 +18,16 @@ #endif /* CONFIG_VIRT_CPU_ACCOUNTING_NATIVE */

#ifdef CONFIG_VIRT_CPU_ACCOUNTING_GEN -extern void task_cputime(struct task_struct *t, +extern bool task_cputime(struct task_struct *t, u64 *utime, u64 *stime); extern u64 task_gtime(struct task_struct *t); #else -static inline void task_cputime(struct task_struct *t, +static inline bool task_cputime(struct task_struct *t, u64 *utime, u64 *stime) { *utime = t->utime; *stime = t->stime; + return false; }

static inline u64 task_gtime(struct task_struct *t) diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c index 872e481d5098c..9392aea1804e5 100644 --- a/kernel/sched/cputime.c +++ b/kernel/sched/cputime.c @@ -615,7 +615,8 @@ void task_cputime_adjusted(struct task_struct *p, u64 *ut, u64 *st) .sum_exec_runtime = p->se.sum_exec_runtime, };

- task_cputime(p, &cputime.utime, &cputime.stime); + if (task_cputime(p, &cputime.utime, &cputime.stime)) + cputime.sum_exec_runtime = task_sched_runtime(p); cputime_adjust(&cputime, &p->prev_cputime, ut, st); } EXPORT_SYMBOL_GPL(task_cputime_adjusted); @@ -828,19 +829,21 @@ u64 task_gtime(struct task_struct *t) * add up the pending nohz execution time since the last * cputime snapshot. */ -void task_cputime(struct task_struct *t, u64 *utime, u64 *stime) +bool task_cputime(struct task_struct *t, u64 *utime, u64 *stime) { struct vtime *vtime = &t->vtime; unsigned int seq; u64 delta; + int ret;

if (!vtime_accounting_enabled()) { *utime = t->utime; *stime = t->stime; - return; + return false; }

do { + ret = false; seq = read_seqcount_begin(&vtime->seqcount);

*utime = t->utime; @@ -850,6 +853,7 @@ void task_cputime(struct task_struct *t, u64 *utime, u64 *stime) if (vtime->state < VTIME_SYS) continue;

+ ret = true; delta = vtime_delta(vtime);

/* @@ -861,6 +865,8 @@ void task_cputime(struct task_struct *t, u64 *utime, u64 *stime) else *utime += vtime->utime + delta; } while (read_seqcount_retry(&vtime->seqcount, seq)); + + return ret; }

static int vtime_state_fetch(struct vtime *vtime, int cpu)

-- 2.33.0

Show replies by date

Sasha Levin

13 Dec 13 Dec

2:19 p.m.

New subject: [PATCH MANUALSEL 5.15 2/9] timers/nohz: Last resort update jiffies on nohz_full IRQ entry

From: Frederic Weisbecker frederic@kernel.org

[ Upstream commit 53e87e3cdc155f20c3417b689df8d2ac88d79576 ]

When at least one CPU runs in nohz_full mode, a dedicated timekeeper CPU is guaranteed to stay online and to never stop its tick.

Meanwhile on some rare case, the dedicated timekeeper may be running with interrupts disabled for a while, such as in stop_machine.

If jiffies stop being updated, a nohz_full CPU may end up endlessly programming the next tick in the past, taking the last jiffies update monotonic timestamp as a stale base, resulting in an tick storm.

Here is a scenario where it matters:

0) CPU 0 is the timekeeper and CPU 1 a nohz_full CPU.

1) A stop machine callback is queued to execute somewhere.

2) CPU 0 reaches MULTI_STOP_DISABLE_IRQ while CPU 1 is still in MULTI_STOP_PREPARE. Hence CPU 0 can't do its timekeeping duty. CPU 1 can still take IRQs.

3) CPU 1 receives an IRQ which queues a timer callback one jiffy forward.

4) On IRQ exit, CPU 1 schedules the tick one jiffy forward, taking last_jiffies_update as a base. But last_jiffies_update hasn't been updated for 2 jiffies since the timekeeper has interrupts disabled.

5) clockevents_program_event(), which relies on ktime_get(), observes that the expiration is in the past and therefore programs the min delta event on the clock.

6) The tick fires immediately, goto 3)

7) Tick storm, the nohz_full CPU is drown and takes ages to reach MULTI_STOP_DISABLE_IRQ, which is the only way out of this situation.

Solve this with unconditionally updating jiffies if the value is stale on nohz_full IRQ entry. IRQs and other disturbances are expected to be rare enough on nohz_full for the unconditional call to ktime_get() to actually matter.

Reported-by: Paul E. McKenney paulmck@kernel.org Signed-off-by: Frederic Weisbecker frederic@kernel.org Signed-off-by: Thomas Gleixner tglx@linutronix.de Tested-by: Paul E. McKenney paulmck@kernel.org Link: https://lore.kernel.org/r/20211026141055.57358-2-frederic@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- kernel/softirq.c | 3 ++- kernel/time/tick-sched.c | 7 +++++++ 2 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/kernel/softirq.c b/kernel/softirq.c index 322b65d456767..41f470929e991 100644 --- a/kernel/softirq.c +++ b/kernel/softirq.c @@ -595,7 +595,8 @@ void irq_enter_rcu(void) { __irq_enter_raw();

- if (is_idle_task(current) && (irq_count() == HARDIRQ_OFFSET)) + if (tick_nohz_full_cpu(smp_processor_id()) || + (is_idle_task(current) && (irq_count() == HARDIRQ_OFFSET))) tick_irq_enter();

account_hardirq_enter(current); diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c index 6bffe5af8cb11..17a283ce2b20f 100644 --- a/kernel/time/tick-sched.c +++ b/kernel/time/tick-sched.c @@ -1375,6 +1375,13 @@ static inline void tick_nohz_irq_enter(void) now = ktime_get(); if (ts->idle_active) tick_nohz_stop_idle(ts, now); + /* + * If all CPUs are idle. We may need to update a stale jiffies value. + * Note nohz_full is a special case: a timekeeper is guaranteed to stay + * alive but it might be busy looping with interrupts disabled in some + * rare case (typically stop machine). So we must make sure we have a + * last resort. + */ if (ts->tick_stopped) tick_nohz_update_jiffies(now); }

-- 2.33.0

Sasha Levin

2:19 p.m.

New subject: [PATCH MANUALSEL 5.15 3/9] KVM: VMX: clear vmx_x86_ops.sync_pir_to_irr if APICv is disabled

From: Paolo Bonzini pbonzini@redhat.com

[ Upstream commit e90e51d5f01d2baae5dcce280866bbb96816e978 ]

There is nothing to synchronize if APICv is disabled, since neither other vCPUs nor assigned devices can set PIR.ON.

Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/x86/kvm/vmx/vmx.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index dacdf2395f01a..4e212f04268bb 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -7776,10 +7776,10 @@ static __init int hardware_setup(void) ple_window_shrink = 0; }

- if (!cpu_has_vmx_apicv()) { + if (!cpu_has_vmx_apicv()) enable_apicv = 0; + if (!enable_apicv) vmx_x86_ops.sync_pir_to_irr = NULL; - }

if (cpu_has_vmx_tsc_scaling()) { kvm_has_tsc_control = true;

-- 2.33.0

Paolo Bonzini

2:27 p.m.

New subject: [PATCH MANUALSEL 5.15 3/9] KVM: VMX: clear vmx_x86_ops.sync_pir_to_irr if APICv is disabled

On 12/13/21 15:19, Sasha Levin wrote:

...

From: Paolo Bonzini pbonzini@redhat.com

[ Upstream commit e90e51d5f01d2baae5dcce280866bbb96816e978 ]

There is nothing to synchronize if APICv is disabled, since neither other vCPUs nor assigned devices can set PIR.ON.

Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org

arch/x86/kvm/vmx/vmx.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index dacdf2395f01a..4e212f04268bb 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -7776,10 +7776,10 @@ static __init int hardware_setup(void) ple_window_shrink = 0; }

if (!cpu_has_vmx_apicv()) {

if (!cpu_has_vmx_apicv()) enable_apicv = 0;

if (!enable_apicv) vmx_x86_ops.sync_pir_to_irr = NULL;

}

if (cpu_has_vmx_tsc_scaling()) { kvm_has_tsc_control = true;

Acked-by: Paolo Bonzini pbonzini@redhat.com

Sasha Levin

2:19 p.m.

New subject: [PATCH MANUALSEL 5.15 4/9] KVM: SEV: do not take kvm->lock when destroying

From: Paolo Bonzini pbonzini@redhat.com

[ Upstream commit 10a37929efeb4c51a0069afdd537c4fa3831f6e5 ]

Taking the lock is useless since there are no other references, and there are already accesses (e.g. to sev->enc_context_owner) that do not take it. So get rid of it.

Reviewed-by: Sean Christopherson seanjc@google.com Message-Id: 20211123005036.2954379-12-pbonzini@redhat.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/x86/kvm/svm/sev.c | 4 ---- 1 file changed, 4 deletions(-)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c index 134c4ea5e6ad8..ae8092f0d401e 100644 --- a/arch/x86/kvm/svm/sev.c +++ b/arch/x86/kvm/svm/sev.c @@ -1842,8 +1842,6 @@ void sev_vm_destroy(struct kvm *kvm) return; }

- mutex_lock(&kvm->lock); - /* * Ensure that all guest tagged cache entries are flushed before * releasing the pages back to the system for use. CLFLUSH will @@ -1863,8 +1861,6 @@ void sev_vm_destroy(struct kvm *kvm) } }

- mutex_unlock(&kvm->lock); - sev_unbind_asid(kvm, sev->handle); sev_asid_free(sev); }

-- 2.33.0

Paolo Bonzini

2:27 p.m.

New subject: [PATCH MANUALSEL 5.15 4/9] KVM: SEV: do not take kvm->lock when destroying

On 12/13/21 15:19, Sasha Levin wrote:

...

From: Paolo Bonzini pbonzini@redhat.com

[ Upstream commit 10a37929efeb4c51a0069afdd537c4fa3831f6e5 ]

Taking the lock is useless since there are no other references, and there are already accesses (e.g. to sev->enc_context_owner) that do not take it. So get rid of it.

Reviewed-by: Sean Christopherson seanjc@google.com Message-Id: 20211123005036.2954379-12-pbonzini@redhat.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org

arch/x86/kvm/svm/sev.c | 4 ---- 1 file changed, 4 deletions(-)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c index 134c4ea5e6ad8..ae8092f0d401e 100644 --- a/arch/x86/kvm/svm/sev.c +++ b/arch/x86/kvm/svm/sev.c @@ -1842,8 +1842,6 @@ void sev_vm_destroy(struct kvm *kvm) return; }

mutex_lock(&kvm->lock);

/*

Ensure that all guest tagged cache entries are flushed before

releasing the pages back to the system for use. CLFLUSH will

@@ -1863,8 +1861,6 @@ void sev_vm_destroy(struct kvm *kvm) } }

mutex_unlock(&kvm->lock);

sev_unbind_asid(kvm, sev->handle); sev_asid_free(sev); }

This is cosmetic only, so

NACK

Paolo

Sasha Levin

2:19 p.m.

New subject: [PATCH MANUALSEL 5.15 5/9] KVM: selftests: Make sure kvm_create_max_vcpus test won't hit RLIMIT_NOFILE

From: Vitaly Kuznetsov vkuznets@redhat.com

[ Upstream commit 908fa88e420f30dde6d80f092795a18ec72ca6d3 ]

With the elevated 'KVM_CAP_MAX_VCPUS' value kvm_create_max_vcpus test may hit RLIMIT_NOFILE limits:

# ./kvm_create_max_vcpus KVM_CAP_MAX_VCPU_ID: 4096 KVM_CAP_MAX_VCPUS: 1024 Testing creating 1024 vCPUs, with IDs 0...1023. /dev/kvm not available (errno: 24), skipping test

Adjust RLIMIT_NOFILE limits to make sure KVM_CAP_MAX_VCPUS fds can be opened. Note, raising hard limit ('rlim_max') requires CAP_SYS_RESOURCE capability which is generally not needed to run kvm selftests (but without raising the limit the test is doomed to fail anyway).

Signed-off-by: Vitaly Kuznetsov vkuznets@redhat.com Message-Id: 20211123135953.667434-1-vkuznets@redhat.com [Skip the test if the hard limit can be raised. - Paolo] Reviewed-by: Sean Christopherson seanjc@google.com Tested-by: Sean Christopherson seanjc@google.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- .../selftests/kvm/kvm_create_max_vcpus.c | 30 +++++++++++++++++++ 1 file changed, 30 insertions(+)

diff --git a/tools/testing/selftests/kvm/kvm_create_max_vcpus.c b/tools/testing/selftests/kvm/kvm_create_max_vcpus.c index 0299cd81b8ba2..aa3795cd7bd3d 100644 --- a/tools/testing/selftests/kvm/kvm_create_max_vcpus.c +++ b/tools/testing/selftests/kvm/kvm_create_max_vcpus.c @@ -12,6 +12,7 @@ #include <stdio.h> #include <stdlib.h> #include <string.h> +#include <sys/resource.h>

#include "test_util.h"

@@ -40,10 +41,39 @@ int main(int argc, char *argv[]) { int kvm_max_vcpu_id = kvm_check_cap(KVM_CAP_MAX_VCPU_ID); int kvm_max_vcpus = kvm_check_cap(KVM_CAP_MAX_VCPUS); + /* + * Number of file descriptors reqired, KVM_CAP_MAX_VCPUS for vCPU fds + + * an arbitrary number for everything else. + */ + int nr_fds_wanted = kvm_max_vcpus + 100; + struct rlimit rl;

pr_info("KVM_CAP_MAX_VCPU_ID: %d\n", kvm_max_vcpu_id); pr_info("KVM_CAP_MAX_VCPUS: %d\n", kvm_max_vcpus);

+ /* + * Check that we're allowed to open nr_fds_wanted file descriptors and + * try raising the limits if needed. + */ + TEST_ASSERT(!getrlimit(RLIMIT_NOFILE, &rl), "getrlimit() failed!"); + + if (rl.rlim_cur < nr_fds_wanted) { + rl.rlim_cur = nr_fds_wanted; + if (rl.rlim_max < nr_fds_wanted) { + int old_rlim_max = rl.rlim_max; + rl.rlim_max = nr_fds_wanted; + + int r = setrlimit(RLIMIT_NOFILE, &rl); + if (r < 0) { + printf("RLIMIT_NOFILE hard limit is too low (%d, wanted %d)\n", + old_rlim_max, nr_fds_wanted); + exit(KSFT_SKIP); + } + } else { + TEST_ASSERT(!setrlimit(RLIMIT_NOFILE, &rl), "setrlimit() failed!"); + } + } + /* * Upstream KVM prior to 4.8 does not support KVM_CAP_MAX_VCPU_ID. * Userspace is supposed to use KVM_CAP_MAX_VCPUS as the maximum ID

-- 2.33.0

Paolo Bonzini

2:27 p.m.

New subject: [PATCH MANUALSEL 5.15 5/9] KVM: selftests: Make sure kvm_create_max_vcpus test won't hit RLIMIT_NOFILE

On 12/13/21 15:19, Sasha Levin wrote:

...

From: Vitaly Kuznetsov vkuznets@redhat.com

[ Upstream commit 908fa88e420f30dde6d80f092795a18ec72ca6d3 ]

With the elevated 'KVM_CAP_MAX_VCPUS' value kvm_create_max_vcpus test may hit RLIMIT_NOFILE limits:

# ./kvm_create_max_vcpus KVM_CAP_MAX_VCPU_ID: 4096 KVM_CAP_MAX_VCPUS: 1024 Testing creating 1024 vCPUs, with IDs 0...1023. /dev/kvm not available (errno: 24), skipping test

Adjust RLIMIT_NOFILE limits to make sure KVM_CAP_MAX_VCPUS fds can be opened. Note, raising hard limit ('rlim_max') requires CAP_SYS_RESOURCE capability which is generally not needed to run kvm selftests (but without raising the limit the test is doomed to fail anyway).

Signed-off-by: Vitaly Kuznetsov vkuznets@redhat.com Message-Id: 20211123135953.667434-1-vkuznets@redhat.com [Skip the test if the hard limit can be raised. - Paolo] Reviewed-by: Sean Christopherson seanjc@google.com Tested-by: Sean Christopherson seanjc@google.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org

.../selftests/kvm/kvm_create_max_vcpus.c | 30 +++++++++++++++++++ 1 file changed, 30 insertions(+)

diff --git a/tools/testing/selftests/kvm/kvm_create_max_vcpus.c b/tools/testing/selftests/kvm/kvm_create_max_vcpus.c index 0299cd81b8ba2..aa3795cd7bd3d 100644 --- a/tools/testing/selftests/kvm/kvm_create_max_vcpus.c +++ b/tools/testing/selftests/kvm/kvm_create_max_vcpus.c @@ -12,6 +12,7 @@ #include <stdio.h> #include <stdlib.h> #include <string.h> +#include <sys/resource.h> #include "test_util.h" @@ -40,10 +41,39 @@ int main(int argc, char *argv[]) { int kvm_max_vcpu_id = kvm_check_cap(KVM_CAP_MAX_VCPU_ID); int kvm_max_vcpus = kvm_check_cap(KVM_CAP_MAX_VCPUS);
/*
* Number of file descriptors reqired, KVM_CAP_MAX_VCPUS for vCPU fds +
* an arbitrary number for everything else.
*/
int nr_fds_wanted = kvm_max_vcpus + 100;

struct rlimit rl;
pr_info("KVM_CAP_MAX_VCPU_ID: %d\n", kvm_max_vcpu_id); pr_info("KVM_CAP_MAX_VCPUS: %d\n", kvm_max_vcpus);
/*
* Check that we're allowed to open nr_fds_wanted file descriptors and
* try raising the limits if needed.
*/
TEST_ASSERT(!getrlimit(RLIMIT_NOFILE, &rl), "getrlimit() failed!");

if (rl.rlim_cur < nr_fds_wanted) {
rl.rlim_cur = nr_fds_wanted;
if (rl.rlim_max < nr_fds_wanted) {
	int old_rlim_max = rl.rlim_max;
	rl.rlim_max = nr_fds_wanted;
	int r = setrlimit(RLIMIT_NOFILE, &rl);
	if (r < 0) {
		printf("RLIMIT_NOFILE hard limit is too low (%d, wanted %d)\n",
		       old_rlim_max, nr_fds_wanted);
		exit(KSFT_SKIP);
	}
} else {
	TEST_ASSERT(!setrlimit(RLIMIT_NOFILE, &rl), "setrlimit() failed!");
}
}

/*

Upstream KVM prior to 4.8 does not support KVM_CAP_MAX_VCPU_ID.

Userspace is supposed to use KVM_CAP_MAX_VCPUS as the maximum ID

Acked-by: Paolo Bonzini pbonzini@redhat.com

Sasha Levin

2:19 p.m.

New subject: [PATCH MANUALSEL 5.15 6/9] KVM: x86: Forbid KVM_SET_CPUID{, 2} after KVM_RUN

From: Vitaly Kuznetsov vkuznets@redhat.com

[ Upstream commit feb627e8d6f69c9a319fe279710959efb3eba873 ]

Commit 63f5a1909f9e ("KVM: x86: Alert userspace that KVM_SET_CPUID{,2} after KVM_RUN is broken") officially deprecated KVM_SET_CPUID{,2} ioctls after first successful KVM_RUN and promissed to make this sequence forbiden in 5.16. It's time to fulfil the promise.

Signed-off-by: Vitaly Kuznetsov vkuznets@redhat.com Message-Id: 20211122175818.608220-3-vkuznets@redhat.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/x86/kvm/mmu/mmu.c | 28 +++++++++++----------------- arch/x86/kvm/x86.c | 19 +++++++++++++++++++ 2 files changed, 30 insertions(+), 17 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 0a88cb4f731f4..31392e492ce5e 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -5022,6 +5022,14 @@ void kvm_mmu_after_set_cpuid(struct kvm_vcpu *vcpu) /* * Invalidate all MMU roles to force them to reinitialize as CPUID * information is factored into reserved bit calculations. + * + * Correctly handling multiple vCPU models with respect to paging and + * physical address properties) in a single VM would require tracking + * all relevant CPUID information in kvm_mmu_page_role. That is very + * undesirable as it would increase the memory requirements for + * gfn_track (see struct kvm_mmu_page_role comments). For now that + * problem is swept under the rug; KVM's CPUID API is horrific and + * it's all but impossible to solve it without introducing a new API. */ vcpu->arch.root_mmu.mmu_role.ext.valid = 0; vcpu->arch.guest_mmu.mmu_role.ext.valid = 0; @@ -5029,24 +5037,10 @@ void kvm_mmu_after_set_cpuid(struct kvm_vcpu *vcpu) kvm_mmu_reset_context(vcpu);

/* - * KVM does not correctly handle changing guest CPUID after KVM_RUN, as - * MAXPHYADDR, GBPAGES support, AMD reserved bit behavior, etc.. aren't - * tracked in kvm_mmu_page_role. As a result, KVM may miss guest page - * faults due to reusing SPs/SPTEs. Alert userspace, but otherwise - * sweep the problem under the rug. - * - * KVM's horrific CPUID ABI makes the problem all but impossible to - * solve, as correctly handling multiple vCPU models (with respect to - * paging and physical address properties) in a single VM would require - * tracking all relevant CPUID information in kvm_mmu_page_role. That - * is very undesirable as it would double the memory requirements for - * gfn_track (see struct kvm_mmu_page_role comments), and in practice - * no sane VMM mucks with the core vCPU model on the fly. + * Changing guest CPUID after KVM_RUN is forbidden, see the comment in + * kvm_arch_vcpu_ioctl(). */ - if (vcpu->arch.last_vmentry_cpu != -1) { - pr_warn_ratelimited("KVM: KVM_SET_CPUID{,2} after KVM_RUN may cause guest instability\n"); - pr_warn_ratelimited("KVM: KVM_SET_CPUID{,2} will fail after KVM_RUN starting with Linux 5.16\n"); - } + KVM_BUG_ON(vcpu->arch.last_vmentry_cpu != -1, vcpu->kvm); }

void kvm_mmu_reset_context(struct kvm_vcpu *vcpu) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index b7aa845f7beee..2a584070833f9 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -5088,6 +5088,17 @@ long kvm_arch_vcpu_ioctl(struct file *filp, struct kvm_cpuid __user *cpuid_arg = argp; struct kvm_cpuid cpuid;

+ /* + * KVM does not correctly handle changing guest CPUID after KVM_RUN, as + * MAXPHYADDR, GBPAGES support, AMD reserved bit behavior, etc.. aren't + * tracked in kvm_mmu_page_role. As a result, KVM may miss guest page + * faults due to reusing SPs/SPTEs. In practice no sane VMM mucks with + * the core vCPU model on the fly, so fail. + */ + r = -EINVAL; + if (vcpu->arch.last_vmentry_cpu != -1) + goto out; + r = -EFAULT; if (copy_from_user(&cpuid, cpuid_arg, sizeof(cpuid))) goto out; @@ -5098,6 +5109,14 @@ long kvm_arch_vcpu_ioctl(struct file *filp, struct kvm_cpuid2 __user *cpuid_arg = argp; struct kvm_cpuid2 cpuid;

+ /* + * KVM_SET_CPUID{,2} after KVM_RUN is forbidded, see the comment in + * KVM_SET_CPUID case above. + */ + r = -EINVAL; + if (vcpu->arch.last_vmentry_cpu != -1) + goto out; + r = -EFAULT; if (copy_from_user(&cpuid, cpuid_arg, sizeof(cpuid))) goto out;

-- 2.33.0

Paolo Bonzini

2:28 p.m.

New subject: [PATCH MANUALSEL 5.15 6/9] KVM: x86: Forbid KVM_SET_CPUID{,2} after KVM_RUN

On 12/13/21 15:19, Sasha Levin wrote:

...

From: Vitaly Kuznetsov vkuznets@redhat.com

[ Upstream commit feb627e8d6f69c9a319fe279710959efb3eba873 ]

Commit 63f5a1909f9e ("KVM: x86: Alert userspace that KVM_SET_CPUID{,2} after KVM_RUN is broken") officially deprecated KVM_SET_CPUID{,2} ioctls after first successful KVM_RUN and promissed to make this sequence forbiden in 5.16. It's time to fulfil the promise.

Signed-off-by: Vitaly Kuznetsov vkuznets@redhat.com Message-Id: 20211122175818.608220-3-vkuznets@redhat.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org

arch/x86/kvm/mmu/mmu.c | 28 +++++++++++----------------- arch/x86/kvm/x86.c | 19 +++++++++++++++++++ 2 files changed, 30 insertions(+), 17 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 0a88cb4f731f4..31392e492ce5e 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -5022,6 +5022,14 @@ void kvm_mmu_after_set_cpuid(struct kvm_vcpu *vcpu) /* * Invalidate all MMU roles to force them to reinitialize as CPUID * information is factored into reserved bit calculations.
*
* Correctly handling multiple vCPU models with respect to paging and
* physical address properties) in a single VM would require tracking
* all relevant CPUID information in kvm_mmu_page_role. That is very
* undesirable as it would increase the memory requirements for
* gfn_track (see struct kvm_mmu_page_role comments).  For now that
* problem is swept under the rug; KVM's CPUID API is horrific and
* it's all but impossible to solve it without introducing a new API.
*/ vcpu->arch.root_mmu.mmu_role.ext.valid = 0; vcpu->arch.guest_mmu.mmu_role.ext.valid = 0;
@@ -5029,24 +5037,10 @@ void kvm_mmu_after_set_cpuid(struct kvm_vcpu *vcpu) kvm_mmu_reset_context(vcpu); /*
* KVM does not correctly handle changing guest CPUID after KVM_RUN, as
* MAXPHYADDR, GBPAGES support, AMD reserved bit behavior, etc.. aren't
* tracked in kvm_mmu_page_role.  As a result, KVM may miss guest page
* faults due to reusing SPs/SPTEs.  Alert userspace, but otherwise
* sweep the problem under the rug.
*
* KVM's horrific CPUID ABI makes the problem all but impossible to
* solve, as correctly handling multiple vCPU models (with respect to
* paging and physical address properties) in a single VM would require
* tracking all relevant CPUID information in kvm_mmu_page_role.  That
* is very undesirable as it would double the memory requirements for
* gfn_track (see struct kvm_mmu_page_role comments), and in practice
* no sane VMM mucks with the core vCPU model on the fly.
* Changing guest CPUID after KVM_RUN is forbidden, see the comment in
* kvm_arch_vcpu_ioctl().
*/
if (vcpu->arch.last_vmentry_cpu != -1) {
pr_warn_ratelimited("KVM: KVM_SET_CPUID{,2} after KVM_RUN may cause guest instability\n");
pr_warn_ratelimited("KVM: KVM_SET_CPUID{,2} will fail after KVM_RUN starting with Linux 5.16\n");
}
KVM_BUG_ON(vcpu->arch.last_vmentry_cpu != -1, vcpu->kvm); }

void kvm_mmu_reset_context(struct kvm_vcpu *vcpu) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index b7aa845f7beee..2a584070833f9 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -5088,6 +5088,17 @@ long kvm_arch_vcpu_ioctl(struct file *filp, struct kvm_cpuid __user *cpuid_arg = argp; struct kvm_cpuid cpuid;
/*
 * KVM does not correctly handle changing guest CPUID after KVM_RUN, as
 * MAXPHYADDR, GBPAGES support, AMD reserved bit behavior, etc.. aren't
 * tracked in kvm_mmu_page_role.  As a result, KVM may miss guest page
 * faults due to reusing SPs/SPTEs.  In practice no sane VMM mucks with
 * the core vCPU model on the fly, so fail.
 */
r = -EINVAL;
if (vcpu->arch.last_vmentry_cpu != -1)
	goto out;
r = -EFAULT; if (copy_from_user(&cpuid, cpuid_arg, sizeof(cpuid))) goto out;
@@ -5098,6 +5109,14 @@ long kvm_arch_vcpu_ioctl(struct file *filp, struct kvm_cpuid2 __user *cpuid_arg = argp; struct kvm_cpuid2 cpuid;
/*
 * KVM_SET_CPUID{,2} after KVM_RUN is forbidded, see the comment in
 * KVM_SET_CPUID case above.
 */
r = -EINVAL;
if (vcpu->arch.last_vmentry_cpu != -1)
	goto out;
r = -EFAULT; if (copy_from_user(&cpuid, cpuid_arg, sizeof(cpuid))) goto out;

NACK

the error says "starting with Linux 5.16" for a reason... :)

Paolo

Sasha Levin

2:19 p.m.

New subject: [PATCH MANUALSEL 5.15 7/9] KVM: selftests: Avoid KVM_SET_CPUID2 after KVM_RUN in hyperv_features test

From: Vitaly Kuznetsov vkuznets@redhat.com

[ Upstream commit 6c1186430a808f97e2052bd5d9eff12c5d5defb0 ]

hyperv_features's sole purpose is to test access to various Hyper-V MSRs and hypercalls with different CPUID data. As KVM_SET_CPUID2 after KVM_RUN is deprecated and soon-to-be forbidden, avoid it by re-creating test VM for each sub-test.

Signed-off-by: Vitaly Kuznetsov vkuznets@redhat.com Message-Id: 20211122175818.608220-2-vkuznets@redhat.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- .../selftests/kvm/x86_64/hyperv_features.c | 140 +++++++++--------- 1 file changed, 71 insertions(+), 69 deletions(-)

diff --git a/tools/testing/selftests/kvm/x86_64/hyperv_features.c b/tools/testing/selftests/kvm/x86_64/hyperv_features.c index 91d88aaa98992..672915ce73d8f 100644 --- a/tools/testing/selftests/kvm/x86_64/hyperv_features.c +++ b/tools/testing/selftests/kvm/x86_64/hyperv_features.c @@ -165,10 +165,10 @@ static void hv_set_cpuid(struct kvm_vm *vm, struct kvm_cpuid2 *cpuid, vcpu_set_cpuid(vm, VCPU_ID, cpuid); }

-static void guest_test_msrs_access(struct kvm_vm *vm, struct msr_data *msr, - struct kvm_cpuid2 *best) +static void guest_test_msrs_access(void) { struct kvm_run *run; + struct kvm_vm *vm; struct ucall uc; int stage = 0, r; struct kvm_cpuid_entry2 feat = { @@ -180,11 +180,34 @@ static void guest_test_msrs_access(struct kvm_vm *vm, struct msr_data *msr, struct kvm_cpuid_entry2 dbg = { .function = HYPERV_CPUID_SYNDBG_PLATFORM_CAPABILITIES }; - struct kvm_enable_cap cap = {0}; - - run = vcpu_state(vm, VCPU_ID); + struct kvm_cpuid2 *best; + vm_vaddr_t msr_gva; + struct kvm_enable_cap cap = { + .cap = KVM_CAP_HYPERV_ENFORCE_CPUID, + .args = {1} + }; + struct msr_data *msr;

while (true) { + vm = vm_create_default(VCPU_ID, 0, guest_msr); + + msr_gva = vm_vaddr_alloc_page(vm); + memset(addr_gva2hva(vm, msr_gva), 0x0, getpagesize()); + msr = addr_gva2hva(vm, msr_gva); + + vcpu_args_set(vm, VCPU_ID, 1, msr_gva); + vcpu_enable_cap(vm, VCPU_ID, &cap); + + vcpu_set_hv_cpuid(vm, VCPU_ID); + + best = kvm_get_supported_hv_cpuid(); + + vm_init_descriptor_tables(vm); + vcpu_init_descriptor_tables(vm, VCPU_ID); + vm_install_exception_handler(vm, GP_VECTOR, guest_gp_handler); + + run = vcpu_state(vm, VCPU_ID); + switch (stage) { case 0: /* @@ -315,6 +338,7 @@ static void guest_test_msrs_access(struct kvm_vm *vm, struct msr_data *msr, * capability enabled and guest visible CPUID bit unset. */ cap.cap = KVM_CAP_HYPERV_SYNIC2; + cap.args[0] = 0; vcpu_enable_cap(vm, VCPU_ID, &cap); break; case 22: @@ -461,9 +485,9 @@ static void guest_test_msrs_access(struct kvm_vm *vm, struct msr_data *msr,

switch (get_ucall(vm, VCPU_ID, &uc)) { case UCALL_SYNC: - TEST_ASSERT(uc.args[1] == stage, - "Unexpected stage: %ld (%d expected)\n", - uc.args[1], stage); + TEST_ASSERT(uc.args[1] == 0, + "Unexpected stage: %ld (0 expected)\n", + uc.args[1]); break; case UCALL_ABORT: TEST_FAIL("%s at %s:%ld", (const char *)uc.args[0], @@ -474,13 +498,14 @@ static void guest_test_msrs_access(struct kvm_vm *vm, struct msr_data *msr, }

stage++; + kvm_vm_free(vm); } }

-static void guest_test_hcalls_access(struct kvm_vm *vm, struct hcall_data *hcall, - void *input, void *output, struct kvm_cpuid2 *best) +static void guest_test_hcalls_access(void) { struct kvm_run *run; + struct kvm_vm *vm; struct ucall uc; int stage = 0, r; struct kvm_cpuid_entry2 feat = { @@ -493,10 +518,38 @@ static void guest_test_hcalls_access(struct kvm_vm *vm, struct hcall_data *hcall struct kvm_cpuid_entry2 dbg = { .function = HYPERV_CPUID_SYNDBG_PLATFORM_CAPABILITIES }; - - run = vcpu_state(vm, VCPU_ID); + struct kvm_enable_cap cap = { + .cap = KVM_CAP_HYPERV_ENFORCE_CPUID, + .args = {1} + }; + vm_vaddr_t hcall_page, hcall_params; + struct hcall_data *hcall; + struct kvm_cpuid2 *best;

while (true) { + vm = vm_create_default(VCPU_ID, 0, guest_hcall); + + vm_init_descriptor_tables(vm); + vcpu_init_descriptor_tables(vm, VCPU_ID); + vm_install_exception_handler(vm, UD_VECTOR, guest_ud_handler); + + /* Hypercall input/output */ + hcall_page = vm_vaddr_alloc_pages(vm, 2); + hcall = addr_gva2hva(vm, hcall_page); + memset(addr_gva2hva(vm, hcall_page), 0x0, 2 * getpagesize()); + + hcall_params = vm_vaddr_alloc_page(vm); + memset(addr_gva2hva(vm, hcall_params), 0x0, getpagesize()); + + vcpu_args_set(vm, VCPU_ID, 2, addr_gva2gpa(vm, hcall_page), hcall_params); + vcpu_enable_cap(vm, VCPU_ID, &cap); + + vcpu_set_hv_cpuid(vm, VCPU_ID); + + best = kvm_get_supported_hv_cpuid(); + + run = vcpu_state(vm, VCPU_ID); + switch (stage) { case 0: hcall->control = 0xdeadbeef; @@ -606,9 +659,9 @@ static void guest_test_hcalls_access(struct kvm_vm *vm, struct hcall_data *hcall

switch (get_ucall(vm, VCPU_ID, &uc)) { case UCALL_SYNC: - TEST_ASSERT(uc.args[1] == stage, - "Unexpected stage: %ld (%d expected)\n", - uc.args[1], stage); + TEST_ASSERT(uc.args[1] == 0, + "Unexpected stage: %ld (0 expected)\n", + uc.args[1]); break; case UCALL_ABORT: TEST_FAIL("%s at %s:%ld", (const char *)uc.args[0], @@ -619,66 +672,15 @@ static void guest_test_hcalls_access(struct kvm_vm *vm, struct hcall_data *hcall }

stage++; + kvm_vm_free(vm); } }

int main(void) { - struct kvm_cpuid2 *best; - struct kvm_vm *vm; - vm_vaddr_t msr_gva, hcall_page, hcall_params; - struct kvm_enable_cap cap = { - .cap = KVM_CAP_HYPERV_ENFORCE_CPUID, - .args = {1} - }; - - /* Test MSRs */ - vm = vm_create_default(VCPU_ID, 0, guest_msr); - - msr_gva = vm_vaddr_alloc_page(vm); - memset(addr_gva2hva(vm, msr_gva), 0x0, getpagesize()); - vcpu_args_set(vm, VCPU_ID, 1, msr_gva); - vcpu_enable_cap(vm, VCPU_ID, &cap); - - vcpu_set_hv_cpuid(vm, VCPU_ID); - - best = kvm_get_supported_hv_cpuid(); - - vm_init_descriptor_tables(vm); - vcpu_init_descriptor_tables(vm, VCPU_ID); - vm_install_exception_handler(vm, GP_VECTOR, guest_gp_handler); - pr_info("Testing access to Hyper-V specific MSRs\n"); - guest_test_msrs_access(vm, addr_gva2hva(vm, msr_gva), - best); - kvm_vm_free(vm); - - /* Test hypercalls */ - vm = vm_create_default(VCPU_ID, 0, guest_hcall); - - vm_init_descriptor_tables(vm); - vcpu_init_descriptor_tables(vm, VCPU_ID); - vm_install_exception_handler(vm, UD_VECTOR, guest_ud_handler); - - /* Hypercall input/output */ - hcall_page = vm_vaddr_alloc_pages(vm, 2); - memset(addr_gva2hva(vm, hcall_page), 0x0, 2 * getpagesize()); - - hcall_params = vm_vaddr_alloc_page(vm); - memset(addr_gva2hva(vm, hcall_params), 0x0, getpagesize()); - - vcpu_args_set(vm, VCPU_ID, 2, addr_gva2gpa(vm, hcall_page), hcall_params); - vcpu_enable_cap(vm, VCPU_ID, &cap); - - vcpu_set_hv_cpuid(vm, VCPU_ID); - - best = kvm_get_supported_hv_cpuid(); + guest_test_msrs_access();

pr_info("Testing access to Hyper-V hypercalls\n"); - guest_test_hcalls_access(vm, addr_gva2hva(vm, hcall_params), - addr_gva2hva(vm, hcall_page), - addr_gva2hva(vm, hcall_page) + getpagesize(), - best); - - kvm_vm_free(vm); + guest_test_hcalls_access(); }

-- 2.33.0

Paolo Bonzini

2:28 p.m.

New subject: [PATCH MANUALSEL 5.15 7/9] KVM: selftests: Avoid KVM_SET_CPUID2 after KVM_RUN in hyperv_features test

On 12/13/21 15:19, Sasha Levin wrote:

...

From: Vitaly Kuznetsov vkuznets@redhat.com

[ Upstream commit 6c1186430a808f97e2052bd5d9eff12c5d5defb0 ]

hyperv_features's sole purpose is to test access to various Hyper-V MSRs and hypercalls with different CPUID data. As KVM_SET_CPUID2 after KVM_RUN is deprecated and soon-to-be forbidden, avoid it by re-creating test VM for each sub-test.

Signed-off-by: Vitaly Kuznetsov vkuznets@redhat.com Message-Id: 20211122175818.608220-2-vkuznets@redhat.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org

.../selftests/kvm/x86_64/hyperv_features.c | 140 +++++++++--------- 1 file changed, 71 insertions(+), 69 deletions(-)

diff --git a/tools/testing/selftests/kvm/x86_64/hyperv_features.c b/tools/testing/selftests/kvm/x86_64/hyperv_features.c index 91d88aaa98992..672915ce73d8f 100644 --- a/tools/testing/selftests/kvm/x86_64/hyperv_features.c +++ b/tools/testing/selftests/kvm/x86_64/hyperv_features.c @@ -165,10 +165,10 @@ static void hv_set_cpuid(struct kvm_vm *vm, struct kvm_cpuid2 *cpuid, vcpu_set_cpuid(vm, VCPU_ID, cpuid); } -static void guest_test_msrs_access(struct kvm_vm *vm, struct msr_data *msr,
		   struct kvm_cpuid2 *best)
+static void guest_test_msrs_access(void) { struct kvm_run *run;

struct kvm_vm *vm; struct ucall uc; int stage = 0, r; struct kvm_cpuid_entry2 feat = {

@@ -180,11 +180,34 @@ static void guest_test_msrs_access(struct kvm_vm *vm, struct msr_data *msr, struct kvm_cpuid_entry2 dbg = { .function = HYPERV_CPUID_SYNDBG_PLATFORM_CAPABILITIES };

struct kvm_enable_cap cap = {0};

run = vcpu_state(vm, VCPU_ID);
struct kvm_cpuid2 *best;

vm_vaddr_t msr_gva;

struct kvm_enable_cap cap = {
.cap = KVM_CAP_HYPERV_ENFORCE_CPUID,
.args = {1}
};

struct msr_data *msr;
while (true) {
vm = vm_create_default(VCPU_ID, 0, guest_msr);
msr_gva = vm_vaddr_alloc_page(vm);
memset(addr_gva2hva(vm, msr_gva), 0x0, getpagesize());
msr = addr_gva2hva(vm, msr_gva);
vcpu_args_set(vm, VCPU_ID, 1, msr_gva);
vcpu_enable_cap(vm, VCPU_ID, &cap);
vcpu_set_hv_cpuid(vm, VCPU_ID);
best = kvm_get_supported_hv_cpuid();
vm_init_descriptor_tables(vm);
vcpu_init_descriptor_tables(vm, VCPU_ID);
vm_install_exception_handler(vm, GP_VECTOR, guest_gp_handler);
run = vcpu_state(vm, VCPU_ID);
switch (stage) { case 0: /*
@@ -315,6 +338,7 @@ static void guest_test_msrs_access(struct kvm_vm *vm, struct msr_data *msr, * capability enabled and guest visible CPUID bit unset. */ cap.cap = KVM_CAP_HYPERV_SYNIC2;
	cap.args[0] = 0;
vcpu_enable_cap(vm, VCPU_ID, &cap);
break;
case 22:
@@ -461,9 +485,9 @@ static void guest_test_msrs_access(struct kvm_vm *vm, struct msr_data *msr, switch (get_ucall(vm, VCPU_ID, &uc)) { case UCALL_SYNC:
	TEST_ASSERT(uc.args[1] == stage,
		    "Unexpected stage: %ld (%d expected)\n",
		    uc.args[1], stage);
	TEST_ASSERT(uc.args[1] == 0,
		    "Unexpected stage: %ld (0 expected)\n",
		    uc.args[1]);
break;
case UCALL_ABORT: TEST_FAIL("%s at %s:%ld", (const char *)uc.args[0],
@@ -474,13 +498,14 @@ static void guest_test_msrs_access(struct kvm_vm *vm, struct msr_data *msr, } stage++;
kvm_vm_free(vm);
} }
-static void guest_test_hcalls_access(struct kvm_vm *vm, struct hcall_data *hcall,
		     void *input, void *output, struct kvm_cpuid2 *best)
+static void guest_test_hcalls_access(void) { struct kvm_run *run;

struct kvm_vm *vm; struct ucall uc; int stage = 0, r; struct kvm_cpuid_entry2 feat = {

@@ -493,10 +518,38 @@ static void guest_test_hcalls_access(struct kvm_vm *vm, struct hcall_data *hcall struct kvm_cpuid_entry2 dbg = { .function = HYPERV_CPUID_SYNDBG_PLATFORM_CAPABILITIES };

run = vcpu_state(vm, VCPU_ID);
struct kvm_enable_cap cap = {
.cap = KVM_CAP_HYPERV_ENFORCE_CPUID,
.args = {1}
};

vm_vaddr_t hcall_page, hcall_params;

struct hcall_data *hcall;

struct kvm_cpuid2 *best;
while (true) {
vm = vm_create_default(VCPU_ID, 0, guest_hcall);
vm_init_descriptor_tables(vm);
vcpu_init_descriptor_tables(vm, VCPU_ID);
vm_install_exception_handler(vm, UD_VECTOR, guest_ud_handler);
/* Hypercall input/output */
hcall_page = vm_vaddr_alloc_pages(vm, 2);
hcall = addr_gva2hva(vm, hcall_page);
memset(addr_gva2hva(vm, hcall_page), 0x0, 2 * getpagesize());
hcall_params = vm_vaddr_alloc_page(vm);
memset(addr_gva2hva(vm, hcall_params), 0x0, getpagesize());
vcpu_args_set(vm, VCPU_ID, 2, addr_gva2gpa(vm, hcall_page), hcall_params);
vcpu_enable_cap(vm, VCPU_ID, &cap);
vcpu_set_hv_cpuid(vm, VCPU_ID);
best = kvm_get_supported_hv_cpuid();
run = vcpu_state(vm, VCPU_ID);
switch (stage) { case 0: hcall->control = 0xdeadbeef;
@@ -606,9 +659,9 @@ static void guest_test_hcalls_access(struct kvm_vm *vm, struct hcall_data *hcall switch (get_ucall(vm, VCPU_ID, &uc)) { case UCALL_SYNC:
	TEST_ASSERT(uc.args[1] == stage,
		    "Unexpected stage: %ld (%d expected)\n",
		    uc.args[1], stage);
	TEST_ASSERT(uc.args[1] == 0,
		    "Unexpected stage: %ld (0 expected)\n",
		    uc.args[1]);
break;
case UCALL_ABORT: TEST_FAIL("%s at %s:%ld", (const char *)uc.args[0],
@@ -619,66 +672,15 @@ static void guest_test_hcalls_access(struct kvm_vm *vm, struct hcall_data *hcall } stage++;
kvm_vm_free(vm);
} }
int main(void) {
struct kvm_cpuid2 *best;

struct kvm_vm *vm;

vm_vaddr_t msr_gva, hcall_page, hcall_params;

struct kvm_enable_cap cap = {
.cap = KVM_CAP_HYPERV_ENFORCE_CPUID,
.args = {1}
};

/* Test MSRs */

vm = vm_create_default(VCPU_ID, 0, guest_msr);

msr_gva = vm_vaddr_alloc_page(vm);

memset(addr_gva2hva(vm, msr_gva), 0x0, getpagesize());

vcpu_args_set(vm, VCPU_ID, 1, msr_gva);

vcpu_enable_cap(vm, VCPU_ID, &cap);

vcpu_set_hv_cpuid(vm, VCPU_ID);

best = kvm_get_supported_hv_cpuid();

vm_init_descriptor_tables(vm);

vcpu_init_descriptor_tables(vm, VCPU_ID);

vm_install_exception_handler(vm, GP_VECTOR, guest_gp_handler);

pr_info("Testing access to Hyper-V specific MSRs\n");

guest_test_msrs_access(vm, addr_gva2hva(vm, msr_gva),
	       best);
kvm_vm_free(vm);

/* Test hypercalls */

vm = vm_create_default(VCPU_ID, 0, guest_hcall);

vm_init_descriptor_tables(vm);

vcpu_init_descriptor_tables(vm, VCPU_ID);

vm_install_exception_handler(vm, UD_VECTOR, guest_ud_handler);

/* Hypercall input/output */

hcall_page = vm_vaddr_alloc_pages(vm, 2);

memset(addr_gva2hva(vm, hcall_page), 0x0, 2 * getpagesize());

hcall_params = vm_vaddr_alloc_page(vm);

memset(addr_gva2hva(vm, hcall_params), 0x0, getpagesize());

vcpu_args_set(vm, VCPU_ID, 2, addr_gva2gpa(vm, hcall_page), hcall_params);

vcpu_enable_cap(vm, VCPU_ID, &cap);

vcpu_set_hv_cpuid(vm, VCPU_ID);

best = kvm_get_supported_hv_cpuid();
guest_test_msrs_access();

pr_info("Testing access to Hyper-V hypercalls\n");
guest_test_hcalls_access(vm, addr_gva2hva(vm, hcall_params),
		 addr_gva2hva(vm, hcall_page),
		 addr_gva2hva(vm, hcall_page) + getpagesize(),
		 best);
kvm_vm_free(vm);
guest_test_hcalls_access(); }

NACK

Sasha Levin

2:19 p.m.

New subject: [PATCH MANUALSEL 5.15 8/9] KVM: downgrade two BUG_ONs to WARN_ON_ONCE

From: Paolo Bonzini pbonzini@redhat.com

[ Upstream commit 5f25e71e311478f9bb0a8ef49e7d8b95316491d7 ]

This is not an unrecoverable situation. Users of kvm_read_guest_offset_cached and kvm_write_guest_offset_cached must expect the read/write to fail, and therefore it is possible to just return early with an error value.

Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- virt/kvm/kvm_main.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index ce1847bc898b2..c6bfd4e15d28a 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -3001,7 +3001,8 @@ int kvm_write_guest_offset_cached(struct kvm *kvm, struct gfn_to_hva_cache *ghc, int r; gpa_t gpa = ghc->gpa + offset;

- BUG_ON(len + offset > ghc->len); + if (WARN_ON_ONCE(len + offset > ghc->len)) + return -EINVAL;

if (slots->generation != ghc->generation) { if (__kvm_gfn_to_hva_cache_init(slots, ghc, ghc->gpa, ghc->len)) @@ -3038,7 +3039,8 @@ int kvm_read_guest_offset_cached(struct kvm *kvm, struct gfn_to_hva_cache *ghc, int r; gpa_t gpa = ghc->gpa + offset;

- BUG_ON(len + offset > ghc->len); + if (WARN_ON_ONCE(len + offset > ghc->len)) + return -EINVAL;

if (slots->generation != ghc->generation) { if (__kvm_gfn_to_hva_cache_init(slots, ghc, ghc->gpa, ghc->len))

-- 2.33.0

Paolo Bonzini

2:28 p.m.

New subject: [PATCH MANUALSEL 5.15 8/9] KVM: downgrade two BUG_ONs to WARN_ON_ONCE

On 12/13/21 15:19, Sasha Levin wrote:

...

From: Paolo Bonzini pbonzini@redhat.com

[ Upstream commit 5f25e71e311478f9bb0a8ef49e7d8b95316491d7 ]

This is not an unrecoverable situation. Users of kvm_read_guest_offset_cached and kvm_write_guest_offset_cached must expect the read/write to fail, and therefore it is possible to just return early with an error value.

Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org

virt/kvm/kvm_main.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index ce1847bc898b2..c6bfd4e15d28a 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -3001,7 +3001,8 @@ int kvm_write_guest_offset_cached(struct kvm *kvm, struct gfn_to_hva_cache *ghc, int r; gpa_t gpa = ghc->gpa + offset;

BUG_ON(len + offset > ghc->len);
if (WARN_ON_ONCE(len + offset > ghc->len))
return -EINVAL;
if (slots->generation != ghc->generation) { if (__kvm_gfn_to_hva_cache_init(slots, ghc, ghc->gpa, ghc->len)) @@ -3038,7 +3039,8 @@ int kvm_read_guest_offset_cached(struct kvm *kvm, struct gfn_to_hva_cache *ghc, int r; gpa_t gpa = ghc->gpa + offset;

BUG_ON(len + offset > ghc->len);
if (WARN_ON_ONCE(len + offset > ghc->len))
return -EINVAL;
if (slots->generation != ghc->generation) { if (__kvm_gfn_to_hva_cache_init(slots, ghc, ghc->gpa, ghc->len))

Acked-by: Paolo Bonzini pbonzini@redhat.com

Sasha Levin

2:19 p.m.

New subject: [PATCH MANUALSEL 5.15 9/9] x86/kvm: remove unused ack_notifier callbacks

From: Juergen Gross jgross@suse.com

[ Upstream commit 9dba4d24cbb5524dd39ab1e08886373b17f07ff2 ]

Commit f52447261bc8c2 ("KVM: irq ack notification") introduced an ack_notifier() callback in struct kvm_pic and in struct kvm_ioapic without using them anywhere. Remove those callbacks again.

Signed-off-by: Juergen Gross jgross@suse.com Message-Id: 20211117071617.19504-1-jgross@suse.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/x86/kvm/ioapic.h | 1 - arch/x86/kvm/irq.h | 1 - 2 files changed, 2 deletions(-)

diff --git a/arch/x86/kvm/ioapic.h b/arch/x86/kvm/ioapic.h index 27e61ff3ac3e8..f1b2b2a6ff4db 100644 --- a/arch/x86/kvm/ioapic.h +++ b/arch/x86/kvm/ioapic.h @@ -81,7 +81,6 @@ struct kvm_ioapic { unsigned long irq_states[IOAPIC_NUM_PINS]; struct kvm_io_device dev; struct kvm *kvm; - void (*ack_notifier)(void *opaque, int irq); spinlock_t lock; struct rtc_status rtc_status; struct delayed_work eoi_inject; diff --git a/arch/x86/kvm/irq.h b/arch/x86/kvm/irq.h index 650642b18d151..c2d7cfe82d004 100644 --- a/arch/x86/kvm/irq.h +++ b/arch/x86/kvm/irq.h @@ -56,7 +56,6 @@ struct kvm_pic { struct kvm_io_device dev_master; struct kvm_io_device dev_slave; struct kvm_io_device dev_elcr; - void (*ack_notifier)(void *opaque, int irq); unsigned long irq_states[PIC_NUM_PINS]; };

-- 2.33.0

Paolo Bonzini

2:28 p.m.

New subject: [PATCH MANUALSEL 5.15 9/9] x86/kvm: remove unused ack_notifier callbacks

On 12/13/21 15:19, Sasha Levin wrote:

...

From: Juergen Gross jgross@suse.com

[ Upstream commit 9dba4d24cbb5524dd39ab1e08886373b17f07ff2 ]

Commit f52447261bc8c2 ("KVM: irq ack notification") introduced an ack_notifier() callback in struct kvm_pic and in struct kvm_ioapic without using them anywhere. Remove those callbacks again.

Signed-off-by: Juergen Gross jgross@suse.com Message-Id: 20211117071617.19504-1-jgross@suse.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org

arch/x86/kvm/ioapic.h | 1 - arch/x86/kvm/irq.h | 1 - 2 files changed, 2 deletions(-)

diff --git a/arch/x86/kvm/ioapic.h b/arch/x86/kvm/ioapic.h index 27e61ff3ac3e8..f1b2b2a6ff4db 100644 --- a/arch/x86/kvm/ioapic.h +++ b/arch/x86/kvm/ioapic.h @@ -81,7 +81,6 @@ struct kvm_ioapic { unsigned long irq_states[IOAPIC_NUM_PINS]; struct kvm_io_device dev; struct kvm *kvm;

void (*ack_notifier)(void *opaque, int irq); spinlock_t lock; struct rtc_status rtc_status; struct delayed_work eoi_inject;

diff --git a/arch/x86/kvm/irq.h b/arch/x86/kvm/irq.h index 650642b18d151..c2d7cfe82d004 100644 --- a/arch/x86/kvm/irq.h +++ b/arch/x86/kvm/irq.h @@ -56,7 +56,6 @@ struct kvm_pic { struct kvm_io_device dev_master; struct kvm_io_device dev_slave; struct kvm_io_device dev_elcr;

void (*ack_notifier)(void *opaque, int irq); unsigned long irq_states[PIC_NUM_PINS]; };

Acked-by: Paolo Bonzini pbonzini@redhat.com

Let's save 24 bytes per VM. :)

Paolo

1301

days inactive

1301

days old

linux-stable-mirror@lists.linaro.org

15 comments

participants

tags (0)

participants (2)

Paolo Bonzini
Sasha Levin