The set value of `fast_switch_enabled` indicates that fast_switch
callback is set. For some drivers such as amd_pstate and intel_pstate,
the adjust_perf callback is used but it still sets
`fast_switch_possible` flag. This is because this flag also decides
whether schedutil governor selects adjust_perf function for frequency
update. This condition in the schedutil governor forces the scaling
driver to set the `fast_switch_possible` flag.
Remove `fast_switch_enabled` check when schedutil decides to select
adjust_perf function for frequency update. Thus removing this drivers
are now free to remove `fast_switch_possible` flag if they don't use
fast_switch callback.
This issue becomes apparent when aperf/mperf overflow occurs. When this
happens, kernel disables frequency invariance calculation which causes
schedutil to fallback to sugov_update_single_freq which currently relies
on the fast_switch callback.
Normal flow:
sugov_update_single_perf
cpufreq_driver_adjust_perf
cpufreq_driver->adjust_perf
Error case flow:
sugov_update_single_perf
sugov_update_single_freq <-- This is chosen because the freq invariant is disabled due to aperf/mperf overflow
cpufreq_driver_fast_switch
cpufreq_driver->fast_switch <-- Here NULL pointer dereference is happening, because fast_switch is not set
This change fixes this NULL pointer dereference issue.
Fixes: a61dec744745 ("cpufreq: schedutil: Avoid missing updates for one-CPU policies")
Signed-off-by: Wyes Karny <wyes.karny(a)amd.com>
Cc: "Rafael J. Wysocki" <rafael(a)kernel.org>
Cc: stable(a)vger.kernel.org
---
drivers/cpufreq/amd-pstate.c | 10 +++++++---
drivers/cpufreq/cpufreq.c | 20 +++++++++++++++++++-
drivers/cpufreq/intel_pstate.c | 3 +--
include/linux/cpufreq.h | 1 +
kernel/sched/cpufreq_schedutil.c | 2 +-
5 files changed, 29 insertions(+), 7 deletions(-)
diff --git a/drivers/cpufreq/amd-pstate.c b/drivers/cpufreq/amd-pstate.c
index 5a3d4aa0f45a..007bfe724a6a 100644
--- a/drivers/cpufreq/amd-pstate.c
+++ b/drivers/cpufreq/amd-pstate.c
@@ -671,8 +671,14 @@ static int amd_pstate_cpu_init(struct cpufreq_policy *policy)
/* It will be updated by governor */
policy->cur = policy->cpuinfo.min_freq;
+ /**
+ * For shared memory system frequency update takes time that's why
+ * do this in deferred kthread context.
+ */
if (boot_cpu_has(X86_FEATURE_CPPC))
- policy->fast_switch_possible = true;
+ current_pstate_driver->adjust_perf = amd_pstate_adjust_perf;
+ else
+ current_pstate_driver->adjust_perf = NULL;
ret = freq_qos_add_request(&policy->constraints, &cpudata->req[0],
FREQ_QOS_MIN, policy->cpuinfo.min_freq);
@@ -697,8 +703,6 @@ static int amd_pstate_cpu_init(struct cpufreq_policy *policy)
policy->driver_data = cpudata;
amd_pstate_boost_init(cpudata);
- if (!current_pstate_driver->adjust_perf)
- current_pstate_driver->adjust_perf = amd_pstate_adjust_perf;
return 0;
diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index 6b52ebe5a890..366747012104 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -501,6 +501,13 @@ void cpufreq_enable_fast_switch(struct cpufreq_policy *policy)
if (!policy->fast_switch_possible)
return;
+ /**
+ * It's not expected driver's fast_switch callback is not set
+ * even fast_switch_possible is true.
+ */
+ if (WARN_ON(!cpufreq_driver_has_fast_switch()))
+ return;
+
mutex_lock(&cpufreq_fast_switch_lock);
if (cpufreq_fast_switch_count >= 0) {
cpufreq_fast_switch_count++;
@@ -2143,6 +2150,17 @@ unsigned int cpufreq_driver_fast_switch(struct cpufreq_policy *policy,
}
EXPORT_SYMBOL_GPL(cpufreq_driver_fast_switch);
+/**
+ * cpufreq_driver_has_fast_switch - Check "fast switch" callback.
+ *
+ * Return 'true' if the ->fast_switch callback is present for the
+ * current driver or 'false' otherwise.
+ */
+bool cpufreq_driver_has_fast_switch(void)
+{
+ return !!cpufreq_driver->fast_switch;
+}
+
/**
* cpufreq_driver_adjust_perf - Adjust CPU performance level in one go.
* @cpu: Target CPU.
@@ -2157,7 +2175,7 @@ EXPORT_SYMBOL_GPL(cpufreq_driver_fast_switch);
* and it is expected to select a suitable performance level equal to or above
* @min_perf and preferably equal to or below @target_perf.
*
- * This function must not be called if policy->fast_switch_enabled is unset.
+ * By default this function takes the fast frequency update path.
*
* Governors calling this function must guarantee that it will never be invoked
* twice in parallel for the same CPU and that it will never be called in
diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
index 2548ec92faa2..007893514c87 100644
--- a/drivers/cpufreq/intel_pstate.c
+++ b/drivers/cpufreq/intel_pstate.c
@@ -2698,8 +2698,6 @@ static int __intel_pstate_cpu_init(struct cpufreq_policy *policy)
intel_pstate_init_acpi_perf_limits(policy);
- policy->fast_switch_possible = true;
-
return 0;
}
@@ -2955,6 +2953,7 @@ static int intel_cpufreq_cpu_init(struct cpufreq_policy *policy)
if (ret)
return ret;
+ policy->fast_switch_possible = true;
policy->cpuinfo.transition_latency = INTEL_CPUFREQ_TRANSITION_LATENCY;
/* This reflects the intel_pstate_get_cpu_pstates() setting. */
policy->cur = policy->cpuinfo.min_freq;
diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h
index 26e2eb399484..7a32cfca26c9 100644
--- a/include/linux/cpufreq.h
+++ b/include/linux/cpufreq.h
@@ -604,6 +604,7 @@ struct cpufreq_governor {
/* Pass a target to the cpufreq driver */
unsigned int cpufreq_driver_fast_switch(struct cpufreq_policy *policy,
unsigned int target_freq);
+bool cpufreq_driver_has_fast_switch(void);
void cpufreq_driver_adjust_perf(unsigned int cpu,
unsigned long min_perf,
unsigned long target_perf,
diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
index e3211455b203..f993ecf731a9 100644
--- a/kernel/sched/cpufreq_schedutil.c
+++ b/kernel/sched/cpufreq_schedutil.c
@@ -776,7 +776,7 @@ static int sugov_start(struct cpufreq_policy *policy)
if (policy_is_shared(policy))
uu = sugov_update_shared;
- else if (policy->fast_switch_enabled && cpufreq_driver_has_adjust_perf())
+ else if (cpufreq_driver_has_adjust_perf())
uu = sugov_update_single_perf;
else
uu = sugov_update_single_freq;
--
2.34.1
On Fri, May 12, 2023 at 08:28:26AM +0000, zhangqiumiao wrote:
> Hello,
>
> We found the following issue using syzkaller on Linux v5.10.0.
5.10.0 is very old and obsolete and over 20 thousand patches old.
Please, if you are testing LTS kernels, use the latest one.
> A similar issue was found in function `paste_selection` before and
> I believe they are the same.
> (https://lore.kernel.org/all/000000000000fe769905d315a1b7@google.com/)
>
> Unfortunately, no one seems to be paying attention to this issue.
Do you have a proposed patch for this fix now that you have a way to
reproduce this? Do you see this in real situations or only in
fault-injection systems running syzbot?
And can you reproduce this on 6.4-rc1? Do you have a reproducer?
thanks,
greg k-h
This is a backport of the CR0.WP KVM series[1] to Linux v6.2. All
commits applied either clean or with only minor changes needed to
account for missing prerequisite patches, e.g. the lack of a
kvm_is_cr0_bit_set() helper for patch 5 or the slightly different
surrounding context in patch 4 (__always_inline vs. plain inline for
to_kvm_vmx()).
I used 'ssdd 10 50000' from rt-tests[2] as a micro-benchmark, running on
a grsecurity L1 VM. Below table shows the results (runtime in seconds,
lower is better):
legacy TDP shadow
Linux v6.2.10 7.61s 7.98s 68.6s
+ patches 3.37s 3.41s 70.2s
The KVM unit test suite showed no regressions.
Please consider applying.
Thanks,
Mathias
[1] https://lore.kernel.org/kvm/20230322013731.102955-1-minipli@grsecurity.net/
[2] https://git.kernel.org/pub/scm/utils/rt-tests/rt-tests.git
Mathias Krause (3):
KVM: x86: Do not unload MMU roots when only toggling CR0.WP with TDP
enabled
KVM: x86: Make use of kvm_read_cr*_bits() when testing bits
KVM: VMX: Make CR0.WP a guest owned bit
Paolo Bonzini (1):
KVM: x86/mmu: Avoid indirect call for get_cr3
Sean Christopherson (1):
KVM: x86/mmu: Refresh CR0.WP prior to checking for emulated permission
faults
arch/x86/kvm/kvm_cache_regs.h | 2 +-
arch/x86/kvm/mmu.h | 26 ++++++++++++++++++-
arch/x86/kvm/mmu/mmu.c | 46 ++++++++++++++++++++++++++--------
arch/x86/kvm/mmu/paging_tmpl.h | 2 +-
arch/x86/kvm/pmu.c | 4 +--
arch/x86/kvm/vmx/nested.c | 4 +--
arch/x86/kvm/vmx/vmx.c | 6 ++---
arch/x86/kvm/vmx/vmx.h | 18 +++++++++++++
arch/x86/kvm/x86.c | 12 +++++++++
9 files changed, 99 insertions(+), 21 deletions(-)
--
2.39.2