From: Len Brown len.brown@intel.com
[ Upstream commit 2734fdbc9bb8a3aeb309ba0d62212d7f53f30bc7 ]
When we are successful in using cpufreq min/max limits, skip setting the raw MSR limits entirely.
This is necessary to avoid undoing any modification that the cpufreq driver makes to our sysfs request.
eg. intel_pstate may take our request for a limit that is valid according to HWP.CAP.MIN/MAX and clip it to be within the range available in PLATFORM_INFO.
Signed-off-by: Len Brown len.brown@intel.com Signed-off-by: Sasha Levin sashal@kernel.org ---
LLM Generated explanations, may be completely bogus:
YES
- What it fixes - Prevents x86_energy_perf_policy from undoing cpufreq/intel_pstate clipping of HWP min/max requests. Previously the tool wrote cpufreq sysfs limits and then also wrote the raw HWP MSR limits, potentially overriding the driver’s adjusted values (e.g., clipping to PLATFORM_INFO). The commit makes the tool prefer the driver’s interpretation when sysfs is used.
- Key changes - Adds a global flag to track sysfs-based limit application: `unsigned char hwp_limits_done_via_sysfs;` (tools/power/x86/x86_energy_perf_policy/x86_energy_perf_policy.c). - Marks sysfs path as authoritative when used: in `update_sysfs(...)`, after writing `scaling_min_freq` and/or `scaling_max_freq`, sets `hwp_limits_done_via_sysfs = 1;` so the driver’s chosen limits take precedence. - Skips raw MSR overwrites if sysfs handled limits: in `update_hwp_request_msr(...)`, assignments to `req.hwp_min` and `req.hwp_max` now check `&& !hwp_limits_done_via_sysfs` before updating. This avoids reprogramming the MSR min/max after the driver has possibly clipped the request via sysfs. - Clarifies intent in comments: explains that intel_pstate may clip values outside PLATFORM_INFO and that driver-chosen values take precedence.
- Why it’s a good stable backport - Real user-visible bug: Without this change, using the tool on HWP systems with intel_pstate can lead to the driver clipping sysfs limits, only for the tool to immediately overwrite min/max in the raw MSR, defeating the driver’s safety/correctness decisions. - Small, contained, low-risk change: Only one userspace file is modified, with ~13 lines touched. No kernel ABI/API changes, no architecture or subsystem redesign. - Behavior is strictly safer: - If cpufreq sysfs is not present or HWP updates aren’t requested, behavior is unchanged. - If cpufreq sysfs is present, the driver’s interpretation (including clipping) is preserved by avoiding the subsequent raw MSR write. - No functional expansion or new features: This is corrective logic to avoid regressions caused by conflicting writes. - Tools-only change: Lives under `tools/power/`, not core kernel code; regression risk to the running kernel is negligible.
- Dependencies and backport notes - Upstream this change lands after a refactor that renamed functions to `read_hwp_request_msr(...)` and `update_hwp_request_msr(...)`. Older stable trees (like your current tree) may still use `read_hwp_request(...)` and `update_hwp_request(...)`. The backport is trivial: add the `hwp_limits_done_via_sysfs` global, set it in `update_sysfs(...)`, and gate the MSR assignments in the (pre- refactor) `update_hwp_request(...)` using the same `&& !hwp_limits_done_via_sysfs` condition. - No additional fixes or reverts appear related to this change in the file’s history, and no “Fixes:”/stable Cc tags are present, but the change clearly matches stable criteria (bugfix with minimal risk).
- Security and regression considerations - Security: None apparent; this reduces the chance of programming unsafe or out-of-range HWP limits by honoring the driver’s clipping. - Regression risk: Low. If sysfs is available, defers to the driver; if not, previous MSR behavior remains. The global flag defaults to zero and is set only when the sysfs path is detected and used.
Conclusion: This is an important, minimal tool fix that prevents incorrect behavior on HWP/intel_pstate systems and should be backported to stable trees, with trivial adaptation for function names in pre- refactor branches.
.../x86_energy_perf_policy/x86_energy_perf_policy.c | 13 +++++++++---- 1 file changed, 9 insertions(+), 4 deletions(-)
diff --git a/tools/power/x86/x86_energy_perf_policy/x86_energy_perf_policy.c b/tools/power/x86/x86_energy_perf_policy/x86_energy_perf_policy.c index 0bda8e3ae7f77..891738116c8b2 100644 --- a/tools/power/x86/x86_energy_perf_policy/x86_energy_perf_policy.c +++ b/tools/power/x86/x86_energy_perf_policy/x86_energy_perf_policy.c @@ -62,6 +62,7 @@ unsigned char turbo_update_value; unsigned char update_hwp_epp; unsigned char update_hwp_min; unsigned char update_hwp_max; +unsigned char hwp_limits_done_via_sysfs; unsigned char update_hwp_desired; unsigned char update_hwp_window; unsigned char update_hwp_use_pkg; @@ -951,8 +952,10 @@ int ratio_2_sysfs_khz(int ratio) } /* * If HWP is enabled and cpufreq sysfs attribtes are present, - * then update sysfs, so that it will not become - * stale when we write to MSRs. + * then update via sysfs. The intel_pstate driver may modify (clip) + * this request, say, when HWP_CAP is outside of PLATFORM_INFO limits, + * and the driver-chosen value takes precidence. + * * (intel_pstate's max_perf_pct and min_perf_pct will follow cpufreq, * so we don't have to touch that.) */ @@ -1007,6 +1010,8 @@ int update_sysfs(int cpu) if (update_hwp_max) update_cpufreq_scaling_freq(1, cpu, req_update.hwp_max);
+ hwp_limits_done_via_sysfs = 1; + return 0; }
@@ -1085,10 +1090,10 @@ int update_hwp_request(int cpu) if (debug) print_hwp_request(cpu, &req, "old: ");
- if (update_hwp_min) + if (update_hwp_min && !hwp_limits_done_via_sysfs) req.hwp_min = req_update.hwp_min;
- if (update_hwp_max) + if (update_hwp_max && !hwp_limits_done_via_sysfs) req.hwp_max = req_update.hwp_max;
if (update_hwp_desired)