Hi all:
The core frequency is subjected to the process variation in semiconductors. Not all cores are able to reach the maximum frequency respecting the infrastructure limits. Consequently, AMD has redefined the concept of maximum frequency of a part. This means that a fraction of cores can reach maximum frequency. To find the best process scheduling policy for a given scenario, OS needs to know the core ordering informed by the platform through highest performance capability register of the CPPC interface.
Earlier implementations of amd-pstate preferred core only support a static core ranking and targeted performance. Now it has the ability to dynamically change the preferred core based on the workload and platform conditions and accounting for thermals and aging.
Amd-pstate driver utilizes the functions and data structures provided by the ITMT architecture to enable the scheduler to favor scheduling on cores which can be get a higher frequency with lower voltage. We call it amd-pstate preferred core.
Here sched_set_itmt_core_prio() is called to set priorities and sched_set_itmt_support() is called to enable ITMT feature. Amd-pstate driver uses the highest performance value to indicate the priority of CPU. The higher value has a higher priority.
Amd-pstate driver will provide an initial core ordering at boot time. It relies on the CPPC interface to communicate the core ranking to the operating system and scheduler to make sure that OS is choosing the cores with highest performance firstly for scheduling the process. When amd-pstate driver receives a message with the highest performance change, it will update the core ranking.
Changes from V11->V12: - all: - - pick up Reviewed-By flag added by Perry. - cpufreq: amd-pstate: - - rebase the latest linux-next and fixed conflicts. - - fixed the issue about cpudata without init in amd_pstate_update_highest_perf().
Changes from V10->V11: - cpufreq: amd-pstate: - - according Perry's commnts, I replace the string with str_enabled_disable().
Changes from V9->V10: - cpufreq: amd-pstate: - - add judgement for highest_perf. When it is less than 255, the preferred core feature is enabled. And it will set the priority. - - deleset "static u32 max_highest_perf" etc, because amd p-state perferred coe does not require specail process for hotpulg.
Changes form V8->V9: - all: - - pick up Tested-By flag added by Oleksandr. - cpufreq: amd-pstate: - - pick up Review-By flag added by Wyes. - - ignore modification of bug. - - add a attribute of prefcore_ranking. - - modify data type conversion from u32 to int. - Documentation: amd-pstate: - - pick up Review-By flag added by Wyes.
Changes form V7->V8: - all: - - pick up Review-By flag added by Mario and Ray. - cpufreq: amd-pstate: - - use hw_prefcore embeds into cpudata structure. - - delete preferred core init from cpu online/off.
Changes form V6->V7: - x86: - - Modify kconfig about X86_AMD_PSTATE. - cpufreq: amd-pstate: - - modify incorrect comments about scheduler_work(). - - convert highest_perf data type. - - modify preferred core init when cpu init and online. - acpi: cppc: - - modify link of CPPC highest performance. - cpufreq: - - modify link of CPPC highest performance changed.
Changes form V5->V6: - cpufreq: amd-pstate: - - modify the wrong tag order. - - modify warning about hw_prefcore sysfs attribute. - - delete duplicate comments. - - modify the variable name cppc_highest_perf to prefcore_ranking. - - modify judgment conditions for setting highest_perf. - - modify sysfs attribute for CPPC highest perf to pr_debug message. - Documentation: amd-pstate: - - modify warning: title underline too short.
Changes form V4->V5: - cpufreq: amd-pstate: - - modify sysfs attribute for CPPC highest perf. - - modify warning about comments - - rebase linux-next - cpufreq: - - Moidfy warning about function declarations. - Documentation: amd-pstate: - - align with ``amd-pstat``
Changes form V3->V4: - Documentation: amd-pstate: - - Modify inappropriate descriptions.
Changes form V2->V3: - x86: - - Modify kconfig and description. - cpufreq: amd-pstate: - - Add Co-developed-by tag in commit message. - cpufreq: - - Modify commit message. - Documentation: amd-pstate: - - Modify inappropriate descriptions.
Changes form V1->V2: - acpi: cppc: - - Add reference link. - cpufreq: - - Moidfy link error. - cpufreq: amd-pstate: - - Init the priorities of all online CPUs - - Use a single variable to represent the status of preferred core. - Documentation: - - Default enabled preferred core. - Documentation: amd-pstate: - - Modify inappropriate descriptions. - - Default enabled preferred core. - - Use a single variable to represent the status of preferred core.
Meng Li (7): x86: Drop CPU_SUP_INTEL from SCHED_MC_PRIO for the expansion. acpi: cppc: Add get the highest performance cppc control cpufreq: amd-pstate: Enable amd-pstate preferred core supporting. cpufreq: Add a notification message that the highest perf has changed cpufreq: amd-pstate: Update amd-pstate preferred core ranking dynamically Documentation: amd-pstate: introduce amd-pstate preferred core Documentation: introduce amd-pstate preferrd core mode kernel command line options
.../admin-guide/kernel-parameters.txt | 5 + Documentation/admin-guide/pm/amd-pstate.rst | 59 +++++- arch/x86/Kconfig | 5 +- drivers/acpi/cppc_acpi.c | 13 ++ drivers/acpi/processor_driver.c | 6 + drivers/cpufreq/amd-pstate.c | 175 +++++++++++++++++- drivers/cpufreq/cpufreq.c | 13 ++ include/acpi/cppc_acpi.h | 5 + include/linux/amd-pstate.h | 10 + include/linux/cpufreq.h | 5 + 10 files changed, 284 insertions(+), 12 deletions(-)
amd-pstate driver also uses SCHED_MC_PRIO, so decouple the requirement of CPU_SUP_INTEL from the dependencies to allow compilation in kernels without Intel CPU support.
Tested-by: Oleksandr Natalenko oleksandr@natalenko.name Reviewed-by: Mario Limonciello mario.limonciello@amd.com Reviewed-by: Huang Rui ray.huang@amd.com Reviewed-by: Perry Yuan perry.yuan@amd.com Signed-off-by: Meng Li li.meng@amd.com --- arch/x86/Kconfig | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 3762f41bb092..3e57773f946a 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -1054,8 +1054,9 @@ config SCHED_MC
config SCHED_MC_PRIO bool "CPU core priorities scheduler support" - depends on SCHED_MC && CPU_SUP_INTEL - select X86_INTEL_PSTATE + depends on SCHED_MC + select X86_INTEL_PSTATE if CPU_SUP_INTEL + select X86_AMD_PSTATE if CPU_SUP_AMD && ACPI select CPU_FREQ default y help
On Tue, Dec 5, 2023 at 7:38 AM Meng Li li.meng@amd.com wrote:
amd-pstate driver also uses SCHED_MC_PRIO, so decouple the requirement of CPU_SUP_INTEL from the dependencies to allow compilation in kernels without Intel CPU support.
Tested-by: Oleksandr Natalenko oleksandr@natalenko.name Reviewed-by: Mario Limonciello mario.limonciello@amd.com Reviewed-by: Huang Rui ray.huang@amd.com Reviewed-by: Perry Yuan perry.yuan@amd.com Signed-off-by: Meng Li li.meng@amd.com
arch/x86/Kconfig | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 3762f41bb092..3e57773f946a 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -1054,8 +1054,9 @@ config SCHED_MC
config SCHED_MC_PRIO bool "CPU core priorities scheduler support"
depends on SCHED_MC && CPU_SUP_INTEL
select X86_INTEL_PSTATE
depends on SCHED_MC
select X86_INTEL_PSTATE if CPU_SUP_INTEL
select X86_AMD_PSTATE if CPU_SUP_AMD && ACPI select CPU_FREQ default y help
--
This needs an ACK from the x86 maintainers.
On Tue, Dec 05, 2023 at 02:35:31PM +0800, Meng Li wrote:
amd-pstate driver also uses SCHED_MC_PRIO, so decouple the requirement of CPU_SUP_INTEL from the dependencies to allow compilation in kernels without Intel CPU support.
Tested-by: Oleksandr Natalenko oleksandr@natalenko.name Reviewed-by: Mario Limonciello mario.limonciello@amd.com Reviewed-by: Huang Rui ray.huang@amd.com Reviewed-by: Perry Yuan perry.yuan@amd.com Signed-off-by: Meng Li li.meng@amd.com
arch/x86/Kconfig | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 3762f41bb092..3e57773f946a 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -1054,8 +1054,9 @@ config SCHED_MC config SCHED_MC_PRIO bool "CPU core priorities scheduler support"
- depends on SCHED_MC && CPU_SUP_INTEL
- select X86_INTEL_PSTATE
- depends on SCHED_MC
- select X86_INTEL_PSTATE if CPU_SUP_INTEL
- select X86_AMD_PSTATE if CPU_SUP_AMD && ACPI select CPU_FREQ default y help
--
I was gonna ask why the selects but apparently mingo wants SCHED_MC_PRIO to be selectable easier:
0a21fc1214a2 ("sched/x86: Make CONFIG_SCHED_MC_PRIO=y easier to enable")
So,
Acked-by: Borislav Petkov (AMD) bp@alien8.de
Thx.
[AMD Official Use Only - General]
Hi Petkov:
-----Original Message----- From: Borislav Petkov bp@alien8.de Sent: Tuesday, January 9, 2024 6:45 PM To: Meng, Li (Jassmine) Li.Meng@amd.com Cc: Rafael J . Wysocki rafael.j.wysocki@intel.com; Huang, Ray Ray.Huang@amd.com; linux-pm@vger.kernel.org; linux- kernel@vger.kernel.org; x86@kernel.org; linux-acpi@vger.kernel.org; Shuah Khan skhan@linuxfoundation.org; linux-kselftest@vger.kernel.org; Fontenot, Nathan Nathan.Fontenot@amd.com; Sharma, Deepak Deepak.Sharma@amd.com; Deucher, Alexander Alexander.Deucher@amd.com; Limonciello, Mario Mario.Limonciello@amd.com; Huang, Shimmer Shimmer.Huang@amd.com; Yuan, Perry Perry.Yuan@amd.com; Du, Xiaojian Xiaojian.Du@amd.com; Viresh Kumar viresh.kumar@linaro.org; Oleksandr Natalenko oleksandr@natalenko.name Subject: Re: [PATCH V12 1/7] x86: Drop CPU_SUP_INTEL from SCHED_MC_PRIO for the expansion.
Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding.
On Tue, Dec 05, 2023 at 02:35:31PM +0800, Meng Li wrote:
amd-pstate driver also uses SCHED_MC_PRIO, so decouple the
requirement
of CPU_SUP_INTEL from the dependencies to allow compilation in kernels without Intel CPU support.
Tested-by: Oleksandr Natalenko oleksandr@natalenko.name Reviewed-by: Mario Limonciello mario.limonciello@amd.com Reviewed-by: Huang Rui ray.huang@amd.com Reviewed-by: Perry Yuan perry.yuan@amd.com Signed-off-by: Meng Li li.meng@amd.com
arch/x86/Kconfig | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 3762f41bb092..3e57773f946a 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -1054,8 +1054,9 @@ config SCHED_MC
config SCHED_MC_PRIO bool "CPU core priorities scheduler support"
depends on SCHED_MC && CPU_SUP_INTEL
select X86_INTEL_PSTATE
depends on SCHED_MC
select X86_INTEL_PSTATE if CPU_SUP_INTEL
select X86_AMD_PSTATE if CPU_SUP_AMD && ACPI select CPU_FREQ default y help
--
I was gonna ask why the selects but apparently mingo wants SCHED_MC_PRIO to be selectable easier:
0a21fc1214a2 ("sched/x86: Make CONFIG_SCHED_MC_PRIO=y easier to enable")
[Meng, Li (Jassmine)] Thank you for your feedback. The reason why I added the selects is just to distinguish different pstate drivers. These two drivers cannot be supported simultaneously in the same project.
So,
Acked-by: Borislav Petkov (AMD) bp@alien8.de
Thx.
-- Regards/Gruss, Boris.
On Wed, Jan 10, 2024 at 06:59:25AM +0000, Meng, Li (Jassmine) wrote:
The reason why I added the selects is just to distinguish different pstate drivers. These two drivers cannot be supported simultaneously in the same project.
No, that's not what I meant. Read here:
"- reverse dependencies: "select" <symbol> ["if" <expr>]
While normal dependencies reduce the upper limit of a symbol (see below), reverse dependencies can be used to force a lower limit of another symbol. The value of the current menu symbol is used as the minimal value <symbol> can be set to. If <symbol> is selected multiple times, the limit is set to the largest selection. Reverse dependencies can only be used with boolean or tristate symbols.
Note: select should be used with care. select will force a symbol to a value without visiting the dependencies. By abusing select you are able to select a symbol FOO even if FOO depends on BAR that is not set. In general use select only for non-visible symbols (no prompts anywhere) and for symbols with no dependencies. That will limit the usefulness but on the other hand avoid the illegal configurations all over."
From Documentation/kbuild/kconfig-language.rst
[AMD Official Use Only - General]
Hi Petkov:
-----Original Message----- From: Borislav Petkov bp@alien8.de Sent: Wednesday, January 10, 2024 6:04 PM To: Meng, Li (Jassmine) Li.Meng@amd.com Cc: Rafael J . Wysocki rafael.j.wysocki@intel.com; Huang, Ray Ray.Huang@amd.com; linux-pm@vger.kernel.org; linux- kernel@vger.kernel.org; x86@kernel.org; linux-acpi@vger.kernel.org; Shuah Khan skhan@linuxfoundation.org; linux-kselftest@vger.kernel.org; Fontenot, Nathan Nathan.Fontenot@amd.com; Sharma, Deepak Deepak.Sharma@amd.com; Deucher, Alexander Alexander.Deucher@amd.com; Limonciello, Mario Mario.Limonciello@amd.com; Huang, Shimmer Shimmer.Huang@amd.com; Yuan, Perry Perry.Yuan@amd.com; Du, Xiaojian Xiaojian.Du@amd.com; Viresh Kumar viresh.kumar@linaro.org; Oleksandr Natalenko oleksandr@natalenko.name Subject: Re: [PATCH V12 1/7] x86: Drop CPU_SUP_INTEL from SCHED_MC_PRIO for the expansion.
Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding.
On Wed, Jan 10, 2024 at 06:59:25AM +0000, Meng, Li (Jassmine) wrote:
The reason why I added the selects is just to distinguish different pstate drivers. These two drivers cannot be supported simultaneously in the same project.
No, that's not what I meant. Read here:
"- reverse dependencies: "select" <symbol> ["if" <expr>]
While normal dependencies reduce the upper limit of a symbol (see below), reverse dependencies can be used to force a lower limit of another symbol. The value of the current menu symbol is used as the minimal value <symbol> can be set to. If <symbol> is selected multiple times, the limit is set to the largest selection. Reverse dependencies can only be used with boolean or tristate symbols.
Note: select should be used with care. select will force a symbol to a value without visiting the dependencies. By abusing select you are able to select a symbol FOO even if FOO depends on BAR that is not set. In general use select only for non-visible symbols (no prompts anywhere) and for symbols with no dependencies. That will limit the usefulness but on the other hand avoid the illegal configurations all over."
From Documentation/kbuild/kconfig-language.rst
[Meng, Li (Jassmine)] Thanks a lot. I will modify it soon.
-- Regards/Gruss, Boris.
On Thu, Jan 11, 2024 at 08:10:48AM +0000, Meng, Li (Jassmine) wrote:
I will modify it soon.
No, don't modify it, don't do anything. Please read the whole thread again.
Add support for getting the highest performance to the generic CPPC driver. This enables downstream drivers such as amd-pstate to discover and use these values.
Please refer to the ACPI_Spec for details on continuous performance control of CPPC.
Tested-by: Oleksandr Natalenko oleksandr@natalenko.name Reviewed-by: Mario Limonciello mario.limonciello@amd.com Reviewed-by: Wyes Karny wyes.karny@amd.com Reviewed-by: Perry Yuan perry.yuan@amd.com Acked-by: Huang Rui ray.huang@amd.com Signed-off-by: Meng Li li.meng@amd.com Link: https://uefi.org/specs/ACPI/6.5/08_Processor_Configuration_and_Control.html?... --- drivers/acpi/cppc_acpi.c | 13 +++++++++++++ include/acpi/cppc_acpi.h | 5 +++++ 2 files changed, 18 insertions(+)
diff --git a/drivers/acpi/cppc_acpi.c b/drivers/acpi/cppc_acpi.c index 7ff269a78c20..ad388a0e8484 100644 --- a/drivers/acpi/cppc_acpi.c +++ b/drivers/acpi/cppc_acpi.c @@ -1154,6 +1154,19 @@ int cppc_get_nominal_perf(int cpunum, u64 *nominal_perf) return cppc_get_perf(cpunum, NOMINAL_PERF, nominal_perf); }
+/** + * cppc_get_highest_perf - Get the highest performance register value. + * @cpunum: CPU from which to get highest performance. + * @highest_perf: Return address. + * + * Return: 0 for success, -EIO otherwise. + */ +int cppc_get_highest_perf(int cpunum, u64 *highest_perf) +{ + return cppc_get_perf(cpunum, HIGHEST_PERF, highest_perf); +} +EXPORT_SYMBOL_GPL(cppc_get_highest_perf); + /** * cppc_get_epp_perf - Get the epp register value. * @cpunum: CPU from which to get epp preference value. diff --git a/include/acpi/cppc_acpi.h b/include/acpi/cppc_acpi.h index 6126c977ece0..c0b69ffe7bdb 100644 --- a/include/acpi/cppc_acpi.h +++ b/include/acpi/cppc_acpi.h @@ -139,6 +139,7 @@ struct cppc_cpudata { #ifdef CONFIG_ACPI_CPPC_LIB extern int cppc_get_desired_perf(int cpunum, u64 *desired_perf); extern int cppc_get_nominal_perf(int cpunum, u64 *nominal_perf); +extern int cppc_get_highest_perf(int cpunum, u64 *highest_perf); extern int cppc_get_perf_ctrs(int cpu, struct cppc_perf_fb_ctrs *perf_fb_ctrs); extern int cppc_set_perf(int cpu, struct cppc_perf_ctrls *perf_ctrls); extern int cppc_set_enable(int cpu, bool enable); @@ -165,6 +166,10 @@ static inline int cppc_get_nominal_perf(int cpunum, u64 *nominal_perf) { return -ENOTSUPP; } +static inline int cppc_get_highest_perf(int cpunum, u64 *highest_perf) +{ + return -ENOTSUPP; +} static inline int cppc_get_perf_ctrs(int cpu, struct cppc_perf_fb_ctrs *perf_fb_ctrs) { return -ENOTSUPP;
Please spell ACPI and CPPC in capitals in the subject.
On Tue, Dec 5, 2023 at 7:38 AM Meng Li li.meng@amd.com wrote:
Add support for getting the highest performance to the generic CPPC driver. This enables downstream drivers such as amd-pstate to discover and use these values.
Please refer to the ACPI_Spec for details on continuous performance control of CPPC.
So which section of the spec is the reader supposed to refer to?
Tested-by: Oleksandr Natalenko oleksandr@natalenko.name Reviewed-by: Mario Limonciello mario.limonciello@amd.com Reviewed-by: Wyes Karny wyes.karny@amd.com Reviewed-by: Perry Yuan perry.yuan@amd.com Acked-by: Huang Rui ray.huang@amd.com Signed-off-by: Meng Li li.meng@amd.com Link: https://uefi.org/specs/ACPI/6.5/08_Processor_Configuration_and_Control.html?...
drivers/acpi/cppc_acpi.c | 13 +++++++++++++ include/acpi/cppc_acpi.h | 5 +++++ 2 files changed, 18 insertions(+)
diff --git a/drivers/acpi/cppc_acpi.c b/drivers/acpi/cppc_acpi.c index 7ff269a78c20..ad388a0e8484 100644 --- a/drivers/acpi/cppc_acpi.c +++ b/drivers/acpi/cppc_acpi.c @@ -1154,6 +1154,19 @@ int cppc_get_nominal_perf(int cpunum, u64 *nominal_perf) return cppc_get_perf(cpunum, NOMINAL_PERF, nominal_perf); }
+/**
- cppc_get_highest_perf - Get the highest performance register value.
- @cpunum: CPU from which to get highest performance.
- @highest_perf: Return address.
- Return: 0 for success, -EIO otherwise.
- */
+int cppc_get_highest_perf(int cpunum, u64 *highest_perf) +{
return cppc_get_perf(cpunum, HIGHEST_PERF, highest_perf);
+} +EXPORT_SYMBOL_GPL(cppc_get_highest_perf);
/**
- cppc_get_epp_perf - Get the epp register value.
- @cpunum: CPU from which to get epp preference value.
diff --git a/include/acpi/cppc_acpi.h b/include/acpi/cppc_acpi.h index 6126c977ece0..c0b69ffe7bdb 100644 --- a/include/acpi/cppc_acpi.h +++ b/include/acpi/cppc_acpi.h @@ -139,6 +139,7 @@ struct cppc_cpudata { #ifdef CONFIG_ACPI_CPPC_LIB extern int cppc_get_desired_perf(int cpunum, u64 *desired_perf); extern int cppc_get_nominal_perf(int cpunum, u64 *nominal_perf); +extern int cppc_get_highest_perf(int cpunum, u64 *highest_perf); extern int cppc_get_perf_ctrs(int cpu, struct cppc_perf_fb_ctrs *perf_fb_ctrs); extern int cppc_set_perf(int cpu, struct cppc_perf_ctrls *perf_ctrls); extern int cppc_set_enable(int cpu, bool enable); @@ -165,6 +166,10 @@ static inline int cppc_get_nominal_perf(int cpunum, u64 *nominal_perf) { return -ENOTSUPP; } +static inline int cppc_get_highest_perf(int cpunum, u64 *highest_perf) +{
return -ENOTSUPP;
+} static inline int cppc_get_perf_ctrs(int cpu, struct cppc_perf_fb_ctrs *perf_fb_ctrs) { return -ENOTSUPP; --
amd-pstate driver utilizes the functions and data structures provided by the ITMT architecture to enable the scheduler to favor scheduling on cores which can be get a higher frequency with lower voltage. We call it amd-pstate preferrred core.
Here sched_set_itmt_core_prio() is called to set priorities and sched_set_itmt_support() is called to enable ITMT feature. amd-pstate driver uses the highest performance value to indicate the priority of CPU. The higher value has a higher priority.
The initial core rankings are set up by amd-pstate when the system boots.
Add a variable hw_prefcore in cpudata structure. It will check if the processor and power firmware support preferred core feature.
Add one new early parameter `disable` to allow user to disable the preferred core.
Only when hardware supports preferred core and user set `enabled` in early parameter, amd pstate driver supports preferred core featue.
Tested-by: Oleksandr Natalenko oleksandr@natalenko.name Reviewed-by: Huang Rui ray.huang@amd.com Reviewed-by: Wyes Karny wyes.karny@amd.com Reviewed-by: Mario Limonciello mario.limonciello@amd.com Co-developed-by: Perry Yuan Perry.Yuan@amd.com Signed-off-by: Perry Yuan Perry.Yuan@amd.com Signed-off-by: Meng Li li.meng@amd.com --- drivers/cpufreq/amd-pstate.c | 131 ++++++++++++++++++++++++++++++++--- include/linux/amd-pstate.h | 4 ++ 2 files changed, 127 insertions(+), 8 deletions(-)
diff --git a/drivers/cpufreq/amd-pstate.c b/drivers/cpufreq/amd-pstate.c index 1f6186475715..9c2790753f99 100644 --- a/drivers/cpufreq/amd-pstate.c +++ b/drivers/cpufreq/amd-pstate.c @@ -37,6 +37,7 @@ #include <linux/uaccess.h> #include <linux/static_call.h> #include <linux/amd-pstate.h> +#include <linux/topology.h>
#include <acpi/processor.h> #include <acpi/cppc_acpi.h> @@ -49,6 +50,7 @@
#define AMD_PSTATE_TRANSITION_LATENCY 20000 #define AMD_PSTATE_TRANSITION_DELAY 1000 +#define AMD_PSTATE_PREFCORE_THRESHOLD 166
/* * TODO: We need more time to fine tune processors with shared memory solution @@ -64,6 +66,7 @@ static struct cpufreq_driver amd_pstate_driver; static struct cpufreq_driver amd_pstate_epp_driver; static int cppc_state = AMD_PSTATE_UNDEFINED; static bool cppc_enabled; +static bool amd_pstate_prefcore = true;
/* * AMD Energy Preference Performance (EPP) @@ -297,13 +300,14 @@ static int pstate_init_perf(struct amd_cpudata *cpudata) if (ret) return ret;
- /* - * TODO: Introduce AMD specific power feature. - * - * CPPC entry doesn't indicate the highest performance in some ASICs. + /* For platforms that do not support the preferred core feature, the + * highest_pef may be configured with 166 or 255, to avoid max frequency + * calculated wrongly. we take the AMD_CPPC_HIGHEST_PERF(cap1) value as + * the default max perf. */ - highest_perf = amd_get_highest_perf(); - if (highest_perf > AMD_CPPC_HIGHEST_PERF(cap1)) + if (cpudata->hw_prefcore) + highest_perf = AMD_PSTATE_PREFCORE_THRESHOLD; + else highest_perf = AMD_CPPC_HIGHEST_PERF(cap1);
WRITE_ONCE(cpudata->highest_perf, highest_perf); @@ -324,8 +328,9 @@ static int cppc_init_perf(struct amd_cpudata *cpudata) if (ret) return ret;
- highest_perf = amd_get_highest_perf(); - if (highest_perf > cppc_perf.highest_perf) + if (cpudata->hw_prefcore) + highest_perf = AMD_PSTATE_PREFCORE_THRESHOLD; + else highest_perf = cppc_perf.highest_perf;
WRITE_ONCE(cpudata->highest_perf, highest_perf); @@ -706,6 +711,80 @@ static void amd_perf_ctl_reset(unsigned int cpu) wrmsrl_on_cpu(cpu, MSR_AMD_PERF_CTL, 0); }
+/* + * Set amd-pstate preferred core enable can't be done directly from cpufreq callbacks + * due to locking, so queue the work for later. + */ +static void amd_pstste_sched_prefcore_workfn(struct work_struct *work) +{ + sched_set_itmt_support(); +} +static DECLARE_WORK(sched_prefcore_work, amd_pstste_sched_prefcore_workfn); + +/* + * Get the highest performance register value. + * @cpu: CPU from which to get highest performance. + * @highest_perf: Return address. + * + * Return: 0 for success, -EIO otherwise. + */ +static int amd_pstate_get_highest_perf(int cpu, u32 *highest_perf) +{ + int ret; + + if (boot_cpu_has(X86_FEATURE_CPPC)) { + u64 cap1; + + ret = rdmsrl_safe_on_cpu(cpu, MSR_AMD_CPPC_CAP1, &cap1); + if (ret) + return ret; + WRITE_ONCE(*highest_perf, AMD_CPPC_HIGHEST_PERF(cap1)); + } else { + u64 cppc_highest_perf; + + ret = cppc_get_highest_perf(cpu, &cppc_highest_perf); + if (ret) + return ret; + WRITE_ONCE(*highest_perf, cppc_highest_perf); + } + + return (ret); +} + +#define CPPC_MAX_PERF U8_MAX + +static void amd_pstate_init_prefcore(struct amd_cpudata *cpudata) +{ + int ret, prio; + u32 highest_perf; + + ret = amd_pstate_get_highest_perf(cpudata->cpu, &highest_perf); + if (ret) + return; + + cpudata->hw_prefcore = true; + /* check if CPPC preferred core feature is enabled*/ + if (highest_perf < CPPC_MAX_PERF) + prio = (int)highest_perf; + else { + pr_debug("AMD CPPC preferred core is unsupported!\n"); + cpudata->hw_prefcore = false; + return; + } + + if (!amd_pstate_prefcore) + return; + + /* + * The priorities can be set regardless of whether or not + * sched_set_itmt_support(true) has been called and it is valid to + * update them at any time after it has been called. + */ + sched_set_itmt_core_prio(prio, cpudata->cpu); + + schedule_work(&sched_prefcore_work); +} + static int amd_pstate_cpu_init(struct cpufreq_policy *policy) { int min_freq, max_freq, nominal_freq, lowest_nonlinear_freq, ret; @@ -727,6 +806,8 @@ static int amd_pstate_cpu_init(struct cpufreq_policy *policy)
cpudata->cpu = policy->cpu;
+ amd_pstate_init_prefcore(cpudata); + ret = amd_pstate_init_perf(cpudata); if (ret) goto free_cpudata1; @@ -877,6 +958,17 @@ static ssize_t show_amd_pstate_highest_perf(struct cpufreq_policy *policy, return sysfs_emit(buf, "%u\n", perf); }
+static ssize_t show_amd_pstate_hw_prefcore(struct cpufreq_policy *policy, + char *buf) +{ + bool hw_prefcore; + struct amd_cpudata *cpudata = policy->driver_data; + + hw_prefcore = READ_ONCE(cpudata->hw_prefcore); + + return sysfs_emit(buf, "%s\n", str_enabled_disabled(hw_prefcore)); +} + static ssize_t show_energy_performance_available_preferences( struct cpufreq_policy *policy, char *buf) { @@ -1074,18 +1166,27 @@ static ssize_t status_store(struct device *a, struct device_attribute *b, return ret < 0 ? ret : count; }
+static ssize_t prefcore_show(struct device *dev, + struct device_attribute *attr, char *buf) +{ + return sysfs_emit(buf, "%s\n", str_enabled_disabled(amd_pstate_prefcore)); +} + cpufreq_freq_attr_ro(amd_pstate_max_freq); cpufreq_freq_attr_ro(amd_pstate_lowest_nonlinear_freq);
cpufreq_freq_attr_ro(amd_pstate_highest_perf); +cpufreq_freq_attr_ro(amd_pstate_hw_prefcore); cpufreq_freq_attr_rw(energy_performance_preference); cpufreq_freq_attr_ro(energy_performance_available_preferences); static DEVICE_ATTR_RW(status); +static DEVICE_ATTR_RO(prefcore);
static struct freq_attr *amd_pstate_attr[] = { &amd_pstate_max_freq, &amd_pstate_lowest_nonlinear_freq, &amd_pstate_highest_perf, + &amd_pstate_hw_prefcore, NULL, };
@@ -1093,6 +1194,7 @@ static struct freq_attr *amd_pstate_epp_attr[] = { &amd_pstate_max_freq, &amd_pstate_lowest_nonlinear_freq, &amd_pstate_highest_perf, + &amd_pstate_hw_prefcore, &energy_performance_preference, &energy_performance_available_preferences, NULL, @@ -1100,6 +1202,7 @@ static struct freq_attr *amd_pstate_epp_attr[] = {
static struct attribute *pstate_global_attributes[] = { &dev_attr_status.attr, + &dev_attr_prefcore.attr, NULL };
@@ -1151,6 +1254,8 @@ static int amd_pstate_epp_cpu_init(struct cpufreq_policy *policy) cpudata->cpu = policy->cpu; cpudata->epp_policy = 0;
+ amd_pstate_init_prefcore(cpudata); + ret = amd_pstate_init_perf(cpudata); if (ret) goto free_cpudata1; @@ -1568,7 +1673,17 @@ static int __init amd_pstate_param(char *str)
return amd_pstate_set_driver(mode_idx); } + +static int __init amd_prefcore_param(char *str) +{ + if (!strcmp(str, "disable")) + amd_pstate_prefcore = false; + + return 0; +} + early_param("amd_pstate", amd_pstate_param); +early_param("amd_prefcore", amd_prefcore_param);
MODULE_AUTHOR("Huang Rui ray.huang@amd.com"); MODULE_DESCRIPTION("AMD Processor P-state Frequency Driver"); diff --git a/include/linux/amd-pstate.h b/include/linux/amd-pstate.h index 6ad02ad9c7b4..68fc1bd8d851 100644 --- a/include/linux/amd-pstate.h +++ b/include/linux/amd-pstate.h @@ -52,6 +52,9 @@ struct amd_aperf_mperf { * @prev: Last Aperf/Mperf/tsc count value read from register * @freq: current cpu frequency value * @boost_supported: check whether the Processor or SBIOS supports boost mode + * @hw_prefcore: check whether HW supports preferred core featue. + * Only when hw_prefcore and early prefcore param are true, + * AMD P-State driver supports preferred core featue. * @epp_policy: Last saved policy used to set energy-performance preference * @epp_cached: Cached CPPC energy-performance preference value * @policy: Cpufreq policy value @@ -85,6 +88,7 @@ struct amd_cpudata {
u64 freq; bool boost_supported; + bool hw_prefcore;
/* EPP feature related attributes*/ s16 epp_policy;
ACPI 6.5 section 8.4.6.1.1.1 specifies that Notify event 0x85 can be emmitted to cause the the OSPM to re-evaluate the highest performance register. Add support for this event.
Tested-by: Oleksandr Natalenko oleksandr@natalenko.name Reviewed-by: Mario Limonciello mario.limonciello@amd.com Reviewed-by: Huang Rui ray.huang@amd.com Reviewed-by: Perry Yuan perry.yuan@amd.com Signed-off-by: Meng Li li.meng@amd.com Link: https://uefi.org/specs/ACPI/6.5/05_ACPI_Software_Programming_Model.html#proc... --- drivers/acpi/processor_driver.c | 6 ++++++ drivers/cpufreq/cpufreq.c | 13 +++++++++++++ include/linux/cpufreq.h | 5 +++++ 3 files changed, 24 insertions(+)
diff --git a/drivers/acpi/processor_driver.c b/drivers/acpi/processor_driver.c index 4bd16b3f0781..29b2fb68a35d 100644 --- a/drivers/acpi/processor_driver.c +++ b/drivers/acpi/processor_driver.c @@ -27,6 +27,7 @@ #define ACPI_PROCESSOR_NOTIFY_PERFORMANCE 0x80 #define ACPI_PROCESSOR_NOTIFY_POWER 0x81 #define ACPI_PROCESSOR_NOTIFY_THROTTLING 0x82 +#define ACPI_PROCESSOR_NOTIFY_HIGEST_PERF_CHANGED 0x85
MODULE_AUTHOR("Paul Diefenbaugh"); MODULE_DESCRIPTION("ACPI Processor Driver"); @@ -83,6 +84,11 @@ static void acpi_processor_notify(acpi_handle handle, u32 event, void *data) acpi_bus_generate_netlink_event(device->pnp.device_class, dev_name(&device->dev), event, 0); break; + case ACPI_PROCESSOR_NOTIFY_HIGEST_PERF_CHANGED: + cpufreq_update_highest_perf(pr->id); + acpi_bus_generate_netlink_event(device->pnp.device_class, + dev_name(&device->dev), event, 0); + break; default: acpi_handle_debug(handle, "Unsupported event [0x%x]\n", event); break; diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c index 934d35f570b7..14a4cbc6dd05 100644 --- a/drivers/cpufreq/cpufreq.c +++ b/drivers/cpufreq/cpufreq.c @@ -2717,6 +2717,19 @@ void cpufreq_update_limits(unsigned int cpu) } EXPORT_SYMBOL_GPL(cpufreq_update_limits);
+/** + * cpufreq_update_highest_perf - Update highest performance for a given CPU. + * @cpu: CPU to update the highest performance for. + * + * Invoke the driver's ->update_highest_perf callback if present + */ +void cpufreq_update_highest_perf(unsigned int cpu) +{ + if (cpufreq_driver->update_highest_perf) + cpufreq_driver->update_highest_perf(cpu); +} +EXPORT_SYMBOL_GPL(cpufreq_update_highest_perf); + /********************************************************************* * BOOST * *********************************************************************/ diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h index 1c5ca92a0555..f62257b2a42f 100644 --- a/include/linux/cpufreq.h +++ b/include/linux/cpufreq.h @@ -235,6 +235,7 @@ int cpufreq_get_policy(struct cpufreq_policy *policy, unsigned int cpu); void refresh_frequency_limits(struct cpufreq_policy *policy); void cpufreq_update_policy(unsigned int cpu); void cpufreq_update_limits(unsigned int cpu); +void cpufreq_update_highest_perf(unsigned int cpu); bool have_governor_per_policy(void); bool cpufreq_supports_freq_invariance(void); struct kobject *get_governor_parent_kobj(struct cpufreq_policy *policy); @@ -263,6 +264,7 @@ static inline bool cpufreq_supports_freq_invariance(void) return false; } static inline void disable_cpufreq(void) { } +static inline void cpufreq_update_highest_perf(unsigned int cpu) { } #endif
#ifdef CONFIG_CPU_FREQ_STAT @@ -380,6 +382,9 @@ struct cpufreq_driver { /* Called to update policy limits on firmware notifications. */ void (*update_limits)(unsigned int cpu);
+ /* Called to update highest performance on firmware notifications. */ + void (*update_highest_perf)(unsigned int cpu); + /* optional */ int (*bios_limit)(int cpu, unsigned int *limit);
On Tue, Dec 5, 2023 at 7:38 AM Meng Li li.meng@amd.com wrote:
ACPI 6.5 section 8.4.6.1.1.1 specifies that Notify event 0x85 can be emmitted to cause the the OSPM to re-evaluate the highest performance
Typos above. Given the number of iterations of this patch, this is kind of disappointing.
register. Add support for this event.
Also it would be nice to describe how this is supposed to work at least roughly, so it is not necessary to reverse-engineer the patch to find out that.
Tested-by: Oleksandr Natalenko oleksandr@natalenko.name Reviewed-by: Mario Limonciello mario.limonciello@amd.com Reviewed-by: Huang Rui ray.huang@amd.com Reviewed-by: Perry Yuan perry.yuan@amd.com Signed-off-by: Meng Li li.meng@amd.com Link: https://uefi.org/specs/ACPI/6.5/05_ACPI_Software_Programming_Model.html#proc...
drivers/acpi/processor_driver.c | 6 ++++++ drivers/cpufreq/cpufreq.c | 13 +++++++++++++ include/linux/cpufreq.h | 5 +++++ 3 files changed, 24 insertions(+)
diff --git a/drivers/acpi/processor_driver.c b/drivers/acpi/processor_driver.c index 4bd16b3f0781..29b2fb68a35d 100644 --- a/drivers/acpi/processor_driver.c +++ b/drivers/acpi/processor_driver.c @@ -27,6 +27,7 @@ #define ACPI_PROCESSOR_NOTIFY_PERFORMANCE 0x80 #define ACPI_PROCESSOR_NOTIFY_POWER 0x81 #define ACPI_PROCESSOR_NOTIFY_THROTTLING 0x82 +#define ACPI_PROCESSOR_NOTIFY_HIGEST_PERF_CHANGED 0x85
MODULE_AUTHOR("Paul Diefenbaugh"); MODULE_DESCRIPTION("ACPI Processor Driver"); @@ -83,6 +84,11 @@ static void acpi_processor_notify(acpi_handle handle, u32 event, void *data) acpi_bus_generate_netlink_event(device->pnp.device_class, dev_name(&device->dev), event, 0); break;
case ACPI_PROCESSOR_NOTIFY_HIGEST_PERF_CHANGED:
cpufreq_update_highest_perf(pr->id);
And the design appears to be a bit ad-hoc here.
Because why does it have anything to do with cpufreq?
acpi_bus_generate_netlink_event(device->pnp.device_class,
dev_name(&device->dev), event, 0);
break; default: acpi_handle_debug(handle, "Unsupported event [0x%x]\n", event); break;
diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c index 934d35f570b7..14a4cbc6dd05 100644 --- a/drivers/cpufreq/cpufreq.c +++ b/drivers/cpufreq/cpufreq.c @@ -2717,6 +2717,19 @@ void cpufreq_update_limits(unsigned int cpu) } EXPORT_SYMBOL_GPL(cpufreq_update_limits);
+/**
- cpufreq_update_highest_perf - Update highest performance for a given CPU.
- @cpu: CPU to update the highest performance for.
- Invoke the driver's ->update_highest_perf callback if present
- */
+void cpufreq_update_highest_perf(unsigned int cpu) +{
if (cpufreq_driver->update_highest_perf)
cpufreq_driver->update_highest_perf(cpu);
+} +EXPORT_SYMBOL_GPL(cpufreq_update_highest_perf);
/*********************************************************************
BOOST *
*********************************************************************/ diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h index 1c5ca92a0555..f62257b2a42f 100644 --- a/include/linux/cpufreq.h +++ b/include/linux/cpufreq.h @@ -235,6 +235,7 @@ int cpufreq_get_policy(struct cpufreq_policy *policy, unsigned int cpu); void refresh_frequency_limits(struct cpufreq_policy *policy); void cpufreq_update_policy(unsigned int cpu); void cpufreq_update_limits(unsigned int cpu); +void cpufreq_update_highest_perf(unsigned int cpu); bool have_governor_per_policy(void); bool cpufreq_supports_freq_invariance(void); struct kobject *get_governor_parent_kobj(struct cpufreq_policy *policy); @@ -263,6 +264,7 @@ static inline bool cpufreq_supports_freq_invariance(void) return false; } static inline void disable_cpufreq(void) { } +static inline void cpufreq_update_highest_perf(unsigned int cpu) { } #endif
#ifdef CONFIG_CPU_FREQ_STAT @@ -380,6 +382,9 @@ struct cpufreq_driver { /* Called to update policy limits on firmware notifications. */ void (*update_limits)(unsigned int cpu);
/* Called to update highest performance on firmware notifications. */
void (*update_highest_perf)(unsigned int cpu);
/* optional */ int (*bios_limit)(int cpu, unsigned int *limit);
-- 2.34.1
On Wed, Dec 6, 2023 at 9:58 PM Rafael J. Wysocki rafael@kernel.org wrote:
On Tue, Dec 5, 2023 at 7:38 AM Meng Li li.meng@amd.com wrote:
ACPI 6.5 section 8.4.6.1.1.1 specifies that Notify event 0x85 can be emmitted to cause the the OSPM to re-evaluate the highest performance
Typos above. Given the number of iterations of this patch, this is kind of disappointing.
register. Add support for this event.
Also it would be nice to describe how this is supposed to work at least roughly, so it is not necessary to reverse-engineer the patch to find out that.
Tested-by: Oleksandr Natalenko oleksandr@natalenko.name Reviewed-by: Mario Limonciello mario.limonciello@amd.com Reviewed-by: Huang Rui ray.huang@amd.com Reviewed-by: Perry Yuan perry.yuan@amd.com Signed-off-by: Meng Li li.meng@amd.com Link: https://uefi.org/specs/ACPI/6.5/05_ACPI_Software_Programming_Model.html#proc...
drivers/acpi/processor_driver.c | 6 ++++++ drivers/cpufreq/cpufreq.c | 13 +++++++++++++ include/linux/cpufreq.h | 5 +++++ 3 files changed, 24 insertions(+)
diff --git a/drivers/acpi/processor_driver.c b/drivers/acpi/processor_driver.c index 4bd16b3f0781..29b2fb68a35d 100644 --- a/drivers/acpi/processor_driver.c +++ b/drivers/acpi/processor_driver.c @@ -27,6 +27,7 @@ #define ACPI_PROCESSOR_NOTIFY_PERFORMANCE 0x80 #define ACPI_PROCESSOR_NOTIFY_POWER 0x81 #define ACPI_PROCESSOR_NOTIFY_THROTTLING 0x82 +#define ACPI_PROCESSOR_NOTIFY_HIGEST_PERF_CHANGED 0x85
MODULE_AUTHOR("Paul Diefenbaugh"); MODULE_DESCRIPTION("ACPI Processor Driver"); @@ -83,6 +84,11 @@ static void acpi_processor_notify(acpi_handle handle, u32 event, void *data) acpi_bus_generate_netlink_event(device->pnp.device_class, dev_name(&device->dev), event, 0); break;
case ACPI_PROCESSOR_NOTIFY_HIGEST_PERF_CHANGED:
cpufreq_update_highest_perf(pr->id);
And the design appears to be a bit ad-hoc here.
Because why does it have anything to do with cpufreq?
Well, clearly, cpufreq can be affected by this, but why would it be not affected the same way as in the ACPI_PROCESSOR_NOTIFY_PERFORMANCE case?
That is, why isn't cpufreq_update_limits() the right thing to do?
On Wed, Dec 6, 2023 at 10:13 PM Rafael J. Wysocki rafael@kernel.org wrote:
On Wed, Dec 6, 2023 at 9:58 PM Rafael J. Wysocki rafael@kernel.org wrote:
On Tue, Dec 5, 2023 at 7:38 AM Meng Li li.meng@amd.com wrote:
ACPI 6.5 section 8.4.6.1.1.1 specifies that Notify event 0x85 can be emmitted to cause the the OSPM to re-evaluate the highest performance
Typos above. Given the number of iterations of this patch, this is kind of disappointing.
register. Add support for this event.
Also it would be nice to describe how this is supposed to work at least roughly, so it is not necessary to reverse-engineer the patch to find out that.
Tested-by: Oleksandr Natalenko oleksandr@natalenko.name Reviewed-by: Mario Limonciello mario.limonciello@amd.com Reviewed-by: Huang Rui ray.huang@amd.com Reviewed-by: Perry Yuan perry.yuan@amd.com Signed-off-by: Meng Li li.meng@amd.com Link: https://uefi.org/specs/ACPI/6.5/05_ACPI_Software_Programming_Model.html#proc...
drivers/acpi/processor_driver.c | 6 ++++++ drivers/cpufreq/cpufreq.c | 13 +++++++++++++ include/linux/cpufreq.h | 5 +++++ 3 files changed, 24 insertions(+)
diff --git a/drivers/acpi/processor_driver.c b/drivers/acpi/processor_driver.c index 4bd16b3f0781..29b2fb68a35d 100644 --- a/drivers/acpi/processor_driver.c +++ b/drivers/acpi/processor_driver.c @@ -27,6 +27,7 @@ #define ACPI_PROCESSOR_NOTIFY_PERFORMANCE 0x80 #define ACPI_PROCESSOR_NOTIFY_POWER 0x81 #define ACPI_PROCESSOR_NOTIFY_THROTTLING 0x82 +#define ACPI_PROCESSOR_NOTIFY_HIGEST_PERF_CHANGED 0x85
MODULE_AUTHOR("Paul Diefenbaugh"); MODULE_DESCRIPTION("ACPI Processor Driver"); @@ -83,6 +84,11 @@ static void acpi_processor_notify(acpi_handle handle, u32 event, void *data) acpi_bus_generate_netlink_event(device->pnp.device_class, dev_name(&device->dev), event, 0); break;
case ACPI_PROCESSOR_NOTIFY_HIGEST_PERF_CHANGED:
cpufreq_update_highest_perf(pr->id);
And the design appears to be a bit ad-hoc here.
Because why does it have anything to do with cpufreq?
Well, clearly, cpufreq can be affected by this, but why would it be not affected the same way as in the ACPI_PROCESSOR_NOTIFY_PERFORMANCE case?
That is, why isn't cpufreq_update_limits() the right thing to do?
Seriously, I'm not going to apply this patch so long as my comments above are not addressed.
[AMD Official Use Only - General]
Hi Rafael:
-----Original Message----- From: Rafael J. Wysocki rafael@kernel.org Sent: Tuesday, December 12, 2023 9:44 PM To: Meng, Li (Jassmine) Li.Meng@amd.com Cc: Rafael J . Wysocki rafael.j.wysocki@intel.com; Huang, Ray Ray.Huang@amd.com; linux-pm@vger.kernel.org; linux- kernel@vger.kernel.org; x86@kernel.org; linux-acpi@vger.kernel.org; Shuah Khan skhan@linuxfoundation.org; linux-kselftest@vger.kernel.org; Fontenot, Nathan Nathan.Fontenot@amd.com; Sharma, Deepak Deepak.Sharma@amd.com; Deucher, Alexander Alexander.Deucher@amd.com; Limonciello, Mario Mario.Limonciello@amd.com; Huang, Shimmer Shimmer.Huang@amd.com; Yuan, Perry Perry.Yuan@amd.com; Du, Xiaojian Xiaojian.Du@amd.com; Viresh Kumar viresh.kumar@linaro.org; Borislav Petkov bp@alien8.de; Oleksandr Natalenko oleksandr@natalenko.name Subject: Re: [PATCH V12 4/7] cpufreq: Add a notification message that the highest perf has changed
Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding.
On Wed, Dec 6, 2023 at 10:13 PM Rafael J. Wysocki rafael@kernel.org wrote:
On Wed, Dec 6, 2023 at 9:58 PM Rafael J. Wysocki rafael@kernel.org
wrote:
On Tue, Dec 5, 2023 at 7:38 AM Meng Li li.meng@amd.com wrote:
ACPI 6.5 section 8.4.6.1.1.1 specifies that Notify event 0x85 can be emmitted to cause the the OSPM to re-evaluate the highest performance
Typos above. Given the number of iterations of this patch, this is kind of disappointing.
register. Add support for this event.
Also it would be nice to describe how this is supposed to work at least roughly, so it is not necessary to reverse-engineer the patch to find out that.
Tested-by: Oleksandr Natalenko oleksandr@natalenko.name Reviewed-by: Mario Limonciello mario.limonciello@amd.com Reviewed-by: Huang Rui ray.huang@amd.com Reviewed-by: Perry Yuan perry.yuan@amd.com Signed-off-by: Meng Li li.meng@amd.com Link:
https://uefi.org/specs/ACPI/6.5/05_ACPI_Software_Programming_Model
.html#processor-device-notification-values
drivers/acpi/processor_driver.c | 6 ++++++ drivers/cpufreq/cpufreq.c | 13 +++++++++++++ include/linux/cpufreq.h | 5 +++++ 3 files changed, 24 insertions(+)
diff --git a/drivers/acpi/processor_driver.c b/drivers/acpi/processor_driver.c index 4bd16b3f0781..29b2fb68a35d 100644 --- a/drivers/acpi/processor_driver.c +++ b/drivers/acpi/processor_driver.c @@ -27,6 +27,7 @@ #define ACPI_PROCESSOR_NOTIFY_PERFORMANCE 0x80 #define ACPI_PROCESSOR_NOTIFY_POWER 0x81 #define ACPI_PROCESSOR_NOTIFY_THROTTLING 0x82 +#define ACPI_PROCESSOR_NOTIFY_HIGEST_PERF_CHANGED 0x85
MODULE_AUTHOR("Paul Diefenbaugh");
MODULE_DESCRIPTION("ACPI
Processor Driver"); @@ -83,6 +84,11 @@ static void acpi_processor_notify(acpi_handle handle, u32 event, void *data) acpi_bus_generate_netlink_event(device->pnp.device_class, dev_name(&device->dev), event, 0); break;
case ACPI_PROCESSOR_NOTIFY_HIGEST_PERF_CHANGED:
cpufreq_update_highest_perf(pr->id);
And the design appears to be a bit ad-hoc here.
Because why does it have anything to do with cpufreq?
Well, clearly, cpufreq can be affected by this, but why would it be not affected the same way as in the
ACPI_PROCESSOR_NOTIFY_PERFORMANCE
case?
That is, why isn't cpufreq_update_limits() the right thing to do?
Seriously, I'm not going to apply this patch so long as my comments above are not addressed.
[Meng, Li (Jassmine)] Sorry for the delayed reply to the email. BIOS/AGESA is responsible to issue the Notify 0x85 to OS that the preferred core has changed. It will only affect the ranking of the preferred core, not the impact policy limits. AMD P-state driver will set the priority of the cores based on the preferred core ranking, and prioritize selecting higher priority core to run the task.
[AMD Official Use Only - General]
Hi Rafael:
-----Original Message----- From: Meng, Li (Jassmine) Sent: Tuesday, December 26, 2023 4:27 PM To: Rafael J. Wysocki rafael@kernel.org Cc: Rafael J . Wysocki rafael.j.wysocki@intel.com; Huang, Ray Ray.Huang@amd.com; linux-pm@vger.kernel.org; linux- kernel@vger.kernel.org; x86@kernel.org; linux-acpi@vger.kernel.org; Shuah Khan skhan@linuxfoundation.org; linux-kselftest@vger.kernel.org; Fontenot, Nathan Nathan.Fontenot@amd.com; Sharma, Deepak Deepak.Sharma@amd.com; Deucher, Alexander Alexander.Deucher@amd.com; Limonciello, Mario Mario.Limonciello@amd.com; Huang, Shimmer Shimmer.Huang@amd.com; Yuan, Perry Perry.Yuan@amd.com; Du, Xiaojian Xiaojian.Du@amd.com; Viresh Kumar viresh.kumar@linaro.org; Borislav Petkov bp@alien8.de; Oleksandr Natalenko oleksandr@natalenko.name Subject: RE: [PATCH V12 4/7] cpufreq: Add a notification message that the highest perf has changed
Hi Rafael:
-----Original Message----- From: Rafael J. Wysocki rafael@kernel.org Sent: Tuesday, December 12, 2023 9:44 PM To: Meng, Li (Jassmine) Li.Meng@amd.com Cc: Rafael J . Wysocki rafael.j.wysocki@intel.com; Huang, Ray Ray.Huang@amd.com; linux-pm@vger.kernel.org; linux- kernel@vger.kernel.org; x86@kernel.org; linux-acpi@vger.kernel.org; Shuah Khan skhan@linuxfoundation.org; linux-kselftest@vger.kernel.org; Fontenot, Nathan Nathan.Fontenot@amd.com; Sharma, Deepak
Deucher, Alexander Alexander.Deucher@amd.com; Limonciello, Mario Mario.Limonciello@amd.com; Huang, Shimmer
Yuan, Perry Perry.Yuan@amd.com; Du, Xiaojian
Viresh Kumar viresh.kumar@linaro.org; Borislav Petkov bp@alien8.de; Oleksandr Natalenko oleksandr@natalenko.name Subject: Re: [PATCH V12 4/7] cpufreq: Add a notification message that the highest perf has changed
Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding.
On Wed, Dec 6, 2023 at 10:13 PM Rafael J. Wysocki rafael@kernel.org wrote:
On Wed, Dec 6, 2023 at 9:58 PM Rafael J. Wysocki rafael@kernel.org
wrote:
On Tue, Dec 5, 2023 at 7:38 AM Meng Li li.meng@amd.com wrote:
ACPI 6.5 section 8.4.6.1.1.1 specifies that Notify event 0x85 can be emmitted to cause the the OSPM to re-evaluate the highest performance
Typos above. Given the number of iterations of this patch, this is kind of disappointing.
register. Add support for this event.
Also it would be nice to describe how this is supposed to work at least roughly, so it is not necessary to reverse-engineer the patch to find out that.
Tested-by: Oleksandr Natalenko oleksandr@natalenko.name Reviewed-by: Mario Limonciello mario.limonciello@amd.com Reviewed-by: Huang Rui ray.huang@amd.com Reviewed-by: Perry Yuan perry.yuan@amd.com Signed-off-by: Meng Li li.meng@amd.com Link:
https://uefi.org/specs/ACPI/6.5/05_ACPI_Software_Programming_Model
.html#processor-device-notification-values
drivers/acpi/processor_driver.c | 6 ++++++ drivers/cpufreq/cpufreq.c | 13 +++++++++++++ include/linux/cpufreq.h | 5 +++++ 3 files changed, 24 insertions(+)
diff --git a/drivers/acpi/processor_driver.c b/drivers/acpi/processor_driver.c index 4bd16b3f0781..29b2fb68a35d 100644 --- a/drivers/acpi/processor_driver.c +++ b/drivers/acpi/processor_driver.c @@ -27,6 +27,7 @@ #define ACPI_PROCESSOR_NOTIFY_PERFORMANCE 0x80 #define ACPI_PROCESSOR_NOTIFY_POWER 0x81 #define ACPI_PROCESSOR_NOTIFY_THROTTLING 0x82 +#define ACPI_PROCESSOR_NOTIFY_HIGEST_PERF_CHANGED
0x85
MODULE_AUTHOR("Paul Diefenbaugh");
MODULE_DESCRIPTION("ACPI
Processor Driver"); @@ -83,6 +84,11 @@ static void acpi_processor_notify(acpi_handle handle, u32 event, void *data) acpi_bus_generate_netlink_event(device->pnp.device_class, dev_name(&device->dev), event, 0); break;
case ACPI_PROCESSOR_NOTIFY_HIGEST_PERF_CHANGED:
cpufreq_update_highest_perf(pr->id);
And the design appears to be a bit ad-hoc here.
Because why does it have anything to do with cpufreq?
Well, clearly, cpufreq can be affected by this, but why would it be not affected the same way as in the
ACPI_PROCESSOR_NOTIFY_PERFORMANCE
case?
That is, why isn't cpufreq_update_limits() the right thing to do?
Seriously, I'm not going to apply this patch so long as my comments above are not addressed.
[Meng, Li (Jassmine)] Sorry for the delayed reply to the email. BIOS/AGESA is responsible to issue the Notify 0x85 to OS that the preferred core has changed. It will only affect the ranking of the preferred core, not the impact policy limits. AMD P-state driver will set the priority of the cores based on the preferred core ranking, and prioritize selecting higher priority core to run the task.
[Meng, Li (Jassmine)] From ACPI v6.5, Table 5.197 Processor Device Notification Values: Hex value Description 0x80 Performance Present Capabilities Changed. Used to notify OSPM that the number of supported processor performance states has changed. This notification causes OSPM to re-evaluate the _PPC object. See Section 8.4.5.3 for more information.
0x85 Highest Performance Changed. Used to notify OSPM that the value of the CPPC Highest Performance Register has changed.
I think they are different notify events, so they need different functions to handle these events.
Also see: https://uefi.org/specs/ACPI/6.5/05_ACPI_Software_Programming_Model.html#proc...
On Wed, Dec 27, 2023 at 2:40 AM Meng, Li (Jassmine) Li.Meng@amd.com wrote:
[AMD Official Use Only - General]
Hi Rafael:
-----Original Message----- From: Meng, Li (Jassmine) Sent: Tuesday, December 26, 2023 4:27 PM To: Rafael J. Wysocki rafael@kernel.org Cc: Rafael J . Wysocki rafael.j.wysocki@intel.com; Huang, Ray Ray.Huang@amd.com; linux-pm@vger.kernel.org; linux- kernel@vger.kernel.org; x86@kernel.org; linux-acpi@vger.kernel.org; Shuah Khan skhan@linuxfoundation.org; linux-kselftest@vger.kernel.org; Fontenot, Nathan Nathan.Fontenot@amd.com; Sharma, Deepak Deepak.Sharma@amd.com; Deucher, Alexander Alexander.Deucher@amd.com; Limonciello, Mario Mario.Limonciello@amd.com; Huang, Shimmer Shimmer.Huang@amd.com; Yuan, Perry Perry.Yuan@amd.com; Du, Xiaojian Xiaojian.Du@amd.com; Viresh Kumar viresh.kumar@linaro.org; Borislav Petkov bp@alien8.de; Oleksandr Natalenko oleksandr@natalenko.name Subject: RE: [PATCH V12 4/7] cpufreq: Add a notification message that the highest perf has changed
Hi Rafael:
-----Original Message----- From: Rafael J. Wysocki rafael@kernel.org Sent: Tuesday, December 12, 2023 9:44 PM To: Meng, Li (Jassmine) Li.Meng@amd.com Cc: Rafael J . Wysocki rafael.j.wysocki@intel.com; Huang, Ray Ray.Huang@amd.com; linux-pm@vger.kernel.org; linux- kernel@vger.kernel.org; x86@kernel.org; linux-acpi@vger.kernel.org; Shuah Khan skhan@linuxfoundation.org; linux-kselftest@vger.kernel.org; Fontenot, Nathan Nathan.Fontenot@amd.com; Sharma, Deepak
Deucher, Alexander Alexander.Deucher@amd.com; Limonciello, Mario Mario.Limonciello@amd.com; Huang, Shimmer
Yuan, Perry Perry.Yuan@amd.com; Du, Xiaojian
Viresh Kumar viresh.kumar@linaro.org; Borislav Petkov bp@alien8.de; Oleksandr Natalenko oleksandr@natalenko.name Subject: Re: [PATCH V12 4/7] cpufreq: Add a notification message that the highest perf has changed
Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding.
On Wed, Dec 6, 2023 at 10:13 PM Rafael J. Wysocki rafael@kernel.org wrote:
On Wed, Dec 6, 2023 at 9:58 PM Rafael J. Wysocki rafael@kernel.org
wrote:
On Tue, Dec 5, 2023 at 7:38 AM Meng Li li.meng@amd.com wrote:
ACPI 6.5 section 8.4.6.1.1.1 specifies that Notify event 0x85 can be emmitted to cause the the OSPM to re-evaluate the highest performance
Typos above. Given the number of iterations of this patch, this is kind of disappointing.
register. Add support for this event.
Also it would be nice to describe how this is supposed to work at least roughly, so it is not necessary to reverse-engineer the patch to find out that.
Tested-by: Oleksandr Natalenko oleksandr@natalenko.name Reviewed-by: Mario Limonciello mario.limonciello@amd.com Reviewed-by: Huang Rui ray.huang@amd.com Reviewed-by: Perry Yuan perry.yuan@amd.com Signed-off-by: Meng Li li.meng@amd.com Link:
https://uefi.org/specs/ACPI/6.5/05_ACPI_Software_Programming_Model
.html#processor-device-notification-values
drivers/acpi/processor_driver.c | 6 ++++++ drivers/cpufreq/cpufreq.c | 13 +++++++++++++ include/linux/cpufreq.h | 5 +++++ 3 files changed, 24 insertions(+)
diff --git a/drivers/acpi/processor_driver.c b/drivers/acpi/processor_driver.c index 4bd16b3f0781..29b2fb68a35d 100644 --- a/drivers/acpi/processor_driver.c +++ b/drivers/acpi/processor_driver.c @@ -27,6 +27,7 @@ #define ACPI_PROCESSOR_NOTIFY_PERFORMANCE 0x80 #define ACPI_PROCESSOR_NOTIFY_POWER 0x81 #define ACPI_PROCESSOR_NOTIFY_THROTTLING 0x82 +#define ACPI_PROCESSOR_NOTIFY_HIGEST_PERF_CHANGED
0x85
MODULE_AUTHOR("Paul Diefenbaugh");
MODULE_DESCRIPTION("ACPI
Processor Driver"); @@ -83,6 +84,11 @@ static void acpi_processor_notify(acpi_handle handle, u32 event, void *data) acpi_bus_generate_netlink_event(device->pnp.device_class, dev_name(&device->dev), event, 0); break;
case ACPI_PROCESSOR_NOTIFY_HIGEST_PERF_CHANGED:
cpufreq_update_highest_perf(pr->id);
And the design appears to be a bit ad-hoc here.
Because why does it have anything to do with cpufreq?
Well, clearly, cpufreq can be affected by this, but why would it be not affected the same way as in the
ACPI_PROCESSOR_NOTIFY_PERFORMANCE
case?
That is, why isn't cpufreq_update_limits() the right thing to do?
Seriously, I'm not going to apply this patch so long as my comments above are not addressed.
[Meng, Li (Jassmine)] Sorry for the delayed reply to the email. BIOS/AGESA is responsible to issue the Notify 0x85 to OS that the preferred core has changed. It will only affect the ranking of the preferred core, not the impact policy limits. AMD P-state driver will set the priority of the cores based on the preferred core ranking, and prioritize selecting higher priority core to run the task.
[Meng, Li (Jassmine)] From ACPI v6.5, Table 5.197 Processor Device Notification Values: Hex value Description 0x80 Performance Present Capabilities Changed. Used to notify OSPM that the number of supported processor performance states has changed. This notification causes OSPM to re-evaluate the _PPC object. See Section 8.4.5.3 for more information.
0x85 Highest Performance Changed. Used to notify OSPM that the value of the CPPC Highest Performance Register has changed.
I think they are different notify events, so they need different functions to handle these events.
But they effectively mean pretty much the same thing: the highest available performance state of the CPU has changed.
Why would the response need to be different?
[AMD Official Use Only - General]
Hi Raphael:
-----Original Message----- From: Rafael J. Wysocki rafael@kernel.org Sent: Thursday, December 28, 2023 1:04 AM To: Meng, Li (Jassmine) Li.Meng@amd.com Cc: Rafael J. Wysocki rafael@kernel.org; Rafael J . Wysocki rafael.j.wysocki@intel.com; Huang, Ray Ray.Huang@amd.com; linux- pm@vger.kernel.org; linux-kernel@vger.kernel.org; x86@kernel.org; linux- acpi@vger.kernel.org; Shuah Khan skhan@linuxfoundation.org; linux- kselftest@vger.kernel.org; Fontenot, Nathan Nathan.Fontenot@amd.com; Sharma, Deepak Deepak.Sharma@amd.com; Deucher, Alexander Alexander.Deucher@amd.com; Limonciello, Mario Mario.Limonciello@amd.com; Huang, Shimmer Shimmer.Huang@amd.com; Yuan, Perry Perry.Yuan@amd.com; Du, Xiaojian Xiaojian.Du@amd.com; Viresh Kumar viresh.kumar@linaro.org; Borislav Petkov bp@alien8.de; Oleksandr Natalenko oleksandr@natalenko.name Subject: Re: [PATCH V12 4/7] cpufreq: Add a notification message that the highest perf has changed
Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding.
On Wed, Dec 27, 2023 at 2:40 AM Meng, Li (Jassmine) Li.Meng@amd.com wrote:
[AMD Official Use Only - General]
Hi Rafael:
-----Original Message----- From: Meng, Li (Jassmine) Sent: Tuesday, December 26, 2023 4:27 PM To: Rafael J. Wysocki rafael@kernel.org Cc: Rafael J . Wysocki rafael.j.wysocki@intel.com; Huang, Ray Ray.Huang@amd.com; linux-pm@vger.kernel.org; linux- kernel@vger.kernel.org; x86@kernel.org; linux-acpi@vger.kernel.org; Shuah Khan skhan@linuxfoundation.org; linux-kselftest@vger.kernel.org; Fontenot, Nathan Nathan.Fontenot@amd.com; Sharma, Deepak
Deucher, Alexander Alexander.Deucher@amd.com; Limonciello, Mario Mario.Limonciello@amd.com; Huang, Shimmer
Yuan, Perry Perry.Yuan@amd.com; Du, Xiaojian Xiaojian.Du@amd.com; Viresh Kumar viresh.kumar@linaro.org; Borislav Petkov bp@alien8.de; Oleksandr Natalenko oleksandr@natalenko.name Subject: RE: [PATCH V12 4/7] cpufreq: Add a notification message that the highest perf has changed
Hi Rafael:
-----Original Message----- From: Rafael J. Wysocki rafael@kernel.org Sent: Tuesday, December 12, 2023 9:44 PM To: Meng, Li (Jassmine) Li.Meng@amd.com Cc: Rafael J . Wysocki rafael.j.wysocki@intel.com; Huang, Ray Ray.Huang@amd.com; linux-pm@vger.kernel.org; linux- kernel@vger.kernel.org; x86@kernel.org; linux-acpi@vger.kernel.org; Shuah Khan skhan@linuxfoundation.org; linux-kselftest@vger.kernel.org; Fontenot, Nathan Nathan.Fontenot@amd.com; Sharma, Deepak
Deucher, Alexander Alexander.Deucher@amd.com; Limonciello,
Mario
Mario.Limonciello@amd.com; Huang, Shimmer
Yuan, Perry Perry.Yuan@amd.com; Du, Xiaojian
Viresh Kumar viresh.kumar@linaro.org; Borislav Petkov bp@alien8.de; Oleksandr Natalenko oleksandr@natalenko.name Subject: Re: [PATCH V12 4/7] cpufreq: Add a notification message that the highest perf has changed
Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding.
On Wed, Dec 6, 2023 at 10:13 PM Rafael J. Wysocki rafael@kernel.org wrote:
On Wed, Dec 6, 2023 at 9:58 PM Rafael J. Wysocki rafael@kernel.org
wrote:
On Tue, Dec 5, 2023 at 7:38 AM Meng Li li.meng@amd.com wrote: > > ACPI 6.5 section 8.4.6.1.1.1 specifies that Notify event > 0x85 can be emmitted to cause the the OSPM to re-evaluate > the highest performance
Typos above. Given the number of iterations of this patch, this is kind of disappointing.
> register. Add support for this event.
Also it would be nice to describe how this is supposed to work at least roughly, so it is not necessary to reverse-engineer the patch to find out that.
> Tested-by: Oleksandr Natalenko oleksandr@natalenko.name > Reviewed-by: Mario Limonciello mario.limonciello@amd.com > Reviewed-by: Huang Rui ray.huang@amd.com > Reviewed-by: Perry Yuan perry.yuan@amd.com > Signed-off-by: Meng Li li.meng@amd.com > Link: >
https://uefi.org/specs/ACPI/6.5/05_ACPI_Software_Programming_Model
> .html#processor-device-notification-values > --- > drivers/acpi/processor_driver.c | 6 ++++++ > drivers/cpufreq/cpufreq.c | 13 +++++++++++++ > include/linux/cpufreq.h | 5 +++++ > 3 files changed, 24 insertions(+) > > diff --git a/drivers/acpi/processor_driver.c > b/drivers/acpi/processor_driver.c index > 4bd16b3f0781..29b2fb68a35d > 100644 > --- a/drivers/acpi/processor_driver.c > +++ b/drivers/acpi/processor_driver.c > @@ -27,6 +27,7 @@ > #define ACPI_PROCESSOR_NOTIFY_PERFORMANCE 0x80 > #define ACPI_PROCESSOR_NOTIFY_POWER 0x81 > #define ACPI_PROCESSOR_NOTIFY_THROTTLING 0x82 > +#define ACPI_PROCESSOR_NOTIFY_HIGEST_PERF_CHANGED
0x85
> > MODULE_AUTHOR("Paul Diefenbaugh");
MODULE_DESCRIPTION("ACPI
> Processor Driver"); @@ -83,6 +84,11 @@ static void > acpi_processor_notify(acpi_handle handle, u32 event, void *data) > acpi_bus_generate_netlink_event(device-
pnp.device_class,
> dev_name(&device->dev), event, 0); > break; > + case ACPI_PROCESSOR_NOTIFY_HIGEST_PERF_CHANGED: > + cpufreq_update_highest_perf(pr->id);
And the design appears to be a bit ad-hoc here.
Because why does it have anything to do with cpufreq?
Well, clearly, cpufreq can be affected by this, but why would it be not affected the same way as in the
ACPI_PROCESSOR_NOTIFY_PERFORMANCE
case?
That is, why isn't cpufreq_update_limits() the right thing to do?
Seriously, I'm not going to apply this patch so long as my comments above are not addressed.
[Meng, Li (Jassmine)] Sorry for the delayed reply to the email. BIOS/AGESA is responsible to issue the Notify 0x85 to OS that the preferred core has changed. It will only affect the ranking of the preferred core, not the impact policy limits. AMD P-state driver will set the priority of the cores based on the preferred core ranking, and prioritize selecting higher priority core to run
the task.
[Meng, Li (Jassmine)] From ACPI v6.5, Table 5.197 Processor Device Notification Values: Hex value Description 0x80 Performance Present Capabilities Changed. Used to notify
OSPM that the number of supported processor performance states has changed. This notification causes OSPM to re-evaluate the _PPC object. See Section 8.4.5.3 for more information.
0x85 Highest Performance Changed. Used to notify OSPM that
the value of the CPPC Highest Performance Register has changed.
I think they are different notify events, so they need different functions to
handle these events.
But they effectively mean pretty much the same thing: the highest available performance state of the CPU has changed.
Why would the response need to be different?
[Meng, Li (Jassmine)] Thanks, I will modify this issue.
Preferred core rankings can be changed dynamically by the platform based on the workload and platform conditions and accounting for thermals and aging. When this occurs, cpu priority need to be set.
Tested-by: Oleksandr Natalenko oleksandr@natalenko.name Reviewed-by: Mario Limonciello mario.limonciello@amd.com Reviewed-by: Wyes Karny wyes.karny@amd.com Reviewed-by: Huang Rui ray.huang@amd.com Reviewed-by: Perry Yuan perry.yuan@amd.com Signed-off-by: Meng Li li.meng@amd.com --- drivers/cpufreq/amd-pstate.c | 44 ++++++++++++++++++++++++++++++++++++ include/linux/amd-pstate.h | 6 +++++ 2 files changed, 50 insertions(+)
diff --git a/drivers/cpufreq/amd-pstate.c b/drivers/cpufreq/amd-pstate.c index 9c2790753f99..25f0fb53d320 100644 --- a/drivers/cpufreq/amd-pstate.c +++ b/drivers/cpufreq/amd-pstate.c @@ -315,6 +315,7 @@ static int pstate_init_perf(struct amd_cpudata *cpudata) WRITE_ONCE(cpudata->nominal_perf, AMD_CPPC_NOMINAL_PERF(cap1)); WRITE_ONCE(cpudata->lowest_nonlinear_perf, AMD_CPPC_LOWNONLIN_PERF(cap1)); WRITE_ONCE(cpudata->lowest_perf, AMD_CPPC_LOWEST_PERF(cap1)); + WRITE_ONCE(cpudata->prefcore_ranking, AMD_CPPC_HIGHEST_PERF(cap1)); WRITE_ONCE(cpudata->min_limit_perf, AMD_CPPC_LOWEST_PERF(cap1)); return 0; } @@ -339,6 +340,7 @@ static int cppc_init_perf(struct amd_cpudata *cpudata) WRITE_ONCE(cpudata->lowest_nonlinear_perf, cppc_perf.lowest_nonlinear_perf); WRITE_ONCE(cpudata->lowest_perf, cppc_perf.lowest_perf); + WRITE_ONCE(cpudata->prefcore_ranking, cppc_perf.highest_perf); WRITE_ONCE(cpudata->min_limit_perf, cppc_perf.lowest_perf);
if (cppc_state == AMD_PSTATE_ACTIVE) @@ -785,6 +787,32 @@ static void amd_pstate_init_prefcore(struct amd_cpudata *cpudata) schedule_work(&sched_prefcore_work); }
+static void amd_pstate_update_highest_perf(unsigned int cpu) +{ + struct cpufreq_policy *policy = cpufreq_cpu_get(cpu); + struct amd_cpudata *cpudata = policy->driver_data; + u32 prev_high = 0, cur_high = 0; + int ret; + + if ((!amd_pstate_prefcore) || (!cpudata->hw_prefcore)) + goto free_cpufreq_put; + + ret = amd_pstate_get_highest_perf(cpu, &cur_high); + if (ret) + goto free_cpufreq_put; + + prev_high = READ_ONCE(cpudata->prefcore_ranking); + if (prev_high != cur_high) { + WRITE_ONCE(cpudata->prefcore_ranking, cur_high); + + if (cur_high < CPPC_MAX_PERF) + sched_set_itmt_core_prio((int)cur_high, cpu); + } + +free_cpufreq_put: + cpufreq_cpu_put(policy); +} + static int amd_pstate_cpu_init(struct cpufreq_policy *policy) { int min_freq, max_freq, nominal_freq, lowest_nonlinear_freq, ret; @@ -958,6 +986,17 @@ static ssize_t show_amd_pstate_highest_perf(struct cpufreq_policy *policy, return sysfs_emit(buf, "%u\n", perf); }
+static ssize_t show_amd_pstate_prefcore_ranking(struct cpufreq_policy *policy, + char *buf) +{ + u32 perf; + struct amd_cpudata *cpudata = policy->driver_data; + + perf = READ_ONCE(cpudata->prefcore_ranking); + + return sysfs_emit(buf, "%u\n", perf); +} + static ssize_t show_amd_pstate_hw_prefcore(struct cpufreq_policy *policy, char *buf) { @@ -1176,6 +1215,7 @@ cpufreq_freq_attr_ro(amd_pstate_max_freq); cpufreq_freq_attr_ro(amd_pstate_lowest_nonlinear_freq);
cpufreq_freq_attr_ro(amd_pstate_highest_perf); +cpufreq_freq_attr_ro(amd_pstate_prefcore_ranking); cpufreq_freq_attr_ro(amd_pstate_hw_prefcore); cpufreq_freq_attr_rw(energy_performance_preference); cpufreq_freq_attr_ro(energy_performance_available_preferences); @@ -1186,6 +1226,7 @@ static struct freq_attr *amd_pstate_attr[] = { &amd_pstate_max_freq, &amd_pstate_lowest_nonlinear_freq, &amd_pstate_highest_perf, + &amd_pstate_prefcore_ranking, &amd_pstate_hw_prefcore, NULL, }; @@ -1194,6 +1235,7 @@ static struct freq_attr *amd_pstate_epp_attr[] = { &amd_pstate_max_freq, &amd_pstate_lowest_nonlinear_freq, &amd_pstate_highest_perf, + &amd_pstate_prefcore_ranking, &amd_pstate_hw_prefcore, &energy_performance_preference, &energy_performance_available_preferences, @@ -1538,6 +1580,7 @@ static struct cpufreq_driver amd_pstate_driver = { .suspend = amd_pstate_cpu_suspend, .resume = amd_pstate_cpu_resume, .set_boost = amd_pstate_set_boost, + .update_highest_perf = amd_pstate_update_highest_perf, .name = "amd-pstate", .attr = amd_pstate_attr, }; @@ -1552,6 +1595,7 @@ static struct cpufreq_driver amd_pstate_epp_driver = { .online = amd_pstate_epp_cpu_online, .suspend = amd_pstate_epp_suspend, .resume = amd_pstate_epp_resume, + .update_highest_perf = amd_pstate_update_highest_perf, .name = "amd-pstate-epp", .attr = amd_pstate_epp_attr, }; diff --git a/include/linux/amd-pstate.h b/include/linux/amd-pstate.h index 68fc1bd8d851..d21838835abd 100644 --- a/include/linux/amd-pstate.h +++ b/include/linux/amd-pstate.h @@ -39,11 +39,16 @@ struct amd_aperf_mperf { * @cppc_req_cached: cached performance request hints * @highest_perf: the maximum performance an individual processor may reach, * assuming ideal conditions + * For platforms that do not support the preferred core feature, the + * highest_pef may be configured with 166 or 255, to avoid max frequency + * calculated wrongly. we take the fixed value as the highest_perf. * @nominal_perf: the maximum sustained performance level of the processor, * assuming ideal operating conditions * @lowest_nonlinear_perf: the lowest performance level at which nonlinear power * savings are achieved * @lowest_perf: the absolute lowest performance level of the processor + * @prefcore_ranking: the preferred core ranking, the higher value indicates a higher + * priority. * @max_freq: the frequency that mapped to highest_perf * @min_freq: the frequency that mapped to lowest_perf * @nominal_freq: the frequency that mapped to nominal_perf @@ -73,6 +78,7 @@ struct amd_cpudata { u32 nominal_perf; u32 lowest_nonlinear_perf; u32 lowest_perf; + u32 prefcore_ranking; u32 min_limit_perf; u32 max_limit_perf; u32 min_limit_freq;
Introduce amd-pstate preferred core.
check preferred core state set by the kernel parameter: $ cat /sys/devices/system/cpu/amd-pstate/prefcore
Tested-by: Oleksandr Natalenko oleksandr@natalenko.name Reviewed-by: Wyes Karny wyes.karny@amd.com Reviewed-by: Mario Limonciello mario.limonciello@amd.com Reviewed-by: Huang Rui ray.huang@amd.com Reviewed-by: Perry Yuan perry.yuan@amd.com Signed-off-by: Meng Li li.meng@amd.com --- Documentation/admin-guide/pm/amd-pstate.rst | 59 ++++++++++++++++++++- 1 file changed, 57 insertions(+), 2 deletions(-)
diff --git a/Documentation/admin-guide/pm/amd-pstate.rst b/Documentation/admin-guide/pm/amd-pstate.rst index 1cf40f69278c..0b832ff529db 100644 --- a/Documentation/admin-guide/pm/amd-pstate.rst +++ b/Documentation/admin-guide/pm/amd-pstate.rst @@ -300,8 +300,8 @@ platforms. The AMD P-States mechanism is the more performance and energy efficiency frequency management method on AMD processors.
-AMD Pstate Driver Operation Modes -================================= +``amd-pstate`` Driver Operation Modes +======================================
``amd_pstate`` CPPC has 3 operation modes: autonomous (active) mode, non-autonomous (passive) mode and guided autonomous (guided) mode. @@ -353,6 +353,48 @@ is activated. In this mode, driver requests minimum and maximum performance level and the platform autonomously selects a performance level in this range and appropriate to the current workload.
+``amd-pstate`` Preferred Core +================================= + +The core frequency is subjected to the process variation in semiconductors. +Not all cores are able to reach the maximum frequency respecting the +infrastructure limits. Consequently, AMD has redefined the concept of +maximum frequency of a part. This means that a fraction of cores can reach +maximum frequency. To find the best process scheduling policy for a given +scenario, OS needs to know the core ordering informed by the platform through +highest performance capability register of the CPPC interface. + +``amd-pstate`` preferred core enables the scheduler to prefer scheduling on +cores that can achieve a higher frequency with lower voltage. The preferred +core rankings can dynamically change based on the workload, platform conditions, +thermals and ageing. + +The priority metric will be initialized by the ``amd-pstate`` driver. The ``amd-pstate`` +driver will also determine whether or not ``amd-pstate`` preferred core is +supported by the platform. + +``amd-pstate`` driver will provide an initial core ordering when the system boots. +The platform uses the CPPC interfaces to communicate the core ranking to the +operating system and scheduler to make sure that OS is choosing the cores +with highest performance firstly for scheduling the process. When ``amd-pstate`` +driver receives a message with the highest performance change, it will +update the core ranking and set the cpu's priority. + +``amd-pstate`` Preferred Core Switch +================================= +Kernel Parameters +----------------- + +``amd-pstate`` peferred core`` has two states: enable and disable. +Enable/disable states can be chosen by different kernel parameters. +Default enable ``amd-pstate`` preferred core. + +``amd_prefcore=disable`` + +For systems that support ``amd-pstate`` preferred core, the core rankings will +always be advertised by the platform. But OS can choose to ignore that via the +kernel parameter ``amd_prefcore=disable``. + User Space Interface in ``sysfs`` - General ===========================================
@@ -385,6 +427,19 @@ control its functionality at the system level. They are located in the to the operation mode represented by that string - or to be unregistered in the "disable" case.
+``prefcore`` + Preferred core state of the driver: "enabled" or "disabled". + + "enabled" + Enable the ``amd-pstate`` preferred core. + + "disabled" + Disable the ``amd-pstate`` preferred core + + + This attribute is read-only to check the state of preferred core set + by the kernel parameter. + ``cpupower`` tool support for ``amd-pstate`` ===============================================
amd-pstate driver support enable/disable preferred core. Default enabled on platforms supporting amd-pstate preferred core. Disable amd-pstate preferred core with "amd_prefcore=disable" added to the kernel command line.
Signed-off-by: Meng Li li.meng@amd.com Reviewed-by: Mario Limonciello mario.limonciello@amd.com Reviewed-by: Wyes Karny wyes.karny@amd.com Reviewed-by: Huang Rui ray.huang@amd.com Reviewed-by: Perry Yuan perry.yuan@amd.com Tested-by: Oleksandr Natalenko oleksandr@natalenko.name --- Documentation/admin-guide/kernel-parameters.txt | 5 +++++ 1 file changed, 5 insertions(+)
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index 65731b060e3f..cbfa63a87e4a 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -363,6 +363,11 @@ selects a performance level in this range and appropriate to the current workload.
+ amd_prefcore= + [X86] + disable + Disable amd-pstate preferred core. + amijoy.map= [HW,JOY] Amiga joystick support Map of devices attached to JOY0DAT and JOY1DAT Format: <a>,<b>
linux-kselftest-mirror@lists.linaro.org