Hi, Srinivas,
Thanks for the patch.
On Tue, 2023-09-05 at 17:06 -0700, Srinivas Pandruvada wrote:
System runs at minimum performance, once powercap RAPL package domain "enabled" flag is toggled.
Setting RAPL package domain enabled flag to 0, results in setting of power limit 4 (PL4) MSR 0x601 to 0.
This is the bug. Setting enabled flag should only affect the Enable bit but not the other bits.
This implies disabling PL4 limit. The PL4 limit controls the peak power. This can significantly change the performance. Even worse, when the enabled flag is set to 1 again. This will set PL4 MSR value to 0x01, which means reduce peak power to 0.125W. This will force the system to run at the lowest possible performance.
This is caused by a change which assumes that there is an enable bit in the PL4 MSR like other power limits.
MSR RAPL doesn't have PL4 enable bit, but TPMI RAPL does have per power limit enable bit.
In functions set_floor_freq_default() and rapl_remove_package(), call rapl_write_pl_data with PL_ENABLE and PL_CLAMP for only power limit 1 and 2. Similarly don't read PL_ENABLE for PL4 to check the presence of power limit 4.
IMO, the rootcause is that we expose an non-exits PL4_ENABLE primitive for MSR Interface, and even worse we expose it wrongly that the primitive uses the power limit bits.
Commit 9050a9cd5e4c ("powercap: intel_rapl: Cleanup Power Limits support") pokes the MSR interface PL4_ENABLE primitive and exposes this bug.
Sumeet is testing the below fix right now, and I suppose he will give some update soon.
thanks, rui
diff --git a/drivers/powercap/intel_rapl_common.c b/drivers/powercap/intel_rapl_common.c index 8fac57b28f8a..d407f918876f 100644 --- a/drivers/powercap/intel_rapl_common.c +++ b/drivers/powercap/intel_rapl_common.c @@ -658,8 +658,6 @@ static struct rapl_primitive_info rpi_msr[NR_RAPL_PRIMITIVES] = { RAPL_DOMAIN_REG_LIMIT, ARBITRARY_UNIT, 0), [PL2_CLAMP] = PRIMITIVE_INFO_INIT(PL2_CLAMP, POWER_LIMIT2_CLAMP, 48, RAPL_DOMAIN_REG_LIMIT, ARBITRARY_UNIT, 0), - [PL4_ENABLE] = PRIMITIVE_INFO_INIT(PL4_ENABLE, POWER_LIMIT4_MASK, 0, - RAPL_DOMAIN_REG_PL4, ARBITRARY_UNIT, 0), [TIME_WINDOW1] = PRIMITIVE_INFO_INIT(TIME_WINDOW1, TIME_WINDOW1_MASK, 17, RAPL_DOMAIN_REG_LIMIT, TIME_UNIT, 0), [TIME_WINDOW2] = PRIMITIVE_INFO_INIT(TIME_WINDOW2, TIME_WINDOW2_MASK, 49,