On Thu, Oct 13, 2022 at 5:06 AM srinivas pandruvada srinivas.pandruvada@linux.intel.com wrote:
On Wed, 2022-10-12 at 18:58 +0200, Rafael J. Wysocki wrote:
On Tue, Oct 11, 2022 at 3:23 PM Chen Yu yu.c.chen@intel.com wrote:
Hi Pavel, On 2022-10-11 at 13:36:46 +0200, Pavel Machek wrote:
Hi!
From: Srinivas Pandruvada srinivas.pandruvada@linux.intel.com
[ Upstream commit 68b99e94a4a2db6ba9b31fe0485e057b9354a640 ]
When CPU 0 is offline and intel_powerclamp is used to inject idle, it generates kernel BUG:
BUG: using smp_processor_id() in preemptible [00000000] code: bash/15687 caller is debug_smp_processor_id+0x17/0x20 CPU: 4 PID: 15687 Comm: bash Not tainted 5.19.0-rc7+ #57 Call Trace:
<TASK> dump_stack_lvl+0x49/0x63 dump_stack+0x10/0x16 check_preemption_disabled+0xdd/0xe0 debug_smp_processor_id+0x17/0x20 powerclamp_set_cur_state+0x7f/0xf9 [intel_powerclamp] ... ...
Here CPU 0 is the control CPU by default and changed to the current CPU, if CPU 0 offlined. This check has to be performed under cpus_read_lock(), hence the above warning.
Use get_cpu() instead of smp_processor_id() to avoid this BUG.
This has exactly the same problem as smp_processor_id(), you just worked around the warning. If it is okay that control_cpu contains stale value, could we have a comment explaining why?
May I know why does control_cpu have stale value? The control_cpu is a random picked online CPU which will be used later to collect statistics. As long as the control_cpu is online, it is valid IMO.
So this is confusing, because the code makes the impression that getting the number of the CPU running the code matters in some way, which isn't the case.
Something like cpumask_first(cpu_online_mask) should work as well if I'm not mistaken and it would be less confusing to use this instead IMO.
That should work as we are under hotplug lock anyway here.
Well, that's my point.
I guess I'll send a patch with this change.