On 8/30/2025 4:00 AM, Song Liu wrote:
Hi,
Thanks for the patchset.
Some logistics:
- Please prefix future patches properly with "bpf" or "bpf-next", for example,
[PATCH v2 bpf-next 1/2].
- Please be specific with the patch title, i.e. "selftests/bpf: Add selftests"
should be something like "selftests/bpf: Add selftests for cpu-idle ext".
Yes, I'll update for them.
On Fri, Aug 29, 2025 at 3:11 AM Lin Yikai yikai.lin@vivo.com wrote:
Summary
Hi, everyone, This patch set introduces an extensible cpuidle governor framework using BPF struct_ops, enabling dynamic implementation of idle-state selection policies via BPF programs.
Motivation
As is well-known, CPUs support multiple idle states (e.g., C0, C1, C2, ...), where deeper states reduce power consumption, but results in longer wakeup latency, potentially affecting performance. Existing generic cpuidle governors operate effectively in common scenarios but exhibit suboptimal behavior in specific Android phone's use cases.
Our testing reveals that during low-utilization scenarios (e.g., screen-off background tasks like music playback with CPU utilization <10%), the C0 state occupies ~50% of idle time, causing significant energy inefficiency. Reducing C0 to ≤20% could yield ≥5% power savings on mobile phones.
To address this, we expect: 1.Dynamic governor switching to power-saved policies for low cpu utilization scenarios (e.g., screen-off mode) 2.Dynamic switching to alternate governors for high-performance scenarios (e.g., gaming)
OverView
The BPF cpuidle ext governor registers at postcore_initcall() but remains disabled by default due to its low priority "rating" with value "1". Activation requires adjust higer "rating" than other governors within BPF.
Core Components: 1.**struct cpuidle_gov_ext_ops** – BPF-overridable operations:
- ops.enable()/ops.disable(): enable or disable callback
- ops.select(): cpu Idle-state selection logic
- ops.set_stop_tick(): Scheduler tick management after state selection
- ops.reflect(): feedback info about previous idle state.
- ops.init()/ops.deinit(): Initialization or cleanup.
2.**Critical kfuncs for kernel state access**:
- bpf_cpuidle_ext_gov_update_rating(): Activate ext governor by raising rating must be called from "ops.init()"
- bpf_cpuidle_ext_gov_latency_req(): get idle-state latency constraints
- bpf_tick_nohz_get_sleep_length(): get CPU sleep duration in tickless mode
Future work
- Scenario detection: Identifying low-utilization states (e.g., screen-off + background music)
- Policy optimization: Optimizing state-selection algorithms for specific scenarios
I am not an expert on cpuidle, so pardon me if the following are rookie questions. But I guess some more detail will help other folks too.
Thanks very much for your comments. The cpuidle framework is as follows.(And I'll add it into the next V2 version.) ---------------------------------------------------------- Scheduler Core ---------------------------------------------------------- | v ---------------------------------------------------------- | FAIR Class | EXT Class | IDLE Class | ---------------------------------------------------------- | | | | | | | v | | | ------------------------ | | | enter_cpu_idle() | | | ------------------------ | | | | | | | v | | | ------------------------------ | | | | CPUIDLE Governor | | | | ------------------------------ | | | | | | | | | v v v | | |----------------------------------- | | | default | | other | | BPF ext | | | | Governor | | Governor | | Governor | <===Here is the feature we add. | | |----------------------------------- | | | | | | | | | v v v | | |------------------------------------- | | | select idle state | | |-------------------------------------> 1. It is not clear to me why a BPF based solution is needed here. Can
we achieve similar benefits with a knob and some userspace daemon? Each time the system switches to the idle class, it requires a governor policy to select the correct idle state.
Currently, we can only switch governor policies through sysfs nodes, as shown below: / # ls /sys/devices/system/cpu/cpuidle/ available_governors current_driver current_governor current_governor_ro / # cat /sys/devices/system/cpu/cpuidle/available_governors menu teo qcom-cpu-lpm / # cat /sys/devices/system/cpu/cpuidle/current_governor qcom-cpu-lpm <===Here we can echo governor name to this node to switch it. However, it is not possible to change the implementation of this policy through user interfaces.
- Is it possible to extend sched_ext to cover cpuidle logic?
The cpuidle governor decides which idle state to enter each time it switches to the idle class. The sched_ext is used to determine the scheduling order of tasks, whereas cpuidle is invoked after switching to an idle state when no tasks are present. They are not closely related, so it's not feasible to implement kfuncs or other extensions via sched_ext.> Thanks,
Song
Thanks for your comments.