On Tue, Oct 03, 2023 at 09:08:44PM -0700, Namhyung Kim wrote:
But after the change, it ended up iterating all pmus/events in the cpu context if there's a cgroup event somewhere on the cpu context. Unfortunately it includes uncore pmus which have much longer latency to control.
Can you describe the problem in more detail please?
We have cgrp as part of the tree key: {cpu, pmu, cgroup, idx}, so it should be possible to find a specific cgroup for a cpu and or skip to the next cgroup on that cpu in O(log n) time.
To fix the issue, I restored a linked list equivalent to cgrp_cpuctx_list in the perf_cpu_context and link perf_cpu_pmu_contexts that have cgroup events only. Also add new helpers to enable/disable and does ctx sched in/out for cgroups.
Adding a list and duplicating the whole scheduling infrastructure seems 'unfortunate' at best.