On Oct 2, 2019, at 11:43 PM, Song Liu songliubraving@fb.com wrote:
This is a rare corner case, but it does happen:
In perf_rotate_context(), when the first cpu flexible event fail to schedule, cpu_rotate is 1, while cpu_event is NULL. Since cpu_event is NULL, perf_rotate_context will _NOT_ call cpu_ctx_sched_out(), thus cpuctx->ctx.is_active will have EVENT_FLEXIBLE set. Then, the next perf_event_sched_in() will skip all cpu flexible events because of the EVENT_FLEXIBLE bit.
In the next call of perf_rotate_context(), cpu_rotate stays 1, and cpu_event stays NULL, so this process repeats. The end result is, flexible events on this cpu will not be scheduled (until another event being added to the cpuctx).
Similar issue may happen with the task_ctx. But it is usually not a problem because the task_ctx moves around different CPU.
Fix this corner case by using cpu_rotate and task_rotate to gate calls for (cpu_)ctx_sched_out and rotate_ctx. Also enable rotate_ctx() to handle event == NULL case.
Here is an easy repro of this issue. On Intel CPUs, where ref-cycles could only use one counter, run one pinned event for ref-cycles, one flexible event for ref-cycles, and one flexible event for cycles. The flexible ref-cycles is never scheduled, which is expected. However, because of this issue, the cycle event is never scheduled either.
perf stat -e ref-cycles:D,ref-cycles,cycles -C 5 -I 1000 # time counts unit events 1.000152973 15,412,480 ref-cycles:D 1.000152973 <not counted> ref-cycles (0.00%) 1.000152973 <not counted> cycles (0.00%) 2.000486957 18,263,120 ref-cycles:D 2.000486957 <not counted> ref-cycles (0.00%) 2.000486957 <not counted> cycles (0.00%)
Thanks, Song