This is a rare corner case, but it does happen:
In perf_rotate_context(), when the first cpu flexible event fail to schedule, cpu_rotate is 1, while cpu_event is NULL. Since cpu_event is NULL, perf_rotate_context will _NOT_ call cpu_ctx_sched_out(), thus cpuctx->ctx.is_active will have EVENT_FLEXIBLE set. Then, the next perf_event_sched_in() will skip all cpu flexible events because of the EVENT_FLEXIBLE bit.
In the next call of perf_rotate_context(), cpu_rotate stays 1, and cpu_event stays NULL, so this process repeats. The end result is, flexible events on this cpu will not be scheduled (until another event being added to the cpuctx).
Similar issue may happen with the task_ctx. But it is usually not a problem because the task_ctx moves around different CPU.
Fix this corner case by using cpu_rotate and task_rotate to gate calls for (cpu_)ctx_sched_out and rotate_ctx. Also enable rotate_ctx() to handle event == NULL case.
Fixes: 8d5bce0c37fa ("perf/core: Optimize perf_rotate_context() event scheduling") Cc: stable@vger.kernel.org # v4.17+ Cc: Peter Zijlstra peterz@infradead.org Cc: Arnaldo Carvalho de Melo acme@kernel.org Cc: Thomas Gleixner tglx@linutronix.de Signed-off-by: Song Liu songliubraving@fb.com --- kernel/events/core.c | 15 +++++++++++---- 1 file changed, 11 insertions(+), 4 deletions(-)
diff --git a/kernel/events/core.c b/kernel/events/core.c index 4655adbbae10..50021735f367 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -3775,6 +3775,13 @@ static void rotate_ctx(struct perf_event_context *ctx, struct perf_event *event) if (ctx->rotate_disable) return;
+ /* if no event specified, try to rotate the first event */ + if (!event) + event = rb_entry_safe(rb_first(&ctx->flexible_groups.tree), + typeof(*event), group_node); + if (!event) + return; + perf_event_groups_delete(&ctx->flexible_groups, event); perf_event_groups_insert(&ctx->flexible_groups, event); } @@ -3816,14 +3823,14 @@ static bool perf_rotate_context(struct perf_cpu_context *cpuctx) * As per the order given at ctx_resched() first 'pop' task flexible * and then, if needed CPU flexible. */ - if (task_event || (task_ctx && cpu_event)) + if (task_rotate || (task_ctx && cpu_rotate)) ctx_sched_out(task_ctx, cpuctx, EVENT_FLEXIBLE); - if (cpu_event) + if (cpu_rotate) cpu_ctx_sched_out(cpuctx, EVENT_FLEXIBLE);
- if (task_event) + if (task_rotate) rotate_ctx(task_ctx, task_event); - if (cpu_event) + if (cpu_rotate) rotate_ctx(&cpuctx->ctx, cpu_event);
perf_event_sched_in(cpuctx, task_ctx, current);
On Oct 2, 2019, at 11:43 PM, Song Liu songliubraving@fb.com wrote:
This is a rare corner case, but it does happen:
In perf_rotate_context(), when the first cpu flexible event fail to schedule, cpu_rotate is 1, while cpu_event is NULL. Since cpu_event is NULL, perf_rotate_context will _NOT_ call cpu_ctx_sched_out(), thus cpuctx->ctx.is_active will have EVENT_FLEXIBLE set. Then, the next perf_event_sched_in() will skip all cpu flexible events because of the EVENT_FLEXIBLE bit.
In the next call of perf_rotate_context(), cpu_rotate stays 1, and cpu_event stays NULL, so this process repeats. The end result is, flexible events on this cpu will not be scheduled (until another event being added to the cpuctx).
Similar issue may happen with the task_ctx. But it is usually not a problem because the task_ctx moves around different CPU.
Fix this corner case by using cpu_rotate and task_rotate to gate calls for (cpu_)ctx_sched_out and rotate_ctx. Also enable rotate_ctx() to handle event == NULL case.
Here is an easy repro of this issue. On Intel CPUs, where ref-cycles could only use one counter, run one pinned event for ref-cycles, one flexible event for ref-cycles, and one flexible event for cycles. The flexible ref-cycles is never scheduled, which is expected. However, because of this issue, the cycle event is never scheduled either.
perf stat -e ref-cycles:D,ref-cycles,cycles -C 5 -I 1000 # time counts unit events 1.000152973 15,412,480 ref-cycles:D 1.000152973 <not counted> ref-cycles (0.00%) 1.000152973 <not counted> cycles (0.00%) 2.000486957 18,263,120 ref-cycles:D 2.000486957 <not counted> ref-cycles (0.00%) 2.000486957 <not counted> cycles (0.00%)
Thanks, Song
Greeting,
FYI, we noticed a -6.8% regression of vm-scalability.median due to commit:
commit: a449d926203c733fecddc1fd077cf3d6ecef7f15 ("[PATCH] perf/core: fix corner case in perf_rotate_context()") url: https://github.com/0day-ci/linux/commits/Song-Liu/perf-core-fix-corner-case-...
in testcase: vm-scalability on test machine: 4 threads Intel(R) Core(TM) i3-3220 CPU @ 3.30GHz with 8G memory with following parameters:
runtime: 300s size: 8T test: anon-w-seq cpufreq_governor: performance ucode: 0x21
test-description: The motivation behind this suite is to exercise functions and regions of the mm/ of the Linux kernel which are of interest to us. test-url: https://git.kernel.org/cgit/linux/kernel/git/wfg/vm-scalability.git/
If you fix the issue, kindly add following tag Reported-by: kernel test robot rong.a.chen@intel.com
Details are as below: -------------------------------------------------------------------------------------------------->
To reproduce:
git clone https://github.com/intel/lkp-tests.git cd lkp-tests bin/lkp install job.yaml # job file is attached in this email bin/lkp run job.yaml
========================================================================================= compiler/cpufreq_governor/kconfig/rootfs/runtime/size/tbox_group/test/testcase/ucode: gcc-7/performance/x86_64-rhel-7.6/debian-x86_64-2019-09-23.cgz/300s/8T/lkp-ivb-d02/anon-w-seq/vm-scalability/0x21
commit: e336b40277 ("kprobes: Prohibit probing on BUG() and WARN() address") a449d92620 ("perf/core: fix corner case in perf_rotate_context()")
e336b4027775cb45 a449d926203c733fecddc1fd077 ---------------- --------------------------- %stddev %change %stddev \ | \ 0.00 ± 2% +3.8% 0.00 vm-scalability.free_time 1235827 -6.8% 1151821 vm-scalability.median 0.02 ± 8% +51.1% 0.03 ± 5% vm-scalability.median_stddev 4931119 -6.6% 4606513 vm-scalability.throughput 1624883 -6.4% 1521193 vm-scalability.time.minor_page_faults 228.43 ± 3% +20.6% 275.45 vm-scalability.time.system_time 919.50 -5.6% 867.64 vm-scalability.time.user_time 11336 -3.9% 10895 vm-scalability.time.voluntary_context_switches 1.432e+09 -7.1% 1.33e+09 vm-scalability.workload 2.07 ± 4% +0.7 2.81 ± 14% mpstat.cpu.all.idle% 20.04 ± 3% +3.8 23.89 mpstat.cpu.all.sys% 77.00 -5.8% 72.50 vmstat.cpu.us 7171 -3.0% 6958 vmstat.system.cs 12591513 ± 16% +60.0% 20142845 ± 27% cpuidle.C6.time 21824 ± 11% +44.6% 31557 ± 19% cpuidle.C6.usage 1123 ± 17% +86.2% 2092 ± 24% cpuidle.POLL.usage 8568 +11.3% 9539 interrupts.CAL:Function_call_interrupts 7614 ± 33% +87.5% 14280 ± 25% interrupts.CPU3.RES:Rescheduling_interrupts 15566 ± 20% +120.5% 34321 ± 28% interrupts.NMI:Non-maskable_interrupts 15566 ± 20% +120.5% 34321 ± 28% interrupts.PMI:Performance_monitoring_interrupts 37492 ± 9% +33.8% 50168 ± 2% interrupts.RES:Rescheduling_interrupts 31894 ± 6% -19.7% 25618 ± 5% softirqs.CPU0.RCU 32409 ± 6% -21.5% 25453 ± 8% softirqs.CPU1.RCU 35578 ± 3% -26.0% 26329 ± 11% softirqs.CPU2.RCU 33765 ± 9% -28.8% 24030 ± 3% softirqs.CPU3.RCU 133648 ± 4% -24.1% 101433 ± 5% softirqs.RCU 21819 ± 11% +44.6% 31556 ± 19% turbostat.C6 1.04 ± 16% +0.6 1.66 ± 26% turbostat.C6% 0.27 ± 29% +247.2% 0.94 ± 41% turbostat.CPU%c6 18.86 -2.1% 18.46 turbostat.CorWatt 0.02 ± 44% +520.0% 0.15 ± 40% turbostat.Pkg%pc2 0.02 ±101% +1466.7% 0.35 ± 63% turbostat.Pkg%pc6 37.19 -1.5% 36.64 turbostat.PkgWatt 2167377 +51.8% 3289728 ± 2% meminfo.Active 2165895 +51.8% 3288244 ± 2% meminfo.Active(anon) 2006718 +53.7% 3085007 ± 2% meminfo.AnonHugePages 2151900 +49.3% 3213568 ± 2% meminfo.AnonPages 141552 -60.7% 55678 ± 8% meminfo.CmaFree 4584114 -24.5% 3458742 meminfo.MemAvailable 4694537 -24.0% 3569088 meminfo.MemFree 3337073 +33.7% 4462514 meminfo.Memused 7677 +27.5% 9791 meminfo.PageTables 9532 ± 12% -38.9% 5820 ± 20% sched_debug.cfs_rq:/.min_vruntime.stddev 29.67 ± 4% -17.8% 24.38 ± 3% sched_debug.cfs_rq:/.nr_spread_over.max 11.21 ± 6% -16.5% 9.36 ± 2% sched_debug.cfs_rq:/.nr_spread_over.stddev 103675 ± 28% -22.0% 80844 ± 6% sched_debug.cfs_rq:/.runnable_weight.stddev 20991 ± 40% -72.0% 5881 ± 52% sched_debug.cfs_rq:/.spread0.max 9528 ± 12% -38.9% 5820 ± 20% sched_debug.cfs_rq:/.spread0.stddev 371922 ± 10% +22.9% 456951 ± 8% sched_debug.cpu.avg_idle.min 191541 ± 12% -19.1% 154992 ± 8% sched_debug.cpu.avg_idle.stddev 318.92 ± 68% -59.2% 130.17 ±124% sched_debug.cpu.curr->pid.stddev 540668 +35.4% 732070 ± 3% proc-vmstat.nr_active_anon 536027 +30.6% 700171 ± 4% proc-vmstat.nr_anon_pages 977.25 +34.4% 1313 ± 4% proc-vmstat.nr_anon_transparent_hugepages 114025 -16.8% 94860 ± 2% proc-vmstat.nr_dirty_background_threshold 228332 -16.8% 189953 ± 2% proc-vmstat.nr_dirty_threshold 35358 -49.5% 17856 ± 8% proc-vmstat.nr_free_cma 1174409 -16.3% 982468 ± 2% proc-vmstat.nr_free_pages 1920 +17.3% 2252 ± 2% proc-vmstat.nr_page_table_pages 6783 +3.2% 7000 proc-vmstat.nr_shmem 540670 +35.4% 732059 ± 3% proc-vmstat.nr_zone_active_anon 2505028 -5.5% 2366487 proc-vmstat.numa_hit 2505028 -5.5% 2366487 proc-vmstat.numa_local 1609 ± 5% +30.7% 2103 ± 10% proc-vmstat.pgactivate 72313983 -7.6% 66816607 ± 2% proc-vmstat.pgalloc_dma32 2.472e+08 -6.9% 2.301e+08 proc-vmstat.pgalloc_normal 2007636 -5.0% 1907338 proc-vmstat.pgfault 3.193e+08 -7.2% 2.963e+08 proc-vmstat.pgfree 619835 -7.2% 575294 proc-vmstat.thp_deferred_split_page 620290 -7.1% 576268 proc-vmstat.thp_fault_alloc 4.948e+09 -8.6% 4.525e+09 perf-stat.i.branch-instructions 71787571 -6.5% 67122869 perf-stat.i.cache-misses 79483511 -7.0% 73906455 perf-stat.i.cache-references 7224 -2.9% 7017 perf-stat.i.context-switches 0.82 +7.9% 0.88 perf-stat.i.cpi 1.272e+10 -1.1% 1.258e+10 perf-stat.i.cpu-cycles 96.49 ± 10% -22.6% 74.70 ± 2% perf-stat.i.cpu-migrations 185.50 +3.7% 192.42 perf-stat.i.cycles-between-cache-misses 3.354e+09 -6.7% 3.129e+09 perf-stat.i.dTLB-loads 0.01 ± 5% +0.0 0.01 ± 4% perf-stat.i.dTLB-store-miss-rate% 1.419e+09 -6.9% 1.321e+09 perf-stat.i.dTLB-stores 1.558e+10 -7.8% 1.437e+10 perf-stat.i.instructions 1.22 -6.8% 1.14 perf-stat.i.ipc 6581 -5.4% 6226 perf-stat.i.minor-faults 6581 -5.4% 6226 perf-stat.i.page-faults 0.82 +7.2% 0.88 perf-stat.overall.cpi 177.25 +5.8% 187.47 perf-stat.overall.cycles-between-cache-misses 1.22 -6.7% 1.14 perf-stat.overall.ipc 4.936e+09 -8.6% 4.513e+09 perf-stat.ps.branch-instructions 71615773 -6.5% 66940324 perf-stat.ps.cache-misses 79293323 -7.0% 73705360 perf-stat.ps.cache-references 7207 -2.9% 6998 perf-stat.ps.context-switches 1.269e+10 -1.1% 1.255e+10 perf-stat.ps.cpu-cycles 96.25 ± 10% -22.6% 74.50 ± 2% perf-stat.ps.cpu-migrations 3.346e+09 -6.7% 3.12e+09 perf-stat.ps.dTLB-loads 1.416e+09 -6.9% 1.318e+09 perf-stat.ps.dTLB-stores 1.554e+10 -7.8% 1.433e+10 perf-stat.ps.instructions 6565 -5.4% 6209 perf-stat.ps.minor-faults 6565 -5.4% 6209 perf-stat.ps.page-faults 4.691e+12 -7.6% 4.333e+12 perf-stat.total.instructions 67.81 -3.9 63.88 perf-profile.calltrace.cycles-pp.do_rw_once 0.89 ± 8% -0.3 0.54 ± 57% perf-profile.calltrace.cycles-pp.smp_apic_timer_interrupt.apic_timer_interrupt.do_rw_once 0.92 ± 8% -0.2 0.69 ± 15% perf-profile.calltrace.cycles-pp.apic_timer_interrupt.do_rw_once 45.88 +3.6 49.44 perf-profile.calltrace.cycles-pp.do_access 14.92 ± 4% +6.1 20.99 ± 4% perf-profile.calltrace.cycles-pp.clear_page_erms.clear_subpage.clear_huge_page.do_huge_pmd_anonymous_page.__handle_mm_fault 15.42 ± 3% +6.3 21.76 ± 4% perf-profile.calltrace.cycles-pp.clear_subpage.clear_huge_page.do_huge_pmd_anonymous_page.__handle_mm_fault.handle_mm_fault 15.92 ± 3% +6.6 22.52 ± 4% perf-profile.calltrace.cycles-pp.clear_huge_page.do_huge_pmd_anonymous_page.__handle_mm_fault.handle_mm_fault.__do_page_fault 17.18 ± 3% +6.6 23.79 ± 4% perf-profile.calltrace.cycles-pp.__handle_mm_fault.handle_mm_fault.__do_page_fault.do_page_fault.page_fault 17.00 ± 3% +6.7 23.68 ± 4% perf-profile.calltrace.cycles-pp.do_huge_pmd_anonymous_page.__handle_mm_fault.handle_mm_fault.__do_page_fault.do_page_fault 16.45 +7.0 23.43 perf-profile.calltrace.cycles-pp.page_fault.do_access 16.29 +7.0 23.27 perf-profile.calltrace.cycles-pp.handle_mm_fault.__do_page_fault.do_page_fault.page_fault.do_access 16.39 +7.0 23.38 perf-profile.calltrace.cycles-pp.__do_page_fault.do_page_fault.page_fault.do_access 16.43 +7.0 23.42 perf-profile.calltrace.cycles-pp.do_page_fault.page_fault.do_access 68.56 -4.7 63.88 perf-profile.children.cycles-pp.do_rw_once 1.62 ± 4% -0.3 1.31 ± 13% perf-profile.children.cycles-pp.smp_apic_timer_interrupt 1.73 ± 4% -0.3 1.42 ± 12% perf-profile.children.cycles-pp.apic_timer_interrupt 1.15 ± 6% -0.2 0.90 ± 13% perf-profile.children.cycles-pp.__hrtimer_run_queues 0.83 ± 3% -0.2 0.63 ± 17% perf-profile.children.cycles-pp.tick_sched_timer 0.66 ± 12% -0.2 0.51 ± 18% perf-profile.children.cycles-pp.update_process_times 0.24 ± 32% -0.1 0.11 ± 15% perf-profile.children.cycles-pp.irq_work_run_list 0.19 ± 21% -0.1 0.09 ± 21% perf-profile.children.cycles-pp.irq_work_interrupt 0.19 ± 21% -0.1 0.09 ± 21% perf-profile.children.cycles-pp.smp_irq_work_interrupt 0.18 ± 22% -0.1 0.09 ± 21% perf-profile.children.cycles-pp.irq_work_run 0.18 ± 22% -0.1 0.09 ± 21% perf-profile.children.cycles-pp.printk 0.10 ± 14% -0.0 0.07 ± 25% perf-profile.children.cycles-pp.try_to_wake_up 0.26 ± 13% +0.0 0.31 ± 4% perf-profile.children.cycles-pp.__free_pages_ok 0.34 ± 8% +0.1 0.41 ± 2% perf-profile.children.cycles-pp.__x64_sys_munmap 0.34 ± 8% +0.1 0.41 ± 3% perf-profile.children.cycles-pp.__vm_munmap 0.32 ± 11% +0.1 0.39 ± 2% perf-profile.children.cycles-pp.tlb_finish_mmu 0.32 ± 11% +0.1 0.39 ± 2% perf-profile.children.cycles-pp.tlb_flush_mmu 0.33 ± 12% +0.1 0.40 ± 3% perf-profile.children.cycles-pp.release_pages 0.34 ± 8% +0.1 0.42 ± 4% perf-profile.children.cycles-pp.__do_munmap 0.34 ± 8% +0.1 0.41 ± 3% perf-profile.children.cycles-pp.unmap_region 0.28 ± 9% +0.1 0.37 ± 5% perf-profile.children.cycles-pp.munmap 0.18 ± 8% +0.1 0.29 ± 9% perf-profile.children.cycles-pp._cond_resched 0.08 ± 26% +0.1 0.19 ± 12% perf-profile.children.cycles-pp._raw_spin_lock_irqsave 0.12 ± 15% +0.1 0.24 ± 13% perf-profile.children.cycles-pp.rcu_all_qs 0.30 ± 12% +0.1 0.43 ± 5% perf-profile.children.cycles-pp.___might_sleep 0.00 +0.1 0.15 ± 7% perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath 45.22 +4.3 49.49 perf-profile.children.cycles-pp.do_access 15.19 ± 3% +6.4 21.58 ± 3% perf-profile.children.cycles-pp.clear_page_erms 15.47 ± 3% +6.6 22.03 ± 3% perf-profile.children.cycles-pp.clear_subpage 17.56 ± 3% +6.8 24.35 ± 3% perf-profile.children.cycles-pp.page_fault 15.98 ± 3% +6.8 22.78 ± 3% perf-profile.children.cycles-pp.clear_huge_page 17.35 ± 3% +6.8 24.15 ± 3% perf-profile.children.cycles-pp.handle_mm_fault 17.30 ± 3% +6.8 24.10 ± 3% perf-profile.children.cycles-pp.__handle_mm_fault 17.51 ± 3% +6.8 24.32 ± 3% perf-profile.children.cycles-pp.do_page_fault 17.47 ± 3% +6.8 24.29 ± 3% perf-profile.children.cycles-pp.__do_page_fault 17.02 ± 2% +6.9 23.91 ± 3% perf-profile.children.cycles-pp.do_huge_pmd_anonymous_page 59.31 -4.4 54.88 perf-profile.self.cycles-pp.do_rw_once 19.43 ± 2% -1.9 17.50 perf-profile.self.cycles-pp.do_access 0.23 ± 12% +0.1 0.29 ± 7% perf-profile.self.cycles-pp.__free_pages_ok 0.29 ± 10% +0.1 0.40 ± 7% perf-profile.self.cycles-pp.___might_sleep 0.07 ± 7% +0.1 0.18 ± 11% perf-profile.self.cycles-pp.rcu_all_qs 0.00 +0.1 0.15 ± 7% perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath 0.31 ± 19% +0.2 0.51 ± 13% perf-profile.self.cycles-pp.clear_subpage 14.82 ± 3% +6.4 21.21 ± 3% perf-profile.self.cycles-pp.clear_page_erms
vm-scalability.time.user_time
940 +-+-------------------------------------------------------------------+ | + | 930 +-+ + + +. .. + + + | 920 +-++. .. + .+ .. + + + +..+.+.. + : + :| |.. +..+.+ +. + .+.+ +..+. .+. + +. + : + :| 910 +-+ +. +. + + + | | | 900 +-+ | | | 890 +-+ | 880 +-+ | O | 870 +-+ O O O O O O O O O O O | | O O O O O O O O O | 860 +-+----------O-----------O-------------------O---------O--------------+
vm-scalability.time.system_time
290 +-+-------------------------------------------------------------------+ | O | 280 +-+O O O O O O O O O | 270 O-+ O O O O O O O O O O O O O O | | | 260 +-+ | | | 250 +-+ | | | 240 +-+ +.. + .+ | 230 +-+ +..+. +..+.+.. +.. + + + .+..+ + +.. :| | + +.. .+.. + + + + + .+.. .+..+ + + :| 220 +-++ + + + + + + + | | | 210 +-+-------------------------------------------------------------------+
vm-scalability.throughput
5e+06 +-+--------------------------------------------------------------+ | | 4.95e+06 +-+ .+. .+.+.+.. .+..+. .+. .+.+..+. .+.| | +. +.+. +.+ +.+. + +.+..+. .+..+.+.+. | 4.9e+06 +-+ + | 4.85e+06 +-+ | | | 4.8e+06 +-+ | | | 4.75e+06 +-+ | 4.7e+06 +-+ | | | 4.65e+06 +-+ | | O O O O O O | 4.6e+06 O-O----O----O-O-O--O-O-O------O--O-----O--O-O-O--O-O-O--O--------+
vm-scalability.median
1.25e+06 +-+--------------------------------------------------------------+ 1.24e+06 +-+ .+. +. .+.. .+. .+. | | +. +. .. + +.+.+..+.+.+. + +..+.+.+..+. .+.+..+.| 1.23e+06 +-+ + +.+..+ | 1.22e+06 +-+ | | | 1.21e+06 +-+ | 1.2e+06 +-+ | 1.19e+06 +-+ | | | 1.18e+06 +-+ | 1.17e+06 +-+ | | | 1.16e+06 O-+ O O O O O O O | 1.15e+06 +-O----O----O-O----O-O-O------O--O-----O--O-O-O--O-O-O--O--------+
vm-scalability.workload
1.46e+09 +-+--------------------------------------------------------------+ | | 1.44e+09 +-+ .+. +. .+.. .+. .+. .+. .+. | | +. +. .. + +.+.+. +.+. + +. +.+..+.+.+..+.+.+..+.| 1.42e+09 +-+ + | | | 1.4e+09 +-+ | | | 1.38e+09 +-+ | | | 1.36e+09 +-+ | | | 1.34e+09 O-+ O O | | O O O O O O O O O O O O O O O O O O O O O | 1.32e+09 +-+---------O----------------------------------------------------+
[*] bisect-good sample [O] bisect-bad sample
Disclaimer: Results have been estimated based on internal Intel analysis and are provided for informational purposes only. Any difference in system hardware or software design or configuration may affect actual performance.
Thanks, Rong Chen
On Wed, Oct 02, 2019 at 11:43:17PM -0700, Song Liu wrote:
This is a rare corner case, but it does happen:
In perf_rotate_context(), when the first cpu flexible event fail to schedule, cpu_rotate is 1, while cpu_event is NULL.
You're failing to explain how we get here in the first place.
diff --git a/kernel/events/core.c b/kernel/events/core.c index 4655adbbae10..50021735f367 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -3775,6 +3775,13 @@ static void rotate_ctx(struct perf_event_context *ctx, struct perf_event *event) if (ctx->rotate_disable) return;
- /* if no event specified, try to rotate the first event */
- if (!event)
event = rb_entry_safe(rb_first(&ctx->flexible_groups.tree),
typeof(*event), group_node);
- if (!event)
return;
- perf_event_groups_delete(&ctx->flexible_groups, event); perf_event_groups_insert(&ctx->flexible_groups, event);
}
I don't think that is a very nice solution; would not something simpler like this be sufficient? (although possible we should rename that function now).
And it needs more comments.
diff --git a/kernel/events/core.c b/kernel/events/core.c index 4655adbbae10..772a9854eb3a 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -3782,8 +3797,17 @@ static void rotate_ctx(struct perf_event_context *ctx, struct perf_event *event) static inline struct perf_event * ctx_first_active(struct perf_event_context *ctx) { - return list_first_entry_or_null(&ctx->flexible_active, - struct perf_event, active_list); + struct perf_event *event; + + event = list_first_entry_or_null(&ctx->flexible_active, + struct perf_event, active_list); + + if (!event) { + event = rb_entry_safe(rb_first(&ctx->flexible_groups.tree), + typeof(*event), group_node); + } + + return event; }
static bool perf_rotate_context(struct perf_cpu_context *cpuctx)
On Oct 8, 2019, at 2:35 AM, Peter Zijlstra peterz@infradead.org wrote:
On Wed, Oct 02, 2019 at 11:43:17PM -0700, Song Liu wrote:
This is a rare corner case, but it does happen:
In perf_rotate_context(), when the first cpu flexible event fail to schedule, cpu_rotate is 1, while cpu_event is NULL.
You're failing to explain how we get here in the first place.
On Intel CPU, it could be trigger by something like:
perf stat -e ref-cycles:D,ref-cycles,cycles -C 5 -I 1000
I will add this to v2.
diff --git a/kernel/events/core.c b/kernel/events/core.c index 4655adbbae10..50021735f367 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -3775,6 +3775,13 @@ static void rotate_ctx(struct perf_event_context *ctx, struct perf_event *event) if (ctx->rotate_disable) return;
- /* if no event specified, try to rotate the first event */
- if (!event)
event = rb_entry_safe(rb_first(&ctx->flexible_groups.tree),
typeof(*event), group_node);
- if (!event)
return;
- perf_event_groups_delete(&ctx->flexible_groups, event); perf_event_groups_insert(&ctx->flexible_groups, event);
}
I don't think that is a very nice solution; would not something simpler like this be sufficient? (although possible we should rename that function now).
Yes, this is much cleaner. Thanks!
Let me submit v2 based on this.
Thanks, Song
And it needs more comments.
diff --git a/kernel/events/core.c b/kernel/events/core.c index 4655adbbae10..772a9854eb3a 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -3782,8 +3797,17 @@ static void rotate_ctx(struct perf_event_context *ctx, struct perf_event *event) static inline struct perf_event * ctx_first_active(struct perf_event_context *ctx) {
- return list_first_entry_or_null(&ctx->flexible_active,
struct perf_event, active_list);
- struct perf_event *event;
- event = list_first_entry_or_null(&ctx->flexible_active,
struct perf_event, active_list);
- if (!event) {
event = rb_entry_safe(rb_first(&ctx->flexible_groups.tree),
typeof(*event), group_node);
- }
- return event;
}
static bool perf_rotate_context(struct perf_cpu_context *cpuctx)
linux-stable-mirror@lists.linaro.org