On 2025-05-15 3:25 p.m., Sean Christopherson wrote:
On Thu, May 15, 2025, Kan Liang wrote:
On 2025-05-14 7:19 p.m., Sean Christopherson wrote:
This naming is confusing on purpose? Pick either guest/host and stick with it.
+1. I also think the inner perf_host_{enter,exit}() helpers are superflous. These flows
After a bit of hacking, and with a few spoilers, this is what I ended up with (not anywhere near fully tested). I like following KVM's kvm_xxx_{load,put}() nomenclature to tie everything together, so I went with "guest" instead of "host" even though the majority of work being down is to shedule out/in host context.
/* When loading a guest's mediated PMU, schedule out all exclude_guest events. */ void perf_load_guest_context(unsigned long data) { struct perf_cpu_context *cpuctx = this_cpu_ptr(&perf_cpu_context);
lockdep_assert_irqs_disabled();
perf_ctx_lock(cpuctx, cpuctx->task_ctx);
if (WARN_ON_ONCE(__this_cpu_read(guest_ctx_loaded))) goto unlock;
perf_ctx_disable(&cpuctx->ctx, EVENT_GUEST); ctx_sched_out(&cpuctx->ctx, NULL, EVENT_GUEST); perf_ctx_enable(&cpuctx->ctx, EVENT_GUEST); if (cpuctx->task_ctx) { perf_ctx_disable(cpuctx->task_ctx, EVENT_GUEST); task_ctx_sched_out(cpuctx->task_ctx, NULL, EVENT_GUEST); perf_ctx_enable(cpuctx->task_ctx, EVENT_GUEST); }
arch_perf_load_guest_context(data);
__this_cpu_write(guest_ctx_loaded, true);
unlock: perf_ctx_unlock(cpuctx, cpuctx->task_ctx); } EXPORT_SYMBOL_GPL(perf_load_guest_context);
void perf_put_guest_context(void) { struct perf_cpu_context *cpuctx = this_cpu_ptr(&perf_cpu_context);
lockdep_assert_irqs_disabled();
perf_ctx_lock(cpuctx, cpuctx->task_ctx);
if (WARN_ON_ONCE(!__this_cpu_read(guest_ctx_loaded))) goto unlock;
arch_perf_put_guest_context();
It will set the guest_ctx_loaded to false. The update_context_time() invoked in the perf_event_sched_in() will not get a chance to update the guest time.
The guest_ctx_loaded in arch/x86/events/core.c is a different variable, it just happens to have the same name.
Ah, I thought it was the same variable. Then there should be no issue with the guest time.
But the same variable name may bring confusions. Maybe add a x86_pmu/x86 prefix to the variable in x86 code.
It's completely gross, but exposing guest_ctx_loaded outside of kernel/events/core.c didn't seem like a great alternative. If we wanted to use a single variable, then the writes in arch_perf_{load,put}_guest_context() can simply go away.
Either two variables or a single variable is fine with me, as long as they can be easily distinguished.
But I think we should put arch_perf_{load,put}_guest_context() and guest_ctx_loaded between the perf_ctx_disable/enable() pair. Perf should only update the state when PMU is completely disabled. It matches the way the rest of the perf code does. It could also avoid some potential issues, e.g., the state is not updated completely, but the counter has already been fired.
Thanks, Kan