On Wed, Oct 30, 2024 at 1:22 AM Marc Zyngier maz@kernel.org wrote:
On Wed, 30 Oct 2024 00:16:48 +0000, Raghavendra Rao Ananta rananta@google.com wrote:
On Tue, Oct 29, 2024 at 11:47 AM Marc Zyngier maz@kernel.org wrote:
On Tue, 29 Oct 2024 17:06:09 +0000, Raghavendra Rao Ananta rananta@google.com wrote:
On Tue, Oct 29, 2024 at 9:27 AM Marc Zyngier maz@kernel.org wrote:
On Mon, 28 Oct 2024 23:45:33 +0000, Raghavendra Rao Ananta rananta@google.com wrote:
Did you have a chance to check whether this had any negative impact on actual workloads? Since the entry/exit code is a bit of a hot spot, I'd like to make sure we're not penalising the common case (I only wrote this patch while waiting in an airport, and didn't test it at all).
I ran the kvm selftests, kvm-unit-tests and booted a linux guest to test the change and noticed no failures. Any specific test you want to try out?
My question is not about failures (I didn't expect any), but specifically about *performance*, and whether checking the flag without a static key can lead to any performance drop on the hot path.
Can you please run an exit-heavy workload (such as hackbench, for example), and report any significant delta you could measure?
Oh, I see. I ran hackbench and micro-bench from kvm-unit-tests (which also causes a lot of entry/exits), on Ampere Altra with kernel at v6.12-rc1, and see no significant difference in perf.
Thanks for running this stuff.
timer_10ms 231040.0 902.0 timer_10ms 234120.0 914.0
This seems to be the only case were we are adversely affected by this change.
Hmm, I'm not sure how much we want to trust this comparison. For instance, I just ran micro-bench again a few more times and here are the outcomes of timer_10ms for each try with the patch:
Tries total ns avg ns ----------------------------------------------------------------------------------- 1_timer_10ms 231840.0 905.0 2_timer_10ms 234560.0 916.0 3_timer_10ms 227440.0 888.0 4_timer_10ms 236640.0 924.0 5_timer_10ms 231200.0 903.0
Here's a few on the baseline:
Tries total ns avg ns ----------------------------------------------------------------------------------- 1_timer_10ms 231080.0 902.0 2_timer_10ms 238040.0 929.0 3_timer_10ms 231680.0 905.0 4_timer_10ms 229280.0 895.0 5_timer_10ms 228520.0 892.0
In the grand scheme of thins, that's noise. But this gives us a clear line of sight for the removal of the in-kernel interrupts back to userspace.
Sorry, I didn't follow you completely on this part.
Thank you. Raghavendra