On Mon, Nov 2, 2020 at 5:12 PM Peter Xu peterx@redhat.com wrote:
On Mon, Nov 02, 2020 at 03:56:05PM -0800, Ben Gardon wrote:
On Mon, Nov 2, 2020 at 2:21 PM Peter Xu peterx@redhat.com wrote:
On Tue, Oct 27, 2020 at 04:37:33PM -0700, Ben Gardon wrote:
The dirty log perf test will time verious dirty logging operations (enabling dirty logging, dirtying memory, getting the dirty log, clearing the dirty log, and disabling dirty logging) in order to quantify dirty logging performance. This test can be used to inform future performance improvements to KVM's dirty logging infrastructure.
One thing to mention is that there're a few patches in the kvm dirty ring series that reworked the dirty log test quite a bit (to add similar test for dirty ring). For example:
https://lore.kernel.org/kvm/20201023183358.50607-11-peterx@redhat.com/
Just a FYI if we're going to use separate test programs. Merging this tests should benefit in many ways, of course (e.g., dirty ring may directly runnable with the perf tests too; so we can manually enable this "perf mode" as a new parameter in dirty_log_test, if possible?), however I don't know how hard - maybe there's some good reason to keep them separate...
Absolutely, we definitely need a performance test for both modes. I'll take a look at the patch you linked and see what it would take to support dirty ring in this test.
That would be highly appreciated.
Do you think that should be done in this series, or would it make sense to add as a follow up?
To me I slightly lean toward working upon those patches, since we should potentially share quite some code there (e.g., the clear dirty log cleanup seems necessary, or not easy to add the dirty ring tests anyway). But current one is still ok to me at least as initial version - we should always be more tolerant for test cases, aren't we? :)
So maybe we can wait for a 3rd opinion before you change the direction.
I took a look at your patches for dirty ring and dirty logging modes and thought about this some more. I think your patch to merge the get and clear dirty log tests is great, and I can try to include it and build on it in my series as well if desired. I don't think it would be hard to use the same mode approach in the dirty log perf test. That said, I think it would be easier to keep the functional test (dirty_log_test, clear_dirty_log_test) separate from the performance test because the dirty log validation is extra time and complexity not needed in the dirty log perf test. I did try building them in the same test initially, but it was really ugly. Perhaps a future refactoring could merge them better.
[...]
+static void run_test(enum vm_guest_mode mode, unsigned long iterations,
uint64_t phys_offset, int vcpus,
uint64_t vcpu_memory_bytes, int wr_fract)
+{
[...]
/* Start the iterations */
iteration = 0;
host_quit = false;
clock_gettime(CLOCK_MONOTONIC, &start);
for (vcpu_id = 0; vcpu_id < vcpus; vcpu_id++) {
pthread_create(&vcpu_threads[vcpu_id], NULL, vcpu_worker,
&perf_test_args.vcpu_args[vcpu_id]);
}
/* Allow the vCPU to populate memory */
pr_debug("Starting iteration %lu - Populating\n", iteration);
while (READ_ONCE(vcpu_last_completed_iteration[vcpu_id]) != iteration)
pr_debug("Waiting for vcpu_last_completed_iteration == %lu\n",
iteration);
Isn't array vcpu_last_completed_iteration[] initialized to all zeros? If so, I feel like this "while" won't run as expected to wait for populating mem.
I think you are totally right. The array should be initialized to -1, which I realize isn't a uint and unsigned integer overflow is bad, so the array should be converted to ints too. I suppose I didn't catch this because it would just make the populating pass 0 look really short and pass 1 really long. I remember seeing that behavior but not realizing that it was caused by a test bug. I will correct this, thank you for pointing that out.
The flooding pr_debug() seems a bit scary too if the mem size is huge.. How about a pr_debug() after the loop (so if we don't see that it means it hanged)?
I don't think the number of messages on pr_debug will be proportional to the size of memory, but rather the product of iterations and vCPUs. That said, that's still a lot of messages.
The guest code dirties all pages, and that process is proportional to the size of memory, no?
Btw since you mentioned vcpus - I also feel like above chunk should be put into the for loop above...
Ooof I misread my code. You're totally right. I'll fix that by removing the print there.
My assumption was that if you've gone to the trouble to turn on debug logging, it's easier to comment log lines out than add them, but I'm also happy to just move this to a single message after the loop.
Yah that's subjective too - feel free to keep whatever you prefer. In all cases, hopefully I won't even need to enable pr_debug at all. :)
-- Peter Xu