Re: [PATCH v3] KVM: selftests: Fix the dirty_log_test semaphore imbalance

2 Feb 2024


      On Fri, Feb 02, 2024, Sean Christopherson wrote:
...
On Fri, Feb 02, 2024, Shaoqin Huang wrote:
...

v2->v3:

Rebase to v6.8-rc2.
Use TEST_ASSERT().

Patch says otherwise.
...
@@ -726,6 +728,11 @@ static void run_test(enum vm_guest_mode mode, void *arg)
   	return;
   }

sem_getvalue(&sem_vcpu_stop, &sem_val);
assert(sem_val == 0);
sem_getvalue(&sem_vcpu_cont, &sem_val);
assert(sem_val == 0);
/*
We reserve page table for 2 times of extra dirty mem which
will definitely cover the original (1G+) test range.  Here



@@ -825,6 +832,13 @@ static void run_test(enum vm_guest_mode mode, void *arg)
   	sync_global_to_guest(vm, iteration);
   }

/*
*


* Before we set the host_quit, let the vcpu has time to run, to make


* sure we consume the sem_vcpu_stop and the vcpu consume the


* sem_vcpu_cont, to keep the semaphore balance.


*/


usleep(p->interval * 1000);

Sorry, I wasn't as explicit as I should have been.  When I said I don't have a
better solution, I did not mean to imply that I am ok with busy waiting as a
hack-a-around.
Against my better judgment, I spent half an hour slogging through this test to
figure out what's going wrong.  IIUC, the problem is essentially that the test
instructs the vCPU worker to continue _after_ the last iteration, and _then_ sets
host_quit, which results in the vCPU running one extra (unvalidated) iteration.
For the other modes, which stop if and only if vcpu_sync_stop_requested is set,
the extra iteration is a non-issue.  But because the dirty ring variant stops
after every exit (to purge the ring), it hangs without an extra "continue".
So rather than blindly fire off an extra sem_vcpu_cont that may or may not be
consumed, I believe the test can simply set host_quit _before_ the final "continue"
so that the vCPU worker doesn't run an extra iteration.
I ran the below with 1000 loops of "for (i = 0; i < LOG_MODE_NUM; i++)" and so
no issues.  I didn't see the assert you hit, but without the fix, I did see this
fire within a few loops (less than 5 I think)l
assert(host_log_mode == LOG_MODE_DIRTY_RING ||
          atomic_read(&vcpu_sync_stop_requested) == false);
I'll post this as two patches: one to fix the bug, and a second to have the
LOG_MODE_DIRTY_RING variant clear vcpu_sync_stop_requested so that the above
assert() can be modified as below.
Scratch patch 2, I'm pretty sure the vCPU worker can race with the main task and
clear vcpu_sync_stop_requested just before the main task sets it, which would
result in a false positive.  I didn't see any failures, but I'm 99% certain the
race exists.  I suspect there are some other warts in the test that would
complicate attempts to clean things up, so for now I'l just post the fix for
the imbalance bug.

    

2025

2024

2023

2022

2021

2020

2019

2018

2017

Re: [PATCH v3] KVM: selftests: Fix the dirty_log_test semaphore imbalance