On 2/25/21 8:53 AM, Tony Krowiak wrote:
On 2/25/21 6:28 AM, Halil Pasic wrote:
On Wed, 24 Feb 2021 22:28:50 -0500 Tony Krowiakakrowiak@linux.ibm.com wrote:
static void vfio_ap_mdev_unset_kvm(struct ap_matrix_mdev *matrix_mdev) {
- kvm_arch_crypto_clear_masks(matrix_mdev->kvm);
- matrix_mdev->kvm->arch.crypto.pqap_hook = NULL;
- vfio_ap_mdev_reset_queues(matrix_mdev->mdev);
- kvm_put_kvm(matrix_mdev->kvm);
- matrix_mdev->kvm = NULL;
- struct kvm *kvm;
- if (matrix_mdev->kvm) {
kvm = matrix_mdev->kvm;
kvm_get_kvm(kvm);
matrix_mdev->kvm = NULL;
I think if there were two threads dong the unset in parallel, one of them could bail out and carry on before the cleanup is done. But since nothing much happens in release after that, I don't see an immediate problem.
Another thing to consider is, that setting ->kvm to NULL arms vfio_ap_mdev_remove()...
I'm not entirely sure what you mean by this, but my assumption is that you are talking about the check for matrix_mdev->kvm != NULL at the start of that function.
Yes I was talking about the check
static int vfio_ap_mdev_remove(struct mdev_device *mdev) { struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev); if (matrix_mdev->kvm) return -EBUSY; ... kfree(matrix_mdev); ... }
As you see, we bail out if kvm is still set, otherwise we clean up the matrix_mdev which includes kfree-ing it. And vfio_ap_mdev_remove() is initiated via the sysfs, i.e. can be initiated at any time. If we were to free matrix_mdev in mdev_remove() and then carry on with kvm_unset() with mutex_lock(&matrix_dev->lock); that would be bad.
I agree.
The reason matrix_mdev->kvm is set to NULL before giving up the matrix_dev->lock is so that functions that check for the presence of the matrix_mdev->kvm pointer, such as assign_adapter_store() - will exit if they get control while the masks are being cleared.
I disagree!
static ssize_t assign_adapter_store(struct device *dev, struct device_attribute *attr, const char *buf, size_t count) { int ret; unsigned long apid; struct mdev_device *mdev = mdev_from_dev(dev); struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev); /* If the guest is running, disallow assignment of adapter */ if (matrix_mdev->kvm) return -EBUSY;
We bail out when kvm != NULL, so having it set to NULL while the mask are being cleared will make these not bail out.
You are correct, I am an idiot.
So what we have here is a catch-22; in other words, we have the case you pointed out above and the cases related to assigning/unassigning adapters, domains and control domains which should exit when a guest is running.
See above.
Ditto.
I may have an idea to resolve this. Suppose we add:
struct ap_matrix_mdev { ... bool kvm_busy; ... }
This flag will be set to true at the start of both the vfio_ap_mdev_set_kvm() and vfio_ap_mdev_unset_kvm() and set to false at the end. The assignment/unassignment and remove callback functions can test this flag and return -EBUSY if the flag is true. That will preclude assigning or unassigning adapters, domains and control domains when the KVM pointer is being set/unset. Likewise, removal of the mediated device will also be prevented while the KVM pointer is being set/unset.
In the case of the PQAP handler function, it can wait for the set/unset of the KVM pointer as follows:
/while (matrix_mdev->kvm_busy) {// // mutex_unlock(&matrix_dev->lock);// // msleep(100);// // mutex_lock(&matrix_dev->lock);// //}// // //if (!matrix_mdev->kvm)// // goto out_unlock;
/What say you? //
I'm not sure. Since I disagree with your analysis above it is difficult to deal with the conclusion. I'm not against decoupling the tracking of the state of the mdev_matrix device from the value of the kvm pointer. I think we should first get a common understanding of the problem, before we proceed to the solution.
Regardless of my brain fog regarding the testing of the matrix_mdev->kvm pointer, I stand by what I stated in the paragraphs just before the code snippet.
The problem is there are 10 functions that depend upon the value of the matrix_mdev->kvm pointer that can get control while the pointer is being set/unset and the matrix_dev->lock is given up to set/clear the masks:
* vfio_ap_irq_enable: called by handle_pqap() when AQIC is intercepted * vfio_ap_irq_disable: called by handle_pqap() when AQIC is intercepted * assign_adapter_store: sysfs * unassign_adapter_store: sysfs * assign_domain_store: sysfs * unassign_domain_store: sysfs * assign__control_domain_store: sysfs * unassign_control_domain_store: sysfs * vfio_ap_mdev_remove: sysfs * vfio_ap_mdev_release: mdev fd closed by userspace (i.e., qemu)If we add the proposed flag to indicate when the matrix_mdev->kvm
pointer is in flux, then we can check that before allowing the functions in the list above to proceed.
Regards, Halil
On Thu, 25 Feb 2021 10:25:24 -0500 Tony Krowiak akrowiak@linux.ibm.com wrote:
On 2/25/21 8:53 AM, Tony Krowiak wrote:
On 2/25/21 6:28 AM, Halil Pasic wrote:
On Wed, 24 Feb 2021 22:28:50 -0500 Tony Krowiakakrowiak@linux.ibm.com wrote:
static void vfio_ap_mdev_unset_kvm(struct ap_matrix_mdev *matrix_mdev) {
- kvm_arch_crypto_clear_masks(matrix_mdev->kvm);
- matrix_mdev->kvm->arch.crypto.pqap_hook = NULL;
- vfio_ap_mdev_reset_queues(matrix_mdev->mdev);
- kvm_put_kvm(matrix_mdev->kvm);
- matrix_mdev->kvm = NULL;
- struct kvm *kvm;
- if (matrix_mdev->kvm) {
kvm = matrix_mdev->kvm;
kvm_get_kvm(kvm);
matrix_mdev->kvm = NULL;
I think if there were two threads dong the unset in parallel, one of them could bail out and carry on before the cleanup is done. But since nothing much happens in release after that, I don't see an immediate problem.
Another thing to consider is, that setting ->kvm to NULL arms vfio_ap_mdev_remove()...
I'm not entirely sure what you mean by this, but my assumption is that you are talking about the check for matrix_mdev->kvm != NULL at the start of that function.
Yes I was talking about the check
static int vfio_ap_mdev_remove(struct mdev_device *mdev) { struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev); if (matrix_mdev->kvm) return -EBUSY; ... kfree(matrix_mdev); ... }
As you see, we bail out if kvm is still set, otherwise we clean up the matrix_mdev which includes kfree-ing it. And vfio_ap_mdev_remove() is initiated via the sysfs, i.e. can be initiated at any time. If we were to free matrix_mdev in mdev_remove() and then carry on with kvm_unset() with mutex_lock(&matrix_dev->lock); that would be bad.
I agree.
The reason matrix_mdev->kvm is set to NULL before giving up the matrix_dev->lock is so that functions that check for the presence of the matrix_mdev->kvm pointer, such as assign_adapter_store() - will exit if they get control while the masks are being cleared.
I disagree!
static ssize_t assign_adapter_store(struct device *dev, struct device_attribute *attr, const char *buf, size_t count) { int ret; unsigned long apid; struct mdev_device *mdev = mdev_from_dev(dev); struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev); /* If the guest is running, disallow assignment of adapter */ if (matrix_mdev->kvm) return -EBUSY;
We bail out when kvm != NULL, so having it set to NULL while the mask are being cleared will make these not bail out.
You are correct, I am an idiot.
So what we have here is a catch-22; in other words, we have the case you pointed out above and the cases related to assigning/unassigning adapters, domains and control domains which should exit when a guest is running.
See above.
Ditto.
I may have an idea to resolve this. Suppose we add:
struct ap_matrix_mdev { ... bool kvm_busy; ... }
This flag will be set to true at the start of both the vfio_ap_mdev_set_kvm() and vfio_ap_mdev_unset_kvm() and set to false at the end. The assignment/unassignment and remove callback functions can test this flag and return -EBUSY if the flag is true. That will preclude assigning or unassigning adapters, domains and control domains when the KVM pointer is being set/unset. Likewise, removal of the mediated device will also be prevented while the KVM pointer is being set/unset.
In the case of the PQAP handler function, it can wait for the set/unset of the KVM pointer as follows:
/while (matrix_mdev->kvm_busy) {// // mutex_unlock(&matrix_dev->lock);// // msleep(100);// // mutex_lock(&matrix_dev->lock);// //}// // //if (!matrix_mdev->kvm)// // goto out_unlock;
/What say you? //
I'm not sure. Since I disagree with your analysis above it is difficult to deal with the conclusion. I'm not against decoupling the tracking of the state of the mdev_matrix device from the value of the kvm pointer. I think we should first get a common understanding of the problem, before we proceed to the solution.
Regardless of my brain fog regarding the testing of the matrix_mdev->kvm pointer, I stand by what I stated in the paragraphs just before the code snippet.
The problem is there are 10 functions that depend upon the value of the matrix_mdev->kvm pointer that can get control while the pointer is being set/unset and the matrix_dev->lock is given up to set/clear the masks:
- vfio_ap_irq_enable: called by handle_pqap() when AQIC is intercepted
- vfio_ap_irq_disable: called by handle_pqap() when AQIC is intercepted
- assign_adapter_store: sysfs
- unassign_adapter_store: sysfs
- assign_domain_store: sysfs
- unassign_domain_store: sysfs
- assign__control_domain_store: sysfs
- unassign_control_domain_store: sysfs
- vfio_ap_mdev_remove: sysfs
- vfio_ap_mdev_release: mdev fd closed by userspace (i.e., qemu)If we
add the proposed flag to indicate when the matrix_mdev->kvm
Something is strange with this email. It is basically the same email as the previous one, just broken, or?
pointer is in flux, then we can check that before allowing the functions in the list above to proceed.
Regards, Halil
On 2/25/21 10:35 AM, Halil Pasic wrote:
On Thu, 25 Feb 2021 10:25:24 -0500 Tony Krowiak akrowiak@linux.ibm.com wrote:
On 2/25/21 8:53 AM, Tony Krowiak wrote:
On 2/25/21 6:28 AM, Halil Pasic wrote:
On Wed, 24 Feb 2021 22:28:50 -0500 Tony Krowiakakrowiak@linux.ibm.com wrote:
> static void vfio_ap_mdev_unset_kvm(struct ap_matrix_mdev *matrix_mdev) > { > - kvm_arch_crypto_clear_masks(matrix_mdev->kvm); > - matrix_mdev->kvm->arch.crypto.pqap_hook = NULL; > - vfio_ap_mdev_reset_queues(matrix_mdev->mdev); > - kvm_put_kvm(matrix_mdev->kvm); > - matrix_mdev->kvm = NULL; > + struct kvm *kvm; > + > + if (matrix_mdev->kvm) { > + kvm = matrix_mdev->kvm; > + kvm_get_kvm(kvm); > + matrix_mdev->kvm = NULL; I think if there were two threads dong the unset in parallel, one of them could bail out and carry on before the cleanup is done. But since nothing much happens in release after that, I don't see an immediate problem.
Another thing to consider is, that setting ->kvm to NULL arms vfio_ap_mdev_remove()...
I'm not entirely sure what you mean by this, but my assumption is that you are talking about the check for matrix_mdev->kvm != NULL at the start of that function.
Yes I was talking about the check
static int vfio_ap_mdev_remove(struct mdev_device *mdev) { struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev); if (matrix_mdev->kvm) return -EBUSY; ... kfree(matrix_mdev); ... }
As you see, we bail out if kvm is still set, otherwise we clean up the matrix_mdev which includes kfree-ing it. And vfio_ap_mdev_remove() is initiated via the sysfs, i.e. can be initiated at any time. If we were to free matrix_mdev in mdev_remove() and then carry on with kvm_unset() with mutex_lock(&matrix_dev->lock); that would be bad.
I agree.
The reason matrix_mdev->kvm is set to NULL before giving up the matrix_dev->lock is so that functions that check for the presence of the matrix_mdev->kvm pointer, such as assign_adapter_store() - will exit if they get control while the masks are being cleared.
I disagree!
static ssize_t assign_adapter_store(struct device *dev, struct device_attribute *attr, const char *buf, size_t count) { int ret; unsigned long apid; struct mdev_device *mdev = mdev_from_dev(dev); struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev); /* If the guest is running, disallow assignment of adapter */ if (matrix_mdev->kvm) return -EBUSY;
We bail out when kvm != NULL, so having it set to NULL while the mask are being cleared will make these not bail out.
You are correct, I am an idiot.
So what we have here is a catch-22; in other words, we have the case you pointed out above and the cases related to assigning/unassigning adapters, domains and control domains which should exit when a guest is running.
See above.
Ditto.
I may have an idea to resolve this. Suppose we add:
struct ap_matrix_mdev { ... bool kvm_busy; ... }
This flag will be set to true at the start of both the vfio_ap_mdev_set_kvm() and vfio_ap_mdev_unset_kvm() and set to false at the end. The assignment/unassignment and remove callback functions can test this flag and return -EBUSY if the flag is true. That will preclude assigning or unassigning adapters, domains and control domains when the KVM pointer is being set/unset. Likewise, removal of the mediated device will also be prevented while the KVM pointer is being set/unset.
In the case of the PQAP handler function, it can wait for the set/unset of the KVM pointer as follows:
/while (matrix_mdev->kvm_busy) {// // mutex_unlock(&matrix_dev->lock);// // msleep(100);// // mutex_lock(&matrix_dev->lock);// //}// // //if (!matrix_mdev->kvm)// // goto out_unlock;
/What say you? //
I'm not sure. Since I disagree with your analysis above it is difficult to deal with the conclusion. I'm not against decoupling the tracking of the state of the mdev_matrix device from the value of the kvm pointer. I think we should first get a common understanding of the problem, before we proceed to the solution.
Regardless of my brain fog regarding the testing of the matrix_mdev->kvm pointer, I stand by what I stated in the paragraphs just before the code snippet.
The problem is there are 10 functions that depend upon the value of the matrix_mdev->kvm pointer that can get control while the pointer is being set/unset and the matrix_dev->lock is given up to set/clear the masks:
- vfio_ap_irq_enable: called by handle_pqap() when AQIC is intercepted
- vfio_ap_irq_disable: called by handle_pqap() when AQIC is intercepted
- assign_adapter_store: sysfs
- unassign_adapter_store: sysfs
- assign_domain_store: sysfs
- unassign_domain_store: sysfs
- assign__control_domain_store: sysfs
- unassign_control_domain_store: sysfs
- vfio_ap_mdev_remove: sysfs
- vfio_ap_mdev_release: mdev fd closed by userspace (i.e., qemu)If we
add the proposed flag to indicate when the matrix_mdev->kvm
Something is strange with this email. It is basically the same email as the previous one, just broken, or?
the previous email was rejected for the kernel addresses because I used bulleted lists which aren't acceptable. The kernel email addresses accept text-only, so I replaced the bulleted list with the above.
pointer is in flux, then we can check that before allowing the functions in the list above to proceed.
Regards, Halil
linux-stable-mirror@lists.linaro.org