This patch series was ultimately created to fix a bug reported via BZ203687. Halil did some problem determination and concluded that the problem was due to the fact that all queues removed from the guest's AP configuration are not reset. For example, when an adapter or domain is assigned to the mdev matrix, if a queue device associated with the adapter or domain is not bound to the vfio_ap device driver, the adapter to which the queue is attached will be removed from the guest's configuration. The problem is, removing the adapter also implicitly removes all queues attached to that adapter. Those queues also need to be reset to avert leaking crypto data should any of those queues get assigned to another guest or back to the host.
The original intent was to ensure that all queues removed from a guest's AP configuration get reset. The testing included permutations of various scenarios involving the binding/unbinding of queues either manually, or due to dynamic host AP configuration changes initiated via an SE or HMC attached to a DPM-enabled LPAR. This testing revealed several issues that are also addressed via this patch series.
Note that several of the patches has a 'Fixes:' tag as well as a Cc: stable@vger.kernel.org tag. I'm not sure whether this is necessary because the patches not related to the reset issue are probably rarely if ever encountered, so I'd like an opinion on that also.
Tony Krowiak (7): s390/vfio-ap: always filter entire AP matrix s390/vfio-ap: circumvent filtering for adapters/domains not in host config s390/vfio-ap: do not reset queue removed from host config s390/vfio-ap: let 'on_scan_complete' callback filter matrix and update guest's APCB s390/vfio-ap: allow reset of subset of queues assigned to matrix mdev s390/vfio-ap: reset queues filtered from the guest's AP config s390/vfio-ap: reset queues associated with adapter for queue unbound from driver
drivers/s390/crypto/vfio_ap_ops.c | 294 +++++++++++++++++++------- drivers/s390/crypto/vfio_ap_private.h | 25 ++- 2 files changed, 234 insertions(+), 85 deletions(-)
The vfio_ap_mdev_filter_matrix function is called whenever a new adapter or domain is assigned to the mdev. The purpose of the function is to update the guest's AP configuration by filtering the matrix of adapters and domains assigned to the mdev. When an adapter or domain is assigned, only the APQNs associated with the APID of the new adapter or APQI of the new domain are inspected. If an APQN does not reference a queue device bound to the vfio_ap device driver, then it's APID will be filtered from the mdev's matrix when updating the guest's AP configuration.
Inspecting only the APID of the new adapter or APQI of the new domain will result in passing AP queues through to a guest that are not bound to the vfio_ap device driver under certain circumstances. Consider the following:
guest's AP configuration (all also assigned to the mdev's matrix): 14.0004 14.0005 14.0006 16.0004 16.0005 16.0006
unassign domain 4 unbind queue 16.0005 assign domain 4
When domain 4 is re-assigned, since only domain 4 will be inspected, the APQNs that will be examined will be: 14.0004 16.0004
Since both of those APQNs reference queue devices that are bound to the vfio_ap device driver, nothing will get filtered from the mdev's matrix when updating the guest's AP configuration. Consequently, queue 16.0005 will get passed through despite not being bound to the driver. This violates the linux device model requirement that a guest shall only be given access to devices bound to the device driver facilitating their pass-through.
To resolve this problem, every adapter and domain assigned to the mdev will be inspected when filtering the mdev's matrix.
Signed-off-by: Tony Krowiak akrowiak@linux.ibm.com Fixes: 48cae940c31d ("s390/vfio-ap: refresh guest's APCB by filtering AP resources assigned to mdev") Cc: stable@vger.kernel.org --- drivers/s390/crypto/vfio_ap_ops.c | 59 ++++++++++--------------------- 1 file changed, 18 insertions(+), 41 deletions(-)
diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c index 4db538a55192..e5490640e19c 100644 --- a/drivers/s390/crypto/vfio_ap_ops.c +++ b/drivers/s390/crypto/vfio_ap_ops.c @@ -670,8 +670,7 @@ static bool vfio_ap_mdev_filter_cdoms(struct ap_matrix_mdev *matrix_mdev) * Return: a boolean value indicating whether the KVM guest's APCB was changed * by the filtering or not. */ -static bool vfio_ap_mdev_filter_matrix(unsigned long *apm, unsigned long *aqm, - struct ap_matrix_mdev *matrix_mdev) +static bool vfio_ap_mdev_filter_matrix(struct ap_matrix_mdev *matrix_mdev) { unsigned long apid, apqi, apqn; DECLARE_BITMAP(prev_shadow_apm, AP_DEVICES); @@ -692,8 +691,8 @@ static bool vfio_ap_mdev_filter_matrix(unsigned long *apm, unsigned long *aqm, bitmap_and(matrix_mdev->shadow_apcb.aqm, matrix_mdev->matrix.aqm, (unsigned long *)matrix_dev->info.aqm, AP_DOMAINS);
- for_each_set_bit_inv(apid, apm, AP_DEVICES) { - for_each_set_bit_inv(apqi, aqm, AP_DOMAINS) { + for_each_set_bit_inv(apid, matrix_mdev->matrix.apm, AP_DEVICES) { + for_each_set_bit_inv(apqi, matrix_mdev->matrix.aqm, AP_DOMAINS) { /* * If the APQN is not bound to the vfio_ap device * driver, then we can't assign it to the guest's @@ -958,7 +957,6 @@ static ssize_t assign_adapter_store(struct device *dev, { int ret; unsigned long apid; - DECLARE_BITMAP(apm_delta, AP_DEVICES); struct ap_matrix_mdev *matrix_mdev = dev_get_drvdata(dev);
mutex_lock(&ap_perms_mutex); @@ -987,11 +985,8 @@ static ssize_t assign_adapter_store(struct device *dev, }
vfio_ap_mdev_link_adapter(matrix_mdev, apid); - memset(apm_delta, 0, sizeof(apm_delta)); - set_bit_inv(apid, apm_delta);
- if (vfio_ap_mdev_filter_matrix(apm_delta, - matrix_mdev->matrix.aqm, matrix_mdev)) + if (vfio_ap_mdev_filter_matrix(matrix_mdev)) vfio_ap_mdev_update_guest_apcb(matrix_mdev);
ret = count; @@ -1167,7 +1162,6 @@ static ssize_t assign_domain_store(struct device *dev, { int ret; unsigned long apqi; - DECLARE_BITMAP(aqm_delta, AP_DOMAINS); struct ap_matrix_mdev *matrix_mdev = dev_get_drvdata(dev);
mutex_lock(&ap_perms_mutex); @@ -1196,11 +1190,8 @@ static ssize_t assign_domain_store(struct device *dev, }
vfio_ap_mdev_link_domain(matrix_mdev, apqi); - memset(aqm_delta, 0, sizeof(aqm_delta)); - set_bit_inv(apqi, aqm_delta);
- if (vfio_ap_mdev_filter_matrix(matrix_mdev->matrix.apm, aqm_delta, - matrix_mdev)) + if (vfio_ap_mdev_filter_matrix(matrix_mdev)) vfio_ap_mdev_update_guest_apcb(matrix_mdev);
ret = count; @@ -2091,9 +2082,7 @@ int vfio_ap_mdev_probe_queue(struct ap_device *apdev) if (matrix_mdev) { vfio_ap_mdev_link_queue(matrix_mdev, q);
- if (vfio_ap_mdev_filter_matrix(matrix_mdev->matrix.apm, - matrix_mdev->matrix.aqm, - matrix_mdev)) + if (vfio_ap_mdev_filter_matrix(matrix_mdev)) vfio_ap_mdev_update_guest_apcb(matrix_mdev); } dev_set_drvdata(&apdev->device, q); @@ -2443,35 +2432,23 @@ void vfio_ap_on_cfg_changed(struct ap_config_info *cur_cfg_info,
static void vfio_ap_mdev_hot_plug_cfg(struct ap_matrix_mdev *matrix_mdev) { - bool do_hotplug = false; - int filter_domains = 0; - int filter_adapters = 0; - DECLARE_BITMAP(apm, AP_DEVICES); - DECLARE_BITMAP(aqm, AP_DOMAINS); + bool filter_domains, filter_adapters, filter_cdoms, do_hotplug = false;
mutex_lock(&matrix_mdev->kvm->lock); mutex_lock(&matrix_dev->mdevs_lock);
- filter_adapters = bitmap_and(apm, matrix_mdev->matrix.apm, - matrix_mdev->apm_add, AP_DEVICES); - filter_domains = bitmap_and(aqm, matrix_mdev->matrix.aqm, - matrix_mdev->aqm_add, AP_DOMAINS); - - if (filter_adapters && filter_domains) - do_hotplug |= vfio_ap_mdev_filter_matrix(apm, aqm, matrix_mdev); - else if (filter_adapters) - do_hotplug |= - vfio_ap_mdev_filter_matrix(apm, - matrix_mdev->shadow_apcb.aqm, - matrix_mdev); - else - do_hotplug |= - vfio_ap_mdev_filter_matrix(matrix_mdev->shadow_apcb.apm, - aqm, matrix_mdev); + filter_adapters = bitmap_intersects(matrix_mdev->matrix.apm, + matrix_mdev->apm_add, AP_DEVICES); + filter_domains = bitmap_intersects(matrix_mdev->matrix.aqm, + matrix_mdev->aqm_add, AP_DOMAINS); + filter_cdoms = bitmap_intersects(matrix_mdev->matrix.adm, + matrix_mdev->adm_add, AP_DOMAINS); + + if (filter_adapters || filter_domains) + do_hotplug = vfio_ap_mdev_filter_matrix(matrix_mdev);
- if (bitmap_intersects(matrix_mdev->matrix.adm, matrix_mdev->adm_add, - AP_DOMAINS)) - do_hotplug |= vfio_ap_mdev_filter_cdoms(matrix_mdev); + if (filter_cdoms) + do_hotplug = vfio_ap_mdev_filter_cdoms(matrix_mdev);
if (do_hotplug) vfio_ap_mdev_update_guest_apcb(matrix_mdev);
While filtering the mdev matrix, it doesn't make sense - and will have unexpected results - to filter an APID from the matrix if the APID or one of the associated APQIs is not in the host's AP configuration. There are two reasons for this:
1. An adapter or domain that is not in the host's AP configuration can be assigned to the matrix; this is known as over-provisioning. Queue devices, however, are only created for adapters and domains in the host's AP configuration, so there will be no queues associated with an over-provisioned adapter or domain to filter.
2. The adapter or domain may have been externally removed from the host's configuration via an SE or HMC attached to a DPM enabled LPAR. In this case, the vfio_ap device driver would have been notified by the AP bus via the on_config_changed callback and the adapter or domain would have already been filtered.
Let's bypass the filtering of an APID if an adapter or domain assigned to the mdev matrix is not in the host's AP configuration.
Signed-off-by: Tony Krowiak akrowiak@linux.ibm.com Fixes: 48cae940c31d ("s390/vfio-ap: refresh guest's APCB by filtering AP resources assigned to mdev") Cc: stable@vger.kernel.org --- drivers/s390/crypto/vfio_ap_ops.c | 32 +++++++++++++++++++++++++------ 1 file changed, 26 insertions(+), 6 deletions(-)
diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c index e5490640e19c..4e40e226ce62 100644 --- a/drivers/s390/crypto/vfio_ap_ops.c +++ b/drivers/s390/crypto/vfio_ap_ops.c @@ -692,17 +692,37 @@ static bool vfio_ap_mdev_filter_matrix(struct ap_matrix_mdev *matrix_mdev) (unsigned long *)matrix_dev->info.aqm, AP_DOMAINS);
for_each_set_bit_inv(apid, matrix_mdev->matrix.apm, AP_DEVICES) { + /* + * If the adapter is not in the host's AP configuration, it will + * be due to one of two reasons: + * 1. The adapter is over-provisioned. + * 2. The adapter was removed from the host's + * configuration in which case it will already have + * been processed by the on_config_changed callback. + * In either case, we should skip the filtering and + * continue examining APIDs. + */ + if (!test_bit_inv(apid, (unsigned long *)matrix_dev->info.apm)) + continue; + for_each_set_bit_inv(apqi, matrix_mdev->matrix.aqm, AP_DOMAINS) { /* - * If the APQN is not bound to the vfio_ap device - * driver, then we can't assign it to the guest's - * AP configuration. The AP architecture won't - * allow filtering of a single APQN, so let's filter - * the APID since an adapter represents a physical - * hardware device. + * If the domain is not in the host's AP configuration, + * it will for one of two reasons: + * 1. The domain is over-provisioned. + * 2. The domain was removed from the host's + * configuration in which case it will already have + * been processed by the on_config_changed callback. + * In either case, we should skip the filtering and + * continue examining APQIs. */ + if (!test_bit_inv(apqi, + (unsigned long *)matrix_dev->info.aqm)) + continue; + apqn = AP_MKQID(apid, apqi); q = vfio_ap_mdev_get_queue(matrix_mdev, apqn); + if (!q || q->reset_status.response_code) { clear_bit_inv(apid, matrix_mdev->shadow_apcb.apm);
On Tue, 17 Oct 2023 18:22:49 -0400 Tony Krowiak akrowiak@linux.ibm.com wrote:
While filtering the mdev matrix, it doesn't make sense - and will have unexpected results - to filter an APID from the matrix if the APID or one of the associated APQIs is not in the host's AP configuration. There are two reasons for this:
An adapter or domain that is not in the host's AP configuration can be assigned to the matrix; this is known as over-provisioning. Queue devices, however, are only created for adapters and domains in the host's AP configuration, so there will be no queues associated with an over-provisioned adapter or domain to filter.
The adapter or domain may have been externally removed from the host's configuration via an SE or HMC attached to a DPM enabled LPAR. In this case, the vfio_ap device driver would have been notified by the AP bus via the on_config_changed callback and the adapter or domain would have already been filtered.
Let's bypass the filtering of an APID if an adapter or domain assigned to the mdev matrix is not in the host's AP configuration.
I strongly agree.
Signed-off-by: Tony Krowiak akrowiak@linux.ibm.com Fixes: 48cae940c31d ("s390/vfio-ap: refresh guest's APCB by filtering AP resources assigned to mdev") Cc: stable@vger.kernel.org
drivers/s390/crypto/vfio_ap_ops.c | 32 +++++++++++++++++++++++++------ 1 file changed, 26 insertions(+), 6 deletions(-)
diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c index e5490640e19c..4e40e226ce62 100644 --- a/drivers/s390/crypto/vfio_ap_ops.c +++ b/drivers/s390/crypto/vfio_ap_ops.c @@ -692,17 +692,37 @@ static bool vfio_ap_mdev_filter_matrix(struct ap_matrix_mdev *matrix_mdev) (unsigned long *)matrix_dev->info.aqm, AP_DOMAINS);
for_each_set_bit_inv(apid, matrix_mdev->matrix.apm, AP_DEVICES) {
What speaks against doing the loop on matrix_mdev->shadow_apcb.a[pq]m?
Those are the and of matrix_mdev->matrix.a{p,q}m and matrix_dev->info.a{p,q}m so excactly those bits are 0 for which you are adding the ifs...
/*
* If the adapter is not in the host's AP configuration, it will
* be due to one of two reasons:
* 1. The adapter is over-provisioned.
* 2. The adapter was removed from the host's
* configuration in which case it will already have
* been processed by the on_config_changed callback.
* In either case, we should skip the filtering and
* continue examining APIDs.
*/
if (!test_bit_inv(apid, (unsigned long *)matrix_dev->info.apm))
continue;
- for_each_set_bit_inv(apqi, matrix_mdev->matrix.aqm, AP_DOMAINS) { /*
* If the APQN is not bound to the vfio_ap device
* driver, then we can't assign it to the guest's
* AP configuration. The AP architecture won't
* allow filtering of a single APQN, so let's filter
* the APID since an adapter represents a physical
* hardware device.
* If the domain is not in the host's AP configuration,
* it will for one of two reasons:
* 1. The domain is over-provisioned.
* 2. The domain was removed from the host's
* configuration in which case it will already have
* been processed by the on_config_changed callback.
* In either case, we should skip the filtering and
* continue examining APQIs. */
if (!test_bit_inv(apqi,
(unsigned long *)matrix_dev->info.aqm))
continue;
apqn = AP_MKQID(apid, apqi); q = vfio_ap_mdev_get_queue(matrix_mdev, apqn);
if (!q || q->reset_status.response_code) { clear_bit_inv(apid, matrix_mdev->shadow_apcb.apm);
On 10/18/23 13:01, Halil Pasic wrote:
On Tue, 17 Oct 2023 18:22:49 -0400 Tony Krowiak akrowiak@linux.ibm.com wrote:
While filtering the mdev matrix, it doesn't make sense - and will have unexpected results - to filter an APID from the matrix if the APID or one of the associated APQIs is not in the host's AP configuration. There are two reasons for this:
An adapter or domain that is not in the host's AP configuration can be assigned to the matrix; this is known as over-provisioning. Queue devices, however, are only created for adapters and domains in the host's AP configuration, so there will be no queues associated with an over-provisioned adapter or domain to filter.
The adapter or domain may have been externally removed from the host's configuration via an SE or HMC attached to a DPM enabled LPAR. In this case, the vfio_ap device driver would have been notified by the AP bus via the on_config_changed callback and the adapter or domain would have already been filtered.
Let's bypass the filtering of an APID if an adapter or domain assigned to the mdev matrix is not in the host's AP configuration.
I strongly agree.
Signed-off-by: Tony Krowiak akrowiak@linux.ibm.com Fixes: 48cae940c31d ("s390/vfio-ap: refresh guest's APCB by filtering AP resources assigned to mdev") Cc: stable@vger.kernel.org
drivers/s390/crypto/vfio_ap_ops.c | 32 +++++++++++++++++++++++++------ 1 file changed, 26 insertions(+), 6 deletions(-)
diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c index e5490640e19c..4e40e226ce62 100644 --- a/drivers/s390/crypto/vfio_ap_ops.c +++ b/drivers/s390/crypto/vfio_ap_ops.c @@ -692,17 +692,37 @@ static bool vfio_ap_mdev_filter_matrix(struct ap_matrix_mdev *matrix_mdev) (unsigned long *)matrix_dev->info.aqm, AP_DOMAINS);
for_each_set_bit_inv(apid, matrix_mdev->matrix.apm, AP_DEVICES) {
What speaks against doing the loop on matrix_mdev->shadow_apcb.a[pq]m?
Those are the and of matrix_mdev->matrix.a{p,q}m and matrix_dev->info.a{p,q}m so excactly those bits are 0 for which you are adding the ifs...
You are correct, there is no good reason to avoid looping on the shadow_apcb. I'll change this patch to do just that.
/*
* If the adapter is not in the host's AP configuration, it will
* be due to one of two reasons:
* 1. The adapter is over-provisioned.
* 2. The adapter was removed from the host's
* configuration in which case it will already have
* been processed by the on_config_changed callback.
* In either case, we should skip the filtering and
* continue examining APIDs.
*/
if (!test_bit_inv(apid, (unsigned long *)matrix_dev->info.apm))
continue;
- for_each_set_bit_inv(apqi, matrix_mdev->matrix.aqm, AP_DOMAINS) { /*
* If the APQN is not bound to the vfio_ap device
* driver, then we can't assign it to the guest's
* AP configuration. The AP architecture won't
* allow filtering of a single APQN, so let's filter
* the APID since an adapter represents a physical
* hardware device.
* If the domain is not in the host's AP configuration,
* it will for one of two reasons:
* 1. The domain is over-provisioned.
* 2. The domain was removed from the host's
* configuration in which case it will already have
* been processed by the on_config_changed callback.
* In either case, we should skip the filtering and
* continue examining APQIs. */
if (!test_bit_inv(apqi,
(unsigned long *)matrix_dev->info.aqm))
continue;
apqn = AP_MKQID(apid, apqi); q = vfio_ap_mdev_get_queue(matrix_mdev, apqn);
if (!q || q->reset_status.response_code) { clear_bit_inv(apid, matrix_mdev->shadow_apcb.apm);
When a queue is unbound from the vfio_ap device driver, it is reset to ensure its crypto data is not leaked when it is bound to another device driver. If the queue is unbound due to the fact that the adapter or domain was removed from the host's AP configuration, then attempting to reset it will fail with response code 01 (APID not valid) getting returned from the reset command. Let's ensure that the queue is assigned to the host's configuration before resetting it.
Signed-off-by: Tony Krowiak akrowiak@linux.ibm.com Fixes: eeb386aeb5b7 ("s390/vfio-ap: handle config changed and scan complete notification") Cc: stable@vger.kernel.org --- drivers/s390/crypto/vfio_ap_ops.c | 18 +++++++++++++----- 1 file changed, 13 insertions(+), 5 deletions(-)
diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c index 4e40e226ce62..08d612dfc506 100644 --- a/drivers/s390/crypto/vfio_ap_ops.c +++ b/drivers/s390/crypto/vfio_ap_ops.c @@ -2125,13 +2125,12 @@ void vfio_ap_mdev_remove_queue(struct ap_device *apdev) q = dev_get_drvdata(&apdev->device); get_update_locks_for_queue(q); matrix_mdev = q->matrix_mdev; + apid = AP_QID_CARD(q->apqn); + apqi = AP_QID_QUEUE(q->apqn);
if (matrix_mdev) { vfio_ap_unlink_queue_fr_mdev(q);
- apid = AP_QID_CARD(q->apqn); - apqi = AP_QID_QUEUE(q->apqn); - /* * If the queue is assigned to the guest's APCB, then remove * the adapter's APID from the APCB and hot it into the guest. @@ -2143,8 +2142,17 @@ void vfio_ap_mdev_remove_queue(struct ap_device *apdev) } }
- vfio_ap_mdev_reset_queue(q); - flush_work(&q->reset_work); + /* + * If the queue is not in the host's AP configuration, then resetting + * it will fail with response code 01, (APQN not valid); so, let's make + * sure it is in the host's config. + */ + if (test_bit_inv(apid, (unsigned long *)matrix_dev->info.apm) && + test_bit_inv(apqi, (unsigned long *)matrix_dev->info.aqm)) { + vfio_ap_mdev_reset_queue(q); + flush_work(&q->reset_work); + } + dev_set_drvdata(&apdev->device, NULL); kfree(q); release_update_locks_for_mdev(matrix_mdev);
When adapters and/or domains are added to the host's AP configuration, this may result in multiple queue devices getting created and probed by the vfio_ap device driver. For each queue device probed, the matrix of adapters and domains assigned to a matrix mdev will be filtered to update the guest's APCB. If any adapters or domains get added to or removed from the APCB, the guest's AP configuration will be dynamically updated (i.e., hot plug/unplug). To dynamically update the guest's configuration, its VCPUs must be taken out of SIE for the period of time it takes to make the update. This is disruptive to the guest's operation and if there are many queues probed due to a change in the host's AP configuration, this could be troublesome. The problem is exacerbated by the fact that the 'on_scan_complete' callback also filters the mdev's matrix and updates the guest's AP configuration.
In order to reduce the potential amount of disruption to the guest that may result from a change to the host's AP configuration, let's bypass the filtering of the matrix and updating of the guest's AP configuration in the probe callback - if due to a host config change - and defer it until the 'on_scan_complete' callback is invoked after the AP bus finishes its device scan operation. This way the filtering and updating will be performed only once regardless of the number of queues added.
Signed-off-by: Tony Krowiak akrowiak@linux.ibm.com Fixes: eeb386aeb5b7 ("s390/vfio-ap: handle config changed and scan complete notification") Cc: stable@vger.kernel.org --- drivers/s390/crypto/vfio_ap_ops.c | 13 +++++++++++++ 1 file changed, 13 insertions(+)
diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c index 08d612dfc506..7c2cd062ffe8 100644 --- a/drivers/s390/crypto/vfio_ap_ops.c +++ b/drivers/s390/crypto/vfio_ap_ops.c @@ -2102,9 +2102,22 @@ int vfio_ap_mdev_probe_queue(struct ap_device *apdev) if (matrix_mdev) { vfio_ap_mdev_link_queue(matrix_mdev, q);
+ /* + * If we're in the process of handling the adding of adapters or + * domains to the host's AP configuration, then let the + * vfio_ap device driver's on_scan_complete callback filter the + * matrix and update the guest's AP configuration after all of + * the new queue devices are probed. + */ + if (!bitmap_empty(matrix_mdev->apm_add, AP_DEVICES) || + !bitmap_empty(matrix_mdev->aqm_add, AP_DOMAINS)) + goto done; + if (vfio_ap_mdev_filter_matrix(matrix_mdev)) vfio_ap_mdev_update_guest_apcb(matrix_mdev); } + +done: dev_set_drvdata(&apdev->device, q); release_update_locks_for_mdev(matrix_mdev);
When filtering the adapters and domains assigned to the matrix mdev to create or update a guest's AP configuration, if the APID of an adapter and the APQI of a domain identify a queue device that is not bound to the vfio_ap device driver, the APID of the adapter will be unassigned from the guest's AP configuration because an individual APQN can not be unassigned due to the fact the APQNs are assigned via a matrix of APIDs and APQIs. Consequently, a guest will lose access to all of the queues associated with that adapter. Since these queues could subsequently get assigned to another guest, they must be reset in order to avoid prevent the leaking of data from the queue being unassigned, so let's make sure all queues associated with an adapter unplugged from the guest.
Signed-off-by: Tony Krowiak akrowiak@linux.ibm.com Fixes: 48cae940c31d ("s390/vfio-ap: refresh guest's APCB by filtering AP resources assigned to mdev") Cc: stable@vger.kernel.org --- drivers/s390/crypto/vfio_ap_ops.c | 95 ++++++++++++++++++++++++++++--- 1 file changed, 86 insertions(+), 9 deletions(-)
diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c index 362d3942e506..2a1e6979d613 100644 --- a/drivers/s390/crypto/vfio_ap_ops.c +++ b/drivers/s390/crypto/vfio_ap_ops.c @@ -661,16 +661,21 @@ static bool vfio_ap_mdev_filter_cdoms(struct ap_matrix_mdev *matrix_mdev) * device driver. * * @matrix_mdev: the matrix mdev whose matrix is to be filtered. + * @apm_filtered: a 256-bit bitmap for storing the APIDs filtered from the + * guest's AP configuration. * * Note: If an APQN referencing a queue device that is not bound to the vfio_ap * driver, its APID will be filtered from the guest's APCB. The matrix * structure precludes filtering an individual APQN, so its APID will be - * filtered. + * filtered. Consequently, all queues associated with the adapter + * identified by the filtered APID will need to be reset lest they get + * plugged into another guest. * * Return: a boolean value indicating whether the KVM guest's APCB was changed * by the filtering or not. */ -static bool vfio_ap_mdev_filter_matrix(struct ap_matrix_mdev *matrix_mdev) +static bool vfio_ap_mdev_filter_matrix(struct ap_matrix_mdev *matrix_mdev, + unsigned long *apm_filtered) { unsigned long apid, apqi, apqn; DECLARE_BITMAP(prev_shadow_apm, AP_DEVICES); @@ -680,6 +685,7 @@ static bool vfio_ap_mdev_filter_matrix(struct ap_matrix_mdev *matrix_mdev) bitmap_copy(prev_shadow_apm, matrix_mdev->shadow_apcb.apm, AP_DEVICES); bitmap_copy(prev_shadow_aqm, matrix_mdev->shadow_apcb.aqm, AP_DOMAINS); vfio_ap_matrix_init(&matrix_dev->info, &matrix_mdev->shadow_apcb); + bitmap_clear(apm_filtered, 0, AP_DEVICES);
/* * Copy the adapters, domains and control domains to the shadow_apcb @@ -724,8 +730,16 @@ static bool vfio_ap_mdev_filter_matrix(struct ap_matrix_mdev *matrix_mdev) q = vfio_ap_mdev_get_queue(matrix_mdev, apqn);
if (!q || q->reset_status.response_code) { - clear_bit_inv(apid, - matrix_mdev->shadow_apcb.apm); + clear_bit_inv(apid, matrix_mdev->shadow_apcb.apm); + + /* + * If the adapter was previously plugged into + * the guest, let's let the caller know that + * the APID was filtered. + */ + if (test_bit_inv(apid, prev_shadow_apm)) + set_bit_inv(apid, apm_filtered); + break; } } @@ -937,6 +951,52 @@ static void vfio_ap_mdev_link_adapter(struct ap_matrix_mdev *matrix_mdev, AP_MKQID(apid, apqi)); }
+static int reset_queues_for_apids(struct ap_matrix_mdev *matrix_mdev, + unsigned long *apm_reset) +{ + struct vfio_ap_queue *q; + unsigned long apid, apqi; + struct hlist_node *tmp_qnode; + struct ap_queue_table *qtable; + int apqn, ret = 0, loop_cursor; + + qtable = kzalloc(sizeof(*qtable), GFP_KERNEL); + if (WARN(!qtable, "Failed to allocate qtable")) + return -ENOMEM; + + ap_queue_table_init(qtable, RESET_QTABLE); + + for_each_set_bit_inv(apid, apm_reset, AP_DEVICES) { + for_each_set_bit_inv(apqi, matrix_mdev->matrix.aqm, + AP_DOMAINS) { + /* + * If the domain is not in the host's AP configuration, + * then resetting it will fail with response code 01 + * (APQN not valid). + */ + if (!test_bit_inv(apqi, + (unsigned long *)matrix_dev->info.aqm)) + continue; + + apqn = AP_MKQID(apid, apqi); + q = vfio_ap_mdev_get_queue(matrix_mdev, apqn); + + if (q) + hash_add(qtable->queues, &q->reset_qnode, apqn); + } + } + + ret = vfio_ap_mdev_reset_queues(qtable); + + hash_for_each_safe(qtable->queues, loop_cursor, tmp_qnode, q, + reset_qnode) + hash_del(&q->reset_qnode); + + kfree(qtable); + + return ret; +} + /** * assign_adapter_store - parses the APID from @buf and sets the * corresponding bit in the mediated matrix device's APM @@ -977,6 +1037,7 @@ static ssize_t assign_adapter_store(struct device *dev, { int ret; unsigned long apid; + DECLARE_BITMAP(apm_filtered, AP_DEVICES); struct ap_matrix_mdev *matrix_mdev = dev_get_drvdata(dev);
mutex_lock(&ap_perms_mutex); @@ -1006,8 +1067,10 @@ static ssize_t assign_adapter_store(struct device *dev,
vfio_ap_mdev_link_adapter(matrix_mdev, apid);
- if (vfio_ap_mdev_filter_matrix(matrix_mdev)) + if (vfio_ap_mdev_filter_matrix(matrix_mdev, apm_filtered)) { vfio_ap_mdev_update_guest_apcb(matrix_mdev); + reset_queues_for_apids(matrix_mdev, apm_filtered); + }
ret = count; done: @@ -1183,6 +1246,7 @@ static ssize_t assign_domain_store(struct device *dev, { int ret; unsigned long apqi; + DECLARE_BITMAP(aqm_filtered, AP_DOMAINS); struct ap_matrix_mdev *matrix_mdev = dev_get_drvdata(dev);
mutex_lock(&ap_perms_mutex); @@ -1212,8 +1276,10 @@ static ssize_t assign_domain_store(struct device *dev,
vfio_ap_mdev_link_domain(matrix_mdev, apqi);
- if (vfio_ap_mdev_filter_matrix(matrix_mdev)) + if (vfio_ap_mdev_filter_matrix(matrix_mdev, aqm_filtered)) { vfio_ap_mdev_update_guest_apcb(matrix_mdev); + reset_queues_for_apids(matrix_mdev, aqm_filtered); + }
ret = count; done: @@ -1762,6 +1828,9 @@ static int vfio_ap_mdev_reset_queues(struct ap_queue_table *qtable) int ret = 0, loop_cursor; struct vfio_ap_queue *q;
+ if (hash_empty(qtable->queues)) + return 0; + switch (qtable->type) { case MDEV_QTABLE: hash_for_each(qtable->queues, loop_cursor, q, mdev_qnode) @@ -2098,6 +2167,7 @@ int vfio_ap_mdev_probe_queue(struct ap_device *apdev) { int ret; struct vfio_ap_queue *q; + DECLARE_BITMAP(apm_filtered, AP_DEVICES); struct ap_matrix_mdev *matrix_mdev;
ret = sysfs_create_group(&apdev->device.kobj, &vfio_queue_attr_group); @@ -2130,15 +2200,17 @@ int vfio_ap_mdev_probe_queue(struct ap_device *apdev) !bitmap_empty(matrix_mdev->aqm_add, AP_DOMAINS)) goto done;
- if (vfio_ap_mdev_filter_matrix(matrix_mdev)) + if (vfio_ap_mdev_filter_matrix(matrix_mdev, apm_filtered)) { vfio_ap_mdev_update_guest_apcb(matrix_mdev); + reset_queues_for_apids(matrix_mdev, apm_filtered); + } }
done: dev_set_drvdata(&apdev->device, q); release_update_locks_for_mdev(matrix_mdev);
- return 0; + return ret;
err_remove_group: sysfs_remove_group(&apdev->device.kobj, &vfio_queue_attr_group); @@ -2490,6 +2562,7 @@ void vfio_ap_on_cfg_changed(struct ap_config_info *cur_cfg_info,
static void vfio_ap_mdev_hot_plug_cfg(struct ap_matrix_mdev *matrix_mdev) { + DECLARE_BITMAP(apm_filtered, AP_DEVICES); bool filter_domains, filter_adapters, filter_cdoms, do_hotplug = false;
mutex_lock(&matrix_mdev->kvm->lock); @@ -2503,7 +2576,8 @@ static void vfio_ap_mdev_hot_plug_cfg(struct ap_matrix_mdev *matrix_mdev) matrix_mdev->adm_add, AP_DOMAINS);
if (filter_adapters || filter_domains) - do_hotplug = vfio_ap_mdev_filter_matrix(matrix_mdev); + do_hotplug = vfio_ap_mdev_filter_matrix(matrix_mdev, + apm_filtered);
if (filter_cdoms) do_hotplug = vfio_ap_mdev_filter_cdoms(matrix_mdev); @@ -2511,6 +2585,9 @@ static void vfio_ap_mdev_hot_plug_cfg(struct ap_matrix_mdev *matrix_mdev) if (do_hotplug) vfio_ap_mdev_update_guest_apcb(matrix_mdev);
+ if (!bitmap_empty(apm_filtered, AP_DEVICES)) + reset_queues_for_apids(matrix_mdev, apm_filtered); + mutex_unlock(&matrix_dev->mdevs_lock); mutex_unlock(&matrix_mdev->kvm->lock); }
When a queue is unbound from the vfio_ap device driver, if that queue is assigned to a guest's AP configuration, its associated adapter is removed because queues are defined to a guest via a matrix of adapters and domains; so, it is not possible to remove a single queue.
If an adapter is removed from the guest's AP configuration, all associated queues must be reset to prevent leaking crypto data should any of them be assigned to a different guest or device driver. The one caveat is that if the queue is being removed because the adapter or domain has been removed from the host's AP configuration, then an attempt to reset the queue will fail with response code 01, AP-queue number not valid; so resetting these queues should be skipped.
Signed-off-by: Tony Krowiak akrowiak@linux.ibm.com Fixes: 09d31ff78793 ("s390/vfio-ap: hot plug/unplug of AP devices when probed/removed") Cc: stable@vger.kernel.org --- drivers/s390/crypto/vfio_ap_ops.c | 48 ++++++++++++++++++++++--------- 1 file changed, 34 insertions(+), 14 deletions(-)
diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c index 2a1e6979d613..e57202e92a0e 100644 --- a/drivers/s390/crypto/vfio_ap_ops.c +++ b/drivers/s390/crypto/vfio_ap_ops.c @@ -2217,6 +2217,23 @@ int vfio_ap_mdev_probe_queue(struct ap_device *apdev) return ret; }
+static void reset_queues_for_apid(struct ap_matrix_mdev *matrix_mdev, + unsigned long apid) +{ + DECLARE_BITMAP(apm_reset, AP_DEVICES); + + /* + * If the adapter is not in the host's AP configuration, then resetting + * any queue for that adapter will fail with response code 01, (APQN not + * valid). + */ + if (test_bit_inv(apid, (unsigned long *)matrix_dev->info.apm)) { + bitmap_clear(apm_reset, 0, AP_DEVICES); + set_bit_inv(apid, apm_reset); + reset_queues_for_apids(matrix_mdev, apm_reset); + } +} + void vfio_ap_mdev_remove_queue(struct ap_device *apdev) { unsigned long apid, apqi; @@ -2231,23 +2248,24 @@ void vfio_ap_mdev_remove_queue(struct ap_device *apdev) apqi = AP_QID_QUEUE(q->apqn);
if (matrix_mdev) { - vfio_ap_unlink_queue_fr_mdev(q); - - /* - * If the queue is assigned to the guest's APCB, then remove - * the adapter's APID from the APCB and hot it into the guest. - */ + /* If the queue is assigned to the guest's AP configuration */ if (test_bit_inv(apid, matrix_mdev->shadow_apcb.apm) && test_bit_inv(apqi, matrix_mdev->shadow_apcb.aqm)) { + /* + * Since the queues are defined via a matrix of adapters + * and domains, it is not possible to hot unplug a + * single queue; so, let's unplug the adapter. + */ clear_bit_inv(apid, matrix_mdev->shadow_apcb.apm); vfio_ap_mdev_update_guest_apcb(matrix_mdev); + reset_queues_for_apid(matrix_mdev, apid); + goto done; } }
/* - * If the queue is not in the host's AP configuration, then resetting - * it will fail with response code 01, (APQN not valid); so, let's make - * sure it is in the host's config. + * Make sure the queue is in the host's AP configuration or attempting + * to reset it will fail with response code 01 (APQN not valid). */ if (test_bit_inv(apid, (unsigned long *)matrix_dev->info.apm) && test_bit_inv(apqi, (unsigned long *)matrix_dev->info.aqm)) { @@ -2255,6 +2273,10 @@ void vfio_ap_mdev_remove_queue(struct ap_device *apdev) flush_work(&q->reset_work); }
+done: + if (matrix_mdev) + vfio_ap_unlink_queue_fr_mdev(q); + dev_set_drvdata(&apdev->device, NULL); kfree(q); release_update_locks_for_mdev(matrix_mdev); @@ -2305,17 +2327,15 @@ static void vfio_ap_mdev_hot_unplug_cfg(struct ap_matrix_mdev *matrix_mdev, { int do_hotplug = 0;
- if (!bitmap_empty(aprem, AP_DEVICES)) { + if (!bitmap_empty(aprem, AP_DEVICES)) do_hotplug |= bitmap_andnot(matrix_mdev->shadow_apcb.apm, matrix_mdev->shadow_apcb.apm, aprem, AP_DEVICES); - }
- if (!bitmap_empty(aqrem, AP_DOMAINS)) { + if (!bitmap_empty(aqrem, AP_DOMAINS)) do_hotplug |= bitmap_andnot(matrix_mdev->shadow_apcb.aqm, matrix_mdev->shadow_apcb.aqm, - aqrem, AP_DEVICES); - } + aqrem, AP_DOMAINS);
if (!bitmap_empty(cdrem, AP_DOMAINS)) do_hotplug |= bitmap_andnot(matrix_mdev->shadow_apcb.adm,
linux-stable-mirror@lists.linaro.org