The patch below does not apply to the 4.9-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to stable@vger.kernel.org.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 021ad274d7dc31611d4f47f7dd4ac7a224526f30 Mon Sep 17 00:00:00 2001
From: Dexuan Cui decui@microsoft.com Date: Thu, 15 Mar 2018 14:20:53 +0000 Subject: [PATCH] PCI: hv: Serialize the present and eject work items
When we hot-remove the device, we first receive a PCI_EJECT message and then receive a PCI_BUS_RELATIONS message with bus_rel->device_count == 0.
The first message is offloaded to hv_eject_device_work(), and the second is offloaded to pci_devices_present_work(). Both the paths can be running list_del(&hpdev->list_entry), causing general protection fault, because system_wq can run them concurrently.
The patch eliminates the race condition.
Since access to present/eject work items is serialized, we do not need the hbus->enum_sem anymore, so remove it.
Fixes: 4daace0d8ce8 ("PCI: hv: Add paravirtual PCI front-end for Microsoft Hyper-V VMs") Link: https://lkml.kernel.org/r/KL1P15301MB00064DA6B4D221123B5241CFBFD70@KL1P15301... Tested-by: Adrian Suhov v-adsuho@microsoft.com Tested-by: Chris Valean v-chvale@microsoft.com Signed-off-by: Dexuan Cui decui@microsoft.com [lorenzo.pieralisi@arm.com: squashed semaphore removal patch] Signed-off-by: Lorenzo Pieralisi lorenzo.pieralisi@arm.com Reviewed-by: Michael Kelley mikelley@microsoft.com Acked-by: Haiyang Zhang haiyangz@microsoft.com Cc: stable@vger.kernel.org # v4.6+ Cc: Vitaly Kuznetsov vkuznets@redhat.com Cc: Jack Morgenstein jackm@mellanox.com Cc: Stephen Hemminger sthemmin@microsoft.com Cc: K. Y. Srinivasan kys@microsoft.com
diff --git a/drivers/pci/host/pci-hyperv.c b/drivers/pci/host/pci-hyperv.c index 2faf38eab785..b7fd5c157d73 100644 --- a/drivers/pci/host/pci-hyperv.c +++ b/drivers/pci/host/pci-hyperv.c @@ -447,7 +447,6 @@ struct hv_pcibus_device { spinlock_t device_list_lock; /* Protect lists below */ void __iomem *cfg_addr;
- struct semaphore enum_sem; struct list_head resources_for_children;
struct list_head children; @@ -461,6 +460,8 @@ struct hv_pcibus_device { struct retarget_msi_interrupt retarget_msi_interrupt_params;
spinlock_t retarget_msi_interrupt_lock; + + struct workqueue_struct *wq; };
/* @@ -1590,12 +1591,8 @@ static struct hv_pci_dev *get_pcichild_wslot(struct hv_pcibus_device *hbus, * It must also treat the omission of a previously observed device as * notification that the device no longer exists. * - * Note that this function is a work item, and it may not be - * invoked in the order that it was queued. Back to back - * updates of the list of present devices may involve queuing - * multiple work items, and this one may run before ones that - * were sent later. As such, this function only does something - * if is the last one in the queue. + * Note that this function is serialized with hv_eject_device_work(), + * because both are pushed to the ordered workqueue hbus->wq. */ static void pci_devices_present_work(struct work_struct *work) { @@ -1616,11 +1613,6 @@ static void pci_devices_present_work(struct work_struct *work)
INIT_LIST_HEAD(&removed);
- if (down_interruptible(&hbus->enum_sem)) { - put_hvpcibus(hbus); - return; - } - /* Pull this off the queue and process it if it was the last one. */ spin_lock_irqsave(&hbus->device_list_lock, flags); while (!list_empty(&hbus->dr_list)) { @@ -1637,7 +1629,6 @@ static void pci_devices_present_work(struct work_struct *work) spin_unlock_irqrestore(&hbus->device_list_lock, flags);
if (!dr) { - up(&hbus->enum_sem); put_hvpcibus(hbus); return; } @@ -1724,7 +1715,6 @@ static void pci_devices_present_work(struct work_struct *work) break; }
- up(&hbus->enum_sem); put_hvpcibus(hbus); kfree(dr); } @@ -1770,7 +1760,7 @@ static void hv_pci_devices_present(struct hv_pcibus_device *hbus, spin_unlock_irqrestore(&hbus->device_list_lock, flags);
get_hvpcibus(hbus); - schedule_work(&dr_wrk->wrk); + queue_work(hbus->wq, &dr_wrk->wrk); }
/** @@ -1848,7 +1838,7 @@ static void hv_pci_eject_device(struct hv_pci_dev *hpdev) get_pcichild(hpdev, hv_pcidev_ref_pnp); INIT_WORK(&hpdev->wrk, hv_eject_device_work); get_hvpcibus(hpdev->hbus); - schedule_work(&hpdev->wrk); + queue_work(hpdev->hbus->wq, &hpdev->wrk); }
/** @@ -2461,13 +2451,18 @@ static int hv_pci_probe(struct hv_device *hdev, spin_lock_init(&hbus->config_lock); spin_lock_init(&hbus->device_list_lock); spin_lock_init(&hbus->retarget_msi_interrupt_lock); - sema_init(&hbus->enum_sem, 1); init_completion(&hbus->remove_event); + hbus->wq = alloc_ordered_workqueue("hv_pci_%x", 0, + hbus->sysdata.domain); + if (!hbus->wq) { + ret = -ENOMEM; + goto free_bus; + }
ret = vmbus_open(hdev->channel, pci_ring_size, pci_ring_size, NULL, 0, hv_pci_onchannelcallback, hbus); if (ret) - goto free_bus; + goto destroy_wq;
hv_set_drvdata(hdev, hbus);
@@ -2536,6 +2531,8 @@ static int hv_pci_probe(struct hv_device *hdev, hv_free_config_window(hbus); close: vmbus_close(hdev->channel); +destroy_wq: + destroy_workqueue(hbus->wq); free_bus: free_page((unsigned long)hbus); return ret; @@ -2615,6 +2612,7 @@ static int hv_pci_remove(struct hv_device *hdev) irq_domain_free_fwnode(hbus->sysdata.fwnode); put_hvpcibus(hbus); wait_for_completion(&hbus->remove_event); + destroy_workqueue(hbus->wq); free_page((unsigned long)hbus); return 0; }
Dexuan,
On Mon, Apr 16, 2018 at 12:15:06PM +0200, gregkh@linuxfoundation.org wrote:
The patch below does not apply to the 4.9-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to stable@vger.kernel.org.
Please do as Greg asks in order to complete the stable backport you have requested.
Thanks, Lorenzo
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 021ad274d7dc31611d4f47f7dd4ac7a224526f30 Mon Sep 17 00:00:00 2001 From: Dexuan Cui decui@microsoft.com Date: Thu, 15 Mar 2018 14:20:53 +0000 Subject: [PATCH] PCI: hv: Serialize the present and eject work items
When we hot-remove the device, we first receive a PCI_EJECT message and then receive a PCI_BUS_RELATIONS message with bus_rel->device_count == 0.
The first message is offloaded to hv_eject_device_work(), and the second is offloaded to pci_devices_present_work(). Both the paths can be running list_del(&hpdev->list_entry), causing general protection fault, because system_wq can run them concurrently.
The patch eliminates the race condition.
Since access to present/eject work items is serialized, we do not need the hbus->enum_sem anymore, so remove it.
Fixes: 4daace0d8ce8 ("PCI: hv: Add paravirtual PCI front-end for Microsoft Hyper-V VMs") Link: https://lkml.kernel.org/r/KL1P15301MB00064DA6B4D221123B5241CFBFD70@KL1P15301... Tested-by: Adrian Suhov v-adsuho@microsoft.com Tested-by: Chris Valean v-chvale@microsoft.com Signed-off-by: Dexuan Cui decui@microsoft.com [lorenzo.pieralisi@arm.com: squashed semaphore removal patch] Signed-off-by: Lorenzo Pieralisi lorenzo.pieralisi@arm.com Reviewed-by: Michael Kelley mikelley@microsoft.com Acked-by: Haiyang Zhang haiyangz@microsoft.com Cc: stable@vger.kernel.org # v4.6+ Cc: Vitaly Kuznetsov vkuznets@redhat.com Cc: Jack Morgenstein jackm@mellanox.com Cc: Stephen Hemminger sthemmin@microsoft.com Cc: K. Y. Srinivasan kys@microsoft.com
diff --git a/drivers/pci/host/pci-hyperv.c b/drivers/pci/host/pci-hyperv.c index 2faf38eab785..b7fd5c157d73 100644 --- a/drivers/pci/host/pci-hyperv.c +++ b/drivers/pci/host/pci-hyperv.c @@ -447,7 +447,6 @@ struct hv_pcibus_device { spinlock_t device_list_lock; /* Protect lists below */ void __iomem *cfg_addr;
- struct semaphore enum_sem; struct list_head resources_for_children;
struct list_head children; @@ -461,6 +460,8 @@ struct hv_pcibus_device { struct retarget_msi_interrupt retarget_msi_interrupt_params; spinlock_t retarget_msi_interrupt_lock;
- struct workqueue_struct *wq;
}; /* @@ -1590,12 +1591,8 @@ static struct hv_pci_dev *get_pcichild_wslot(struct hv_pcibus_device *hbus,
- It must also treat the omission of a previously observed device as
- notification that the device no longer exists.
- Note that this function is a work item, and it may not be
- invoked in the order that it was queued. Back to back
- updates of the list of present devices may involve queuing
- multiple work items, and this one may run before ones that
- were sent later. As such, this function only does something
- if is the last one in the queue.
- Note that this function is serialized with hv_eject_device_work(),
*/
- because both are pushed to the ordered workqueue hbus->wq.
static void pci_devices_present_work(struct work_struct *work) { @@ -1616,11 +1613,6 @@ static void pci_devices_present_work(struct work_struct *work) INIT_LIST_HEAD(&removed);
- if (down_interruptible(&hbus->enum_sem)) {
put_hvpcibus(hbus);
return;
- }
- /* Pull this off the queue and process it if it was the last one. */ spin_lock_irqsave(&hbus->device_list_lock, flags); while (!list_empty(&hbus->dr_list)) {
@@ -1637,7 +1629,6 @@ static void pci_devices_present_work(struct work_struct *work) spin_unlock_irqrestore(&hbus->device_list_lock, flags); if (!dr) {
put_hvpcibus(hbus); return; }up(&hbus->enum_sem);
@@ -1724,7 +1715,6 @@ static void pci_devices_present_work(struct work_struct *work) break; }
- up(&hbus->enum_sem); put_hvpcibus(hbus); kfree(dr);
} @@ -1770,7 +1760,7 @@ static void hv_pci_devices_present(struct hv_pcibus_device *hbus, spin_unlock_irqrestore(&hbus->device_list_lock, flags); get_hvpcibus(hbus);
- schedule_work(&dr_wrk->wrk);
- queue_work(hbus->wq, &dr_wrk->wrk);
} /** @@ -1848,7 +1838,7 @@ static void hv_pci_eject_device(struct hv_pci_dev *hpdev) get_pcichild(hpdev, hv_pcidev_ref_pnp); INIT_WORK(&hpdev->wrk, hv_eject_device_work); get_hvpcibus(hpdev->hbus);
- schedule_work(&hpdev->wrk);
- queue_work(hpdev->hbus->wq, &hpdev->wrk);
} /** @@ -2461,13 +2451,18 @@ static int hv_pci_probe(struct hv_device *hdev, spin_lock_init(&hbus->config_lock); spin_lock_init(&hbus->device_list_lock); spin_lock_init(&hbus->retarget_msi_interrupt_lock);
- sema_init(&hbus->enum_sem, 1); init_completion(&hbus->remove_event);
- hbus->wq = alloc_ordered_workqueue("hv_pci_%x", 0,
hbus->sysdata.domain);
- if (!hbus->wq) {
ret = -ENOMEM;
goto free_bus;
- }
ret = vmbus_open(hdev->channel, pci_ring_size, pci_ring_size, NULL, 0, hv_pci_onchannelcallback, hbus); if (ret)
goto free_bus;
goto destroy_wq;
hv_set_drvdata(hdev, hbus); @@ -2536,6 +2531,8 @@ static int hv_pci_probe(struct hv_device *hdev, hv_free_config_window(hbus); close: vmbus_close(hdev->channel); +destroy_wq:
- destroy_workqueue(hbus->wq);
free_bus: free_page((unsigned long)hbus); return ret; @@ -2615,6 +2612,7 @@ static int hv_pci_remove(struct hv_device *hdev) irq_domain_free_fwnode(hbus->sysdata.fwnode); put_hvpcibus(hbus); wait_for_completion(&hbus->remove_event);
- destroy_workqueue(hbus->wq); free_page((unsigned long)hbus); return 0;
}
From: Lorenzo Pieralisi lorenzo.pieralisi@arm.com Sent: Monday, April 16, 2018 06:23
Dexuan,
On Mon, Apr 16, 2018 at 12:15:06PM +0200, gregkh@linuxfoundation.org wrote:
The patch below does not apply to the 4.9-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to stable@vger.kernel.org.
Please do as Greg asks in order to complete the stable backport you have requested.
Thanks, Lorenzo
greg k-h
Hi Greg, Lorenzo, It turns out that Hyper-V vPCI driver (drivers/pci/host/pci-hyperv.c) in v4.9.y is broken on latest Hyper-V hosts, because it lacks more fixes, e.g. at least we must cherry-pick these 2 extra fixes: 7dcf90e PCI: hv: Use vPCI protocol version 1.2 b1db7e7 PCI: hv: Add vPCI version protocol negotiation
Otherwise, we always get a "hv_pci ... Request for interrupt failed: 0xc0350005" error, as reported in https://github.com/Microsoft/azure-linux-kernel/issues/13.
The 2 extra fixes depend on a few more patches, and a manual resolution of conflicts is required...
Backporting all the required patches to v4.9 seems too many. IMO we may as well simply drop this patch ("PCI: hv: Serialize the present and eject work items") for v4.9 and 4.10, 4.11 and 4.12, which are all broken due to the same reason.
I can confirm this patch ("PCI: hv: Serialize the present and eject work items") can be applied cleanly to v4.13+.
PS, For Hyper-V users that want PCIe-pass-through and NIC SR-IOV on some certain versions of kernels, they can pick the patches here: https://github.com/Microsoft/azure-linux-kernel.
Thanks, -- Dexuan
On Tue, Apr 17, 2018 at 02:06:01AM +0000, Dexuan Cui wrote:
From: Lorenzo Pieralisi lorenzo.pieralisi@arm.com Sent: Monday, April 16, 2018 06:23
Dexuan,
On Mon, Apr 16, 2018 at 12:15:06PM +0200, gregkh@linuxfoundation.org wrote:
The patch below does not apply to the 4.9-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to stable@vger.kernel.org.
Please do as Greg asks in order to complete the stable backport you have requested.
Thanks, Lorenzo
greg k-h
Hi Greg, Lorenzo, It turns out that Hyper-V vPCI driver (drivers/pci/host/pci-hyperv.c) in v4.9.y is broken on latest Hyper-V hosts, because it lacks more fixes, e.g. at least we must cherry-pick these 2 extra fixes: 7dcf90e PCI: hv: Use vPCI protocol version 1.2 b1db7e7 PCI: hv: Add vPCI version protocol negotiation
Otherwise, we always get a "hv_pci ... Request for interrupt failed: 0xc0350005" error, as reported in https://github.com/Microsoft/azure-linux-kernel/issues/13.
The 2 extra fixes depend on a few more patches, and a manual resolution of conflicts is required...
Backporting all the required patches to v4.9 seems too many. IMO we may as well simply drop this patch ("PCI: hv: Serialize the present and eject work items") for v4.9 and 4.10, 4.11 and 4.12, which are all broken due to the same reason.
I can confirm this patch ("PCI: hv: Serialize the present and eject work items") can be applied cleanly to v4.13+.
PS, For Hyper-V users that want PCIe-pass-through and NIC SR-IOV on some certain versions of kernels, they can pick the patches here: https://github.com/Microsoft/azure-linux-kernel.
I'm sorry, I'm still totally confused.
What exact git commit ids do you wish to see applied to which stable tree?
And if backporting is needed, can you send the proper patches through email? I can't use a random github repo at all, for obvious reasons.
thanks,
greg k-h
From: gregkh@linuxfoundation.org gregkh@linuxfoundation.org Sent: Sunday, April 22, 2018 02:45 On Tue, Apr 17, 2018 at 02:06:01AM +0000, Dexuan Cui wrote:
From: Lorenzo Pieralisi lorenzo.pieralisi@arm.com Sent: Monday, April 16, 2018 06:23
Dexuan,
On Mon, Apr 16, 2018 at 12:15:06PM +0200, gregkh@linuxfoundation.org wrote:
The patch below does not apply to the 4.9-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to stable@vger.kernel.org.
Please do as Greg asks in order to complete the stable backport you have requested.
Lorenzo
greg k-h
Hi Greg, Lorenzo, It turns out that Hyper-V vPCI driver (drivers/pci/host/pci-hyperv.c) in v4.9.y is broken on latest Hyper-V hosts, because it lacks more fixes, e.g. at least we must cherry-pick these 2 extra fixes: 7dcf90e PCI: hv: Use vPCI protocol version 1.2 b1db7e7 PCI: hv: Add vPCI version protocol negotiation
Otherwise, we always get a "hv_pci ... Request for interrupt failed:
0xc0350005"
error, as reported in
https://github.com/Microsoft/azure-linux-kernel/issues/13
The 2 extra fixes depend on a few more patches, and a manual resolution of conflicts is required...
Backporting all the required patches to v4.9 seems too many. IMO we may
as
well simply drop this patch ("PCI: hv: Serialize the present and eject work
items")
for v4.9 and 4.10, 4.11 and 4.12, which are all broken due to the same
reason.
I can confirm this patch ("PCI: hv: Serialize the present and eject work
items")
can be applied cleanly to v4.13+.
PS, For Hyper-V users that want PCIe-pass-through and NIC SR-IOV on some certain versions of kernels, they can pick the patches here:
This link is just FYI only. I didn't mean to ask you to pick any patch from it. :-)
I'm sorry, I'm still totally confused.
What exact git commit ids do you wish to see applied to which stable tree?
I hope this mainline patch 021ad274d7dc ("PCI: hv: Serialize the present and eject work items") can be cherry-pick'd to v4.13.y.
I have verified this patch is already in v4.14.y, v4.15.y and v4.16.y, and it can be cleanly applied to v4.13.y.
And if backporting is needed, can you send the proper patches through email? I can't use a random github repo at all, for obvious reasons. greg k-h
I'd like to backport 021ad274d7dc to v4.13.y, as I mentioned above. No other backport is needed here.
Thanks, -- Dexuan
On Sun, Apr 22, 2018 at 01:49:55PM +0000, Dexuan Cui wrote:
From: gregkh@linuxfoundation.org gregkh@linuxfoundation.org Sent: Sunday, April 22, 2018 02:45 On Tue, Apr 17, 2018 at 02:06:01AM +0000, Dexuan Cui wrote:
From: Lorenzo Pieralisi lorenzo.pieralisi@arm.com Sent: Monday, April 16, 2018 06:23
Dexuan,
On Mon, Apr 16, 2018 at 12:15:06PM +0200, gregkh@linuxfoundation.org wrote:
The patch below does not apply to the 4.9-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to stable@vger.kernel.org.
Please do as Greg asks in order to complete the stable backport you have requested.
Lorenzo
greg k-h
Hi Greg, Lorenzo, It turns out that Hyper-V vPCI driver (drivers/pci/host/pci-hyperv.c) in v4.9.y is broken on latest Hyper-V hosts, because it lacks more fixes, e.g. at least we must cherry-pick these 2 extra fixes: 7dcf90e PCI: hv: Use vPCI protocol version 1.2 b1db7e7 PCI: hv: Add vPCI version protocol negotiation
Otherwise, we always get a "hv_pci ... Request for interrupt failed:
0xc0350005"
error, as reported in
https://github.com/Microsoft/azure-linux-kernel/issues/13
The 2 extra fixes depend on a few more patches, and a manual resolution of conflicts is required...
Backporting all the required patches to v4.9 seems too many. IMO we may
as
well simply drop this patch ("PCI: hv: Serialize the present and eject work
items")
for v4.9 and 4.10, 4.11 and 4.12, which are all broken due to the same
reason.
I can confirm this patch ("PCI: hv: Serialize the present and eject work
items")
can be applied cleanly to v4.13+.
PS, For Hyper-V users that want PCIe-pass-through and NIC SR-IOV on some certain versions of kernels, they can pick the patches here:
This link is just FYI only. I didn't mean to ask you to pick any patch from it. :-)
I'm sorry, I'm still totally confused.
What exact git commit ids do you wish to see applied to which stable tree?
I hope this mainline patch 021ad274d7dc ("PCI: hv: Serialize the present and eject work items") can be cherry-pick'd to v4.13.y.
4.13.y has been end-of-life for a very long time now. So there's nothing I can do there.
I have verified this patch is already in v4.14.y, v4.15.y and v4.16.y, and it can be cleanly applied to v4.13.y.
That's because 4.13.y is end-of-life and has not gotten an update for a very long time.
So there's nothing for me to do now? Great! :)
greg k-h
From: gregkh@linuxfoundation.org gregkh@linuxfoundation.org Sent: Sunday, April 22, 2018 07:03
I hope this mainline patch 021ad274d7dc ("PCI: hv: Serialize the present and eject work items") can be cherry-pick'd to v4.13.y.
4.13.y has been end-of-life for a very long time now. So there's nothing I can do there.
I have verified this patch is already in v4.14.y, v4.15.y and v4.16.y, and it can be cleanly applied to v4.13.y.
That's because 4.13.y is end-of-life and has not gotten an update for a very long time.
Got it.
So there's nothing for me to do now? Great! :)
greg k-h
Correct. Thanks a lot!
--Dexuan
From: gregkh@linuxfoundation.org gregkh@linuxfoundation.org Sent: Sunday, April 22, 2018 07:03
I hope this mainline patch 021ad274d7dc ("PCI: hv: Serialize the present and eject work items") can be cherry-pick'd to v4.13.y.
4.13.y has been end-of-life for a very long time now. So there's nothing I can do there.
I have verified this patch is already in v4.14.y, v4.15.y and v4.16.y, and it can be cleanly applied to v4.13.y.
That's because 4.13.y is end-of-life and has not gotten an update for a very long time.
So there's nothing for me to do now? Great! :) greg k-h
Yes. :-)
-- Dexuan
linux-stable-mirror@lists.linaro.org