On Wed, 2020-03-11 at 17:19 +0530, Sreekanth Reddy wrote:
On Wed, Mar 11, 2020 at 4:55 PM Sreekanth Reddy sreekanth.reddy@broadcom.com wrote:
On Wed, Mar 11, 2020 at 4:35 PM Amit Shah amit@kernel.org wrote:
On Wed, 2020-03-11 at 06:36 -0400, Sreekanth Reddy wrote:
Generic protection fault type kernel panic is observed when user performs soft(ordered) HBA unplug operation while IOs are running on drives connected to HBA.
When user performs ordered HBA removal operation then kernel calls PCI device's .remove() call back function where driver is flushing out all the outstanding SCSI IO commands with DID_NO_CONNECT host byte and also un-maps sg buffers allocated for these IO commands. But in the ordered HBA removal case (unlike of real HBA hot unplug) HBA device is still alive and hence HBA hardware is performing the DMA operations to those buffers on the system memory which are already unmapped while flushing out the outstanding SCSI IO commands and this leads to Kernel panic.
Fix: Don't flush out the outstanding IOs from .remove() path in case of ordered HBA removal since HBA will be still alive in this case and it can complete the outstanding IOs. Flush out the outstanding IOs only in case physical HBA hot unplug where their won't be any communication with the HBA.
Can you please point to the commit that introduces the bug?
Sure I will add the commit ID which introduced this bug in the next patch.
Thanks.
Cc: stable@vger.kernel.org Signed-off-by: Sreekanth Reddy sreekanth.reddy@broadcom.com
drivers/scsi/mpt3sas/mpt3sas_scsih.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/scsi/mpt3sas/mpt3sas_scsih.c b/drivers/scsi/mpt3sas/mpt3sas_scsih.c index 778d5e6..04a40af 100644 --- a/drivers/scsi/mpt3sas/mpt3sas_scsih.c +++ b/drivers/scsi/mpt3sas/mpt3sas_scsih.c @@ -9908,8 +9908,8 @@ static void scsih_remove(struct pci_dev *pdev)
ioc->remove_host = 1;
mpt3sas_wait_for_commands_to_complete(ioc);
_scsih_flush_running_cmds(ioc);
if (!pci_device_is_present(pdev))
_scsih_flush_running_cmds(ioc); _scsih_fw_event_cleanup_queue(ioc);
@@ -9992,8 +9992,8 @@ static void scsih_remove(struct pci_dev *pdev)
Just a note: this function is scsih_shutdown(). Doesn't block application of the patch, though. Just wondering how the patch was created.
I got your query now, yes this hunk change is in scsih_shutdown() function. I am not sure why scsih_remove name is getting displayed here in this hunk. I have used 'git format-patch' to generate the patch.
Thanks. Does the commit description need an update as well? It only talks about remove callback.
Sorry I didn't get you. Can you please elaborate your query?
ioc->remove_host = 1;
mpt3sas_wait_for_commands_to_complete(ioc);
_scsih_flush_running_cmds(ioc);
if (!pci_device_is_present(pdev))
_scsih_flush_running_cmds(ioc); _scsih_fw_event_cleanup_queue(ioc);