This series of patches contains 3 separate changes that fix some bugs in the qla2xxx driver. --- v2: - Change a spinlock wrap to a WRITE_ONCE() in patch 1 - Add Reviewed-by tags on patches 2 and 3 --- Anastasia Kovaleva (3): scsi: qla2xxx: Drop starvation counter on success scsi: qla2xxx: Make target send correct LOGO scsi: qla2xxx: Remove incorrect trap
drivers/scsi/qla2xxx/qla_iocb.c | 11 +++++++++++ drivers/scsi/qla2xxx/qla_isr.c | 4 ++++ drivers/scsi/qla2xxx/qla_target.c | 16 +++++++--------- 3 files changed, 22 insertions(+), 9 deletions(-)
Long-lived sessions under high load can accumulate a starvation counter, and the current implementation does not allow this counter to be reset during an active session.
If HBA sends correct ATIO IOCB, then it has enough resources to process commands and we should not call ISP recovery.
Cc: stable@vger.kernel.org Fixes: ead038556f64 ("qla2xxx: Add Dual mode support in the driver") Signed-off-by: Anastasia Kovaleva a.kovaleva@yadro.com Reviewed-by: Dmitry Bogdanov d.bogdanov@yadro.com --- drivers/scsi/qla2xxx/qla_isr.c | 4 ++++ drivers/scsi/qla2xxx/qla_target.c | 6 ++++++ 2 files changed, 10 insertions(+)
diff --git a/drivers/scsi/qla2xxx/qla_isr.c b/drivers/scsi/qla2xxx/qla_isr.c index fe98c76e9be3..5234ce0985e0 100644 --- a/drivers/scsi/qla2xxx/qla_isr.c +++ b/drivers/scsi/qla2xxx/qla_isr.c @@ -1959,6 +1959,10 @@ qla2x00_async_event(scsi_qla_host_t *vha, struct rsp_que *rsp, uint16_t *mb) ql_dbg(ql_dbg_async, vha, 0x5091, "Transceiver Removal\n"); break;
+ case MBA_REJECTED_FCP_CMD: + ql_dbg(ql_dbg_async, vha, 0x5092, "LS_RJT was sent. No resources to process the ELS request.\n"); + break; + default: ql_dbg(ql_dbg_async, vha, 0x5057, "Unknown AEN:%04x %04x %04x %04x\n", diff --git a/drivers/scsi/qla2xxx/qla_target.c b/drivers/scsi/qla2xxx/qla_target.c index d7551b1443e4..bc6b014eb422 100644 --- a/drivers/scsi/qla2xxx/qla_target.c +++ b/drivers/scsi/qla2xxx/qla_target.c @@ -6826,6 +6826,12 @@ qlt_24xx_process_atio_queue(struct scsi_qla_host *vha, uint8_t ha_locked) qlt_send_term_exchange(ha->base_qpair, NULL, pkt, ha_locked, 0); } else { + /* + * If we get correct ATIO, then HBA had enough memory + * to proceed without reset. + */ + WRITE_ONCE(&vha->hw->exch_starvation, 0); + qlt_24xx_atio_pkt_all_vps(vha, (struct atio_from_isp *)pkt, ha_locked); }
Hi Anastasia,
kernel test robot noticed the following build errors:
[auto build test ERROR on jejb-scsi/for-next] [also build test ERROR on mkp-scsi/for-next linus/master v6.12-rc2 next-20241010] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/Anastasia-Kovaleva/scsi-qla2x... base: https://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi.git for-next patch link: https://lore.kernel.org/r/20241009111654.4697-2-a.kovaleva%40yadro.com patch subject: [PATCH v2 1/3] scsi: qla2xxx: Drop starvation counter on success config: alpha-allyesconfig (https://download.01.org/0day-ci/archive/20241010/202410102244.4WCXxyGQ-lkp@i...) compiler: alpha-linux-gcc (GCC) 13.3.0 reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20241010/202410102244.4WCXxyGQ-lkp@i...)
If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot lkp@intel.com | Closes: https://lore.kernel.org/oe-kbuild-all/202410102244.4WCXxyGQ-lkp@intel.com/
All errors (new ones prefixed by >>):
In file included from arch/alpha/include/asm/rwonce.h:33, from include/linux/compiler.h:317, from include/linux/build_bug.h:5, from include/linux/container_of.h:5, from include/linux/list.h:5, from include/linux/module.h:12, from drivers/scsi/qla2xxx/qla_target.c:17: drivers/scsi/qla2xxx/qla_target.c: In function 'qlt_24xx_process_atio_queue':
include/asm-generic/rwonce.h:55:32: error: lvalue required as unary '&' operand
55 | *(volatile typeof(x) *)&(x) = (val); \ | ^ include/asm-generic/rwonce.h:61:9: note: in expansion of macro '__WRITE_ONCE' 61 | __WRITE_ONCE(x, val); \ | ^~~~~~~~~~~~ drivers/scsi/qla2xxx/qla_target.c:6833:25: note: in expansion of macro 'WRITE_ONCE' 6833 | WRITE_ONCE(&vha->hw->exch_starvation, 0); | ^~~~~~~~~~
vim +55 include/asm-generic/rwonce.h
e506ea451254ab Will Deacon 2019-10-15 52 e506ea451254ab Will Deacon 2019-10-15 53 #define __WRITE_ONCE(x, val) \ e506ea451254ab Will Deacon 2019-10-15 54 do { \ e506ea451254ab Will Deacon 2019-10-15 @55 *(volatile typeof(x) *)&(x) = (val); \ e506ea451254ab Will Deacon 2019-10-15 56 } while (0) e506ea451254ab Will Deacon 2019-10-15 57
Hi Anastasia,
kernel test robot noticed the following build errors:
[auto build test ERROR on jejb-scsi/for-next] [also build test ERROR on mkp-scsi/for-next linus/master v6.12-rc2 next-20241010] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/Anastasia-Kovaleva/scsi-qla2x... base: https://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi.git for-next patch link: https://lore.kernel.org/r/20241009111654.4697-2-a.kovaleva%40yadro.com patch subject: [PATCH v2 1/3] scsi: qla2xxx: Drop starvation counter on success config: um-allmodconfig (https://download.01.org/0day-ci/archive/20241011/202410110059.pb1whtvg-lkp@i...) compiler: clang version 20.0.0git (https://github.com/llvm/llvm-project 70e0a7e7e6a8541bcc46908c592eed561850e416) reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20241011/202410110059.pb1whtvg-lkp@i...)
If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot lkp@intel.com | Closes: https://lore.kernel.org/oe-kbuild-all/202410110059.pb1whtvg-lkp@intel.com/
All errors (new ones prefixed by >>):
In file included from drivers/scsi/qla2xxx/qla_target.c:20: In file included from include/linux/blkdev.h:9: In file included from include/linux/blk_types.h:10: In file included from include/linux/bvec.h:10: In file included from include/linux/highmem.h:8: In file included from include/linux/cacheflush.h:5: In file included from arch/um/include/asm/cacheflush.h:4: In file included from arch/um/include/asm/tlbflush.h:9: In file included from include/linux/mm.h:2213: include/linux/vmstat.h:518:36: warning: arithmetic between different enumeration types ('enum node_stat_item' and 'enum lru_list') [-Wenum-enum-conversion] 518 | return node_stat_name(NR_LRU_BASE + lru) + 3; // skip "nr_" | ~~~~~~~~~~~ ^ ~~~ In file included from drivers/scsi/qla2xxx/qla_target.c:20: In file included from include/linux/blkdev.h:9: In file included from include/linux/blk_types.h:10: In file included from include/linux/bvec.h:10: In file included from include/linux/highmem.h:12: In file included from include/linux/hardirq.h:11: In file included from arch/um/include/asm/hardirq.h:5: In file included from include/asm-generic/hardirq.h:17: In file included from include/linux/irq.h:20: In file included from include/linux/io.h:14: In file included from arch/um/include/asm/io.h:24: include/asm-generic/io.h:548:31: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic] 548 | val = __raw_readb(PCI_IOBASE + addr); | ~~~~~~~~~~ ^ include/asm-generic/io.h:561:61: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic] 561 | val = __le16_to_cpu((__le16 __force)__raw_readw(PCI_IOBASE + addr)); | ~~~~~~~~~~ ^ include/uapi/linux/byteorder/little_endian.h:37:51: note: expanded from macro '__le16_to_cpu' 37 | #define __le16_to_cpu(x) ((__force __u16)(__le16)(x)) | ^ In file included from drivers/scsi/qla2xxx/qla_target.c:20: In file included from include/linux/blkdev.h:9: In file included from include/linux/blk_types.h:10: In file included from include/linux/bvec.h:10: In file included from include/linux/highmem.h:12: In file included from include/linux/hardirq.h:11: In file included from arch/um/include/asm/hardirq.h:5: In file included from include/asm-generic/hardirq.h:17: In file included from include/linux/irq.h:20: In file included from include/linux/io.h:14: In file included from arch/um/include/asm/io.h:24: include/asm-generic/io.h:574:61: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic] 574 | val = __le32_to_cpu((__le32 __force)__raw_readl(PCI_IOBASE + addr)); | ~~~~~~~~~~ ^ include/uapi/linux/byteorder/little_endian.h:35:51: note: expanded from macro '__le32_to_cpu' 35 | #define __le32_to_cpu(x) ((__force __u32)(__le32)(x)) | ^ In file included from drivers/scsi/qla2xxx/qla_target.c:20: In file included from include/linux/blkdev.h:9: In file included from include/linux/blk_types.h:10: In file included from include/linux/bvec.h:10: In file included from include/linux/highmem.h:12: In file included from include/linux/hardirq.h:11: In file included from arch/um/include/asm/hardirq.h:5: In file included from include/asm-generic/hardirq.h:17: In file included from include/linux/irq.h:20: In file included from include/linux/io.h:14: In file included from arch/um/include/asm/io.h:24: include/asm-generic/io.h:585:33: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic] 585 | __raw_writeb(value, PCI_IOBASE + addr); | ~~~~~~~~~~ ^ include/asm-generic/io.h:595:59: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic] 595 | __raw_writew((u16 __force)cpu_to_le16(value), PCI_IOBASE + addr); | ~~~~~~~~~~ ^ include/asm-generic/io.h:605:59: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic] 605 | __raw_writel((u32 __force)cpu_to_le32(value), PCI_IOBASE + addr); | ~~~~~~~~~~ ^ include/asm-generic/io.h:693:20: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic] 693 | readsb(PCI_IOBASE + addr, buffer, count); | ~~~~~~~~~~ ^ include/asm-generic/io.h:701:20: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic] 701 | readsw(PCI_IOBASE + addr, buffer, count); | ~~~~~~~~~~ ^ include/asm-generic/io.h:709:20: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic] 709 | readsl(PCI_IOBASE + addr, buffer, count); | ~~~~~~~~~~ ^ include/asm-generic/io.h:718:21: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic] 718 | writesb(PCI_IOBASE + addr, buffer, count); | ~~~~~~~~~~ ^ include/asm-generic/io.h:727:21: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic] 727 | writesw(PCI_IOBASE + addr, buffer, count); | ~~~~~~~~~~ ^ include/asm-generic/io.h:736:21: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic] 736 | writesl(PCI_IOBASE + addr, buffer, count); | ~~~~~~~~~~ ^
drivers/scsi/qla2xxx/qla_target.c:6833:4: error: cannot take the address of an rvalue of type 'uint8_t *' (aka 'unsigned char *')
6833 | WRITE_ONCE(&vha->hw->exch_starvation, 0); | ^ ~~~~~~~~~~~~~~~~~~~~~~~~~ include/asm-generic/rwonce.h:61:2: note: expanded from macro 'WRITE_ONCE' 61 | __WRITE_ONCE(x, val); \ | ^ ~ include/asm-generic/rwonce.h:55:25: note: expanded from macro '__WRITE_ONCE' 55 | *(volatile typeof(x) *)&(x) = (val); \ | ^ ~ 13 warnings and 1 error generated.
Kconfig warnings: (for reference only) WARNING: unmet direct dependencies detected for MODVERSIONS Depends on [n]: MODULES [=y] && !COMPILE_TEST [=y] Selected by [y]: - RANDSTRUCT_FULL [=y] && (CC_HAS_RANDSTRUCT [=y] || GCC_PLUGINS [=n]) && MODULES [=y] WARNING: unmet direct dependencies detected for GET_FREE_REGION Depends on [n]: SPARSEMEM [=n] Selected by [m]: - RESOURCE_KUNIT_TEST [=m] && RUNTIME_TESTING_MENU [=y] && KUNIT [=m]
vim +6833 drivers/scsi/qla2xxx/qla_target.c
6793 6794 /* 6795 * qlt_24xx_process_atio_queue() - Process ATIO queue entries. 6796 * @ha: SCSI driver HA context 6797 */ 6798 void 6799 qlt_24xx_process_atio_queue(struct scsi_qla_host *vha, uint8_t ha_locked) 6800 { 6801 struct qla_hw_data *ha = vha->hw; 6802 struct atio_from_isp *pkt; 6803 int cnt, i; 6804 6805 if (!ha->flags.fw_started) 6806 return; 6807 6808 while ((ha->tgt.atio_ring_ptr->signature != ATIO_PROCESSED) || 6809 fcpcmd_is_corrupted(ha->tgt.atio_ring_ptr)) { 6810 pkt = (struct atio_from_isp *)ha->tgt.atio_ring_ptr; 6811 cnt = pkt->u.raw.entry_count; 6812 6813 if (unlikely(fcpcmd_is_corrupted(ha->tgt.atio_ring_ptr))) { 6814 /* 6815 * This packet is corrupted. The header + payload 6816 * can not be trusted. There is no point in passing 6817 * it further up. 6818 */ 6819 ql_log(ql_log_warn, vha, 0xd03c, 6820 "corrupted fcp frame SID[%3phN] OXID[%04x] EXCG[%x] %64phN\n", 6821 &pkt->u.isp24.fcp_hdr.s_id, 6822 be16_to_cpu(pkt->u.isp24.fcp_hdr.ox_id), 6823 pkt->u.isp24.exchange_addr, pkt); 6824 6825 adjust_corrupted_atio(pkt); 6826 qlt_send_term_exchange(ha->base_qpair, NULL, pkt, 6827 ha_locked, 0); 6828 } else { 6829 /* 6830 * If we get correct ATIO, then HBA had enough memory 6831 * to proceed without reset. 6832 */
6833 WRITE_ONCE(&vha->hw->exch_starvation, 0);
6834 6835 qlt_24xx_atio_pkt_all_vps(vha, 6836 (struct atio_from_isp *)pkt, ha_locked); 6837 } 6838 6839 for (i = 0; i < cnt; i++) { 6840 ha->tgt.atio_ring_index++; 6841 if (ha->tgt.atio_ring_index == ha->tgt.atio_q_length) { 6842 ha->tgt.atio_ring_index = 0; 6843 ha->tgt.atio_ring_ptr = ha->tgt.atio_ring; 6844 } else 6845 ha->tgt.atio_ring_ptr++; 6846 6847 pkt->u.raw.signature = cpu_to_le32(ATIO_PROCESSED); 6848 pkt = (struct atio_from_isp *)ha->tgt.atio_ring_ptr; 6849 } 6850 wmb(); 6851 } 6852 6853 /* Adjust ring index */ 6854 wrt_reg_dword(ISP_ATIO_Q_OUT(vha), ha->tgt.atio_ring_index); 6855 } 6856
Upon removing the ACL from the target, it sends a LOGO command to the initiator to break the connection. But HBA fills port_name and port_id of the LOGO command with all zeroes, which is not valid. The initiator sends a reject for this command, but it is not being processed on the target, since it assumes LOGO can never fail. This leaves a system in a state where the initiator thinks it is still logged in to the target and can send commands to it, but the target ignores all incoming commands from this initiator.
If, in such a situation, the initiator sends some command (e.g. during a scan), after not receiving a response for a timeout duration, it sends ABORT for the command. After a timeout on receiving an ABORT response, the initiator sends LOGO to the target. Only after that, the initiator can successfully relogin to the target and restore the connection. In the end, this whole situation hangs the system for approximately a minute.
By default, the driver sends a LOGO command to HBA filling only port_id, expecting HBA to match port_id with the correct port_name from it's internal table. HBA doesn't do that, instead filling these fields with all zeroes.
This patch makes the driver send a LOGO command to HBA with port_name and port_id in the I/O PARMETER fields. HBA then copies these values to corresponding fields in the LOGO command frame.
Signed-off-by: Anastasia Kovaleva a.kovaleva@yadro.com Reviewed-by: Dmitry Bogdanov d.bogdanov@yadro.com Reviewed-by: Hannes Reinecke hare@suse.de --- drivers/scsi/qla2xxx/qla_iocb.c | 11 +++++++++++ 1 file changed, 11 insertions(+)
diff --git a/drivers/scsi/qla2xxx/qla_iocb.c b/drivers/scsi/qla2xxx/qla_iocb.c index 0b41e8a06602..90026fca14dc 100644 --- a/drivers/scsi/qla2xxx/qla_iocb.c +++ b/drivers/scsi/qla2xxx/qla_iocb.c @@ -2486,6 +2486,17 @@ qla24xx_logout_iocb(srb_t *sp, struct logio_entry_24xx *logio) logio->port_id[1] = sp->fcport->d_id.b.area; logio->port_id[2] = sp->fcport->d_id.b.domain; logio->vp_index = sp->vha->vp_idx; + logio->io_parameter[0] = cpu_to_le32(sp->vha->d_id.b.al_pa | + sp->vha->d_id.b.area << 8 | + sp->vha->d_id.b.domain << 16); + logio->io_parameter[1] = cpu_to_le32(sp->vha->port_name[3] | + sp->vha->port_name[2] << 8 | + sp->vha->port_name[1] << 16 | + sp->vha->port_name[0] << 24); + logio->io_parameter[2] = cpu_to_le32(sp->vha->port_name[7] | + sp->vha->port_name[6] << 8 | + sp->vha->port_name[5] << 16 | + sp->vha->port_name[4] << 24); }
static void
Hi,
Thanks for your patch.
FYI: kernel test robot notices the stable kernel rule is not satisfied.
The check is based on https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.html#opti...
Rule: add the tag "Cc: stable@vger.kernel.org" in the sign-off area to have the patch automatically included in the stable tree. Subject: [PATCH v2 2/3] scsi: qla2xxx: Make target send correct LOGO Link: https://lore.kernel.org/stable/20241009111654.4697-3-a.kovaleva%40yadro.com
This BUG_ON() is triggered when there is no fc_port with a certain loop ID in the scsi host vp_fcports list, but there is one in lport_loopid_map. As these two data structures do not change simultaneously and atomically, such a trap is invalid.
Cc: stable@vger.kernel.org Fixes: 726b85487067 ("qla2xxx: Add framework for async fabric discovery") Signed-off-by: Anastasia Kovaleva a.kovaleva@yadro.com Reviewed-by: Dmitry Bogdanov d.bogdanov@yadro.com Reviewed-by: Hannes Reinecke hare@suse.de --- drivers/scsi/qla2xxx/qla_target.c | 10 +--------- 1 file changed, 1 insertion(+), 9 deletions(-)
diff --git a/drivers/scsi/qla2xxx/qla_target.c b/drivers/scsi/qla2xxx/qla_target.c index bc6b014eb422..492fc1627354 100644 --- a/drivers/scsi/qla2xxx/qla_target.c +++ b/drivers/scsi/qla2xxx/qla_target.c @@ -5190,15 +5190,7 @@ static int qlt_24xx_handle_els(struct scsi_qla_host *vha, ql_dbg(ql_dbg_disc, vha, 0x20fc, "%s: logo %llx res %d sess %p ", __func__, wwn, res, sess); - if (res == 0) { - /* - * cmd went upper layer, look for qlt_xmit_tm_rsp() - * for LOGO_ACK & sess delete - */ - BUG_ON(!sess); - res = 0; - } else { - /* cmd did not go to upper layer. */ + if (res) { if (sess) { qlt_schedule_sess_for_deletion(sess); res = 0;
On 10/9/24 04:16, Anastasia Kovaleva wrote:
This series of patches contains 3 separate changes that fix some bugs in the qla2xxx driver.
v2:
- Change a spinlock wrap to a WRITE_ONCE() in patch 1
- Add Reviewed-by tags on patches 2 and 3
Anastasia Kovaleva (3): scsi: qla2xxx: Drop starvation counter on success scsi: qla2xxx: Make target send correct LOGO scsi: qla2xxx: Remove incorrect trap
drivers/scsi/qla2xxx/qla_iocb.c | 11 +++++++++++ drivers/scsi/qla2xxx/qla_isr.c | 4 ++++ drivers/scsi/qla2xxx/qla_target.c | 16 +++++++--------- 3 files changed, 22 insertions(+), 9 deletions(-)
For the series,
Reviewed-by: Himanshu Madhani himanshu.madhani@oracle.com
linux-stable-mirror@lists.linaro.org