The driver performs SCR (state change registration) in all modes including pure target mode.
For each RSCN, scan_needed flag is set in qla2x00_handle_rscn() for the port mentioned in the RSCN and fabric rescan is scheduled. During the rescan, GNN_FT handler, qla24xx_async_gnnft_done() deletes session of the port that caused the RSCN.
In target mode, the session deletion has an impact on ATIO handler, qlt_24xx_atio_pkt(). Target responds with SAM STATUS BUSY to I/O incoming from the deleted session. qlt_handle_cmd_for_atio() and qlt_handle_task_mgmt() return -EFAULT if they are not able to find session of the command/TMF, and that results in invocation of qlt_send_busy():
qlt_24xx_atio_pkt_all_vps: qla_target(0): type 6 ox_id 0014 qla_target(0): Unable to send command to target, sending BUSY status
Such response causes command timeout on the initiator. Error handler thread on the initiator will be spawned to abort the commands:
scsi 23:0:0:0: tag#0 abort scheduled scsi 23:0:0:0: tag#0 aborting command qla2xxx [0000:af:00.0]-188c:23: Entered qla24xx_abort_command. qla2xxx [0000:af:00.0]-801c:23: Abort command issued nexus=23:0:0 -- 0 2003.
Command abort is rejected by target and fails (2003), error handler then tries to perform DEVICE RESET and TARGET RESET but they're also doomed to fail because TMFs are ignored for the deleted sessions.
Then initiator makes BUS RESET that resets the link via qla2x00_full_login_lip(). BUS RESET succeeds and brings initiator port up, SAN switch detects that and sends RSCN to the target port and it fails again the same way as described above. It never goes out of the loop.
The change breaks the RSCN loop by keeping initiator sessions mentioned in RSCN payload in all modes, including dual and pure target mode.
Fixes: 2037ce49d30a ("scsi: qla2xxx: Fix stale session") Cc: Quinn Tran qutran@marvell.com Cc: Arun Easi aeasi@marvell.com Cc: Nilesh Javali njavali@marvell.com Cc: Bart Van Assche bvanassche@acm.org Cc: Daniel Wagner dwagner@suse.de Cc: Himanshu Madhani himanshu.madhani@oracle.com Cc: Martin Wilck mwilck@suse.com Cc: stable@vger.kernel.org # v5.4+ Signed-off-by: Roman Bolshakov r.bolshakov@yadro.com --- drivers/scsi/qla2xxx/qla_gs.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
Changes since v1: - Corrected an error when N_Port_ID change wouldn't clean up stale session (Martin W.).
N_Port_ID may change in the switched fabric topology if initiator cable is replugged to another physical port on the SAN switch (some fabrics assign physical port number to domain area). Physical reconnection implies that initiator is going to relogin anyway and previous session is no longer needed.
diff --git a/drivers/scsi/qla2xxx/qla_gs.c b/drivers/scsi/qla2xxx/qla_gs.c index 42c3ad27f1cb..df670fba2ab8 100644 --- a/drivers/scsi/qla2xxx/qla_gs.c +++ b/drivers/scsi/qla2xxx/qla_gs.c @@ -3496,7 +3496,9 @@ void qla24xx_async_gnnft_done(scsi_qla_host_t *vha, srb_t *sp) qla2x00_clear_loop_id(fcport); fcport->flags |= FCF_FABRIC_DEVICE; } else if (fcport->d_id.b24 != rp->id.b24 || - fcport->scan_needed) { + (fcport->scan_needed && + fcport->port_type != FCT_INITIATOR && + fcport->port_type != FCT_NVME_INITIATOR)) { qlt_schedule_sess_for_deletion(fcport); } fcport->d_id.b24 = rp->id.b24;
On Jun 5, 2020, at 9:44 AM, Roman Bolshakov r.bolshakov@yadro.com wrote:
The driver performs SCR (state change registration) in all modes including pure target mode.
For each RSCN, scan_needed flag is set in qla2x00_handle_rscn() for the port mentioned in the RSCN and fabric rescan is scheduled. During the rescan, GNN_FT handler, qla24xx_async_gnnft_done() deletes session of the port that caused the RSCN.
In target mode, the session deletion has an impact on ATIO handler, qlt_24xx_atio_pkt(). Target responds with SAM STATUS BUSY to I/O incoming from the deleted session. qlt_handle_cmd_for_atio() and qlt_handle_task_mgmt() return -EFAULT if they are not able to find session of the command/TMF, and that results in invocation of qlt_send_busy():
qlt_24xx_atio_pkt_all_vps: qla_target(0): type 6 ox_id 0014 qla_target(0): Unable to send command to target, sending BUSY status
Such response causes command timeout on the initiator. Error handler thread on the initiator will be spawned to abort the commands:
scsi 23:0:0:0: tag#0 abort scheduled scsi 23:0:0:0: tag#0 aborting command qla2xxx [0000:af:00.0]-188c:23: Entered qla24xx_abort_command. qla2xxx [0000:af:00.0]-801c:23: Abort command issued nexus=23:0:0 -- 0 2003.
Command abort is rejected by target and fails (2003), error handler then tries to perform DEVICE RESET and TARGET RESET but they're also doomed to fail because TMFs are ignored for the deleted sessions.
Then initiator makes BUS RESET that resets the link via qla2x00_full_login_lip(). BUS RESET succeeds and brings initiator port up, SAN switch detects that and sends RSCN to the target port and it fails again the same way as described above. It never goes out of the loop.
The change breaks the RSCN loop by keeping initiator sessions mentioned in RSCN payload in all modes, including dual and pure target mode.
Fixes: 2037ce49d30a ("scsi: qla2xxx: Fix stale session") Cc: Quinn Tran qutran@marvell.com Cc: Arun Easi aeasi@marvell.com Cc: Nilesh Javali njavali@marvell.com Cc: Bart Van Assche bvanassche@acm.org Cc: Daniel Wagner dwagner@suse.de Cc: Himanshu Madhani himanshu.madhani@oracle.com Cc: Martin Wilck mwilck@suse.com Cc: stable@vger.kernel.org # v5.4+ Signed-off-by: Roman Bolshakov r.bolshakov@yadro.com
drivers/scsi/qla2xxx/qla_gs.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
Changes since v1:
Corrected an error when N_Port_ID change wouldn't clean up stale session (Martin W.).
N_Port_ID may change in the switched fabric topology if initiator cable is replugged to another physical port on the SAN switch (some fabrics assign physical port number to domain area). Physical reconnection implies that initiator is going to relogin anyway and previous session is no longer needed.
diff --git a/drivers/scsi/qla2xxx/qla_gs.c b/drivers/scsi/qla2xxx/qla_gs.c index 42c3ad27f1cb..df670fba2ab8 100644 --- a/drivers/scsi/qla2xxx/qla_gs.c +++ b/drivers/scsi/qla2xxx/qla_gs.c @@ -3496,7 +3496,9 @@ void qla24xx_async_gnnft_done(scsi_qla_host_t *vha, srb_t *sp) qla2x00_clear_loop_id(fcport); fcport->flags |= FCF_FABRIC_DEVICE; } else if (fcport->d_id.b24 != rp->id.b24 ||
fcport->scan_needed) {
(fcport->scan_needed &&
fcport->port_type != FCT_INITIATOR &&
fcport->port_type != FCT_NVME_INITIATOR)) { qlt_schedule_sess_for_deletion(fcport); } fcport->d_id.b24 = rp->id.b24;
-- 2.26.1
Looks fine.
Reviewed-by: Himanshu Madhani himanshu.madhani@oracle.com
-- Himanshu Madhani Oracle Linux Engineering
On Fri, Jun 05, 2020 at 05:44:37PM +0300, Roman Bolshakov wrote:
The driver performs SCR (state change registration) in all modes including pure target mode.
For each RSCN, scan_needed flag is set in qla2x00_handle_rscn() for the port mentioned in the RSCN and fabric rescan is scheduled. During the rescan, GNN_FT handler, qla24xx_async_gnnft_done() deletes session of the port that caused the RSCN.
In target mode, the session deletion has an impact on ATIO handler, qlt_24xx_atio_pkt(). Target responds with SAM STATUS BUSY to I/O incoming from the deleted session. qlt_handle_cmd_for_atio() and qlt_handle_task_mgmt() return -EFAULT if they are not able to find session of the command/TMF, and that results in invocation of qlt_send_busy():
qlt_24xx_atio_pkt_all_vps: qla_target(0): type 6 ox_id 0014 qla_target(0): Unable to send command to target, sending BUSY status
Such response causes command timeout on the initiator. Error handler thread on the initiator will be spawned to abort the commands:
scsi 23:0:0:0: tag#0 abort scheduled scsi 23:0:0:0: tag#0 aborting command qla2xxx [0000:af:00.0]-188c:23: Entered qla24xx_abort_command. qla2xxx [0000:af:00.0]-801c:23: Abort command issued nexus=23:0:0 -- 0 2003.
Command abort is rejected by target and fails (2003), error handler then tries to perform DEVICE RESET and TARGET RESET but they're also doomed to fail because TMFs are ignored for the deleted sessions.
Then initiator makes BUS RESET that resets the link via qla2x00_full_login_lip(). BUS RESET succeeds and brings initiator port up, SAN switch detects that and sends RSCN to the target port and it fails again the same way as described above. It never goes out of the loop.
The change breaks the RSCN loop by keeping initiator sessions mentioned in RSCN payload in all modes, including dual and pure target mode.
Fixes: 2037ce49d30a ("scsi: qla2xxx: Fix stale session") Cc: Quinn Tran qutran@marvell.com Cc: Arun Easi aeasi@marvell.com Cc: Nilesh Javali njavali@marvell.com Cc: Bart Van Assche bvanassche@acm.org Cc: Daniel Wagner dwagner@suse.de Cc: Himanshu Madhani himanshu.madhani@oracle.com Cc: Martin Wilck mwilck@suse.com Cc: stable@vger.kernel.org # v5.4+ Signed-off-by: Roman Bolshakov r.bolshakov@yadro.com
I tried to follow the code paths as descriped in the commit message and also tried to match it with the detailed response on Martin's question if this would leak sessions. As far I can tell, this looks good but I am still a noob when it comes to FC :)
Reviewed-by: Daniel Wagner dwagner@suse.de
Looks good.
Reviewed-by: Shyam Sundar ssundar@marvell.com
On Jun 5, 2020, at 7:44 AM, Roman Bolshakov r.bolshakov@yadro.com wrote:
qla24xx_async_gnnft_done
On Fri, 5 Jun 2020 17:44:37 +0300, Roman Bolshakov wrote:
The driver performs SCR (state change registration) in all modes including pure target mode.
For each RSCN, scan_needed flag is set in qla2x00_handle_rscn() for the port mentioned in the RSCN and fabric rescan is scheduled. During the rescan, GNN_FT handler, qla24xx_async_gnnft_done() deletes session of the port that caused the RSCN.
[...]
Applied to 5.8/scsi-fixes, thanks!
[1/1] scsi: qla2xxx: Keep initiator ports after RSCN https://git.kernel.org/mkp/scsi/c/632f24f09d5b
linux-stable-mirror@lists.linaro.org