The __nvmf_check_ready() routine used to bounce all filesystem io if the controller state isn't LIVE. However, a later patch changed the logic so that it rejection ends up being based on the Q live check. The fc transport has a slightly different sequence from rdma and tcp for shutting down queues/marking them non-live. FC marks its queue non-live after aborting all ios and waiting for their termination, leaving a rather large window for filesystem io to continue to hit the transport. Unfortunately this resulted in filesystem io or applications seeing I/O errors.
Change the fc transport to mark the queues non-live at the first sign of teardown for the association (when i/o is initially terminated).
Fixes: 73a5379937ec ("nvme-fabrics: allow to queue requests for live queues") Cc: stable@vger.kernel.org # v5.8+ Signed-off-by: James Smart jsmart2021@gmail.com
--- stable trees for 5.8 and 5.9 will require a slightly modified patch --- drivers/nvme/host/fc.c | 12 ++++++++++++ 1 file changed, 12 insertions(+)
diff --git a/drivers/nvme/host/fc.c b/drivers/nvme/host/fc.c index d9ab9e7871d0..256e87721a01 100644 --- a/drivers/nvme/host/fc.c +++ b/drivers/nvme/host/fc.c @@ -2461,6 +2461,18 @@ nvme_fc_terminate_exchange(struct request *req, void *data, bool reserved) static void __nvme_fc_abort_outstanding_ios(struct nvme_fc_ctrl *ctrl, bool start_queues) { + int q; + + /* + * if aborting io, the queues are no longer good, mark them + * all as not live. + */ + if (ctrl->ctrl.queue_count > 1) { + for (q = 1; q < ctrl->ctrl.queue_count; q++) + clear_bit(NVME_FC_Q_LIVE, &ctrl->queues[q].flags); + } + clear_bit(NVME_FC_Q_LIVE, &ctrl->queues[0].flags); + /* * If io queues are present, stop them and terminate all outstanding * ios on them. As FC allocates FC exchange for each io, the
The __nvmf_check_ready() routine used to bounce all filesystem io if the controller state isn't LIVE. However, a later patch changed the logic so that it rejection ends up being based on the Q live check. The fc transport has a slightly different sequence from rdma and tcp for shutting down queues/marking them non-live. FC marks its queue non-live after aborting all ios and waiting for their termination, leaving a rather large window for filesystem io to continue to hit the transport. Unfortunately this resulted in filesystem io or applications seeing I/O errors.
Change the fc transport to mark the queues non-live at the first sign of teardown for the association (when i/o is initially terminated).
Sounds like the correct behavior to me, what is the motivation for doing that only after all I/O was aborted?
And, Reviewed-by: Sagi Grimberg sagi@grimberg.me
On 5/12/2021 4:03 PM, Sagi Grimberg wrote:
The __nvmf_check_ready() routine used to bounce all filesystem io if the controller state isn't LIVE. However, a later patch changed the logic so that it rejection ends up being based on the Q live check. The fc transport has a slightly different sequence from rdma and tcp for shutting down queues/marking them non-live. FC marks its queue non-live after aborting all ios and waiting for their termination, leaving a rather large window for filesystem io to continue to hit the transport. Unfortunately this resulted in filesystem io or applications seeing I/O errors.
Change the fc transport to mark the queues non-live at the first sign of teardown for the association (when i/o is initially terminated).
Sounds like the correct behavior to me, what is the motivation for doing that only after all I/O was aborted?
And, Reviewed-by: Sagi Grimberg sagi@grimberg.me
source evolution over time (rdma/tcp changed how they worked) and the need didn't show up earlier based on the earlier checks.
-- james
The __nvmf_check_ready() routine used to bounce all filesystem io if the controller state isn't LIVE. However, a later patch changed the logic so that it rejection ends up being based on the Q live check. The fc transport has a slightly different sequence from rdma and tcp for shutting down queues/marking them non-live. FC marks its queue non-live after aborting all ios and waiting for their termination, leaving a rather large window for filesystem io to continue to hit the transport. Unfortunately this resulted in filesystem io or applications seeing I/O errors.
Change the fc transport to mark the queues non-live at the first sign of teardown for the association (when i/o is initially terminated).
Sounds like the correct behavior to me, what is the motivation for doing that only after all I/O was aborted?
And, Reviewed-by: Sagi Grimberg sagi@grimberg.me
source evolution over time (rdma/tcp changed how they worked) and the need didn't show up earlier based on the earlier checks.
Makes sense...
On 5/10/21 11:56 PM, James Smart wrote:
The __nvmf_check_ready() routine used to bounce all filesystem io if the controller state isn't LIVE. However, a later patch changed the logic so that it rejection ends up being based on the Q live check. The fc transport has a slightly different sequence from rdma and tcp for shutting down queues/marking them non-live. FC marks its queue non-live after aborting all ios and waiting for their termination, leaving a rather large window for filesystem io to continue to hit the transport. Unfortunately this resulted in filesystem io or applications seeing I/O errors.
Change the fc transport to mark the queues non-live at the first sign of teardown for the association (when i/o is initially terminated).
Fixes: 73a5379937ec ("nvme-fabrics: allow to queue requests for live queues") Cc: stable@vger.kernel.org # v5.8+ Signed-off-by: James Smart jsmart2021@gmail.com
stable trees for 5.8 and 5.9 will require a slightly modified patch
drivers/nvme/host/fc.c | 12 ++++++++++++ 1 file changed, 12 insertions(+)
diff --git a/drivers/nvme/host/fc.c b/drivers/nvme/host/fc.c index d9ab9e7871d0..256e87721a01 100644 --- a/drivers/nvme/host/fc.c +++ b/drivers/nvme/host/fc.c @@ -2461,6 +2461,18 @@ nvme_fc_terminate_exchange(struct request *req, void *data, bool reserved) static void __nvme_fc_abort_outstanding_ios(struct nvme_fc_ctrl *ctrl, bool start_queues) {
- int q;
- /*
* if aborting io, the queues are no longer good, mark them
* all as not live.
*/
- if (ctrl->ctrl.queue_count > 1) {
for (q = 1; q < ctrl->ctrl.queue_count; q++)
clear_bit(NVME_FC_Q_LIVE, &ctrl->queues[q].flags);
- }
- clear_bit(NVME_FC_Q_LIVE, &ctrl->queues[0].flags);
- /*
- If io queues are present, stop them and terminate all outstanding
- ios on them. As FC allocates FC exchange for each io, the
Reviewed-by: Himanshu Madhani himanshu.madhani@oracle.com
On 5/11/21 6:56 AM, James Smart wrote:
The __nvmf_check_ready() routine used to bounce all filesystem io if the controller state isn't LIVE. However, a later patch changed the logic so that it rejection ends up being based on the Q live check. The fc transport has a slightly different sequence from rdma and tcp for shutting down queues/marking them non-live. FC marks its queue non-live after aborting all ios and waiting for their termination, leaving a rather large window for filesystem io to continue to hit the transport. Unfortunately this resulted in filesystem io or applications seeing I/O errors.
Change the fc transport to mark the queues non-live at the first sign of teardown for the association (when i/o is initially terminated).
Fixes: 73a5379937ec ("nvme-fabrics: allow to queue requests for live queues") Cc: stable@vger.kernel.org # v5.8+ Signed-off-by: James Smart jsmart2021@gmail.com
stable trees for 5.8 and 5.9 will require a slightly modified patch
drivers/nvme/host/fc.c | 12 ++++++++++++ 1 file changed, 12 insertions(+)
diff --git a/drivers/nvme/host/fc.c b/drivers/nvme/host/fc.c index d9ab9e7871d0..256e87721a01 100644 --- a/drivers/nvme/host/fc.c +++ b/drivers/nvme/host/fc.c @@ -2461,6 +2461,18 @@ nvme_fc_terminate_exchange(struct request *req, void *data, bool reserved) static void __nvme_fc_abort_outstanding_ios(struct nvme_fc_ctrl *ctrl, bool start_queues) {
- int q;
- /*
* if aborting io, the queues are no longer good, mark them
* all as not live.
*/
- if (ctrl->ctrl.queue_count > 1) {
for (q = 1; q < ctrl->ctrl.queue_count; q++)
clear_bit(NVME_FC_Q_LIVE, &ctrl->queues[q].flags);
- }
- clear_bit(NVME_FC_Q_LIVE, &ctrl->queues[0].flags);
- /*
- If io queues are present, stop them and terminate all outstanding
- ios on them. As FC allocates FC exchange for each io, the
Reviewed-by: Hannes Reinecke hare@suse.de
Cheers,
Hannes
linux-stable-mirror@lists.linaro.org