When the controller is reconnecting, the host fails I/O and admin commands as the host cannot reach the controller. ns scanning may revalidate namespaces during that period and it is wrong to remove namespaces due to these failures as we may hang (see 205da2434301).
One command that may fail is nvme_identify_ns_descs. Since we return success due to having ns descriptor list optional, we continue to validate ns identifiers in nvme_revalidate_disk, obviously fail and return -ENODEV to nvme_validate_ns, which will remove the namespace.
Exactly what we don't want to happen.
Fixes: 22802bf742c2 ("nvme: Namepace identification descriptor list is optional") Tested-by: Anton Eidelman anton@lightbitslabs.com Signed-off-by: Sagi Grimberg sagi@grimberg.me
Signed-off-by: Sagi Grimberg sagi@grimberg.me --- drivers/nvme/host/core.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c index 31b7dcd791c2..8ce9b4fbc821 100644 --- a/drivers/nvme/host/core.c +++ b/drivers/nvme/host/core.c @@ -1746,7 +1746,7 @@ static int nvme_report_ns_ids(struct nvme_ctrl *ctrl, unsigned int nsid, if (ret) dev_warn(ctrl->device, "Identify Descriptors failed (%d)\n", ret); - if (ret > 0) + if (ret > 0 && !(ret & NVME_SC_DNR)) ret = 0; } return ret;
On Wed, May 06, 2020 at 04:14:51PM -0700, Sagi Grimberg wrote:
When the controller is reconnecting, the host fails I/O and admin commands as the host cannot reach the controller. ns scanning may revalidate namespaces during that period and it is wrong to remove namespaces due to these failures as we may hang (see 205da2434301).
One command that may fail is nvme_identify_ns_descs. Since we return success due to having ns descriptor list optional, we continue to validate ns identifiers in nvme_revalidate_disk, obviously fail and return -ENODEV to nvme_validate_ns, which will remove the namespace.
Exactly what we don't want to happen.
Fixes: 22802bf742c2 ("nvme: Namepace identification descriptor list is optional") Tested-by: Anton Eidelman anton@lightbitslabs.com Signed-off-by: Sagi Grimberg sagi@grimberg.me
Signed-off-by: Sagi Grimberg sagi@grimberg.me
drivers/nvme/host/core.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
What is the git commit id of this patch in Linus's tree?
And why sign-off on a patch twice with a blank line?
thanks,
greg k-h
When the controller is reconnecting, the host fails I/O and admin commands as the host cannot reach the controller. ns scanning may revalidate namespaces during that period and it is wrong to remove namespaces due to these failures as we may hang (see 205da2434301).
One command that may fail is nvme_identify_ns_descs. Since we return success due to having ns descriptor list optional, we continue to validate ns identifiers in nvme_revalidate_disk, obviously fail and return -ENODEV to nvme_validate_ns, which will remove the namespace.
Exactly what we don't want to happen.
Fixes: 22802bf742c2 ("nvme: Namepace identification descriptor list is optional") Tested-by: Anton Eidelman anton@lightbitslabs.com Signed-off-by: Sagi Grimberg sagi@grimberg.me
Signed-off-by: Sagi Grimberg sagi@grimberg.me
drivers/nvme/host/core.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
What is the git commit id of this patch in Linus's tree?
And why sign-off on a patch twice with a blank line?
I'll resend greg, sorry for the inconvenience.
linux-stable-mirror@lists.linaro.org