From: Keith Busch kbusch@kernel.org
Some devices are reporting controller ready mode support, but return 0 for CRTO. These devices require a much higher time to ready than that, so they are failing to initialize after the driver starter preferring that value over CAP.TO.
The spec requires that CAP.TO match the appropritate CRTO value, or be set to 0xff if CRTO is larger than that. This means that CAP.TO can be used to validate if CRTO is reliable, and provides an appropriate fallback for setting the timeout value if not. Use whichever is larger.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=217863 Reported-by: Cláudio Sampaio patola@gmail.com Reported-by: Felix Yan felixonmars@archlinux.org Based-on-a-patch-by: Felix Yan felixonmars@archlinux.org Cc: stable@vger.kernel.org Signed-off-by: Keith Busch kbusch@kernel.org --- drivers/nvme/host/core.c | 48 ++++++++++++++++++++++++---------------- 1 file changed, 29 insertions(+), 19 deletions(-)
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c index 37b6fa7466620..4adc0b2f12f1e 100644 --- a/drivers/nvme/host/core.c +++ b/drivers/nvme/host/core.c @@ -2245,25 +2245,8 @@ int nvme_enable_ctrl(struct nvme_ctrl *ctrl) else ctrl->ctrl_config = NVME_CC_CSS_NVM;
- if (ctrl->cap & NVME_CAP_CRMS_CRWMS) { - u32 crto; - - ret = ctrl->ops->reg_read32(ctrl, NVME_REG_CRTO, &crto); - if (ret) { - dev_err(ctrl->device, "Reading CRTO failed (%d)\n", - ret); - return ret; - } - - if (ctrl->cap & NVME_CAP_CRMS_CRIMS) { - ctrl->ctrl_config |= NVME_CC_CRIME; - timeout = NVME_CRTO_CRIMT(crto); - } else { - timeout = NVME_CRTO_CRWMT(crto); - } - } else { - timeout = NVME_CAP_TIMEOUT(ctrl->cap); - } + if (ctrl->cap & NVME_CAP_CRMS_CRWMS && ctrl->cap & NVME_CAP_CRMS_CRIMS) + ctrl->ctrl_config |= NVME_CC_CRIME;
ctrl->ctrl_config |= (NVME_CTRL_PAGE_SHIFT - 12) << NVME_CC_MPS_SHIFT; ctrl->ctrl_config |= NVME_CC_AMS_RR | NVME_CC_SHN_NONE; @@ -2277,6 +2260,33 @@ int nvme_enable_ctrl(struct nvme_ctrl *ctrl) if (ret) return ret;
+ /* CAP value may change after initial CC write */ + ret = ctrl->ops->reg_read64(ctrl, NVME_REG_CAP, &ctrl->cap); + if (ret) + return ret; + + timeout = NVME_CAP_TIMEOUT(ctrl->cap); + if (ctrl->cap & NVME_CAP_CRMS_CRWMS) { + u32 crto; + + ret = ctrl->ops->reg_read32(ctrl, NVME_REG_CRTO, &crto); + if (ret) { + dev_err(ctrl->device, "Reading CRTO failed (%d)\n", + ret); + return ret; + } + + /* + * CRTO should always be greater or equal to CAP.TO, but some + * devices are known to get this wrong. Use the larger of the + * two values. + */ + if (ctrl->ctrl_config & NVME_CC_CRIME) + timeout = max(timeout, NVME_CRTO_CRIMT(crto)); + else + timeout = max(timeout, NVME_CRTO_CRWMT(crto)); + } + ctrl->ctrl_config |= NVME_CC_ENABLE; ret = ctrl->ops->reg_write32(ctrl, NVME_REG_CC, ctrl->ctrl_config); if (ret)
On Tue, Sep 12, 2023 at 02:47:33PM -0700, Keith Busch wrote:
From: Keith Busch kbusch@kernel.org
Some devices are reporting controller ready mode support, but return 0 for CRTO. These devices require a much higher time to ready than that, so they are failing to initialize after the driver starter preferring that value over CAP.TO.
The spec requires that CAP.TO match the appropritate CRTO value, or be set to 0xff if CRTO is larger than that. This means that CAP.TO can be used to validate if CRTO is reliable, and provides an appropriate fallback for setting the timeout value if not. Use whichever is larger.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=217863 Reported-by: Cláudio Sampaio patola@gmail.com Reported-by: Felix Yan felixonmars@archlinux.org Based-on-a-patch-by: Felix Yan felixonmars@archlinux.org Cc: stable@vger.kernel.org Signed-off-by: Keith Busch kbusch@kernel.org
drivers/nvme/host/core.c | 48 ++++++++++++++++++++++++---------------- 1 file changed, 29 insertions(+), 19 deletions(-)
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
- if (ctrl->cap & NVME_CAP_CRMS_CRWMS && ctrl->cap & NVME_CAP_CRMS_CRIMS)
I don't think the NVME_CAP_CRMS_CRWMS check here makes sense, this should only need the NVME_CAP_CRMS_CRIMS one.
- timeout = NVME_CAP_TIMEOUT(ctrl->cap);
- if (ctrl->cap & NVME_CAP_CRMS_CRWMS) {
u32 crto;
ret = ctrl->ops->reg_read32(ctrl, NVME_REG_CRTO, &crto);
if (ret) {
dev_err(ctrl->device, "Reading CRTO failed (%d)\n",
ret);
return ret;
}
/*
* CRTO should always be greater or equal to CAP.TO, but some
* devices are known to get this wrong. Use the larger of the
* two values.
*/
if (ctrl->ctrl_config & NVME_CC_CRIME)
timeout = max(timeout, NVME_CRTO_CRIMT(crto));
else
timeout = max(timeout, NVME_CRTO_CRWMT(crto));
Should we at least log a harmless one-liner warning if the new timeouts are too small?
On Wed, Sep 13, 2023 at 12:50:15PM +0200, Christoph Hellwig wrote:
On Tue, Sep 12, 2023 at 02:47:33PM -0700, Keith Busch wrote:
1 file changed, 29 insertions(+), 19 deletions(-)
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
- if (ctrl->cap & NVME_CAP_CRMS_CRWMS && ctrl->cap & NVME_CAP_CRMS_CRIMS)
I don't think the NVME_CAP_CRMS_CRWMS check here makes sense, this should only need the NVME_CAP_CRMS_CRIMS one.
I think you're right, but we currently only enable CRIME if both are set, so that would be a functional change snuck into this sanity check.
- timeout = NVME_CAP_TIMEOUT(ctrl->cap);
- if (ctrl->cap & NVME_CAP_CRMS_CRWMS) {
u32 crto;
ret = ctrl->ops->reg_read32(ctrl, NVME_REG_CRTO, &crto);
if (ret) {
dev_err(ctrl->device, "Reading CRTO failed (%d)\n",
ret);
return ret;
}
/*
* CRTO should always be greater or equal to CAP.TO, but some
* devices are known to get this wrong. Use the larger of the
* two values.
*/
if (ctrl->ctrl_config & NVME_CC_CRIME)
timeout = max(timeout, NVME_CRTO_CRIMT(crto));
else
timeout = max(timeout, NVME_CRTO_CRWMT(crto));
Should we at least log a harmless one-liner warning if the new timeouts are too small?
Felix had something like that in an earlier patch, but it would print on every reset. That could get a bit spammy on the kernel logs, but I can add a dev_warn_once() instead.
On Tue, Sep 12, 2023 at 02:47:33PM -0700, Keith Busch wrote:
From: Keith Busch kbusch@kernel.org
Some devices are reporting controller ready mode support, but return 0 for CRTO. These devices require a much higher time to ready than that, so they are failing to initialize after the driver starter preferring that value over CAP.TO.
The spec requires that CAP.TO match the appropritate CRTO value, or be set to 0xff if CRTO is larger than that. This means that CAP.TO can be used to validate if CRTO is reliable, and provides an appropriate fallback for setting the timeout value if not. Use whichever is larger.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=217863 Reported-by: Cláudio Sampaio patola@gmail.com Reported-by: Felix Yan felixonmars@archlinux.org Based-on-a-patch-by: Felix Yan felixonmars@archlinux.org Cc: stable@vger.kernel.org Signed-off-by: Keith Busch kbusch@kernel.org
drivers/nvme/host/core.c | 48 ++++++++++++++++++++++++---------------- 1 file changed, 29 insertions(+), 19 deletions(-)
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c index 37b6fa7466620..4adc0b2f12f1e 100644 --- a/drivers/nvme/host/core.c +++ b/drivers/nvme/host/core.c @@ -2245,25 +2245,8 @@ int nvme_enable_ctrl(struct nvme_ctrl *ctrl) else ctrl->ctrl_config = NVME_CC_CSS_NVM;
- if (ctrl->cap & NVME_CAP_CRMS_CRWMS) {
u32 crto;
ret = ctrl->ops->reg_read32(ctrl, NVME_REG_CRTO, &crto);
if (ret) {
dev_err(ctrl->device, "Reading CRTO failed (%d)\n",
ret);
return ret;
}
if (ctrl->cap & NVME_CAP_CRMS_CRIMS) {
ctrl->ctrl_config |= NVME_CC_CRIME;
timeout = NVME_CRTO_CRIMT(crto);
} else {
timeout = NVME_CRTO_CRWMT(crto);
}
- } else {
timeout = NVME_CAP_TIMEOUT(ctrl->cap);
- }
- if (ctrl->cap & NVME_CAP_CRMS_CRWMS && ctrl->cap & NVME_CAP_CRMS_CRIMS)
ctrl->ctrl_config |= NVME_CC_CRIME;
ctrl->ctrl_config |= (NVME_CTRL_PAGE_SHIFT - 12) << NVME_CC_MPS_SHIFT; ctrl->ctrl_config |= NVME_CC_AMS_RR | NVME_CC_SHN_NONE; @@ -2277,6 +2260,33 @@ int nvme_enable_ctrl(struct nvme_ctrl *ctrl) if (ret) return ret;
- /* CAP value may change after initial CC write */
- ret = ctrl->ops->reg_read64(ctrl, NVME_REG_CAP, &ctrl->cap);
- if (ret)
return ret;
- timeout = NVME_CAP_TIMEOUT(ctrl->cap);
- if (ctrl->cap & NVME_CAP_CRMS_CRWMS) {
u32 crto;
ret = ctrl->ops->reg_read32(ctrl, NVME_REG_CRTO, &crto);
if (ret) {
dev_err(ctrl->device, "Reading CRTO failed (%d)\n",
ret);
return ret;
}
/*
* CRTO should always be greater or equal to CAP.TO, but some
* devices are known to get this wrong. Use the larger of the
* two values.
*/
if (ctrl->ctrl_config & NVME_CC_CRIME)
timeout = max(timeout, NVME_CRTO_CRIMT(crto));
else
timeout = max(timeout, NVME_CRTO_CRWMT(crto));
I saw the original bug report. But wasn't the problem that these were compared before NVME_CC_CRIME had been written?
i.e. is this max() check still needed for the bug reporter's NVMe drive, after NVME_CC_CRIME was been written and CAP has been re-read? (If so, would a quirk be better?)
Kind regards, Niklas
On Wed, Sep 13, 2023 at 12:25:29PM +0000, Niklas Cassel wrote:
if (ctrl->ctrl_config & NVME_CC_CRIME)
timeout = max(timeout, NVME_CRTO_CRIMT(crto));
else
timeout = max(timeout, NVME_CRTO_CRWMT(crto));
I saw the original bug report. But wasn't the problem that these were compared before NVME_CC_CRIME had been written?
i.e. is this max() check still needed for the bug reporter's NVMe drive, after NVME_CC_CRIME was been written and CAP has been re-read? (If so, would a quirk be better?)
The values reported in CRTO don't change with the CC writes. It's only CAP.TO that can change, so we still can't rely on CRTO at any point in the sequence for these devices.
Hello Keith,
On Wed, Sep 13, 2023 at 08:20:09AM -0700, Keith Busch wrote:
On Wed, Sep 13, 2023 at 12:25:29PM +0000, Niklas Cassel wrote:
if (ctrl->ctrl_config & NVME_CC_CRIME)
timeout = max(timeout, NVME_CRTO_CRIMT(crto));
else
timeout = max(timeout, NVME_CRTO_CRWMT(crto));
I saw the original bug report. But wasn't the problem that these were compared before NVME_CC_CRIME had been written?
i.e. is this max() check still needed for the bug reporter's NVMe drive, after NVME_CC_CRIME was been written and CAP has been re-read? (If so, would a quirk be better?)
The values reported in CRTO don't change with the CC writes. It's only CAP.TO that can change, so we still can't rely on CRTO at any point in the sequence for these devices.
Yes, I know that CRTO (which is the new register), doesn't change. It is supposed to have to values: CRTO.CRIMT CRTO.CRWMT
The hack to modify CAP.TO to be in sync with either CRTO.CRIMT or CRTO.CRWMT is really ugly.
Considering that CRTO.CRIMT and CRTO.CRWMT are also 16-bit, so wider than CAP.TO, suggests that software should read these if available, i.e. no need to ever read CAP.TO if CAP.CRMS.CRWMS is 1 or if CAP.CRMS.CRIMS is 1.
CRTO.CRIMT is allowed to be 0, if CAP.CRMS.CRIMS is 0. CRTO.CRWMT is not allowed to be 0 if CAP.CRMS.CRWMS is 1. (and this bit should be set to 1 according to NVMe 2.0).
Additionally, NVMe 2.0b, Figure 35, clearly states register CRTO as Mandatory for I/O and Admin controllers.
I guess that I just can't understand how a controller can set bit CAP.CRMS.CRWMS to 1, without setting a value in CRTO.CRWMT. Is it such a silly spec violation, that they seem to have not bothered to read what this bit means at all.
Such a controller is so obviously broken that it deserves a quirk.
Although, I understand that using quirks is not always the best solution for end users...
Kind regards, Niklas
On Wed, Sep 13, 2023 at 04:14:44PM +0000, Niklas Cassel wrote:
Hello Keith,
On Wed, Sep 13, 2023 at 08:20:09AM -0700, Keith Busch wrote:
On Wed, Sep 13, 2023 at 12:25:29PM +0000, Niklas Cassel wrote:
if (ctrl->ctrl_config & NVME_CC_CRIME)
timeout = max(timeout, NVME_CRTO_CRIMT(crto));
else
timeout = max(timeout, NVME_CRTO_CRWMT(crto));
I saw the original bug report. But wasn't the problem that these were compared before NVME_CC_CRIME had been written?
i.e. is this max() check still needed for the bug reporter's NVMe drive, after NVME_CC_CRIME was been written and CAP has been re-read? (If so, would a quirk be better?)
The values reported in CRTO don't change with the CC writes. It's only CAP.TO that can change, so we still can't rely on CRTO at any point in the sequence for these devices.
Yes, I know that CRTO (which is the new register), doesn't change. It is supposed to have to values: CRTO.CRIMT CRTO.CRWMT
The hack to modify CAP.TO to be in sync with either CRTO.CRIMT or CRTO.CRWMT is really ugly.
This is just a sanity check to ensure the device complies with spec, specifically this part for CAP.TO:
this field shall be set to: a) the value in Controller Ready With Media Timeout (CRTO.CRWMT); or b) FFh if CRTO.CRWMT is greater than FFh.
If CRWMT is smaller than CAP.TO, then the device is not spec compliant. Yeah, the driver should prefer CRTO values, but if they're clearly unreliable, we ought to fallback to a longer wait since we can't trust the lower value. What's the worst that can happen? We wait longer than necessary for a device that never becomes ready? That's totally acceptable, IMO.
Considering that CRTO.CRIMT and CRTO.CRWMT are also 16-bit, so wider than CAP.TO, suggests that software should read these if available, i.e. no need to ever read CAP.TO if CAP.CRMS.CRWMS is 1 or if CAP.CRMS.CRIMS is 1.
CRTO.CRIMT is allowed to be 0, if CAP.CRMS.CRIMS is 0. CRTO.CRWMT is not allowed to be 0 if CAP.CRMS.CRWMS is 1. (and this bit should be set to 1 according to NVMe 2.0).
I don't see any reason why CRTO.CRWMT can't legitimately be 0. I can conceive of an implementation ready for commands immediately after CC.EN 0->1 without a delay. In this device case, though, it's just wrong.
Additionally, NVMe 2.0b, Figure 35, clearly states register CRTO as Mandatory for I/O and Admin controllers.
I guess that I just can't understand how a controller can set bit CAP.CRMS.CRWMS to 1, without setting a value in CRTO.CRWMT. Is it such a silly spec violation, that they seem to have not bothered to read what this bit means at all.
Yep, but broken controllers are broken. They used to work in Linux before the driver started preferring the recommended CRTO, so we gotta fix 'em.
Such a controller is so obviously broken that it deserves a quirk.
Although, I understand that using quirks is not always the best solution for end users...
If we have a way to sanity check for spec non-compliance, I would prefer doing that generically rather than quirk specific devices. Coincidently, I received an internal report just yesterday of a 2nd vendor with similar breakage, so this might be a common fuck up. Let's not put users and maintainers through the hassle if we can resolve it just once.
On Wed, Sep 13, 2023 at 09:46:08AM -0700, Keith Busch wrote:
I guess that I just can't understand how a controller can set bit CAP.CRMS.CRWMS to 1, without setting a value in CRTO.CRWMT. Is it such a silly spec violation, that they seem to have not bothered to read what this bit means at all.
Yep, but broken controllers are broken. They used to work in Linux before the driver started preferring the recommended CRTO, so we gotta fix 'em.
I'm not saying that these controllers shouldn't work with Linux. However, these controller used to work with CC.CRIME == 0, so perhaps we should continue to use them that way?
My reasoning is that if the controller did not even manage to get the most fundamental part of this new feature correctly (i.e. defining the actual timeout values for this feature), then perhaps we shouldn't assume that the rest of the feature, setting the NRDY bit, sending the AER, etc., is implemented correctly either.
When CC.CRIME == 1, both fields in CRTO actually have a defined meaning: CRTO.CRIMT is the maximum time that host software should wait for the controller to become ready. CRTO.CRWMT is the maximum time that host software should wait for all attached namespaces to become ready.
So having both fields defined to zero, or rather, to have both fields defined to a value smaller than CAP.TO, regardless of CC.CRIME value, is quite bad.
So perhaps it is better to keep CC.CRIME == 0 for such controllers.
If we have a way to sanity check for spec non-compliance, I would prefer doing that generically rather than quirk specific devices.
It's not going to be beautiful, but one way could be to: -check CAP.CRMS.CRIMS, if it is set to 1: -write CC.CRIME == 1, -re-read CAP register, since it can change depending on CC.CRIME (urgh) -check if CRTO.CRIMT is less than CAP.TO, if so: -write CC.CRIME == 0 (disable the feature since it is obviously broken) -re-read CAP register, since it can change depending on CC.CRIME (urgh)
For the record, I'm obviously happy with whatever decision you make, I'm just giving my two cents...
Kind regards, Niklas
On Fri, Sep 15, 2023 at 09:31:44PM +0000, Niklas Cassel wrote:
I'm not saying that these controllers shouldn't work with Linux. However, these controller used to work with CC.CRIME == 0, so perhaps we should continue to use them that way?
Perhaps I missed something, but I didn't get any indication that these controllers were reporting CRIMS capability. They're just reporting CRWMS as far as I know, so CRIME doesn't apply. I only included that case in the patch for completeness.
So having both fields defined to zero, or rather, to have both fields defined to a value smaller than CAP.TO, regardless of CC.CRIME value, is quite bad.
So perhaps it is better to keep CC.CRIME == 0 for such controllers.
If we have a way to sanity check for spec non-compliance, I would prefer doing that generically rather than quirk specific devices.
It's not going to be beautiful, but one way could be to: -check CAP.CRMS.CRIMS, if it is set to 1: -write CC.CRIME == 1, -re-read CAP register, since it can change depending on CC.CRIME (urgh) -check if CRTO.CRIMT is less than CAP.TO, if so: -write CC.CRIME == 0 (disable the feature since it is obviously broken) -re-read CAP register, since it can change depending on CC.CRIME (urgh)
There is a corner case that I am somewhat purposefully ignoring: CAP.TO is worst case for both CC.EN 0->1 and 1->0, whereas both CRTO values are only for the 0->1 transition. It's entirely possible some implementation needs a longer 1->0 transition, so CAP.TO *could* validly be greater than the either CRTO value. I just don't see that happening in practice, and even if we do encounter such a device, waiting a little longer on init for a broken controller doesn't make any situation worse.
On Fri, Sep 15, 2023 at 02:50:20PM -0700, Keith Busch wrote:
On Fri, Sep 15, 2023 at 09:31:44PM +0000, Niklas Cassel wrote:
I'm not saying that these controllers shouldn't work with Linux. However, these controller used to work with CC.CRIME == 0, so perhaps we should continue to use them that way?
Perhaps I missed something, but I didn't get any indication that these controllers were reporting CRIMS capability. They're just reporting CRWMS as far as I know, so CRIME doesn't apply. I only included that case in the patch for completeness.
Yes, you are right of course.
My brain started thinking about this issue when relaxing... but I guess that my brain got some details wrong when context switching to this problem again after too much idle...
Have a nice weekend!
Kind regards, Niklas
linux-stable-mirror@lists.linaro.org