From: Alistair Francis alistair.francis@wdc.com
[ Upstream commit 2e482655019ab6fcfe8865b62432c6d03f0b5f80 ]
The NVMe Base Specification 2.1 states that:
""" A host requests an explicit persistent connection ... by specifying a non-zero Keep Alive Timer value in the Connect command. """
As such if we are starting a persistent connection to a discovery controller and the KATO is currently 0 we need to update KATO to a non zero value to avoid continuous timeouts on the target.
Signed-off-by: Alistair Francis alistair.francis@wdc.com Reviewed-by: Hannes Reinecke hare@suse.de Reviewed-by: Christoph Hellwig hch@lst.de Signed-off-by: Keith Busch kbusch@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org ---
LLM Generated explanations, may be completely bogus:
YES
- What it fixes - Aligns Linux host behavior with NVMe Base Spec 2.1 requirement that a host must specify a non-zero Keep Alive Timer in the Connect command to request an explicit persistent discovery connection. The previous behavior left `KATO=0` for discovery controllers even when the connection became persistent, causing targets to time out and drop connections.
- Code change and behavior - In `nvme_start_ctrl()` (`drivers/nvme/host/core.c:4998`), on reconnect for discovery controllers (`test_bit(NVME_CTRL_STARTED_ONCE, &ctrl->flags)` and `nvme_discovery_ctrl(ctrl)`), the patch: - Checks if `ctrl->kato` is zero. - If zero, calls `nvme_stop_keep_alive(ctrl)`, sets `ctrl->kato = NVME_DEFAULT_KATO`, then `nvme_start_keep_alive(ctrl)`. - Still sends the rediscover uevent: `nvme_change_uevent(ctrl, "NVME_EVENT=rediscover")`. - This immediately starts keep-alive commands after a persistent discovery reconnect and ensures subsequent Connect commands advertise non-zero KATO.
- Why this is correct and effective - Immediate effect: Even if the just-completed Connect used `kato=0`, forcing a non-zero `kato` here starts the host keep-alive work right away, avoiding target keep-alive timeouts after a persistent reconnect. - Future connections: `nvmf_connect_cmd_prep()` sets Connect’s KATO from `ctrl->kato` (`drivers/nvme/host/fabrics.c:426`). With this change, the next reconnection will send a non-zero KATO in the Connect command as the spec requires. - Safe sequence: `nvme_stop_keep_alive()` is a no-op when `kato==0` (`drivers/nvme/host/core.c:1412`), then `ctrl->kato` is set to `NVME_DEFAULT_KATO` (`drivers/nvme/host/nvme.h:31`), and `nvme_start_keep_alive()` only schedules work when `kato!=0` (`drivers/nvme/host/core.c:1404`).
- Scope and risk - Scope-limited: Only affects discovery controllers on reconnect (persistent discovery) and only when `kato==0`. No effect on: - Non-discovery (I/O) controllers (they already default to non-zero KATO). - Discovery controllers where userspace explicitly set a non-zero KATO. - No architectural changes; uses existing helpers and flags; no ABI change. - Regression risk is low. Prior history already introduced persistent discovery semantics and a sysfs `kato` attribute, and transports already honor `ctrl->kato` for Connect. This change simply fills a corner case where `kato` remained zero in a persistent discovery reconnect.
- Historical context and consistency - 2018: We explicitly avoided KA to discovery controllers per early spec constraints. - 2021: The code was adjusted so discovery controllers default to `kato=0`, while I/O controllers default to `NVME_DEFAULT_KATO` (commit 32feb6de). Persistent discovery connections were intended to have a positive KATO (via options), but implicit persistent reconnects could still have `kato=0`. - 2022: Added rediscover uevent for persistent discovery reconnects (f46ef9e87) and `NVME_CTRL_STARTED_ONCE` usage. - This patch completes the intent by ensuring persistent discovery reconnects run with non-zero KATO automatically, preventing target timeouts and complying with spec 2.1.
- Stable backport suitability - Fixes a user-visible bug (target timeouts and unstable discovery connectivity on persistent reconnects). - Small, self-contained change confined to `nvme_start_ctrl()` in `drivers/nvme/host/core.c`. - No new features or interfaces; minimal risk of regression; behavior matches spec and existing design. - Dependencies exist in stable trees that already have persistent discovery support and the `NVME_CTRL_STARTED_ONCE` mechanism. For older branches that still use `test_and_set_bit` in the rediscover path, the logic remains valid within that conditional block.
- Side notes for backporters - Ensure the tree has `NVME_CTRL_STARTED_ONCE`, `nvme_discovery_ctrl()`, and the rediscover uevent path in `nvme_start_ctrl()`. If an older stable branch uses `test_and_set_bit` instead of `test_bit`, place the new KATO block inside that existing conditional. - `nvmf_connect_cmd_prep()` must already populate Connect’s `kato` from `ctrl->kato` (`drivers/nvme/host/fabrics.c:426`) so that future reconnects benefit from the updated `kato`.
drivers/nvme/host/core.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c index 6b7493934535a..5714d49932822 100644 --- a/drivers/nvme/host/core.c +++ b/drivers/nvme/host/core.c @@ -4990,8 +4990,14 @@ void nvme_start_ctrl(struct nvme_ctrl *ctrl) * checking that they started once before, hence are reconnecting back. */ if (test_bit(NVME_CTRL_STARTED_ONCE, &ctrl->flags) && - nvme_discovery_ctrl(ctrl)) + nvme_discovery_ctrl(ctrl)) { + if (!ctrl->kato) { + nvme_stop_keep_alive(ctrl); + ctrl->kato = NVME_DEFAULT_KATO; + nvme_start_keep_alive(ctrl); + } nvme_change_uevent(ctrl, "NVME_EVENT=rediscover"); + }
if (ctrl->queue_count > 1) { nvme_queue_scan(ctrl);