From: Michael Strauss michael.strauss@amd.com
[ Upstream commit e3419e1e44b87d4176fb98679a77301b1ca40f63 ]
[WHY] In the worst case, AUX intra-hop done can take hundreds of milliseconds as each retimer in a link might have to wait a full AUX_RD_INTERVAL to send LT abort downstream.
[HOW] Wait 300ms for each retimer in a link to allow time to propagate a LT abort without infinitely waiting on intra-hop done. For no-retimer case, keep the max duration at 10ms.
Reviewed-by: Wenjing Liu wenjing.liu@amd.com Signed-off-by: Michael Strauss michael.strauss@amd.com Signed-off-by: Ivan Lipski ivan.lipski@amd.com Tested-by: Daniel Wheeler daniel.wheeler@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Sasha Levin sashal@kernel.org ---
LLM Generated explanations, may be completely bogus:
YES
- What changed - In `drivers/gpu/drm/amd/display/dc/link/protocols/link_dp_training.c :1011`, `dpcd_exit_training_mode()` previously polled the sink for “intra‑hop AUX reply indication” clearing with a fixed 10 ms window using `for (i = 0; i < 10; i++) ... fsleep(1000);` (see `drivers/gpu /drm/amd/display/dc/link/protocols/link_dp_training.c:1024` and `:1027`). - The patch computes a per‑topology maximum wait based on the number of LTTPR retimers and changes the loop bound accordingly: - Introduces `lttpr_count = dp_parse_lttpr_repeater_count(link- >dpcd_caps.lttpr_caps.phy_repeater_cnt)` and `intra_hop_disable_time_ms = (lttpr_count > 0 ? lttpr_count * 300 : 10)` so the poll waits up to 300 ms per retimer, defaulting to 10 ms if none are present. - Changes the loop counter type from `uint8_t` to `uint32_t` to safely support multi‑second waits without overflow. - The poll still checks `DP_SINK_STATUS` for `DP_INTRA_HOP_AUX_REPLY_INDICATION` to go low and sleeps 1 ms per iteration via `fsleep(1000)`.
- Why it matters (bug being fixed) - For DP 2.0 (128b/132b), when exiting link training the source must wait for intra‑hop AUX reply indication to clear. With retimers, each hop may wait up to a full AUX_RD_INTERVAL to propagate the link‑training abort downstream; worst case can be “hundreds of milliseconds” per hop. - The prior fixed 10 ms total window can be too short, causing premature exit while retimers are still active. That can lead to spurious failures or retries after training, affecting users with LTTPR chains. - The new logic scales the wait to the actual retimer count, eliminating timeouts without risking indefinite waits.
- Context and correctness - The helper `dp_parse_lttpr_repeater_count()` already exists and is used elsewhere in DC to scale timeouts (e.g., `link_dp_training_128b_132b.c:248` sets `cds_wait_time_limit` from the same count), so this change aligns with existing design patterns. - `lttpr_caps.phy_repeater_cnt` is populated during capability discovery (`link_dp_capability.c:1500+`), and invalid counts are handled (including forcing 1 in certain fixed‑VS cases), so the new wait computation is robust. - The change affects only the DP 2.0 path (`if (encoding == DP_128b_132b_ENCODING)` in `dpcd_exit_training_mode()`), leaving DP 1.x behavior untouched. - The loop counter upgrade to `uint32_t` is necessary to avoid overflow for waits >255 ms (a latent bug if the bound is raised).
- Risk assessment - Behavioral changes are confined to a small, well‑scoped polling loop in AMD DC’s DP training teardown. No architectural changes, no ABI changes, no new features. - Regression risk is low: non‑retimer systems keep the 10 ms max; retimer topologies get longer but finite waits (worst case ~2.4 s for 8 retimers). - The i915 driver also waits for the same intra‑hop indication to clear (up to 500 ms total; see `drivers/gpu/drm/i915/display/intel_dp_link_training.c:1119`), so waiting here is consistent with cross‑driver practice.
- Stable backport criteria - Fixes a real user‑visible reliability issue (training teardown races on DP 2.0 with retimers). - Small, contained change with clear rationale and no dependency on new infrastructure. - No feature enablement; minimal regression surface; targeted to a single function in AMD DC.
- Recommendation - Backport to stable trees that include AMD DC DP 2.0 (128b/132b) support. This improves link‑training robustness for LTTPR topologies with negligible risk for others.
.../drm/amd/display/dc/link/protocols/link_dp_training.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/amd/display/dc/link/protocols/link_dp_training.c b/drivers/gpu/drm/amd/display/dc/link/protocols/link_dp_training.c index 2dc1a660e5045..134093ce5a8e8 100644 --- a/drivers/gpu/drm/amd/display/dc/link/protocols/link_dp_training.c +++ b/drivers/gpu/drm/amd/display/dc/link/protocols/link_dp_training.c @@ -1018,7 +1018,12 @@ static enum link_training_result dpcd_exit_training_mode(struct dc_link *link, e { enum dc_status status; uint8_t sink_status = 0; - uint8_t i; + uint32_t i; + uint8_t lttpr_count = dp_parse_lttpr_repeater_count(link->dpcd_caps.lttpr_caps.phy_repeater_cnt); + uint32_t intra_hop_disable_time_ms = (lttpr_count > 0 ? lttpr_count * 300 : 10); + + // Each hop could theoretically take over 256ms (max 128b/132b AUX RD INTERVAL) + // To be safe, allow 300ms per LTTPR and 10ms for no LTTPR case
/* clear training pattern set */ status = dpcd_set_training_pattern(link, DP_TRAINING_PATTERN_VIDEOIDLE); @@ -1028,7 +1033,7 @@ static enum link_training_result dpcd_exit_training_mode(struct dc_link *link, e
if (encoding == DP_128b_132b_ENCODING) { /* poll for intra-hop disable */ - for (i = 0; i < 10; i++) { + for (i = 0; i < intra_hop_disable_time_ms; i++) { if ((core_link_read_dpcd(link, DP_SINK_STATUS, &sink_status, 1) == DC_OK) && (sink_status & DP_INTRA_HOP_AUX_REPLY_INDICATION) == 0) break;