From: Xin Wang x.wang@intel.com
[ Upstream commit 95d0883ac8105717f59c2dcdc0d8b9150f13aa12 ]
This patch ensures the gt will be awake for the entire duration of the resume sequences until GuCRC takes over and GT-C6 gets re-enabled.
Before suspending GT-C6 is kept enabled, but upon resume, GuCRC is not yet alive to properly control the exits and some cases of instability and corruption related to GT-C6 can be observed.
Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/4037
Suggested-by: Rodrigo Vivi rodrigo.vivi@intel.com Signed-off-by: Xin Wang x.wang@intel.com Reviewed-by: Rodrigo Vivi rodrigo.vivi@intel.com Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/4037 Link: https://lore.kernel.org/r/20250827000633.1369890-3-x.wang@intel.com Signed-off-by: Rodrigo Vivi rodrigo.vivi@intel.com Signed-off-by: Sasha Levin sashal@kernel.org ---
LLM Generated explanations, may be completely bogus:
YES
- Fixes real user-visible bugs: The commit addresses resume-time instability and corruption linked to GT-C6 being enabled before GuC (power control) is up, as described in the message and tracked by the referenced issue. Keeping the GT awake through resume avoids those failures.
- Small, contained change: It only adds early RC6 disable during resume paths and the required header: - Adds `#include "xe_gt_idle.h"` so the helper is visible (drivers/gpu/drm/xe/xe_pm.c:21). - System resume: disables GT-C6 for all GTs right at resume entry, before any other resume work (drivers/gpu/drm/xe/xe_pm.c:184-186). - Runtime resume: same early disable for all GTs (drivers/gpu/drm/xe/xe_pm.c:570-572). - No architectural changes, no interface changes, no behavior changes outside resume paths.
- Correct technical fix: The helper `xe_gt_idle_disable_c6()` safely forces GT out of RC6 before resume proceeds: - Implementation clears RC6 and RC state under forcewake and is a no- op on VFs (drivers/gpu/drm/xe/xe_gt_idle.c:389-407). - If forcewake is not available yet, it returns `-ETIMEDOUT`; the resume continues without regressing behavior (callers ignore return, which is acceptable to prevent blocking resume).
- Proper handoff to re-enable C-states: RC6 is re-enabled by GuC Power Conservation once firmware is up, or explicitly when GuC PC is skipped: - `xe_uc_load_hw()` starts GuC PC during GT bringup (drivers/gpu/drm/xe/xe_uc.c:215). - If GuC PC is skipped, RC6 is explicitly re-enabled via `xe_gt_idle_enable_c6(gt)` (drivers/gpu/drm/xe/xe_guc_pc.c:1257). - Thus the “keep GT awake only until GuC takes over” intent is fulfilled, avoiding prolonged power impact.
- Low regression risk: - Scope limited to early resume time; worst-case effect is slightly higher power during resume window. - No changes to suspend sequencing, only resume entry. - SR-IOV VFs unaffected (helper is no-op there). - Resume sequences already transition to GuC-controlled power states, so this change aligns with existing design.
- Stable backport suitability: - Bug fix with user impact (instability/corruption) and a minimal, targeted change. - No new features or ABI changes. - Touches the `drm/xe` driver only, not core subsystems. - If a target stable branch predates `xe_gt_idle_disable_c6()` or `xe_gt_idle.h`, the backport must include or adapt to the equivalent RC6 control helper; otherwise this applies cleanly.
Overall, this is a classic stable-worthy fix: minimal, isolated, and prevents real-world resume failures without architectural churn.
drivers/gpu/drm/xe/xe_pm.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/xe/xe_pm.c b/drivers/gpu/drm/xe/xe_pm.c index 3e301e42b2f19..9fccc7a855f30 100644 --- a/drivers/gpu/drm/xe/xe_pm.c +++ b/drivers/gpu/drm/xe/xe_pm.c @@ -18,7 +18,7 @@ #include "xe_device.h" #include "xe_ggtt.h" #include "xe_gt.h" -#include "xe_guc.h" +#include "xe_gt_idle.h" #include "xe_i2c.h" #include "xe_irq.h" #include "xe_pcode.h" @@ -177,6 +177,9 @@ int xe_pm_resume(struct xe_device *xe) drm_dbg(&xe->drm, "Resuming device\n"); trace_xe_pm_resume(xe, __builtin_return_address(0));
+ for_each_gt(gt, xe, id) + xe_gt_idle_disable_c6(gt); + for_each_tile(tile, xe, id) xe_wa_apply_tile_workarounds(tile);
@@ -547,6 +550,9 @@ int xe_pm_runtime_resume(struct xe_device *xe)
xe_rpm_lockmap_acquire(xe);
+ for_each_gt(gt, xe, id) + xe_gt_idle_disable_c6(gt); + if (xe->d3cold.allowed) { err = xe_pcode_ready(xe, true); if (err)