From: Christoph Rudorff chris@rudorff.com
[ Upstream commit 4c4d9b7b6c6e676eca22585139aba5f03de74b90 ]
Hibernate bricks the machine if a discrete GPU was disabled via
echo IGD > /sys/kernel/debug/vgaswitcheroo/switch
The freeze and thaw handler lacks checking the GPU power state, as suspend and resume do.
This patch add the checks and fix this issue.
Signed-off-by: Christoph Rudorff chris@rudorff.com Signed-off-by: Lyude Paul lyude@redhat.com Link: https://lore.kernel.org/r/20250325-nouveau-fix-hibernate-v2-1-2bd5c13fb953@r... Signed-off-by: Sasha Levin sashal@kernel.org ---
Based on my comprehensive analysis of both the commit changes and the broader Linux kernel context, I can now provide my assessment: **YES** This commit should be backported to stable kernel trees. Here's my extensive analysis: ## **Critical Bug Fix Analysis** **1. User-Facing System-Breaking Issue** - **Severity**: The commit fixes a machine- bricking bug during hibernation - **Trigger**: Common user action (`echo IGD > /sys/kernel/debug/vgaswitcheroo/switch`) - **Impact**: Complete system failure requiring hard reset - **Affected Systems**: All laptops/systems with discrete NVIDIA GPUs using nouveau driver **2. Minimal, Surgical Code Changes** The fix is extremely contained and safe: ```c + if (drm->dev->switch_power_state == DRM_SWITCH_POWER_OFF || + drm->dev->switch_power_state == DRM_SWITCH_POWER_DYNAMIC_OFF) + return 0; ``` - **8 lines added** across 2 functions (`nouveau_pmops_freeze` and `nouveau_pmops_thaw`) - **No complex logic** - simple early return on disabled GPU - **No side effects** - only prevents hardware access when GPU is already powered off - **Zero risk** - early return with success code when hardware is inaccessible **3. Follows Established Kernel Patterns** This change brings hibernation handlers in line with existing patterns: - **AMD drivers**: Already implement identical checks in suspend/resume - **Intel i915**: Uses same power state checking methodology - **Nouveau suspend/resume**: Already had similar protections - **Standard practice**: This pattern is used throughout the DRM subsystem **4. Addresses Critical Gap in Power Management** - **Missing parity**: Hibernation handlers lacked protection present in suspend/resume - **Hardware protection**: Prevents accessing powered-off GPU hardware - **VGA switcheroo integration**: Proper handling of multi- GPU power states - **System stability**: Eliminates PCI bus hangs during hibernation **5. Stable Tree Suitability Criteria Met** ✅ **Fixes important user-facing bug**: Machine bricking during hibernation ✅ **Small and contained**: Only 8 lines, no architectural changes ✅ **Minimal regression risk**: Early returns are inherently safe ✅ **No new features**: Pure bug fix for existing functionality ✅ **Clear cause and effect**: Well-understood hardware access protection ✅ **Subsystem confined**: Changes only affect nouveau hibernation path **6. Historical Precedent** Similar commits have been backported (from provided examples): - **drm/radeon hibernation fix**: "Cc: stable@vger.kernel.org" for APU hibernation - **drm/amdgpu hibernation fix**: "Cc: stable@vger.kernel.org" for s0ix hibernation **7. Risk Assessment: Extremely Low** - **Cannot break working systems**: Only affects disabled GPU hibernation path - **Early return pattern**: Universally safe approach in kernel drivers - **Hardware protection**: Prevents undefined behavior from hardware access - **Tested path**: Same logic already proven in suspend/resume handlers ## **Conclusion** This commit represents exactly the type of fix that stable trees are designed for: a critical bug fix that prevents system failures, with minimal code changes and virtually zero regression risk. The fix follows established kernel patterns and addresses a serious gap in power management that could leave users with completely unusable systems after hibernation attempts. The commit should definitely be backported to prevent users from experiencing machine-bricking hibernation failures on systems with disabled discrete NVIDIA GPUs.
drivers/gpu/drm/nouveau/nouveau_drm.c | 8 ++++++++ 1 file changed, 8 insertions(+)
diff --git a/drivers/gpu/drm/nouveau/nouveau_drm.c b/drivers/gpu/drm/nouveau/nouveau_drm.c index e154d08857c55..c69139701056d 100644 --- a/drivers/gpu/drm/nouveau/nouveau_drm.c +++ b/drivers/gpu/drm/nouveau/nouveau_drm.c @@ -1079,6 +1079,10 @@ nouveau_pmops_freeze(struct device *dev) { struct nouveau_drm *drm = dev_get_drvdata(dev);
+ if (drm->dev->switch_power_state == DRM_SWITCH_POWER_OFF || + drm->dev->switch_power_state == DRM_SWITCH_POWER_DYNAMIC_OFF) + return 0; + return nouveau_do_suspend(drm, false); }
@@ -1087,6 +1091,10 @@ nouveau_pmops_thaw(struct device *dev) { struct nouveau_drm *drm = dev_get_drvdata(dev);
+ if (drm->dev->switch_power_state == DRM_SWITCH_POWER_OFF || + drm->dev->switch_power_state == DRM_SWITCH_POWER_DYNAMIC_OFF) + return 0; + return nouveau_do_resume(drm, false); }