Wow!
Sasha I think an impersonator has gotten into your account, and is just making nonsense up.
This reads like an impassioned plea to backport this change, from someone who has actually dealt with it.
However reading the justification in detail is an exercise in reading falehoods.
If this does not come from an impersonator then if this comes from a human being, I recommend you have a talk with them.
If this comes from a machine I recommend take it out of commission and rework it.
If I see this kind of baloney again I expect I will just auto-nack it instead of reading it, as reading it appears to be a waste of time. It is a complete waste reading fiction in what little time I have for kernel development.
Eric
Sasha Levin sashal@kernel.org writes:
**YES**
This commit should be backported to stable kernel trees for the following reasons:
## Critical Bug Fix for Real User Issues
- **Fixes Actual Suspend Failures**: The commit addresses real-world suspend failures under memory pressure on systems with AMD discrete GPUs. The linked issues (ROCm/ROCK-Kernel-Driver#174 and freedesktop.org/drm/amd#2362) indicate this affects actual users.
The links in the first paragraph are very distorted. The links from the actual change are:
https://github.com/ROCm/ROCK-Kernel-Driver/issues/174 https://gitlab.freedesktop.org/drm/amd/-/issues/2362
Those completely distorted links make understanding this justification much harder then necessary.
- **Regression Fix**: This is effectively a regression fix. The PM subsystem's early swap restriction prevents AMD GPU drivers from properly evicting VRAM during their prepare() callbacks, which is a requirement that has become more critical as GPU VRAM sizes have increased.
That is a justification. There is no evidence that a kernel change made this worse. Thus there is no evidence this is a regression fix.
## Small, Contained Change
- **Minimal Code Changes**: The fix is remarkably simple - it just moves the `pm_restrict_gfp_mask()` call from early in the suspend sequence to after `dpm_prepare()` completes. The changes are:
- Move `pm_restrict_gfp_mask()` from multiple early locations to inside `dpm_suspend_start()` after `dpm_prepare()` succeeds
- Add corresponding `pm_restore_gfp_mask()` calls in error paths and resume paths
- Remove the now-redundant calls from hibernate.c and suspend.c
Completely wrong.
- **Low Risk of Regression**: The change maintains the original intent of preventing I/O during the critical suspend phase while allowing it during device preparation. The swap restriction still happens before `dpm_suspend()`, just after `dpm_prepare()`.
This is a fundamental change to a susbsystem that the subsystem maintainer does not say is low risk.
## Follows Stable Rules
- **Meets Stable Criteria**:
- Fixes a real bug that bothers people (suspend failures)
Addresses a real bug.
- Small change (moves function calls, doesn't introduce new logic)
The change is a large change in the logic.
- Obviously correct (allows drivers to use swap during their designated preparation phase)
It obviously changes the behavior. It is not at all obvious the change is behavior is desirable for all callbacks, and in all other scenarios.
- Already tested by users (Tested-by tags from affected users)
Yes it has Tested-by tags.
## Similar to Other Backported Commits
- **Pattern Matches**: Looking at the similar commits provided, this follows the same pattern as the AMD GPU eviction commits that were backported. Those commits also addressed the same fundamental issue - ensuring GPU VRAM can be properly evicted during suspend/hibernation.
Which commits that were backported?
## Critical Timing
Timing??? There is no race condition.
- **Error Path Handling**: The commit properly handles error paths by adding `pm_restore_gfp_mask()` calls in:
- `dpm_resume_end()` for normal resume
- `platform_recover()` error path in suspend.c
- `pm_restore_gfp_mask()` in kexec_core.c for kexec flows
The commit is well-tested, addresses a real problem affecting users, and makes a minimal, obviously correct change to fix suspend failures on systems with discrete GPUs under memory pressure.
What evidence is there that this commit has been tested let alone well-tested.
The entire line of reasoning is completely suspect.
Eric