From: Yunxiang Li Yunxiang.Li@amd.com
[ Upstream commit ba5e322b2617157edb757055252a33587b6729e0 ]
On multi-vf setup if the VM have two vf assigned, perhaps from two different gpus, mgpu fan boost will fail.
Signed-off-by: Yunxiang Li Yunxiang.Li@amd.com Acked-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Sasha Levin sashal@kernel.org ---
LLM Generated explanations, may be completely bogus:
YES
Explanation
- What changed and where: - The condition in `amdgpu_device_enable_mgpu_fan_boost()` now skips enabling MGPU fan boost when running in SR-IOV multi-VF mode by adding `amdgpu_sriov_multi_vf_mode(adev)` to the exclusion check: - Old: only skips APUs - New: also skips multi-VF VFs - File: drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:3373 - This function is invoked during late init: - File: drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:3455
- Why the change is needed (bug context and behavior today): - In SR-IOV multi-VF mode, the SMU power management is intentionally disabled: - File: drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c:1868 - In `smu_hw_init()`: `if (amdgpu_sriov_multi_vf_mode(adev)) { smu->pm_enabled = false; return 0; }` - Consequently, attempting to enable MGPU fan boost from a VF returns `-EOPNOTSUPP`: - File: drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c:3668 - In `smu_enable_mgpu_fan_boost()`: returns `-EOPNOTSUPP` when `!smu->pm_enabled || !smu->adev->pm.dpm_enabled` - Today, `amdgpu_device_enable_mgpu_fan_boost()` breaks out of its loop on the first failure (`if (ret) break;`), which: - Spams the logs with “enable mgpu fan boost failed” messages. - Can prevent enabling MGPU fan boost for other eligible GPUs in mixed setups because it stops at the first error. - File: drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:3373
- Why this is safe and suitable for stable: - Small and contained: one conditional update, no API or structural changes; uses existing and widely used macro `amdgpu_sriov_multi_vf_mode()`: - Macro: drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h:27 - Matches established pattern: AMDGPU already disables many PM features in multi-VF mode and guards calls on `pm_enabled`. - Prevents a known failure path and avoids breaking out early in the loop, improving behavior without changing functionality for supported cases. - No behavioral change for PFs or single-VF (“PP_ONE_VF”) environments; only avoids unsupported operations for multi-VF VFs.
- Stable tree criteria assessment: - Fixes a user-visible bug (failed MGPU fan boost attempts and log noise; prevents premature loop exit from blocking other devices). - Minimal risk and scope; no architectural changes; confined to AMDGPU. - No new features; purely defensive fix to avoid unsupported operations. - While there’s no explicit “Fixes:” or “Cc: stable” tag, it is a low- risk, clear bug-avoidance change acked by AMD maintainers.
Conclusion: This commit is a good candidate for backport to stable kernels that have the MGPU fan boost path and `amdgpu_sriov_multi_vf_mode()` available.
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index dfa68cb411966..097ceee79ece6 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -3389,7 +3389,7 @@ static int amdgpu_device_enable_mgpu_fan_boost(void) for (i = 0; i < mgpu_info.num_dgpu; i++) { gpu_ins = &(mgpu_info.gpu_ins[i]); adev = gpu_ins->adev; - if (!(adev->flags & AMD_IS_APU) && + if (!(adev->flags & AMD_IS_APU || amdgpu_sriov_multi_vf_mode(adev)) && !gpu_ins->mgpu_fan_enabled) { ret = amdgpu_dpm_enable_mgpu_fan_boost(adev); if (ret)