From: Mangesh Gadre Mangesh.Gadre@amd.com
[ Upstream commit 37551277dfed796b6749e4fa52bdb62403cfdb42 ]
Sriov guest side doesn't init ras feature hence the poison irq shouldn't be put during hw fini
Signed-off-by: Mangesh Gadre Mangesh.Gadre@amd.com Reviewed-by: Hawking Zhang Hawking.Zhang@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Sasha Levin sashal@kernel.org ---
LLM Generated explanations, may be completely bogus:
YES
- What it fixes: Prevents a WARN_ON call trace during VCN v5.0.1 hardware fini on SR-IOV guests by avoiding an unmatched amdgpu_irq_put() for the VCN poison IRQ that was never enabled on the guest. The WARN arises because amdgpu_irq_put() checks that the IRQ was enabled and emits a warning if not. - Precise change: Adds an SR-IOV VF guard to the RAS poison IRQ “put” in the VCN v5.0.1 fini path: - drivers/gpu/drm/amd/amdgpu/vcn_v5_0_1.c:352 - Before: if (amdgpu_ras_is_supported(adev, AMDGPU_RAS_BLOCK__VCN)) amdgpu_irq_put(...) - After: if (amdgpu_ras_is_supported(adev, AMDGPU_RAS_BLOCK__VCN) && !amdgpu_sriov_vf(adev)) amdgpu_irq_put(...) - Why the call trace happens: amdgpu_irq_put() warns if the interrupt type wasn’t previously enabled (no prior amdgpu_irq_get()): - drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c:639 - if (WARN_ON(!amdgpu_irq_enabled(adev, src, type))) return -EINVAL; - Why SR-IOV VF should skip put: The SR-IOV guest doesn’t initialize VCN RAS poison IRQ (no amdgpu_irq_get()), so calling amdgpu_irq_put() on fini is an unmatched “put” that triggers the WARN_ON. The RAS “get” for VCN v5.0.1 is only attempted when RAS is supported and the handler is present: - drivers/gpu/drm/amd/amdgpu/vcn_v5_0_1.c:1702 - if (amdgpu_ras_is_supported(adev, ras_block->block) && adev->vcn.inst->ras_poison_irq.funcs) amdgpu_irq_get(...) - Consistency with adjacent code: Other blocks already avoid the “put” on VF, demonstrating a known-good pattern: - VCN v4.0.3 fini: drivers/gpu/drm/amd/amdgpu/vcn_v4_0_3.c:391 - if (amdgpu_ras_is_supported(...) && !amdgpu_sriov_vf(adev)) amdgpu_irq_put(...) - JPEG v5.0.1 fini: drivers/gpu/drm/amd/amdgpu/jpeg_v5_0_1.c:318 - if (amdgpu_ras_is_supported(...) && !amdgpu_sriov_vf(adev)) amdgpu_irq_put(...) - Scope and risk: - Small, contained, and localized to VCN v5.0.1 fini. - No functional change for bare metal or PF; only suppresses an invalid “put” on VF where the IRQ was never enabled. - No architectural changes; pure bug fix in a driver subsystem. - Stable criteria: - Fixes a user-visible bug (call trace on SR-IOV guests) during suspend/shutdown or module teardown paths. - Minimal risk; follows existing patterns in related IP blocks. - No new features; clear, targeted fix suitable for stable backport.
drivers/gpu/drm/amd/amdgpu/vcn_v5_0_1.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/vcn_v5_0_1.c b/drivers/gpu/drm/amd/amdgpu/vcn_v5_0_1.c index cb560d64da08c..8ef4a8b2fae99 100644 --- a/drivers/gpu/drm/amd/amdgpu/vcn_v5_0_1.c +++ b/drivers/gpu/drm/amd/amdgpu/vcn_v5_0_1.c @@ -284,7 +284,7 @@ static int vcn_v5_0_1_hw_fini(struct amdgpu_ip_block *ip_block) vinst->set_pg_state(vinst, AMD_PG_STATE_GATE); }
- if (amdgpu_ras_is_supported(adev, AMDGPU_RAS_BLOCK__VCN)) + if (amdgpu_ras_is_supported(adev, AMDGPU_RAS_BLOCK__VCN) && !amdgpu_sriov_vf(adev)) amdgpu_irq_put(adev, &adev->vcn.inst->ras_poison_irq, 0);
return 0;