[AMD Official Use Only - General]
Reviewed-by: Tao Zhou tao.zhou1@amd.com
-----Original Message----- From: Greg Kroah-Hartman gregkh@linuxfoundation.org Sent: Sunday, July 9, 2023 7:14 PM To: stable@vger.kernel.org Cc: Greg Kroah-Hartman gregkh@linuxfoundation.org; patches@lists.linux.dev; Zhou1, Tao Tao.Zhou1@amd.com; Zhang, Hawking Hawking.Zhang@amd.com; Deucher, Alexander Alexander.Deucher@amd.com; Tuikov, Luben Luben.Tuikov@amd.com; Deucher, Alexander Alexander.Deucher@amd.com; Sasha Levin sashal@kernel.org Subject: [PATCH 6.3 317/431] drm/amdgpu: Fix usage of UMC fill record in RAS
From: Luben Tuikov luben.tuikov@amd.com
[ Upstream commit 71344a718a9fda8c551cdc4381d354f9a9907f6f ]
The fixed commit listed in the Fixes tag below, introduced a bug in amdgpu_ras.c::amdgpu_reserve_page_direct(), in that when introducing the new amdgpu_umc_fill_error_record() and internally in that new function the physical address (argument "uint64_t retired_page"--wrong name) is right-shifted by AMDGPU_GPU_PAGE_SHIFT. Thus, in amdgpu_reserve_page_direct() when we pass "address" to that new function, we should NOT right-shift it, since this results, erroneously, in the page address to be 0 for first 2^(2*AMDGPU_GPU_PAGE_SHIFT) memory addresses.
This commit fixes this bug.
Cc: Tao Zhou tao.zhou1@amd.com Cc: Hawking Zhang Hawking.Zhang@amd.com Cc: Alex Deucher Alexander.Deucher@amd.com Fixes: 400013b268cb ("drm/amdgpu: add umc_fill_error_record to make code more simple") Signed-off-by: Luben Tuikov luben.tuikov@amd.com Link: https://lore.kernel.org/r/20230610113536.10621-1-luben.tuikov@amd.com Reviewed-by: Hawking Zhang Hawking.Zhang@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Sasha Levin sashal@kernel.org
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c index 63dfcc98152d5..b3daca6372a90 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c @@ -170,8 +170,7 @@ static int amdgpu_reserve_page_direct(struct amdgpu_device *adev, uint64_t addre
memset(&err_rec, 0x0, sizeof(struct eeprom_table_record)); err_data.err_addr = &err_rec;
amdgpu_umc_fill_error_record(&err_data, address,
(address >> AMDGPU_GPU_PAGE_SHIFT), 0, 0);
amdgpu_umc_fill_error_record(&err_data, address, address, 0, 0); if (amdgpu_bad_page_threshold != 0) { amdgpu_ras_add_bad_pages(adev, err_data.err_addr,
-- 2.39.2