These patches are back port for 5.10 and 5.12 stable. They are cherry-picked from 5.14 stable and has some conflict fix.
BugFix: https://bugzilla.kernel.org/show_bug.cgi?id=211277
James Zhu (3): drm/amdkfd: separate kfd_iommu_resume from kfd_resume drm/amdgpu: add amdgpu_amdkfd_resume_iommu drm/amdgpu: move iommu_resume before ip init/resume
Yifan Zhang (2): drm/amdgpu: init iommu after amdkfd device init drm/amdkfd: fix boot failure when iommu is disabled in Picasso.
drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 10 ++++++++++ drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h | 2 ++ drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 8 ++++++++ drivers/gpu/drm/amd/amdkfd/kfd_device.c | 15 +++++++++++---- 4 files changed, 31 insertions(+), 4 deletions(-)
commit fefc01f042f44ede373ee66773b8238dd8fdcb55 upstream.
Separate kfd_iommu_resume from kfd_resume for fine-tuning of amdgpu device init/resume/reset/recovery sequence.
v2: squash in fix for !CONFIG_HSA_AMD
Bug: https://bugzilla.kernel.org/show_bug.cgi?id=211277 Signed-off-by: James Zhu James.Zhu@amd.com Reviewed-by: Felix Kuehling Felix.Kuehling@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h | 1 + drivers/gpu/drm/amd/amdkfd/kfd_device.c | 12 ++++++++---- 2 files changed, 9 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h index ea391ca..7f78bcf 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h @@ -262,6 +262,7 @@ bool kgd2kfd_device_init(struct kfd_dev *kfd, const struct kgd2kfd_shared_resources *gpu_resources); void kgd2kfd_device_exit(struct kfd_dev *kfd); void kgd2kfd_suspend(struct kfd_dev *kfd, bool run_pm); +int kgd2kfd_resume_iommu(struct kfd_dev *kfd); int kgd2kfd_resume(struct kfd_dev *kfd, bool run_pm); int kgd2kfd_pre_reset(struct kfd_dev *kfd); int kgd2kfd_post_reset(struct kfd_dev *kfd); diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c b/drivers/gpu/drm/amd/amdkfd/kfd_device.c index 5751bdd..1204dae 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c @@ -896,17 +896,21 @@ int kgd2kfd_resume(struct kfd_dev *kfd, bool run_pm) return ret; }
-static int kfd_resume(struct kfd_dev *kfd) +int kgd2kfd_resume_iommu(struct kfd_dev *kfd) { int err = 0;
err = kfd_iommu_resume(kfd); - if (err) { + if (err) dev_err(kfd_device, "Failed to resume IOMMU for device %x:%x\n", kfd->pdev->vendor, kfd->pdev->device); - return err; - } + return err; +} + +static int kfd_resume(struct kfd_dev *kfd) +{ + int err = 0;
err = kfd->dqm->ops.start(kfd->dqm); if (err) {
commit 8066008482e533e91934bee49765bf8b4a7c40db upstream.
Add amdgpu_amdkfd_resume_iommu for amdgpu.
Bug: https://bugzilla.kernel.org/show_bug.cgi?id=211277 Signed-off-by: James Zhu James.Zhu@amd.com Reviewed-by: Felix Kuehling Felix.Kuehling@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 10 ++++++++++ drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h | 1 + 2 files changed, 11 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c index 0544460..ef83047 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c @@ -194,6 +194,16 @@ void amdgpu_amdkfd_suspend(struct amdgpu_device *adev, bool run_pm) kgd2kfd_suspend(adev->kfd.dev, run_pm); }
+int amdgpu_amdkfd_resume_iommu(struct amdgpu_device *adev) +{ + int r = 0; + + if (adev->kfd.dev) + r = kgd2kfd_resume_iommu(adev->kfd.dev); + + return r; +} + int amdgpu_amdkfd_resume(struct amdgpu_device *adev, bool run_pm) { int r = 0; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h index 7f78bcf..896ae32 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h @@ -126,6 +126,7 @@ int amdgpu_amdkfd_init(void); void amdgpu_amdkfd_fini(void);
void amdgpu_amdkfd_suspend(struct amdgpu_device *adev, bool run_pm); +int amdgpu_amdkfd_resume_iommu(struct amdgpu_device *adev); int amdgpu_amdkfd_resume(struct amdgpu_device *adev, bool run_pm); void amdgpu_amdkfd_interrupt(struct amdgpu_device *adev, const void *ih_ring_entry);
commit f02abeb0779700c308e661a412451b38962b8a0b upstream.
Separate iommu_resume from kfd_resume, and move it before other amdgpu ip init/resume.
Bug: https://bugzilla.kernel.org/show_bug.cgi?id=211277 Signed-off-by: James Zhu James.Zhu@amd.com Reviewed-by: Felix Kuehling Felix.Kuehling@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 12 ++++++++++++ 1 file changed, 12 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index 97723f2..2947bde 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -2220,6 +2220,10 @@ static int amdgpu_device_ip_init(struct amdgpu_device *adev) if (r) goto init_failed;
+ r = amdgpu_amdkfd_resume_iommu(adev); + if (r) + goto init_failed; + r = amdgpu_device_ip_hw_init_phase1(adev); if (r) goto init_failed; @@ -2913,6 +2917,10 @@ static int amdgpu_device_ip_resume(struct amdgpu_device *adev) { int r;
+ r = amdgpu_amdkfd_resume_iommu(adev); + if (r) + return r; + r = amdgpu_device_ip_resume_phase1(adev); if (r) return r; @@ -4296,6 +4304,10 @@ static int amdgpu_do_asic_reset(struct amdgpu_hive_info *hive,
if (!r) { dev_info(tmp_adev->dev, "GPU reset succeeded, trying to resume\n"); + r = amdgpu_amdkfd_resume_iommu(tmp_adev); + if (r) + goto out; + r = amdgpu_device_ip_resume_phase1(tmp_adev); if (r) goto out;
From: Yifan Zhang yifan1.zhang@amd.com
[ Upstream commit 714d9e4574d54596973ee3b0624ee4a16264d700 ]
This patch is to fix clinfo failure in Raven/Picasso:
Number of platforms: 1 Platform Profile: FULL_PROFILE Platform Version: OpenCL 2.2 AMD-APP (3364.0) Platform Name: AMD Accelerated Parallel Processing Platform Vendor: Advanced Micro Devices, Inc. Platform Extensions: cl_khr_icd cl_amd_event_callback
Platform Name: AMD Accelerated Parallel Processing Number of devices: 0
Signed-off-by: Yifan Zhang yifan1.zhang@amd.com Reviewed-by: James Zhu James.Zhu@amd.com Tested-by: James Zhu James.Zhu@amd.com Acked-by: Felix Kuehling Felix.Kuehling@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index 2947bde..488e574 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -2220,10 +2220,6 @@ static int amdgpu_device_ip_init(struct amdgpu_device *adev) if (r) goto init_failed;
- r = amdgpu_amdkfd_resume_iommu(adev); - if (r) - goto init_failed; - r = amdgpu_device_ip_hw_init_phase1(adev); if (r) goto init_failed; @@ -2259,6 +2255,10 @@ static int amdgpu_device_ip_init(struct amdgpu_device *adev) amdgpu_xgmi_add_device(adev); amdgpu_amdkfd_device_init(adev);
+ r = amdgpu_amdkfd_resume_iommu(adev); + if (r) + goto init_failed; + amdgpu_fru_get_product_info(adev);
init_failed:
From: Yifan Zhang yifan1.zhang@amd.com
commit afd18180c07026f94a80ff024acef5f4159084a4 upstream.
When IOMMU disabled in sbios and kfd in iommuv2 path, iommuv2 init will fail. But this failure should not block amdgpu driver init.
Reported-by: youling youling257@gmail.com Tested-by: youling youling257@gmail.com Signed-off-by: Yifan Zhang yifan1.zhang@amd.com Reviewed-by: James Zhu James.Zhu@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 ---- drivers/gpu/drm/amd/amdkfd/kfd_device.c | 3 +++ 2 files changed, 3 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index 488e574..f262c4e 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -2255,10 +2255,6 @@ static int amdgpu_device_ip_init(struct amdgpu_device *adev) amdgpu_xgmi_add_device(adev); amdgpu_amdkfd_device_init(adev);
- r = amdgpu_amdkfd_resume_iommu(adev); - if (r) - goto init_failed; - amdgpu_fru_get_product_info(adev);
init_failed: diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c b/drivers/gpu/drm/amd/amdkfd/kfd_device.c index 1204dae..b35f0af 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c @@ -751,6 +751,9 @@ bool kgd2kfd_device_init(struct kfd_dev *kfd,
kfd_cwsr_init(kfd);
+ if (kgd2kfd_resume_iommu(kfd)) + goto device_iommu_error; + if (kfd_resume(kfd)) goto kfd_resume_error;
On Fri, Dec 03, 2021 at 12:27:32PM -0500, James Zhu wrote:
From: Yifan Zhang yifan1.zhang@amd.com
commit afd18180c07026f94a80ff024acef5f4159084a4 upstream.
When IOMMU disabled in sbios and kfd in iommuv2 path, iommuv2 init will fail. But this failure should not block amdgpu driver init.
Reported-by: youling youling257@gmail.com Tested-by: youling youling257@gmail.com Signed-off-by: Yifan Zhang yifan1.zhang@amd.com Reviewed-by: James Zhu James.Zhu@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 ---- drivers/gpu/drm/amd/amdkfd/kfd_device.c | 3 +++ 2 files changed, 3 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index 488e574..f262c4e 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -2255,10 +2255,6 @@ static int amdgpu_device_ip_init(struct amdgpu_device *adev) amdgpu_xgmi_add_device(adev); amdgpu_amdkfd_device_init(adev);
- r = amdgpu_amdkfd_resume_iommu(adev);
- if (r)
goto init_failed;
- amdgpu_fru_get_product_info(adev);
init_failed: diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c b/drivers/gpu/drm/amd/amdkfd/kfd_device.c index 1204dae..b35f0af 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c @@ -751,6 +751,9 @@ bool kgd2kfd_device_init(struct kfd_dev *kfd, kfd_cwsr_init(kfd);
- if (kgd2kfd_resume_iommu(kfd))
goto device_iommu_error;
No need to "fix up" the coding style here from the original, please always stick to what is upstream whenever possible :(
thanks,
greg k-h
On Fri, Dec 03, 2021 at 12:27:27PM -0500, James Zhu wrote:
These patches are back port for 5.10 and 5.12 stable. They are cherry-picked from 5.14 stable and has some conflict fix.
BugFix: https://bugzilla.kernel.org/show_bug.cgi?id=211277
James Zhu (3): drm/amdkfd: separate kfd_iommu_resume from kfd_resume drm/amdgpu: add amdgpu_amdkfd_resume_iommu drm/amdgpu: move iommu_resume before ip init/resume
Yifan Zhang (2): drm/amdgpu: init iommu after amdkfd device init drm/amdkfd: fix boot failure when iommu is disabled in Picasso.
drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 10 ++++++++++ drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h | 2 ++ drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 8 ++++++++ drivers/gpu/drm/amd/amdkfd/kfd_device.c | 15 +++++++++++---- 4 files changed, 31 insertions(+), 4 deletions(-)
-- 2.7.4
This patch series is causing build errors as reported in numerous places: https://lore.kernel.org/r/4a881261-ba90-2095-58df-13dcffe6bcf2@roeck-us.net https://lore.kernel.org/r/bd0bddb3-8fa7-dc8e-d205-5c266e98d8b9@huawei.com https://lore.kernel.org/r/CA+G9fYtYmZY-m1ZCaF3qJeGtn=8CJR_4K8EB_T4W+wuh31CNj... so I am going to drop them all from the tree now.
Please fix up and resend if you want them in the 5.10.y kernel tree.
thanks,
greg k-h
linux-stable-mirror@lists.linaro.org