The patch below does not apply to the 5.14-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to stable@vger.kernel.org.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 90a9266269eb9f71af1f323c33e1dca53527bd22 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Michel=20D=C3=A4nzer?= mdaenzer@redhat.com Date: Tue, 17 Aug 2021 10:23:25 +0200 Subject: [PATCH] drm/amdgpu: Cancel delayed work when GFXOFF is disabled MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit
schedule_delayed_work does not push back the work if it was already scheduled before, so amdgpu_device_delay_enable_gfx_off ran ~100 ms after the first time GFXOFF was disabled and re-enabled, even if GFXOFF was disabled and re-enabled again during those 100 ms.
This resulted in frame drops / stutter with the upcoming mutter 41 release on Navi 14, due to constantly enabling GFXOFF in the HW and disabling it again (for getting the GPU clock counter).
To fix this, call cancel_delayed_work_sync when the disable count transitions from 0 to 1, and only schedule the delayed work on the reverse transition, not if the disable count was already 0. This makes sure the delayed work doesn't run at unexpected times, and allows it to be lock-free.
v2: * Use cancel_delayed_work_sync & mutex_trylock instead of mod_delayed_work. v3: * Make amdgpu_device_delay_enable_gfx_off lock-free (Christian König) v4: * Fix race condition between amdgpu_gfx_off_ctrl incrementing adev->gfx.gfx_off_req_count and amdgpu_device_delay_enable_gfx_off checking for it to be 0 (Evan Quan)
Cc: stable@vger.kernel.org Reviewed-by: Evan Quan evan.quan@amd.com Reviewed-by: Lijo Lazar lijo.lazar@amd.com # v3 Acked-by: Christian König christian.koenig@amd.com # v3 Signed-off-by: Michel Dänzer mdaenzer@redhat.com Signed-off-by: Alex Deucher alexander.deucher@amd.com
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index 41cc00e489ac..41c6b3aacd37 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -2829,12 +2829,11 @@ static void amdgpu_device_delay_enable_gfx_off(struct work_struct *work) struct amdgpu_device *adev = container_of(work, struct amdgpu_device, gfx.gfx_off_delay_work.work);
- mutex_lock(&adev->gfx.gfx_off_mutex); - if (!adev->gfx.gfx_off_state && !adev->gfx.gfx_off_req_count) { - if (!amdgpu_dpm_set_powergating_by_smu(adev, AMD_IP_BLOCK_TYPE_GFX, true)) - adev->gfx.gfx_off_state = true; - } - mutex_unlock(&adev->gfx.gfx_off_mutex); + WARN_ON_ONCE(adev->gfx.gfx_off_state); + WARN_ON_ONCE(adev->gfx.gfx_off_req_count); + + if (!amdgpu_dpm_set_powergating_by_smu(adev, AMD_IP_BLOCK_TYPE_GFX, true)) + adev->gfx.gfx_off_state = true; }
/** diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c index e7e9655c5623..e7f06bd0f0cd 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c @@ -563,24 +563,38 @@ void amdgpu_gfx_off_ctrl(struct amdgpu_device *adev, bool enable)
mutex_lock(&adev->gfx.gfx_off_mutex);
- if (!enable) - adev->gfx.gfx_off_req_count++; - else if (adev->gfx.gfx_off_req_count > 0) + if (enable) { + /* If the count is already 0, it means there's an imbalance bug somewhere. + * Note that the bug may be in a different caller than the one which triggers the + * WARN_ON_ONCE. + */ + if (WARN_ON_ONCE(adev->gfx.gfx_off_req_count == 0)) + goto unlock; + adev->gfx.gfx_off_req_count--;
- if (enable && !adev->gfx.gfx_off_state && !adev->gfx.gfx_off_req_count) { - schedule_delayed_work(&adev->gfx.gfx_off_delay_work, GFX_OFF_DELAY_ENABLE); - } else if (!enable && adev->gfx.gfx_off_state) { - if (!amdgpu_dpm_set_powergating_by_smu(adev, AMD_IP_BLOCK_TYPE_GFX, false)) { - adev->gfx.gfx_off_state = false; + if (adev->gfx.gfx_off_req_count == 0 && !adev->gfx.gfx_off_state) + schedule_delayed_work(&adev->gfx.gfx_off_delay_work, GFX_OFF_DELAY_ENABLE); + } else { + if (adev->gfx.gfx_off_req_count == 0) { + cancel_delayed_work_sync(&adev->gfx.gfx_off_delay_work); + + if (adev->gfx.gfx_off_state && + !amdgpu_dpm_set_powergating_by_smu(adev, AMD_IP_BLOCK_TYPE_GFX, false)) { + adev->gfx.gfx_off_state = false;
- if (adev->gfx.funcs->init_spm_golden) { - dev_dbg(adev->dev, "GFXOFF is disabled, re-init SPM golden settings\n"); - amdgpu_gfx_init_spm_golden(adev); + if (adev->gfx.funcs->init_spm_golden) { + dev_dbg(adev->dev, + "GFXOFF is disabled, re-init SPM golden settings\n"); + amdgpu_gfx_init_spm_golden(adev); + } } } + + adev->gfx.gfx_off_req_count++; }
+unlock: mutex_unlock(&adev->gfx.gfx_off_mutex); }
On 2021-09-16 15:05, gregkh@linuxfoundation.org wrote:
The patch below does not apply to the 5.14-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to stable@vger.kernel.org.
It's already in 5.14, commit 32bc8f8373d2d6a681c96e4b25dca60d4d1c6016.
Please do backport it to older stable trees.
On Thu, Sep 16, 2021 at 03:39:16PM +0200, Michel Dänzer wrote:
On 2021-09-16 15:05, gregkh@linuxfoundation.org wrote:
The patch below does not apply to the 5.14-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to stable@vger.kernel.org.
It's already in 5.14, commit 32bc8f8373d2d6a681c96e4b25dca60d4d1c6016.
Odd, how were we supposed to know that?
Please do backport it to older stable trees.
The above commit is already in 5.10.62 and 5.13.14.
thanks,
greg k-h
On 2021-09-16 15:48, Greg KH wrote:
On Thu, Sep 16, 2021 at 03:39:16PM +0200, Michel Dänzer wrote:
On 2021-09-16 15:05, gregkh@linuxfoundation.org wrote:
The patch below does not apply to the 5.14-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to stable@vger.kernel.org.
It's already in 5.14, commit 32bc8f8373d2d6a681c96e4b25dca60d4d1c6016.
Odd, how were we supposed to know that?
Looks like the fix was merged separately for 5.14 and 5.15. I don't know how/why that happened. Alex / Christian?
[Public]
-----Original Message----- From: Michel Dänzer mdaenzer@redhat.com Sent: Thursday, September 16, 2021 9:55 AM To: Greg KH gregkh@linuxfoundation.org Cc: Deucher, Alexander Alexander.Deucher@amd.com; Koenig, Christian Christian.Koenig@amd.com; Quan, Evan Evan.Quan@amd.com; Lazar, Lijo Lijo.Lazar@amd.com; stable@vger.kernel.org Subject: Re: FAILED: patch "[PATCH] drm/amdgpu: Cancel delayed work when GFXOFF is disabled" failed to apply to 5.14-stable tree
On 2021-09-16 15:48, Greg KH wrote:
On Thu, Sep 16, 2021 at 03:39:16PM +0200, Michel Dänzer wrote:
On 2021-09-16 15:05, gregkh@linuxfoundation.org wrote:
The patch below does not apply to the 5.14-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to stable@vger.kernel.org.
It's already in 5.14, commit 32bc8f8373d2d6a681c96e4b25dca60d4d1c6016.
Odd, how were we supposed to know that?
Looks like the fix was merged separately for 5.14 and 5.15. I don't know how/why that happened. Alex / Christian?
The fix already landed in drm-next, but since this was before the merge window, we wanted to make sure the fix also landed in stable, so I cherry-picked it to 5.14. I'm not sure of a better way to handle these sort of cases.
Alex
-- Earthling Michel Dänzer | https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fredh at.com%2F&data=04%7C01%7Calexander.deucher%40amd.com%7Ceaf ba6f5f87c49c5fc0b08d9791987a2%7C3dd8961fe4884e608e11a82d994e183d% 7C0%7C0%7C637673972820037953%7CUnknown%7CTWFpbGZsb3d8eyJWIjoi MC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C100 0&sdata=VQ5VPN9N3%2B0oRSWQDSgwU9D7V6ONdwaE34huiwRYoW8 %3D&reserved=0 Libre software enthusiast | Mesa and X developer
On Thu, Sep 16, 2021 at 02:42:42PM +0000, Deucher, Alexander wrote:
[Public]
-----Original Message----- From: Michel Dänzer mdaenzer@redhat.com Sent: Thursday, September 16, 2021 9:55 AM To: Greg KH gregkh@linuxfoundation.org Cc: Deucher, Alexander Alexander.Deucher@amd.com; Koenig, Christian Christian.Koenig@amd.com; Quan, Evan Evan.Quan@amd.com; Lazar, Lijo Lijo.Lazar@amd.com; stable@vger.kernel.org Subject: Re: FAILED: patch "[PATCH] drm/amdgpu: Cancel delayed work when GFXOFF is disabled" failed to apply to 5.14-stable tree
On 2021-09-16 15:48, Greg KH wrote:
On Thu, Sep 16, 2021 at 03:39:16PM +0200, Michel Dänzer wrote:
On 2021-09-16 15:05, gregkh@linuxfoundation.org wrote:
The patch below does not apply to the 5.14-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to stable@vger.kernel.org.
It's already in 5.14, commit 32bc8f8373d2d6a681c96e4b25dca60d4d1c6016.
Odd, how were we supposed to know that?
Looks like the fix was merged separately for 5.14 and 5.15. I don't know how/why that happened. Alex / Christian?
The fix already landed in drm-next, but since this was before the merge window, we wanted to make sure the fix also landed in stable, so I cherry-picked it to 5.14. I'm not sure of a better way to handle these sort of cases.
You gotta give me a chance to know what happened. As the drm developers do this a lot, they have been putting the "cherry picked from" line in the patch because they can not merge between branches and keep the git id the same.
So try using the '-x' option to 'git cherry-pick' next time?
thanks,
greg k-h
[Public]
-----Original Message----- From: Greg KH gregkh@linuxfoundation.org Sent: Thursday, September 16, 2021 11:36 AM To: Deucher, Alexander Alexander.Deucher@amd.com Cc: Michel Dänzer mdaenzer@redhat.com; Koenig, Christian Christian.Koenig@amd.com; Quan, Evan Evan.Quan@amd.com; Lazar, Lijo Lijo.Lazar@amd.com; stable@vger.kernel.org Subject: Re: FAILED: patch "[PATCH] drm/amdgpu: Cancel delayed work when GFXOFF is disabled" failed to apply to 5.14-stable tree
On Thu, Sep 16, 2021 at 02:42:42PM +0000, Deucher, Alexander wrote:
[Public]
-----Original Message----- From: Michel Dänzer mdaenzer@redhat.com Sent: Thursday, September 16, 2021 9:55 AM To: Greg KH gregkh@linuxfoundation.org Cc: Deucher, Alexander Alexander.Deucher@amd.com; Koenig, Christian Christian.Koenig@amd.com; Quan, Evan Evan.Quan@amd.com; Lazar, Lijo Lijo.Lazar@amd.com; stable@vger.kernel.org Subject: Re: FAILED: patch "[PATCH] drm/amdgpu: Cancel delayed work when GFXOFF is disabled" failed to apply to 5.14-stable tree
On 2021-09-16 15:48, Greg KH wrote:
On Thu, Sep 16, 2021 at 03:39:16PM +0200, Michel Dänzer wrote:
On 2021-09-16 15:05, gregkh@linuxfoundation.org wrote:
The patch below does not apply to the 5.14-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to stable@vger.kernel.org.
It's already in 5.14, commit
32bc8f8373d2d6a681c96e4b25dca60d4d1c6016.
Odd, how were we supposed to know that?
Looks like the fix was merged separately for 5.14 and 5.15. I don't know how/why that happened. Alex / Christian?
The fix already landed in drm-next, but since this was before the merge window, we wanted to make sure the fix also landed in stable, so I cherry-picked it to 5.14. I'm not sure of a better way to handle these sort of cases.
You gotta give me a chance to know what happened. As the drm developers do this a lot, they have been putting the "cherry picked from" line in the patch because they can not merge between branches and keep the git id the same.
So try using the '-x' option to 'git cherry-pick' next time?
I can do that, but historically, I've gotten a lot of flack for that because the commit doesn't exist in Linus' tree yet at that point as the merge window hasn't opened yet.
Alex
thanks,
greg k-h
On Thu, Sep 16, 2021 at 04:05:05PM +0000, Deucher, Alexander wrote:
[Public]
-----Original Message----- From: Greg KH gregkh@linuxfoundation.org Sent: Thursday, September 16, 2021 11:36 AM To: Deucher, Alexander Alexander.Deucher@amd.com Cc: Michel Dänzer mdaenzer@redhat.com; Koenig, Christian Christian.Koenig@amd.com; Quan, Evan Evan.Quan@amd.com; Lazar, Lijo Lijo.Lazar@amd.com; stable@vger.kernel.org Subject: Re: FAILED: patch "[PATCH] drm/amdgpu: Cancel delayed work when GFXOFF is disabled" failed to apply to 5.14-stable tree
On Thu, Sep 16, 2021 at 02:42:42PM +0000, Deucher, Alexander wrote:
[Public]
-----Original Message----- From: Michel Dänzer mdaenzer@redhat.com Sent: Thursday, September 16, 2021 9:55 AM To: Greg KH gregkh@linuxfoundation.org Cc: Deucher, Alexander Alexander.Deucher@amd.com; Koenig, Christian Christian.Koenig@amd.com; Quan, Evan Evan.Quan@amd.com; Lazar, Lijo Lijo.Lazar@amd.com; stable@vger.kernel.org Subject: Re: FAILED: patch "[PATCH] drm/amdgpu: Cancel delayed work when GFXOFF is disabled" failed to apply to 5.14-stable tree
On 2021-09-16 15:48, Greg KH wrote:
On Thu, Sep 16, 2021 at 03:39:16PM +0200, Michel Dänzer wrote:
On 2021-09-16 15:05, gregkh@linuxfoundation.org wrote: > > The patch below does not apply to the 5.14-stable tree. > If someone wants it applied there, or to any other stable or > longterm tree, then please email the backport, including the > original git commit id to stable@vger.kernel.org.
It's already in 5.14, commit
32bc8f8373d2d6a681c96e4b25dca60d4d1c6016.
Odd, how were we supposed to know that?
Looks like the fix was merged separately for 5.14 and 5.15. I don't know how/why that happened. Alex / Christian?
The fix already landed in drm-next, but since this was before the merge window, we wanted to make sure the fix also landed in stable, so I cherry-picked it to 5.14. I'm not sure of a better way to handle these sort of cases.
You gotta give me a chance to know what happened. As the drm developers do this a lot, they have been putting the "cherry picked from" line in the patch because they can not merge between branches and keep the git id the same.
So try using the '-x' option to 'git cherry-pick' next time?
I can do that, but historically, I've gotten a lot of flack for that because the commit doesn't exist in Linus' tree yet at that point as the merge window hasn't opened yet.
I totally agree, and would argue that this is NOT the way to do it either. But it seems like this is what the i915 developers have been doing for a few years now and honestly, it's better than nothing.
But I would like a real solution here, this gets very annoying every -rc1 cycle for the DRM subsystem. You all are the only ones with this problem for some strange reason :(
greg k-h
linux-stable-mirror@lists.linaro.org