This patch is wrong as noted by MIchel a while ago - quote from his review of the patch.
"Actually, pflip_status should really only be set to AMDGPU_FLIP_SUBMITTED after the flip has been programmed to the hardware, at least as far as the lock holder is concerned. That's why the code was previously holding the lock until after the dc_commit_updates_for_stream call. Otherwise, it's at least theoretically possible that either:
* dm_pflip_high_irq is called before dc_commit_updates_for_stream, but sees flip_status == AMDGPU_FLIP_SUBMITTED and sends the event to userspace prematurely
* dm_pflip_high_irq is called after dc_commit_updates_for_stream, but sees flip_status != AMDGPU_FLIP_SUBMITTED, so it incorrectly doesn't send the event to userspace "
It shouldn't go in.
Andrey
On 07/05/2018 11:59 AM, gregkh@linuxfoundation.org wrote:
This is a note to let you know that I've just added the patch titled
drm/amd/display: release spinlock before committing updates to stream
to the 4.17-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git%3Ba=su...
The filename of the patch is: drm-amd-display-release-spinlock-before-committing-updates-to-stream.patch and it can be found in the queue-4.17 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree, please let stable@vger.kernel.org know about it.
From 4de9f38bb2cce3a4821ffb8a83d6b08f6e37d905 Mon Sep 17 00:00:00 2001 From: Shirish S shirish.s@amd.com Date: Tue, 26 Jun 2018 09:32:39 +0530 Subject: drm/amd/display: release spinlock before committing updates to stream MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit
From: Shirish S shirish.s@amd.com
commit 4de9f38bb2cce3a4821ffb8a83d6b08f6e37d905 upstream.
Currently, amdgpu_do_flip() spinlocks crtc->dev->event_lock and releases it only after committing updates to the stream.
dc_commit_updates_for_stream() should be moved out of spinlock for the below reasons:
- event_lock is supposed to protect access to acrct->pflip_status _only_
- dc_commit_updates_for_stream() has potential sleep's and also its not appropriate to be in an atomic state for such long sequences of code.
Signed-off-by: Shirish S shirish.s@amd.com Suggested-by: Andrey Grodzovsky andrey.grodzovsky@amd.com Reviewed-by: Michel Dänzer michel.daenzer@amd.com Reviewed-by: Harry Wentland harry.wentland@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-)
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c @@ -3967,10 +3967,11 @@ static void amdgpu_dm_do_flip(struct drm if (acrtc->base.state->event) prepare_flip_isr(acrtc);
- spin_unlock_irqrestore(&crtc->dev->event_lock, flags);
- surface_updates->surface = dc_stream_get_status(acrtc_state->stream)->plane_states[0]; surface_updates->flip_addr = &addr;
- dc_commit_updates_for_stream(adev->dm.dc, surface_updates, 1,
@@ -3983,9 +3984,6 @@ static void amdgpu_dm_do_flip(struct drm __func__, addr.address.grph.addr.high_part, addr.address.grph.addr.low_part);
- spin_unlock_irqrestore(&crtc->dev->event_lock, flags); }
static void amdgpu_dm_commit_planes(struct drm_atomic_state *state,
Patches currently in stable-queue which might be from shirish.s@amd.com are
queue-4.17/drm-amdgpu-add-apu-support-in-vi_set_vce_clocks.patch queue-4.17/drm-amdgpu-add-apu-support-in-vi_set_uvd_clocks.patch queue-4.17/drm-amd-display-release-spinlock-before-committing-updates-to-stream.patch
On Thu, Jul 05, 2018 at 12:05:09PM -0400, Andrey Grodzovsky wrote:
This patch is wrong as noted by MIchel a while ago - quote from his review of the patch.
"Actually, pflip_status should really only be set to AMDGPU_FLIP_SUBMITTED after the flip has been programmed to the hardware, at least as far as the lock holder is concerned. That's why the code was previously holding the lock until after the dc_commit_updates_for_stream call. Otherwise, it's at least theoretically possible that either:
- dm_pflip_high_irq is called before dc_commit_updates_for_stream, but sees
flip_status == AMDGPU_FLIP_SUBMITTED and sends the event to userspace prematurely
- dm_pflip_high_irq is called after dc_commit_updates_for_stream, but sees
flip_status != AMDGPU_FLIP_SUBMITTED, so it incorrectly doesn't send the event to userspace "
It shouldn't go in.
Is there a fix for this in Linus's tree for these problems? If not, why not? If so, what is that git commit id?
thanks,
greg k-h
On 07/05/2018 12:06 PM, Greg KH wrote:
On Thu, Jul 05, 2018 at 12:05:09PM -0400, Andrey Grodzovsky wrote:
This patch is wrong as noted by MIchel a while ago - quote from his review of the patch.
"Actually, pflip_status should really only be set to AMDGPU_FLIP_SUBMITTED after the flip has been programmed to the hardware, at least as far as the lock holder is concerned. That's why the code was previously holding the lock until after the dc_commit_updates_for_stream call. Otherwise, it's at least theoretically possible that either:
- dm_pflip_high_irq is called before dc_commit_updates_for_stream, but sees
flip_status == AMDGPU_FLIP_SUBMITTED and sends the event to userspace prematurely
- dm_pflip_high_irq is called after dc_commit_updates_for_stream, but sees
flip_status != AMDGPU_FLIP_SUBMITTED, so it incorrectly doesn't send the event to userspace "
It shouldn't go in.
Is there a fix for this in Linus's tree for these problems? If not, why not? If so, what is that git commit id?
AFAIK there is still no fix for the original issue which this patch was intended to fix.
Harry, can you please comment on this ?
Andrey
thanks,
greg k-h
On 2018-07-05 12:09 PM, Andrey Grodzovsky wrote:
On 07/05/2018 12:06 PM, Greg KH wrote:
On Thu, Jul 05, 2018 at 12:05:09PM -0400, Andrey Grodzovsky wrote:
This patch is wrong as noted by MIchel a while ago - quote from his review of the patch.
"Actually, pflip_status should really only be set to AMDGPU_FLIP_SUBMITTED after the flip has been programmed to the hardware, at least as far as the lock holder is concerned. That's why the code was previously holding the lock until after the dc_commit_updates_for_stream call. Otherwise, it's at least theoretically possible that either:
- dm_pflip_high_irq is called before dc_commit_updates_for_stream, but sees
flip_status == AMDGPU_FLIP_SUBMITTED and sends the event to userspace prematurely
- dm_pflip_high_irq is called after dc_commit_updates_for_stream, but sees
flip_status != AMDGPU_FLIP_SUBMITTED, so it incorrectly doesn't send the event to userspace "
It shouldn't go in.
Is there a fix for this in Linus's tree for these problems? If not, why not? If so, what is that git commit id?
AFAIK there is still no fix for the original issue which this patch was intended to fix.
Harry, can you please comment on this ?
That can be better answered by Shirish.
In my opinion the issue described by Michel is a lesser evil than attempting to allocate memory inside a spinlocked area.
Shirish, we need to understand why dc_commit_updates_for_stream() is trying to do a UPDATE_TYPE_FULL in your scenario. This should never really happen through the amdgpu_dm_do_flip code path and we need to understand why. If we never do UPDATE_TYPE_FULL here we should not need to allocate memory and should be find with leaving the spinlock around dc_commit_updates_for_stream().
Harry
Andrey
thanks,
greg k-h
On 7/5/2018 11:11 PM, Harry Wentland wrote:
On 2018-07-05 12:09 PM, Andrey Grodzovsky wrote:
On 07/05/2018 12:06 PM, Greg KH wrote:
On Thu, Jul 05, 2018 at 12:05:09PM -0400, Andrey Grodzovsky wrote:
This patch is wrong as noted by MIchel a while ago - quote from his review of the patch.
"Actually, pflip_status should really only be set to AMDGPU_FLIP_SUBMITTED after the flip has been programmed to the hardware, at least as far as the lock holder is concerned. That's why the code was previously holding the lock until after the dc_commit_updates_for_stream call. Otherwise, it's at least theoretically possible that either:
- dm_pflip_high_irq is called before dc_commit_updates_for_stream, but sees
flip_status == AMDGPU_FLIP_SUBMITTED and sends the event to userspace prematurely
- dm_pflip_high_irq is called after dc_commit_updates_for_stream, but sees
flip_status != AMDGPU_FLIP_SUBMITTED, so it incorrectly doesn't send the event to userspace "
It shouldn't go in.
Is there a fix for this in Linus's tree for these problems? If not, why not? If so, what is that git commit id?
AFAIK there is still no fix for the original issue which this patch was intended to fix.
Andrey, Ideally sleep's should not be coded within spinlock block, in this case its coded in dc_commit_updates_for_stream() at the below occurrences: 1. In drivers/gpu/drm/amd/amdgpu/atom.c => amdgpu_atom_execute_table_locked() => kzalloc 2. mutex callbacks in drivers/gpu/drm/amd/powerplay/amd_powerplay.c & drivers/gpu/drm/amd/amdgpu/amdgpu_atombios.c. 3. In drivers/gpu/drm/amd/display/dc/core/dc.c => dc_create_state() => kzalloc
In my case, when rendering on multiple planes, it led to system instabilities and with this patch applied we no more see them. (analysis further below in this mail).
Also when posted on amd-gfx with the corresponding RB's there was no objection to this patch not sure if nobody saw this patch getting merged.
Harry, can you please comment on this ?
That can be better answered by Shirish.
In my opinion the issue described by Michel is a lesser evil than attempting to allocate memory inside a spinlocked area.
Yes Harry. I agree, hence i pushed this patch as we have a code that attempts to allocate memory and furthermore calls mutex's.
Shirish, we need to understand why dc_commit_updates_for_stream() is trying to do a UPDATE_TYPE_FULL in your scenario. This should never really happen through the amdgpu_dm_do_flip code path and we need to understand why.
This happens in amdgpu_dm_do_flip() because of the hard-coded number of surfaces it call dc_commit_updates_for_stream() with, dc_commit_updates_for_stream(adev->dm.dc, surface_updates, 1, <<<<<<<<< acrtc_state->stream, NULL, &surface_updates->surface, state);
which is 1, where as in case of rendering on multiple planes the stream has 2 surfaces attached and hence the below condition fails in very first check of check_update_surfaces_for_stream() "if ( ... || stream_status->plane_count != surface_count)"
If we never do UPDATE_TYPE_FULL here we should not need to allocate memory and should be find with leaving the spinlock around dc_commit_updates_for_stream().
As i understand we may end-up trying to do a UPDATE_TYPE_FULL due to failure in any of the several checks in dc_check_update_surfaces_for_stream(). We cannot rule out the possibility of a code path being executed as long as it exists, ideally we should code another callback in amdgpu_dm_flip() that presumes the update type is always UPDATE_TYPE_FAST.
Till we fix this hard-coding i recommend that we should have this patch, am open to avoiding it if it breaks existing working systems or offends any builds.
Thanks. Regards, Shirish S
Harry
Andrey
thanks,
greg k-h
On 07/06/2018 02:26 AM, S, Shirish wrote:
On 7/5/2018 11:11 PM, Harry Wentland wrote:
On 2018-07-05 12:09 PM, Andrey Grodzovsky wrote:
On 07/05/2018 12:06 PM, Greg KH wrote:
On Thu, Jul 05, 2018 at 12:05:09PM -0400, Andrey Grodzovsky wrote:
This patch is wrong as noted by MIchel a while ago - quote from his review of the patch.
"Actually, pflip_status should really only be set to AMDGPU_FLIP_SUBMITTED after the flip has been programmed to the hardware, at least as far as the lock holder is concerned. That's why the code was previously holding the lock until after the dc_commit_updates_for_stream call. Otherwise, it's at least theoretically possible that either:
- dm_pflip_high_irq is called before dc_commit_updates_for_stream,
but sees flip_status == AMDGPU_FLIP_SUBMITTED and sends the event to userspace prematurely
- dm_pflip_high_irq is called after dc_commit_updates_for_stream,
but sees flip_status != AMDGPU_FLIP_SUBMITTED, so it incorrectly doesn't send the event to userspace "
It shouldn't go in.
Is there a fix for this in Linus's tree for these problems? If not, why not? If so, what is that git commit id?
AFAIK there is still no fix for the original issue which this patch was intended to fix.
Andrey, Ideally sleep's should not be coded within spinlock block, in this case its coded in dc_commit_updates_for_stream() at the below occurrences:
- In drivers/gpu/drm/amd/amdgpu/atom.c =>
amdgpu_atom_execute_table_locked() => kzalloc 2. mutex callbacks in drivers/gpu/drm/amd/powerplay/amd_powerplay.c & drivers/gpu/drm/amd/amdgpu/amdgpu_atombios.c. 3. In drivers/gpu/drm/amd/display/dc/core/dc.c => dc_create_state() => kzalloc
In my case, when rendering on multiple planes, it led to system instabilities and with this patch applied we no more see them. (analysis further below in this mail).
Also when posted on amd-gfx with the corresponding RB's there was no objection to this patch not sure if nobody saw this patch getting merged.
Harry, can you please comment on this ?
That can be better answered by Shirish.
In my opinion the issue described by Michel is a lesser evil than attempting to allocate memory inside a spinlocked area.
Yes Harry. I agree, hence i pushed this patch as we have a code that attempts to allocate memory and furthermore calls mutex's.
Shirish, we need to understand why dc_commit_updates_for_stream() is trying to do a UPDATE_TYPE_FULL in your scenario. This should never really happen through the amdgpu_dm_do_flip code path and we need to understand why.
This happens in amdgpu_dm_do_flip() because of the hard-coded number of surfaces it call dc_commit_updates_for_stream() with, dc_commit_updates_for_stream(adev->dm.dc, surface_updates, 1, <<<<<<<<< acrtc_state->stream, NULL, &surface_updates->surface, state);
which is 1, where as in case of rendering on multiple planes the stream has 2 surfaces attached and hence the below condition fails in very first check of check_update_surfaces_for_stream() "if ( ... || stream_status->plane_count != surface_count)"
If we never do UPDATE_TYPE_FULL here we should not need to allocate memory and should be find with leaving the spinlock around dc_commit_updates_for_stream().
As i understand we may end-up trying to do a UPDATE_TYPE_FULL due to failure in any of the several checks in dc_check_update_surfaces_for_stream(). We cannot rule out the possibility of a code path being executed as long as it exists, ideally we should code another callback in amdgpu_dm_flip() that presumes the update type is always UPDATE_TYPE_FAST.
Till we fix this hard-coding i recommend that we should have this patch, am open to avoiding it if it breaks existing working systems or offends any builds.
Thanks. Regards, Shirish S
I believe that at least then we should put a BIG comment in the code explaining this change both to avoid future people looking into it trying to revert it believing it's just a plain bug and for possible proper revert when the underlying issue with sleeping in atomic context is resolved.
Andrey
Harry
Andrey
thanks,
greg k-h
linux-stable-mirror@lists.linaro.org