On Thu, Nov 04, 2021 at 08:39:18AM +0100, Christian König wrote:
Am 03.11.21 um 22:25 schrieb Karol Herbst:
On Wed, Nov 3, 2021 at 9:47 PM Sven Joachim svenjoac@gmx.de wrote:
On 2021-11-03 21:32 +0100, Karol Herbst wrote:
On Wed, Nov 3, 2021 at 9:29 PM Karol Herbst kherbst@redhat.com wrote:
On Wed, Nov 3, 2021 at 8:52 PM Sven Joachim svenjoac@gmx.de wrote:
On 2021-11-01 10:17 +0100, Greg Kroah-Hartman wrote:
> From: Christian König christian.koenig@amd.com > > commit 0db55f9a1bafbe3dac750ea669de9134922389b5 upstream. > > We need to cleanup the fences for ghost objects as well. > > Signed-off-by: Christian König christian.koenig@amd.com > Reported-by: Erhard F. erhard_f@mailbox.org > Tested-by: Erhard F. erhard_f@mailbox.org > Reviewed-by: Huang Rui ray.huang@amd.com > Bug: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugzilla.k... > Bug: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugzilla.k... > CC: stable@vger.kernel.org > Link: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpatchwork.... > Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org > --- > drivers/gpu/drm/ttm/ttm_bo_util.c | 1 + > 1 file changed, 1 insertion(+) > > --- a/drivers/gpu/drm/ttm/ttm_bo_util.c > +++ b/drivers/gpu/drm/ttm/ttm_bo_util.c > @@ -322,6 +322,7 @@ static void ttm_transfered_destroy(struc > struct ttm_transfer_obj *fbo; > > fbo = container_of(bo, struct ttm_transfer_obj, base); > + dma_resv_fini(&fbo->base.base._resv); > ttm_bo_put(fbo->bo); > kfree(fbo); > } Alas, this innocuous looking commit causes one of my systems to lock up as soon as run startx. This happens with the nouveau driver, two other systems with radeon and intel graphics are not affected. Also I only noticed it in 5.10.77. Kernels 5.15 and 5.14.16 are not affected, and I do not use 5.4 anymore.
I am not familiar with nouveau's ttm management and what has changed there between 5.10 and 5.14, but maybe one of their developers can shed a light on this.
Cheers, Sven
could be related to 265ec0dd1a0d18f4114f62c0d4a794bb4e729bc1
maybe not.. but I did remember there being a few tmm related patches which only hurt nouveau :/ I guess one could do a git bisect to figure out what change "fixes" it.
Maybe, but since the memory leaks reported by Erhard only started to show up in 5.14 (if I read the bugzilla reports correctly), perhaps the patch should simply be reverted on earlier kernels?
Yeah, I think this is probably the right approach.
I agree. The problem is this memory leak could potentially happen with 5.10 as wel, just much much much less likely.
But my guess is that 5.10 is so buggy that when the leak does NOT happen we double free and obviously causing a crash.
So for the sake of stability please don't apply this patch to 5.10. I'm going to comment on the original bug report as well.
Now reverted from 5.10 and 5.4 kernels, thanks,
greg k-h