This change breaks tons of systems.
This reverts commit 33b3ad3788aba846fc8b9a065fe2685a0b64f713.
Bug: https://bugzilla.kernel.org/show_bug.cgi?id=206973 Bug: https://bugzilla.kernel.org/show_bug.cgi?id=206697 Bug: https://bugzilla.kernel.org/show_bug.cgi?id=207763 Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1140 Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1287 Signed-off-by: Alex Deucher alexander.deucher@amd.com Cc: stable@vger.kernel.org Cc: Christoph Hellwig hch@lst.de Cc: christian.koenig@amd.com --- drivers/gpu/drm/radeon/radeon.h | 1 + drivers/gpu/drm/radeon/radeon_device.c | 13 ++++++++----- drivers/gpu/drm/radeon/radeon_ttm.c | 2 +- 3 files changed, 10 insertions(+), 6 deletions(-)
diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h index b7c3fb2bfb54..eed23dffccf4 100644 --- a/drivers/gpu/drm/radeon/radeon.h +++ b/drivers/gpu/drm/radeon/radeon.h @@ -2391,6 +2391,7 @@ struct radeon_device { struct radeon_wb wb; struct radeon_dummy_page dummy_page; bool shutdown; + bool need_dma32; bool need_swiotlb; bool accel_working; bool fastfb_working; /* IGP feature*/ diff --git a/drivers/gpu/drm/radeon/radeon_device.c b/drivers/gpu/drm/radeon/radeon_device.c index 266e3cbbd09b..f74c74ad8b5d 100644 --- a/drivers/gpu/drm/radeon/radeon_device.c +++ b/drivers/gpu/drm/radeon/radeon_device.c @@ -1363,25 +1363,28 @@ int radeon_device_init(struct radeon_device *rdev, else rdev->mc.mc_mask = 0xffffffffULL; /* 32 bit MC */
- /* set DMA mask. + /* set DMA mask + need_dma32 flags. * PCIE - can handle 40-bits. * IGP - can handle 40-bits * AGP - generally dma32 is safest * PCI - dma32 for legacy pci gart, 40 bits on newer asics */ - dma_bits = 40; + rdev->need_dma32 = false; if (rdev->flags & RADEON_IS_AGP) - dma_bits = 32; + rdev->need_dma32 = true; if ((rdev->flags & RADEON_IS_PCI) && (rdev->family <= CHIP_RS740)) - dma_bits = 32; + rdev->need_dma32 = true; #ifdef CONFIG_PPC64 if (rdev->family == CHIP_CEDAR) - dma_bits = 32; + rdev->need_dma32 = true; #endif
+ dma_bits = rdev->need_dma32 ? 32 : 40; r = dma_set_mask_and_coherent(&rdev->pdev->dev, DMA_BIT_MASK(dma_bits)); if (r) { + rdev->need_dma32 = true; + dma_bits = 32; pr_warn("radeon: No suitable DMA available\n"); return r; } diff --git a/drivers/gpu/drm/radeon/radeon_ttm.c b/drivers/gpu/drm/radeon/radeon_ttm.c index 357e8e98cca9..d2550862313e 100644 --- a/drivers/gpu/drm/radeon/radeon_ttm.c +++ b/drivers/gpu/drm/radeon/radeon_ttm.c @@ -787,7 +787,7 @@ int radeon_ttm_init(struct radeon_device *rdev) &radeon_bo_driver, rdev->ddev->anon_inode->i_mapping, rdev->ddev->vma_offset_manager, - dma_addressing_limited(&rdev->pdev->dev)); + rdev->need_dma32); if (r) { DRM_ERROR("failed initializing buffer object driver(%d).\n", r); return r;
On Tue, Sep 15, 2020 at 02:46:07PM -0400, Alex Deucher wrote:
This change breaks tons of systems.
Very vague :(
This commit has also been merged for over a year, why the sudden problem now?
This reverts commit 33b3ad3788aba846fc8b9a065fe2685a0b64f713.
You mean "33b3ad3788ab ("drm/radeon: handle PCIe root ports with addressing limitations")"?
That's the proper way to reference commits in changelogs please. It's even documented that way...
Bug: https://bugzilla.kernel.org/show_bug.cgi?id=206973 Bug: https://bugzilla.kernel.org/show_bug.cgi?id=206697 Bug: https://bugzilla.kernel.org/show_bug.cgi?id=207763 Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1140 Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1287 Signed-off-by: Alex Deucher alexander.deucher@amd.com Cc: stable@vger.kernel.org Cc: Christoph Hellwig hch@lst.de Cc: christian.koenig@amd.com
Fixes: 33b3ad3788ab ("drm/radeon: handle PCIe root ports with addressing limitations")
as well?
thanks,
greg k-h
On Wed, Sep 16, 2020 at 08:33:00AM +0200, Greg KH wrote:
On Tue, Sep 15, 2020 at 02:46:07PM -0400, Alex Deucher wrote:
This change breaks tons of systems.
Very vague :(
This commit has also been merged for over a year, why the sudden problem now?
Unrelated rant, but one year is generally what it takes for most users to upgrade to new kernels, through their distro updates. Especially for older hw like the radeon drivers (since 5 years or so amd gpus switched over to amdgpu.ko).
So surprise that bugs only show up after 1+ year shouldn't be a surprise :-) My personal rule is that I put a 1 year spacer between a risky change and any cleanup that enables. Too many regrets in the past.
Cheers, Daniel
This reverts commit 33b3ad3788aba846fc8b9a065fe2685a0b64f713.
You mean "33b3ad3788ab ("drm/radeon: handle PCIe root ports with addressing limitations")"?
That's the proper way to reference commits in changelogs please. It's even documented that way...
Bug: https://bugzilla.kernel.org/show_bug.cgi?id=206973 Bug: https://bugzilla.kernel.org/show_bug.cgi?id=206697 Bug: https://bugzilla.kernel.org/show_bug.cgi?id=207763 Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1140 Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1287 Signed-off-by: Alex Deucher alexander.deucher@amd.com Cc: stable@vger.kernel.org Cc: Christoph Hellwig hch@lst.de Cc: christian.koenig@amd.com
Fixes: 33b3ad3788ab ("drm/radeon: handle PCIe root ports with addressing limitations")
as well?
thanks,
greg k-h _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
On Wed, Sep 16, 2020 at 2:32 AM Greg KH gregkh@linuxfoundation.org wrote:
On Tue, Sep 15, 2020 at 02:46:07PM -0400, Alex Deucher wrote:
This change breaks tons of systems.
Very vague :(
Screen corruption making the system unusable.
This commit has also been merged for over a year, why the sudden problem now?
It was noticed by several people closer to when the change went in as well. If you notice, most of the bugs date back quite a while. We looked into it a bit at the time but couldn't determine the problem. It only seems to affect really old chips (like 15-20 years old) which makes it hard to reproduce if you don't have an old system. There were a couple of threads at the time, but nothing was resolved. I was able to find one of them: https://lkml.org/lkml/2019/12/14/263
There were several new bugs filed which brought the issue back to my attention recently.
This reverts commit 33b3ad3788aba846fc8b9a065fe2685a0b64f713.
You mean "33b3ad3788ab ("drm/radeon: handle PCIe root ports with addressing limitations")"?
That's the proper way to reference commits in changelogs please. It's even documented that way...
When you revert a patch with git, that is what it does. Maybe we should fix git to change the formatting.
Bug: https://bugzilla.kernel.org/show_bug.cgi?id=206973 Bug: https://bugzilla.kernel.org/show_bug.cgi?id=206697 Bug: https://bugzilla.kernel.org/show_bug.cgi?id=207763 Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1140 Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1287 Signed-off-by: Alex Deucher alexander.deucher@amd.com Cc: stable@vger.kernel.org Cc: Christoph Hellwig hch@lst.de Cc: christian.koenig@amd.com
Fixes: 33b3ad3788ab ("drm/radeon: handle PCIe root ports with addressing limitations")
Sure, I can add that, but it doesn't really fix it, it reverts it. But point taken, it does fix the commit by removing it.
Alex
as well?
thanks,
greg k-h
On Tue, Sep 15, 2020 at 02:46:07PM -0400, Alex Deucher wrote:
This change breaks tons of systems.
Did you do at least some basic root causing on why? Do GPUs get fed address they can't deal with? Any examples?
Bug 1 doesn't seem to contain any analysis and was reported against a very old kernel that had all kind of fixes since.
Bug 2 seems to imply a drm kthread is accessing some structure it shouldn't, which would imply a mismatch between pools used by radeon now and those actually provided by the core. Something that should be pretty to trivial to fix for someone understanding the whole ttm pool maze.
Bug 3: same as 1, but an even older kernel.
Bug 4: looks like 1 and 3, and actually verified to work properly in 5.9-rc. Did you try to get the other reporters test this as well?
All over not a very useful changelog.
On Wed, Sep 16, 2020 at 3:04 AM Christoph Hellwig hch@lst.de wrote:
On Tue, Sep 15, 2020 at 02:46:07PM -0400, Alex Deucher wrote:
This change breaks tons of systems.
Did you do at least some basic root causing on why? Do GPUs get fed address they can't deal with? Any examples?
Bug 1 doesn't seem to contain any analysis and was reported against a very old kernel that had all kind of fixes since.
Bug 2 seems to imply a drm kthread is accessing some structure it shouldn't, which would imply a mismatch between pools used by radeon now and those actually provided by the core. Something that should be pretty to trivial to fix for someone understanding the whole ttm pool maze.
Bug 3: same as 1, but an even older kernel.
Bug 4: looks like 1 and 3, and actually verified to work properly in 5.9-rc. Did you try to get the other reporters test this as well?
It would appear that the change in 5.9 to disable AGP on radeon fixed the issue. I'm following up on the other tickets to see if I can get confirmation. On another thread[1], the user was able to avoid the issue by disabling HIMEM. Looks like some issue with HIMEM and/or AGP.
Alex
On Wed, Sep 16, 2020 at 06:16:25PM -0400, Alex Deucher wrote:
On Wed, Sep 16, 2020 at 3:04 AM Christoph Hellwig hch@lst.de wrote:
On Tue, Sep 15, 2020 at 02:46:07PM -0400, Alex Deucher wrote:
This change breaks tons of systems.
Did you do at least some basic root causing on why? Do GPUs get fed address they can't deal with? Any examples?
Bug 1 doesn't seem to contain any analysis and was reported against a very old kernel that had all kind of fixes since.
Bug 2 seems to imply a drm kthread is accessing some structure it shouldn't, which would imply a mismatch between pools used by radeon now and those actually provided by the core. Something that should be pretty to trivial to fix for someone understanding the whole ttm pool maze.
Bug 3: same as 1, but an even older kernel.
Bug 4: looks like 1 and 3, and actually verified to work properly in 5.9-rc. Did you try to get the other reporters test this as well?
It would appear that the change in 5.9 to disable AGP on radeon fixed the issue. I'm following up on the other tickets to see if I can get confirmation. On another thread[1], the user was able to avoid the issue by disabling HIMEM. Looks like some issue with HIMEM and/or AGP.
Thanks. I'll try to spend some time to figure out what could be highmem related. I'd much rather get this fixed properly.
linux-stable-mirror@lists.linaro.org