commit 542a56e8eb4467ae654eefab31ff194569db39cd upstream.
The VCN firmware loading path enables the indirect SRAM mode if it's advertised as supported. We might have some cases of FW issues that prevents this mode to working properly though, ending-up in a failed probe. An example below, observed in the Steam Deck:
[...] [drm] failed to load ucode VCN0_RAM(0x3A) [drm] psp gfx command LOAD_IP_FW(0x6) failed and response status is (0xFFFF0000) amdgpu 0000:04:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring vcn_dec_0 test failed (-110) [drm:amdgpu_device_init.cold [amdgpu]] *ERROR* hw_init of IP block <vcn_v3_0> failed -110 amdgpu 0000:04:00.0: amdgpu: amdgpu_device_ip_init failed amdgpu 0000:04:00.0: amdgpu: Fatal error during GPU init [...]
Disabling the VCN block circumvents this, but it's a very invasive workaround that turns off the entire feature. So, let's add a quirk on VCN loading that checks for known problematic BIOSes on Vangogh, so we can proactively disable the indirect SRAM mode and allow the HW proper probe and VCN IP block to work fine.
Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/2385 Fixes: 82132ecc5432 ("drm/amdgpu: enable Vangogh VCN indirect sram mode") Fixes: 9a8cc8cabc1e ("drm/amdgpu: enable Vangogh VCN indirect sram mode") Cc: stable@vger.kernel.org Cc: James Zhu James.Zhu@amd.com Cc: Leo Liu leo.liu@amd.com Signed-off-by: Guilherme G. Piccoli gpiccoli@igalia.com Signed-off-by: Alex Deucher alexander.deucher@amd.com ---
Hi folks, this was build/boot tested on Deck. I've also adjusted the context, function was reworked on 6.2.
But what a surprise was for me not see this fix already in 6.1.y, since I've CCed stable, and the reason for that is really peculiar:
$ git log -1 --pretty="%an <%ae>: %s" 82132ecc5432 Leo Liu leo.liu@amd.com: drm/amdgpu: enable Vangogh VCN indirect sram mode
$ git describe --contains 82132ecc5432 v6.2-rc1~124^2~1^2~13
$ git log -1 --pretty="%an <%ae>: %s" 9a8cc8cabc1e Leo Liu leo.liu@amd.com: drm/amdgpu: enable Vangogh VCN indirect sram mode
$ git describe --contains 9a8cc8cabc1e v6.1-rc8~16^2^2
This is quite strange for me, we have 2 commit hashes pointing to the *same* commit, and each one is present..in a different release !!?! Since I've marked this patch as fixing 82132ecc5432 originally, 6.1.y stable misses it, since it only contains 9a8cc8cabc1e (which is the same patch!).
Alex, do you have an idea why sometimes commits from the AMD tree appear duplicate in mainline? Specially in different releases, this could cause some confusion I guess.
Thanks in advance,
Guilherme
drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c | 17 +++++++++++++++++ 1 file changed, 17 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c index ce64ca1c6e66..5c1193dd7d88 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c @@ -26,6 +26,7 @@
#include <linux/firmware.h> #include <linux/module.h> +#include <linux/dmi.h> #include <linux/pci.h> #include <linux/debugfs.h> #include <drm/drm_drv.h> @@ -84,6 +85,7 @@ int amdgpu_vcn_sw_init(struct amdgpu_device *adev) { unsigned long bo_size; const char *fw_name; + const char *bios_ver; const struct common_firmware_header *hdr; unsigned char fw_check; unsigned int fw_shared_size, log_offset; @@ -159,6 +161,21 @@ int amdgpu_vcn_sw_init(struct amdgpu_device *adev) if ((adev->firmware.load_type == AMDGPU_FW_LOAD_PSP) && (adev->pg_flags & AMD_PG_SUPPORT_VCN_DPG)) adev->vcn.indirect_sram = true; + /* + * Some Steam Deck's BIOS versions are incompatible with the + * indirect SRAM mode, leading to amdgpu being unable to get + * properly probed (and even potentially crashing the kernel). + * Hence, check for these versions here - notice this is + * restricted to Vangogh (Deck's APU). + */ + bios_ver = dmi_get_system_info(DMI_BIOS_VERSION); + + if (bios_ver && (!strncmp("F7A0113", bios_ver, 7) || + !strncmp("F7A0114", bios_ver, 7))) { + adev->vcn.indirect_sram = false; + dev_info(adev->dev, + "Steam Deck quirk: indirect SRAM disabled on BIOS %s\n", bios_ver); + } break; case IP_VERSION(3, 0, 16): fw_name = FIRMWARE_DIMGREY_CAVEFISH;
[Public]
-----Original Message----- From: Guilherme G. Piccoli gpiccoli@igalia.com Sent: Tuesday, April 18, 2023 6:15 PM To: stable@vger.kernel.org Cc: gregkh@linuxfoundation.org; sashal@kernel.org; amd- gfx@lists.freedesktop.org; Deucher, Alexander Alexander.Deucher@amd.com; Zhu, James James.Zhu@amd.com; Liu, Leo Leo.Liu@amd.com; kernel@gpiccoli.net; kernel-dev@igalia.com Subject: [PATCH 6.1.y] drm/amdgpu/vcn: Disable indirect SRAM on Vangogh broken BIOSes
commit 542a56e8eb4467ae654eefab31ff194569db39cd upstream.
The VCN firmware loading path enables the indirect SRAM mode if it's advertised as supported. We might have some cases of FW issues that prevents this mode to working properly though, ending-up in a failed probe. An example below, observed in the Steam Deck:
[...] [drm] failed to load ucode VCN0_RAM(0x3A) [drm] psp gfx command LOAD_IP_FW(0x6) failed and response status is (0xFFFF0000) amdgpu 0000:04:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring vcn_dec_0 test failed (-110) [drm:amdgpu_device_init.cold [amdgpu]] *ERROR* hw_init of IP block <vcn_v3_0> failed -110 amdgpu 0000:04:00.0: amdgpu: amdgpu_device_ip_init failed amdgpu 0000:04:00.0: amdgpu: Fatal error during GPU init [...]
Disabling the VCN block circumvents this, but it's a very invasive workaround that turns off the entire feature. So, let's add a quirk on VCN loading that checks for known problematic BIOSes on Vangogh, so we can proactively disable the indirect SRAM mode and allow the HW proper probe and VCN IP block to work fine.
Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/2385 Fixes: 82132ecc5432 ("drm/amdgpu: enable Vangogh VCN indirect sram mode") Fixes: 9a8cc8cabc1e ("drm/amdgpu: enable Vangogh VCN indirect sram mode") Cc: stable@vger.kernel.org Cc: James Zhu James.Zhu@amd.com Cc: Leo Liu leo.liu@amd.com Signed-off-by: Guilherme G. Piccoli gpiccoli@igalia.com Signed-off-by: Alex Deucher alexander.deucher@amd.com
Hi folks, this was build/boot tested on Deck. I've also adjusted the context, function was reworked on 6.2.
But what a surprise was for me not see this fix already in 6.1.y, since I've CCed stable, and the reason for that is really peculiar:
$ git log -1 --pretty="%an <%ae>: %s" 82132ecc5432 Leo Liu leo.liu@amd.com: drm/amdgpu: enable Vangogh VCN indirect sram mode
$ git describe --contains 82132ecc5432 v6.2-rc1~124^2~1^2~13
$ git log -1 --pretty="%an <%ae>: %s" 9a8cc8cabc1e Leo Liu leo.liu@amd.com: drm/amdgpu: enable Vangogh VCN indirect sram mode
$ git describe --contains 9a8cc8cabc1e v6.1-rc8~16^2^2
This is quite strange for me, we have 2 commit hashes pointing to the *same* commit, and each one is present..in a different release !!?! Since I've marked this patch as fixing 82132ecc5432 originally, 6.1.y stable misses it, since it only contains 9a8cc8cabc1e (which is the same patch!).
Alex, do you have an idea why sometimes commits from the AMD tree appear duplicate in mainline? Specially in different releases, this could cause some confusion I guess.
This is pretty common in the drm. The problem is there is not a good way to get bug fixes into both -next and -fixes without constant back merges. So the patches end up in -next and if they are important for stable, they go to -fixes too. We don't want -next to be broken, but we can't wait until the next kernel cycle to get the bug fixes into the current kernel and stable. I don't know of a good way to make this smoother.
Alex
Thanks in advance,
Guilherme
drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c | 17 +++++++++++++++++ 1 file changed, 17 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c index ce64ca1c6e66..5c1193dd7d88 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c @@ -26,6 +26,7 @@
#include <linux/firmware.h> #include <linux/module.h> +#include <linux/dmi.h> #include <linux/pci.h> #include <linux/debugfs.h> #include <drm/drm_drv.h> @@ -84,6 +85,7 @@ int amdgpu_vcn_sw_init(struct amdgpu_device *adev) { unsigned long bo_size; const char *fw_name;
- const char *bios_ver; const struct common_firmware_header *hdr; unsigned char fw_check; unsigned int fw_shared_size, log_offset; @@ -159,6 +161,21 @@ int
amdgpu_vcn_sw_init(struct amdgpu_device *adev) if ((adev->firmware.load_type == AMDGPU_FW_LOAD_PSP) && (adev->pg_flags & AMD_PG_SUPPORT_VCN_DPG)) adev->vcn.indirect_sram = true;
/*
* Some Steam Deck's BIOS versions are incompatible with
the
* indirect SRAM mode, leading to amdgpu being unable to
get
* properly probed (and even potentially crashing the
kernel).
* Hence, check for these versions here - notice this is
* restricted to Vangogh (Deck's APU).
*/
bios_ver = dmi_get_system_info(DMI_BIOS_VERSION);
if (bios_ver && (!strncmp("F7A0113", bios_ver, 7) ||
!strncmp("F7A0114", bios_ver, 7))) {
adev->vcn.indirect_sram = false;
dev_info(adev->dev,
"Steam Deck quirk: indirect SRAM disabled on
BIOS %s\n", bios_ver);
break; case IP_VERSION(3, 0, 16): fw_name = FIRMWARE_DIMGREY_CAVEFISH;}
-- 2.40.0
On 19/04/2023 10:16, Deucher, Alexander wrote:
[...]
This is quite strange for me, we have 2 commit hashes pointing to the *same* commit, and each one is present..in a different release !!?! Since I've marked this patch as fixing 82132ecc5432 originally, 6.1.y stable misses it, since it only contains 9a8cc8cabc1e (which is the same patch!).
Alex, do you have an idea why sometimes commits from the AMD tree appear duplicate in mainline? Specially in different releases, this could cause some confusion I guess.
This is pretty common in the drm. The problem is there is not a good way to get bug fixes into both -next and -fixes without constant back merges. So the patches end up in -next and if they are important for stable, they go to -fixes too. We don't want -next to be broken, but we can't wait until the next kernel cycle to get the bug fixes into the current kernel and stable. I don't know of a good way to make this smoother.
Alex
Thanks Alex, seems quite complicated indeed.
So, let me check if I understood properly: there are 2 trees (-fixes and -next), and the problem is that their merge onto mainline happens apart and there are kind of duplicate commits, that were first merged on -fixes, then "re-merged" on -next, right?
Would be possible to clean the -next tree right before the merge, using some script to find commits there that are going to be merged but are already present? Then you'd have a -next-to-merge tree that is clean and doesn't present dups, avoiding this.
Disclaimer: I'm not a maintainer, maybe there are smart ways of doing that or perhaps my suggestion is silly and unfeasible heh But seems likely that other maintainers face this problem as well and came up with some solution - avoiding the dups would be a good thing, IMO, if possible.
Cheers,
Guilherme
[Public]
-----Original Message----- From: Guilherme G. Piccoli gpiccoli@igalia.com Sent: Wednesday, April 19, 2023 10:15 AM To: Deucher, Alexander Alexander.Deucher@amd.com Cc: stable@vger.kernel.org; gregkh@linuxfoundation.org; sashal@kernel.org; amd-gfx@lists.freedesktop.org; Zhu, James James.Zhu@amd.com; Liu, Leo Leo.Liu@amd.com; kernel@gpiccoli.net; kernel-dev@igalia.com Subject: Re: [PATCH 6.1.y] drm/amdgpu/vcn: Disable indirect SRAM on Vangogh broken BIOSes
On 19/04/2023 10:16, Deucher, Alexander wrote:
[...]
This is quite strange for me, we have 2 commit hashes pointing to the *same* commit, and each one is present..in a different release !!?! Since I've marked this patch as fixing 82132ecc5432 originally, 6.1.y stable misses it, since it only contains 9a8cc8cabc1e (which is the same
patch!).
Alex, do you have an idea why sometimes commits from the AMD tree appear duplicate in mainline? Specially in different releases, this could cause some confusion I guess.
This is pretty common in the drm. The problem is there is not a good way
to get bug fixes into both -next and -fixes without constant back merges. So the patches end up in -next and if they are important for stable, they go to - fixes too. We don't want -next to be broken, but we can't wait until the next kernel cycle to get the bug fixes into the current kernel and stable. I don't know of a good way to make this smoother.
Alex
Thanks Alex, seems quite complicated indeed.
So, let me check if I understood properly: there are 2 trees (-fixes and -next), and the problem is that their merge onto mainline happens apart and there are kind of duplicate commits, that were first merged on -fixes, then "re- merged" on -next, right?
Each subsystem works on new features, then when the merge window opens, Linus pulls them into mainline. At that point, mainline goes into RCs and then mainline is bug fixes only. Meanwhile in parallel, each subsystem is working on new feature development for the next merge window (subsystem -next trees). The trees tend to lag behind mainline a bit. Most developers focus on them in support of upcoming features. They are usually also where bugs are fixed. If a bug is fixed in the -next tree, what's the best way to get it into mainline? I guess ideally it would be fixed in mainline and them mainline would be merged into everyone's -next branch, but that's not always practical. Often times new features depend on bug fixes and then you'd end up stalling new development waiting for a back merge, or you'd have to rebase your -next branches regularly so that they would shed any patches in mainline which is not great either. We try and cherry-pick -x when we can to show the relationship.
Would be possible to clean the -next tree right before the merge, using some script to find commits there that are going to be merged but are already present? Then you'd have a -next-to-merge tree that is clean and doesn't present dups, avoiding this.
There's no real way to clean it without rewriting history. You'd basically need to do regular backmerges and rebases of the -next trees. Also then what do you do if you already have a feature in -next (say Dave or Daniel have already pulled in my latest amdgpu PR for -next) and you realize that one of those patches is an important bug fix for mainline? If the drm -next tree rebased then all of the other drm driver trees that feed into it would need to rebase as well otherwise they'd have stale history.
You also have the case of a fix in -next needing to land in mainline, but due to differences in the trees, it needs backporting to apply properly.
Disclaimer: I'm not a maintainer, maybe there are smart ways of doing that or perhaps my suggestion is silly and unfeasible heh But seems likely that other maintainers face this problem as well and came up with some solution - avoiding the dups would be a good thing, IMO, if possible.
No worries. I agree. I haven't seen anything so far that really addresses handles this effectively.
Alex
Cheers,
Guilherme
On 19/04/2023 17:04, Deucher, Alexander wrote:
[...]
So, let me check if I understood properly: there are 2 trees (-fixes and -next), and the problem is that their merge onto mainline happens apart and there are kind of duplicate commits, that were first merged on -fixes, then "re- merged" on -next, right?
Each subsystem works on new features, then when the merge window opens, Linus pulls them into mainline. At that point, mainline goes into RCs and then mainline is bug fixes only. Meanwhile in parallel, each subsystem is working on new feature development for the next merge window (subsystem -next trees). The trees tend to lag behind mainline a bit. Most developers focus on them in support of upcoming features. They are usually also where bugs are fixed. If a bug is fixed in the -next tree, what's the best way to get it into mainline? I guess ideally it would be fixed in mainline and them mainline would be merged into everyone's -next branch, but that's not always practical. Often times new features depend on bug fixes and then you'd end up stalling new development waiting for a back merge, or you'd have to rebase your -next branches regularly so that they would shed any patches in mainline which is not great either. We try and cherry-pick -x when we can to show the relationship.
Would be possible to clean the -next tree right before the merge, using some script to find commits there that are going to be merged but are already present? Then you'd have a -next-to-merge tree that is clean and doesn't present dups, avoiding this.
There's no real way to clean it without rewriting history. You'd basically need to do regular backmerges and rebases of the -next trees. Also then what do you do if you already have a feature in -next (say Dave or Daniel have already pulled in my latest amdgpu PR for -next) and you realize that one of those patches is an important bug fix for mainline? If the drm -next tree rebased then all of the other drm driver trees that feed into it would need to rebase as well otherwise they'd have stale history.
You also have the case of a fix in -next needing to land in mainline, but due to differences in the trees, it needs backporting to apply properly.
Disclaimer: I'm not a maintainer, maybe there are smart ways of doing that or perhaps my suggestion is silly and unfeasible heh But seems likely that other maintainers face this problem as well and came up with some solution - avoiding the dups would be a good thing, IMO, if possible.
No worries. I agree. I haven't seen anything so far that really addresses handles this effectively.
Alex
Thanks a bunch Alex, I appreciate your time detailing the issue, which seems...quite complex and annoying, indeed ={
What I'll do from now on is trying to check all hashes that match a commit, so I can add more than one fixes tag if that makes sense. At least, this way I can prevent missing fixes in stable like this patch heh
@Greg, can you pick this one? Thanks!
Cheers,
Guilherme
On Thu, Apr 20, 2023 at 09:36:28AM -0300, Guilherme G. Piccoli wrote:
On 19/04/2023 17:04, Deucher, Alexander wrote:
[...]
So, let me check if I understood properly: there are 2 trees (-fixes and -next), and the problem is that their merge onto mainline happens apart and there are kind of duplicate commits, that were first merged on -fixes, then "re- merged" on -next, right?
Each subsystem works on new features, then when the merge window opens, Linus pulls them into mainline. At that point, mainline goes into RCs and then mainline is bug fixes only. Meanwhile in parallel, each subsystem is working on new feature development for the next merge window (subsystem -next trees). The trees tend to lag behind mainline a bit. Most developers focus on them in support of upcoming features. They are usually also where bugs are fixed. If a bug is fixed in the -next tree, what's the best way to get it into mainline? I guess ideally it would be fixed in mainline and them mainline would be merged into everyone's -next branch, but that's not always practical. Often times new features depend on bug fixes and then you'd end up stalling new development waiting for a back merge, or you'd have to rebase your -next branches regularly so that they would shed any patches in mainline which is not great either. We try and cherry-pick -x when we can to show the relationship.
Would be possible to clean the -next tree right before the merge, using some script to find commits there that are going to be merged but are already present? Then you'd have a -next-to-merge tree that is clean and doesn't present dups, avoiding this.
There's no real way to clean it without rewriting history. You'd basically need to do regular backmerges and rebases of the -next trees. Also then what do you do if you already have a feature in -next (say Dave or Daniel have already pulled in my latest amdgpu PR for -next) and you realize that one of those patches is an important bug fix for mainline? If the drm -next tree rebased then all of the other drm driver trees that feed into it would need to rebase as well otherwise they'd have stale history.
You also have the case of a fix in -next needing to land in mainline, but due to differences in the trees, it needs backporting to apply properly.
Disclaimer: I'm not a maintainer, maybe there are smart ways of doing that or perhaps my suggestion is silly and unfeasible heh But seems likely that other maintainers face this problem as well and came up with some solution - avoiding the dups would be a good thing, IMO, if possible.
No worries. I agree. I haven't seen anything so far that really addresses handles this effectively.
Alex
Thanks a bunch Alex, I appreciate your time detailing the issue, which seems...quite complex and annoying, indeed ={
And it drives me crazy, I hate seeing drm patches show up in my stable queue and I put them at the end of the list for this very reason. I've complained about it for years, but oh well...
@Greg, can you pick this one? Thanks!
Which "one" are you referring to here?
confused,
greg k-h
On 20/04/2023 09:42, gregkh@linuxfoundation.org wrote:
[...]
@Greg, can you pick this one? Thanks!
Which "one" are you referring to here?
confused,
greg k-h
This one, sent in this email thread.
The title of the patch is "drm/amdgpu/vcn: Disable indirect SRAM on Vangogh broken BIOSes", target is 6.1.y and (one of the) upstream hash(es) is 542a56e8eb44 heh
Cheers,
Guilherme
On Thu, Apr 20, 2023 at 09:59:17AM -0300, Guilherme G. Piccoli wrote:
On 20/04/2023 09:42, gregkh@linuxfoundation.org wrote:
[...]
@Greg, can you pick this one? Thanks!
Which "one" are you referring to here?
confused,
greg k-h
This one, sent in this email thread.
I don't have "this email thread" anymore, remember, some of us get thousand+ emails a day...
The title of the patch is "drm/amdgpu/vcn: Disable indirect SRAM on Vangogh broken BIOSes", target is 6.1.y and (one of the) upstream hash(es) is 542a56e8eb44 heh
But that commit says it fixes a problem in the 6.2 tree, why is this relevant for 6.1.y?
thanks,
greg k-h
On Thu, Apr 20, 2023 at 11:04 AM gregkh@linuxfoundation.org gregkh@linuxfoundation.org wrote:
On Thu, Apr 20, 2023 at 09:59:17AM -0300, Guilherme G. Piccoli wrote:
On 20/04/2023 09:42, gregkh@linuxfoundation.org wrote:
[...]
@Greg, can you pick this one? Thanks!
Which "one" are you referring to here?
confused,
greg k-h
This one, sent in this email thread.
I don't have "this email thread" anymore, remember, some of us get thousand+ emails a day...
The title of the patch is "drm/amdgpu/vcn: Disable indirect SRAM on Vangogh broken BIOSes", target is 6.1.y and (one of the) upstream hash(es) is 542a56e8eb44 heh
But that commit says it fixes a problem in the 6.2 tree, why is this relevant for 6.1.y?
It fixes a generic issue with certain sbios versions.
Alex
thanks,
greg k-h
On 20/04/2023 12:02, gregkh@linuxfoundation.org wrote:
[...]
Which "one" are you referring to here?
confused,
greg k-h
This one, sent in this email thread.
I don't have "this email thread" anymore, remember, some of us get thousand+ emails a day...
I don't really understand the issue to be honest, we are talking in the very email thread! The email was sent April/18, it's not old or anything.
But in any case, for reference, this is the original email from the lore archives: https://lore.kernel.org/stable/20230418221522.1287942-1-gpiccoli@igalia.com/
The title of the patch is "drm/amdgpu/vcn: Disable indirect SRAM on Vangogh broken BIOSes", target is 6.1.y and (one of the) upstream hash(es) is 542a56e8eb44 heh
But that commit says it fixes a problem in the 6.2 tree, why is this relevant for 6.1.y?
That is explained in the email and the very reason for that, is the duplicate hashes we are discussing here.
The fix commit in question points the "Fixes:" tag to 82132ecc5432 ("drm/amdgpu: enable Vangogh VCN indirect sram mode"), which appears to be in 6.2 tree, right?
But notice that 9a8cc8cabc1e ("drm/amdgpu: enable Vangogh VCN indirect sram mode") is the *same* offender and..is present on 6.1 !
In other words, when I first wrote this fix, I just checked the tree quickly and came up with "Fixes: 82132ecc5432", but to be thorough, I should have pointed the fixes tag to 9a8cc8cabc1e, to pick it on 6.1.y.
tl;dr: the offender is present on 6.1.y, but this fix is not, hence I'm hereby requesting the merge. Some backport/context adjustment was necessary and it was properly tested in the Steam Deck.
Thanks,
Guilherme
On Thu, Apr 20, 2023 at 12:36:00PM -0300, Guilherme G. Piccoli wrote:
On 20/04/2023 12:02, gregkh@linuxfoundation.org wrote:
[...]
Which "one" are you referring to here?
confused,
greg k-h
This one, sent in this email thread.
I don't have "this email thread" anymore, remember, some of us get thousand+ emails a day...
I don't really understand the issue to be honest, we are talking in the very email thread! The email was sent April/18, it's not old or anything.
That's 3000+ emails ago for me :)
But in any case, for reference, this is the original email from the lore archives: https://lore.kernel.org/stable/20230418221522.1287942-1-gpiccoli@igalia.com/
The title of the patch is "drm/amdgpu/vcn: Disable indirect SRAM on Vangogh broken BIOSes", target is 6.1.y and (one of the) upstream hash(es) is 542a56e8eb44 heh
But that commit says it fixes a problem in the 6.2 tree, why is this relevant for 6.1.y?
That is explained in the email and the very reason for that, is the duplicate hashes we are discussing here.
The fix commit in question points the "Fixes:" tag to 82132ecc5432 ("drm/amdgpu: enable Vangogh VCN indirect sram mode"), which appears to be in 6.2 tree, right?
But notice that 9a8cc8cabc1e ("drm/amdgpu: enable Vangogh VCN indirect sram mode") is the *same* offender and..is present on 6.1 !
In other words, when I first wrote this fix, I just checked the tree quickly and came up with "Fixes: 82132ecc5432", but to be thorough, I should have pointed the fixes tag to 9a8cc8cabc1e, to pick it on 6.1.y.
tl;dr: the offender is present on 6.1.y, but this fix is not, hence I'm hereby requesting the merge. Some backport/context adjustment was necessary and it was properly tested in the Steam Deck.
Ok, we'll queue it up soon, thanks.
greg k-h
On 20/04/2023 12:56, gregkh@linuxfoundation.org wrote:
[...] That's 3000+ emails ago for me :)
/head_exploding
this is > 1000 emails per day, wow...my sympathies to you heh
[...]
tl;dr: the offender is present on 6.1.y, but this fix is not, hence I'm hereby requesting the merge. Some backport/context adjustment was necessary and it was properly tested in the Steam Deck.
Ok, we'll queue it up soon, thanks.
greg k-h
Thanks =)
On Tue, Apr 18, 2023 at 07:15:22PM -0300, Guilherme G. Piccoli wrote:
commit 542a56e8eb4467ae654eefab31ff194569db39cd upstream.
The VCN firmware loading path enables the indirect SRAM mode if it's advertised as supported. We might have some cases of FW issues that prevents this mode to working properly though, ending-up in a failed probe. An example below, observed in the Steam Deck:
[...] [drm] failed to load ucode VCN0_RAM(0x3A) [drm] psp gfx command LOAD_IP_FW(0x6) failed and response status is (0xFFFF0000) amdgpu 0000:04:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring vcn_dec_0 test failed (-110) [drm:amdgpu_device_init.cold [amdgpu]] *ERROR* hw_init of IP block <vcn_v3_0> failed -110 amdgpu 0000:04:00.0: amdgpu: amdgpu_device_ip_init failed amdgpu 0000:04:00.0: amdgpu: Fatal error during GPU init [...]
Disabling the VCN block circumvents this, but it's a very invasive workaround that turns off the entire feature. So, let's add a quirk on VCN loading that checks for known problematic BIOSes on Vangogh, so we can proactively disable the indirect SRAM mode and allow the HW proper probe and VCN IP block to work fine.
Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/2385 Fixes: 82132ecc5432 ("drm/amdgpu: enable Vangogh VCN indirect sram mode") Fixes: 9a8cc8cabc1e ("drm/amdgpu: enable Vangogh VCN indirect sram mode") Cc: stable@vger.kernel.org Cc: James Zhu James.Zhu@amd.com Cc: Leo Liu leo.liu@amd.com Signed-off-by: Guilherme G. Piccoli gpiccoli@igalia.com Signed-off-by: Alex Deucher alexander.deucher@amd.com
Hi folks, this was build/boot tested on Deck. I've also adjusted the context, function was reworked on 6.2.
But what a surprise was for me not see this fix already in 6.1.y, since I've CCed stable, and the reason for that is really peculiar:
Now queued up, thanks.
greg k-h
linux-stable-mirror@lists.linaro.org