On Fri, Jul 07, 2023 at 06:07:49PM +0200, Linux regression tracking (Thorsten Leemhuis) wrote:
Hi, Thorsten here, the Linux kernel's regression tracker. Top-posting for once, to make this easily accessible to everyone.
So what's the way forward now? It sounded like Jarkko wanted to apply the patch from this thread days ago, but that didn't happen afaics. Then below message showed up, but Marios patch also wasn't applied.
Is this intentional, or did something somewhere fall through the cracks?
Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
I haven't seen any update to Jarkko's repo.
My patch resolves the immediate issue being seen on the ppc system, and was mostly just me asking why even go through this amd specific code on non-x86 systems.
The vio bus shutdown code only does the remove call when kexec is in progress. The pnp and platform bus type shutdown calls do not do something similar so maybe the check in Mario's patch isn't needed, but I don't think it would hurt to have it in there.
Regards, Jerry
-- Everything you wanna know about Linux kernel regression tracking: https://linux-regtracking.leemhuis.info/about/#tldr If I did something stupid, please tell me, as explained on that page.
#regzbot poke
On 05.07.23 19:04, Jerry Snitselaar wrote:
On Fri, Jun 30, 2023 at 01:07:00PM +0300, Jarkko Sakkinen wrote:
On Thu Jun 29, 2023 at 11:41 PM EEST, Jerry Snitselaar wrote:
tpm_amd_is_rng_defective is for dealing with an issue related to the AMD firmware TPM, so on non-x86 architectures just have it inline and return false.
Cc: Jarkko Sakkinen jarkko@kernel.org Cc: "Jason A. Donenfeld" Jason@zx2c4.com Cc: Jason Gunthorpe jgg@ziepe.ca Cc: Peter Huewe peterhuewe@gmx.de Cc: stable@vger.kernel.org Cc: Linux regressions mailing list regressions@lists.linux.dev Cc: Mario Limonciello mario.limonciello@amd.com Reported-by: Aneesh Kumar K. V aneesh.kumar@linux.ibm.com Reported-by: Sachin Sant sachinp@linux.ibm.com Closes: https://lore.kernel.org/lkml/99B81401-DB46-49B9-B321-CF832B50CAC3@linux.ibm.... Fixes: f1324bbc4011 ("tpm: disable hwrng for fTPM on some AMD designs") Signed-off-by: Jerry Snitselaar jsnitsel@redhat.com
drivers/char/tpm/tpm-chip.c | 7 +++++++ 1 file changed, 7 insertions(+)
diff --git a/drivers/char/tpm/tpm-chip.c b/drivers/char/tpm/tpm-chip.c index cd48033b804a..cf5499e51999 100644 --- a/drivers/char/tpm/tpm-chip.c +++ b/drivers/char/tpm/tpm-chip.c @@ -518,6 +518,7 @@ static int tpm_add_legacy_sysfs(struct tpm_chip *chip)
- 6.x.y.z series: 6.0.18.6 +
- 3.x.y.z series: 3.57.y.5 +
*/ +#ifdef CONFIG_X86 static bool tpm_amd_is_rng_defective(struct tpm_chip *chip) { u32 val1, val2; @@ -566,6 +567,12 @@ static bool tpm_amd_is_rng_defective(struct tpm_chip *chip) return true; } +#else +static inline bool tpm_amd_is_rng_defective(struct tpm_chip *chip) +{
- return false;
+} +#endif /* CONFIG_X86 */ static int tpm_hwrng_read(struct hwrng *rng, void *data, size_t max, bool wait) { -- 2.38.1
Sanity check, this was the right patch, right?
I'll apply it.
BR, Jarkko
Sorry, I've been dealing with a family health issue the past week. It wasn't clear to me why chip->ops was null when I first took a look, but I think I understand now looking at it again this morning. The stack trace shows it in the device_shutdown() path:
[ 34.381674] NIP [c0000000009db1e4] tpm_amd_is_rng_defective+0x74/0x240 [ 34.381681] LR [c0000000009db928] tpm_chip_unregister+0x138/0x160 [ 34.381685] Call Trace: [ 34.381686] [c00000009742faa0] [c0000000009db928] tpm_chip_unregister+0x138/0x160 [ 34.381690] [c00000009742fae0] [c0000000009eab94] tpm_ibmvtpm_remove+0x34/0x130 [ 34.381695] [c00000009742fb50] [c000000000115738] vio_bus_remove+0x58/0xd0 [ 34.381701] [c00000009742fb90] [c000000000a01ecc] device_shutdown+0x21c/0x39c [ 34.381705] [c00000009742fc20] [c0000000001a2684] kernel_restart_prepare+0x54/0x70 [ 34.381710] [c00000009742fc40] [c000000000292c48] kernel_kexec+0xa8/0x100 [ 34.381714] [c00000009742fcb0] [c0000000001a2cd4] __do_sys_reboot+0x214/0x2c0 [ 34.381718] [c00000009742fe10] [c000000000034adc] system_call_exception+0x13c/0x340 [ 34.381723] [c00000009742fe50] [c00000000000d05c] system_call_vectored_common+0x15c/0x2ec
So I think what happened is:
device_shutdown -> dev->class->shutdown_pre (tpm_class_shutdown) // clears chip->ops -> dev->bus->shutdown (vio_bus_shutdown) -> vio_bus_remove -> viodrv->remove (tpm_ibmvtpm_remove) -> tpm_chip_unregister -> tpm_amd_is_rng_defective -> oops!
I guess anything that gets called in the tpm_chip_unregister path should be doing a check of chip->ops prior to using it. So I think Mario's patch would still be a good thing to have.
Regards, Jerry