6.18-stable review patch. If anyone has any objections, please let me know.
------------------
From: Avadhut Naik avadhut.naik@amd.com
commit d7ac083f095d894a0b8ac0573516bfd035e6b25a upstream.
Currently, when a CMCI storm detected on a Machine Check bank, subsides, the bank's corresponding bit in the mce_poll_banks per-CPU variable is cleared unconditionally by cmci_storm_end().
On AMD SMCA systems, this essentially disables polling on that particular bank on that CPU. Consequently, any subsequent correctable errors or storms will not be logged.
Since AMD SMCA systems allow banks to be managed by both polling and interrupts, the polling banks bitmap for a CPU, i.e., mce_poll_banks, should not be modified when a storm subsides.
Fixes: 7eae17c4add5 ("x86/mce: Add per-bank CMCI storm mitigation") Signed-off-by: Avadhut Naik avadhut.naik@amd.com Signed-off-by: Borislav Petkov (AMD) bp@alien8.de Cc: stable@vger.kernel.org Link: https://patch.msgid.link/20251121190542.2447913-2-avadhut.naik@amd.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kernel/cpu/mce/threshold.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/arch/x86/kernel/cpu/mce/threshold.c +++ b/arch/x86/kernel/cpu/mce/threshold.c @@ -85,7 +85,8 @@ void cmci_storm_end(unsigned int bank) { struct mca_storm_desc *storm = this_cpu_ptr(&storm_desc);
- __clear_bit(bank, this_cpu_ptr(mce_poll_banks)); + if (!mce_flags.amd_threshold) + __clear_bit(bank, this_cpu_ptr(mce_poll_banks)); storm->banks[bank].history = 0; storm->banks[bank].in_storm_mode = false;