Hi Rafael,
On 8/8/23 15:37, Rafael J. Wysocki wrote:
Hi,
On Tue, Aug 8, 2023 at 3:31 PM Hans de Goede hdegoede@redhat.com wrote:
Hi,
On 8/8/23 13:18, Rafael J. Wysocki wrote:
On Tue, Aug 8, 2023 at 10:36 AM Hans de Goede hdegoede@redhat.com wrote:
Hi,
On 8/7/23 17:19, Hans de Goede wrote:
Hi All,
On 8/6/23 17:14, Hans de Goede wrote:
Commit a9c4a912b7dc ("ACPI: resource: Remove "Zen" specific match and quirks") is causing keyboard problems for quite a log of AMD based laptop users, leading to many bug reports.
Revert this change for now, until we can come up with a better fix for the PS/2 IRQ trigger-type/polarity problems on some x86 laptops.
Fixes: a9c4a912b7dc ("ACPI: resource: Remove "Zen" specific match and quirks") Link: https://bugzilla.redhat.com/show_bug.cgi?id=2229165 Link: https://bugzilla.redhat.com/show_bug.cgi?id=2229317 Link: https://bugzilla.kernel.org/show_bug.cgi?id=217726 Cc: Mario Limonciello mario.limonciello@amd.com Cc: Linux regressions mailing list regressions@lists.linux.dev Cc: stable@vger.kernel.org Signed-off-by: Hans de Goede hdegoede@redhat.com
I've spend most of today analysing the situation / this problem :
213031 - MEDION notebook internal keyboard not recognized / not working correctly https://bugzilla.kernel.org/show_bug.cgi?id=213031
This is the bug that started it all, the issue here was overriding a level low DSDT entry:
IRQ (Level, ActiveLow, Exclusive, ) {1}
With an edge high entry from the MADT, note that edge high is the default mp_irqs[idx].irqflags value for legacy/ISA IRQs. The dmesg for the Ice Lake Medion M15T this bug is about shows no INT_SRC_OVR entry for IRQ 1 in the MADT, it does show INT_SRC_OVR entries for IRQ 0 and 9.
At first a fix was attempted to not use the MADT override unless the DSDT entry was edge high. But that caused regressions, so a switch to a DMI based approach was used instead. Noteworthy is that some of the regressions benefitted from a MADT override to high edge for IRQ 3 and 4 (UART IRQs) even though there are no INT_SRC_OVR messages in the dmesg of the machine with the regression.
*** fast forward to today ***
The DMI quirk based approach seems to have worked well for the Ice Lake era problems from approx. 3 years ago. But on AMD Zen based systems the situation seems to be more complex. Not using the MADT override is a problem for quite a few models. But using the MADT override is a problem on quite a few other models ...
Looking at the status quo for v6.4 where MADT overriding by default is not used, 3 bugs have been filed where the override is actually necessary (note dmesg snippets with patched kernel to enable MADT override):
217394 - IRQ override skipping breaks the Aya Neo Air Plus 6800U keyboard buttons https://bugzilla.kernel.org/show_bug.cgi?id=217394
Aya Neo Air Plus - AMD Ryzen 7 6800U
[ 0.003333] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl) [ 0.003333] ACPI: INT_SRC_OVR (bus 0 bus_irq 1 global_irq 1 low edge) [ 0.003333] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 low level) [ 0.410670] ACPI: IRQ 1 override to edge, low(!)
217406 - very slow keyboard typing without IRQ override with new AMD Ryzen CPUs https://bugzilla.kernel.org/show_bug.cgi?id=217406
HP Pavilion Aero 13 - AMD Ryzen 7735U
[ 0.026135] ACPI: INT_SRC_OVR (bus 0 bus_irq 1 global_irq 1 low edge) [ 0.026136] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl) [ 0.026137] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 low level)
[ 0.361640] ACPI: IRQ 1 override to edge, low(!)
217336 - keyboard not working Asus TUF FA617NS https://bugzilla.kernel.org/show_bug.cgi?id=217336
Asus TUF FA617NS - AMD Ryzen 7 7735HS
Noteworthy DSTD keyboard resource:
IRQNoFlags () {1}
ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl) ACPI: INT_SRC_OVR (bus 0 bus_irq 1 global_irq 1 low edge) ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 low level) ACPI: IRQ 1 override to edge, low(!)
So for all 3 do use MADT override on Zen bugs we have an INT_SRC_OVR dmesg entry for IRQ 1.
Unfortunately the "MAINGEAR Vector Pro 2 17" / "MG-VCP2-17A3070T" for which a quirk was added in commit 9946e39fe8d0 to force the override even though it it Zen based breaks this pattern:
[ 0.073733] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl) [ 0.073734] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 low level) [ 0.341347] ACPI: IRQ 1 override to edge, high(!)
Still the presence of an INT_SRC_OVR for a specific legacy IRQ seems to be a strong indicator that MADT overriding should be used in that case and can be used to at least reduce the amount of DMI quirks.
Another interesting data point is that all the devices for which DMI quirks are present for which MADT overriding should not be used for IRQ 1 all have a DSDT entry with the IRQ configured as level low and exclusive.
I think that the best thing to do might be to go back to the original approach from:
https://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git/commit/?...
and then limit this to IRQ1. Also maybe inverting the check to:
static bool irq_is_legacy(struct acpi_resource_irq *irq) { return !(irq->triggering == ACPI_LEVEL_SENSITIVE && irq->polarity == ACPI_ACTIVE_LOW && irq->shareable == ACPI_EXCLUSIVE); }
But I need to check if this will work for all the new Zen models for which we got bug reports after the recent dropping of 9946e39fe8d0 ("ACPI: resource: skip IRQ override on AMD Zen platforms")
So today I have started with continueing the investigation looking at laptop models where we used to not override because of:
if (boot_cpu_has(X86_FEATURE_ZEN)) return false;
And where removing this and thus using the override has led to a regression.
Looking at the acpidump-s from the following bugs:
https://bugzilla.redhat.com/show_bug.cgi?id=2229165 https://bugzilla.redhat.com/show_bug.cgi?id=2229317 https://bugzilla.kernel.org/show_bug.cgi?id=217726
All of these use the following settings for the kbd in the DSDT:
IRQ (Edge, ActiveLow, Shared, ) {1}
So we know these are at least 3 models with "Edge, ActiveLow, Shared" IRQ 1 settings which must not use the override. But the existing quirks before a9c4a912b7dc ("ACPI: resource: Remove "Zen" specific match and quirks") contain:
{ tongfang_gm_rg, 1, ACPI_EDGE_SENSITIVE, ACPI_ACTIVE_LOW, 1, true }, { maingear_laptop, 1, ACPI_EDGE_SENSITIVE, ACPI_ACTIVE_LOW, 1, true },
IOW models with "Edge, ActiveLow, Shared" IRQ 1 settings which OTOH must use the override, so as I was already afraid there is no easy for these DSDT IRQ 1 settings skip the override solution :|
So I have 2 plans to move forward with this:
Plan 1. Short them for 6.5 and backporting to 6.4.y (and other stable series):
- Revert a9c4a912b7dc
- Limit the acpi_dev_irq_override() check to only ever skip the IRQ override (return false) for GSI 1.
- Add a check if there is a INT_SRC_OVR MADT entry for GSI 1 and in that case use the override even on ZEN, fixing:
217394 - IRQ override skipping breaks the Aya Neo Air Plus 6800U keyboard buttons https://bugzilla.kernel.org/show_bug.cgi?id=217394 217406 - very slow keyboard typing without IRQ override with new AMD Ryzen CPUs https://bugzilla.kernel.org/show_bug.cgi?id=217406 217336 - keyboard not working Asus TUF FA617NS https://bugzilla.kernel.org/show_bug.cgi?id=217336
Which are known AMD ZEN based laptops which do need the override for IRQ 1.
This short term plan is not ideal, but it does fix all currently known issues / models and does so in a way which will hopefully not cause regressions on any other models.
Plan 2. Long term, see if I can come up with a way to read back the actual trigger type set in the IOAPIC for IRQ 1 at boot (in drivers/acpi/resource.c) and use that.
Sounds reasonable to me.
It looks like using the IRQ 1 configuration left by the BIOS as is would be the best choice unless that is not viable for some reason.
Agreed, do you have a suggestion how to do that ? I've been looking at this but I've gotten a bit lost in all the layers of ioapic code.
It looks like we need to add a flag to acpi_register_gsi() for it to register a gsi while keeping the IOAPIC settings as is (or a new function) but it is not clear to me how to implement this.
An alternative method would be to call irq_get_trigger_type() for IRQ 1 and use that for the IRQ trigger info when calling acpi_register_gsi(), but I think we need to have the IRQ registered / added to the IRQ domain first ?
I need to take a deeper look into this, which I guess is going to take some time.
If you figure this out in the meantime, please let me know.
I don't see myself having time to dive deeper into this anytime soon, so if you can take a look (as time permits) that would be great.
Regards,
Hans