Commit a9c4a912b7dc ("ACPI: resource: Remove "Zen" specific match and quirks") is causing keyboard problems for quite a log of AMD based laptop users, leading to many bug reports.
Revert this change for now, until we can come up with a better fix for the PS/2 IRQ trigger-type/polarity problems on some x86 laptops.
Fixes: a9c4a912b7dc ("ACPI: resource: Remove "Zen" specific match and quirks") Link: https://bugzilla.redhat.com/show_bug.cgi?id=2229165 Link: https://bugzilla.redhat.com/show_bug.cgi?id=2229317 Link: https://bugzilla.kernel.org/show_bug.cgi?id=217726 Cc: Mario Limonciello mario.limonciello@amd.com Cc: Linux regressions mailing list regressions@lists.linux.dev Cc: stable@vger.kernel.org Signed-off-by: Hans de Goede hdegoede@redhat.com --- drivers/acpi/resource.c | 60 +++++++++++++++++++++++++++++++++++++++++ 1 file changed, 60 insertions(+)
diff --git a/drivers/acpi/resource.c b/drivers/acpi/resource.c index 1dd8d5aebf67..0800a9d77558 100644 --- a/drivers/acpi/resource.c +++ b/drivers/acpi/resource.c @@ -470,6 +470,52 @@ static const struct dmi_system_id asus_laptop[] = { { } };
+static const struct dmi_system_id lenovo_laptop[] = { + { + .ident = "LENOVO IdeaPad Flex 5 14ALC7", + .matches = { + DMI_MATCH(DMI_SYS_VENDOR, "LENOVO"), + DMI_MATCH(DMI_PRODUCT_NAME, "82R9"), + }, + }, + { + .ident = "LENOVO IdeaPad Flex 5 16ALC7", + .matches = { + DMI_MATCH(DMI_SYS_VENDOR, "LENOVO"), + DMI_MATCH(DMI_PRODUCT_NAME, "82RA"), + }, + }, + { } +}; + +static const struct dmi_system_id tongfang_gm_rg[] = { + { + .ident = "TongFang GMxRGxx/XMG CORE 15 (M22)/TUXEDO Stellaris 15 Gen4 AMD", + .matches = { + DMI_MATCH(DMI_BOARD_NAME, "GMxRGxx"), + }, + }, + { } +}; + +static const struct dmi_system_id maingear_laptop[] = { + { + .ident = "MAINGEAR Vector Pro 2 15", + .matches = { + DMI_MATCH(DMI_SYS_VENDOR, "Micro Electronics Inc"), + DMI_MATCH(DMI_PRODUCT_NAME, "MG-VCP2-15A3070T"), + } + }, + { + .ident = "MAINGEAR Vector Pro 2 17", + .matches = { + DMI_MATCH(DMI_SYS_VENDOR, "Micro Electronics Inc"), + DMI_MATCH(DMI_PRODUCT_NAME, "MG-VCP2-17A3070T"), + }, + }, + { } +}; + static const struct dmi_system_id lg_laptop[] = { { .ident = "LG Electronics 17U70P", @@ -493,6 +539,10 @@ struct irq_override_cmp { static const struct irq_override_cmp override_table[] = { { medion_laptop, 1, ACPI_LEVEL_SENSITIVE, ACPI_ACTIVE_LOW, 0, false }, { asus_laptop, 1, ACPI_LEVEL_SENSITIVE, ACPI_ACTIVE_LOW, 0, false }, + { lenovo_laptop, 6, ACPI_LEVEL_SENSITIVE, ACPI_ACTIVE_LOW, 0, true }, + { lenovo_laptop, 10, ACPI_LEVEL_SENSITIVE, ACPI_ACTIVE_LOW, 0, true }, + { tongfang_gm_rg, 1, ACPI_EDGE_SENSITIVE, ACPI_ACTIVE_LOW, 1, true }, + { maingear_laptop, 1, ACPI_EDGE_SENSITIVE, ACPI_ACTIVE_LOW, 1, true }, { lg_laptop, 1, ACPI_LEVEL_SENSITIVE, ACPI_ACTIVE_LOW, 0, false }, };
@@ -512,6 +562,16 @@ static bool acpi_dev_irq_override(u32 gsi, u8 triggering, u8 polarity, return entry->override; }
+#ifdef CONFIG_X86 + /* + * IRQ override isn't needed on modern AMD Zen systems and + * this override breaks active low IRQs on AMD Ryzen 6000 and + * newer systems. Skip it. + */ + if (boot_cpu_has(X86_FEATURE_ZEN)) + return false; +#endif + return true; }
On 2023-08-06 17:14, Hans de Goede wrote:
Commit a9c4a912b7dc ("ACPI: resource: Remove "Zen" specific match and quirks") is causing keyboard problems for quite a log of AMD based laptop users, leading to many bug reports.
Revert this change for now, until we can come up with a better fix for the PS/2 IRQ trigger-type/polarity problems on some x86 laptops.
Fixes: a9c4a912b7dc ("ACPI: resource: Remove "Zen" specific match and quirks") Link: https://bugzilla.redhat.com/show_bug.cgi?id=2229165 Link: https://bugzilla.redhat.com/show_bug.cgi?id=2229317 Link: https://bugzilla.kernel.org/show_bug.cgi?id=217726
Other reports for various Lenovo laptops: https://lore.kernel.org/all/596b9c4a-fb83-a8ab-3a44-6052d83fa546@augustwiker... https://bugzilla.kernel.org/show_bug.cgi?id=217718 https://bugzilla.kernel.org/show_bug.cgi?id=217731
Cc: Mario Limonciello mario.limonciello@amd.com Cc: Linux regressions mailing list regressions@lists.linux.dev Cc: stable@vger.kernel.org Signed-off-by: Hans de Goede hdegoede@redhat.com
FWIW only reverting a9c4a912b7dc will also cause a regression because it did also fix the keyboard on some devices, see the links in its commit message.
Regards, August Wikerfors
On 06.08.23 17:14, Hans de Goede wrote:
Commit a9c4a912b7dc ("ACPI: resource: Remove "Zen" specific match and quirks") is causing keyboard problems for quite a log of AMD based laptop users, leading to many bug reports.
Revert this change for now, until we can come up with a better fix for the PS/2 IRQ trigger-type/polarity problems on some x86 laptops.
Thx for this.
Fixes: a9c4a912b7dc ("ACPI: resource: Remove "Zen" specific match and quirks") Link: https://bugzilla.redhat.com/show_bug.cgi?id=2229165 Link: https://bugzilla.redhat.com/show_bug.cgi?id=2229317 Link: https://bugzilla.kernel.org/show_bug.cgi?id=217726
Not that it matters much, but there is at least one more:
https://lore.kernel.org/lkml/596b9c4a-fb83-a8ab-3a44-6052d83fa546@augustwike...
Cc: Mario Limonciello mario.limonciello@amd.com Cc: Linux regressions mailing list regressions@lists.linux.dev Cc: stable@vger.kernel.org Signed-off-by: Hans de Goede hdegoede@redhat.com
I tend to think this is the right thing to do. But for completeness to ensure everybody is aware of this: I'm pretty sure (Mario will know for sure) this will cause a regression for a few users, because a9c4a912b7dc was meant fixing a regression for them. Not sure how many users this might affect, but I guess it are at least those linked to in a9c4a912b7dc:
https://bugzilla.kernel.org/show_bug.cgi?id=217336 https://bugzilla.kernel.org/show_bug.cgi?id=217406 https://bugzilla.kernel.org/show_bug.cgi?id=217394
I assume that could be avoided by adding quirk entries for their machines. Some of the tickets have the data to do so.
Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat) -- Everything you wanna know about Linux kernel regression tracking: https://linux-regtracking.leemhuis.info/about/#tldr If I did something stupid, please tell me, as explained on that page.
On 8/6/23 10:14, Hans de Goede wrote:
Commit a9c4a912b7dc ("ACPI: resource: Remove "Zen" specific match and quirks") is causing keyboard problems for quite a log of AMD based laptop users, leading to many bug reports.
Revert this change for now, until we can come up with a better fix for the PS/2 IRQ trigger-type/polarity problems on some x86 laptops.
Reverting it is going to help a bunch of machines but cause regressions for others. How do you prioritize which to fix when it comes to a regression?
Fixes: a9c4a912b7dc ("ACPI: resource: Remove "Zen" specific match and quirks") Link: https://bugzilla.redhat.com/show_bug.cgi?id=2229165 Link: https://bugzilla.redhat.com/show_bug.cgi?id=2229317 Link: https://bugzilla.kernel.org/show_bug.cgi?id=217726 Cc: Mario Limonciello mario.limonciello@amd.com Cc: Linux regressions mailing list regressions@lists.linux.dev Cc: stable@vger.kernel.org Signed-off-by: Hans de Goede hdegoede@redhat.com
drivers/acpi/resource.c | 60 +++++++++++++++++++++++++++++++++++++++++ 1 file changed, 60 insertions(+)
diff --git a/drivers/acpi/resource.c b/drivers/acpi/resource.c index 1dd8d5aebf67..0800a9d77558 100644 --- a/drivers/acpi/resource.c +++ b/drivers/acpi/resource.c @@ -470,6 +470,52 @@ static const struct dmi_system_id asus_laptop[] = { { } }; +static const struct dmi_system_id lenovo_laptop[] = {
- {
.ident = "LENOVO IdeaPad Flex 5 14ALC7",
.matches = {
DMI_MATCH(DMI_SYS_VENDOR, "LENOVO"),
DMI_MATCH(DMI_PRODUCT_NAME, "82R9"),
},
- },
- {
.ident = "LENOVO IdeaPad Flex 5 16ALC7",
.matches = {
DMI_MATCH(DMI_SYS_VENDOR, "LENOVO"),
DMI_MATCH(DMI_PRODUCT_NAME, "82RA"),
},
- },
- { }
+};
+static const struct dmi_system_id tongfang_gm_rg[] = {
- {
.ident = "TongFang GMxRGxx/XMG CORE 15 (M22)/TUXEDO Stellaris 15 Gen4 AMD",
.matches = {
DMI_MATCH(DMI_BOARD_NAME, "GMxRGxx"),
},
- },
- { }
+};
+static const struct dmi_system_id maingear_laptop[] = {
- {
.ident = "MAINGEAR Vector Pro 2 15",
.matches = {
DMI_MATCH(DMI_SYS_VENDOR, "Micro Electronics Inc"),
DMI_MATCH(DMI_PRODUCT_NAME, "MG-VCP2-15A3070T"),
}
- },
- {
.ident = "MAINGEAR Vector Pro 2 17",
.matches = {
DMI_MATCH(DMI_SYS_VENDOR, "Micro Electronics Inc"),
DMI_MATCH(DMI_PRODUCT_NAME, "MG-VCP2-17A3070T"),
},
- },
- { }
+};
- static const struct dmi_system_id lg_laptop[] = { { .ident = "LG Electronics 17U70P",
@@ -493,6 +539,10 @@ struct irq_override_cmp { static const struct irq_override_cmp override_table[] = { { medion_laptop, 1, ACPI_LEVEL_SENSITIVE, ACPI_ACTIVE_LOW, 0, false }, { asus_laptop, 1, ACPI_LEVEL_SENSITIVE, ACPI_ACTIVE_LOW, 0, false },
- { lenovo_laptop, 6, ACPI_LEVEL_SENSITIVE, ACPI_ACTIVE_LOW, 0, true },
- { lenovo_laptop, 10, ACPI_LEVEL_SENSITIVE, ACPI_ACTIVE_LOW, 0, true },
- { tongfang_gm_rg, 1, ACPI_EDGE_SENSITIVE, ACPI_ACTIVE_LOW, 1, true },
- { maingear_laptop, 1, ACPI_EDGE_SENSITIVE, ACPI_ACTIVE_LOW, 1, true }, { lg_laptop, 1, ACPI_LEVEL_SENSITIVE, ACPI_ACTIVE_LOW, 0, false }, };
@@ -512,6 +562,16 @@ static bool acpi_dev_irq_override(u32 gsi, u8 triggering, u8 polarity, return entry->override; } +#ifdef CONFIG_X86
- /*
* IRQ override isn't needed on modern AMD Zen systems and
* this override breaks active low IRQs on AMD Ryzen 6000 and
* newer systems. Skip it.
*/
- if (boot_cpu_has(X86_FEATURE_ZEN))
return false;
+#endif
- return true; }
Hi Mario,
On 8/6/23 19:13, Mario Limonciello wrote:
On 8/6/23 10:14, Hans de Goede wrote:
Commit a9c4a912b7dc ("ACPI: resource: Remove "Zen" specific match and quirks") is causing keyboard problems for quite a log of AMD based laptop users, leading to many bug reports.
Revert this change for now, until we can come up with a better fix for the PS/2 IRQ trigger-type/polarity problems on some x86 laptops.
Reverting it is going to help a bunch of machines but cause regressions for others. How do you prioritize which to fix when it comes to a regression?
I understand where you are coming from. But we are dealing with a lot of bug reports from users of recent kernel versions (so not on LTS distros) where things used to work fine. Which is pretty much the definition of a regression.
OTOH the do not override IRQ on Zen behavior has been with us for a while now, so things which are broken by it have been broken for a while and have only started working since 6.4.7.
So to me it is clear that we first need to revert to the old state, so that users for which everything was working fine get back to a working system.
I realize this will unfix some very recently fixed systems, but notice the very recently bit here most users are no in 6.4.7 or .8 yet, so most users will not even have gotten the fixing effect OTOH many users are seeing breakage now.
So IMHO we should first get back to the known bad state, instead of introducing a new unknown bad state like a9c4a912b7dc did.
If there are known systems which will need an override with a9c4a912b7dc reverted, then we should probably add those to the override table right away.
And I think what we also need is a way to specify an IRQ trigger-type override for IRQ 1 on the kernel commandline, so that we can easily ask users to try a different trigger-type without them needing to build a kernel with an updated quirk table.
As Linus mentioned already, we really need to find a proper fix for this, but for now my main priority is to stop the influx of new Fedora bugzilla and bugzilla.kernel.org bugs caused by a9c4a912b7dc, counting the extra bug links added in this thread it seems we are up to 6 known bug reports for this and that likely is just the tip of the iceberg.
Regards,
Hans
Fixes: a9c4a912b7dc ("ACPI: resource: Remove "Zen" specific match and quirks") Link: https://bugzilla.redhat.com/show_bug.cgi?id=2229165 Link: https://bugzilla.redhat.com/show_bug.cgi?id=2229317 Link: https://bugzilla.kernel.org/show_bug.cgi?id=217726 Cc: Mario Limonciello mario.limonciello@amd.com Cc: Linux regressions mailing list regressions@lists.linux.dev Cc: stable@vger.kernel.org Signed-off-by: Hans de Goede hdegoede@redhat.com
drivers/acpi/resource.c | 60 +++++++++++++++++++++++++++++++++++++++++ 1 file changed, 60 insertions(+)
diff --git a/drivers/acpi/resource.c b/drivers/acpi/resource.c index 1dd8d5aebf67..0800a9d77558 100644 --- a/drivers/acpi/resource.c +++ b/drivers/acpi/resource.c @@ -470,6 +470,52 @@ static const struct dmi_system_id asus_laptop[] = { { } }; +static const struct dmi_system_id lenovo_laptop[] = { + { + .ident = "LENOVO IdeaPad Flex 5 14ALC7", + .matches = { + DMI_MATCH(DMI_SYS_VENDOR, "LENOVO"), + DMI_MATCH(DMI_PRODUCT_NAME, "82R9"), + }, + }, + { + .ident = "LENOVO IdeaPad Flex 5 16ALC7", + .matches = { + DMI_MATCH(DMI_SYS_VENDOR, "LENOVO"), + DMI_MATCH(DMI_PRODUCT_NAME, "82RA"), + }, + }, + { } +};
+static const struct dmi_system_id tongfang_gm_rg[] = { + { + .ident = "TongFang GMxRGxx/XMG CORE 15 (M22)/TUXEDO Stellaris 15 Gen4 AMD", + .matches = { + DMI_MATCH(DMI_BOARD_NAME, "GMxRGxx"), + }, + }, + { } +};
+static const struct dmi_system_id maingear_laptop[] = { + { + .ident = "MAINGEAR Vector Pro 2 15", + .matches = { + DMI_MATCH(DMI_SYS_VENDOR, "Micro Electronics Inc"), + DMI_MATCH(DMI_PRODUCT_NAME, "MG-VCP2-15A3070T"), + } + }, + { + .ident = "MAINGEAR Vector Pro 2 17", + .matches = { + DMI_MATCH(DMI_SYS_VENDOR, "Micro Electronics Inc"), + DMI_MATCH(DMI_PRODUCT_NAME, "MG-VCP2-17A3070T"), + }, + }, + { } +};
static const struct dmi_system_id lg_laptop[] = { { .ident = "LG Electronics 17U70P", @@ -493,6 +539,10 @@ struct irq_override_cmp { static const struct irq_override_cmp override_table[] = { { medion_laptop, 1, ACPI_LEVEL_SENSITIVE, ACPI_ACTIVE_LOW, 0, false }, { asus_laptop, 1, ACPI_LEVEL_SENSITIVE, ACPI_ACTIVE_LOW, 0, false }, + { lenovo_laptop, 6, ACPI_LEVEL_SENSITIVE, ACPI_ACTIVE_LOW, 0, true }, + { lenovo_laptop, 10, ACPI_LEVEL_SENSITIVE, ACPI_ACTIVE_LOW, 0, true }, + { tongfang_gm_rg, 1, ACPI_EDGE_SENSITIVE, ACPI_ACTIVE_LOW, 1, true }, + { maingear_laptop, 1, ACPI_EDGE_SENSITIVE, ACPI_ACTIVE_LOW, 1, true }, { lg_laptop, 1, ACPI_LEVEL_SENSITIVE, ACPI_ACTIVE_LOW, 0, false }, }; @@ -512,6 +562,16 @@ static bool acpi_dev_irq_override(u32 gsi, u8 triggering, u8 polarity, return entry->override; } +#ifdef CONFIG_X86 + /* + * IRQ override isn't needed on modern AMD Zen systems and + * this override breaks active low IRQs on AMD Ryzen 6000 and + * newer systems. Skip it. + */ + if (boot_cpu_has(X86_FEATURE_ZEN)) + return false; +#endif
return true; }
On 8/6/23 13:20, Hans de Goede wrote:
Hi Mario,
On 8/6/23 19:13, Mario Limonciello wrote:
On 8/6/23 10:14, Hans de Goede wrote:
Commit a9c4a912b7dc ("ACPI: resource: Remove "Zen" specific match and quirks") is causing keyboard problems for quite a log of AMD based laptop users, leading to many bug reports.
Revert this change for now, until we can come up with a better fix for the PS/2 IRQ trigger-type/polarity problems on some x86 laptops.
Reverting it is going to help a bunch of machines but cause regressions for others. How do you prioritize which to fix when it comes to a regression?
I understand where you are coming from. But we are dealing with a lot of bug reports from users of recent kernel versions (so not on LTS distros) where things used to work fine. Which is pretty much the definition of a regression.
OTOH the do not override IRQ on Zen behavior has been with us for a while now, so things which are broken by it have been broken for a while and have only started working since 6.4.7.
So to me it is clear that we first need to revert to the old state, so that users for which everything was working fine get back to a working system.
I realize this will unfix some very recently fixed systems, but notice the very recently bit here most users are no in 6.4.7 or .8 yet, so most users will not even have gotten the fixing effect OTOH many users are seeing breakage now.
So IMHO we should first get back to the known bad state, instead of introducing a new unknown bad state like a9c4a912b7dc did.
If there are known systems which will need an override with a9c4a912b7dc reverted, then we should probably add those to the override table right away.
And I think what we also need is a way to specify an IRQ trigger-type override for IRQ 1 on the kernel commandline, so that we can easily ask users to try a different trigger-type without them needing to build a kernel with an updated quirk table.
As Linus mentioned already, we really need to find a proper fix for this, but for now my main priority is to stop the influx of new Fedora bugzilla and bugzilla.kernel.org bugs caused by a9c4a912b7dc, counting the extra bug links added in this thread it seems we are up to 6 known bug reports for this and that likely is just the tip of the iceberg.
Regards,
Hans
We haven't even given a try to fixing it; I think the revert is still hasty.
I don't have a machine that can reproduce this failure, but I did confirm that my keyboard still works with this:
diff --git a/drivers/acpi/resource.c b/drivers/acpi/resource.c index 1dd8d5aebf678..b74d7d8cc8630 100644 --- a/drivers/acpi/resource.c +++ b/drivers/acpi/resource.c @@ -19,7 +19,7 @@ #include <linux/dmi.h>
#ifdef CONFIG_X86 -#define valid_IRQ(i) (((i) != 0) && ((i) != 2)) +#define valid_IRQ(i) ((i) > 2) static inline bool acpi_iospace_resource_valid(struct resource *res) { /* On X86 IO space is limited to the [0 - 64K] IO port range */
Can we perhaps see if that works instead for some affected people?
If it does we should be able to throw away the entire quirks table.
Fixes: a9c4a912b7dc ("ACPI: resource: Remove "Zen" specific match and quirks") Link: https://bugzilla.redhat.com/show_bug.cgi?id=2229165 Link: https://bugzilla.redhat.com/show_bug.cgi?id=2229317 Link: https://bugzilla.kernel.org/show_bug.cgi?id=217726 Cc: Mario Limonciello mario.limonciello@amd.com Cc: Linux regressions mailing list regressions@lists.linux.dev Cc: stable@vger.kernel.org Signed-off-by: Hans de Goede hdegoede@redhat.com
drivers/acpi/resource.c | 60 +++++++++++++++++++++++++++++++++++++++++ 1 file changed, 60 insertions(+)
diff --git a/drivers/acpi/resource.c b/drivers/acpi/resource.c index 1dd8d5aebf67..0800a9d77558 100644 --- a/drivers/acpi/resource.c +++ b/drivers/acpi/resource.c @@ -470,6 +470,52 @@ static const struct dmi_system_id asus_laptop[] = { { } }; +static const struct dmi_system_id lenovo_laptop[] = { + { + .ident = "LENOVO IdeaPad Flex 5 14ALC7", + .matches = { + DMI_MATCH(DMI_SYS_VENDOR, "LENOVO"), + DMI_MATCH(DMI_PRODUCT_NAME, "82R9"), + }, + }, + { + .ident = "LENOVO IdeaPad Flex 5 16ALC7", + .matches = { + DMI_MATCH(DMI_SYS_VENDOR, "LENOVO"), + DMI_MATCH(DMI_PRODUCT_NAME, "82RA"), + }, + }, + { } +};
+static const struct dmi_system_id tongfang_gm_rg[] = { + { + .ident = "TongFang GMxRGxx/XMG CORE 15 (M22)/TUXEDO Stellaris 15 Gen4 AMD", + .matches = { + DMI_MATCH(DMI_BOARD_NAME, "GMxRGxx"), + }, + }, + { } +};
+static const struct dmi_system_id maingear_laptop[] = { + { + .ident = "MAINGEAR Vector Pro 2 15", + .matches = { + DMI_MATCH(DMI_SYS_VENDOR, "Micro Electronics Inc"), + DMI_MATCH(DMI_PRODUCT_NAME, "MG-VCP2-15A3070T"), + } + }, + { + .ident = "MAINGEAR Vector Pro 2 17", + .matches = { + DMI_MATCH(DMI_SYS_VENDOR, "Micro Electronics Inc"), + DMI_MATCH(DMI_PRODUCT_NAME, "MG-VCP2-17A3070T"), + }, + }, + { } +};
static const struct dmi_system_id lg_laptop[] = { { .ident = "LG Electronics 17U70P", @@ -493,6 +539,10 @@ struct irq_override_cmp { static const struct irq_override_cmp override_table[] = { { medion_laptop, 1, ACPI_LEVEL_SENSITIVE, ACPI_ACTIVE_LOW, 0, false }, { asus_laptop, 1, ACPI_LEVEL_SENSITIVE, ACPI_ACTIVE_LOW, 0, false }, + { lenovo_laptop, 6, ACPI_LEVEL_SENSITIVE, ACPI_ACTIVE_LOW, 0, true }, + { lenovo_laptop, 10, ACPI_LEVEL_SENSITIVE, ACPI_ACTIVE_LOW, 0, true }, + { tongfang_gm_rg, 1, ACPI_EDGE_SENSITIVE, ACPI_ACTIVE_LOW, 1, true }, + { maingear_laptop, 1, ACPI_EDGE_SENSITIVE, ACPI_ACTIVE_LOW, 1, true }, { lg_laptop, 1, ACPI_LEVEL_SENSITIVE, ACPI_ACTIVE_LOW, 0, false }, }; @@ -512,6 +562,16 @@ static bool acpi_dev_irq_override(u32 gsi, u8 triggering, u8 polarity, return entry->override; } +#ifdef CONFIG_X86 + /* + * IRQ override isn't needed on modern AMD Zen systems and + * this override breaks active low IRQs on AMD Ryzen 6000 and + * newer systems. Skip it. + */ + if (boot_cpu_has(X86_FEATURE_ZEN)) + return false; +#endif
return true; }
On 07.08.23 06:38, Mario Limonciello wrote:
On 8/6/23 13:20, Hans de Goede wrote:
On 8/6/23 19:13, Mario Limonciello wrote:
On 8/6/23 10:14, Hans de Goede wrote:
Commit a9c4a912b7dc ("ACPI: resource: Remove "Zen" specific match and quirks") is causing keyboard problems for quite a log of AMD based laptop users, leading to many bug reports.
Revert this change for now, until we can come up with a better fix for the PS/2 IRQ trigger-type/polarity problems on some x86 laptops.
Reverting it is going to help a bunch of machines but cause regressions for others. How do you prioritize which to fix when it comes to a regression?
It's up to Linus in the end, but to echo what Hans already said: I guess Linus stance in this case would be along the lines of "let's get back to a known state, even if we known that state has problems as well". And that state is the mainline release 6.4 . Sure, Greg picked this up for stable, but apart from that things boil down to "we tried something in 6.5-rc1, it did work well, so let's revert this until we work our a proper solution that can be applied in a later cycle, no harm done".
We haven't even given a try to fixing it; I think the revert is still hasty.
Some urgency is required, as (a) the patch made it into a stable release (b) get closer to the end of the merge window and want to avoid last minute changes.
[...] Can we perhaps see if that works instead for some affected people? [...]
I'd say: it would be best to resolve this now with a revert, then we might even reach the next stable-rc release (Greg unusually releases those on Mon/Tue/Wed) and thus the next stable release. But let's please definitely resolve this this week in mainline before -rc6.
Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat) -- Everything you wanna know about Linux kernel regression tracking: https://linux-regtracking.leemhuis.info/about/#tldr If I did something stupid, please tell me, as explained on that page.
Hi,
On 8/7/23 06:38, Mario Limonciello wrote:
On 8/6/23 13:20, Hans de Goede wrote:
Hi Mario,
On 8/6/23 19:13, Mario Limonciello wrote:
On 8/6/23 10:14, Hans de Goede wrote:
Commit a9c4a912b7dc ("ACPI: resource: Remove "Zen" specific match and quirks") is causing keyboard problems for quite a log of AMD based laptop users, leading to many bug reports.
Revert this change for now, until we can come up with a better fix for the PS/2 IRQ trigger-type/polarity problems on some x86 laptops.
Reverting it is going to help a bunch of machines but cause regressions for others. How do you prioritize which to fix when it comes to a regression?
I understand where you are coming from. But we are dealing with a lot of bug reports from users of recent kernel versions (so not on LTS distros) where things used to work fine. Which is pretty much the definition of a regression.
OTOH the do not override IRQ on Zen behavior has been with us for a while now, so things which are broken by it have been broken for a while and have only started working since 6.4.7.
So to me it is clear that we first need to revert to the old state, so that users for which everything was working fine get back to a working system.
I realize this will unfix some very recently fixed systems, but notice the very recently bit here most users are no in 6.4.7 or .8 yet, so most users will not even have gotten the fixing effect OTOH many users are seeing breakage now.
So IMHO we should first get back to the known bad state, instead of introducing a new unknown bad state like a9c4a912b7dc did.
If there are known systems which will need an override with a9c4a912b7dc reverted, then we should probably add those to the override table right away.
And I think what we also need is a way to specify an IRQ trigger-type override for IRQ 1 on the kernel commandline, so that we can easily ask users to try a different trigger-type without them needing to build a kernel with an updated quirk table.
As Linus mentioned already, we really need to find a proper fix for this, but for now my main priority is to stop the influx of new Fedora bugzilla and bugzilla.kernel.org bugs caused by a9c4a912b7dc, counting the extra bug links added in this thread it seems we are up to 6 known bug reports for this and that likely is just the tip of the iceberg.
Regards,
Hans
We haven't even given a try to fixing it; I think the revert is still hasty.
I don't have a machine that can reproduce this failure, but I did confirm that my keyboard still works with this:
diff --git a/drivers/acpi/resource.c b/drivers/acpi/resource.c index 1dd8d5aebf678..b74d7d8cc8630 100644 --- a/drivers/acpi/resource.c +++ b/drivers/acpi/resource.c @@ -19,7 +19,7 @@ #include <linux/dmi.h>
#ifdef CONFIG_X86 -#define valid_IRQ(i) (((i) != 0) && ((i) != 2)) +#define valid_IRQ(i) ((i) > 2) static inline bool acpi_iospace_resource_valid(struct resource *res) { /* On X86 IO space is limited to the [0 - 64K] IO port range */
Can we perhaps see if that works instead for some affected people?
That does not just skip the override stuff, it will make the kernel return irqresource_disabled(res, 1) for the kbd IRQ:
static inline void irqresource_disabled(struct resource *res, u32 irq) { res->start = irq; res->end = irq; res->flags |= IORESOURCE_IRQ | IORESOURCE_DISABLED | IORESOURCE_UNSET; }
Now not much code seems to actually check the IORESOURCE_DISABLED | IORESOURCE_UNSET flags, so this might still work but it does not seem like the right thing to do.
Regards,
Hans
If it does we should be able to throw away the entire quirks table.
Fixes: a9c4a912b7dc ("ACPI: resource: Remove "Zen" specific match and quirks") Link: https://bugzilla.redhat.com/show_bug.cgi?id=2229165 Link: https://bugzilla.redhat.com/show_bug.cgi?id=2229317 Link: https://bugzilla.kernel.org/show_bug.cgi?id=217726 Cc: Mario Limonciello mario.limonciello@amd.com Cc: Linux regressions mailing list regressions@lists.linux.dev Cc: stable@vger.kernel.org Signed-off-by: Hans de Goede hdegoede@redhat.com
drivers/acpi/resource.c | 60 +++++++++++++++++++++++++++++++++++++++++ 1 file changed, 60 insertions(+)
diff --git a/drivers/acpi/resource.c b/drivers/acpi/resource.c index 1dd8d5aebf67..0800a9d77558 100644 --- a/drivers/acpi/resource.c +++ b/drivers/acpi/resource.c @@ -470,6 +470,52 @@ static const struct dmi_system_id asus_laptop[] = { { } }; +static const struct dmi_system_id lenovo_laptop[] = { + { + .ident = "LENOVO IdeaPad Flex 5 14ALC7", + .matches = { + DMI_MATCH(DMI_SYS_VENDOR, "LENOVO"), + DMI_MATCH(DMI_PRODUCT_NAME, "82R9"), + }, + }, + { + .ident = "LENOVO IdeaPad Flex 5 16ALC7", + .matches = { + DMI_MATCH(DMI_SYS_VENDOR, "LENOVO"), + DMI_MATCH(DMI_PRODUCT_NAME, "82RA"), + }, + }, + { } +};
+static const struct dmi_system_id tongfang_gm_rg[] = { + { + .ident = "TongFang GMxRGxx/XMG CORE 15 (M22)/TUXEDO Stellaris 15 Gen4 AMD", + .matches = { + DMI_MATCH(DMI_BOARD_NAME, "GMxRGxx"), + }, + }, + { } +};
+static const struct dmi_system_id maingear_laptop[] = { + { + .ident = "MAINGEAR Vector Pro 2 15", + .matches = { + DMI_MATCH(DMI_SYS_VENDOR, "Micro Electronics Inc"), + DMI_MATCH(DMI_PRODUCT_NAME, "MG-VCP2-15A3070T"), + } + }, + { + .ident = "MAINGEAR Vector Pro 2 17", + .matches = { + DMI_MATCH(DMI_SYS_VENDOR, "Micro Electronics Inc"), + DMI_MATCH(DMI_PRODUCT_NAME, "MG-VCP2-17A3070T"), + }, + }, + { } +};
static const struct dmi_system_id lg_laptop[] = { { .ident = "LG Electronics 17U70P", @@ -493,6 +539,10 @@ struct irq_override_cmp { static const struct irq_override_cmp override_table[] = { { medion_laptop, 1, ACPI_LEVEL_SENSITIVE, ACPI_ACTIVE_LOW, 0, false }, { asus_laptop, 1, ACPI_LEVEL_SENSITIVE, ACPI_ACTIVE_LOW, 0, false }, + { lenovo_laptop, 6, ACPI_LEVEL_SENSITIVE, ACPI_ACTIVE_LOW, 0, true }, + { lenovo_laptop, 10, ACPI_LEVEL_SENSITIVE, ACPI_ACTIVE_LOW, 0, true }, + { tongfang_gm_rg, 1, ACPI_EDGE_SENSITIVE, ACPI_ACTIVE_LOW, 1, true }, + { maingear_laptop, 1, ACPI_EDGE_SENSITIVE, ACPI_ACTIVE_LOW, 1, true }, { lg_laptop, 1, ACPI_LEVEL_SENSITIVE, ACPI_ACTIVE_LOW, 0, false }, }; @@ -512,6 +562,16 @@ static bool acpi_dev_irq_override(u32 gsi, u8 triggering, u8 polarity, return entry->override; } +#ifdef CONFIG_X86 + /* + * IRQ override isn't needed on modern AMD Zen systems and + * this override breaks active low IRQs on AMD Ryzen 6000 and + * newer systems. Skip it. + */ + if (boot_cpu_has(X86_FEATURE_ZEN)) + return false; +#endif
return true; }
We haven't even given a try to fixing it; I think the revert is still hasty.
I don't have a machine that can reproduce this failure, but I did confirm that my keyboard still works with this:
diff --git a/drivers/acpi/resource.c b/drivers/acpi/resource.c index 1dd8d5aebf678..b74d7d8cc8630 100644 --- a/drivers/acpi/resource.c +++ b/drivers/acpi/resource.c @@ -19,7 +19,7 @@ #include <linux/dmi.h>
#ifdef CONFIG_X86 -#define valid_IRQ(i) (((i) != 0) && ((i) != 2)) +#define valid_IRQ(i) ((i) > 2) static inline bool acpi_iospace_resource_valid(struct resource *res) { /* On X86 IO space is limited to the [0 - 64K] IO port range */
Can we perhaps see if that works instead for some affected people?
That does not just skip the override stuff, it will make the kernel return irqresource_disabled(res, 1) for the kbd IRQ:
static inline void irqresource_disabled(struct resource *res, u32 irq) { res->start = irq; res->end = irq; res->flags |= IORESOURCE_IRQ | IORESOURCE_DISABLED | IORESOURCE_UNSET; }
Now not much code seems to actually check the IORESOURCE_DISABLED | IORESOURCE_UNSET flags, so this might still work but it does not seem like the right thing to do.
Right; so it makes the resource get skipped when PNP is enumerated.
To me it seems that it should match the theme of 'don't reprogram the polarity and triggering for the IRQ'. When the IRQ is set up by i8042 then it will try to go and get a resource and wouldn't find one. So when the IRQ is set up it should match the values already programmed.
I do wonder if both IRQ 1 and IRQ 12 both would need to be skipped for this to work (if people are affected by both i8042 devices).
Hi,
On 8/7/23 13:39, Mario Limonciello wrote:
We haven't even given a try to fixing it; I think the revert is still hasty.
I don't have a machine that can reproduce this failure, but I did confirm that my keyboard still works with this:
diff --git a/drivers/acpi/resource.c b/drivers/acpi/resource.c index 1dd8d5aebf678..b74d7d8cc8630 100644 --- a/drivers/acpi/resource.c +++ b/drivers/acpi/resource.c @@ -19,7 +19,7 @@ #include <linux/dmi.h>
#ifdef CONFIG_X86 -#define valid_IRQ(i) (((i) != 0) && ((i) != 2)) +#define valid_IRQ(i) ((i) > 2) static inline bool acpi_iospace_resource_valid(struct resource *res) { /* On X86 IO space is limited to the [0 - 64K] IO port range */
Can we perhaps see if that works instead for some affected people?
That does not just skip the override stuff, it will make the kernel return irqresource_disabled(res, 1) for the kbd IRQ:
static inline void irqresource_disabled(struct resource *res, u32 irq) { res->start = irq; res->end = irq; res->flags |= IORESOURCE_IRQ | IORESOURCE_DISABLED | IORESOURCE_UNSET; }
Now not much code seems to actually check the IORESOURCE_DISABLED | IORESOURCE_UNSET flags, so this might still work but it does not seem like the right thing to do.
Right; so it makes the resource get skipped when PNP is enumerated.
Which AFAICT means that PNP enumerated i8042-s will not have any IRQ assigned at all and this will not work.
Regards,
Hans
On 8/7/2023 12:06, Hans de Goede wrote:
Hi,
On 8/7/23 13:39, Mario Limonciello wrote:
We haven't even given a try to fixing it; I think the revert is still hasty.
I don't have a machine that can reproduce this failure, but I did confirm that my keyboard still works with this:
diff --git a/drivers/acpi/resource.c b/drivers/acpi/resource.c index 1dd8d5aebf678..b74d7d8cc8630 100644 --- a/drivers/acpi/resource.c +++ b/drivers/acpi/resource.c @@ -19,7 +19,7 @@ #include <linux/dmi.h>
#ifdef CONFIG_X86 -#define valid_IRQ(i) (((i) != 0) && ((i) != 2)) +#define valid_IRQ(i) ((i) > 2) static inline bool acpi_iospace_resource_valid(struct resource *res) { /* On X86 IO space is limited to the [0 - 64K] IO port range */
Can we perhaps see if that works instead for some affected people?
That does not just skip the override stuff, it will make the kernel return irqresource_disabled(res, 1) for the kbd IRQ:
static inline void irqresource_disabled(struct resource *res, u32 irq) { res->start = irq; res->end = irq; res->flags |= IORESOURCE_IRQ | IORESOURCE_DISABLED | IORESOURCE_UNSET; }
Now not much code seems to actually check the IORESOURCE_DISABLED | IORESOURCE_UNSET flags, so this might still work but it does not seem like the right thing to do.
Right; so it makes the resource get skipped when PNP is enumerated.
Which AFAICT means that PNP enumerated i8042-s will not have any IRQ assigned at all and this will not work.
Regards,
Hans
Yeah; I did some experimentation with some other machines and confirmed this doesn't work reliably on everything.
Your newly proposed direction sounds good to me though. If you can post a new patch for that I'll test on the stuff I have.
Hi All,
On 8/6/23 17:14, Hans de Goede wrote:
Commit a9c4a912b7dc ("ACPI: resource: Remove "Zen" specific match and quirks") is causing keyboard problems for quite a log of AMD based laptop users, leading to many bug reports.
Revert this change for now, until we can come up with a better fix for the PS/2 IRQ trigger-type/polarity problems on some x86 laptops.
Fixes: a9c4a912b7dc ("ACPI: resource: Remove "Zen" specific match and quirks") Link: https://bugzilla.redhat.com/show_bug.cgi?id=2229165 Link: https://bugzilla.redhat.com/show_bug.cgi?id=2229317 Link: https://bugzilla.kernel.org/show_bug.cgi?id=217726 Cc: Mario Limonciello mario.limonciello@amd.com Cc: Linux regressions mailing list regressions@lists.linux.dev Cc: stable@vger.kernel.org Signed-off-by: Hans de Goede hdegoede@redhat.com
I've spend most of today analysing the situation / this problem :
213031 - MEDION notebook internal keyboard not recognized / not working correctly https://bugzilla.kernel.org/show_bug.cgi?id=213031
This is the bug that started it all, the issue here was overriding a level low DSDT entry:
IRQ (Level, ActiveLow, Exclusive, ) {1}
With an edge high entry from the MADT, note that edge high is the default mp_irqs[idx].irqflags value for legacy/ISA IRQs. The dmesg for the Ice Lake Medion M15T this bug is about shows no INT_SRC_OVR entry for IRQ 1 in the MADT, it does show INT_SRC_OVR entries for IRQ 0 and 9.
At first a fix was attempted to not use the MADT override unless the DSDT entry was edge high. But that caused regressions, so a switch to a DMI based approach was used instead. Noteworthy is that some of the regressions benefitted from a MADT override to high edge for IRQ 3 and 4 (UART IRQs) even though there are no INT_SRC_OVR messages in the dmesg of the machine with the regression.
*** fast forward to today ***
The DMI quirk based approach seems to have worked well for the Ice Lake era problems from approx. 3 years ago. But on AMD Zen based systems the situation seems to be more complex. Not using the MADT override is a problem for quite a few models. But using the MADT override is a problem on quite a few other models ...
Looking at the status quo for v6.4 where MADT overriding by default is not used, 3 bugs have been filed where the override is actually necessary (note dmesg snippets with patched kernel to enable MADT override):
217394 - IRQ override skipping breaks the Aya Neo Air Plus 6800U keyboard buttons https://bugzilla.kernel.org/show_bug.cgi?id=217394
Aya Neo Air Plus - AMD Ryzen 7 6800U
[ 0.003333] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl) [ 0.003333] ACPI: INT_SRC_OVR (bus 0 bus_irq 1 global_irq 1 low edge) [ 0.003333] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 low level) [ 0.410670] ACPI: IRQ 1 override to edge, low(!)
217406 - very slow keyboard typing without IRQ override with new AMD Ryzen CPUs https://bugzilla.kernel.org/show_bug.cgi?id=217406
HP Pavilion Aero 13 - AMD Ryzen 7735U
[ 0.026135] ACPI: INT_SRC_OVR (bus 0 bus_irq 1 global_irq 1 low edge) [ 0.026136] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl) [ 0.026137] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 low level)
[ 0.361640] ACPI: IRQ 1 override to edge, low(!)
217336 - keyboard not working Asus TUF FA617NS https://bugzilla.kernel.org/show_bug.cgi?id=217336
Asus TUF FA617NS - AMD Ryzen 7 7735HS
Noteworthy DSTD keyboard resource:
IRQNoFlags () {1}
ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl) ACPI: INT_SRC_OVR (bus 0 bus_irq 1 global_irq 1 low edge) ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 low level) ACPI: IRQ 1 override to edge, low(!)
So for all 3 do use MADT override on Zen bugs we have an INT_SRC_OVR dmesg entry for IRQ 1.
Unfortunately the "MAINGEAR Vector Pro 2 17" / "MG-VCP2-17A3070T" for which a quirk was added in commit 9946e39fe8d0 to force the override even though it it Zen based breaks this pattern:
[ 0.073733] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl) [ 0.073734] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 low level) [ 0.341347] ACPI: IRQ 1 override to edge, high(!)
Still the presence of an INT_SRC_OVR for a specific legacy IRQ seems to be a strong indicator that MADT overriding should be used in that case and can be used to at least reduce the amount of DMI quirks.
Another interesting data point is that all the devices for which DMI quirks are present for which MADT overriding should not be used for IRQ 1 all have a DSDT entry with the IRQ configured as level low and exclusive.
I think that the best thing to do might be to go back to the original approach from:
https://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git/commit/?...
and then limit this to IRQ1. Also maybe inverting the check to:
static bool irq_is_legacy(struct acpi_resource_irq *irq) { return !(irq->triggering == ACPI_LEVEL_SENSITIVE && irq->polarity == ACPI_ACTIVE_LOW && irq->shareable == ACPI_EXCLUSIVE); }
But I need to check if this will work for all the new Zen models for which we got bug reports after the recent dropping of 9946e39fe8d0 ("ACPI: resource: skip IRQ override on AMD Zen platforms")
Regards,
Hans
drivers/acpi/resource.c | 60 +++++++++++++++++++++++++++++++++++++++++ 1 file changed, 60 insertions(+)
diff --git a/drivers/acpi/resource.c b/drivers/acpi/resource.c index 1dd8d5aebf67..0800a9d77558 100644 --- a/drivers/acpi/resource.c +++ b/drivers/acpi/resource.c @@ -470,6 +470,52 @@ static const struct dmi_system_id asus_laptop[] = { { } }; +static const struct dmi_system_id lenovo_laptop[] = {
- {
.ident = "LENOVO IdeaPad Flex 5 14ALC7",
.matches = {
DMI_MATCH(DMI_SYS_VENDOR, "LENOVO"),
DMI_MATCH(DMI_PRODUCT_NAME, "82R9"),
},
- },
- {
.ident = "LENOVO IdeaPad Flex 5 16ALC7",
.matches = {
DMI_MATCH(DMI_SYS_VENDOR, "LENOVO"),
DMI_MATCH(DMI_PRODUCT_NAME, "82RA"),
},
- },
- { }
+};
+static const struct dmi_system_id tongfang_gm_rg[] = {
- {
.ident = "TongFang GMxRGxx/XMG CORE 15 (M22)/TUXEDO Stellaris 15 Gen4 AMD",
.matches = {
DMI_MATCH(DMI_BOARD_NAME, "GMxRGxx"),
},
- },
- { }
+};
+static const struct dmi_system_id maingear_laptop[] = {
- {
.ident = "MAINGEAR Vector Pro 2 15",
.matches = {
DMI_MATCH(DMI_SYS_VENDOR, "Micro Electronics Inc"),
DMI_MATCH(DMI_PRODUCT_NAME, "MG-VCP2-15A3070T"),
}
- },
- {
.ident = "MAINGEAR Vector Pro 2 17",
.matches = {
DMI_MATCH(DMI_SYS_VENDOR, "Micro Electronics Inc"),
DMI_MATCH(DMI_PRODUCT_NAME, "MG-VCP2-17A3070T"),
},
- },
- { }
+};
static const struct dmi_system_id lg_laptop[] = { { .ident = "LG Electronics 17U70P", @@ -493,6 +539,10 @@ struct irq_override_cmp { static const struct irq_override_cmp override_table[] = { { medion_laptop, 1, ACPI_LEVEL_SENSITIVE, ACPI_ACTIVE_LOW, 0, false }, { asus_laptop, 1, ACPI_LEVEL_SENSITIVE, ACPI_ACTIVE_LOW, 0, false },
- { lenovo_laptop, 6, ACPI_LEVEL_SENSITIVE, ACPI_ACTIVE_LOW, 0, true },
- { lenovo_laptop, 10, ACPI_LEVEL_SENSITIVE, ACPI_ACTIVE_LOW, 0, true },
- { tongfang_gm_rg, 1, ACPI_EDGE_SENSITIVE, ACPI_ACTIVE_LOW, 1, true },
- { maingear_laptop, 1, ACPI_EDGE_SENSITIVE, ACPI_ACTIVE_LOW, 1, true }, { lg_laptop, 1, ACPI_LEVEL_SENSITIVE, ACPI_ACTIVE_LOW, 0, false },
}; @@ -512,6 +562,16 @@ static bool acpi_dev_irq_override(u32 gsi, u8 triggering, u8 polarity, return entry->override; } +#ifdef CONFIG_X86
- /*
* IRQ override isn't needed on modern AMD Zen systems and
* this override breaks active low IRQs on AMD Ryzen 6000 and
* newer systems. Skip it.
*/
- if (boot_cpu_has(X86_FEATURE_ZEN))
return false;
+#endif
- return true;
}
Hi,
On 8/7/23 17:19, Hans de Goede wrote:
Hi All,
On 8/6/23 17:14, Hans de Goede wrote:
Commit a9c4a912b7dc ("ACPI: resource: Remove "Zen" specific match and quirks") is causing keyboard problems for quite a log of AMD based laptop users, leading to many bug reports.
Revert this change for now, until we can come up with a better fix for the PS/2 IRQ trigger-type/polarity problems on some x86 laptops.
Fixes: a9c4a912b7dc ("ACPI: resource: Remove "Zen" specific match and quirks") Link: https://bugzilla.redhat.com/show_bug.cgi?id=2229165 Link: https://bugzilla.redhat.com/show_bug.cgi?id=2229317 Link: https://bugzilla.kernel.org/show_bug.cgi?id=217726 Cc: Mario Limonciello mario.limonciello@amd.com Cc: Linux regressions mailing list regressions@lists.linux.dev Cc: stable@vger.kernel.org Signed-off-by: Hans de Goede hdegoede@redhat.com
I've spend most of today analysing the situation / this problem :
213031 - MEDION notebook internal keyboard not recognized / not working correctly https://bugzilla.kernel.org/show_bug.cgi?id=213031
This is the bug that started it all, the issue here was overriding a level low DSDT entry:
IRQ (Level, ActiveLow, Exclusive, ) {1}
With an edge high entry from the MADT, note that edge high is the default mp_irqs[idx].irqflags value for legacy/ISA IRQs. The dmesg for the Ice Lake Medion M15T this bug is about shows no INT_SRC_OVR entry for IRQ 1 in the MADT, it does show INT_SRC_OVR entries for IRQ 0 and 9.
At first a fix was attempted to not use the MADT override unless the DSDT entry was edge high. But that caused regressions, so a switch to a DMI based approach was used instead. Noteworthy is that some of the regressions benefitted from a MADT override to high edge for IRQ 3 and 4 (UART IRQs) even though there are no INT_SRC_OVR messages in the dmesg of the machine with the regression.
*** fast forward to today ***
The DMI quirk based approach seems to have worked well for the Ice Lake era problems from approx. 3 years ago. But on AMD Zen based systems the situation seems to be more complex. Not using the MADT override is a problem for quite a few models. But using the MADT override is a problem on quite a few other models ...
Looking at the status quo for v6.4 where MADT overriding by default is not used, 3 bugs have been filed where the override is actually necessary (note dmesg snippets with patched kernel to enable MADT override):
217394 - IRQ override skipping breaks the Aya Neo Air Plus 6800U keyboard buttons https://bugzilla.kernel.org/show_bug.cgi?id=217394
Aya Neo Air Plus - AMD Ryzen 7 6800U
[ 0.003333] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl) [ 0.003333] ACPI: INT_SRC_OVR (bus 0 bus_irq 1 global_irq 1 low edge) [ 0.003333] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 low level) [ 0.410670] ACPI: IRQ 1 override to edge, low(!)
217406 - very slow keyboard typing without IRQ override with new AMD Ryzen CPUs https://bugzilla.kernel.org/show_bug.cgi?id=217406
HP Pavilion Aero 13 - AMD Ryzen 7735U
[ 0.026135] ACPI: INT_SRC_OVR (bus 0 bus_irq 1 global_irq 1 low edge) [ 0.026136] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl) [ 0.026137] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 low level)
[ 0.361640] ACPI: IRQ 1 override to edge, low(!)
217336 - keyboard not working Asus TUF FA617NS https://bugzilla.kernel.org/show_bug.cgi?id=217336
Asus TUF FA617NS - AMD Ryzen 7 7735HS
Noteworthy DSTD keyboard resource:
IRQNoFlags () {1}
ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl) ACPI: INT_SRC_OVR (bus 0 bus_irq 1 global_irq 1 low edge) ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 low level) ACPI: IRQ 1 override to edge, low(!)
So for all 3 do use MADT override on Zen bugs we have an INT_SRC_OVR dmesg entry for IRQ 1.
Unfortunately the "MAINGEAR Vector Pro 2 17" / "MG-VCP2-17A3070T" for which a quirk was added in commit 9946e39fe8d0 to force the override even though it it Zen based breaks this pattern:
[ 0.073733] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl) [ 0.073734] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 low level) [ 0.341347] ACPI: IRQ 1 override to edge, high(!)
Still the presence of an INT_SRC_OVR for a specific legacy IRQ seems to be a strong indicator that MADT overriding should be used in that case and can be used to at least reduce the amount of DMI quirks.
Another interesting data point is that all the devices for which DMI quirks are present for which MADT overriding should not be used for IRQ 1 all have a DSDT entry with the IRQ configured as level low and exclusive.
I think that the best thing to do might be to go back to the original approach from:
https://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git/commit/?...
and then limit this to IRQ1. Also maybe inverting the check to:
static bool irq_is_legacy(struct acpi_resource_irq *irq) { return !(irq->triggering == ACPI_LEVEL_SENSITIVE && irq->polarity == ACPI_ACTIVE_LOW && irq->shareable == ACPI_EXCLUSIVE); }
But I need to check if this will work for all the new Zen models for which we got bug reports after the recent dropping of 9946e39fe8d0 ("ACPI: resource: skip IRQ override on AMD Zen platforms")
So today I have started with continueing the investigation looking at laptop models where we used to not override because of:
if (boot_cpu_has(X86_FEATURE_ZEN)) return false;
And where removing this and thus using the override has led to a regression.
Looking at the acpidump-s from the following bugs:
https://bugzilla.redhat.com/show_bug.cgi?id=2229165 https://bugzilla.redhat.com/show_bug.cgi?id=2229317 https://bugzilla.kernel.org/show_bug.cgi?id=217726
All of these use the following settings for the kbd in the DSDT:
IRQ (Edge, ActiveLow, Shared, ) {1}
So we know these are at least 3 models with "Edge, ActiveLow, Shared" IRQ 1 settings which must not use the override. But the existing quirks before a9c4a912b7dc ("ACPI: resource: Remove "Zen" specific match and quirks") contain:
{ tongfang_gm_rg, 1, ACPI_EDGE_SENSITIVE, ACPI_ACTIVE_LOW, 1, true }, { maingear_laptop, 1, ACPI_EDGE_SENSITIVE, ACPI_ACTIVE_LOW, 1, true },
IOW models with "Edge, ActiveLow, Shared" IRQ 1 settings which OTOH must use the override, so as I was already afraid there is no easy for these DSDT IRQ 1 settings skip the override solution :|
So I have 2 plans to move forward with this:
Plan 1. Short them for 6.5 and backporting to 6.4.y (and other stable series):
1. Revert a9c4a912b7dc 2. Limit the acpi_dev_irq_override() check to only ever skip the IRQ override (return false) for GSI 1. 3. Add a check if there is a INT_SRC_OVR MADT entry for GSI 1 and in that case use the override even on ZEN, fixing:
217394 - IRQ override skipping breaks the Aya Neo Air Plus 6800U keyboard buttons https://bugzilla.kernel.org/show_bug.cgi?id=217394 217406 - very slow keyboard typing without IRQ override with new AMD Ryzen CPUs https://bugzilla.kernel.org/show_bug.cgi?id=217406 217336 - keyboard not working Asus TUF FA617NS https://bugzilla.kernel.org/show_bug.cgi?id=217336
Which are known AMD ZEN based laptops which do need the override for IRQ 1.
This short term plan is not ideal, but it does fix all currently known issues / models and does so in a way which will hopefully not cause regressions on any other models.
Plan 2. Long term, see if I can come up with a way to read back the actual trigger type set in the IOAPIC for IRQ 1 at boot (in drivers/acpi/resource.c) and use that.
Regards,
Hans
On Tue, Aug 8, 2023 at 10:36 AM Hans de Goede hdegoede@redhat.com wrote:
Hi,
On 8/7/23 17:19, Hans de Goede wrote:
Hi All,
On 8/6/23 17:14, Hans de Goede wrote:
Commit a9c4a912b7dc ("ACPI: resource: Remove "Zen" specific match and quirks") is causing keyboard problems for quite a log of AMD based laptop users, leading to many bug reports.
Revert this change for now, until we can come up with a better fix for the PS/2 IRQ trigger-type/polarity problems on some x86 laptops.
Fixes: a9c4a912b7dc ("ACPI: resource: Remove "Zen" specific match and quirks") Link: https://bugzilla.redhat.com/show_bug.cgi?id=2229165 Link: https://bugzilla.redhat.com/show_bug.cgi?id=2229317 Link: https://bugzilla.kernel.org/show_bug.cgi?id=217726 Cc: Mario Limonciello mario.limonciello@amd.com Cc: Linux regressions mailing list regressions@lists.linux.dev Cc: stable@vger.kernel.org Signed-off-by: Hans de Goede hdegoede@redhat.com
I've spend most of today analysing the situation / this problem :
213031 - MEDION notebook internal keyboard not recognized / not working correctly https://bugzilla.kernel.org/show_bug.cgi?id=213031
This is the bug that started it all, the issue here was overriding a level low DSDT entry:
IRQ (Level, ActiveLow, Exclusive, ) {1}
With an edge high entry from the MADT, note that edge high is the default mp_irqs[idx].irqflags value for legacy/ISA IRQs. The dmesg for the Ice Lake Medion M15T this bug is about shows no INT_SRC_OVR entry for IRQ 1 in the MADT, it does show INT_SRC_OVR entries for IRQ 0 and 9.
At first a fix was attempted to not use the MADT override unless the DSDT entry was edge high. But that caused regressions, so a switch to a DMI based approach was used instead. Noteworthy is that some of the regressions benefitted from a MADT override to high edge for IRQ 3 and 4 (UART IRQs) even though there are no INT_SRC_OVR messages in the dmesg of the machine with the regression.
*** fast forward to today ***
The DMI quirk based approach seems to have worked well for the Ice Lake era problems from approx. 3 years ago. But on AMD Zen based systems the situation seems to be more complex. Not using the MADT override is a problem for quite a few models. But using the MADT override is a problem on quite a few other models ...
Looking at the status quo for v6.4 where MADT overriding by default is not used, 3 bugs have been filed where the override is actually necessary (note dmesg snippets with patched kernel to enable MADT override):
217394 - IRQ override skipping breaks the Aya Neo Air Plus 6800U keyboard buttons https://bugzilla.kernel.org/show_bug.cgi?id=217394
Aya Neo Air Plus - AMD Ryzen 7 6800U
[ 0.003333] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl) [ 0.003333] ACPI: INT_SRC_OVR (bus 0 bus_irq 1 global_irq 1 low edge) [ 0.003333] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 low level) [ 0.410670] ACPI: IRQ 1 override to edge, low(!)
217406 - very slow keyboard typing without IRQ override with new AMD Ryzen CPUs https://bugzilla.kernel.org/show_bug.cgi?id=217406
HP Pavilion Aero 13 - AMD Ryzen 7735U
[ 0.026135] ACPI: INT_SRC_OVR (bus 0 bus_irq 1 global_irq 1 low edge) [ 0.026136] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl) [ 0.026137] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 low level)
[ 0.361640] ACPI: IRQ 1 override to edge, low(!)
217336 - keyboard not working Asus TUF FA617NS https://bugzilla.kernel.org/show_bug.cgi?id=217336
Asus TUF FA617NS - AMD Ryzen 7 7735HS
Noteworthy DSTD keyboard resource:
IRQNoFlags () {1}
ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl) ACPI: INT_SRC_OVR (bus 0 bus_irq 1 global_irq 1 low edge) ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 low level) ACPI: IRQ 1 override to edge, low(!)
So for all 3 do use MADT override on Zen bugs we have an INT_SRC_OVR dmesg entry for IRQ 1.
Unfortunately the "MAINGEAR Vector Pro 2 17" / "MG-VCP2-17A3070T" for which a quirk was added in commit 9946e39fe8d0 to force the override even though it it Zen based breaks this pattern:
[ 0.073733] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl) [ 0.073734] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 low level) [ 0.341347] ACPI: IRQ 1 override to edge, high(!)
Still the presence of an INT_SRC_OVR for a specific legacy IRQ seems to be a strong indicator that MADT overriding should be used in that case and can be used to at least reduce the amount of DMI quirks.
Another interesting data point is that all the devices for which DMI quirks are present for which MADT overriding should not be used for IRQ 1 all have a DSDT entry with the IRQ configured as level low and exclusive.
I think that the best thing to do might be to go back to the original approach from:
https://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git/commit/?...
and then limit this to IRQ1. Also maybe inverting the check to:
static bool irq_is_legacy(struct acpi_resource_irq *irq) { return !(irq->triggering == ACPI_LEVEL_SENSITIVE && irq->polarity == ACPI_ACTIVE_LOW && irq->shareable == ACPI_EXCLUSIVE); }
But I need to check if this will work for all the new Zen models for which we got bug reports after the recent dropping of 9946e39fe8d0 ("ACPI: resource: skip IRQ override on AMD Zen platforms")
So today I have started with continueing the investigation looking at laptop models where we used to not override because of:
if (boot_cpu_has(X86_FEATURE_ZEN)) return false;
And where removing this and thus using the override has led to a regression.
Looking at the acpidump-s from the following bugs:
https://bugzilla.redhat.com/show_bug.cgi?id=2229165 https://bugzilla.redhat.com/show_bug.cgi?id=2229317 https://bugzilla.kernel.org/show_bug.cgi?id=217726
All of these use the following settings for the kbd in the DSDT:
IRQ (Edge, ActiveLow, Shared, ) {1}
So we know these are at least 3 models with "Edge, ActiveLow, Shared" IRQ 1 settings which must not use the override. But the existing quirks before a9c4a912b7dc ("ACPI: resource: Remove "Zen" specific match and quirks") contain:
{ tongfang_gm_rg, 1, ACPI_EDGE_SENSITIVE, ACPI_ACTIVE_LOW, 1, true }, { maingear_laptop, 1, ACPI_EDGE_SENSITIVE, ACPI_ACTIVE_LOW, 1, true },
IOW models with "Edge, ActiveLow, Shared" IRQ 1 settings which OTOH must use the override, so as I was already afraid there is no easy for these DSDT IRQ 1 settings skip the override solution :|
So I have 2 plans to move forward with this:
Plan 1. Short them for 6.5 and backporting to 6.4.y (and other stable series):
- Revert a9c4a912b7dc
- Limit the acpi_dev_irq_override() check to only ever skip the IRQ override (return false) for GSI 1.
- Add a check if there is a INT_SRC_OVR MADT entry for GSI 1 and in that case use the override even on ZEN, fixing:
217394 - IRQ override skipping breaks the Aya Neo Air Plus 6800U keyboard buttons https://bugzilla.kernel.org/show_bug.cgi?id=217394 217406 - very slow keyboard typing without IRQ override with new AMD Ryzen CPUs https://bugzilla.kernel.org/show_bug.cgi?id=217406 217336 - keyboard not working Asus TUF FA617NS https://bugzilla.kernel.org/show_bug.cgi?id=217336
Which are known AMD ZEN based laptops which do need the override for IRQ 1.
This short term plan is not ideal, but it does fix all currently known issues / models and does so in a way which will hopefully not cause regressions on any other models.
Plan 2. Long term, see if I can come up with a way to read back the actual trigger type set in the IOAPIC for IRQ 1 at boot (in drivers/acpi/resource.c) and use that.
Sounds reasonable to me.
It looks like using the IRQ 1 configuration left by the BIOS as is would be the best choice unless that is not viable for some reason.
Hi,
On 8/8/23 13:18, Rafael J. Wysocki wrote:
On Tue, Aug 8, 2023 at 10:36 AM Hans de Goede hdegoede@redhat.com wrote:
Hi,
On 8/7/23 17:19, Hans de Goede wrote:
Hi All,
On 8/6/23 17:14, Hans de Goede wrote:
Commit a9c4a912b7dc ("ACPI: resource: Remove "Zen" specific match and quirks") is causing keyboard problems for quite a log of AMD based laptop users, leading to many bug reports.
Revert this change for now, until we can come up with a better fix for the PS/2 IRQ trigger-type/polarity problems on some x86 laptops.
Fixes: a9c4a912b7dc ("ACPI: resource: Remove "Zen" specific match and quirks") Link: https://bugzilla.redhat.com/show_bug.cgi?id=2229165 Link: https://bugzilla.redhat.com/show_bug.cgi?id=2229317 Link: https://bugzilla.kernel.org/show_bug.cgi?id=217726 Cc: Mario Limonciello mario.limonciello@amd.com Cc: Linux regressions mailing list regressions@lists.linux.dev Cc: stable@vger.kernel.org Signed-off-by: Hans de Goede hdegoede@redhat.com
I've spend most of today analysing the situation / this problem :
213031 - MEDION notebook internal keyboard not recognized / not working correctly https://bugzilla.kernel.org/show_bug.cgi?id=213031
This is the bug that started it all, the issue here was overriding a level low DSDT entry:
IRQ (Level, ActiveLow, Exclusive, ) {1}
With an edge high entry from the MADT, note that edge high is the default mp_irqs[idx].irqflags value for legacy/ISA IRQs. The dmesg for the Ice Lake Medion M15T this bug is about shows no INT_SRC_OVR entry for IRQ 1 in the MADT, it does show INT_SRC_OVR entries for IRQ 0 and 9.
At first a fix was attempted to not use the MADT override unless the DSDT entry was edge high. But that caused regressions, so a switch to a DMI based approach was used instead. Noteworthy is that some of the regressions benefitted from a MADT override to high edge for IRQ 3 and 4 (UART IRQs) even though there are no INT_SRC_OVR messages in the dmesg of the machine with the regression.
*** fast forward to today ***
The DMI quirk based approach seems to have worked well for the Ice Lake era problems from approx. 3 years ago. But on AMD Zen based systems the situation seems to be more complex. Not using the MADT override is a problem for quite a few models. But using the MADT override is a problem on quite a few other models ...
Looking at the status quo for v6.4 where MADT overriding by default is not used, 3 bugs have been filed where the override is actually necessary (note dmesg snippets with patched kernel to enable MADT override):
217394 - IRQ override skipping breaks the Aya Neo Air Plus 6800U keyboard buttons https://bugzilla.kernel.org/show_bug.cgi?id=217394
Aya Neo Air Plus - AMD Ryzen 7 6800U
[ 0.003333] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl) [ 0.003333] ACPI: INT_SRC_OVR (bus 0 bus_irq 1 global_irq 1 low edge) [ 0.003333] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 low level) [ 0.410670] ACPI: IRQ 1 override to edge, low(!)
217406 - very slow keyboard typing without IRQ override with new AMD Ryzen CPUs https://bugzilla.kernel.org/show_bug.cgi?id=217406
HP Pavilion Aero 13 - AMD Ryzen 7735U
[ 0.026135] ACPI: INT_SRC_OVR (bus 0 bus_irq 1 global_irq 1 low edge) [ 0.026136] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl) [ 0.026137] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 low level)
[ 0.361640] ACPI: IRQ 1 override to edge, low(!)
217336 - keyboard not working Asus TUF FA617NS https://bugzilla.kernel.org/show_bug.cgi?id=217336
Asus TUF FA617NS - AMD Ryzen 7 7735HS
Noteworthy DSTD keyboard resource:
IRQNoFlags () {1}
ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl) ACPI: INT_SRC_OVR (bus 0 bus_irq 1 global_irq 1 low edge) ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 low level) ACPI: IRQ 1 override to edge, low(!)
So for all 3 do use MADT override on Zen bugs we have an INT_SRC_OVR dmesg entry for IRQ 1.
Unfortunately the "MAINGEAR Vector Pro 2 17" / "MG-VCP2-17A3070T" for which a quirk was added in commit 9946e39fe8d0 to force the override even though it it Zen based breaks this pattern:
[ 0.073733] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl) [ 0.073734] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 low level) [ 0.341347] ACPI: IRQ 1 override to edge, high(!)
Still the presence of an INT_SRC_OVR for a specific legacy IRQ seems to be a strong indicator that MADT overriding should be used in that case and can be used to at least reduce the amount of DMI quirks.
Another interesting data point is that all the devices for which DMI quirks are present for which MADT overriding should not be used for IRQ 1 all have a DSDT entry with the IRQ configured as level low and exclusive.
I think that the best thing to do might be to go back to the original approach from:
https://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git/commit/?...
and then limit this to IRQ1. Also maybe inverting the check to:
static bool irq_is_legacy(struct acpi_resource_irq *irq) { return !(irq->triggering == ACPI_LEVEL_SENSITIVE && irq->polarity == ACPI_ACTIVE_LOW && irq->shareable == ACPI_EXCLUSIVE); }
But I need to check if this will work for all the new Zen models for which we got bug reports after the recent dropping of 9946e39fe8d0 ("ACPI: resource: skip IRQ override on AMD Zen platforms")
So today I have started with continueing the investigation looking at laptop models where we used to not override because of:
if (boot_cpu_has(X86_FEATURE_ZEN)) return false;
And where removing this and thus using the override has led to a regression.
Looking at the acpidump-s from the following bugs:
https://bugzilla.redhat.com/show_bug.cgi?id=2229165 https://bugzilla.redhat.com/show_bug.cgi?id=2229317 https://bugzilla.kernel.org/show_bug.cgi?id=217726
All of these use the following settings for the kbd in the DSDT:
IRQ (Edge, ActiveLow, Shared, ) {1}
So we know these are at least 3 models with "Edge, ActiveLow, Shared" IRQ 1 settings which must not use the override. But the existing quirks before a9c4a912b7dc ("ACPI: resource: Remove "Zen" specific match and quirks") contain:
{ tongfang_gm_rg, 1, ACPI_EDGE_SENSITIVE, ACPI_ACTIVE_LOW, 1, true }, { maingear_laptop, 1, ACPI_EDGE_SENSITIVE, ACPI_ACTIVE_LOW, 1, true },
IOW models with "Edge, ActiveLow, Shared" IRQ 1 settings which OTOH must use the override, so as I was already afraid there is no easy for these DSDT IRQ 1 settings skip the override solution :|
So I have 2 plans to move forward with this:
Plan 1. Short them for 6.5 and backporting to 6.4.y (and other stable series):
- Revert a9c4a912b7dc
- Limit the acpi_dev_irq_override() check to only ever skip the IRQ override (return false) for GSI 1.
- Add a check if there is a INT_SRC_OVR MADT entry for GSI 1 and in that case use the override even on ZEN, fixing:
217394 - IRQ override skipping breaks the Aya Neo Air Plus 6800U keyboard buttons https://bugzilla.kernel.org/show_bug.cgi?id=217394 217406 - very slow keyboard typing without IRQ override with new AMD Ryzen CPUs https://bugzilla.kernel.org/show_bug.cgi?id=217406 217336 - keyboard not working Asus TUF FA617NS https://bugzilla.kernel.org/show_bug.cgi?id=217336
Which are known AMD ZEN based laptops which do need the override for IRQ 1.
This short term plan is not ideal, but it does fix all currently known issues / models and does so in a way which will hopefully not cause regressions on any other models.
Plan 2. Long term, see if I can come up with a way to read back the actual trigger type set in the IOAPIC for IRQ 1 at boot (in drivers/acpi/resource.c) and use that.
Sounds reasonable to me.
It looks like using the IRQ 1 configuration left by the BIOS as is would be the best choice unless that is not viable for some reason.
Agreed, do you have a suggestion how to do that ? I've been looking at this but I've gotten a bit lost in all the layers of ioapic code.
It looks like we need to add a flag to acpi_register_gsi() for it to register a gsi while keeping the IOAPIC settings as is (or a new function) but it is not clear to me how to implement this.
An alternative method would be to call irq_get_trigger_type() for IRQ 1 and use that for the IRQ trigger info when calling acpi_register_gsi(), but I think we need to have the IRQ registered / added to the IRQ domain first ?
Regards,
Hans
Hi,
On Tue, Aug 8, 2023 at 3:31 PM Hans de Goede hdegoede@redhat.com wrote:
Hi,
On 8/8/23 13:18, Rafael J. Wysocki wrote:
On Tue, Aug 8, 2023 at 10:36 AM Hans de Goede hdegoede@redhat.com wrote:
Hi,
On 8/7/23 17:19, Hans de Goede wrote:
Hi All,
On 8/6/23 17:14, Hans de Goede wrote:
Commit a9c4a912b7dc ("ACPI: resource: Remove "Zen" specific match and quirks") is causing keyboard problems for quite a log of AMD based laptop users, leading to many bug reports.
Revert this change for now, until we can come up with a better fix for the PS/2 IRQ trigger-type/polarity problems on some x86 laptops.
Fixes: a9c4a912b7dc ("ACPI: resource: Remove "Zen" specific match and quirks") Link: https://bugzilla.redhat.com/show_bug.cgi?id=2229165 Link: https://bugzilla.redhat.com/show_bug.cgi?id=2229317 Link: https://bugzilla.kernel.org/show_bug.cgi?id=217726 Cc: Mario Limonciello mario.limonciello@amd.com Cc: Linux regressions mailing list regressions@lists.linux.dev Cc: stable@vger.kernel.org Signed-off-by: Hans de Goede hdegoede@redhat.com
I've spend most of today analysing the situation / this problem :
213031 - MEDION notebook internal keyboard not recognized / not working correctly https://bugzilla.kernel.org/show_bug.cgi?id=213031
This is the bug that started it all, the issue here was overriding a level low DSDT entry:
IRQ (Level, ActiveLow, Exclusive, ) {1}
With an edge high entry from the MADT, note that edge high is the default mp_irqs[idx].irqflags value for legacy/ISA IRQs. The dmesg for the Ice Lake Medion M15T this bug is about shows no INT_SRC_OVR entry for IRQ 1 in the MADT, it does show INT_SRC_OVR entries for IRQ 0 and 9.
At first a fix was attempted to not use the MADT override unless the DSDT entry was edge high. But that caused regressions, so a switch to a DMI based approach was used instead. Noteworthy is that some of the regressions benefitted from a MADT override to high edge for IRQ 3 and 4 (UART IRQs) even though there are no INT_SRC_OVR messages in the dmesg of the machine with the regression.
*** fast forward to today ***
The DMI quirk based approach seems to have worked well for the Ice Lake era problems from approx. 3 years ago. But on AMD Zen based systems the situation seems to be more complex. Not using the MADT override is a problem for quite a few models. But using the MADT override is a problem on quite a few other models ...
Looking at the status quo for v6.4 where MADT overriding by default is not used, 3 bugs have been filed where the override is actually necessary (note dmesg snippets with patched kernel to enable MADT override):
217394 - IRQ override skipping breaks the Aya Neo Air Plus 6800U keyboard buttons https://bugzilla.kernel.org/show_bug.cgi?id=217394
Aya Neo Air Plus - AMD Ryzen 7 6800U
[ 0.003333] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl) [ 0.003333] ACPI: INT_SRC_OVR (bus 0 bus_irq 1 global_irq 1 low edge) [ 0.003333] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 low level) [ 0.410670] ACPI: IRQ 1 override to edge, low(!)
217406 - very slow keyboard typing without IRQ override with new AMD Ryzen CPUs https://bugzilla.kernel.org/show_bug.cgi?id=217406
HP Pavilion Aero 13 - AMD Ryzen 7735U
[ 0.026135] ACPI: INT_SRC_OVR (bus 0 bus_irq 1 global_irq 1 low edge) [ 0.026136] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl) [ 0.026137] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 low level)
[ 0.361640] ACPI: IRQ 1 override to edge, low(!)
217336 - keyboard not working Asus TUF FA617NS https://bugzilla.kernel.org/show_bug.cgi?id=217336
Asus TUF FA617NS - AMD Ryzen 7 7735HS
Noteworthy DSTD keyboard resource:
IRQNoFlags () {1}
ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl) ACPI: INT_SRC_OVR (bus 0 bus_irq 1 global_irq 1 low edge) ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 low level) ACPI: IRQ 1 override to edge, low(!)
So for all 3 do use MADT override on Zen bugs we have an INT_SRC_OVR dmesg entry for IRQ 1.
Unfortunately the "MAINGEAR Vector Pro 2 17" / "MG-VCP2-17A3070T" for which a quirk was added in commit 9946e39fe8d0 to force the override even though it it Zen based breaks this pattern:
[ 0.073733] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl) [ 0.073734] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 low level) [ 0.341347] ACPI: IRQ 1 override to edge, high(!)
Still the presence of an INT_SRC_OVR for a specific legacy IRQ seems to be a strong indicator that MADT overriding should be used in that case and can be used to at least reduce the amount of DMI quirks.
Another interesting data point is that all the devices for which DMI quirks are present for which MADT overriding should not be used for IRQ 1 all have a DSDT entry with the IRQ configured as level low and exclusive.
I think that the best thing to do might be to go back to the original approach from:
https://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git/commit/?...
and then limit this to IRQ1. Also maybe inverting the check to:
static bool irq_is_legacy(struct acpi_resource_irq *irq) { return !(irq->triggering == ACPI_LEVEL_SENSITIVE && irq->polarity == ACPI_ACTIVE_LOW && irq->shareable == ACPI_EXCLUSIVE); }
But I need to check if this will work for all the new Zen models for which we got bug reports after the recent dropping of 9946e39fe8d0 ("ACPI: resource: skip IRQ override on AMD Zen platforms")
So today I have started with continueing the investigation looking at laptop models where we used to not override because of:
if (boot_cpu_has(X86_FEATURE_ZEN)) return false;
And where removing this and thus using the override has led to a regression.
Looking at the acpidump-s from the following bugs:
https://bugzilla.redhat.com/show_bug.cgi?id=2229165 https://bugzilla.redhat.com/show_bug.cgi?id=2229317 https://bugzilla.kernel.org/show_bug.cgi?id=217726
All of these use the following settings for the kbd in the DSDT:
IRQ (Edge, ActiveLow, Shared, ) {1}
So we know these are at least 3 models with "Edge, ActiveLow, Shared" IRQ 1 settings which must not use the override. But the existing quirks before a9c4a912b7dc ("ACPI: resource: Remove "Zen" specific match and quirks") contain:
{ tongfang_gm_rg, 1, ACPI_EDGE_SENSITIVE, ACPI_ACTIVE_LOW, 1, true }, { maingear_laptop, 1, ACPI_EDGE_SENSITIVE, ACPI_ACTIVE_LOW, 1, true },
IOW models with "Edge, ActiveLow, Shared" IRQ 1 settings which OTOH must use the override, so as I was already afraid there is no easy for these DSDT IRQ 1 settings skip the override solution :|
So I have 2 plans to move forward with this:
Plan 1. Short them for 6.5 and backporting to 6.4.y (and other stable series):
- Revert a9c4a912b7dc
- Limit the acpi_dev_irq_override() check to only ever skip the IRQ override (return false) for GSI 1.
- Add a check if there is a INT_SRC_OVR MADT entry for GSI 1 and in that case use the override even on ZEN, fixing:
217394 - IRQ override skipping breaks the Aya Neo Air Plus 6800U keyboard buttons https://bugzilla.kernel.org/show_bug.cgi?id=217394 217406 - very slow keyboard typing without IRQ override with new AMD Ryzen CPUs https://bugzilla.kernel.org/show_bug.cgi?id=217406 217336 - keyboard not working Asus TUF FA617NS https://bugzilla.kernel.org/show_bug.cgi?id=217336
Which are known AMD ZEN based laptops which do need the override for IRQ 1.
This short term plan is not ideal, but it does fix all currently known issues / models and does so in a way which will hopefully not cause regressions on any other models.
Plan 2. Long term, see if I can come up with a way to read back the actual trigger type set in the IOAPIC for IRQ 1 at boot (in drivers/acpi/resource.c) and use that.
Sounds reasonable to me.
It looks like using the IRQ 1 configuration left by the BIOS as is would be the best choice unless that is not viable for some reason.
Agreed, do you have a suggestion how to do that ? I've been looking at this but I've gotten a bit lost in all the layers of ioapic code.
It looks like we need to add a flag to acpi_register_gsi() for it to register a gsi while keeping the IOAPIC settings as is (or a new function) but it is not clear to me how to implement this.
An alternative method would be to call irq_get_trigger_type() for IRQ 1 and use that for the IRQ trigger info when calling acpi_register_gsi(), but I think we need to have the IRQ registered / added to the IRQ domain first ?
I need to take a deeper look into this, which I guess is going to take some time.
If you figure this out in the meantime, please let me know.
Hi Rafael,
On 8/8/23 15:37, Rafael J. Wysocki wrote:
Hi,
On Tue, Aug 8, 2023 at 3:31 PM Hans de Goede hdegoede@redhat.com wrote:
Hi,
On 8/8/23 13:18, Rafael J. Wysocki wrote:
On Tue, Aug 8, 2023 at 10:36 AM Hans de Goede hdegoede@redhat.com wrote:
Hi,
On 8/7/23 17:19, Hans de Goede wrote:
Hi All,
On 8/6/23 17:14, Hans de Goede wrote:
Commit a9c4a912b7dc ("ACPI: resource: Remove "Zen" specific match and quirks") is causing keyboard problems for quite a log of AMD based laptop users, leading to many bug reports.
Revert this change for now, until we can come up with a better fix for the PS/2 IRQ trigger-type/polarity problems on some x86 laptops.
Fixes: a9c4a912b7dc ("ACPI: resource: Remove "Zen" specific match and quirks") Link: https://bugzilla.redhat.com/show_bug.cgi?id=2229165 Link: https://bugzilla.redhat.com/show_bug.cgi?id=2229317 Link: https://bugzilla.kernel.org/show_bug.cgi?id=217726 Cc: Mario Limonciello mario.limonciello@amd.com Cc: Linux regressions mailing list regressions@lists.linux.dev Cc: stable@vger.kernel.org Signed-off-by: Hans de Goede hdegoede@redhat.com
I've spend most of today analysing the situation / this problem :
213031 - MEDION notebook internal keyboard not recognized / not working correctly https://bugzilla.kernel.org/show_bug.cgi?id=213031
This is the bug that started it all, the issue here was overriding a level low DSDT entry:
IRQ (Level, ActiveLow, Exclusive, ) {1}
With an edge high entry from the MADT, note that edge high is the default mp_irqs[idx].irqflags value for legacy/ISA IRQs. The dmesg for the Ice Lake Medion M15T this bug is about shows no INT_SRC_OVR entry for IRQ 1 in the MADT, it does show INT_SRC_OVR entries for IRQ 0 and 9.
At first a fix was attempted to not use the MADT override unless the DSDT entry was edge high. But that caused regressions, so a switch to a DMI based approach was used instead. Noteworthy is that some of the regressions benefitted from a MADT override to high edge for IRQ 3 and 4 (UART IRQs) even though there are no INT_SRC_OVR messages in the dmesg of the machine with the regression.
*** fast forward to today ***
The DMI quirk based approach seems to have worked well for the Ice Lake era problems from approx. 3 years ago. But on AMD Zen based systems the situation seems to be more complex. Not using the MADT override is a problem for quite a few models. But using the MADT override is a problem on quite a few other models ...
Looking at the status quo for v6.4 where MADT overriding by default is not used, 3 bugs have been filed where the override is actually necessary (note dmesg snippets with patched kernel to enable MADT override):
217394 - IRQ override skipping breaks the Aya Neo Air Plus 6800U keyboard buttons https://bugzilla.kernel.org/show_bug.cgi?id=217394
Aya Neo Air Plus - AMD Ryzen 7 6800U
[ 0.003333] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl) [ 0.003333] ACPI: INT_SRC_OVR (bus 0 bus_irq 1 global_irq 1 low edge) [ 0.003333] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 low level) [ 0.410670] ACPI: IRQ 1 override to edge, low(!)
217406 - very slow keyboard typing without IRQ override with new AMD Ryzen CPUs https://bugzilla.kernel.org/show_bug.cgi?id=217406
HP Pavilion Aero 13 - AMD Ryzen 7735U
[ 0.026135] ACPI: INT_SRC_OVR (bus 0 bus_irq 1 global_irq 1 low edge) [ 0.026136] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl) [ 0.026137] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 low level)
[ 0.361640] ACPI: IRQ 1 override to edge, low(!)
217336 - keyboard not working Asus TUF FA617NS https://bugzilla.kernel.org/show_bug.cgi?id=217336
Asus TUF FA617NS - AMD Ryzen 7 7735HS
Noteworthy DSTD keyboard resource:
IRQNoFlags () {1}
ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl) ACPI: INT_SRC_OVR (bus 0 bus_irq 1 global_irq 1 low edge) ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 low level) ACPI: IRQ 1 override to edge, low(!)
So for all 3 do use MADT override on Zen bugs we have an INT_SRC_OVR dmesg entry for IRQ 1.
Unfortunately the "MAINGEAR Vector Pro 2 17" / "MG-VCP2-17A3070T" for which a quirk was added in commit 9946e39fe8d0 to force the override even though it it Zen based breaks this pattern:
[ 0.073733] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl) [ 0.073734] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 low level) [ 0.341347] ACPI: IRQ 1 override to edge, high(!)
Still the presence of an INT_SRC_OVR for a specific legacy IRQ seems to be a strong indicator that MADT overriding should be used in that case and can be used to at least reduce the amount of DMI quirks.
Another interesting data point is that all the devices for which DMI quirks are present for which MADT overriding should not be used for IRQ 1 all have a DSDT entry with the IRQ configured as level low and exclusive.
I think that the best thing to do might be to go back to the original approach from:
https://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git/commit/?...
and then limit this to IRQ1. Also maybe inverting the check to:
static bool irq_is_legacy(struct acpi_resource_irq *irq) { return !(irq->triggering == ACPI_LEVEL_SENSITIVE && irq->polarity == ACPI_ACTIVE_LOW && irq->shareable == ACPI_EXCLUSIVE); }
But I need to check if this will work for all the new Zen models for which we got bug reports after the recent dropping of 9946e39fe8d0 ("ACPI: resource: skip IRQ override on AMD Zen platforms")
So today I have started with continueing the investigation looking at laptop models where we used to not override because of:
if (boot_cpu_has(X86_FEATURE_ZEN)) return false;
And where removing this and thus using the override has led to a regression.
Looking at the acpidump-s from the following bugs:
https://bugzilla.redhat.com/show_bug.cgi?id=2229165 https://bugzilla.redhat.com/show_bug.cgi?id=2229317 https://bugzilla.kernel.org/show_bug.cgi?id=217726
All of these use the following settings for the kbd in the DSDT:
IRQ (Edge, ActiveLow, Shared, ) {1}
So we know these are at least 3 models with "Edge, ActiveLow, Shared" IRQ 1 settings which must not use the override. But the existing quirks before a9c4a912b7dc ("ACPI: resource: Remove "Zen" specific match and quirks") contain:
{ tongfang_gm_rg, 1, ACPI_EDGE_SENSITIVE, ACPI_ACTIVE_LOW, 1, true }, { maingear_laptop, 1, ACPI_EDGE_SENSITIVE, ACPI_ACTIVE_LOW, 1, true },
IOW models with "Edge, ActiveLow, Shared" IRQ 1 settings which OTOH must use the override, so as I was already afraid there is no easy for these DSDT IRQ 1 settings skip the override solution :|
So I have 2 plans to move forward with this:
Plan 1. Short them for 6.5 and backporting to 6.4.y (and other stable series):
- Revert a9c4a912b7dc
- Limit the acpi_dev_irq_override() check to only ever skip the IRQ override (return false) for GSI 1.
- Add a check if there is a INT_SRC_OVR MADT entry for GSI 1 and in that case use the override even on ZEN, fixing:
217394 - IRQ override skipping breaks the Aya Neo Air Plus 6800U keyboard buttons https://bugzilla.kernel.org/show_bug.cgi?id=217394 217406 - very slow keyboard typing without IRQ override with new AMD Ryzen CPUs https://bugzilla.kernel.org/show_bug.cgi?id=217406 217336 - keyboard not working Asus TUF FA617NS https://bugzilla.kernel.org/show_bug.cgi?id=217336
Which are known AMD ZEN based laptops which do need the override for IRQ 1.
This short term plan is not ideal, but it does fix all currently known issues / models and does so in a way which will hopefully not cause regressions on any other models.
Plan 2. Long term, see if I can come up with a way to read back the actual trigger type set in the IOAPIC for IRQ 1 at boot (in drivers/acpi/resource.c) and use that.
Sounds reasonable to me.
It looks like using the IRQ 1 configuration left by the BIOS as is would be the best choice unless that is not viable for some reason.
Agreed, do you have a suggestion how to do that ? I've been looking at this but I've gotten a bit lost in all the layers of ioapic code.
It looks like we need to add a flag to acpi_register_gsi() for it to register a gsi while keeping the IOAPIC settings as is (or a new function) but it is not clear to me how to implement this.
An alternative method would be to call irq_get_trigger_type() for IRQ 1 and use that for the IRQ trigger info when calling acpi_register_gsi(), but I think we need to have the IRQ registered / added to the IRQ domain first ?
I need to take a deeper look into this, which I guess is going to take some time.
If you figure this out in the meantime, please let me know.
I don't see myself having time to dive deeper into this anytime soon, so if you can take a look (as time permits) that would be great.
Regards,
Hans
linux-stable-mirror@lists.linaro.org