On Mon, May 17 2021 at 21:25, Maximilian Luz wrote:
On 5/17/21 8:40 PM, Thomas Gleixner wrote:
Can you please add "apic=verbose" to the kernel command line and provide full dmesg output for a kernel w/o your patch and one with your patch applied?
I don't actually own an affected device, but I'm sure Sachi can provide you with that.
Ok.
As far as we can tell, due to the NULL PIC being chosen nr_legacy_irqs() returns 0. That in turn causes mp_check_pin_attr() to return false because is_level and active_low don't seem to match the expected values.
Ok.
That check is essentially ignored if nr_legacy_irqs() returns a high enough value.
Close enough.
I guess that might also be a firmware bug here? Not sure where the expected values come from.
They come from the interrupt override ACPI table and if not supplied then irq 0-15 is preset with default values, which are type=edge and polarity=high, i.e. the opposite of what the failing driver wants.
The ACPI table lacks an override entry for IRQ7. I looked at one of the dmesg files in that github thread and that has overrides:
[ 0.111674] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl) [ 0.111681] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 low level) [ 0.111688] ACPI: IRQ0 used by override. [ 0.111692] ACPI: IRQ9 used by override.
IRQ7 should have a corresponding entry as IRQ9 has:
https://github.com/linux-surface/acpidumps/blob/4da0148744164cea0c924dab92f4...
Subtable Type : 02 [Interrupt Source Override] Length : 0A Bus : 00 Source : 07 Interrupt : 00000007 Flags (decoded below) : 000F Polarity : 3 Trigger Mode : 3
Sachi can probably walk you through this a bit better as she's the one who tracked this down. See also [1, 2] and following comments.
Impressive detective work!
Sachi, can you please try the hack below to confirm the above?
It's not meant to be a solution, but it's the most trivial way to validate this.
I'm pretty sure that Windows on Surface does not care about the PIC at all. Whether that's on purpose to safe power or just because Windows ignores the PIC completely by now does not matter at all. No idea how that repeated poking on the PIC makes it come alive either and TBH, I don't care too much about it simply because Linux is able to cope with a missing PIC as long as the ACPI tables are correct.
I'm way too tired to think about a proper solution for that problem and I noticed another related issue in that dmesg output:
[ 0.272448] Failed to register legacy timer interrupt
It's not a problem which causes failures, but it's related to the missing PIC.
Needs some more thoughts with brain awake...
Thanks,
tglx --- --- a/arch/x86/kernel/acpi/boot.c +++ b/arch/x86/kernel/acpi/boot.c @@ -21,6 +21,7 @@ #include <linux/efi-bgrt.h> #include <linux/serial_core.h> #include <linux/pgtable.h> +#include <linux/dmi.h>
#include <asm/e820/api.h> #include <asm/irqdomain.h> @@ -1155,6 +1156,17 @@ static void __init mp_config_acpi_legacy } }
+static const struct dmi_system_id surface_quirk[] __initconst = { + { + .ident = "Microsoft Surface Laptop 4 (AMD)", + .matches = { + DMI_MATCH(DMI_SYS_VENDOR, "Microsoft Corporation"), + DMI_MATCH(DMI_PRODUCT_SKU, "Surface_Laptop_4_1952:1953") + }, + }, + {} +}; + /* * Parse IOAPIC related entries in MADT * returns 0 on success, < 0 on error @@ -1212,6 +1224,11 @@ static int __init acpi_parse_madt_ioapic acpi_sci_ioapic_setup(acpi_gbl_FADT.sci_interrupt, 0, 0, acpi_gbl_FADT.sci_interrupt);
+ if (dmi_check_system(surface_quirk)) { + pr_warn("Surface hack: Override irq 7\n"); + mp_override_legacy_irq(7, 3, 3, 7); + } + /* Fill in identity legacy mappings where no override */ mp_config_acpi_legacy_irqs();