Some BIOS-es contain a bug where they add addresses which map to system RAM in the PCI host bridge window returned by the ACPI _CRS method, see commit 4dc2287c1805 ("x86: avoid E820 regions when allocating address space").
To work around this bug Linux excludes E820 reserved addresses when allocating addresses from the PCI host bridge window since 2010.
Recently (2019) some systems have shown-up with E820 reservations which cover the entire _CRS returned PCI bridge memory window, causing all attempts to assign memory to PCI BARs which have not been setup by the BIOS to fail. For example here are the relevant dmesg bits from a Lenovo IdeaPad 3 15IIL 81WE:
[mem 0x000000004bc50000-0x00000000cfffffff] reserved pci_bus 0000:00: root bus resource [mem 0x65400000-0xbfffffff window]
The ACPI specifications appear to allow this new behavior:
The relationship between E820 and ACPI _CRS is not really very clear. ACPI v6.3, sec 15, table 15-374, says AddressRangeReserved means:
This range of addresses is in use or reserved by the system and is not to be included in the allocatable memory pool of the operating system's memory manager.
and it may be used when:
The address range is in use by a memory-mapped system device.
Furthermore, sec 15.2 says:
Address ranges defined for baseboard memory-mapped I/O devices, such as APICs, are returned as reserved.
A PCI host bridge qualifies as a baseboard memory-mapped I/O device, and its apertures are in use and certainly should not be included in the general allocatable pool, so the fact that some BIOS-es reports the PCI aperture as "reserved" in E820 doesn't seem like a BIOS bug.
So it seems that the excluding of E820 reserved addresses is a mistake.
Ideally Linux would fully stop excluding E820 reserved addresses, but then the old systems this was added for will regress. Instead keep the old behavior for old systems, while ignoring the E820 reservations for any systems from now on.
Old systems are defined here as BIOS year < 2018, this was chosen to make sure that E820 reservations will not be used on the currently affected systems, while at the same time also taking into account that the systems for which the E820 checking was originally added may have received BIOS updates for quite a while (esp. CVE related ones), giving them a more recent BIOS year then 2010.
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=206459 BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=1868899 BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=1871793 BugLink: https://bugs.launchpad.net/bugs/1878279 BugLink: https://bugs.launchpad.net/bugs/1931715 BugLink: https://bugs.launchpad.net/bugs/1932069 BugLink: https://bugs.launchpad.net/bugs/1921649 Cc: Benoit Grégoire benoitg@coeus.ca Cc: Hui Wang hui.wang@canonical.com Cc: stable@vger.kernel.org Reviewed-by: Mika Westerberg mika.westerberg@linux.intel.com Acked-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Acked-by: Bjorn Helgaas bhelgaas@google.com Signed-off-by: Hans de Goede hdegoede@redhat.com --- Changes in v6: - Remove the possibility to change the behavior from the commandline because of worries that users may use this to paper over other problems
Changes in v5: - Drop mention of Windows behavior from the commit msg, replace with a reference to the specs - Improve documentation in Documentation/admin-guide/kernel-parameters.txt - Reword the big comment added, use "PCI host bridge window" in it and drop all refences to Windows
Changes in v4: - Rewrap the big comment block to fit in 80 columns - Add Rafael's Acked-by - Add Cc: stable@vger.kernel.org
Changes in v3: - Commit msg tweaks (drop dmesg timestamps, typo fix) - Use "defined(CONFIG_...)" instead of "defined CONFIG_..." - Add Mika's Reviewed-by
Changes in v2: - Replace the per model DMI quirk approach with disabling E820 reservations checking for all systems with a BIOS year >= 2018 - Add documentation for the new kernel-parameters to Documentation/admin-guide/kernel-parameters.txt --- Other patches trying to address the same issue: https://lore.kernel.org/r/20210624095324.34906-1-hui.wang@canonical.com https://lore.kernel.org/r/20200617164734.84845-1-mika.westerberg@linux.intel... V1 patch: https://lore.kernel.org/r/20211005150956.303707-1-hdegoede@redhat.com --- arch/x86/kernel/resource.c | 23 ++++++++++++++++++++++- 1 file changed, 22 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kernel/resource.c b/arch/x86/kernel/resource.c index 9b9fb7882c20..9ae64f9af956 100644 --- a/arch/x86/kernel/resource.c +++ b/arch/x86/kernel/resource.c @@ -1,4 +1,5 @@ // SPDX-License-Identifier: GPL-2.0 +#include <linux/dmi.h> #include <linux/ioport.h> #include <asm/e820/api.h>
@@ -23,11 +24,31 @@ static void resource_clip(struct resource *res, resource_size_t start, res->start = end + 1; }
+/* + * Some BIOS-es contain a bug where they add addresses which map to + * system RAM in the PCI host bridge window returned by the ACPI _CRS + * method, see commit 4dc2287c1805 ("x86: avoid E820 regions when + * allocating address space"). To avoid this Linux by default excludes + * E820 reservations when allocating addresses since 2010. + * In 2019 some systems have shown-up with E820 reservations which cover + * the entire _CRS returned PCI host bridge window, causing all attempts + * to assign memory to PCI BARs to fail if Linux uses E820 reservations. + * + * Ideally Linux would fully stop using E820 reservations, but then + * the old systems this was added for will regress. + * Instead keep the old behavior for old systems, while ignoring the + * E820 reservations for any systems from now on. + */ static void remove_e820_regions(struct resource *avail) { - int i; + int i, year = dmi_get_bios_year(); struct e820_entry *entry;
+ if (year >= 2018) + return; + + pr_info_once("PCI: Removing E820 reservations from host bridge windows\n"); + for (i = 0; i < e820_table->nr_entries; i++) { entry = &e820_table->entries[i];
Hi All,
On 12/17/21 15:13, Hans de Goede wrote:
Some BIOS-es contain a bug where they add addresses which map to system RAM in the PCI host bridge window returned by the ACPI _CRS method, see commit 4dc2287c1805 ("x86: avoid E820 regions when allocating address space").
To work around this bug Linux excludes E820 reserved addresses when allocating addresses from the PCI host bridge window since 2010.
Recently (2019) some systems have shown-up with E820 reservations which cover the entire _CRS returned PCI bridge memory window, causing all attempts to assign memory to PCI BARs which have not been setup by the BIOS to fail. For example here are the relevant dmesg bits from a Lenovo IdeaPad 3 15IIL 81WE:
[mem 0x000000004bc50000-0x00000000cfffffff] reserved pci_bus 0000:00: root bus resource [mem 0x65400000-0xbfffffff window]
The ACPI specifications appear to allow this new behavior:
The relationship between E820 and ACPI _CRS is not really very clear. ACPI v6.3, sec 15, table 15-374, says AddressRangeReserved means:
This range of addresses is in use or reserved by the system and is not to be included in the allocatable memory pool of the operating system's memory manager.
and it may be used when:
The address range is in use by a memory-mapped system device.
Furthermore, sec 15.2 says:
Address ranges defined for baseboard memory-mapped I/O devices, such as APICs, are returned as reserved.
A PCI host bridge qualifies as a baseboard memory-mapped I/O device, and its apertures are in use and certainly should not be included in the general allocatable pool, so the fact that some BIOS-es reports the PCI aperture as "reserved" in E820 doesn't seem like a BIOS bug.
So it seems that the excluding of E820 reserved addresses is a mistake.
Ideally Linux would fully stop excluding E820 reserved addresses, but then the old systems this was added for will regress. Instead keep the old behavior for old systems, while ignoring the E820 reservations for any systems from now on.
Old systems are defined here as BIOS year < 2018, this was chosen to make sure that E820 reservations will not be used on the currently affected systems, while at the same time also taking into account that the systems for which the E820 checking was originally added may have received BIOS updates for quite a while (esp. CVE related ones), giving them a more recent BIOS year then 2010.
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=206459 BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=1868899 BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=1871793 BugLink: https://bugs.launchpad.net/bugs/1878279 BugLink: https://bugs.launchpad.net/bugs/1931715 BugLink: https://bugs.launchpad.net/bugs/1932069 BugLink: https://bugs.launchpad.net/bugs/1921649 Cc: Benoit Grégoire benoitg@coeus.ca Cc: Hui Wang hui.wang@canonical.com Cc: stable@vger.kernel.org Reviewed-by: Mika Westerberg mika.westerberg@linux.intel.com Acked-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Acked-by: Bjorn Helgaas bhelgaas@google.com Signed-off-by: Hans de Goede hdegoede@redhat.com
Changes in v6:
- Remove the possibility to change the behavior from the commandline because of worries that users may use this to paper over other problems
ping ?
Regards,
Hans
Changes in v5:
- Drop mention of Windows behavior from the commit msg, replace with a reference to the specs
- Improve documentation in Documentation/admin-guide/kernel-parameters.txt
- Reword the big comment added, use "PCI host bridge window" in it and drop all refences to Windows
Changes in v4:
- Rewrap the big comment block to fit in 80 columns
- Add Rafael's Acked-by
- Add Cc: stable@vger.kernel.org
Changes in v3:
- Commit msg tweaks (drop dmesg timestamps, typo fix)
- Use "defined(CONFIG_...)" instead of "defined CONFIG_..."
- Add Mika's Reviewed-by
Changes in v2:
- Replace the per model DMI quirk approach with disabling E820 reservations checking for all systems with a BIOS year >= 2018
- Add documentation for the new kernel-parameters to Documentation/admin-guide/kernel-parameters.txt
Other patches trying to address the same issue: https://lore.kernel.org/r/20210624095324.34906-1-hui.wang@canonical.com https://lore.kernel.org/r/20200617164734.84845-1-mika.westerberg@linux.intel... V1 patch: https://lore.kernel.org/r/20211005150956.303707-1-hdegoede@redhat.com
arch/x86/kernel/resource.c | 23 ++++++++++++++++++++++- 1 file changed, 22 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kernel/resource.c b/arch/x86/kernel/resource.c index 9b9fb7882c20..9ae64f9af956 100644 --- a/arch/x86/kernel/resource.c +++ b/arch/x86/kernel/resource.c @@ -1,4 +1,5 @@ // SPDX-License-Identifier: GPL-2.0 +#include <linux/dmi.h> #include <linux/ioport.h> #include <asm/e820/api.h> @@ -23,11 +24,31 @@ static void resource_clip(struct resource *res, resource_size_t start, res->start = end + 1; } +/*
- Some BIOS-es contain a bug where they add addresses which map to
- system RAM in the PCI host bridge window returned by the ACPI _CRS
- method, see commit 4dc2287c1805 ("x86: avoid E820 regions when
- allocating address space"). To avoid this Linux by default excludes
- E820 reservations when allocating addresses since 2010.
- In 2019 some systems have shown-up with E820 reservations which cover
- the entire _CRS returned PCI host bridge window, causing all attempts
- to assign memory to PCI BARs to fail if Linux uses E820 reservations.
- Ideally Linux would fully stop using E820 reservations, but then
- the old systems this was added for will regress.
- Instead keep the old behavior for old systems, while ignoring the
- E820 reservations for any systems from now on.
- */
static void remove_e820_regions(struct resource *avail) {
- int i;
- int i, year = dmi_get_bios_year(); struct e820_entry *entry;
- if (year >= 2018)
return;
- pr_info_once("PCI: Removing E820 reservations from host bridge windows\n");
- for (i = 0; i < e820_table->nr_entries; i++) { entry = &e820_table->entries[i];
On Mon, Jan 10, 2022 at 12:41:37PM +0100, Hans de Goede wrote:
Hi All,
On 12/17/21 15:13, Hans de Goede wrote:
Some BIOS-es contain a bug where they add addresses which map to system RAM in the PCI host bridge window returned by the ACPI _CRS method, see commit 4dc2287c1805 ("x86: avoid E820 regions when allocating address space").
To work around this bug Linux excludes E820 reserved addresses when allocating addresses from the PCI host bridge window since 2010.
Recently (2019) some systems have shown-up with E820 reservations which cover the entire _CRS returned PCI bridge memory window, causing all attempts to assign memory to PCI BARs which have not been setup by the BIOS to fail. For example here are the relevant dmesg bits from a Lenovo IdeaPad 3 15IIL 81WE:
[mem 0x000000004bc50000-0x00000000cfffffff] reserved pci_bus 0000:00: root bus resource [mem 0x65400000-0xbfffffff window]
The ACPI specifications appear to allow this new behavior:
The relationship between E820 and ACPI _CRS is not really very clear. ACPI v6.3, sec 15, table 15-374, says AddressRangeReserved means:
This range of addresses is in use or reserved by the system and is not to be included in the allocatable memory pool of the operating system's memory manager.
and it may be used when:
The address range is in use by a memory-mapped system device.
Furthermore, sec 15.2 says:
Address ranges defined for baseboard memory-mapped I/O devices, such as APICs, are returned as reserved.
A PCI host bridge qualifies as a baseboard memory-mapped I/O device, and its apertures are in use and certainly should not be included in the general allocatable pool, so the fact that some BIOS-es reports the PCI aperture as "reserved" in E820 doesn't seem like a BIOS bug.
So it seems that the excluding of E820 reserved addresses is a mistake.
Ideally Linux would fully stop excluding E820 reserved addresses, but then the old systems this was added for will regress. Instead keep the old behavior for old systems, while ignoring the E820 reservations for any systems from now on.
Old systems are defined here as BIOS year < 2018, this was chosen to make sure that E820 reservations will not be used on the currently affected systems, while at the same time also taking into account that the systems for which the E820 checking was originally added may have received BIOS updates for quite a while (esp. CVE related ones), giving them a more recent BIOS year then 2010.
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=206459 BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=1868899 BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=1871793 BugLink: https://bugs.launchpad.net/bugs/1878279 BugLink: https://bugs.launchpad.net/bugs/1931715 BugLink: https://bugs.launchpad.net/bugs/1932069 BugLink: https://bugs.launchpad.net/bugs/1921649 Cc: Benoit Grégoire benoitg@coeus.ca Cc: Hui Wang hui.wang@canonical.com Cc: stable@vger.kernel.org Reviewed-by: Mika Westerberg mika.westerberg@linux.intel.com Acked-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Acked-by: Bjorn Helgaas bhelgaas@google.com Signed-off-by: Hans de Goede hdegoede@redhat.com
Changes in v6:
- Remove the possibility to change the behavior from the commandline because of worries that users may use this to paper over other problems
ping ?
Thanks, Hans. Maybe I'm quixotic, but I'm still hoping for an approach based on firmware behavior instead of firmware date. If nobody else tries, I will eventually try myself, but I don't have any ETA.
Bjorn
Hi,
On 1/10/22 18:11, Bjorn Helgaas wrote:
On Mon, Jan 10, 2022 at 12:41:37PM +0100, Hans de Goede wrote:
Hi All,
On 12/17/21 15:13, Hans de Goede wrote:
Some BIOS-es contain a bug where they add addresses which map to system RAM in the PCI host bridge window returned by the ACPI _CRS method, see commit 4dc2287c1805 ("x86: avoid E820 regions when allocating address space").
To work around this bug Linux excludes E820 reserved addresses when allocating addresses from the PCI host bridge window since 2010.
Recently (2019) some systems have shown-up with E820 reservations which cover the entire _CRS returned PCI bridge memory window, causing all attempts to assign memory to PCI BARs which have not been setup by the BIOS to fail. For example here are the relevant dmesg bits from a Lenovo IdeaPad 3 15IIL 81WE:
[mem 0x000000004bc50000-0x00000000cfffffff] reserved pci_bus 0000:00: root bus resource [mem 0x65400000-0xbfffffff window]
The ACPI specifications appear to allow this new behavior:
The relationship between E820 and ACPI _CRS is not really very clear. ACPI v6.3, sec 15, table 15-374, says AddressRangeReserved means:
This range of addresses is in use or reserved by the system and is not to be included in the allocatable memory pool of the operating system's memory manager.
and it may be used when:
The address range is in use by a memory-mapped system device.
Furthermore, sec 15.2 says:
Address ranges defined for baseboard memory-mapped I/O devices, such as APICs, are returned as reserved.
A PCI host bridge qualifies as a baseboard memory-mapped I/O device, and its apertures are in use and certainly should not be included in the general allocatable pool, so the fact that some BIOS-es reports the PCI aperture as "reserved" in E820 doesn't seem like a BIOS bug.
So it seems that the excluding of E820 reserved addresses is a mistake.
Ideally Linux would fully stop excluding E820 reserved addresses, but then the old systems this was added for will regress. Instead keep the old behavior for old systems, while ignoring the E820 reservations for any systems from now on.
Old systems are defined here as BIOS year < 2018, this was chosen to make sure that E820 reservations will not be used on the currently affected systems, while at the same time also taking into account that the systems for which the E820 checking was originally added may have received BIOS updates for quite a while (esp. CVE related ones), giving them a more recent BIOS year then 2010.
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=206459 BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=1868899 BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=1871793 BugLink: https://bugs.launchpad.net/bugs/1878279 BugLink: https://bugs.launchpad.net/bugs/1931715 BugLink: https://bugs.launchpad.net/bugs/1932069 BugLink: https://bugs.launchpad.net/bugs/1921649 Cc: Benoit Grégoire benoitg@coeus.ca Cc: Hui Wang hui.wang@canonical.com Cc: stable@vger.kernel.org Reviewed-by: Mika Westerberg mika.westerberg@linux.intel.com Acked-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Acked-by: Bjorn Helgaas bhelgaas@google.com Signed-off-by: Hans de Goede hdegoede@redhat.com
Changes in v6:
- Remove the possibility to change the behavior from the commandline because of worries that users may use this to paper over other problems
ping ?
Thanks, Hans. Maybe I'm quixotic, but I'm still hoping for an approach based on firmware behavior instead of firmware date. If nobody else tries, I will eventually try myself, but I don't have any ETA.
I really do NOT see how doing a better approach later blocks merging the date based fix now ?
The date based approach can simply be replaced by any better solution later.
Can we please merge the date based approach now so peoples broken systems get fixed now, rather then at some unknown later time ?
Regards,
Hans
On Mon, Jan 10, 2022 at 10:25 PM Hans de Goede hdegoede@redhat.com wrote:
Hi,
On 1/10/22 18:11, Bjorn Helgaas wrote:
On Mon, Jan 10, 2022 at 12:41:37PM +0100, Hans de Goede wrote:
Hi All,
On 12/17/21 15:13, Hans de Goede wrote:
Some BIOS-es contain a bug where they add addresses which map to system RAM in the PCI host bridge window returned by the ACPI _CRS method, see commit 4dc2287c1805 ("x86: avoid E820 regions when allocating address space").
To work around this bug Linux excludes E820 reserved addresses when allocating addresses from the PCI host bridge window since 2010.
Recently (2019) some systems have shown-up with E820 reservations which cover the entire _CRS returned PCI bridge memory window, causing all attempts to assign memory to PCI BARs which have not been setup by the BIOS to fail. For example here are the relevant dmesg bits from a Lenovo IdeaPad 3 15IIL 81WE:
[mem 0x000000004bc50000-0x00000000cfffffff] reserved pci_bus 0000:00: root bus resource [mem 0x65400000-0xbfffffff window]
The ACPI specifications appear to allow this new behavior:
The relationship between E820 and ACPI _CRS is not really very clear. ACPI v6.3, sec 15, table 15-374, says AddressRangeReserved means:
This range of addresses is in use or reserved by the system and is not to be included in the allocatable memory pool of the operating system's memory manager.
and it may be used when:
The address range is in use by a memory-mapped system device.
Furthermore, sec 15.2 says:
Address ranges defined for baseboard memory-mapped I/O devices, such as APICs, are returned as reserved.
A PCI host bridge qualifies as a baseboard memory-mapped I/O device, and its apertures are in use and certainly should not be included in the general allocatable pool, so the fact that some BIOS-es reports the PCI aperture as "reserved" in E820 doesn't seem like a BIOS bug.
So it seems that the excluding of E820 reserved addresses is a mistake.
Ideally Linux would fully stop excluding E820 reserved addresses, but then the old systems this was added for will regress. Instead keep the old behavior for old systems, while ignoring the E820 reservations for any systems from now on.
Old systems are defined here as BIOS year < 2018, this was chosen to make sure that E820 reservations will not be used on the currently affected systems, while at the same time also taking into account that the systems for which the E820 checking was originally added may have received BIOS updates for quite a while (esp. CVE related ones), giving them a more recent BIOS year then 2010.
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=206459 BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=1868899 BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=1871793 BugLink: https://bugs.launchpad.net/bugs/1878279 BugLink: https://bugs.launchpad.net/bugs/1931715 BugLink: https://bugs.launchpad.net/bugs/1932069 BugLink: https://bugs.launchpad.net/bugs/1921649 Cc: Benoit Grégoire benoitg@coeus.ca Cc: Hui Wang hui.wang@canonical.com Cc: stable@vger.kernel.org Reviewed-by: Mika Westerberg mika.westerberg@linux.intel.com Acked-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Acked-by: Bjorn Helgaas bhelgaas@google.com Signed-off-by: Hans de Goede hdegoede@redhat.com
Changes in v6:
- Remove the possibility to change the behavior from the commandline because of worries that users may use this to paper over other problems
ping ?
Thanks, Hans. Maybe I'm quixotic, but I'm still hoping for an approach based on firmware behavior instead of firmware date. If nobody else tries, I will eventually try myself, but I don't have any ETA.
I really do NOT see how doing a better approach later blocks merging the date based fix now ?
The date based approach can simply be replaced by any better solution later.
Agreed.
Can we please merge the date based approach now so peoples broken systems get fixed now, rather then at some unknown later time ?
OK, I'll queue it up. Thanks!
linux-stable-mirror@lists.linaro.org