Russ Anderson rja@hpe.com writes:
On Sun, Mar 24, 2024 at 11:31:39AM +0100, Ingo Molnar wrote:
- Steve Wahl steve.wahl@hpe.com wrote:
Some systems have ACPI tables that don't include everything that needs to be mapped for a successful kexec. These systems rely on identity maps that include the full gigabyte surrounding any smaller region requested for kexec success. Without this, they fail to kexec and end up doing a full firmware reboot.
So, reduce the use of GB pages only on systems where this is known to be necessary (specifically, UV systems).
Signed-off-by: Steve Wahl steve.wahl@hpe.com Fixes: d794734c9bbf ("x86/mm/ident_map: Use gbpages only where full GB page should be mapped.") Reported-by: Pavin Joseph me@pavinjoseph.com
Sigh, why was d794734c9bbf marked for a -stable backport? The commit never explains ...
I will try to explain, since Steve is offline. That commit fixes a legitimate bug where more address range is mapped (1G) than the requested address range. The fix avoids the issue of cpu speculativly loading beyond the requested range, which inludes specutalive loads from reserved memory. That is why it was marked for -stable.
To call that a bug presumes that the memory type range registers were not setup properly by the boot firmware.
I think I saw something that the existence of memory type range registers is changing/has changed in recent cpus, but historically it has been the job of the memory type range registers to ensure that the attributes of specific addresses are correct.
The memory attributes should guide the speculation.
To depend upon page tables to ensure the attributes are correct would presumably require a cpu that does not have support for disabling page tables in 32bit mode and does not have 16bit mode.
On older systems (I haven't looked lately) I have seen all kinds of oddities in the descriptions of memory. Like not describing the memory at address 0 where the real mode IDT lives. So I am not at all certain any firmware information can be depended upon or reasonably expected to be complete. For a while there was no concept of firmware memory areas so on some older systems it was actually required for their to be gaps in the description of memory provided to the system, so that operating systems would not touch memory used by the firmware.
Which definitely means in the case of kexec there are legitimate reasons to access memory areas that are well known but have not always been descried by the boot firmware. So the assertion that it is necessarily a firmware bug for not describing all of memory of memory is at least historically incorrect on x86_64.
There may be different requirements for the kexec identity map and the ordinary kernel boot type memory map and as we look at solutions that can reasonably be explored
Some memory ends up not being mapped, but it is not clear if it is due to some other bug, such as bios not accurately providing the right memory map or some other kernel code path did not map what it should.
The 1G mapping covers up that type issue.
I have seen this assertion repeated several times, and at least historically on x86_64 it is most definitely false. The E820 map which was the primary information source for a long time could not describe all of memory so depending upon it to be complete is erroneous.
When there's boot breakage with new patches, we back out the bad patch and re-try in 99.9% of the cases.
Steve can certainly merge his two patches and resubmit, to replace the reverted original patch. He should be on in the morning to speak for himself.
I am going to push back and suggest that this is perhaps a bug in the HPE UV systems firmware not setting up the cpus memory type range registers correctly.
Unless those systems are using new fangled cpus that don't have 16bit and 32bit support, and don't implement memory type range registers, I don't see how something that only affects HPE UV systems could be anything except an HPE UV specific bug.
Eric