Ingo Molnar mingo@kernel.org writes:
- Pavin Joseph me@pavinjoseph.com wrote:
On 3/29/24 13:45, Ingo Molnar wrote:
Just to clarify, we have the following 3 upstream (and soon to be upstream) versions:
v1: pre-d794734c9bbf kernels v2: d794734c9bbf x86/mm/ident_map: Use gbpages only where full GB page should be mapped. v3: c567f2948f57 Revert "x86/mm/ident_map: Use gbpages only where full GB page should be mapped."
Where v1 and v3 ought to be the same in behavior.
So how does the failure matrix look like on your systems? Is my understanding accurate:
Slight correction:
regular boot | regular kexec | nogbpages boot | nogbpages kexec boot -----------------|---------------|----------------|------------------ v1: OK | OK | OK | FAIL v2: OK | FAIL | OK | FAIL
Thanks!
So the question is now: does anyone have a theory about in what fashion the kexec nogbpages bootup differs from the regular nogbpages bootup to break on your system?
I'd have expected the described root cause of the firmware not properly enumerating all memory areas that need to be mapped to cause trouble on regular, non-kexec nogbpages bootups too. What makes the kexec bootup special to trigger this crash?
My blind hunch would be something in the first 1MiB being different. The first 1MiB is where all of the historical stuff is and where I have seen historical memory maps be less than perfectly accurate.
Changing what is mapped being the difference between success and failure sounds like some place that is dark and hard to debug a page fault is being triggered and that in turn becoming a triple fault.
Paving Joseph is there any chance you can provide your memory map? Perhaps just cat /proc/iomem?
If I have something to go one other than works/doesn't work I can probably say something intelligent.
Eric