On Fri, Mar 22, 2024 at 10:40:37AM -0700, Dave Hansen wrote:
On 3/22/24 10:31, Eric W. Biederman wrote:
I'd much rather add synthetic entries to the memory maps that have this information than hack around it by assuming that things are within a gigabyte.
So this change is a partial revert of a change that broke kexec in existing configurations. To fix a regression that breaks kexec.
Hi, Dave!
Let's back up for a second:
- Mapping extra memory on UV systems causes halts[1]
- Mapping extra memory on UV systems breaks kexec (this thread)
These are the same. The most reliable way to create the problem[1] on UV is a kexec to a kdump kernel, because of the typical placement of the kdump kernel active region with respect to the reserved addresses that cause the halts. (The distros we typically run place the crashkernel just below the highest reserved region, where a gbpage can include both.)
What you didn't state here is the third bullet that this patch addresses.
* Neglecting to map extra memory on some (firmware buggy?) non-UV systems breaks kexec.
So we're in a pickle. I understand your concern for kexec. But I'm concerned that fixing the kexec problem will re-expose us to the [1] problem.
Steve, can you explain a bit why this patch doesn't re-expose the kernel to the [1] bug?
This patch still has UV systems avoid gbpages that go far outside actual requested regions, but allows the full gb pages on other systems. On UV systems, the new gbpage algorithm is followed. On non-UV systems, gbpages are allowed even for requests that don't cover a complete gbpage -- essentially the former algorithm but using the new code.
Hope that makes sense.
I would probably consider this buggy firmware, but got enough reports of this regression (from Pavin Joseph, Eric Hagberg, and Sara Brofeldt, all of whom tested the patch to see if it cured the regression) that it seemd everyone would want it fixed quickly and point fingers later.
In the private debugging exchanges with Pavin, I got some printks of regions that were mapped, and did one exchange with hard-coded adding regions not covered on his particular system back into the table; there were four regions left out. I added all four in one patch. I could have dived in further to diagnose which of the missing region(s) were actually necessary to get kexec to succeed, but couldn't see what I would do with that information once I had it, as I don't see a way to generalize this to other platforms exhibiting the problem.
Thanks,
--> Steve