On May 15, 2024 10:42:49 AM PDT, Ard Biesheuvel ardb@kernel.org wrote:
(cc Kees)
On Wed, 15 May 2024 at 19:32, Chaney, Ben bchaney@akamai.com wrote:
Hello, I encountered an issue when upgrading to 6.1.89 from 6.1.77. This upgrade caused a breakage in emulated persistent memory. Significant amounts of memory are missing from a pmem device:
fdisk -l /dev/pmem* Disk /dev/pmem0: 355.9 GiB, 382117871616 bytes, 746323968 sectors Units: sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 4096 bytes I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disk /dev/pmem1: 25.38 GiB, 27246198784 bytes, 53215232 sectors Units: sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 4096 bytes I/O size (minimum/optimal): 4096 bytes / 4096 bytes
The memmap parameter that created these pmem devices is “memmap=364416M!28672M,367488M!419840M”, which should cause a much larger amount of memory to be allocated to /dev/pmem1. The amount of missing memory and the device it is missing from is randomized on each reboot. There is some amount of memory missing in almost all cases, but not 100% of the time. Notably, the memory that is missing from these devices is not reclaimed by the system for general use. This system in question has 768GB of memory split evenly across two NUMA nodes. When the error occurs, there are also the following error messages showing up in dmesg:
[ 5.318317] nd_pmem namespace1.0: [mem 0x5c2042c000-0x5ff7ffffff flags 0x200] misaligned, unable to map [ 5.335073] nd_pmem: probe of namespace1.0 failed with error -95
Bisection implicates 2dfaeac3f38e4e550d215204eedd97a061fdc118 as the patch that first caused the issue. I believe the cause of the issue is that the EFI stub is randomizing the location of the decompressed kernel without accounting for the memory map, and it is clobbering some of the memory that has been reserved for pmem.
Does using 'nokaslr' on the kernel command line work around this?
I think in this particular case, we could just disable physical KASLR (but retain virtual KASLR) if memmap= appears on the kernel command line, on the basis that emulated persistent memory is somewhat of a niche use case, and physical KASLR is not as important as virtual KASLR (which shouldn't be implicated in this).
Yeah, that seems reasonable to me. As long as we put a notice to dmesg that physical ASLR was disabled due to memmap's physical reservation. If this usage becomes more common, we should find a better way, though.
This reminds me a bit of the work Steve has been exploring: https://lore.kernel.org/all/20240509163310.2aa0b2e1@rorschach.local.home/