On Thu, 31 Oct 2024 at 08:55, Jiri Slaby jirislaby@kernel.org wrote:
On 25. 10. 24, 9:30, Ard Biesheuvel wrote:
To me, it seems like the use of EFI_ACPI_RECLAIM_MEMORY in this case simply tickles a bug in the firmware that causes it to corrupt the memory attributes table. The fact that cold boot behaves differently is a strong indicator here.
I didn't see the results of the memory attribute table dumps on the bugzilla thread, but dumping this table from EFI is not very useful because it will get regenerated/updated at ExitBootServices() time. Unfortunately, that also takes away the console so capturing the state of that table before the EFI stub boots the kernel is not an easy thing to do.
Is the memattr table completely corrupted? It also has a version field, and only versions 1 and 2 are defined so we might use that to detect corruption.
So from a today test: https://bugzilla.suse.com/attachment.cgi?id=878296
efi: memattr: efi_memattr_init: tab=0x7752f018 ver=1
size=16+2*1705287680=0x00000000cb494010
version is NOT corrupted :).
OK, so the struct looks like this
typedef struct { u32 version; u32 num_entries; u32 desc_size; u32 flags; efi_memory_desc_t entry[]; } efi_memory_attributes_table_t;
and in the correct case, num_entries == 45 and desc_size == 48.
It is quite easy to sanity check this structure: desc_size should be equal to the desc_size in the memory map, and num_entries can never exceed 2x the number of entries in the EFI memory map.
I'll go and implement something that performs the check right after ExitBootServices(), and just drops the table if it is bogus (it isn't that important anyway)