From: Ard Biesheuvel <ardb(a)kernel.org>
The legacy decompressor has elaborate logic to ensure that the
randomized physical placement of the decompressed kernel image does not
conflict with any memory reservations, including ones specified on the
command line using mem=, memmap=, efi_fake_mem= or hugepages=, which are
taken into account by the kernel proper at a later stage.
When booting in EFI mode, it is the firmware's job to ensure that the
chosen range does not conflict with any memory reservations that it
knows about, and this is trivially achieved by using the firmware's
memory allocation APIs.
That leaves reservations specified on the command line, though, which
the firmware knows nothing about, as these regions have no other special
significance to the platform. Since commit
a1b87d54f4e4 ("x86/efistub: Avoid legacy decompressor when doing EFI boot")
these reservations are not taken into account when randomizing the
physical placement, which may result in conflicts where the memory
cannot be reserved by the kernel proper because its own executable image
resides there.
To avoid having to duplicate or reuse the existing complicated logic,
disable physical KASLR entirely when such overrides are specified. These
are mostly diagnostic tools or niche features, and physical KASLR (as
opposed to virtual KASLR, which is much more important as it affects the
memory addresses observed by code executing in the kernel) is something
we can live without.
Closes: https://lkml.kernel.org/r/FA5F6719-8824-4B04-803E-82990E65E627%40akamai.com
Reported-by: Ben Chaney <bchaney(a)akamai.com>
Fixes: a1b87d54f4e4 ("x86/efistub: Avoid legacy decompressor when doing EFI boot")
Cc: <stable(a)vger.kernel.org> # v6.1+
Signed-off-by: Ard Biesheuvel <ardb(a)kernel.org>
---
drivers/firmware/efi/libstub/x86-stub.c | 28 +++++++++++++++++++++++--
1 file changed, 26 insertions(+), 2 deletions(-)
diff --git a/drivers/firmware/efi/libstub/x86-stub.c b/drivers/firmware/efi/libstub/x86-stub.c
index d5a8182cf2e1..1983fd3bf392 100644
--- a/drivers/firmware/efi/libstub/x86-stub.c
+++ b/drivers/firmware/efi/libstub/x86-stub.c
@@ -776,6 +776,26 @@ static void error(char *str)
efi_warn("Decompression failed: %s\n", str);
}
+static const char *cmdline_memmap_override;
+
+static efi_status_t parse_options(const char *cmdline)
+{
+ static const char opts[][14] = {
+ "mem=", "memmap=", "efi_fake_mem=", "hugepages="
+ };
+
+ for (int i = 0; i < ARRAY_SIZE(opts); i++) {
+ const char *p = strstr(cmdline, opts[i]);
+
+ if (p == cmdline || (p > cmdline && isspace(p[-1]))) {
+ cmdline_memmap_override = opts[i];
+ break;
+ }
+ }
+
+ return efi_parse_options(cmdline);
+}
+
static efi_status_t efi_decompress_kernel(unsigned long *kernel_entry)
{
unsigned long virt_addr = LOAD_PHYSICAL_ADDR;
@@ -807,6 +827,10 @@ static efi_status_t efi_decompress_kernel(unsigned long *kernel_entry)
!memcmp(efistub_fw_vendor(), ami, sizeof(ami))) {
efi_debug("AMI firmware v2.0 or older detected - disabling physical KASLR\n");
seed[0] = 0;
+ } else if (cmdline_memmap_override) {
+ efi_info("%s detected on the kernel command line - disabling physical KASLR\n",
+ cmdline_memmap_override);
+ seed[0] = 0;
}
boot_params_ptr->hdr.loadflags |= KASLR_FLAG;
@@ -883,7 +907,7 @@ void __noreturn efi_stub_entry(efi_handle_t handle,
}
#ifdef CONFIG_CMDLINE_BOOL
- status = efi_parse_options(CONFIG_CMDLINE);
+ status = parse_options(CONFIG_CMDLINE);
if (status != EFI_SUCCESS) {
efi_err("Failed to parse options\n");
goto fail;
@@ -892,7 +916,7 @@ void __noreturn efi_stub_entry(efi_handle_t handle,
if (!IS_ENABLED(CONFIG_CMDLINE_OVERRIDE)) {
unsigned long cmdline_paddr = ((u64)hdr->cmd_line_ptr |
((u64)boot_params->ext_cmd_line_ptr << 32));
- status = efi_parse_options((char *)cmdline_paddr);
+ status = parse_options((char *)cmdline_paddr);
if (status != EFI_SUCCESS) {
efi_err("Failed to parse options\n");
goto fail;
--
2.45.0.rc1.225.g2a3ae87e7f-goog
The canary in collect_domain_accesses() can be triggered when trying to
link a root mount point. This cannot work in practice because this
directory is mounted, but the VFS check is done after the call to
security_path_link().
Do not use source directory's d_parent when the source directory is the
mount point.
Add tests to check error codes when renaming or linking a mount root
directory. This previously triggered a kernel warning. The
linux/mount.h file is not sorted with other headers to ease backport to
Linux 6.1 .
Cc: Günther Noack <gnoack(a)google.com>
Cc: Paul Moore <paul(a)paul-moore.com>
Cc: stable(a)vger.kernel.org
Reported-by: syzbot+bf4903dc7e12b18ebc87(a)syzkaller.appspotmail.com
Fixes: b91c3e4ea756 ("landlock: Add support for file reparenting with LANDLOCK_ACCESS_FS_REFER")
Closes: https://lore.kernel.org/r/000000000000553d3f0618198200@google.com
Signed-off-by: Mickaël Salaün <mic(a)digikod.net>
Link: https://lore.kernel.org/r/20240516181935.1645983-2-mic@digikod.net
---
security/landlock/fs.c | 13 +++++++++++--
1 file changed, 11 insertions(+), 2 deletions(-)
diff --git a/security/landlock/fs.c b/security/landlock/fs.c
index 22d8b7c28074..7877a64cc6b8 100644
--- a/security/landlock/fs.c
+++ b/security/landlock/fs.c
@@ -1110,6 +1110,7 @@ static int current_check_refer_path(struct dentry *const old_dentry,
bool allow_parent1, allow_parent2;
access_mask_t access_request_parent1, access_request_parent2;
struct path mnt_dir;
+ struct dentry *old_parent;
layer_mask_t layer_masks_parent1[LANDLOCK_NUM_ACCESS_FS] = {},
layer_masks_parent2[LANDLOCK_NUM_ACCESS_FS] = {};
@@ -1157,9 +1158,17 @@ static int current_check_refer_path(struct dentry *const old_dentry,
mnt_dir.mnt = new_dir->mnt;
mnt_dir.dentry = new_dir->mnt->mnt_root;
+ /*
+ * old_dentry may be the root of the common mount point and
+ * !IS_ROOT(old_dentry) at the same time (e.g. with open_tree() and
+ * OPEN_TREE_CLONE). We do not need to call dget(old_parent) because
+ * we keep a reference to old_dentry.
+ */
+ old_parent = (old_dentry == mnt_dir.dentry) ? old_dentry :
+ old_dentry->d_parent;
+
/* new_dir->dentry is equal to new_dentry->d_parent */
- allow_parent1 = collect_domain_accesses(dom, mnt_dir.dentry,
- old_dentry->d_parent,
+ allow_parent1 = collect_domain_accesses(dom, mnt_dir.dentry, old_parent,
&layer_masks_parent1);
allow_parent2 = collect_domain_accesses(
dom, mnt_dir.dentry, new_dir->dentry, &layer_masks_parent2);
--
2.45.0
Hello,
I encountered an issue when upgrading to 6.1.89 from 6.1.77. This upgrade caused a breakage in emulated persistent memory. Significant amounts of memory are missing from a pmem device:
fdisk -l /dev/pmem*
Disk /dev/pmem0: 355.9 GiB, 382117871616 bytes, 746323968 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disk /dev/pmem1: 25.38 GiB, 27246198784 bytes, 53215232 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
The memmap parameter that created these pmem devices is “memmap=364416M!28672M,367488M!419840M”, which should cause a much larger amount of memory to be allocated to /dev/pmem1. The amount of missing memory and the device it is missing from is randomized on each reboot. There is some amount of memory missing in almost all cases, but not 100% of the time. Notably, the memory that is missing from these devices is not reclaimed by the system for general use. This system in question has 768GB of memory split evenly across two NUMA nodes.
When the error occurs, there are also the following error messages showing up in dmesg:
[ 5.318317] nd_pmem namespace1.0: [mem 0x5c2042c000-0x5ff7ffffff flags 0x200] misaligned, unable to map
[ 5.335073] nd_pmem: probe of namespace1.0 failed with error -95
Bisection implicates 2dfaeac3f38e4e550d215204eedd97a061fdc118 as the patch that first caused the issue. I believe the cause of the issue is that the EFI stub is randomizing the location of the decompressed kernel without accounting for the memory map, and it is clobbering some of the memory that has been reserved for pmem.
Thank you,
Ben Chaney