The most critical issue with vm.memfd_noexec=2 (the fact that passing
MFD_EXEC would bypass it entirely[1]) has been fixed in Andrew's
tree[2], but there are still some outstanding issues that need to be
addressed:
* vm.memfd_noexec=2 shouldn't reject old-style memfd_create(2) syscalls
because it will make it far to difficult to ever migrate. Instead it
should imply MFD_EXEC.
* The dmesg warnings are pr_warn_once(), which on most systems means
that they will be used up by systemd or some other boot process and
userspace developers will never see it.
- For the !(flags & (MFD_EXEC | MFD_NOEXEC_SEAL)) case, outputting a
rate-limited message to the kernel log is necessary to tell
userspace that they should add the new flags.
Arguably the most ideal way to deal with the spam concern[3,4]
while still prompting userspace to switch to the new flags would be
to only log the warning once per task or something similar.
However, adding something to task_struct for tracking this would be
needless bloat for a single pr_warn_ratelimited().
So just switch to pr_info_ratelimited() to avoid spamming the log
with something that isn't a real warning. There's lots of
info-level stuff in dmesg, it seems really unlikely that this
should be an actual problem. Most programs are already switching to
the new flags anyway.
- For the vm.memfd_noexec=2 case, we need to log a warning for every
failure because otherwise userspace will have no idea why their
previously working program started returning -EACCES (previously
-EINVAL) from memfd_create(2). pr_warn_once() is simply wrong here.
* The racheting mechanism for vm.memfd_noexec makes it incredibly
unappealing for most users to enable the sysctl because enabling it
on &init_pid_ns means you need a system reboot to unset it. Given the
actual security threat being protected against, CAP_SYS_ADMIN users
being restricted in this way makes little sense.
The argument for this ratcheting by the original author was that it
allows you to have a hierarchical setting that cannot be unset by
child pidnses, but this is not accurate -- changing the parent
pidns's vm.memfd_noexec setting to be more restrictive didn't affect
children.
Instead, switch the vm.memfd_noexec sysctl to be properly
hierarchical and allow CAP_SYS_ADMIN users (in the pidns's owning
userns) to lower the setting as long as it is not lower than the
parent's effective setting. This change also makes it so that
changing a parent pidns's vm.memfd_noexec will affect all
descendants, providing a properly hierarchical setting. The
performance impact of this is incredibly minimal since the maximum
depth of pidns is 32 and it is only checked during memfd_create(2)
and unshare(CLONE_NEWPID).
* The memfd selftests would not exit with a non-zero error code when
certain tests that ran in a forked process (specifically the ones
related to MFD_EXEC and MFD_NOEXEC_SEAL) failed.
[1]: https://lore.kernel.org/all/ZJwcsU0vI-nzgOB_@codewreck.org/
[2]: https://lore.kernel.org/all/20230705063315.3680666-1-jeffxu@google.com/
[3]: https://lore.kernel.org/Y5yS8wCnuYGLHMj4@x1n/
[4]: https://lore.kernel.org/f185bb42-b29c-977e-312e-3349eea15383@linuxfoundatio…
Signed-off-by: Aleksa Sarai <cyphar(a)cyphar.com>
---
Changes in v2:
- Make vm.memfd_noexec restrictions properly hierarchical.
- Allow vm.memfd_noexec setting to be lowered by CAP_SYS_ADMIN as long
as it is not lower than the parent's effective setting.
- Fix the logging behaviour related to the new flags and
vm.memfd_noexec=2.
- Add more thorough tests for vm.memfd_noexec in selftests.
- v1: <https://lore.kernel.org/r/20230713143406.14342-1-cyphar@cyphar.com>
---
Aleksa Sarai (5):
selftests: memfd: error out test process when child test fails
memfd: do not -EACCES old memfd_create() users with vm.memfd_noexec=2
memfd: improve userspace warnings for missing exec-related flags
memfd: replace ratcheting feature from vm.memfd_noexec with hierarchy
selftests: improve vm.memfd_noexec sysctl tests
include/linux/pid_namespace.h | 39 ++--
kernel/pid.c | 3 +
kernel/pid_namespace.c | 6 +-
kernel/pid_sysctl.h | 28 ++-
mm/memfd.c | 33 ++-
tools/testing/selftests/memfd/memfd_test.c | 332 +++++++++++++++++++++++------
6 files changed, 322 insertions(+), 119 deletions(-)
---
base-commit: 3ff995246e801ea4de0a30860a1d8da4aeb538e7
change-id: 20230803-memfd-vm-noexec-uapi-fixes-ace725c67b0f
Best regards,
--
Aleksa Sarai <cyphar(a)cyphar.com>
Hi all,
Recently "memfd: improve userspace warnings for missing exec-related
flags" was merged. On my system, this is a regression, not an
improvement, because the entire 256k kernel log buffer (default on x86)
is filled with these warnings and "__do_sys_memfd_create: 122 callbacks
suppressed". I haven't investigated too closely, but the most likely
cause is Wayland libraries.
This is too serious of a consequence for using an old API, especially
considering how recently the flags were added. The vast majority of
software has not had time to add the flags: glibc does not define the
macros until 2.38 which was released less than one month ago, man-pages
does not document the flags, and according to Debian Code Search, only
systemd, stress-ng, and strace actually pass either of these flags.
Furthermore, since old kernels reject unknown flags, it's not just a
matter of defining and passing the flag; every program needs to
add logic to handle EINVAL and try again.
Some other way needs to be found to encourage userspace to add the
flags; otherwise, this message will be patched out because the kernel
log becomes unusable after running unupdated programs, which will still
exist even after upstreams are fixed. In particular, AppImages,
flatpaks, snaps, and similar app bundles contain vendored Wayland
libraries which can be difficult or impossible to update.
Thanks,
Alex.
The vendor check introduced by commit 554b841d4703 ("tpm: Disable RNG for
all AMD fTPMs") doesn't work properly on Intel fTPM. The TPM doesn't reply
at bootup and returns back the command code.
As this isn't crucial for anything but AMD fTPM and AMD fTPM works, throw
away the error code to let Intel fTPM continue to work.
Cc: stable(a)vger.kernel.org
Fixes: 554b841d4703 ("tpm: Disable RNG for all AMD fTPMs")
Reported-by: Todd Brandt <todd.e.brandt(a)intel.com>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217804
Signed-off-by: Mario Limonciello <mario.limonciello(a)amd.com>
---
drivers/char/tpm/tpm_crb.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/drivers/char/tpm/tpm_crb.c b/drivers/char/tpm/tpm_crb.c
index 9eb1a18590123..b0e9931fe436c 100644
--- a/drivers/char/tpm/tpm_crb.c
+++ b/drivers/char/tpm/tpm_crb.c
@@ -472,8 +472,7 @@ static int crb_check_flags(struct tpm_chip *chip)
if (ret)
return ret;
- ret = tpm2_get_tpm_pt(chip, TPM2_PT_MANUFACTURER, &val, NULL);
- if (ret)
+ if (tpm2_get_tpm_pt(chip, TPM2_PT_MANUFACTURER, &val, NULL))
goto release;
if (val == 0x414D4400U /* AMD */)
--
2.34.1
When introducing support for processed channels I needed
to invert the expression:
if (!iio_channel_has_info(schan, IIO_CHAN_INFO_RAW) ||
!iio_channel_has_info(schan, IIO_CHAN_INFO_SCALE))
dev_err(dev, "source channel does not support raw/scale\n");
To the inverse, meaning detect when we can usse raw+scale
rather than when we can not. This was the result:
if (iio_channel_has_info(schan, IIO_CHAN_INFO_RAW) ||
iio_channel_has_info(schan, IIO_CHAN_INFO_SCALE))
dev_info(dev, "using raw+scale source channel\n");
Ooops. Spot the error. Yep old George Boole came up and bit me.
That should be an &&.
The current code "mostly works" because we have not run into
systems supporting only raw but not scale or only scale but not
raw, and I doubt there are few using the rescaler on anything
such, but let's fix the logic.
Cc: Liam Beguin <liambeguin(a)gmail.com>
Cc: stable(a)vger.kernel.org
Fixes: 53ebee949980 ("iio: afe: iio-rescale: Support processed channels")
Signed-off-by: Linus Walleij <linus.walleij(a)linaro.org>
---
drivers/iio/afe/iio-rescale.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/iio/afe/iio-rescale.c b/drivers/iio/afe/iio-rescale.c
index 7e511293d6d1..dc426e1484f0 100644
--- a/drivers/iio/afe/iio-rescale.c
+++ b/drivers/iio/afe/iio-rescale.c
@@ -278,7 +278,7 @@ static int rescale_configure_channel(struct device *dev,
chan->ext_info = rescale->ext_info;
chan->type = rescale->cfg->type;
- if (iio_channel_has_info(schan, IIO_CHAN_INFO_RAW) ||
+ if (iio_channel_has_info(schan, IIO_CHAN_INFO_RAW) &&
iio_channel_has_info(schan, IIO_CHAN_INFO_SCALE)) {
dev_info(dev, "using raw+scale source channel\n");
} else if (iio_channel_has_info(schan, IIO_CHAN_INFO_PROCESSED)) {
--
2.35.3
From: Baoquan He <bhe(a)redhat.com>
[ Upstream commit b1e213a9e31c20206f111ec664afcf31cbfe0dbb ]
On s390 systems (aka mainframes), it has classic channel devices for
networking and permanent storage that are currently even more common
than PCI devices. Hence it could have a fully functional s390 kernel
with CONFIG_PCI=n, then the relevant iomem mapping functions
[including ioremap(), devm_ioremap(), etc.] are not available.
Here let FSL_EDMA and INTEL_IDMA64 depend on HAS_IOMEM so that it
won't be built to cause below compiling error if PCI is unset.
--------
ERROR: modpost: "devm_platform_ioremap_resource" [drivers/dma/fsl-edma.ko] undefined!
ERROR: modpost: "devm_platform_ioremap_resource" [drivers/dma/idma64.ko] undefined!
--------
Reported-by: kernel test robot <lkp(a)intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202306211329.ticOJCSv-lkp@intel.com/
Signed-off-by: Baoquan He <bhe(a)redhat.com>
Cc: Vinod Koul <vkoul(a)kernel.org>
Cc: dmaengine(a)vger.kernel.org
Link: https://lore.kernel.org/r/20230707135852.24292-2-bhe@redhat.com
Signed-off-by: Vinod Koul <vkoul(a)kernel.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/dma/Kconfig | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/dma/Kconfig b/drivers/dma/Kconfig
index b64ae02c26f8c..81de833ccd041 100644
--- a/drivers/dma/Kconfig
+++ b/drivers/dma/Kconfig
@@ -210,6 +210,7 @@ config FSL_DMA
config FSL_EDMA
tristate "Freescale eDMA engine support"
depends on OF
+ depends on HAS_IOMEM
select DMA_ENGINE
select DMA_VIRTUAL_CHANNELS
help
@@ -279,6 +280,7 @@ config IMX_SDMA
config INTEL_IDMA64
tristate "Intel integrated DMA 64-bit support"
+ depends on HAS_IOMEM
select DMA_ENGINE
select DMA_VIRTUAL_CHANNELS
help
--
2.40.1
Hello Greg, Hello Jens, Hello stable team,
would you please accept some backports to v6.1-stable for io_uring()?
io_uring() fails on parisc because of some missing upstream patches.
Since 6.1 is currently used in debian and gentoo as main kernel we
face some build errors due to the missing patches.
Here are the 3 steps I'm asking for (for kernel 6.1-stable only, the others are OK):
1) cherry-pick this upstream commit:
commit 567b35159e76997e95b643b9a8a5d9d2198f2522
Author: John David Anglin <dave(a)parisc-linux.org>
Date: Sun Feb 26 18:03:33 2023 +0000
parisc: Cleanup mmap implementation regarding color alignment
2) cherry-pick this upstream commit:
commit b5d89408b9fb21258f7c371d6d48a674f60f7181
Author: Helge Deller <deller(a)gmx.de>
Date: Fri Jun 30 12:36:09 2023 +0200
parisc: sys_parisc: parisc_personality() is called from asm code
3) apply the patch below as manual backport:
I think this is the least invasive change and I wasn't able to otherwise
simply pull in the upstream patches without touching code I don't want
to touch (and keep life easier for Jens if he wants to backport other
patches later).
Thanks!
Helge
From: Helge Deller <deller(a)gmx.de>
Date: Mon, 28 Aug 2023 23:07:49 +0200
Subject: [PATCH] io_uring/parisc: Adjust pgoff in io_uring mmap() for parisc
Vidra Jonas reported issues on parisc with libuv which then triggers
build errors with cmake. Debugging shows that those issues stem from
io_uring().
I was not able to easily pull in upstream commits directly, so here
is IMHO the least invasive manual backport of the following upstream
commits to fix the cache aliasing issues on parisc on kernel 6.1
with io_uring:
56675f8b9f9b ("io_uring/parisc: Adjust pgoff in io_uring mmap() for parisc")
32832a407a71 ("io_uring: Fix io_uring mmap() by using architecture-provided get_unmapped_area()")
d808459b2e31 ("io_uring: Adjust mapping wrt architecture aliasing requirements")
With this patch kernel 6.1 has all relevant mmap changes and is
identical to kernel 6.5 with regard to mmap() in io_uring.
Signed-off-by: Helge Deller <deller(a)gmx.de>
Reported-by: Vidra.Jonas(a)seznam.cz
Link: https://lore.kernel.org/linux-parisc/520.NvTX.6mXZpmfh4Ju.1awpAS@seznam.cz/
Cc: Sam James <sam(a)gentoo.org>
Cc: John David Anglin <dave.anglin(a)bell.net>
diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index ed8e9deae284..b0e47fe1eb4b 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -72,6 +72,7 @@
#include <linux/io_uring.h>
#include <linux/audit.h>
#include <linux/security.h>
+#include <asm/shmparam.h>
#define CREATE_TRACE_POINTS
#include <trace/events/io_uring.h>
@@ -3110,6 +3111,49 @@ static __cold int io_uring_mmap(struct file *file, struct vm_area_struct *vma)
return remap_pfn_range(vma, vma->vm_start, pfn, sz, vma->vm_page_prot);
}
+static unsigned long io_uring_mmu_get_unmapped_area(struct file *filp,
+ unsigned long addr, unsigned long len,
+ unsigned long pgoff, unsigned long flags)
+{
+ void *ptr;
+
+ /*
+ * Do not allow to map to user-provided address to avoid breaking the
+ * aliasing rules. Userspace is not able to guess the offset address of
+ * kernel kmalloc()ed memory area.
+ */
+ if (addr)
+ return -EINVAL;
+
+ ptr = io_uring_validate_mmap_request(filp, pgoff, len);
+ if (IS_ERR(ptr))
+ return -ENOMEM;
+
+ /*
+ * Some architectures have strong cache aliasing requirements.
+ * For such architectures we need a coherent mapping which aliases
+ * kernel memory *and* userspace memory. To achieve that:
+ * - use a NULL file pointer to reference physical memory, and
+ * - use the kernel virtual address of the shared io_uring context
+ * (instead of the userspace-provided address, which has to be 0UL
+ * anyway).
+ * - use the same pgoff which the get_unmapped_area() uses to
+ * calculate the page colouring.
+ * For architectures without such aliasing requirements, the
+ * architecture will return any suitable mapping because addr is 0.
+ */
+ filp = NULL;
+ flags |= MAP_SHARED;
+ pgoff = 0; /* has been translated to ptr above */
+#ifdef SHM_COLOUR
+ addr = (uintptr_t) ptr;
+ pgoff = addr >> PAGE_SHIFT;
+#else
+ addr = 0UL;
+#endif
+ return current->mm->get_unmapped_area(filp, addr, len, pgoff, flags);
+}
+
#else /* !CONFIG_MMU */
static int io_uring_mmap(struct file *file, struct vm_area_struct *vma)
@@ -3324,6 +3368,8 @@ static const struct file_operations io_uring_fops = {
#ifndef CONFIG_MMU
.get_unmapped_area = io_uring_nommu_get_unmapped_area,
.mmap_capabilities = io_uring_nommu_mmap_capabilities,
+#else
+ .get_unmapped_area = io_uring_mmu_get_unmapped_area,
#endif
.poll = io_uring_poll,
#ifdef CONFIG_PROC_FS