It seems that the most critical issue with vm.memfd_noexec=2 (the fact that passing MFD_EXEC would bypass it entirely[1]) has been fixed in Andrew's tree[2], but there are still some outstanding issues that need to be addressed:
* The dmesg warnings are pr_warn_once, which on most systems means that they will be used up by systemd or some other boot process and userspace developers will never see it. The original patch posted to the ML used pr_warn_ratelimited but the merged patch had it changed (with a comment about it being "per review"), but given that the current warnings are useless, pr_warn_ratelimited makes far more sense.
* vm.memfd_noexec=2 shouldn't reject old-style memfd_create(2) syscalls because it will make it far to difficult to ever migrate. Instead it should imply MFD_EXEC.
* The racheting mechanism for vm.memfd_noexec doesn't make sense as a security mechanism because a CAP_SYS_ADMIN capable user can create executable binaries in a hidden tmpfs very easily, not to mention the many other things they can do.
* The memfd selftests would not exit with a non-zero error code when certain tests that ran in a forked process (specifically the ones related to MFD_EXEC and MFD_NOEXEC_SEAL) failed.
(This patchset is based on top of Jeff Xu's patches[2] fixing the MFD_EXEC bug in vm.memfd_noexec=2.)
[1]: https://lore.kernel.org/all/ZJwcsU0vI-nzgOB_@codewreck.org/ [2]: https://lore.kernel.org/all/20230705063315.3680666-1-jeffxu@google.com/
Aleksa Sarai (3): memfd: cleanups for vm.memfd_noexec handling memfd: remove racheting feature from vm.memfd_noexec selftests: memfd: error out test process when child test fails
include/linux/pid_namespace.h | 16 +++------ kernel/pid_sysctl.h | 7 ---- mm/memfd.c | 32 +++++++---------- tools/testing/selftests/memfd/memfd_test.c | 41 ++++++++++++++++++---- 4 files changed, 51 insertions(+), 45 deletions(-)