On 2024-05-23, Jeff Xu jeffxu@google.com wrote:
On Thu, May 23, 2024 at 1:24 AM David Rheinsberg david@readahead.eu wrote:
Hi
On Thu, May 23, 2024, at 4:25 AM, Barnabás Pőcze wrote:
- május 23., csütörtök 1:23 keltezéssel, Andrew Morton
akpm@linux-foundation.org írta:
It's a change to a userspace API, yes? Please let's have a detailed description of why this is OK. Why it won't affect any existing users.
Yes, it is a uAPI change. To trigger user visible change, a program has to
- create a memfd
- with MFD_NOEXEC_SEAL,
- without MFD_ALLOW_SEALING;
- try to add seals / check the seals.
This change in essence reverts the kernel's behaviour to that of Linux <6.3, where only `MFD_ALLOW_SEALING` enabled sealing. If a program works correctly on those kernels, it will likely work correctly after this change.
I have looked through Debian Code Search and GitHub, searching for `MFD_NOEXEC_SEAL`. And I could find only a single breakage that this change would case: dbus-broker has its own memfd_create() wrapper that is aware of this implicit `MFD_ALLOW_SEALING` behaviour[0], and tries to work around it. This workaround will break. Luckily, however, as far as I could tell this only affects the test suite of dbus-broker, not its normal operations, so I believe it should be fine. I have prepared a PR with a fix[1].
We asked for exactly this fix before, so I very much support this. Our test-suite in `dbus-broker` merely verifies what the current kernel behavior is (just like the kernel selftests). I am certainly ok if the kernel breaks it. I will gladly adapt the test-suite.
Previous discussion was in:
[PATCH] memfd: support MFD_NOEXEC alongside MFD_EXEC https://lore.kernel.org/lkml/20230714114753.170814-1-david@readahead.eu/
Note that this fix is particularly important in combination with `vm.memfd_noexec=2`, since this breaks existing user-space by enabling sealing on all memfds unconditionally. I also encourage backporting to stable kernels.
Also with vm.memfd_noexec=1. I think that problem must be addressed either with this patch, or with a new flag.
Regarding vm.memfd_noexec, on another topic. I think in addition to vm.memfd_noexec = 1 and 2, there still could be another state: 3
=0. Do nothing. =1. This will add MFD_NOEXEC_SEAL if application didn't set EXEC or MFD_NOEXE_SEAL (to help with the migration) =2: This will reject all calls without MFD_NOEXEC_SEAL (the whole system doesn't allow executable memfd) =3: Application must set MFD_EXEC or MFD_NOEXEC_SEAL explicitly, or else it will be rejected.
3 is useful because it lets applications choose what to use, and forces applications to migrate to new semantics (this is what 2 did before 9876cfe8). The caveat is 3 is less restrictive than 2, so must document it clearly.
As discussed at the time, "you must use this flag" is not a useful setting for a general purpose operating system because it explicitly disables backwards compatibility (breaking any application that was written in the past 10 years!) for no reason other than "new is better".
As I suggested when we fixed the semantics of vm.memfd_noexec, if you really want to block a particular flag from not being set, seccomp lets you do this incredibly easily without acting as a footgun for admins. Yes, vm.memfd_noexec can break programs that use executable memfds, but that is the point of the sysctl -- making vm.memfd_noexec break programs that don't use executable memfds (they are only guilty of being written before mid-2023) is not useful.
In addition, making 3 less restrictive than 2 would make the original restriction mechanism useless. A malicious process could raise the setting to 3 and disable the "protection" (as discussed before, I really don't understand the threat model here, but making it possible to disable easily is pretty clearly). You could change the policy, but now you're adding more complexity for a feature that IMO doesn't really make sense in the first place.
-Jeff
Reviewed-by: David Rheinsberg david@readahead.eu
Thanks David