From: Jeff Xu jeffxu@google.com
By default, memfd_create() creates a non-sealable MFD, unless the MFD_ALLOW_SEALING flag is set.
When the MFD_NOEXEC_SEAL flag is initially introduced, the MFD created with that flag is sealable, even though MFD_ALLOW_SEALING is not set. This patch changes MFD_NOEXEC_SEAL to be non-sealable by default, unless MFD_ALLOW_SEALING is explicitly set.
This is a non-backward compatible change. However, as MFD_NOEXEC_SEAL is new, we expect not many applications will rely on the nature of MFD_NOEXEC_SEAL being sealable. In most cases, the application already sets MFD_ALLOW_SEALING if they need a sealable MFD.
Additionally, this enhances the useability of pid namespace sysctl vm.memfd_noexec. When vm.memfd_noexec equals 1 or 2, the kernel will add MFD_NOEXEC_SEAL if mfd_create does not specify MFD_EXEC or MFD_NOEXEC_SEAL, and the addition of MFD_NOEXEC_SEAL enables the MFD to be sealable. This means, any application that does not desire this behavior will be unable to utilize vm.memfd_noexec = 1 or 2 to migrate/enforce non-executable MFD. This adjustment ensures that applications can anticipate that the sealable characteristic will remain unmodified by vm.memfd_noexec.
This patch was initially developed by Barnabás Pőcze, and Barnabás used Debian Code Search and GitHub to try to find potential breakages and could only find a single one. Dbus-broker's memfd_create() wrapper is aware of this implicit `MFD_ALLOW_SEALING` behavior, and tries to work around it [1]. This workaround will break. Luckily, this only affects the test suite, it does not affect the normal operations of dbus-broker. There is a PR with a fix[2]. In addition, David Rheinsberg also raised similar fix in [3]
[1]: https://github.com/bus1/dbus-broker/blob/9eb0b7e5826fc76cad7b025bc46f267d4a8... [2]: https://github.com/bus1/dbus-broker/pull/366 [3]: https://lore.kernel.org/lkml/20230714114753.170814-1-david@readahead.eu/
History ====== V2: update commit message. add testcase for vm.memfd_noexec add documentation.
V1: https://lore.kernel.org/lkml/20240513191544.94754-1-pobrn@protonmail.com/
Jeff Xu (2): memfd: fix MFD_NOEXEC_SEAL to be non-sealable by default memfd:add MEMFD_NOEXEC_SEAL documentation
Documentation/userspace-api/index.rst | 1 + Documentation/userspace-api/mfd_noexec.rst | 90 ++++++++++++++++++++++ mm/memfd.c | 9 +-- tools/testing/selftests/memfd/memfd_test.c | 26 ++++++- 4 files changed, 120 insertions(+), 6 deletions(-) create mode 100644 Documentation/userspace-api/mfd_noexec.rst
From: Jeff Xu jeffxu@google.com
By default, memfd_create() creates a non-sealable MFD, unless the MFD_ALLOW_SEALING flag is set.
When the MFD_NOEXEC_SEAL flag is initially introduced, the MFD created with that flag is sealable, even though MFD_ALLOW_SEALING is not set. This patch changes MFD_NOEXEC_SEAL to be non-sealable by default, unless MFD_ALLOW_SEALING is explicitly set.
This is a non-backward compatible change. However, as MFD_NOEXEC_SEAL is new, we expect not many applications will rely on the nature of MFD_NOEXEC_SEAL being sealable. In most cases, the application already sets MFD_ALLOW_SEALING if they need a sealable MFD.
Additionally, this enhances the useability of pid namespace sysctl vm.memfd_noexec. When vm.memfd_noexec equals 1 or 2, the kernel will add MFD_NOEXEC_SEAL if mfd_create does not specify MFD_EXEC or MFD_NOEXEC_SEAL, and the addition of MFD_NOEXEC_SEAL enables the MFD to be sealable. This means, any application that does not desire this behavior will be unable to utilize vm.memfd_noexec = 1 or 2 to migrate/enforce non-executable MFD. This adjustment ensures that applications can anticipate that the sealable characteristic will remain unmodified by vm.memfd_noexec.
This patch was initially developed by Barnabás Pőcze, and Barnabás used Debian Code Search and GitHub to try to find potential breakages and could only find a single one. Dbus-broker's memfd_create() wrapper is aware of this implicit `MFD_ALLOW_SEALING` behavior, and tries to work around it [1]. This workaround will break. Luckily, this only affects the test suite, it does not affect the normal operations of dbus-broker. There is a PR with a fix[2]. In addition, David Rheinsberg also raised similar fix in [3]
[1]: https://github.com/bus1/dbus-broker/blob/9eb0b7e5826fc76cad7b025bc46f267d4a8... [2]: https://github.com/bus1/dbus-broker/pull/366 [3]: https://lore.kernel.org/lkml/20230714114753.170814-1-david@readahead.eu/
Cc: stable@vger.kernel.org Fixes: 105ff5339f498a ("mm/memfd: add MFD_NOEXEC_SEAL and MFD_EXEC") Signed-off-by: Barnabás Pőcze pobrn@protonmail.com Signed-off-by: Jeff Xu jeffxu@google.com Reviewed-by: David Rheinsberg david@readahead.eu --- mm/memfd.c | 9 ++++---- tools/testing/selftests/memfd/memfd_test.c | 26 +++++++++++++++++++++- 2 files changed, 29 insertions(+), 6 deletions(-)
diff --git a/mm/memfd.c b/mm/memfd.c index 7d8d3ab3fa37..8b7f6afee21d 100644 --- a/mm/memfd.c +++ b/mm/memfd.c @@ -356,12 +356,11 @@ SYSCALL_DEFINE2(memfd_create,
inode->i_mode &= ~0111; file_seals = memfd_file_seals_ptr(file); - if (file_seals) { - *file_seals &= ~F_SEAL_SEAL; + if (file_seals) *file_seals |= F_SEAL_EXEC; - } - } else if (flags & MFD_ALLOW_SEALING) { - /* MFD_EXEC and MFD_ALLOW_SEALING are set */ + } + + if (flags & MFD_ALLOW_SEALING) { file_seals = memfd_file_seals_ptr(file); if (file_seals) *file_seals &= ~F_SEAL_SEAL; diff --git a/tools/testing/selftests/memfd/memfd_test.c b/tools/testing/selftests/memfd/memfd_test.c index 95af2d78fd31..8579a93d006b 100644 --- a/tools/testing/selftests/memfd/memfd_test.c +++ b/tools/testing/selftests/memfd/memfd_test.c @@ -1151,7 +1151,7 @@ static void test_noexec_seal(void) mfd_def_size, MFD_CLOEXEC | MFD_NOEXEC_SEAL); mfd_assert_mode(fd, 0666); - mfd_assert_has_seals(fd, F_SEAL_EXEC); + mfd_assert_has_seals(fd, F_SEAL_SEAL | F_SEAL_EXEC); mfd_fail_chmod(fd, 0777); close(fd); } @@ -1169,6 +1169,14 @@ static void test_sysctl_sysctl0(void) mfd_assert_has_seals(fd, 0); mfd_assert_chmod(fd, 0644); close(fd); + + fd = mfd_assert_new("kern_memfd_sysctl_0_dfl", + mfd_def_size, + MFD_CLOEXEC); + mfd_assert_mode(fd, 0777); + mfd_assert_has_seals(fd, F_SEAL_SEAL); + mfd_assert_chmod(fd, 0644); + close(fd); }
static void test_sysctl_set_sysctl0(void) @@ -1206,6 +1214,14 @@ static void test_sysctl_sysctl1(void) mfd_assert_has_seals(fd, F_SEAL_EXEC); mfd_fail_chmod(fd, 0777); close(fd); + + fd = mfd_assert_new("kern_memfd_sysctl_1_noexec_nosealable", + mfd_def_size, + MFD_CLOEXEC | MFD_NOEXEC_SEAL); + mfd_assert_mode(fd, 0666); + mfd_assert_has_seals(fd, F_SEAL_EXEC | F_SEAL_SEAL); + mfd_fail_chmod(fd, 0777); + close(fd); }
static void test_sysctl_set_sysctl1(void) @@ -1238,6 +1254,14 @@ static void test_sysctl_sysctl2(void) mfd_assert_has_seals(fd, F_SEAL_EXEC); mfd_fail_chmod(fd, 0777); close(fd); + + fd = mfd_assert_new("kern_memfd_sysctl_2_noexec_notsealable", + mfd_def_size, + MFD_CLOEXEC | MFD_NOEXEC_SEAL); + mfd_assert_mode(fd, 0666); + mfd_assert_has_seals(fd, F_SEAL_EXEC | F_SEAL_SEAL); + mfd_fail_chmod(fd, 0777); + close(fd); }
static void test_sysctl_set_sysctl2(void)
Hi
On Fri, May 24, 2024, at 5:39 AM, jeffxu@chromium.org wrote:
From: Jeff Xu jeffxu@google.com
By default, memfd_create() creates a non-sealable MFD, unless the MFD_ALLOW_SEALING flag is set.
When the MFD_NOEXEC_SEAL flag is initially introduced, the MFD created with that flag is sealable, even though MFD_ALLOW_SEALING is not set. This patch changes MFD_NOEXEC_SEAL to be non-sealable by default, unless MFD_ALLOW_SEALING is explicitly set.
This is a non-backward compatible change. However, as MFD_NOEXEC_SEAL is new, we expect not many applications will rely on the nature of MFD_NOEXEC_SEAL being sealable. In most cases, the application already sets MFD_ALLOW_SEALING if they need a sealable MFD.
This does not really reflect the effort that went into this. Shouldn't this be something along the lines of:
This is a non-backward compatible change. However, MFD_NOEXEC_SEAL was only recently introduced and a codesearch revealed no breaking users apart from dbus-broker unit-tests (which have a patch pending and explicitly support this change).
Additionally, this enhances the useability of pid namespace sysctl vm.memfd_noexec. When vm.memfd_noexec equals 1 or 2, the kernel will add MFD_NOEXEC_SEAL if mfd_create does not specify MFD_EXEC or MFD_NOEXEC_SEAL, and the addition of MFD_NOEXEC_SEAL enables the MFD to be sealable. This means, any application that does not desire this behavior will be unable to utilize vm.memfd_noexec = 1 or 2 to migrate/enforce non-executable MFD. This adjustment ensures that applications can anticipate that the sealable characteristic will remain unmodified by vm.memfd_noexec.
This patch was initially developed by Barnabás Pőcze, and Barnabás used Debian Code Search and GitHub to try to find potential breakages and could only find a single one. Dbus-broker's memfd_create() wrapper is aware of this implicit `MFD_ALLOW_SEALING` behavior, and tries to work around it [1]. This workaround will break. Luckily, this only affects the test suite, it does not affect the normal operations of dbus-broker. There is a PR with a fix[2]. In addition, David Rheinsberg also raised similar fix in [3]
Cc: stable@vger.kernel.org Fixes: 105ff5339f498a ("mm/memfd: add MFD_NOEXEC_SEAL and MFD_EXEC") Signed-off-by: Barnabás Pőcze pobrn@protonmail.com Signed-off-by: Jeff Xu jeffxu@google.com Reviewed-by: David Rheinsberg david@readahead.eu
Looks good! Thanks! David
Hi David and Barnabás
On Fri, May 24, 2024 at 7:15 AM David Rheinsberg david@readahead.eu wrote:
Hi
On Fri, May 24, 2024, at 5:39 AM, jeffxu@chromium.org wrote:
From: Jeff Xu jeffxu@google.com
By default, memfd_create() creates a non-sealable MFD, unless the MFD_ALLOW_SEALING flag is set.
When the MFD_NOEXEC_SEAL flag is initially introduced, the MFD created with that flag is sealable, even though MFD_ALLOW_SEALING is not set. This patch changes MFD_NOEXEC_SEAL to be non-sealable by default, unless MFD_ALLOW_SEALING is explicitly set.
This is a non-backward compatible change. However, as MFD_NOEXEC_SEAL is new, we expect not many applications will rely on the nature of MFD_NOEXEC_SEAL being sealable. In most cases, the application already sets MFD_ALLOW_SEALING if they need a sealable MFD.
This does not really reflect the effort that went into this. Shouldn't this be something along the lines of:
This is a non-backward compatible change. However, MFD_NOEXEC_SEAL was only recently introduced and a codesearch revealed no breaking users apart from dbus-broker unit-tests (which have a patch pending and explicitly support this change).
Actually, I think we might need to hold on to this change. With debian code search, I found more codes that already use MFD_NOEXEC_SEAL without MFD_ALLOW_SEALING. e.g. systemd [1], [2] [3]
I'm not sure if this will break more applications not-knowingly that have started relying on MFD_NOEXEC_SEAL being sealable. The feature has been out for more than a year.
Would you consider my augments in [4] to make MFD to be sealable by default ?
At this moment, I'm willing to add a document to clarify that MFD_NOEXEC_SEAL is sealable by default, and that an app that needs non-sealable MFD can set SEAL_SEAL. Because both MFD_NOEXEC_SEAL and vm.memfd_noexec are new, I don't think it breaks the existing ABI, and vm.memfd_noexec=0 is there for backward compatibility reasons. Besides, I honestly think there is little reason that MFD needs to be non-sealable by default. There might be few rare cases, but the majority of apps don't need that. On the flip side, the fact that MFD is set up to be sealable by default is a nice bonus for an app - it makes it easier for apps to use the sealing feature.
What do you think ?
Thanks -Jeff
[1] https://codesearch.debian.net/search?q=MFD_NOEXEC_SEAL [2] https://codesearch.debian.net/show?file=systemd_256~rc3-5%2Fsrc%2Fhome%2Fhom... [3] https://sources.debian.org/src/elogind/255.5-1debian1/src/shared/serialize.c... [4] https://lore.kernel.org/lkml/CALmYWFuPBEM2DE97mQvB2eEgSO9Dvt=uO9OewMhGfhGCY6...
Additionally, this enhances the useability of pid namespace sysctl vm.memfd_noexec. When vm.memfd_noexec equals 1 or 2, the kernel will add MFD_NOEXEC_SEAL if mfd_create does not specify MFD_EXEC or MFD_NOEXEC_SEAL, and the addition of MFD_NOEXEC_SEAL enables the MFD to be sealable. This means, any application that does not desire this behavior will be unable to utilize vm.memfd_noexec = 1 or 2 to migrate/enforce non-executable MFD. This adjustment ensures that applications can anticipate that the sealable characteristic will remain unmodified by vm.memfd_noexec.
This patch was initially developed by Barnabás Pőcze, and Barnabás used Debian Code Search and GitHub to try to find potential breakages and could only find a single one. Dbus-broker's memfd_create() wrapper is aware of this implicit `MFD_ALLOW_SEALING` behavior, and tries to work around it [1]. This workaround will break. Luckily, this only affects the test suite, it does not affect the normal operations of dbus-broker. There is a PR with a fix[2]. In addition, David Rheinsberg also raised similar fix in [3]
Cc: stable@vger.kernel.org Fixes: 105ff5339f498a ("mm/memfd: add MFD_NOEXEC_SEAL and MFD_EXEC") Signed-off-by: Barnabás Pőcze pobrn@protonmail.com Signed-off-by: Jeff Xu jeffxu@google.com Reviewed-by: David Rheinsberg david@readahead.eu
Looks good! Thanks! David
Hi
2024. május 29., szerda 23:30 keltezéssel, Jeff Xu jeffxu@google.com írta:
Hi David and Barnabás
On Fri, May 24, 2024 at 7:15 AM David Rheinsberg david@readahead.eu wrote:
Hi
On Fri, May 24, 2024, at 5:39 AM, jeffxu@chromium.org wrote:
From: Jeff Xu jeffxu@google.com
By default, memfd_create() creates a non-sealable MFD, unless the MFD_ALLOW_SEALING flag is set.
When the MFD_NOEXEC_SEAL flag is initially introduced, the MFD created with that flag is sealable, even though MFD_ALLOW_SEALING is not set. This patch changes MFD_NOEXEC_SEAL to be non-sealable by default, unless MFD_ALLOW_SEALING is explicitly set.
This is a non-backward compatible change. However, as MFD_NOEXEC_SEAL is new, we expect not many applications will rely on the nature of MFD_NOEXEC_SEAL being sealable. In most cases, the application already sets MFD_ALLOW_SEALING if they need a sealable MFD.
This does not really reflect the effort that went into this. Shouldn't this be something along the lines of:
This is a non-backward compatible change. However, MFD_NOEXEC_SEAL was only recently introduced and a codesearch revealed no breaking users apart from dbus-broker unit-tests (which have a patch pending and explicitly support this change).
Actually, I think we might need to hold on to this change. With debian code search, I found more codes that already use MFD_NOEXEC_SEAL without MFD_ALLOW_SEALING. e.g. systemd [1], [2] [3]
Yes, I have looked at those as well, and as far as I could tell, they are not affected. Have I missed something?
Regards, Barnabás
I'm not sure if this will break more applications not-knowingly that have started relying on MFD_NOEXEC_SEAL being sealable. The feature has been out for more than a year.
Would you consider my augments in [4] to make MFD to be sealable by default ?
At this moment, I'm willing to add a document to clarify that MFD_NOEXEC_SEAL is sealable by default, and that an app that needs non-sealable MFD can set SEAL_SEAL. Because both MFD_NOEXEC_SEAL and vm.memfd_noexec are new, I don't think it breaks the existing ABI, and vm.memfd_noexec=0 is there for backward compatibility reasons. Besides, I honestly think there is little reason that MFD needs to be non-sealable by default. There might be few rare cases, but the majority of apps don't need that. On the flip side, the fact that MFD is set up to be sealable by default is a nice bonus for an app - it makes it easier for apps to use the sealing feature.
What do you think ?
Thanks -Jeff
[1] https://codesearch.debian.net/search?q=MFD_NOEXEC_SEAL [2] https://codesearch.debian.net/show?file=systemd_256~rc3-5%2Fsrc%2Fhome%2Fhom... [3] https://sources.debian.org/src/elogind/255.5-1debian1/src/shared/serialize.c... [4] https://lore.kernel.org/lkml/CALmYWFuPBEM2DE97mQvB2eEgSO9Dvt=uO9OewMhGfhGCY6...
Additionally, this enhances the useability of pid namespace sysctl vm.memfd_noexec. When vm.memfd_noexec equals 1 or 2, the kernel will add MFD_NOEXEC_SEAL if mfd_create does not specify MFD_EXEC or MFD_NOEXEC_SEAL, and the addition of MFD_NOEXEC_SEAL enables the MFD to be sealable. This means, any application that does not desire this behavior will be unable to utilize vm.memfd_noexec = 1 or 2 to migrate/enforce non-executable MFD. This adjustment ensures that applications can anticipate that the sealable characteristic will remain unmodified by vm.memfd_noexec.
This patch was initially developed by Barnabás Pőcze, and Barnabás used Debian Code Search and GitHub to try to find potential breakages and could only find a single one. Dbus-broker's memfd_create() wrapper is aware of this implicit `MFD_ALLOW_SEALING` behavior, and tries to work around it [1]. This workaround will break. Luckily, this only affects the test suite, it does not affect the normal operations of dbus-broker. There is a PR with a fix[2]. In addition, David Rheinsberg also raised similar fix in [3]
Cc: stable@vger.kernel.org Fixes: 105ff5339f498a ("mm/memfd: add MFD_NOEXEC_SEAL and MFD_EXEC") Signed-off-by: Barnabás Pőcze pobrn@protonmail.com Signed-off-by: Jeff Xu jeffxu@google.com Reviewed-by: David Rheinsberg david@readahead.eu
Looks good! Thanks! David
On Wed, May 29, 2024 at 2:46 PM Barnabás Pőcze pobrn@protonmail.com wrote:
Hi
- május 29., szerda 23:30 keltezéssel, Jeff Xu jeffxu@google.com írta:
Hi David and Barnabás
On Fri, May 24, 2024 at 7:15 AM David Rheinsberg david@readahead.eu wrote:
Hi
On Fri, May 24, 2024, at 5:39 AM, jeffxu@chromium.org wrote:
From: Jeff Xu jeffxu@google.com
By default, memfd_create() creates a non-sealable MFD, unless the MFD_ALLOW_SEALING flag is set.
When the MFD_NOEXEC_SEAL flag is initially introduced, the MFD created with that flag is sealable, even though MFD_ALLOW_SEALING is not set. This patch changes MFD_NOEXEC_SEAL to be non-sealable by default, unless MFD_ALLOW_SEALING is explicitly set.
This is a non-backward compatible change. However, as MFD_NOEXEC_SEAL is new, we expect not many applications will rely on the nature of MFD_NOEXEC_SEAL being sealable. In most cases, the application already sets MFD_ALLOW_SEALING if they need a sealable MFD.
This does not really reflect the effort that went into this. Shouldn't this be something along the lines of:
This is a non-backward compatible change. However, MFD_NOEXEC_SEAL was only recently introduced and a codesearch revealed no breaking users apart from dbus-broker unit-tests (which have a patch pending and explicitly support this change).
Actually, I think we might need to hold on to this change. With debian code search, I found more codes that already use MFD_NOEXEC_SEAL without MFD_ALLOW_SEALING. e.g. systemd [1], [2] [3]
Yes, I have looked at those as well, and as far as I could tell, they are not affected. Have I missed something?
In the example, the MFD was created then passed into somewhere else (safe_fork_full, open_serialization_fd, etc.), the scope and usage of mfd isn't that clear to me, you might have checked all the user cases. In addition, MFD_NOEXEC_SEAL exists in libc and rust and go lib. I don't know if debian code search is sufficient to cover enough apps . There is a certain risk.
Fundamentally, I'm not convinced that making MFD default-non-sealable has meaningful benefit, especially when MFD_NOEXEC_SEAL is new.
Regards, Barnabás
I'm not sure if this will break more applications not-knowingly that have started relying on MFD_NOEXEC_SEAL being sealable. The feature has been out for more than a year.
Would you consider my augments in [4] to make MFD to be sealable by default ?
At this moment, I'm willing to add a document to clarify that MFD_NOEXEC_SEAL is sealable by default, and that an app that needs non-sealable MFD can set SEAL_SEAL. Because both MFD_NOEXEC_SEAL and vm.memfd_noexec are new, I don't think it breaks the existing ABI, and vm.memfd_noexec=0 is there for backward compatibility reasons. Besides, I honestly think there is little reason that MFD needs to be non-sealable by default. There might be few rare cases, but the majority of apps don't need that. On the flip side, the fact that MFD is set up to be sealable by default is a nice bonus for an app - it makes it easier for apps to use the sealing feature.
What do you think ?
Thanks -Jeff
[1] https://codesearch.debian.net/search?q=MFD_NOEXEC_SEAL [2] https://codesearch.debian.net/show?file=systemd_256~rc3-5%2Fsrc%2Fhome%2Fhom... [3] https://sources.debian.org/src/elogind/255.5-1debian1/src/shared/serialize.c... [4] https://lore.kernel.org/lkml/CALmYWFuPBEM2DE97mQvB2eEgSO9Dvt=uO9OewMhGfhGCY6...
Additionally, this enhances the useability of pid namespace sysctl vm.memfd_noexec. When vm.memfd_noexec equals 1 or 2, the kernel will add MFD_NOEXEC_SEAL if mfd_create does not specify MFD_EXEC or MFD_NOEXEC_SEAL, and the addition of MFD_NOEXEC_SEAL enables the MFD to be sealable. This means, any application that does not desire this behavior will be unable to utilize vm.memfd_noexec = 1 or 2 to migrate/enforce non-executable MFD. This adjustment ensures that applications can anticipate that the sealable characteristic will remain unmodified by vm.memfd_noexec.
This patch was initially developed by Barnabás Pőcze, and Barnabás used Debian Code Search and GitHub to try to find potential breakages and could only find a single one. Dbus-broker's memfd_create() wrapper is aware of this implicit `MFD_ALLOW_SEALING` behavior, and tries to work around it [1]. This workaround will break. Luckily, this only affects the test suite, it does not affect the normal operations of dbus-broker. There is a PR with a fix[2]. In addition, David Rheinsberg also raised similar fix in [3]
Cc: stable@vger.kernel.org Fixes: 105ff5339f498a ("mm/memfd: add MFD_NOEXEC_SEAL and MFD_EXEC") Signed-off-by: Barnabás Pőcze pobrn@protonmail.com Signed-off-by: Jeff Xu jeffxu@google.com Reviewed-by: David Rheinsberg david@readahead.eu
Looks good! Thanks! David
2024. május 30., csütörtök 0:24 keltezéssel, Jeff Xu jeffxu@google.com írta:
On Wed, May 29, 2024 at 2:46 PM Barnabás Pőcze pobrn@protonmail.com wrote:
Hi
- május 29., szerda 23:30 keltezéssel, Jeff Xu jeffxu@google.com írta:
Hi David and Barnabás
On Fri, May 24, 2024 at 7:15 AM David Rheinsberg david@readahead.eu wrote:
Hi
On Fri, May 24, 2024, at 5:39 AM, jeffxu@chromium.org wrote:
From: Jeff Xu jeffxu@google.com
By default, memfd_create() creates a non-sealable MFD, unless the MFD_ALLOW_SEALING flag is set.
When the MFD_NOEXEC_SEAL flag is initially introduced, the MFD created with that flag is sealable, even though MFD_ALLOW_SEALING is not set. This patch changes MFD_NOEXEC_SEAL to be non-sealable by default, unless MFD_ALLOW_SEALING is explicitly set.
This is a non-backward compatible change. However, as MFD_NOEXEC_SEAL is new, we expect not many applications will rely on the nature of MFD_NOEXEC_SEAL being sealable. In most cases, the application already sets MFD_ALLOW_SEALING if they need a sealable MFD.
This does not really reflect the effort that went into this. Shouldn't this be something along the lines of:
This is a non-backward compatible change. However, MFD_NOEXEC_SEAL was only recently introduced and a codesearch revealed no breaking users apart from dbus-broker unit-tests (which have a patch pending and explicitly support this change).
Actually, I think we might need to hold on to this change. With debian code search, I found more codes that already use MFD_NOEXEC_SEAL without MFD_ALLOW_SEALING. e.g. systemd [1], [2] [3]
Yes, I have looked at those as well, and as far as I could tell, they are not affected. Have I missed something?
In the example, the MFD was created then passed into somewhere else (safe_fork_full, open_serialization_fd, etc.), the scope and usage of mfd isn't that clear to me, you might have checked all the user cases. In addition, MFD_NOEXEC_SEAL exists in libc and rust and go lib. I don't know if debian code search is sufficient to cover enough apps . There is a certain risk.
Fundamentally, I'm not convinced that making MFD default-non-sealable has meaningful benefit, especially when MFD_NOEXEC_SEAL is new.
Certainly, there is always a risk, I did not mean to imply that there isn't. However, I believe this risk is low enough to at least warrant an attempt at eliminating this inconsistency. It can always be reverted if it turns out that the effects have been vastly underestimated by me.
So I would still like to see this change merged.
Regards, Barnabás Pőcze
Regards, Barnabás
I'm not sure if this will break more applications not-knowingly that have started relying on MFD_NOEXEC_SEAL being sealable. The feature has been out for more than a year.
Would you consider my augments in [4] to make MFD to be sealable by default ?
At this moment, I'm willing to add a document to clarify that MFD_NOEXEC_SEAL is sealable by default, and that an app that needs non-sealable MFD can set SEAL_SEAL. Because both MFD_NOEXEC_SEAL and vm.memfd_noexec are new, I don't think it breaks the existing ABI, and vm.memfd_noexec=0 is there for backward compatibility reasons. Besides, I honestly think there is little reason that MFD needs to be non-sealable by default. There might be few rare cases, but the majority of apps don't need that. On the flip side, the fact that MFD is set up to be sealable by default is a nice bonus for an app - it makes it easier for apps to use the sealing feature.
What do you think ?
Thanks -Jeff
[1] https://codesearch.debian.net/search?q=MFD_NOEXEC_SEAL [2] https://codesearch.debian.net/show?file=systemd_256~rc3-5%2Fsrc%2Fhome%2Fhom... [3] https://sources.debian.org/src/elogind/255.5-1debian1/src/shared/serialize.c... [4] https://lore.kernel.org/lkml/CALmYWFuPBEM2DE97mQvB2eEgSO9Dvt=uO9OewMhGfhGCY6...
Additionally, this enhances the useability of pid namespace sysctl vm.memfd_noexec. When vm.memfd_noexec equals 1 or 2, the kernel will add MFD_NOEXEC_SEAL if mfd_create does not specify MFD_EXEC or MFD_NOEXEC_SEAL, and the addition of MFD_NOEXEC_SEAL enables the MFD to be sealable. This means, any application that does not desire this behavior will be unable to utilize vm.memfd_noexec = 1 or 2 to migrate/enforce non-executable MFD. This adjustment ensures that applications can anticipate that the sealable characteristic will remain unmodified by vm.memfd_noexec.
This patch was initially developed by Barnabás Pőcze, and Barnabás used Debian Code Search and GitHub to try to find potential breakages and could only find a single one. Dbus-broker's memfd_create() wrapper is aware of this implicit `MFD_ALLOW_SEALING` behavior, and tries to work around it [1]. This workaround will break. Luckily, this only affects the test suite, it does not affect the normal operations of dbus-broker. There is a PR with a fix[2]. In addition, David Rheinsberg also raised similar fix in [3]
Cc: stable@vger.kernel.org Fixes: 105ff5339f498a ("mm/memfd: add MFD_NOEXEC_SEAL and MFD_EXEC") Signed-off-by: Barnabás Pőcze pobrn@protonmail.com Signed-off-by: Jeff Xu jeffxu@google.com Reviewed-by: David Rheinsberg david@readahead.eu
Looks good! Thanks! David
Hi Barnabás
On Fri, May 31, 2024 at 11:56 AM Barnabás Pőcze pobrn@protonmail.com wrote:
- május 30., csütörtök 0:24 keltezéssel, Jeff Xu jeffxu@google.com írta:
On Wed, May 29, 2024 at 2:46 PM Barnabás Pőcze pobrn@protonmail.com wrote:
Hi
- május 29., szerda 23:30 keltezéssel, Jeff Xu jeffxu@google.com írta:
Hi David and Barnabás
On Fri, May 24, 2024 at 7:15 AM David Rheinsberg david@readahead.eu wrote:
Hi
On Fri, May 24, 2024, at 5:39 AM, jeffxu@chromium.org wrote:
From: Jeff Xu jeffxu@google.com
By default, memfd_create() creates a non-sealable MFD, unless the MFD_ALLOW_SEALING flag is set.
When the MFD_NOEXEC_SEAL flag is initially introduced, the MFD created with that flag is sealable, even though MFD_ALLOW_SEALING is not set. This patch changes MFD_NOEXEC_SEAL to be non-sealable by default, unless MFD_ALLOW_SEALING is explicitly set.
This is a non-backward compatible change. However, as MFD_NOEXEC_SEAL is new, we expect not many applications will rely on the nature of MFD_NOEXEC_SEAL being sealable. In most cases, the application already sets MFD_ALLOW_SEALING if they need a sealable MFD.
This does not really reflect the effort that went into this. Shouldn't this be something along the lines of:
This is a non-backward compatible change. However, MFD_NOEXEC_SEAL was only recently introduced and a codesearch revealed no breaking users apart from dbus-broker unit-tests (which have a patch pending and explicitly support this change).
Actually, I think we might need to hold on to this change. With debian code search, I found more codes that already use MFD_NOEXEC_SEAL without MFD_ALLOW_SEALING. e.g. systemd [1], [2] [3]
Yes, I have looked at those as well, and as far as I could tell, they are not affected. Have I missed something?
In the example, the MFD was created then passed into somewhere else (safe_fork_full, open_serialization_fd, etc.), the scope and usage of mfd isn't that clear to me, you might have checked all the user cases. In addition, MFD_NOEXEC_SEAL exists in libc and rust and go lib. I don't know if debian code search is sufficient to cover enough apps . There is a certain risk.
Fundamentally, I'm not convinced that making MFD default-non-sealable has meaningful benefit, especially when MFD_NOEXEC_SEAL is new.
Certainly, there is always a risk, I did not mean to imply that there isn't. However, I believe this risk is low enough to at least warrant an attempt at eliminating this inconsistency. It can always be reverted if it turns out that the effects have been vastly underestimated by me.
So I would still like to see this change merged.
The MFD_NOEXEC_SEAL is a new flag, technically, ABI is not broken. The sysctl vm.memfd_noexec=1 or 2, is meant to help migration/enforcement of MFD_NOEXEC_SEAL, so it will break application if it is used pre-maturely, that is by-design.
I think the main problem here is lack of documentation, instead of a code bug. ABI change shouldn't be treated lightly, given the risk, I would like to keep the API the same and add the documentation instead. I think that is the best route forward.
Best Regards, -Jeff
Regards, Barnabás Pőcze
Regards, Barnabás
I'm not sure if this will break more applications not-knowingly that have started relying on MFD_NOEXEC_SEAL being sealable. The feature has been out for more than a year.
Would you consider my augments in [4] to make MFD to be sealable by default ?
At this moment, I'm willing to add a document to clarify that MFD_NOEXEC_SEAL is sealable by default, and that an app that needs non-sealable MFD can set SEAL_SEAL. Because both MFD_NOEXEC_SEAL and vm.memfd_noexec are new, I don't think it breaks the existing ABI, and vm.memfd_noexec=0 is there for backward compatibility reasons. Besides, I honestly think there is little reason that MFD needs to be non-sealable by default. There might be few rare cases, but the majority of apps don't need that. On the flip side, the fact that MFD is set up to be sealable by default is a nice bonus for an app - it makes it easier for apps to use the sealing feature.
What do you think ?
Thanks -Jeff
[1] https://codesearch.debian.net/search?q=MFD_NOEXEC_SEAL [2] https://codesearch.debian.net/show?file=systemd_256~rc3-5%2Fsrc%2Fhome%2Fhom... [3] https://sources.debian.org/src/elogind/255.5-1debian1/src/shared/serialize.c... [4] https://lore.kernel.org/lkml/CALmYWFuPBEM2DE97mQvB2eEgSO9Dvt=uO9OewMhGfhGCY6...
Additionally, this enhances the useability of pid namespace sysctl vm.memfd_noexec. When vm.memfd_noexec equals 1 or 2, the kernel will add MFD_NOEXEC_SEAL if mfd_create does not specify MFD_EXEC or MFD_NOEXEC_SEAL, and the addition of MFD_NOEXEC_SEAL enables the MFD to be sealable. This means, any application that does not desire this behavior will be unable to utilize vm.memfd_noexec = 1 or 2 to migrate/enforce non-executable MFD. This adjustment ensures that applications can anticipate that the sealable characteristic will remain unmodified by vm.memfd_noexec.
This patch was initially developed by Barnabás Pőcze, and Barnabás used Debian Code Search and GitHub to try to find potential breakages and could only find a single one. Dbus-broker's memfd_create() wrapper is aware of this implicit `MFD_ALLOW_SEALING` behavior, and tries to work around it [1]. This workaround will break. Luckily, this only affects the test suite, it does not affect the normal operations of dbus-broker. There is a PR with a fix[2]. In addition, David Rheinsberg also raised similar fix in [3]
Cc: stable@vger.kernel.org Fixes: 105ff5339f498a ("mm/memfd: add MFD_NOEXEC_SEAL and MFD_EXEC") Signed-off-by: Barnabás Pőcze pobrn@protonmail.com Signed-off-by: Jeff Xu jeffxu@google.com Reviewed-by: David Rheinsberg david@readahead.eu
Looks good! Thanks! David
From: Jeff Xu jeffxu@google.com
Add documentation for MFD_NOEXEC_SEAL and MFD_EXEC
Cc: stable@vger.kernel.org Signed-off-by: Jeff Xu jeffxu@google.com --- Documentation/userspace-api/index.rst | 1 + Documentation/userspace-api/mfd_noexec.rst | 90 ++++++++++++++++++++++ 2 files changed, 91 insertions(+) create mode 100644 Documentation/userspace-api/mfd_noexec.rst
diff --git a/Documentation/userspace-api/index.rst b/Documentation/userspace-api/index.rst index 5926115ec0ed..8a251d71fa6e 100644 --- a/Documentation/userspace-api/index.rst +++ b/Documentation/userspace-api/index.rst @@ -32,6 +32,7 @@ Security-related interfaces seccomp_filter landlock lsm + mfd_noexec spec_ctrl tee
diff --git a/Documentation/userspace-api/mfd_noexec.rst b/Documentation/userspace-api/mfd_noexec.rst new file mode 100644 index 000000000000..6f11ad86b076 --- /dev/null +++ b/Documentation/userspace-api/mfd_noexec.rst @@ -0,0 +1,90 @@ +.. SPDX-License-Identifier: GPL-2.0 + +================================== +Introduction of non executable mfd +================================== +:Author: + Daniel Verkamp dverkamp@chromium.org + Jeff Xu jeffxu@google.com + +:Contributor: + Aleksa Sarai cyphar@cyphar.com + Barnabás Pőcze pobrn@protonmail.com + David Rheinsberg david@readahead.eu + +Since Linux introduced the memfd feature, memfd have always had their +execute bit set, and the memfd_create() syscall doesn't allow setting +it differently. + +However, in a secure by default system, such as ChromeOS, (where all +executables should come from the rootfs, which is protected by Verified +boot), this executable nature of memfd opens a door for NoExec bypass +and enables “confused deputy attack”. E.g, in VRP bug [1]: cros_vm +process created a memfd to share the content with an external process, +however the memfd is overwritten and used for executing arbitrary code +and root escalation. [2] lists more VRP in this kind. + +On the other hand, executable memfd has its legit use, runc uses memfd’s +seal and executable feature to copy the contents of the binary then +execute them, for such system, we need a solution to differentiate runc's +use of executable memfds and an attacker's [3]. + +To address those above. + - Let memfd_create() set X bit at creation time. + - Let memfd be sealed for modifying X bit when NX is set. + - A new pid namespace sysctl: vm.memfd_noexec to help applications to + migrating and enforcing non-executable MFD. + +User API +======== +``int memfd_create(const char *name, unsigned int flags)`` + +``MFD_NOEXEC_SEAL`` + When MFD_NOEXEC_SEAL bit is set in the ``flags``, memfd is created + with NX. F_SEAL_EXEC is set and the memfd can't be modified to + add X later. + This is the most common case for the application to use memfd. + +``MFD_EXEC`` + When MFD_EXEC bit is set in the ``flags``, memfd is created with X. + +Note: + ``MFD_NOEXEC_SEAL`` and ``MFD_EXEC`` doesn't change the sealable + characteristic of memfd, which is controlled by ``MFD_ALLOW_SEALING``. + + +Sysctl: +======== +``pid namespaced sysctl vm.memfd_noexec`` + +The new pid namespaced sysctl vm.memfd_noexec has 3 values: + + - 0: MEMFD_NOEXEC_SCOPE_EXEC + memfd_create() without MFD_EXEC nor MFD_NOEXEC_SEAL acts like + MFD_EXEC was set. + + - 1: MEMFD_NOEXEC_SCOPE_NOEXEC_SEAL + memfd_create() without MFD_EXEC nor MFD_NOEXEC_SEAL acts like + MFD_NOEXEC_SEAL was set. + + - 2: MEMFD_NOEXEC_SCOPE_NOEXEC_ENFORCED + memfd_create() without MFD_NOEXEC_SEAL will be rejected. + +The sysctl allows finer control of memfd_create for old-software that +doesn't set the executable bit, for example, a container with +vm.memfd_noexec=1 means the old-software will create non-executable memfd +by default while new-software can create executable memfd by setting +MFD_EXEC. + +The value of memfd_noexec is passed to child namespace at creation time, +in addition, the setting is hierarchical, i.e. during memfd_create, +we will search from current ns to root ns and use the most restrictive +setting. + +Reference: +========== +[1] https://crbug.com/1305267 + +[2] https://bugs.chromium.org/p/chromium/issues/list?q=type%3Dbug-security%20mem... + +[3] https://lwn.net/Articles/781013/
Hi Aleksa
On Thu, May 23, 2024 at 8:39 PM jeffxu@chromium.org wrote:
From: Jeff Xu jeffxu@google.com
Add documentation for MFD_NOEXEC_SEAL and MFD_EXEC
Cc: stable@vger.kernel.org Signed-off-by: Jeff Xu jeffxu@google.com
Documentation/userspace-api/index.rst | 1 + Documentation/userspace-api/mfd_noexec.rst | 90 ++++++++++++++++++++++ 2 files changed, 91 insertions(+) create mode 100644 Documentation/userspace-api/mfd_noexec.rst
diff --git a/Documentation/userspace-api/index.rst b/Documentation/userspace-api/index.rst index 5926115ec0ed..8a251d71fa6e 100644 --- a/Documentation/userspace-api/index.rst +++ b/Documentation/userspace-api/index.rst @@ -32,6 +32,7 @@ Security-related interfaces seccomp_filter landlock lsm
- mfd_noexec spec_ctrl tee
diff --git a/Documentation/userspace-api/mfd_noexec.rst b/Documentation/userspace-api/mfd_noexec.rst new file mode 100644 index 000000000000..6f11ad86b076 --- /dev/null +++ b/Documentation/userspace-api/mfd_noexec.rst @@ -0,0 +1,90 @@ +.. SPDX-License-Identifier: GPL-2.0
+================================== +Introduction of non executable mfd +================================== +:Author:
- Daniel Verkamp dverkamp@chromium.org
- Jeff Xu jeffxu@google.com
+:Contributor:
Aleksa Sarai <cyphar@cyphar.com>
Barnabás Pőcze <pobrn@protonmail.com>
David Rheinsberg <david@readahead.eu>
+Since Linux introduced the memfd feature, memfd have always had their +execute bit set, and the memfd_create() syscall doesn't allow setting +it differently.
+However, in a secure by default system, such as ChromeOS, (where all +executables should come from the rootfs, which is protected by Verified +boot), this executable nature of memfd opens a door for NoExec bypass +and enables “confused deputy attack”. E.g, in VRP bug [1]: cros_vm +process created a memfd to share the content with an external process, +however the memfd is overwritten and used for executing arbitrary code +and root escalation. [2] lists more VRP in this kind.
+On the other hand, executable memfd has its legit use, runc uses memfd’s +seal and executable feature to copy the contents of the binary then +execute them, for such system, we need a solution to differentiate runc's +use of executable memfds and an attacker's [3].
+To address those above.
- Let memfd_create() set X bit at creation time.
- Let memfd be sealed for modifying X bit when NX is set.
- A new pid namespace sysctl: vm.memfd_noexec to help applications to
- migrating and enforcing non-executable MFD.
+User API +======== +``int memfd_create(const char *name, unsigned int flags)``
+``MFD_NOEXEC_SEAL``
When MFD_NOEXEC_SEAL bit is set in the ``flags``, memfd is created
with NX. F_SEAL_EXEC is set and the memfd can't be modified to
add X later.
This is the most common case for the application to use memfd.
+``MFD_EXEC``
When MFD_EXEC bit is set in the ``flags``, memfd is created with X.
+Note:
``MFD_NOEXEC_SEAL`` and ``MFD_EXEC`` doesn't change the sealable
characteristic of memfd, which is controlled by ``MFD_ALLOW_SEALING``.
+Sysctl: +======== +``pid namespaced sysctl vm.memfd_noexec``
+The new pid namespaced sysctl vm.memfd_noexec has 3 values:
- 0: MEMFD_NOEXEC_SCOPE_EXEC
memfd_create() without MFD_EXEC nor MFD_NOEXEC_SEAL acts like
MFD_EXEC was set.
- 1: MEMFD_NOEXEC_SCOPE_NOEXEC_SEAL
memfd_create() without MFD_EXEC nor MFD_NOEXEC_SEAL acts like
MFD_NOEXEC_SEAL was set.
- 2: MEMFD_NOEXEC_SCOPE_NOEXEC_ENFORCED
memfd_create() without MFD_NOEXEC_SEAL will be rejected.
+The sysctl allows finer control of memfd_create for old-software that +doesn't set the executable bit, for example, a container with +vm.memfd_noexec=1 means the old-software will create non-executable memfd +by default while new-software can create executable memfd by setting +MFD_EXEC.
+The value of memfd_noexec is passed to child namespace at creation time, +in addition, the setting is hierarchical, i.e. during memfd_create, +we will search from current ns to root ns and use the most restrictive +setting.
Can you please help to review the sysctl part to check if I captured your change correctly ?
Thanks -Jeff
+Reference: +========== +[1] https://crbug.com/1305267
+[2] https://bugs.chromium.org/p/chromium/issues/list?q=type%3Dbug-security%20mem...
+[3] https://lwn.net/Articles/781013/
2.45.1.288.g0e0cd299f1-goog
linux-kselftest-mirror@lists.linaro.org