This series is based on torvalds/master, but additionally the run_vmtests.sh changes assume my refactor [1] has been applied first.
The series is split up like so: - Patch 1 is a simple fixup which we should take in any case (even by itself). - Patches 2-4 add the feature, basic support for it to the selftest, and docs. - Patches 5-6 make the selftest configurable, so you can test one or the other instead of always both. If we decide this is overcomplicated, we could just drop these two patches and take the rest of the series.
[1]: https://patchwork.kernel.org/project/linux-mm/patch/20220421224928.1848230-1...
Changelog: v1->v2: - Add documentation update. - Test *both* userfaultfd(2) and /dev/userfaultfd via the selftest.
Axel Rasmussen (6): selftests: vm: add hugetlb_shared userfaultfd test to run_vmtests.sh userfaultfd: add /dev/userfaultfd for fine grained access control userfaultfd: selftests: modify selftest to use /dev/userfaultfd userfaultfd: update documentation to describe /dev/userfaultfd userfaultfd: selftests: make /dev/userfaultfd testing configurable selftests: vm: add /dev/userfaultfd test cases to run_vmtests.sh
Documentation/admin-guide/mm/userfaultfd.rst | 38 +++++++++- Documentation/admin-guide/sysctl/vm.rst | 3 + fs/userfaultfd.c | 79 ++++++++++++++++---- include/uapi/linux/userfaultfd.h | 4 + tools/testing/selftests/vm/run_vmtests.sh | 11 ++- tools/testing/selftests/vm/userfaultfd.c | 60 +++++++++++++-- 6 files changed, 170 insertions(+), 25 deletions(-)
-- 2.36.0.rc2.479.g8af0fa9b8e-goog
This not being included was just a simple oversight. There are certain features (like minor fault support) which are only enabled on shared mappings, so without including hugetlb_shared we actually lose a significant amount of test coverage.
Signed-off-by: Axel Rasmussen axelrasmussen@google.com --- tools/testing/selftests/vm/run_vmtests.sh | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/vm/run_vmtests.sh b/tools/testing/selftests/vm/run_vmtests.sh index a2302b5faaf2..5065dbd89bdb 100755 --- a/tools/testing/selftests/vm/run_vmtests.sh +++ b/tools/testing/selftests/vm/run_vmtests.sh @@ -121,9 +121,11 @@ run_test ./gup_test -a run_test ./gup_test -ct -F 0x1 0 19 0x1000
run_test ./userfaultfd anon 20 16 -# Test requires source and destination huge pages. Size of source -# (half_ufd_size_MB) is passed as argument to test. +# Hugetlb tests require source and destination huge pages. Pass in half the +# size ($half_ufd_size_MB), which is used for *each*. run_test ./userfaultfd hugetlb "$half_ufd_size_MB" 32 +run_test ./userfaultfd hugetlb_shared "$half_ufd_size_MB" 32 "$mnt"/uffd-test +rm -f "$mnt"/uffd-test run_test ./userfaultfd shmem 20 16
#cleanup
On 4/22/22 3:29 PM, Axel Rasmussen wrote:
This not being included was just a simple oversight. There are certain features (like minor fault support) which are only enabled on shared mappings, so without including hugetlb_shared we actually lose a significant amount of test coverage.
Signed-off-by: Axel Rasmussen axelrasmussen@google.com
tools/testing/selftests/vm/run_vmtests.sh | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/vm/run_vmtests.sh b/tools/testing/selftests/vm/run_vmtests.sh index a2302b5faaf2..5065dbd89bdb 100755 --- a/tools/testing/selftests/vm/run_vmtests.sh +++ b/tools/testing/selftests/vm/run_vmtests.sh @@ -121,9 +121,11 @@ run_test ./gup_test -a run_test ./gup_test -ct -F 0x1 0 19 0x1000 run_test ./userfaultfd anon 20 16 -# Test requires source and destination huge pages. Size of source -# (half_ufd_size_MB) is passed as argument to test. +# Hugetlb tests require source and destination huge pages. Pass in half the +# size ($half_ufd_size_MB), which is used for *each*. run_test ./userfaultfd hugetlb "$half_ufd_size_MB" 32 +run_test ./userfaultfd hugetlb_shared "$half_ufd_size_MB" 32 "$mnt"/uffd-test +rm -f "$mnt"/uffd-test run_test ./userfaultfd shmem 20 16 #cleanup
Looks good to me.
Reviewed-by: Shuah Khan skhan@linuxfoundation.org
thanks, -- Shuah
On Fri, Apr 22, 2022 at 02:29:40PM -0700, Axel Rasmussen wrote:
This not being included was just a simple oversight. There are certain features (like minor fault support) which are only enabled on shared mappings, so without including hugetlb_shared we actually lose a significant amount of test coverage.
Signed-off-by: Axel Rasmussen axelrasmussen@google.com
Reviewed-by: Peter Xu peterx@redhat.com
Historically, it has been shown that intercepting kernel faults with userfaultfd (thereby forcing the kernel to wait for an arbitrary amount of time) can be exploited, or at least can make some kinds of exploits easier. So, in 37cd0575b8 "userfaultfd: add UFFD_USER_MODE_ONLY" we changed things so, in order for kernel faults to be handled by userfaultfd, either the process needs CAP_SYS_PTRACE, or this sysctl must be configured so that any unprivileged user can do it.
In a typical implementation of a hypervisor with live migration (take QEMU/KVM as one such example), we do indeed need to be able to handle kernel faults. But, both options above are less than ideal:
- Toggling the sysctl increases attack surface by allowing any unprivileged user to do it.
- Granting the live migration process CAP_SYS_PTRACE gives it this ability, but *also* the ability to "observe and control the execution of another process [...], and examine and change [its] memory and registers" (from ptrace(2)). This isn't something we need or want to be able to do, so granting this permission violates the "principle of least privilege".
This is all a long winded way to say: we want a more fine-grained way to grant access to userfaultfd, without granting other additional permissions at the same time.
To achieve this, add a /dev/userfaultfd misc device. This device provides an alternative to the userfaultfd(2) syscall for the creation of new userfaultfds. The idea is, any userfaultfds created this way will be able to handle kernel faults, without the caller having any special capabilities. Access to this mechanism is instead restricted using e.g. standard filesystem permissions.
Signed-off-by: Axel Rasmussen axelrasmussen@google.com --- fs/userfaultfd.c | 79 ++++++++++++++++++++++++++------ include/uapi/linux/userfaultfd.h | 4 ++ 2 files changed, 69 insertions(+), 14 deletions(-)
diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c index aa0c47cb0d16..16d7573ab41a 100644 --- a/fs/userfaultfd.c +++ b/fs/userfaultfd.c @@ -29,6 +29,7 @@ #include <linux/ioctl.h> #include <linux/security.h> #include <linux/hugetlb.h> +#include <linux/miscdevice.h>
int sysctl_unprivileged_userfaultfd __read_mostly;
@@ -65,6 +66,8 @@ struct userfaultfd_ctx { unsigned int flags; /* features requested from the userspace */ unsigned int features; + /* whether or not to handle kernel faults */ + bool handle_kernel_faults; /* released */ bool released; /* memory mappings are changing because of non-cooperative event */ @@ -410,13 +413,8 @@ vm_fault_t handle_userfault(struct vm_fault *vmf, unsigned long reason)
if (ctx->features & UFFD_FEATURE_SIGBUS) goto out; - if ((vmf->flags & FAULT_FLAG_USER) == 0 && - ctx->flags & UFFD_USER_MODE_ONLY) { - printk_once(KERN_WARNING "uffd: Set unprivileged_userfaultfd " - "sysctl knob to 1 if kernel faults must be handled " - "without obtaining CAP_SYS_PTRACE capability\n"); + if (!(vmf->flags & FAULT_FLAG_USER) && !ctx->handle_kernel_faults) goto out; - }
/* * If it's already released don't get it. This avoids to loop @@ -2064,19 +2062,33 @@ static void init_once_userfaultfd_ctx(void *mem) seqcount_spinlock_init(&ctx->refile_seq, &ctx->fault_pending_wqh.lock); }
-SYSCALL_DEFINE1(userfaultfd, int, flags) +static inline bool userfaultfd_allowed(bool is_syscall, int flags) +{ + bool kernel_faults = !(flags & UFFD_USER_MODE_ONLY); + bool allow_unprivileged = sysctl_unprivileged_userfaultfd; + + /* userfaultfd(2) access is controlled by sysctl + capability. */ + if (is_syscall && kernel_faults) { + if (!allow_unprivileged && !capable(CAP_SYS_PTRACE)) + return false; + } + + /* + * For /dev/userfaultfd, access is to be controlled using e.g. + * permissions on the device node. We assume this is correctly + * configured by userspace, so we simply allow access here. + */ + + return true; +} + +static int new_userfaultfd(bool is_syscall, int flags) { struct userfaultfd_ctx *ctx; int fd;
- if (!sysctl_unprivileged_userfaultfd && - (flags & UFFD_USER_MODE_ONLY) == 0 && - !capable(CAP_SYS_PTRACE)) { - printk_once(KERN_WARNING "uffd: Set unprivileged_userfaultfd " - "sysctl knob to 1 if kernel faults must be handled " - "without obtaining CAP_SYS_PTRACE capability\n"); + if (!userfaultfd_allowed(is_syscall, flags)) return -EPERM; - }
BUG_ON(!current->mm);
@@ -2095,6 +2107,11 @@ SYSCALL_DEFINE1(userfaultfd, int, flags) refcount_set(&ctx->refcount, 1); ctx->flags = flags; ctx->features = 0; + /* + * If UFFD_USER_MODE_ONLY is not set, then userfaultfd_allowed() above + * decided that kernel faults were allowed and should be handled. + */ + ctx->handle_kernel_faults = !(flags & UFFD_USER_MODE_ONLY); ctx->released = false; atomic_set(&ctx->mmap_changing, 0); ctx->mm = current->mm; @@ -2110,8 +2127,42 @@ SYSCALL_DEFINE1(userfaultfd, int, flags) return fd; }
+SYSCALL_DEFINE1(userfaultfd, int, flags) +{ + return new_userfaultfd(true, flags); +} + +static int userfaultfd_dev_open(struct inode *inode, struct file *file) +{ + return 0; +} + +static long userfaultfd_dev_ioctl(struct file *file, unsigned int cmd, unsigned long flags) +{ + if (cmd != USERFAULTFD_IOC_NEW) + return -EINVAL; + + return new_userfaultfd(false, flags); +} + +static const struct file_operations userfaultfd_dev_fops = { + .open = userfaultfd_dev_open, + .unlocked_ioctl = userfaultfd_dev_ioctl, + .compat_ioctl = compat_ptr_ioctl, + .owner = THIS_MODULE, + .llseek = noop_llseek, +}; + +static struct miscdevice userfaultfd_misc = { + .minor = MISC_DYNAMIC_MINOR, + .name = "userfaultfd", + .fops = &userfaultfd_dev_fops +}; + static int __init userfaultfd_init(void) { + WARN_ON(misc_register(&userfaultfd_misc)); + userfaultfd_ctx_cachep = kmem_cache_create("userfaultfd_ctx_cache", sizeof(struct userfaultfd_ctx), 0, diff --git a/include/uapi/linux/userfaultfd.h b/include/uapi/linux/userfaultfd.h index ef739054cb1c..032a35b3bbd2 100644 --- a/include/uapi/linux/userfaultfd.h +++ b/include/uapi/linux/userfaultfd.h @@ -12,6 +12,10 @@
#include <linux/types.h>
+/* ioctls for /dev/userfaultfd */ +#define USERFAULTFD_IOC 0xAA +#define USERFAULTFD_IOC_NEW _IOWR(USERFAULTFD_IOC, 0x00, int) + /* * If the UFFDIO_API is upgraded someday, the UFFDIO_UNREGISTER and * UFFDIO_WAKE ioctls should be defined as _IOW and not as _IOR. In
On Fri, Apr 22, 2022 at 02:29:41PM -0700, Axel Rasmussen wrote: [...]
--- a/include/uapi/linux/userfaultfd.h +++ b/include/uapi/linux/userfaultfd.h @@ -12,6 +12,10 @@ #include <linux/types.h> +/* ioctls for /dev/userfaultfd */ +#define USERFAULTFD_IOC 0xAA +#define USERFAULTFD_IOC_NEW _IOWR(USERFAULTFD_IOC, 0x00, int)
Why this new ioctl is defined using _IOWR()? Since it neither reads from user memory nor writes into user memory, it should rather be defined using _IO(), shouldn't it?
You're right, [1] says _IO is appropriate for ioctls which only take an integer argument. I'll send a v3 with this fix, although I might wait a bit for any other review comments before doing so. Thanks for taking a look!
https://www.kernel.org/doc/html/latest/driver-api/ioctl.html
On Mon, Apr 25, 2022 at 1:32 PM Dmitry V. Levin ldv@altlinux.org wrote:
On Fri, Apr 22, 2022 at 02:29:41PM -0700, Axel Rasmussen wrote: [...]
--- a/include/uapi/linux/userfaultfd.h +++ b/include/uapi/linux/userfaultfd.h @@ -12,6 +12,10 @@
#include <linux/types.h>
+/* ioctls for /dev/userfaultfd */ +#define USERFAULTFD_IOC 0xAA +#define USERFAULTFD_IOC_NEW _IOWR(USERFAULTFD_IOC, 0x00, int)
Why this new ioctl is defined using _IOWR()? Since it neither reads from user memory nor writes into user memory, it should rather be defined using _IO(), shouldn't it?
-- ldv
On Tue, Apr 26, 2022 at 6:00 PM Axel Rasmussen axelrasmussen@google.com wrote:
You're right, [1] says _IO is appropriate for ioctls which only take an integer argument. I'll send a v3 with this fix, although I might wait a bit for any other review comments before doing so. Thanks for taking a look!
If there are no other command codes, you could also set .compat_ioctl to the same function pointer as .unlocked_ioctl, the compat_ptr_ioctl conversion is only needed when there are commands that take a pointer.
Armd
Axel,
On Fri, Apr 22, 2022 at 02:29:41PM -0700, Axel Rasmussen wrote:
@@ -65,6 +66,8 @@ struct userfaultfd_ctx { unsigned int flags; /* features requested from the userspace */ unsigned int features;
- /* whether or not to handle kernel faults */
- bool handle_kernel_faults;
Could you help explain why we need this bool? I failed to figure out myself on the difference against "!(ctx->flags & UFFD_USER_MODE_ONLY)".
Thanks,
On Tue, Apr 26, 2022 at 1:33 PM Peter Xu peterx@redhat.com wrote:
Axel,
On Fri, Apr 22, 2022 at 02:29:41PM -0700, Axel Rasmussen wrote:
@@ -65,6 +66,8 @@ struct userfaultfd_ctx { unsigned int flags; /* features requested from the userspace */ unsigned int features;
/* whether or not to handle kernel faults */
bool handle_kernel_faults;
Could you help explain why we need this bool? I failed to figure out myself on the difference against "!(ctx->flags & UFFD_USER_MODE_ONLY)".
Ah, yeah you're right, we can get rid of it and just rely on UFFD_USER_MODE_ONLY.
Just to add context, in a previous version I never sent out, I had:
ctx->handle_kernel_faults = userfaultfd_allowed(...);
That's wrong for other reasons, but if we were going to do that we'd have to store the result, since it's a function not just of the flags, but also of the method used to create the userfaultfd. I changed this without also dropping the boolean, which can now be cleaned up. I'll include this change in a v3.
Thanks,
-- Peter Xu
We clearly want to ensure both userfaultfd(2) and /dev/userfaultfd keep working into the future, so just run the test twice, using each interface.
Signed-off-by: Axel Rasmussen axelrasmussen@google.com --- tools/testing/selftests/vm/userfaultfd.c | 31 ++++++++++++++++++++++-- 1 file changed, 29 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/vm/userfaultfd.c b/tools/testing/selftests/vm/userfaultfd.c index 92a4516f8f0d..12ae742a9981 100644 --- a/tools/testing/selftests/vm/userfaultfd.c +++ b/tools/testing/selftests/vm/userfaultfd.c @@ -77,6 +77,9 @@ static int bounces; #define TEST_SHMEM 3 static int test_type;
+/* test using /dev/userfaultfd, instead of userfaultfd(2) */ +static bool test_dev_userfaultfd; + /* exercise the test_uffdio_*_eexist every ALARM_INTERVAL_SECS */ #define ALARM_INTERVAL_SECS 10 static volatile bool test_uffdio_copy_eexist = true; @@ -383,13 +386,31 @@ static void assert_expected_ioctls_present(uint64_t mode, uint64_t ioctls) } }
+static void __userfaultfd_open_dev(void) +{ + int fd; + + uffd = -1; + fd = open("/dev/userfaultfd", O_RDWR | O_CLOEXEC); + if (fd < 0) + return; + + uffd = ioctl(fd, USERFAULTFD_IOC_NEW, + O_CLOEXEC | O_NONBLOCK | UFFD_USER_MODE_ONLY); + close(fd); +} + static void userfaultfd_open(uint64_t *features) { struct uffdio_api uffdio_api;
- uffd = syscall(__NR_userfaultfd, O_CLOEXEC | O_NONBLOCK | UFFD_USER_MODE_ONLY); + if (test_dev_userfaultfd) + __userfaultfd_open_dev(); + else + uffd = syscall(__NR_userfaultfd, + O_CLOEXEC | O_NONBLOCK | UFFD_USER_MODE_ONLY); if (uffd < 0) - err("userfaultfd syscall not available in this kernel"); + err("creating userfaultfd failed"); uffd_flags = fcntl(uffd, F_GETFD, NULL);
uffdio_api.api = UFFD_API; @@ -1698,6 +1719,12 @@ int main(int argc, char **argv) } printf("nr_pages: %lu, nr_pages_per_cpu: %lu\n", nr_pages, nr_pages_per_cpu); + + test_dev_userfaultfd = false; + if (userfaultfd_stress()) + return 1; + + test_dev_userfaultfd = true; return userfaultfd_stress(); }
On 4/22/22 3:29 PM, Axel Rasmussen wrote:
We clearly want to ensure both userfaultfd(2) and /dev/userfaultfd keep working into the future, so just run the test twice, using each interface.
Signed-off-by: Axel Rasmussen axelrasmussen@google.com
tools/testing/selftests/vm/userfaultfd.c | 31 ++++++++++++++++++++++-- 1 file changed, 29 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/vm/userfaultfd.c b/tools/testing/selftests/vm/userfaultfd.c index 92a4516f8f0d..12ae742a9981 100644 --- a/tools/testing/selftests/vm/userfaultfd.c +++ b/tools/testing/selftests/vm/userfaultfd.c @@ -77,6 +77,9 @@ static int bounces; #define TEST_SHMEM 3 static int test_type; +/* test using /dev/userfaultfd, instead of userfaultfd(2) */ +static bool test_dev_userfaultfd;
- /* exercise the test_uffdio_*_eexist every ALARM_INTERVAL_SECS */ #define ALARM_INTERVAL_SECS 10 static volatile bool test_uffdio_copy_eexist = true;
@@ -383,13 +386,31 @@ static void assert_expected_ioctls_present(uint64_t mode, uint64_t ioctls) } } +static void __userfaultfd_open_dev(void) +{
- int fd;
- uffd = -1;
- fd = open("/dev/userfaultfd", O_RDWR | O_CLOEXEC);
- if (fd < 0)
return;
- uffd = ioctl(fd, USERFAULTFD_IOC_NEW,
O_CLOEXEC | O_NONBLOCK | UFFD_USER_MODE_ONLY);
- close(fd);
+}
- static void userfaultfd_open(uint64_t *features) { struct uffdio_api uffdio_api;
- uffd = syscall(__NR_userfaultfd, O_CLOEXEC | O_NONBLOCK | UFFD_USER_MODE_ONLY);
- if (test_dev_userfaultfd)
__userfaultfd_open_dev();
- else
uffd = syscall(__NR_userfaultfd,
if (uffd < 0)O_CLOEXEC | O_NONBLOCK | UFFD_USER_MODE_ONLY);
err("userfaultfd syscall not available in this kernel");
err("creating userfaultfd failed");
This isn't an error as in test failure. This will be a skip because of unmet dependencies. Also if this test requires root access, please check for that and make that a skip as well.
uffd_flags = fcntl(uffd, F_GETFD, NULL); uffdio_api.api = UFFD_API; @@ -1698,6 +1719,12 @@ int main(int argc, char **argv) } printf("nr_pages: %lu, nr_pages_per_cpu: %lu\n", nr_pages, nr_pages_per_cpu);
- test_dev_userfaultfd = false;
- if (userfaultfd_stress())
return 1;
- test_dev_userfaultfd = true; return userfaultfd_stress(); }
thanks, -- Shuah
On Tue, Apr 26, 2022 at 9:16 AM Shuah Khan skhan@linuxfoundation.org wrote:
On 4/22/22 3:29 PM, Axel Rasmussen wrote:
We clearly want to ensure both userfaultfd(2) and /dev/userfaultfd keep working into the future, so just run the test twice, using each interface.
Signed-off-by: Axel Rasmussen axelrasmussen@google.com
tools/testing/selftests/vm/userfaultfd.c | 31 ++++++++++++++++++++++-- 1 file changed, 29 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/vm/userfaultfd.c b/tools/testing/selftests/vm/userfaultfd.c index 92a4516f8f0d..12ae742a9981 100644 --- a/tools/testing/selftests/vm/userfaultfd.c +++ b/tools/testing/selftests/vm/userfaultfd.c @@ -77,6 +77,9 @@ static int bounces; #define TEST_SHMEM 3 static int test_type;
+/* test using /dev/userfaultfd, instead of userfaultfd(2) */ +static bool test_dev_userfaultfd;
- /* exercise the test_uffdio_*_eexist every ALARM_INTERVAL_SECS */ #define ALARM_INTERVAL_SECS 10 static volatile bool test_uffdio_copy_eexist = true;
@@ -383,13 +386,31 @@ static void assert_expected_ioctls_present(uint64_t mode, uint64_t ioctls) } }
+static void __userfaultfd_open_dev(void) +{
int fd;
uffd = -1;
fd = open("/dev/userfaultfd", O_RDWR | O_CLOEXEC);
if (fd < 0)
return;
uffd = ioctl(fd, USERFAULTFD_IOC_NEW,
O_CLOEXEC | O_NONBLOCK | UFFD_USER_MODE_ONLY);
close(fd);
+}
- static void userfaultfd_open(uint64_t *features) { struct uffdio_api uffdio_api;
uffd = syscall(__NR_userfaultfd, O_CLOEXEC | O_NONBLOCK | UFFD_USER_MODE_ONLY);
if (test_dev_userfaultfd)
__userfaultfd_open_dev();
else
uffd = syscall(__NR_userfaultfd,
O_CLOEXEC | O_NONBLOCK | UFFD_USER_MODE_ONLY); if (uffd < 0)
err("userfaultfd syscall not available in this kernel");
err("creating userfaultfd failed");
This isn't an error as in test failure. This will be a skip because of unmet dependencies. Also if this test requires root access, please check for that and make that a skip as well.
Testing with the userfaultfd syscall doesn't require any special permissions (root or otherwise).
But testing with /dev/userfaultfd will require access to that device node, which is root:root by default, but the system administrator may have changed this. In general I think this will only fail due to a) lack of kernel support or b) lack of permissions though, so always exiting with KSFT_SKIP here seems reasonable. I'll make that change in v3.
uffd_flags = fcntl(uffd, F_GETFD, NULL); uffdio_api.api = UFFD_API;
@@ -1698,6 +1719,12 @@ int main(int argc, char **argv) } printf("nr_pages: %lu, nr_pages_per_cpu: %lu\n", nr_pages, nr_pages_per_cpu);
test_dev_userfaultfd = false;
if (userfaultfd_stress())
return 1;
}test_dev_userfaultfd = true; return userfaultfd_stress();
thanks, -- Shuah
Explain the different ways to create a new userfaultfd, and how access control works for each way.
Signed-off-by: Axel Rasmussen axelrasmussen@google.com --- Documentation/admin-guide/mm/userfaultfd.rst | 38 ++++++++++++++++++-- Documentation/admin-guide/sysctl/vm.rst | 3 ++ 2 files changed, 39 insertions(+), 2 deletions(-)
diff --git a/Documentation/admin-guide/mm/userfaultfd.rst b/Documentation/admin-guide/mm/userfaultfd.rst index 6528036093e1..4c079b5377d4 100644 --- a/Documentation/admin-guide/mm/userfaultfd.rst +++ b/Documentation/admin-guide/mm/userfaultfd.rst @@ -17,7 +17,10 @@ of the ``PROT_NONE+SIGSEGV`` trick. Design ======
-Userfaults are delivered and resolved through the ``userfaultfd`` syscall. +Userspace creates a new userfaultfd, initializes it, and registers one or more +regions of virtual memory with it. Then, any page faults which occur within the +region(s) result in a message being delivered to the userfaultfd, notifying +userspace of the fault.
The ``userfaultfd`` (aside from registering and unregistering virtual memory ranges) provides two primary functionalities: @@ -39,7 +42,7 @@ Vmas are not suitable for page- (or hugepage) granular fault tracking when dealing with virtual address spaces that could span Terabytes. Too many vmas would be needed for that.
-The ``userfaultfd`` once opened by invoking the syscall, can also be +The ``userfaultfd``, once created, can also be passed using unix domain sockets to a manager process, so the same manager process could handle the userfaults of a multitude of different processes without them being aware about what is going on @@ -50,6 +53,37 @@ is a corner case that would currently return ``-EBUSY``). API ===
+Creating a userfaultfd +---------------------- + +There are two mechanisms to create a userfaultfd. There are various ways to +restrict this too, since userfaultfds which handle kernel page faults have +historically been a useful tool for exploiting the kernel. + +The first is the userfaultfd(2) syscall. Access to this is controlled in several +ways: + +- By default, the userfaultfd will be able to handle kernel page faults. This + can be disabled by passing in UFFD_USER_MODE_ONLY. + +- If vm.unprivileged_userfaultfd is 0, then the caller must *either* have + CAP_SYS_PTRACE, or pass in UFFD_USER_MODE_ONLY. + +- If vm.unprivileged_userfaultfd is 1, then no particular privilege is needed to + use this syscall, even if UFFD_USER_MODE_ONLY is *not* set. + +Alternatively, userfaultfds can be created by opening /dev/userfaultfd, and +issuing a USERFAULTFD_IOC_NEW ioctl to this device. Access to this device is +controlled via normal filesystem permissions (user/group/mode for example) - no +additional permission (capability/sysctl) is needed to be able to handle kernel +faults this way. This is useful because it allows e.g. a specific user or group +to be able to create kernel-fault-handling userfaultfds, without allowing it +more broadly, or granting more privileges in addition to that particular ability +(CAP_SYS_PTRACE). In other words, it allows permissions to be minimized. + +Initializing up a userfaultfd +------------------------ + When first opened the ``userfaultfd`` must be enabled invoking the ``UFFDIO_API`` ioctl specifying a ``uffdio_api.api`` value set to ``UFFD_API`` (or a later API version) which will specify the ``read/POLLIN`` protocol diff --git a/Documentation/admin-guide/sysctl/vm.rst b/Documentation/admin-guide/sysctl/vm.rst index f4804ce37c58..8682d5fbc8ea 100644 --- a/Documentation/admin-guide/sysctl/vm.rst +++ b/Documentation/admin-guide/sysctl/vm.rst @@ -880,6 +880,9 @@ calls without any restrictions.
The default value is 0.
+An alternative to this sysctl / the userfaultfd(2) syscall is to create +userfaultfds via /dev/userfaultfd. See +Documentation/admin-guide/mm/userfaultfd.rst.
user_reserve_kbytes ===================
On 4/22/22 3:29 PM, Axel Rasmussen wrote:
Explain the different ways to create a new userfaultfd, and how access control works for each way.
Signed-off-by: Axel Rasmussen axelrasmussen@google.com
Documentation/admin-guide/mm/userfaultfd.rst | 38 ++++++++++++++++++-- Documentation/admin-guide/sysctl/vm.rst | 3 ++ 2 files changed, 39 insertions(+), 2 deletions(-)
diff --git a/Documentation/admin-guide/mm/userfaultfd.rst b/Documentation/admin-guide/mm/userfaultfd.rst index 6528036093e1..4c079b5377d4 100644 --- a/Documentation/admin-guide/mm/userfaultfd.rst +++ b/Documentation/admin-guide/mm/userfaultfd.rst @@ -17,7 +17,10 @@ of the ``PROT_NONE+SIGSEGV`` trick. Design ====== -Userfaults are delivered and resolved through the ``userfaultfd`` syscall.
Please keep this sentence in there and rephrase it to indicate how it was done in the past.
Also explain here why this new approach is better than the syscall approach before getting into the below details.
+Userspace creates a new userfaultfd, initializes it, and registers one or more +regions of virtual memory with it. Then, any page faults which occur within the +region(s) result in a message being delivered to the userfaultfd, notifying +userspace of the fault. The ``userfaultfd`` (aside from registering and unregistering virtual memory ranges) provides two primary functionalities: @@ -39,7 +42,7 @@ Vmas are not suitable for page- (or hugepage) granular fault tracking when dealing with virtual address spaces that could span Terabytes. Too many vmas would be needed for that.> -The ``userfaultfd`` once opened by invoking the syscall, can also be +The ``userfaultfd``, once created, can also be
This is sentence is too short and would look odd. Combine the sentences so it renders well in the generated doc.
passed using unix domain sockets to a manager process, so the same manager process could handle the userfaults of a multitude of different processes without them being aware about what is going on @@ -50,6 +53,37 @@ is a corner case that would currently return ``-EBUSY``). API === +Creating a userfaultfd +----------------------
+There are two mechanisms to create a userfaultfd. There are various ways to +restrict this too, since userfaultfds which handle kernel page faults have +historically been a useful tool for exploiting the kernel.
+The first is the userfaultfd(2) syscall. Access to this is controlled in several +ways:
+- By default, the userfaultfd will be able to handle kernel page faults. This
- can be disabled by passing in UFFD_USER_MODE_ONLY.
+- If vm.unprivileged_userfaultfd is 0, then the caller must *either* have
- CAP_SYS_PTRACE, or pass in UFFD_USER_MODE_ONLY.
+- If vm.unprivileged_userfaultfd is 1, then no particular privilege is needed to
- use this syscall, even if UFFD_USER_MODE_ONLY is *not* set.
+Alternatively, userfaultfds can be created by opening /dev/userfaultfd, and +issuing a USERFAULTFD_IOC_NEW ioctl to this device. Access to this device is
New ioctl? I thought we are moving away from using ioctls?
+controlled via normal filesystem permissions (user/group/mode for example) - no +additional permission (capability/sysctl) is needed to be able to handle kernel +faults this way. This is useful because it allows e.g. a specific user or group +to be able to create kernel-fault-handling userfaultfds, without allowing it +more broadly, or granting more privileges in addition to that particular ability +(CAP_SYS_PTRACE). In other words, it allows permissions to be minimized.
+Initializing up a userfaultfd +------------------------
This will generate doc warn very likley - extend the dashes to the entire length of the subtitle.
When first opened the ``userfaultfd`` must be enabled invoking the ``UFFDIO_API`` ioctl specifying a ``uffdio_api.api`` value set to ``UFFD_API`` (or a later API version) which will specify the ``read/POLLIN`` protocol diff --git a/Documentation/admin-guide/sysctl/vm.rst b/Documentation/admin-guide/sysctl/vm.rst index f4804ce37c58..8682d5fbc8ea 100644 --- a/Documentation/admin-guide/sysctl/vm.rst +++ b/Documentation/admin-guide/sysctl/vm.rst @@ -880,6 +880,9 @@ calls without any restrictions. The default value is 0. +An alternative to this sysctl / the userfaultfd(2) syscall is to create +userfaultfds via /dev/userfaultfd. See +Documentation/admin-guide/mm/userfaultfd.rst. user_reserve_kbytes ===================
thanks, -- Shuah
On Tue, Apr 26, 2022 at 9:46 AM Shuah Khan skhan@linuxfoundation.org wrote:
On 4/22/22 3:29 PM, Axel Rasmussen wrote:
Explain the different ways to create a new userfaultfd, and how access control works for each way.
Signed-off-by: Axel Rasmussen axelrasmussen@google.com
Documentation/admin-guide/mm/userfaultfd.rst | 38 ++++++++++++++++++-- Documentation/admin-guide/sysctl/vm.rst | 3 ++ 2 files changed, 39 insertions(+), 2 deletions(-)
diff --git a/Documentation/admin-guide/mm/userfaultfd.rst b/Documentation/admin-guide/mm/userfaultfd.rst index 6528036093e1..4c079b5377d4 100644 --- a/Documentation/admin-guide/mm/userfaultfd.rst +++ b/Documentation/admin-guide/mm/userfaultfd.rst @@ -17,7 +17,10 @@ of the ``PROT_NONE+SIGSEGV`` trick. Design ======
-Userfaults are delivered and resolved through the ``userfaultfd`` syscall.
Please keep this sentence in there and rephrase it to indicate how it was done in the past.
Also explain here why this new approach is better than the syscall approach before getting into the below details.
Hmm, so the old sentence I think was incorrect already. Notifications of *the faults* aren't delivered and resolved through the syscall. Rather, the syscall just gives you a file descriptor, and then notification / resolution of faults happens though the file descriptor, not through the syscall. So I think it needs to be reworded in any case.
I think the overall structure of the doc as-is makes the most sense as well - first explain how this will be used at a very high level, and then go into the details (first how to create a userfaultfd, then how to use it).
So, in the end I reworded the "Creating a userfaultfd" section, to cover the two things you mentioned:
- Which is the "older" way and which is the "newer" way - What the benefit of the newer way is
Hopefully this addresses the comment? I can tweak it more if needed. In any case, thanks for taking a look at this series!
+Userspace creates a new userfaultfd, initializes it, and registers one or more +regions of virtual memory with it. Then, any page faults which occur within the +region(s) result in a message being delivered to the userfaultfd, notifying +userspace of the fault.
The ``userfaultfd`` (aside from registering and unregistering virtual memory ranges) provides two primary functionalities: @@ -39,7 +42,7 @@ Vmas are not suitable for page- (or hugepage) granular fault tracking when dealing with virtual address spaces that could span Terabytes. Too many vmas would be needed for that.> -The ``userfaultfd`` once opened by invoking the syscall, can also be +The ``userfaultfd``, once created, can also be
This is sentence is too short and would look odd. Combine the sentences so it renders well in the generated doc.
Not 100% sure I understood the concern, but I do think it makes sense to move "Vmas are not suitable ..." up into the same paragraph with the other sentence about scalability. I'll do this in v3 as it looks a bit nicer. This leaves the "The userfaultfd, once created, ..." part alone, though. I think s/once opened by invoking the syscall/once created/ is correct, since there are now various ways to create it. I also think that second comma technically should have been there even in the previous version.
passed using unix domain sockets to a manager process, so the same manager process could handle the userfaults of a multitude of different processes without them being aware about what is going on @@ -50,6 +53,37 @@ is a corner case that would currently return ``-EBUSY``). API ===
+Creating a userfaultfd +----------------------
+There are two mechanisms to create a userfaultfd. There are various ways to +restrict this too, since userfaultfds which handle kernel page faults have +historically been a useful tool for exploiting the kernel.
+The first is the userfaultfd(2) syscall. Access to this is controlled in several +ways:
+- By default, the userfaultfd will be able to handle kernel page faults. This
- can be disabled by passing in UFFD_USER_MODE_ONLY.
+- If vm.unprivileged_userfaultfd is 0, then the caller must *either* have
- CAP_SYS_PTRACE, or pass in UFFD_USER_MODE_ONLY.
+- If vm.unprivileged_userfaultfd is 1, then no particular privilege is needed to
- use this syscall, even if UFFD_USER_MODE_ONLY is *not* set.
+Alternatively, userfaultfds can be created by opening /dev/userfaultfd, and +issuing a USERFAULTFD_IOC_NEW ioctl to this device. Access to this device is
New ioctl? I thought we are moving away from using ioctls?
Hmm, looking at alternatives [1] am not sure I see a viable one:
We could have defined a new "userfaultfdfs" filesystem, but it seems to me to be overkill for this feature.
We could have used a syscall instead and supported fine-grained access control with a new capability, but this approach was rejected [2] generally because we prefer to avoid adding capabilities, and this new capability's scope (just userfaultfd) was considered too narrow.
So, I'm not sure of another better way to do this. I suppose one could argue that the dislike of ioctls outweighs the usefulness of this feature, but to me at least the tradeoff seems worth it. :)
[1]: https://www.kernel.org/doc/html/latest/driver-api/ioctl.html#alternatives-to... [2]: https://lkml.org/lkml/2022/2/24/1012
+controlled via normal filesystem permissions (user/group/mode for example) - no +additional permission (capability/sysctl) is needed to be able to handle kernel +faults this way. This is useful because it allows e.g. a specific user or group +to be able to create kernel-fault-handling userfaultfds, without allowing it +more broadly, or granting more privileges in addition to that particular ability +(CAP_SYS_PTRACE). In other words, it allows permissions to be minimized.
+Initializing up a userfaultfd +------------------------
This will generate doc warn very likley - extend the dashes to the entire length of the subtitle.
I'll fix this in v3.
When first opened the ``userfaultfd`` must be enabled invoking the ``UFFDIO_API`` ioctl specifying a ``uffdio_api.api`` value set to ``UFFD_API`` (or a later API version) which will specify the ``read/POLLIN`` protocol diff --git a/Documentation/admin-guide/sysctl/vm.rst b/Documentation/admin-guide/sysctl/vm.rst index f4804ce37c58..8682d5fbc8ea 100644 --- a/Documentation/admin-guide/sysctl/vm.rst +++ b/Documentation/admin-guide/sysctl/vm.rst @@ -880,6 +880,9 @@ calls without any restrictions.
The default value is 0.
+An alternative to this sysctl / the userfaultfd(2) syscall is to create +userfaultfds via /dev/userfaultfd. See +Documentation/admin-guide/mm/userfaultfd.rst.
user_reserve_kbytes
thanks, -- Shuah
Instead of always testing both userfaultfd(2) and /dev/userfaultfd, let the user choose which to test.
As with other test features, change the behavior based on a new command line flag. Introduce the idea of "test mods", which are generic (not specific to a test type) modifications to the behavior of the test. This is sort of borrowed from this RFC patch series [1], but simplified a bit.
The benefit is, in "typical" configurations this test is somewhat slow (say, 30sec or something). Testing both clearly doubles it, so it may not always be desirable, as users are likely to use one or the other, but never both, in the "real world".
[1]: https://patchwork.kernel.org/project/linux-mm/patch/20201129004548.1619714-1...
Signed-off-by: Axel Rasmussen axelrasmussen@google.com --- tools/testing/selftests/vm/userfaultfd.c | 41 +++++++++++++++++------- 1 file changed, 30 insertions(+), 11 deletions(-)
diff --git a/tools/testing/selftests/vm/userfaultfd.c b/tools/testing/selftests/vm/userfaultfd.c index 12ae742a9981..274522704e40 100644 --- a/tools/testing/selftests/vm/userfaultfd.c +++ b/tools/testing/selftests/vm/userfaultfd.c @@ -142,8 +142,17 @@ static void usage(void) { fprintf(stderr, "\nUsage: ./userfaultfd <test type> <MiB> <bounces> " "[hugetlbfs_file]\n\n"); + fprintf(stderr, "Supported <test type>: anon, hugetlb, " "hugetlb_shared, shmem\n\n"); + + fprintf(stderr, "'Test mods' can be joined to the test type string with a ':'. " + "Supported mods:\n"); + fprintf(stderr, "\tdev - Use /dev/userfaultfd instead of userfaultfd(2)\n"); + fprintf(stderr, "\nExample test mod usage:\n"); + fprintf(stderr, "# Run anonymous memory test with /dev/userfaultfd:\n"); + fprintf(stderr, "./userfaultfd anon:dev 100 99999\n\n"); + fprintf(stderr, "Examples:\n\n"); fprintf(stderr, "%s", examples); exit(1); @@ -1610,8 +1619,6 @@ unsigned long default_huge_page_size(void)
static void set_test_type(const char *type) { - uint64_t features = UFFD_API_FEATURES; - if (!strcmp(type, "anon")) { test_type = TEST_ANON; uffd_test_ops = &anon_uffd_test_ops; @@ -1631,10 +1638,28 @@ static void set_test_type(const char *type) test_type = TEST_SHMEM; uffd_test_ops = &shmem_uffd_test_ops; test_uffdio_minor = true; - } else { - err("Unknown test type: %s", type); + } +} + +static void parse_test_type_arg(const char *raw_type) +{ + char *buf = strdup(raw_type); + uint64_t features = UFFD_API_FEATURES; + + while (buf) { + const char *token = strsep(&buf, ":"); + + if (!test_type) + set_test_type(token); + else if (!strcmp(token, "dev")) + test_dev_userfaultfd = true; + else + err("unrecognized test mod '%s'", token); }
+ if (!test_type) + err("failed to parse test type argument: '%s'", raw_type); + if (test_type == TEST_HUGETLB) page_size = default_huge_page_size(); else @@ -1681,7 +1706,7 @@ int main(int argc, char **argv) err("failed to arm SIGALRM"); alarm(ALARM_INTERVAL_SECS);
- set_test_type(argv[1]); + parse_test_type_arg(argv[1]);
nr_cpus = sysconf(_SC_NPROCESSORS_ONLN); nr_pages_per_cpu = atol(argv[2]) * 1024*1024 / page_size / @@ -1719,12 +1744,6 @@ int main(int argc, char **argv) } printf("nr_pages: %lu, nr_pages_per_cpu: %lu\n", nr_pages, nr_pages_per_cpu); - - test_dev_userfaultfd = false; - if (userfaultfd_stress()) - return 1; - - test_dev_userfaultfd = true; return userfaultfd_stress(); }
On 4/22/22 3:29 PM, Axel Rasmussen wrote:
Instead of always testing both userfaultfd(2) and /dev/userfaultfd, let the user choose which to test.
As with other test features, change the behavior based on a new command line flag. Introduce the idea of "test mods", which are generic (not specific to a test type) modifications to the behavior of the test. This is sort of borrowed from this RFC patch series [1], but simplified a bit.
The benefit is, in "typical" configurations this test is somewhat slow (say, 30sec or something). Testing both clearly doubles it, so it may not always be desirable, as users are likely to use one or the other, but never both, in the "real world".
Signed-off-by: Axel Rasmussen axelrasmussen@google.com
tools/testing/selftests/vm/userfaultfd.c | 41 +++++++++++++++++------- 1 file changed, 30 insertions(+), 11 deletions(-)
diff --git a/tools/testing/selftests/vm/userfaultfd.c b/tools/testing/selftests/vm/userfaultfd.c index 12ae742a9981..274522704e40 100644 --- a/tools/testing/selftests/vm/userfaultfd.c +++ b/tools/testing/selftests/vm/userfaultfd.c @@ -142,8 +142,17 @@ static void usage(void) { fprintf(stderr, "\nUsage: ./userfaultfd <test type> <MiB> <bounces> " "[hugetlbfs_file]\n\n");
Remove the extra blank line here.
fprintf(stderr, "Supported <test type>: anon, hugetlb, " "hugetlb_shared, shmem\n\n");
Remove the extra blank line here.
- fprintf(stderr, "'Test mods' can be joined to the test type string with a ':'. "
"Supported mods:\n");
- fprintf(stderr, "\tdev - Use /dev/userfaultfd instead of userfaultfd(2)\n");
- fprintf(stderr, "\nExample test mod usage:\n");
- fprintf(stderr, "# Run anonymous memory test with /dev/userfaultfd:\n");
- fprintf(stderr, "./userfaultfd anon:dev 100 99999\n\n");
- fprintf(stderr, "Examples:\n\n"); fprintf(stderr, "%s", examples);
Update examples above with new test cases if any.
exit(1); @@ -1610,8 +1619,6 @@ unsigned long default_huge_page_size(void) static void set_test_type(const char *type) {
- uint64_t features = UFFD_API_FEATURES;
- if (!strcmp(type, "anon")) { test_type = TEST_ANON; uffd_test_ops = &anon_uffd_test_ops;
@@ -1631,10 +1638,28 @@ static void set_test_type(const char *type) test_type = TEST_SHMEM; uffd_test_ops = &shmem_uffd_test_ops; test_uffdio_minor = true;
- } else {
err("Unknown test type: %s", type);
- }
At this point, it might make it so much easier and maintainable if we were to use getopt instead of parsing options.
+}
+static void parse_test_type_arg(const char *raw_type) +{
- char *buf = strdup(raw_type);
- uint64_t features = UFFD_API_FEATURES;
- while (buf) {
const char *token = strsep(&buf, ":");
if (!test_type)
set_test_type(token);
else if (!strcmp(token, "dev"))
test_dev_userfaultfd = true;
else
}err("unrecognized test mod '%s'", token);
- if (!test_type)
err("failed to parse test type argument: '%s'", raw_type);
- if (test_type == TEST_HUGETLB) page_size = default_huge_page_size(); else
@@ -1681,7 +1706,7 @@ int main(int argc, char **argv) err("failed to arm SIGALRM"); alarm(ALARM_INTERVAL_SECS);
- set_test_type(argv[1]);
- parse_test_type_arg(argv[1]);
nr_cpus = sysconf(_SC_NPROCESSORS_ONLN); nr_pages_per_cpu = atol(argv[2]) * 1024*1024 / page_size / @@ -1719,12 +1744,6 @@ int main(int argc, char **argv) } printf("nr_pages: %lu, nr_pages_per_cpu: %lu\n", nr_pages, nr_pages_per_cpu);
- test_dev_userfaultfd = false;
- if (userfaultfd_stress())
return 1;
- test_dev_userfaultfd = true; return userfaultfd_stress(); }
Same comments as before on fail vs. skip conditions to watch out for and report them correctly.
thanks, -- Shuah
On Tue, Apr 26, 2022 at 9:56 AM Shuah Khan skhan@linuxfoundation.org wrote:
On 4/22/22 3:29 PM, Axel Rasmussen wrote:
Instead of always testing both userfaultfd(2) and /dev/userfaultfd, let the user choose which to test.
As with other test features, change the behavior based on a new command line flag. Introduce the idea of "test mods", which are generic (not specific to a test type) modifications to the behavior of the test. This is sort of borrowed from this RFC patch series [1], but simplified a bit.
The benefit is, in "typical" configurations this test is somewhat slow (say, 30sec or something). Testing both clearly doubles it, so it may not always be desirable, as users are likely to use one or the other, but never both, in the "real world".
Signed-off-by: Axel Rasmussen axelrasmussen@google.com
tools/testing/selftests/vm/userfaultfd.c | 41 +++++++++++++++++------- 1 file changed, 30 insertions(+), 11 deletions(-)
diff --git a/tools/testing/selftests/vm/userfaultfd.c b/tools/testing/selftests/vm/userfaultfd.c index 12ae742a9981..274522704e40 100644 --- a/tools/testing/selftests/vm/userfaultfd.c +++ b/tools/testing/selftests/vm/userfaultfd.c @@ -142,8 +142,17 @@ static void usage(void) { fprintf(stderr, "\nUsage: ./userfaultfd <test type> <MiB> <bounces> " "[hugetlbfs_file]\n\n");
Remove the extra blank line here.
fprintf(stderr, "Supported <test type>: anon, hugetlb, " "hugetlb_shared, shmem\n\n");
Remove the extra blank line here.
fprintf(stderr, "'Test mods' can be joined to the test type string with a ':'. "
"Supported mods:\n");
fprintf(stderr, "\tdev - Use /dev/userfaultfd instead of userfaultfd(2)\n");
fprintf(stderr, "\nExample test mod usage:\n");
fprintf(stderr, "# Run anonymous memory test with /dev/userfaultfd:\n");
fprintf(stderr, "./userfaultfd anon:dev 100 99999\n\n");
fprintf(stderr, "Examples:\n\n"); fprintf(stderr, "%s", examples);
Update examples above with new test cases if any.
Will fix the above comments in v3.
exit(1);
@@ -1610,8 +1619,6 @@ unsigned long default_huge_page_size(void)
static void set_test_type(const char *type) {
uint64_t features = UFFD_API_FEATURES;
if (!strcmp(type, "anon")) { test_type = TEST_ANON; uffd_test_ops = &anon_uffd_test_ops;
@@ -1631,10 +1638,28 @@ static void set_test_type(const char *type) test_type = TEST_SHMEM; uffd_test_ops = &shmem_uffd_test_ops; test_uffdio_minor = true;
} else {
err("Unknown test type: %s", type);
}
At this point, it might make it so much easier and maintainable if we were to use getopt instead of parsing options.
Agreed, I'd like that as well. But, since it's a bigger refactor that affects all test types, I think it may be cleaner to leave it for a follow-up series.
+}
+static void parse_test_type_arg(const char *raw_type) +{
char *buf = strdup(raw_type);
uint64_t features = UFFD_API_FEATURES;
while (buf) {
const char *token = strsep(&buf, ":");
if (!test_type)
set_test_type(token);
else if (!strcmp(token, "dev"))
test_dev_userfaultfd = true;
else
err("unrecognized test mod '%s'", token); }
if (!test_type)
err("failed to parse test type argument: '%s'", raw_type);
if (test_type == TEST_HUGETLB) page_size = default_huge_page_size(); else
@@ -1681,7 +1706,7 @@ int main(int argc, char **argv) err("failed to arm SIGALRM"); alarm(ALARM_INTERVAL_SECS);
set_test_type(argv[1]);
parse_test_type_arg(argv[1]); nr_cpus = sysconf(_SC_NPROCESSORS_ONLN); nr_pages_per_cpu = atol(argv[2]) * 1024*1024 / page_size /
@@ -1719,12 +1744,6 @@ int main(int argc, char **argv) } printf("nr_pages: %lu, nr_pages_per_cpu: %lu\n", nr_pages, nr_pages_per_cpu);
test_dev_userfaultfd = false;
if (userfaultfd_stress())
return 1;
}test_dev_userfaultfd = true; return userfaultfd_stress();
Same comments as before on fail vs. skip conditions to watch out for and report them correctly.
I think in v3 things will be correct. Basically, in the skip cases we just exit(KSFT_SKIP) directly, instead of relying on the return value here. I'll take a pass and double check though before sending v3.
thanks, -- Shuah
This new mode was recently added to the userfaultfd selftest. We want to exercise both userfaultfd(2) as well as /dev/userfaultfd, so add both test cases to the script.
Signed-off-by: Axel Rasmussen axelrasmussen@google.com --- tools/testing/selftests/vm/run_vmtests.sh | 5 +++++ 1 file changed, 5 insertions(+)
diff --git a/tools/testing/selftests/vm/run_vmtests.sh b/tools/testing/selftests/vm/run_vmtests.sh index 5065dbd89bdb..57f01505c719 100755 --- a/tools/testing/selftests/vm/run_vmtests.sh +++ b/tools/testing/selftests/vm/run_vmtests.sh @@ -121,12 +121,17 @@ run_test ./gup_test -a run_test ./gup_test -ct -F 0x1 0 19 0x1000
run_test ./userfaultfd anon 20 16 +run_test ./userfaultfd anon:dev 20 16 # Hugetlb tests require source and destination huge pages. Pass in half the # size ($half_ufd_size_MB), which is used for *each*. run_test ./userfaultfd hugetlb "$half_ufd_size_MB" 32 +run_test ./userfaultfd hugetlb:dev "$half_ufd_size_MB" 32 run_test ./userfaultfd hugetlb_shared "$half_ufd_size_MB" 32 "$mnt"/uffd-test rm -f "$mnt"/uffd-test +run_test ./userfaultfd hugetlb_shared:dev "$half_ufd_size_MB" 32 "$mnt"/uffd-test +rm -f "$mnt"/uffd-test run_test ./userfaultfd shmem 20 16 +run_test ./userfaultfd shmem:dev 20 16
#cleanup umount "$mnt"
On 4/22/22 3:29 PM, Axel Rasmussen wrote:
This new mode was recently added to the userfaultfd selftest. We want to exercise both userfaultfd(2) as well as /dev/userfaultfd, so add both test cases to the script.
Signed-off-by: Axel Rasmussen axelrasmussen@google.com
tools/testing/selftests/vm/run_vmtests.sh | 5 +++++ 1 file changed, 5 insertions(+)
diff --git a/tools/testing/selftests/vm/run_vmtests.sh b/tools/testing/selftests/vm/run_vmtests.sh index 5065dbd89bdb..57f01505c719 100755 --- a/tools/testing/selftests/vm/run_vmtests.sh +++ b/tools/testing/selftests/vm/run_vmtests.sh @@ -121,12 +121,17 @@ run_test ./gup_test -a run_test ./gup_test -ct -F 0x1 0 19 0x1000 run_test ./userfaultfd anon 20 16 +run_test ./userfaultfd anon:dev 20 16 # Hugetlb tests require source and destination huge pages. Pass in half the # size ($half_ufd_size_MB), which is used for *each*. run_test ./userfaultfd hugetlb "$half_ufd_size_MB" 32 +run_test ./userfaultfd hugetlb:dev "$half_ufd_size_MB" 32 run_test ./userfaultfd hugetlb_shared "$half_ufd_size_MB" 32 "$mnt"/uffd-test rm -f "$mnt"/uffd-test +run_test ./userfaultfd hugetlb_shared:dev "$half_ufd_size_MB" 32 "$mnt"/uffd-test +rm -f "$mnt"/uffd-test run_test ./userfaultfd shmem 20 16 +run_test ./userfaultfd shmem:dev 20 16 #cleanup umount "$mnt"
Looks good to me.
Reviewed-by: Shuah Khan skhan@linuxfoundation.org
thanks, -- Shuah
linux-kselftest-mirror@lists.linaro.org