Hi,
Changes since v1:
1) Fixed a krobot-reported mistake, which also uncovered a pre-existing
bug. Thanks to Kirill for recommending the fix.
2) Added another small fix: changed the data type to unsigned int, as
pointed out by Ira.
2) Added a "Fixes:" line, thanks to Kirill and Aneesh for pinpointing the
commit.
3) Collected Acked-by and Suggested-by's.
Original cover letter, edited slightly
======================================
These trivial fixes apply to today's linux.git (5.4-rc3).
I found these while polishing up the Next And Final get_user_pages()+dma
tracking patchset (which is in final testing and passing nicely...so far).
Anyway, as these two patches apply cleanly both before and after the larger
gup/dma upcoming patchset, I thought it best to send this out separately,
in order to avoid muddying the waters more than usual.
Cc: Christoph Hellwig <hch(a)lst.de>
Cc: Aneesh Kumar K.V <aneesh.kumar(a)linux.ibm.com>
Cc: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
Cc: Keith Busch <keith.busch(a)intel.com>
Cc: Ira Weiny <ira.weiny(a)intel.com>
Cc: Shuah Khan <shuah(a)kernel.org>
Cc: linux-kselftest(a)vger.kernel.org
John Hubbard (2):
mm/gup_benchmark: add a missing "w" to getopt string
mm/gup: fix a misnamed "write" argument, and a related bug
mm/gup.c | 14 ++++++++------
tools/testing/selftests/vm/gup_benchmark.c | 2 +-
2 files changed, 9 insertions(+), 7 deletions(-)
--
2.23.0
Patch series implements hugetlb_cgroup reservation usage and limits, which
track hugetlb reservations rather than hugetlb memory faulted in. Details of
the approach is 1/7.
Changes in v5:
- Moved the bulk of the description to the first patch in the series.
- Clang formatted the entire series.
- Split off 'hugetlb: remove duplicated code' and 'hugetlb: region_chg provides
only cache entry' into their own patch series.
- Added comments to HUGETLB_RES enum.
- Fixed bug in 'hugetlb: disable region_add file_region coalescing' calculating
the wrong number of regions_needed in some cases.
- Changed sleeps in test to proper conditions.
- Misc fixes in test based on shuah@ review.
Changes in v4:
- Split up 'hugetlb_cgroup: add accounting for shared mappings' into 4 patches
for better isolation and context on the indvidual changes:
- hugetlb_cgroup: add accounting for shared mappings
- hugetlb: disable region_add file_region coalescing
- hugetlb: remove duplicated code
- hugetlb: region_chg provides only cache entry
- Fixed resv->adds_in_progress accounting.
- Retained behavior that region_add never fails, in earlier patchsets region_add
could return failure.
- Fixed libhugetlbfs failure.
- Minor fix to the added tests that was preventing them from running on some
environments.
Changes in v3:
- Addressed comments of Hillf Danton:
- Added docs.
- cgroup_files now uses enum.
- Various readability improvements.
- Addressed comments of Mike Kravetz.
- region_* functions no longer coalesce file_region entries in the resv_map.
- region_add() and region_chg() refactored to make them much easier to
understand and remove duplicated code so this patch doesn't add too much
complexity.
- Refactored common functionality into helpers.
Changes in v2:
- Split the patch into a 5 patch series.
- Fixed patch subject.
Mina Almasry (7):
hugetlb_cgroup: Add hugetlb_cgroup reservation counter
hugetlb_cgroup: add interface for charge/uncharge hugetlb reservations
hugetlb_cgroup: add reservation accounting for private mappings
hugetlb: disable region_add file_region coalescing
hugetlb_cgroup: add accounting for shared mappings
hugetlb_cgroup: Add hugetlb_cgroup reservation tests
hugetlb_cgroup: Add hugetlb_cgroup reservation docs
.../admin-guide/cgroup-v1/hugetlb.rst | 85 +++-
include/linux/hugetlb.h | 31 +-
include/linux/hugetlb_cgroup.h | 33 +-
mm/hugetlb.c | 423 +++++++++++-----
mm/hugetlb_cgroup.c | 190 ++++++--
tools/testing/selftests/vm/.gitignore | 1 +
tools/testing/selftests/vm/Makefile | 1 +
.../selftests/vm/charge_reserved_hugetlb.sh | 461 ++++++++++++++++++
.../selftests/vm/write_hugetlb_memory.sh | 22 +
.../testing/selftests/vm/write_to_hugetlbfs.c | 250 ++++++++++
10 files changed, 1306 insertions(+), 191 deletions(-)
create mode 100755 tools/testing/selftests/vm/charge_reserved_hugetlb.sh
create mode 100644 tools/testing/selftests/vm/write_hugetlb_memory.sh
create mode 100644 tools/testing/selftests/vm/write_to_hugetlbfs.c
--
2.23.0.351.gc4317032e6-goog
Hi,
These trivial fixes apply to today's linux.git (5.4-rc3, or maybe -rc4, by
the time I send this).
I found these while polishing up the Next And Final get_user_pages()+dma
tracking patchset (which is in final testing and passing nicely...so far).
Anyway, as these two patches apply cleanly both before and after the larger
gup/dma upcoming patchset, I thought it best to send this out separately,
in order to avoid muddying the waters more than usual.
Cc: Christoph Hellwig <hch(a)lst.de>
Cc: Aneesh Kumar K.V <aneesh.kumar(a)linux.ibm.com>
Cc: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
Cc: Keith Busch <keith.busch(a)intel.com>
Cc: Shuah Khan <shuah(a)kernel.org>
Cc: linux-kselftest(a)vger.kernel.org
John Hubbard (2):
mm/gup_benchmark: add a missing "w" to getopt string
mm/gup: fix a misnamed "write" argument: should be "flags"
mm/gup.c | 12 +++++++-----
tools/testing/selftests/vm/gup_benchmark.c | 2 +-
2 files changed, 8 insertions(+), 6 deletions(-)
--
2.23.0
Reset all signal handlers of the child not set to SIG_IGN to SIG_DFL.
Mutually exclusive with CLONE_SIGHAND to not disturb other thread's
signal handler.
In the spirit of closer cooperation between glibc developers and kernel
developers (cf. [2]) this patchset came out of a discussion on the glibc
mailing list for improving posix_spawn() (cf. [1], [3], [4]). Kernel
support for this feature has been explicitly requested by glibc and I
see no reason not to help them with this.
The child helper process on Linux posix_spawn must ensure that no signal
handlers are enabled, so the signal disposition must be either SIG_DFL
or SIG_IGN. However, it requires a sigprocmask to obtain the current
signal mask and at least _NSIG sigaction calls to reset the signal
handlers for each posix_spawn call or complex state tracking that might
lead to data corruption in glibc. Adding this flags lets glibc avoid
these problems.
[1]: https://www.sourceware.org/ml/libc-alpha/2019-10/msg00149.html
[3]: https://www.sourceware.org/ml/libc-alpha/2019-10/msg00158.html
[4]: https://www.sourceware.org/ml/libc-alpha/2019-10/msg00160.html
[2]: https://lwn.net/Articles/799331/
'[...] by asking for better cooperation with the C-library projects
in general. They should be copied on patches containing ABI
changes, for example. I noted that there are often times where
C-library developers wish the kernel community had done things
differently; how could those be avoided in the future? Members of
the audience suggested that more glibc developers should perhaps
join the linux-api list. The other suggestion was to "copy Florian
on everything".'
Cc: Oleg Nesterov <oleg(a)redhat.com>
Cc: Florian Weimer <fweimer(a)redhat.com>
Cc: libc-alpha(a)sourceware.org
Signed-off-by: Christian Brauner <christian.brauner(a)ubuntu.com>
---
include/uapi/linux/sched.h | 3 +++
kernel/fork.c | 11 ++++++++++-
2 files changed, 13 insertions(+), 1 deletion(-)
diff --git a/include/uapi/linux/sched.h b/include/uapi/linux/sched.h
index 99335e1f4a27..c583720f689f 100644
--- a/include/uapi/linux/sched.h
+++ b/include/uapi/linux/sched.h
@@ -33,6 +33,9 @@
#define CLONE_NEWNET 0x40000000 /* New network namespace */
#define CLONE_IO 0x80000000 /* Clone io context */
+/* Flags for the clone3() syscall */
+#define CLONE3_CLEAR_SIGHAND 0x100000000ULL /* Clear any signal handler and reset to SIG_DFL. */
+
#ifndef __ASSEMBLY__
/**
* struct clone_args - arguments for the clone3 syscall
diff --git a/kernel/fork.c b/kernel/fork.c
index 1f6c45f6a734..661f8d1f3881 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1517,6 +1517,11 @@ static int copy_sighand(unsigned long clone_flags, struct task_struct *tsk)
spin_lock_irq(¤t->sighand->siglock);
memcpy(sig->action, current->sighand->action, sizeof(sig->action));
spin_unlock_irq(¤t->sighand->siglock);
+
+ /* Reset all signal handler not set to SIG_IGN to SIG_DFL. */
+ if (clone_flags & CLONE3_CLEAR_SIGHAND)
+ flush_signal_handlers(tsk, 0);
+
return 0;
}
@@ -2567,7 +2572,7 @@ static bool clone3_args_valid(const struct kernel_clone_args *kargs)
* All lower bits of the flag word are taken.
* Verify that no other unknown flags are passed along.
*/
- if (kargs->flags & ~CLONE_LEGACY_FLAGS)
+ if (kargs->flags & ~(CLONE_LEGACY_FLAGS | CLONE3_CLEAR_SIGHAND))
return false;
/*
@@ -2577,6 +2582,10 @@ static bool clone3_args_valid(const struct kernel_clone_args *kargs)
if (kargs->flags & (CLONE_DETACHED | CSIGNAL))
return false;
+ if ((kargs->flags & (CLONE_SIGHAND | CLONE3_CLEAR_SIGHAND)) ==
+ (CLONE_SIGHAND | CLONE3_CLEAR_SIGHAND))
+ return false;
+
if ((kargs->flags & (CLONE_THREAD | CLONE_PARENT)) &&
kargs->exit_signal)
return false;
--
2.23.0
Reset all signal handlers of the child not set to SIG_IGN to SIG_DFL.
Mutually exclusive with CLONE_SIGHAND to not disturb other thread's
signal handler.
In the spirit of closer cooperation between glibc developers and kernel
developers (cf. [2]) this patchset came out of a discussion on the glibc
mailing list for improving posix_spawn() (cf. [1], [3], [4]). Kernel
support for this feature has been explicitly requested by glibc and I
see no reason not to help them with this.
The child helper process on Linux posix_spawn must ensure that no signal
handlers are enabled, so the signal disposition must be either SIG_DFL
or SIG_IGN. However, it requires a sigprocmask to obtain the current
signal mask and at least _NSIG sigaction calls to reset the signal
handlers for each posix_spawn call or complex state tracking that might
lead to data corruption in glibc. Adding this flags lets glibc avoid
these problems.
[1]: https://www.sourceware.org/ml/libc-alpha/2019-10/msg00149.html
[3]: https://www.sourceware.org/ml/libc-alpha/2019-10/msg00158.html
[4]: https://www.sourceware.org/ml/libc-alpha/2019-10/msg00160.html
[2]: https://lwn.net/Articles/799331/
'[...] by asking for better cooperation with the C-library projects
in general. They should be copied on patches containing ABI
changes, for example. I noted that there are often times where
C-library developers wish the kernel community had done things
differently; how could those be avoided in the future? Members of
the audience suggested that more glibc developers should perhaps
join the linux-api list. The other suggestion was to "copy Florian
on everything".'
Cc: Oleg Nesterov <oleg(a)redhat.com>
Cc: Florian Weimer <fweimer(a)redhat.com>
Cc: libc-alpha(a)sourceware.org
Cc: linux-api(a)vger.kernel.org
Signed-off-by: Christian Brauner <christian.brauner(a)ubuntu.com>
---
/* v1 */
Link: https://lore.kernel.org/r/20191010133518.5420-1-christian.brauner@ubuntu.com
/* v2 */
- Florian Weimer <fweimer(a)redhat.com>:
- update comment in clone3_args_valid()
---
include/uapi/linux/sched.h | 3 +++
kernel/fork.c | 16 +++++++++++-----
2 files changed, 14 insertions(+), 5 deletions(-)
diff --git a/include/uapi/linux/sched.h b/include/uapi/linux/sched.h
index 99335e1f4a27..c583720f689f 100644
--- a/include/uapi/linux/sched.h
+++ b/include/uapi/linux/sched.h
@@ -33,6 +33,9 @@
#define CLONE_NEWNET 0x40000000 /* New network namespace */
#define CLONE_IO 0x80000000 /* Clone io context */
+/* Flags for the clone3() syscall */
+#define CLONE3_CLEAR_SIGHAND 0x100000000ULL /* Clear any signal handler and reset to SIG_DFL. */
+
#ifndef __ASSEMBLY__
/**
* struct clone_args - arguments for the clone3 syscall
diff --git a/kernel/fork.c b/kernel/fork.c
index 1f6c45f6a734..0a0269cb2c18 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1517,6 +1517,11 @@ static int copy_sighand(unsigned long clone_flags, struct task_struct *tsk)
spin_lock_irq(¤t->sighand->siglock);
memcpy(sig->action, current->sighand->action, sizeof(sig->action));
spin_unlock_irq(¤t->sighand->siglock);
+
+ /* Reset all signal handler not set to SIG_IGN to SIG_DFL. */
+ if (clone_flags & CLONE3_CLEAR_SIGHAND)
+ flush_signal_handlers(tsk, 0);
+
return 0;
}
@@ -2563,11 +2568,8 @@ noinline static int copy_clone_args_from_user(struct kernel_clone_args *kargs,
static bool clone3_args_valid(const struct kernel_clone_args *kargs)
{
- /*
- * All lower bits of the flag word are taken.
- * Verify that no other unknown flags are passed along.
- */
- if (kargs->flags & ~CLONE_LEGACY_FLAGS)
+ /* Verify that no unknown flags are passed along. */
+ if (kargs->flags & ~(CLONE_LEGACY_FLAGS | CLONE3_CLEAR_SIGHAND))
return false;
/*
@@ -2577,6 +2579,10 @@ static bool clone3_args_valid(const struct kernel_clone_args *kargs)
if (kargs->flags & (CLONE_DETACHED | CSIGNAL))
return false;
+ if ((kargs->flags & (CLONE_SIGHAND | CLONE3_CLEAR_SIGHAND)) ==
+ (CLONE_SIGHAND | CLONE3_CLEAR_SIGHAND))
+ return false;
+
if ((kargs->flags & (CLONE_THREAD | CLONE_PARENT)) &&
kargs->exit_signal)
return false;
--
2.23.0
Hey everyone,
/* v2 */
This is the patchset coming out of the KSummit session Kees and I gave
in Lisbon last week (cf. [3] which also contains slides with more
details on related things such as deep argument inspection).
The simple idea is to extend the seccomp notifier to allow for the
continuation of a syscall. The rationale for this can be found in the
commit message to [1]. For the curious there is more detail in [2].
This patchset would unblock supervising an extended set of syscalls such
as mount() where a privileged process is supervising the syscalls of a
lesser privileged process and emulates the syscall for the latter in
userspace.
For more comments on security see [1] and the comments in
include/uapi/linux/seccomp.h added by this patchset.
Kees, if you prefer a pr the series can be pulled from:
git@gitolite.kernel.org:pub/scm/linux/kernel/git/brauner/linux tags/seccomp-notify-syscall-continue-v5.5
For anyone who wants to play with this it's sitting in:
https://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux.git/log/?h=se…
/* v1 */
Link: https://lore.kernel.org/r/20190919095903.19370-1-christian.brauner@ubuntu.c…
- Kees Cook <keescook(a)chromium.org>:
- dropped patch because it is already present in linux-next
[PATCH 2/4] seccomp: add two missing ptrace ifdefines
Link: https://lore.kernel.org/r/20190918084833.9369-3-christian.brauner@ubuntu.com
/* v0 */
Link: https://lore.kernel.org/r/20190918084833.9369-1-christian.brauner@ubuntu.com
Thanks!
Christian
*** BLURB HERE ***
Christian Brauner (3):
seccomp: add SECCOMP_USER_NOTIF_FLAG_CONTINUE
seccomp: avoid overflow in implicit constant conversion
seccomp: test SECCOMP_USER_NOTIF_FLAG_CONTINUE
include/uapi/linux/seccomp.h | 28 +++++
kernel/seccomp.c | 28 ++++-
tools/testing/selftests/seccomp/seccomp_bpf.c | 110 +++++++++++++++++-
3 files changed, 159 insertions(+), 7 deletions(-)
--
2.23.0