The patch titled
Subject: drivers/base/memory: add memory block to memory group after registration succeeded
has been added to the -mm tree. Its filename is
drivers-base-memory-add-memory-block-to-memory-group-after-registration-succeeded.patch
This patch should soon appear at
https://ozlabs.org/~akpm/mmots/broken-out/drivers-base-memory-add-memory-bl…
and later at
https://ozlabs.org/~akpm/mmotm/broken-out/drivers-base-memory-add-memory-bl…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: David Hildenbrand <david(a)redhat.com>
Subject: drivers/base/memory: add memory block to memory group after registration succeeded
If register_memory() fails, we freed the memory block but already added
the memory block to the group list, not good. Let's defer adding the
block to the memory group to after registering the memory block device.
We do handle it properly during unregister_memory(), but that's not
called when the registration fails.
Link: https://lkml.kernel.org/r/20220128144540.153902-1-david@redhat.com
Fixes: 028fc57a1c36 ("drivers/base/memory: introduce "memory groups" to logically group memory blocks")
Signed-off-by: David Hildenbrand <david(a)redhat.com>
Reviewed-by: Oscar Salvador <osalvador(a)suse.de>
Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Cc: "Rafael J. Wysocki" <rafael(a)kernel.org>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: <stable(a)vger.kernel.org> [5.15+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
drivers/base/memory.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)
--- a/drivers/base/memory.c~drivers-base-memory-add-memory-block-to-memory-group-after-registration-succeeded
+++ a/drivers/base/memory.c
@@ -663,14 +663,16 @@ static int init_memory_block(unsigned lo
mem->nr_vmemmap_pages = nr_vmemmap_pages;
INIT_LIST_HEAD(&mem->group_next);
+ ret = register_memory(mem);
+ if (ret)
+ return ret;
+
if (group) {
mem->group = group;
list_add(&mem->group_next, &group->memory_blocks);
}
- ret = register_memory(mem);
-
- return ret;
+ return 0;
}
static int add_memory_block(unsigned long base_section_nr)
_
Patches currently in -mm which might be from david(a)redhat.com are
mm-optimize-do_wp_page-for-exclusive-pages-in-the-swapcache.patch
mm-optimize-do_wp_page-for-fresh-pages-in-local-lru-pagevecs.patch
mm-slightly-clarify-ksm-logic-in-do_swap_page.patch
mm-streamline-cow-logic-in-do_swap_page.patch
mm-huge_memory-streamline-cow-logic-in-do_huge_pmd_wp_page.patch
mm-khugepaged-remove-reuse_swap_page-usage.patch
mm-swapfile-remove-stale-reuse_swap_page.patch
mm-huge_memory-remove-stale-page_trans_huge_mapcount.patch
mm-huge_memory-remove-stale-locking-logic-from-__split_huge_pmd.patch
drivers-base-memory-add-memory-block-to-memory-group-after-registration-succeeded.patch
proc-vmcore-fix-possible-deadlock-on-concurrent-mmap-and-read.patch
The patch titled
Subject: mm/kmemleak: avoid scanning potential huge holes
has been added to the -mm tree. Its filename is
mm-kmemleak-avoid-scanning-potential-huge-holes.patch
This patch should soon appear at
https://ozlabs.org/~akpm/mmots/broken-out/mm-kmemleak-avoid-scanning-potent…
and later at
https://ozlabs.org/~akpm/mmotm/broken-out/mm-kmemleak-avoid-scanning-potent…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Lang Yu <lang.yu(a)amd.com>
Subject: mm/kmemleak: avoid scanning potential huge holes
When using devm_request_free_mem_region() and devm_memremap_pages() to add
ZONE_DEVICE memory, if requested free mem region's end pfn were huge(e.g.,
0x400000000), the node_end_pfn() will be also huge (see
move_pfn_range_to_zone()). Thus it creates a huge hole between
node_start_pfn() and node_end_pfn().
We found on some AMD APUs, amdkfd requested such a free mem region and
created a huge hole. In such a case, following code snippet was just
doing busy test_bit() looping on the huge hole.
for (pfn = start_pfn; pfn < end_pfn; pfn++) {
struct page *page = pfn_to_online_page(pfn);
if (!page)
continue;
...
}
So we got a soft lockup:
watchdog: BUG: soft lockup - CPU#6 stuck for 26s! [bash:1221]
CPU: 6 PID: 1221 Comm: bash Not tainted 5.15.0-custom #1
RIP: 0010:pfn_to_online_page+0x5/0xd0
Call Trace:
? kmemleak_scan+0x16a/0x440
kmemleak_write+0x306/0x3a0
? common_file_perm+0x72/0x170
full_proxy_write+0x5c/0x90
vfs_write+0xb9/0x260
ksys_write+0x67/0xe0
__x64_sys_write+0x1a/0x20
do_syscall_64+0x3b/0xc0
entry_SYSCALL_64_after_hwframe+0x44/0xae
I did some tests with the patch.
(1) amdgpu module unloaded
before the patch:
real 0m0.976s
user 0m0.000s
sys 0m0.968s
after the patch:
real 0m0.981s
user 0m0.000s
sys 0m0.973s
(2) amdgpu module loaded
before the patch:
real 0m35.365s
user 0m0.000s
sys 0m35.354s
after the patch:
real 0m1.049s
user 0m0.000s
sys 0m1.042s
Link: https://lkml.kernel.org/r/20211108140029.721144-1-lang.yu@amd.com
Signed-off-by: Lang Yu <lang.yu(a)amd.com>
Acked-by: David Hildenbrand <david(a)redhat.com>
Acked-by: Catalin Marinas <catalin.marinas(a)arm.com>
Cc: Oscar Salvador <osalvador(a)suse.de>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/kmemleak.c | 13 +++++++------
1 file changed, 7 insertions(+), 6 deletions(-)
--- a/mm/kmemleak.c~mm-kmemleak-avoid-scanning-potential-huge-holes
+++ a/mm/kmemleak.c
@@ -1410,7 +1410,8 @@ static void kmemleak_scan(void)
{
unsigned long flags;
struct kmemleak_object *object;
- int i;
+ struct zone *zone;
+ int __maybe_unused i;
int new_leaks = 0;
jiffies_last_scan = jiffies;
@@ -1450,9 +1451,9 @@ static void kmemleak_scan(void)
* Struct page scanning for each node.
*/
get_online_mems();
- for_each_online_node(i) {
- unsigned long start_pfn = node_start_pfn(i);
- unsigned long end_pfn = node_end_pfn(i);
+ for_each_populated_zone(zone) {
+ unsigned long start_pfn = zone->zone_start_pfn;
+ unsigned long end_pfn = zone_end_pfn(zone);
unsigned long pfn;
for (pfn = start_pfn; pfn < end_pfn; pfn++) {
@@ -1461,8 +1462,8 @@ static void kmemleak_scan(void)
if (!page)
continue;
- /* only scan pages belonging to this node */
- if (page_to_nid(page) != i)
+ /* only scan pages belonging to this zone */
+ if (page_zone(page) != zone)
continue;
/* only scan if page is in use */
if (page_count(page) == 0)
_
Patches currently in -mm which might be from lang.yu(a)amd.com are
mm-kmemleak-avoid-scanning-potential-huge-holes.patch
The patch titled
Subject: exec: force single empty string when argv is empty
has been added to the -mm tree. Its filename is
exec-force-single-empty-string-when-argv-is-empty.patch
This patch should soon appear at
https://ozlabs.org/~akpm/mmots/broken-out/exec-force-single-empty-string-wh…
and later at
https://ozlabs.org/~akpm/mmotm/broken-out/exec-force-single-empty-string-wh…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Kees Cook <keescook(a)chromium.org>
Subject: exec: force single empty string when argv is empty
Quoting[1] Ariadne Conill:
"In several other operating systems, it is a hard requirement that the
second argument to execve(2) be the name of a program, thus prohibiting
a scenario where argc < 1. POSIX 2017 also recommends this behaviour,
but it is not an explicit requirement[2]:
The argument arg0 should point to a filename string that is
associated with the process being started by one of the exec
functions.
...
Interestingly, Michael Kerrisk opened an issue about this in 2008[3],
but there was no consensus to support fixing this issue then.
Hopefully now that CVE-2021-4034 shows practical exploitative use[4]
of this bug in a shellcode, we can reconsider.
This issue is being tracked in the KSPP issue tracker[5]."
While the initial code searches[6][7] turned up what appeared to be
mostly corner case tests, trying to that just reject argv == NULL
(or an immediately terminated pointer list) quickly started tripping[8]
existing userspace programs.
The next best approach is forcing a single empty string into argv and
adjusting argc to match. The number of programs depending on argc == 0
seems a smaller set than those calling execve with a NULL argv.
Account for the additional stack space in bprm_stack_limits(). Inject an
empty string when argc == 0 (and set argc = 1). Warn about the case so
userspace has some notice about the change:
process './argc0' launched './argc0' with NULL argv: empty string added
Additionally WARN() and reject NULL argv usage for kernel threads.
[1] https://lore.kernel.org/lkml/20220127000724.15106-1-ariadne@dereferenced.or…
[2] https://pubs.opengroup.org/onlinepubs/9699919799/functions/exec.html
[3] https://bugzilla.kernel.org/show_bug.cgi?id=8408
[4] https://www.qualys.com/2022/01/25/cve-2021-4034/pwnkit.txt
[5] https://github.com/KSPP/linux/issues/176
[6] https://codesearch.debian.net/search?q=execve%5C+*%5C%28%5B%5E%2C%5D%2B%2C+…
[7] https://codesearch.debian.net/search?q=execlp%3F%5Cs*%5C%28%5B%5E%2C%5D%2B%…
[8] https://lore.kernel.org/lkml/20220131144352.GE16385@xsang-OptiPlex-9020/
Link: https://lkml.kernel.org/r/20220201000947.2453721-1-keescook@chromium.org
Signed-off-by: Kees Cook <keescook(a)chromium.org>Reported-by: Ariadne Conill <ariadne(a)dereferenced.org>
Reported-by: Michael Kerrisk <mtk.manpages(a)gmail.com>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: Christian Brauner <brauner(a)kernel.org>
Cc: Rich Felker <dalias(a)libc.org>
Cc: Eric Biederman <ebiederm(a)xmission.com>
Cc: Alexander Viro <viro(a)zeniv.linux.org.uk>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/exec.c | 26 +++++++++++++++++++++++++-
1 file changed, 25 insertions(+), 1 deletion(-)
--- a/fs/exec.c~exec-force-single-empty-string-when-argv-is-empty
+++ a/fs/exec.c
@@ -495,8 +495,14 @@ static int bprm_stack_limits(struct linu
* the stack. They aren't stored until much later when we can't
* signal to the parent that the child has run out of stack space.
* Instead, calculate it here so it's possible to fail gracefully.
+ *
+ * In the case of argc = 0, make sure there is space for adding a
+ * empty string (which will bump argc to 1), to ensure confused
+ * userspace programs don't start processing from argv[1], thinking
+ * argc can never be 0, to keep them from walking envp by accident.
+ * See do_execveat_common().
*/
- ptr_size = (bprm->argc + bprm->envc) * sizeof(void *);
+ ptr_size = (min(bprm->argc, 1) + bprm->envc) * sizeof(void *);
if (limit <= ptr_size)
return -E2BIG;
limit -= ptr_size;
@@ -1897,6 +1903,9 @@ static int do_execveat_common(int fd, st
}
retval = count(argv, MAX_ARG_STRINGS);
+ if (retval == 0)
+ pr_warn_once("process '%s' launched '%s' with NULL argv: empty string added\n",
+ current->comm, bprm->filename);
if (retval < 0)
goto out_free;
bprm->argc = retval;
@@ -1923,6 +1932,19 @@ static int do_execveat_common(int fd, st
if (retval < 0)
goto out_free;
+ /*
+ * When argv is empty, add an empty string ("") as argv[0] to
+ * ensure confused userspace programs that start processing
+ * from argv[1] won't end up walking envp. See also
+ * bprm_stack_limits().
+ */
+ if (bprm->argc == 0) {
+ retval = copy_string_kernel("", bprm);
+ if (retval < 0)
+ goto out_free;
+ bprm->argc = 1;
+ }
+
retval = bprm_execve(bprm, fd, filename, flags);
out_free:
free_bprm(bprm);
@@ -1951,6 +1973,8 @@ int kernel_execve(const char *kernel_fil
}
retval = count_strings_kernel(argv);
+ if (WARN_ON_ONCE(retval == 0))
+ retval = -EINVAL;
if (retval < 0)
goto out_free;
bprm->argc = retval;
_
Patches currently in -mm which might be from keescook(a)chromium.org are
kconfigdebug-make-debug_info-selectable-from-a-choice.patch
kconfigdebug-make-debug_info-selectable-from-a-choice-fix.patch
exec-force-single-empty-string-when-argv-is-empty.patch
The patch titled
Subject: fs/exec: require argv[0] presence in do_execveat_common()
has been removed from the -mm tree. Its filename was
fs-exec-require-argv-presence-in-do_execveat_common.patch
This patch was dropped because an alternative patch was merged
------------------------------------------------------
From: Ariadne Conill <ariadne(a)dereferenced.org>
Subject: fs/exec: require argv[0] presence in do_execveat_common()
In several other operating systems, it is a hard requirement that the
second argument to execve(2) be the name of a program, thus prohibiting a
scenario where argc < 1. POSIX 2017 also recommends this behaviour, but
it is not an explicit requirement[0]:
The argument arg0 should point to a filename string that is
associated with the process being started by one of the exec
functions.
To ensure that execve(2) with argc < 1 is not a useful tool for shellcode
to use, we can validate this in do_execveat_common() and fail for this
scenario, effectively blocking successful exploitation of CVE-2021-4034
and similar bugs which depend on execve(2) working with argc < 1.
We use -EINVAL for this case, mirroring recent changes to FreeBSD and
OpenBSD. -EINVAL is also used by QNX for this, while Solaris uses
-EFAULT.
In earlier versions of the patch, it was proposed that we create a fake
argv for applications to use when argc < 1, but it was concluded that it
would be better to just fail the execve(2) in these cases, as launching a
process with an empty or NULL argv[0] was likely to just cause more
problems.
Interestingly, Michael Kerrisk opened an issue about this in 2008[1], but
there was no consensus to support fixing this issue then. Hopefully now
that CVE-2021-4034 shows practical exploitative use[2] of this bug in a
shellcode, we can reconsider.
This issue is being tracked in the KSPP issue tracker[3].
There are a few[4][5] minor edge cases (primarily in test suites) that are
caught by this, but we plan to work with the projects to fix those edge
cases.
[0]: https://pubs.opengroup.org/onlinepubs/9699919799/functions/exec.html
[1]: https://bugzilla.kernel.org/show_bug.cgi?id=8408
[2]: https://www.qualys.com/2022/01/25/cve-2021-4034/pwnkit.txt
[3]: https://github.com/KSPP/linux/issues/176
[4]: https://codesearch.debian.net/search?q=execve%5C+*%5C%28%5B%5E%2C%5D%2B%2C+…
[5]: https://codesearch.debian.net/search?q=execlp%3F%5Cs*%5C%28%5B%5E%2C%5D%2B%…
Link: https://lkml.kernel.org/r/20220127000724.15106-1-ariadne@dereferenced.org
Signed-off-by: Ariadne Conill <ariadne(a)dereferenced.org>
Reported-by: Michael Kerrisk <mtk.manpages(a)gmail.com>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: Christian Brauner <brauner(a)kernel.org>
Cc: Rich Felker <dalias(a)libc.org>
Cc: Eric Biederman <ebiederm(a)xmission.com>
Cc: Alexander Viro <viro(a)zeniv.linux.org.uk>
Cc: Kees Cook <keescook(a)chromium.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/exec.c | 4 ++++
1 file changed, 4 insertions(+)
--- a/fs/exec.c~fs-exec-require-argv-presence-in-do_execveat_common
+++ a/fs/exec.c
@@ -1897,6 +1897,10 @@ static int do_execveat_common(int fd, st
}
retval = count(argv, MAX_ARG_STRINGS);
+ if (retval == 0) {
+ pr_warn_once("Attempted to run process '%s' with NULL argv\n", bprm->filename);
+ retval = -EINVAL;
+ }
if (retval < 0)
goto out_free;
bprm->argc = retval;
_
Patches currently in -mm which might be from ariadne(a)dereferenced.org are