During fuzz testing, the following warning was discovered:
different return values (15 and 11) from vsnprintf("%*pbl
", ...)
test:keyward is WARNING in kvasprintf
WARNING: CPU: 55 PID: 1168477 at lib/kasprintf.c:30 kvasprintf+0x121/0x130
Call Trace:
kvasprintf+0x121/0x130
kasprintf+0xa6/0xe0
bitmap_print_to_buf+0x89/0x100
core_siblings_list_read+0x7e/0xb0
kernfs_file_read_iter+0x15b/0x270
new_sync_read+0x153/0x260
vfs_read+0x215/0x290
ksys_read+0xb9/0x160
do_syscall_64+0x56/0x100
entry_SYSCALL_64_after_hwframe+0x78/0xe2
The call trace shows that kvasprintf() reported this warning during the
printing of core_siblings_list. kvasprintf() has several steps:
(1) First, calculate the length of the resulting formatted string.
(2) Allocate a buffer based on the returned length.
(3) Then, perform the actual string formatting.
(4) Check whether the lengths of the formatted strings returned in
steps (1) and (2) are consistent.
If the core_cpumask is modified between steps (1) and (3), the lengths
obtained in these two steps may not match. Indeed our test includes cpu
hotplugging, which should modify core_cpumask while printing.
To fix this issue, cache the cpumask into a temporary variable before
calling cpumap_print_{list, cpumask}_to_buf(), to keep it unchanged
during the printing process.
Fixes: bb9ec13d156e ("topology: use bin_attribute to break the size limitation of cpumap ABI")
Cc: stable(a)vger.kernel.org
Signed-off-by: Li Huafei <lihuafei1(a)huawei.com>
---
Changes in v2:
- Return an error when calling alloc_cpumask_var() fails instead of
returning a size of 0.
- Add Cc (to stable) tag.
---
drivers/base/topology.c | 24 ++++++++++++++++++++----
1 file changed, 20 insertions(+), 4 deletions(-)
diff --git a/drivers/base/topology.c b/drivers/base/topology.c
index 89f98be5c5b9..d293cbd253e4 100644
--- a/drivers/base/topology.c
+++ b/drivers/base/topology.c
@@ -27,9 +27,17 @@ static ssize_t name##_read(struct file *file, struct kobject *kobj, \
loff_t off, size_t count) \
{ \
struct device *dev = kobj_to_dev(kobj); \
+ cpumask_var_t mask; \
+ ssize_t n; \
\
- return cpumap_print_bitmask_to_buf(buf, topology_##mask(dev->id), \
- off, count); \
+ if (!alloc_cpumask_var(&mask, GFP_KERNEL)) \
+ return -ENOMEM; \
+ \
+ cpumask_copy(mask, topology_##mask(dev->id)); \
+ n = cpumap_print_bitmask_to_buf(buf, mask, off, count); \
+ free_cpumask_var(mask); \
+ \
+ return n; \
} \
\
static ssize_t name##_list_read(struct file *file, struct kobject *kobj, \
@@ -37,9 +45,17 @@ static ssize_t name##_list_read(struct file *file, struct kobject *kobj, \
loff_t off, size_t count) \
{ \
struct device *dev = kobj_to_dev(kobj); \
+ cpumask_var_t mask; \
+ ssize_t n; \
+ \
+ if (!alloc_cpumask_var(&mask, GFP_KERNEL)) \
+ return -ENOMEM; \
+ \
+ cpumask_copy(mask, topology_##mask(dev->id)); \
+ n = cpumap_print_list_to_buf(buf, mask, off, count); \
+ free_cpumask_var(mask); \
\
- return cpumap_print_list_to_buf(buf, topology_##mask(dev->id), \
- off, count); \
+ return n; \
}
define_id_show_func(physical_package_id, "%d");
--
2.25.1
A missing or empty dma-ranges in a DT node implies a 1:1 mapping for dma
translations. In this specific case, the current behaviour is to zero out
the entire specifier so that the translation could be carried on as an
offset from zero. This includes address specifier that has flags (e.g.
PCI ranges).
Once the flags portion has been zeroed, the translation chain is broken
since the mapping functions will check the upcoming address specifier
against mismatching flags, always failing the 1:1 mapping and its entire
purpose of always succeeding.
Set to zero only the address portion while passing the flags through.
Fixes: dbbdee94734b ("of/address: Merge all of the bus translation code")
Cc: stable(a)vger.kernel.org
Signed-off-by: Andrea della Porta <andrea.porta(a)suse.com>
Tested-by: Herve Codina <herve.codina(a)bootlin.com>
---
drivers/of/address.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/of/address.c b/drivers/of/address.c
index 286f0c161e33..72b6accff21c 100644
--- a/drivers/of/address.c
+++ b/drivers/of/address.c
@@ -455,7 +455,8 @@ static int of_translate_one(struct device_node *parent, struct of_bus *bus,
}
if (ranges == NULL || rlen == 0) {
offset = of_read_number(addr, na);
- memset(addr, 0, pna * 4);
+ /* copy the address while preserving the flags */
+ memset(addr + pbus->flag_cells, 0, (pna - pbus->flag_cells) * 4);
pr_debug("empty ranges; 1:1 translation\n");
goto finish;
}
--
2.35.3
From: Tvrtko Ursulin <tvrtko.ursulin(a)igalia.com>
One alternative to the fix Christian proposed in
https://lore.kernel.org/dri-devel/20241024124159.4519-3-christian.koenig@am…
is to replace the rather complex open coded sorting loops with the kernel
standard sort followed by a context squashing pass.
Proposed advantage of this would be readability but one concern Christian
raised was that there could be many fences, that they are typically mostly
sorted, and so the kernel's heap sort would be much worse by the proposed
algorithm.
I had a look running some games and vkcube to see what are the typical
number of input fences. Tested scenarios:
1) Hogwarts Legacy under Gamescope
450 calls per second to __dma_fence_unwrap_merge.
Percentages per number of fences buckets, before and after checking for
signalled status, sorting and flattening:
N Before After
0 0.91%
1 69.40%
2-3 28.72% 9.4% (90.6% resolved to one fence)
4-5 0.93%
6-9 0.03%
10+
2) Cyberpunk 2077 under Gamescope
1050 calls per second, amounting to 0.01% CPU time according to perf top.
N Before After
0 1.13%
1 52.30%
2-3 40.34% 55.57%
4-5 1.46% 0.50%
6-9 2.44%
10+ 2.34%
3) vkcube under Plasma
90 calls per second.
N Before After
0
1
2-3 100% 0% (Ie. all resolved to a single fence)
4-5
6-9
10+
In the case of vkcube all invocations in the 2-3 bucket were actually
just two input fences.
From these numbers it looks like the heap sort should not be a
disadvantage, given how the dominant case is <= 2 input fences which heap
sort solves with just one compare and swap. (And for the case of one input
fence we have a fast path in the previous patch.)
A complementary possibility is to implement a different sorting algorithm
under the same API as the kernel's sort() and so keep the simplicity,
potentially moving the new sort under lib/ if it would be found more
widely useful.
v2:
* Hold on to fence references and reduce commentary. (Christian)
* Record and use latest signaled timestamp in the 2nd loop too.
* Consolidate zero or one fences fast paths.
v3:
* Reverse the seqno sort order for a simpler squashing pass. (Christian)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin(a)igalia.com>
Fixes: 245a4a7b531c ("dma-buf: generalize dma_fence unwrap & merging v3")
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3617
Cc: Christian König <christian.koenig(a)amd.com>
Cc: Daniel Vetter <daniel.vetter(a)ffwll.ch>
Cc: Sumit Semwal <sumit.semwal(a)linaro.org>
Cc: Gustavo Padovan <gustavo(a)padovan.org>
Cc: Friedrich Vock <friedrich.vock(a)gmx.de>
Cc: linux-media(a)vger.kernel.org
Cc: dri-devel(a)lists.freedesktop.org
Cc: linaro-mm-sig(a)lists.linaro.org
Cc: <stable(a)vger.kernel.org> # v6.0+
---
drivers/dma-buf/dma-fence-unwrap.c | 128 ++++++++++++++---------------
1 file changed, 61 insertions(+), 67 deletions(-)
diff --git a/drivers/dma-buf/dma-fence-unwrap.c b/drivers/dma-buf/dma-fence-unwrap.c
index b19d0adf6086..6345062731f1 100644
--- a/drivers/dma-buf/dma-fence-unwrap.c
+++ b/drivers/dma-buf/dma-fence-unwrap.c
@@ -12,6 +12,7 @@
#include <linux/dma-fence-chain.h>
#include <linux/dma-fence-unwrap.h>
#include <linux/slab.h>
+#include <linux/sort.h>
/* Internal helper to start new array iteration, don't use directly */
static struct dma_fence *
@@ -59,6 +60,25 @@ struct dma_fence *dma_fence_unwrap_next(struct dma_fence_unwrap *cursor)
}
EXPORT_SYMBOL_GPL(dma_fence_unwrap_next);
+
+static int fence_cmp(const void *_a, const void *_b)
+{
+ struct dma_fence *a = *(struct dma_fence **)_a;
+ struct dma_fence *b = *(struct dma_fence **)_b;
+
+ if (a->context < b->context)
+ return -1;
+ else if (a->context > b->context)
+ return 1;
+
+ if (dma_fence_is_later(b, a))
+ return 1;
+ else if (dma_fence_is_later(a, b))
+ return -1;
+
+ return 0;
+}
+
/* Implementation for the dma_fence_merge() marco, don't use directly */
struct dma_fence *__dma_fence_unwrap_merge(unsigned int num_fences,
struct dma_fence **fences,
@@ -67,8 +87,7 @@ struct dma_fence *__dma_fence_unwrap_merge(unsigned int num_fences,
struct dma_fence_array *result;
struct dma_fence *tmp, **array;
ktime_t timestamp;
- unsigned int i;
- size_t count;
+ int i, j, count;
count = 0;
timestamp = ns_to_ktime(0);
@@ -96,80 +115,55 @@ struct dma_fence *__dma_fence_unwrap_merge(unsigned int num_fences,
if (!array)
return NULL;
- /*
- * This trashes the input fence array and uses it as position for the
- * following merge loop. This works because the dma_fence_merge()
- * wrapper macro is creating this temporary array on the stack together
- * with the iterators.
- */
- for (i = 0; i < num_fences; ++i)
- fences[i] = dma_fence_unwrap_first(fences[i], &iter[i]);
-
count = 0;
- do {
- unsigned int sel;
-
-restart:
- tmp = NULL;
- for (i = 0; i < num_fences; ++i) {
- struct dma_fence *next;
-
- while (fences[i] && dma_fence_is_signaled(fences[i]))
- fences[i] = dma_fence_unwrap_next(&iter[i]);
-
- next = fences[i];
- if (!next)
- continue;
-
- /*
- * We can't guarantee that inpute fences are ordered by
- * context, but it is still quite likely when this
- * function is used multiple times. So attempt to order
- * the fences by context as we pass over them and merge
- * fences with the same context.
- */
- if (!tmp || tmp->context > next->context) {
- tmp = next;
- sel = i;
-
- } else if (tmp->context < next->context) {
- continue;
-
- } else if (dma_fence_is_later(tmp, next)) {
- fences[i] = dma_fence_unwrap_next(&iter[i]);
- goto restart;
+ for (i = 0; i < num_fences; ++i) {
+ dma_fence_unwrap_for_each(tmp, &iter[i], fences[i]) {
+ if (!dma_fence_is_signaled(tmp)) {
+ array[count++] = dma_fence_get(tmp);
} else {
- fences[sel] = dma_fence_unwrap_next(&iter[sel]);
- goto restart;
+ ktime_t t = dma_fence_timestamp(tmp);
+
+ if (ktime_after(t, timestamp))
+ timestamp = t;
}
}
-
- if (tmp) {
- array[count++] = dma_fence_get(tmp);
- fences[sel] = dma_fence_unwrap_next(&iter[sel]);
- }
- } while (tmp);
-
- if (count == 0) {
- tmp = dma_fence_allocate_private_stub(ktime_get());
- goto return_tmp;
}
- if (count == 1) {
- tmp = array[0];
- goto return_tmp;
- }
+ if (count == 0 || count == 1)
+ goto return_fastpath;
+
+ sort(array, count, sizeof(*array), fence_cmp, NULL);
- result = dma_fence_array_create(count, array,
- dma_fence_context_alloc(1),
- 1, false);
- if (!result) {
- for (i = 0; i < count; i++)
+ /*
+ * Only keep the most recent fence for each context.
+ */
+ j = 0;
+ for (i = 1; i < count; i++) {
+ if (array[i]->context == array[j]->context)
dma_fence_put(array[i]);
- tmp = NULL;
- goto return_tmp;
+ else
+ array[++j] = array[i];
}
- return &result->base;
+ count = ++j;
+
+ if (count > 1) {
+ result = dma_fence_array_create(count, array,
+ dma_fence_context_alloc(1),
+ 1, false);
+ if (!result) {
+ for (i = 0; i < count; i++)
+ dma_fence_put(array[i]);
+ tmp = NULL;
+ goto return_tmp;
+ }
+ return &result->base;
+ }
+
+return_fastpath:
+ if (count == 0)
+ tmp = dma_fence_allocate_private_stub(timestamp);
+ else
+ tmp = array[0];
return_tmp:
kfree(array);
--
2.46.0
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 5de195060b2e251a835f622759550e6202167641
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024111159-powwow-tweezers-dc64@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 5de195060b2e251a835f622759550e6202167641 Mon Sep 17 00:00:00 2001
From: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com>
Date: Tue, 29 Oct 2024 18:11:48 +0000
Subject: [PATCH] mm: resolve faulty mmap_region() error path behaviour
The mmap_region() function is somewhat terrifying, with spaghetti-like
control flow and numerous means by which issues can arise and incomplete
state, memory leaks and other unpleasantness can occur.
A large amount of the complexity arises from trying to handle errors late
in the process of mapping a VMA, which forms the basis of recently
observed issues with resource leaks and observable inconsistent state.
Taking advantage of previous patches in this series we move a number of
checks earlier in the code, simplifying things by moving the core of the
logic into a static internal function __mmap_region().
Doing this allows us to perform a number of checks up front before we do
any real work, and allows us to unwind the writable unmap check
unconditionally as required and to perform a CONFIG_DEBUG_VM_MAPLE_TREE
validation unconditionally also.
We move a number of things here:
1. We preallocate memory for the iterator before we call the file-backed
memory hook, allowing us to exit early and avoid having to perform
complicated and error-prone close/free logic. We carefully free
iterator state on both success and error paths.
2. The enclosing mmap_region() function handles the mapping_map_writable()
logic early. Previously the logic had the mapping_map_writable() at the
point of mapping a newly allocated file-backed VMA, and a matching
mapping_unmap_writable() on success and error paths.
We now do this unconditionally if this is a file-backed, shared writable
mapping. If a driver changes the flags to eliminate VM_MAYWRITE, however
doing so does not invalidate the seal check we just performed, and we in
any case always decrement the counter in the wrapper.
We perform a debug assert to ensure a driver does not attempt to do the
opposite.
3. We also move arch_validate_flags() up into the mmap_region()
function. This is only relevant on arm64 and sparc64, and the check is
only meaningful for SPARC with ADI enabled. We explicitly add a warning
for this arch if a driver invalidates this check, though the code ought
eventually to be fixed to eliminate the need for this.
With all of these measures in place, we no longer need to explicitly close
the VMA on error paths, as we place all checks which might fail prior to a
call to any driver mmap hook.
This eliminates an entire class of errors, makes the code easier to reason
about and more robust.
Link: https://lkml.kernel.org/r/6e0becb36d2f5472053ac5d544c0edfe9b899e25.17302246…
Fixes: deb0f6562884 ("mm/mmap: undo ->mmap() when arch_validate_flags() fails")
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com>
Reported-by: Jann Horn <jannh(a)google.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett(a)oracle.com>
Reviewed-by: Vlastimil Babka <vbabka(a)suse.cz>
Tested-by: Mark Brown <broonie(a)kernel.org>
Cc: Andreas Larsson <andreas(a)gaisler.com>
Cc: Catalin Marinas <catalin.marinas(a)arm.com>
Cc: David S. Miller <davem(a)davemloft.net>
Cc: Helge Deller <deller(a)gmx.de>
Cc: James E.J. Bottomley <James.Bottomley(a)HansenPartnership.com>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: Peter Xu <peterx(a)redhat.com>
Cc: Will Deacon <will(a)kernel.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/mm/mmap.c b/mm/mmap.c
index aee5fa08ae5d..79d541f1502b 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1358,20 +1358,18 @@ int do_munmap(struct mm_struct *mm, unsigned long start, size_t len,
return do_vmi_munmap(&vmi, mm, start, len, uf, false);
}
-unsigned long mmap_region(struct file *file, unsigned long addr,
+static unsigned long __mmap_region(struct file *file, unsigned long addr,
unsigned long len, vm_flags_t vm_flags, unsigned long pgoff,
struct list_head *uf)
{
struct mm_struct *mm = current->mm;
struct vm_area_struct *vma = NULL;
pgoff_t pglen = PHYS_PFN(len);
- struct vm_area_struct *merge;
unsigned long charged = 0;
struct vma_munmap_struct vms;
struct ma_state mas_detach;
struct maple_tree mt_detach;
unsigned long end = addr + len;
- bool writable_file_mapping = false;
int error;
VMA_ITERATOR(vmi, mm, addr);
VMG_STATE(vmg, mm, &vmi, addr, end, vm_flags, pgoff);
@@ -1445,28 +1443,26 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
vm_flags_init(vma, vm_flags);
vma->vm_page_prot = vm_get_page_prot(vm_flags);
+ if (vma_iter_prealloc(&vmi, vma)) {
+ error = -ENOMEM;
+ goto free_vma;
+ }
+
if (file) {
vma->vm_file = get_file(file);
error = mmap_file(file, vma);
if (error)
- goto unmap_and_free_vma;
-
- if (vma_is_shared_maywrite(vma)) {
- error = mapping_map_writable(file->f_mapping);
- if (error)
- goto close_and_free_vma;
-
- writable_file_mapping = true;
- }
+ goto unmap_and_free_file_vma;
+ /* Drivers cannot alter the address of the VMA. */
+ WARN_ON_ONCE(addr != vma->vm_start);
/*
- * Expansion is handled above, merging is handled below.
- * Drivers should not alter the address of the VMA.
+ * Drivers should not permit writability when previously it was
+ * disallowed.
*/
- if (WARN_ON((addr != vma->vm_start))) {
- error = -EINVAL;
- goto close_and_free_vma;
- }
+ VM_WARN_ON_ONCE(vm_flags != vma->vm_flags &&
+ !(vm_flags & VM_MAYWRITE) &&
+ (vma->vm_flags & VM_MAYWRITE));
vma_iter_config(&vmi, addr, end);
/*
@@ -1474,6 +1470,8 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
* vma again as we may succeed this time.
*/
if (unlikely(vm_flags != vma->vm_flags && vmg.prev)) {
+ struct vm_area_struct *merge;
+
vmg.flags = vma->vm_flags;
/* If this fails, state is reset ready for a reattempt. */
merge = vma_merge_new_range(&vmg);
@@ -1491,7 +1489,7 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
vma = merge;
/* Update vm_flags to pick up the change. */
vm_flags = vma->vm_flags;
- goto unmap_writable;
+ goto file_expanded;
}
vma_iter_config(&vmi, addr, end);
}
@@ -1500,26 +1498,15 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
} else if (vm_flags & VM_SHARED) {
error = shmem_zero_setup(vma);
if (error)
- goto free_vma;
+ goto free_iter_vma;
} else {
vma_set_anonymous(vma);
}
- if (map_deny_write_exec(vma->vm_flags, vma->vm_flags)) {
- error = -EACCES;
- goto close_and_free_vma;
- }
-
- /* Allow architectures to sanity-check the vm_flags */
- if (!arch_validate_flags(vma->vm_flags)) {
- error = -EINVAL;
- goto close_and_free_vma;
- }
-
- if (vma_iter_prealloc(&vmi, vma)) {
- error = -ENOMEM;
- goto close_and_free_vma;
- }
+#ifdef CONFIG_SPARC64
+ /* TODO: Fix SPARC ADI! */
+ WARN_ON_ONCE(!arch_validate_flags(vm_flags));
+#endif
/* Lock the VMA since it is modified after insertion into VMA tree */
vma_start_write(vma);
@@ -1533,10 +1520,7 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
*/
khugepaged_enter_vma(vma, vma->vm_flags);
- /* Once vma denies write, undo our temporary denial count */
-unmap_writable:
- if (writable_file_mapping)
- mapping_unmap_writable(file->f_mapping);
+file_expanded:
file = vma->vm_file;
ksm_add_vma(vma);
expanded:
@@ -1569,23 +1553,17 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
vma_set_page_prot(vma);
- validate_mm(mm);
return addr;
-close_and_free_vma:
- vma_close(vma);
+unmap_and_free_file_vma:
+ fput(vma->vm_file);
+ vma->vm_file = NULL;
- if (file || vma->vm_file) {
-unmap_and_free_vma:
- fput(vma->vm_file);
- vma->vm_file = NULL;
-
- vma_iter_set(&vmi, vma->vm_end);
- /* Undo any partial mapping done by a device driver. */
- unmap_region(&vmi.mas, vma, vmg.prev, vmg.next);
- }
- if (writable_file_mapping)
- mapping_unmap_writable(file->f_mapping);
+ vma_iter_set(&vmi, vma->vm_end);
+ /* Undo any partial mapping done by a device driver. */
+ unmap_region(&vmi.mas, vma, vmg.prev, vmg.next);
+free_iter_vma:
+ vma_iter_free(&vmi);
free_vma:
vm_area_free(vma);
unacct_error:
@@ -1595,10 +1573,43 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
abort_munmap:
vms_abort_munmap_vmas(&vms, &mas_detach);
gather_failed:
- validate_mm(mm);
return error;
}
+unsigned long mmap_region(struct file *file, unsigned long addr,
+ unsigned long len, vm_flags_t vm_flags, unsigned long pgoff,
+ struct list_head *uf)
+{
+ unsigned long ret;
+ bool writable_file_mapping = false;
+
+ /* Check to see if MDWE is applicable. */
+ if (map_deny_write_exec(vm_flags, vm_flags))
+ return -EACCES;
+
+ /* Allow architectures to sanity-check the vm_flags. */
+ if (!arch_validate_flags(vm_flags))
+ return -EINVAL;
+
+ /* Map writable and ensure this isn't a sealed memfd. */
+ if (file && is_shared_maywrite(vm_flags)) {
+ int error = mapping_map_writable(file->f_mapping);
+
+ if (error)
+ return error;
+ writable_file_mapping = true;
+ }
+
+ ret = __mmap_region(file, addr, len, vm_flags, pgoff, uf);
+
+ /* Clear our write mapping regardless of error. */
+ if (writable_file_mapping)
+ mapping_unmap_writable(file->f_mapping);
+
+ validate_mm(current->mm);
+ return ret;
+}
+
static int __vm_munmap(unsigned long start, size_t len, bool unlock)
{
int ret;
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 5de195060b2e251a835f622759550e6202167641
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024111151-threaten-calamari-7920@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 5de195060b2e251a835f622759550e6202167641 Mon Sep 17 00:00:00 2001
From: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com>
Date: Tue, 29 Oct 2024 18:11:48 +0000
Subject: [PATCH] mm: resolve faulty mmap_region() error path behaviour
The mmap_region() function is somewhat terrifying, with spaghetti-like
control flow and numerous means by which issues can arise and incomplete
state, memory leaks and other unpleasantness can occur.
A large amount of the complexity arises from trying to handle errors late
in the process of mapping a VMA, which forms the basis of recently
observed issues with resource leaks and observable inconsistent state.
Taking advantage of previous patches in this series we move a number of
checks earlier in the code, simplifying things by moving the core of the
logic into a static internal function __mmap_region().
Doing this allows us to perform a number of checks up front before we do
any real work, and allows us to unwind the writable unmap check
unconditionally as required and to perform a CONFIG_DEBUG_VM_MAPLE_TREE
validation unconditionally also.
We move a number of things here:
1. We preallocate memory for the iterator before we call the file-backed
memory hook, allowing us to exit early and avoid having to perform
complicated and error-prone close/free logic. We carefully free
iterator state on both success and error paths.
2. The enclosing mmap_region() function handles the mapping_map_writable()
logic early. Previously the logic had the mapping_map_writable() at the
point of mapping a newly allocated file-backed VMA, and a matching
mapping_unmap_writable() on success and error paths.
We now do this unconditionally if this is a file-backed, shared writable
mapping. If a driver changes the flags to eliminate VM_MAYWRITE, however
doing so does not invalidate the seal check we just performed, and we in
any case always decrement the counter in the wrapper.
We perform a debug assert to ensure a driver does not attempt to do the
opposite.
3. We also move arch_validate_flags() up into the mmap_region()
function. This is only relevant on arm64 and sparc64, and the check is
only meaningful for SPARC with ADI enabled. We explicitly add a warning
for this arch if a driver invalidates this check, though the code ought
eventually to be fixed to eliminate the need for this.
With all of these measures in place, we no longer need to explicitly close
the VMA on error paths, as we place all checks which might fail prior to a
call to any driver mmap hook.
This eliminates an entire class of errors, makes the code easier to reason
about and more robust.
Link: https://lkml.kernel.org/r/6e0becb36d2f5472053ac5d544c0edfe9b899e25.17302246…
Fixes: deb0f6562884 ("mm/mmap: undo ->mmap() when arch_validate_flags() fails")
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com>
Reported-by: Jann Horn <jannh(a)google.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett(a)oracle.com>
Reviewed-by: Vlastimil Babka <vbabka(a)suse.cz>
Tested-by: Mark Brown <broonie(a)kernel.org>
Cc: Andreas Larsson <andreas(a)gaisler.com>
Cc: Catalin Marinas <catalin.marinas(a)arm.com>
Cc: David S. Miller <davem(a)davemloft.net>
Cc: Helge Deller <deller(a)gmx.de>
Cc: James E.J. Bottomley <James.Bottomley(a)HansenPartnership.com>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: Peter Xu <peterx(a)redhat.com>
Cc: Will Deacon <will(a)kernel.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/mm/mmap.c b/mm/mmap.c
index aee5fa08ae5d..79d541f1502b 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1358,20 +1358,18 @@ int do_munmap(struct mm_struct *mm, unsigned long start, size_t len,
return do_vmi_munmap(&vmi, mm, start, len, uf, false);
}
-unsigned long mmap_region(struct file *file, unsigned long addr,
+static unsigned long __mmap_region(struct file *file, unsigned long addr,
unsigned long len, vm_flags_t vm_flags, unsigned long pgoff,
struct list_head *uf)
{
struct mm_struct *mm = current->mm;
struct vm_area_struct *vma = NULL;
pgoff_t pglen = PHYS_PFN(len);
- struct vm_area_struct *merge;
unsigned long charged = 0;
struct vma_munmap_struct vms;
struct ma_state mas_detach;
struct maple_tree mt_detach;
unsigned long end = addr + len;
- bool writable_file_mapping = false;
int error;
VMA_ITERATOR(vmi, mm, addr);
VMG_STATE(vmg, mm, &vmi, addr, end, vm_flags, pgoff);
@@ -1445,28 +1443,26 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
vm_flags_init(vma, vm_flags);
vma->vm_page_prot = vm_get_page_prot(vm_flags);
+ if (vma_iter_prealloc(&vmi, vma)) {
+ error = -ENOMEM;
+ goto free_vma;
+ }
+
if (file) {
vma->vm_file = get_file(file);
error = mmap_file(file, vma);
if (error)
- goto unmap_and_free_vma;
-
- if (vma_is_shared_maywrite(vma)) {
- error = mapping_map_writable(file->f_mapping);
- if (error)
- goto close_and_free_vma;
-
- writable_file_mapping = true;
- }
+ goto unmap_and_free_file_vma;
+ /* Drivers cannot alter the address of the VMA. */
+ WARN_ON_ONCE(addr != vma->vm_start);
/*
- * Expansion is handled above, merging is handled below.
- * Drivers should not alter the address of the VMA.
+ * Drivers should not permit writability when previously it was
+ * disallowed.
*/
- if (WARN_ON((addr != vma->vm_start))) {
- error = -EINVAL;
- goto close_and_free_vma;
- }
+ VM_WARN_ON_ONCE(vm_flags != vma->vm_flags &&
+ !(vm_flags & VM_MAYWRITE) &&
+ (vma->vm_flags & VM_MAYWRITE));
vma_iter_config(&vmi, addr, end);
/*
@@ -1474,6 +1470,8 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
* vma again as we may succeed this time.
*/
if (unlikely(vm_flags != vma->vm_flags && vmg.prev)) {
+ struct vm_area_struct *merge;
+
vmg.flags = vma->vm_flags;
/* If this fails, state is reset ready for a reattempt. */
merge = vma_merge_new_range(&vmg);
@@ -1491,7 +1489,7 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
vma = merge;
/* Update vm_flags to pick up the change. */
vm_flags = vma->vm_flags;
- goto unmap_writable;
+ goto file_expanded;
}
vma_iter_config(&vmi, addr, end);
}
@@ -1500,26 +1498,15 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
} else if (vm_flags & VM_SHARED) {
error = shmem_zero_setup(vma);
if (error)
- goto free_vma;
+ goto free_iter_vma;
} else {
vma_set_anonymous(vma);
}
- if (map_deny_write_exec(vma->vm_flags, vma->vm_flags)) {
- error = -EACCES;
- goto close_and_free_vma;
- }
-
- /* Allow architectures to sanity-check the vm_flags */
- if (!arch_validate_flags(vma->vm_flags)) {
- error = -EINVAL;
- goto close_and_free_vma;
- }
-
- if (vma_iter_prealloc(&vmi, vma)) {
- error = -ENOMEM;
- goto close_and_free_vma;
- }
+#ifdef CONFIG_SPARC64
+ /* TODO: Fix SPARC ADI! */
+ WARN_ON_ONCE(!arch_validate_flags(vm_flags));
+#endif
/* Lock the VMA since it is modified after insertion into VMA tree */
vma_start_write(vma);
@@ -1533,10 +1520,7 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
*/
khugepaged_enter_vma(vma, vma->vm_flags);
- /* Once vma denies write, undo our temporary denial count */
-unmap_writable:
- if (writable_file_mapping)
- mapping_unmap_writable(file->f_mapping);
+file_expanded:
file = vma->vm_file;
ksm_add_vma(vma);
expanded:
@@ -1569,23 +1553,17 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
vma_set_page_prot(vma);
- validate_mm(mm);
return addr;
-close_and_free_vma:
- vma_close(vma);
+unmap_and_free_file_vma:
+ fput(vma->vm_file);
+ vma->vm_file = NULL;
- if (file || vma->vm_file) {
-unmap_and_free_vma:
- fput(vma->vm_file);
- vma->vm_file = NULL;
-
- vma_iter_set(&vmi, vma->vm_end);
- /* Undo any partial mapping done by a device driver. */
- unmap_region(&vmi.mas, vma, vmg.prev, vmg.next);
- }
- if (writable_file_mapping)
- mapping_unmap_writable(file->f_mapping);
+ vma_iter_set(&vmi, vma->vm_end);
+ /* Undo any partial mapping done by a device driver. */
+ unmap_region(&vmi.mas, vma, vmg.prev, vmg.next);
+free_iter_vma:
+ vma_iter_free(&vmi);
free_vma:
vm_area_free(vma);
unacct_error:
@@ -1595,10 +1573,43 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
abort_munmap:
vms_abort_munmap_vmas(&vms, &mas_detach);
gather_failed:
- validate_mm(mm);
return error;
}
+unsigned long mmap_region(struct file *file, unsigned long addr,
+ unsigned long len, vm_flags_t vm_flags, unsigned long pgoff,
+ struct list_head *uf)
+{
+ unsigned long ret;
+ bool writable_file_mapping = false;
+
+ /* Check to see if MDWE is applicable. */
+ if (map_deny_write_exec(vm_flags, vm_flags))
+ return -EACCES;
+
+ /* Allow architectures to sanity-check the vm_flags. */
+ if (!arch_validate_flags(vm_flags))
+ return -EINVAL;
+
+ /* Map writable and ensure this isn't a sealed memfd. */
+ if (file && is_shared_maywrite(vm_flags)) {
+ int error = mapping_map_writable(file->f_mapping);
+
+ if (error)
+ return error;
+ writable_file_mapping = true;
+ }
+
+ ret = __mmap_region(file, addr, len, vm_flags, pgoff, uf);
+
+ /* Clear our write mapping regardless of error. */
+ if (writable_file_mapping)
+ mapping_unmap_writable(file->f_mapping);
+
+ validate_mm(current->mm);
+ return ret;
+}
+
static int __vm_munmap(unsigned long start, size_t len, bool unlock)
{
int ret;
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 5baf8b037debf4ec60108ccfeccb8636d1dbad81
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024111138-moving-borough-7e09@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 5baf8b037debf4ec60108ccfeccb8636d1dbad81 Mon Sep 17 00:00:00 2001
From: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com>
Date: Tue, 29 Oct 2024 18:11:47 +0000
Subject: [PATCH] mm: refactor arch_calc_vm_flag_bits() and arm64 MTE handling
Currently MTE is permitted in two circumstances (desiring to use MTE
having been specified by the VM_MTE flag) - where MAP_ANONYMOUS is
specified, as checked by arch_calc_vm_flag_bits() and actualised by
setting the VM_MTE_ALLOWED flag, or if the file backing the mapping is
shmem, in which case we set VM_MTE_ALLOWED in shmem_mmap() when the mmap
hook is activated in mmap_region().
The function that checks that, if VM_MTE is set, VM_MTE_ALLOWED is also
set is the arm64 implementation of arch_validate_flags().
Unfortunately, we intend to refactor mmap_region() to perform this check
earlier, meaning that in the case of a shmem backing we will not have
invoked shmem_mmap() yet, causing the mapping to fail spuriously.
It is inappropriate to set this architecture-specific flag in general mm
code anyway, so a sensible resolution of this issue is to instead move the
check somewhere else.
We resolve this by setting VM_MTE_ALLOWED much earlier in do_mmap(), via
the arch_calc_vm_flag_bits() call.
This is an appropriate place to do this as we already check for the
MAP_ANONYMOUS case here, and the shmem file case is simply a variant of
the same idea - we permit RAM-backed memory.
This requires a modification to the arch_calc_vm_flag_bits() signature to
pass in a pointer to the struct file associated with the mapping, however
this is not too egregious as this is only used by two architectures anyway
- arm64 and parisc.
So this patch performs this adjustment and removes the unnecessary
assignment of VM_MTE_ALLOWED in shmem_mmap().
[akpm(a)linux-foundation.org: fix whitespace, per Catalin]
Link: https://lkml.kernel.org/r/ec251b20ba1964fb64cf1607d2ad80c47f3873df.17302246…
Fixes: deb0f6562884 ("mm/mmap: undo ->mmap() when arch_validate_flags() fails")
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com>
Suggested-by: Catalin Marinas <catalin.marinas(a)arm.com>
Reported-by: Jann Horn <jannh(a)google.com>
Reviewed-by: Catalin Marinas <catalin.marinas(a)arm.com>
Reviewed-by: Vlastimil Babka <vbabka(a)suse.cz>
Cc: Andreas Larsson <andreas(a)gaisler.com>
Cc: David S. Miller <davem(a)davemloft.net>
Cc: Helge Deller <deller(a)gmx.de>
Cc: James E.J. Bottomley <James.Bottomley(a)HansenPartnership.com>
Cc: Liam R. Howlett <Liam.Howlett(a)oracle.com>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: Mark Brown <broonie(a)kernel.org>
Cc: Peter Xu <peterx(a)redhat.com>
Cc: Will Deacon <will(a)kernel.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/arch/arm64/include/asm/mman.h b/arch/arm64/include/asm/mman.h
index 9e39217b4afb..798d965760d4 100644
--- a/arch/arm64/include/asm/mman.h
+++ b/arch/arm64/include/asm/mman.h
@@ -6,6 +6,8 @@
#ifndef BUILD_VDSO
#include <linux/compiler.h>
+#include <linux/fs.h>
+#include <linux/shmem_fs.h>
#include <linux/types.h>
static inline unsigned long arch_calc_vm_prot_bits(unsigned long prot,
@@ -31,19 +33,21 @@ static inline unsigned long arch_calc_vm_prot_bits(unsigned long prot,
}
#define arch_calc_vm_prot_bits(prot, pkey) arch_calc_vm_prot_bits(prot, pkey)
-static inline unsigned long arch_calc_vm_flag_bits(unsigned long flags)
+static inline unsigned long arch_calc_vm_flag_bits(struct file *file,
+ unsigned long flags)
{
/*
* Only allow MTE on anonymous mappings as these are guaranteed to be
* backed by tags-capable memory. The vm_flags may be overridden by a
* filesystem supporting MTE (RAM-based).
*/
- if (system_supports_mte() && (flags & MAP_ANONYMOUS))
+ if (system_supports_mte() &&
+ ((flags & MAP_ANONYMOUS) || shmem_file(file)))
return VM_MTE_ALLOWED;
return 0;
}
-#define arch_calc_vm_flag_bits(flags) arch_calc_vm_flag_bits(flags)
+#define arch_calc_vm_flag_bits(file, flags) arch_calc_vm_flag_bits(file, flags)
static inline bool arch_validate_prot(unsigned long prot,
unsigned long addr __always_unused)
diff --git a/arch/parisc/include/asm/mman.h b/arch/parisc/include/asm/mman.h
index 89b6beeda0b8..663f587dc789 100644
--- a/arch/parisc/include/asm/mman.h
+++ b/arch/parisc/include/asm/mman.h
@@ -2,6 +2,7 @@
#ifndef __ASM_MMAN_H__
#define __ASM_MMAN_H__
+#include <linux/fs.h>
#include <uapi/asm/mman.h>
/* PARISC cannot allow mdwe as it needs writable stacks */
@@ -11,7 +12,7 @@ static inline bool arch_memory_deny_write_exec_supported(void)
}
#define arch_memory_deny_write_exec_supported arch_memory_deny_write_exec_supported
-static inline unsigned long arch_calc_vm_flag_bits(unsigned long flags)
+static inline unsigned long arch_calc_vm_flag_bits(struct file *file, unsigned long flags)
{
/*
* The stack on parisc grows upwards, so if userspace requests memory
@@ -23,6 +24,6 @@ static inline unsigned long arch_calc_vm_flag_bits(unsigned long flags)
return 0;
}
-#define arch_calc_vm_flag_bits(flags) arch_calc_vm_flag_bits(flags)
+#define arch_calc_vm_flag_bits(file, flags) arch_calc_vm_flag_bits(file, flags)
#endif /* __ASM_MMAN_H__ */
diff --git a/include/linux/mman.h b/include/linux/mman.h
index 8ddca62d6460..a842783ffa62 100644
--- a/include/linux/mman.h
+++ b/include/linux/mman.h
@@ -2,6 +2,7 @@
#ifndef _LINUX_MMAN_H
#define _LINUX_MMAN_H
+#include <linux/fs.h>
#include <linux/mm.h>
#include <linux/percpu_counter.h>
@@ -94,7 +95,7 @@ static inline void vm_unacct_memory(long pages)
#endif
#ifndef arch_calc_vm_flag_bits
-#define arch_calc_vm_flag_bits(flags) 0
+#define arch_calc_vm_flag_bits(file, flags) 0
#endif
#ifndef arch_validate_prot
@@ -151,13 +152,13 @@ calc_vm_prot_bits(unsigned long prot, unsigned long pkey)
* Combine the mmap "flags" argument into "vm_flags" used internally.
*/
static inline unsigned long
-calc_vm_flag_bits(unsigned long flags)
+calc_vm_flag_bits(struct file *file, unsigned long flags)
{
return _calc_vm_trans(flags, MAP_GROWSDOWN, VM_GROWSDOWN ) |
_calc_vm_trans(flags, MAP_LOCKED, VM_LOCKED ) |
_calc_vm_trans(flags, MAP_SYNC, VM_SYNC ) |
_calc_vm_trans(flags, MAP_STACK, VM_NOHUGEPAGE) |
- arch_calc_vm_flag_bits(flags);
+ arch_calc_vm_flag_bits(file, flags);
}
unsigned long vm_commit_limit(void);
diff --git a/mm/mmap.c b/mm/mmap.c
index ab71d4c3464c..aee5fa08ae5d 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -344,7 +344,7 @@ unsigned long do_mmap(struct file *file, unsigned long addr,
* to. we assume access permissions have been handled by the open
* of the memory object, so we don't do any here.
*/
- vm_flags |= calc_vm_prot_bits(prot, pkey) | calc_vm_flag_bits(flags) |
+ vm_flags |= calc_vm_prot_bits(prot, pkey) | calc_vm_flag_bits(file, flags) |
mm->def_flags | VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC;
/* Obtain the address to map to. we verify (or select) it and ensure
diff --git a/mm/nommu.c b/mm/nommu.c
index 635d028d647b..e9b5f527ab5b 100644
--- a/mm/nommu.c
+++ b/mm/nommu.c
@@ -842,7 +842,7 @@ static unsigned long determine_vm_flags(struct file *file,
{
unsigned long vm_flags;
- vm_flags = calc_vm_prot_bits(prot, 0) | calc_vm_flag_bits(flags);
+ vm_flags = calc_vm_prot_bits(prot, 0) | calc_vm_flag_bits(file, flags);
if (!file) {
/*
diff --git a/mm/shmem.c b/mm/shmem.c
index 4ba1d00fabda..e87f5d6799a7 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -2733,9 +2733,6 @@ static int shmem_mmap(struct file *file, struct vm_area_struct *vma)
if (ret)
return ret;
- /* arm64 - allow memory tagging on RAM-based files */
- vm_flags_set(vma, VM_MTE_ALLOWED);
-
file_accessed(file);
/* This is anonymous shared memory if it is unlinked at the time of mmap */
if (inode->i_nlink)
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x 0fb4a7ad270b3b209e510eb9dc5b07bf02b7edaf
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024111110-dubbed-hydration-c1be@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 0fb4a7ad270b3b209e510eb9dc5b07bf02b7edaf Mon Sep 17 00:00:00 2001
From: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com>
Date: Tue, 29 Oct 2024 18:11:46 +0000
Subject: [PATCH] mm: refactor map_deny_write_exec()
Refactor the map_deny_write_exec() to not unnecessarily require a VMA
parameter but rather to accept VMA flags parameters, which allows us to
use this function early in mmap_region() in a subsequent commit.
While we're here, we refactor the function to be more readable and add
some additional documentation.
Link: https://lkml.kernel.org/r/6be8bb59cd7c68006ebb006eb9d8dc27104b1f70.17302246…
Fixes: deb0f6562884 ("mm/mmap: undo ->mmap() when arch_validate_flags() fails")
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com>
Reported-by: Jann Horn <jannh(a)google.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett(a)oracle.com>
Reviewed-by: Vlastimil Babka <vbabka(a)suse.cz>
Reviewed-by: Jann Horn <jannh(a)google.com>
Cc: Andreas Larsson <andreas(a)gaisler.com>
Cc: Catalin Marinas <catalin.marinas(a)arm.com>
Cc: David S. Miller <davem(a)davemloft.net>
Cc: Helge Deller <deller(a)gmx.de>
Cc: James E.J. Bottomley <James.Bottomley(a)HansenPartnership.com>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: Mark Brown <broonie(a)kernel.org>
Cc: Peter Xu <peterx(a)redhat.com>
Cc: Will Deacon <will(a)kernel.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/include/linux/mman.h b/include/linux/mman.h
index bcb201ab7a41..8ddca62d6460 100644
--- a/include/linux/mman.h
+++ b/include/linux/mman.h
@@ -188,16 +188,31 @@ static inline bool arch_memory_deny_write_exec_supported(void)
*
* d) mmap(PROT_READ | PROT_EXEC)
* mmap(PROT_READ | PROT_EXEC | PROT_BTI)
+ *
+ * This is only applicable if the user has set the Memory-Deny-Write-Execute
+ * (MDWE) protection mask for the current process.
+ *
+ * @old specifies the VMA flags the VMA originally possessed, and @new the ones
+ * we propose to set.
+ *
+ * Return: false if proposed change is OK, true if not ok and should be denied.
*/
-static inline bool map_deny_write_exec(struct vm_area_struct *vma, unsigned long vm_flags)
+static inline bool map_deny_write_exec(unsigned long old, unsigned long new)
{
+ /* If MDWE is disabled, we have nothing to deny. */
if (!test_bit(MMF_HAS_MDWE, ¤t->mm->flags))
return false;
- if ((vm_flags & VM_EXEC) && (vm_flags & VM_WRITE))
+ /* If the new VMA is not executable, we have nothing to deny. */
+ if (!(new & VM_EXEC))
+ return false;
+
+ /* Under MDWE we do not accept newly writably executable VMAs... */
+ if (new & VM_WRITE)
return true;
- if (!(vma->vm_flags & VM_EXEC) && (vm_flags & VM_EXEC))
+ /* ...nor previously non-executable VMAs becoming executable. */
+ if (!(old & VM_EXEC))
return true;
return false;
diff --git a/mm/mmap.c b/mm/mmap.c
index ac0604f146f6..ab71d4c3464c 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1505,7 +1505,7 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
vma_set_anonymous(vma);
}
- if (map_deny_write_exec(vma, vma->vm_flags)) {
+ if (map_deny_write_exec(vma->vm_flags, vma->vm_flags)) {
error = -EACCES;
goto close_and_free_vma;
}
diff --git a/mm/mprotect.c b/mm/mprotect.c
index 0c5d6d06107d..6f450af3252e 100644
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -810,7 +810,7 @@ static int do_mprotect_pkey(unsigned long start, size_t len,
break;
}
- if (map_deny_write_exec(vma, newflags)) {
+ if (map_deny_write_exec(vma->vm_flags, newflags)) {
error = -EACCES;
break;
}
diff --git a/mm/vma.h b/mm/vma.h
index 75558b5e9c8c..d58068c0ff2e 100644
--- a/mm/vma.h
+++ b/mm/vma.h
@@ -42,7 +42,7 @@ struct vma_munmap_struct {
int vma_count; /* Number of vmas that will be removed */
bool unlock; /* Unlock after the munmap */
bool clear_ptes; /* If there are outstanding PTE to be cleared */
- /* 1 byte hole */
+ /* 2 byte hole */
unsigned long nr_pages; /* Number of pages being removed */
unsigned long locked_vm; /* Number of locked pages */
unsigned long nr_accounted; /* Number of VM_ACCOUNT pages */