July 2023 - Linux-stable-mirror

[PATCH] venus: hfi_parser: Add check to keep the number of codecs within range

by Vikash Garodia

Supported codec bitmask is populated from the payload from venus firmware. There is a possible case when all the bits in the codec bitmask is set. In such case, core cap for decoder is filled and MAX_CODEC_NUM is utilized. Now while filling the caps for encoder, it can lead to access the caps array beyong 32 index. Hence leading to OOB write. The fix counts the supported encoder and decoder. If the count is more than max, then it skips accessing the caps. Cc: stable(a)vger.kernel.org Fixes: 1a73374a04e5 ("media: venus: hfi_parser: add common capability parser") Signed-off-by: Vikash Garodia <quic_vgarodia(a)quicinc.com> --- drivers/media/platform/qcom/venus/hfi_parser.c | 15 +++++++++++++++ 1 file changed, 15 insertions(+) diff --git a/drivers/media/platform/qcom/venus/hfi_parser.c b/drivers/media/platform/qcom/venus/hfi_parser.c index ec73cac..651e215 100644 --- a/drivers/media/platform/qcom/venus/hfi_parser.c +++ b/drivers/media/platform/qcom/venus/hfi_parser.c @@ -14,11 +14,26 @@ typedef void (*func)(struct hfi_plat_caps *cap, const void *data, unsigned int size); +static int count_setbits(u32 input) +{ + u32 count = 0; + + while (input > 0) { + if ((input & 1) == 1) + count++; + input >>= 1; + } + return count; +} + static void init_codecs(struct venus_core *core) { struct hfi_plat_caps *caps = core->caps, *cap; unsigned long bit; + if ((count_setbits(core->dec_codecs) + count_setbits(core->enc_codecs)) > MAX_CODEC_NUM) + return; + for_each_set_bit(bit, &core->dec_codecs, MAX_CODEC_NUM) { cap = &caps[core->codecs_count++]; cap->codec = BIT(bit); -- 2.7.4

1 year, 10 months

1
0
0 0

[PATCH] venus: hfi_parser: Add check to keep the number of codecs within range

by Vikash Garodia

Supported codec bitmask is populated from the payload from venus firmware. There is a possible case when all the bits in the codec bitmask is set. In such case, core cap for decoder is filled and MAX_CODEC_NUM is utilized. Now while filling the caps for encoder, it can lead to access the caps array beyong 32 index. Hence leading to OOB write. The fix counts the supported encoder and decoder. If the count is more than max, then it skips accessing the caps. Cc: stable(a)vger.kernel.org Fixes: 1a73374a04e5 ("media: venus: hfi_parser: add common capability parser") Signed-off-by: Vikash Garodia <quic_vgarodia(a)quicinc.com> --- drivers/media/platform/qcom/venus/hfi_parser.c | 15 +++++++++++++++ 1 file changed, 15 insertions(+) diff --git a/drivers/media/platform/qcom/venus/hfi_parser.c b/drivers/media/platform/qcom/venus/hfi_parser.c index ec73cac..651e215 100644 --- a/drivers/media/platform/qcom/venus/hfi_parser.c +++ b/drivers/media/platform/qcom/venus/hfi_parser.c @@ -14,11 +14,26 @@ typedef void (*func)(struct hfi_plat_caps *cap, const void *data, unsigned int size); +static int count_setbits(u32 input) +{ + u32 count = 0; + + while (input > 0) { + if ((input & 1) == 1) + count++; + input >>= 1; + } + return count; +} + static void init_codecs(struct venus_core *core) { struct hfi_plat_caps *caps = core->caps, *cap; unsigned long bit; + if ((count_setbits(core->dec_codecs) + count_setbits(core->enc_codecs)) > MAX_CODEC_NUM) + return; + for_each_set_bit(bit, &core->dec_codecs, MAX_CODEC_NUM) { cap = &caps[core->codecs_count++]; cap->codec = BIT(bit); -- The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project

1 year, 10 months

1
0
0 0

Re: RPET SHOPPING BAG 2023/7/28 12:54:33

by Carrie | Oreadybags

Hi, linux-stable-mirror Wish you have a nice day! This is Carrie. Do you may need below customized non-woven bags for your branding activities or products packing? We are bag factory directly who has been in this field for more than 12 years. Below is one of non-woven bags we did for your reference. Item: Ultrasonic Non-woven Bag Fabric: 80g non-woven fabric Size: 40w x 40h x 10d cm Printing: 2 colors-2 sides MOQ: 5,000 pcs Price: Ask me, please! Lead Time: 15-25 days All bags details, including size/ fabric/ printing/ package, can be customized according to your request. Please reply directly if you need any further information. Best regards, Carrie Pan +86 18906051620 Please kindly reply “remove” if you are not interested. Wish you have a nice day! But was she mourning— or just confused? Do non-human animals grieve? 00:48 This question is tricky It's all about just building your resume In fact, it was 1987" 00:25 And imagine your musician saying, "No, it's not true

1 year, 10 months

1
0
0 0

[PATCH] [v4] bpf: fix bpf_probe_read_kernel prototype mismatch

by Arnd Bergmann

From: Arnd Bergmann <arnd(a)arndb.de> bpf_probe_read_kernel() has a __weak definition in core.c and another definition with an incompatible prototype in kernel/trace/bpf_trace.c, when CONFIG_BPF_EVENTS is enabled. Since the two are incompatible, there cannot be a shared declaration in a header file, but the lack of a prototype causes a W=1 warning: kernel/bpf/core.c:1638:12: error: no previous prototype for 'bpf_probe_read_kernel' [-Werror=missing-prototypes] On 32-bit architectures, the local prototype u64 __weak bpf_probe_read_kernel(void *dst, u32 size, const void *unsafe_ptr) passes arguments in other registers as the one in bpf_trace.c BPF_CALL_3(bpf_probe_read_kernel, void *, dst, u32, size, const void *, unsafe_ptr) which uses 64-bit arguments in pairs of registers. As both versions of the function are fairly simple and only really differ in one line, just move them into a header file as an inline function that does not add any overhead for the bpf_trace.c callers and actually avoids a function call for the other one. Cc: stable(a)vger.kernel.org Link: https://lore.kernel.org/all/ac25cb0f-b804-1649-3afb-1dc6138c2716@iogearbox.… Signed-off-by: Arnd Bergmann <arnd(a)arndb.de> -- v4: rewrite again to use a shared inline helper v3: clarify changelog text further. v2: rewrite completely to fix the mismatch. --- include/linux/bpf.h | 12 ++++++++++++ kernel/bpf/core.c | 10 ++-------- kernel/trace/bpf_trace.c | 11 ----------- 3 files changed, 14 insertions(+), 19 deletions(-) diff --git a/include/linux/bpf.h b/include/linux/bpf.h index ceaa8c23287fc..abe75063630b8 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -2661,6 +2661,18 @@ static inline void bpf_dynptr_set_rdonly(struct bpf_dynptr_kern *ptr) } #endif /* CONFIG_BPF_SYSCALL */ +static __always_inline int +bpf_probe_read_kernel_common(void *dst, u32 size, const void *unsafe_ptr) +{ + int ret = -EFAULT; + + if (IS_ENABLED(CONFIG_BPF_EVENTS)) + ret = copy_from_kernel_nofault(dst, unsafe_ptr, size); + if (unlikely(ret < 0)) + memset(dst, 0, size); + return ret; +} + void __bpf_free_used_btfs(struct bpf_prog_aux *aux, struct btf_mod_pair *used_btfs, u32 len); diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c index dd70c58c9d3a3..9cdf53bfb8bd3 100644 --- a/kernel/bpf/core.c +++ b/kernel/bpf/core.c @@ -1634,12 +1634,6 @@ bool bpf_opcode_in_insntable(u8 code) } #ifndef CONFIG_BPF_JIT_ALWAYS_ON -u64 __weak bpf_probe_read_kernel(void *dst, u32 size, const void *unsafe_ptr) -{ - memset(dst, 0, size); - return -EFAULT; -} - /** * ___bpf_prog_run - run eBPF program on a given context * @regs: is the array of MAX_BPF_EXT_REG eBPF pseudo-registers @@ -1930,8 +1924,8 @@ static u64 ___bpf_prog_run(u64 *regs, const struct bpf_insn *insn) DST = *(SIZE *)(unsigned long) (SRC + insn->off); \ CONT; \ LDX_PROBE_MEM_##SIZEOP: \ - bpf_probe_read_kernel(&DST, sizeof(SIZE), \ - (const void *)(long) (SRC + insn->off)); \ + bpf_probe_read_kernel_common(&DST, sizeof(SIZE), \ + (const void *)(long) (SRC + insn->off)); \ DST = *((SIZE *)&DST); \ CONT; diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c index c92eb8c6ff08d..83bde2475ae54 100644 --- a/kernel/trace/bpf_trace.c +++ b/kernel/trace/bpf_trace.c @@ -223,17 +223,6 @@ const struct bpf_func_proto bpf_probe_read_user_str_proto = { .arg3_type = ARG_ANYTHING, }; -static __always_inline int -bpf_probe_read_kernel_common(void *dst, u32 size, const void *unsafe_ptr) -{ - int ret; - - ret = copy_from_kernel_nofault(dst, unsafe_ptr, size); - if (unlikely(ret < 0)) - memset(dst, 0, size); - return ret; -} - BPF_CALL_3(bpf_probe_read_kernel, void *, dst, u32, size, const void *, unsafe_ptr) { -- 2.39.2

1 year, 10 months

2
1
0 0

[PATCH 1/2] perf build: Update build rule for generated files

by Namhyung Kim

The bison and flex generate C files from the source (.y and .l) files. When O= option is used, they are saved in a separate directory but the default build rule assumes the .C files are in the source directory. So it might read invalid file if there are generated files from an old version. The same is true for the pmu-events files. For example, the following command would cause a build failure: $ git checkout v6.3 $ make -C tools/perf # build in the same directory $ git checkout v6.5-rc2 $ mkdir build # create a build directory $ make -C tools/perf O=build # build in a different directory but it # refers files in the source directory Let's update the build rule to specify those cases explicitly to depend on the files in the output directory. Note that it's not a complete fix and it needs the next patch for the include path too. Fixes: 80eeb67fe577 ("perf jevents: Program to convert JSON file") Cc: stable(a)vger.kernel.org Signed-off-by: Namhyung Kim <namhyung(a)kernel.org> --- tools/build/Makefile.build | 8 ++++++++ tools/perf/pmu-events/Build | 4 ++++ 2 files changed, 12 insertions(+) diff --git a/tools/build/Makefile.build b/tools/build/Makefile.build index 89430338a3d9..f9396696fcbf 100644 --- a/tools/build/Makefile.build +++ b/tools/build/Makefile.build @@ -117,6 +117,14 @@ $(OUTPUT)%.s: %.c FORCE $(call rule_mkdir) $(call if_changed_dep,cc_s_c) +$(OUTPUT)%-bison.o: $(OUTPUT)%-bison.c FORCE + $(call rule_mkdir) + $(call if_changed_dep,$(host)cc_o_c) + +$(OUTPUT)%-flex.o: $(OUTPUT)%-flex.c FORCE + $(call rule_mkdir) + $(call if_changed_dep,$(host)cc_o_c) + # Gather build data: # obj-y - list of build objects # subdir-y - list of directories to nest diff --git a/tools/perf/pmu-events/Build b/tools/perf/pmu-events/Build index 150765f2baee..f38a27765604 100644 --- a/tools/perf/pmu-events/Build +++ b/tools/perf/pmu-events/Build @@ -35,3 +35,7 @@ $(PMU_EVENTS_C): $(JSON) $(JSON_TEST) $(JEVENTS_PY) $(METRIC_PY) $(METRIC_TEST_L $(call rule_mkdir) $(Q)$(call echo-cmd,gen)$(PYTHON) $(JEVENTS_PY) $(JEVENTS_ARCH) $(JEVENTS_MODEL) pmu-events/arch $@ endif + +$(OUTPUT)pmu-events/pmu-events.o: $(PMU_EVENTS_C) + $(call rule_mkdir) + $(call if_changed_dep,$(host)cc_o_c) -- 2.41.0.487.g6d72f3e995-goog

1 year, 10 months

3
9
0 0

[PATCH v1 1/4] smaps: Fix the abnormal memory statistics obtained through /proc/pid/smaps

by David Hildenbrand

From: liubo <liubo254(a)huawei.com> In commit 474098edac26 ("mm/gup: replace FOLL_NUMA by gup_can_follow_protnone()"), FOLL_NUMA was removed and replaced by the gup_can_follow_protnone interface. However, for the case where the user-mode process uses transparent huge pages, when analyzing the memory usage through /proc/pid/smaps_rollup, the obtained memory usage is not consistent with the RSS in /proc/pid/status. Related examples are as follows: cat /proc/15427/status VmRSS: 20973024 kB RssAnon: 20971616 kB RssFile: 1408 kB RssShmem: 0 kB cat /proc/15427/smaps_rollup 00400000-7ffcc372d000 ---p 00000000 00:00 0 [rollup] Rss: 14419432 kB Pss: 14418079 kB Pss_Dirty: 14418016 kB Pss_Anon: 14418016 kB Pss_File: 63 kB Pss_Shmem: 0 kB Anonymous: 14418016 kB LazyFree: 0 kB AnonHugePages: 14417920 kB The root cause is that the traversal In the page table, the number of pages obtained by smaps_pmd_entry does not include the pages corresponding to PROTNONE,resulting in a different situation. Therefore, when obtaining pages through the follow_trans_huge_pmd interface, add the FOLL_FORCE flag to count the pages corresponding to PROTNONE to solve the above problem. Signed-off-by: liubo <liubo254(a)huawei.com> Cc: stable(a)vger.kernel.org Fixes: 474098edac26 ("mm/gup: replace FOLL_NUMA by gup_can_follow_protnone()") Signed-off-by: David Hildenbrand <david(a)redhat.com> # AKPM fixups, cc stable Signed-off-by: David Hildenbrand <david(a)redhat.com> --- fs/proc/task_mmu.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index c1e6531cb02a..7075ce11dc7d 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -571,8 +571,12 @@ static void smaps_pmd_entry(pmd_t *pmd, unsigned long addr, bool migration = false; if (pmd_present(*pmd)) { - /* FOLL_DUMP will return -EFAULT on huge zero page */ - page = follow_trans_huge_pmd(vma, addr, pmd, FOLL_DUMP); + /* + * FOLL_DUMP will return -EFAULT on huge zero page + * FOLL_FORCE follow a PROT_NONE mapped page + */ + page = follow_trans_huge_pmd(vma, addr, pmd, + FOLL_DUMP | FOLL_FORCE); } else if (unlikely(thp_migration_supported() && is_swap_pmd(*pmd))) { swp_entry_t entry = pmd_to_swp_entry(*pmd); -- 2.41.0

1 year, 10 months

1
0
0 0

[merged mm-hotfixes-stable] mm-memory-failure-fix-hardware-poison-check-in-unpoison_memory.patch removed from -mm tree

by Andrew Morton

The quilt patch titled Subject: mm/memory-failure: fix hardware poison check in unpoison_memory() has been removed from the -mm tree. Its filename was mm-memory-failure-fix-hardware-poison-check-in-unpoison_memory.patch This patch was dropped because it was merged into the mm-hotfixes-stable branch of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm ------------------------------------------------------ From: Sidhartha Kumar <sidhartha.kumar(a)oracle.com> Subject: mm/memory-failure: fix hardware poison check in unpoison_memory() Date: Mon, 17 Jul 2023 11:18:12 -0700 It was pointed out[1] that using folio_test_hwpoison() is wrong as we need to check the indiviual page that has poison. folio_test_hwpoison() only checks the head page so go back to using PageHWPoison(). User-visible effects include existing hwpoison-inject tests possibly failing as unpoisoning a single subpage could lead to unpoisoning an entire folio. Memory unpoisoning could also not work as expected as the function will break early due to only checking the head page and not the actually poisoned subpage. [1]: https://lore.kernel.org/lkml/ZLIbZygG7LqSI9xe@casper.infradead.org/ Link: https://lkml.kernel.org/r/20230717181812.167757-1-sidhartha.kumar@oracle.com Fixes: a6fddef49eef ("mm/memory-failure: convert unpoison_memory() to folios") Signed-off-by: Sidhartha Kumar <sidhartha.kumar(a)oracle.com> Reported-by: Matthew Wilcox (Oracle) <willy(a)infradead.org> Acked-by: Naoya Horiguchi <naoya.horiguchi(a)nec.com> Reviewed-by: Miaohe Lin <linmiaohe(a)huawei.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- mm/memory-failure.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) --- a/mm/memory-failure.c~mm-memory-failure-fix-hardware-poison-check-in-unpoison_memory +++ a/mm/memory-failure.c @@ -2487,7 +2487,7 @@ int unpoison_memory(unsigned long pfn) goto unlock_mutex; } - if (!folio_test_hwpoison(folio)) { + if (!PageHWPoison(p)) { unpoison_pr_info("Unpoison: Page was already unpoisoned %#lx\n", pfn, &unpoison_rs); goto unlock_mutex; _ Patches currently in -mm which might be from sidhartha.kumar(a)oracle.com are mm-increase-usage-of-folio_next_index-helper.patch mm-memory-convert-do_page_mkwrite-to-use-folios.patch mm-memory-convert-wp_page_shared-to-use-folios.patch mm-memory-convert-do_shared_fault-to-folios.patch mm-memory-convert-do_read_fault-to-use-folios.patch mm-memory-pass-folio-into-do_page_mkwrite.patch mm-hugetlb-get-rid-of-page_hstate.patch

1 year, 10 months

1
0
0 0

[merged mm-hotfixes-stable] proc-vmcore-fix-signedness-bug-in-read_from_oldmem.patch removed from -mm tree

by Andrew Morton

The quilt patch titled Subject: proc/vmcore: fix signedness bug in read_from_oldmem() has been removed from the -mm tree. Its filename was proc-vmcore-fix-signedness-bug-in-read_from_oldmem.patch This patch was dropped because it was merged into the mm-hotfixes-stable branch of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm ------------------------------------------------------ From: Dan Carpenter <dan.carpenter(a)linaro.org> Subject: proc/vmcore: fix signedness bug in read_from_oldmem() Date: Tue, 25 Jul 2023 20:03:16 +0300 The bug is the error handling: if (tmp < nr_bytes) { "tmp" can hold negative error codes but because "nr_bytes" is type size_t the negative error codes are treated as very high positive values (success). Fix this by changing "nr_bytes" to type ssize_t. The "nr_bytes" variable is used to store values between 1 and PAGE_SIZE and they can fit in ssize_t without any issue. Link: https://lkml.kernel.org/r/b55f7eed-1c65-4adc-95d1-6c7c65a54a6e@moroto.mount… Fixes: 5d8de293c224 ("vmcore: convert copy_oldmem_page() to take an iov_iter") Signed-off-by: Dan Carpenter <dan.carpenter(a)linaro.org> Reviewed-by: Matthew Wilcox (Oracle) <willy(a)infradead.org> Acked-by: Baoquan He <bhe(a)redhat.com> Cc: Dave Young <dyoung(a)redhat.com> Cc: Vivek Goyal <vgoyal(a)redhat.com> Cc: Alexey Dobriyan <adobriyan(a)gmail.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- fs/proc/vmcore.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) --- a/fs/proc/vmcore.c~proc-vmcore-fix-signedness-bug-in-read_from_oldmem +++ a/fs/proc/vmcore.c @@ -132,7 +132,7 @@ ssize_t read_from_oldmem(struct iov_iter u64 *ppos, bool encrypted) { unsigned long pfn, offset; - size_t nr_bytes; + ssize_t nr_bytes; ssize_t read = 0, tmp; int idx; _ Patches currently in -mm which might be from dan.carpenter(a)linaro.org are

1 year, 10 months

1
0
0 0

[merged mm-hotfixes-stable] mm-lock-vma-in-dup_anon_vma-before-setting-anon_vma.patch removed from -mm tree

by Andrew Morton

The quilt patch titled Subject: mm: lock VMA in dup_anon_vma() before setting ->anon_vma has been removed from the -mm tree. Its filename was mm-lock-vma-in-dup_anon_vma-before-setting-anon_vma.patch This patch was dropped because it was merged into the mm-hotfixes-stable branch of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm ------------------------------------------------------ From: Jann Horn <jannh(a)google.com> Subject: mm: lock VMA in dup_anon_vma() before setting ->anon_vma Date: Fri, 21 Jul 2023 05:46:43 +0200 When VMAs are merged, dup_anon_vma() is called with `dst` pointing to the VMA that is being expanded to cover the area previously occupied by another VMA. This currently happens while `dst` is not write-locked. This means that, in the `src->anon_vma && !dst->anon_vma` case, as soon as the assignment `dst->anon_vma = src->anon_vma` has happened, concurrent page faults can happen on `dst` under the per-VMA lock. This is already icky in itself, since such page faults can now install pages into `dst` that are attached to an `anon_vma` that is not yet tied back to the `anon_vma` with an `anon_vma_chain`. But if `anon_vma_clone()` fails due to an out-of-memory error, things get much worse: `anon_vma_clone()` then reverts `dst->anon_vma` back to NULL, and `dst` remains completely unconnected to the `anon_vma`, even though we can have pages in the area covered by `dst` that point to the `anon_vma`. This means the `anon_vma` of such pages can be freed while the pages are still mapped into userspace, which leads to UAF when a helper like folio_lock_anon_vma_read() tries to look up the anon_vma of such a page. This theoretically is a security bug, but I believe it is really hard to actually trigger as an unprivileged user because it requires that you can make an order-0 GFP_KERNEL allocation fail, and the page allocator tries pretty hard to prevent that. I think doing the vma_start_write() call inside dup_anon_vma() is the most straightforward fix for now. For a kernel-assisted reproducer, see the notes section of the patch mail. Link: https://lkml.kernel.org/r/20230721034643.616851-1-jannh@google.com Fixes: 5e31275cc997 ("mm: add per-VMA lock and helper functions to control it") Signed-off-by: Jann Horn <jannh(a)google.com> Reviewed-by: Suren Baghdasaryan <surenb(a)google.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- mm/mmap.c | 1 + 1 file changed, 1 insertion(+) --- a/mm/mmap.c~mm-lock-vma-in-dup_anon_vma-before-setting-anon_vma +++ a/mm/mmap.c @@ -615,6 +615,7 @@ static inline int dup_anon_vma(struct vm * anon pages imported. */ if (src->anon_vma && !dst->anon_vma) { + vma_start_write(dst); dst->anon_vma = src->anon_vma; return anon_vma_clone(dst, src); } _ Patches currently in -mm which might be from jannh(a)google.com are mm-dont-drop-vma-locks-in-mm_drop_all_locks.patch

1 year, 10 months

1
0
0 0

[merged mm-hotfixes-stable] mm-fix-memory-ordering-for-mm_lock_seq-and-vm_lock_seq.patch removed from -mm tree

by Andrew Morton

The quilt patch titled Subject: mm: fix memory ordering for mm_lock_seq and vm_lock_seq has been removed from the -mm tree. Its filename was mm-fix-memory-ordering-for-mm_lock_seq-and-vm_lock_seq.patch This patch was dropped because it was merged into the mm-hotfixes-stable branch of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm ------------------------------------------------------ From: Jann Horn <jannh(a)google.com> Subject: mm: fix memory ordering for mm_lock_seq and vm_lock_seq Date: Sat, 22 Jul 2023 00:51:07 +0200 mm->mm_lock_seq effectively functions as a read/write lock; therefore it must be used with acquire/release semantics. A specific example is the interaction between userfaultfd_register() and lock_vma_under_rcu(). userfaultfd_register() does the following from the point where it changes a VMA's flags to the point where concurrent readers are permitted again (in a simple scenario where only a single private VMA is accessed and no merging/splitting is involved): userfaultfd_register userfaultfd_set_vm_flags vm_flags_reset vma_start_write down_write(&vma->vm_lock->lock) vma->vm_lock_seq = mm_lock_seq [marks VMA as busy] up_write(&vma->vm_lock->lock) vm_flags_init [sets VM_UFFD_* in __vm_flags] vma->vm_userfaultfd_ctx.ctx = ctx mmap_write_unlock vma_end_write_all WRITE_ONCE(mm->mm_lock_seq, mm->mm_lock_seq + 1) [unlocks VMA] There are no memory barriers in between the __vm_flags update and the mm->mm_lock_seq update that unlocks the VMA, so the unlock can be reordered to above the `vm_flags_init()` call, which means from the perspective of a concurrent reader, a VMA can be marked as a userfaultfd VMA while it is not VMA-locked. That's bad, we definitely need a store-release for the unlock operation. The non-atomic write to vma->vm_lock_seq in vma_start_write() is mostly fine because all accesses to vma->vm_lock_seq that matter are always protected by the VMA lock. There is a racy read in vma_start_read() though that can tolerate false-positives, so we should be using WRITE_ONCE() to keep things tidy and data-race-free (including for KCSAN). On the other side, lock_vma_under_rcu() works as follows in the relevant region for locking and userfaultfd check: lock_vma_under_rcu vma_start_read vma->vm_lock_seq == READ_ONCE(vma->vm_mm->mm_lock_seq) [early bailout] down_read_trylock(&vma->vm_lock->lock) vma->vm_lock_seq == READ_ONCE(vma->vm_mm->mm_lock_seq) [main check] userfaultfd_armed checks vma->vm_flags & __VM_UFFD_FLAGS Here, the interesting aspect is how far down the mm->mm_lock_seq read can be reordered - if this read is reordered down below the vma->vm_flags access, this could cause lock_vma_under_rcu() to partly operate on information that was read while the VMA was supposed to be locked. To prevent this kind of downwards bleeding of the mm->mm_lock_seq read, we need to read it with a load-acquire. Some of the comment wording is based on suggestions by Suren. BACKPORT WARNING: One of the functions changed by this patch (which I've written against Linus' tree) is vma_try_start_write(), but this function no longer exists in mm/mm-everything. I don't know whether the merged version of this patch will be ordered before or after the patch that removes vma_try_start_write(). If you're backporting this patch to a tree with vma_try_start_write(), make sure this patch changes that function. Link: https://lkml.kernel.org/r/20230721225107.942336-1-jannh@google.com Fixes: 5e31275cc997 ("mm: add per-VMA lock and helper functions to control it") Signed-off-by: Jann Horn <jannh(a)google.com> Reviewed-by: Suren Baghdasaryan <surenb(a)google.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- include/linux/mm.h | 29 +++++++++++++++++++++++------ include/linux/mm_types.h | 28 ++++++++++++++++++++++++++++ include/linux/mmap_lock.h | 10 ++++++++-- 3 files changed, 59 insertions(+), 8 deletions(-) --- a/include/linux/mmap_lock.h~mm-fix-memory-ordering-for-mm_lock_seq-and-vm_lock_seq +++ a/include/linux/mmap_lock.h @@ -76,8 +76,14 @@ static inline void mmap_assert_write_loc static inline void vma_end_write_all(struct mm_struct *mm) { mmap_assert_write_locked(mm); - /* No races during update due to exclusive mmap_lock being held */ - WRITE_ONCE(mm->mm_lock_seq, mm->mm_lock_seq + 1); + /* + * Nobody can concurrently modify mm->mm_lock_seq due to exclusive + * mmap_lock being held. + * We need RELEASE semantics here to ensure that preceding stores into + * the VMA take effect before we unlock it with this store. + * Pairs with ACQUIRE semantics in vma_start_read(). + */ + smp_store_release(&mm->mm_lock_seq, mm->mm_lock_seq + 1); } #else static inline void vma_end_write_all(struct mm_struct *mm) {} --- a/include/linux/mm.h~mm-fix-memory-ordering-for-mm_lock_seq-and-vm_lock_seq +++ a/include/linux/mm.h @@ -641,8 +641,14 @@ static inline void vma_numab_state_free( */ static inline bool vma_start_read(struct vm_area_struct *vma) { - /* Check before locking. A race might cause false locked result. */ - if (vma->vm_lock_seq == READ_ONCE(vma->vm_mm->mm_lock_seq)) + /* + * Check before locking. A race might cause false locked result. + * We can use READ_ONCE() for the mm_lock_seq here, and don't need + * ACQUIRE semantics, because this is just a lockless check whose result + * we don't rely on for anything - the mm_lock_seq read against which we + * need ordering is below. + */ + if (READ_ONCE(vma->vm_lock_seq) == READ_ONCE(vma->vm_mm->mm_lock_seq)) return false; if (unlikely(down_read_trylock(&vma->vm_lock->lock) == 0)) @@ -653,8 +659,13 @@ static inline bool vma_start_read(struct * False unlocked result is impossible because we modify and check * vma->vm_lock_seq under vma->vm_lock protection and mm->mm_lock_seq * modification invalidates all existing locks. + * + * We must use ACQUIRE semantics for the mm_lock_seq so that if we are + * racing with vma_end_write_all(), we only start reading from the VMA + * after it has been unlocked. + * This pairs with RELEASE semantics in vma_end_write_all(). */ - if (unlikely(vma->vm_lock_seq == READ_ONCE(vma->vm_mm->mm_lock_seq))) { + if (unlikely(vma->vm_lock_seq == smp_load_acquire(&vma->vm_mm->mm_lock_seq))) { up_read(&vma->vm_lock->lock); return false; } @@ -676,7 +687,7 @@ static bool __is_vma_write_locked(struct * current task is holding mmap_write_lock, both vma->vm_lock_seq and * mm->mm_lock_seq can't be concurrently modified. */ - *mm_lock_seq = READ_ONCE(vma->vm_mm->mm_lock_seq); + *mm_lock_seq = vma->vm_mm->mm_lock_seq; return (vma->vm_lock_seq == *mm_lock_seq); } @@ -688,7 +699,13 @@ static inline void vma_start_write(struc return; down_write(&vma->vm_lock->lock); - vma->vm_lock_seq = mm_lock_seq; + /* + * We should use WRITE_ONCE() here because we can have concurrent reads + * from the early lockless pessimistic check in vma_start_read(). + * We don't really care about the correctness of that early check, but + * we should use WRITE_ONCE() for cleanliness and to keep KCSAN happy. + */ + WRITE_ONCE(vma->vm_lock_seq, mm_lock_seq); up_write(&vma->vm_lock->lock); } @@ -702,7 +719,7 @@ static inline bool vma_try_start_write(s if (!down_write_trylock(&vma->vm_lock->lock)) return false; - vma->vm_lock_seq = mm_lock_seq; + WRITE_ONCE(vma->vm_lock_seq, mm_lock_seq); up_write(&vma->vm_lock->lock); return true; } --- a/include/linux/mm_types.h~mm-fix-memory-ordering-for-mm_lock_seq-and-vm_lock_seq +++ a/include/linux/mm_types.h @@ -514,6 +514,20 @@ struct vm_area_struct { }; #ifdef CONFIG_PER_VMA_LOCK + /* + * Can only be written (using WRITE_ONCE()) while holding both: + * - mmap_lock (in write mode) + * - vm_lock->lock (in write mode) + * Can be read reliably while holding one of: + * - mmap_lock (in read or write mode) + * - vm_lock->lock (in read or write mode) + * Can be read unreliably (using READ_ONCE()) for pessimistic bailout + * while holding nothing (except RCU to keep the VMA struct allocated). + * + * This sequence counter is explicitly allowed to overflow; sequence + * counter reuse can only lead to occasional unnecessary use of the + * slowpath. + */ int vm_lock_seq; struct vma_lock *vm_lock; @@ -679,6 +693,20 @@ struct mm_struct { * by mmlist_lock */ #ifdef CONFIG_PER_VMA_LOCK + /* + * This field has lock-like semantics, meaning it is sometimes + * accessed with ACQUIRE/RELEASE semantics. + * Roughly speaking, incrementing the sequence number is + * equivalent to releasing locks on VMAs; reading the sequence + * number can be part of taking a read lock on a VMA. + * + * Can be modified under write mmap_lock using RELEASE + * semantics. + * Can be read with no other protection when holding write + * mmap_lock. + * Can be read with ACQUIRE semantics if not holding write + * mmap_lock. + */ int mm_lock_seq; #endif _ Patches currently in -mm which might be from jannh(a)google.com are mm-dont-drop-vma-locks-in-mm_drop_all_locks.patch

1 year, 10 months

1
0
0 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

Linux-stable-mirror July 2023