This is the start of the stable review cycle for the 5.15.73 release.
There are 37 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Wed, 12 Oct 2022 07:03:19 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.15.73-rc…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.15.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 5.15.73-rc1
Krzysztof Kozlowski <krzysztof.kozlowski(a)linaro.org>
rpmsg: qcom: glink: replace strncpy() with strscpy_pad()
Johan Hovold <johan(a)kernel.org>
USB: serial: ftdi_sio: fix 300 bps rate for SIO
Tadeusz Struk <tadeusz.struk(a)linaro.org>
usb: mon: make mmapped memory read only
Vlad Buslov <vladbu(a)nvidia.com>
net/mlx5: Disable irq when locking lag_lock
Tamizh Chelvam Raja <quic_tamizhr(a)quicinc.com>
wifi: cfg80211: fix MCS divisor value
Naoya Horiguchi <naoya.horiguchi(a)nec.com>
mm/huge_memory: use pfn_to_online_page() in split_huge_pages_all()
Miaohe Lin <linmiaohe(a)huawei.com>
mm/huge_memory: minor cleanup for split_huge_pages_all
Ian Rogers <irogers(a)google.com>
perf parse-events: Identify broken modifiers
Brian Norris <briannorris(a)chromium.org>
mmc: core: Terminate infinite loop in SD-UHS voltage switch
ChanWoo Lee <cw9316.lee(a)samsung.com>
mmc: core: Replace with already defined values for readability
zhikzhai <zhikai.zhai(a)amd.com>
drm/amd/display: skip audio setup when audio stream is enabled
Hugo Hu <hugo.hu(a)amd.com>
drm/amd/display: update gamut remap if plane has changed
Michael Strauss <michael.strauss(a)amd.com>
drm/amd/display: Assume an LTTPR is always present on fixed_vs links
Leo Li <sunpeng.li(a)amd.com>
drm/amd/display: Fix double cursor on non-video RGB MPO
Jianglei Nie <niejianglei2021(a)163.com>
net: atlantic: fix potential memory leak in aq_ndev_close()
David Gow <davidgow(a)google.com>
arch: um: Mark the stack non-executable to fix a binutils warning
Lukas Straub <lukasstraub2(a)web.de>
um: Cleanup compiler warning in arch/x86/um/tls_32.c
Lukas Straub <lukasstraub2(a)web.de>
um: Cleanup syscall_handler_t cast in syscalls_32.h
Jaroslav Kysela <perex(a)perex.cz>
ALSA: hda/hdmi: Fix the converter reuse for the silent stream
Oleksandr Mazur <oleksandr.mazur(a)plvision.eu>
net: marvell: prestera: add support for for Aldrin2
Haimin Zhang <tcs.kernel(a)gmail.com>
net/ieee802154: fix uninit value bug in dgram_sendmsg
Letu Ren <fantasquex(a)gmail.com>
scsi: qedf: Fix a UAF bug in __qedf_probe()
Sergei Antonov <saproj(a)gmail.com>
ARM: dts: fix Moxa SDIO 'compatible', remove 'sdhci' misnomer
Swati Agarwal <swati.agarwal(a)xilinx.com>
dmaengine: xilinx_dma: Report error in case of dma_set_mask_and_coherent API failure
Swati Agarwal <swati.agarwal(a)xilinx.com>
dmaengine: xilinx_dma: cleanup for fetching xlnx,num-fstores property
Swati Agarwal <swati.agarwal(a)xilinx.com>
dmaengine: xilinx_dma: Fix devm_platform_ioremap_resource error handling
Cristian Marussi <cristian.marussi(a)arm.com>
firmware: arm_scmi: Add SCMI PM driver remove routine
Cristian Marussi <cristian.marussi(a)arm.com>
firmware: arm_scmi: Harden accesses to the sensor domains
Cristian Marussi <cristian.marussi(a)arm.com>
firmware: arm_scmi: Improve checks in the info_get operations
Dongliang Mu <mudongliangabcd(a)gmail.com>
fs: fix UAF/GPF bug in nilfs_mdt_destroy
Mikulas Patocka <mpatocka(a)redhat.com>
provide arch_test_bit_acquire for architectures that define test_bit
Mikulas Patocka <mpatocka(a)redhat.com>
wait_on_bit: add an acquire memory barrier
Yang Shi <shy828301(a)gmail.com>
powerpc/64s/radix: don't need to broadcast IPI for radix pmd collapse flush
Yang Shi <shy828301(a)gmail.com>
mm: gup: fix the fast GUP race against THP collapse
Jalal Mostafa <jalal.a.mostapha(a)gmail.com>
xsk: Inherit need_wakeup flag for shared sockets
Shuah Khan <skhan(a)linuxfoundation.org>
docs: update mediator information in CoC docs
Sami Tolvanen <samitolvanen(a)google.com>
Makefile.extrawarn: Move -Wcast-function-type-strict to W=1
-------------
Diffstat:
.../devicetree/bindings/dma/moxa,moxart-dma.txt | 4 +-
.../process/code-of-conduct-interpretation.rst | 2 +-
Makefile | 4 +-
arch/alpha/include/asm/bitops.h | 7 +++
arch/arm/boot/dts/moxart-uc7112lx.dts | 2 +-
arch/arm/boot/dts/moxart.dtsi | 4 +-
arch/hexagon/include/asm/bitops.h | 15 ++++++
arch/ia64/include/asm/bitops.h | 7 +++
arch/m68k/include/asm/bitops.h | 6 +++
arch/powerpc/mm/book3s64/radix_pgtable.c | 9 ----
arch/s390/include/asm/bitops.h | 7 +++
arch/sh/include/asm/bitops-op32.h | 7 +++
arch/um/Makefile | 8 ++++
arch/x86/include/asm/bitops.h | 21 +++++++++
arch/x86/um/shared/sysdep/syscalls_32.h | 5 +-
arch/x86/um/tls_32.c | 6 ---
arch/x86/um/vdso/Makefile | 2 +-
drivers/dma/xilinx/xilinx_dma.c | 21 +++++----
drivers/firmware/arm_scmi/clock.c | 6 ++-
drivers/firmware/arm_scmi/scmi_pm_domain.c | 20 ++++++++
drivers/firmware/arm_scmi/sensors.c | 25 ++++++++--
drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 12 ++++-
drivers/gpu/drm/amd/display/dc/core/dc_link_dp.c | 8 ++++
.../amd/display/dc/dce110/dce110_hw_sequencer.c | 6 ++-
drivers/gpu/drm/amd/display/dc/dcn20/dcn20_hwseq.c | 1 +
drivers/mmc/core/sd.c | 3 +-
drivers/net/ethernet/aquantia/atlantic/aq_main.c | 3 --
.../net/ethernet/marvell/prestera/prestera_pci.c | 1 +
drivers/net/ethernet/mellanox/mlx5/core/lag.c | 55 +++++++++++++---------
drivers/rpmsg/qcom_glink_native.c | 2 +-
drivers/rpmsg/qcom_smd.c | 4 +-
drivers/scsi/qedf/qedf_main.c | 5 --
drivers/usb/mon/mon_bin.c | 5 ++
drivers/usb/serial/ftdi_sio.c | 3 +-
fs/inode.c | 7 ++-
.../asm-generic/bitops/instrumented-non-atomic.h | 12 +++++
include/asm-generic/bitops/non-atomic.h | 14 ++++++
include/linux/buffer_head.h | 2 +-
include/linux/scmi_protocol.h | 4 +-
include/linux/wait_bit.h | 8 ++--
include/net/ieee802154_netdev.h | 37 +++++++++++++++
include/net/xsk_buff_pool.h | 2 +-
kernel/sched/wait_bit.c | 2 +-
mm/gup.c | 34 ++++++++++---
mm/huge_memory.c | 13 +++--
mm/khugepaged.c | 10 ++--
net/ieee802154/socket.c | 42 +++++++++--------
net/wireless/util.c | 2 +-
net/xdp/xsk.c | 4 +-
net/xdp/xsk_buff_pool.c | 5 +-
scripts/Makefile.extrawarn | 1 +
sound/pci/hda/patch_hdmi.c | 1 +
tools/perf/util/parse-events.y | 10 ++++
53 files changed, 374 insertions(+), 132 deletions(-)
The quilt patch titled
Subject: nilfs2: fix leak of nilfs_root in case of writer thread creation failure
has been removed from the -mm tree. Its filename was
nilfs2-fix-leak-of-nilfs_root-in-case-of-writer-thread-creation-failure.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Subject: nilfs2: fix leak of nilfs_root in case of writer thread creation failure
Date: Fri, 7 Oct 2022 17:52:26 +0900
If nilfs_attach_log_writer() failed to create a log writer thread, it
frees a data structure of the log writer without any cleanup. After
commit e912a5b66837 ("nilfs2: use root object to get ifile"), this causes
a leak of struct nilfs_root, which started to leak an ifile metadata inode
and a kobject on that struct.
In addition, if the kernel is booted with panic_on_warn, the above
ifile metadata inode leak will cause the following panic when the
nilfs2 kernel module is removed:
kmem_cache_destroy nilfs2_inode_cache: Slab cache still has objects when
called from nilfs_destroy_cachep+0x16/0x3a [nilfs2]
WARNING: CPU: 8 PID: 1464 at mm/slab_common.c:494 kmem_cache_destroy+0x138/0x140
...
RIP: 0010:kmem_cache_destroy+0x138/0x140
Code: 00 20 00 00 e8 a9 55 d8 ff e9 76 ff ff ff 48 8b 53 60 48 c7 c6 20 70 65 86 48 c7 c7 d8 69 9c 86 48 8b 4c 24 28 e8 ef 71 c7 00 <0f> 0b e9 53 ff ff ff c3 48 81 ff ff 0f 00 00 77 03 31 c0 c3 53 48
...
Call Trace:
<TASK>
? nilfs_palloc_freev.cold.24+0x58/0x58 [nilfs2]
nilfs_destroy_cachep+0x16/0x3a [nilfs2]
exit_nilfs_fs+0xa/0x1b [nilfs2]
__x64_sys_delete_module+0x1d9/0x3a0
? __sanitizer_cov_trace_pc+0x1a/0x50
? syscall_trace_enter.isra.19+0x119/0x190
do_syscall_64+0x34/0x80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
...
</TASK>
Kernel panic - not syncing: panic_on_warn set ...
This patch fixes these issues by calling nilfs_detach_log_writer() cleanup
function if spawning the log writer thread fails.
Link: https://lkml.kernel.org/r/20221007085226.57667-1-konishi.ryusuke@gmail.com
Fixes: e912a5b66837 ("nilfs2: use root object to get ifile")
Signed-off-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Reported-by: syzbot+7381dc4ad60658ca4c05(a)syzkaller.appspotmail.com
Tested-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/nilfs2/segment.c | 7 +++----
1 file changed, 3 insertions(+), 4 deletions(-)
--- a/fs/nilfs2/segment.c~nilfs2-fix-leak-of-nilfs_root-in-case-of-writer-thread-creation-failure
+++ a/fs/nilfs2/segment.c
@@ -2786,10 +2786,9 @@ int nilfs_attach_log_writer(struct super
inode_attach_wb(nilfs->ns_bdev->bd_inode, NULL);
err = nilfs_segctor_start_thread(nilfs->ns_writer);
- if (err) {
- kfree(nilfs->ns_writer);
- nilfs->ns_writer = NULL;
- }
+ if (unlikely(err))
+ nilfs_detach_log_writer(sb);
+
return err;
}
_
Patches currently in -mm which might be from konishi.ryusuke(a)gmail.com are
The quilt patch titled
Subject: nilfs2: fix NULL pointer dereference at nilfs_bmap_lookup_at_level()
has been removed from the -mm tree. Its filename was
nilfs2-fix-null-pointer-dereference-at-nilfs_bmap_lookup_at_level.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Subject: nilfs2: fix NULL pointer dereference at nilfs_bmap_lookup_at_level()
Date: Sun, 2 Oct 2022 12:08:04 +0900
If the i_mode field in inode of metadata files is corrupted on disk, it
can cause the initialization of bmap structure, which should have been
called from nilfs_read_inode_common(), not to be called. This causes a
lockdep warning followed by a NULL pointer dereference at
nilfs_bmap_lookup_at_level().
This patch fixes these issues by adding a missing sanitiy check for the
i_mode field of metadata file's inode.
Link: https://lkml.kernel.org/r/20221002030804.29978-1-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Reported-by: syzbot+2b32eb36c1a825b7a74c(a)syzkaller.appspotmail.com
Reported-by: Tetsuo Handa <penguin-kernel(a)I-love.SAKURA.ne.jp>
Tested-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/nilfs2/inode.c | 2 ++
1 file changed, 2 insertions(+)
--- a/fs/nilfs2/inode.c~nilfs2-fix-null-pointer-dereference-at-nilfs_bmap_lookup_at_level
+++ a/fs/nilfs2/inode.c
@@ -455,6 +455,8 @@ int nilfs_read_inode_common(struct inode
inode->i_atime.tv_nsec = le32_to_cpu(raw_inode->i_mtime_nsec);
inode->i_ctime.tv_nsec = le32_to_cpu(raw_inode->i_ctime_nsec);
inode->i_mtime.tv_nsec = le32_to_cpu(raw_inode->i_mtime_nsec);
+ if (nilfs_is_metadata_file_inode(inode) && !S_ISREG(inode->i_mode))
+ return -EIO; /* this inode is for metadata and corrupted */
if (inode->i_nlink == 0)
return -ESTALE; /* this inode is deleted */
_
Patches currently in -mm which might be from konishi.ryusuke(a)gmail.com are
nilfs2-fix-leak-of-nilfs_root-in-case-of-writer-thread-creation-failure.patch
The quilt patch titled
Subject: nilfs2: fix use-after-free bug of struct nilfs_root
has been removed from the -mm tree. Its filename was
nilfs2-fix-use-after-free-bug-of-struct-nilfs_root.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Subject: nilfs2: fix use-after-free bug of struct nilfs_root
Date: Tue, 4 Oct 2022 00:05:19 +0900
If the beginning of the inode bitmap area is corrupted on disk, an inode
with the same inode number as the root inode can be allocated and fail
soon after. In this case, the subsequent call to nilfs_clear_inode() on
that bogus root inode will wrongly decrement the reference counter of
struct nilfs_root, and this will erroneously free struct nilfs_root,
causing kernel oopses.
This fixes the problem by changing nilfs_new_inode() to skip reserved
inode numbers while repairing the inode bitmap.
Link: https://lkml.kernel.org/r/20221003150519.39789-1-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Reported-by: syzbot+b8c672b0e22615c80fe0(a)syzkaller.appspotmail.com
Reported-by: Khalid Masum <khalid.masum.92(a)gmail.com>
Tested-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/nilfs2/inode.c | 17 ++++++++++++++++-
1 file changed, 16 insertions(+), 1 deletion(-)
--- a/fs/nilfs2/inode.c~nilfs2-fix-use-after-free-bug-of-struct-nilfs_root
+++ a/fs/nilfs2/inode.c
@@ -328,6 +328,7 @@ struct inode *nilfs_new_inode(struct ino
struct inode *inode;
struct nilfs_inode_info *ii;
struct nilfs_root *root;
+ struct buffer_head *bh;
int err = -ENOMEM;
ino_t ino;
@@ -343,11 +344,25 @@ struct inode *nilfs_new_inode(struct ino
ii->i_state = BIT(NILFS_I_NEW);
ii->i_root = root;
- err = nilfs_ifile_create_inode(root->ifile, &ino, &ii->i_bh);
+ err = nilfs_ifile_create_inode(root->ifile, &ino, &bh);
if (unlikely(err))
goto failed_ifile_create_inode;
/* reference count of i_bh inherits from nilfs_mdt_read_block() */
+ if (unlikely(ino < NILFS_USER_INO)) {
+ nilfs_warn(sb,
+ "inode bitmap is inconsistent for reserved inodes");
+ do {
+ brelse(bh);
+ err = nilfs_ifile_create_inode(root->ifile, &ino, &bh);
+ if (unlikely(err))
+ goto failed_ifile_create_inode;
+ } while (ino < NILFS_USER_INO);
+
+ nilfs_info(sb, "repaired inode bitmap for reserved inodes");
+ }
+ ii->i_bh = bh;
+
atomic64_inc(&root->inodes_count);
inode_init_owner(&init_user_ns, inode, dir, mode);
inode->i_ino = ino;
_
Patches currently in -mm which might be from konishi.ryusuke(a)gmail.com are
nilfs2-fix-null-pointer-dereference-at-nilfs_bmap_lookup_at_level.patch
nilfs2-fix-leak-of-nilfs_root-in-case-of-writer-thread-creation-failure.patch
The quilt patch titled
Subject: mm/hugetlb: fix races when looking up a CONT-PTE/PMD size hugetlb page
has been removed from the -mm tree. Its filename was
mm-hugetlb-fix-races-when-looking-up-a-cont-pte-pmd-size-hugetlb-page.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Baolin Wang <baolin.wang(a)linux.alibaba.com>
Subject: mm/hugetlb: fix races when looking up a CONT-PTE/PMD size hugetlb page
Date: Thu, 1 Sep 2022 18:41:31 +0800
On some architectures (like ARM64), it can support CONT-PTE/PMD size
hugetlb, which means it can support not only PMD/PUD size hugetlb (2M and
1G), but also CONT-PTE/PMD size(64K and 32M) if a 4K page size specified.
So when looking up a CONT-PTE size hugetlb page by follow_page(), it will
use pte_offset_map_lock() to get the pte entry lock for the CONT-PTE size
hugetlb in follow_page_pte(). However this pte entry lock is incorrect
for the CONT-PTE size hugetlb, since we should use huge_pte_lock() to get
the correct lock, which is mm->page_table_lock.
That means the pte entry of the CONT-PTE size hugetlb under current pte
lock is unstable in follow_page_pte(), we can continue to migrate or
poison the pte entry of the CONT-PTE size hugetlb, which can cause some
potential race issues, even though they are under the 'pte lock'.
For example, suppose thread A is trying to look up a CONT-PTE size hugetlb
page by move_pages() syscall under the lock, however antoher thread B can
migrate the CONT-PTE hugetlb page at the same time, which will cause
thread A to get an incorrect page, if thread A also wants to do page
migration, then data inconsistency error occurs.
Moreover we have the same issue for CONT-PMD size hugetlb in
follow_huge_pmd().
To fix above issues, rename the follow_huge_pmd() as follow_huge_pmd_pte()
to handle PMD and PTE level size hugetlb, which uses huge_pte_lock() to
get the correct pte entry lock to make the pte entry stable.
Mike said:
Support for CONT_PMD/_PTE was added with bb9dd3df8ee9 ("arm64: hugetlb:
refactor find_num_contig()"). Patch series "Support for contiguous pte
hugepages", v4. However, I do not believe these code paths were
executed until migration support was added with 5480280d3f2d ("arm64/mm:
enable HugeTLB migration for contiguous bit HugeTLB pages") I would go
with 5480280d3f2d for the Fixes: targe.
Link: https://lkml.kernel.org/r/635f43bdd85ac2615a58405da82b4d33c6e5eb05.16620175…
Fixes: 5480280d3f2d ("arm64/mm: enable HugeTLB migration for contiguous bit HugeTLB pages")
Signed-off-by: Baolin Wang <baolin.wang(a)linux.alibaba.com>
Suggested-by: Mike Kravetz <mike.kravetz(a)oracle.com>
Reviewed-by: Mike Kravetz <mike.kravetz(a)oracle.com>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: Muchun Song <songmuchun(a)bytedance.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
include/linux/hugetlb.h | 8 ++++----
mm/gup.c | 14 +++++++++++++-
mm/hugetlb.c | 27 +++++++++++++--------------
3 files changed, 30 insertions(+), 19 deletions(-)
--- a/include/linux/hugetlb.h~mm-hugetlb-fix-races-when-looking-up-a-cont-pte-pmd-size-hugetlb-page
+++ a/include/linux/hugetlb.h
@@ -207,8 +207,8 @@ struct page *follow_huge_addr(struct mm_
struct page *follow_huge_pd(struct vm_area_struct *vma,
unsigned long address, hugepd_t hpd,
int flags, int pdshift);
-struct page *follow_huge_pmd(struct mm_struct *mm, unsigned long address,
- pmd_t *pmd, int flags);
+struct page *follow_huge_pmd_pte(struct vm_area_struct *vma, unsigned long address,
+ int flags);
struct page *follow_huge_pud(struct mm_struct *mm, unsigned long address,
pud_t *pud, int flags);
struct page *follow_huge_pgd(struct mm_struct *mm, unsigned long address,
@@ -312,8 +312,8 @@ static inline struct page *follow_huge_p
return NULL;
}
-static inline struct page *follow_huge_pmd(struct mm_struct *mm,
- unsigned long address, pmd_t *pmd, int flags)
+static inline struct page *follow_huge_pmd_pte(struct vm_area_struct *vma,
+ unsigned long address, int flags)
{
return NULL;
}
--- a/mm/gup.c~mm-hugetlb-fix-races-when-looking-up-a-cont-pte-pmd-size-hugetlb-page
+++ a/mm/gup.c
@@ -530,6 +530,18 @@ static struct page *follow_page_pte(stru
if (WARN_ON_ONCE((flags & (FOLL_PIN | FOLL_GET)) ==
(FOLL_PIN | FOLL_GET)))
return ERR_PTR(-EINVAL);
+
+ /*
+ * Considering PTE level hugetlb, like continuous-PTE hugetlb on
+ * ARM64 architecture.
+ */
+ if (is_vm_hugetlb_page(vma)) {
+ page = follow_huge_pmd_pte(vma, address, flags);
+ if (page)
+ return page;
+ return no_page_table(vma, flags);
+ }
+
retry:
if (unlikely(pmd_bad(*pmd)))
return no_page_table(vma, flags);
@@ -662,7 +674,7 @@ static struct page *follow_pmd_mask(stru
if (pmd_none(pmdval))
return no_page_table(vma, flags);
if (pmd_huge(pmdval) && is_vm_hugetlb_page(vma)) {
- page = follow_huge_pmd(mm, address, pmd, flags);
+ page = follow_huge_pmd_pte(vma, address, flags);
if (page)
return page;
return no_page_table(vma, flags);
--- a/mm/hugetlb.c~mm-hugetlb-fix-races-when-looking-up-a-cont-pte-pmd-size-hugetlb-page
+++ a/mm/hugetlb.c
@@ -6946,12 +6946,13 @@ follow_huge_pd(struct vm_area_struct *vm
}
struct page * __weak
-follow_huge_pmd(struct mm_struct *mm, unsigned long address,
- pmd_t *pmd, int flags)
+follow_huge_pmd_pte(struct vm_area_struct *vma, unsigned long address, int flags)
{
+ struct hstate *h = hstate_vma(vma);
+ struct mm_struct *mm = vma->vm_mm;
struct page *page = NULL;
spinlock_t *ptl;
- pte_t pte;
+ pte_t *ptep, pte;
/*
* FOLL_PIN is not supported for follow_page(). Ordinary GUP goes via
@@ -6961,17 +6962,15 @@ follow_huge_pmd(struct mm_struct *mm, un
return NULL;
retry:
- ptl = pmd_lockptr(mm, pmd);
- spin_lock(ptl);
- /*
- * make sure that the address range covered by this pmd is not
- * unmapped from other threads.
- */
- if (!pmd_huge(*pmd))
- goto out;
- pte = huge_ptep_get((pte_t *)pmd);
+ ptep = huge_pte_offset(mm, address, huge_page_size(h));
+ if (!ptep)
+ return NULL;
+
+ ptl = huge_pte_lock(h, mm, ptep);
+ pte = huge_ptep_get(ptep);
if (pte_present(pte)) {
- page = pmd_page(*pmd) + ((address & ~PMD_MASK) >> PAGE_SHIFT);
+ page = pte_page(pte) +
+ ((address & ~huge_page_mask(h)) >> PAGE_SHIFT);
/*
* try_grab_page() should always succeed here, because: a) we
* hold the pmd (ptl) lock, and b) we've just checked that the
@@ -6987,7 +6986,7 @@ retry:
} else {
if (is_hugetlb_entry_migration(pte)) {
spin_unlock(ptl);
- __migration_entry_wait_huge((pte_t *)pmd, ptl);
+ __migration_entry_wait_huge(ptep, ptl);
goto retry;
}
/*
_
Patches currently in -mm which might be from baolin.wang(a)linux.alibaba.com are
The patch titled
Subject: mm/mmap: undo ->mmap() when arch_validate_flags() fails
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
mm-mmap-undo-mmap-when-arch_validate_flags-fails.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Carlos Llamas <cmllamas(a)google.com>
Subject: mm/mmap: undo ->mmap() when arch_validate_flags() fails
Date: Fri, 30 Sep 2022 00:38:43 +0000
Commit c462ac288f2c ("mm: Introduce arch_validate_flags()") added a late
check in mmap_region() to let architectures validate vm_flags. The check
needs to happen after calling ->mmap() as the flags can potentially be
modified during this callback.
If arch_validate_flags() check fails we unmap and free the vma. However,
the error path fails to undo the ->mmap() call that previously succeeded
and depending on the specific ->mmap() implementation this translates to
reference increments, memory allocations and other operations what will
not be cleaned up.
There are several places (mainly device drivers) where this is an issue.
However, one specific example is bpf_map_mmap() which keeps count of the
mappings in map->writecnt. The count is incremented on ->mmap() and then
decremented on vm_ops->close(). When arch_validate_flags() fails this
count is off since bpf_map_mmap_close() is never called.
One can reproduce this issue in arm64 devices with MTE support. Here the
vm_flags are checked to only allow VM_MTE if VM_MTE_ALLOWED has been set
previously. From userspace then is enough to pass the PROT_MTE flag to
mmap() syscall to trigger the arch_validate_flags() failure.
The following program reproduces this issue:
#include <stdio.h>
#include <unistd.h>
#include <linux/unistd.h>
#include <linux/bpf.h>
#include <sys/mman.h>
int main(void)
{
union bpf_attr attr = {
.map_type = BPF_MAP_TYPE_ARRAY,
.key_size = sizeof(int),
.value_size = sizeof(long long),
.max_entries = 256,
.map_flags = BPF_F_MMAPABLE,
};
int fd;
fd = syscall(__NR_bpf, BPF_MAP_CREATE, &attr, sizeof(attr));
mmap(NULL, 4096, PROT_WRITE | PROT_MTE, MAP_SHARED, fd, 0);
return 0;
}
By manually adding some log statements to the vm_ops callbacks we can
confirm that when passing PROT_MTE to mmap() the map->writecnt is off upon
->release():
With PROT_MTE flag:
root@debian:~# ./bpf-test
[ 111.263874] bpf_map_write_active_inc: map=9 writecnt=1
[ 111.288763] bpf_map_release: map=9 writecnt=1
Without PROT_MTE flag:
root@debian:~# ./bpf-test
[ 157.816912] bpf_map_write_active_inc: map=10 writecnt=1
[ 157.830442] bpf_map_write_active_dec: map=10 writecnt=0
[ 157.832396] bpf_map_release: map=10 writecnt=0
This patch fixes the above issue by calling vm_ops->close() when the
arch_validate_flags() check fails, after this we can proceed to unmap and
free the vma on the error path.
Link: https://lkml.kernel.org/r/20220930003844.1210987-1-cmllamas@google.com
Fixes: c462ac288f2c ("mm: Introduce arch_validate_flags()")
Reviewed-by: Catalin Marinas <catalin.marinas(a)arm.com>
Acked-by: Andrii Nakryiko <andrii(a)kernel.org>
Reviewed-by: Liam Howlett <liam.howlett(a)oracle.com>
Cc: Christian Brauner (Microsoft) <brauner(a)kernel.org>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: Suren Baghdasaryan <surenb(a)google.com>
Cc: <stable(a)vger.kernel.org> [5.10+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/mmap.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
--- a/mm/mmap.c~mm-mmap-undo-mmap-when-arch_validate_flags-fails
+++ a/mm/mmap.c
@@ -1797,7 +1797,7 @@ unsigned long mmap_region(struct file *f
if (!arch_validate_flags(vma->vm_flags)) {
error = -EINVAL;
if (file)
- goto unmap_and_free_vma;
+ goto close_and_free_vma;
else
goto free_vma;
}
@@ -1844,6 +1844,9 @@ out:
return addr;
+close_and_free_vma:
+ if (vma->vm_ops && vma->vm_ops->close)
+ vma->vm_ops->close(vma);
unmap_and_free_vma:
fput(vma->vm_file);
vma->vm_file = NULL;
_
Patches currently in -mm which might be from cmllamas(a)google.com are
mm-mmap-undo-mmap-when-arch_validate_flags-fails.patch
From: "Steven Rostedt (Google)" <rostedt(a)goodmis.org>
The follow commands caused a crash:
# cd /sys/kernel/tracing
# echo 's:open char file[]' > dynamic_events
# echo 'hist:keys=common_pid:file=filename:onchange($file).trace(open,$file)' > events/syscalls/sys_enter_openat/trigger'
# echo 1 > events/synthetic/open/enable
BOOM!
The problem is that the synthetic event field "char file[]" will read
the value given to it as a string without any memory checks to make sure
the address is valid. The above example will pass in the user space
address and the sythetic event code will happily call strlen() on it
and then strscpy() where either one will cause an oops when accessing
user space addresses.
Use the helper functions from trace_kprobe and trace_eprobe that can
read strings safely (and actually succeed when the address is from user
space and the memory is mapped in).
Cc: stable(a)vger.kernel.org
Fixes: bd82631d7ccdc ("tracing: Add support for dynamic strings to synthetic events")
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
---
kernel/trace/trace_events_synth.c | 28 +++++++++++++++++++++-------
1 file changed, 21 insertions(+), 7 deletions(-)
diff --git a/kernel/trace/trace_events_synth.c b/kernel/trace/trace_events_synth.c
index 5e8c07aef071..eae15bde883d 100644
--- a/kernel/trace/trace_events_synth.c
+++ b/kernel/trace/trace_events_synth.c
@@ -17,6 +17,8 @@
/* for gfp flag names */
#include <linux/trace_events.h>
#include <trace/events/mmflags.h>
+#include "trace_probe.h"
+#include "trace_probe_kernel.h"
#include "trace_synth.h"
@@ -409,6 +411,7 @@ static unsigned int trace_string(struct synth_trace_event *entry,
{
unsigned int len = 0;
char *str_field;
+ int ret;
if (is_dynamic) {
u32 data_offset;
@@ -417,19 +420,28 @@ static unsigned int trace_string(struct synth_trace_event *entry,
data_offset += event->n_u64 * sizeof(u64);
data_offset += data_size;
- str_field = (char *)entry + data_offset;
-
- len = strlen(str_val) + 1;
- strscpy(str_field, str_val, len);
-
+ len = kern_fetch_store_strlen(str_val) + 1;
+ if (len == 1)
+ len = strlen("fault") + 1;
data_offset |= len << 16;
*(u32 *)&entry->fields[*n_u64] = data_offset;
+ kern_fetch_store_string((unsigned long)str_val, &entry->fields[*n_u64], entry);
+
(*n_u64)++;
} else {
str_field = (char *)&entry->fields[*n_u64];
- strscpy(str_field, str_val, STR_VAR_LEN_MAX);
+#ifdef CONFIG_ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE
+ if ((unsigned long)str_val < TASK_SIZE)
+ ret = strncpy_from_user_nofault(str_field, str_val, STR_VAR_LEN_MAX);
+ else
+#endif
+ ret = strncpy_from_kernel_nofault(str_field, str_val, STR_VAR_LEN_MAX);
+
+ if (ret < 0)
+ strcpy(str_field, "(fault)");
+
(*n_u64) += STR_VAR_LEN_MAX / sizeof(u64);
}
@@ -462,7 +474,9 @@ static notrace void trace_event_raw_event_synth(void *__data,
val_idx = var_ref_idx[field_pos];
str_val = (char *)(long)var_ref_vals[val_idx];
- len = strlen(str_val) + 1;
+ len = kern_fetch_store_strlen(str_val) + 1;
+ if (len == 1)
+ len = strlen("(fault)") + 1;
fields_size += len;
}
--
2.35.1