Below error was reported in a 32-bit kernel build:
static_call.c:(.ref.text+0x46): undefined reference to `cpu_wants_rethunk_at'
make[1]: [Makefile:1234: vmlinux] Error
This is because the definition of cpu_wants_rethunk_at() depends on
CONFIG_STACK_VALIDATION which is only enabled in 64-bit mode.
Define the empty function for CONFIG_STACK_VALIDATION=n, rethunk mitigation
is anyways not supported without it.
Reported-by: Guenter Roeck <linux(a)roeck-us.net>
Fixes: 5d19a0574b75 ("x86/its: Add support for ITS-safe return thunk")
Signed-off-by: Pawan Gupta <pawan.kumar.gupta(a)linux.intel.com>
Link: https://lore.kernel.org/stable/0f597436-5da6-4319-b918-9f57bde5634a@roeck-u…
---
arch/x86/include/asm/alternative.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/x86/include/asm/alternative.h b/arch/x86/include/asm/alternative.h
index 1797f80c10de..a5f704dbb4a1 100644
--- a/arch/x86/include/asm/alternative.h
+++ b/arch/x86/include/asm/alternative.h
@@ -98,7 +98,7 @@ static inline u8 *its_static_thunk(int reg)
}
#endif
-#ifdef CONFIG_RETHUNK
+#if defined(CONFIG_RETHUNK) && defined(CONFIG_STACK_VALIDATION)
extern bool cpu_wants_rethunk(void);
extern bool cpu_wants_rethunk_at(void *addr);
#else
--
2.34.1
The quilt patch titled
Subject: mm/khugepaged: fix race with folio split/free using temporary reference
has been removed from the -mm tree. Its filename was
mm-khugepaged-fix-race-with-folio-split-free-using-temporary-reference.patch
This patch was dropped because it was merged into the mm-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Shivank Garg <shivankg(a)amd.com>
Subject: mm/khugepaged: fix race with folio split/free using temporary reference
Date: Mon, 26 May 2025 18:28:18 +0000
hpage_collapse_scan_file() calls is_refcount_suitable(), which in turn
calls folio_mapcount(). folio_mapcount() checks folio_test_large() before
proceeding to folio_large_mapcount(), but there is a race window where the
folio may get split/freed between these checks, triggering:
VM_WARN_ON_FOLIO(!folio_test_large(folio), folio)
Take a temporary reference to the folio in hpage_collapse_scan_file().
This stabilizes the folio during refcount check and prevents incorrect
large folio detection due to concurrent split/free. Use helper
folio_expected_ref_count() + 1 to compare with folio_ref_count() instead
of using is_refcount_suitable().
Link: https://lkml.kernel.org/r/20250526182818.37978-1-shivankg@amd.com
Fixes: 05c5323b2a34 ("mm: track mapcount of large folios in single value")
Signed-off-by: Shivank Garg <shivankg(a)amd.com>
Reported-by: syzbot+2b99589e33edbe9475ca(a)syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/6828470d.a70a0220.38f255.000c.GAE@google.com
Suggested-by: David Hildenbrand <david(a)redhat.com>
Acked-by: David Hildenbrand <david(a)redhat.com>
Acked-by: Dev Jain <dev.jain(a)arm.com>
Reviewed-by: Baolin Wang <baolin.wang(a)linux.alibaba.com>
Cc: Bharata B Rao <bharata(a)amd.com>
Cc: Fengwei Yin <fengwei.yin(a)intel.com>
Cc: Liam Howlett <liam.howlett(a)oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com>
Cc: Mariano Pache <npache(a)redhat.com>
Cc: Ryan Roberts <ryan.roberts(a)arm.com>
Cc: Zi Yan <ziy(a)nvidia.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/khugepaged.c | 18 +++++++++++++++++-
1 file changed, 17 insertions(+), 1 deletion(-)
--- a/mm/khugepaged.c~mm-khugepaged-fix-race-with-folio-split-free-using-temporary-reference
+++ a/mm/khugepaged.c
@@ -2293,6 +2293,17 @@ static int hpage_collapse_scan_file(stru
continue;
}
+ if (!folio_try_get(folio)) {
+ xas_reset(&xas);
+ continue;
+ }
+
+ if (unlikely(folio != xas_reload(&xas))) {
+ folio_put(folio);
+ xas_reset(&xas);
+ continue;
+ }
+
if (folio_order(folio) == HPAGE_PMD_ORDER &&
folio->index == start) {
/* Maybe PMD-mapped */
@@ -2303,23 +2314,27 @@ static int hpage_collapse_scan_file(stru
* it's safe to skip LRU and refcount checks before
* returning.
*/
+ folio_put(folio);
break;
}
node = folio_nid(folio);
if (hpage_collapse_scan_abort(node, cc)) {
result = SCAN_SCAN_ABORT;
+ folio_put(folio);
break;
}
cc->node_load[node]++;
if (!folio_test_lru(folio)) {
result = SCAN_PAGE_LRU;
+ folio_put(folio);
break;
}
- if (!is_refcount_suitable(folio)) {
+ if (folio_expected_ref_count(folio) + 1 != folio_ref_count(folio)) {
result = SCAN_PAGE_COUNT;
+ folio_put(folio);
break;
}
@@ -2331,6 +2346,7 @@ static int hpage_collapse_scan_file(stru
*/
present += folio_nr_pages(folio);
+ folio_put(folio);
if (need_resched()) {
xas_pause(&xas);
_
Patches currently in -mm which might be from shivankg(a)amd.com are
The patch titled
Subject: KVM: s390: rename PROT_NONE to PROT_TYPE_DUMMY
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
kvm-s390-rename-prot_none-to-prot_type_dummy.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com>
Subject: KVM: s390: rename PROT_NONE to PROT_TYPE_DUMMY
Date: Mon, 19 May 2025 15:56:57 +0100
The enum type prot_type declared in arch/s390/kvm/gaccess.c declares an
unfortunate identifier within it - PROT_NONE.
This clashes with the protection bit define from the uapi for mmap()
declared in include/uapi/asm-generic/mman-common.h, which is indeed what
those casually reading this code would assume this to refer to.
This means that any changes which subsequently alter headers in any way
which results in the uapi header being imported here will cause build
errors.
Resolve the issue by renaming PROT_NONE to PROT_TYPE_DUMMY.
Link: https://lkml.kernel.org/r/20250519145657.178365-1-lorenzo.stoakes@oracle.com
Fixes: b3cefd6bf16e ("KVM: s390: Pass initialized arg even if unused")
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com>
Suggested-by: Ignacio Moreno Gonzalez <Ignacio.MorenoGonzalez(a)kuka.com>
Reported-by: kernel test robot <lkp(a)intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202505140943.IgHDa9s7-lkp@intel.com/
Acked-by: Christian Borntraeger <borntraeger(a)linux.ibm.com>
Acked-by: Ignacio Moreno Gonzalez <Ignacio.MorenoGonzalez(a)kuka.com>
Acked-by: Yang Shi <yang(a)os.amperecomputing.com>
Reviewed-by: David Hildenbrand <david(a)redhat.com>
Acked-by: Liam R. Howlett <Liam.Howlett(a)oracle.com>
Reviewed-by: Oscar Salvador <osalvador(a)suse.de>
Reviewed-by: Claudio Imbrenda <imbrenda(a)linux.ibm.com>
Cc: <stable(a)vger.kernel.org>
Cc: Alexander Gordeev <agordeev(a)linux.ibm.com>
Cc: Heiko Carstens <hca(a)linux.ibm.com>
Cc: James Houghton <jthoughton(a)google.com>
Cc: Janosch Frank <frankja(a)linux.ibm.com>
Cc: Matthew Wilcox (Oracle) <willy(a)infradead.org>
Cc: Paolo Bonzini <pbonzini(a)redhat.com>
Cc: Sven Schnelle <svens(a)linux.ibm.com>
Cc: Vasily Gorbik <gor(a)linux.ibm.com>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
arch/s390/kvm/gaccess.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
--- a/arch/s390/kvm/gaccess.c~kvm-s390-rename-prot_none-to-prot_type_dummy
+++ a/arch/s390/kvm/gaccess.c
@@ -318,7 +318,7 @@ enum prot_type {
PROT_TYPE_DAT = 3,
PROT_TYPE_IEP = 4,
/* Dummy value for passing an initialized value when code != PGM_PROTECTION */
- PROT_NONE,
+ PROT_TYPE_DUMMY,
};
static int trans_exc_ending(struct kvm_vcpu *vcpu, int code, unsigned long gva, u8 ar,
@@ -334,7 +334,7 @@ static int trans_exc_ending(struct kvm_v
switch (code) {
case PGM_PROTECTION:
switch (prot) {
- case PROT_NONE:
+ case PROT_TYPE_DUMMY:
/* We should never get here, acts like termination */
WARN_ON_ONCE(1);
break;
@@ -804,7 +804,7 @@ static int guest_range_to_gpas(struct kv
gpa = kvm_s390_real_to_abs(vcpu, ga);
if (!kvm_is_gpa_in_memslot(vcpu->kvm, gpa)) {
rc = PGM_ADDRESSING;
- prot = PROT_NONE;
+ prot = PROT_TYPE_DUMMY;
}
}
if (rc)
@@ -962,7 +962,7 @@ int access_guest_with_key(struct kvm_vcp
if (rc == PGM_PROTECTION)
prot = PROT_TYPE_KEYC;
else
- prot = PROT_NONE;
+ prot = PROT_TYPE_DUMMY;
rc = trans_exc_ending(vcpu, rc, ga, ar, mode, prot, terminate);
}
out_unlock:
_
Patches currently in -mm which might be from lorenzo.stoakes(a)oracle.com are
kvm-s390-rename-prot_none-to-prot_type_dummy.patch
tools-testing-vma-add-missing-function-stub.patch
Mask the value read before returning it. The value read over the
parallel bus via the AXI ADC IP block contains both the address and
the data, but callers expect val to only contain the data.
axi_adc_raw_write() takes a u32 parameter, so addr was the wrong type.
This wasn't causing any issues but is corrected anyway since we are
touching the same line to add a new variable.
Cc: stable(a)vger.kernel.org
Fixes: 79c47485e438 ("iio: adc: adi-axi-adc: add support for AD7606 register writing")
Signed-off-by: David Lechner <dlechner(a)baylibre.com>
---
Changes in v2:
- Use ADI_AXI_REG_VALUE_MASK instead of hard-coding 0xFF.
- Introduce local variable and use FIELD_PREP() instead of modifying val.
- Link to v1: https://lore.kernel.org/r/20250530-iio-adc-adi-axi-adc-fix-ad7606_bus_reg_r…
---
drivers/iio/adc/adi-axi-adc.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/drivers/iio/adc/adi-axi-adc.c b/drivers/iio/adc/adi-axi-adc.c
index cf942c043457ccea49207c3900153ee371b3774f..fc745297bcb82cf2cf7f30c7fcf9bba2d861a48c 100644
--- a/drivers/iio/adc/adi-axi-adc.c
+++ b/drivers/iio/adc/adi-axi-adc.c
@@ -445,7 +445,7 @@ static int axi_adc_raw_read(struct iio_backend *back, u32 *val)
static int ad7606_bus_reg_read(struct iio_backend *back, u32 reg, u32 *val)
{
struct adi_axi_adc_state *st = iio_backend_get_priv(back);
- int addr;
+ u32 addr, reg_val;
guard(mutex)(&st->lock);
@@ -455,7 +455,9 @@ static int ad7606_bus_reg_read(struct iio_backend *back, u32 reg, u32 *val)
*/
addr = FIELD_PREP(ADI_AXI_REG_ADDRESS_MASK, reg) | ADI_AXI_REG_READ_BIT;
axi_adc_raw_write(back, addr);
- axi_adc_raw_read(back, val);
+ axi_adc_raw_read(back, ®_val);
+
+ *val = FIELD_GET(ADI_AXI_REG_VALUE_MASK, reg_val);
/* Write 0x0 on the bus to get back to ADC mode */
axi_adc_raw_write(back, 0);
---
base-commit: 7cdfbc0113d087348b8e65dd79276d0f57b89a10
change-id: 20250530-iio-adc-adi-axi-adc-fix-ad7606_bus_reg_read-f2bbb503db8b
Best regards,
--
David Lechner <dlechner(a)baylibre.com>
From: Helge Deller <deller(a)gmx.de>
The dfa blob stream for the aa_dfa_unpack() function is expected to be aligned
on a 8 byte boundary.
The static nulldfa_src[] and stacksplitdfa_src[] arrays store the inital
apparmor dfa blob streams, but since they are declared as an array-of-chars
the compiler and linker will only ensure a "char" (1-byte) alignment.
Add an __aligned(8) annotation to the arrays to tell the linker to always
align them on a 8-byte boundary. This avoids runtime warnings at startup on
alignment-sensitive platforms like parisc such as:
Kernel: unaligned access to 0x7f2a584a in aa_dfa_unpack+0x124/0x788 (iir 0xca0109f)
Kernel: unaligned access to 0x7f2a584e in aa_dfa_unpack+0x210/0x788 (iir 0xca8109c)
Kernel: unaligned access to 0x7f2a586a in aa_dfa_unpack+0x278/0x788 (iir 0xcb01090)
Signed-off-by: Helge Deller <deller(a)gmx.de>
Cc: stable(a)vger.kernel.org
---
security/apparmor/lsm.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/security/apparmor/lsm.c b/security/apparmor/lsm.c
index 9b6c2f157f83..531bde29cccb 100644
--- a/security/apparmor/lsm.c
+++ b/security/apparmor/lsm.c
@@ -2149,12 +2149,12 @@ static int __init apparmor_nf_ip_init(void)
__initcall(apparmor_nf_ip_init);
#endif
-static char nulldfa_src[] = {
+static char nulldfa_src[] __aligned(8) = {
#include "nulldfa.in"
};
static struct aa_dfa *nulldfa;
-static char stacksplitdfa_src[] = {
+static char stacksplitdfa_src[] __aligned(8) = {
#include "stacksplitdfa.in"
};
struct aa_dfa *stacksplitdfa;
--
2.47.0
On Mon, May 26, 2025 at 6:40 AM Wang Yugui <wangyugui(a)e16-tech.com> wrote:
>
> Hi,
> Cc: Filipe Manana
>
> I noticed 6.1.140 build failure(fs/btrfs/discard.c:247:5: error: implicit declaration of function 'ASSERT')
>
> fs/btrfs/discard.c: In function 'peek_discard_list':
> fs/btrfs/discard.c:247:5: error: implicit declaration of function 'ASSERT'; did you mean 'IS_ERR'? [-Werror=implicit-function-declaration]
> ASSERT(block_group->discard_index !=
> ^~~~~~
> IS_ERR
>
> It seems realted to the patch(btrfs-fix-discard-worker-infinite-loop-after-disabling-discard.patch).
Yes, it is.
The patch, like most stable backports, was automatically picked by the
stable scripts and added to stable releases.
I assume that before the release was made, it was compile tested by
the stable automatic processes.
I just tried it, and it compiles successfully for me:
fdmanana 11:56:26 ~/git/hub/linux ((v6.12))> git clean -xfd
fdmanana 11:57:27 ~/git/hub/linux ((v6.12))> git co v6.1.140
fdmanana 11:59:53 ~/git/hub/linux ((v6.1.140))> make defconfig
HOSTCC scripts/basic/fixdep
HOSTCC scripts/kconfig/conf.o
HOSTCC scripts/kconfig/confdata.o
HOSTCC scripts/kconfig/expr.o
LEX scripts/kconfig/lexer.lex.c
YACC scripts/kconfig/parser.tab.[ch]
HOSTCC scripts/kconfig/lexer.lex.o
HOSTCC scripts/kconfig/menu.o
HOSTCC scripts/kconfig/parser.tab.o
HOSTCC scripts/kconfig/preprocess.o
HOSTCC scripts/kconfig/symbol.o
HOSTCC scripts/kconfig/util.o
HOSTLD scripts/kconfig/conf
*** Default configuration is based on 'x86_64_defconfig'
#
# configuration written to .config
#
# Run make menuconfig to enable btrfs and all its config options
fdmanana 12:01:55 ~/git/hub/linux ((v6.1.140))> make menuconfig
fdmanana 12:02:17 ~/git/hub/linux ((v6.1.140))> grep BTRFS .config
CONFIG_BTRFS_FS=m
CONFIG_BTRFS_FS_POSIX_ACL=y
CONFIG_BTRFS_FS_CHECK_INTEGRITY=y
CONFIG_BTRFS_FS_RUN_SANITY_TESTS=y
CONFIG_BTRFS_DEBUG=y
CONFIG_BTRFS_ASSERT=y
CONFIG_BTRFS_FS_REF_VERIFY=y
fdmanana 12:06:08 ~/git/hub/linux ((v6.1.140))> make fs/btrfs/btrfs.ko
SYNC include/config/auto.conf
CALL scripts/checksyscalls.sh
DESCEND objtool
CC [M] fs/btrfs/super.o
CC [M] fs/btrfs/ctree.o
CC [M] fs/btrfs/extent-tree.o
CC [M] fs/btrfs/print-tree.o
CC [M] fs/btrfs/root-tree.o
CC [M] fs/btrfs/dir-item.o
CC [M] fs/btrfs/file-item.o
CC [M] fs/btrfs/inode-item.o
CC [M] fs/btrfs/disk-io.o
CC [M] fs/btrfs/transaction.o
CC [M] fs/btrfs/inode.o
CC [M] fs/btrfs/file.o
CC [M] fs/btrfs/tree-defrag.o
CC [M] fs/btrfs/extent_map.o
CC [M] fs/btrfs/sysfs.o
CC [M] fs/btrfs/struct-funcs.o
CC [M] fs/btrfs/xattr.o
CC [M] fs/btrfs/ordered-data.o
CC [M] fs/btrfs/extent_io.o
CC [M] fs/btrfs/volumes.o
CC [M] fs/btrfs/async-thread.o
CC [M] fs/btrfs/ioctl.o
CC [M] fs/btrfs/locking.o
CC [M] fs/btrfs/orphan.o
CC [M] fs/btrfs/export.o
CC [M] fs/btrfs/tree-log.o
CC [M] fs/btrfs/free-space-cache.o
CC [M] fs/btrfs/lzo.o
CC [M] fs/btrfs/zstd.o
CC [M] fs/btrfs/compression.o
CC [M] fs/btrfs/delayed-ref.o
CC [M] fs/btrfs/relocation.o
CC [M] fs/btrfs/delayed-inode.o
CC [M] fs/btrfs/scrub.o
CC [M] fs/btrfs/backref.o
CC [M] fs/btrfs/ulist.o
CC [M] fs/btrfs/qgroup.o
CC [M] fs/btrfs/send.o
CC [M] fs/btrfs/dev-replace.o
CC [M] fs/btrfs/raid56.o
CC [M] fs/btrfs/uuid-tree.o
CC [M] fs/btrfs/props.o
CC [M] fs/btrfs/free-space-tree.o
CC [M] fs/btrfs/tree-checker.o
CC [M] fs/btrfs/space-info.o
CC [M] fs/btrfs/block-rsv.o
CC [M] fs/btrfs/delalloc-space.o
CC [M] fs/btrfs/block-group.o
CC [M] fs/btrfs/discard.o
CC [M] fs/btrfs/reflink.o
CC [M] fs/btrfs/subpage.o
CC [M] fs/btrfs/tree-mod-log.o
CC [M] fs/btrfs/extent-io-tree.o
CC [M] fs/btrfs/acl.o
CC [M] fs/btrfs/check-integrity.o
CC [M] fs/btrfs/ref-verify.o
CC [M] fs/btrfs/tests/free-space-tests.o
CC [M] fs/btrfs/tests/extent-buffer-tests.o
CC [M] fs/btrfs/tests/btrfs-tests.o
CC [M] fs/btrfs/tests/extent-io-tests.o
CC [M] fs/btrfs/tests/inode-tests.o
CC [M] fs/btrfs/tests/qgroup-tests.o
CC [M] fs/btrfs/tests/free-space-tree-tests.o
CC [M] fs/btrfs/tests/extent-map-tests.o
LD [M] fs/btrfs/btrfs.o
make[3]: 'fs/btrfs/btrfs.mod' is up to date.
MODPOST modules-only.symvers
WARNING: vmlinux.o is missing.
Modules may not have dependencies or modversions.
You may get many unresolved symbol warnings.
WARNING: modpost: "bio_associate_blkg" [fs/btrfs/btrfs.ko] undefined!
WARNING: modpost: "zstd_cstream_workspace_bound" [fs/btrfs/btrfs.ko] undefined!
WARNING: modpost: "folio_invalidate" [fs/btrfs/btrfs.ko] undefined!
WARNING: modpost: "__page_file_index" [fs/btrfs/btrfs.ko] undefined!
WARNING: modpost: "strcpy" [fs/btrfs/btrfs.ko] undefined!
WARNING: modpost: "inode_init_owner" [fs/btrfs/btrfs.ko] undefined!
WARNING: modpost: "generic_fillattr" [fs/btrfs/btrfs.ko] undefined!
WARNING: modpost: "is_vmalloc_addr" [fs/btrfs/btrfs.ko] undefined!
WARNING: modpost: "krealloc" [fs/btrfs/btrfs.ko] undefined!
WARNING: modpost: "vfs_fsync_range" [fs/btrfs/btrfs.ko] undefined!
WARNING: modpost: suppressed 521 unresolved symbol warnings because
there were too many)
LD [M] fs/btrfs/btrfs.ko
fdmanana 12:11:12 ~/git/hub/linux ((v6.1.140))> gcc --version
gcc (Debian 9.3.0-18) 9.3.0
>
> I walked around it with the following patch.
In the future please cc the stable list when you find problems with
stable backports.
>
> diff --git a/fs/btrfs/discard.c b/fs/btrfs/discard.c
> index 98bce18..9ffe5c4 100644
> --- a/fs/btrfs/discard.c
> +++ b/fs/btrfs/discard.c
> @@ -7,6 +7,7 @@
> #include <linux/math64.h>
> #include <linux/sizes.h>
> #include <linux/workqueue.h>
> +#include "messages.h"
> #include "ctree.h"
> #include "block-group.h"
> #include "discard.h"
>
>
> Best Regards
> Wang Yugui (wangyugui(a)e16-tech.com)
> 2025/05/26
>
>
>
Hi,
We’re excited to share exclusive access to the Integrated Systems Europe 2025 Visitor Contact List!
Our database includes 88,351 verified visitor contacts, giving you a powerful resource to connect with key industry professionals.
Each contact includes: Contact Name, Job Title, Company Name, Physical Address, Phone Number, Official Email Address, and more.
Interested in the list? Simply reply with “Send me Pricing” to get full details.
Kind regards,
Garnet Conwell
Sr. Marketing Manager
To opt out of future messages, just reply “Unfollow.”
From: Kairui Song <kasong(a)tencent.com>
On seeing a swap entry PTE, userfaultfd_move does a lockless swap cache
lookup, and try to move the found folio to the faulting vma when.
Currently, it relies on the PTE value check to ensure the moved folio
still belongs to the src swap entry, which turns out is not reliable.
While working and reviewing the swap table series with Barry, following
existing race is observed and reproduced [1]:
( move_pages_pte is moving src_pte to dst_pte, where src_pte is a
swap entry PTE holding swap entry S1, and S1 isn't in the swap cache.)
CPU1 CPU2
userfaultfd_move
move_pages_pte()
entry = pte_to_swp_entry(orig_src_pte);
// Here it got entry = S1
... < Somehow interrupted> ...
<swapin src_pte, alloc and use folio A>
// folio A is just a new allocated folio
// and get installed into src_pte
<frees swap entry S1>
// src_pte now points to folio A, S1
// has swap count == 0, it can be freed
// by folio_swap_swap or swap
// allocator's reclaim.
<try to swap out another folio B>
// folio B is a folio in another VMA.
<put folio B to swap cache using S1 >
// S1 is freed, folio B could use it
// for swap out with no problem.
...
folio = filemap_get_folio(S1)
// Got folio B here !!!
... < Somehow interrupted again> ...
<swapin folio B and free S1>
// Now S1 is free to be used again.
<swapout src_pte & folio A using S1>
// Now src_pte is a swap entry pte
// holding S1 again.
folio_trylock(folio)
move_swap_pte
double_pt_lock
is_pte_pages_stable
// Check passed because src_pte == S1
folio_move_anon_rmap(...)
// Moved invalid folio B here !!!
The race window is very short and requires multiple collisions of
multiple rare events, so it's very unlikely to happen, but with a
deliberately constructed reproducer and increased time window, it can be
reproduced [1].
It's also possible that folio (A) is swapped in, and swapped out again
after the filemap_get_folio lookup, in such case folio (A) may stay in
swap cache so it needs to be moved too. In this case we should also try
again so kernel won't miss a folio move.
Fix this by checking if the folio is the valid swap cache folio after
acquiring the folio lock, and checking the swap cache again after
acquiring the src_pte lock.
SWP_SYNCRHONIZE_IO path does make the problem more complex, but so far
we don't need to worry about that since folios only might get exposed to
swap cache in the swap out path, and it's covered in this patch too by
checking the swap cache again after acquiring src_pte lock.
Fixes: adef440691ba ("userfaultfd: UFFDIO_MOVE uABI")
Closes: https://lore.kernel.org/linux-mm/CAMgjq7B1K=6OOrK2OUZ0-tqCzi+EJt+2_K97TPGoS… [1]
Signed-off-by: Kairui Song <kasong(a)tencent.com>
---
mm/userfaultfd.c | 26 ++++++++++++++++++++++++++
1 file changed, 26 insertions(+)
diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c
index bc473ad21202..a1564d205dfb 100644
--- a/mm/userfaultfd.c
+++ b/mm/userfaultfd.c
@@ -15,6 +15,7 @@
#include <linux/mmu_notifier.h>
#include <linux/hugetlb.h>
#include <linux/shmem_fs.h>
+#include <linux/delay.h>
#include <asm/tlbflush.h>
#include <asm/tlb.h>
#include "internal.h"
@@ -1086,6 +1087,8 @@ static int move_swap_pte(struct mm_struct *mm, struct vm_area_struct *dst_vma,
spinlock_t *dst_ptl, spinlock_t *src_ptl,
struct folio *src_folio)
{
+ swp_entry_t entry;
+
double_pt_lock(dst_ptl, src_ptl);
if (!is_pte_pages_stable(dst_pte, src_pte, orig_dst_pte, orig_src_pte,
@@ -1102,6 +1105,19 @@ static int move_swap_pte(struct mm_struct *mm, struct vm_area_struct *dst_vma,
if (src_folio) {
folio_move_anon_rmap(src_folio, dst_vma);
src_folio->index = linear_page_index(dst_vma, dst_addr);
+ } else {
+ /*
+ * Check again after acquiring the src_pte lock. Or we might
+ * miss a new loaded swap cache folio.
+ */
+ entry = pte_to_swp_entry(orig_src_pte);
+ src_folio = filemap_get_folio(swap_address_space(entry),
+ swap_cache_index(entry));
+ if (!IS_ERR_OR_NULL(src_folio)) {
+ double_pt_unlock(dst_ptl, src_ptl);
+ folio_put(src_folio);
+ return -EAGAIN;
+ }
}
orig_src_pte = ptep_get_and_clear(mm, src_addr, src_pte);
@@ -1409,6 +1425,16 @@ static int move_pages_pte(struct mm_struct *mm, pmd_t *dst_pmd, pmd_t *src_pmd,
folio_lock(src_folio);
goto retry;
}
+ /*
+ * Check if the folio still belongs to the target swap entry after
+ * acquiring the lock. Folio can be freed in the swap cache while
+ * not locked.
+ */
+ if (unlikely(!folio_test_swapcache(folio) ||
+ entry.val != folio->swap.val)) {
+ err = -EAGAIN;
+ goto out;
+ }
}
err = move_swap_pte(mm, dst_vma, dst_addr, src_addr, dst_pte, src_pte,
orig_dst_pte, orig_src_pte, dst_pmd, dst_pmdval,
--
2.49.0