Incomplete backport of series "xfs: log intent item recovery should
reconstruct defer work state" [1] leads to a kernel crash during the
xfs/235 test execution on top of 6.6.y stable.
Tested (briefly) with my local xfstests setup. Additional testing would
be much appreciated.
[1]: https://lore.kernel.org/linux-xfs/170191741007.1195961.10092536809136830257…
XFS (loop1): Corruption of in-memory data (0x8) detected at xfs_trans_cancel+0x4d9/0x610 (fs/xfs/xfs_trans.c:1097). Shutting down filesystem.
XFS (loop1): Please unmount the filesystem and rectify the problem(s)
general protection fault, probably for non-canonical address 0xdffffc000000000c: 0000 [#1] PREEMPT SMP KASAN NOPTI
KASAN: null-ptr-deref in range [0x0000000000000060-0x0000000000000067]
CPU: 1 PID: 2011 Comm: mount Not tainted 6.6.84-rc2+ #12
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-3.fc41 04/01/2014
RIP: 0010:xlog_recover_cancel_intents+0xad/0x1b0
Call Trace:
<TASK>
xlog_recover_finish+0x7f6/0x9a0
xfs_log_mount_finish+0x386/0x650
xfs_mountfs+0x1405/0x1fb0
xfs_fs_fill_super+0x11d6/0x1ca0
get_tree_bdev+0x3b4/0x650
vfs_get_tree+0x92/0x370
path_mount+0x13b9/0x1f10
__x64_sys_mount+0x286/0x310
do_syscall_64+0x39/0x90
entry_SYSCALL_64_after_hwframe+0x78/0xe2
</TASK>
Modules linked in:
---[ end trace 0000000000000000 ]---
RIP: 0010:xlog_recover_cancel_intents+0xad/0x1b0
Link to the original bug report [2].
[2]: https://lore.kernel.org/stable/6pxyzwujo52p4bp2otliyssjcvsfydd6ju32eusdlyhz…
Found by Linux Verification Center (linuxtesting.org).
Darrick J. Wong (4):
xfs: recreate work items when recovering intent items
xfs: dump the recovered xattri log item if corruption happens
xfs: use xfs_defer_finish_one to finish recovered work items
xfs: move ->iop_recover to xfs_defer_op_type
fs/xfs/libxfs/xfs_defer.c | 22 ++++-
fs/xfs/libxfs/xfs_defer.h | 14 +++
fs/xfs/libxfs/xfs_log_recover.h | 4 +-
fs/xfs/xfs_attr_item.c | 115 ++++++++++++------------
fs/xfs/xfs_bmap_item.c | 92 ++++++++++---------
fs/xfs/xfs_extfree_item.c | 117 +++++++++++--------------
fs/xfs/xfs_log_recover.c | 37 ++++----
fs/xfs/xfs_refcount_item.c | 127 +++++++++------------------
fs/xfs/xfs_rmap_item.c | 151 ++++++++++++++++----------------
fs/xfs/xfs_trans.h | 4 -
10 files changed, 326 insertions(+), 357 deletions(-)
--
2.49.0
Mounting a corrupted filesystem with directory which contains '.' dir
entry with rec_len == block size results in out-of-bounds read (later
on, when the corrupted directory is removed).
ext4_empty_dir() assumes every ext4 directory contains at least '.'
and '..' as directory entries in the first data block. It first loads
the '.' dir entry, performs sanity checks by calling ext4_check_dir_entry()
and then uses its rec_len member to compute the location of '..' dir
entry (in ext4_next_entry). It assumes the '..' dir entry fits into the
same data block.
If the rec_len of '.' is precisely one block (4KB), it slips through the
sanity checks (it is considered the last directory entry in the data
block) and leaves "struct ext4_dir_entry_2 *de" point exactly past the
memory slot allocated to the data block. The following call to
ext4_check_dir_entry() on new value of de then dereferences this pointer
which results in out-of-bounds mem access.
Fix this by extending __ext4_check_dir_entry() to check for '.' dir
entries that reach the end of data block. Make sure to ignore the phony
dir entries for checksum (by checking name_len for non-zero).
Note: This is reported by KASAN as use-after-free in case another
structure was recently freed from the slot past the bound, but it is
really an OOB read.
This issue was found by syzkaller tool.
Call Trace:
[ 38.594108] BUG: KASAN: slab-use-after-free in __ext4_check_dir_entry+0x67e/0x710
[ 38.594649] Read of size 2 at addr ffff88802b41a004 by task syz-executor/5375
[ 38.595158]
[ 38.595288] CPU: 0 UID: 0 PID: 5375 Comm: syz-executor Not tainted 6.14.0-rc7 #1
[ 38.595298] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
[ 38.595304] Call Trace:
[ 38.595308] <TASK>
[ 38.595311] dump_stack_lvl+0xa7/0xd0
[ 38.595325] print_address_description.constprop.0+0x2c/0x3f0
[ 38.595339] ? __ext4_check_dir_entry+0x67e/0x710
[ 38.595349] print_report+0xaa/0x250
[ 38.595359] ? __ext4_check_dir_entry+0x67e/0x710
[ 38.595368] ? kasan_addr_to_slab+0x9/0x90
[ 38.595378] kasan_report+0xab/0xe0
[ 38.595389] ? __ext4_check_dir_entry+0x67e/0x710
[ 38.595400] __ext4_check_dir_entry+0x67e/0x710
[ 38.595410] ext4_empty_dir+0x465/0x990
[ 38.595421] ? __pfx_ext4_empty_dir+0x10/0x10
[ 38.595432] ext4_rmdir.part.0+0x29a/0xd10
[ 38.595441] ? __dquot_initialize+0x2a7/0xbf0
[ 38.595455] ? __pfx_ext4_rmdir.part.0+0x10/0x10
[ 38.595464] ? __pfx___dquot_initialize+0x10/0x10
[ 38.595478] ? down_write+0xdb/0x140
[ 38.595487] ? __pfx_down_write+0x10/0x10
[ 38.595497] ext4_rmdir+0xee/0x140
[ 38.595506] vfs_rmdir+0x209/0x670
[ 38.595517] ? lookup_one_qstr_excl+0x3b/0x190
[ 38.595529] do_rmdir+0x363/0x3c0
[ 38.595537] ? __pfx_do_rmdir+0x10/0x10
[ 38.595544] ? strncpy_from_user+0x1ff/0x2e0
[ 38.595561] __x64_sys_unlinkat+0xf0/0x130
[ 38.595570] do_syscall_64+0x5b/0x180
[ 38.595583] entry_SYSCALL_64_after_hwframe+0x76/0x7e
Fixes: ac27a0ec112a0 ("[PATCH] ext4: initial copy of files from ext3")
Signed-off-by: Jakub Acs <acsjakub(a)amazon.de>
Cc: "Theodore Ts'o" <tytso(a)mit.edu>
Cc: Andreas Dilger <adilger.kernel(a)dilger.ca>
Cc: linux-ext4(a)vger.kernel.org
Cc: linux-kernel(a)vger.kernel.org
Cc: Mahmoud Adam <mngyadam(a)amazon.com>
Cc: stable(a)vger.kernel.org
Cc: security(a)kernel.org
---
v1: https://lore.kernel.org/all/20250319110134.10071-1-acsjakub@amazon.com/
v1->v2:
- optimize condition as per suggestions
- remove questions
- move this section to correct place
I ran 'kvm-xfstests smoke' and '-c ext4/4k -g quick' as suggested in
reply to v1. Some were skipped, none failed.
fs/ext4/dir.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/fs/ext4/dir.c b/fs/ext4/dir.c
index 02d47a64e8d1..253992fcf57c 100644
--- a/fs/ext4/dir.c
+++ b/fs/ext4/dir.c
@@ -104,6 +104,9 @@ int __ext4_check_dir_entry(const char *function, unsigned int line,
else if (unlikely(le32_to_cpu(de->inode) >
le32_to_cpu(EXT4_SB(dir->i_sb)->s_es->s_inodes_count)))
error_msg = "inode out of bounds";
+ else if (unlikely(next_offset == size && de->name_len == 1 &&
+ de->name[0] == '.'))
+ error_msg = "'.' directory cannot be the last in data block";
else
return 0;
--
2.47.1
The patch titled
Subject: mm: zswap: fix crypto_free_acomp() deadlock in zswap_cpu_comp_dead()
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
mm-zswap-fix-crypto_free_acomp-deadlock-in-zswap_cpu_comp_dead.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Yosry Ahmed <yosry.ahmed(a)linux.dev>
Subject: mm: zswap: fix crypto_free_acomp() deadlock in zswap_cpu_comp_dead()
Date: Wed, 26 Feb 2025 18:56:25 +0000
Currently, zswap_cpu_comp_dead() calls crypto_free_acomp() while holding
the per-CPU acomp_ctx mutex. crypto_free_acomp() then holds scomp_lock
(through crypto_exit_scomp_ops_async()).
On the other hand, crypto_alloc_acomp_node() holds the scomp_lock (through
crypto_scomp_init_tfm()), and then allocates memory. If the allocation
results in reclaim, we may attempt to hold the per-CPU acomp_ctx mutex.
The above dependencies can cause an ABBA deadlock. For example in the
following scenario:
(1) Task A running on CPU #1:
crypto_alloc_acomp_node()
Holds scomp_lock
Enters reclaim
Reads per_cpu_ptr(pool->acomp_ctx, 1)
(2) Task A is descheduled
(3) CPU #1 goes offline
zswap_cpu_comp_dead(CPU #1)
Holds per_cpu_ptr(pool->acomp_ctx, 1))
Calls crypto_free_acomp()
Waits for scomp_lock
(4) Task A running on CPU #2:
Waits for per_cpu_ptr(pool->acomp_ctx, 1) // Read on CPU #1
DEADLOCK
Since there is no requirement to call crypto_free_acomp() with the per-CPU
acomp_ctx mutex held in zswap_cpu_comp_dead(), move it after the mutex is
unlocked. Also move the acomp_request_free() and kfree() calls for
consistency and to avoid any potential sublte locking dependencies in the
future.
With this, only setting acomp_ctx fields to NULL occurs with the mutex
held. This is similar to how zswap_cpu_comp_prepare() only initializes
acomp_ctx fields with the mutex held, after performing all allocations
before holding the mutex.
Opportunistically, move the NULL check on acomp_ctx so that it takes place
before the mutex dereference.
Link: https://lkml.kernel.org/r/20250226185625.2672936-1-yosry.ahmed@linux.dev
Fixes: 12dcb0ef5406 ("mm: zswap: properly synchronize freeing resources during CPU hotunplug")
Signed-off-by: Herbert Xu <herbert(a)gondor.apana.org.au>
Co-developed-by: Herbert Xu <herbert(a)gondor.apana.org.au>
Signed-off-by: Yosry Ahmed <yosry.ahmed(a)linux.dev>
Reported-by: syzbot+1a517ccfcbc6a7ab0f82(a)syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/67bcea51.050a0220.bbfd1.0096.GAE@google.com/
Acked-by: Herbert Xu <herbert(a)gondor.apana.org.au>
Cc: Chengming Zhou <chengming.zhou(a)linux.dev>
Cc: David S. Miller <davem(a)davemloft.net>
Cc: Eric Biggers <ebiggers(a)kernel.org>
Cc: Johannes Weiner <hannes(a)cmpxchg.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/zswap.c | 30 ++++++++++++++++++++++--------
1 file changed, 22 insertions(+), 8 deletions(-)
--- a/mm/zswap.c~mm-zswap-fix-crypto_free_acomp-deadlock-in-zswap_cpu_comp_dead
+++ a/mm/zswap.c
@@ -881,18 +881,32 @@ static int zswap_cpu_comp_dead(unsigned
{
struct zswap_pool *pool = hlist_entry(node, struct zswap_pool, node);
struct crypto_acomp_ctx *acomp_ctx = per_cpu_ptr(pool->acomp_ctx, cpu);
+ struct acomp_req *req;
+ struct crypto_acomp *acomp;
+ u8 *buffer;
+
+ if (IS_ERR_OR_NULL(acomp_ctx))
+ return 0;
mutex_lock(&acomp_ctx->mutex);
- if (!IS_ERR_OR_NULL(acomp_ctx)) {
- if (!IS_ERR_OR_NULL(acomp_ctx->req))
- acomp_request_free(acomp_ctx->req);
- acomp_ctx->req = NULL;
- if (!IS_ERR_OR_NULL(acomp_ctx->acomp))
- crypto_free_acomp(acomp_ctx->acomp);
- kfree(acomp_ctx->buffer);
- }
+ req = acomp_ctx->req;
+ acomp = acomp_ctx->acomp;
+ buffer = acomp_ctx->buffer;
+ acomp_ctx->req = NULL;
+ acomp_ctx->acomp = NULL;
+ acomp_ctx->buffer = NULL;
mutex_unlock(&acomp_ctx->mutex);
+ /*
+ * Do the actual freeing after releasing the mutex to avoid subtle
+ * locking dependencies causing deadlocks.
+ */
+ if (!IS_ERR_OR_NULL(req))
+ acomp_request_free(req);
+ if (!IS_ERR_OR_NULL(acomp))
+ crypto_free_acomp(acomp);
+ kfree(buffer);
+
return 0;
}
_
Patches currently in -mm which might be from yosry.ahmed(a)linux.dev are
mm-zswap-fix-crypto_free_acomp-deadlock-in-zswap_cpu_comp_dead.patch
Hello:
This series was applied to netdev/net-next.git (main)
by Paolo Abeni <pabeni(a)redhat.com>:
On Fri, 14 Mar 2025 21:11:30 +0100 you wrote:
> Here are 3 unrelated fixes for the net tree.
>
> - Patch 1: fix data stream corruption when ending up not sending an
> ADD_ADDR.
>
> - Patch 2: fix missing getsockopt(IPV6_V6ONLY) support -- the set part
> is supported.
>
> [...]
Here is the summary with links:
- [net,1/3] mptcp: Fix data stream corruption in the address announcement
(no matching commit)
- [net,2/3] mptcp: sockopt: fix getting IPV6_V6ONLY
https://git.kernel.org/netdev/net-next/c/8c3963375988
- [net,3/3] mptcp: sockopt: fix getting freebind & transparent
https://git.kernel.org/netdev/net-next/c/e2f4ac7bab22
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
Hello,
The following change seems to cause kexec to sometimes fail (not every
time but about 50% chance) for the stable 6.1 kernels, 6.1.129 and later:
6821918f4519 ("x86/kexec: Allocate PGD for x86_64 transition page tables
separately")
The commit message for that commit states that it is dependent on
another change "x86/mm: Add _PAGE_NOPTISHADOW bit to avoid updating
userspace page tables" but that change does not seem to have been done
in the 6.1 kernel series, which could explain why the 6821918f4519
change causes problems for 6.1.
This appears to be a problem only for 6.1, for the 6.6 and later stable
kernels there is no problem.
I think the reason this problem is seen only for 6.1 and not for 6.6 and
later is that the change "x86/kexec: Allocate PGD for x86_64 transition
page tables separately" relies on things that are not available in 6.1.
In the tests I have done, kexec is called via u-root.
More details are available here:
https://git.glasklar.is/system-transparency/core/stboot/-/issues/227
Cheers,
Elias