April 2025 - Linux-stable-mirror

FAILED: patch "[PATCH] btrfs: check folio mapping after unlock in" failed to apply to 6.1-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 6.1-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y git checkout FETCH_HEAD git cherry-pick -x 3e74859ee35edc33a022c3f3971df066ea0ca6b9 # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024123045-parka-sublet-a95d@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^.. Possible dependencies: thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From 3e74859ee35edc33a022c3f3971df066ea0ca6b9 Mon Sep 17 00:00:00 2001 From: Boris Burkov <boris(a)bur.io> Date: Fri, 13 Dec 2024 12:22:32 -0800 Subject: [PATCH] btrfs: check folio mapping after unlock in relocate_one_folio() When we call btrfs_read_folio() to bring a folio uptodate, we unlock the folio. The result of that is that a different thread can modify the mapping (like remove it with invalidate) before we call folio_lock(). This results in an invalid page and we need to try again. In particular, if we are relocating concurrently with aborting a transaction, this can result in a crash like the following: BUG: kernel NULL pointer dereference, address: 0000000000000000 PGD 0 P4D 0 Oops: 0000 [#1] SMP CPU: 76 PID: 1411631 Comm: kworker/u322:5 Workqueue: events_unbound btrfs_reclaim_bgs_work RIP: 0010:set_page_extent_mapped+0x20/0xb0 RSP: 0018:ffffc900516a7be8 EFLAGS: 00010246 RAX: ffffea009e851d08 RBX: ffffea009e0b1880 RCX: 0000000000000000 RDX: 0000000000000000 RSI: ffffc900516a7b90 RDI: ffffea009e0b1880 RBP: 0000000003573000 R08: 0000000000000001 R09: ffff88c07fd2f3f0 R10: 0000000000000000 R11: 0000194754b575be R12: 0000000003572000 R13: 0000000003572fff R14: 0000000000100cca R15: 0000000005582fff FS: 0000000000000000(0000) GS:ffff88c07fd00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 000000407d00f002 CR4: 00000000007706f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: <TASK> ? __die+0x78/0xc0 ? page_fault_oops+0x2a8/0x3a0 ? __switch_to+0x133/0x530 ? wq_worker_running+0xa/0x40 ? exc_page_fault+0x63/0x130 ? asm_exc_page_fault+0x22/0x30 ? set_page_extent_mapped+0x20/0xb0 relocate_file_extent_cluster+0x1a7/0x940 relocate_data_extent+0xaf/0x120 relocate_block_group+0x20f/0x480 btrfs_relocate_block_group+0x152/0x320 btrfs_relocate_chunk+0x3d/0x120 btrfs_reclaim_bgs_work+0x2ae/0x4e0 process_scheduled_works+0x184/0x370 worker_thread+0xc6/0x3e0 ? blk_add_timer+0xb0/0xb0 kthread+0xae/0xe0 ? flush_tlb_kernel_range+0x90/0x90 ret_from_fork+0x2f/0x40 ? flush_tlb_kernel_range+0x90/0x90 ret_from_fork_asm+0x11/0x20 </TASK> This occurs because cleanup_one_transaction() calls destroy_delalloc_inodes() which calls invalidate_inode_pages2() which takes the folio_lock before setting mapping to NULL. We fail to check this, and subsequently call set_extent_mapping(), which assumes that mapping != NULL (in fact it asserts that in debug mode) Note that the "fixes" patch here is not the one that introduced the race (the very first iteration of this code from 2009) but a more recent change that made this particular crash happen in practice. Fixes: e7f1326cc24e ("btrfs: set page extent mapped after read_folio in relocate_one_page") CC: stable(a)vger.kernel.org # 6.1+ Reviewed-by: Qu Wenruo <wqu(a)suse.com> Signed-off-by: Boris Burkov <boris(a)bur.io> Signed-off-by: David Sterba <dsterba(a)suse.com> diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c index bf267bdfa8f8..db8b42f674b7 100644 --- a/fs/btrfs/relocation.c +++ b/fs/btrfs/relocation.c @@ -2902,6 +2902,7 @@ static int relocate_one_folio(struct reloc_control *rc, const bool use_rst = btrfs_need_stripe_tree_update(fs_info, rc->block_group->flags); ASSERT(index <= last_index); +again: folio = filemap_lock_folio(inode->i_mapping, index); if (IS_ERR(folio)) { @@ -2937,6 +2938,11 @@ static int relocate_one_folio(struct reloc_control *rc, ret = -EIO; goto release_folio; } + if (folio->mapping != inode->i_mapping) { + folio_unlock(folio); + folio_put(folio); + goto again; + } } /*

1 month, 3 weeks

4
5
0 0

FAILED: patch "[PATCH] btrfs: check folio mapping after unlock in" failed to apply to 6.6-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 6.6-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y git checkout FETCH_HEAD git cherry-pick -x 3e74859ee35edc33a022c3f3971df066ea0ca6b9 # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024123042-limelight-doily-8703@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^.. Possible dependencies: thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From 3e74859ee35edc33a022c3f3971df066ea0ca6b9 Mon Sep 17 00:00:00 2001 From: Boris Burkov <boris(a)bur.io> Date: Fri, 13 Dec 2024 12:22:32 -0800 Subject: [PATCH] btrfs: check folio mapping after unlock in relocate_one_folio() When we call btrfs_read_folio() to bring a folio uptodate, we unlock the folio. The result of that is that a different thread can modify the mapping (like remove it with invalidate) before we call folio_lock(). This results in an invalid page and we need to try again. In particular, if we are relocating concurrently with aborting a transaction, this can result in a crash like the following: BUG: kernel NULL pointer dereference, address: 0000000000000000 PGD 0 P4D 0 Oops: 0000 [#1] SMP CPU: 76 PID: 1411631 Comm: kworker/u322:5 Workqueue: events_unbound btrfs_reclaim_bgs_work RIP: 0010:set_page_extent_mapped+0x20/0xb0 RSP: 0018:ffffc900516a7be8 EFLAGS: 00010246 RAX: ffffea009e851d08 RBX: ffffea009e0b1880 RCX: 0000000000000000 RDX: 0000000000000000 RSI: ffffc900516a7b90 RDI: ffffea009e0b1880 RBP: 0000000003573000 R08: 0000000000000001 R09: ffff88c07fd2f3f0 R10: 0000000000000000 R11: 0000194754b575be R12: 0000000003572000 R13: 0000000003572fff R14: 0000000000100cca R15: 0000000005582fff FS: 0000000000000000(0000) GS:ffff88c07fd00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 000000407d00f002 CR4: 00000000007706f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: <TASK> ? __die+0x78/0xc0 ? page_fault_oops+0x2a8/0x3a0 ? __switch_to+0x133/0x530 ? wq_worker_running+0xa/0x40 ? exc_page_fault+0x63/0x130 ? asm_exc_page_fault+0x22/0x30 ? set_page_extent_mapped+0x20/0xb0 relocate_file_extent_cluster+0x1a7/0x940 relocate_data_extent+0xaf/0x120 relocate_block_group+0x20f/0x480 btrfs_relocate_block_group+0x152/0x320 btrfs_relocate_chunk+0x3d/0x120 btrfs_reclaim_bgs_work+0x2ae/0x4e0 process_scheduled_works+0x184/0x370 worker_thread+0xc6/0x3e0 ? blk_add_timer+0xb0/0xb0 kthread+0xae/0xe0 ? flush_tlb_kernel_range+0x90/0x90 ret_from_fork+0x2f/0x40 ? flush_tlb_kernel_range+0x90/0x90 ret_from_fork_asm+0x11/0x20 </TASK> This occurs because cleanup_one_transaction() calls destroy_delalloc_inodes() which calls invalidate_inode_pages2() which takes the folio_lock before setting mapping to NULL. We fail to check this, and subsequently call set_extent_mapping(), which assumes that mapping != NULL (in fact it asserts that in debug mode) Note that the "fixes" patch here is not the one that introduced the race (the very first iteration of this code from 2009) but a more recent change that made this particular crash happen in practice. Fixes: e7f1326cc24e ("btrfs: set page extent mapped after read_folio in relocate_one_page") CC: stable(a)vger.kernel.org # 6.1+ Reviewed-by: Qu Wenruo <wqu(a)suse.com> Signed-off-by: Boris Burkov <boris(a)bur.io> Signed-off-by: David Sterba <dsterba(a)suse.com> diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c index bf267bdfa8f8..db8b42f674b7 100644 --- a/fs/btrfs/relocation.c +++ b/fs/btrfs/relocation.c @@ -2902,6 +2902,7 @@ static int relocate_one_folio(struct reloc_control *rc, const bool use_rst = btrfs_need_stripe_tree_update(fs_info, rc->block_group->flags); ASSERT(index <= last_index); +again: folio = filemap_lock_folio(inode->i_mapping, index); if (IS_ERR(folio)) { @@ -2937,6 +2938,11 @@ static int relocate_one_folio(struct reloc_control *rc, ret = -EIO; goto release_folio; } + if (folio->mapping != inode->i_mapping) { + folio_unlock(folio); + folio_put(folio); + goto again; + } } /*

1 month, 3 weeks

3
2
0 0

[PATCH] ext4: inline: fix len overflow in ext4_prepare_inline_data

by Thadeu Lima de Souza Cascardo

When running the following code on an ext4 filesystem with inline_data feature enabled, it will lead to the bug below. fd = open("file1", O_RDWR | O_CREAT | O_TRUNC, 0666); ftruncate(fd, 30); pwrite(fd, "a", 1, (1UL << 40) + 5UL); That happens because write_begin will succeed as when ext4_generic_write_inline_data calls ext4_prepare_inline_data, pos + len will be truncated, leading to ext4_prepare_inline_data parameter to be 6 instead of 0x10000000006. Then, later when write_end is called, we hit: BUG_ON(pos + len > EXT4_I(inode)->i_inline_size); at ext4_write_inline_data. Fix it by using a loff_t type for the len parameter in ext4_prepare_inline_data instead of an unsigned int. [ 44.545164] ------------[ cut here ]------------ [ 44.545530] kernel BUG at fs/ext4/inline.c:240! [ 44.545834] Oops: invalid opcode: 0000 [#1] SMP NOPTI [ 44.546172] CPU: 3 UID: 0 PID: 343 Comm: test Not tainted 6.15.0-rc2-00003-g9080916f4863 #45 PREEMPT(full) 112853fcebfdb93254270a7959841d2c6aa2c8bb [ 44.546523] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014 [ 44.546523] RIP: 0010:ext4_write_inline_data+0xfe/0x100 [ 44.546523] Code: 3c 0e 48 83 c7 48 48 89 de 5b 41 5c 41 5d 41 5e 41 5f 5d e9 e4 fa 43 01 5b 41 5c 41 5d 41 5e 41 5f 5d c3 cc cc cc cc cc 0f 0b <0f> 0b 0f 1f 44 00 00 55 41 57 41 56 41 55 41 54 53 48 83 ec 20 49 [ 44.546523] RSP: 0018:ffffb342008b79a8 EFLAGS: 00010216 [ 44.546523] RAX: 0000000000000001 RBX: ffff9329c579c000 RCX: 0000010000000006 [ 44.546523] RDX: 000000000000003c RSI: ffffb342008b79f0 RDI: ffff9329c158e738 [ 44.546523] RBP: 0000000000000001 R08: 0000000000000001 R09: 0000000000000000 [ 44.546523] R10: 00007ffffffff000 R11: ffffffff9bd0d910 R12: 0000006210000000 [ 44.546523] R13: fffffc7e4015e700 R14: 0000010000000005 R15: ffff9329c158e738 [ 44.546523] FS: 00007f4299934740(0000) GS:ffff932a60179000(0000) knlGS:0000000000000000 [ 44.546523] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 44.546523] CR2: 00007f4299a1ec90 CR3: 0000000002886002 CR4: 0000000000770eb0 [ 44.546523] PKRU: 55555554 [ 44.546523] Call Trace: [ 44.546523] <TASK> [ 44.546523] ext4_write_inline_data_end+0x126/0x2d0 [ 44.546523] generic_perform_write+0x17e/0x270 [ 44.546523] ext4_buffered_write_iter+0xc8/0x170 [ 44.546523] vfs_write+0x2be/0x3e0 [ 44.546523] __x64_sys_pwrite64+0x6d/0xc0 [ 44.546523] do_syscall_64+0x6a/0xf0 [ 44.546523] ? __wake_up+0x89/0xb0 [ 44.546523] ? xas_find+0x72/0x1c0 [ 44.546523] ? next_uptodate_folio+0x317/0x330 [ 44.546523] ? set_pte_range+0x1a6/0x270 [ 44.546523] ? filemap_map_pages+0x6ee/0x840 [ 44.546523] ? ext4_setattr+0x2fa/0x750 [ 44.546523] ? do_pte_missing+0x128/0xf70 [ 44.546523] ? security_inode_post_setattr+0x3e/0xd0 [ 44.546523] ? ___pte_offset_map+0x19/0x100 [ 44.546523] ? handle_mm_fault+0x721/0xa10 [ 44.546523] ? do_user_addr_fault+0x197/0x730 [ 44.546523] ? do_syscall_64+0x76/0xf0 [ 44.546523] ? arch_exit_to_user_mode_prepare+0x1e/0x60 [ 44.546523] ? irqentry_exit_to_user_mode+0x79/0x90 [ 44.546523] entry_SYSCALL_64_after_hwframe+0x55/0x5d [ 44.546523] RIP: 0033:0x7f42999c6687 [ 44.546523] Code: 48 89 fa 4c 89 df e8 58 b3 00 00 8b 93 08 03 00 00 59 5e 48 83 f8 fc 74 1a 5b c3 0f 1f 84 00 00 00 00 00 48 8b 44 24 10 0f 05 <5b> c3 0f 1f 80 00 00 00 00 83 e2 39 83 fa 08 75 de e8 23 ff ff ff [ 44.546523] RSP: 002b:00007ffeae4a7930 EFLAGS: 00000202 ORIG_RAX: 0000000000000012 [ 44.546523] RAX: ffffffffffffffda RBX: 00007f4299934740 RCX: 00007f42999c6687 [ 44.546523] RDX: 0000000000000001 RSI: 000055ea6149200f RDI: 0000000000000003 [ 44.546523] RBP: 00007ffeae4a79a0 R08: 0000000000000000 R09: 0000000000000000 [ 44.546523] R10: 0000010000000005 R11: 0000000000000202 R12: 0000000000000000 [ 44.546523] R13: 00007ffeae4a7ac8 R14: 00007f4299b86000 R15: 000055ea61493dd8 [ 44.546523] </TASK> [ 44.546523] Modules linked in: [ 44.568501] ---[ end trace 0000000000000000 ]--- [ 44.568889] RIP: 0010:ext4_write_inline_data+0xfe/0x100 [ 44.569328] Code: 3c 0e 48 83 c7 48 48 89 de 5b 41 5c 41 5d 41 5e 41 5f 5d e9 e4 fa 43 01 5b 41 5c 41 5d 41 5e 41 5f 5d c3 cc cc cc cc cc 0f 0b <0f> 0b 0f 1f 44 00 00 55 41 57 41 56 41 55 41 54 53 48 83 ec 20 49 [ 44.570931] RSP: 0018:ffffb342008b79a8 EFLAGS: 00010216 [ 44.571356] RAX: 0000000000000001 RBX: ffff9329c579c000 RCX: 0000010000000006 [ 44.571959] RDX: 000000000000003c RSI: ffffb342008b79f0 RDI: ffff9329c158e738 [ 44.572571] RBP: 0000000000000001 R08: 0000000000000001 R09: 0000000000000000 [ 44.573148] R10: 00007ffffffff000 R11: ffffffff9bd0d910 R12: 0000006210000000 [ 44.573748] R13: fffffc7e4015e700 R14: 0000010000000005 R15: ffff9329c158e738 [ 44.574335] FS: 00007f4299934740(0000) GS:ffff932a60179000(0000) knlGS:0000000000000000 [ 44.575027] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 44.575520] CR2: 00007f4299a1ec90 CR3: 0000000002886002 CR4: 0000000000770eb0 [ 44.576112] PKRU: 55555554 [ 44.576338] Kernel panic - not syncing: Fatal exception [ 44.576517] Kernel Offset: 0x1a600000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) Reported-by: syzbot+fe2a25dae02a207717a0(a)syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=fe2a25dae02a207717a0 Fixes: f19d5870cbf7 ("ext4: add normal write support for inline data") Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo(a)igalia.com> Cc: stable(a)vger.kernel.org --- fs/ext4/inline.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/ext4/inline.c b/fs/ext4/inline.c index 2c9b762925c72f2ff5a402b02500370bc1eb0eb1..e5e6bf0d338b965a885fb99581f9ed5e51c5257c 100644 --- a/fs/ext4/inline.c +++ b/fs/ext4/inline.c @@ -397,7 +397,7 @@ static int ext4_update_inline_data(handle_t *handle, struct inode *inode, } static int ext4_prepare_inline_data(handle_t *handle, struct inode *inode, - unsigned int len) + loff_t len) { int ret, size, no_expand; struct ext4_inode_info *ei = EXT4_I(inode); --- base-commit: 8ffd015db85fea3e15a77027fda6c02ced4d2444 change-id: 20250415-ext4-prepare-inline-overflow-8db0e747cb16 Best regards, -- Thadeu Lima de Souza Cascardo <cascardo(a)igalia.com>

1 month, 3 weeks

4
3
0 0

v2 [PATCH] ocfs2: fix panic in failed foilio allocation

by Mark Tinguely

In the page to order 0 folio conversion series, the commit 7e119cff9d0a, "ocfs2: convert w_pages to w_folios" and commit 9a5e08652dc4b, "ocfs2: use an array of folios instead of an array of pages", saves -ENOMEM in the folio array upon allocation failure and calls the folio array free code. The folio array free code expects either valid folio pointers or NULL. Finding the -ENOMEM will result in a panic. Fix by NULLing the error folio entry. Signed-off-by: Mark Tinguely <mark.tinguely(a)oracle.com> Cc: stable(a)vger.kernel.org Cc: Changwei Ge <gechangwei(a)live.cn> Cc: Joel Becker <jlbec(a)evilplan.org> Cc: Junxiao Bi <junxiao.bi(a)oracle.com> Cc: Mark Fasheh <mark(a)fasheh.com> Cc: Matthew Wilcox <willy(a)infradead.org> --- v2: sorry, ocfs2_grab_folios() needs the same change. the other callers do not need the change. --- fs/ocfs2/alloc.c | 1 + fs/ocfs2/aops.c | 1 + 2 files changed, 2 insertions(+) diff --git a/fs/ocfs2/alloc.c b/fs/ocfs2/alloc.c index b8ac85b548c7..821cb7874685 100644 --- a/fs/ocfs2/alloc.c +++ b/fs/ocfs2/alloc.c @@ -6918,6 +6918,7 @@ static int ocfs2_grab_folios(struct inode *inode, loff_t start, loff_t end, if (IS_ERR(folios[numfolios])) { ret = PTR_ERR(folios[numfolios]); mlog_errno(ret); + folios[numfolios] = NULL; goto out; } diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c index 40b6bce12951..89aadc6cdd87 100644 --- a/fs/ocfs2/aops.c +++ b/fs/ocfs2/aops.c @@ -1071,6 +1071,7 @@ static int ocfs2_grab_folios_for_write(struct address_space *mapping, if (IS_ERR(wc->w_folios[i])) { ret = PTR_ERR(wc->w_folios[i]); mlog_errno(ret); + wc->w_folios[i] = NULL; goto out; } } -- 2.39.5 (Apple Git-154)

1 month, 3 weeks

3
3
0 0

[PATCH 0/5] net: ch9200: fix various bugs and improve qinheng ch9200 driver

by Qasim Ijaz

This patch series aims to fix various issues throughout the QinHeng CH9200 driver. This driver fails to handle various failures, which in one case has lead to a uninit access bug found via syzbot. Upon reviewing the driver I fixed a few more issues which I have included in this patch series. Parts of this series are the product of discussions and suggestions I had from others like Andrew Lunn, Simon Horman and Jakub Kicinski you can view those discussions below: Link: <https://lore.kernel.org/all/20250319112156.48312-1-qasdev00@gmail.com> Link: <https://lore.kernel.org/all/20250218002443.11731-1-qasdev00@gmail.com/> Link: <https://lore.kernel.org/all/20250311161157.49065-1-qasdev00@gmail.com/> Qasim Ijaz (5): fix uninitialised access bug during mii_nway_restart remove extraneous return that prevents error propagation fail fast on control_read() failures during get_mac_address() add missing error handling in ch9200_bind() avoid triggering NWay restart on non-zero PHY ID drivers/net/usb/ch9200.c | 61 ++++++++++++++++++++++++++-------------- 1 file changed, 40 insertions(+), 21 deletions(-) -- 2.39.5

1 month, 3 weeks

4
20
0 0

[PATCH] wifi: ath11k: fix rx completion meta data corruption

by Johan Hovold

Add the missing memory barrier to make sure that the REO dest ring descriptor is read after the head pointer to avoid using stale data on weakly ordered architectures like aarch64. This may fix the ring-buffer corruption worked around by commit f9fff67d2d7c ("wifi: ath11k: Fix SKB corruption in REO destination ring") by silently discarding data, and may possibly also address user reported errors like: ath11k_pci 0006:01:00.0: msdu_done bit in attention is not set Tested-on: WCN6855 hw2.1 WLAN.HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3.6510.41 Fixes: d5c65159f289 ("ath11k: driver for Qualcomm IEEE 802.11ax devices") Cc: stable(a)vger.kernel.org # 5.6 Link: https://bugzilla.kernel.org/show_bug.cgi?id=218005 Signed-off-by: Johan Hovold <johan+linaro(a)kernel.org> --- As I reported here: https://lore.kernel.org/lkml/Z9G5zEOcTdGKm7Ei@hovoldconsulting.com/ the ath11k and ath12k appear to be missing a number of memory barriers that are required on weakly ordered architectures like aarch64 to avoid memory corruption issues. Here's a fix for one more such case which people already seem to be hitting. Note that I've seen one "msdu_done" bit not set warning also with this patch so whether it helps with that at all remains to be seen. I'm CCing Jens and Steev that see these warnings frequently and that may be able to help out with testing. Johan drivers/net/wireless/ath/ath11k/dp_rx.c | 25 ++++++++++++++++--------- 1 file changed, 16 insertions(+), 9 deletions(-) diff --git a/drivers/net/wireless/ath/ath11k/dp_rx.c b/drivers/net/wireless/ath/ath11k/dp_rx.c index 029ecf51c9ef..0a57b337e4c6 100644 --- a/drivers/net/wireless/ath/ath11k/dp_rx.c +++ b/drivers/net/wireless/ath/ath11k/dp_rx.c @@ -2646,7 +2646,7 @@ int ath11k_dp_process_rx(struct ath11k_base *ab, int ring_id, struct ath11k *ar; struct hal_reo_dest_ring *desc; enum hal_reo_dest_ring_push_reason push_reason; - u32 cookie; + u32 cookie, info0, rx_msdu_info0, rx_mpdu_info0; int i; for (i = 0; i < MAX_RADIOS; i++) @@ -2659,11 +2659,14 @@ int ath11k_dp_process_rx(struct ath11k_base *ab, int ring_id, try_again: ath11k_hal_srng_access_begin(ab, srng); + /* Make sure descriptor is read after the head pointer. */ + dma_rmb(); + while (likely(desc = (struct hal_reo_dest_ring *)ath11k_hal_srng_dst_get_next_entry(ab, srng))) { cookie = FIELD_GET(BUFFER_ADDR_INFO1_SW_COOKIE, - desc->buf_addr_info.info1); + READ_ONCE(desc->buf_addr_info.info1)); buf_id = FIELD_GET(DP_RXDMA_BUF_COOKIE_BUF_ID, cookie); mac_id = FIELD_GET(DP_RXDMA_BUF_COOKIE_PDEV_ID, cookie); @@ -2692,8 +2695,9 @@ int ath11k_dp_process_rx(struct ath11k_base *ab, int ring_id, num_buffs_reaped[mac_id]++; + info0 = READ_ONCE(desc->info0); push_reason = FIELD_GET(HAL_REO_DEST_RING_INFO0_PUSH_REASON, - desc->info0); + info0); if (unlikely(push_reason != HAL_REO_DEST_RING_PUSH_REASON_ROUTING_INSTRUCTION)) { dev_kfree_skb_any(msdu); @@ -2701,18 +2705,21 @@ int ath11k_dp_process_rx(struct ath11k_base *ab, int ring_id, continue; } - rxcb->is_first_msdu = !!(desc->rx_msdu_info.info0 & + rx_msdu_info0 = READ_ONCE(desc->rx_msdu_info.info0); + rx_mpdu_info0 = READ_ONCE(desc->rx_mpdu_info.info0); + + rxcb->is_first_msdu = !!(rx_msdu_info0 & RX_MSDU_DESC_INFO0_FIRST_MSDU_IN_MPDU); - rxcb->is_last_msdu = !!(desc->rx_msdu_info.info0 & + rxcb->is_last_msdu = !!(rx_msdu_info0 & RX_MSDU_DESC_INFO0_LAST_MSDU_IN_MPDU); - rxcb->is_continuation = !!(desc->rx_msdu_info.info0 & + rxcb->is_continuation = !!(rx_msdu_info0 & RX_MSDU_DESC_INFO0_MSDU_CONTINUATION); rxcb->peer_id = FIELD_GET(RX_MPDU_DESC_META_DATA_PEER_ID, - desc->rx_mpdu_info.meta_data); + READ_ONCE(desc->rx_mpdu_info.meta_data)); rxcb->seq_no = FIELD_GET(RX_MPDU_DESC_INFO0_SEQ_NUM, - desc->rx_mpdu_info.info0); + rx_mpdu_info0); rxcb->tid = FIELD_GET(HAL_REO_DEST_RING_INFO0_RX_QUEUE_NUM, - desc->info0); + info0); rxcb->mac_id = mac_id; __skb_queue_tail(&msdu_list[mac_id], msdu); -- 2.48.1

1 month, 3 weeks

5
7
0 0

[PATCH] wifi: ath11k: fix ring-buffer corruption

by Johan Hovold

Users of the Lenovo ThinkPad X13s have reported that Wi-Fi sometimes breaks and the log fills up with errors like: ath11k_pci 0006:01:00.0: HTC Rx: insufficient length, got 1484, expected 1492 ath11k_pci 0006:01:00.0: HTC Rx: insufficient length, got 1460, expected 1484 which based on a quick look at the driver seemed to indicate some kind of ring-buffer corruption. Miaoqing Pan tracked it down to the host seeing the updated destination ring head pointer before the updated descriptor, and the error handling for that in turn leaves the ring buffer in an inconsistent state. Add the missing memory barrier to make sure that the descriptor is read after the head pointer to address the root cause of the corruption while fixing up the error handling in case there are ever any (ordering) bugs on the device side. Note that the READ_ONCE() are only needed to avoid compiler mischief in case the ring-buffer helpers are ever inlined. Tested-on: WCN6855 hw2.1 WLAN.HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3.6510.41 Fixes: d5c65159f289 ("ath11k: driver for Qualcomm IEEE 802.11ax devices") Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218623 Link: https://lore.kernel.org/20250310010217.3845141-3-quic_miaoqing@quicinc.com Cc: Miaoqing Pan <quic_miaoqing(a)quicinc.com> Cc: stable(a)vger.kernel.org # 5.6 Signed-off-by: Johan Hovold <johan+linaro(a)kernel.org> --- drivers/net/wireless/ath/ath11k/ce.c | 11 +++++------ drivers/net/wireless/ath/ath11k/hal.c | 4 ++-- 2 files changed, 7 insertions(+), 8 deletions(-) diff --git a/drivers/net/wireless/ath/ath11k/ce.c b/drivers/net/wireless/ath/ath11k/ce.c index e66e86bdec20..9d8efec46508 100644 --- a/drivers/net/wireless/ath/ath11k/ce.c +++ b/drivers/net/wireless/ath/ath11k/ce.c @@ -393,11 +393,10 @@ static int ath11k_ce_completed_recv_next(struct ath11k_ce_pipe *pipe, goto err; } + /* Make sure descriptor is read after the head pointer. */ + dma_rmb(); + *nbytes = ath11k_hal_ce_dst_status_get_length(desc); - if (*nbytes == 0) { - ret = -EIO; - goto err; - } *skb = pipe->dest_ring->skb[sw_index]; pipe->dest_ring->skb[sw_index] = NULL; @@ -430,8 +429,8 @@ static void ath11k_ce_recv_process_cb(struct ath11k_ce_pipe *pipe) dma_unmap_single(ab->dev, ATH11K_SKB_RXCB(skb)->paddr, max_nbytes, DMA_FROM_DEVICE); - if (unlikely(max_nbytes < nbytes)) { - ath11k_warn(ab, "rxed more than expected (nbytes %d, max %d)", + if (unlikely(max_nbytes < nbytes || nbytes == 0)) { + ath11k_warn(ab, "unexpected rx length (nbytes %d, max %d)", nbytes, max_nbytes); dev_kfree_skb_any(skb); continue; diff --git a/drivers/net/wireless/ath/ath11k/hal.c b/drivers/net/wireless/ath/ath11k/hal.c index 61f4b6dd5380..8cb1505a5a0c 100644 --- a/drivers/net/wireless/ath/ath11k/hal.c +++ b/drivers/net/wireless/ath/ath11k/hal.c @@ -599,7 +599,7 @@ u32 ath11k_hal_ce_dst_status_get_length(void *buf) struct hal_ce_srng_dst_status_desc *desc = buf; u32 len; - len = FIELD_GET(HAL_CE_DST_STATUS_DESC_FLAGS_LEN, desc->flags); + len = FIELD_GET(HAL_CE_DST_STATUS_DESC_FLAGS_LEN, READ_ONCE(desc->flags)); desc->flags &= ~HAL_CE_DST_STATUS_DESC_FLAGS_LEN; return len; @@ -829,7 +829,7 @@ void ath11k_hal_srng_access_begin(struct ath11k_base *ab, struct hal_srng *srng) srng->u.src_ring.cached_tp = *(volatile u32 *)srng->u.src_ring.tp_addr; } else { - srng->u.dst_ring.cached_hp = *srng->u.dst_ring.hp_addr; + srng->u.dst_ring.cached_hp = READ_ONCE(*srng->u.dst_ring.hp_addr); /* Try to prefetch the next descriptor in the ring */ if (srng->flags & HAL_SRNG_FLAGS_CACHED) -- 2.48.1

1 month, 3 weeks

7
6
0 0

[PATCH stable 5.15/6.1/6.6] af_unix: Clear oob_skb in scan_inflight().

by Kuniyuki Iwashima

Embryo socket is not queued in gc_candidates, so we can't drop a reference held by its oob_skb. Let's say we create listener and embryo sockets, send the listener's fd to the embryo as OOB data, and close() them without recv()ing the OOB data. There is a self-reference cycle like listener -> embryo.oob_skb -> listener , so this must be cleaned up by GC. Otherwise, the listener's refcnt is not released and sockets are leaked: # unshare -n # cat /proc/net/protocols | grep UNIX-STREAM UNIX-STREAM 1024 0 -1 NI 0 yes kernel ... # python3 >>> from array import array >>> from socket import * >>> >>> s = socket(AF_UNIX, SOCK_STREAM) >>> s.bind('\0test\0') >>> s.listen() >>> >>> c = socket(AF_UNIX, SOCK_STREAM) >>> c.connect(s.getsockname()) >>> c.sendmsg([b'x'], [(SOL_SOCKET, SCM_RIGHTS, array('i', [s.fileno()]))], MSG_OOB) 1 >>> quit() # cat /proc/net/protocols | grep UNIX-STREAM UNIX-STREAM 1024 3 -1 NI 0 yes kernel ... ^^^ 3 sockets still in use after FDs are close()d Let's drop the embryo socket's oob_skb ref in scan_inflight(). This also fixes a racy access to oob_skb that commit 9841991a446c ("af_unix: Update unix_sk(sk)->oob_skb under sk_receive_queue lock.") fixed for the new Tarjan's algo-based GC. Fixes: 314001f0bf92 ("af_unix: Add OOB support") Reported-by: Lei Lu <llfamsec(a)gmail.com> Signed-off-by: Kuniyuki Iwashima <kuniyu(a)amazon.com> --- This has no upstream commit because I replaced the entire GC in 6.10 and the new GC does not have this bug, and this fix is only applicable to the old GC (<= 6.9), thus for 5.15/6.1/6.6. --- --- net/unix/garbage.c | 20 ++++++++++---------- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/net/unix/garbage.c b/net/unix/garbage.c index 2a758531e102..b3fbdf129944 100644 --- a/net/unix/garbage.c +++ b/net/unix/garbage.c @@ -102,13 +102,14 @@ static void scan_inflight(struct sock *x, void (*func)(struct unix_sock *), /* Process the descriptors of this socket */ int nfd = UNIXCB(skb).fp->count; struct file **fp = UNIXCB(skb).fp->fp; + struct unix_sock *u; while (nfd--) { /* Get the socket the fd matches if it indeed does so */ struct sock *sk = unix_get_socket(*fp++); if (sk) { - struct unix_sock *u = unix_sk(sk); + u = unix_sk(sk); /* Ignore non-candidates, they could * have been added to the queues after @@ -122,6 +123,13 @@ static void scan_inflight(struct sock *x, void (*func)(struct unix_sock *), } } if (hit && hitlist != NULL) { +#if IS_ENABLED(CONFIG_AF_UNIX_OOB) + u = unix_sk(x); + if (u->oob_skb) { + WARN_ON_ONCE(skb_unref(u->oob_skb)); + u->oob_skb = NULL; + } +#endif __skb_unlink(skb, &x->sk_receive_queue); __skb_queue_tail(hitlist, skb); } @@ -299,17 +307,9 @@ void unix_gc(void) * which are creating the cycle(s). */ skb_queue_head_init(&hitlist); - list_for_each_entry(u, &gc_candidates, link) { + list_for_each_entry(u, &gc_candidates, link) scan_children(&u->sk, inc_inflight, &hitlist); -#if IS_ENABLED(CONFIG_AF_UNIX_OOB) - if (u->oob_skb) { - kfree_skb(u->oob_skb); - u->oob_skb = NULL; - } -#endif - } - /* not_cycle_list contains those sockets which do not make up a * cycle. Restore these to the inflight list. */ -- 2.39.5 (Apple Git-154)

1 month, 3 weeks

5
8
0 0

[PATCH RESEND] drm/xe/display: Add check for alloc_ordered_workqueue()

by Haoxiang Li

Add check for the return value of alloc_ordered_workqueue() in xe_display_create() to catch potential exception. Fixes: 44e694958b95 ("drm/xe/display: Implement display support") Cc: stable(a)vger.kernel.org Signed-off-by: Haoxiang Li <haoxiang_li2024(a)163.com> --- drivers/gpu/drm/xe/display/xe_display.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/gpu/drm/xe/display/xe_display.c b/drivers/gpu/drm/xe/display/xe_display.c index 0b0aca7a25af..18062cfb265f 100644 --- a/drivers/gpu/drm/xe/display/xe_display.c +++ b/drivers/gpu/drm/xe/display/xe_display.c @@ -104,6 +104,8 @@ int xe_display_create(struct xe_device *xe) spin_lock_init(&xe->display.fb_tracking.lock); xe->display.hotplug.dp_wq = alloc_ordered_workqueue("xe-dp", 0); + if (!xe->display.hotplug.dp_wq) + return -ENOMEM; return drmm_add_action_or_reset(&xe->drm, display_destroy, NULL); } -- 2.25.1

1 month, 3 weeks

2
1
0 0

[PATCH v2 RESEND] drm/i915/display: Add check for alloc_ordered_workqueue() and alloc_workqueue()

by Haoxiang Li

Add check for the return value of alloc_ordered_workqueue() and alloc_workqueue(). Furthermore, if some allocations fail, cleanup works are added to avoid potential memory leak problem. Fixes: 40053823baad ("drm/i915/display: move modeset probe/remove functions to intel_display_driver.c") Cc: stable(a)vger.kernel.org Signed-off-by: Haoxiang Li <haoxiang_li2024(a)163.com> --- Changes in v2: - Split the compound conditional statement into separate conditional statements to facilitate cleanup works. - Add cleanup works to destory work queues if allocations fail, and modify the later goto destination to do the full excercise. - modify the patch description. Thanks, Jani! --- .../drm/i915/display/intel_display_driver.c | 30 +++++++++++++++---- 1 file changed, 25 insertions(+), 5 deletions(-) diff --git a/drivers/gpu/drm/i915/display/intel_display_driver.c b/drivers/gpu/drm/i915/display/intel_display_driver.c index 31740a677dd8..ac94561715dc 100644 --- a/drivers/gpu/drm/i915/display/intel_display_driver.c +++ b/drivers/gpu/drm/i915/display/intel_display_driver.c @@ -241,31 +241,45 @@ int intel_display_driver_probe_noirq(struct intel_display *display) intel_dmc_init(display); display->wq.modeset = alloc_ordered_workqueue("i915_modeset", 0); + if (!display->wq.modeset) { + ret = -ENOMEM; + goto cleanup_vga_client_pw_domain_dmc; + } + display->wq.flip = alloc_workqueue("i915_flip", WQ_HIGHPRI | WQ_UNBOUND, WQ_UNBOUND_MAX_ACTIVE); + if (!display->wq.flip) { + ret = -ENOMEM; + goto cleanup_wq_modeset; + } + display->wq.cleanup = alloc_workqueue("i915_cleanup", WQ_HIGHPRI, 0); + if (!display->wq.cleanup) { + ret = -ENOMEM; + goto cleanup_wq_flip; + } intel_mode_config_init(display); ret = intel_cdclk_init(display); if (ret) - goto cleanup_vga_client_pw_domain_dmc; + goto cleanup_wq_cleanup; ret = intel_color_init(display); if (ret) - goto cleanup_vga_client_pw_domain_dmc; + goto cleanup_wq_cleanup; ret = intel_dbuf_init(i915); if (ret) - goto cleanup_vga_client_pw_domain_dmc; + goto cleanup_wq_cleanup; ret = intel_bw_init(i915); if (ret) - goto cleanup_vga_client_pw_domain_dmc; + goto cleanup_wq_cleanup; ret = intel_pmdemand_init(display); if (ret) - goto cleanup_vga_client_pw_domain_dmc; + goto cleanup_wq_cleanup; intel_init_quirks(display); @@ -273,6 +287,12 @@ int intel_display_driver_probe_noirq(struct intel_display *display) return 0; +cleanup_wq_cleanup: + destroy_workqueue(display->wq.cleanup); +cleanup_wq_flip: + destroy_workqueue(display->wq.flip); +cleanup_wq_modeset: + destroy_workqueue(display->wq.modeset); cleanup_vga_client_pw_domain_dmc: intel_dmc_fini(display); intel_power_domains_driver_remove(display); -- 2.25.1

1 month, 3 weeks

2
1
0 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

Linux-stable-mirror April 2025