We are seeing deadlock in cifs code while updating volume in
cifs_reconnect. There are few fixes available in stable trees
already. This series backports some patches back to 5.4 stable.
__schedule+0x268/0x6e0
schedule+0x2f/0xa0
schedule_preempt_disabled+0xa/0x10
__mutex_lock.isra.7+0x20b/0x470
? dfs_cache_update_vol+0x45/0x2a0 [cifs]
dfs_cache_update_vol+0x45/0x2a0 [cifs]
cifs_reconnect+0x6f2/0xef0 [cifs]
cifs_handle_standard+0x18d/0x1b0 [cifs]
cifs_demultiplex_thread+0xa5c/0xc90 [cifs]
? cifs_handle_standard+0x1b0/0x1b0 [cifs]
Paulo Alcantara (SUSE) (5):
cifs: Clean up DFS referral cache
cifs: Get rid of kstrdup_const()'d paths
cifs: Introduce helpers for finding TCP connection
cifs: Merge is_path_valid() into get_normalized_path()
cifs: Fix potential deadlock when updating vol in cifs_reconnect()
fs/cifs/dfs_cache.c | 701 +++++++++++++++++++++++---------------------
1 file changed, 372 insertions(+), 329 deletions(-)
--
2.40.1
Hi Pablo,
While checking netfilter backports to the stable series, I noticed
that 6e1acfa387b9 ("netfilter: nf_tables: validate registers coming
from userspace.") was backported in various series for stable, and
included in 4.14.316, 4.19.284, 5.4.244, 5.15.32, 5.16.18, 5.17.1,
where the original fix was in 5.18-rc1.
While the commit has
Fixes: 49499c3e6e18 ("netfilter: nf_tables: switch registers to 32 bit addressing")
the 6e1acfa387b9 change got not backported to the 5.10.y series.
The backports to the other series are
https://lore.kernel.org/stable/20230516151606.4892-1-pablo@netfilter.org/https://lore.kernel.org/stable/20230516150613.4566-1-pablo@netfilter.org/https://lore.kernel.org/stable/20230516144435.4010-1-pablo@netfilter.org/
Pablo, was this an oversight and can the change as well be applied to
5.10.y?
From looking at the 5.4.y series, from the stable dependencies,
08a01c11a5bb ("netfilter: nftables: statify nft_parse_register()") is
missing in 5.10.y, then 6e1acfa387b9 ("netfilter: nf_tables: validate
registers coming from userspace.") can be applied (almost, the comment
needs to be dropped, as done in the backports).
I'm right now not understanding what I'm missing that it was for 5.4.y
but not 5.10.y after the report of the failed apply by Greg.
At least the two attached bring 5.10.y inline with 5.4.y up to 4) from
https://lore.kernel.org/stable/20230516144435.4010-1-pablo@netfilter.org/
but I'm unsure if you want/need as well the remaining 5), 6), 7), 8)
and 9).
Regards,
Salvatore
Linux 6.2 and newer are (mostly) unbootable on my old HP 6730b laptop, the 6.1.30 works still fine.
The weirdest thing is that newer kernels (like 6.3.4 and 6.4-rc3) may boot ok on the first try, but when rebooting, the very same version doesn't boot.
Some times, when trying to boot, I get this message repeated forever:
ACPI Error: No handler or method for GPE [XX], disabling event (20221020/evgpe-839)
On newer kernels, the date is 20230331 instead of 20221020. There is also some other error, but I can't read it as it gets overwritten by the other ACPI error, see image linked at the end.
And some times, the screen will just stay completely blank.
I tried booting with acpi=off, but it does not help.
I bisected and this is the first bad commit 7e68dd7d07a2
"Merge tag 'net-next-6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next"
As the later kernels had the seemingly random booting behaviour (mentioned above), I retested the last good one 7c4a6309e27f by booting it several times and it boots every time.
I tried getting some boot logs, but the boot process does not go far enough to make any logs.
Kernel .config file: https://0x0.st/Hqt1.txt
Environment (outputs of a working Linux 6.1 build):
Software (output of the ver_linux script): https://0x0.st/Hqte.txt
Processor information (from /proc/cpuinfo): https://0x0.st/Hqt2.txt
Module information (from /proc/modules): https://0x0.st/HqtL.txt
/proc/ioports: https://0x0.st/Hqt9.txt
/proc/iomem: https://0x0.st/Hqtf.txt
PCI information ('lspci -vvv' as root): https://0x0.st/HqtO.txt
SCSI information (from /proc/scsi/scsi)
Attached devices:
Host: scsi0 Channel: 00 Id: 00 Lun: 00
Vendor: ATA Model: KINGSTON SVP200S Rev: C4
Type: Direct-Access ANSI SCSI revision: 05
Host: scsi1 Channel: 00 Id: 00 Lun: 00
Vendor: hp Model: CDDVDW TS-L633M Rev: 0301
Type: CD-ROM ANSI SCSI revision: 05
Distribution: Arch Linux
Boot manager: systemd-boot (UEFI)
git bisect log: https://0x0.st/Hqgx.txt
ACPI Error (sorry for the dusty screen): https://0x0.st/HqEk.jpeg
#regzbot ^introduced 7e68dd7d07a2
Best regards
Sami Korkalainen
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x 54abe19e00cfcc5a72773d15cd00ed19ab763439
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023062336-squall-impotence-3b78@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 54abe19e00cfcc5a72773d15cd00ed19ab763439 Mon Sep 17 00:00:00 2001
From: Rafael Aquini <aquini(a)redhat.com>
Date: Tue, 6 Jun 2023 19:36:13 -0400
Subject: [PATCH] writeback: fix dereferencing NULL mapping->host on
writeback_page_template
When commit 19343b5bdd16 ("mm/page-writeback: introduce tracepoint for
wait_on_page_writeback()") repurposed the writeback_dirty_page trace event
as a template to create its new wait_on_page_writeback trace event, it
ended up opening a window to NULL pointer dereference crashes due to the
(infrequent) occurrence of a race where an access to a page in the
swap-cache happens concurrently with the moment this page is being written
to disk and the tracepoint is enabled:
BUG: kernel NULL pointer dereference, address: 0000000000000040
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 800000010ec0a067 P4D 800000010ec0a067 PUD 102353067 PMD 0
Oops: 0000 [#1] PREEMPT SMP PTI
CPU: 1 PID: 1320 Comm: shmem-worker Kdump: loaded Not tainted 6.4.0-rc5+ #13
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS edk2-20230301gitf80f052277c8-1.fc37 03/01/2023
RIP: 0010:trace_event_raw_event_writeback_folio_template+0x76/0xf0
Code: 4d 85 e4 74 5c 49 8b 3c 24 e8 06 98 ee ff 48 89 c7 e8 9e 8b ee ff ba 20 00 00 00 48 89 ef 48 89 c6 e8 fe d4 1a 00 49 8b 04 24 <48> 8b 40 40 48 89 43 28 49 8b 45 20 48 89 e7 48 89 43 30 e8 a2 4d
RSP: 0000:ffffaad580b6fb60 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff90e38035c01c RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff90e38035c044
RBP: ffff90e38035c024 R08: 0000000000000002 R09: 0000000000000006
R10: ffff90e38035c02e R11: 0000000000000020 R12: ffff90e380bac000
R13: ffffe3a7456d9200 R14: 0000000000001b81 R15: ffffe3a7456d9200
FS: 00007f2e4e8a15c0(0000) GS:ffff90e3fbc80000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000040 CR3: 00000001150c6003 CR4: 0000000000170ee0
Call Trace:
<TASK>
? __die+0x20/0x70
? page_fault_oops+0x76/0x170
? kernelmode_fixup_or_oops+0x84/0x110
? exc_page_fault+0x65/0x150
? asm_exc_page_fault+0x22/0x30
? trace_event_raw_event_writeback_folio_template+0x76/0xf0
folio_wait_writeback+0x6b/0x80
shmem_swapin_folio+0x24a/0x500
? filemap_get_entry+0xe3/0x140
shmem_get_folio_gfp+0x36e/0x7c0
? find_busiest_group+0x43/0x1a0
shmem_fault+0x76/0x2a0
? __update_load_avg_cfs_rq+0x281/0x2f0
__do_fault+0x33/0x130
do_read_fault+0x118/0x160
do_pte_missing+0x1ed/0x2a0
__handle_mm_fault+0x566/0x630
handle_mm_fault+0x91/0x210
do_user_addr_fault+0x22c/0x740
exc_page_fault+0x65/0x150
asm_exc_page_fault+0x22/0x30
This problem arises from the fact that the repurposed writeback_dirty_page
trace event code was written assuming that every pointer to mapping
(struct address_space) would come from a file-mapped page-cache object,
thus mapping->host would always be populated, and that was a valid case
before commit 19343b5bdd16. The swap-cache address space
(swapper_spaces), however, doesn't populate its ->host (struct inode)
pointer, thus leading to the crashes in the corner-case aforementioned.
commit 19343b5bdd16 ended up breaking the assignment of __entry->name and
__entry->ino for the wait_on_page_writeback tracepoint -- both dependent
on mapping->host carrying a pointer to a valid inode. The assignment of
__entry->name was fixed by commit 68f23b89067f ("memcg: fix a crash in
wb_workfn when a device disappears"), and this commit fixes the remaining
case, for __entry->ino.
Link: https://lkml.kernel.org/r/20230606233613.1290819-1-aquini@redhat.com
Fixes: 19343b5bdd16 ("mm/page-writeback: introduce tracepoint for wait_on_page_writeback()")
Signed-off-by: Rafael Aquini <aquini(a)redhat.com>
Reviewed-by: Yafang Shao <laoar.shao(a)gmail.com>
Cc: Aristeu Rozanski <aris(a)redhat.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/include/trace/events/writeback.h b/include/trace/events/writeback.h
index 86b2a82da546..54e353c9f919 100644
--- a/include/trace/events/writeback.h
+++ b/include/trace/events/writeback.h
@@ -68,7 +68,7 @@ DECLARE_EVENT_CLASS(writeback_folio_template,
strscpy_pad(__entry->name,
bdi_dev_name(mapping ? inode_to_bdi(mapping->host) :
NULL), 32);
- __entry->ino = mapping ? mapping->host->i_ino : 0;
+ __entry->ino = (mapping && mapping->host) ? mapping->host->i_ino : 0;
__entry->index = folio->index;
),
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 782e53d0c14420858dbf0f8f797973c150d3b6d7
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023062316-swooned-scurvy-040f@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 782e53d0c14420858dbf0f8f797973c150d3b6d7 Mon Sep 17 00:00:00 2001
From: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Date: Mon, 12 Jun 2023 11:14:56 +0900
Subject: [PATCH] nilfs2: prevent general protection fault in
nilfs_clear_dirty_page()
In a syzbot stress test that deliberately causes file system errors on
nilfs2 with a corrupted disk image, it has been reported that
nilfs_clear_dirty_page() called from nilfs_clear_dirty_pages() can cause a
general protection fault.
In nilfs_clear_dirty_pages(), when looking up dirty pages from the page
cache and calling nilfs_clear_dirty_page() for each dirty page/folio
retrieved, the back reference from the argument page to "mapping" may have
been changed to NULL (and possibly others). It is necessary to check this
after locking the page/folio.
So, fix this issue by not calling nilfs_clear_dirty_page() on a page/folio
after locking it in nilfs_clear_dirty_pages() if the back reference
"mapping" from the page/folio is different from the "mapping" that held
the page/folio just before.
Link: https://lkml.kernel.org/r/20230612021456.3682-1-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Reported-by: syzbot+53369d11851d8f26735c(a)syzkaller.appspotmail.com
Closes: https://lkml.kernel.org/r/000000000000da4f6b05eb9bf593@google.com
Tested-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/fs/nilfs2/page.c b/fs/nilfs2/page.c
index 5cf30827f244..b4e54d079b7d 100644
--- a/fs/nilfs2/page.c
+++ b/fs/nilfs2/page.c
@@ -370,7 +370,15 @@ void nilfs_clear_dirty_pages(struct address_space *mapping, bool silent)
struct folio *folio = fbatch.folios[i];
folio_lock(folio);
- nilfs_clear_dirty_page(&folio->page, silent);
+
+ /*
+ * This folio may have been removed from the address
+ * space by truncation or invalidation when the lock
+ * was acquired. Skip processing in that case.
+ */
+ if (likely(folio->mapping == mapping))
+ nilfs_clear_dirty_page(&folio->page, silent);
+
folio_unlock(folio);
}
folio_batch_release(&fbatch);