The quilt patch titled
Subject: x86/kgdb: fix hang on failed breakpoint removal
has been removed from the -mm tree. Its filename was
x86-kgdb-fix-hang-on-failed-breakpoint-removal.patch
This patch was dropped because an updated version will be issued
------------------------------------------------------
From: Florian Rommel <mail(a)florommel.de>
Subject: x86/kgdb: fix hang on failed breakpoint removal
Date: Mon, 12 Aug 2024 01:22:08 +0200
On x86, occasionally, the removal of a breakpoint (i.e., removal of the
int3 instruction) fails because the text_mutex is taken by another CPU
(mainly due to the static_key mechanism, I think). The function
kgdb_skipexception catches exceptions from these spurious int3
instructions, bails out of KGDB, and continues execution from the previous
PC address.
However, this led to an endless loop between the int3 instruction and
kgdb_skipexception since the int3 instruction (being still present)
triggered again. This effectively caused the system to hang.
With this patch, we try to remove the concerned spurious int3 instruction
in kgdb_skipexception before continuing execution. This may take a few
attempts until the concurrent holders of the text_mutex have released it,
but eventually succeeds and the kernel can continue.
Link: https://lkml.kernel.org/r/20240811232208.234261-3-mail@florommel.de
Signed-off-by: Florian Rommel <mail(a)florommel.de>
Cc: Borislav Petkov <bp(a)alien8.de>
Cc: Christophe JAILLET <christophe.jaillet(a)wanadoo.fr>
Cc: Christophe Leroy <christophe.leroy(a)csgroup.eu>
Cc: Daniel Thompson <daniel.thompson(a)linaro.org>
Cc: Dave Hansen <dave.hansen(a)linux.intel.com>
Cc: Douglas Anderson <dianders(a)chromium.org>
Cc: Geert Uytterhoeven <geert+renesas(a)glider.be>
Cc: "H. Peter Anvin" <hpa(a)zytor.com>
Cc: Ingo Molnar <mingo(a)redhat.com>
Cc: Jason Wessel <jason.wessel(a)windriver.com>
Cc: Masami Hiramatsu <mhiramat(a)kernel.org>
Cc: Peter Zijlstra (Intel) <peterz(a)infradead.org>
Cc: Randy Dunlap <rdunlap(a)infradead.org>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
arch/x86/kernel/kgdb.c | 24 ++++++++++++++++++++++++
1 file changed, 24 insertions(+)
--- a/arch/x86/kernel/kgdb.c~x86-kgdb-fix-hang-on-failed-breakpoint-removal
+++ a/arch/x86/kernel/kgdb.c
@@ -723,7 +723,31 @@ void kgdb_arch_exit(void)
int kgdb_skipexception(int exception, struct pt_regs *regs)
{
if (exception == 3 && kgdb_isremovedbreak(regs->ip - 1)) {
+ struct kgdb_bkpt *bpt;
+ int i, error;
+
regs->ip -= 1;
+
+ /*
+ * Try to remove the spurious int3 instruction.
+ * These int3s can result from failed breakpoint removals
+ * in kgdb_arch_remove_breakpoint.
+ */
+ for (bpt = NULL, i = 0; i < KGDB_MAX_BREAKPOINTS; i++) {
+ if (kgdb_break[i].bpt_addr == regs->ip &&
+ kgdb_break[i].state == BP_REMOVED &&
+ (kgdb_break[i].type == BP_BREAKPOINT ||
+ kgdb_break[i].type == BP_POKE_BREAKPOINT)) {
+ bpt = &kgdb_break[i];
+ break;
+ }
+ }
+ if (!bpt)
+ return 1;
+ error = kgdb_arch_remove_breakpoint(bpt);
+ if (error)
+ pr_err("skipexception: breakpoint remove failed: %lx\n",
+ bpt->bpt_addr);
return 1;
}
return 0;
_
Patches currently in -mm which might be from mail(a)florommel.de are
The quilt patch titled
Subject: x86/kgdb: convert early breakpoints to poke breakpoints
has been removed from the -mm tree. Its filename was
x86-kgdb-convert-early-breakpoints-to-poke-breakpoints.patch
This patch was dropped because an updated version will be issued
------------------------------------------------------
From: Florian Rommel <mail(a)florommel.de>
Subject: x86/kgdb: convert early breakpoints to poke breakpoints
Date: Mon, 12 Aug 2024 01:22:07 +0200
Patch series "kgdb: x86: fix breakpoint removal problems".
This series fixes two problems with KGDB on x86 concerning the removal
of breakpoints, causing the kernel to hang. Note that breakpoint
removal is not only performed when explicitly deleting a breakpoint,
but also happens before continuing execution or single stepping.
This patch (of 2):
On x86, after booting, the kernel text is read-only. Then, KGDB has to
use the text_poke mechanism to install software breakpoints. KGDB uses a
special (x86-specific) breakpoint type for these kinds of breakpoints
(BP_POKE_BREAKPOINT). When removing a breakpoint, KGDB always adheres to
the breakpoint's original installment method, which is determined by its
type.
Before this fix, early (non-"poke") breakpoints could not be removed after
the kernel text was set as read-only since the original code patching
mechanism was no longer allowed to remove the breakpoints. Eventually,
this even caused the kernel to hang (loop between int3 instruction and the
function kgdb_skipexception).
With this patch, we convert early breakpoints to "poke" breakpoints after
the kernel text has been made read-only. This makes them removable later.
Link: https://lkml.kernel.org/r/20240811232208.234261-1-mail@florommel.de
Link: https://lkml.kernel.org/r/20240811232208.234261-2-mail@florommel.de
Signed-off-by: Florian Rommel <mail(a)florommel.de>
Cc: Borislav Petkov <bp(a)alien8.de>
Cc: Christophe JAILLET <christophe.jaillet(a)wanadoo.fr>
Cc: Christophe Leroy <christophe.leroy(a)csgroup.eu>
Cc: Daniel Thompson <daniel.thompson(a)linaro.org>
Cc: Dave Hansen <dave.hansen(a)linux.intel.com>
Cc: Douglas Anderson <dianders(a)chromium.org>
Cc: Geert Uytterhoeven <geert+renesas(a)glider.be>
Cc: "H. Peter Anvin" <hpa(a)zytor.com>
Cc: Ingo Molnar <mingo(a)redhat.com>
Cc: Jason Wessel <jason.wessel(a)windriver.com>
Cc: Masami Hiramatsu <mhiramat(a)kernel.org>
Cc: Peter Zijlstra (Intel) <peterz(a)infradead.org>
Cc: Randy Dunlap <rdunlap(a)infradead.org>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
arch/x86/kernel/kgdb.c | 14 ++++++++++++++
include/linux/kgdb.h | 3 +++
init/main.c | 1 +
kernel/debug/debug_core.c | 7 ++++++-
4 files changed, 24 insertions(+), 1 deletion(-)
--- a/arch/x86/kernel/kgdb.c~x86-kgdb-convert-early-breakpoints-to-poke-breakpoints
+++ a/arch/x86/kernel/kgdb.c
@@ -623,6 +623,20 @@ out:
return retval;
}
+void kgdb_after_mark_readonly(void)
+{
+ int i;
+
+ /* Convert all breakpoints in rodata to BP_POKE_BREAKPOINT. */
+ for (i = 0; i < KGDB_MAX_BREAKPOINTS; i++) {
+ if (kgdb_break[i].state != BP_UNDEFINED &&
+ kgdb_break[i].type == BP_BREAKPOINT &&
+ is_kernel_text(kgdb_break[i].bpt_addr)) {
+ kgdb_break[i].type = BP_POKE_BREAKPOINT;
+ }
+ }
+}
+
static void kgdb_hw_overflow_handler(struct perf_event *event,
struct perf_sample_data *data, struct pt_regs *regs)
{
--- a/include/linux/kgdb.h~x86-kgdb-convert-early-breakpoints-to-poke-breakpoints
+++ a/include/linux/kgdb.h
@@ -98,6 +98,8 @@ extern int dbg_set_reg(int regno, void *
# define KGDB_MAX_BREAKPOINTS 1000
#endif
+extern struct kgdb_bkpt kgdb_break[KGDB_MAX_BREAKPOINTS];
+
#define KGDB_HW_BREAKPOINT 1
/*
@@ -360,6 +362,7 @@ extern bool dbg_is_early;
extern void __init dbg_late_init(void);
extern void kgdb_panic(const char *msg);
extern void kgdb_free_init_mem(void);
+extern void kgdb_after_mark_readonly(void);
#else /* ! CONFIG_KGDB */
#define in_dbg_master() (0)
#define dbg_late_init()
--- a/init/main.c~x86-kgdb-convert-early-breakpoints-to-poke-breakpoints
+++ a/init/main.c
@@ -1441,6 +1441,7 @@ static void mark_readonly(void)
mark_rodata_ro();
debug_checkwx();
rodata_test();
+ kgdb_after_mark_readonly();
} else if (IS_ENABLED(CONFIG_STRICT_KERNEL_RWX)) {
pr_info("Kernel memory protection disabled.\n");
} else if (IS_ENABLED(CONFIG_ARCH_HAS_STRICT_KERNEL_RWX)) {
--- a/kernel/debug/debug_core.c~x86-kgdb-convert-early-breakpoints-to-poke-breakpoints
+++ a/kernel/debug/debug_core.c
@@ -98,7 +98,7 @@ module_param(kgdbreboot, int, 0644);
* Holds information about breakpoints in a kernel. These breakpoints are
* added and removed by gdb.
*/
-static struct kgdb_bkpt kgdb_break[KGDB_MAX_BREAKPOINTS] = {
+struct kgdb_bkpt kgdb_break[KGDB_MAX_BREAKPOINTS] = {
[0 ... KGDB_MAX_BREAKPOINTS-1] = { .state = BP_UNDEFINED }
};
@@ -452,6 +452,11 @@ void kgdb_free_init_mem(void)
}
}
+void __weak kgdb_after_mark_readonly(void)
+{
+ /* Weak implementation, may be overridden by arch code */
+}
+
#ifdef CONFIG_KGDB_KDB
void kdb_dump_stack_on_cpu(int cpu)
{
_
Patches currently in -mm which might be from mail(a)florommel.de are
x86-kgdb-fix-hang-on-failed-breakpoint-removal.patch
The patch titled
Subject: nilfs2: fix missing cleanup on rollforward recovery error
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
nilfs2-fix-missing-cleanup-on-rollforward-recovery-error.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Subject: nilfs2: fix missing cleanup on rollforward recovery error
Date: Sat, 10 Aug 2024 15:52:42 +0900
In an error injection test of a routine for mount-time recovery, KASAN
found a use-after-free bug.
It turned out that if data recovery was performed using partial logs
created by dsync writes, but an error occurred before starting the log
writer to create a recovered checkpoint, the inodes whose data had been
recovered were left in the ns_dirty_files list of the nilfs object and
were not freed.
Fix this issue by cleaning up inodes that have read the recovery data if
the recovery routine fails midway before the log writer starts.
Link: https://lkml.kernel.org/r/20240810065242.3701-1-konishi.ryusuke@gmail.com
Fixes: 0f3e1c7f23f8 ("nilfs2: recovery functions")
Signed-off-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Tested-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/nilfs2/recovery.c | 35 +++++++++++++++++++++++++++++++++--
1 file changed, 33 insertions(+), 2 deletions(-)
--- a/fs/nilfs2/recovery.c~nilfs2-fix-missing-cleanup-on-rollforward-recovery-error
+++ a/fs/nilfs2/recovery.c
@@ -716,6 +716,33 @@ static void nilfs_finish_roll_forward(st
}
/**
+ * nilfs_abort_roll_forward - cleaning up after a failed rollforward recovery
+ * @nilfs: nilfs object
+ */
+static void nilfs_abort_roll_forward(struct the_nilfs *nilfs)
+{
+ struct nilfs_inode_info *ii, *n;
+ LIST_HEAD(head);
+
+ /* Abandon inodes that have read recovery data */
+ spin_lock(&nilfs->ns_inode_lock);
+ list_splice_init(&nilfs->ns_dirty_files, &head);
+ spin_unlock(&nilfs->ns_inode_lock);
+ if (list_empty(&head))
+ return;
+
+ set_nilfs_purging(nilfs);
+ list_for_each_entry_safe(ii, n, &head, i_dirty) {
+ spin_lock(&nilfs->ns_inode_lock);
+ list_del_init(&ii->i_dirty);
+ spin_unlock(&nilfs->ns_inode_lock);
+
+ iput(&ii->vfs_inode);
+ }
+ clear_nilfs_purging(nilfs);
+}
+
+/**
* nilfs_salvage_orphan_logs - salvage logs written after the latest checkpoint
* @nilfs: nilfs object
* @sb: super block instance
@@ -773,15 +800,19 @@ int nilfs_salvage_orphan_logs(struct the
if (unlikely(err)) {
nilfs_err(sb, "error %d writing segment for recovery",
err);
- goto failed;
+ goto put_root;
}
nilfs_finish_roll_forward(nilfs, ri);
}
- failed:
+put_root:
nilfs_put_root(root);
return err;
+
+failed:
+ nilfs_abort_roll_forward(nilfs);
+ goto put_root;
}
/**
_
Patches currently in -mm which might be from konishi.ryusuke(a)gmail.com are
nilfs2-protect-references-to-superblock-parameters-exposed-in-sysfs.patch
nilfs2-fix-missing-cleanup-on-rollforward-recovery-error.patch
The patch titled
Subject: nilfs2: protect references to superblock parameters exposed in sysfs
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
nilfs2-protect-references-to-superblock-parameters-exposed-in-sysfs.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Subject: nilfs2: protect references to superblock parameters exposed in sysfs
Date: Sun, 11 Aug 2024 19:03:20 +0900
The superblock buffers of nilfs2 can not only be overwritten at runtime
for modifications/repairs, but they are also regularly swapped, replaced
during resizing, and even abandoned when degrading to one side due to
backing device issues. So, accessing them requires mutual exclusion using
the reader/writer semaphore "nilfs->ns_sem".
Some sysfs attribute show methods read this superblock buffer without the
necessary mutual exclusion, which can cause problems with pointer
dereferencing and memory access, so fix it.
Link: https://lkml.kernel.org/r/20240811100320.9913-1-konishi.ryusuke@gmail.com
Fixes: da7141fb78db ("nilfs2: add /sys/fs/nilfs2/<device> group")
Signed-off-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/nilfs2/sysfs.c | 43 +++++++++++++++++++++++++++++++++----------
1 file changed, 33 insertions(+), 10 deletions(-)
--- a/fs/nilfs2/sysfs.c~nilfs2-protect-references-to-superblock-parameters-exposed-in-sysfs
+++ a/fs/nilfs2/sysfs.c
@@ -836,9 +836,15 @@ ssize_t nilfs_dev_revision_show(struct n
struct the_nilfs *nilfs,
char *buf)
{
- struct nilfs_super_block **sbp = nilfs->ns_sbp;
- u32 major = le32_to_cpu(sbp[0]->s_rev_level);
- u16 minor = le16_to_cpu(sbp[0]->s_minor_rev_level);
+ struct nilfs_super_block *raw_sb;
+ u32 major;
+ u16 minor;
+
+ down_read(&nilfs->ns_sem);
+ raw_sb = nilfs->ns_sbp[0];
+ major = le32_to_cpu(raw_sb->s_rev_level);
+ minor = le16_to_cpu(raw_sb->s_minor_rev_level);
+ up_read(&nilfs->ns_sem);
return sysfs_emit(buf, "%d.%d\n", major, minor);
}
@@ -856,8 +862,13 @@ ssize_t nilfs_dev_device_size_show(struc
struct the_nilfs *nilfs,
char *buf)
{
- struct nilfs_super_block **sbp = nilfs->ns_sbp;
- u64 dev_size = le64_to_cpu(sbp[0]->s_dev_size);
+ struct nilfs_super_block *raw_sb;
+ u64 dev_size;
+
+ down_read(&nilfs->ns_sem);
+ raw_sb = nilfs->ns_sbp[0];
+ dev_size = le64_to_cpu(raw_sb->s_dev_size);
+ up_read(&nilfs->ns_sem);
return sysfs_emit(buf, "%llu\n", dev_size);
}
@@ -879,9 +890,15 @@ ssize_t nilfs_dev_uuid_show(struct nilfs
struct the_nilfs *nilfs,
char *buf)
{
- struct nilfs_super_block **sbp = nilfs->ns_sbp;
+ struct nilfs_super_block *raw_sb;
+ ssize_t len;
+
+ down_read(&nilfs->ns_sem);
+ raw_sb = nilfs->ns_sbp[0];
+ len = sysfs_emit(buf, "%pUb\n", raw_sb->s_uuid);
+ up_read(&nilfs->ns_sem);
- return sysfs_emit(buf, "%pUb\n", sbp[0]->s_uuid);
+ return len;
}
static
@@ -889,10 +906,16 @@ ssize_t nilfs_dev_volume_name_show(struc
struct the_nilfs *nilfs,
char *buf)
{
- struct nilfs_super_block **sbp = nilfs->ns_sbp;
+ struct nilfs_super_block *raw_sb;
+ ssize_t len;
+
+ down_read(&nilfs->ns_sem);
+ raw_sb = nilfs->ns_sbp[0];
+ len = scnprintf(buf, sizeof(raw_sb->s_volume_name), "%s\n",
+ raw_sb->s_volume_name);
+ up_read(&nilfs->ns_sem);
- return scnprintf(buf, sizeof(sbp[0]->s_volume_name), "%s\n",
- sbp[0]->s_volume_name);
+ return len;
}
static const char dev_readme_str[] =
_
Patches currently in -mm which might be from konishi.ryusuke(a)gmail.com are
nilfs2-protect-references-to-superblock-parameters-exposed-in-sysfs.patch
The quilt patch titled
Subject: nilfs2: fix state management in error path of log writing function
has been removed from the -mm tree. Its filename was
nilfs2-fix-state-management-in-error-path-of-log-writing-function.patch
This patch was dropped because it was withdrawn
------------------------------------------------------
From: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Subject: nilfs2: fix state management in error path of log writing function
Date: Thu, 8 Aug 2024 08:07:42 +0900
After commit a694291a6211 ("nilfs2: separate wait function from
nilfs_segctor_write") was applied, the log writing function
nilfs_segctor_do_construct() was able to issue I/O requests continuously
even if user data blocks were split into multiple logs across segments,
but two potential flaws were introduced in its error handling.
First, if nilfs_segctor_begin_construction() fails while creating the
second or subsequent logs, the log writing function returns without
calling nilfs_segctor_abort_construction(), so the writeback flag set on
pages/folios will remain uncleared. This causes page cache operations to
hang waiting for the writeback flag. For example,
truncate_inode_pages_final(), which is called via nilfs_evict_inode() when
an inode is evicted from memory, will hang.
Second, the NILFS_I_COLLECTED flag set on normal inodes remain uncleared.
As a result, if the next log write involves checkpoint creation, that's
fine, but if a partial log write is performed that does not, inodes with
NILFS_I_COLLECTED set are erroneously removed from the "sc_dirty_files"
list, and their data and b-tree blocks may not be written to the device,
corrupting the block mapping.
Fix these issues by correcting the jump destination of the error branch in
nilfs_segctor_do_construct() and the condition for calling
nilfs_redirty_inodes(), which clears the NILFS_I_COLLECTED flag.
Link: https://lkml.kernel.org/r/20240807230742.11151-1-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Fixes: a694291a6211 ("nilfs2: separate wait function from nilfs_segctor_write")
Tested-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/nilfs2/segment.c | 7 +++----
1 file changed, 3 insertions(+), 4 deletions(-)
--- a/fs/nilfs2/segment.c~nilfs2-fix-state-management-in-error-path-of-log-writing-function
+++ a/fs/nilfs2/segment.c
@@ -2056,7 +2056,7 @@ static int nilfs_segctor_do_construct(st
err = nilfs_segctor_begin_construction(sci, nilfs);
if (unlikely(err))
- goto out;
+ goto failed;
/* Update time stamp */
sci->sc_seg_ctime = ktime_get_real_seconds();
@@ -2120,10 +2120,9 @@ static int nilfs_segctor_do_construct(st
return err;
failed_to_write:
- if (sci->sc_stage.flags & NILFS_CF_IFILE_STARTED)
- nilfs_redirty_inodes(&sci->sc_dirty_files);
-
failed:
+ if (mode == SC_LSEG_SR && nilfs_sc_cstage_get(sci) >= NILFS_ST_IFILE)
+ nilfs_redirty_inodes(&sci->sc_dirty_files);
if (nilfs_doing_gc())
nilfs_redirty_inodes(&sci->sc_gc_inodes);
nilfs_segctor_abort_construction(sci, nilfs, err);
_
Patches currently in -mm which might be from konishi.ryusuke(a)gmail.com are
The patch titled
Subject: Squashfs: sanity check symbolic link size
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
squashfs-sanity-check-symbolic-link-size.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Phillip Lougher <phillip(a)squashfs.org.uk>
Subject: Squashfs: sanity check symbolic link size
Date: Sun, 11 Aug 2024 21:13:01 +0100
Syzkiller reports a "KMSAN: uninit-value in pick_link" bug.
This is caused by an uninitialised page, which is ultimately caused
by a corrupted symbolic link size read from disk.
The reason why the corrupted symlink size causes an uninitialised
page is due to the following sequence of events:
1. squashfs_read_inode() is called to read the symbolic
link from disk. This assigns the corrupted value
3875536935 to inode->i_size.
2. Later squashfs_symlink_read_folio() is called, which assigns
this corrupted value to the length variable, which being a
signed int, overflows producing a negative number.
3. The following loop that fills in the page contents checks that
the copied bytes is less than length, which being negative means
the loop is skipped, producing an unitialised page.
This patch adds a sanity check which checks that the symbolic
link size is not larger than expected.
Link: https://lkml.kernel.org/r/20240811201301.13076-1-phillip@squashfs.org.uk
Signed-off-by: Phillip Lougher <phillip(a)squashfs.org.uk>
Reported-by: Lizhi Xu <lizhi.xu(a)windriver.com>
Reported-by: syzbot+24ac24ff58dc5b0d26b9(a)syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/000000000000a90e8c061e86a76b@google.com/
Cc: Al Viro <viro(a)zeniv.linux.org.uk>
Cc: Christian Brauner <brauner(a)kernel.org>
Cc: Jan Kara <jack(a)suse.cz>
Cc: Phillip Lougher <phillip(a)squashfs.org.uk>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/squashfs/inode.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
--- a/fs/squashfs/inode.c~squashfs-sanity-check-symbolic-link-size
+++ a/fs/squashfs/inode.c
@@ -279,8 +279,13 @@ int squashfs_read_inode(struct inode *in
if (err < 0)
goto failed_read;
- set_nlink(inode, le32_to_cpu(sqsh_ino->nlink));
inode->i_size = le32_to_cpu(sqsh_ino->symlink_size);
+ if (inode->i_size > PAGE_SIZE) {
+ ERROR("Corrupted symlink\n");
+ return -EINVAL;
+ }
+
+ set_nlink(inode, le32_to_cpu(sqsh_ino->nlink));
inode->i_op = &squashfs_symlink_inode_ops;
inode_nohighmem(inode);
inode->i_data.a_ops = &squashfs_symlink_aops;
_
Patches currently in -mm which might be from phillip(a)squashfs.org.uk are
squashfs-sanity-check-symbolic-link-size.patch
The patch titled
Subject: userfaultfd: don't BUG_ON() if khugepaged yanks our page table
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
userfaultfd-dont-bug_on-if-khugepaged-yanks-our-page-table.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Jann Horn <jannh(a)google.com>
Subject: userfaultfd: don't BUG_ON() if khugepaged yanks our page table
Date: Tue, 13 Aug 2024 22:25:22 +0200
Since khugepaged was changed to allow retracting page tables in file
mappings without holding the mmap lock, these BUG_ON()s are wrong - get
rid of them.
We could also remove the preceding "if (unlikely(...))" block, but then we
could reach pte_offset_map_lock() with transhuge pages not just for file
mappings but also for anonymous mappings - which would probably be fine
but I think is not necessarily expected.
Link: https://lkml.kernel.org/r/20240813-uffd-thp-flip-fix-v2-2-5efa61078a41@goog…
Fixes: 1d65b771bc08 ("mm/khugepaged: retract_page_tables() without mmap or vma lock")
Signed-off-by: Jann Horn <jannh(a)google.com>
Reviewed-by: Qi Zheng <zhengqi.arch(a)bytedance.com>
Acked-by: David Hildenbrand <david(a)redhat.com>
Cc: Andrea Arcangeli <aarcange(a)redhat.com>
Cc: Hugh Dickins <hughd(a)google.com>
Cc: Pavel Emelyanov <xemul(a)virtuozzo.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/userfaultfd.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)
--- a/mm/userfaultfd.c~userfaultfd-dont-bug_on-if-khugepaged-yanks-our-page-table
+++ a/mm/userfaultfd.c
@@ -807,9 +807,10 @@ retry:
err = -EFAULT;
break;
}
-
- BUG_ON(pmd_none(*dst_pmd));
- BUG_ON(pmd_trans_huge(*dst_pmd));
+ /*
+ * For shmem mappings, khugepaged is allowed to remove page
+ * tables under us; pte_offset_map_lock() will deal with that.
+ */
err = mfill_atomic_pte(dst_pmd, dst_vma, dst_addr,
src_addr, flags, &folio);
_
Patches currently in -mm which might be from jannh(a)google.com are
userfaultfd-fix-checks-for-huge-pmds.patch
userfaultfd-dont-bug_on-if-khugepaged-yanks-our-page-table.patch
mm-fix-harmless-type-confusion-in-lock_vma_under_rcu.patch
The patch titled
Subject: userfaultfd: fix checks for huge PMDs
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
userfaultfd-fix-checks-for-huge-pmds.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Jann Horn <jannh(a)google.com>
Subject: userfaultfd: fix checks for huge PMDs
Date: Tue, 13 Aug 2024 22:25:21 +0200
Patch series "userfaultfd: fix races around pmd_trans_huge() check", v2.
The pmd_trans_huge() code in mfill_atomic() is wrong in three different
ways depending on kernel version:
1. The pmd_trans_huge() check is racy and can lead to a BUG_ON() (if you hit
the right two race windows) - I've tested this in a kernel build with
some extra mdelay() calls. See the commit message for a description
of the race scenario.
On older kernels (before 6.5), I think the same bug can even
theoretically lead to accessing transhuge page contents as a page table
if you hit the right 5 narrow race windows (I haven't tested this case).
2. As pointed out by Qi Zheng, pmd_trans_huge() is not sufficient for
detecting PMDs that don't point to page tables.
On older kernels (before 6.5), you'd just have to win a single fairly
wide race to hit this.
I've tested this on 6.1 stable by racing migration (with a mdelay()
patched into try_to_migrate()) against UFFDIO_ZEROPAGE - on my x86
VM, that causes a kernel oops in ptlock_ptr().
3. On newer kernels (>=6.5), for shmem mappings, khugepaged is allowed
to yank page tables out from under us (though I haven't tested that),
so I think the BUG_ON() checks in mfill_atomic() are just wrong.
I decided to write two separate fixes for these (one fix for bugs 1+2, one
fix for bug 3), so that the first fix can be backported to kernels
affected by bugs 1+2.
This patch (of 2):
This fixes two issues.
I discovered that the following race can occur:
mfill_atomic other thread
============ ============
<zap PMD>
pmdp_get_lockless() [reads none pmd]
<bail if trans_huge>
<if none:>
<pagefault creates transhuge zeropage>
__pte_alloc [no-op]
<zap PMD>
<bail if pmd_trans_huge(*dst_pmd)>
BUG_ON(pmd_none(*dst_pmd))
I have experimentally verified this in a kernel with extra mdelay() calls;
the BUG_ON(pmd_none(*dst_pmd)) triggers.
On kernels newer than commit 0d940a9b270b ("mm/pgtable: allow
pte_offset_map[_lock]() to fail"), this can't lead to anything worse than
a BUG_ON(), since the page table access helpers are actually designed to
deal with page tables concurrently disappearing; but on older kernels
(<=6.4), I think we could probably theoretically race past the two
BUG_ON() checks and end up treating a hugepage as a page table.
The second issue is that, as Qi Zheng pointed out, there are other types
of huge PMDs that pmd_trans_huge() can't catch: devmap PMDs and swap PMDs
(in particular, migration PMDs).
On <=6.4, this is worse than the first issue: If mfill_atomic() runs on a
PMD that contains a migration entry (which just requires winning a single,
fairly wide race), it will pass the PMD to pte_offset_map_lock(), which
assumes that the PMD points to a page table.
Breakage follows: First, the kernel tries to take the PTE lock (which will
crash or maybe worse if there is no "struct page" for the address bits in
the migration entry PMD - I think at least on X86 there usually is no
corresponding "struct page" thanks to the PTE inversion mitigation, amd64
looks different).
If that didn't crash, the kernel would next try to write a PTE into what
it wrongly thinks is a page table.
As part of fixing these issues, get rid of the check for pmd_trans_huge()
before __pte_alloc() - that's redundant, we're going to have to check for
that after the __pte_alloc() anyway.
Backport note: pmdp_get_lockless() is pmd_read_atomic() in older kernels.
Link: https://lkml.kernel.org/r/20240813-uffd-thp-flip-fix-v2-0-5efa61078a41@goog…
Link: https://lkml.kernel.org/r/20240813-uffd-thp-flip-fix-v2-1-5efa61078a41@goog…
Fixes: c1a4de99fada ("userfaultfd: mcopy_atomic|mfill_zeropage: UFFDIO_COPY|UFFDIO_ZEROPAGE preparation")
Signed-off-by: Jann Horn <jannh(a)google.com>
Reported-by: Qi Zheng <zhengqi.arch(a)bytedance.com>
Closes: https://lore.kernel.org/r/59bf3c2e-d58b-41af-ab10-3e631d802229@bytedance.com
Acked-by: David Hildenbrand <david(a)redhat.com>
Cc: Andrea Arcangeli <aarcange(a)redhat.com>
Cc: Hugh Dickins <hughd(a)google.com>
Cc: Jann Horn <jannh(a)google.com>
Cc: Pavel Emelyanov <xemul(a)virtuozzo.com>
Cc: Qi Zheng <zhengqi.arch(a)bytedance.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/userfaultfd.c | 22 ++++++++++++----------
1 file changed, 12 insertions(+), 10 deletions(-)
--- a/mm/userfaultfd.c~userfaultfd-fix-checks-for-huge-pmds
+++ a/mm/userfaultfd.c
@@ -787,21 +787,23 @@ retry:
}
dst_pmdval = pmdp_get_lockless(dst_pmd);
- /*
- * If the dst_pmd is mapped as THP don't
- * override it and just be strict.
- */
- if (unlikely(pmd_trans_huge(dst_pmdval))) {
- err = -EEXIST;
- break;
- }
if (unlikely(pmd_none(dst_pmdval)) &&
unlikely(__pte_alloc(dst_mm, dst_pmd))) {
err = -ENOMEM;
break;
}
- /* If an huge pmd materialized from under us fail */
- if (unlikely(pmd_trans_huge(*dst_pmd))) {
+ dst_pmdval = pmdp_get_lockless(dst_pmd);
+ /*
+ * If the dst_pmd is THP don't override it and just be strict.
+ * (This includes the case where the PMD used to be THP and
+ * changed back to none after __pte_alloc().)
+ */
+ if (unlikely(!pmd_present(dst_pmdval) || pmd_trans_huge(dst_pmdval) ||
+ pmd_devmap(dst_pmdval))) {
+ err = -EEXIST;
+ break;
+ }
+ if (unlikely(pmd_bad(dst_pmdval))) {
err = -EFAULT;
break;
}
_
Patches currently in -mm which might be from jannh(a)google.com are
userfaultfd-fix-checks-for-huge-pmds.patch
userfaultfd-dont-bug_on-if-khugepaged-yanks-our-page-table.patch
mm-fix-harmless-type-confusion-in-lock_vma_under_rcu.patch
The patch titled
Subject: mm: vmalloc: ensure vmap_block is initialised before adding to queue
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
mm-vmalloc-ensure-vmap_block-is-initialised-before-adding-to-queue.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Will Deacon <will(a)kernel.org>
Subject: mm: vmalloc: ensure vmap_block is initialised before adding to queue
Date: Mon, 12 Aug 2024 18:16:06 +0100
Commit 8c61291fd850 ("mm: fix incorrect vbq reference in
purge_fragmented_block") extended the 'vmap_block' structure to contain a
'cpu' field which is set at allocation time to the id of the initialising
CPU.
When a new 'vmap_block' is being instantiated by new_vmap_block(), the
partially initialised structure is added to the local 'vmap_block_queue'
xarray before the 'cpu' field has been initialised. If another CPU is
concurrently walking the xarray (e.g. via vm_unmap_aliases()), then it
may perform an out-of-bounds access to the remote queue thanks to an
uninitialised index.
This has been observed as UBSAN errors in Android:
| Internal error: UBSAN: array index out of bounds: 00000000f2005512 [#1] PREEMPT SMP
|
| Call trace:
| purge_fragmented_block+0x204/0x21c
| _vm_unmap_aliases+0x170/0x378
| vm_unmap_aliases+0x1c/0x28
| change_memory_common+0x1dc/0x26c
| set_memory_ro+0x18/0x24
| module_enable_ro+0x98/0x238
| do_init_module+0x1b0/0x310
Move the initialisation of 'vb->cpu' in new_vmap_block() ahead of the
addition to the xarray.
Link: https://lkml.kernel.org/r/20240812171606.17486-1-will@kernel.org
Fixes: 8c61291fd850 ("mm: fix incorrect vbq reference in purge_fragmented_block")
Signed-off-by: Will Deacon <will(a)kernel.org>
Reviewed-by: Baoquan He <bhe(a)redhat.com>
Reviewed-by: Uladzislau Rezki (Sony) <urezki(a)gmail.com>
Cc: Zhaoyang Huang <zhaoyang.huang(a)unisoc.com>
Cc: Hailong.Liu <hailong.liu(a)oppo.com>
Cc: Christoph Hellwig <hch(a)infradead.org>
Cc: Lorenzo Stoakes <lstoakes(a)gmail.com>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/vmalloc.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/mm/vmalloc.c~mm-vmalloc-ensure-vmap_block-is-initialised-before-adding-to-queue
+++ a/mm/vmalloc.c
@@ -2626,6 +2626,7 @@ static void *new_vmap_block(unsigned int
vb->dirty_max = 0;
bitmap_set(vb->used_map, 0, (1UL << order));
INIT_LIST_HEAD(&vb->free_list);
+ vb->cpu = raw_smp_processor_id();
xa = addr_to_vb_xa(va->va_start);
vb_idx = addr_to_vb_idx(va->va_start);
@@ -2642,7 +2643,6 @@ static void *new_vmap_block(unsigned int
* integrity together with list_for_each_rcu from read
* side.
*/
- vb->cpu = raw_smp_processor_id();
vbq = per_cpu_ptr(&vmap_block_queue, vb->cpu);
spin_lock(&vbq->lock);
list_add_tail_rcu(&vb->free_list, &vbq->free);
_
Patches currently in -mm which might be from will(a)kernel.org are
mm-vmalloc-ensure-vmap_block-is-initialised-before-adding-to-queue.patch
The patch titled
Subject: alloc_tag: mark pages reserved during CMA activation as not tagged
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
alloc_tag-mark-pages-reserved-during-cma-activation-as-not-tagged.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Suren Baghdasaryan <surenb(a)google.com>
Subject: alloc_tag: mark pages reserved during CMA activation as not tagged
Date: Tue, 13 Aug 2024 08:07:57 -0700
During CMA activation, pages in CMA area are prepared and then freed
without being allocated. This triggers warnings when memory allocation
debug config (CONFIG_MEM_ALLOC_PROFILING_DEBUG) is enabled. Fix this by
marking these pages not tagged before freeing them.
Link: https://lkml.kernel.org/r/20240813150758.855881-2-surenb@google.com
Fixes: d224eb0287fb ("codetag: debug: mark codetags for reserved pages as empty")
Signed-off-by: Suren Baghdasaryan <surenb(a)google.com>
Acked-by: David Hildenbrand <david(a)redhat.com>
Cc: Kees Cook <keescook(a)chromium.org>
Cc: Kent Overstreet <kent.overstreet(a)linux.dev>
Cc: Pasha Tatashin <pasha.tatashin(a)soleen.com>
Cc: Sourav Panda <souravpanda(a)google.com>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: <stable(a)vger.kernel.org> [6.10]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/mm_init.c | 2 ++
1 file changed, 2 insertions(+)
--- a/mm/mm_init.c~alloc_tag-mark-pages-reserved-during-cma-activation-as-not-tagged
+++ a/mm/mm_init.c
@@ -2244,6 +2244,8 @@ void __init init_cma_reserved_pageblock(
set_pageblock_migratetype(page, MIGRATE_CMA);
set_page_refcounted(page);
+ /* pages were reserved and not allocated */
+ clear_page_tag_ref(page);
__free_pages(page, pageblock_order);
adjust_managed_page_count(page, pageblock_nr_pages);
_
Patches currently in -mm which might be from surenb(a)google.com are
alloc_tag-introduce-clear_page_tag_ref-helper-function.patch
alloc_tag-mark-pages-reserved-during-cma-activation-as-not-tagged.patch