There is nothing preventing kernel memory allocators from allocating a
page that overlaps with PTR_ERR(), except for architecture-specific
code that setup memblock.
It was discovered that RISCV architecture doesn't setup memblock
corectly, leading to a page overlapping with PTR_ERR() being allocated,
and subsequently crashing the kernel (link in Close: )
The reported crash has nothing to do with PTR_ERR(): the last page
(at address 0xfffff000) being allocated leads to an unexpected
arithmetic overflow in ext4; but still, this page shouldn't be
allocated in the first place.
Because PTR_ERR() is an architecture-independent thing, we shouldn't
ask every single architecture to set this up. There may be other
architectures beside RISCV that have the same problem.
Fix this one and for all by reserving the physical memory page that
may be mapped to the last virtual memory page as part of low memory.
Unfortunately, this means if there is actual memory at this reserved
location, that memory will become inaccessible. However, if this page
is not reserved, it can only be accessed as high memory, so this
doesn't matter if high memory is not supported. Even if high memory is
supported, it is still only one page.
Closes: https://lore.kernel.org/linux-riscv/878r1ibpdn.fsf@all.your.base.are.belong…
Signed-off-by: Nam Cao <namcao(a)linutronix.de>
Cc: <stable(a)vger.kernel.org> # all versions
---
init/main.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/init/main.c b/init/main.c
index 881f6230ee59..f8d2793c4641 100644
--- a/init/main.c
+++ b/init/main.c
@@ -900,6 +900,7 @@ void start_kernel(void)
page_address_init();
pr_notice("%s", linux_banner);
early_security_init();
+ memblock_reserve(__pa(-PAGE_SIZE), PAGE_SIZE); /* reserve last page for ERR_PTR */
setup_arch(&command_line);
setup_boot_config();
setup_command_line(command_line);
--
2.39.2
The NOP op flags should have been checked from beginning like any other
opcode, otherwise NOP may not be extended with the op flags.
Given both liburing and Rust io-uring crate always zeros SQE op flags, just
ignore users which play raw NOP uring interface without zeroing SQE, because
NOP is just for test purpose. Then we can save one NOP2 opcode.
Suggested-by: Jens Axboe <axboe(a)kernel.dk>
Fixes: 2b188cc1bb85 ("Add io_uring IO interface")
Cc: stable(a)vger.kernel.org
Signed-off-by: Ming Lei <ming.lei(a)redhat.com>
---
io_uring/nop.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/io_uring/nop.c b/io_uring/nop.c
index d956599a3c1b..1a4e312dfe51 100644
--- a/io_uring/nop.c
+++ b/io_uring/nop.c
@@ -12,6 +12,8 @@
int io_nop_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
{
+ if (READ_ONCE(sqe->rw_flags))
+ return -EINVAL;
return 0;
}
--
2.42.0
After upgrading to rc7 from rc6 on a system with NVIDIA GP104 using
the nouveau driver, I get no display output from the kernel (only the
output from GRUB shows on the primary display). Nonetheless, I was
able to SSH to the system and get the kernel log from dmesg. I found
errors from nouveau in it. Grepping it for nouveau gives me this:
[ 0.367379] nouveau 0000:01:00.0: NVIDIA GP104 (134000a1)
[ 0.474499] nouveau 0000:01:00.0: bios: version 86.04.50.80.13
[ 0.474620] nouveau 0000:01:00.0: pmu: firmware unavailable
[ 0.474977] nouveau 0000:01:00.0: fb: 8192 MiB GDDR5
[ 0.484371] nouveau 0000:01:00.0: sec2(acr): mbox 00000001 00000000
[ 0.484377] nouveau 0000:01:00.0: sec2(acr):load: boot failed: -5
[ 0.484379] nouveau 0000:01:00.0: acr: init failed, -5
[ 0.484466] nouveau 0000:01:00.0: init failed with -5
[ 0.484468] nouveau: DRM-master:00000000:00000080: init failed with -5
[ 0.484470] nouveau 0000:01:00.0: DRM-master: Device allocation failed: -5
[ 0.485078] nouveau 0000:01:00.0: probe with driver nouveau failed with error -50
I bisected between v6.9-rc6 and v6.9-rc7 and that identified commit
52a6947bf576 ("drm/nouveau/firmware: Fix SG_DEBUG error with
nvkm_firmware_ctor()") as the first bad commit. I then rebuilt
v6.9-rc7 with just that commit reverted and the problem does not
occur.
Please let me know if there are any additional details I can provide
that would be helpful, or if I should reproduce the failure with
additional debugging options enabled, etc.
Cheers,
-- Dan
The patch titled
Subject: fs/proc: fix softlockup in __read_vmcore
has been added to the -mm mm-nonmm-unstable branch. Its filename is
fs-proc-fix-softlockup-in-__read_vmcore.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-nonmm-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Rik van Riel <riel(a)surriel.com>
Subject: fs/proc: fix softlockup in __read_vmcore
Date: Tue, 7 May 2024 09:18:58 -0400
While taking a kernel core dump with makedumpfile on a larger system,
softlockup messages often appear.
While softlockup warnings can be harmless, they can also interfere with
things like RCU freeing memory, which can be problematic when the kdump
kexec image is configured with as little memory as possible.
Avoid the softlockup, and give things like work items and RCU a chance to
do their thing during __read_vmcore by adding a cond_resched.
Link: https://lkml.kernel.org/r/20240507091858.36ff767f@imladris.surriel.com
Signed-off-by: Rik van Riel <riel(a)surriel.com>
Acked-by: Baoquan He <bhe(a)redhat.com>
Cc: Dave Young <dyoung(a)redhat.com>
Cc: Vivek Goyal <vgoyal(a)redhat.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/proc/vmcore.c | 2 ++
1 file changed, 2 insertions(+)
--- a/fs/proc/vmcore.c~fs-proc-fix-softlockup-in-__read_vmcore
+++ a/fs/proc/vmcore.c
@@ -383,6 +383,8 @@ static ssize_t __read_vmcore(struct iov_
/* leave now if filled buffer already */
if (!iov_iter_count(iter))
return acc;
+
+ cond_resched();
}
list_for_each_entry(m, &vmcore_list, list) {
_
Patches currently in -mm which might be from riel(a)surriel.com are
fs-proc-fix-softlockup-in-__read_vmcore.patch
Fuzzing reports a possible deadlock in jbd2_log_wait_commit.
The problem occurs in ext4_ind_migrate due to an incorrect order of
unlocking of the journal and write semaphores - the order of unlocking
must be the reverse of the order of locking.
Found by Linux Verification Center (linuxtesting.org) with syzkaller.
Reviewed-by: Ritesh Harjani (IBM) <ritesh.list(a)gmail.com>
Signed-off-by: Artem Sadovnikov <ancowi69(a)gmail.com>
Signed-off-by: Mikhail Ukhin <mish.uxin2012(a)yandex.ru>
---
v2: New addresses have been added and Ritesh Harjani has been noted as a
reviewer.
fs/ext4/migrate.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/ext4/migrate.c b/fs/ext4/migrate.c
index b0ea646454ac..59290356aa5b 100644
--- a/fs/ext4/migrate.c
+++ b/fs/ext4/migrate.c
@@ -663,8 +663,8 @@ int ext4_ind_migrate(struct inode *inode)
if (unlikely(ret2 && !ret))
ret = ret2;
errout:
- ext4_journal_stop(handle);
up_write(&EXT4_I(inode)->i_data_sem);
+ ext4_journal_stop(handle);
out_unlock:
percpu_up_write(&sbi->s_writepages_rwsem);
return ret;
--
2.25.1