With these features are enabled, the EEVDF scheduler introduces a large
performance degradation, observed in multiple database tests on kernel
versions using EEVDF, across multiple architectures (x86, aarch64, amd64)
and CPU generations.
Disable the features to minimize default performance impact.
Cc: <stable(a)vger.kernel.org> # 6.6.x
Fixes: 86bfbb7ce4f6 ("sched/fair: Add lag based placement")
Fixes: 63304558ba5d ("sched/eevdf: Curb wakeup-preemption")
Signed-off-by: Cristian Prundeanu <cpru(a)amazon.com>
---
kernel/sched/features.h | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/kernel/sched/features.h b/kernel/sched/features.h
index a3d331dd2d8f..8a5ca80665b3 100644
--- a/kernel/sched/features.h
+++ b/kernel/sched/features.h
@@ -4,7 +4,7 @@
* Using the avg_vruntime, do the right thing and preserve lag across
* sleep+wake cycles. EEVDF placement strategy #1, #2 if disabled.
*/
-SCHED_FEAT(PLACE_LAG, true)
+SCHED_FEAT(PLACE_LAG, false)
/*
* Give new tasks half a slice to ease into the competition.
*/
@@ -17,7 +17,7 @@ SCHED_FEAT(PLACE_REL_DEADLINE, true)
* Inhibit (wakeup) preemption until the current task has either matched the
* 0-lag point or until is has exhausted it's slice.
*/
-SCHED_FEAT(RUN_TO_PARITY, true)
+SCHED_FEAT(RUN_TO_PARITY, false)
/*
* Allow wakeup of tasks with a shorter slice to cancel RUN_TO_PARITY for
* current.
--
2.40.1
Hi, Conor
Thanks for your patch.
> From: Conor Dooley <conor.dooley(a)microchip.com>
>
> Aurelien reported probe failures due to the csi node being enabled without
> having a camera attached to it. A camera was in the initial submissions, but
> was removed from the dts, as it had not actually been present on the board,
> but was from an addon board used by the developer of the relevant drivers.
> The non-camera pipeline nodes were not disabled when this happened and
> the probe failures are problematic for Debian. Disable them.
>
> CC: stable(a)vger.kernel.org
> Fixes: 28ecaaa5af192 ("riscv: dts: starfive: jh7110: Add camera subsystem
> nodes")
Here you write it in 13 characters, should be "Fixes: 28ecaaa5af19 ..."
Best Regards
Changhuang.
> Closes: https://lore.kernel.org/all/Zw1-vcN4CoVkfLjU@aurel32.net/
> Reported-by: Aurelien Jarno <aurelien(a)aurel32.net>
> Signed-off-by: Conor Dooley <conor.dooley(a)microchip.com>
> ---
> CC: Emil Renner Berthing <kernel(a)esmil.dk>
> CC: Rob Herring <robh(a)kernel.org>
> CC: Krzysztof Kozlowski <krzk+dt(a)kernel.org>
> CC: Conor Dooley <conor+dt(a)kernel.org>
> CC: Changhuang Liang <changhuang.liang(a)starfivetech.com>
> CC: devicetree(a)vger.kernel.org
> CC: linux-riscv(a)lists.infradead.org
> CC: linux-kernel(a)vger.kernel.org
> ---
> arch/riscv/boot/dts/starfive/jh7110-common.dtsi | 2 --
> 1 file changed, 2 deletions(-)
>
> diff --git a/arch/riscv/boot/dts/starfive/jh7110-common.dtsi
> b/arch/riscv/boot/dts/starfive/jh7110-common.dtsi
> index c7771b3b64758..d6c55f1cc96a9 100644
> --- a/arch/riscv/boot/dts/starfive/jh7110-common.dtsi
> +++ b/arch/riscv/boot/dts/starfive/jh7110-common.dtsi
> @@ -128,7 +128,6 @@ &camss {
> assigned-clocks = <&ispcrg JH7110_ISPCLK_DOM4_APB_FUNC>,
> <&ispcrg JH7110_ISPCLK_MIPI_RX0_PXL>;
> assigned-clock-rates = <49500000>, <198000000>;
> - status = "okay";
>
> ports {
> #address-cells = <1>;
> @@ -151,7 +150,6 @@ camss_from_csi2rx: endpoint { &csi2rx {
> assigned-clocks = <&ispcrg JH7110_ISPCLK_VIN_SYS>;
> assigned-clock-rates = <297000000>;
> - status = "okay";
>
> ports {
> #address-cells = <1>;
> --
> 2.45.2
The patch titled
Subject: ocfs2: pass u64 to ocfs2_truncate_inline maybe overflow
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
ocfs2-pass-u64-to-ocfs2_truncate_inline-maybe-overflow.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Edward Adam Davis <eadavis(a)qq.com>
Subject: ocfs2: pass u64 to ocfs2_truncate_inline maybe overflow
Date: Wed, 16 Oct 2024 19:43:47 +0800
Syzbot reported a kernel BUG in ocfs2_truncate_inline. There are two
reasons for this: first, the parameter value passed is greater than
ocfs2_max_inline_data_with_xattr, second, the start and end parameters of
ocfs2_truncate_inline are "unsigned int".
So, we need to add a sanity check for byte_start and byte_len right before
ocfs2_truncate_inline() in ocfs2_remove_inode_range(), if they are greater
than ocfs2_max_inline_data_with_xattr return -EINVAL.
Link: https://lkml.kernel.org/r/tencent_D48DB5122ADDAEDDD11918CFB68D93258C07@qq.c…
Fixes: 1afc32b95233 ("ocfs2: Write support for inline data")
Signed-off-by: Edward Adam Davis <eadavis(a)qq.com>
Reported-by: syzbot+81092778aac03460d6b7(a)syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=81092778aac03460d6b7
Reviewed-by: Joseph Qi <joseph.qi(a)linux.alibaba.com>
Cc: Joel Becker <jlbec(a)evilplan.org>
Cc: Joseph Qi <joseph.qi(a)linux.alibaba.com>
Cc: Mark Fasheh <mark(a)fasheh.com>
Cc: Junxiao Bi <junxiao.bi(a)oracle.com>
Cc: Changwei Ge <gechangwei(a)live.cn>
Cc: Gang He <ghe(a)suse.com>
Cc: Jun Piao <piaojun(a)huawei.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/ocfs2/file.c | 8 ++++++++
1 file changed, 8 insertions(+)
--- a/fs/ocfs2/file.c~ocfs2-pass-u64-to-ocfs2_truncate_inline-maybe-overflow
+++ a/fs/ocfs2/file.c
@@ -1784,6 +1784,14 @@ int ocfs2_remove_inode_range(struct inod
return 0;
if (OCFS2_I(inode)->ip_dyn_features & OCFS2_INLINE_DATA_FL) {
+ int id_count = ocfs2_max_inline_data_with_xattr(inode->i_sb, di);
+
+ if (byte_start > id_count || byte_start + byte_len > id_count) {
+ ret = -EINVAL;
+ mlog_errno(ret);
+ goto out;
+ }
+
ret = ocfs2_truncate_inline(inode, di_bh, byte_start,
byte_start + byte_len, 0);
if (ret) {
_
Patches currently in -mm which might be from eadavis(a)qq.com are
ocfs2-pass-u64-to-ocfs2_truncate_inline-maybe-overflow.patch
Hey,
Would you be interested in acquiring the attendees list of NEPCON NAGOYA 2024?
List contains: Names, Titles, Phone Numbers, Company Details, and more…
Interested? Let me know so that I’ll send you the pricing for the same.
Kind Regards,
Jane Wilkins
Marketing Executive
If you do not wish to receive our emails, please reply with "Not Interested."
Jeongjun Park <aha310510(a)gmail.com> wrote:
>
> I got the following KCSAN report during syzbot testing:
>
> ==================================================================
> BUG: KCSAN: data-race in generic_fillattr / inode_set_ctime_current
>
> write to 0xffff888102eb3260 of 4 bytes by task 6565 on cpu 1:
> inode_set_ctime_to_ts include/linux/fs.h:1638 [inline]
> inode_set_ctime_current+0x169/0x1d0 fs/inode.c:2626
> shmem_mknod+0x117/0x180 mm/shmem.c:3443
> shmem_create+0x34/0x40 mm/shmem.c:3497
> lookup_open fs/namei.c:3578 [inline]
> open_last_lookups fs/namei.c:3647 [inline]
> path_openat+0xdbc/0x1f00 fs/namei.c:3883
> do_filp_open+0xf7/0x200 fs/namei.c:3913
> do_sys_openat2+0xab/0x120 fs/open.c:1416
> do_sys_open fs/open.c:1431 [inline]
> __do_sys_openat fs/open.c:1447 [inline]
> __se_sys_openat fs/open.c:1442 [inline]
> __x64_sys_openat+0xf3/0x120 fs/open.c:1442
> x64_sys_call+0x1025/0x2d60 arch/x86/include/generated/asm/syscalls_64.h:258
> do_syscall_x64 arch/x86/entry/common.c:52 [inline]
> do_syscall_64+0x54/0x120 arch/x86/entry/common.c:83
> entry_SYSCALL_64_after_hwframe+0x76/0x7e
>
> read to 0xffff888102eb3260 of 4 bytes by task 3498 on cpu 0:
> inode_get_ctime_nsec include/linux/fs.h:1623 [inline]
> inode_get_ctime include/linux/fs.h:1629 [inline]
> generic_fillattr+0x1dd/0x2f0 fs/stat.c:62
> shmem_getattr+0x17b/0x200 mm/shmem.c:1157
> vfs_getattr_nosec fs/stat.c:166 [inline]
> vfs_getattr+0x19b/0x1e0 fs/stat.c:207
> vfs_statx_path fs/stat.c:251 [inline]
> vfs_statx+0x134/0x2f0 fs/stat.c:315
> vfs_fstatat+0xec/0x110 fs/stat.c:341
> __do_sys_newfstatat fs/stat.c:505 [inline]
> __se_sys_newfstatat+0x58/0x260 fs/stat.c:499
> __x64_sys_newfstatat+0x55/0x70 fs/stat.c:499
> x64_sys_call+0x141f/0x2d60 arch/x86/include/generated/asm/syscalls_64.h:263
> do_syscall_x64 arch/x86/entry/common.c:52 [inline]
> do_syscall_64+0x54/0x120 arch/x86/entry/common.c:83
> entry_SYSCALL_64_after_hwframe+0x76/0x7e
>
> value changed: 0x2755ae53 -> 0x27ee44d3
>
> Since there is no special protection when shmem_getattr() calls
> generic_fillattr(), data-race occurs by functions such as shmem_unlink()
> or shmem_mknod(). This can cause unexpected results, so commenting it out
> is not enough.
>
> Therefore, when calling generic_fillattr() from shmem_getattr(), it is
> appropriate to protect the inode using inode_lock_shared() and
> inode_unlock_shared() to prevent data-race.
>
Cc: stable(a)vger.kernel.org
I think this patch should be applied from next rc version and also stable
version. When calling generic_fillattr(), if you don't hold read lock,
data-race will occur in inode member variables, which can cause unexpected
behavior. This problem is also present in several stable versions, so I think
it should be fixed as soon as possible.
Regards,
Jeongjun Park
> Reported-by: syzbot <syzkaller(a)googlegroups.com>
> Fixes: 44a30220bc0a ("shmem: recalculate file inode when fstat")
> Signed-off-by: Jeongjun Park <aha310510(a)gmail.com>
> ---
> mm/shmem.c | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/mm/shmem.c b/mm/shmem.c
> index 5a77acf6ac6a..9beeb47c3743 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -1154,7 +1154,9 @@ static int shmem_getattr(struct mnt_idmap *idmap,
> stat->attributes_mask |= (STATX_ATTR_APPEND |
> STATX_ATTR_IMMUTABLE |
> STATX_ATTR_NODUMP);
> + inode_lock_shared(inode);
> generic_fillattr(idmap, request_mask, inode, stat);
> + inode_unlock_shared(inode);
>
> if (shmem_is_huge(inode, 0, false, NULL, 0))
> stat->blksize = HPAGE_PMD_SIZE;
> --
The patch titled
Subject: mm/gup: stop leaking pinned pages in low memory conditions
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
mm-gup-stop-leaking-pinned-pages-in-low-memory-conditions.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: John Hubbard <jhubbard(a)nvidia.com>
Subject: mm/gup: stop leaking pinned pages in low memory conditions
Date: Wed, 16 Oct 2024 13:22:42 -0700
If a driver tries to call any of the pin_user_pages*(FOLL_LONGTERM) family
of functions, and requests "too many" pages, then the call will
erroneously leave pages pinned. This is visible in user space as an
actual memory leak.
Repro is trivial: just make enough pin_user_pages(FOLL_LONGTERM) calls to
exhaust memory.
The root cause of the problem is this sequence, within
__gup_longterm_locked():
__get_user_pages_locked()
rc = check_and_migrate_movable_pages()
...which gets retried in a loop. The loop error handling is incomplete,
clearly due to a somewhat unusual and complicated tri-state error API.
But anyway, if -ENOMEM, or in fact, any unexpected error is returned from
check_and_migrate_movable_pages(), then __gup_longterm_locked() happily
returns the error, while leaving the pages pinned.
In the failed case, which is an app that requests (via a device driver)
30720000000 bytes to be pinned, and then exits, I see this:
$ grep foll /proc/vmstat
nr_foll_pin_acquired 7502048
nr_foll_pin_released 2048
And after applying this patch, it returns to balanced pins:
$ grep foll /proc/vmstat
nr_foll_pin_acquired 7502048
nr_foll_pin_released 7502048
Fix this by unpinning the pages that __get_user_pages_locked() has
pinned, in such error cases.
Link: https://lkml.kernel.org/r/20241016202242.456953-1-jhubbard@nvidia.com
Fixes: 24a95998e9ba ("mm/gup.c: simplify and fix check_and_migrate_movable_pages() return codes")
Signed-off-by: John Hubbard <jhubbard(a)nvidia.com>
Cc: Alistair Popple <apopple(a)nvidia.com>
Cc: Shigeru Yoshida <syoshida(a)redhat.com>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: Jason Gunthorpe <jgg(a)nvidia.com>
Cc: Minchan Kim <minchan(a)kernel.org>
Cc: Pasha Tatashin <pasha.tatashin(a)soleen.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/gup.c | 11 +++++++++++
1 file changed, 11 insertions(+)
--- a/mm/gup.c~mm-gup-stop-leaking-pinned-pages-in-low-memory-conditions
+++ a/mm/gup.c
@@ -2492,6 +2492,17 @@ static long __gup_longterm_locked(struct
/* FOLL_LONGTERM implies FOLL_PIN */
rc = check_and_migrate_movable_pages(nr_pinned_pages, pages);
+
+ /*
+ * The __get_user_pages_locked() call happens before we know
+ * that whether it's possible to successfully complete the whole
+ * operation. To compensate for this, if we get an unexpected
+ * error (such as -ENOMEM) then we must unpin everything, before
+ * erroring out.
+ */
+ if (rc != -EAGAIN && rc != 0)
+ unpin_user_pages(pages, nr_pinned_pages);
+
} while (rc == -EAGAIN);
memalloc_pin_restore(flags);
return rc ? rc : nr_pinned_pages;
_
Patches currently in -mm which might be from jhubbard(a)nvidia.com are
mm-gup-stop-leaking-pinned-pages-in-low-memory-conditions.patch
kaslr-rename-physmem_end-and-physmem_end-to-direct_map_physmem_end.patch
The patch titled
Subject: x86/traps: move kmsan check after instrumentation_begin
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
x86-traps-move-kmsan-check-after-instrumentation_begin.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Sabyrzhan Tasbolatov <snovitoll(a)gmail.com>
Subject: x86/traps: move kmsan check after instrumentation_begin
Date: Wed, 16 Oct 2024 20:24:07 +0500
During x86_64 kernel build with CONFIG_KMSAN, the objtool warns following:
AR built-in.a
AR vmlinux.a
LD vmlinux.o
vmlinux.o: warning: objtool: handle_bug+0x4: call to
kmsan_unpoison_entry_regs() leaves .noinstr.text section
OBJCOPY modules.builtin.modinfo
GEN modules.builtin
MODPOST Module.symvers
CC .vmlinux.export.o
Moving kmsan_unpoison_entry_regs() _after_ instrumentation_begin() fixes
the warning.
There is decode_bug(regs->ip, &imm) is left before KMSAN unpoisoining, but
it has the return condition and if we include it after
instrumentation_begin() it results the warning "return with
instrumentation enabled", hence, I'm concerned that regs will not be KMSAN
unpoisoned if `ud_type == BUG_NONE` is true.
Link: https://lkml.kernel.org/r/20241016152407.3149001-1-snovitoll@gmail.com
Fixes: ba54d194f8da ("x86/traps: avoid KMSAN bugs originating from handle_bug()")
Signed-off-by: Sabyrzhan Tasbolatov <snovitoll(a)gmail.com>
Reviewed-by: Alexander Potapenko <glider(a)google.com>
Cc: Borislav Petkov (AMD) <bp(a)alien8.de>
Cc: Dave Hansen <dave.hansen(a)linux.intel.com>
Cc: Ingo Molnar <mingo(a)redhat.com>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
arch/x86/kernel/traps.c | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)
--- a/arch/x86/kernel/traps.c~x86-traps-move-kmsan-check-after-instrumentation_begin
+++ a/arch/x86/kernel/traps.c
@@ -261,12 +261,6 @@ static noinstr bool handle_bug(struct pt
int ud_type;
u32 imm;
- /*
- * Normally @regs are unpoisoned by irqentry_enter(), but handle_bug()
- * is a rare case that uses @regs without passing them to
- * irqentry_enter().
- */
- kmsan_unpoison_entry_regs(regs);
ud_type = decode_bug(regs->ip, &imm);
if (ud_type == BUG_NONE)
return handled;
@@ -276,6 +270,12 @@ static noinstr bool handle_bug(struct pt
*/
instrumentation_begin();
/*
+ * Normally @regs are unpoisoned by irqentry_enter(), but handle_bug()
+ * is a rare case that uses @regs without passing them to
+ * irqentry_enter().
+ */
+ kmsan_unpoison_entry_regs(regs);
+ /*
* Since we're emulating a CALL with exceptions, restore the interrupt
* state to what it was at the exception site.
*/
_
Patches currently in -mm which might be from snovitoll(a)gmail.com are
x86-traps-move-kmsan-check-after-instrumentation_begin.patch
mm-kasan-kmsan-copy_from-to_kernel_nofault.patch