The patch titled
Subject: mm/page_alloc: fix pcp->count race between drain_pages_zone() vs __rmqueue_pcplist()
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
mm-page_alloc-fix-pcp-count-race-between-drain_pages_zone-vs-__rmqueue_pcplist.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Li Zhijian <lizhijian(a)fujitsu.com>
Subject: mm/page_alloc: fix pcp->count race between drain_pages_zone() vs __rmqueue_pcplist()
Date: Tue, 23 Jul 2024 14:44:28 +0800
It's expected that no page should be left in pcp_list after calling
zone_pcp_disable() in offline_pages(). Previously, it's observed that
offline_pages() gets stuck [1] due to some pages remaining in pcp_list.
Cause:
There is a race condition between drain_pages_zone() and __rmqueue_pcplist()
involving the pcp->count variable. See below scenario:
CPU0 CPU1
---------------- ---------------
spin_lock(&pcp->lock);
__rmqueue_pcplist() {
zone_pcp_disable() {
/* list is empty */
if (list_empty(list)) {
/* add pages to pcp_list */
alloced = rmqueue_bulk()
mutex_lock(&pcp_batch_high_lock)
...
__drain_all_pages() {
drain_pages_zone() {
/* read pcp->count, it's 0 here */
count = READ_ONCE(pcp->count)
/* 0 means nothing to drain */
/* update pcp->count */
pcp->count += alloced << order;
...
...
spin_unlock(&pcp->lock);
In this case, after calling zone_pcp_disable() though, there are still some
pages in pcp_list. And these pages in pcp_list are neither movable nor
isolated, offline_pages() gets stuck as a result.
Solution:
Expand the scope of the pcp->lock to also protect pcp->count in
drain_pages_zone(), to ensure no pages are left in the pcp list after
zone_pcp_disable()
[1] https://lore.kernel.org/linux-mm/6a07125f-e720-404c-b2f9-e55f3f166e85@fujit…
Link: https://lkml.kernel.org/r/20240723064428.1179519-1-lizhijian@fujitsu.com
Fixes: 4b23a68f9536 ("mm/page_alloc: protect PCP lists with a spinlock")
Signed-off-by: Li Zhijian <lizhijian(a)fujitsu.com>
Reported-by: Yao Xingtao <yaoxt.fnst(a)fujitsu.com>
Reviewed-by: Vlastimil Babka <vbabka(a)suse.cz>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/page_alloc.c | 18 +++++++++++-------
1 file changed, 11 insertions(+), 7 deletions(-)
--- a/mm/page_alloc.c~mm-page_alloc-fix-pcp-count-race-between-drain_pages_zone-vs-__rmqueue_pcplist
+++ a/mm/page_alloc.c
@@ -2343,16 +2343,20 @@ void drain_zone_pages(struct zone *zone,
static void drain_pages_zone(unsigned int cpu, struct zone *zone)
{
struct per_cpu_pages *pcp = per_cpu_ptr(zone->per_cpu_pageset, cpu);
- int count = READ_ONCE(pcp->count);
-
- while (count) {
- int to_drain = min(count, pcp->batch << CONFIG_PCP_BATCH_SCALE_MAX);
- count -= to_drain;
+ int count;
+ do {
spin_lock(&pcp->lock);
- free_pcppages_bulk(zone, to_drain, pcp, 0);
+ count = pcp->count;
+ if (count) {
+ int to_drain = min(count,
+ pcp->batch << CONFIG_PCP_BATCH_SCALE_MAX);
+
+ free_pcppages_bulk(zone, to_drain, pcp, 0);
+ count -= to_drain;
+ }
spin_unlock(&pcp->lock);
- }
+ } while (count);
}
/*
_
Patches currently in -mm which might be from lizhijian(a)fujitsu.com are
mm-page_alloc-fix-pcp-count-race-between-drain_pages_zone-vs-__rmqueue_pcplist.patch
The patch titled
Subject: scripts/gdb: fix lx-mounts command error
has been added to the -mm mm-nonmm-unstable branch. Its filename is
scripts-gdb-fix-lx-mounts-command-error.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-nonmm-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Kuan-Ying Lee <kuan-ying.lee(a)canonical.com>
Subject: scripts/gdb: fix lx-mounts command error
Date: Tue, 23 Jul 2024 14:48:59 +0800
(gdb) lx-mounts
mount super_block devname pathname fstype options
Python Exception <class 'gdb.error'>: There is no member named list.
Error occurred in Python: There is no member named list.
We encounter the above issue after commit 2eea9ce4310d ("mounts: keep
list of mounts in an rbtree"). The commit move a mount from list into
rbtree.
So we can instead use rbtree to iterate all mounts information.
Link: https://lkml.kernel.org/r/20240723064902.124154-4-kuan-ying.lee@canonical.c…
Fixes: 2eea9ce4310d ("mounts: keep list of mounts in an rbtree")
Signed-off-by: Kuan-Ying Lee <kuan-ying.lee(a)canonical.com>
Cc: Jan Kiszka <jan.kiszka(a)siemens.com>
Cc: Kieran Bingham <kbingham(a)kernel.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
scripts/gdb/linux/proc.py | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
--- a/scripts/gdb/linux/proc.py~scripts-gdb-fix-lx-mounts-command-error
+++ a/scripts/gdb/linux/proc.py
@@ -18,6 +18,7 @@ from linux import utils
from linux import tasks
from linux import lists
from linux import vfs
+from linux import rbtree
from struct import *
@@ -172,8 +173,7 @@ values of that process namespace"""
gdb.write("{:^18} {:^15} {:>9} {} {} options\n".format(
"mount", "super_block", "devname", "pathname", "fstype"))
- for mnt in lists.list_for_each_entry(namespace['list'],
- mount_ptr_type, "mnt_list"):
+ for mnt in rbtree.rb_inorder_for_each_entry(namespace['mounts'], mount_ptr_type, "mnt_node"):
devname = mnt['mnt_devname'].string()
devname = devname if devname else "none"
_
Patches currently in -mm which might be from kuan-ying.lee(a)canonical.com are
scripts-gdb-fix-timerlist-parsing-issue.patch
scripts-gdb-add-iteration-function-for-rbtree.patch
scripts-gdb-fix-lx-mounts-command-error.patch
scripts-gdb-add-lx-stack_depot_lookup-command.patch
scripts-gdb-add-lx-kasan_mem_to_shadow-command.patch
The patch titled
Subject: scripts/gdb: add iteration function for rbtree
has been added to the -mm mm-nonmm-unstable branch. Its filename is
scripts-gdb-add-iteration-function-for-rbtree.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-nonmm-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Kuan-Ying Lee <kuan-ying.lee(a)canonical.com>
Subject: scripts/gdb: add iteration function for rbtree
Date: Tue, 23 Jul 2024 14:48:58 +0800
Add inorder iteration function for rbtree usage.
This is a preparation patch for the next patch to fix the gdb mounts
issue.
Link: https://lkml.kernel.org/r/20240723064902.124154-3-kuan-ying.lee@canonical.c…
Fixes: 2eea9ce4310d ("mounts: keep list of mounts in an rbtree")
Signed-off-by: Kuan-Ying Lee <kuan-ying.lee(a)canonical.com>
Cc: Jan Kiszka <jan.kiszka(a)siemens.com>
Cc: Kieran Bingham <kbingham(a)kernel.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
scripts/gdb/linux/rbtree.py | 12 ++++++++++++
1 file changed, 12 insertions(+)
--- a/scripts/gdb/linux/rbtree.py~scripts-gdb-add-iteration-function-for-rbtree
+++ a/scripts/gdb/linux/rbtree.py
@@ -9,6 +9,18 @@ from linux import utils
rb_root_type = utils.CachedType("struct rb_root")
rb_node_type = utils.CachedType("struct rb_node")
+def rb_inorder_for_each(root):
+ def inorder(node):
+ if node:
+ yield from inorder(node['rb_left'])
+ yield node
+ yield from inorder(node['rb_right'])
+
+ yield from inorder(root['rb_node'])
+
+def rb_inorder_for_each_entry(root, gdbtype, member):
+ for node in rb_inorder_for_each(root):
+ yield utils.container_of(node, gdbtype, member)
def rb_first(root):
if root.type == rb_root_type.get_type():
_
Patches currently in -mm which might be from kuan-ying.lee(a)canonical.com are
scripts-gdb-fix-timerlist-parsing-issue.patch
scripts-gdb-add-iteration-function-for-rbtree.patch
scripts-gdb-fix-lx-mounts-command-error.patch
scripts-gdb-add-lx-stack_depot_lookup-command.patch
scripts-gdb-add-lx-kasan_mem_to_shadow-command.patch
The patch titled
Subject: scripts/gdb: fix timerlist parsing issue
has been added to the -mm mm-nonmm-unstable branch. Its filename is
scripts-gdb-fix-timerlist-parsing-issue.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-nonmm-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Kuan-Ying Lee <kuan-ying.lee(a)canonical.com>
Subject: scripts/gdb: fix timerlist parsing issue
Date: Tue, 23 Jul 2024 14:48:57 +0800
Patch series "Fix some GDB command error and add some GDB commands", v3.
Fix some GDB command errors and add some useful GDB commands.
This patch (of 5):
Commit 7988e5ae2be7 ("tick: Split nohz and highres features from
nohz_mode") and commit 7988e5ae2be7 ("tick: Split nohz and highres
features from nohz_mode") move 'tick_stopped' and 'nohz_mode' to flags
field which will break the gdb lx-mounts command:
(gdb) lx-timerlist
Python Exception <class 'gdb.error'>: There is no member named nohz_mode.
Error occurred in Python: There is no member named nohz_mode.
(gdb) lx-timerlist
Python Exception <class 'gdb.error'>: There is no member named tick_stopped.
Error occurred in Python: There is no member named tick_stopped.
We move 'tick_stopped' and 'nohz_mode' to flags field instead.
Link: https://lkml.kernel.org/r/20240723064902.124154-1-kuan-ying.lee@canonical.c…
Link: https://lkml.kernel.org/r/20240723064902.124154-2-kuan-ying.lee@canonical.c…
Fixes: a478ffb2ae23 ("tick: Move individual bit features to debuggable mask accesses")
Fixes: 7988e5ae2be7 ("tick: Split nohz and highres features from nohz_mode")
Signed-off-by: Kuan-Ying Lee <kuan-ying.lee(a)canonical.com>
Cc: Jan Kiszka <jan.kiszka(a)siemens.com>
Cc: Kieran Bingham <kbingham(a)kernel.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
scripts/gdb/linux/timerlist.py | 31 ++++++++++++++++---------------
1 file changed, 16 insertions(+), 15 deletions(-)
--- a/scripts/gdb/linux/timerlist.py~scripts-gdb-fix-timerlist-parsing-issue
+++ a/scripts/gdb/linux/timerlist.py
@@ -87,21 +87,22 @@ def print_cpu(hrtimer_bases, cpu, max_cl
text += "\n"
if constants.LX_CONFIG_TICK_ONESHOT:
- fmts = [(" .{} : {}", 'nohz_mode'),
- (" .{} : {} nsecs", 'last_tick'),
- (" .{} : {}", 'tick_stopped'),
- (" .{} : {}", 'idle_jiffies'),
- (" .{} : {}", 'idle_calls'),
- (" .{} : {}", 'idle_sleeps'),
- (" .{} : {} nsecs", 'idle_entrytime'),
- (" .{} : {} nsecs", 'idle_waketime'),
- (" .{} : {} nsecs", 'idle_exittime'),
- (" .{} : {} nsecs", 'idle_sleeptime'),
- (" .{}: {} nsecs", 'iowait_sleeptime'),
- (" .{} : {}", 'last_jiffies'),
- (" .{} : {}", 'next_timer'),
- (" .{} : {} nsecs", 'idle_expires')]
- text += "\n".join([s.format(f, ts[f]) for s, f in fmts])
+ TS_FLAG_STOPPED = 1 << 1
+ TS_FLAG_NOHZ = 1 << 4
+ text += f" .{'nohz':15s}: {int(bool(ts['flags'] & TS_FLAG_NOHZ))}\n"
+ text += f" .{'last_tick':15s}: {ts['last_tick']}\n"
+ text += f" .{'tick_stopped':15s}: {int(bool(ts['flags'] & TS_FLAG_STOPPED))}\n"
+ text += f" .{'idle_jiffies':15s}: {ts['idle_jiffies']}\n"
+ text += f" .{'idle_calls':15s}: {ts['idle_calls']}\n"
+ text += f" .{'idle_sleeps':15s}: {ts['idle_sleeps']}\n"
+ text += f" .{'idle_entrytime':15s}: {ts['idle_entrytime']} nsecs\n"
+ text += f" .{'idle_waketime':15s}: {ts['idle_waketime']} nsecs\n"
+ text += f" .{'idle_exittime':15s}: {ts['idle_exittime']} nsecs\n"
+ text += f" .{'idle_sleeptime':15s}: {ts['idle_sleeptime']} nsecs\n"
+ text += f" .{'iowait_sleeptime':15s}: {ts['iowait_sleeptime']} nsecs\n"
+ text += f" .{'last_jiffies':15s}: {ts['last_jiffies']}\n"
+ text += f" .{'next_timer':15s}: {ts['next_timer']}\n"
+ text += f" .{'idle_expires':15s}: {ts['idle_expires']} nsecs\n"
text += "\njiffies: {}\n".format(jiffies)
text += "\n"
_
Patches currently in -mm which might be from kuan-ying.lee(a)canonical.com are
scripts-gdb-fix-timerlist-parsing-issue.patch
scripts-gdb-add-iteration-function-for-rbtree.patch
scripts-gdb-fix-lx-mounts-command-error.patch
scripts-gdb-add-lx-stack_depot_lookup-command.patch
scripts-gdb-add-lx-kasan_mem_to_shadow-command.patch
From: Zhang Yi <yi.zhang(a)huawei.com>
commit 4df031ff5876d94b48dd9ee486ba5522382a06b2 upstream
After commit 3da40c7b0898 ("ext4: only call ext4_truncate when size <=
isize"), i_disksize could always be updated to i_size in ext4_setattr(),
and we could sure that i_disksize <= i_size since holding inode lock and
if i_disksize < i_size there are delalloc writes pending in the range
upto i_size. If the end of the current write is <= i_size, there's no
need to touch i_disksize since writeback will push i_disksize upto
i_size eventually. So we can switch to check i_size instead of
i_disksize in ext4_da_write_end() when write to the end of the file.
we also could remove ext4_mark_inode_dirty() together because we defer
inode dirtying to generic_write_end() or ext4_da_write_inline_data_end().
Cc: stable(a)vger.kernel.org
Signed-off-by: Zhang Yi <yi.zhang(a)huawei.com>
Reviewed-by: Jan Kara <jack(a)suse.cz>
Signed-off-by: Theodore Ts'o <tytso(a)mit.edu>
Link: https://lore.kernel.org/r/20210716122024.1105856-2-yi.zhang@huawei.com
Reviewed-by: Cheng Nie <niecheng1(a)uniontech.com>
Signed-off-by: Dandan Zhang <zhangdandan(a)uniontech.com>
Signed-off-by: WangYuli <wangyuli(a)uniontech.com>
---
fs/ext4/inode.c | 34 ++++++++++++++++++----------------
1 file changed, 18 insertions(+), 16 deletions(-)
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 646285fbc9fc..d8a8e4ee5ff8 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3207,35 +3207,37 @@ static int ext4_da_write_end(struct file *file,
end = start + copied - 1;
/*
- * generic_write_end() will run mark_inode_dirty() if i_size
- * changes. So let's piggyback the i_disksize mark_inode_dirty
- * into that.
+ * Since we are holding inode lock, we are sure i_disksize <=
+ * i_size. We also know that if i_disksize < i_size, there are
+ * delalloc writes pending in the range upto i_size. If the end of
+ * the current write is <= i_size, there's no need to touch
+ * i_disksize since writeback will push i_disksize upto i_size
+ * eventually. If the end of the current write is > i_size and
+ * inside an allocated block (ext4_da_should_update_i_disksize()
+ * check), we need to update i_disksize here as neither
+ * ext4_writepage() nor certain ext4_writepages() paths not
+ * allocating blocks update i_disksize.
+ *
+ * Note that we defer inode dirtying to generic_write_end() /
+ * ext4_da_write_inline_data_end().
*/
new_i_size = pos + copied;
- if (copied && new_i_size > EXT4_I(inode)->i_disksize) {
+ if (copied && new_i_size > inode->i_size) {
if (ext4_has_inline_data(inode) ||
- ext4_da_should_update_i_disksize(page, end)) {
+ ext4_da_should_update_i_disksize(page, end))
ext4_update_i_disksize(inode, new_i_size);
- /* We need to mark inode dirty even if
- * new_i_size is less that inode->i_size
- * bu greater than i_disksize.(hint delalloc)
- */
- ext4_mark_inode_dirty(handle, inode);
- }
}
if (write_mode != CONVERT_INLINE_DATA &&
ext4_test_inode_state(inode, EXT4_STATE_MAY_INLINE_DATA) &&
ext4_has_inline_data(inode))
- ret2 = ext4_da_write_inline_data_end(inode, pos, len, copied,
+ ret = ext4_da_write_inline_data_end(inode, pos, len, copied,
page);
else
- ret2 = generic_write_end(file, mapping, pos, len, copied,
+ ret = generic_write_end(file, mapping, pos, len, copied,
page, fsdata);
- copied = ret2;
- if (ret2 < 0)
- ret = ret2;
+ copied = ret;
ret2 = ext4_journal_stop(handle);
if (!ret)
ret = ret2;
--
2.43.4