The patch titled
Subject: mm/memcontrol.c: add missed css_put()
has been removed from the -mm tree. Its filename was
mm-memcontrol-fix-do-not-put-the-css-reference.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Muchun Song <songmuchun(a)bytedance.com>
Subject: mm/memcontrol.c: add missed css_put()
We should put the css reference when memory allocation failed.
Link: http://lkml.kernel.org/r/20200614122653.98829-1-songmuchun@bytedance.com
Fixes: f0a3a24b532d ("mm: memcg/slab: rework non-root kmem_cache lifecycle management")
Signed-off-by: Muchun Song <songmuchun(a)bytedance.com>
Acked-by: Roman Gushchin <guro(a)fb.com>
Acked-by: Michal Hocko <mhocko(a)suse.com>
Cc: Johannes Weiner <hannes(a)cmpxchg.org>
Cc: Vladimir Davydov <vdavydov.dev(a)gmail.com>
Cc: Qian Cai <cai(a)lca.pw>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/memcontrol.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
--- a/mm/memcontrol.c~mm-memcontrol-fix-do-not-put-the-css-reference
+++ a/mm/memcontrol.c
@@ -2772,8 +2772,10 @@ static void memcg_schedule_kmem_cache_cr
return;
cw = kmalloc(sizeof(*cw), GFP_NOWAIT | __GFP_NOWARN);
- if (!cw)
+ if (!cw) {
+ css_put(&memcg->css);
return;
+ }
cw->memcg = memcg;
cw->cachep = cachep;
_
Patches currently in -mm which might be from songmuchun(a)bytedance.com are
The patch titled
Subject: mm: memcontrol: handle div0 crash race condition in memory.low
has been removed from the -mm tree. Its filename was
mm-memcontrol-handle-div0-crash-race-condition-in-memorylow.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Johannes Weiner <hannes(a)cmpxchg.org>
Subject: mm: memcontrol: handle div0 crash race condition in memory.low
Tejun reports seeing rare div0 crashes in memory.low stress testing:
[37228.504582] RIP: 0010:mem_cgroup_calculate_protection+0xed/0x150
[37228.505059] Code: 0f 46 d1 4c 39 d8 72 57 f6 05 16 d6 42 01 40 74 1f 4c 39 d8 76 1a 4c 39 d1 76 15 4c 29 d1 4c 29 d8 4d 29 d9 31 d2 48 0f af c1 <49> f7 f1 49 01 c2 4c 89 96 38 01 00 00 5d c3 48 0f af c7 31 d2 49
[37228.506254] RSP: 0018:ffffa14e01d6fcd0 EFLAGS: 00010246
[37228.506769] RAX: 000000000243e384 RBX: 0000000000000000 RCX: 0000000000008f4b
[37228.507319] RDX: 0000000000000000 RSI: ffff8b89bee84000 RDI: 0000000000000000
[37228.507869] RBP: ffffa14e01d6fcd0 R08: ffff8b89ca7d40f8 R09: 0000000000000000
[37228.508376] R10: 0000000000000000 R11: 00000000006422f7 R12: 0000000000000000
[37228.508881] R13: ffff8b89d9617000 R14: ffff8b89bee84000 R15: ffffa14e01d6fdb8
[37228.509397] FS: 0000000000000000(0000) GS:ffff8b8a1f1c0000(0000) knlGS:0000000000000000
[37228.509917] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[37228.510442] CR2: 00007f93b1fc175b CR3: 000000016100a000 CR4: 0000000000340ea0
[37228.511076] Call Trace:
[37228.511561] shrink_node+0x1e5/0x6c0
[37228.512044] balance_pgdat+0x32d/0x5f0
[37228.512521] kswapd+0x1d7/0x3d0
[37228.513346] ? wait_woken+0x80/0x80
[37228.514170] kthread+0x11c/0x160
[37228.514983] ? balance_pgdat+0x5f0/0x5f0
[37228.515797] ? kthread_park+0x90/0x90
[37228.516593] ret_from_fork+0x1f/0x30
This happens when parent_usage == siblings_protected. We check that usage
is bigger than protected, which should imply parent_usage being bigger
than siblings_protected. However, we don't read (or even update) these
values atomically, and they can be out of sync as the memory state changes
under us. A bit of fluctuation around the target protection isn't a big
deal, but we need to handle the div0 case.
Check the parent state explicitly to make sure we have a reasonable
positive value for the divisor.
Link: http://lkml.kernel.org/r/20200615140658.601684-1-hannes@cmpxchg.org
Fixes: 8a931f801340 ("mm: memcontrol: recursive memory.low protection")
Signed-off-by: Johannes Weiner <hannes(a)cmpxchg.org>
Reported-by: Tejun Heo <tj(a)kernel.org>
Acked-by: Michal Hocko <mhocko(a)suse.com>
Acked-by: Chris Down <chris(a)chrisdown.name>
Cc: Roman Gushchin <guro(a)fb.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/memcontrol.c | 9 +++++++--
1 file changed, 7 insertions(+), 2 deletions(-)
--- a/mm/memcontrol.c~mm-memcontrol-handle-div0-crash-race-condition-in-memorylow
+++ a/mm/memcontrol.c
@@ -6360,11 +6360,16 @@ static unsigned long effective_protectio
* We're using unprotected memory for the weight so that if
* some cgroups DO claim explicit protection, we don't protect
* the same bytes twice.
+ *
+ * Check both usage and parent_usage against the respective
+ * protected values. One should imply the other, but they
+ * aren't read atomically - make sure the division is sane.
*/
if (!(cgrp_dfl_root.flags & CGRP_ROOT_MEMORY_RECURSIVE_PROT))
return ep;
-
- if (parent_effective > siblings_protected && usage > protected) {
+ if (parent_effective > siblings_protected &&
+ parent_usage > siblings_protected &&
+ usage > protected) {
unsigned long unclaimed;
unclaimed = parent_effective - siblings_protected;
_
Patches currently in -mm which might be from hannes(a)cmpxchg.org are
mm-memcontrol-decouple-reference-counting-from-page-accounting.patch
The patch titled
Subject: mm: fix swap cache node allocation mask
has been removed from the -mm tree. Its filename was
mm-fix-swap-cache-node-allocation-mask.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Hugh Dickins <hughd(a)google.com>
Subject: mm: fix swap cache node allocation mask
https://bugzilla.kernel.org/show_bug.cgi?id=208085 reports that a slightly
overcommitted load, testing swap and zram along with i915, splats and
keeps on splatting, when it had better fail less noisily:
gnome-shell: page allocation failure: order:0,
mode:0x400d0(__GFP_IO|__GFP_FS|__GFP_COMP|__GFP_RECLAIMABLE),
nodemask=(null),cpuset=/,mems_allowed=0
CPU: 2 PID: 1155 Comm: gnome-shell Not tainted 5.7.0-1.fc33.x86_64 #1
Call Trace:
dump_stack+0x64/0x88
warn_alloc.cold+0x75/0xd9
__alloc_pages_slowpath.constprop.0+0xcfa/0xd30
__alloc_pages_nodemask+0x2df/0x320
alloc_slab_page+0x195/0x310
allocate_slab+0x3c5/0x440
___slab_alloc+0x40c/0x5f0
__slab_alloc+0x1c/0x30
kmem_cache_alloc+0x20e/0x220
xas_nomem+0x28/0x70
add_to_swap_cache+0x321/0x400
__read_swap_cache_async+0x105/0x240
swap_cluster_readahead+0x22c/0x2e0
shmem_swapin+0x8e/0xc0
shmem_swapin_page+0x196/0x740
shmem_getpage_gfp+0x3a2/0xa60
shmem_read_mapping_page_gfp+0x32/0x60
shmem_get_pages+0x155/0x5e0 [i915]
__i915_gem_object_get_pages+0x68/0xa0 [i915]
i915_vma_pin+0x3fe/0x6c0 [i915]
eb_add_vma+0x10b/0x2c0 [i915]
i915_gem_do_execbuffer+0x704/0x3430 [i915]
i915_gem_execbuffer2_ioctl+0x1ea/0x3e0 [i915]
drm_ioctl_kernel+0x86/0xd0 [drm]
drm_ioctl+0x206/0x390 [drm]
ksys_ioctl+0x82/0xc0
__x64_sys_ioctl+0x16/0x20
do_syscall_64+0x5b/0xf0
entry_SYSCALL_64_after_hwframe+0x44/0xa9
Reported on 5.7, but it goes back really to 3.1: when
shmem_read_mapping_page_gfp() was implemented for use by i915, and
allowed for __GFP_NORETRY and __GFP_NOWARN flags in most places, but
missed swapin's "& GFP_KERNEL" mask for page tree node allocation in
__read_swap_cache_async() - that was to mask off HIGHUSER_MOVABLE bits
from what page cache uses, but GFP_RECLAIM_MASK is now what's needed.
Link: http://lkml.kernel.org/r/alpine.LSU.2.11.2006151330070.11064@eggly.anvils
Fixes: 68da9f055755 ("tmpfs: pass gfp to shmem_getpage_gfp")
Signed-off-by: Hugh Dickins <hughd(a)google.com>
Reviewed-by: Vlastimil Babka <vbabka(a)suse.cz>
Reviewed-by: Matthew Wilcox (Oracle) <willy(a)infradead.org>
Reported-by: Chris Murphy <lists(a)colorremedies.com>
Analyzed-by: Vlastimil Babka <vbabka(a)suse.cz>
Analyzed-by: Matthew Wilcox <willy(a)infradead.org>
Tested-by: Chris Murphy <lists(a)colorremedies.com>
Cc: <stable(a)vger.kernel.org> [3.1+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/swap_state.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
--- a/mm/swap_state.c~mm-fix-swap-cache-node-allocation-mask
+++ a/mm/swap_state.c
@@ -21,7 +21,7 @@
#include <linux/vmalloc.h>
#include <linux/swap_slots.h>
#include <linux/huge_mm.h>
-
+#include "internal.h"
/*
* swapper_space is a fiction, retained to simplify the path through
@@ -429,7 +429,7 @@ struct page *__read_swap_cache_async(swp
__SetPageSwapBacked(page);
/* May fail (-ENOMEM) if XArray node allocation failed. */
- if (add_to_swap_cache(page, entry, gfp_mask & GFP_KERNEL)) {
+ if (add_to_swap_cache(page, entry, gfp_mask & GFP_RECLAIM_MASK)) {
put_swap_page(page, entry);
goto fail_unlock;
}
_
Patches currently in -mm which might be from hughd(a)google.com are
mm-vmstat-add-events-for-pmd-based-thp-migration-without-split-fix.patch
The patch titled
Subject: mm/slab: use memzero_explicit() in kzfree()
has been removed from the -mm tree. Its filename was
mm-slab-use-memzero_explicit-in-kzfree.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Waiman Long <longman(a)redhat.com>
Subject: mm/slab: use memzero_explicit() in kzfree()
The kzfree() function is normally used to clear some sensitive
information, like encryption keys, in the buffer before freeing it back to
the pool. Memset() is currently used for buffer clearing. However
unlikely, there is still a non-zero probability that the compiler may
choose to optimize away the memory clearing especially if LTO is being
used in the future. To make sure that this optimization will never
happen, memzero_explicit(), which is introduced in v3.18, is now used in
kzfree() to future-proof it.
Link: http://lkml.kernel.org/r/20200616154311.12314-2-longman@redhat.com
Fixes: 3ef0e5ba4673 ("slab: introduce kzfree()")
Signed-off-by: Waiman Long <longman(a)redhat.com>
Acked-by: Michal Hocko <mhocko(a)suse.com>
Cc: David Howells <dhowells(a)redhat.com>
Cc: Jarkko Sakkinen <jarkko.sakkinen(a)linux.intel.com>
Cc: James Morris <jmorris(a)namei.org>
Cc: "Serge E. Hallyn" <serge(a)hallyn.com>
Cc: Joe Perches <joe(a)perches.com>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: David Rientjes <rientjes(a)google.com>
Cc: Johannes Weiner <hannes(a)cmpxchg.org>
Cc: Dan Carpenter <dan.carpenter(a)oracle.com>
Cc: "Jason A . Donenfeld" <Jason(a)zx2c4.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/slab_common.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/mm/slab_common.c~mm-slab-use-memzero_explicit-in-kzfree
+++ a/mm/slab_common.c
@@ -1726,7 +1726,7 @@ void kzfree(const void *p)
if (unlikely(ZERO_OR_NULL_PTR(mem)))
return;
ks = ksize(mem);
- memset(mem, 0, ks);
+ memzero_explicit(mem, ks);
kfree(mem);
}
EXPORT_SYMBOL(kzfree);
_
Patches currently in -mm which might be from longman(a)redhat.com are
mm-treewide-rename-kzfree-to-kfree_sensitive.patch
sched-mm-optimize-current_gfp_context.patch
The patch titled
Subject: mm, slab: fix sign conversion problem in memcg_uncharge_slab()
has been removed from the -mm tree. Its filename was
mm-slab-fix-sign-conversion-problem-in-memcg_uncharge_slab.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Waiman Long <longman(a)redhat.com>
Subject: mm, slab: fix sign conversion problem in memcg_uncharge_slab()
It was found that running the LTP test on a PowerPC system could produce
erroneous values in /proc/meminfo, like:
MemTotal: 531915072 kB
MemFree: 507962176 kB
MemAvailable: 1100020596352 kB
Using bisection, the problem is tracked down to commit 9c315e4d7d8c ("mm:
memcg/slab: cache page number in memcg_(un)charge_slab()").
In memcg_uncharge_slab() with a "int order" argument:
unsigned int nr_pages = 1 << order;
:
mod_lruvec_state(lruvec, cache_vmstat_idx(s), -nr_pages);
The mod_lruvec_state() function will eventually call the
__mod_zone_page_state() which accepts a long argument. Depending on the
compiler and how inlining is done, "-nr_pages" may be treated as a
negative number or a very large positive number. Apparently, it was
treated as a large positive number in that PowerPC system leading to
incorrect stat counts. This problem hasn't been seen in x86-64 yet,
perhaps the gcc compiler there has some slight difference in behavior.
It is fixed by making nr_pages a signed value. For consistency, a similar
change is applied to memcg_charge_slab() as well.
Link: http://lkml.kernel.org/r/20200620184719.10994-1-longman@redhat.com
Fixes: 9c315e4d7d8c ("mm: memcg/slab: cache page number in memcg_(un)charge_slab()").
Signed-off-by: Waiman Long <longman(a)redhat.com>
Acked-by: Roman Gushchin <guro(a)fb.com>
Cc: Christoph Lameter <cl(a)linux.com>
Cc: Pekka Enberg <penberg(a)kernel.org>
Cc: David Rientjes <rientjes(a)google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim(a)lge.com>
Cc: Shakeel Butt <shakeelb(a)google.com>
Cc: Johannes Weiner <hannes(a)cmpxchg.org>
Cc: Michal Hocko <mhocko(a)kernel.org>
Cc: Vladimir Davydov <vdavydov.dev(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/slab.h | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
--- a/mm/slab.h~mm-slab-fix-sign-conversion-problem-in-memcg_uncharge_slab
+++ a/mm/slab.h
@@ -348,7 +348,7 @@ static __always_inline int memcg_charge_
gfp_t gfp, int order,
struct kmem_cache *s)
{
- unsigned int nr_pages = 1 << order;
+ int nr_pages = 1 << order;
struct mem_cgroup *memcg;
struct lruvec *lruvec;
int ret;
@@ -388,7 +388,7 @@ out:
static __always_inline void memcg_uncharge_slab(struct page *page, int order,
struct kmem_cache *s)
{
- unsigned int nr_pages = 1 << order;
+ int nr_pages = 1 << order;
struct mem_cgroup *memcg;
struct lruvec *lruvec;
_
Patches currently in -mm which might be from longman(a)redhat.com are
mm-treewide-rename-kzfree-to-kfree_sensitive.patch
sched-mm-optimize-current_gfp_context.patch
The patch titled
Subject: ocfs2: fix value of OCFS2_INVALID_SLOT
has been removed from the -mm tree. Its filename was
ocfs2-fix-value-of-ocfs2_invalid_slot.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Junxiao Bi <junxiao.bi(a)oracle.com>
Subject: ocfs2: fix value of OCFS2_INVALID_SLOT
In the ocfs2 disk layout, slot number is 16 bits, but in ocfs2
implementation, slot number is 32 bits. Usually this will not cause any
issue, because slot number is converted from u16 to u32, but
OCFS2_INVALID_SLOT was defined as -1, when an invalid slot number from
disk was obtained, its value was (u16)-1, and it was converted to u32.
Then the following checking in get_local_system_inode will be always
skipped:
static struct inode **get_local_system_inode(struct ocfs2_super *osb,
int type,
u32 slot)
{
BUG_ON(slot == OCFS2_INVALID_SLOT);
...
}
Link: http://lkml.kernel.org/r/20200616183829.87211-5-junxiao.bi@oracle.com
Signed-off-by: Junxiao Bi <junxiao.bi(a)oracle.com>
Reviewed-by: Joseph Qi <joseph.qi(a)linux.alibaba.com>
Cc: Mark Fasheh <mark(a)fasheh.com>
Cc: Joel Becker <jlbec(a)evilplan.org>
Cc: Changwei Ge <gechangwei(a)live.cn>
Cc: Gang He <ghe(a)suse.com>
Cc: Jun Piao <piaojun(a)huawei.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/ocfs2/ocfs2_fs.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/fs/ocfs2/ocfs2_fs.h~ocfs2-fix-value-of-ocfs2_invalid_slot
+++ a/fs/ocfs2/ocfs2_fs.h
@@ -290,7 +290,7 @@
#define OCFS2_MAX_SLOTS 255
/* Slot map indicator for an empty slot */
-#define OCFS2_INVALID_SLOT -1
+#define OCFS2_INVALID_SLOT ((u16)-1)
#define OCFS2_VOL_UUID_LEN 16
#define OCFS2_MAX_VOL_LABEL_LEN 64
_
Patches currently in -mm which might be from junxiao.bi(a)oracle.com are
The patch titled
Subject: ocfs2: load global_inode_alloc
has been removed from the -mm tree. Its filename was
ocfs2-load-global_inode_alloc.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Junxiao Bi <junxiao.bi(a)oracle.com>
Subject: ocfs2: load global_inode_alloc
Set global_inode_alloc as OCFS2_FIRST_ONLINE_SYSTEM_INODE, that will make
it load during mount. It can be used to test whether some global/system
inodes are valid. One use case is that nfsd will test whether root inode
is valid.
Link: http://lkml.kernel.org/r/20200616183829.87211-3-junxiao.bi@oracle.com
Signed-off-by: Junxiao Bi <junxiao.bi(a)oracle.com>
Reviewed-by: Joseph Qi <joseph.qi(a)linux.alibaba.com>
Cc: Changwei Ge <gechangwei(a)live.cn>
Cc: Gang He <ghe(a)suse.com>
Cc: Joel Becker <jlbec(a)evilplan.org>
Cc: Jun Piao <piaojun(a)huawei.com>
Cc: Mark Fasheh <mark(a)fasheh.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/ocfs2/ocfs2_fs.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/fs/ocfs2/ocfs2_fs.h~ocfs2-load-global_inode_alloc
+++ a/fs/ocfs2/ocfs2_fs.h
@@ -326,8 +326,8 @@ struct ocfs2_system_inode_info {
enum {
BAD_BLOCK_SYSTEM_INODE = 0,
GLOBAL_INODE_ALLOC_SYSTEM_INODE,
+#define OCFS2_FIRST_ONLINE_SYSTEM_INODE GLOBAL_INODE_ALLOC_SYSTEM_INODE
SLOT_MAP_SYSTEM_INODE,
-#define OCFS2_FIRST_ONLINE_SYSTEM_INODE SLOT_MAP_SYSTEM_INODE
HEARTBEAT_SYSTEM_INODE,
GLOBAL_BITMAP_SYSTEM_INODE,
USER_QUOTA_SYSTEM_INODE,
_
Patches currently in -mm which might be from junxiao.bi(a)oracle.com are
The patch titled
Subject: ocfs2: avoid inode removal while nfsd is accessing it
has been removed from the -mm tree. Its filename was
ocfs2-avoid-inode-removed-while-nfsd-access-it.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Junxiao Bi <junxiao.bi(a)oracle.com>
Subject: ocfs2: avoid inode removal while nfsd is accessing it
Patch series "ocfs2: fix nfsd over ocfs2 issues", v2.
This is a series of patches to fix issues on nfsd over ocfs2. patch 1 is
to avoid inode removed while nfsd access it patch 2 & 3 is to fix a panic
issue.
This patch (of 4):
When nfsd is getting file dentry using handle or parent dentry of some
dentry, one cluster lock is used to avoid inode removed from other node,
but it still could be removed from local node, so use a rw lock to avoid
this.
Link: http://lkml.kernel.org/r/20200616183829.87211-1-junxiao.bi@oracle.com
Link: http://lkml.kernel.org/r/20200616183829.87211-2-junxiao.bi@oracle.com
Signed-off-by: Junxiao Bi <junxiao.bi(a)oracle.com>
Reviewed-by: Joseph Qi <joseph.qi(a)linux.alibaba.com>
Cc: Changwei Ge <gechangwei(a)live.cn>
Cc: Gang He <ghe(a)suse.com>
Cc: Joel Becker <jlbec(a)evilplan.org>
Cc: Jun Piao <piaojun(a)huawei.com>
Cc: Mark Fasheh <mark(a)fasheh.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/ocfs2/dlmglue.c | 17 ++++++++++++++++-
fs/ocfs2/ocfs2.h | 1 +
2 files changed, 17 insertions(+), 1 deletion(-)
--- a/fs/ocfs2/dlmglue.c~ocfs2-avoid-inode-removed-while-nfsd-access-it
+++ a/fs/ocfs2/dlmglue.c
@@ -689,6 +689,12 @@ static void ocfs2_nfs_sync_lock_res_init
&ocfs2_nfs_sync_lops, osb);
}
+static void ocfs2_nfs_sync_lock_init(struct ocfs2_super *osb)
+{
+ ocfs2_nfs_sync_lock_res_init(&osb->osb_nfs_sync_lockres, osb);
+ init_rwsem(&osb->nfs_sync_rwlock);
+}
+
void ocfs2_trim_fs_lock_res_init(struct ocfs2_super *osb)
{
struct ocfs2_lock_res *lockres = &osb->osb_trim_fs_lockres;
@@ -2855,6 +2861,11 @@ int ocfs2_nfs_sync_lock(struct ocfs2_sup
if (ocfs2_is_hard_readonly(osb))
return -EROFS;
+ if (ex)
+ down_write(&osb->nfs_sync_rwlock);
+ else
+ down_read(&osb->nfs_sync_rwlock);
+
if (ocfs2_mount_local(osb))
return 0;
@@ -2873,6 +2884,10 @@ void ocfs2_nfs_sync_unlock(struct ocfs2_
if (!ocfs2_mount_local(osb))
ocfs2_cluster_unlock(osb, lockres,
ex ? LKM_EXMODE : LKM_PRMODE);
+ if (ex)
+ up_write(&osb->nfs_sync_rwlock);
+ else
+ up_read(&osb->nfs_sync_rwlock);
}
int ocfs2_trim_fs_lock(struct ocfs2_super *osb,
@@ -3340,7 +3355,7 @@ int ocfs2_dlm_init(struct ocfs2_super *o
local:
ocfs2_super_lock_res_init(&osb->osb_super_lockres, osb);
ocfs2_rename_lock_res_init(&osb->osb_rename_lockres, osb);
- ocfs2_nfs_sync_lock_res_init(&osb->osb_nfs_sync_lockres, osb);
+ ocfs2_nfs_sync_lock_init(osb);
ocfs2_orphan_scan_lock_res_init(&osb->osb_orphan_scan.os_lockres, osb);
osb->cconn = conn;
--- a/fs/ocfs2/ocfs2.h~ocfs2-avoid-inode-removed-while-nfsd-access-it
+++ a/fs/ocfs2/ocfs2.h
@@ -395,6 +395,7 @@ struct ocfs2_super
struct ocfs2_lock_res osb_super_lockres;
struct ocfs2_lock_res osb_rename_lockres;
struct ocfs2_lock_res osb_nfs_sync_lockres;
+ struct rw_semaphore nfs_sync_rwlock;
struct ocfs2_lock_res osb_trim_fs_lockres;
struct mutex obs_trim_fs_mutex;
struct ocfs2_dlm_debug *osb_dlm_debug;
_
Patches currently in -mm which might be from junxiao.bi(a)oracle.com are
The patch titled
Subject: mm, compaction: make capture control handling safe wrt interrupts
has been removed from the -mm tree. Its filename was
mm-compaction-make-capture-control-handling-safe-wrt-interrupts.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Vlastimil Babka <vbabka(a)suse.cz>
Subject: mm, compaction: make capture control handling safe wrt interrupts
Hugh reports:
: While stressing compaction, one run oopsed on NULL capc->cc in
: __free_one_page()'s task_capc(zone): compact_zone_order() had been
: interrupted, and a page was being freed in the return from interrupt.
:
: Though you would not expect it from the source, both gccs I was using (a
: 4.8.1 and a 7.5.0) had chosen to compile compact_zone_order() with the
: ".cc = &cc" implemented by mov %rbx,-0xb0(%rbp) immediately before callq
: compact_zone - long after the "current->capture_control = &capc". An
: interrupt in between those finds capc->cc NULL (zeroed by an earlier rep
: stos).
:
: This could presumably be fixed by a barrier() before setting
: current->capture_control in compact_zone_order(); but would also need more
: care on return from compact_zone(), in order not to risk leaking a page
: captured by interrupt just before capture_control is reset.
:
: Maybe that is the preferable fix, but I felt safer for task_capc() to
: exclude the rather surprising possibility of capture at interrupt time.
I have checked that gcc10 also behaves the same.
The advantage of fix in compact_zone_order() is that we don't add another
test in the page freeing hot path, and that it might prevent future
problems if we stop exposing pointers to uninitialized structures in
current task.
So this patch implements the suggestion for compact_zone_order() with
barrier() (and WRITE_ONCE() to prevent store tearing) for setting
current->capture_control, and prevents page leaking with
WRITE_ONCE/READ_ONCE in the proper order.
Link: http://lkml.kernel.org/r/20200616082649.27173-1-vbabka@suse.cz
Fixes: 5e1f0f098b46 ("mm, compaction: capture a page under direct compaction")
Signed-off-by: Vlastimil Babka <vbabka(a)suse.cz>
Reported-by: Hugh Dickins <hughd(a)google.com>
Suggested-by: Hugh Dickins <hughd(a)google.com>
Acked-by: Hugh Dickins <hughd(a)google.com>
Cc: Alex Shi <alex.shi(a)linux.alibaba.com>
Cc: Li Wang <liwang(a)redhat.com>
Cc: Mel Gorman <mgorman(a)techsingularity.net>
Cc: <stable(a)vger.kernel.org> [5.1+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/compaction.c | 17 ++++++++++++++---
1 file changed, 14 insertions(+), 3 deletions(-)
--- a/mm/compaction.c~mm-compaction-make-capture-control-handling-safe-wrt-interrupts
+++ a/mm/compaction.c
@@ -2316,15 +2316,26 @@ static enum compact_result compact_zone_
.page = NULL,
};
- current->capture_control = &capc;
+ /*
+ * Make sure the structs are really initialized before we expose the
+ * capture control, in case we are interrupted and the interrupt handler
+ * frees a page.
+ */
+ barrier();
+ WRITE_ONCE(current->capture_control, &capc);
ret = compact_zone(&cc, &capc);
VM_BUG_ON(!list_empty(&cc.freepages));
VM_BUG_ON(!list_empty(&cc.migratepages));
- *capture = capc.page;
- current->capture_control = NULL;
+ /*
+ * Make sure we hide capture control first before we read the captured
+ * page pointer, otherwise an interrupt could free and capture a page
+ * and we would leak it.
+ */
+ WRITE_ONCE(current->capture_control, NULL);
+ *capture = READ_ONCE(capc.page);
return ret;
}
_
Patches currently in -mm which might be from vbabka(a)suse.cz are
mm-slub-extend-slub_debug-syntax-for-multiple-blocks.patch
mm-slub-make-some-slub_debug-related-attributes-read-only.patch
mm-slub-remove-runtime-allocation-order-changes.patch
mm-slub-make-remaining-slub_debug-related-attributes-read-only.patch
mm-slub-make-reclaim_account-attribute-read-only.patch
mm-slub-introduce-static-key-for-slub_debug.patch
mm-slub-introduce-kmem_cache_debug_flags.patch
mm-slub-introduce-kmem_cache_debug_flags-fix.patch
mm-slub-extend-checks-guarded-by-slub_debug-static-key.patch
mm-slab-slub-move-and-improve-cache_from_obj.patch
mm-slab-slub-improve-error-reporting-and-overhead-of-cache_from_obj.patch
mm-slab-slub-improve-error-reporting-and-overhead-of-cache_from_obj-fix.patch
mm-page_alloc-use-unlikely-in-task_capc.patch