The patch titled
Subject: z3fold: fix memory leak in kmem cache
has been removed from the -mm tree. Its filename was
z3fold-fix-memory-leak-in-kmem-cache.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Vitaly Wool <vitalywool(a)gmail.com>
Subject: z3fold: fix memory leak in kmem cache
Currently there is a leak in init_z3fold_page() -- it allocates handles
from kmem cache even for headless pages, but then they are never used and
never freed, so eventually kmem cache may get exhausted. This patch
provides a fix for that.
Link: http://lkml.kernel.org/r/20190917185352.44cf285d3ebd9e64548de5de@gmail.com
Signed-off-by: Vitaly Wool <vitalywool(a)gmail.com>
Reported-by: Markus Linnala <markus.linnala(a)gmail.com>
Tested-by: Markus Linnala <markus.linnala(a)gmail.com>
Cc: Dan Streetman <ddstreet(a)ieee.org>
Cc: Henry Burns <henrywolfeburns(a)gmail.com>
Cc: Shakeel Butt <shakeelb(a)google.com>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/z3fold.c | 15 +++++++++------
1 file changed, 9 insertions(+), 6 deletions(-)
--- a/mm/z3fold.c~z3fold-fix-memory-leak-in-kmem-cache
+++ a/mm/z3fold.c
@@ -295,14 +295,11 @@ static void z3fold_unregister_migration(
}
/* Initializes the z3fold header of a newly allocated z3fold page */
-static struct z3fold_header *init_z3fold_page(struct page *page,
+static struct z3fold_header *init_z3fold_page(struct page *page, bool headless,
struct z3fold_pool *pool, gfp_t gfp)
{
struct z3fold_header *zhdr = page_address(page);
- struct z3fold_buddy_slots *slots = alloc_slots(pool, gfp);
-
- if (!slots)
- return NULL;
+ struct z3fold_buddy_slots *slots;
INIT_LIST_HEAD(&page->lru);
clear_bit(PAGE_HEADLESS, &page->private);
@@ -310,6 +307,12 @@ static struct z3fold_header *init_z3fold
clear_bit(NEEDS_COMPACTING, &page->private);
clear_bit(PAGE_STALE, &page->private);
clear_bit(PAGE_CLAIMED, &page->private);
+ if (headless)
+ return zhdr;
+
+ slots = alloc_slots(pool, gfp);
+ if (!slots)
+ return NULL;
spin_lock_init(&zhdr->page_lock);
kref_init(&zhdr->refcount);
@@ -930,7 +933,7 @@ retry:
if (!page)
return -ENOMEM;
- zhdr = init_z3fold_page(page, pool, gfp);
+ zhdr = init_z3fold_page(page, bud == HEADLESS, pool, gfp);
if (!zhdr) {
__free_page(page);
return -ENOMEM;
_
Patches currently in -mm which might be from vitalywool(a)gmail.com are
The patch titled
Subject: z3fold: fix retry mechanism in page reclaim
has been removed from the -mm tree. Its filename was
z3fold-fix-retry-mechanism-in-page-reclaim.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Vitaly Wool <vitalywool(a)gmail.com>
Subject: z3fold: fix retry mechanism in page reclaim
z3fold_page_reclaim()'s retry mechanism is broken: on a second iteration
it will have zhdr from the first one so that zhdr is no longer in line
with struct page. That leads to crashes when the system is stressed.
Fix that by moving zhdr assignment up.
While at it, protect against using already freed handles by using own
local slots structure in z3fold_page_reclaim().
Link: http://lkml.kernel.org/r/20190908162919.830388dc7404d1e2c80f4095@gmail.com
Signed-off-by: Vitaly Wool <vitalywool(a)gmail.com>
Reported-by: Markus Linnala <markus.linnala(a)gmail.com>
Reported-by: Chris Murphy <bugzilla(a)colorremedies.com>
Reported-by: Agustin Dall'Alba <agustin(a)dallalba.com.ar>
Cc: "Maciej S. Szmigiero" <mail(a)maciej.szmigiero.name>
Cc: Shakeel Butt <shakeelb(a)google.com>
Cc: Henry Burns <henrywolfeburns(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/z3fold.c | 49 ++++++++++++++++++++++++++++++++++---------------
1 file changed, 34 insertions(+), 15 deletions(-)
--- a/mm/z3fold.c~z3fold-fix-retry-mechanism-in-page-reclaim
+++ a/mm/z3fold.c
@@ -366,9 +366,10 @@ static inline int __idx(struct z3fold_he
* Encodes the handle of a particular buddy within a z3fold page
* Pool lock should be held as this function accesses first_num
*/
-static unsigned long encode_handle(struct z3fold_header *zhdr, enum buddy bud)
+static unsigned long __encode_handle(struct z3fold_header *zhdr,
+ struct z3fold_buddy_slots *slots,
+ enum buddy bud)
{
- struct z3fold_buddy_slots *slots;
unsigned long h = (unsigned long)zhdr;
int idx = 0;
@@ -385,11 +386,15 @@ static unsigned long encode_handle(struc
if (bud == LAST)
h |= (zhdr->last_chunks << BUDDY_SHIFT);
- slots = zhdr->slots;
slots->slot[idx] = h;
return (unsigned long)&slots->slot[idx];
}
+static unsigned long encode_handle(struct z3fold_header *zhdr, enum buddy bud)
+{
+ return __encode_handle(zhdr, zhdr->slots, bud);
+}
+
/* Returns the z3fold page where a given handle is stored */
static inline struct z3fold_header *handle_to_z3fold_header(unsigned long h)
{
@@ -624,6 +629,7 @@ static void do_compact_page(struct z3fol
}
if (unlikely(PageIsolated(page) ||
+ test_bit(PAGE_CLAIMED, &page->private) ||
test_bit(PAGE_STALE, &page->private))) {
z3fold_page_unlock(zhdr);
return;
@@ -1100,6 +1106,7 @@ static int z3fold_reclaim_page(struct z3
struct z3fold_header *zhdr = NULL;
struct page *page = NULL;
struct list_head *pos;
+ struct z3fold_buddy_slots slots;
unsigned long first_handle = 0, middle_handle = 0, last_handle = 0;
spin_lock(&pool->lock);
@@ -1118,16 +1125,22 @@ static int z3fold_reclaim_page(struct z3
/* this bit could have been set by free, in which case
* we pass over to the next page in the pool.
*/
- if (test_and_set_bit(PAGE_CLAIMED, &page->private))
+ if (test_and_set_bit(PAGE_CLAIMED, &page->private)) {
+ page = NULL;
continue;
+ }
- if (unlikely(PageIsolated(page)))
+ if (unlikely(PageIsolated(page))) {
+ clear_bit(PAGE_CLAIMED, &page->private);
+ page = NULL;
continue;
+ }
+ zhdr = page_address(page);
if (test_bit(PAGE_HEADLESS, &page->private))
break;
- zhdr = page_address(page);
if (!z3fold_page_trylock(zhdr)) {
+ clear_bit(PAGE_CLAIMED, &page->private);
zhdr = NULL;
continue; /* can't evict at this point */
}
@@ -1145,26 +1158,30 @@ static int z3fold_reclaim_page(struct z3
if (!test_bit(PAGE_HEADLESS, &page->private)) {
/*
- * We need encode the handles before unlocking, since
- * we can race with free that will set
- * (first|last)_chunks to 0
+ * We need encode the handles before unlocking, and
+ * use our local slots structure because z3fold_free
+ * can zero out zhdr->slots and we can't do much
+ * about that
*/
first_handle = 0;
last_handle = 0;
middle_handle = 0;
if (zhdr->first_chunks)
- first_handle = encode_handle(zhdr, FIRST);
+ first_handle = __encode_handle(zhdr, &slots,
+ FIRST);
if (zhdr->middle_chunks)
- middle_handle = encode_handle(zhdr, MIDDLE);
+ middle_handle = __encode_handle(zhdr, &slots,
+ MIDDLE);
if (zhdr->last_chunks)
- last_handle = encode_handle(zhdr, LAST);
+ last_handle = __encode_handle(zhdr, &slots,
+ LAST);
/*
* it's safe to unlock here because we hold a
* reference to this page
*/
z3fold_page_unlock(zhdr);
} else {
- first_handle = encode_handle(zhdr, HEADLESS);
+ first_handle = __encode_handle(zhdr, &slots, HEADLESS);
last_handle = middle_handle = 0;
}
@@ -1194,9 +1211,9 @@ next:
spin_lock(&pool->lock);
list_add(&page->lru, &pool->lru);
spin_unlock(&pool->lock);
+ clear_bit(PAGE_CLAIMED, &page->private);
} else {
z3fold_page_lock(zhdr);
- clear_bit(PAGE_CLAIMED, &page->private);
if (kref_put(&zhdr->refcount,
release_z3fold_page_locked)) {
atomic64_dec(&pool->pages_nr);
@@ -1211,6 +1228,7 @@ next:
list_add(&page->lru, &pool->lru);
spin_unlock(&pool->lock);
z3fold_page_unlock(zhdr);
+ clear_bit(PAGE_CLAIMED, &page->private);
}
/* We started off locked to we need to lock the pool back */
@@ -1315,7 +1333,8 @@ static bool z3fold_page_isolate(struct p
VM_BUG_ON_PAGE(!PageMovable(page), page);
VM_BUG_ON_PAGE(PageIsolated(page), page);
- if (test_bit(PAGE_HEADLESS, &page->private))
+ if (test_bit(PAGE_HEADLESS, &page->private) ||
+ test_bit(PAGE_CLAIMED, &page->private))
return false;
zhdr = page_address(page);
_
Patches currently in -mm which might be from vitalywool(a)gmail.com are
The patch titled
Subject: Revert "mm/z3fold.c: fix race between migration and destruction"
has been removed from the -mm tree. Its filename was
revert-mm-z3foldc-fix-race-between-migration-and-destruction.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Vitaly Wool <vitalywool(a)gmail.com>
Subject: Revert "mm/z3fold.c: fix race between migration and destruction"
With the original commit applied, z3fold_zpool_destroy() may get blocked
on wait_event() for indefinite time. Revert this commit for the time
being to get rid of this problem since the issue the original commit
addresses is less severe.
Link: http://lkml.kernel.org/r/20190910123142.7a9c8d2de4d0acbc0977c602@gmail.com
Fixes: d776aaa9895eb6eb77 ("mm/z3fold.c: fix race between migration and destruction")
Reported-by: Agust�n Dall'Alba <agustin(a)dallalba.com.ar>
Signed-off-by: Vitaly Wool <vitalywool(a)gmail.com>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: Vitaly Wool <vitalywool(a)gmail.com>
Cc: Shakeel Butt <shakeelb(a)google.com>
Cc: Jonathan Adams <jwadams(a)google.com>
Cc: Henry Burns <henrywolfeburns(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/z3fold.c | 90 --------------------------------------------------
1 file changed, 90 deletions(-)
--- a/mm/z3fold.c~revert-mm-z3foldc-fix-race-between-migration-and-destruction
+++ a/mm/z3fold.c
@@ -41,7 +41,6 @@
#include <linux/workqueue.h>
#include <linux/slab.h>
#include <linux/spinlock.h>
-#include <linux/wait.h>
#include <linux/zpool.h>
#include <linux/magic.h>
@@ -146,8 +145,6 @@ struct z3fold_header {
* @release_wq: workqueue for safe page release
* @work: work_struct for safe page release
* @inode: inode for z3fold pseudo filesystem
- * @destroying: bool to stop migration once we start destruction
- * @isolated: int to count the number of pages currently in isolation
*
* This structure is allocated at pool creation time and maintains metadata
* pertaining to a particular z3fold pool.
@@ -166,11 +163,8 @@ struct z3fold_pool {
const struct zpool_ops *zpool_ops;
struct workqueue_struct *compact_wq;
struct workqueue_struct *release_wq;
- struct wait_queue_head isolate_wait;
struct work_struct work;
struct inode *inode;
- bool destroying;
- int isolated;
};
/*
@@ -775,7 +769,6 @@ static struct z3fold_pool *z3fold_create
goto out_c;
spin_lock_init(&pool->lock);
spin_lock_init(&pool->stale_lock);
- init_waitqueue_head(&pool->isolate_wait);
pool->unbuddied = __alloc_percpu(sizeof(struct list_head)*NCHUNKS, 2);
if (!pool->unbuddied)
goto out_pool;
@@ -815,15 +808,6 @@ out:
return NULL;
}
-static bool pool_isolated_are_drained(struct z3fold_pool *pool)
-{
- bool ret;
-
- spin_lock(&pool->lock);
- ret = pool->isolated == 0;
- spin_unlock(&pool->lock);
- return ret;
-}
/**
* z3fold_destroy_pool() - destroys an existing z3fold pool
* @pool: the z3fold pool to be destroyed
@@ -833,22 +817,6 @@ static bool pool_isolated_are_drained(st
static void z3fold_destroy_pool(struct z3fold_pool *pool)
{
kmem_cache_destroy(pool->c_handle);
- /*
- * We set pool-> destroying under lock to ensure that
- * z3fold_page_isolate() sees any changes to destroying. This way we
- * avoid the need for any memory barriers.
- */
-
- spin_lock(&pool->lock);
- pool->destroying = true;
- spin_unlock(&pool->lock);
-
- /*
- * We need to ensure that no pages are being migrated while we destroy
- * these workqueues, as migration can queue work on either of the
- * workqueues.
- */
- wait_event(pool->isolate_wait, !pool_isolated_are_drained(pool));
/*
* We need to destroy pool->compact_wq before pool->release_wq,
@@ -1339,28 +1307,6 @@ static u64 z3fold_get_pool_size(struct z
return atomic64_read(&pool->pages_nr);
}
-/*
- * z3fold_dec_isolated() expects to be called while pool->lock is held.
- */
-static void z3fold_dec_isolated(struct z3fold_pool *pool)
-{
- assert_spin_locked(&pool->lock);
- VM_BUG_ON(pool->isolated <= 0);
- pool->isolated--;
-
- /*
- * If we have no more isolated pages, we have to see if
- * z3fold_destroy_pool() is waiting for a signal.
- */
- if (pool->isolated == 0 && waitqueue_active(&pool->isolate_wait))
- wake_up_all(&pool->isolate_wait);
-}
-
-static void z3fold_inc_isolated(struct z3fold_pool *pool)
-{
- pool->isolated++;
-}
-
static bool z3fold_page_isolate(struct page *page, isolate_mode_t mode)
{
struct z3fold_header *zhdr;
@@ -1387,34 +1333,6 @@ static bool z3fold_page_isolate(struct p
spin_lock(&pool->lock);
if (!list_empty(&page->lru))
list_del(&page->lru);
- /*
- * We need to check for destruction while holding pool->lock, as
- * otherwise destruction could see 0 isolated pages, and
- * proceed.
- */
- if (unlikely(pool->destroying)) {
- spin_unlock(&pool->lock);
- /*
- * If this page isn't stale, somebody else holds a
- * reference to it. Let't drop our refcount so that they
- * can call the release logic.
- */
- if (unlikely(kref_put(&zhdr->refcount,
- release_z3fold_page_locked))) {
- /*
- * If we get here we have kref problems, so we
- * should freak out.
- */
- WARN(1, "Z3fold is experiencing kref problems\n");
- z3fold_page_unlock(zhdr);
- return false;
- }
- z3fold_page_unlock(zhdr);
- return false;
- }
-
-
- z3fold_inc_isolated(pool);
spin_unlock(&pool->lock);
z3fold_page_unlock(zhdr);
return true;
@@ -1483,10 +1401,6 @@ static int z3fold_page_migrate(struct ad
queue_work_on(new_zhdr->cpu, pool->compact_wq, &new_zhdr->work);
- spin_lock(&pool->lock);
- z3fold_dec_isolated(pool);
- spin_unlock(&pool->lock);
-
page_mapcount_reset(page);
put_page(page);
return 0;
@@ -1506,14 +1420,10 @@ static void z3fold_page_putback(struct p
INIT_LIST_HEAD(&page->lru);
if (kref_put(&zhdr->refcount, release_z3fold_page_locked)) {
atomic64_dec(&pool->pages_nr);
- spin_lock(&pool->lock);
- z3fold_dec_isolated(pool);
- spin_unlock(&pool->lock);
return;
}
spin_lock(&pool->lock);
list_add(&page->lru, &pool->lru);
- z3fold_dec_isolated(pool);
spin_unlock(&pool->lock);
z3fold_page_unlock(zhdr);
}
_
Patches currently in -mm which might be from vitalywool(a)gmail.com are
The patch titled
Subject: kernel/sysctl.c: do not override max_threads provided by userspace
has been added to the -mm tree. Its filename is
kernel-sysctlc-do-not-override-max_threads-provided-by-userspace.patch
This patch should soon appear at
http://ozlabs.org/~akpm/mmots/broken-out/kernel-sysctlc-do-not-override-max…
and later at
http://ozlabs.org/~akpm/mmotm/broken-out/kernel-sysctlc-do-not-override-max…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Michal Hocko <mhocko(a)suse.com>
Subject: kernel/sysctl.c: do not override max_threads provided by userspace
Partially revert 16db3d3f1170 ("kernel/sysctl.c: threads-max observe
limits") because the patch is causing a regression to any workload which
needs to override the auto-tuning of the limit provided by kernel.
set_max_threads is implementing a boot time guesstimate to provide a
sensible limit of the concurrently running threads so that runaways will
not deplete all the memory. This is a good thing in general but there are
workloads which might need to increase this limit for an application to
run (reportedly WebSpher MQ is affected) and that is simply not possible
after the mentioned change. It is also very dubious to override an admin
decision by an estimation that doesn't have any direct relation to
correctness of the kernel operation.
Fix this by dropping set_max_threads from sysctl_max_threads so any value
is accepted as long as it fits into MAX_THREADS which is important to
check because allowing more threads could break internal robust futex
restriction. While at it, do not use MIN_THREADS as the lower boundary
because it is also only a heuristic for automatic estimation and admin
might have a good reason to stop new threads to be created even when below
this limit.
This became more severe when we switched x86 from 4k to 8k kernel stacks.
Starting since 6538b8ea886e ("x86_64: expand kernel stack to 16K") (3.16)
we use THREAD_SIZE_ORDER = 2 and that halved the auto-tuned value.
In the particular case
3.12
kernel.threads-max = 515561
4.4
kernel.threads-max = 200000
Neither of the two values is really insane on 32GB machine.
I am not sure we want/need to tune the max_thread value further. If
anything the tuning should be removed altogether if proven not useful in
general. But we definitely need a way to override this auto-tuning.
Link: http://lkml.kernel.org/r/20190922065801.GB18814@dhcp22.suse.cz
Fixes: 16db3d3f1170 ("kernel/sysctl.c: threads-max observe limits")
Signed-off-by: Michal Hocko <mhocko(a)suse.com>
Reviewed-by: "Eric W. Biederman" <ebiederm(a)xmission.com>
Cc: Heinrich Schuchardt <xypron.glpk(a)gmx.de>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
kernel/fork.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
--- a/kernel/fork.c~kernel-sysctlc-do-not-override-max_threads-provided-by-userspace
+++ a/kernel/fork.c
@@ -2920,7 +2920,7 @@ int sysctl_max_threads(struct ctl_table
struct ctl_table t;
int ret;
int threads = max_threads;
- int min = MIN_THREADS;
+ int min = 1;
int max = MAX_THREADS;
t = *table;
@@ -2932,7 +2932,7 @@ int sysctl_max_threads(struct ctl_table
if (ret || !write)
return ret;
- set_max_threads(threads);
+ max_threads = threads;
return 0;
}
_
Patches currently in -mm which might be from mhocko(a)suse.com are
mm-oom-consider-present-pages-for-the-node-size.patch
memcg-kmem-deprecate-kmemlimit_in_bytes.patch
memcg-kmem-do-not-fail-__gfp_nofail-charges.patch
kernel-sysctlc-do-not-override-max_threads-provided-by-userspace.patch