Re: [PATCH v2] mm: zswap: properly synchronize freeing resources during CPU hotunplug

9 Jan 2025

On Wed, Jan 8, 2025 at 4:12 PM Sridhar, Kanchana P
kanchana.p.sridhar@intel.com wrote:
...
...
-----Original Message-----
From: Yosry Ahmed yosryahmed@google.com
Sent: Wednesday, January 8, 2025 2:25 PM
To: Andrew Morton akpm@linux-foundation.org
Cc: Johannes Weiner hannes@cmpxchg.org; Nhat Pham
nphamcs@gmail.com; Chengming Zhou chengming.zhou@linux.dev;
Vitaly Wool vitalywool@gmail.com; Barry Song baohua@kernel.org; Sam
Sun samsun1006219@gmail.com; Sridhar, Kanchana P
kanchana.p.sridhar@intel.com; linux-mm@kvack.org; linux-
kernel@vger.kernel.org; Yosry Ahmed yosryahmed@google.com;
stable@vger.kernel.org
Subject: [PATCH v2] mm: zswap: properly synchronize freeing resources
during CPU hotunplug
In zswap_compress() and zswap_decompress(), the per-CPU acomp_ctx of
the
current CPU at the beginning of the operation is retrieved and used
throughout.  However, since neither preemption nor migration are
disabled, it is possible that the operation continues on a different
CPU.
If the original CPU is hotunplugged while the acomp_ctx is still in use,
we run into a UAF bug as some of the resources attached to the acomp_ctx
are freed during hotunplug in zswap_cpu_comp_dead() (i.e.
acomp_ctx.buffer, acomp_ctx.req, or acomp_ctx.acomp).
The problem was introduced in commit 1ec3b5fe6eec ("mm/zswap: move to
use crypto_acomp API for hardware acceleration") when the switch to the
crypto_acomp API was made.  Prior to that, the per-CPU crypto_comp was
retrieved using get_cpu_ptr() which disables preemption and makes sure
the CPU cannot go away from under us.  Preemption cannot be disabled
with the crypto_acomp API as a sleepable context is needed.
Use the acomp_ctx.mutex to synchronize CPU hotplug callbacks allocating
and freeing resources with compression/decompression paths. Make sure
that acomp_ctx.req is NULL when the resources are freed. In the
compression/decompression paths, check if acomp_ctx.req is NULL after
acquiring the mutex (meaning the CPU was offlined) and retry on the new
CPU.
The initialization of acomp_ctx.mutex is moved from the CPU hotplug
callback to the pool initialization where it belongs (where the mutex is
allocated). In addition to adding clarity, this makes sure that CPU
hotplug cannot reinitialize a mutex that is already locked by
compression/decompression.
Previously a fix was attempted by holding cpus_read_lock() [1]. This
would have caused a potential deadlock as it is possible for code
already holding the lock to fall into reclaim and enter zswap (causing a
deadlock). A fix was also attempted using SRCU for synchronization, but
Johannes pointed out that synchronize_srcu() cannot be used in CPU
hotplug notifiers [2].
Alternative fixes that were considered/attempted and could have worked:

Refcounting the per-CPU acomp_ctx. This involves complexity in
handling the race between the refcount dropping to zero in
zswap_[de]compress() and the refcount being re-initialized when the
CPU is onlined.
Disabling migration before getting the per-CPU acomp_ctx [3], but
that's discouraged and is a much bigger hammer than needed, and could
result in subtle performance issues.

[1]https://lkml.kernel.org/20241219212437.2714151-1-
yosryahmed@google.com/
[2]https://lkml.kernel.org/20250107074724.1756696-2-
yosryahmed@google.com/
[3]https://lkml.kernel.org/20250107222236.2715883-2-
yosryahmed@google.com/
Fixes: 1ec3b5fe6eec ("mm/zswap: move to use crypto_acomp API for
hardware acceleration")
Cc: stable@vger.kernel.org
Signed-off-by: Yosry Ahmed yosryahmed@google.com
Reported-by: Johannes Weiner hannes@cmpxchg.org
Closes:
https://lore.kernel.org/lkml/20241113213007.GB1564047@cmpxchg.org/
Reported-by: Sam Sun samsun1006219@gmail.com
Closes:
https://lore.kernel.org/lkml/CAEkJfYMtSdM5HceNsXUDf5haghD5+o2e7Qv4O
curuL4tPg6OaQ@mail.gmail.com/

This applies on top of the latest mm-hotfixes-unstable on top of 'Revert
"mm: zswap: fix race between [de]compression and CPU hotunplug"' and
after 'mm: zswap: disable migration while using per-CPU acomp_ctx' was
dropped.
v1 -> v2:

Move the initialization of the mutex to pool initialization.
Use the mutex to also synchronize with the CPU hotplug callback (i.e.
zswap_cpu_comp_prep()).
Naming cleanups.


mm/zswap.c | 60 +++++++++++++++++++++++++++++++++++++++++---------
1 file changed, 46 insertions(+), 14 deletions(-)

diff --git a/mm/zswap.c b/mm/zswap.c
index f6316b66fb236..4d7e564732267 100644
--- a/mm/zswap.c
+++ b/mm/zswap.c
@@ -251,7 +251,7 @@ static struct zswap_pool *zswap_pool_create(char
*type, char *compressor)
      struct zswap_pool *pool;
      char name[38]; /* 'zswap' + 32 char (max) num + \0 */
      gfp_t gfp = __GFP_NORETRY | __GFP_NOWARN |
__GFP_KSWAPD_RECLAIM;

int ret;




int ret, cpu;

if (!zswap_has_pool) {
        /* if either are unset, pool initialization failed, and we



@@ -285,6 +285,9 @@ static struct zswap_pool *zswap_pool_create(char
*type, char *compressor)
              goto error;
      }

for_each_possible_cpu(cpu)


        mutex_init(&per_cpu_ptr(pool->acomp_ctx, cpu)->mutex);


ret =



cpuhp_state_add_instance(CPUHP_MM_ZSWP_POOL_PREPARE,
                                     &pool->node);
      if (ret)
@@ -821,11 +824,12 @@ static int zswap_cpu_comp_prepare(unsigned int
cpu, struct hlist_node *node)
      struct acomp_req *req;
      int ret;

mutex_init(&acomp_ctx->mutex);





mutex_lock(&acomp_ctx->mutex);
acomp_ctx->buffer = kmalloc_node(PAGE_SIZE * 2, GFP_KERNEL,



cpu_to_node(cpu));

if (!acomp_ctx->buffer)


        return -ENOMEM;




if (!acomp_ctx->buffer) {


        ret = -ENOMEM;


        goto buffer_fail;


}

acomp = crypto_alloc_acomp_node(pool->tfm_name, 0, 0,



cpu_to_node(cpu));
      if (IS_ERR(acomp)) {
@@ -844,6 +848,8 @@ static int zswap_cpu_comp_prepare(unsigned int
cpu, struct hlist_node *node)
              ret = -ENOMEM;
              goto req_fail;
      }


/* acomp_ctx->req must be NULL if the acomp_ctx is not fully



initialized */
      acomp_ctx->req = req;
For this to happen, shouldn't we directly assign:
 acomp_ctx->req = acomp_request_alloc(acomp_ctx->acomp);
 if (!acomp_ctx->req) { ...}
For the initial zswap_cpu_comp_prepare() call on a CPU, it doesn't
matter because a failure will cause a failure of initialization or CPU
onlining. For calls after a CPU is offlined, zswap_cpu_comp_dead()
will have set acomp_ctx->req to NULL, so leaving it unmodified keeps
it as NULL.
Although I suppose the comment is not really clear because
zswap_cpu_comp_prepare() could fail before acomp_request_alloc() is
ever called and this would not be a problem (as the onlining will
fail).
Maybe it's best to just drop the comment.
Andrew, could you please fold this in?
diff --git a/mm/zswap.c b/mm/zswap.c
index 4d7e564732267..30f5a27a68620 100644
--- a/mm/zswap.c
+++ b/mm/zswap.c
@@ -848,8 +848,6 @@ static int zswap_cpu_comp_prepare(unsigned int
cpu, struct hlist_node *node)
                ret = -ENOMEM;
                goto req_fail;
        }
-
-       /* acomp_ctx->req must be NULL if the acomp_ctx is not fully
initialized */
        acomp_ctx->req = req;
crypto_init_wait(&acomp_ctx->wait);
...
I was wondering how error conditions encountered in zswap_cpu_comp_prepare()
will impact zswap_[de]compress(). This is probably unrelated to this patch itself,
but is my understanding correct that an error in this procedure will cause
zswap_enabled to be set to false, which will cause any zswap_stores() to fail early?
Yeah that is my understanding, for the initial call. For later calls
when a CPU gets onlined I think the CPU onlining fails and the
acomp_ctx is not used.

    

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

Re: [PATCH v2] mm: zswap: properly synchronize freeing resources during CPU hotunplug

mm/zswap.c | 60 +++++++++++++++++++++++++++++++++++++++++---------