Re: [PATCH v2 2/2] mm: zswap: disable migration while using per-CPU acomp_ctx

7 Jan 2025

On Wed, Jan 8, 2025 at 11:47 AM Barry Song baohua@kernel.org wrote:
...
On Wed, Jan 8, 2025 at 11:22 AM Yosry Ahmed yosryahmed@google.com wrote:
...
In zswap_compress() and zswap_decompress(), the per-CPU acomp_ctx of the
current CPU at the beginning of the operation is retrieved and used
throughout.  However, since neither preemption nor migration are disabled,
it is possible that the operation continues on a different CPU.
If the original CPU is hotunplugged while the acomp_ctx is still in use,
we run into a UAF bug as the resources attached to the acomp_ctx are freed
during hotunplug in zswap_cpu_comp_dead().
The problem was introduced in commit 1ec3b5fe6eec ("mm/zswap: move to use
crypto_acomp API for hardware acceleration") when the switch to the
crypto_acomp API was made.  Prior to that, the per-CPU crypto_comp was
retrieved using get_cpu_ptr() which disables preemption and makes sure the
CPU cannot go away from under us.  Preemption cannot be disabled with the
crypto_acomp API as a sleepable context is needed.
Commit 8ba2f844f050 ("mm/zswap: change per-cpu mutex and buffer to
per-acomp_ctx") increased the UAF surface area by making the per-CPU
buffers dynamic, adding yet another resource that can be freed from under
zswap compression/decompression by CPU hotunplug.
This cannot be fixed by holding cpus_read_lock(), as it is possible for
code already holding the lock to fall into reclaim and enter zswap
(causing a deadlock). It also cannot be fixed by wrapping the usage of
acomp_ctx in an SRCU critical section and using synchronize_srcu() in
zswap_cpu_comp_dead(), because synchronize_srcu() is not allowed in
CPU-hotplug notifiers (see
Documentation/RCU/Design/Requirements/Requirements.rst).
This can be fixed by refcounting the acomp_ctx, but it involves
complexity in handling the race between the refcount dropping to zero in
zswap_[de]compress() and the refcount being re-initialized when the CPU
is onlined.
Keep things simple for now and just disable migration while using the
per-CPU acomp_ctx to block CPU hotunplug until the usage is over.
Fixes: 1ec3b5fe6eec ("mm/zswap: move to use crypto_acomp API for hardware acceleration")
Cc: stable@vger.kernel.org
Signed-off-by: Yosry Ahmed yosryahmed@google.com
Reported-by: Johannes Weiner hannes@cmpxchg.org
Closes: https://lore.kernel.org/lkml/20241113213007.GB1564047@cmpxchg.org/
Reported-by: Sam Sun samsun1006219@gmail.com
Closes: https://lore.kernel.org/lkml/CAEkJfYMtSdM5HceNsXUDf5haghD5+o2e7Qv4OcuruL4tPg...

mm/zswap.c | 19 ++++++++++++++++---
 1 file changed, 16 insertions(+), 3 deletions(-)

diff --git a/mm/zswap.c b/mm/zswap.c
index f6316b66fb236..ecd86153e8a32 100644
--- a/mm/zswap.c
+++ b/mm/zswap.c
@@ -880,6 +880,18 @@ static int zswap_cpu_comp_dead(unsigned int cpu, struct hlist_node *node)
        return 0;
 }
+/* Remain on the CPU while using its acomp_ctx to stop it from going offline */
+static struct crypto_acomp_ctx *acomp_ctx_get_cpu(struct crypto_acomp_ctx __percpu *acomp_ctx)
+{

  migrate_disable();



I'm not entirely sure, but I feel it is quite unsafe. Allowing sleep
during migrate_disable() and
migrate_enable() would require the entire scheduler, runqueue,
waitqueue, and CPU
hotplug mechanisms to be aware that a task is pinned to a specific CPU.
If there is no sleep during this period, it seems to be only a
runqueue issue—CPU hotplug can
wait for the task to be unpinned while it is always in runqueue.
However, if sleep is involved,
the situation becomes significantly more complex.
After double-checking the scheduler's code, it seems fine. When a task is
scheduled out, __schedule() will set its allowable cpu by:
migrate_disable_switch(rq, prev);
static void migrate_disable_switch(struct rq *rq, struct task_struct *p)
{
        struct affinity_context ac = {
                .new_mask  = cpumask_of(rq->cpu),
                .flags     = SCA_MIGRATE_DISABLE,
        };
if (likely(!p->migration_disabled))
                return;
if (p->cpus_ptr != &p->cpus_mask)
                return;
/*
         * Violates locking rules! see comment in __do_set_cpus_allowed().
         */
        __do_set_cpus_allowed(p, &ac);
}
while woken-up, the previous cpu will be selected:
/*
 * The caller (fork, wakeup) owns p->pi_lock, ->cpus_ptr is stable.
 */
static inline
int select_task_rq(struct task_struct *p, int cpu, int wake_flags)
{
lockdep_assert_held(&p->pi_lock);
if (p->nr_cpus_allowed > 1 && !is_migration_disabled(p))
                cpu = p->sched_class->select_task_rq(p, cpu, wake_flags);
        else
                cpu = cpumask_any(p->cpus_ptr);
         ...
}
Anyway, not an expert :-)  Hopefully, other experts can provide their
input to confirm whether sleeping during migrate_disable() is all good.
...
If static data doesn't consume much memory, it could be the simplest solution.
...

  return raw_cpu_ptr(acomp_ctx);



+}



+static void acomp_ctx_put_cpu(void)
+{

  migrate_enable();



+}



static bool zswap_compress(struct page *page, struct zswap_entry *entry,
                           struct zswap_pool *pool)
 {
@@ -893,8 +905,7 @@ static bool zswap_compress(struct page *page, struct zswap_entry *entry,
        gfp_t gfp;
        u8 *dst;

  acomp_ctx = raw_cpu_ptr(pool->acomp_ctx);





  acomp_ctx = acomp_ctx_get_cpu(pool->acomp_ctx);
  mutex_lock(&acomp_ctx->mutex);

  dst = acomp_ctx->buffer;



@@ -950,6 +961,7 @@ static bool zswap_compress(struct page *page, struct zswap_entry *entry,
                zswap_reject_alloc_fail++;
    mutex_unlock(&acomp_ctx->mutex);


  acomp_ctx_put_cpu();
  return comp_ret == 0 && alloc_ret == 0;



}
@@ -960,7 +972,7 @@ static void zswap_decompress(struct zswap_entry *entry, struct folio *folio)
        struct crypto_acomp_ctx *acomp_ctx;
        u8 *src;

  acomp_ctx = raw_cpu_ptr(entry->pool->acomp_ctx);




  acomp_ctx = acomp_ctx_get_cpu(entry->pool->acomp_ctx);
  mutex_lock(&acomp_ctx->mutex);

  src = zpool_map_handle(zpool, entry->handle, ZPOOL_MM_RO);



@@ -990,6 +1002,7 @@ static void zswap_decompress(struct zswap_entry *entry, struct folio *folio)
    if (src != acomp_ctx->buffer)
            zpool_unmap_handle(zpool, entry->handle);


  acomp_ctx_put_cpu();



}
/*********************************
2.47.1.613.gc27f4b7a9f-goog
Thanks
Barry

    

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

Re: [PATCH v2 2/2] mm: zswap: disable migration while using per-CPU acomp_ctx

/*********************************