Re: [PATCH RESEND 2/2] mm: zswap: use SRCU to synchronize with CPU hotunplug

7 Jan 2025

On Tue, Jan 7, 2025 at 12:16 PM Yosry Ahmed yosryahmed@google.com wrote:
...
On Tue, Jan 7, 2025 at 10:13 AM Yosry Ahmed yosryahmed@google.com wrote:
...
On Tue, Jan 7, 2025 at 10:03 AM Johannes Weiner hannes@cmpxchg.org wrote:
...
On Tue, Jan 07, 2025 at 07:47:24AM +0000, Yosry Ahmed wrote:
...
In zswap_compress() and zswap_decompress(), the per-CPU acomp_ctx of the
current CPU at the beginning of the operation is retrieved and used
throughout.  However, since neither preemption nor migration are disabled,
it is possible that the operation continues on a different CPU.
If the original CPU is hotunplugged while the acomp_ctx is still in use,
we run into a UAF bug as the resources attached to the acomp_ctx are freed
during hotunplug in zswap_cpu_comp_dead().
The problem was introduced in commit 1ec3b5fe6eec ("mm/zswap: move to use
crypto_acomp API for hardware acceleration") when the switch to the
crypto_acomp API was made.  Prior to that, the per-CPU crypto_comp was
retrieved using get_cpu_ptr() which disables preemption and makes sure the
CPU cannot go away from under us.  Preemption cannot be disabled with the
crypto_acomp API as a sleepable context is needed.
Commit 8ba2f844f050 ("mm/zswap: change per-cpu mutex and buffer to
per-acomp_ctx") increased the UAF surface area by making the per-CPU
buffers dynamic, adding yet another resource that can be freed from under
zswap compression/decompression by CPU hotunplug.
There are a few ways to fix this:
(a) Add a refcount for acomp_ctx.
(b) Disable migration while using the per-CPU acomp_ctx.
(c) Use SRCU to wait for other CPUs using the acomp_ctx of the CPU being
hotunplugged. Normal RCU cannot be used as a sleepable context is
required.
Implement (c) since it's simpler than (a), and (b) involves using
migrate_disable() which is apparently undesired (see huge comment in
include/linux/preempt.h).
Fixes: 1ec3b5fe6eec ("mm/zswap: move to use crypto_acomp API for hardware acceleration")
Cc: stable@vger.kernel.org
Signed-off-by: Yosry Ahmed yosryahmed@google.com
Reported-by: Johannes Weiner hannes@cmpxchg.org
Closes: https://lore.kernel.org/lkml/20241113213007.GB1564047@cmpxchg.org/
Reported-by: Sam Sun samsun1006219@gmail.com
Closes: https://lore.kernel.org/lkml/CAEkJfYMtSdM5HceNsXUDf5haghD5+o2e7Qv4OcuruL4tPg...

mm/zswap.c | 31 ++++++++++++++++++++++++++++---
 1 file changed, 28 insertions(+), 3 deletions(-)

diff --git a/mm/zswap.c b/mm/zswap.c
index f6316b66fb236..add1406d693b8 100644
--- a/mm/zswap.c
+++ b/mm/zswap.c
@@ -864,12 +864,22 @@ static int zswap_cpu_comp_prepare(unsigned int cpu, struct hlist_node *node)
      return ret;
 }
+DEFINE_STATIC_SRCU(acomp_srcu);



static int zswap_cpu_comp_dead(unsigned int cpu, struct hlist_node *node)
 {
      struct zswap_pool *pool = hlist_entry(node, struct zswap_pool, node);
      struct crypto_acomp_ctx *acomp_ctx = per_cpu_ptr(pool->acomp_ctx, cpu);
  if (!IS_ERR_OR_NULL(acomp_ctx)) {


        /*


         * Even though the acomp_ctx should not be currently in use on


         * @cpu, it may still be used by compress/decompress operations


         * that started on @cpu and migrated to a different CPU. Wait


         * for such usages to complete, any news usages would be a bug.


         */


        synchronize_srcu(&acomp_srcu);



The docs suggest you can't solve it like that :(
Documentation/RCU/Design/Requirements/Requirements.rst:
Also unlike other RCU flavors, synchronize_srcu() may **not** be
  invoked from CPU-hotplug notifiers, due to the fact that SRCU grace
  periods make use of timers and the possibility of timers being
  temporarily “stranded” on the outgoing CPU. This stranding of timers
  means that timers posted to the outgoing CPU will not fire until
  late in the CPU-hotplug process. The problem is that if a notifier
  is waiting on an SRCU grace period, that grace period is waiting on
  a timer, and that timer is stranded on the outgoing CPU, then the
  notifier will never be awakened, in other words, deadlock has
  occurred. This same situation of course also prohibits
  srcu_barrier() from being invoked from CPU-hotplug notifiers.
Thanks for checking, I completely missed this. I guess it only works
with SRCU if we use call_srcu(), but then we need to copy the pointers
to a new struct to avoid racing with the CPU getting onlined again.
Otherwise we can just bite the bullet and add a refcount, or use
migrate_disable() despite that being undesirable.
Do you have a favorite? :)
I briefly looked into refcounting. The annoying thing is that we need
to handle the race between putting the last refcount in
zswap_compress()/zswap_decompress(), and the CPU getting onlined again
and re-initializing the refcount. One way to do it would be to put all
dynamically allocated resources in a struct with the same struct with
the new refcount, and use RCU + refcounts to allocate and free the
struct as a whole.
I am leaning toward just disabling migration for now tbh unless there
are objections to that, especially this close to the v6.13 release.
(Sorry for going back and forth on this, I am essentially thinking out loud)
Actually, as Kanchana mentioned before, we should be able to just hold
the mutex in zswap_cpu_comp_dead() before freeing the dynamic
resources. The mutex is allocated when the pool is created and will
not go away during CPU hotunplug AFAICT. It confused me before because
we call mutex_init() in zswap_cpu_comp_prepare(), but it really should
be in zswap_pool_create() after we allocate the pool->acomp_ctx.

    

2025

2024

2023

2022

2021

2020

2019

2018

2017

Re: [PATCH RESEND 2/2] mm: zswap: use SRCU to synchronize with CPU hotunplug