On Wed, Jan 8, 2025 at 12:23 PM Sridhar, Kanchana P kanchana.p.sridhar@intel.com wrote:
-----Original Message----- From: Yosry Ahmed yosryahmed@google.com Sent: Wednesday, January 8, 2025 8:15 AM To: Andrew Morton akpm@linux-foundation.org Cc: Johannes Weiner hannes@cmpxchg.org; Nhat Pham nphamcs@gmail.com; Chengming Zhou chengming.zhou@linux.dev; Vitaly Wool vitalywool@gmail.com; Barry Song baohua@kernel.org; Sam Sun samsun1006219@gmail.com; Sridhar, Kanchana P kanchana.p.sridhar@intel.com; linux-mm@kvack.org; linux- kernel@vger.kernel.org; Yosry Ahmed yosryahmed@google.com; stable@vger.kernel.org Subject: [PATCH] mm: zswap: properly synchronize freeing resources during CPU hotunplug
In zswap_compress() and zswap_decompress(), the per-CPU acomp_ctx of the current CPU at the beginning of the operation is retrieved and used throughout. However, since neither preemption nor migration are disabled, it is possible that the operation continues on a different CPU.
If the original CPU is hotunplugged while the acomp_ctx is still in use, we run into a UAF bug as some of the resources attached to the acomp_ctx are freed during hotunplug in zswap_cpu_comp_dead().
The problem was introduced in commit 1ec3b5fe6eec ("mm/zswap: move to use crypto_acomp API for hardware acceleration") when the switch to the crypto_acomp API was made. Prior to that, the per-CPU crypto_comp was retrieved using get_cpu_ptr() which disables preemption and makes sure the CPU cannot go away from under us. Preemption cannot be disabled with the crypto_acomp API as a sleepable context is needed.
During CPU hotunplug, hold the acomp_ctx.mutex before freeing any resources, and set acomp_ctx.req to NULL when it is freed. In the compress/decompress paths, after acquiring the acomp_ctx.mutex make sure that acomp_ctx.req is not NULL (i.e. acomp_ctx resources were not freed by CPU hotunplug). Otherwise, retry with the acomp_ctx from the new CPU.
This adds proper synchronization to ensure that the acomp_ctx resources are not freed from under compress/decompress paths.
Note that the per-CPU acomp_ctx itself (including the mutex) is not freed during CPU hotunplug, only acomp_ctx.req, acomp_ctx.buffer, and acomp_ctx.acomp. So it is safe to acquire the acomp_ctx.mutex of a CPU after it is hotunplugged.
Only other fail-proofing I can think of is to initialize the mutex right after the per-cpu acomp_ctx is allocated in zswap_pool_create() and de-couple it from the cpu onlining. This further clarifies the intent for this mutex to be used at the same lifetime scope as the acomp_ctx itself, independent of cpu hotplug/hotunplug.
I mentioned doing this initially then dismissed it as a readability improvement. However, I think it's actually required for correctness. It's possible that the CPU becomes online again after acomp_ctx_get_cpu_lock() decides to retry but before it unlocks the mutex, in which case the CPU being onlined will reinitialize an already locked mutex.
I will add that and send a v2 shortly.