On Wed, Jan 8, 2025 at 12:34 PM Yosry Ahmed yosryahmed@google.com wrote:
On Tue, Jan 7, 2025 at 9:00 PM Chengming Zhou chengming.zhou@linux.dev wrote: Please correct me if I am wrong, but my understanding is that memory allocated with alloc_percpu() is allocated for each *possible* CPU, and does not go away when CPUs are offlined. We allocate the per-CPU crypto_acomp_ctx structs with alloc_percpu() (including the mutex), so they should not go away with CPU offlining.
OTOH, we allocate the crypto_acomp_ctx.acompx, crypto_acomp_ctx.req, and crypto_acomp_ctx.buffer only for online CPUs through the CPU hotplug notifiers (i.e. zswap_cpu_comp_prepare() and zswap_cpu_comp_dead()). These are the resources that can go away with CPU offlining, and what we need to protect about.
The approach I am taking here is to hold the per-CPU mutex in the CPU offlining code while we free these resources, and set crypto_acomp_ctx.req to NULL. In acomp_ctx_get_cpu_locked(), we hold the mutex of the current CPU, and check if crypto_acomp_ctx.req is NULL.
If it is NULL, then the CPU is offlined between raw_cpu_ptr() and acquiring the mutex, and we retry on the new CPU that we end up on. If it is not NULL, then we are guaranteed that the resources will not be freed by CPU offlining until acomp_ctx_put_unlock() is called and the mutex is unlocked.
Ah you're right, that makes a lot of sense now :)