6.17-stable review patch. If anyone has any objections, please let me know.
------------------
From: Pasha Tatashin pasha.tatashin@soleen.com
commit a2fff99f92dae9c0eaf0d75de3def70ec68dad92 upstream.
KHO memory preservation metadata is preserved in 512 byte chunks which requires their allocation from slab allocator. Slabs are not safe to be used with KHO because of kfence, and because partial slabs may lead leaks to the next kernel. Change the size to be PAGE_SIZE.
The kfence specifically may cause memory corruption, where it randomly provides slab objects that can be within the scratch area. The reason for that is that kfence allocates its objects prior to KHO scratch is marked as CMA region.
While this change could potentially increase metadata overhead on systems with sparsely preserved memory, this is being mitigated by ongoing work to reduce sparseness during preservation via 1G guest pages. Furthermore, this change aligns with future work on a stateless KHO, which will also use page-sized bitmaps for its radix tree metadata.
Link: https://lkml.kernel.org/r/20251021000852.2924827-3-pasha.tatashin@soleen.com Fixes: fc33e4b44b27 ("kexec: enable KHO support for memory preservation") Signed-off-by: Pasha Tatashin pasha.tatashin@soleen.com Reviewed-by: Mike Rapoport (Microsoft) rppt@kernel.org Reviewed-by: Pratyush Yadav pratyush@kernel.org Cc: Alexander Graf graf@amazon.com Cc: Christian Brauner brauner@kernel.org Cc: David Matlack dmatlack@google.com Cc: Jason Gunthorpe jgg@ziepe.ca Cc: Jonathan Corbet corbet@lwn.net Cc: Masahiro Yamada masahiroy@kernel.org Cc: Miguel Ojeda ojeda@kernel.org Cc: Randy Dunlap rdunlap@infradead.org Cc: Samiullah Khawaja skhawaja@google.com Cc: Tejun Heo tj@kernel.org Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/kexec_handover.c | 21 +++++++++++---------- 1 file changed, 11 insertions(+), 10 deletions(-)
--- a/kernel/kexec_handover.c +++ b/kernel/kexec_handover.c @@ -52,10 +52,10 @@ early_param("kho", kho_parse_enable); * Keep track of memory that is to be preserved across KHO. * * The serializing side uses two levels of xarrays to manage chunks of per-order - * 512 byte bitmaps. For instance if PAGE_SIZE = 4096, the entire 1G order of a - * 1TB system would fit inside a single 512 byte bitmap. For order 0 allocations - * each bitmap will cover 16M of address space. Thus, for 16G of memory at most - * 512K of bitmap memory will be needed for order 0. + * PAGE_SIZE byte bitmaps. For instance if PAGE_SIZE = 4096, the entire 1G order + * of a 8TB system would fit inside a single 4096 byte bitmap. For order 0 + * allocations each bitmap will cover 128M of address space. Thus, for 16G of + * memory at most 512K of bitmap memory will be needed for order 0. * * This approach is fully incremental, as the serialization progresses folios * can continue be aggregated to the tracker. The final step, immediately prior @@ -63,12 +63,14 @@ early_param("kho", kho_parse_enable); * successor kernel to parse. */
-#define PRESERVE_BITS (512 * 8) +#define PRESERVE_BITS (PAGE_SIZE * 8)
struct kho_mem_phys_bits { DECLARE_BITMAP(preserve, PRESERVE_BITS); };
+static_assert(sizeof(struct kho_mem_phys_bits) == PAGE_SIZE); + struct kho_mem_phys { /* * Points to kho_mem_phys_bits, a sparse bitmap array. Each bit is sized @@ -93,19 +95,19 @@ struct kho_serialization { struct khoser_mem_chunk *preserved_mem_map; };
-static void *xa_load_or_alloc(struct xarray *xa, unsigned long index, size_t sz) +static void *xa_load_or_alloc(struct xarray *xa, unsigned long index) { void *res = xa_load(xa, index);
if (res) return res;
- void *elm __free(kfree) = kzalloc(sz, GFP_KERNEL); + void *elm __free(kfree) = kzalloc(PAGE_SIZE, GFP_KERNEL);
if (!elm) return ERR_PTR(-ENOMEM);
- if (WARN_ON(kho_scratch_overlap(virt_to_phys(elm), sz))) + if (WARN_ON(kho_scratch_overlap(virt_to_phys(elm), PAGE_SIZE))) return ERR_PTR(-EINVAL);
res = xa_cmpxchg(xa, index, NULL, elm, GFP_KERNEL); @@ -175,8 +177,7 @@ static int __kho_preserve_order(struct k } }
- bits = xa_load_or_alloc(&physxa->phys_bits, pfn_high / PRESERVE_BITS, - sizeof(*bits)); + bits = xa_load_or_alloc(&physxa->phys_bits, pfn_high / PRESERVE_BITS); if (IS_ERR(bits)) return PTR_ERR(bits);