The page allocator does a lot of stuff that is not visible to the user in any deterministic way. But this stuff is still important and it would be nice to test that behaviour.
KUnit is a tool for unit-testing kernel-internal APIs. This is an attempt to adopt it the page allocator.
I have been hacking on this as a way to try and test the code I'm writing for my ASI page_alloc integration proposal [0]. It's been extremely useful to be able to "just call it and see what it does". So I wanna gather some feedback on whether this basic idea is of interest before I invest too much more time in it.
You can run these tests like this:
tools/testing/kunit/kunit.py run \ --arch=x86_64 --kernel_args="movablecore=2G" \ --qemu_args="-m 4G" --kunitconfig mm/.kunitconfig
Unit-testing code that has mutable global variables can be a pain. Unit-testing code with mutable global variables _that can change concurrently with the tests_ is basically impossible. So, we need some way to isolate an "instance" of the allocator that doesn't refer to any such concurrently-mutated state.
Luckily, the allocator only has one really important global variable: node_data. So, the approach here is to carve out a subset of that variable which is as isolated as possible from the rest of rthe system, which can be used for deterministic testing. This is achieved by crating a fake "isolated" node at boot, and plugging in memory at test init time.
This is an RFC and not a PATCH because:
1. I have not taken much care to ensure the isolation is complete. There are probably sources of flakiness and nondeterminism in here.
2. I suspect the the basic idea might be over-complicated: do we really need memory hotplug here? Do we even need the instance of the allocator we're testing to actual memory behind the pages it's allocating, or could we just hallucinate a new region of vmemmap without any of that awkwardness?
One significant downside of relying on memory hotplug is that the test won't run if we can't hotplug anything out. That means you have to fiddle with the platform to even run the tests - see the --kernel_args and --qemu_args I had to add to my kunit.py command above.
So yeah, other suggestions welcome.
2b. I'm not very confident I'm using the hotplug API properly.
3. There's no point in merging this without actually having at least a few tests that are actually interesting!
Maybe a "build it and they will come" approach can be justified to some extent, but there's a nonzero cost to the infrastructure so we should probably have some confidence that they will indeed come.
[0] https://lore.kernel.org/linux-mm/20250129144320.2675822-1-jackmanb@google.co...
Signed-off-by: Brendan Jackman jackmanb@google.com --- Brendan Jackman (4): kunit: Allocate assertion data with GFP_ATOMIC mm/page_alloc_test: Add empty KUnit boilerplate mm/page_alloc_test: Add logic to isolate a node for testing mm/page_alloc_test: Add smoke-test for page allocation
drivers/base/memory.c | 5 +- include/linux/memory.h | 4 + include/linux/nodemask.h | 13 +++ kernel/kthread.c | 3 + lib/kunit/assert.c | 2 +- lib/kunit/resource.c | 2 +- lib/kunit/test.c | 2 +- mm/.kunitconfig | 10 ++ mm/Kconfig | 8 ++ mm/Makefile | 2 + mm/internal.h | 11 ++ mm/memory_hotplug.c | 26 +++-- mm/numa_memblks.c | 22 ++++ mm/page_alloc.c | 37 +++++- mm/page_alloc_test.c | 296 +++++++++++++++++++++++++++++++++++++++++++++++ 15 files changed, 429 insertions(+), 14 deletions(-) --- base-commit: d082ecbc71e9e0bf49883ee4afd435a77a5101b6 change-id: 20250219-page-alloc-kunit-df76ef8d8eb9
Best regards,
At present KUnit doesn't handle assertions happening in atomic contexts. A later commit will add tests that make assertions with spinlocks held.
In preparation, switch to GFP_ATOMIC.
"Just use GFP_ATOMIC" is not generally a solution to this kind of problem: since it uses up memory reserves, instead it should be only used when truly needed.
However, for test code that should not be expected to run in production systems it seems tolerable, given that it avoids creating more complex APIs.
Signed-off-by: Brendan Jackman jackmanb@google.com --- lib/kunit/assert.c | 2 +- lib/kunit/resource.c | 2 +- lib/kunit/test.c | 2 +- 3 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/lib/kunit/assert.c b/lib/kunit/assert.c index 867aa5c4bccf764757e190948b8e3a2439116786..f08656c5fb247b510c4215445cc307ed1205a96c 100644 --- a/lib/kunit/assert.c +++ b/lib/kunit/assert.c @@ -101,7 +101,7 @@ VISIBLE_IF_KUNIT bool is_literal(const char *text, long long value) if (strlen(text) != len) return false;
- buffer = kmalloc(len+1, GFP_KERNEL); + buffer = kmalloc(len+1, GFP_ATOMIC); if (!buffer) return false;
diff --git a/lib/kunit/resource.c b/lib/kunit/resource.c index f0209252b179f8b48d47ecc244c468ed80e23bdc..eac511af4f8d7843d58c4e3976c77a9c4def86a7 100644 --- a/lib/kunit/resource.c +++ b/lib/kunit/resource.c @@ -98,7 +98,7 @@ int kunit_add_action(struct kunit *test, void (*action)(void *), void *ctx)
KUNIT_ASSERT_NOT_NULL_MSG(test, action, "Tried to action a NULL function!");
- action_ctx = kzalloc(sizeof(*action_ctx), GFP_KERNEL); + action_ctx = kzalloc(sizeof(*action_ctx), GFP_ATOMIC); if (!action_ctx) return -ENOMEM;
diff --git a/lib/kunit/test.c b/lib/kunit/test.c index 146d1b48a0965e8aaddb6162928f408bbb542645..08d0ff51bd85845a08b40cd3933dd588bd10bddf 100644 --- a/lib/kunit/test.c +++ b/lib/kunit/test.c @@ -279,7 +279,7 @@ static void kunit_fail(struct kunit *test, const struct kunit_loc *loc,
kunit_set_failure(test);
- stream = kunit_alloc_string_stream(test, GFP_KERNEL); + stream = kunit_alloc_string_stream(test, GFP_ATOMIC); if (IS_ERR(stream)) { WARN(true, "Could not allocate stream to print failed assertion in %s:%d\n",
Add the Kbuild plumbing to create a new KUnit suite. Create the suite, with no tests inside it.
Signed-off-by: Brendan Jackman jackmanb@google.com --- mm/.kunitconfig | 2 ++ mm/Kconfig | 8 ++++++++ mm/Makefile | 2 ++ mm/page_alloc_test.c | 21 +++++++++++++++++++++ 4 files changed, 33 insertions(+)
diff --git a/mm/.kunitconfig b/mm/.kunitconfig new file mode 100644 index 0000000000000000000000000000000000000000..fcc28557fa1c1412b21f9dbddbf6a63adca6f2b4 --- /dev/null +++ b/mm/.kunitconfig @@ -0,0 +1,2 @@ +CONFIG_KUNIT=y +CONFIG_PAGE_ALLOC_KUNIT_TEST=y \ No newline at end of file diff --git a/mm/Kconfig b/mm/Kconfig index 1b501db064172cf54f1b1259893dfba676473c56..1fac51c536c66243a1321195a78eb40668386158 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -1358,6 +1358,14 @@ config PT_RECLAIM
Note: now only empty user PTE page table pages will be reclaimed.
+config PAGE_ALLOC_KUNIT_TEST + tristate "KUnit test for page allocator" if !KUNIT_ALL_TESTS + depends on KUNIT + default KUNIT_ALL_TESTS + help + Builds unit tests for page allocator. + + If unsure, say N.
source "mm/damon/Kconfig"
diff --git a/mm/Makefile b/mm/Makefile index 850386a67b3e0e3b543b27691a6512c448815697..7b8018e0e6510881fac6e4295fdd1472e38d743d 100644 --- a/mm/Makefile +++ b/mm/Makefile @@ -61,6 +61,8 @@ obj-y := filemap.o mempool.o oom_kill.o fadvise.o \ page-alloc-y := page_alloc.o page-alloc-$(CONFIG_SHUFFLE_PAGE_ALLOCATOR) += shuffle.o
+obj-$(CONFIG_PAGE_ALLOC_KUNIT_TEST) += page_alloc_test.o + # Give 'memory_hotplug' its own module-parameter namespace memory-hotplug-$(CONFIG_MEMORY_HOTPLUG) += memory_hotplug.o
diff --git a/mm/page_alloc_test.c b/mm/page_alloc_test.c new file mode 100644 index 0000000000000000000000000000000000000000..377dfdd50a3c6928e15210cc87d5399c1db80da7 --- /dev/null +++ b/mm/page_alloc_test.c @@ -0,0 +1,21 @@ +// SPDX-License-Identifier: GPL-2.0-only +#include <linux/errname.h> +#include <linux/list.h> +#include <linux/gfp.h> +#include <linux/memory.h> +#include <linux/nodemask.h> +#include <linux/percpu.h> +#include <linux/smp.h> + +#include <kunit/test.h> + +static struct kunit_case test_cases[] = { {} }; + +static struct kunit_suite test_suite = { + .name = "page_alloc", + .test_cases = test_cases, +}; +kunit_test_suite(test_suite); + +MODULE_LICENSE("GPL"); +MODULE_IMPORT_NS("EXPORTED_FOR_KUNIT_TESTING");
In order to test the page allocator, we need an "instance" of the page allocator that is not subject to unpredictable perturbation by the live system. The closest thing that we have to an "instance" of the allocator is a NUMA node.
So, introduce a new concept of an "isolated" node. This is an extension of the existing concept of a "fake" node, with the addition that nothing else in the system will touch it unless instructed to by the test code.
The node is created during boot but has no memory nor any CPUs attached. It is not on any other node's fallback lists. Any code that pays general attention to NODE_DATA in such a way that might cause the page allocator data structures to be modified asynchronously to the test, is enlightened to ignore it via the node_isolated() helper.
Then, during initialization of the allocator test suite, hotplug out some memory and then plug it back in to the isolated node. The node can then be used for testing.
Because it's easy to miss code that needs enlightenment, which can lead to confusing test behaviour, also add some defensive checks to try and interference with the isolated node before the start of the test.
Signed-off-by: Brendan Jackman jackmanb@google.com --- drivers/base/memory.c | 5 +- include/linux/memory.h | 4 ++ include/linux/nodemask.h | 13 +++++ kernel/kthread.c | 3 + mm/.kunitconfig | 10 +++- mm/Kconfig | 2 +- mm/internal.h | 11 ++++ mm/memory_hotplug.c | 26 ++++++--- mm/numa_memblks.c | 22 ++++++++ mm/page_alloc.c | 37 +++++++++++- mm/page_alloc_test.c | 142 ++++++++++++++++++++++++++++++++++++++++++++++- 11 files changed, 260 insertions(+), 15 deletions(-)
diff --git a/drivers/base/memory.c b/drivers/base/memory.c index 348c5dbbfa68ad30d34b344ace1dd8deac0e1947..cdb893d7f13324862ee0943df080440d19fbd957 100644 --- a/drivers/base/memory.c +++ b/drivers/base/memory.c @@ -26,6 +26,8 @@ #include <linux/atomic.h> #include <linux/uaccess.h>
+#include <kunit/visibility.h> + #define MEMORY_CLASS_NAME "memory"
static const char *const online_type_to_str[] = { @@ -183,7 +185,7 @@ static inline unsigned long memblk_nr_poison(struct memory_block *mem) /* * Must acquire mem_hotplug_lock in write mode. */ -static int memory_block_online(struct memory_block *mem) +VISIBLE_IF_KUNIT int memory_block_online(struct memory_block *mem) { unsigned long start_pfn = section_nr_to_pfn(mem->start_section_nr); unsigned long nr_pages = PAGES_PER_SECTION * sections_per_block; @@ -250,6 +252,7 @@ static int memory_block_online(struct memory_block *mem) mem_hotplug_done(); return ret; } +EXPORT_SYMBOL_IF_KUNIT(memory_block_online);
/* * Must acquire mem_hotplug_lock in write mode. diff --git a/include/linux/memory.h b/include/linux/memory.h index c0afee5d126ef65d420770e1f8669842c499c8de..99139a6e9c11a407a8d7bfb17b7bbe3d276048ff 100644 --- a/include/linux/memory.h +++ b/include/linux/memory.h @@ -177,6 +177,10 @@ int walk_dynamic_memory_groups(int nid, walk_memory_groups_func_t func, register_memory_notifier(&fn##_mem_nb); \ })
+#ifdef CONFIG_KUNIT +int memory_block_online(struct memory_block *mem); +#endif + #ifdef CONFIG_NUMA void memory_block_add_nid(struct memory_block *mem, int nid, enum meminit_context context); diff --git a/include/linux/nodemask.h b/include/linux/nodemask.h index 9fd7a0ce9c1a7336df46f12622867e6786a5c0a9..6ea38963487e1fbb800eab69e5e6413aa17a8047 100644 --- a/include/linux/nodemask.h +++ b/include/linux/nodemask.h @@ -536,6 +536,19 @@ static __always_inline int node_random(const nodemask_t *maskp) #define for_each_node(node) for_each_node_state(node, N_POSSIBLE) #define for_each_online_node(node) for_each_node_state(node, N_ONLINE)
+ +#ifdef CONFIG_PAGE_ALLOC_KUNIT_TEST +/* + * An isolated node is a fake node for testing, that boots with no memory and no + * attached CPUs, and nothing should touch it except for test code. + */ +extern bool node_isolated(int node); +/* Only one isolated node is supported at present and it cannot be un-isolated. */ +extern void node_set_isolated(int node); +#else +static inline bool node_isolated(int node) { return false; } +#endif /* CONFIG_PAGE_ALLOC_KUNIT_TEST */ + /* * For nodemask scratch area. * NODEMASK_ALLOC(type, name) allocates an object with a specified type and diff --git a/kernel/kthread.c b/kernel/kthread.c index 5dc5b0d7238e85ad4074076e4036062c7bfcae74..93f65c5935cba8a59c7d3df2e36335130c3e1f71 100644 --- a/kernel/kthread.c +++ b/kernel/kthread.c @@ -9,6 +9,7 @@ */ #include <uapi/linux/sched/types.h> #include <linux/mm.h> +#include <linux/mmdebug.h> #include <linux/mmu_context.h> #include <linux/sched.h> #include <linux/sched/mm.h> @@ -511,6 +512,8 @@ struct task_struct *__kthread_create_on_node(int (*threadfn)(void *data), struct kthread_create_info *create = kmalloc(sizeof(*create), GFP_KERNEL);
+ VM_WARN_ON(node != NUMA_NO_NODE && node_isolated(node)); + if (!create) return ERR_PTR(-ENOMEM); create->threadfn = threadfn; diff --git a/mm/.kunitconfig b/mm/.kunitconfig index fcc28557fa1c1412b21f9dbddbf6a63adca6f2b4..4ff4e1654c3e9b364072d33bfffb3a2336825859 100644 --- a/mm/.kunitconfig +++ b/mm/.kunitconfig @@ -1,2 +1,10 @@ CONFIG_KUNIT=y -CONFIG_PAGE_ALLOC_KUNIT_TEST=y \ No newline at end of file +CONFIG_PAGE_ALLOC_KUNIT_TEST=y + +# Required for NUMA +CONFIG_SMP=y +# Used by tests to carve out fake node for isolating page_alloc data. +CONFIG_NUMA=y +CONFIG_NUMA_EMU=y +CONFIG_MEMORY_HOTPLUG=y +CONFIG_MEMORY_HOTREMOVE=y \ No newline at end of file diff --git a/mm/Kconfig b/mm/Kconfig index 1fac51c536c66243a1321195a78eb40668386158..64c3794120002a839f56e3feb284c6d5c2635f40 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -1360,7 +1360,7 @@ config PT_RECLAIM
config PAGE_ALLOC_KUNIT_TEST tristate "KUnit test for page allocator" if !KUNIT_ALL_TESTS - depends on KUNIT + depends on KUNIT && NUMA && MEMORY_HOTREMOVE default KUNIT_ALL_TESTS help Builds unit tests for page allocator. diff --git a/mm/internal.h b/mm/internal.h index 109ef30fee11f8b399f6bac42eab078cd51e01a5..9dbe5853b90b53ff261ba1b2fca12eabfda1a9de 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -1545,5 +1545,16 @@ static inline bool reclaim_pt_is_enabled(unsigned long start, unsigned long end, } #endif /* CONFIG_PT_RECLAIM */
+#ifdef CONFIG_PAGE_ALLOC_KUNIT_TEST +/* + * Note that node_isolated() is separate, that's a "public API". But only + * test code needs to look up which node is isolated. + */ +extern int isolated_node; +#endif + +#ifdef CONFIG_KUNIT +void drain_pages(unsigned int cpu); +#endif
#endif /* __MM_INTERNAL_H */ diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index e3655f07dd6e33efb3e811cab07f240649487441..968c23b6f347cf6a0c30d00cb556166b8df9c9c3 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -1198,10 +1198,12 @@ int online_pages(unsigned long pfn, unsigned long nr_pages, arg.nr_pages = nr_pages; node_states_check_changes_online(nr_pages, zone, &arg);
- ret = memory_notify(MEM_GOING_ONLINE, &arg); - ret = notifier_to_errno(ret); - if (ret) - goto failed_addition; + if (!node_isolated(nid)) { + ret = memory_notify(MEM_GOING_ONLINE, &arg); + ret = notifier_to_errno(ret); + if (ret) + goto failed_addition; + }
/* * Fixup the number of isolated pageblocks before marking the sections @@ -1242,19 +1244,27 @@ int online_pages(unsigned long pfn, unsigned long nr_pages, /* reinitialise watermarks and update pcp limits */ init_per_zone_wmark_min();
- kswapd_run(nid); - kcompactd_run(nid); + /* + * Don't run daemons on the special test node, if that needs to be + * tested the test should run it. + */ + if (!node_isolated(nid)) { + kswapd_run(nid); + kcompactd_run(nid); + }
writeback_set_ratelimit();
- memory_notify(MEM_ONLINE, &arg); + if (!node_isolated(nid)) + memory_notify(MEM_ONLINE, &arg); return 0;
failed_addition: pr_debug("online_pages [mem %#010llx-%#010llx] failed\n", (unsigned long long) pfn << PAGE_SHIFT, (((unsigned long long) pfn + nr_pages) << PAGE_SHIFT) - 1); - memory_notify(MEM_CANCEL_ONLINE, &arg); + if (!node_isolated(nid)) + memory_notify(MEM_CANCEL_ONLINE, &arg); remove_pfn_range_from_zone(zone, pfn, nr_pages); return ret; } diff --git a/mm/numa_memblks.c b/mm/numa_memblks.c index ff4054f4334dae42ee3b3668da18bba01dc3cd8b..190c879af2c779df1be448c45c43b0570bb6c308 100644 --- a/mm/numa_memblks.c +++ b/mm/numa_memblks.c @@ -7,6 +7,8 @@ #include <linux/numa.h> #include <linux/numa_memblks.h>
+#include "internal.h" + int numa_distance_cnt; static u8 *numa_distance;
@@ -371,6 +373,24 @@ static void __init numa_clear_kernel_node_hotplug(void) } }
+#ifdef CONFIG_PAGE_ALLOC_KUNIT_TEST +static inline void make_isolated_node(void) +{ + int node; + + node = num_possible_nodes(); + if (!numa_valid_node(node)) { + pr_err("All node IDs used, can't fake another.\n"); + } else { + node_set(num_possible_nodes(), node_possible_map); + node_set_isolated(node); + } +} +#else +static inline void make_isolated_node(void) { } +#endif + + static int __init numa_register_meminfo(struct numa_meminfo *mi) { int i; @@ -381,6 +401,8 @@ static int __init numa_register_meminfo(struct numa_meminfo *mi) if (WARN_ON(nodes_empty(node_possible_map))) return -EINVAL;
+ make_isolated_node(); + for (i = 0; i < mi->nr_blks; i++) { struct numa_memblk *mb = &mi->blk[i];
diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 579789600a3c7bfb7b0d847d51af702a9d4b139a..9472da738119589150db26126dfcf808e2dc9371 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -56,6 +56,7 @@ #include <linux/cacheinfo.h> #include <linux/pgalloc_tag.h> #include <asm/div64.h> +#include <kunit/visibility.h> #include "internal.h" #include "shuffle.h" #include "page_reporting.h" @@ -291,6 +292,26 @@ static bool __free_unaccepted(struct page *page);
int page_group_by_mobility_disabled __read_mostly;
+/* + * Test harness for KUnit - pick a node that we will never allocate from, except + * for in the page allocator tests. + */ +#ifdef CONFIG_PAGE_ALLOC_KUNIT_TEST +int isolated_node = NUMA_NO_NODE; +EXPORT_SYMBOL(isolated_node); + +void node_set_isolated(int node) +{ + WARN_ON(isolated_node != NUMA_NO_NODE); + isolated_node = node; +} + +bool node_isolated(int node) +{ + return node == isolated_node; +} +#endif /* CONFIG_POAGE_ALLOC_KUNIT_TEST */ + #ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT /* * During boot we initialize deferred pages on-demand, as needed, but once @@ -2410,7 +2431,7 @@ static void drain_pages_zone(unsigned int cpu, struct zone *zone) /* * Drain pcplists of all zones on the indicated processor. */ -static void drain_pages(unsigned int cpu) +VISIBLE_IF_KUNIT void drain_pages(unsigned int cpu) { struct zone *zone;
@@ -2418,6 +2439,7 @@ static void drain_pages(unsigned int cpu) drain_pages_zone(cpu, zone); } } +EXPORT_SYMBOL_IF_KUNIT(drain_pages);
/* * Spill all of this CPU's per-cpu pages back into the buddy allocator. @@ -5087,6 +5109,8 @@ int find_next_best_node(int node, nodemask_t *used_node_mask) }
for_each_node_state(n, N_MEMORY) { + if (node_isolated(n)) + continue;
/* Don't want a node to appear more than once */ if (node_isset(n, *used_node_mask)) @@ -5134,8 +5158,17 @@ static void build_zonelists_in_node_order(pg_data_t *pgdat, int *node_order,
for (i = 0; i < nr_nodes; i++) { int nr_zones; + int other_nid = node_order[i]; + pg_data_t *node = NODE_DATA(other_nid);
- pg_data_t *node = NODE_DATA(node_order[i]); + /* + * Never fall back to the isolated node. The isolated node has + * to be able to fall back to other nodes because that fallback + * is relied on for allocating data structures that describe the + * node. + */ + if (node_isolated(other_nid) && other_nid != pgdat->node_id) + continue;
nr_zones = build_zonerefs_node(node, zonerefs); zonerefs += nr_zones; diff --git a/mm/page_alloc_test.c b/mm/page_alloc_test.c index 377dfdd50a3c6928e15210cc87d5399c1db80da7..c6bcfcaf61b57ca35ad1b5fc48fd07d0402843bc 100644 --- a/mm/page_alloc_test.c +++ b/mm/page_alloc_test.c @@ -3,19 +3,157 @@ #include <linux/list.h> #include <linux/gfp.h> #include <linux/memory.h> +#include <linux/memory_hotplug.h> #include <linux/nodemask.h> #include <linux/percpu.h> #include <linux/smp.h>
#include <kunit/test.h>
+#include "internal.h" + +#define EXPECT_PCPLIST_EMPTY(test, zone, cpu, pindex) ({ \ + struct per_cpu_pages *pcp = per_cpu_ptr(zone->per_cpu_pageset, cpu); \ + struct page *page; \ + \ + lockdep_assert_held(&pcp->lock); \ + page = list_first_entry_or_null( \ + &pcp->lists[pindex], struct page, pcp_list); \ + \ + if (page) { \ + KUNIT_FAIL(test, "PCPlist %d on CPU %d wasn't empty", i, cpu); \ + dump_page(page, "unexpectedly on pcplist"); \ + } \ +}) + +static void action_drain_pages_all(void *unused) +{ + int cpu; + + for_each_online_cpu(cpu) + drain_pages(cpu); +} + +/* Runs before each test. */ +static int test_init(struct kunit *test) +{ + struct zone *zone_normal; + int cpu; + + if (isolated_node == NUMA_NO_NODE) + kunit_skip(test, "No fake NUMA node ID allocated"); + + zone_normal = &NODE_DATA(isolated_node)->node_zones[ZONE_NORMAL]; + + /* + * Nothing except these tests should be allocating from the fake node so + * the pcplists should be empty. Obviously this is racy but at least it + * can probabilistically detect issues that would otherwise make for + * really confusing test results. + */ + for_each_possible_cpu(cpu) { + struct per_cpu_pages *pcp = per_cpu_ptr(zone_normal->per_cpu_pageset, cpu); + unsigned long flags; + int i; + + spin_lock_irqsave(&pcp->lock, flags); + for (i = 0; i < ARRAY_SIZE(pcp->lists); i++) + EXPECT_PCPLIST_EMPTY(test, zone_normal, cpu, i); + spin_unlock_irqrestore(&pcp->lock, flags); + } + + /* Also ensure we don't leave a mess for the next test. */ + kunit_add_action(test, action_drain_pages_all, NULL); + + return 0; +} + +static int memory_block_online_cb(struct memory_block *mem, void *unused) +{ + return memory_block_online(mem); +} + +struct region { + int node; + unsigned long start; + unsigned long size; +}; + +/* + * Unplug some memory from a "real" node and plug it into the isolated node, for + * use during the tests. + */ +static int populate_isolated_node(struct kunit_suite *suite) +{ + struct zone *zone_movable = &NODE_DATA(0)->node_zones[ZONE_MOVABLE]; + phys_addr_t zone_start = zone_movable->zone_start_pfn << PAGE_SHIFT; + phys_addr_t zone_size = zone_movable->spanned_pages << PAGE_SHIFT; + unsigned long bs = memory_block_size_bytes(); + u64 start = round_up(zone_start, bs); + /* Plug a memory block if we can find it. */ + unsigned long size = round_down(min(zone_size, bs), bs); + int err; + + if (!size) { + pr_err("Couldn't find ZONE_MOVABLE block to offline\n"); + pr_err("Try setting/expanding movablecore=\n"); + return -1; + } + + err = offline_and_remove_memory(start, size); + if (err) { + pr_notice("Couldn't offline PFNs 0x%llx - 0x%llx\n", + start >> PAGE_SHIFT, (start + size) >> PAGE_SHIFT); + return err; + } + err = add_memory(isolated_node, start, size, MMOP_ONLINE); + if (err) { + pr_notice("Couldn't add PFNs 0x%llx - 0x%llx\n", + start >> PAGE_SHIFT, (start + size) >> PAGE_SHIFT); + goto add_and_online_memory; + } + err = walk_memory_blocks(start, size, NULL, memory_block_online_cb); + if (err) { + pr_notice("Couldn't online PFNs 0x%llx - 0x%llx\n", + start >> PAGE_SHIFT, (start + size) >> PAGE_SHIFT); + goto remove_memory; + } + + return 0; + +remove_memory: + if (WARN_ON(remove_memory(start, size))) + return err; +add_and_online_memory: + if (WARN_ON(add_memory(0, start, size, MMOP_ONLINE))) + return err; + WARN_ON(walk_memory_blocks(start, size, NULL, memory_block_online_cb)); + return err; +} + +static void depopulate_isolated_node(struct kunit_suite *suite) +{ + unsigned long start, size = memory_block_size_bytes(); + + if (suite->suite_init_err) + return; + + start = NODE_DATA(isolated_node)->node_start_pfn << PAGE_SHIFT; + + WARN_ON(remove_memory(start, size)); + WARN_ON(add_memory(0, start, size, MMOP_ONLINE)); + WARN_ON(walk_memory_blocks(start, size, NULL, memory_block_online_cb)); +} static struct kunit_case test_cases[] = { {} };
-static struct kunit_suite test_suite = { +struct kunit_suite page_alloc_test_suite = { .name = "page_alloc", .test_cases = test_cases, + .suite_init = populate_isolated_node, + .suite_exit = depopulate_isolated_node, + .init = test_init, }; -kunit_test_suite(test_suite); +kunit_test_suite(page_alloc_test_suite);
MODULE_LICENSE("GPL"); MODULE_IMPORT_NS("EXPORTED_FOR_KUNIT_TESTING");
On Mon, Feb 24, 2025 at 02:47:13PM +0000, Brendan Jackman wrote:
In order to test the page allocator, we need an "instance" of the page allocator that is not subject to unpredictable perturbation by the live system. The closest thing that we have to an "instance" of the allocator is a NUMA node.
So, introduce a new concept of an "isolated" node. This is an extension of the existing concept of a "fake" node, with the addition that nothing else in the system will touch it unless instructed to by the test code.
The node is created during boot but has no memory nor any CPUs attached. It is not on any other node's fallback lists. Any code that pays general attention to NODE_DATA in such a way that might cause the page allocator data structures to be modified asynchronously to the test, is enlightened to ignore it via the node_isolated() helper.
Then, during initialization of the allocator test suite, hotplug out some memory and then plug it back in to the isolated node. The node can then be used for testing.
Because it's easy to miss code that needs enlightenment, which can lead to confusing test behaviour, also add some defensive checks to try and interference with the isolated node before the start of the test.
Signed-off-by: Brendan Jackman jackmanb@google.com
drivers/base/memory.c | 5 +- include/linux/memory.h | 4 ++ include/linux/nodemask.h | 13 +++++ kernel/kthread.c | 3 + mm/.kunitconfig | 10 +++- mm/Kconfig | 2 +- mm/internal.h | 11 ++++ mm/memory_hotplug.c | 26 ++++++--- mm/numa_memblks.c | 22 ++++++++ mm/page_alloc.c | 37 +++++++++++- mm/page_alloc_test.c | 142 ++++++++++++++++++++++++++++++++++++++++++++++- 11 files changed, 260 insertions(+), 15 deletions(-)
diff --git a/drivers/base/memory.c b/drivers/base/memory.c index 348c5dbbfa68ad30d34b344ace1dd8deac0e1947..cdb893d7f13324862ee0943df080440d19fbd957 100644 --- a/drivers/base/memory.c +++ b/drivers/base/memory.c @@ -26,6 +26,8 @@ #include <linux/atomic.h> #include <linux/uaccess.h> +#include <kunit/visibility.h>
#define MEMORY_CLASS_NAME "memory" static const char *const online_type_to_str[] = { @@ -183,7 +185,7 @@ static inline unsigned long memblk_nr_poison(struct memory_block *mem) /*
- Must acquire mem_hotplug_lock in write mode.
*/ -static int memory_block_online(struct memory_block *mem) +VISIBLE_IF_KUNIT int memory_block_online(struct memory_block *mem) { unsigned long start_pfn = section_nr_to_pfn(mem->start_section_nr); unsigned long nr_pages = PAGES_PER_SECTION * sections_per_block; @@ -250,6 +252,7 @@ static int memory_block_online(struct memory_block *mem) mem_hotplug_done(); return ret; } +EXPORT_SYMBOL_IF_KUNIT(memory_block_online); /*
- Must acquire mem_hotplug_lock in write mode.
diff --git a/include/linux/memory.h b/include/linux/memory.h index c0afee5d126ef65d420770e1f8669842c499c8de..99139a6e9c11a407a8d7bfb17b7bbe3d276048ff 100644 --- a/include/linux/memory.h +++ b/include/linux/memory.h @@ -177,6 +177,10 @@ int walk_dynamic_memory_groups(int nid, walk_memory_groups_func_t func, register_memory_notifier(&fn##_mem_nb); \ }) +#ifdef CONFIG_KUNIT
Why not CONFIG_PAGE_ALLOC_KUNIT_TEST?
+int memory_block_online(struct memory_block *mem); +#endif
#ifdef CONFIG_NUMA void memory_block_add_nid(struct memory_block *mem, int nid, enum meminit_context context); diff --git a/include/linux/nodemask.h b/include/linux/nodemask.h index 9fd7a0ce9c1a7336df46f12622867e6786a5c0a9..6ea38963487e1fbb800eab69e5e6413aa17a8047 100644 --- a/include/linux/nodemask.h +++ b/include/linux/nodemask.h @@ -536,6 +536,19 @@ static __always_inline int node_random(const nodemask_t *maskp) #define for_each_node(node) for_each_node_state(node, N_POSSIBLE) #define for_each_online_node(node) for_each_node_state(node, N_ONLINE)
+#ifdef CONFIG_PAGE_ALLOC_KUNIT_TEST +/*
- An isolated node is a fake node for testing, that boots with no memory and no
- attached CPUs, and nothing should touch it except for test code.
- */
+extern bool node_isolated(int node); +/* Only one isolated node is supported at present and it cannot be un-isolated. */ +extern void node_set_isolated(int node); +#else +static inline bool node_isolated(int node) { return false; } +#endif /* CONFIG_PAGE_ALLOC_KUNIT_TEST */
/*
- For nodemask scratch area.
- NODEMASK_ALLOC(type, name) allocates an object with a specified type and
diff --git a/kernel/kthread.c b/kernel/kthread.c index 5dc5b0d7238e85ad4074076e4036062c7bfcae74..93f65c5935cba8a59c7d3df2e36335130c3e1f71 100644 --- a/kernel/kthread.c +++ b/kernel/kthread.c @@ -9,6 +9,7 @@ */ #include <uapi/linux/sched/types.h> #include <linux/mm.h> +#include <linux/mmdebug.h> #include <linux/mmu_context.h> #include <linux/sched.h> #include <linux/sched/mm.h> @@ -511,6 +512,8 @@ struct task_struct *__kthread_create_on_node(int (*threadfn)(void *data), struct kthread_create_info *create = kmalloc(sizeof(*create), GFP_KERNEL);
- VM_WARN_ON(node != NUMA_NO_NODE && node_isolated(node));
- if (!create) return ERR_PTR(-ENOMEM); create->threadfn = threadfn;
diff --git a/mm/.kunitconfig b/mm/.kunitconfig index fcc28557fa1c1412b21f9dbddbf6a63adca6f2b4..4ff4e1654c3e9b364072d33bfffb3a2336825859 100644 --- a/mm/.kunitconfig +++ b/mm/.kunitconfig @@ -1,2 +1,10 @@ CONFIG_KUNIT=y -CONFIG_PAGE_ALLOC_KUNIT_TEST=y \ No newline at end of file +CONFIG_PAGE_ALLOC_KUNIT_TEST=y
+# Required for NUMA +CONFIG_SMP=y +# Used by tests to carve out fake node for isolating page_alloc data. +CONFIG_NUMA=y +CONFIG_NUMA_EMU=y +CONFIG_MEMORY_HOTPLUG=y +CONFIG_MEMORY_HOTREMOVE=y \ No newline at end of file diff --git a/mm/Kconfig b/mm/Kconfig index 1fac51c536c66243a1321195a78eb40668386158..64c3794120002a839f56e3feb284c6d5c2635f40 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -1360,7 +1360,7 @@ config PT_RECLAIM config PAGE_ALLOC_KUNIT_TEST tristate "KUnit test for page allocator" if !KUNIT_ALL_TESTS
- depends on KUNIT
- depends on KUNIT && NUMA && MEMORY_HOTREMOVE default KUNIT_ALL_TESTS help Builds unit tests for page allocator.
diff --git a/mm/internal.h b/mm/internal.h index 109ef30fee11f8b399f6bac42eab078cd51e01a5..9dbe5853b90b53ff261ba1b2fca12eabfda1a9de 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -1545,5 +1545,16 @@ static inline bool reclaim_pt_is_enabled(unsigned long start, unsigned long end, } #endif /* CONFIG_PT_RECLAIM */ +#ifdef CONFIG_PAGE_ALLOC_KUNIT_TEST +/*
- Note that node_isolated() is separate, that's a "public API". But only
- test code needs to look up which node is isolated.
- */
+extern int isolated_node; +#endif
+#ifdef CONFIG_KUNIT
Same here, why not just put it in the above #ifdef?
+void drain_pages(unsigned int cpu); +#endif #endif /* __MM_INTERNAL_H */ diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index e3655f07dd6e33efb3e811cab07f240649487441..968c23b6f347cf6a0c30d00cb556166b8df9c9c3 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -1198,10 +1198,12 @@ int online_pages(unsigned long pfn, unsigned long nr_pages, arg.nr_pages = nr_pages; node_states_check_changes_online(nr_pages, zone, &arg);
- ret = memory_notify(MEM_GOING_ONLINE, &arg);
- ret = notifier_to_errno(ret);
- if (ret)
goto failed_addition;
- if (!node_isolated(nid)) {
ret = memory_notify(MEM_GOING_ONLINE, &arg);
ret = notifier_to_errno(ret);
if (ret)
goto failed_addition;
- }
/* * Fixup the number of isolated pageblocks before marking the sections @@ -1242,19 +1244,27 @@ int online_pages(unsigned long pfn, unsigned long nr_pages, /* reinitialise watermarks and update pcp limits */ init_per_zone_wmark_min();
- kswapd_run(nid);
- kcompactd_run(nid);
- /*
* Don't run daemons on the special test node, if that needs to be
* tested the test should run it.
*/
- if (!node_isolated(nid)) {
kswapd_run(nid);
kcompactd_run(nid);
- }
writeback_set_ratelimit();
- memory_notify(MEM_ONLINE, &arg);
- if (!node_isolated(nid))
memory_notify(MEM_ONLINE, &arg);
I am not familiar with this code, I am wondering if we can move things around to have a single block of things we skip for isolated nodes. It depends on ordering dependencies so we need someone who knows this code to tell us.
return 0; failed_addition: pr_debug("online_pages [mem %#010llx-%#010llx] failed\n", (unsigned long long) pfn << PAGE_SHIFT, (((unsigned long long) pfn + nr_pages) << PAGE_SHIFT) - 1);
- memory_notify(MEM_CANCEL_ONLINE, &arg);
- if (!node_isolated(nid))
remove_pfn_range_from_zone(zone, pfn, nr_pages); return ret;memory_notify(MEM_CANCEL_ONLINE, &arg);
}
[..]
-static struct kunit_suite test_suite = { +struct kunit_suite page_alloc_test_suite = {
We should probably just intrdouce the suite as page_alloc_test_suite to begin with?
.name = "page_alloc", .test_cases = test_cases,
- .suite_init = populate_isolated_node,
- .suite_exit = depopulate_isolated_node,
- .init = test_init,
}; -kunit_test_suite(test_suite); +kunit_test_suite(page_alloc_test_suite); MODULE_LICENSE("GPL"); MODULE_IMPORT_NS("EXPORTED_FOR_KUNIT_TESTING");
-- 2.48.1.601.g30ceb7b040-goog
On Mon, 24 Feb 2025 at 19:34, Yosry Ahmed yosry.ahmed@linux.dev wrote:
+#ifdef CONFIG_KUNIT
Why not CONFIG_PAGE_ALLOC_KUNIT_TEST?
VISIBLE_IF_KUNIT is paired with #ifdef CONFIG_KUNIT elsewhere (I think there might even be docs that do this in an example) so I just followed the pattern.
#ifdef CONFIG_KUNIT -> things are consistent and you just don't have to think about this very much.
#ifdef CONFIG_PAGE_ALLOC_KUNIT_TEST -> better scoping.
So yeah, shrug. Maybe David/Rae/Brendan has an opinion.
On Tue, 25 Feb 2025 at 12:20, Brendan Jackman jackmanb@google.com wrote:
On Mon, 24 Feb 2025 at 19:34, Yosry Ahmed yosry.ahmed@linux.dev wrote:
+#ifdef CONFIG_KUNIT
Why not CONFIG_PAGE_ALLOC_KUNIT_TEST?
VISIBLE_IF_KUNIT is paired with #ifdef CONFIG_KUNIT elsewhere (I think there might even be docs that do this in an example) so I just followed the pattern.
#ifdef CONFIG_KUNIT -> things are consistent and you just don't have to think about this very much.
#ifdef CONFIG_PAGE_ALLOC_KUNIT_TEST -> better scoping.
So yeah, shrug. Maybe David/Rae/Brendan has an opinion.
Oh, actually I rescind my shrug. If we used #ifdef CONFIG_PAGE_ALLOC_KUNIT_TEST, then -Wmissing-prototypes would fire for people running other KUnit tests.
So yeah the function needs to be non-static exactly when there's a prototype in the header.
This is the bare minimum to illustrate what KUnit code would look like that covers the page allocator.
Even this trivial test illustrates a couple of nice things that are possible when testing via KUnit
1. We can directly assert that the correct zone was used. (Although note due to the simplistic setup, you can have any zone you like as long as it's ZONE_NORMAL).
2. We can assert that a page got freed. It's probably pretty unlikely that we'd have a bug that actually causes a page to get leaked by the allocator, but it serves as a good example of the kind of assertions we can make by judicously peeking at allocator internals.
Signed-off-by: Brendan Jackman jackmanb@google.com --- mm/page_alloc_test.c | 139 ++++++++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 138 insertions(+), 1 deletion(-)
diff --git a/mm/page_alloc_test.c b/mm/page_alloc_test.c index c6bcfcaf61b57ca35ad1b5fc48fd07d0402843bc..0c4effb151f4cd31ec6a696615a9b6ae4964b332 100644 --- a/mm/page_alloc_test.c +++ b/mm/page_alloc_test.c @@ -26,6 +26,139 @@ } \ })
+#define EXPECT_WITHIN_ZONE(test, page, zone) ({ \ + unsigned long pfn = page_to_pfn(page); \ + unsigned long start_pfn = zone->zone_start_pfn; \ + unsigned long end_pfn = start_pfn + zone->spanned_pages; \ + \ + KUNIT_EXPECT_TRUE_MSG(test, \ + pfn >= start_pfn && pfn < end_pfn, \ + "Wanted PFN 0x%lx - 0x%lx, got 0x%lx", \ + start_pfn, end_pfn, pfn); \ + KUNIT_EXPECT_PTR_EQ_MSG(test, page_zone(page), zone, \ + "Wanted %px (%s), got %px (%s)", \ + zone, zone->name, page_zone(page), page_zone(page)->name); \ +}) + +static void action_nodemask_free(void *ctx) +{ + NODEMASK_FREE(ctx); +} + +/* + * Call __alloc_pages_noprof with a nodemask containing only the nid. + * + * Never returns NULL. + */ +static inline struct page *alloc_pages_force_nid(struct kunit *test, + gfp_t gfp, int order, int nid) +{ + NODEMASK_ALLOC(nodemask_t, nodemask, GFP_KERNEL); + struct page *page; + + KUNIT_ASSERT_NOT_NULL(test, nodemask); + kunit_add_action(test, action_nodemask_free, &nodemask); + nodes_clear(*nodemask); + node_set(nid, *nodemask); + + page = __alloc_pages_noprof(GFP_KERNEL, 0, nid, nodemask); + KUNIT_ASSERT_NOT_NULL(test, page); + return page; +} + +static inline bool page_on_buddy_list(struct page *want_page, struct list_head *head) +{ + struct page *found_page; + + list_for_each_entry(found_page, head, buddy_list) { + if (found_page == want_page) + return true; + } + + return false; +} + +/* Test case parameters that are independent of alloc order. */ +static const struct { + gfp_t gfp_flags; + enum zone_type want_zone; +} alloc_fresh_gfps[] = { + /* + * The way we currently set up the isolated node, everything ends up in + * ZONE_NORMAL. + */ + { .gfp_flags = GFP_KERNEL, .want_zone = ZONE_NORMAL }, + { .gfp_flags = GFP_ATOMIC, .want_zone = ZONE_NORMAL }, + { .gfp_flags = GFP_USER, .want_zone = ZONE_NORMAL }, + { .gfp_flags = GFP_DMA32, .want_zone = ZONE_NORMAL }, +}; + +struct alloc_fresh_test_case { + int order; + int gfp_idx; +}; + +/* Generate test cases as the cross product of orders and alloc_fresh_gfps. */ +static const void *alloc_fresh_gen_params(const void *prev, char *desc) +{ + /* Buffer to avoid allocations. */ + static struct alloc_fresh_test_case tc; + + if (!prev) { + /* First call */ + tc.order = 0; + tc.gfp_idx = 0; + return &tc; + } + + tc.gfp_idx++; + if (tc.gfp_idx >= ARRAY_SIZE(alloc_fresh_gfps)) { + tc.gfp_idx = 0; + tc.order++; + } + if (tc.order > MAX_PAGE_ORDER) + /* Finished. */ + return NULL; + + snprintf(desc, KUNIT_PARAM_DESC_SIZE, "order %d %pGg\n", + tc.order, &alloc_fresh_gfps[tc.gfp_idx].gfp_flags); + return &tc; +} + +/* Smoke test: allocate from a node where everything is in a pristine state. */ +static void test_alloc_fresh(struct kunit *test) +{ + const struct alloc_fresh_test_case *tc = test->param_value; + gfp_t gfp_flags = alloc_fresh_gfps[tc->gfp_idx].gfp_flags; + enum zone_type want_zone_type = alloc_fresh_gfps[tc->gfp_idx].want_zone; + struct zone *want_zone = &NODE_DATA(isolated_node)->node_zones[want_zone_type]; + struct list_head *buddy_list; + struct per_cpu_pages *pcp; + struct page *page, *merged_page; + int cpu; + + page = alloc_pages_force_nid(test, gfp_flags, tc->order, isolated_node); + + EXPECT_WITHIN_ZONE(test, page, want_zone); + + cpu = get_cpu(); + __free_pages(page, 0); + pcp = per_cpu_ptr(want_zone->per_cpu_pageset, cpu); + put_cpu(); + + /* + * Should end up back in the free area when drained. Because everything + * is free, it should get buddy-merged up to the maximum order. + */ + drain_zone_pages(want_zone, pcp); + KUNIT_EXPECT_TRUE(test, PageBuddy(page)); + KUNIT_EXPECT_EQ(test, buddy_order(page), MAX_PAGE_ORDER); + KUNIT_EXPECT_TRUE(test, list_empty(&pcp->lists[MIGRATE_UNMOVABLE])); + merged_page = pfn_to_page(round_down(page_to_pfn(page), 1 << MAX_PAGE_ORDER)); + buddy_list = &want_zone->free_area[MAX_PAGE_ORDER].free_list[MIGRATE_UNMOVABLE]; + KUNIT_EXPECT_TRUE(test, page_on_buddy_list(merged_page, buddy_list)); +} + static void action_drain_pages_all(void *unused) { int cpu; @@ -144,7 +277,11 @@ static void depopulate_isolated_node(struct kunit_suite *suite) WARN_ON(add_memory(0, start, size, MMOP_ONLINE)); WARN_ON(walk_memory_blocks(start, size, NULL, memory_block_online_cb)); } -static struct kunit_case test_cases[] = { {} }; + +static struct kunit_case test_cases[] = { + KUNIT_CASE_PARAM(test_alloc_fresh, alloc_fresh_gen_params), + {} +};
struct kunit_suite page_alloc_test_suite = { .name = "page_alloc",
On Mon, Feb 24, 2025 at 02:47:14PM +0000, Brendan Jackman wrote:
This is the bare minimum to illustrate what KUnit code would look like that covers the page allocator.
Even this trivial test illustrates a couple of nice things that are possible when testing via KUnit
We can directly assert that the correct zone was used. (Although note due to the simplistic setup, you can have any zone you like as long as it's ZONE_NORMAL).
We can assert that a page got freed. It's probably pretty unlikely that we'd have a bug that actually causes a page to get leaked by the allocator, but it serves as a good example of the kind of assertions we can make by judicously peeking at allocator internals.
Signed-off-by: Brendan Jackman jackmanb@google.com
mm/page_alloc_test.c | 139 ++++++++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 138 insertions(+), 1 deletion(-)
diff --git a/mm/page_alloc_test.c b/mm/page_alloc_test.c index c6bcfcaf61b57ca35ad1b5fc48fd07d0402843bc..0c4effb151f4cd31ec6a696615a9b6ae4964b332 100644 --- a/mm/page_alloc_test.c +++ b/mm/page_alloc_test.c @@ -26,6 +26,139 @@ } \ }) +#define EXPECT_WITHIN_ZONE(test, page, zone) ({ \
ultranit: the '' above is unaligned with the ones below.
- unsigned long pfn = page_to_pfn(page); \
- unsigned long start_pfn = zone->zone_start_pfn; \
- unsigned long end_pfn = start_pfn + zone->spanned_pages; \
\
- KUNIT_EXPECT_TRUE_MSG(test, \
pfn >= start_pfn && pfn < end_pfn, \
"Wanted PFN 0x%lx - 0x%lx, got 0x%lx", \
start_pfn, end_pfn, pfn); \
- KUNIT_EXPECT_PTR_EQ_MSG(test, page_zone(page), zone, \
"Wanted %px (%s), got %px (%s)", \
zone, zone->name, page_zone(page), page_zone(page)->name); \
+})
+static void action_nodemask_free(void *ctx) +{
- NODEMASK_FREE(ctx);
+}
+/*
- Call __alloc_pages_noprof with a nodemask containing only the nid.
- Never returns NULL.
- */
+static inline struct page *alloc_pages_force_nid(struct kunit *test,
gfp_t gfp, int order, int nid)
+{
- NODEMASK_ALLOC(nodemask_t, nodemask, GFP_KERNEL);
For the sake of the test can't we just put the nodemask on the stack?
- struct page *page;
- KUNIT_ASSERT_NOT_NULL(test, nodemask);
- kunit_add_action(test, action_nodemask_free, &nodemask);
Why aren't we just freeing the nodemask after using it, before we make any assertions?
- nodes_clear(*nodemask);
- node_set(nid, *nodemask);
- page = __alloc_pages_noprof(GFP_KERNEL, 0, nid, nodemask);
- KUNIT_ASSERT_NOT_NULL(test, page);
- return page;
+}
+static inline bool page_on_buddy_list(struct page *want_page, struct list_head *head) +{
- struct page *found_page;
nit: This is just an iterator, we can probably just call it 'page'. 'found' is confusing for me because we didn't really search for it.
- list_for_each_entry(found_page, head, buddy_list) {
if (found_page == want_page)
return true;
- }
- return false;
+}
+/* Test case parameters that are independent of alloc order. */ +static const struct {
- gfp_t gfp_flags;
- enum zone_type want_zone;
+} alloc_fresh_gfps[] = {
- /*
* The way we currently set up the isolated node, everything ends up in
* ZONE_NORMAL.
*/
- { .gfp_flags = GFP_KERNEL, .want_zone = ZONE_NORMAL },
- { .gfp_flags = GFP_ATOMIC, .want_zone = ZONE_NORMAL },
- { .gfp_flags = GFP_USER, .want_zone = ZONE_NORMAL },
- { .gfp_flags = GFP_DMA32, .want_zone = ZONE_NORMAL },
+};
+struct alloc_fresh_test_case {
- int order;
- int gfp_idx;
+};
+/* Generate test cases as the cross product of orders and alloc_fresh_gfps. */ +static const void *alloc_fresh_gen_params(const void *prev, char *desc) +{
- /* Buffer to avoid allocations. */
- static struct alloc_fresh_test_case tc;
- if (!prev) {
/* First call */
tc.order = 0;
tc.gfp_idx = 0;
return &tc;
- }
We need to set 'tc' here to whatever 'prev' is pointing at, right?
- tc.gfp_idx++;
- if (tc.gfp_idx >= ARRAY_SIZE(alloc_fresh_gfps)) {
tc.gfp_idx = 0;
tc.order++;
- }
- if (tc.order > MAX_PAGE_ORDER)
/* Finished. */
return NULL;
- snprintf(desc, KUNIT_PARAM_DESC_SIZE, "order %d %pGg\n",
tc.order, &alloc_fresh_gfps[tc.gfp_idx].gfp_flags);
- return &tc;
+}
+/* Smoke test: allocate from a node where everything is in a pristine state. */ +static void test_alloc_fresh(struct kunit *test) +{
- const struct alloc_fresh_test_case *tc = test->param_value;
- gfp_t gfp_flags = alloc_fresh_gfps[tc->gfp_idx].gfp_flags;
- enum zone_type want_zone_type = alloc_fresh_gfps[tc->gfp_idx].want_zone;
- struct zone *want_zone = &NODE_DATA(isolated_node)->node_zones[want_zone_type];
- struct list_head *buddy_list;
- struct per_cpu_pages *pcp;
- struct page *page, *merged_page;
- int cpu;
- page = alloc_pages_force_nid(test, gfp_flags, tc->order, isolated_node);
- EXPECT_WITHIN_ZONE(test, page, want_zone);
- cpu = get_cpu();
- __free_pages(page, 0);
- pcp = per_cpu_ptr(want_zone->per_cpu_pageset, cpu);
- put_cpu();
- /*
* Should end up back in the free area when drained. Because everything
* is free, it should get buddy-merged up to the maximum order.
*/
- drain_zone_pages(want_zone, pcp);
- KUNIT_EXPECT_TRUE(test, PageBuddy(page));
- KUNIT_EXPECT_EQ(test, buddy_order(page), MAX_PAGE_ORDER);
- KUNIT_EXPECT_TRUE(test, list_empty(&pcp->lists[MIGRATE_UNMOVABLE]));
- merged_page = pfn_to_page(round_down(page_to_pfn(page), 1 << MAX_PAGE_ORDER));
- buddy_list = &want_zone->free_area[MAX_PAGE_ORDER].free_list[MIGRATE_UNMOVABLE];
- KUNIT_EXPECT_TRUE(test, page_on_buddy_list(merged_page, buddy_list));
+}
static void action_drain_pages_all(void *unused) { int cpu; @@ -144,7 +277,11 @@ static void depopulate_isolated_node(struct kunit_suite *suite) WARN_ON(add_memory(0, start, size, MMOP_ONLINE)); WARN_ON(walk_memory_blocks(start, size, NULL, memory_block_online_cb)); } -static struct kunit_case test_cases[] = { {} };
+static struct kunit_case test_cases[] = {
- KUNIT_CASE_PARAM(test_alloc_fresh, alloc_fresh_gen_params),
- {}
+}; struct kunit_suite page_alloc_test_suite = { .name = "page_alloc",
-- 2.48.1.601.g30ceb7b040-goog
On Mon, 24 Feb 2025 at 19:26, Yosry Ahmed yosry.ahmed@linux.dev wrote:
+static void action_nodemask_free(void *ctx) +{
NODEMASK_FREE(ctx);
+}
+/*
- Call __alloc_pages_noprof with a nodemask containing only the nid.
- Never returns NULL.
- */
+static inline struct page *alloc_pages_force_nid(struct kunit *test,
gfp_t gfp, int order, int nid)
+{
NODEMASK_ALLOC(nodemask_t, nodemask, GFP_KERNEL);
For the sake of the test can't we just put the nodemask on the stack?
Hm, I think whether or not it's test code is irrelevant to whether we can put it on the stack. Presumably the nodemask code is written as it is because nodemasks can be massive, so we can overflow the stack here just as easily and confusingly as anywhere else.
(I think we're not in a very deep stack here right now but KUnit could easily change that).
FWIW I think when using the mm/.kunitconfig provided in this series it does actually go on the stack.
struct page *page;
KUNIT_ASSERT_NOT_NULL(test, nodemask);
kunit_add_action(test, action_nodemask_free, &nodemask);
Why aren't we just freeing the nodemask after using it, before we make any assertions?
I guess that's just a philosophical question, I usually default to writing KUnit code such that you can throw an assertion in ~anywhere and things just work.
But, I'm not passionate about it, I would also be fine with freeing it directly (it would certainly save quite a few lines of code).
+/* Generate test cases as the cross product of orders and alloc_fresh_gfps. */ +static const void *alloc_fresh_gen_params(const void *prev, char *desc) +{
/* Buffer to avoid allocations. */
static struct alloc_fresh_test_case tc;
if (!prev) {
/* First call */
tc.order = 0;
tc.gfp_idx = 0;
return &tc;
}
We need to set 'tc' here to whatever 'prev' is pointing at, right?
prev always points at tc (or is NULL). Sounds like it needs a comment to that effect!
(Note tc is static).
Ack to everything else, thanks for the review!
On Mon, 24 Feb 2025 at 15:47, Brendan Jackman jackmanb@google.com wrote:
+static inline struct page *alloc_pages_force_nid(struct kunit *test,
gfp_t gfp, int order, int nid)
+{
NODEMASK_ALLOC(nodemask_t, nodemask, GFP_KERNEL);
struct page *page;
KUNIT_ASSERT_NOT_NULL(test, nodemask);
kunit_add_action(test, action_nodemask_free, &nodemask);
nodes_clear(*nodemask);
node_set(nid, *nodemask);
page = __alloc_pages_noprof(GFP_KERNEL, 0, nid, nodemask);
Oops, it's ignoring the gfp argument here.
{ .gfp_flags = GFP_DMA32, .want_zone = ZONE_NORMAL },
And with that fixed, it becomes clear DMA32 allocations can't be expected to succeed in this zone setup.
(Anyway, it's a bit of a silly test regardless, just something to illustrate the KUnit idea).
On 24.02.25 15:47, Brendan Jackman wrote:
The page allocator does a lot of stuff that is not visible to the user in any deterministic way. But this stuff is still important and it would be nice to test that behaviour.
KUnit is a tool for unit-testing kernel-internal APIs. This is an attempt to adopt it the page allocator.
I have been hacking on this as a way to try and test the code I'm writing for my ASI page_alloc integration proposal [0]. It's been extremely useful to be able to "just call it and see what it does". So I wanna gather some feedback on whether this basic idea is of interest before I invest too much more time in it.
You can run these tests like this:
tools/testing/kunit/kunit.py run \ --arch=x86_64 --kernel_args="movablecore=2G" \ --qemu_args="-m 4G" --kunitconfig mm/.kunitconfig
Unit-testing code that has mutable global variables can be a pain. Unit-testing code with mutable global variables _that can change concurrently with the tests_ is basically impossible. So, we need some way to isolate an "instance" of the allocator that doesn't refer to any such concurrently-mutated state.
Luckily, the allocator only has one really important global variable: node_data. So, the approach here is to carve out a subset of that variable which is as isolated as possible from the rest of rthe system, which can be used for deterministic testing. This is achieved by crating a fake "isolated" node at boot, and plugging in memory at test init time.
This is an RFC and not a PATCH because:
I have not taken much care to ensure the isolation is complete. There are probably sources of flakiness and nondeterminism in here.
I suspect the the basic idea might be over-complicated: do we really need memory hotplug here? Do we even need the instance of the allocator we're testing to actual memory behind the pages it's allocating, or could we just hallucinate a new region of vmemmap without any of that awkwardness?
One significant downside of relying on memory hotplug is that the test won't run if we can't hotplug anything out. That means you have to fiddle with the platform to even run the tests - see the --kernel_args and --qemu_args I had to add to my kunit.py command above.
So yeah, other suggestions welcome.
2b. I'm not very confident I'm using the hotplug API properly.
Me neither ;)
Dynamically adding memory to that "fake" node is certainly interesting, but which guarantees do we have that some other features (page migration, memory offlining, page reporting ...) don't interact in weird ways with this "fake" node? Adding special-casing all over the place for that feels wrong. I assume this is point 1. you note above.
So I don't quite love the idea on first sight ... but I haven't grasped all details of the full picture yet I'm afraid.
On Tue, Feb 25, 2025 at 11:01:47AM +0100, David Hildenbrand wrote:
This is an RFC and not a PATCH because:
I have not taken much care to ensure the isolation is complete. There are probably sources of flakiness and nondeterminism in here.
I suspect the the basic idea might be over-complicated: do we really need memory hotplug here? Do we even need the instance of the allocator we're testing to actual memory behind the pages it's allocating, or could we just hallucinate a new region of vmemmap without any of that awkwardness?
One significant downside of relying on memory hotplug is that the test won't run if we can't hotplug anything out. That means you have to fiddle with the platform to even run the tests - see the --kernel_args and --qemu_args I had to add to my kunit.py command above.
So yeah, other suggestions welcome.
2b. I'm not very confident I'm using the hotplug API properly.
Me neither ;)
Dynamically adding memory to that "fake" node is certainly interesting, but which guarantees do we have that some other features (page migration, memory offlining, page reporting ...) don't interact in weird ways with this "fake" node? Adding special-casing all over the place for that feels wrong. I assume this is point 1. you note above.
Yeah, basically this is the big downside. Changing the system we're trying to test in order to make it testable can't be avoided entirely, but I am also pretty unhappy with sprinkling "if (node_isolated(node))" everywhere.
I guess the ideal approach is one where, instead of having to modify the meaning of node_data, we just support replacing it completely, e.g:
struct page *__alloc_frozen_pages_noprof(gfp_t gfp, unsigned int order, int preferred_nid, nodemask_t *nodemask, struct pagelist_data *node_data) { struct alloc_context ac = { .node_data = node_data };
// ... }
Ideally this could be done in such a way that it disappears completely outside of KUnit builds, the interface should be private and we'd wanna get rid of any unnecessary pointer chasing with stuff like:
#ifdef CONFIG_PAGE_ALLOC_KUNIT_TESTS static inline struct ac_node_data(struct alloc_context *ac, int node) { return ac->node_data[node]; } #else #define ac_node_data(ac, node) (NODE_DATA(node)) #endif
I initially rejected this approach because it felt "too intrusive", but now that I've actually written this RFC I think it could be less intrusive than the node_isolated() thing I've proposed here.
The most obvious challenges I can see there are:
- There might be places that page_alloc.c calls out to that care about node_data but where we really don't want to plumb the alloc_context through (maybe vmscan.c is already such a place)?
- I dunno how many more such helpers we'd need beyond ac_node_data(), like do we need ac_nodes_possible_mask() etc etc etc?
But maybe worth a try - can you see any obvious reason this idea is stupid?
So I don't quite love the idea on first sight ... but I haven't grasped all details of the full picture yet I'm afraid.
Do you have any thoughts about "making up" memory instead of hot-unplugging real memory for test usage? That might simplify things a bit?
It seems possible that very little mm code cares if the memory we're managing actually exists. (For ASI code we did briefly experiment with tracking information about free pages in the page itself, but it's pretty sketchy and the presence of debug_pagealloc makes me think nobody does it today).
There might be arch-specific issues there, but for unit tests it seems OK if they don't work on every ISA.
On 25.02.25 13:56, Brendan Jackman wrote:
On Tue, Feb 25, 2025 at 11:01:47AM +0100, David Hildenbrand wrote:
This is an RFC and not a PATCH because:
I have not taken much care to ensure the isolation is complete. There are probably sources of flakiness and nondeterminism in here.
I suspect the the basic idea might be over-complicated: do we really need memory hotplug here? Do we even need the instance of the allocator we're testing to actual memory behind the pages it's allocating, or could we just hallucinate a new region of vmemmap without any of that awkwardness?
One significant downside of relying on memory hotplug is that the test won't run if we can't hotplug anything out. That means you have to fiddle with the platform to even run the tests - see the --kernel_args and --qemu_args I had to add to my kunit.py command above.
So yeah, other suggestions welcome.
2b. I'm not very confident I'm using the hotplug API properly.
Me neither ;)
Dynamically adding memory to that "fake" node is certainly interesting, but which guarantees do we have that some other features (page migration, memory offlining, page reporting ...) don't interact in weird ways with this "fake" node? Adding special-casing all over the place for that feels wrong. I assume this is point 1. you note above.
Yeah, basically this is the big downside. Changing the system we're trying to test in order to make it testable can't be avoided entirely, but I am also pretty unhappy with sprinkling "if (node_isolated(node))" everywhere.
I guess the ideal approach is one where, instead of having to modify the meaning of node_data, we just support replacing it completely, e.g:
struct page *__alloc_frozen_pages_noprof(gfp_t gfp, unsigned int order, int preferred_nid, nodemask_t *nodemask, struct pagelist_data *node_data) { struct alloc_context ac = { .node_data = node_data };
// ... }
Ideally this could be done in such a way that it disappears completely outside of KUnit builds, the interface should be private and we'd wanna get rid of any unnecessary pointer chasing with stuff like:
#ifdef CONFIG_PAGE_ALLOC_KUNIT_TESTS static inline struct ac_node_data(struct alloc_context *ac, int node) { return ac->node_data[node]; } #else #define ac_node_data(ac, node) (NODE_DATA(node)) #endif
I initially rejected this approach because it felt "too intrusive", but now that I've actually written this RFC I think it could be less intrusive than the node_isolated() thing I've proposed here.
The most obvious challenges I can see there are:
There might be places that page_alloc.c calls out to that care about node_data but where we really don't want to plumb the alloc_context through (maybe vmscan.c is already such a place)?
I dunno how many more such helpers we'd need beyond ac_node_data(), like do we need ac_nodes_possible_mask() etc etc etc?
But maybe worth a try - can you see any obvious reason this idea is stupid?
So I don't quite love the idea on first sight ... but I haven't grasped all details of the full picture yet I'm afraid.
Do you have any thoughts about "making up" memory instead of hot-unplugging real memory for test usage? That might simplify things a bit?
It seems possible that very little mm code cares if the memory we're
managing actually exists. (For ASI code we did briefly experiment with tracking information about free pages in the page itself, but it's pretty sketchy and the presence of debug_pagealloc makes me think nobody does it today).
At least when it comes to the buddy, only page zeroing+poisoning should access actual page content.
So making up memory might actually work in quite some setups, assuming that it will never get allocated.
The "complicated" thing is still that we are trying to test parts of the buddy in a well-controlled way while other kernel infrastructure is using the buddy in rather uncontrolled ways.
There might be arch-specific issues there, but for unit tests it seems OK if they don't work on every ISA.
Just pointing it out: for memblock tests (tools/testing/memblock/) we actually compile memblock.c to be used in a user space application, stubbing all external function calls etc such that we get the basics running.
It'd probably be quite some work to get page_alloc.c into a similar shape, likely we'd have to move a lot of unrelated-for-the tests stuff, and think about how to handle some nasty details like pcp etc. Just wondering, did you think about that option as well?
The nice thing about such an approach is that we can test the allcator without any possible side effects from the running system.
On Wed, 26 Feb 2025 at 12:52, David Hildenbrand david@redhat.com wrote:
It seems possible that very little mm code cares if the memory we're
managing actually exists. (For ASI code we did briefly experiment with tracking information about free pages in the page itself, but it's pretty sketchy and the presence of debug_pagealloc makes me think nobody does it today).
At least when it comes to the buddy, only page zeroing+poisoning should access actual page content.
So making up memory might actually work in quite some setups, assuming that it will never get allocated.
The "complicated" thing is still that we are trying to test parts of the buddy in a well-controlled way while other kernel infrastructure is using the buddy in rather uncontrolled ways.
Thanks, yeah that makes sense, and I agree that's the hard part. If we can design a way to actually test the interface in an isolated way, where we get the "memory" that we use to do that is kinda secondary and can be changed later.
There might be arch-specific issues there, but for unit tests it seems OK if they don't work on every ISA.
Just pointing it out: for memblock tests (tools/testing/memblock/) we actually compile memblock.c to be used in a user space application, stubbing all external function calls etc such that we get the basics running.
It'd probably be quite some work to get page_alloc.c into a similar shape, likely we'd have to move a lot of unrelated-for-the tests stuff, and think about how to handle some nasty details like pcp etc. Just wondering, did you think about that option as well?
The nice thing about such an approach is that we can test the allcator without any possible side effects from the running system.
Yeah Lorenzo also pointed me to tools/testing/vma and I am pretty sold that it's a better approach than KUnit where it's possible. But, I'm doubtful about using it for page_alloc.
I think it could definitely be a good idea for the really core buddy logic (like rmqueue_buddy() and below), where I'm sure we could stub out stuff like percpu_* and locking and have the tests still be meaningful. But I'm not sure that really low-level code is calling out for more testing.
Whereas I suspect if you zoom out even just to the level of __alloc_frozen_pages_noprof(), it starts to get a bit impractical already. And that's where I really wanna get coverage.
Anyway, I'm thinking the next step here is to explore how to get away from the node_isolated() stuff in this RFC, so I'll keep that idea in mind and try to get a feel for whether it looks possible.
linux-kselftest-mirror@lists.linaro.org