On Wed, 26 Feb 2025 at 12:52, David Hildenbrand david@redhat.com wrote:
It seems possible that very little mm code cares if the memory we're
managing actually exists. (For ASI code we did briefly experiment with tracking information about free pages in the page itself, but it's pretty sketchy and the presence of debug_pagealloc makes me think nobody does it today).
At least when it comes to the buddy, only page zeroing+poisoning should access actual page content.
So making up memory might actually work in quite some setups, assuming that it will never get allocated.
The "complicated" thing is still that we are trying to test parts of the buddy in a well-controlled way while other kernel infrastructure is using the buddy in rather uncontrolled ways.
Thanks, yeah that makes sense, and I agree that's the hard part. If we can design a way to actually test the interface in an isolated way, where we get the "memory" that we use to do that is kinda secondary and can be changed later.
There might be arch-specific issues there, but for unit tests it seems OK if they don't work on every ISA.
Just pointing it out: for memblock tests (tools/testing/memblock/) we actually compile memblock.c to be used in a user space application, stubbing all external function calls etc such that we get the basics running.
It'd probably be quite some work to get page_alloc.c into a similar shape, likely we'd have to move a lot of unrelated-for-the tests stuff, and think about how to handle some nasty details like pcp etc. Just wondering, did you think about that option as well?
The nice thing about such an approach is that we can test the allcator without any possible side effects from the running system.
Yeah Lorenzo also pointed me to tools/testing/vma and I am pretty sold that it's a better approach than KUnit where it's possible. But, I'm doubtful about using it for page_alloc.
I think it could definitely be a good idea for the really core buddy logic (like rmqueue_buddy() and below), where I'm sure we could stub out stuff like percpu_* and locking and have the tests still be meaningful. But I'm not sure that really low-level code is calling out for more testing.
Whereas I suspect if you zoom out even just to the level of __alloc_frozen_pages_noprof(), it starts to get a bit impractical already. And that's where I really wanna get coverage.
Anyway, I'm thinking the next step here is to explore how to get away from the node_isolated() stuff in this RFC, so I'll keep that idea in mind and try to get a feel for whether it looks possible.