On 6 Mar 2025, at 11:21, Zi Yan wrote:
On 5 Mar 2025, at 17:38, Hugh Dickins wrote:
On Wed, 5 Mar 2025, Zi Yan wrote:
On 5 Mar 2025, at 16:03, Hugh Dickins wrote:
Beyond checking that, I didn't have time yesterday to investigate further, but I'll try again today (still using last weekend's mm.git).
I am trying to replicate your runs locally. Can you clarify your steps of “kernel builds on huge tmpfs while swapping to SSD”? Do you impose a memory limit so that anonymous memory is swapped to SSD or make tmpfs swap to SSD?
Yeah, my heart sank a bit when I saw Andrew (with good intention) asking you to repeat my testing.
We could spend weeks going back and forth on that, and neither of us has weeks to spare.
"To fulfil contractual obligations" I'll mail you the tarfile I send out each time I'm asked for this; but I haven't updated that tarfile in four years, whereas I'm frequently tweaking things to match what's needed (most recently and relevantly, I guess enabling 64kB hugepages for anon and shmem in addition to the PMD-sized).
Please don't waste much of your time over trying to replicate what I'm doing: just give the scripts a glance, as a source for "oh, I could exercise something like that in my testing too" ideas.
Yes, I limit physical memory by booting with mem=1G, and also apply lower memcg v1 limits.
I made a point of saying "SSD" there because I'm not testing zram or zswap at all, whereas many others are testing those rather than disk.
swapoff, and ext4 on loop0 on tmpfs, feature in what I exercise, but are NOT relevant to the corruption I'm seeing here - that can occur before any swapoff, and it's always on the kernel build in tmpfs: the parallel build in ext4 on loop0 on tmpfs completes successfully.
Thanks for the scripts. I kinda replicate your setup as follows:
- boot a VM with 1GB memory and 8 cores;
- mount a tmpfs with huge=always and 200GB;
- clone the mainline kernel and use x86_64 defconfig (my gcc 14 gives errors during the old kernel builds), this takes about 2GB space, so some of tmpfs is already swapped to SSD;
- create a new cgroupv2 and set memory.high to 700MB to induce memory swap during kernel compilation;
- run “while true; do echo 1 | sudo tee /proc/sys/vm/compact_memory >/dev/null; done” to trigger compaction all the time;
- build the kernel with make -j20.
I ran the above on mm-everything-2025-03-05-03-54 plus the xarray fix v3, folio_split() with your fixes, and Minimize xa_node allocation during xarry split patches. The repo is at: https://github.com/x-y-z/linux-dev/tree/shmem_fix-mm-everything-2025-03-05-0....
It has ran over night for 30 kernel builds and no crash happened so far. I wonder if you can give my repo a shot.
I just boosted khugepaged like you did and see no immediate crash. But I will let it run for longer.
I have run this over night and have not seen any crash. I assume it is stable. I am going to send V10 and resend Minimize xa_node allocation during xarry split.
Best Regards, Yan, Zi