On Fri 12-02-21 11:42:15, David Hildenbrand wrote:
On 12.02.21 11:33, Michal Hocko wrote:
[...]
I have to digest this but my first impression is that this is more heavy weight than it needs to. Pfn walkers should normally obey node range at least. The first pfn is usually excluded but I haven't seen real
We've seen examples where this is not sufficient. Simple example:
Have your physical memory end within a memory section. Easy via QEMU, just do a "-m 4000M". The remaining part of the last section has fake/wrong node/zone info.
Does this really matter though. If those pages are reserved then nobody will touch them regardless of their node/zone ids.
Hotplug memory. The node/zone gets resized such that PFN walkers might stumble over it.
The basic idea is to make sure that any initialized/"online" pfn belongs to exactly one node/zone and that the node/zone spans that PFN.
Yeah, this sounds like a good idea but what is the poper node for hole between two ranges associated with a different nodes/zones? This will always be a random number. We should have a clear way to tell "do not touch those pages" and PageReserved sounds like a good way to tell that.
problems with that. The VM_BUG_ON blowing up is really bad but as said above we can simply make it less offensive in presence of reserved pages as those shouldn't reach that path AFAICS normally.
Andrea tried tried working around if via PG_reserved pages and it resulted in quite some ugly code. Andrea also noted that we cannot rely on any random page walker to do the right think when it comes to messed up node/zone info.
I am sorry, I haven't followed previous discussions. Has the removal of the VM_BUG_ON been considered as an immediate workaround?