On 05/28/2018 09:21 AM, Michal Hocko wrote:
On Fri 25-05-18 12:43:00, Andrew Morton wrote:
On Fri, 25 May 2018 15:08:53 +0200 Vlastimil Babka vbabka@suse.cz wrote:
we might consider this for 4.17 although I don't know if there's anything currently broken. Stable backports should be more important, but will have to be reviewed carefully, as the code went through many changes. BTW I think that also the ac->preferred_zoneref reset is currently useless if we don't also reset ac->nodemask from a mempolicy to NULL first (which we probably should for the OOM victims etc?), but I would leave that for a separate patch.
Confused. If nothing is currently broken then why is a backport needed? Presumably because we expect breakage in the future? Can you expand on this?
__GFP_THISNODE is documented to _use_ the given node. Allocating from a different one is a bug. Maybe the current code can cope with that or at least doesn't blow up in an obvious way but the bug is still there.
I am still not sure what to do about the zonelist reset. It still seems like an echo from the past
Hmm actually it seems that even at the time of commit 183f6371aac2 introduced the reset, the per-policy zonelists for MPOL_BIND policies were gone for years. Mempolicy only affects which node's zonelist is used, but that always contains all the nodes (unless __GFP_THISNODE) so there's no reason to get another node's zonelist to escape mempolicy restrictions.
Mempolicy restrictions are given as nodemask, so if we want to ignore them for OOM victims etc, we have to reset nodemask instead. But again we have to be careful in case the nodemask doesn't come from mempolicy, but from somebody who might be broken if we ignore it.
but using numa_node_id for __GFP_THISNODE is a clear bug because our task could have been migrated to a cpu on a different than requested node.
Acked-by: Michal Hocko mhocko@suse.com