On Tue 16-07-24 16:00:13, Kirill A. Shutemov wrote:
Unaccepted memory is considered unusable free memory, which is not counted as free on the zone watermark check. This causes get_page_from_freelist() to accept more memory to hit the high watermark, but it creates problems in the reclaim path.
The reclaim path encounters a failed zone watermark check and attempts to reclaim memory. This is usually successful, but if there is little or no reclaimable memory, it can result in endless reclaim with little to no progress. This can occur early in the boot process, just after start of the init process when the only reclaimable memory is the page cache of the init executable and its libraries.
How does this happen when try_to_accept_memory is the first thing to do when wmark check fails in the allocation path?
Could you describe what was the initial configuration of the system? How much of the unaccepted memory was there to trigger this?
To address this issue, teach shrink_node() and shrink_zones() to accept memory before attempting to reclaim.
Signed-off-by: Kirill A. Shutemov kirill.shutemov@linux.intel.com Reported-by: Jianxiong Gao jxgao@google.com Fixes: dcdfdd40fa82 ("mm: Add support for unaccepted memory") Cc: stable@vger.kernel.org # v6.5+
[...]
static void shrink_node(pg_data_t *pgdat, struct scan_control *sc) { unsigned long nr_reclaimed, nr_scanned, nr_node_reclaimed; struct lruvec *target_lruvec; bool reclaimable = false;
- /* Try to accept memory before going for reclaim */
- if (node_try_to_accept_memory(pgdat, sc)) {
if (!should_continue_reclaim(pgdat, 0, sc))
return;
- }
This would need an exemption from the memcg reclaim.
if (lru_gen_enabled() && root_reclaim(sc)) { lru_gen_shrink_node(pgdat, sc); return;