On Thu 06-12-18 05:21:38, Naoya Horiguchi wrote:
On Wed, Dec 05, 2018 at 05:57:16PM +0100, Michal Hocko wrote:
On Wed 05-12-18 13:29:18, Michal Hocko wrote: [...]
After some more thinking I am not really sure the above reasoning is still true with the current upstream kernel. Maybe I just managed to confuse myself so please hold off on this patch for now. Testing by Oscar has shown this patch is helping but the changelog might need to be updated.
OK, so Oscar has nailed it down and it seems that 4.4 kernel we have been debugging on behaves slightly different. The underlying problem is the same though. So I have reworded the changelog and added "just in case" PageLRU handling. Naoya, maybe you have an argument that would make this void for current upstream kernels.
The following commit (not in 4.4.x stable tree) might explain the difference you experienced:
commit 286c469a988fbaf68e3a97ddf1e6c245c1446968 Author: Naoya Horiguchi n-horiguchi@ah.jp.nec.com Date: Wed May 3 14:56:22 2017 -0700 mm: hwpoison: call shake_page() after try_to_unmap() for mlocked page
This commit adds shake_page() for mlocked pages to make sure that the target page is flushed out from LRU cache. Without this shake_page(), subsequent delete_from_lru_cache() (from me_pagecache_clean()) fails to isolate it and the page will finally return back to LRU list. So this scenario leads to "hwpoisoned by still linked to LRU list" page.
OK, I see. So does that mean that the LRU handling is no longer needed and there is a guanratee that all kernels with the above commit cannot ever get an LRU page?