On Mon, Mar 23, 2020 at 03:29:41PM +0100, Michal Hocko wrote:
On Mon 23-03-20 10:16:59, Rafael Aquini wrote:
On Sun, Mar 22, 2020 at 09:31:04AM -0700, Shakeel Butt wrote:
On Sat, Mar 21, 2020 at 6:35 PM Rafael Aquini aquini@redhat.com wrote:
Changes for commit 9c4e6b1a7027f ("mm, mlock, vmscan: no more skipping pagevecs") break this test expectations on the behavior of mlock syscall family immediately inserting the recently faulted pages into the UNEVICTABLE_LRU, when MCL_ONFAULT is passed to the syscall as part of its flag-set.
mlock* syscalls do not provide any guarantee that the pages will be in unevictable LRU, only that the pages will not be paged-out. The test is checking something very internal to the kernel and this is expected to break.
It was a check expected to be satisfied before the commit, though. Getting the mlocked pages inserted directly into the unevictable LRU, skipping the pagevec, was established behavior before the aforementioned commit, and even though one could argue userspace should not be aware, or care, about such inner kernel circles the program in question is not an ordinary userspace app, but a kernel selftest that is supposed to check for the functionality correctness.
But mlock (in neither mode) is reall forced to put pages to the UNEVICTABLE_LRU for correctness. If the test is really depending on it then the selftest is bogus. A real MCL_ONFAULT test should focus on the user observable contract of this api. And that is that a new mapping doesn't fault in the page during the mlock call but the memory is locked after the memory is faulted in. You can use different methods to observe locked memory - e.g. try to reclaim it and check or check /proc/<pid>/smaps
Again, I don't think the test is bogus, although it's (now) expecting something that is not guaranteed after the referred commit. The check for PG_unevictable set on the page backing up the mapping seems reasonable, as the flag is supposed to be there, if everything went on fine after the mlock call.
-- Rafael