+cc Greg for stable question
On Wed, Mar 19, 2025 at 11:22:40AM -0700, Andrei Vagin wrote:
On Mon, Feb 24, 2025 at 2:39 AM David Hildenbrand david@redhat.com wrote:
On 24.02.25 11:18, Lorenzo Stoakes wrote:
[snip]
Acked-by: David Hildenbrand david@redhat.com
Thanks! :)
Something that might be interesting is also extending the PAGEMAP_SCAN ioctl.
Yeah, funny you should mention that, I did see that, but on reading the man page it struck me that it requires the region to be uffd afaict? All the tests seem to establish uffd, and the man page implies it:
To start tracking the written state (flag) of a page or range of memory, the UFFD_FEATURE_WP_ASYNC must be enabled by UFFDIO_API ioctl(2) on userfaultfd and memory range must be registered with UFFDIO_REGISTER ioctl(2) in UFFDIO_REGISTER_MODE_WP mode.
It would be a bit of a weird edge case to add support there. I was excited when I first saw this ioctl, then disappointed afterwards... but maybe I got it wrong?
I never managed to review that fully, but I thing that UFFD_FEATURE_WP_ASYNC thingy is only required for PM_SCAN_CHECK_WPASYNC and PM_SCAN_WP_MATCHING.
See pagemap_scan_test_walk().
I do recall that it works on any VMA.
Ah yes, tools/testing/selftests/mm/vm_util.c ends up using it for pagemap_is_swapped() and friends via page_entry_is() to sanity check that what pagemap gives us is consistent with what pagemap_scan gives us.
So it should work independent of the uffd magic. I might be wrong, though ...
PAGEMAP_SCAN can work without the UFFD magic. CRIU utilizes PAGEMAP_SCAN as a more efficient alternative to /proc/pid/pagemap: https://github.com/checkpoint-restore/criu/blob/d18912fc88f3dc7bde5fdfa35756...
Yeah we ascertained that - is on my list, LSF coming up next week means we aren't great on timing here, but I'll prioritise this. When I'm back.
For CRIU, obtaining information about guard regions is critical. Without this functionality in the kernel, CRIU is broken. We probably should consider backporting these changes to the 6.13 and 6.14 stable branches.
I'm not sure on precedent for backporting a feature like this - Greg? Am happy to do it though.
As a stop gap we can backport the pagemap feature if Greg feels this is appropriate?
[snip]
My thinking was, that if you have a large VMA, with ordinary pagemap you have to copy 8byte per entry (and have room for that somewhere in user space). In theory, with the scanning feature, you can leave that ... scanning to the kernel and don't have to do any copying/allocate space for it in user space etc.
PAGEMAP_SCAN doesn't have this issue and it was one of the reasons to implement it.
Ack.
Thanks, Andrei