On Thu, 6 Apr 2023 at 23:04, Muhammad Usama Anjum usama.anjum@collabora.com wrote:
On 4/7/23 1:00 AM, Michał Mirosław wrote:
On Thu, 6 Apr 2023 at 19:58, Muhammad Usama Anjum usama.anjum@collabora.com wrote:
[...]
cur->len += n_pages;
p->found_pages += n_pages;
if (p->max_pages && (p->found_pages == p->max_pages))
return PM_SCAN_FOUND_MAX_PAGES;
return 0;
}
if (!p->vec_index || ((p->vec_index + 1) < p->vec_len)) {
It looks that `if (p->vec_index < p->vec_len)` is enough here - if we have vec_len == 0 here, then we'd not fit the entry in the userspace buffer anyway. Am I missing something?
No. I'd explained it with diagram last time: https://lore.kernel.org/all/3c8d9ea0-1382-be0c-8dd2-d490eedd3b55@collabora.c...
I'll add a concise comment here.
So it seems, but I think the code changed a bit and maybe could be simplified now? Since p->vec_len == 0 is currently not valid, the field could count only the entries available in p->vec[] -- IOW: not include p->cur in the count.
I see. But this'll not work as we need to count p->cur to don't go above the maximum count, p->vec_size.
You can subtract 1 from p->vec_size before the page walk to account for the buffer in `cur`.
[...]
+static inline int pagemap_scan_deposit(struct pagemap_scan_private *p,
struct page_region __user *vec,
unsigned long *vec_index)
..._deposit() is used only in single place - please inline.
It is already inline.
Sorry. I mean: please paste the code in place of the single call.
I've made it a separate function to make the code look better in the caller function and logically easier to understand. This function is ugly. do_pagemap_scan() is also already very long function with lots of things happening. If you still insist, I'll remove this function.
Please do remove - it will make the copy to userspace code all neatly together.
[...]
*/
if (is_written && PM_SCAN_OP_IS_WP(p) &&
((end - start < HPAGE_SIZE) ||
(p->max_pages &&
(p->max_pages - p->found_pages) < n_pages))) {
split_huge_pmd(vma, pmd, start);
goto process_smaller_pages;
}
if (p->max_pages &&
p->found_pages + n_pages > p->max_pages)
n_pages = p->max_pages - p->found_pages;
ret = pagemap_scan_output(is_written, is_file, is_present,
is_swap, p, start, n_pages);
if (ret < 0)
return ret;
So let's simplify this:
if (p->max_pages && n_pages > max_pages - found_pages) n_pages = max_pages - found_pages;
if (is_written && DO_WP && n_pages != HPAGE_SIZE / PAGE_SIZE) { split_thp(); goto process_smaller_pages; }
Clever!! This looks very sleek.
BTW, THP handling could be extracted to a function that would return -EAGAIN if it has split the page or it wasn't a THP -- and that would mean `goto process_smaller_pages`.
Other functions in this file handle the THP in this same way. So it feels like more intuitive that we follow to same pattern in this file.
I'll leave it to you. Extracting THP support would avoid a goto and #ifdef inside a function, though (and make the function smaller).
/*
* Allocate smaller buffer to get output from inside the page walk
* functions and walk page range in PAGEMAP_WALK_SIZE size chunks. As
* we want to return output to user in compact form where no two
* consecutive regions should be continuous and have the same flags.
* So store the latest element in p.cur between different walks and
* store the p.cur at the end of the walk to the user buffer.
*/
p.vec = kmalloc_array(p.vec_len, sizeof(struct page_region),
GFP_KERNEL);
if (!p.vec)
return -ENOMEM;
walk_start = walk_end = start;
while (walk_end < end && !ret) {
The loop will stop if a previous iteration returned ENOSPC (and the error will be lost) - is it intended?
It is intentional. -ENOSPC means that the user buffer is full even though there was more memory to walk over. We don't treat this error. So when buffer gets full, we stop walking over further as user buffer has gotten full and return as success.
Thanks. What's the difference between -ENOSPC and PM_SCAN_FOUND_MAX_PAGES? They seem to result in the same effect (code flow).
-ENOSPC --> user buffer has been filled completely PM_SCAN_FOUND_MAX_PAGES --> max_pages have been found, user buffer may still have more space
What is the difference in code behaviour when those two cases are compared? (I'd expect none.)
Best Regards Michał Mirosław