Hi Nadav, Mike, Michał,
Can you please share your thoughts at [A] below?
On 2/23/23 12:10 AM, Nadav Amit wrote:
On Feb 20, 2023, at 5:24 AM, Muhammad Usama Anjum usama.anjum@collabora.com wrote:
+static inline int pagemap_scan_pmd_entry(pmd_t *pmd, unsigned long start,
unsigned long end, struct mm_walk *walk)
+{
- struct pagemap_scan_private *p = walk->private;
- struct vm_area_struct *vma = walk->vma;
- unsigned long addr = end;
- spinlock_t *ptl;
- int ret = 0;
- pte_t *pte;
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
- ptl = pmd_trans_huge_lock(pmd, vma);
- if (ptl) {
bool pmd_wt;
pmd_wt = !is_pmd_uffd_wp(*pmd);
/*
* Break huge page into small pages if operation needs to be
performed is
* on a portion of the huge page.
*/
if (pmd_wt && IS_WP_ENGAGE_OP(p) && (end - start < HPAGE_SIZE)) {
spin_unlock(ptl);
split_huge_pmd(vma, pmd, start);
goto process_smaller_pages;
I think that such goto's are really confusing and should be avoided. And using 'else' (could have easily prevented the need for goto). It is not the best solution though, since I think it would have been better to invert the conditions.
Yeah, else can be used here. But then we'll have to add a tab to all the code after adding else. We have already so many tabs and very less space to right code. Not sure which is better.
goto’s are usually not the right solution. You can extract things into a different function if you have to.
I’m not sure why IS_GET_OP(p) might be false and what’s the meaning of taking the lock and dropping it in such a case. I think that the code can be simplified and additional condition nesting can be avoided.
Lock is taken and we check if pmd has UFFD_WP set or not. In the next version, the GET check has been removed as we have dropped WP_ENGAGE + !GET operation. So get is always specified and condition isn't needed.
Please comment on next version if you want anything more optimized.
--- a/include/uapi/linux/fs.h +++ b/include/uapi/linux/fs.h @@ -305,4 +305,54 @@ typedef int __bitwise __kernel_rwf_t; #define RWF_SUPPORTED (RWF_HIPRI | RWF_DSYNC | RWF_SYNC | RWF_NOWAIT |\ RWF_APPEND) +/* Pagemap ioctl */ +#define PAGEMAP_SCAN _IOWR('f', 16, struct pagemap_scan_arg)
+/* Bits are set in the bitmap of the page_region and masks in pagemap_scan_args */ +#define PAGE_IS_WRITTEN (1 << 0) +#define PAGE_IS_FILE (1 << 1) +#define PAGE_IS_PRESENT (1 << 2) +#define PAGE_IS_SWAPPED (1 << 3)
These names are way too generic and are likely to be misused for the wrong purpose. The "_IS_" part seems confusing as well. So I think the naming needs to be fixed and some new type (using typedef) or enum should be introduced to hold these flags. I understand it is part of uapi and it is less common there, but it is not unheard of and does make things clearer.
Do you think PM_SCAN_PAGE_IS_* work here?
Can we lose the IS somehow?
[A] Do you think these names would work better: PM_SCAN_WRITTEN_PAGE, PM_SCAN_FILE_PAGE, PM_SCAN_SWAP_PAGE, PM_SCAN_PRESENT_PAGE?
+/*
- struct page_region - Page region with bitmap flags
- @start: Start of the region
- @len: Length of the region
- bitmap: Bits sets for the region
- */
+struct page_region {
- __u64 start;
- __u64 len;
I presume in bytes. Would be useful to mention.
Length of region in pages.
Very unintuitive to me I must say. If the start is an address, I would expect the len to be in bytes.
The PAGEMAP_SCAN ioctl is working on page granularity level. We tell the user if a page has certain flags are not. Keeping length in bytes doesn't makes sense.