On 2/16/23 2:12 AM, Peter Xu wrote:
On Wed, Feb 15, 2023 at 03:03:09PM +0500, Muhammad Usama Anjum wrote:
On 2/15/23 1:59 AM, Peter Xu wrote: [..]
static inline bool is_pte_written(pte_t pte) { if ((pte_present(pte) && pte_uffd_wp(pte)) || (pte_swp_uffd_wp_any(pte))) return false; return (pte_present(pte) || is_swap_pte(pte)); }
Could you explain why you don't want to return dirty for !present? A page can be written then swapped out. Don't you want to know that happened (from dirty tracking POV)?
The code looks weird to me too.. We only have three types of ptes: (1) present, (2) swap, (3) none.
Then, "(pte_present() || is_swap_pte())" is the same as !pte_none(). Is that what you're really looking for?
Yes, this is what I've been trying to do. I'll use !pte_none() to make it simpler.
Ah I think I see what you wanted to do now.. But I'm afraid it won't work for all cases.
So IIUC the problem is anon pte can be empty, but since uffd-wp bit doesn't persist on anon (but none) ptes, then we got it lost and we cannot identify it from pages being written. Your solution will solve problem for anonymous, but I think it'll break file memories.
Example:
Consider one shmem page that got mapped, write protected (using UFFDIO_WP ioctl), written again (removing uffd-wp bit automatically), then zapped. The pte will be pte_none() but it's actually written, afaiu.
Maybe it's time we should introduce UFFD_FEATURE_WP_ZEROPAGE, so we'll need to install pte markers for anonymous too (then it will work similarly like shmem/hugetlbfs, that we'll report writting to zero pages), then you'll need to have the new UFFD_FEATURE_WP_ASYNC depend on it. With that I think you can keep using the old check and it should start to work.
Please let me know if my understanding is correct above.
Thank you for identifying it. Your understanding seems on point. I'll have research things up about PTE Markers. I'm looking at your patches about it [1]. Can you refer me to "mm alignment sessions" discussion in form of presentation or if any transcript is available?
No worry now, after a second thought I think zero page is better than pte markers, and I've got a patch that works for it here by injecting zero pages for anonymous:
https://lore.kernel.org/all/20230215210257.224243-1-peterx@redhat.com/
I think we'd also better to enforce your new WP_ASYNC feature bit to depend on this one, so fail the UFFDIO_API if WP_ASYNC && !WP_ZEROPAGE.
Could you please try by rebasing your work upon this one? Hope it'll work for you already. Note again that you'll need to go back to the old is_pte|pmd_written() to make things work always, I think.
Thank you so much for sending the ZEROPAGE patch. I've rebased my patches on top of it and my all tests for anon memory are passing. Now we don't need to touch the page before engaging wp. This is what we wanted to achieve. So !wp flag can be easily translated to soft-dirty flag (is_pte_soft_dirty = is_pte_wp).
I've only a few file mem and shmem tests. I'll write more tests.
[...]
I truly understand how you feel about export_prev_to_out(). It is really difficult to understand. Even I had to made a hard try to come up with the current code to avoid consuming a lot of kernel's memory while giving user the compact output. I can surely map both of these with a dirty looking macro. But I'm unable to find a decent macro to replace these. I think I'll put a comment some where to explain whats going-on.
So maybe I still missed something? I'll read the new version when it comes.
Lets reconvene in next patches if you feel like they can be improved.
Thanks,