huang ying huang.ying.caritas@gmail.com writes:
On Tue, Aug 16, 2022 at 3:39 PM Alistair Popple apopple@nvidia.com wrote:
migrate_vma_setup() has a fast path in migrate_vma_collect_pmd() that installs migration entries directly if it can lock the migrating page. When removing a dirty pte the dirty bit is supposed to be carried over to the underlying page to prevent it being lost.
Currently migrate_vma_*() can only be used for private anonymous mappings. That means loss of the dirty bit usually doesn't result in data loss because these pages are typically not file-backed. However pages may be backed by swap storage which can result in data loss if an attempt is made to migrate a dirty page that doesn't yet have the PageDirty flag set.
In this case migration will fail due to unexpected references but the dirty pte bit will be lost. If the page is subsequently reclaimed data won't be written back to swap storage as it is considered uptodate, resulting in data loss if the page is subsequently accessed.
Prevent this by copying the dirty bit to the page when removing the pte to match what try_to_migrate_one() does.
Signed-off-by: Alistair Popple apopple@nvidia.com Acked-by: Peter Xu peterx@redhat.com Reported-by: Huang Ying ying.huang@intel.com Fixes: 8c3328f1f36a ("mm/migrate: migrate_vma() unmap page from vma while collecting pages") Cc: stable@vger.kernel.org
Changes for v2:
- Fixed up Reported-by tag.
- Added Peter's Acked-by.
- Atomically read and clear the pte to prevent the dirty bit getting set after reading it.
- Added fixes tag
mm/migrate_device.c | 21 ++++++++------------- 1 file changed, 8 insertions(+), 13 deletions(-)
diff --git a/mm/migrate_device.c b/mm/migrate_device.c index 27fb37d..e2d09e5 100644 --- a/mm/migrate_device.c +++ b/mm/migrate_device.c @@ -7,6 +7,7 @@ #include <linux/export.h> #include <linux/memremap.h> #include <linux/migrate.h> +#include <linux/mm.h> #include <linux/mm_inline.h> #include <linux/mmu_notifier.h> #include <linux/oom.h> @@ -61,7 +62,7 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp, struct migrate_vma *migrate = walk->private; struct vm_area_struct *vma = walk->vma; struct mm_struct *mm = vma->vm_mm;
unsigned long addr = start, unmapped = 0;
unsigned long addr = start; spinlock_t *ptl; pte_t *ptep;
@@ -193,11 +194,10 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp, bool anon_exclusive; pte_t swp_pte;
flush_cache_page(vma, addr, pte_pfn(*ptep));
pte = ptep_clear_flush(vma, addr, ptep);
Although I think it's possible to batch the TLB flushing just before unlocking PTL. The current code looks correct.
I think you might be right but I'd rather deal with batch TLB flushing as a separate change that implements it for normal migration as well given we don't seem to do it there either.
Reviewed-by: "Huang, Ying" ying.huang@intel.com
Thanks.
Best Regards, Huang, Ying
anon_exclusive = PageAnon(page) && PageAnonExclusive(page); if (anon_exclusive) {
flush_cache_page(vma, addr, pte_pfn(*ptep));
ptep_clear_flush(vma, addr, ptep);
if (page_try_share_anon_rmap(page)) { set_pte_at(mm, addr, ptep, pte); unlock_page(page);
@@ -205,12 +205,14 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp, mpfn = 0; goto next; }
} else {
ptep_get_and_clear(mm, addr, ptep); } migrate->cpages++;
/* Set the dirty flag on the folio now the pte is gone. */
if (pte_dirty(pte))
folio_mark_dirty(page_folio(page));
/* Setup special migration page table entry */ if (mpfn & MIGRATE_PFN_WRITE) entry = make_writable_migration_entry(
@@ -242,9 +244,6 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp, */ page_remove_rmap(page, vma, false); put_page(page);
if (pte_present(pte))
unmapped++; } else { put_page(page); mpfn = 0;
@@ -257,10 +256,6 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp, arch_leave_lazy_mmu_mode(); pte_unmap_unlock(ptep - 1, ptl);
/* Only flush the TLB if we actually modified any entries */
if (unmapped)
flush_tlb_range(walk->vma, start, end);
return 0;
}
base-commit: ffcf9c5700e49c0aee42dcba9a12ba21338e8136
git-series 0.9.1