This patch fixes an issue that a hugetlb uffd-wr-protected mapping can be writable even with uffd-wp bit set. It only happens with hugetlb private mappings, when someone firstly wr-protects a missing pte (which will install a pte marker), then a write to the same page without any prior access to the page.
Userfaultfd-wp trap for hugetlb was implemented in hugetlb_fault() before reaching hugetlb_wp() to avoid taking more locks that userfault won't need. However there's one CoW optimization path that can trigger hugetlb_wp() inside hugetlb_no_page(), which will bypass the trap.
This patch skips hugetlb_wp() for CoW and retries the fault if uffd-wp bit is detected. The new path will only trigger in the CoW optimization path because generic hugetlb_fault() (e.g. when a present pte was wr-protected) will resolve the uffd-wp bit already. Also make sure anonymous UNSHARE won't be affected and can still be resolved, IOW only skip CoW not CoR.
This patch will be needed for v5.19+ hence copy stable.
Reported-by: Muhammad Usama Anjum usama.anjum@collabora.com Cc: linux-stable stable@vger.kernel.org Fixes: 166f3ecc0daf ("mm/hugetlb: hook page faults for uffd write protection") Signed-off-by: Peter Xu peterx@redhat.com ---
Notes:
v2 is not on the list but in an attachment in the reply; this v3 is mostly to make sure it's not the same as the patch used to be attached. Sorry Andrew, we need to drop the queued one as I rewrote the commit message.
Muhammad, I didn't attach your T-b because of the slight functional change. Please feel free to re-attach if it still works for you (which I believe should).
thanks, --- mm/hugetlb.c | 14 ++++++++++++-- 1 file changed, 12 insertions(+), 2 deletions(-)
diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 8bfd07f4c143..a58b3739ed4b 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -5478,7 +5478,7 @@ static vm_fault_t hugetlb_wp(struct mm_struct *mm, struct vm_area_struct *vma, struct folio *pagecache_folio, spinlock_t *ptl) { const bool unshare = flags & FAULT_FLAG_UNSHARE; - pte_t pte; + pte_t pte = huge_ptep_get(ptep); struct hstate *h = hstate_vma(vma); struct page *old_page; struct folio *new_folio; @@ -5487,6 +5487,17 @@ static vm_fault_t hugetlb_wp(struct mm_struct *mm, struct vm_area_struct *vma, unsigned long haddr = address & huge_page_mask(h); struct mmu_notifier_range range;
+ /* + * Never handle CoW for uffd-wp protected pages. It should be only + * handled when the uffd-wp protection is removed. + * + * Note that only the CoW optimization path (in hugetlb_no_page()) + * can trigger this, because hugetlb_fault() will always resolve + * uffd-wp bit first. + */ + if (!unshare && huge_pte_uffd_wp(pte)) + return 0; + /* * hugetlb does not support FOLL_FORCE-style write faults that keep the * PTE mapped R/O such as maybe_mkwrite() would do. @@ -5500,7 +5511,6 @@ static vm_fault_t hugetlb_wp(struct mm_struct *mm, struct vm_area_struct *vma, return 0; }
- pte = huge_ptep_get(ptep); old_page = pte_page(pte);
delayacct_wpcopy_start();
On 3/24/23 7:26 PM, Peter Xu wrote:
This patch fixes an issue that a hugetlb uffd-wr-protected mapping can be writable even with uffd-wp bit set. It only happens with hugetlb private mappings, when someone firstly wr-protects a missing pte (which will install a pte marker), then a write to the same page without any prior access to the page.
Userfaultfd-wp trap for hugetlb was implemented in hugetlb_fault() before reaching hugetlb_wp() to avoid taking more locks that userfault won't need. However there's one CoW optimization path that can trigger hugetlb_wp() inside hugetlb_no_page(), which will bypass the trap.
This patch skips hugetlb_wp() for CoW and retries the fault if uffd-wp bit is detected. The new path will only trigger in the CoW optimization path because generic hugetlb_fault() (e.g. when a present pte was wr-protected) will resolve the uffd-wp bit already. Also make sure anonymous UNSHARE won't be affected and can still be resolved, IOW only skip CoW not CoR.
This patch will be needed for v5.19+ hence copy stable.
Reported-by: Muhammad Usama Anjum usama.anjum@collabora.com Cc: linux-stable stable@vger.kernel.org Fixes: 166f3ecc0daf ("mm/hugetlb: hook page faults for uffd write protection") Signed-off-by: Peter Xu peterx@redhat.com
Tested-by: Muhammad Usama Anjum usama.anjum@collabora.com
Notes:
v2 is not on the list but in an attachment in the reply; this v3 is mostly to make sure it's not the same as the patch used to be attached. Sorry Andrew, we need to drop the queued one as I rewrote the commit message.
Muhammad, I didn't attach your T-b because of the slight functional change. Please feel free to re-attach if it still works for you (which I believe should).
Thank you for the fix.
thanks,
mm/hugetlb.c | 14 ++++++++++++-- 1 file changed, 12 insertions(+), 2 deletions(-)
diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 8bfd07f4c143..a58b3739ed4b 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -5478,7 +5478,7 @@ static vm_fault_t hugetlb_wp(struct mm_struct *mm, struct vm_area_struct *vma, struct folio *pagecache_folio, spinlock_t *ptl) { const bool unshare = flags & FAULT_FLAG_UNSHARE;
- pte_t pte;
- pte_t pte = huge_ptep_get(ptep); struct hstate *h = hstate_vma(vma); struct page *old_page; struct folio *new_folio;
@@ -5487,6 +5487,17 @@ static vm_fault_t hugetlb_wp(struct mm_struct *mm, struct vm_area_struct *vma, unsigned long haddr = address & huge_page_mask(h); struct mmu_notifier_range range;
- /*
* Never handle CoW for uffd-wp protected pages. It should be only
* handled when the uffd-wp protection is removed.
*
* Note that only the CoW optimization path (in hugetlb_no_page())
* can trigger this, because hugetlb_fault() will always resolve
* uffd-wp bit first.
*/
- if (!unshare && huge_pte_uffd_wp(pte))
return 0;
- /*
- hugetlb does not support FOLL_FORCE-style write faults that keep the
- PTE mapped R/O such as maybe_mkwrite() would do.
@@ -5500,7 +5511,6 @@ static vm_fault_t hugetlb_wp(struct mm_struct *mm, struct vm_area_struct *vma, return 0; }
- pte = huge_ptep_get(ptep); old_page = pte_page(pte);
delayacct_wpcopy_start();
On 03/24/23 10:26, Peter Xu wrote:
This patch fixes an issue that a hugetlb uffd-wr-protected mapping can be writable even with uffd-wp bit set. It only happens with hugetlb private mappings, when someone firstly wr-protects a missing pte (which will install a pte marker), then a write to the same page without any prior access to the page.
Userfaultfd-wp trap for hugetlb was implemented in hugetlb_fault() before reaching hugetlb_wp() to avoid taking more locks that userfault won't need. However there's one CoW optimization path that can trigger hugetlb_wp() inside hugetlb_no_page(), which will bypass the trap.
This patch skips hugetlb_wp() for CoW and retries the fault if uffd-wp bit is detected. The new path will only trigger in the CoW optimization path because generic hugetlb_fault() (e.g. when a present pte was wr-protected) will resolve the uffd-wp bit already. Also make sure anonymous UNSHARE won't be affected and can still be resolved, IOW only skip CoW not CoR.
This patch will be needed for v5.19+ hence copy stable.
Reported-by: Muhammad Usama Anjum usama.anjum@collabora.com Cc: linux-stable stable@vger.kernel.org Fixes: 166f3ecc0daf ("mm/hugetlb: hook page faults for uffd write protection") Signed-off-by: Peter Xu peterx@redhat.com
Notes:
v2 is not on the list but in an attachment in the reply; this v3 is mostly to make sure it's not the same as the patch used to be attached. Sorry Andrew, we need to drop the queued one as I rewrote the commit message.
My appologies! I saw the code path missed in v2 and assumed you did not think it applied. So, I said nothing. My bad!
Muhammad, I didn't attach your T-b because of the slight functional change. Please feel free to re-attach if it still works for you (which I believe should).
thanks,
mm/hugetlb.c | 14 ++++++++++++-- 1 file changed, 12 insertions(+), 2 deletions(-)
diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 8bfd07f4c143..a58b3739ed4b 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -5478,7 +5478,7 @@ static vm_fault_t hugetlb_wp(struct mm_struct *mm, struct vm_area_struct *vma, struct folio *pagecache_folio, spinlock_t *ptl) { const bool unshare = flags & FAULT_FLAG_UNSHARE;
- pte_t pte;
- pte_t pte = huge_ptep_get(ptep); struct hstate *h = hstate_vma(vma); struct page *old_page; struct folio *new_folio;
@@ -5487,6 +5487,17 @@ static vm_fault_t hugetlb_wp(struct mm_struct *mm, struct vm_area_struct *vma, unsigned long haddr = address & huge_page_mask(h); struct mmu_notifier_range range;
- /*
* Never handle CoW for uffd-wp protected pages. It should be only
* handled when the uffd-wp protection is removed.
*
* Note that only the CoW optimization path (in hugetlb_no_page())
* can trigger this, because hugetlb_fault() will always resolve
* uffd-wp bit first.
*/
- if (!unshare && huge_pte_uffd_wp(pte))
return 0;
This looks correct. However, since the previous version looked correct I must ask. Can we have unshare set and huge_pte_uffd_wp true? If so, then it seems we would need to possibly propogate that uffd_wp to the new pte as in v2
- /*
- hugetlb does not support FOLL_FORCE-style write faults that keep the
- PTE mapped R/O such as maybe_mkwrite() would do.
@@ -5500,7 +5511,6 @@ static vm_fault_t hugetlb_wp(struct mm_struct *mm, struct vm_area_struct *vma, return 0; }
- pte = huge_ptep_get(ptep); old_page = pte_page(pte);
delayacct_wpcopy_start(); -- 2.39.1
On 24.03.23 23:27, Mike Kravetz wrote:
On 03/24/23 10:26, Peter Xu wrote:
This patch fixes an issue that a hugetlb uffd-wr-protected mapping can be writable even with uffd-wp bit set. It only happens with hugetlb private mappings, when someone firstly wr-protects a missing pte (which will install a pte marker), then a write to the same page without any prior access to the page.
Userfaultfd-wp trap for hugetlb was implemented in hugetlb_fault() before reaching hugetlb_wp() to avoid taking more locks that userfault won't need. However there's one CoW optimization path that can trigger hugetlb_wp() inside hugetlb_no_page(), which will bypass the trap.
This patch skips hugetlb_wp() for CoW and retries the fault if uffd-wp bit is detected. The new path will only trigger in the CoW optimization path because generic hugetlb_fault() (e.g. when a present pte was wr-protected) will resolve the uffd-wp bit already. Also make sure anonymous UNSHARE won't be affected and can still be resolved, IOW only skip CoW not CoR.
This patch will be needed for v5.19+ hence copy stable.
Reported-by: Muhammad Usama Anjum usama.anjum@collabora.com Cc: linux-stable stable@vger.kernel.org Fixes: 166f3ecc0daf ("mm/hugetlb: hook page faults for uffd write protection") Signed-off-by: Peter Xu peterx@redhat.com
Notes:
v2 is not on the list but in an attachment in the reply; this v3 is mostly to make sure it's not the same as the patch used to be attached. Sorry Andrew, we need to drop the queued one as I rewrote the commit message.
My appologies! I saw the code path missed in v2 and assumed you did not think it applied. So, I said nothing. My bad!
Muhammad, I didn't attach your T-b because of the slight functional change. Please feel free to re-attach if it still works for you (which I believe should).
thanks,
mm/hugetlb.c | 14 ++++++++++++-- 1 file changed, 12 insertions(+), 2 deletions(-)
diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 8bfd07f4c143..a58b3739ed4b 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -5478,7 +5478,7 @@ static vm_fault_t hugetlb_wp(struct mm_struct *mm, struct vm_area_struct *vma, struct folio *pagecache_folio, spinlock_t *ptl) { const bool unshare = flags & FAULT_FLAG_UNSHARE;
- pte_t pte;
- pte_t pte = huge_ptep_get(ptep); struct hstate *h = hstate_vma(vma); struct page *old_page; struct folio *new_folio;
@@ -5487,6 +5487,17 @@ static vm_fault_t hugetlb_wp(struct mm_struct *mm, struct vm_area_struct *vma, unsigned long haddr = address & huge_page_mask(h); struct mmu_notifier_range range;
- /*
* Never handle CoW for uffd-wp protected pages. It should be only
* handled when the uffd-wp protection is removed.
*
* Note that only the CoW optimization path (in hugetlb_no_page())
* can trigger this, because hugetlb_fault() will always resolve
* uffd-wp bit first.
*/
- if (!unshare && huge_pte_uffd_wp(pte))
return 0;
This looks correct. However, since the previous version looked correct I must ask. Can we have unshare set and huge_pte_uffd_wp true? If so, then it seems we would need to possibly propogate that uffd_wp to the new pte as in v2
We can. A reproducer would share an anon hugetlb page because parent and child. In the parent, we would uffd-wp that page. We could trigger unsharing by R/O-pinning that page.
On Fri, Mar 24, 2023 at 11:36:53PM +0100, David Hildenbrand wrote:
@@ -5487,6 +5487,17 @@ static vm_fault_t hugetlb_wp(struct mm_struct *mm, struct vm_area_struct *vma, unsigned long haddr = address & huge_page_mask(h); struct mmu_notifier_range range;
- /*
* Never handle CoW for uffd-wp protected pages. It should be only
* handled when the uffd-wp protection is removed.
*
* Note that only the CoW optimization path (in hugetlb_no_page())
* can trigger this, because hugetlb_fault() will always resolve
* uffd-wp bit first.
*/
- if (!unshare && huge_pte_uffd_wp(pte))
return 0;
This looks correct. However, since the previous version looked correct I must ask. Can we have unshare set and huge_pte_uffd_wp true? If so, then it seems we would need to possibly propogate that uffd_wp to the new pte as in v2
Good point, thanks for spotting!
We can. A reproducer would share an anon hugetlb page because parent and child. In the parent, we would uffd-wp that page. We could trigger unsharing by R/O-pinning that page.
Right. This seems to be a separate bug.. It should be triggered in totally different context and much harder due to rare use of RO pins, meanwhile used with userfault-wp.
If both of you agree, I can prepare a separate patch for this bug, and I'll better prepare a reproducer/selftest with it.
On 03/26/23 10:46, Peter Xu wrote:
On Fri, Mar 24, 2023 at 11:36:53PM +0100, David Hildenbrand wrote:
@@ -5487,6 +5487,17 @@ static vm_fault_t hugetlb_wp(struct mm_struct *mm, struct vm_area_struct *vma, unsigned long haddr = address & huge_page_mask(h); struct mmu_notifier_range range;
- /*
* Never handle CoW for uffd-wp protected pages. It should be only
* handled when the uffd-wp protection is removed.
*
* Note that only the CoW optimization path (in hugetlb_no_page())
* can trigger this, because hugetlb_fault() will always resolve
* uffd-wp bit first.
*/
- if (!unshare && huge_pte_uffd_wp(pte))
return 0;
This looks correct. However, since the previous version looked correct I must ask. Can we have unshare set and huge_pte_uffd_wp true? If so, then it seems we would need to possibly propogate that uffd_wp to the new pte as in v2
Good point, thanks for spotting!
We can. A reproducer would share an anon hugetlb page because parent and child. In the parent, we would uffd-wp that page. We could trigger unsharing by R/O-pinning that page.
Right. This seems to be a separate bug.. It should be triggered in totally different context and much harder due to rare use of RO pins, meanwhile used with userfault-wp.
If both of you agree, I can prepare a separate patch for this bug, and I'll better prepare a reproducer/selftest with it.
I am OK with separate patches, and agree that the R/O pinning case is less likely to happen.
Since this patch addresses the issue found by Muhammad,
Reviewed-by: Mike Kravetz mike.kravetz@oracle.com
On 27.03.23 20:34, Mike Kravetz wrote:
On 03/26/23 10:46, Peter Xu wrote:
On Fri, Mar 24, 2023 at 11:36:53PM +0100, David Hildenbrand wrote:
@@ -5487,6 +5487,17 @@ static vm_fault_t hugetlb_wp(struct mm_struct *mm, struct vm_area_struct *vma, unsigned long haddr = address & huge_page_mask(h); struct mmu_notifier_range range;
- /*
* Never handle CoW for uffd-wp protected pages. It should be only
* handled when the uffd-wp protection is removed.
*
* Note that only the CoW optimization path (in hugetlb_no_page())
* can trigger this, because hugetlb_fault() will always resolve
* uffd-wp bit first.
*/
- if (!unshare && huge_pte_uffd_wp(pte))
return 0;
This looks correct. However, since the previous version looked correct I must ask. Can we have unshare set and huge_pte_uffd_wp true? If so, then it seems we would need to possibly propogate that uffd_wp to the new pte as in v2
Good point, thanks for spotting!
We can. A reproducer would share an anon hugetlb page because parent and child. In the parent, we would uffd-wp that page. We could trigger unsharing by R/O-pinning that page.
Right. This seems to be a separate bug.. It should be triggered in totally different context and much harder due to rare use of RO pins, meanwhile used with userfault-wp.
If both of you agree, I can prepare a separate patch for this bug, and I'll better prepare a reproducer/selftest with it.
I am OK with separate patches, and agree that the R/O pinning case is less likely to happen.
Yes, the combination should be rather rare and we can fix that separately. Ideally, we'd try to mimic the same uffd code flow in hugetlb cow/unshare handling that we use in memory.c
Since this patch addresses the issue found by Muhammad,
Reviewed-by: Mike Kravetz mike.kravetz@oracle.com
Hopefully we didn't forget about yet another case :D
Acked-by: David Hildenbrand david@redhat.com
linux-stable-mirror@lists.linaro.org