This is a note to let you know that I've just added the patch titled
usb: gadget: dummy_hcd: fix gpf in gadget_setup
to my usb git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git
in the usb-testing branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will be merged to the usb-next branch sometime soon,
after it passes testing, and the merge window is open.
If you have any questions about this process, please let me know.
>From 4a5d797a9f9c4f18585544237216d7812686a71f Mon Sep 17 00:00:00 2001
From: Anirudh Rayabharam <mail(a)anirudhrb.com>
Date: Mon, 19 Apr 2021 09:07:08 +0530
Subject: usb: gadget: dummy_hcd: fix gpf in gadget_setup
Fix a general protection fault reported by syzbot due to a race between
gadget_setup() and gadget_unbind() in raw_gadget.
The gadget core is supposed to guarantee that there won't be any more
callbacks to the gadget driver once the driver's unbind routine is
called. That guarantee is enforced in usb_gadget_remove_driver as
follows:
usb_gadget_disconnect(udc->gadget);
if (udc->gadget->irq)
synchronize_irq(udc->gadget->irq);
udc->driver->unbind(udc->gadget);
usb_gadget_udc_stop(udc);
usb_gadget_disconnect turns off the pullup resistor, telling the host
that the gadget is no longer connected and preventing the transmission
of any more USB packets. Any packets that have already been received
are sure to processed by the UDC driver's interrupt handler by the time
synchronize_irq returns.
But this doesn't work with dummy_hcd, because dummy_hcd doesn't use
interrupts; it uses a timer instead. It does have code to emulate the
effect of synchronize_irq, but that code doesn't get invoked at the
right time -- it currently runs in usb_gadget_udc_stop, after the unbind
callback instead of before. Indeed, there's no way for
usb_gadget_remove_driver to invoke this code before the unbind callback.
To fix this, move the synchronize_irq() emulation code to dummy_pullup
so that it runs before unbind. Also, add a comment explaining why it is
necessary to have it there.
Reported-by: syzbot+eb4674092e6cc8d9e0bd(a)syzkaller.appspotmail.com
Suggested-by: Alan Stern <stern(a)rowland.harvard.edu>
Acked-by: Alan Stern <stern(a)rowland.harvard.edu>
Signed-off-by: Anirudh Rayabharam <mail(a)anirudhrb.com>
Link: https://lore.kernel.org/r/20210419033713.3021-1-mail@anirudhrb.com
Cc: stable <stable(a)vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/usb/gadget/udc/dummy_hcd.c | 23 +++++++++++++++--------
1 file changed, 15 insertions(+), 8 deletions(-)
diff --git a/drivers/usb/gadget/udc/dummy_hcd.c b/drivers/usb/gadget/udc/dummy_hcd.c
index ce24d4f28f2a..7db773c87379 100644
--- a/drivers/usb/gadget/udc/dummy_hcd.c
+++ b/drivers/usb/gadget/udc/dummy_hcd.c
@@ -903,6 +903,21 @@ static int dummy_pullup(struct usb_gadget *_gadget, int value)
spin_lock_irqsave(&dum->lock, flags);
dum->pullup = (value != 0);
set_link_state(dum_hcd);
+ if (value == 0) {
+ /*
+ * Emulate synchronize_irq(): wait for callbacks to finish.
+ * This seems to be the best place to emulate the call to
+ * synchronize_irq() that's in usb_gadget_remove_driver().
+ * Doing it in dummy_udc_stop() would be too late since it
+ * is called after the unbind callback and unbind shouldn't
+ * be invoked until all the other callbacks are finished.
+ */
+ while (dum->callback_usage > 0) {
+ spin_unlock_irqrestore(&dum->lock, flags);
+ usleep_range(1000, 2000);
+ spin_lock_irqsave(&dum->lock, flags);
+ }
+ }
spin_unlock_irqrestore(&dum->lock, flags);
usb_hcd_poll_rh_status(dummy_hcd_to_hcd(dum_hcd));
@@ -1004,14 +1019,6 @@ static int dummy_udc_stop(struct usb_gadget *g)
spin_lock_irq(&dum->lock);
dum->ints_enabled = 0;
stop_activity(dum);
-
- /* emulate synchronize_irq(): wait for callbacks to finish */
- while (dum->callback_usage > 0) {
- spin_unlock_irq(&dum->lock);
- usleep_range(1000, 2000);
- spin_lock_irq(&dum->lock);
- }
-
dum->driver = NULL;
spin_unlock_irq(&dum->lock);
--
2.31.1
This is a note to let you know that I've just added the patch titled
usb: dwc3: gadget: Fix START_TRANSFER link state check
to my usb git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git
in the usb-testing branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will be merged to the usb-next branch sometime soon,
after it passes testing, and the merge window is open.
If you have any questions about this process, please let me know.
>From c560e76319a94a3b9285bc426c609903408e4826 Mon Sep 17 00:00:00 2001
From: Thinh Nguyen <Thinh.Nguyen(a)synopsys.com>
Date: Mon, 19 Apr 2021 19:11:12 -0700
Subject: usb: dwc3: gadget: Fix START_TRANSFER link state check
The START_TRANSFER command needs to be executed while in ON/U0 link
state (with an exception during register initialization). Don't use
dwc->link_state to check this since the driver only tracks the link
state when the link state change interrupt is enabled. Check the link
state from DSTS register instead.
Note that often the host already brings the device out of low power
before it sends/requests the next transfer. So, the user won't see any
issue when the device starts transfer then. This issue is more
noticeable in cases when the device delays starting transfer, which can
happen during delayed control status after the host put the device in
low power.
Fixes: 799e9dc82968 ("usb: dwc3: gadget: conditionally disable Link State change events")
Cc: <stable(a)vger.kernel.org>
Acked-by: Felipe Balbi <balbi(a)kernel.org>
Signed-off-by: Thinh Nguyen <Thinh.Nguyen(a)synopsys.com>
Link: https://lore.kernel.org/r/bcefaa9ecbc3e1936858c0baa14de6612960e909.16188842…
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/usb/dwc3/gadget.c | 13 +++++++------
1 file changed, 7 insertions(+), 6 deletions(-)
diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
index 6227641f2d31..1a632a3faf7f 100644
--- a/drivers/usb/dwc3/gadget.c
+++ b/drivers/usb/dwc3/gadget.c
@@ -308,13 +308,12 @@ int dwc3_send_gadget_ep_cmd(struct dwc3_ep *dep, unsigned int cmd,
}
if (DWC3_DEPCMD_CMD(cmd) == DWC3_DEPCMD_STARTTRANSFER) {
- int needs_wakeup;
+ int link_state;
- needs_wakeup = (dwc->link_state == DWC3_LINK_STATE_U1 ||
- dwc->link_state == DWC3_LINK_STATE_U2 ||
- dwc->link_state == DWC3_LINK_STATE_U3);
-
- if (unlikely(needs_wakeup)) {
+ link_state = dwc3_gadget_get_link_state(dwc);
+ if (link_state == DWC3_LINK_STATE_U1 ||
+ link_state == DWC3_LINK_STATE_U2 ||
+ link_state == DWC3_LINK_STATE_U3) {
ret = __dwc3_gadget_wakeup(dwc);
dev_WARN_ONCE(dwc->dev, ret, "wakeup failed --> %d\n",
ret);
@@ -1989,6 +1988,8 @@ static int __dwc3_gadget_wakeup(struct dwc3 *dwc)
case DWC3_LINK_STATE_RESET:
case DWC3_LINK_STATE_RX_DET: /* in HS, means Early Suspend */
case DWC3_LINK_STATE_U3: /* in HS, means SUSPEND */
+ case DWC3_LINK_STATE_U2: /* in HS, means Sleep (L1) */
+ case DWC3_LINK_STATE_U1:
case DWC3_LINK_STATE_RESUME:
break;
default:
--
2.31.1
This reverts commit 32f47179833b63de72427131169809065db6745e.
Commits from @umn.edu addresses have been found to be submitted in "bad
faith" to try to test the kernel community's ability to review "known
malicious" changes. The result of these submissions can be found in a
paper published at the 42nd IEEE Symposium on Security and Privacy
entitled, "Open Source Insecurity: Stealthily Introducing
Vulnerabilities via Hypocrite Commits" written by Qiushi Wu (University
of Minnesota) and Kangjie Lu (University of Minnesota).
Because of this, all submissions from this group must be reverted from
the kernel tree and will need to be re-reviewed again to determine if
they actually are a valid fix. Until that work is complete, remove this
change to ensure that no problems are being introduced into the
codebase.
Cc: Aditya Pakki <pakki001(a)umn.edu>
Cc: stable <stable(a)vger.kernel.org>
Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/tty/serial/mvebu-uart.c | 3 ---
1 file changed, 3 deletions(-)
diff --git a/drivers/tty/serial/mvebu-uart.c b/drivers/tty/serial/mvebu-uart.c
index e0c00a1b0763..51b0ecabf2ec 100644
--- a/drivers/tty/serial/mvebu-uart.c
+++ b/drivers/tty/serial/mvebu-uart.c
@@ -818,9 +818,6 @@ static int mvebu_uart_probe(struct platform_device *pdev)
return -EINVAL;
}
- if (!match)
- return -ENODEV;
-
/* Assume that all UART ports have a DT alias or none has */
id = of_alias_get_id(pdev->dev.of_node, "serial");
if (!pdev->dev.of_node || id < 0)
--
2.31.1
The patch titled
Subject: ovl: fix reference counting in ovl_mmap error path
has been added to the -mm tree. Its filename is
ovl-fix-reference-counting-in-ovl_mmap-error-path.patch
This patch should soon appear at
https://ozlabs.org/~akpm/mmots/broken-out/ovl-fix-reference-counting-in-ovl…
and later at
https://ozlabs.org/~akpm/mmotm/broken-out/ovl-fix-reference-counting-in-ovl…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Christian König <christian.koenig(a)amd.com>
Subject: ovl: fix reference counting in ovl_mmap error path
mmap_region() now calls fput() on the vma->vm_file.
Fix this by using vma_set_file() so it doesn't need to be handled manually
here any more.
Link: https://lkml.kernel.org/r/20210421132012.82354-2-christian.koenig@amd.com
Fixes: 1527f926fd04 ("mm: mmap: fix fput in error path v2")
Signed-off-by: Christian König <christian.koenig(a)amd.com>
Cc: Jan Harkes <jaharkes(a)cs.cmu.edu>
Cc: Miklos Szeredi <miklos(a)szeredi.hu>
Cc: Jason Gunthorpe <jgg(a)ziepe.ca>
Cc: <stable(a)vger.kernel.org> [5.11+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/overlayfs/file.c | 11 +----------
1 file changed, 1 insertion(+), 10 deletions(-)
--- a/fs/overlayfs/file.c~ovl-fix-reference-counting-in-ovl_mmap-error-path
+++ a/fs/overlayfs/file.c
@@ -430,20 +430,11 @@ static int ovl_mmap(struct file *file, s
if (WARN_ON(file != vma->vm_file))
return -EIO;
- vma->vm_file = get_file(realfile);
+ vma_set_file(vma, realfile);
old_cred = ovl_override_creds(file_inode(file)->i_sb);
ret = call_mmap(vma->vm_file, vma);
revert_creds(old_cred);
-
- if (ret) {
- /* Drop reference count from new vm_file value */
- fput(realfile);
- } else {
- /* Drop reference count from previous vm_file value */
- fput(file);
- }
-
ovl_file_accessed(file);
return ret;
_
Patches currently in -mm which might be from christian.koenig(a)amd.com are
coda-fix-reference-counting-in-coda_file_mmap-error-path.patch
ovl-fix-reference-counting-in-ovl_mmap-error-path.patch
The patch titled
Subject: coda: fix reference counting in coda_file_mmap error path
has been added to the -mm tree. Its filename is
coda-fix-reference-counting-in-coda_file_mmap-error-path.patch
This patch should soon appear at
https://ozlabs.org/~akpm/mmots/broken-out/coda-fix-reference-counting-in-co…
and later at
https://ozlabs.org/~akpm/mmotm/broken-out/coda-fix-reference-counting-in-co…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Christian König <christian.koenig(a)amd.com>
Subject: coda: fix reference counting in coda_file_mmap error path
mmap_region() now calls fput() on the vma->vm_file.
So we need to drop the extra reference on the coda file instead of the
host file.
Link: https://lkml.kernel.org/r/20210421132012.82354-1-christian.koenig@amd.com
Fixes: 1527f926fd04 ("mm: mmap: fix fput in error path v2")
Signed-off-by: Christian König <christian.koenig(a)amd.com>
Cc: Jan Harkes <jaharkes(a)cs.cmu.edu>
Cc: Miklos Szeredi <miklos(a)szeredi.hu>
Cc: Jason Gunthorpe <jgg(a)ziepe.ca>
Cc: <stable(a)vger.kernel.org> [5.11+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/coda/file.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
--- a/fs/coda/file.c~coda-fix-reference-counting-in-coda_file_mmap-error-path
+++ a/fs/coda/file.c
@@ -175,10 +175,10 @@ coda_file_mmap(struct file *coda_file, s
ret = call_mmap(vma->vm_file, vma);
if (ret) {
- /* if call_mmap fails, our caller will put coda_file so we
- * should drop the reference to the host_file that we got.
+ /* if call_mmap fails, our caller will put host_file so we
+ * should drop the reference to the coda_file that we got.
*/
- fput(host_file);
+ fput(coda_file);
kfree(cvm_ops);
} else {
/* here we add redirects for the open/close vm_operations */
_
Patches currently in -mm which might be from christian.koenig(a)amd.com are
coda-fix-reference-counting-in-coda_file_mmap-error-path.patch
ovl-fix-reference-counting-in-ovl_mmap-error-path.patch
Good Day,
I hope this email finds you well.
We are pleased to invite you or your company to quote below
listed item:
Product/Model No: A702TH FYNE PRESSURE REGULATOR
Model Number: A702TH
Qty. 30 units
Important note: Kindly send your quotation to:
quote(a)pfizersuppliers.com
for immediate approval.
Kind Regards,
Albert Bourla
PFIZER B.V Supply Chain Manager
Tel: +31(0)208080 880
ADDRESS: Rivium Westlaan 142, 2909 LD
Capelle aan den IJssel, Netherlands
We received a report that the copy-on-write issue repored by Jann Horn in
https://bugs.chromium.org/p/project-zero/issues/detail?id=2045 is still
reproducible on 4.14 and 4.19 kernels (the first issue with the reproducer
coded in vmsplice.c). I confirmed this and also that the issue was not
reproducible with 5.10 kernel. I tracked the fix to the following patch
introduced in 5.9 which changes the do_wp_page() logic:
09854ba94c6a 'mm: do_wp_page() simplification'
I backported this patch (#2 in the series) along with 2 prerequisite patches
(#1 and #4) that keep the backports clean and two followup fixes to the main
patch (#3 and #5). I had to skip the following fix:
feb889fb40fa 'mm: don't put pinned pages into the swap cache'
because it uses page_maybe_dma_pinned() which does not exists in earlier
kernels. Because pin_user_pages() does not exist there as well, I *think*
we can safely skip this fix on older kernels, but I would appreciate if
someone could confirm that claim.
The patchset cleanly applies over: stable linux-4.14.y, tag: v4.14.228
Note: 4.14 and 4.19 backports are very similar, so while I backported
only to these two versions I think backports for other versions can be
done easily.
Kirill Tkhai (1):
mm: reuse only-pte-mapped KSM page in do_wp_page()
Linus Torvalds (2):
mm: do_wp_page() simplification
mm: fix misplaced unlock_page in do_wp_page()
Nadav Amit (1):
mm/userfaultfd: fix memory corruption due to writeprotect
Shaohua Li (1):
userfaultfd: wp: add helper for writeprotect check
include/linux/ksm.h | 7 ++++
include/linux/userfaultfd_k.h | 10 ++++++
mm/ksm.c | 30 ++++++++++++++++--
mm/memory.c | 60 ++++++++++++++++-------------------
4 files changed, 73 insertions(+), 34 deletions(-)
--
2.31.0.291.g576ba9dcdaf-goog
From: Linus Torvalds <torvalds(a)linux-foundation.org>
commit 17839856fd588f4ab6b789f482ed3ffd7c403e1f upstream.
Doing a "get_user_pages()" on a copy-on-write page for reading can be
ambiguous: the page can be COW'ed at any time afterwards, and the
direction of a COW event isn't defined.
Yes, whoever writes to it will generally do the COW, but if the thread
that did the get_user_pages() unmapped the page before the write (and
that could happen due to memory pressure in addition to any outright
action), the writer could also just take over the old page instead.
End result: the get_user_pages() call might result in a page pointer
that is no longer associated with the original VM, and is associated
with - and controlled by - another VM having taken it over instead.
So when doing a get_user_pages() on a COW mapping, the only really safe
thing to do would be to break the COW when getting the page, even when
only getting it for reading.
At the same time, some users simply don't even care.
For example, the perf code wants to look up the page not because it
cares about the page, but because the code simply wants to look up the
physical address of the access for informational purposes, and doesn't
really care about races when a page might be unmapped and remapped
elsewhere.
This adds logic to force a COW event by setting FOLL_WRITE on any
copy-on-write mapping when FOLL_GET (or FOLL_PIN) is used to get a page
pointer as a result.
The current semantics end up being:
- __get_user_pages_fast(): no change. If you don't ask for a write,
you won't break COW. You'd better know what you're doing.
- get_user_pages_fast(): the fast-case "look it up in the page tables
without anything getting mmap_sem" now refuses to follow a read-only
page, since it might need COW breaking. Which happens in the slow
path - the fast path doesn't know if the memory might be COW or not.
- get_user_pages() (including the slow-path fallback for gup_fast()):
for a COW mapping, turn on FOLL_WRITE for FOLL_GET/FOLL_PIN, with
very similar semantics to FOLL_FORCE.
If it turns out that we want finer granularity (ie "only break COW when
it might actually matter" - things like the zero page are special and
don't need to be broken) we might need to push these semantics deeper
into the lookup fault path. So if people care enough, it's possible
that we might end up adding a new internal FOLL_BREAK_COW flag to go
with the internal FOLL_COW flag we already have for tracking "I had a
COW".
Alternatively, if it turns out that different callers might want to
explicitly control the forced COW break behavior, we might even want to
make such a flag visible to the users of get_user_pages() instead of
using the above default semantics.
But for now, this is mostly commentary on the issue (this commit message
being a lot bigger than the patch, and that patch in turn is almost all
comments), with that minimal "enable COW breaking early" logic using the
existing FOLL_WRITE behavior.
[ It might be worth noting that we've always had this ambiguity, and it
could arguably be seen as a user-space issue.
You only get private COW mappings that could break either way in
situations where user space is doing cooperative things (ie fork()
before an execve() etc), but it _is_ surprising and very subtle, and
fork() is supposed to give you independent address spaces.
So let's treat this as a kernel issue and make the semantics of
get_user_pages() easier to understand. Note that obviously a true
shared mapping will still get a page that can change under us, so this
does _not_ mean that get_user_pages() somehow returns any "stable"
page ]
[surenb: backport notes]
Replaced (gup_flags | FOLL_WRITE) with write=1 in gup_pgd_range.
Removed FOLL_PIN usage in should_force_cow_break since it's missing in
the earlier kernels.
Reported-by: Jann Horn <jannh(a)google.com>
Tested-by: Christoph Hellwig <hch(a)lst.de>
Acked-by: Oleg Nesterov <oleg(a)redhat.com>
Acked-by: Kirill Shutemov <kirill(a)shutemov.name>
Acked-by: Jan Kara <jack(a)suse.cz>
Cc: Andrea Arcangeli <aarcange(a)redhat.com>
Cc: Matthew Wilcox <willy(a)infradead.org>
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
[surenb: backport to 4.14 kernel]
Cc: stable(a)vger.kernel.org # 4.14.x
Signed-off-by: Suren Baghdasaryan <surenb(a)google.com>
---
mm/gup.c | 44 ++++++++++++++++++++++++++++++++++++++------
mm/huge_memory.c | 7 +++----
2 files changed, 41 insertions(+), 10 deletions(-)
diff --git a/mm/gup.c b/mm/gup.c
index 12b9626b1a9e..cfe0a56f8e27 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -61,13 +61,22 @@ static int follow_pfn_pte(struct vm_area_struct *vma, unsigned long address,
}
/*
- * FOLL_FORCE can write to even unwritable pte's, but only
- * after we've gone through a COW cycle and they are dirty.
+ * FOLL_FORCE or a forced COW break can write even to unwritable pte's,
+ * but only after we've gone through a COW cycle and they are dirty.
*/
static inline bool can_follow_write_pte(pte_t pte, unsigned int flags)
{
- return pte_write(pte) ||
- ((flags & FOLL_FORCE) && (flags & FOLL_COW) && pte_dirty(pte));
+ return pte_write(pte) || ((flags & FOLL_COW) && pte_dirty(pte));
+}
+
+/*
+ * A (separate) COW fault might break the page the other way and
+ * get_user_pages() would return the page from what is now the wrong
+ * VM. So we need to force a COW break at GUP time even for reads.
+ */
+static inline bool should_force_cow_break(struct vm_area_struct *vma, unsigned int flags)
+{
+ return is_cow_mapping(vma->vm_flags) && (flags & FOLL_GET);
}
static struct page *follow_page_pte(struct vm_area_struct *vma,
@@ -694,12 +703,18 @@ static long __get_user_pages(struct task_struct *tsk, struct mm_struct *mm,
if (!vma || check_vma_flags(vma, gup_flags))
return i ? : -EFAULT;
if (is_vm_hugetlb_page(vma)) {
+ if (should_force_cow_break(vma, foll_flags))
+ foll_flags |= FOLL_WRITE;
i = follow_hugetlb_page(mm, vma, pages, vmas,
&start, &nr_pages, i,
- gup_flags, nonblocking);
+ foll_flags, nonblocking);
continue;
}
}
+
+ if (should_force_cow_break(vma, foll_flags))
+ foll_flags |= FOLL_WRITE;
+
retry:
/*
* If we have a pending SIGKILL, don't keep faulting pages and
@@ -1796,6 +1811,10 @@ bool gup_fast_permitted(unsigned long start, int nr_pages, int write)
/*
* Like get_user_pages_fast() except it's IRQ-safe in that it won't fall back to
* the regular GUP. It will only return non-negative values.
+ *
+ * Careful, careful! COW breaking can go either way, so a non-write
+ * access can get ambiguous page results. If you call this function without
+ * 'write' set, you'd better be sure that you're ok with that ambiguity.
*/
int __get_user_pages_fast(unsigned long start, int nr_pages, int write,
struct page **pages)
@@ -1823,6 +1842,12 @@ int __get_user_pages_fast(unsigned long start, int nr_pages, int write,
*
* We do not adopt an rcu_read_lock(.) here as we also want to
* block IPIs that come from THPs splitting.
+ *
+ * NOTE! We allow read-only gup_fast() here, but you'd better be
+ * careful about possible COW pages. You'll get _a_ COW page, but
+ * not necessarily the one you intended to get depending on what
+ * COW event happens after this. COW may break the page copy in a
+ * random direction.
*/
if (gup_fast_permitted(start, nr_pages, write)) {
@@ -1868,9 +1893,16 @@ int get_user_pages_fast(unsigned long start, int nr_pages, int write,
(void __user *)start, len)))
return -EFAULT;
+ /*
+ * The FAST_GUP case requires FOLL_WRITE even for pure reads,
+ * because get_user_pages() may need to cause an early COW in
+ * order to avoid confusing the normal COW routines. So only
+ * targets that are already writable are safe to do by just
+ * looking at the page tables.
+ */
if (gup_fast_permitted(start, nr_pages, write)) {
local_irq_disable();
- gup_pgd_range(addr, end, write, pages, &nr);
+ gup_pgd_range(addr, end, 1, pages, &nr);
local_irq_enable();
ret = nr;
}
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 9dbfa7286c61..513f0cf173ad 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1367,13 +1367,12 @@ int do_huge_pmd_wp_page(struct vm_fault *vmf, pmd_t orig_pmd)
}
/*
- * FOLL_FORCE can write to even unwritable pmd's, but only
- * after we've gone through a COW cycle and they are dirty.
+ * FOLL_FORCE or a forced COW break can write even to unwritable pmd's,
+ * but only after we've gone through a COW cycle and they are dirty.
*/
static inline bool can_follow_write_pmd(pmd_t pmd, unsigned int flags)
{
- return pmd_write(pmd) ||
- ((flags & FOLL_FORCE) && (flags & FOLL_COW) && pmd_dirty(pmd));
+ return pmd_write(pmd) || ((flags & FOLL_COW) && pmd_dirty(pmd));
}
struct page *follow_trans_huge_pmd(struct vm_area_struct *vma,
--
2.31.1.498.g6c1eba8ee3d-goog