This is a note to let you know that I've just added the patch titled
usb-storage: Revert commit 747668dbc061 ("usb-storage: Set
to my usb git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git
in the usb-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
>From 9a976949613132977098fc49510b46fa8678d864 Mon Sep 17 00:00:00 2001
From: Alan Stern <stern(a)rowland.harvard.edu>
Date: Mon, 21 Oct 2019 11:48:06 -0400
Subject: usb-storage: Revert commit 747668dbc061 ("usb-storage: Set
virt_boundary_mask to avoid SG overflows")
Commit 747668dbc061 ("usb-storage: Set virt_boundary_mask to avoid SG
overflows") attempted to solve a problem involving scatter-gather I/O
and USB/IP by setting the virt_boundary_mask for mass-storage devices.
However, it now turns out that this interacts badly with commit
09324d32d2a0 ("block: force an unlimited segment size on queues with a
virt boundary"), which was added later. A typical error message is:
ehci-pci 0000:00:13.2: swiotlb buffer is full (sz: 327680 bytes),
total 32768 (slots), used 97 (slots)
There is no longer any reason to keep the virt_boundary_mask setting
for usb-storage. It was needed in the first place only for handling
devices with a block size smaller than the maxpacket size and where
the host controller was not capable of fully general scatter-gather
operation (that is, able to merge two SG segments into a single USB
packet). But:
High-speed or slower connections never use a bulk maxpacket
value larger than 512;
The SCSI layer does not handle block devices with a block size
smaller than 512 bytes;
All the host controllers capable of SuperSpeed operation can
handle fully general SG;
Since commit ea44d190764b ("usbip: Implement SG support to
vhci-hcd and stub driver") was merged, the USB/IP driver can
also handle SG.
Therefore all supported device/controller combinations should be okay
with no need for any special virt_boundary_mask. So in order to fix
the swiotlb problem, this patch reverts commit 747668dbc061.
Reported-and-tested-by: Piergiorgio Sartor <piergiorgio.sartor(a)nexgo.de>
Link: https://marc.info/?l=linux-usb&m=157134199501202&w=2
Signed-off-by: Alan Stern <stern(a)rowland.harvard.edu>
CC: Seth Bollinger <Seth.Bollinger(a)digi.com>
CC: <stable(a)vger.kernel.org>
Fixes: 747668dbc061 ("usb-storage: Set virt_boundary_mask to avoid SG overflows")
Acked-by: Christoph Hellwig <hch(a)lst.de>
Link: https://lore.kernel.org/r/Pine.LNX.4.44L0.1910211145520.1673-100000@iolanth…
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/usb/storage/scsiglue.c | 10 ----------
1 file changed, 10 deletions(-)
diff --git a/drivers/usb/storage/scsiglue.c b/drivers/usb/storage/scsiglue.c
index 6737fab94959..54a3c8195c96 100644
--- a/drivers/usb/storage/scsiglue.c
+++ b/drivers/usb/storage/scsiglue.c
@@ -68,7 +68,6 @@ static const char* host_info(struct Scsi_Host *host)
static int slave_alloc (struct scsi_device *sdev)
{
struct us_data *us = host_to_us(sdev->host);
- int maxp;
/*
* Set the INQUIRY transfer length to 36. We don't use any of
@@ -77,15 +76,6 @@ static int slave_alloc (struct scsi_device *sdev)
*/
sdev->inquiry_len = 36;
- /*
- * USB has unusual scatter-gather requirements: the length of each
- * scatterlist element except the last must be divisible by the
- * Bulk maxpacket value. Fortunately this value is always a
- * power of 2. Inform the block layer about this requirement.
- */
- maxp = usb_maxpacket(us->pusb_dev, us->recv_bulk_pipe, 0);
- blk_queue_virt_boundary(sdev->request_queue, maxp - 1);
-
/*
* Some host controllers may have alignment requirements.
* We'll play it safe by requiring 512-byte alignment always.
--
2.23.0
This is a note to let you know that I've just added the patch titled
usb: xhci: fix __le32/__le64 accessors in debugfs code
to my usb git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git
in the usb-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
>From d5501d5c29a2e684640507cfee428178d6fd82ca Mon Sep 17 00:00:00 2001
From: "Ben Dooks (Codethink)" <ben.dooks(a)codethink.co.uk>
Date: Fri, 25 Oct 2019 17:30:29 +0300
Subject: usb: xhci: fix __le32/__le64 accessors in debugfs code
It looks like some of the xhci debug code is passing u32 to functions
directly from __le32/__le64 fields.
Fix this by using le{32,64}_to_cpu() on these to fix the following
sparse warnings;
xhci-debugfs.c:205:62: warning: incorrect type in argument 1 (different base types)
xhci-debugfs.c:205:62: expected unsigned int [usertype] field0
xhci-debugfs.c:205:62: got restricted __le32
xhci-debugfs.c:206:62: warning: incorrect type in argument 2 (different base types)
xhci-debugfs.c:206:62: expected unsigned int [usertype] field1
xhci-debugfs.c:206:62: got restricted __le32
...
[Trim down commit message, sparse warnings were similar -Mathias]
Cc: <stable(a)vger.kernel.org> # 4.15+
Signed-off-by: Ben Dooks <ben.dooks(a)codethink.co.uk>
Signed-off-by: Mathias Nyman <mathias.nyman(a)linux.intel.com>
Link: https://lore.kernel.org/r/1572013829-14044-4-git-send-email-mathias.nyman@l…
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/usb/host/xhci-debugfs.c | 24 ++++++++++++------------
1 file changed, 12 insertions(+), 12 deletions(-)
diff --git a/drivers/usb/host/xhci-debugfs.c b/drivers/usb/host/xhci-debugfs.c
index 7ba6afc7ef23..76c3f29562d2 100644
--- a/drivers/usb/host/xhci-debugfs.c
+++ b/drivers/usb/host/xhci-debugfs.c
@@ -202,10 +202,10 @@ static void xhci_ring_dump_segment(struct seq_file *s,
trb = &seg->trbs[i];
dma = seg->dma + i * sizeof(*trb);
seq_printf(s, "%pad: %s\n", &dma,
- xhci_decode_trb(trb->generic.field[0],
- trb->generic.field[1],
- trb->generic.field[2],
- trb->generic.field[3]));
+ xhci_decode_trb(le32_to_cpu(trb->generic.field[0]),
+ le32_to_cpu(trb->generic.field[1]),
+ le32_to_cpu(trb->generic.field[2]),
+ le32_to_cpu(trb->generic.field[3])));
}
}
@@ -263,10 +263,10 @@ static int xhci_slot_context_show(struct seq_file *s, void *unused)
xhci = hcd_to_xhci(bus_to_hcd(dev->udev->bus));
slot_ctx = xhci_get_slot_ctx(xhci, dev->out_ctx);
seq_printf(s, "%pad: %s\n", &dev->out_ctx->dma,
- xhci_decode_slot_context(slot_ctx->dev_info,
- slot_ctx->dev_info2,
- slot_ctx->tt_info,
- slot_ctx->dev_state));
+ xhci_decode_slot_context(le32_to_cpu(slot_ctx->dev_info),
+ le32_to_cpu(slot_ctx->dev_info2),
+ le32_to_cpu(slot_ctx->tt_info),
+ le32_to_cpu(slot_ctx->dev_state)));
return 0;
}
@@ -286,10 +286,10 @@ static int xhci_endpoint_context_show(struct seq_file *s, void *unused)
ep_ctx = xhci_get_ep_ctx(xhci, dev->out_ctx, dci);
dma = dev->out_ctx->dma + dci * CTX_SIZE(xhci->hcc_params);
seq_printf(s, "%pad: %s\n", &dma,
- xhci_decode_ep_context(ep_ctx->ep_info,
- ep_ctx->ep_info2,
- ep_ctx->deq,
- ep_ctx->tx_info));
+ xhci_decode_ep_context(le32_to_cpu(ep_ctx->ep_info),
+ le32_to_cpu(ep_ctx->ep_info2),
+ le64_to_cpu(ep_ctx->deq),
+ le32_to_cpu(ep_ctx->tx_info)));
}
return 0;
--
2.23.0
This is a note to let you know that I've just added the patch titled
xhci: Fix use-after-free regression in xhci clear hub TT
to my usb git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git
in the usb-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
>From 18b74067ac78a2dea65783314c13df98a53d071c Mon Sep 17 00:00:00 2001
From: Mathias Nyman <mathias.nyman(a)linux.intel.com>
Date: Fri, 25 Oct 2019 17:30:27 +0300
Subject: xhci: Fix use-after-free regression in xhci clear hub TT
implementation
commit ef513be0a905 ("usb: xhci: Add Clear_TT_Buffer") schedules work
to clear TT buffer, but causes a use-after-free regression at the same time
Make sure hub_tt_work finishes before endpoint is disabled, otherwise
the work will dereference already freed endpoint and device related
pointers.
This was triggered when usb core failed to read the configuration
descriptor of a FS/LS device during enumeration.
xhci driver queued clear_tt_work while usb core freed and reallocated
a new device for the next enumeration attempt.
EHCI driver implents ehci_endpoint_disable() that makes sure
clear_tt_work has finished before it returns, but xhci lacks this support.
usb core will call hcd->driver->endpoint_disable() callback before
disabling endpoints, so we want this in xhci as well.
The added xhci_endpoint_disable() is based on ehci_endpoint_disable()
Fixes: ef513be0a905 ("usb: xhci: Add Clear_TT_Buffer")
Cc: <stable(a)vger.kernel.org> # v5.3
Reported-by: Johan Hovold <johan(a)kernel.org>
Suggested-by: Johan Hovold <johan(a)kernel.org>
Reviewed-by: Johan Hovold <johan(a)kernel.org>
Tested-by: Johan Hovold <johan(a)kernel.org>
Signed-off-by: Mathias Nyman <mathias.nyman(a)linux.intel.com>
Link: https://lore.kernel.org/r/1572013829-14044-2-git-send-email-mathias.nyman@l…
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/usb/host/xhci.c | 54 ++++++++++++++++++++++++++++++++++-------
1 file changed, 45 insertions(+), 9 deletions(-)
diff --git a/drivers/usb/host/xhci.c b/drivers/usb/host/xhci.c
index 517ec3206f6e..6c17e3fe181a 100644
--- a/drivers/usb/host/xhci.c
+++ b/drivers/usb/host/xhci.c
@@ -3071,6 +3071,48 @@ void xhci_cleanup_stalled_ring(struct xhci_hcd *xhci, unsigned int ep_index,
}
}
+static void xhci_endpoint_disable(struct usb_hcd *hcd,
+ struct usb_host_endpoint *host_ep)
+{
+ struct xhci_hcd *xhci;
+ struct xhci_virt_device *vdev;
+ struct xhci_virt_ep *ep;
+ struct usb_device *udev;
+ unsigned long flags;
+ unsigned int ep_index;
+
+ xhci = hcd_to_xhci(hcd);
+rescan:
+ spin_lock_irqsave(&xhci->lock, flags);
+
+ udev = (struct usb_device *)host_ep->hcpriv;
+ if (!udev || !udev->slot_id)
+ goto done;
+
+ vdev = xhci->devs[udev->slot_id];
+ if (!vdev)
+ goto done;
+
+ ep_index = xhci_get_endpoint_index(&host_ep->desc);
+ ep = &vdev->eps[ep_index];
+ if (!ep)
+ goto done;
+
+ /* wait for hub_tt_work to finish clearing hub TT */
+ if (ep->ep_state & EP_CLEARING_TT) {
+ spin_unlock_irqrestore(&xhci->lock, flags);
+ schedule_timeout_uninterruptible(1);
+ goto rescan;
+ }
+
+ if (ep->ep_state)
+ xhci_dbg(xhci, "endpoint disable with ep_state 0x%x\n",
+ ep->ep_state);
+done:
+ host_ep->hcpriv = NULL;
+ spin_unlock_irqrestore(&xhci->lock, flags);
+}
+
/*
* Called after usb core issues a clear halt control message.
* The host side of the halt should already be cleared by a reset endpoint
@@ -5238,20 +5280,13 @@ static void xhci_clear_tt_buffer_complete(struct usb_hcd *hcd,
unsigned int ep_index;
unsigned long flags;
- /*
- * udev might be NULL if tt buffer is cleared during a failed device
- * enumeration due to a halted control endpoint. Usb core might
- * have allocated a new udev for the next enumeration attempt.
- */
-
xhci = hcd_to_xhci(hcd);
+
+ spin_lock_irqsave(&xhci->lock, flags);
udev = (struct usb_device *)ep->hcpriv;
- if (!udev)
- return;
slot_id = udev->slot_id;
ep_index = xhci_get_endpoint_index(&ep->desc);
- spin_lock_irqsave(&xhci->lock, flags);
xhci->devs[slot_id]->eps[ep_index].ep_state &= ~EP_CLEARING_TT;
xhci_ring_doorbell_for_active_rings(xhci, slot_id, ep_index);
spin_unlock_irqrestore(&xhci->lock, flags);
@@ -5288,6 +5323,7 @@ static const struct hc_driver xhci_hc_driver = {
.free_streams = xhci_free_streams,
.add_endpoint = xhci_add_endpoint,
.drop_endpoint = xhci_drop_endpoint,
+ .endpoint_disable = xhci_endpoint_disable,
.endpoint_reset = xhci_endpoint_reset,
.check_bandwidth = xhci_check_bandwidth,
.reset_bandwidth = xhci_reset_bandwidth,
--
2.23.0
This is a note to let you know that I've just added the patch titled
usb: xhci: fix Immediate Data Transfer endianness
to my usb git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git
in the usb-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
>From bfa3dbb343f664573292afb9e44f9abeb81a19de Mon Sep 17 00:00:00 2001
From: Samuel Holland <samuel(a)sholland.org>
Date: Fri, 25 Oct 2019 17:30:28 +0300
Subject: usb: xhci: fix Immediate Data Transfer endianness
The arguments to queue_trb are always byteswapped to LE for placement in
the ring, but this should not happen in the case of immediate data; the
bytes copied out of transfer_buffer are already in the correct order.
Add a complementary byteswap so the bytes end up in the ring correctly.
This was observed on BE ppc64 with a "Texas Instruments TUSB73x0
SuperSpeed USB 3.0 xHCI Host Controller [104c:8241]" as a ch341
usb-serial adapter ("1a86:7523 QinHeng Electronics HL-340 USB-Serial
adapter") always transmitting the same character (generally NUL) over
the serial link regardless of the key pressed.
Cc: <stable(a)vger.kernel.org> # 5.2+
Fixes: 33e39350ebd2 ("usb: xhci: add Immediate Data Transfer support")
Signed-off-by: Samuel Holland <samuel(a)sholland.org>
Signed-off-by: Mathias Nyman <mathias.nyman(a)linux.intel.com>
Link: https://lore.kernel.org/r/1572013829-14044-3-git-send-email-mathias.nyman@l…
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/usb/host/xhci-ring.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
index 85ceb43e3405..e7aab31fd9a5 100644
--- a/drivers/usb/host/xhci-ring.c
+++ b/drivers/usb/host/xhci-ring.c
@@ -3330,6 +3330,7 @@ int xhci_queue_bulk_tx(struct xhci_hcd *xhci, gfp_t mem_flags,
if (xhci_urb_suitable_for_idt(urb)) {
memcpy(&send_addr, urb->transfer_buffer,
trb_buff_len);
+ le64_to_cpus(&send_addr);
field |= TRB_IDT;
}
}
@@ -3475,6 +3476,7 @@ int xhci_queue_ctrl_tx(struct xhci_hcd *xhci, gfp_t mem_flags,
if (xhci_urb_suitable_for_idt(urb)) {
memcpy(&addr, urb->transfer_buffer,
urb->transfer_buffer_length);
+ le64_to_cpus(&addr);
field |= TRB_IDT;
} else {
addr = (u64) urb->transfer_dma;
--
2.23.0
This is a note to let you know that I've just added the patch titled
USB: ldusb: fix ring-buffer locking
to my usb git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git
in the usb-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
>From d98ee2a19c3334e9343df3ce254b496f1fc428eb Mon Sep 17 00:00:00 2001
From: Johan Hovold <johan(a)kernel.org>
Date: Tue, 22 Oct 2019 16:32:02 +0200
Subject: USB: ldusb: fix ring-buffer locking
The custom ring-buffer implementation was merged without any locking or
explicit memory barriers, but a spinlock was later added by commit
9d33efd9a791 ("USB: ldusb bugfix").
The lock did not cover the update of the tail index once the entry had
been processed, something which could lead to memory corruption on
weakly ordered architectures or due to compiler optimisations.
Specifically, a completion handler running on another CPU might observe
the incremented tail index and update the entry before ld_usb_read() is
done with it.
Fixes: 2824bd250f0b ("[PATCH] USB: add ldusb driver")
Fixes: 9d33efd9a791 ("USB: ldusb bugfix")
Cc: stable <stable(a)vger.kernel.org> # 2.6.13
Signed-off-by: Johan Hovold <johan(a)kernel.org>
Link: https://lore.kernel.org/r/20191022143203.5260-2-johan@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/usb/misc/ldusb.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/usb/misc/ldusb.c b/drivers/usb/misc/ldusb.c
index 15b5f06fb0b3..c3e764909fd0 100644
--- a/drivers/usb/misc/ldusb.c
+++ b/drivers/usb/misc/ldusb.c
@@ -495,11 +495,11 @@ static ssize_t ld_usb_read(struct file *file, char __user *buffer, size_t count,
retval = -EFAULT;
goto unlock_exit;
}
- dev->ring_tail = (dev->ring_tail+1) % ring_buffer_size;
-
retval = bytes_to_read;
spin_lock_irq(&dev->rbsl);
+ dev->ring_tail = (dev->ring_tail + 1) % ring_buffer_size;
+
if (dev->buffer_overflow) {
dev->buffer_overflow = 0;
spin_unlock_irq(&dev->rbsl);
--
2.23.0
This is a note to let you know that I've just added the patch titled
USB: ldusb: fix control-message timeout
to my usb git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git
in the usb-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
>From 52403cfbc635d28195167618690595013776ebde Mon Sep 17 00:00:00 2001
From: Johan Hovold <johan(a)kernel.org>
Date: Tue, 22 Oct 2019 17:31:27 +0200
Subject: USB: ldusb: fix control-message timeout
USB control-message timeouts are specified in milliseconds, not jiffies.
Waiting 83 minutes for a transfer to complete is a bit excessive.
Fixes: 2824bd250f0b ("[PATCH] USB: add ldusb driver")
Cc: stable <stable(a)vger.kernel.org> # 2.6.13
Reported-by: syzbot+a4fbb3bb76cda0ea4e58(a)syzkaller.appspotmail.com
Signed-off-by: Johan Hovold <johan(a)kernel.org>
Link: https://lore.kernel.org/r/20191022153127.22295-1-johan@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/usb/misc/ldusb.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/usb/misc/ldusb.c b/drivers/usb/misc/ldusb.c
index dd1ea25e42b1..8f86b4ebca89 100644
--- a/drivers/usb/misc/ldusb.c
+++ b/drivers/usb/misc/ldusb.c
@@ -581,7 +581,7 @@ static ssize_t ld_usb_write(struct file *file, const char __user *buffer,
1 << 8, 0,
dev->interrupt_out_buffer,
bytes_to_write,
- USB_CTRL_SET_TIMEOUT * HZ);
+ USB_CTRL_SET_TIMEOUT);
if (retval < 0)
dev_err(&dev->intf->dev,
"Couldn't submit HID_REQ_SET_REPORT %d\n",
--
2.23.0
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 96c804a6ae8c59a9092b3d5dd581198472063184 Mon Sep 17 00:00:00 2001
From: David Hildenbrand <david(a)redhat.com>
Date: Fri, 18 Oct 2019 20:19:23 -0700
Subject: [PATCH] mm/memory-failure.c: don't access uninitialized memmaps in
memory_failure()
We should check for pfn_to_online_page() to not access uninitialized
memmaps. Reshuffle the code so we don't have to duplicate the error
message.
Link: http://lkml.kernel.org/r/20191009142435.3975-3-david@redhat.com
Signed-off-by: David Hildenbrand <david(a)redhat.com>
Fixes: f1dd2cd13c4b ("mm, memory_hotplug: do not associate hotadded memory to zones until online") [visible after d0dc12e86b319]
Acked-by: Naoya Horiguchi <n-horiguchi(a)ah.jp.nec.com>
Cc: Michal Hocko <mhocko(a)kernel.org>
Cc: <stable(a)vger.kernel.org> [4.13+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 0ae72b6acee7..3151c87dff73 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1257,17 +1257,19 @@ int memory_failure(unsigned long pfn, int flags)
if (!sysctl_memory_failure_recovery)
panic("Memory failure on page %lx", pfn);
- if (!pfn_valid(pfn)) {
+ p = pfn_to_online_page(pfn);
+ if (!p) {
+ if (pfn_valid(pfn)) {
+ pgmap = get_dev_pagemap(pfn, NULL);
+ if (pgmap)
+ return memory_failure_dev_pagemap(pfn, flags,
+ pgmap);
+ }
pr_err("Memory failure: %#lx: memory outside kernel control\n",
pfn);
return -ENXIO;
}
- pgmap = get_dev_pagemap(pfn, NULL);
- if (pgmap)
- return memory_failure_dev_pagemap(pfn, flags, pgmap);
-
- p = pfn_to_page(pfn);
if (PageHuge(p))
return memory_failure_hugetlb(pfn, flags);
if (TestSetPageHWPoison(p)) {
The patch below does not apply to the 5.3-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From b21555786f18cd77f2311ad89074533109ae3ffa Mon Sep 17 00:00:00 2001
From: Mikulas Patocka <mpatocka(a)redhat.com>
Date: Wed, 2 Oct 2019 06:15:53 -0400
Subject: [PATCH] dm snapshot: rework COW throttling to fix deadlock
Commit 721b1d98fb517a ("dm snapshot: Fix excessive memory usage and
workqueue stalls") introduced a semaphore to limit the maximum number of
in-flight kcopyd (COW) jobs.
The implementation of this throttling mechanism is prone to a deadlock:
1. One or more threads write to the origin device causing COW, which is
performed by kcopyd.
2. At some point some of these threads might reach the s->cow_count
semaphore limit and block in down(&s->cow_count), holding a read lock
on _origins_lock.
3. Someone tries to acquire a write lock on _origins_lock, e.g.,
snapshot_ctr(), which blocks because the threads at step (2) already
hold a read lock on it.
4. A COW operation completes and kcopyd runs dm-snapshot's completion
callback, which ends up calling pending_complete().
pending_complete() tries to resubmit any deferred origin bios. This
requires acquiring a read lock on _origins_lock, which blocks.
This happens because the read-write semaphore implementation gives
priority to writers, meaning that as soon as a writer tries to enter
the critical section, no readers will be allowed in, until all
writers have completed their work.
So, pending_complete() waits for the writer at step (3) to acquire
and release the lock. This writer waits for the readers at step (2)
to release the read lock and those readers wait for
pending_complete() (the kcopyd thread) to signal the s->cow_count
semaphore: DEADLOCK.
The above was thoroughly analyzed and documented by Nikos Tsironis as
part of his initial proposal for fixing this deadlock, see:
https://www.redhat.com/archives/dm-devel/2019-October/msg00001.html
Fix this deadlock by reworking COW throttling so that it waits without
holding any locks. Add a variable 'in_progress' that counts how many
kcopyd jobs are running. A function wait_for_in_progress() will sleep if
'in_progress' is over the limit. It drops _origins_lock in order to
avoid the deadlock.
Reported-by: Guruswamy Basavaiah <guru2018(a)gmail.com>
Reported-by: Nikos Tsironis <ntsironis(a)arrikto.com>
Reviewed-by: Nikos Tsironis <ntsironis(a)arrikto.com>
Tested-by: Nikos Tsironis <ntsironis(a)arrikto.com>
Fixes: 721b1d98fb51 ("dm snapshot: Fix excessive memory usage and workqueue stalls")
Cc: stable(a)vger.kernel.org # v5.0+
Depends-on: 4a3f111a73a8c ("dm snapshot: introduce account_start_copy() and account_end_copy()")
Signed-off-by: Mikulas Patocka <mpatocka(a)redhat.com>
Signed-off-by: Mike Snitzer <snitzer(a)redhat.com>
diff --git a/drivers/md/dm-snap.c b/drivers/md/dm-snap.c
index da3bd1794ee0..4fb1a40e68a0 100644
--- a/drivers/md/dm-snap.c
+++ b/drivers/md/dm-snap.c
@@ -18,7 +18,6 @@
#include <linux/vmalloc.h>
#include <linux/log2.h>
#include <linux/dm-kcopyd.h>
-#include <linux/semaphore.h>
#include "dm.h"
@@ -107,8 +106,8 @@ struct dm_snapshot {
/* The on disk metadata handler */
struct dm_exception_store *store;
- /* Maximum number of in-flight COW jobs. */
- struct semaphore cow_count;
+ unsigned in_progress;
+ struct wait_queue_head in_progress_wait;
struct dm_kcopyd_client *kcopyd_client;
@@ -162,8 +161,8 @@ struct dm_snapshot {
*/
#define DEFAULT_COW_THRESHOLD 2048
-static int cow_threshold = DEFAULT_COW_THRESHOLD;
-module_param_named(snapshot_cow_threshold, cow_threshold, int, 0644);
+static unsigned cow_threshold = DEFAULT_COW_THRESHOLD;
+module_param_named(snapshot_cow_threshold, cow_threshold, uint, 0644);
MODULE_PARM_DESC(snapshot_cow_threshold, "Maximum number of chunks being copied on write");
DECLARE_DM_KCOPYD_THROTTLE_WITH_MODULE_PARM(snapshot_copy_throttle,
@@ -1327,7 +1326,7 @@ static int snapshot_ctr(struct dm_target *ti, unsigned int argc, char **argv)
goto bad_hash_tables;
}
- sema_init(&s->cow_count, (cow_threshold > 0) ? cow_threshold : INT_MAX);
+ init_waitqueue_head(&s->in_progress_wait);
s->kcopyd_client = dm_kcopyd_client_create(&dm_kcopyd_throttle);
if (IS_ERR(s->kcopyd_client)) {
@@ -1509,17 +1508,54 @@ static void snapshot_dtr(struct dm_target *ti)
dm_put_device(ti, s->origin);
+ WARN_ON(s->in_progress);
+
kfree(s);
}
static void account_start_copy(struct dm_snapshot *s)
{
- down(&s->cow_count);
+ spin_lock(&s->in_progress_wait.lock);
+ s->in_progress++;
+ spin_unlock(&s->in_progress_wait.lock);
}
static void account_end_copy(struct dm_snapshot *s)
{
- up(&s->cow_count);
+ spin_lock(&s->in_progress_wait.lock);
+ BUG_ON(!s->in_progress);
+ s->in_progress--;
+ if (likely(s->in_progress <= cow_threshold) &&
+ unlikely(waitqueue_active(&s->in_progress_wait)))
+ wake_up_locked(&s->in_progress_wait);
+ spin_unlock(&s->in_progress_wait.lock);
+}
+
+static bool wait_for_in_progress(struct dm_snapshot *s, bool unlock_origins)
+{
+ if (unlikely(s->in_progress > cow_threshold)) {
+ spin_lock(&s->in_progress_wait.lock);
+ if (likely(s->in_progress > cow_threshold)) {
+ /*
+ * NOTE: this throttle doesn't account for whether
+ * the caller is servicing an IO that will trigger a COW
+ * so excess throttling may result for chunks not required
+ * to be COW'd. But if cow_threshold was reached, extra
+ * throttling is unlikely to negatively impact performance.
+ */
+ DECLARE_WAITQUEUE(wait, current);
+ __add_wait_queue(&s->in_progress_wait, &wait);
+ __set_current_state(TASK_UNINTERRUPTIBLE);
+ spin_unlock(&s->in_progress_wait.lock);
+ if (unlock_origins)
+ up_read(&_origins_lock);
+ io_schedule();
+ remove_wait_queue(&s->in_progress_wait, &wait);
+ return false;
+ }
+ spin_unlock(&s->in_progress_wait.lock);
+ }
+ return true;
}
/*
@@ -1537,7 +1573,7 @@ static void flush_bios(struct bio *bio)
}
}
-static int do_origin(struct dm_dev *origin, struct bio *bio);
+static int do_origin(struct dm_dev *origin, struct bio *bio, bool limit);
/*
* Flush a list of buffers.
@@ -1550,7 +1586,7 @@ static void retry_origin_bios(struct dm_snapshot *s, struct bio *bio)
while (bio) {
n = bio->bi_next;
bio->bi_next = NULL;
- r = do_origin(s->origin, bio);
+ r = do_origin(s->origin, bio, false);
if (r == DM_MAPIO_REMAPPED)
generic_make_request(bio);
bio = n;
@@ -1926,6 +1962,11 @@ static int snapshot_map(struct dm_target *ti, struct bio *bio)
if (!s->valid)
return DM_MAPIO_KILL;
+ if (bio_data_dir(bio) == WRITE) {
+ while (unlikely(!wait_for_in_progress(s, false)))
+ ; /* wait_for_in_progress() has slept */
+ }
+
down_read(&s->lock);
dm_exception_table_lock(&lock);
@@ -2122,7 +2163,7 @@ static int snapshot_merge_map(struct dm_target *ti, struct bio *bio)
if (bio_data_dir(bio) == WRITE) {
up_write(&s->lock);
- return do_origin(s->origin, bio);
+ return do_origin(s->origin, bio, false);
}
out_unlock:
@@ -2497,15 +2538,24 @@ static int __origin_write(struct list_head *snapshots, sector_t sector,
/*
* Called on a write from the origin driver.
*/
-static int do_origin(struct dm_dev *origin, struct bio *bio)
+static int do_origin(struct dm_dev *origin, struct bio *bio, bool limit)
{
struct origin *o;
int r = DM_MAPIO_REMAPPED;
+again:
down_read(&_origins_lock);
o = __lookup_origin(origin->bdev);
- if (o)
+ if (o) {
+ if (limit) {
+ struct dm_snapshot *s;
+ list_for_each_entry(s, &o->snapshots, list)
+ if (unlikely(!wait_for_in_progress(s, true)))
+ goto again;
+ }
+
r = __origin_write(&o->snapshots, bio->bi_iter.bi_sector, bio);
+ }
up_read(&_origins_lock);
return r;
@@ -2618,7 +2668,7 @@ static int origin_map(struct dm_target *ti, struct bio *bio)
dm_accept_partial_bio(bio, available_sectors);
/* Only tell snapshots if this is a write */
- return do_origin(o->dev, bio);
+ return do_origin(o->dev, bio, true);
}
/*
The patch below does not apply to the 5.3-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 491381ce07ca57f68c49c79a8a43da5b60749e32 Mon Sep 17 00:00:00 2001
From: Jens Axboe <axboe(a)kernel.dk>
Date: Thu, 17 Oct 2019 09:20:46 -0600
Subject: [PATCH] io_uring: fix up O_NONBLOCK handling for sockets
We've got two issues with the non-regular file handling for non-blocking
IO:
1) We don't want to re-do a short read in full for a non-regular file,
as we can't just read the data again.
2) For non-regular files that don't support non-blocking IO attempts,
we need to punt to async context even if the file is opened as
non-blocking. Otherwise the caller always gets -EAGAIN.
Add two new request flags to handle these cases. One is just a cache
of the inode S_ISREG() status, the other tells io_uring that we always
need to punt this request to async context, even if REQ_F_NOWAIT is set.
Cc: stable(a)vger.kernel.org
Reported-by: Hrvoje Zeba <zeba.hrvoje(a)gmail.com>
Tested-by: Hrvoje Zeba <zeba.hrvoje(a)gmail.com>
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
diff --git a/fs/io_uring.c b/fs/io_uring.c
index d2cb277da2f4..b7d4085d6ffd 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -322,6 +322,8 @@ struct io_kiocb {
#define REQ_F_FAIL_LINK 256 /* fail rest of links */
#define REQ_F_SHADOW_DRAIN 512 /* link-drain shadow req */
#define REQ_F_TIMEOUT 1024 /* timeout request */
+#define REQ_F_ISREG 2048 /* regular file */
+#define REQ_F_MUST_PUNT 4096 /* must be punted even for NONBLOCK */
u64 user_data;
u32 result;
u32 sequence;
@@ -914,26 +916,26 @@ static int io_iopoll_check(struct io_ring_ctx *ctx, unsigned *nr_events,
return ret;
}
-static void kiocb_end_write(struct kiocb *kiocb)
+static void kiocb_end_write(struct io_kiocb *req)
{
- if (kiocb->ki_flags & IOCB_WRITE) {
- struct inode *inode = file_inode(kiocb->ki_filp);
+ /*
+ * Tell lockdep we inherited freeze protection from submission
+ * thread.
+ */
+ if (req->flags & REQ_F_ISREG) {
+ struct inode *inode = file_inode(req->file);
- /*
- * Tell lockdep we inherited freeze protection from submission
- * thread.
- */
- if (S_ISREG(inode->i_mode))
- __sb_writers_acquired(inode->i_sb, SB_FREEZE_WRITE);
- file_end_write(kiocb->ki_filp);
+ __sb_writers_acquired(inode->i_sb, SB_FREEZE_WRITE);
}
+ file_end_write(req->file);
}
static void io_complete_rw(struct kiocb *kiocb, long res, long res2)
{
struct io_kiocb *req = container_of(kiocb, struct io_kiocb, rw);
- kiocb_end_write(kiocb);
+ if (kiocb->ki_flags & IOCB_WRITE)
+ kiocb_end_write(req);
if ((req->flags & REQ_F_LINK) && res != req->result)
req->flags |= REQ_F_FAIL_LINK;
@@ -945,7 +947,8 @@ static void io_complete_rw_iopoll(struct kiocb *kiocb, long res, long res2)
{
struct io_kiocb *req = container_of(kiocb, struct io_kiocb, rw);
- kiocb_end_write(kiocb);
+ if (kiocb->ki_flags & IOCB_WRITE)
+ kiocb_end_write(req);
if ((req->flags & REQ_F_LINK) && res != req->result)
req->flags |= REQ_F_FAIL_LINK;
@@ -1059,8 +1062,17 @@ static int io_prep_rw(struct io_kiocb *req, const struct sqe_submit *s,
if (!req->file)
return -EBADF;
- if (force_nonblock && !io_file_supports_async(req->file))
- force_nonblock = false;
+ if (S_ISREG(file_inode(req->file)->i_mode))
+ req->flags |= REQ_F_ISREG;
+
+ /*
+ * If the file doesn't support async, mark it as REQ_F_MUST_PUNT so
+ * we know to async punt it even if it was opened O_NONBLOCK
+ */
+ if (force_nonblock && !io_file_supports_async(req->file)) {
+ req->flags |= REQ_F_MUST_PUNT;
+ return -EAGAIN;
+ }
kiocb->ki_pos = READ_ONCE(sqe->off);
kiocb->ki_flags = iocb_flags(kiocb->ki_filp);
@@ -1081,7 +1093,8 @@ static int io_prep_rw(struct io_kiocb *req, const struct sqe_submit *s,
return ret;
/* don't allow async punt if RWF_NOWAIT was requested */
- if (kiocb->ki_flags & IOCB_NOWAIT)
+ if ((kiocb->ki_flags & IOCB_NOWAIT) ||
+ (req->file->f_flags & O_NONBLOCK))
req->flags |= REQ_F_NOWAIT;
if (force_nonblock)
@@ -1382,7 +1395,9 @@ static int io_read(struct io_kiocb *req, const struct sqe_submit *s,
* need async punt anyway, so it's more efficient to do it
* here.
*/
- if (force_nonblock && ret2 > 0 && ret2 < read_size)
+ if (force_nonblock && !(req->flags & REQ_F_NOWAIT) &&
+ (req->flags & REQ_F_ISREG) &&
+ ret2 > 0 && ret2 < read_size)
ret2 = -EAGAIN;
/* Catch -EAGAIN return for forced non-blocking submission */
if (!force_nonblock || ret2 != -EAGAIN) {
@@ -1447,7 +1462,7 @@ static int io_write(struct io_kiocb *req, const struct sqe_submit *s,
* released so that it doesn't complain about the held lock when
* we return to userspace.
*/
- if (S_ISREG(file_inode(file)->i_mode)) {
+ if (req->flags & REQ_F_ISREG) {
__sb_start_write(file_inode(file)->i_sb,
SB_FREEZE_WRITE, true);
__sb_writers_release(file_inode(file)->i_sb,
@@ -2282,7 +2297,13 @@ static int __io_queue_sqe(struct io_ring_ctx *ctx, struct io_kiocb *req,
int ret;
ret = __io_submit_sqe(ctx, req, s, force_nonblock);
- if (ret == -EAGAIN && !(req->flags & REQ_F_NOWAIT)) {
+
+ /*
+ * We async punt it if the file wasn't marked NOWAIT, or if the file
+ * doesn't support non-blocking read/write attempts
+ */
+ if (ret == -EAGAIN && (!(req->flags & REQ_F_NOWAIT) ||
+ (req->flags & REQ_F_MUST_PUNT))) {
struct io_uring_sqe *sqe_copy;
sqe_copy = kmemdup(s->sqe, sizeof(*sqe_copy), GFP_KERNEL);