These series of patches are required to fix the specified CVE in v4.14.
I've tested these patches with a VM running vhost. Jason Wang who originally
wrote the patches, helped identify other patches to backport and also tested
this version and provided feedback on the patches.
Jason Wang (5):
vhost_net: introduce vhost_exceeds_weight()
vhost: introduce vhost_exceeds_weight()
vhost_net: fix possible infinite loop
vhost: vsock: add weight support
vhost: scsi: add weight support
Paolo Abeni (1):
vhost_net: use packet weight for rx handler, too
haibinzhang(张海斌) (1):
vhost-net: set packet weight of tx polling to 2 * vq size
drivers/vhost/net.c | 33 +++++++++++++++++++--------------
drivers/vhost/scsi.c | 14 ++++++++++----
drivers/vhost/vhost.c | 20 +++++++++++++++++++-
drivers/vhost/vhost.h | 6 +++++-
drivers/vhost/vsock.c | 27 ++++++++++++++++++++-------
5 files changed, 73 insertions(+), 27 deletions(-)
--
2.16.5
Recent FITRIM work, namely bbbf7243d62d ("btrfs: combine device update
operations during transaction commit") combined the way certain
operations are recoded in a transaction. As a result an ASSERT was added
in dev_replace_finish to ensure the new code works correctly.
Unfortunately I got reports that it's possible to trigger the assert,
meaning that during a device replace it's possible to have an unfinished
chunk allocation on the source device.
This is supposed to be prevented by the fact that a transaction is
committed before finishing the replace oepration and alter acquiring the
chunk mutex. This is not sufficient since by the time the transaction is
committed and the chunk mutex acquired it's possible to allocate a chunk
depending on the workload being executed on the replaced device. This
bug has been present ever since device replace was introduced but there
was never code which checks for it.
The correct way to fix is to ensure that there is no pending device
modification operation when the chunk mutex is acquire and if there is
repeat transaction commit. Unfortunately it's not possible to just
exclude the source device from btrfs_fs_devices::dev_alloc_list since
this causes ENOSPC to be hit in transaction commit.
Fixing that in another way would need to add special cases to handle the
last writes and forbid new ones. The looped transaction fix is more
obvious, and can be easily backported. The runtime of dev-replace is
long so there's no noticeable delay caused by that.
Signed-off-by: Nikolay Borisov <nborisov(a)suse.com>
---
Hello Greg,
Please merge the following backport of upstream commit debd1c065d2037919a7da67baf55cc683fee09f0
to 4.4.y stable branch.
fs/btrfs/dev-replace.c | 29 +++++++++++++++++++----------
fs/btrfs/volumes.c | 2 ++
fs/btrfs/volumes.h | 5 +++++
3 files changed, 26 insertions(+), 10 deletions(-)
diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c
index 81e5bc62e8e3..1414a99b3ab4 100644
--- a/fs/btrfs/dev-replace.c
+++ b/fs/btrfs/dev-replace.c
@@ -495,18 +495,27 @@ static int btrfs_dev_replace_finishing(struct btrfs_fs_info *fs_info,
}
btrfs_wait_ordered_roots(root->fs_info, -1);
- trans = btrfs_start_transaction(root, 0);
- if (IS_ERR(trans)) {
- mutex_unlock(&dev_replace->lock_finishing_cancel_unmount);
- return PTR_ERR(trans);
+ while (1) {
+ trans = btrfs_start_transaction(root, 0);
+ if (IS_ERR(trans)) {
+ mutex_unlock(&dev_replace->lock_finishing_cancel_unmount);
+ return PTR_ERR(trans);
+ }
+ ret = btrfs_commit_transaction(trans, root);
+ WARN_ON(ret);
+ mutex_lock(&uuid_mutex);
+ /* keep away write_all_supers() during the finishing procedure */
+ mutex_lock(&root->fs_info->fs_devices->device_list_mutex);
+ mutex_lock(&root->fs_info->chunk_mutex);
+ if (src_device->has_pending_chunks) {
+ mutex_unlock(&root->fs_info->chunk_mutex);
+ mutex_unlock(&root->fs_info->fs_devices->device_list_mutex);
+ mutex_unlock(&uuid_mutex);
+ } else {
+ break;
+ }
}
- ret = btrfs_commit_transaction(trans, root);
- WARN_ON(ret);
- mutex_lock(&uuid_mutex);
- /* keep away write_all_supers() during the finishing procedure */
- mutex_lock(&root->fs_info->fs_devices->device_list_mutex);
- mutex_lock(&root->fs_info->chunk_mutex);
btrfs_dev_replace_lock(dev_replace);
dev_replace->replace_state =
scrub_ret ? BTRFS_IOCTL_DEV_REPLACE_STATE_CANCELED
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index d1cca19b29d3..4eb7a6ba7e47 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -4760,6 +4760,7 @@ static int __btrfs_alloc_chunk(struct btrfs_trans_handle *trans,
for (i = 0; i < map->num_stripes; i++) {
num_bytes = map->stripes[i].dev->bytes_used + stripe_size;
btrfs_device_set_bytes_used(map->stripes[i].dev, num_bytes);
+ map->stripes[i].dev->has_pending_chunks = true;
}
spin_lock(&extent_root->fs_info->free_chunk_lock);
@@ -7064,6 +7065,7 @@ void btrfs_update_commit_device_bytes_used(struct btrfs_root *root,
for (i = 0; i < map->num_stripes; i++) {
dev = map->stripes[i].dev;
dev->commit_bytes_used = dev->bytes_used;
+ dev->has_pending_chunks = false;
}
}
unlock_chunks(root);
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index 3c651df420be..7feac2d9da56 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -62,6 +62,11 @@ struct btrfs_device {
spinlock_t io_lock ____cacheline_aligned;
int running_pending;
+ /* When true means this device has pending chunk alloc in
+ * current transaction. Protected by chunk_mutex.
+ */
+ bool has_pending_chunks;
+
/* regular prio bios */
struct btrfs_pending_bios pending_bios;
/* WRITE_SYNC bios */
--
2.17.1
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 1571c029a2ff289683ddb0a32253850363bcb8a7 Mon Sep 17 00:00:00 2001
From: Jan Kara <jack(a)suse.cz>
Date: Thu, 6 Jun 2019 11:10:28 +0200
Subject: [PATCH] dax: Fix xarray entry association for mixed mappings
When inserting entry into xarray, we store mapping and index in
corresponding struct pages for memory error handling. When it happened
that one process was mapping file at PMD granularity while another
process at PTE granularity, we could wrongly deassociate PMD range and
then reassociate PTE range leaving the rest of struct pages in PMD range
without mapping information which could later cause missed notifications
about memory errors. Fix the problem by calling the association /
deassociation code if and only if we are really going to update the
xarray (deassociating and associating zero or empty entries is just
no-op so there's no reason to complicate the code with trying to avoid
the calls for these cases).
Cc: <stable(a)vger.kernel.org>
Fixes: d2c997c0f145 ("fs, dax: use page->mapping to warn if truncate...")
Signed-off-by: Jan Kara <jack(a)suse.cz>
Signed-off-by: Dan Williams <dan.j.williams(a)intel.com>
diff --git a/fs/dax.c b/fs/dax.c
index f74386293632..9fd908f3df32 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -728,12 +728,11 @@ static void *dax_insert_entry(struct xa_state *xas,
xas_reset(xas);
xas_lock_irq(xas);
- if (dax_entry_size(entry) != dax_entry_size(new_entry)) {
+ if (dax_is_zero_entry(entry) || dax_is_empty_entry(entry)) {
+ void *old;
+
dax_disassociate_entry(entry, mapping, false);
dax_associate_entry(new_entry, mapping, vmf->vma, vmf->address);
- }
-
- if (dax_is_zero_entry(entry) || dax_is_empty_entry(entry)) {
/*
* Only swap our new entry into the page cache if the current
* entry is a zero page or an empty entry. If a normal PTE or
@@ -742,7 +741,7 @@ static void *dax_insert_entry(struct xa_state *xas,
* existing entry is a PMD, we will just leave the PMD in the
* tree and dirty it if necessary.
*/
- void *old = dax_lock_entry(xas, new_entry);
+ old = dax_lock_entry(xas, new_entry);
WARN_ON_ONCE(old != xa_mk_value(xa_to_value(entry) |
DAX_LOCKED));
entry = new_entry;
One space is left unused in circular FIFO to differentiate
'full' and 'empty' cases. So take that in to account while
counting for the descriptors completed.
Fixes the issue reported here,
https://lkml.org/lkml/2019/6/18/669
Cc: stable(a)vger.kernel.org
Reported-by: Srinivas Kandagatla <srinivas.kandagatla(a)linaro.org>
Signed-off-by: Sricharan R <sricharan(a)codeaurora.org>
---
drivers/dma/qcom/bam_dma.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/dma/qcom/bam_dma.c b/drivers/dma/qcom/bam_dma.c
index 4b43844..8e90a40 100644
--- a/drivers/dma/qcom/bam_dma.c
+++ b/drivers/dma/qcom/bam_dma.c
@@ -799,6 +799,9 @@ static u32 process_channel_irqs(struct bam_device *bdev)
/* Number of bytes available to read */
avail = CIRC_CNT(offset, bchan->head, MAX_DESCRIPTORS + 1);
+ if (offset < bchan->head)
+ avail--;
+
list_for_each_entry_safe(async_desc, tmp,
&bchan->desc_list, desc_node) {
/* Not enough data to read */
--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation