The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 7703bdd8d23e6ef057af3253958a793ec6066b28 Mon Sep 17 00:00:00 2001
From: Chris Mason <clm(a)fb.com>
Date: Wed, 20 Jun 2018 07:56:11 -0700
Subject: [PATCH] Btrfs: don't clean dirty pages during buffered writes
During buffered writes, we follow this basic series of steps:
again:
lock all the pages
wait for writeback on all the pages
Take the extent range lock
wait for ordered extents on the whole range
clean all the pages
if (copy_from_user_in_atomic() hits a fault) {
drop our locks
goto again;
}
dirty all the pages
release all the locks
The extra waiting, cleaning and locking are there to make sure we don't
modify pages in flight to the drive, after they've been crc'd.
If some of the pages in the range were already dirty when the write
began, and we need to goto again, we create a window where a dirty page
has been cleaned and unlocked. It may be reclaimed before we're able to
lock it again, which means we'll read the old contents off the drive and
lose any modifications that had been pending writeback.
We don't actually need to clean the pages. All of the other locking in
place makes sure we don't start IO on the pages, so we can just leave
them dirty for the duration of the write.
Fixes: 73d59314e6ed (the original btrfs merge)
CC: stable(a)vger.kernel.org # v4.4+
Signed-off-by: Chris Mason <clm(a)fb.com>
Reviewed-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index d254cf94545f..15b925142793 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -531,6 +531,14 @@ int btrfs_dirty_pages(struct inode *inode, struct page **pages,
end_of_last_block = start_pos + num_bytes - 1;
+ /*
+ * The pages may have already been dirty, clear out old accounting so
+ * we can set things up properly
+ */
+ clear_extent_bit(&BTRFS_I(inode)->io_tree, start_pos, end_of_last_block,
+ EXTENT_DIRTY | EXTENT_DELALLOC |
+ EXTENT_DO_ACCOUNTING | EXTENT_DEFRAG, 0, 0, cached);
+
if (!btrfs_is_free_space_inode(BTRFS_I(inode))) {
if (start_pos >= isize &&
!(BTRFS_I(inode)->flags & BTRFS_INODE_PREALLOC)) {
@@ -1500,18 +1508,27 @@ lock_and_cleanup_extent_if_need(struct btrfs_inode *inode, struct page **pages,
}
if (ordered)
btrfs_put_ordered_extent(ordered);
- clear_extent_bit(&inode->io_tree, start_pos, last_pos,
- EXTENT_DIRTY | EXTENT_DELALLOC |
- EXTENT_DO_ACCOUNTING | EXTENT_DEFRAG,
- 0, 0, cached_state);
+
*lockstart = start_pos;
*lockend = last_pos;
ret = 1;
}
+ /*
+ * It's possible the pages are dirty right now, but we don't want
+ * to clean them yet because copy_from_user may catch a page fault
+ * and we might have to fall back to one page at a time. If that
+ * happens, we'll unlock these pages and we'd have a window where
+ * reclaim could sneak in and drop the once-dirty page on the floor
+ * without writing it.
+ *
+ * We have the pages locked and the extent range locked, so there's
+ * no way someone can start IO on any dirty pages in this range.
+ *
+ * We'll call btrfs_dirty_pages() later on, and that will flip around
+ * delalloc bits and dirty the pages as required.
+ */
for (i = 0; i < num_pages; i++) {
- if (clear_page_dirty_for_io(pages[i]))
- account_page_redirty(pages[i]);
set_page_extent_mapped(pages[i]);
WARN_ON(!PageLocked(pages[i]));
}
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 7703bdd8d23e6ef057af3253958a793ec6066b28 Mon Sep 17 00:00:00 2001
From: Chris Mason <clm(a)fb.com>
Date: Wed, 20 Jun 2018 07:56:11 -0700
Subject: [PATCH] Btrfs: don't clean dirty pages during buffered writes
During buffered writes, we follow this basic series of steps:
again:
lock all the pages
wait for writeback on all the pages
Take the extent range lock
wait for ordered extents on the whole range
clean all the pages
if (copy_from_user_in_atomic() hits a fault) {
drop our locks
goto again;
}
dirty all the pages
release all the locks
The extra waiting, cleaning and locking are there to make sure we don't
modify pages in flight to the drive, after they've been crc'd.
If some of the pages in the range were already dirty when the write
began, and we need to goto again, we create a window where a dirty page
has been cleaned and unlocked. It may be reclaimed before we're able to
lock it again, which means we'll read the old contents off the drive and
lose any modifications that had been pending writeback.
We don't actually need to clean the pages. All of the other locking in
place makes sure we don't start IO on the pages, so we can just leave
them dirty for the duration of the write.
Fixes: 73d59314e6ed (the original btrfs merge)
CC: stable(a)vger.kernel.org # v4.4+
Signed-off-by: Chris Mason <clm(a)fb.com>
Reviewed-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index d254cf94545f..15b925142793 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -531,6 +531,14 @@ int btrfs_dirty_pages(struct inode *inode, struct page **pages,
end_of_last_block = start_pos + num_bytes - 1;
+ /*
+ * The pages may have already been dirty, clear out old accounting so
+ * we can set things up properly
+ */
+ clear_extent_bit(&BTRFS_I(inode)->io_tree, start_pos, end_of_last_block,
+ EXTENT_DIRTY | EXTENT_DELALLOC |
+ EXTENT_DO_ACCOUNTING | EXTENT_DEFRAG, 0, 0, cached);
+
if (!btrfs_is_free_space_inode(BTRFS_I(inode))) {
if (start_pos >= isize &&
!(BTRFS_I(inode)->flags & BTRFS_INODE_PREALLOC)) {
@@ -1500,18 +1508,27 @@ lock_and_cleanup_extent_if_need(struct btrfs_inode *inode, struct page **pages,
}
if (ordered)
btrfs_put_ordered_extent(ordered);
- clear_extent_bit(&inode->io_tree, start_pos, last_pos,
- EXTENT_DIRTY | EXTENT_DELALLOC |
- EXTENT_DO_ACCOUNTING | EXTENT_DEFRAG,
- 0, 0, cached_state);
+
*lockstart = start_pos;
*lockend = last_pos;
ret = 1;
}
+ /*
+ * It's possible the pages are dirty right now, but we don't want
+ * to clean them yet because copy_from_user may catch a page fault
+ * and we might have to fall back to one page at a time. If that
+ * happens, we'll unlock these pages and we'd have a window where
+ * reclaim could sneak in and drop the once-dirty page on the floor
+ * without writing it.
+ *
+ * We have the pages locked and the extent range locked, so there's
+ * no way someone can start IO on any dirty pages in this range.
+ *
+ * We'll call btrfs_dirty_pages() later on, and that will flip around
+ * delalloc bits and dirty the pages as required.
+ */
for (i = 0; i < num_pages; i++) {
- if (clear_page_dirty_for_io(pages[i]))
- account_page_redirty(pages[i]);
set_page_extent_mapped(pages[i]);
WARN_ON(!PageLocked(pages[i]));
}
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From fee7acc361314df6561208c2d3c0882d663dd537 Mon Sep 17 00:00:00 2001
From: Jeff Mahoney <jeffm(a)suse.com>
Date: Thu, 6 Sep 2018 17:18:16 -0400
Subject: [PATCH] btrfs: keep trim from interfering with transaction commits
Commit 499f377f49f08 (btrfs: iterate over unused chunk space in FITRIM)
fixed free space trimming, but introduced latency when it was running.
This is due to it pinning the transaction using both a incremented
refcount and holding the commit root sem for the duration of a single
trim operation.
This was to ensure safety but it's unnecessary. We already hold the the
chunk mutex so we know that the chunk we're using can't be allocated
while we're trimming it.
In order to check against chunks allocated already in this transaction,
we need to check the pending chunks list. To to that safely without
joining the transaction (or attaching than then having to commit it) we
need to ensure that the dev root's commit root doesn't change underneath
us and the pending chunk lists stays around until we're done with it.
We can ensure the former by holding the commit root sem and the latter
by pinning the transaction. We do this now, but the critical section
covers the trim operation itself and we don't need to do that.
This patch moves the pinning and unpinning logic into helpers and unpins
the transaction after performing the search and check for pending
chunks.
Limiting the critical section of the transaction pinning improves the
latency substantially on slower storage (e.g. image files over NFS).
Fixes: 499f377f49f08 ("btrfs: iterate over unused chunk space in FITRIM")
CC: stable(a)vger.kernel.org # 4.4+
Signed-off-by: Jeff Mahoney <jeffm(a)suse.com>
Reviewed-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 6e1c2a22874f..f0524bdf49b3 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -10755,14 +10755,16 @@ int btrfs_error_unpin_extent_range(struct btrfs_fs_info *fs_info,
* We don't want a transaction for this since the discard may take a
* substantial amount of time. We don't require that a transaction be
* running, but we do need to take a running transaction into account
- * to ensure that we're not discarding chunks that were released in
- * the current transaction.
+ * to ensure that we're not discarding chunks that were released or
+ * allocated in the current transaction.
*
* Holding the chunks lock will prevent other threads from allocating
* or releasing chunks, but it won't prevent a running transaction
* from committing and releasing the memory that the pending chunks
* list head uses. For that, we need to take a reference to the
- * transaction.
+ * transaction and hold the commit root sem. We only need to hold
+ * it while performing the free space search since we have already
+ * held back allocations.
*/
static int btrfs_trim_free_extents(struct btrfs_device *device,
u64 minlen, u64 *trimmed)
@@ -10793,9 +10795,13 @@ static int btrfs_trim_free_extents(struct btrfs_device *device,
ret = mutex_lock_interruptible(&fs_info->chunk_mutex);
if (ret)
- return ret;
+ break;
- down_read(&fs_info->commit_root_sem);
+ ret = down_read_killable(&fs_info->commit_root_sem);
+ if (ret) {
+ mutex_unlock(&fs_info->chunk_mutex);
+ break;
+ }
spin_lock(&fs_info->trans_lock);
trans = fs_info->running_transaction;
@@ -10803,13 +10809,17 @@ static int btrfs_trim_free_extents(struct btrfs_device *device,
refcount_inc(&trans->use_count);
spin_unlock(&fs_info->trans_lock);
+ if (!trans)
+ up_read(&fs_info->commit_root_sem);
+
ret = find_free_dev_extent_start(trans, device, minlen, start,
&start, &len);
- if (trans)
+ if (trans) {
+ up_read(&fs_info->commit_root_sem);
btrfs_put_transaction(trans);
+ }
if (ret) {
- up_read(&fs_info->commit_root_sem);
mutex_unlock(&fs_info->chunk_mutex);
if (ret == -ENOSPC)
ret = 0;
@@ -10817,7 +10827,6 @@ static int btrfs_trim_free_extents(struct btrfs_device *device,
}
ret = btrfs_issue_discard(device->bdev, start, len, &bytes);
- up_read(&fs_info->commit_root_sem);
mutex_unlock(&fs_info->chunk_mutex);
if (ret)
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From fee7acc361314df6561208c2d3c0882d663dd537 Mon Sep 17 00:00:00 2001
From: Jeff Mahoney <jeffm(a)suse.com>
Date: Thu, 6 Sep 2018 17:18:16 -0400
Subject: [PATCH] btrfs: keep trim from interfering with transaction commits
Commit 499f377f49f08 (btrfs: iterate over unused chunk space in FITRIM)
fixed free space trimming, but introduced latency when it was running.
This is due to it pinning the transaction using both a incremented
refcount and holding the commit root sem for the duration of a single
trim operation.
This was to ensure safety but it's unnecessary. We already hold the the
chunk mutex so we know that the chunk we're using can't be allocated
while we're trimming it.
In order to check against chunks allocated already in this transaction,
we need to check the pending chunks list. To to that safely without
joining the transaction (or attaching than then having to commit it) we
need to ensure that the dev root's commit root doesn't change underneath
us and the pending chunk lists stays around until we're done with it.
We can ensure the former by holding the commit root sem and the latter
by pinning the transaction. We do this now, but the critical section
covers the trim operation itself and we don't need to do that.
This patch moves the pinning and unpinning logic into helpers and unpins
the transaction after performing the search and check for pending
chunks.
Limiting the critical section of the transaction pinning improves the
latency substantially on slower storage (e.g. image files over NFS).
Fixes: 499f377f49f08 ("btrfs: iterate over unused chunk space in FITRIM")
CC: stable(a)vger.kernel.org # 4.4+
Signed-off-by: Jeff Mahoney <jeffm(a)suse.com>
Reviewed-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 6e1c2a22874f..f0524bdf49b3 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -10755,14 +10755,16 @@ int btrfs_error_unpin_extent_range(struct btrfs_fs_info *fs_info,
* We don't want a transaction for this since the discard may take a
* substantial amount of time. We don't require that a transaction be
* running, but we do need to take a running transaction into account
- * to ensure that we're not discarding chunks that were released in
- * the current transaction.
+ * to ensure that we're not discarding chunks that were released or
+ * allocated in the current transaction.
*
* Holding the chunks lock will prevent other threads from allocating
* or releasing chunks, but it won't prevent a running transaction
* from committing and releasing the memory that the pending chunks
* list head uses. For that, we need to take a reference to the
- * transaction.
+ * transaction and hold the commit root sem. We only need to hold
+ * it while performing the free space search since we have already
+ * held back allocations.
*/
static int btrfs_trim_free_extents(struct btrfs_device *device,
u64 minlen, u64 *trimmed)
@@ -10793,9 +10795,13 @@ static int btrfs_trim_free_extents(struct btrfs_device *device,
ret = mutex_lock_interruptible(&fs_info->chunk_mutex);
if (ret)
- return ret;
+ break;
- down_read(&fs_info->commit_root_sem);
+ ret = down_read_killable(&fs_info->commit_root_sem);
+ if (ret) {
+ mutex_unlock(&fs_info->chunk_mutex);
+ break;
+ }
spin_lock(&fs_info->trans_lock);
trans = fs_info->running_transaction;
@@ -10803,13 +10809,17 @@ static int btrfs_trim_free_extents(struct btrfs_device *device,
refcount_inc(&trans->use_count);
spin_unlock(&fs_info->trans_lock);
+ if (!trans)
+ up_read(&fs_info->commit_root_sem);
+
ret = find_free_dev_extent_start(trans, device, minlen, start,
&start, &len);
- if (trans)
+ if (trans) {
+ up_read(&fs_info->commit_root_sem);
btrfs_put_transaction(trans);
+ }
if (ret) {
- up_read(&fs_info->commit_root_sem);
mutex_unlock(&fs_info->chunk_mutex);
if (ret == -ENOSPC)
ret = 0;
@@ -10817,7 +10827,6 @@ static int btrfs_trim_free_extents(struct btrfs_device *device,
}
ret = btrfs_issue_discard(device->bdev, start, len, &bytes);
- up_read(&fs_info->commit_root_sem);
mutex_unlock(&fs_info->chunk_mutex);
if (ret)
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From fee7acc361314df6561208c2d3c0882d663dd537 Mon Sep 17 00:00:00 2001
From: Jeff Mahoney <jeffm(a)suse.com>
Date: Thu, 6 Sep 2018 17:18:16 -0400
Subject: [PATCH] btrfs: keep trim from interfering with transaction commits
Commit 499f377f49f08 (btrfs: iterate over unused chunk space in FITRIM)
fixed free space trimming, but introduced latency when it was running.
This is due to it pinning the transaction using both a incremented
refcount and holding the commit root sem for the duration of a single
trim operation.
This was to ensure safety but it's unnecessary. We already hold the the
chunk mutex so we know that the chunk we're using can't be allocated
while we're trimming it.
In order to check against chunks allocated already in this transaction,
we need to check the pending chunks list. To to that safely without
joining the transaction (or attaching than then having to commit it) we
need to ensure that the dev root's commit root doesn't change underneath
us and the pending chunk lists stays around until we're done with it.
We can ensure the former by holding the commit root sem and the latter
by pinning the transaction. We do this now, but the critical section
covers the trim operation itself and we don't need to do that.
This patch moves the pinning and unpinning logic into helpers and unpins
the transaction after performing the search and check for pending
chunks.
Limiting the critical section of the transaction pinning improves the
latency substantially on slower storage (e.g. image files over NFS).
Fixes: 499f377f49f08 ("btrfs: iterate over unused chunk space in FITRIM")
CC: stable(a)vger.kernel.org # 4.4+
Signed-off-by: Jeff Mahoney <jeffm(a)suse.com>
Reviewed-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 6e1c2a22874f..f0524bdf49b3 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -10755,14 +10755,16 @@ int btrfs_error_unpin_extent_range(struct btrfs_fs_info *fs_info,
* We don't want a transaction for this since the discard may take a
* substantial amount of time. We don't require that a transaction be
* running, but we do need to take a running transaction into account
- * to ensure that we're not discarding chunks that were released in
- * the current transaction.
+ * to ensure that we're not discarding chunks that were released or
+ * allocated in the current transaction.
*
* Holding the chunks lock will prevent other threads from allocating
* or releasing chunks, but it won't prevent a running transaction
* from committing and releasing the memory that the pending chunks
* list head uses. For that, we need to take a reference to the
- * transaction.
+ * transaction and hold the commit root sem. We only need to hold
+ * it while performing the free space search since we have already
+ * held back allocations.
*/
static int btrfs_trim_free_extents(struct btrfs_device *device,
u64 minlen, u64 *trimmed)
@@ -10793,9 +10795,13 @@ static int btrfs_trim_free_extents(struct btrfs_device *device,
ret = mutex_lock_interruptible(&fs_info->chunk_mutex);
if (ret)
- return ret;
+ break;
- down_read(&fs_info->commit_root_sem);
+ ret = down_read_killable(&fs_info->commit_root_sem);
+ if (ret) {
+ mutex_unlock(&fs_info->chunk_mutex);
+ break;
+ }
spin_lock(&fs_info->trans_lock);
trans = fs_info->running_transaction;
@@ -10803,13 +10809,17 @@ static int btrfs_trim_free_extents(struct btrfs_device *device,
refcount_inc(&trans->use_count);
spin_unlock(&fs_info->trans_lock);
+ if (!trans)
+ up_read(&fs_info->commit_root_sem);
+
ret = find_free_dev_extent_start(trans, device, minlen, start,
&start, &len);
- if (trans)
+ if (trans) {
+ up_read(&fs_info->commit_root_sem);
btrfs_put_transaction(trans);
+ }
if (ret) {
- up_read(&fs_info->commit_root_sem);
mutex_unlock(&fs_info->chunk_mutex);
if (ret == -ENOSPC)
ret = 0;
@@ -10817,7 +10827,6 @@ static int btrfs_trim_free_extents(struct btrfs_device *device,
}
ret = btrfs_issue_discard(device->bdev, start, len, &bytes);
- up_read(&fs_info->commit_root_sem);
mutex_unlock(&fs_info->chunk_mutex);
if (ret)
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 6ba9fc8e628becf0e3ec94083450d089b0dec5f5 Mon Sep 17 00:00:00 2001
From: Qu Wenruo <wqu(a)suse.com>
Date: Fri, 7 Sep 2018 14:16:24 +0800
Subject: [PATCH] btrfs: Ensure btrfs_trim_fs can trim the whole filesystem
[BUG]
fstrim on some btrfs only trims the unallocated space, not trimming any
space in existing block groups.
[CAUSE]
Before fstrim_range passed to btrfs_trim_fs(), it gets truncated to
range [0, super->total_bytes). So later btrfs_trim_fs() will only be
able to trim block groups in range [0, super->total_bytes).
While for btrfs, any bytenr aligned to sectorsize is valid, since btrfs
uses its logical address space, there is nothing limiting the location
where we put block groups.
For filesystem with frequent balance, it's quite easy to relocate all
block groups and bytenr of block groups will start beyond
super->total_bytes.
In that case, btrfs will not trim existing block groups.
[FIX]
Just remove the truncation in btrfs_ioctl_fitrim(), so btrfs_trim_fs()
can get the unmodified range, which is normally set to [0, U64_MAX].
Reported-by: Chris Murphy <lists(a)colorremedies.com>
Fixes: f4c697e6406d ("btrfs: return EINVAL if start > total_bytes in fitrim ioctl")
CC: <stable(a)vger.kernel.org> # v4.4+
Signed-off-by: Qu Wenruo <wqu(a)suse.com>
Reviewed-by: Nikolay Borisov <nborisov(a)suse.com>
Reviewed-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 5dbb3f713125..da3257585e29 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -10851,21 +10851,13 @@ int btrfs_trim_fs(struct btrfs_fs_info *fs_info, struct fstrim_range *range)
u64 start;
u64 end;
u64 trimmed = 0;
- u64 total_bytes = btrfs_super_total_bytes(fs_info->super_copy);
u64 bg_failed = 0;
u64 dev_failed = 0;
int bg_ret = 0;
int dev_ret = 0;
int ret = 0;
- /*
- * try to trim all FS space, our block group may start from non-zero.
- */
- if (range->len == total_bytes)
- cache = btrfs_lookup_first_block_group(fs_info, range->start);
- else
- cache = btrfs_lookup_block_group(fs_info, range->start);
-
+ cache = btrfs_lookup_first_block_group(fs_info, range->start);
for (; cache; cache = next_block_group(fs_info, cache)) {
if (cache->key.objectid >= (range->start + range->len)) {
btrfs_put_block_group(cache);
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 4905d13dee0a..a990a9045139 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -491,7 +491,6 @@ static noinline int btrfs_ioctl_fitrim(struct file *file, void __user *arg)
struct fstrim_range range;
u64 minlen = ULLONG_MAX;
u64 num_devices = 0;
- u64 total_bytes = btrfs_super_total_bytes(fs_info->super_copy);
int ret;
if (!capable(CAP_SYS_ADMIN))
@@ -515,11 +514,15 @@ static noinline int btrfs_ioctl_fitrim(struct file *file, void __user *arg)
return -EOPNOTSUPP;
if (copy_from_user(&range, arg, sizeof(range)))
return -EFAULT;
- if (range.start > total_bytes ||
- range.len < fs_info->sb->s_blocksize)
+
+ /*
+ * NOTE: Don't truncate the range using super->total_bytes. Bytenr of
+ * block group is in the logical address space, which can be any
+ * sectorsize aligned bytenr in the range [0, U64_MAX].
+ */
+ if (range.len < fs_info->sb->s_blocksize)
return -EINVAL;
- range.len = min(range.len, total_bytes - range.start);
range.minlen = max(range.minlen, minlen);
ret = btrfs_trim_fs(fs_info, &range);
if (ret < 0)
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 93bba24d4b5ad1e5cd8b43f64e66ff9d6355dd20 Mon Sep 17 00:00:00 2001
From: Qu Wenruo <wqu(a)suse.com>
Date: Fri, 7 Sep 2018 14:16:23 +0800
Subject: [PATCH] btrfs: Enhance btrfs_trim_fs function to handle error better
Function btrfs_trim_fs() doesn't handle errors in a consistent way. If
error happens when trimming existing block groups, it will skip the
remaining blocks and continue to trim unallocated space for each device.
The return value will only reflect the final error from device trimming.
This patch will fix such behavior by:
1) Recording the last error from block group or device trimming
The return value will also reflect the last error during trimming.
Make developer more aware of the problem.
2) Continuing trimming if possible
If we failed to trim one block group or device, we could still try
the next block group or device.
3) Report number of failures during block group and device trimming
It would be less noisy, but still gives user a brief summary of
what's going wrong.
Such behavior can avoid confusion for cases like failure to trim the
first block group and then only unallocated space is trimmed.
Reported-by: Chris Murphy <lists(a)colorremedies.com>
CC: stable(a)vger.kernel.org # 4.4+
Signed-off-by: Qu Wenruo <wqu(a)suse.com>
Reviewed-by: David Sterba <dsterba(a)suse.com>
[ add bg_ret and dev_ret to the messages ]
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 528f3308c32b..5dbb3f713125 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -10833,6 +10833,15 @@ static int btrfs_trim_free_extents(struct btrfs_device *device,
return ret;
}
+/*
+ * Trim the whole filesystem by:
+ * 1) trimming the free space in each block group
+ * 2) trimming the unallocated space on each device
+ *
+ * This will also continue trimming even if a block group or device encounters
+ * an error. The return value will be the last error, or 0 if nothing bad
+ * happens.
+ */
int btrfs_trim_fs(struct btrfs_fs_info *fs_info, struct fstrim_range *range)
{
struct btrfs_block_group_cache *cache = NULL;
@@ -10843,6 +10852,10 @@ int btrfs_trim_fs(struct btrfs_fs_info *fs_info, struct fstrim_range *range)
u64 end;
u64 trimmed = 0;
u64 total_bytes = btrfs_super_total_bytes(fs_info->super_copy);
+ u64 bg_failed = 0;
+ u64 dev_failed = 0;
+ int bg_ret = 0;
+ int dev_ret = 0;
int ret = 0;
/*
@@ -10853,7 +10866,7 @@ int btrfs_trim_fs(struct btrfs_fs_info *fs_info, struct fstrim_range *range)
else
cache = btrfs_lookup_block_group(fs_info, range->start);
- while (cache) {
+ for (; cache; cache = next_block_group(fs_info, cache)) {
if (cache->key.objectid >= (range->start + range->len)) {
btrfs_put_block_group(cache);
break;
@@ -10867,13 +10880,15 @@ int btrfs_trim_fs(struct btrfs_fs_info *fs_info, struct fstrim_range *range)
if (!block_group_cache_done(cache)) {
ret = cache_block_group(cache, 0);
if (ret) {
- btrfs_put_block_group(cache);
- break;
+ bg_failed++;
+ bg_ret = ret;
+ continue;
}
ret = wait_block_group_cache_done(cache);
if (ret) {
- btrfs_put_block_group(cache);
- break;
+ bg_failed++;
+ bg_ret = ret;
+ continue;
}
}
ret = btrfs_trim_block_group(cache,
@@ -10884,28 +10899,40 @@ int btrfs_trim_fs(struct btrfs_fs_info *fs_info, struct fstrim_range *range)
trimmed += group_trimmed;
if (ret) {
- btrfs_put_block_group(cache);
- break;
+ bg_failed++;
+ bg_ret = ret;
+ continue;
}
}
-
- cache = next_block_group(fs_info, cache);
}
+ if (bg_failed)
+ btrfs_warn(fs_info,
+ "failed to trim %llu block group(s), last error %d",
+ bg_failed, bg_ret);
mutex_lock(&fs_info->fs_devices->device_list_mutex);
devices = &fs_info->fs_devices->alloc_list;
list_for_each_entry(device, devices, dev_alloc_list) {
ret = btrfs_trim_free_extents(device, range->minlen,
&group_trimmed);
- if (ret)
+ if (ret) {
+ dev_failed++;
+ dev_ret = ret;
break;
+ }
trimmed += group_trimmed;
}
mutex_unlock(&fs_info->fs_devices->device_list_mutex);
+ if (dev_failed)
+ btrfs_warn(fs_info,
+ "failed to trim %llu device(s), last error %d",
+ dev_failed, dev_ret);
range->len = trimmed;
- return ret;
+ if (bg_ret)
+ return bg_ret;
+ return dev_ret;
}
/*
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 93bba24d4b5ad1e5cd8b43f64e66ff9d6355dd20 Mon Sep 17 00:00:00 2001
From: Qu Wenruo <wqu(a)suse.com>
Date: Fri, 7 Sep 2018 14:16:23 +0800
Subject: [PATCH] btrfs: Enhance btrfs_trim_fs function to handle error better
Function btrfs_trim_fs() doesn't handle errors in a consistent way. If
error happens when trimming existing block groups, it will skip the
remaining blocks and continue to trim unallocated space for each device.
The return value will only reflect the final error from device trimming.
This patch will fix such behavior by:
1) Recording the last error from block group or device trimming
The return value will also reflect the last error during trimming.
Make developer more aware of the problem.
2) Continuing trimming if possible
If we failed to trim one block group or device, we could still try
the next block group or device.
3) Report number of failures during block group and device trimming
It would be less noisy, but still gives user a brief summary of
what's going wrong.
Such behavior can avoid confusion for cases like failure to trim the
first block group and then only unallocated space is trimmed.
Reported-by: Chris Murphy <lists(a)colorremedies.com>
CC: stable(a)vger.kernel.org # 4.4+
Signed-off-by: Qu Wenruo <wqu(a)suse.com>
Reviewed-by: David Sterba <dsterba(a)suse.com>
[ add bg_ret and dev_ret to the messages ]
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 528f3308c32b..5dbb3f713125 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -10833,6 +10833,15 @@ static int btrfs_trim_free_extents(struct btrfs_device *device,
return ret;
}
+/*
+ * Trim the whole filesystem by:
+ * 1) trimming the free space in each block group
+ * 2) trimming the unallocated space on each device
+ *
+ * This will also continue trimming even if a block group or device encounters
+ * an error. The return value will be the last error, or 0 if nothing bad
+ * happens.
+ */
int btrfs_trim_fs(struct btrfs_fs_info *fs_info, struct fstrim_range *range)
{
struct btrfs_block_group_cache *cache = NULL;
@@ -10843,6 +10852,10 @@ int btrfs_trim_fs(struct btrfs_fs_info *fs_info, struct fstrim_range *range)
u64 end;
u64 trimmed = 0;
u64 total_bytes = btrfs_super_total_bytes(fs_info->super_copy);
+ u64 bg_failed = 0;
+ u64 dev_failed = 0;
+ int bg_ret = 0;
+ int dev_ret = 0;
int ret = 0;
/*
@@ -10853,7 +10866,7 @@ int btrfs_trim_fs(struct btrfs_fs_info *fs_info, struct fstrim_range *range)
else
cache = btrfs_lookup_block_group(fs_info, range->start);
- while (cache) {
+ for (; cache; cache = next_block_group(fs_info, cache)) {
if (cache->key.objectid >= (range->start + range->len)) {
btrfs_put_block_group(cache);
break;
@@ -10867,13 +10880,15 @@ int btrfs_trim_fs(struct btrfs_fs_info *fs_info, struct fstrim_range *range)
if (!block_group_cache_done(cache)) {
ret = cache_block_group(cache, 0);
if (ret) {
- btrfs_put_block_group(cache);
- break;
+ bg_failed++;
+ bg_ret = ret;
+ continue;
}
ret = wait_block_group_cache_done(cache);
if (ret) {
- btrfs_put_block_group(cache);
- break;
+ bg_failed++;
+ bg_ret = ret;
+ continue;
}
}
ret = btrfs_trim_block_group(cache,
@@ -10884,28 +10899,40 @@ int btrfs_trim_fs(struct btrfs_fs_info *fs_info, struct fstrim_range *range)
trimmed += group_trimmed;
if (ret) {
- btrfs_put_block_group(cache);
- break;
+ bg_failed++;
+ bg_ret = ret;
+ continue;
}
}
-
- cache = next_block_group(fs_info, cache);
}
+ if (bg_failed)
+ btrfs_warn(fs_info,
+ "failed to trim %llu block group(s), last error %d",
+ bg_failed, bg_ret);
mutex_lock(&fs_info->fs_devices->device_list_mutex);
devices = &fs_info->fs_devices->alloc_list;
list_for_each_entry(device, devices, dev_alloc_list) {
ret = btrfs_trim_free_extents(device, range->minlen,
&group_trimmed);
- if (ret)
+ if (ret) {
+ dev_failed++;
+ dev_ret = ret;
break;
+ }
trimmed += group_trimmed;
}
mutex_unlock(&fs_info->fs_devices->device_list_mutex);
+ if (dev_failed)
+ btrfs_warn(fs_info,
+ "failed to trim %llu device(s), last error %d",
+ dev_failed, dev_ret);
range->len = trimmed;
- return ret;
+ if (bg_ret)
+ return bg_ret;
+ return dev_ret;
}
/*
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 5c06147128fbbdf7a84232c5f0d808f53153defe Mon Sep 17 00:00:00 2001
From: Jeff Mahoney <jeffm(a)suse.com>
Date: Thu, 6 Sep 2018 15:52:17 -0400
Subject: [PATCH] btrfs: fix error handling in btrfs_dev_replace_start
When we fail to start a transaction in btrfs_dev_replace_start, we leave
dev_replace->replace_start set to STARTED but clear ->srcdev and
->tgtdev. Later, that can result in an Oops in
btrfs_dev_replace_progress when having state set to STARTED or SUSPENDED
implies that ->srcdev is valid.
Also fix error handling when the state is already STARTED or SUSPENDED
while starting. That, too, will clear ->srcdev and ->tgtdev even though
it doesn't own them. This should be an impossible case to hit since we
should be protected by the BTRFS_FS_EXCL_OP bit being set. Let's add an
ASSERT there while we're at it.
Fixes: e93c89c1aaaaa (Btrfs: add new sources for device replace code)
CC: stable(a)vger.kernel.org # 4.4+
Signed-off-by: Jeff Mahoney <jeffm(a)suse.com>
Reviewed-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c
index 4e2b67d06305..ff01740158aa 100644
--- a/fs/btrfs/dev-replace.c
+++ b/fs/btrfs/dev-replace.c
@@ -440,6 +440,7 @@ int btrfs_dev_replace_start(struct btrfs_fs_info *fs_info,
break;
case BTRFS_IOCTL_DEV_REPLACE_STATE_STARTED:
case BTRFS_IOCTL_DEV_REPLACE_STATE_SUSPENDED:
+ ASSERT(0);
ret = BTRFS_IOCTL_DEV_REPLACE_RESULT_ALREADY_STARTED;
goto leave;
}
@@ -482,6 +483,10 @@ int btrfs_dev_replace_start(struct btrfs_fs_info *fs_info,
if (IS_ERR(trans)) {
ret = PTR_ERR(trans);
btrfs_dev_replace_write_lock(dev_replace);
+ dev_replace->replace_state =
+ BTRFS_IOCTL_DEV_REPLACE_STATE_NEVER_STARTED;
+ dev_replace->srcdev = NULL;
+ dev_replace->tgtdev = NULL;
goto leave;
}
@@ -503,8 +508,6 @@ int btrfs_dev_replace_start(struct btrfs_fs_info *fs_info,
return ret;
leave:
- dev_replace->srcdev = NULL;
- dev_replace->tgtdev = NULL;
btrfs_dev_replace_write_unlock(dev_replace);
btrfs_destroy_dev_replace_tgtdev(tgt_device);
return ret;
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 5c06147128fbbdf7a84232c5f0d808f53153defe Mon Sep 17 00:00:00 2001
From: Jeff Mahoney <jeffm(a)suse.com>
Date: Thu, 6 Sep 2018 15:52:17 -0400
Subject: [PATCH] btrfs: fix error handling in btrfs_dev_replace_start
When we fail to start a transaction in btrfs_dev_replace_start, we leave
dev_replace->replace_start set to STARTED but clear ->srcdev and
->tgtdev. Later, that can result in an Oops in
btrfs_dev_replace_progress when having state set to STARTED or SUSPENDED
implies that ->srcdev is valid.
Also fix error handling when the state is already STARTED or SUSPENDED
while starting. That, too, will clear ->srcdev and ->tgtdev even though
it doesn't own them. This should be an impossible case to hit since we
should be protected by the BTRFS_FS_EXCL_OP bit being set. Let's add an
ASSERT there while we're at it.
Fixes: e93c89c1aaaaa (Btrfs: add new sources for device replace code)
CC: stable(a)vger.kernel.org # 4.4+
Signed-off-by: Jeff Mahoney <jeffm(a)suse.com>
Reviewed-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c
index 4e2b67d06305..ff01740158aa 100644
--- a/fs/btrfs/dev-replace.c
+++ b/fs/btrfs/dev-replace.c
@@ -440,6 +440,7 @@ int btrfs_dev_replace_start(struct btrfs_fs_info *fs_info,
break;
case BTRFS_IOCTL_DEV_REPLACE_STATE_STARTED:
case BTRFS_IOCTL_DEV_REPLACE_STATE_SUSPENDED:
+ ASSERT(0);
ret = BTRFS_IOCTL_DEV_REPLACE_RESULT_ALREADY_STARTED;
goto leave;
}
@@ -482,6 +483,10 @@ int btrfs_dev_replace_start(struct btrfs_fs_info *fs_info,
if (IS_ERR(trans)) {
ret = PTR_ERR(trans);
btrfs_dev_replace_write_lock(dev_replace);
+ dev_replace->replace_state =
+ BTRFS_IOCTL_DEV_REPLACE_STATE_NEVER_STARTED;
+ dev_replace->srcdev = NULL;
+ dev_replace->tgtdev = NULL;
goto leave;
}
@@ -503,8 +508,6 @@ int btrfs_dev_replace_start(struct btrfs_fs_info *fs_info,
return ret;
leave:
- dev_replace->srcdev = NULL;
- dev_replace->tgtdev = NULL;
btrfs_dev_replace_write_unlock(dev_replace);
btrfs_destroy_dev_replace_tgtdev(tgt_device);
return ret;
From: Dennis Krein <Dennis.Krein(a)netapp.com>
The srcu_gp_start() function is called with the srcu_struct structure's
->lock held, but not with the srcu_data structure's ->lock. This is
problematic because this function accesses and updates the srcu_data
structure's ->srcu_cblist, which is protected by that lock. Failing to
hold this lock can result in corruption of the SRCU callback lists,
which in turn can result in arbitrarily bad results.
This commit therefore makes srcu_gp_start() acquire the srcu_data
structure's ->lock across the calls to rcu_segcblist_advance() and
rcu_segcblist_accelerate(), thus preventing this corruption.
Reported-by: Bart Van Assche <bvanassche(a)acm.org>
Reported-by: Christoph Hellwig <hch(a)infradead.org>
Reported-by: Sebastian Kuzminsky <seb.kuzminsky(a)gmail.com>
Signed-off-by: Dennis Krein <Dennis.Krein(a)netapp.com>
Signed-off-by: Paul E. McKenney <paulmck(a)linux.ibm.com>
Tested-by: Dennis Krein <Dennis.Krein(a)netapp.com>
Cc: <stable(a)vger.kernel.org> # 4.12.x
---
kernel/rcu/srcutree.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c
index 60f3236beaf7..697a2d7e8e8a 100644
--- a/kernel/rcu/srcutree.c
+++ b/kernel/rcu/srcutree.c
@@ -451,10 +451,12 @@ static void srcu_gp_start(struct srcu_struct *sp)
lockdep_assert_held(&ACCESS_PRIVATE(sp, lock));
WARN_ON_ONCE(ULONG_CMP_GE(sp->srcu_gp_seq, sp->srcu_gp_seq_needed));
+ spin_lock_rcu_node(sdp); /* Interrupts already disabled. */
rcu_segcblist_advance(&sdp->srcu_cblist,
rcu_seq_current(&sp->srcu_gp_seq));
(void)rcu_segcblist_accelerate(&sdp->srcu_cblist,
rcu_seq_snap(&sp->srcu_gp_seq));
+ spin_unlock_rcu_node(sdp); /* Interrupts remain disabled. */
smp_mb(); /* Order prior store to ->srcu_gp_seq_needed vs. GP start. */
rcu_seq_start(&sp->srcu_gp_seq);
state = rcu_seq_state(READ_ONCE(sp->srcu_gp_seq));
--
2.17.1
On Sat, Nov 10, 2018 at 10:49:46AM -0800, gregkh(a)linuxfoundation.org wrote:
It shouldn't be added to 4.14, 4.18, 4.19, 4.9
>
>This is a note to let you know that I've just added the patch titled
>
> net: ethernet: ti: cpsw: unsync mcast entries while switch promisc mode
>
>to the 4.18-stable tree which can be found at:
> http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
>
>The filename of the patch is:
> net-ethernet-ti-cpsw-unsync-mcast-entries-while-switch-promisc-mode.patch
>and it can be found in the queue-4.18 subdirectory.
>
>If you, or anyone else, feels it should not be added to the stable tree,
>please let <stable(a)vger.kernel.org> know about it.
>
>
>From foo@baz Sat Nov 10 10:48:43 PST 2018
>From: Ivan Khoronzhuk <ivan.khoronzhuk(a)linaro.org>
>Date: Mon, 22 Oct 2018 21:51:36 +0300
>Subject: net: ethernet: ti: cpsw: unsync mcast entries while switch promisc mode
>
>From: Ivan Khoronzhuk <ivan.khoronzhuk(a)linaro.org>
>
>[ Upstream commit 9737cc99dd14b5b8b9d267618a6061feade8ea68 ]
>
>After flushing all mcast entries from the table, the ones contained in
>mc list of ndev are not restored when promisc mode is toggled off,
>because they are considered as synched with ALE, thus, in order to
>restore them after promisc mode - reset syncing info. This fix
>touches only switch mode devices, including single port boards
>like Beagle Bone.
>
>Fixes: commit 5da1948969bc
>("net: ethernet: ti: cpsw: fix lost of mcast packets while rx_mode update")
>
>Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk(a)linaro.org>
>Reviewed-by: Grygorii Strashko <grygorii.strashko(a)ti.com>
>Signed-off-by: David S. Miller <davem(a)davemloft.net>
>Signed-off-by: Sasha Levin <sashal(a)kernel.org>
>Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
>---
> drivers/net/ethernet/ti/cpsw.c | 1 +
> 1 file changed, 1 insertion(+)
>
>--- a/drivers/net/ethernet/ti/cpsw.c
>+++ b/drivers/net/ethernet/ti/cpsw.c
>@@ -641,6 +641,7 @@ static void cpsw_set_promiscious(struct
>
> /* Clear all mcast from ALE */
> cpsw_ale_flush_multicast(ale, ALE_ALL_PORTS, -1);
>+ __dev_mc_unsync(ndev, NULL);
>
> /* Flood All Unicast Packets to Host port */
> cpsw_ale_control_set(ale, 0, ALE_P0_UNI_FLOOD, 1);
>
>
>Patches currently in stable-queue which might be from ivan.khoronzhuk(a)linaro.org are
>
>queue-4.18/net-ethernet-ti-cpsw-unsync-mcast-entries-while-switch-promisc-mode.patch
--
Regards,
Ivan Khoronzhuk
On Sat, Nov 10, 2018 at 11:33:10AM -0800, gregkh(a)linuxfoundation.org wrote:
It shouldn't be added to 4.9
>
>This is a note to let you know that I've just added the patch titled
>
> net: ethernet: ti: cpsw: unsync mcast entries while switch promisc mode
>
>to the 4.9-stable tree which can be found at:
> http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
>
>The filename of the patch is:
> net-ethernet-ti-cpsw-unsync-mcast-entries-while-switch-promisc-mode.patch
>and it can be found in the queue-4.9 subdirectory.
>
>If you, or anyone else, feels it should not be added to the stable tree,
>please let <stable(a)vger.kernel.org> know about it.
>
>
>From foo@baz Sat Nov 10 11:24:34 PST 2018
>From: Ivan Khoronzhuk <ivan.khoronzhuk(a)linaro.org>
>Date: Mon, 22 Oct 2018 21:51:36 +0300
>Subject: net: ethernet: ti: cpsw: unsync mcast entries while switch promisc mode
>
>From: Ivan Khoronzhuk <ivan.khoronzhuk(a)linaro.org>
>
>[ Upstream commit 9737cc99dd14b5b8b9d267618a6061feade8ea68 ]
>
>After flushing all mcast entries from the table, the ones contained in
>mc list of ndev are not restored when promisc mode is toggled off,
>because they are considered as synched with ALE, thus, in order to
>restore them after promisc mode - reset syncing info. This fix
>touches only switch mode devices, including single port boards
>like Beagle Bone.
>
>Fixes: commit 5da1948969bc
>("net: ethernet: ti: cpsw: fix lost of mcast packets while rx_mode update")
>
>Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk(a)linaro.org>
>Reviewed-by: Grygorii Strashko <grygorii.strashko(a)ti.com>
>Signed-off-by: David S. Miller <davem(a)davemloft.net>
>Signed-off-by: Sasha Levin <sashal(a)kernel.org>
>Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
>---
> drivers/net/ethernet/ti/cpsw.c | 1 +
> 1 file changed, 1 insertion(+)
>
>--- a/drivers/net/ethernet/ti/cpsw.c
>+++ b/drivers/net/ethernet/ti/cpsw.c
>@@ -590,6 +590,7 @@ static void cpsw_set_promiscious(struct
>
> /* Clear all mcast from ALE */
> cpsw_ale_flush_multicast(ale, ALE_ALL_PORTS, -1);
>+ __dev_mc_unsync(ndev, NULL);
>
> /* Flood All Unicast Packets to Host port */
> cpsw_ale_control_set(ale, 0, ALE_P0_UNI_FLOOD, 1);
>
>
>Patches currently in stable-queue which might be from ivan.khoronzhuk(a)linaro.org are
>
>queue-4.9/net-ethernet-ti-cpsw-unsync-mcast-entries-while-switch-promisc-mode.patch
--
Regards,
Ivan Khoronzhuk
On Sat, Nov 10, 2018 at 11:21:06AM -0800, gregkh(a)linuxfoundation.org wrote:
It shouldn't be added to 4.14
>
>This is a note to let you know that I've just added the patch titled
>
> net: ethernet: ti: cpsw: unsync mcast entries while switch promisc mode
>
>to the 4.14-stable tree which can be found at:
> http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
>
>The filename of the patch is:
> net-ethernet-ti-cpsw-unsync-mcast-entries-while-switch-promisc-mode.patch
>and it can be found in the queue-4.14 subdirectory.
>
>If you, or anyone else, feels it should not be added to the stable tree,
>please let <stable(a)vger.kernel.org> know about it.
>
>
>From foo@baz Sat Nov 10 11:17:18 PST 2018
>From: Ivan Khoronzhuk <ivan.khoronzhuk(a)linaro.org>
>Date: Mon, 22 Oct 2018 21:51:36 +0300
>Subject: net: ethernet: ti: cpsw: unsync mcast entries while switch promisc mode
>
>From: Ivan Khoronzhuk <ivan.khoronzhuk(a)linaro.org>
>
>[ Upstream commit 9737cc99dd14b5b8b9d267618a6061feade8ea68 ]
>
>After flushing all mcast entries from the table, the ones contained in
>mc list of ndev are not restored when promisc mode is toggled off,
>because they are considered as synched with ALE, thus, in order to
>restore them after promisc mode - reset syncing info. This fix
>touches only switch mode devices, including single port boards
>like Beagle Bone.
>
>Fixes: commit 5da1948969bc
>("net: ethernet: ti: cpsw: fix lost of mcast packets while rx_mode update")
>
>Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk(a)linaro.org>
>Reviewed-by: Grygorii Strashko <grygorii.strashko(a)ti.com>
>Signed-off-by: David S. Miller <davem(a)davemloft.net>
>Signed-off-by: Sasha Levin <sashal(a)kernel.org>
>Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
>---
> drivers/net/ethernet/ti/cpsw.c | 1 +
> 1 file changed, 1 insertion(+)
>
>--- a/drivers/net/ethernet/ti/cpsw.c
>+++ b/drivers/net/ethernet/ti/cpsw.c
>@@ -601,6 +601,7 @@ static void cpsw_set_promiscious(struct
>
> /* Clear all mcast from ALE */
> cpsw_ale_flush_multicast(ale, ALE_ALL_PORTS, -1);
>+ __dev_mc_unsync(ndev, NULL);
>
> /* Flood All Unicast Packets to Host port */
> cpsw_ale_control_set(ale, 0, ALE_P0_UNI_FLOOD, 1);
>
>
>Patches currently in stable-queue which might be from ivan.khoronzhuk(a)linaro.org are
>
>queue-4.14/net-ethernet-ti-cpsw-unsync-mcast-entries-while-switch-promisc-mode.patch
--
Regards,
Ivan Khoronzhuk
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From fd5ba6ee3187617287fb9cb187e3d6b3631210a3 Mon Sep 17 00:00:00 2001
From: Aaro Koskinen <aaro.koskinen(a)nokia.com>
Date: Fri, 2 Nov 2018 21:10:48 +0200
Subject: [PATCH] arm64: dts: stratix10: fix multicast filtering
On Stratix 10, the EMAC has 256 hash buckets for multicast filtering. This
needs to be specified in DTS, otherwise the stmmac driver defaults to 64
buckets and initializes the filter incorrectly. As a result, e.g. valid
IPv6 multicast traffic ends up being dropped.
Fixes: 78cd6a9d8e15 ("arm64: dts: Add base stratix 10 dtsi")
Cc: stable(a)vger.kernel.org
Signed-off-by: Aaro Koskinen <aaro.koskinen(a)nokia.com>
Signed-off-by: Dinh Nguyen <dinguyen(a)kernel.org>
diff --git a/arch/arm64/boot/dts/altera/socfpga_stratix10.dtsi b/arch/arm64/boot/dts/altera/socfpga_stratix10.dtsi
index 8253a1a9e985..fef7351e9f67 100644
--- a/arch/arm64/boot/dts/altera/socfpga_stratix10.dtsi
+++ b/arch/arm64/boot/dts/altera/socfpga_stratix10.dtsi
@@ -139,6 +139,7 @@
clock-names = "stmmaceth";
tx-fifo-depth = <16384>;
rx-fifo-depth = <16384>;
+ snps,multicast-filter-bins = <256>;
status = "disabled";
};
@@ -154,6 +155,7 @@
clock-names = "stmmaceth";
tx-fifo-depth = <16384>;
rx-fifo-depth = <16384>;
+ snps,multicast-filter-bins = <256>;
status = "disabled";
};
@@ -169,6 +171,7 @@
clock-names = "stmmaceth";
tx-fifo-depth = <16384>;
rx-fifo-depth = <16384>;
+ snps,multicast-filter-bins = <256>;
status = "disabled";
};
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From fd5ba6ee3187617287fb9cb187e3d6b3631210a3 Mon Sep 17 00:00:00 2001
From: Aaro Koskinen <aaro.koskinen(a)nokia.com>
Date: Fri, 2 Nov 2018 21:10:48 +0200
Subject: [PATCH] arm64: dts: stratix10: fix multicast filtering
On Stratix 10, the EMAC has 256 hash buckets for multicast filtering. This
needs to be specified in DTS, otherwise the stmmac driver defaults to 64
buckets and initializes the filter incorrectly. As a result, e.g. valid
IPv6 multicast traffic ends up being dropped.
Fixes: 78cd6a9d8e15 ("arm64: dts: Add base stratix 10 dtsi")
Cc: stable(a)vger.kernel.org
Signed-off-by: Aaro Koskinen <aaro.koskinen(a)nokia.com>
Signed-off-by: Dinh Nguyen <dinguyen(a)kernel.org>
diff --git a/arch/arm64/boot/dts/altera/socfpga_stratix10.dtsi b/arch/arm64/boot/dts/altera/socfpga_stratix10.dtsi
index 8253a1a9e985..fef7351e9f67 100644
--- a/arch/arm64/boot/dts/altera/socfpga_stratix10.dtsi
+++ b/arch/arm64/boot/dts/altera/socfpga_stratix10.dtsi
@@ -139,6 +139,7 @@
clock-names = "stmmaceth";
tx-fifo-depth = <16384>;
rx-fifo-depth = <16384>;
+ snps,multicast-filter-bins = <256>;
status = "disabled";
};
@@ -154,6 +155,7 @@
clock-names = "stmmaceth";
tx-fifo-depth = <16384>;
rx-fifo-depth = <16384>;
+ snps,multicast-filter-bins = <256>;
status = "disabled";
};
@@ -169,6 +171,7 @@
clock-names = "stmmaceth";
tx-fifo-depth = <16384>;
rx-fifo-depth = <16384>;
+ snps,multicast-filter-bins = <256>;
status = "disabled";
};
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From fd5ba6ee3187617287fb9cb187e3d6b3631210a3 Mon Sep 17 00:00:00 2001
From: Aaro Koskinen <aaro.koskinen(a)nokia.com>
Date: Fri, 2 Nov 2018 21:10:48 +0200
Subject: [PATCH] arm64: dts: stratix10: fix multicast filtering
On Stratix 10, the EMAC has 256 hash buckets for multicast filtering. This
needs to be specified in DTS, otherwise the stmmac driver defaults to 64
buckets and initializes the filter incorrectly. As a result, e.g. valid
IPv6 multicast traffic ends up being dropped.
Fixes: 78cd6a9d8e15 ("arm64: dts: Add base stratix 10 dtsi")
Cc: stable(a)vger.kernel.org
Signed-off-by: Aaro Koskinen <aaro.koskinen(a)nokia.com>
Signed-off-by: Dinh Nguyen <dinguyen(a)kernel.org>
diff --git a/arch/arm64/boot/dts/altera/socfpga_stratix10.dtsi b/arch/arm64/boot/dts/altera/socfpga_stratix10.dtsi
index 8253a1a9e985..fef7351e9f67 100644
--- a/arch/arm64/boot/dts/altera/socfpga_stratix10.dtsi
+++ b/arch/arm64/boot/dts/altera/socfpga_stratix10.dtsi
@@ -139,6 +139,7 @@
clock-names = "stmmaceth";
tx-fifo-depth = <16384>;
rx-fifo-depth = <16384>;
+ snps,multicast-filter-bins = <256>;
status = "disabled";
};
@@ -154,6 +155,7 @@
clock-names = "stmmaceth";
tx-fifo-depth = <16384>;
rx-fifo-depth = <16384>;
+ snps,multicast-filter-bins = <256>;
status = "disabled";
};
@@ -169,6 +171,7 @@
clock-names = "stmmaceth";
tx-fifo-depth = <16384>;
rx-fifo-depth = <16384>;
+ snps,multicast-filter-bins = <256>;
status = "disabled";
};
The patch below does not apply to the 4.18-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From fd5ba6ee3187617287fb9cb187e3d6b3631210a3 Mon Sep 17 00:00:00 2001
From: Aaro Koskinen <aaro.koskinen(a)nokia.com>
Date: Fri, 2 Nov 2018 21:10:48 +0200
Subject: [PATCH] arm64: dts: stratix10: fix multicast filtering
On Stratix 10, the EMAC has 256 hash buckets for multicast filtering. This
needs to be specified in DTS, otherwise the stmmac driver defaults to 64
buckets and initializes the filter incorrectly. As a result, e.g. valid
IPv6 multicast traffic ends up being dropped.
Fixes: 78cd6a9d8e15 ("arm64: dts: Add base stratix 10 dtsi")
Cc: stable(a)vger.kernel.org
Signed-off-by: Aaro Koskinen <aaro.koskinen(a)nokia.com>
Signed-off-by: Dinh Nguyen <dinguyen(a)kernel.org>
diff --git a/arch/arm64/boot/dts/altera/socfpga_stratix10.dtsi b/arch/arm64/boot/dts/altera/socfpga_stratix10.dtsi
index 8253a1a9e985..fef7351e9f67 100644
--- a/arch/arm64/boot/dts/altera/socfpga_stratix10.dtsi
+++ b/arch/arm64/boot/dts/altera/socfpga_stratix10.dtsi
@@ -139,6 +139,7 @@
clock-names = "stmmaceth";
tx-fifo-depth = <16384>;
rx-fifo-depth = <16384>;
+ snps,multicast-filter-bins = <256>;
status = "disabled";
};
@@ -154,6 +155,7 @@
clock-names = "stmmaceth";
tx-fifo-depth = <16384>;
rx-fifo-depth = <16384>;
+ snps,multicast-filter-bins = <256>;
status = "disabled";
};
@@ -169,6 +171,7 @@
clock-names = "stmmaceth";
tx-fifo-depth = <16384>;
rx-fifo-depth = <16384>;
+ snps,multicast-filter-bins = <256>;
status = "disabled";
};
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From fd5ba6ee3187617287fb9cb187e3d6b3631210a3 Mon Sep 17 00:00:00 2001
From: Aaro Koskinen <aaro.koskinen(a)nokia.com>
Date: Fri, 2 Nov 2018 21:10:48 +0200
Subject: [PATCH] arm64: dts: stratix10: fix multicast filtering
On Stratix 10, the EMAC has 256 hash buckets for multicast filtering. This
needs to be specified in DTS, otherwise the stmmac driver defaults to 64
buckets and initializes the filter incorrectly. As a result, e.g. valid
IPv6 multicast traffic ends up being dropped.
Fixes: 78cd6a9d8e15 ("arm64: dts: Add base stratix 10 dtsi")
Cc: stable(a)vger.kernel.org
Signed-off-by: Aaro Koskinen <aaro.koskinen(a)nokia.com>
Signed-off-by: Dinh Nguyen <dinguyen(a)kernel.org>
diff --git a/arch/arm64/boot/dts/altera/socfpga_stratix10.dtsi b/arch/arm64/boot/dts/altera/socfpga_stratix10.dtsi
index 8253a1a9e985..fef7351e9f67 100644
--- a/arch/arm64/boot/dts/altera/socfpga_stratix10.dtsi
+++ b/arch/arm64/boot/dts/altera/socfpga_stratix10.dtsi
@@ -139,6 +139,7 @@
clock-names = "stmmaceth";
tx-fifo-depth = <16384>;
rx-fifo-depth = <16384>;
+ snps,multicast-filter-bins = <256>;
status = "disabled";
};
@@ -154,6 +155,7 @@
clock-names = "stmmaceth";
tx-fifo-depth = <16384>;
rx-fifo-depth = <16384>;
+ snps,multicast-filter-bins = <256>;
status = "disabled";
};
@@ -169,6 +171,7 @@
clock-names = "stmmaceth";
tx-fifo-depth = <16384>;
rx-fifo-depth = <16384>;
+ snps,multicast-filter-bins = <256>;
status = "disabled";
};
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From ce3bf934f919a7d675c5b7fa4cc233ded9c6256e Mon Sep 17 00:00:00 2001
From: Thor Thayer <thor.thayer(a)linux.intel.com>
Date: Tue, 25 Sep 2018 10:21:10 -0500
Subject: [PATCH] ARM: dts: socfpga: Fix SDRAM node address for Arria10
The address in the SDRAM node was incorrect. Fix this to agree with the
correct address and to match the reg definition block.
Cc: stable(a)vger.kernel.org
Fixes: 54b4a8f57848b("arm: socfpga: dts: Add Arria10 SDRAM EDAC DTS support")
Signed-off-by: Thor Thayer <thor.thayer(a)linux.intel.com>
Signed-off-by: Dinh Nguyen <dinguyen(a)kernel.org>
diff --git a/arch/arm/boot/dts/socfpga_arria10.dtsi b/arch/arm/boot/dts/socfpga_arria10.dtsi
index 4e0c26423d84..59ef13e37536 100644
--- a/arch/arm/boot/dts/socfpga_arria10.dtsi
+++ b/arch/arm/boot/dts/socfpga_arria10.dtsi
@@ -628,7 +628,7 @@
status = "disabled";
};
- sdr: sdr@ffc25000 {
+ sdr: sdr@ffcfb100 {
compatible = "altr,sdr-ctl", "syscon";
reg = <0xffcfb100 0x80>;
};
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From ce3bf934f919a7d675c5b7fa4cc233ded9c6256e Mon Sep 17 00:00:00 2001
From: Thor Thayer <thor.thayer(a)linux.intel.com>
Date: Tue, 25 Sep 2018 10:21:10 -0500
Subject: [PATCH] ARM: dts: socfpga: Fix SDRAM node address for Arria10
The address in the SDRAM node was incorrect. Fix this to agree with the
correct address and to match the reg definition block.
Cc: stable(a)vger.kernel.org
Fixes: 54b4a8f57848b("arm: socfpga: dts: Add Arria10 SDRAM EDAC DTS support")
Signed-off-by: Thor Thayer <thor.thayer(a)linux.intel.com>
Signed-off-by: Dinh Nguyen <dinguyen(a)kernel.org>
diff --git a/arch/arm/boot/dts/socfpga_arria10.dtsi b/arch/arm/boot/dts/socfpga_arria10.dtsi
index 4e0c26423d84..59ef13e37536 100644
--- a/arch/arm/boot/dts/socfpga_arria10.dtsi
+++ b/arch/arm/boot/dts/socfpga_arria10.dtsi
@@ -628,7 +628,7 @@
status = "disabled";
};
- sdr: sdr@ffc25000 {
+ sdr: sdr@ffcfb100 {
compatible = "altr,sdr-ctl", "syscon";
reg = <0xffcfb100 0x80>;
};
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 928002a5e9dab2ddc1a0fe3e00739e89be30dc6b Mon Sep 17 00:00:00 2001
From: Arun Kumar Neelakantam <aneela(a)codeaurora.org>
Date: Wed, 3 Oct 2018 17:08:20 +0530
Subject: [PATCH] rpmsg: glink: smem: Support rx peak for size less than 4
bytes
The current rx peak function fails to read the data if size is
less than 4bytes.
Use memcpy_fromio to support data reads of size less than 4 bytes.
Cc: stable(a)vger.kernel.org
Fixes: f0beb4ba9b18 ("rpmsg: glink: Remove chunk size word align warning")
Signed-off-by: Arun Kumar Neelakantam <aneela(a)codeaurora.org>
Signed-off-by: Bjorn Andersson <bjorn.andersson(a)linaro.org>
diff --git a/drivers/rpmsg/qcom_glink_smem.c b/drivers/rpmsg/qcom_glink_smem.c
index ab23da3d7131..64a5ce324c7f 100644
--- a/drivers/rpmsg/qcom_glink_smem.c
+++ b/drivers/rpmsg/qcom_glink_smem.c
@@ -89,15 +89,11 @@ static void glink_smem_rx_peak(struct qcom_glink_pipe *np,
tail -= pipe->native.length;
len = min_t(size_t, count, pipe->native.length - tail);
- if (len) {
- __ioread32_copy(data, pipe->fifo + tail,
- len / sizeof(u32));
- }
+ if (len)
+ memcpy_fromio(data, pipe->fifo + tail, len);
- if (len != count) {
- __ioread32_copy(data + len, pipe->fifo,
- (count - len) / sizeof(u32));
- }
+ if (len != count)
+ memcpy_fromio(data + len, pipe->fifo, (count - len));
}
static void glink_smem_rx_advance(struct qcom_glink_pipe *np,
This is a note to let you know that I've just added the patch titled
uio: Fix an Oops on load
to my char-misc git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git
in the char-misc-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
>From 432798195bbce1f8cd33d1c0284d0538835e25fb Mon Sep 17 00:00:00 2001
From: Dan Carpenter <dan.carpenter(a)oracle.com>
Date: Fri, 26 Oct 2018 10:19:51 +0300
Subject: uio: Fix an Oops on load
I was trying to solve a double free but I introduced a more serious
NULL dereference bug. The problem is that if there is an IRQ which
triggers immediately, then we need "info->uio_dev" but it's not set yet.
This patch puts the original initialization back to how it was and just
sets info->uio_dev to NULL on the error path so it should solve both
the Oops and the double free.
Fixes: f019f07ecf6a ("uio: potential double frees if __uio_register_device() fails")
Reported-by: Mathias Thore <Mathias.Thore(a)infinera.com>
Signed-off-by: Dan Carpenter <dan.carpenter(a)oracle.com>
Cc: stable <stable(a)vger.kernel.org>
Tested-by: Mathias Thore <Mathias.Thore(a)infinera.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/uio/uio.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/drivers/uio/uio.c b/drivers/uio/uio.c
index 85644669fbe7..0a357db4b31b 100644
--- a/drivers/uio/uio.c
+++ b/drivers/uio/uio.c
@@ -961,6 +961,8 @@ int __uio_register_device(struct module *owner,
if (ret)
goto err_uio_dev_add_attributes;
+ info->uio_dev = idev;
+
if (info->irq && (info->irq != UIO_IRQ_CUSTOM)) {
/*
* Note that we deliberately don't use devm_request_irq
@@ -972,11 +974,12 @@ int __uio_register_device(struct module *owner,
*/
ret = request_irq(info->irq, uio_interrupt,
info->irq_flags, info->name, idev);
- if (ret)
+ if (ret) {
+ info->uio_dev = NULL;
goto err_request_irq;
+ }
}
- info->uio_dev = idev;
return 0;
err_request_irq:
--
2.19.1