> > > On Fri, Jul 24, 2020 at 09:15:40AM +0900, Namjae Jeon wrote:
> > > >Hi,
> > > >
> > > >Could you please pick up exfat stable patches ?
> > >
> > Hi Sasha,
> > > I see that the upstream commits already have stable tags and don't need modifications for backporting,
> > > so there is no need to explicitly send them here - they will be picked up automatically.
> > Okay, I see:)
>
Hi Greg,
Sorry for late reply. I checked this mail late.
> As Sasha said, these will normally get picked up automatically. We wait
> until a patch is in a release from Linus (i.e. a -rc) before taking
> them, but we can take them earilier if you ask for them.
Okay, I like that you apply them after 5.8-rc7 is released if these
patches can be applied in 5.7. I was just worried that 5.7 stable
kernel release is closed without these patches being applied :)
>
> You did include one patch in this series that was not marked for stable,
> so I've taken that, and the other 3 now to make it easy.
Ah, okay, I will check stable mark in patches next time.
Thanks for your mail!
>
> thanks,
>
> greg k-h
>
>
>
>
>
On 2020/7/26 下午8:41, gregkh(a)linuxfoundation.org wrote:
>
> This is a note to let you know that I've just added the patch titled
>
> btrfs: qgroup: fix data leak caused by race between writeback and truncate
>
> to the 4.14-stable tree which can be found at:
> http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
>
> The filename of the patch is:
> btrfs-qgroup-fix-data-leak-caused-by-race-between-writeback-and-truncate.patch
> and it can be found in the queue-4.14 subdirectory.
>
> If you, or anyone else, feels it should not be added to the stable tree,
> please let <stable(a)vger.kernel.org> know about it.
Please don't merge this patch for any of the stable branches.
This patch needs one unmerged patch ("btrfs: change timing for qgroup
reserved space for ordered extents to fix reserved space leak", already
in maintainer's tree) as prerequisite.
The behavior without that patch could be problematic.
I should have noticed this earlier.
Thanks,
Qu
>
>
> From fa91e4aa1716004ea8096d5185ec0451e206aea0 Mon Sep 17 00:00:00 2001
> From: Qu Wenruo <wqu(a)suse.com>
> Date: Fri, 17 Jul 2020 15:12:05 +0800
> Subject: btrfs: qgroup: fix data leak caused by race between writeback and truncate
>
> From: Qu Wenruo <wqu(a)suse.com>
>
> commit fa91e4aa1716004ea8096d5185ec0451e206aea0 upstream.
>
> [BUG]
> When running tests like generic/013 on test device with btrfs quota
> enabled, it can normally lead to data leak, detected at unmount time:
>
> BTRFS warning (device dm-3): qgroup 0/5 has unreleased space, type 0 rsv 4096
> ------------[ cut here ]------------
> WARNING: CPU: 11 PID: 16386 at fs/btrfs/disk-io.c:4142 close_ctree+0x1dc/0x323 [btrfs]
> RIP: 0010:close_ctree+0x1dc/0x323 [btrfs]
> Call Trace:
> btrfs_put_super+0x15/0x17 [btrfs]
> generic_shutdown_super+0x72/0x110
> kill_anon_super+0x18/0x30
> btrfs_kill_super+0x17/0x30 [btrfs]
> deactivate_locked_super+0x3b/0xa0
> deactivate_super+0x40/0x50
> cleanup_mnt+0x135/0x190
> __cleanup_mnt+0x12/0x20
> task_work_run+0x64/0xb0
> __prepare_exit_to_usermode+0x1bc/0x1c0
> __syscall_return_slowpath+0x47/0x230
> do_syscall_64+0x64/0xb0
> entry_SYSCALL_64_after_hwframe+0x44/0xa9
> ---[ end trace caf08beafeca2392 ]---
> BTRFS error (device dm-3): qgroup reserved space leaked
>
> [CAUSE]
> In the offending case, the offending operations are:
> 2/6: writev f2X[269 1 0 0 0 0] [1006997,67,288] 0
> 2/7: truncate f2X[269 1 0 0 48 1026293] 18388 0
>
> The following sequence of events could happen after the writev():
> CPU1 (writeback) | CPU2 (truncate)
> -----------------------------------------------------------------
> btrfs_writepages() |
> |- extent_write_cache_pages() |
> |- Got page for 1003520 |
> | 1003520 is Dirty, no writeback |
> | So (!clear_page_dirty_for_io()) |
> | gets called for it |
> |- Now page 1003520 is Clean. |
> | | btrfs_setattr()
> | | |- btrfs_setsize()
> | | |- truncate_setsize()
> | | New i_size is 18388
> |- __extent_writepage() |
> | |- page_offset() > i_size |
> |- btrfs_invalidatepage() |
> |- Page is clean, so no qgroup |
> callback executed
>
> This means, the qgroup reserved data space is not properly released in
> btrfs_invalidatepage() as the page is Clean.
>
> [FIX]
> Instead of checking the dirty bit of a page, call
> btrfs_qgroup_free_data() unconditionally in btrfs_invalidatepage().
>
> As qgroup rsv are completely bound to the QGROUP_RESERVED bit of
> io_tree, not bound to page status, thus we won't cause double freeing
> anyway.
>
> Fixes: 0b34c261e235 ("btrfs: qgroup: Prevent qgroup->reserved from going subzero")
> CC: stable(a)vger.kernel.org # 4.14+
> Reviewed-by: Josef Bacik <josef(a)toxicpanda.com>
> Signed-off-by: Qu Wenruo <wqu(a)suse.com>
> Signed-off-by: David Sterba <dsterba(a)suse.com>
> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
>
> ---
> fs/btrfs/inode.c | 23 ++++++++++-------------
> 1 file changed, 10 insertions(+), 13 deletions(-)
>
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -9197,20 +9197,17 @@ again:
> /*
> * Qgroup reserved space handler
> * Page here will be either
> - * 1) Already written to disk
> - * In this case, its reserved space is released from data rsv map
> - * and will be freed by delayed_ref handler finally.
> - * So even we call qgroup_free_data(), it won't decrease reserved
> - * space.
> - * 2) Not written to disk
> - * This means the reserved space should be freed here. However,
> - * if a truncate invalidates the page (by clearing PageDirty)
> - * and the page is accounted for while allocating extent
> - * in btrfs_check_data_free_space() we let delayed_ref to
> - * free the entire extent.
> + * 1) Already written to disk or ordered extent already submitted
> + * Then its QGROUP_RESERVED bit in io_tree is already cleaned.
> + * Qgroup will be handled by its qgroup_record then.
> + * btrfs_qgroup_free_data() call will do nothing here.
> + *
> + * 2) Not written to disk yet
> + * Then btrfs_qgroup_free_data() call will clear the QGROUP_RESERVED
> + * bit of its io_tree, and free the qgroup reserved data space.
> + * Since the IO will never happen for this page.
> */
> - if (PageDirty(page))
> - btrfs_qgroup_free_data(inode, NULL, page_start, PAGE_SIZE);
> + btrfs_qgroup_free_data(inode, NULL, page_start, PAGE_SIZE);
> if (!inode_evicting) {
> clear_extent_bit(tree, page_start, page_end,
> EXTENT_LOCKED | EXTENT_DIRTY |
>
>
> Patches currently in stable-queue which might be from wqu(a)suse.com are
>
> queue-4.14/btrfs-qgroup-fix-data-leak-caused-by-race-between-writeback-and-truncate.patch
>
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 48cfa61b58a1fee0bc49eef04f8ccf31493b7cdd Mon Sep 17 00:00:00 2001
From: Boris Burkov <boris(a)bur.io>
Date: Thu, 16 Jul 2020 13:29:46 -0700
Subject: [PATCH] btrfs: fix mount failure caused by race with umount
It is possible to cause a btrfs mount to fail by racing it with a slow
umount. The crux of the sequence is generic_shutdown_super not yet
calling sop->put_super before btrfs_mount_root calls btrfs_open_devices.
If that occurs, btrfs_open_devices will decide the opened counter is
non-zero, increment it, and skip resetting fs_devices->total_rw_bytes to
0. From here, mount will call sget which will result in grab_super
trying to take the super block umount semaphore. That semaphore will be
held by the slow umount, so mount will block. Before up-ing the
semaphore, umount will delete the super block, resulting in mount's sget
reliably allocating a new one, which causes the mount path to dutifully
fill it out, and increment total_rw_bytes a second time, which causes
the mount to fail, as we see double the expected bytes.
Here is the sequence laid out in greater detail:
CPU0 CPU1
down_write sb->s_umount
btrfs_kill_super
kill_anon_super(sb)
generic_shutdown_super(sb);
shrink_dcache_for_umount(sb);
sync_filesystem(sb);
evict_inodes(sb); // SLOW
btrfs_mount_root
btrfs_scan_one_device
fs_devices = device->fs_devices
fs_info->fs_devices = fs_devices
// fs_devices-opened makes this a no-op
btrfs_open_devices(fs_devices, mode, fs_type)
s = sget(fs_type, test, set, flags, fs_info);
find sb in s_instances
grab_super(sb);
down_write(&s->s_umount); // blocks
sop->put_super(sb)
// sb->fs_devices->opened == 2; no-op
spin_lock(&sb_lock);
hlist_del_init(&sb->s_instances);
spin_unlock(&sb_lock);
up_write(&sb->s_umount);
return 0;
retry lookup
don't find sb in s_instances (deleted by CPU0)
s = alloc_super
return s;
btrfs_fill_super(s, fs_devices, data)
open_ctree // fs_devices total_rw_bytes improperly set!
btrfs_read_chunk_tree
read_one_dev // increment total_rw_bytes again!!
super_total_bytes < fs_devices->total_rw_bytes // ERROR!!!
To fix this, we clear total_rw_bytes from within btrfs_read_chunk_tree
before the calls to read_one_dev, while holding the sb umount semaphore
and the uuid mutex.
To reproduce, it is sufficient to dirty a decent number of inodes, then
quickly umount and mount.
for i in $(seq 0 500)
do
dd if=/dev/zero of="/mnt/foo/$i" bs=1M count=1
done
umount /mnt/foo&
mount /mnt/foo
does the trick for me.
CC: stable(a)vger.kernel.org # 4.4+
Signed-off-by: Boris Burkov <boris(a)bur.io>
Reviewed-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 0d6e785bcb98..f403fb1e6d37 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -7051,6 +7051,14 @@ int btrfs_read_chunk_tree(struct btrfs_fs_info *fs_info)
mutex_lock(&uuid_mutex);
mutex_lock(&fs_info->chunk_mutex);
+ /*
+ * It is possible for mount and umount to race in such a way that
+ * we execute this code path, but open_fs_devices failed to clear
+ * total_rw_bytes. We certainly want it cleared before reading the
+ * device items, so clear it here.
+ */
+ fs_info->fs_devices->total_rw_bytes = 0;
+
/*
* Read all device items, and then all the chunk items. All
* device items are found before any chunk item (their object id
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 48cfa61b58a1fee0bc49eef04f8ccf31493b7cdd Mon Sep 17 00:00:00 2001
From: Boris Burkov <boris(a)bur.io>
Date: Thu, 16 Jul 2020 13:29:46 -0700
Subject: [PATCH] btrfs: fix mount failure caused by race with umount
It is possible to cause a btrfs mount to fail by racing it with a slow
umount. The crux of the sequence is generic_shutdown_super not yet
calling sop->put_super before btrfs_mount_root calls btrfs_open_devices.
If that occurs, btrfs_open_devices will decide the opened counter is
non-zero, increment it, and skip resetting fs_devices->total_rw_bytes to
0. From here, mount will call sget which will result in grab_super
trying to take the super block umount semaphore. That semaphore will be
held by the slow umount, so mount will block. Before up-ing the
semaphore, umount will delete the super block, resulting in mount's sget
reliably allocating a new one, which causes the mount path to dutifully
fill it out, and increment total_rw_bytes a second time, which causes
the mount to fail, as we see double the expected bytes.
Here is the sequence laid out in greater detail:
CPU0 CPU1
down_write sb->s_umount
btrfs_kill_super
kill_anon_super(sb)
generic_shutdown_super(sb);
shrink_dcache_for_umount(sb);
sync_filesystem(sb);
evict_inodes(sb); // SLOW
btrfs_mount_root
btrfs_scan_one_device
fs_devices = device->fs_devices
fs_info->fs_devices = fs_devices
// fs_devices-opened makes this a no-op
btrfs_open_devices(fs_devices, mode, fs_type)
s = sget(fs_type, test, set, flags, fs_info);
find sb in s_instances
grab_super(sb);
down_write(&s->s_umount); // blocks
sop->put_super(sb)
// sb->fs_devices->opened == 2; no-op
spin_lock(&sb_lock);
hlist_del_init(&sb->s_instances);
spin_unlock(&sb_lock);
up_write(&sb->s_umount);
return 0;
retry lookup
don't find sb in s_instances (deleted by CPU0)
s = alloc_super
return s;
btrfs_fill_super(s, fs_devices, data)
open_ctree // fs_devices total_rw_bytes improperly set!
btrfs_read_chunk_tree
read_one_dev // increment total_rw_bytes again!!
super_total_bytes < fs_devices->total_rw_bytes // ERROR!!!
To fix this, we clear total_rw_bytes from within btrfs_read_chunk_tree
before the calls to read_one_dev, while holding the sb umount semaphore
and the uuid mutex.
To reproduce, it is sufficient to dirty a decent number of inodes, then
quickly umount and mount.
for i in $(seq 0 500)
do
dd if=/dev/zero of="/mnt/foo/$i" bs=1M count=1
done
umount /mnt/foo&
mount /mnt/foo
does the trick for me.
CC: stable(a)vger.kernel.org # 4.4+
Signed-off-by: Boris Burkov <boris(a)bur.io>
Reviewed-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 0d6e785bcb98..f403fb1e6d37 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -7051,6 +7051,14 @@ int btrfs_read_chunk_tree(struct btrfs_fs_info *fs_info)
mutex_lock(&uuid_mutex);
mutex_lock(&fs_info->chunk_mutex);
+ /*
+ * It is possible for mount and umount to race in such a way that
+ * we execute this code path, but open_fs_devices failed to clear
+ * total_rw_bytes. We certainly want it cleared before reading the
+ * device items, so clear it here.
+ */
+ fs_info->fs_devices->total_rw_bytes = 0;
+
/*
* Read all device items, and then all the chunk items. All
* device items are found before any chunk item (their object id
The EFI platform firmware fallback would clobber any pre-allocated
buffers. Instead, correctly refuse to reallocate when too small (as
already done in the sysfs fallback), or perform allocation normally
when needed.
Fixes: e4c2c0ff00ec ("firmware: Add new platform fallback mechanism and firm ware_request_platform()")
Cc: stable(a)vger.kernel.org
Acked-by: Scott Branden <scott.branden(a)broadcom.com>
Signed-off-by: Kees Cook <keescook(a)chromium.org>
---
To aid in backporting, this change is made before moving
kernel_read_file() to separate header/source files.
---
drivers/base/firmware_loader/fallback_platform.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/drivers/base/firmware_loader/fallback_platform.c b/drivers/base/firmware_loader/fallback_platform.c
index cdd2c9a9f38a..685edb7dd05a 100644
--- a/drivers/base/firmware_loader/fallback_platform.c
+++ b/drivers/base/firmware_loader/fallback_platform.c
@@ -25,7 +25,10 @@ int firmware_fallback_platform(struct fw_priv *fw_priv, u32 opt_flags)
if (rc)
return rc; /* rc == -ENOENT when the fw was not found */
- fw_priv->data = vmalloc(size);
+ if (fw_priv->data && size > fw_priv->allocated_size)
+ return -ENOMEM;
+ if (!fw_priv->data)
+ fw_priv->data = vmalloc(size);
if (!fw_priv->data)
return -ENOMEM;
--
2.25.1