The patch below does not apply to the 6.5-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.5.y
git checkout FETCH_HEAD
git cherry-pick -x 1e611e104b9acb6310b8c684d5acee0e11ca7bd1
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023091658-glare-spinster-90d6@gregkh' --subject-prefix 'PATCH 6.5.y' HEAD^..
Possible dependencies:
1e611e104b9a ("mtd: spi-nor: spansion: preserve CFR2V[7] when writing MEMLAT")
d534fd9787d5 ("mtd: spi-nor: spansion: use CLPEF as an alternative to CLSR")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 1e611e104b9acb6310b8c684d5acee0e11ca7bd1 Mon Sep 17 00:00:00 2001
From: Takahiro Kuwano <Takahiro.Kuwano(a)infineon.com>
Date: Wed, 26 Jul 2023 10:52:48 +0300
Subject: [PATCH] mtd: spi-nor: spansion: preserve CFR2V[7] when writing MEMLAT
CFR2V[7] is assigned to Flash's address mode (3- or 4-ybte) and must not
be changed when writing MEMLAT (CFR2V[3:0]). CFR2V shall be used in a read,
update, write back fashion.
Fixes: c3266af101f2 ("mtd: spi-nor: spansion: add support for Cypress Semper flash")
Signed-off-by: Takahiro Kuwano <Takahiro.Kuwano(a)infineon.com>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/20230726075257.12985-3-tudor.ambarus@linaro.org
Signed-off-by: Tudor Ambarus <tudor.ambarus(a)linaro.org>
diff --git a/drivers/mtd/spi-nor/spansion.c b/drivers/mtd/spi-nor/spansion.c
index 6b2532ed053c..6460d2247bdf 100644
--- a/drivers/mtd/spi-nor/spansion.c
+++ b/drivers/mtd/spi-nor/spansion.c
@@ -4,6 +4,7 @@
* Copyright (C) 2014, Freescale Semiconductor, Inc.
*/
+#include <linux/bitfield.h>
#include <linux/device.h>
#include <linux/mtd/spi-nor.h>
@@ -28,6 +29,7 @@
#define SPINOR_REG_CYPRESS_CFR2 0x3
#define SPINOR_REG_CYPRESS_CFR2V \
(SPINOR_REG_CYPRESS_VREG + SPINOR_REG_CYPRESS_CFR2)
+#define SPINOR_REG_CYPRESS_CFR2_MEMLAT_MASK GENMASK(3, 0)
#define SPINOR_REG_CYPRESS_CFR2_MEMLAT_11_24 0xb
#define SPINOR_REG_CYPRESS_CFR2_ADRBYT BIT(7)
#define SPINOR_REG_CYPRESS_CFR3 0x4
@@ -161,8 +163,18 @@ static int cypress_nor_octal_dtr_en(struct spi_nor *nor)
int ret;
u8 addr_mode_nbytes = nor->params->addr_mode_nbytes;
+ op = (struct spi_mem_op)
+ CYPRESS_NOR_RD_ANY_REG_OP(addr_mode_nbytes,
+ SPINOR_REG_CYPRESS_CFR2V, 0, buf);
+
+ ret = spi_nor_read_any_reg(nor, &op, nor->reg_proto);
+ if (ret)
+ return ret;
+
/* Use 24 dummy cycles for memory array reads. */
- *buf = SPINOR_REG_CYPRESS_CFR2_MEMLAT_11_24;
+ *buf &= ~SPINOR_REG_CYPRESS_CFR2_MEMLAT_MASK;
+ *buf |= FIELD_PREP(SPINOR_REG_CYPRESS_CFR2_MEMLAT_MASK,
+ SPINOR_REG_CYPRESS_CFR2_MEMLAT_11_24);
op = (struct spi_mem_op)
CYPRESS_NOR_WR_ANY_REG_OP(addr_mode_nbytes,
SPINOR_REG_CYPRESS_CFR2V, 1, buf);
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 13bb483d32abb6f8ebd40141d87eb68f11cc2dd2
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023091640-upstroke-gopher-c666@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
13bb483d32ab ("btrfs: zoned: activate metadata block group on write time")
0356ad41e0dd ("btrfs: zoned: defer advancing meta write pointer")
2ad8c0510a96 ("btrfs: zoned: return int from btrfs_check_meta_write_pointer")
7db94301a980 ("btrfs: zoned: introduce block group context to btrfs_eb_write_context")
861093eff4f0 ("btrfs: introduce struct to consolidate extent buffer write context")
71df088c1cc0 ("btrfs: defer splitting of ordered extents until I/O completion")
a6f3e205e491 ("btrfs: move split_extent_map to extent_map.c")
cbfce4c7fbde ("btrfs: optimize the logical to physical mapping for zoned writes")
5cfe76f846d5 ("btrfs: rename the bytenr field in struct btrfs_ordered_sum to logical")
6e4b2479ab38 ("btrfs: mark the len field in struct btrfs_ordered_sum as unsigned")
50b21d7a066f ("btrfs: submit a writeback bio per extent_buffer")
9fdd160160f0 ("btrfs: return bool from lock_extent_buffer_for_io")
adbe7e388e42 ("btrfs: use SECTOR_SHIFT to convert LBA to physical offset")
7edd339c8a41 ("btrfs: pass an ordered_extent to btrfs_extract_ordered_extent")
2e38a84bc6ab ("btrfs: simplify extent map splitting and rename split_zoned_em")
11d33ab6c1f3 ("btrfs: simplify splitting logic in btrfs_extract_ordered_extent")
e44ca71cfe07 ("btrfs: move ordered_extent internal sanity checks into btrfs_split_ordered_extent")
2cef0c79bb81 ("btrfs: make btrfs_split_bio work on struct btrfs_bio")
ae42a154ca89 ("btrfs: pass a btrfs_bio to btrfs_submit_bio")
34f888ce3a35 ("btrfs: cleanup main loop in btrfs_encoded_read_regular_fill_pages")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 13bb483d32abb6f8ebd40141d87eb68f11cc2dd2 Mon Sep 17 00:00:00 2001
From: Naohiro Aota <naohiro.aota(a)wdc.com>
Date: Tue, 8 Aug 2023 01:12:37 +0900
Subject: [PATCH] btrfs: zoned: activate metadata block group on write time
In the current implementation, block groups are activated at reservation
time to ensure that all reserved bytes can be written to an active metadata
block group. However, this approach has proven to be less efficient, as it
activates block groups more frequently than necessary, putting pressure on
the active zone resource and leading to potential issues such as early
ENOSPC or hung_task.
Another drawback of the current method is that it hampers metadata
over-commit, and necessitates additional flush operations and block group
allocations, resulting in decreased overall performance.
To address these issues, this commit introduces a write-time activation of
metadata and system block group. This involves reserving at least one
active block group specifically for a metadata and system block group.
Since metadata write-out is always allocated sequentially, when we need to
write to a non-active block group, we can wait for the ongoing IOs to
complete, activate a new block group, and then proceed with writing to the
new block group.
Fixes: b09315139136 ("btrfs: zoned: activate metadata block group on flush_space")
CC: stable(a)vger.kernel.org # 6.1+
Signed-off-by: Naohiro Aota <naohiro.aota(a)wdc.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
index a127865f49f9..b0e432c30e1d 100644
--- a/fs/btrfs/block-group.c
+++ b/fs/btrfs/block-group.c
@@ -4287,6 +4287,17 @@ int btrfs_free_block_groups(struct btrfs_fs_info *info)
struct btrfs_caching_control *caching_ctl;
struct rb_node *n;
+ if (btrfs_is_zoned(info)) {
+ if (info->active_meta_bg) {
+ btrfs_put_block_group(info->active_meta_bg);
+ info->active_meta_bg = NULL;
+ }
+ if (info->active_system_bg) {
+ btrfs_put_block_group(info->active_system_bg);
+ info->active_system_bg = NULL;
+ }
+ }
+
write_lock(&info->block_group_cache_lock);
while (!list_empty(&info->caching_block_groups)) {
caching_ctl = list_entry(info->caching_block_groups.next,
diff --git a/fs/btrfs/fs.h b/fs/btrfs/fs.h
index ef07c6c252d8..a523d64d5491 100644
--- a/fs/btrfs/fs.h
+++ b/fs/btrfs/fs.h
@@ -770,6 +770,9 @@ struct btrfs_fs_info {
u64 data_reloc_bg;
struct mutex zoned_data_reloc_io_lock;
+ struct btrfs_block_group *active_meta_bg;
+ struct btrfs_block_group *active_system_bg;
+
u64 nr_global_roots;
spinlock_t zone_active_bgs_lock;
diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c
index fc69041bb6b4..099cb6a6d3b3 100644
--- a/fs/btrfs/zoned.c
+++ b/fs/btrfs/zoned.c
@@ -65,6 +65,9 @@
#define SUPER_INFO_SECTORS ((u64)BTRFS_SUPER_INFO_SIZE >> SECTOR_SHIFT)
+static void wait_eb_writebacks(struct btrfs_block_group *block_group);
+static int do_zone_finish(struct btrfs_block_group *block_group, bool fully_written);
+
static inline bool sb_zone_is_full(const struct blk_zone *zone)
{
return (zone->cond == BLK_ZONE_COND_FULL) ||
@@ -1747,6 +1750,62 @@ void btrfs_finish_ordered_zoned(struct btrfs_ordered_extent *ordered)
}
}
+static bool check_bg_is_active(struct btrfs_eb_write_context *ctx,
+ struct btrfs_block_group **active_bg)
+{
+ const struct writeback_control *wbc = ctx->wbc;
+ struct btrfs_block_group *block_group = ctx->zoned_bg;
+ struct btrfs_fs_info *fs_info = block_group->fs_info;
+
+ if (test_bit(BLOCK_GROUP_FLAG_ZONE_IS_ACTIVE, &block_group->runtime_flags))
+ return true;
+
+ if (fs_info->treelog_bg == block_group->start) {
+ if (!btrfs_zone_activate(block_group)) {
+ int ret_fin = btrfs_zone_finish_one_bg(fs_info);
+
+ if (ret_fin != 1 || !btrfs_zone_activate(block_group))
+ return false;
+ }
+ } else if (*active_bg != block_group) {
+ struct btrfs_block_group *tgt = *active_bg;
+
+ /* zoned_meta_io_lock protects fs_info->active_{meta,system}_bg. */
+ lockdep_assert_held(&fs_info->zoned_meta_io_lock);
+
+ if (tgt) {
+ /*
+ * If there is an unsent IO left in the allocated area,
+ * we cannot wait for them as it may cause a deadlock.
+ */
+ if (tgt->meta_write_pointer < tgt->start + tgt->alloc_offset) {
+ if (wbc->sync_mode == WB_SYNC_NONE ||
+ (wbc->sync_mode == WB_SYNC_ALL && !wbc->for_sync))
+ return false;
+ }
+
+ /* Pivot active metadata/system block group. */
+ btrfs_zoned_meta_io_unlock(fs_info);
+ wait_eb_writebacks(tgt);
+ do_zone_finish(tgt, true);
+ btrfs_zoned_meta_io_lock(fs_info);
+ if (*active_bg == tgt) {
+ btrfs_put_block_group(tgt);
+ *active_bg = NULL;
+ }
+ }
+ if (!btrfs_zone_activate(block_group))
+ return false;
+ if (*active_bg != block_group) {
+ ASSERT(*active_bg == NULL);
+ *active_bg = block_group;
+ btrfs_get_block_group(block_group);
+ }
+ }
+
+ return true;
+}
+
/*
* Check if @ctx->eb is aligned to the write pointer.
*
@@ -1781,8 +1840,26 @@ int btrfs_check_meta_write_pointer(struct btrfs_fs_info *fs_info,
ctx->zoned_bg = block_group;
}
- if (block_group->meta_write_pointer == eb->start)
- return 0;
+ if (block_group->meta_write_pointer == eb->start) {
+ struct btrfs_block_group **tgt;
+
+ if (!test_bit(BTRFS_FS_ACTIVE_ZONE_TRACKING, &fs_info->flags))
+ return 0;
+
+ if (block_group->flags & BTRFS_BLOCK_GROUP_SYSTEM)
+ tgt = &fs_info->active_system_bg;
+ else
+ tgt = &fs_info->active_meta_bg;
+ if (check_bg_is_active(ctx, tgt))
+ return 0;
+ }
+
+ /*
+ * Since we may release fs_info->zoned_meta_io_lock, someone can already
+ * start writing this eb. In that case, we can just bail out.
+ */
+ if (block_group->meta_write_pointer > eb->start)
+ return -EBUSY;
/* If for_sync, this hole will be filled with trasnsaction commit. */
if (wbc->sync_mode == WB_SYNC_ALL && !wbc->for_sync)
The patch below does not apply to the 6.5-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.5.y
git checkout FETCH_HEAD
git cherry-pick -x 13bb483d32abb6f8ebd40141d87eb68f11cc2dd2
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023091639-mundane-justice-4caa@gregkh' --subject-prefix 'PATCH 6.5.y' HEAD^..
Possible dependencies:
13bb483d32ab ("btrfs: zoned: activate metadata block group on write time")
0356ad41e0dd ("btrfs: zoned: defer advancing meta write pointer")
2ad8c0510a96 ("btrfs: zoned: return int from btrfs_check_meta_write_pointer")
7db94301a980 ("btrfs: zoned: introduce block group context to btrfs_eb_write_context")
861093eff4f0 ("btrfs: introduce struct to consolidate extent buffer write context")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 13bb483d32abb6f8ebd40141d87eb68f11cc2dd2 Mon Sep 17 00:00:00 2001
From: Naohiro Aota <naohiro.aota(a)wdc.com>
Date: Tue, 8 Aug 2023 01:12:37 +0900
Subject: [PATCH] btrfs: zoned: activate metadata block group on write time
In the current implementation, block groups are activated at reservation
time to ensure that all reserved bytes can be written to an active metadata
block group. However, this approach has proven to be less efficient, as it
activates block groups more frequently than necessary, putting pressure on
the active zone resource and leading to potential issues such as early
ENOSPC or hung_task.
Another drawback of the current method is that it hampers metadata
over-commit, and necessitates additional flush operations and block group
allocations, resulting in decreased overall performance.
To address these issues, this commit introduces a write-time activation of
metadata and system block group. This involves reserving at least one
active block group specifically for a metadata and system block group.
Since metadata write-out is always allocated sequentially, when we need to
write to a non-active block group, we can wait for the ongoing IOs to
complete, activate a new block group, and then proceed with writing to the
new block group.
Fixes: b09315139136 ("btrfs: zoned: activate metadata block group on flush_space")
CC: stable(a)vger.kernel.org # 6.1+
Signed-off-by: Naohiro Aota <naohiro.aota(a)wdc.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
index a127865f49f9..b0e432c30e1d 100644
--- a/fs/btrfs/block-group.c
+++ b/fs/btrfs/block-group.c
@@ -4287,6 +4287,17 @@ int btrfs_free_block_groups(struct btrfs_fs_info *info)
struct btrfs_caching_control *caching_ctl;
struct rb_node *n;
+ if (btrfs_is_zoned(info)) {
+ if (info->active_meta_bg) {
+ btrfs_put_block_group(info->active_meta_bg);
+ info->active_meta_bg = NULL;
+ }
+ if (info->active_system_bg) {
+ btrfs_put_block_group(info->active_system_bg);
+ info->active_system_bg = NULL;
+ }
+ }
+
write_lock(&info->block_group_cache_lock);
while (!list_empty(&info->caching_block_groups)) {
caching_ctl = list_entry(info->caching_block_groups.next,
diff --git a/fs/btrfs/fs.h b/fs/btrfs/fs.h
index ef07c6c252d8..a523d64d5491 100644
--- a/fs/btrfs/fs.h
+++ b/fs/btrfs/fs.h
@@ -770,6 +770,9 @@ struct btrfs_fs_info {
u64 data_reloc_bg;
struct mutex zoned_data_reloc_io_lock;
+ struct btrfs_block_group *active_meta_bg;
+ struct btrfs_block_group *active_system_bg;
+
u64 nr_global_roots;
spinlock_t zone_active_bgs_lock;
diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c
index fc69041bb6b4..099cb6a6d3b3 100644
--- a/fs/btrfs/zoned.c
+++ b/fs/btrfs/zoned.c
@@ -65,6 +65,9 @@
#define SUPER_INFO_SECTORS ((u64)BTRFS_SUPER_INFO_SIZE >> SECTOR_SHIFT)
+static void wait_eb_writebacks(struct btrfs_block_group *block_group);
+static int do_zone_finish(struct btrfs_block_group *block_group, bool fully_written);
+
static inline bool sb_zone_is_full(const struct blk_zone *zone)
{
return (zone->cond == BLK_ZONE_COND_FULL) ||
@@ -1747,6 +1750,62 @@ void btrfs_finish_ordered_zoned(struct btrfs_ordered_extent *ordered)
}
}
+static bool check_bg_is_active(struct btrfs_eb_write_context *ctx,
+ struct btrfs_block_group **active_bg)
+{
+ const struct writeback_control *wbc = ctx->wbc;
+ struct btrfs_block_group *block_group = ctx->zoned_bg;
+ struct btrfs_fs_info *fs_info = block_group->fs_info;
+
+ if (test_bit(BLOCK_GROUP_FLAG_ZONE_IS_ACTIVE, &block_group->runtime_flags))
+ return true;
+
+ if (fs_info->treelog_bg == block_group->start) {
+ if (!btrfs_zone_activate(block_group)) {
+ int ret_fin = btrfs_zone_finish_one_bg(fs_info);
+
+ if (ret_fin != 1 || !btrfs_zone_activate(block_group))
+ return false;
+ }
+ } else if (*active_bg != block_group) {
+ struct btrfs_block_group *tgt = *active_bg;
+
+ /* zoned_meta_io_lock protects fs_info->active_{meta,system}_bg. */
+ lockdep_assert_held(&fs_info->zoned_meta_io_lock);
+
+ if (tgt) {
+ /*
+ * If there is an unsent IO left in the allocated area,
+ * we cannot wait for them as it may cause a deadlock.
+ */
+ if (tgt->meta_write_pointer < tgt->start + tgt->alloc_offset) {
+ if (wbc->sync_mode == WB_SYNC_NONE ||
+ (wbc->sync_mode == WB_SYNC_ALL && !wbc->for_sync))
+ return false;
+ }
+
+ /* Pivot active metadata/system block group. */
+ btrfs_zoned_meta_io_unlock(fs_info);
+ wait_eb_writebacks(tgt);
+ do_zone_finish(tgt, true);
+ btrfs_zoned_meta_io_lock(fs_info);
+ if (*active_bg == tgt) {
+ btrfs_put_block_group(tgt);
+ *active_bg = NULL;
+ }
+ }
+ if (!btrfs_zone_activate(block_group))
+ return false;
+ if (*active_bg != block_group) {
+ ASSERT(*active_bg == NULL);
+ *active_bg = block_group;
+ btrfs_get_block_group(block_group);
+ }
+ }
+
+ return true;
+}
+
/*
* Check if @ctx->eb is aligned to the write pointer.
*
@@ -1781,8 +1840,26 @@ int btrfs_check_meta_write_pointer(struct btrfs_fs_info *fs_info,
ctx->zoned_bg = block_group;
}
- if (block_group->meta_write_pointer == eb->start)
- return 0;
+ if (block_group->meta_write_pointer == eb->start) {
+ struct btrfs_block_group **tgt;
+
+ if (!test_bit(BTRFS_FS_ACTIVE_ZONE_TRACKING, &fs_info->flags))
+ return 0;
+
+ if (block_group->flags & BTRFS_BLOCK_GROUP_SYSTEM)
+ tgt = &fs_info->active_system_bg;
+ else
+ tgt = &fs_info->active_meta_bg;
+ if (check_bg_is_active(ctx, tgt))
+ return 0;
+ }
+
+ /*
+ * Since we may release fs_info->zoned_meta_io_lock, someone can already
+ * start writing this eb. In that case, we can just bail out.
+ */
+ if (block_group->meta_write_pointer > eb->start)
+ return -EBUSY;
/* If for_sync, this hole will be filled with trasnsaction commit. */
if (wbc->sync_mode == WB_SYNC_ALL && !wbc->for_sync)
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.14.y
git checkout FETCH_HEAD
git cherry-pick -x b8bd342d50cbf606666488488f9fea374aceb2d5
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023091602-untapped-scapegoat-dd85@gregkh' --subject-prefix 'PATCH 4.14.y' HEAD^..
Possible dependencies:
b8bd342d50cb ("fuse: nlookup missing decrement in fuse_direntplus_link")
d123d8e1833c ("fuse: split out readdir.c")
63576c13bd17 ("fuse: fix initial parallel dirops")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From b8bd342d50cbf606666488488f9fea374aceb2d5 Mon Sep 17 00:00:00 2001
From: ruanmeisi <ruan.meisi(a)zte.com.cn>
Date: Tue, 25 Apr 2023 19:13:54 +0800
Subject: [PATCH] fuse: nlookup missing decrement in fuse_direntplus_link
During our debugging of glusterfs, we found an Assertion failed error:
inode_lookup >= nlookup, which was caused by the nlookup value in the
kernel being greater than that in the FUSE file system.
The issue was introduced by fuse_direntplus_link, where in the function,
fuse_iget increments nlookup, and if d_splice_alias returns failure,
fuse_direntplus_link returns failure without decrementing nlookup
https://github.com/gluster/glusterfs/pull/4081
Signed-off-by: ruanmeisi <ruan.meisi(a)zte.com.cn>
Fixes: 0b05b18381ee ("fuse: implement NFS-like readdirplus support")
Cc: <stable(a)vger.kernel.org> # v3.9
Signed-off-by: Miklos Szeredi <mszeredi(a)redhat.com>
diff --git a/fs/fuse/readdir.c b/fs/fuse/readdir.c
index dc603479b30e..b3d498163f97 100644
--- a/fs/fuse/readdir.c
+++ b/fs/fuse/readdir.c
@@ -243,8 +243,16 @@ static int fuse_direntplus_link(struct file *file,
dput(dentry);
dentry = alias;
}
- if (IS_ERR(dentry))
+ if (IS_ERR(dentry)) {
+ if (!IS_ERR(inode)) {
+ struct fuse_inode *fi = get_fuse_inode(inode);
+
+ spin_lock(&fi->lock);
+ fi->nlookup--;
+ spin_unlock(&fi->lock);
+ }
return PTR_ERR(dentry);
+ }
}
if (fc->readdirplus_auto)
set_bit(FUSE_I_INIT_RDPLUS, &get_fuse_inode(inode)->state);
The patch below does not apply to the 6.5-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.5.y
git checkout FETCH_HEAD
git cherry-pick -x 772c9f691dcf3a487f29ddb90a5a15c78d7328e1
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023091649-ember-remindful-28e0@gregkh' --subject-prefix 'PATCH 6.5.y' HEAD^..
Possible dependencies:
772c9f691dcf ("ext4: don't use CR_BEST_AVAIL_LEN for non-regular files")
b50675a4a6a6 ("ext4: return found group directly in ext4_mb_choose_next_group_goal_fast")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 772c9f691dcf3a487f29ddb90a5a15c78d7328e1 Mon Sep 17 00:00:00 2001
From: Ritesh Harjani <ritesh.list(a)gmail.com>
Date: Sun, 16 Jul 2023 19:33:34 +0530
Subject: [PATCH] ext4: don't use CR_BEST_AVAIL_LEN for non-regular files
Using CR_BEST_AVAIL_LEN only make sense for regular files, as for
non-regular files we never normalize the allocation request length i.e.
goal len is same as original length (ac_g_ex.fe_len == ac_o_ex.fe_len).
Hence there is no scope of trimming the goal length to make it
satisfy original request len. Thus this patch avoids using
CR_BEST_AVAIL_LEN criteria for non-regular files request.
Cc: stable(a)kernel.org
Fixes: 33122aa930f1 ("ext4: Add allocation criteria 1.5 (CR1_5)")
Reported-by: Eric Whitney <enwlinux(a)gmail.com>
Signed-off-by: Ritesh Harjani (IBM) <ritesh.list(a)gmail.com>
Tested-by: Eric Whitney <enwlinux(a)gmail.com>
Link: https://lore.kernel.org/r/2a694c748ff8b8c4b416995a24f06f07b55047a8.16895160…
Signed-off-by: Theodore Ts'o <tytso(a)mit.edu>
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index b89b5f0816e7..3d5b0b71d7f5 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -966,7 +966,18 @@ static void ext4_mb_choose_next_group_goal_fast(struct ext4_allocation_context *
}
}
- *new_cr = CR_BEST_AVAIL_LEN;
+ /*
+ * CR_BEST_AVAIL_LEN works based on the concept that we have
+ * a larger normalized goal len request which can be trimmed to
+ * a smaller goal len such that it can still satisfy original
+ * request len. However, allocation request for non-regular
+ * files never gets normalized.
+ * See function ext4_mb_normalize_request() (EXT4_MB_HINT_DATA).
+ */
+ if (ac->ac_flags & EXT4_MB_HINT_DATA)
+ *new_cr = CR_BEST_AVAIL_LEN;
+ else
+ *new_cr = CR_GOAL_LEN_SLOW;
}
/*
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.14.y
git checkout FETCH_HEAD
git cherry-pick -x bc056e7163ac7db945366de219745cf94f32a3e6
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023091649-spotlight-ounce-dd0d@gregkh' --subject-prefix 'PATCH 4.14.y' HEAD^..
Possible dependencies:
bc056e7163ac ("ext4: fix BUG in ext4_mb_new_inode_pa() due to overflow")
7e170922f06b ("ext4: Add allocation criteria 1.5 (CR1_5)")
1b4200112108 ("ext4: Avoid scanning smaller extents in BG during CR1")
3ef5d2638796 ("ext4: Add counter to track successful allocation of goal length")
4eb7a4a1a33b ("ext4: Convert mballoc cr (criteria) to enum")
c3defd99d58c ("ext4: treat stripe in block unit")
361eb69fc99f ("ext4: Remove the logic to trim inode PAs")
3872778664e3 ("ext4: Use rbtrees to manage PAs instead of inode i_prealloc_list")
a8e38fd37cff ("ext4: Convert pa->pa_inode_list and pa->pa_obj_lock into a union")
93cdf49f6eca ("ext4: Fix best extent lstart adjustment logic in ext4_mb_new_inode_pa()")
0830344c953a ("ext4: Abstract out overlap fix/check logic in ext4_mb_normalize_request()")
7692094ac513 ("ext4: Move overlap assert logic into a separate function")
bcf434992145 ("ext4: Refactor code in ext4_mb_normalize_request() and ext4_mb_use_preallocated()")
e86a718228b6 ("ext4: Stop searching if PA doesn't satisfy non-extent file")
91a48aaf59d0 ("ext4: avoid unnecessary pointer dereference in ext4_mb_normalize_request")
83e80a6e3543 ("ext4: use buckets for cr 1 block scan instead of rbtree")
4fca50d440cc ("ext4: make mballoc try target group first even with mb_optimize_scan")
cf4ff938b47f ("ext4: correct the judgment of BUG in ext4_mb_normalize_request")
359745d78351 ("proc: remove PDE_DATA() completely")
6dfbbae14a7b ("fs: proc: store PDE()->data into inode->i_private")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From bc056e7163ac7db945366de219745cf94f32a3e6 Mon Sep 17 00:00:00 2001
From: Baokun Li <libaokun1(a)huawei.com>
Date: Mon, 24 Jul 2023 20:10:58 +0800
Subject: [PATCH] ext4: fix BUG in ext4_mb_new_inode_pa() due to overflow
When we calculate the end position of ext4_free_extent, this position may
be exactly where ext4_lblk_t (i.e. uint) overflows. For example, if
ac_g_ex.fe_logical is 4294965248 and ac_orig_goal_len is 2048, then the
computed end is 0x100000000, which is 0. If ac->ac_o_ex.fe_logical is not
the first case of adjusting the best extent, that is, new_bex_end > 0, the
following BUG_ON will be triggered:
=========================================================
kernel BUG at fs/ext4/mballoc.c:5116!
invalid opcode: 0000 [#1] PREEMPT SMP PTI
CPU: 3 PID: 673 Comm: xfs_io Tainted: G E 6.5.0-rc1+ #279
RIP: 0010:ext4_mb_new_inode_pa+0xc5/0x430
Call Trace:
<TASK>
ext4_mb_use_best_found+0x203/0x2f0
ext4_mb_try_best_found+0x163/0x240
ext4_mb_regular_allocator+0x158/0x1550
ext4_mb_new_blocks+0x86a/0xe10
ext4_ext_map_blocks+0xb0c/0x13a0
ext4_map_blocks+0x2cd/0x8f0
ext4_iomap_begin+0x27b/0x400
iomap_iter+0x222/0x3d0
__iomap_dio_rw+0x243/0xcb0
iomap_dio_rw+0x16/0x80
=========================================================
A simple reproducer demonstrating the problem:
mkfs.ext4 -F /dev/sda -b 4096 100M
mount /dev/sda /tmp/test
fallocate -l1M /tmp/test/tmp
fallocate -l10M /tmp/test/file
fallocate -i -o 1M -l16777203M /tmp/test/file
fsstress -d /tmp/test -l 0 -n 100000 -p 8 &
sleep 10 && killall -9 fsstress
rm -f /tmp/test/tmp
xfs_io -c "open -ad /tmp/test/file" -c "pwrite -S 0xff 0 8192"
We simply refactor the logic for adjusting the best extent by adding
a temporary ext4_free_extent ex and use extent_logical_end() to avoid
overflow, which also simplifies the code.
Cc: stable(a)kernel.org # 6.4
Fixes: 93cdf49f6eca ("ext4: Fix best extent lstart adjustment logic in ext4_mb_new_inode_pa()")
Signed-off-by: Baokun Li <libaokun1(a)huawei.com>
Reviewed-by: Ritesh Harjani (IBM) <ritesh.list(a)gmail.com>
Link: https://lore.kernel.org/r/20230724121059.11834-3-libaokun1@huawei.com
Signed-off-by: Theodore Ts'o <tytso(a)mit.edu>
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 4cb13b3e41b3..86bce870dc5a 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -5177,8 +5177,11 @@ ext4_mb_new_inode_pa(struct ext4_allocation_context *ac)
pa = ac->ac_pa;
if (ac->ac_b_ex.fe_len < ac->ac_orig_goal_len) {
- int new_bex_start;
- int new_bex_end;
+ struct ext4_free_extent ex = {
+ .fe_logical = ac->ac_g_ex.fe_logical,
+ .fe_len = ac->ac_orig_goal_len,
+ };
+ loff_t orig_goal_end = extent_logical_end(sbi, &ex);
/* we can't allocate as much as normalizer wants.
* so, found space must get proper lstart
@@ -5197,29 +5200,23 @@ ext4_mb_new_inode_pa(struct ext4_allocation_context *ac)
* still cover original start
* 3. Else, keep the best ex at start of original request.
*/
- new_bex_end = ac->ac_g_ex.fe_logical +
- EXT4_C2B(sbi, ac->ac_orig_goal_len);
- new_bex_start = new_bex_end - EXT4_C2B(sbi, ac->ac_b_ex.fe_len);
- if (ac->ac_o_ex.fe_logical >= new_bex_start)
- goto adjust_bex;
+ ex.fe_len = ac->ac_b_ex.fe_len;
- new_bex_start = ac->ac_g_ex.fe_logical;
- new_bex_end =
- new_bex_start + EXT4_C2B(sbi, ac->ac_b_ex.fe_len);
- if (ac->ac_o_ex.fe_logical < new_bex_end)
+ ex.fe_logical = orig_goal_end - EXT4_C2B(sbi, ex.fe_len);
+ if (ac->ac_o_ex.fe_logical >= ex.fe_logical)
goto adjust_bex;
- new_bex_start = ac->ac_o_ex.fe_logical;
- new_bex_end =
- new_bex_start + EXT4_C2B(sbi, ac->ac_b_ex.fe_len);
+ ex.fe_logical = ac->ac_g_ex.fe_logical;
+ if (ac->ac_o_ex.fe_logical < extent_logical_end(sbi, &ex))
+ goto adjust_bex;
+ ex.fe_logical = ac->ac_o_ex.fe_logical;
adjust_bex:
- ac->ac_b_ex.fe_logical = new_bex_start;
+ ac->ac_b_ex.fe_logical = ex.fe_logical;
BUG_ON(ac->ac_o_ex.fe_logical < ac->ac_b_ex.fe_logical);
BUG_ON(ac->ac_o_ex.fe_len > ac->ac_b_ex.fe_len);
- BUG_ON(new_bex_end > (ac->ac_g_ex.fe_logical +
- EXT4_C2B(sbi, ac->ac_orig_goal_len)));
+ BUG_ON(extent_logical_end(sbi, &ex) > orig_goal_end);
}
pa->pa_lstart = ac->ac_b_ex.fe_logical;
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.19.y
git checkout FETCH_HEAD
git cherry-pick -x bc056e7163ac7db945366de219745cf94f32a3e6
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023091648-dinginess-legacy-08ce@gregkh' --subject-prefix 'PATCH 4.19.y' HEAD^..
Possible dependencies:
bc056e7163ac ("ext4: fix BUG in ext4_mb_new_inode_pa() due to overflow")
7e170922f06b ("ext4: Add allocation criteria 1.5 (CR1_5)")
1b4200112108 ("ext4: Avoid scanning smaller extents in BG during CR1")
3ef5d2638796 ("ext4: Add counter to track successful allocation of goal length")
4eb7a4a1a33b ("ext4: Convert mballoc cr (criteria) to enum")
c3defd99d58c ("ext4: treat stripe in block unit")
361eb69fc99f ("ext4: Remove the logic to trim inode PAs")
3872778664e3 ("ext4: Use rbtrees to manage PAs instead of inode i_prealloc_list")
a8e38fd37cff ("ext4: Convert pa->pa_inode_list and pa->pa_obj_lock into a union")
93cdf49f6eca ("ext4: Fix best extent lstart adjustment logic in ext4_mb_new_inode_pa()")
0830344c953a ("ext4: Abstract out overlap fix/check logic in ext4_mb_normalize_request()")
7692094ac513 ("ext4: Move overlap assert logic into a separate function")
bcf434992145 ("ext4: Refactor code in ext4_mb_normalize_request() and ext4_mb_use_preallocated()")
e86a718228b6 ("ext4: Stop searching if PA doesn't satisfy non-extent file")
91a48aaf59d0 ("ext4: avoid unnecessary pointer dereference in ext4_mb_normalize_request")
83e80a6e3543 ("ext4: use buckets for cr 1 block scan instead of rbtree")
4fca50d440cc ("ext4: make mballoc try target group first even with mb_optimize_scan")
cf4ff938b47f ("ext4: correct the judgment of BUG in ext4_mb_normalize_request")
359745d78351 ("proc: remove PDE_DATA() completely")
6dfbbae14a7b ("fs: proc: store PDE()->data into inode->i_private")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From bc056e7163ac7db945366de219745cf94f32a3e6 Mon Sep 17 00:00:00 2001
From: Baokun Li <libaokun1(a)huawei.com>
Date: Mon, 24 Jul 2023 20:10:58 +0800
Subject: [PATCH] ext4: fix BUG in ext4_mb_new_inode_pa() due to overflow
When we calculate the end position of ext4_free_extent, this position may
be exactly where ext4_lblk_t (i.e. uint) overflows. For example, if
ac_g_ex.fe_logical is 4294965248 and ac_orig_goal_len is 2048, then the
computed end is 0x100000000, which is 0. If ac->ac_o_ex.fe_logical is not
the first case of adjusting the best extent, that is, new_bex_end > 0, the
following BUG_ON will be triggered:
=========================================================
kernel BUG at fs/ext4/mballoc.c:5116!
invalid opcode: 0000 [#1] PREEMPT SMP PTI
CPU: 3 PID: 673 Comm: xfs_io Tainted: G E 6.5.0-rc1+ #279
RIP: 0010:ext4_mb_new_inode_pa+0xc5/0x430
Call Trace:
<TASK>
ext4_mb_use_best_found+0x203/0x2f0
ext4_mb_try_best_found+0x163/0x240
ext4_mb_regular_allocator+0x158/0x1550
ext4_mb_new_blocks+0x86a/0xe10
ext4_ext_map_blocks+0xb0c/0x13a0
ext4_map_blocks+0x2cd/0x8f0
ext4_iomap_begin+0x27b/0x400
iomap_iter+0x222/0x3d0
__iomap_dio_rw+0x243/0xcb0
iomap_dio_rw+0x16/0x80
=========================================================
A simple reproducer demonstrating the problem:
mkfs.ext4 -F /dev/sda -b 4096 100M
mount /dev/sda /tmp/test
fallocate -l1M /tmp/test/tmp
fallocate -l10M /tmp/test/file
fallocate -i -o 1M -l16777203M /tmp/test/file
fsstress -d /tmp/test -l 0 -n 100000 -p 8 &
sleep 10 && killall -9 fsstress
rm -f /tmp/test/tmp
xfs_io -c "open -ad /tmp/test/file" -c "pwrite -S 0xff 0 8192"
We simply refactor the logic for adjusting the best extent by adding
a temporary ext4_free_extent ex and use extent_logical_end() to avoid
overflow, which also simplifies the code.
Cc: stable(a)kernel.org # 6.4
Fixes: 93cdf49f6eca ("ext4: Fix best extent lstart adjustment logic in ext4_mb_new_inode_pa()")
Signed-off-by: Baokun Li <libaokun1(a)huawei.com>
Reviewed-by: Ritesh Harjani (IBM) <ritesh.list(a)gmail.com>
Link: https://lore.kernel.org/r/20230724121059.11834-3-libaokun1@huawei.com
Signed-off-by: Theodore Ts'o <tytso(a)mit.edu>
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 4cb13b3e41b3..86bce870dc5a 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -5177,8 +5177,11 @@ ext4_mb_new_inode_pa(struct ext4_allocation_context *ac)
pa = ac->ac_pa;
if (ac->ac_b_ex.fe_len < ac->ac_orig_goal_len) {
- int new_bex_start;
- int new_bex_end;
+ struct ext4_free_extent ex = {
+ .fe_logical = ac->ac_g_ex.fe_logical,
+ .fe_len = ac->ac_orig_goal_len,
+ };
+ loff_t orig_goal_end = extent_logical_end(sbi, &ex);
/* we can't allocate as much as normalizer wants.
* so, found space must get proper lstart
@@ -5197,29 +5200,23 @@ ext4_mb_new_inode_pa(struct ext4_allocation_context *ac)
* still cover original start
* 3. Else, keep the best ex at start of original request.
*/
- new_bex_end = ac->ac_g_ex.fe_logical +
- EXT4_C2B(sbi, ac->ac_orig_goal_len);
- new_bex_start = new_bex_end - EXT4_C2B(sbi, ac->ac_b_ex.fe_len);
- if (ac->ac_o_ex.fe_logical >= new_bex_start)
- goto adjust_bex;
+ ex.fe_len = ac->ac_b_ex.fe_len;
- new_bex_start = ac->ac_g_ex.fe_logical;
- new_bex_end =
- new_bex_start + EXT4_C2B(sbi, ac->ac_b_ex.fe_len);
- if (ac->ac_o_ex.fe_logical < new_bex_end)
+ ex.fe_logical = orig_goal_end - EXT4_C2B(sbi, ex.fe_len);
+ if (ac->ac_o_ex.fe_logical >= ex.fe_logical)
goto adjust_bex;
- new_bex_start = ac->ac_o_ex.fe_logical;
- new_bex_end =
- new_bex_start + EXT4_C2B(sbi, ac->ac_b_ex.fe_len);
+ ex.fe_logical = ac->ac_g_ex.fe_logical;
+ if (ac->ac_o_ex.fe_logical < extent_logical_end(sbi, &ex))
+ goto adjust_bex;
+ ex.fe_logical = ac->ac_o_ex.fe_logical;
adjust_bex:
- ac->ac_b_ex.fe_logical = new_bex_start;
+ ac->ac_b_ex.fe_logical = ex.fe_logical;
BUG_ON(ac->ac_o_ex.fe_logical < ac->ac_b_ex.fe_logical);
BUG_ON(ac->ac_o_ex.fe_len > ac->ac_b_ex.fe_len);
- BUG_ON(new_bex_end > (ac->ac_g_ex.fe_logical +
- EXT4_C2B(sbi, ac->ac_orig_goal_len)));
+ BUG_ON(extent_logical_end(sbi, &ex) > orig_goal_end);
}
pa->pa_lstart = ac->ac_b_ex.fe_logical;
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x bc056e7163ac7db945366de219745cf94f32a3e6
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023091646-safely-preflight-9684@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
bc056e7163ac ("ext4: fix BUG in ext4_mb_new_inode_pa() due to overflow")
7e170922f06b ("ext4: Add allocation criteria 1.5 (CR1_5)")
1b4200112108 ("ext4: Avoid scanning smaller extents in BG during CR1")
3ef5d2638796 ("ext4: Add counter to track successful allocation of goal length")
4eb7a4a1a33b ("ext4: Convert mballoc cr (criteria) to enum")
c3defd99d58c ("ext4: treat stripe in block unit")
361eb69fc99f ("ext4: Remove the logic to trim inode PAs")
3872778664e3 ("ext4: Use rbtrees to manage PAs instead of inode i_prealloc_list")
a8e38fd37cff ("ext4: Convert pa->pa_inode_list and pa->pa_obj_lock into a union")
93cdf49f6eca ("ext4: Fix best extent lstart adjustment logic in ext4_mb_new_inode_pa()")
0830344c953a ("ext4: Abstract out overlap fix/check logic in ext4_mb_normalize_request()")
7692094ac513 ("ext4: Move overlap assert logic into a separate function")
bcf434992145 ("ext4: Refactor code in ext4_mb_normalize_request() and ext4_mb_use_preallocated()")
e86a718228b6 ("ext4: Stop searching if PA doesn't satisfy non-extent file")
91a48aaf59d0 ("ext4: avoid unnecessary pointer dereference in ext4_mb_normalize_request")
83e80a6e3543 ("ext4: use buckets for cr 1 block scan instead of rbtree")
4fca50d440cc ("ext4: make mballoc try target group first even with mb_optimize_scan")
cf4ff938b47f ("ext4: correct the judgment of BUG in ext4_mb_normalize_request")
359745d78351 ("proc: remove PDE_DATA() completely")
6dfbbae14a7b ("fs: proc: store PDE()->data into inode->i_private")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From bc056e7163ac7db945366de219745cf94f32a3e6 Mon Sep 17 00:00:00 2001
From: Baokun Li <libaokun1(a)huawei.com>
Date: Mon, 24 Jul 2023 20:10:58 +0800
Subject: [PATCH] ext4: fix BUG in ext4_mb_new_inode_pa() due to overflow
When we calculate the end position of ext4_free_extent, this position may
be exactly where ext4_lblk_t (i.e. uint) overflows. For example, if
ac_g_ex.fe_logical is 4294965248 and ac_orig_goal_len is 2048, then the
computed end is 0x100000000, which is 0. If ac->ac_o_ex.fe_logical is not
the first case of adjusting the best extent, that is, new_bex_end > 0, the
following BUG_ON will be triggered:
=========================================================
kernel BUG at fs/ext4/mballoc.c:5116!
invalid opcode: 0000 [#1] PREEMPT SMP PTI
CPU: 3 PID: 673 Comm: xfs_io Tainted: G E 6.5.0-rc1+ #279
RIP: 0010:ext4_mb_new_inode_pa+0xc5/0x430
Call Trace:
<TASK>
ext4_mb_use_best_found+0x203/0x2f0
ext4_mb_try_best_found+0x163/0x240
ext4_mb_regular_allocator+0x158/0x1550
ext4_mb_new_blocks+0x86a/0xe10
ext4_ext_map_blocks+0xb0c/0x13a0
ext4_map_blocks+0x2cd/0x8f0
ext4_iomap_begin+0x27b/0x400
iomap_iter+0x222/0x3d0
__iomap_dio_rw+0x243/0xcb0
iomap_dio_rw+0x16/0x80
=========================================================
A simple reproducer demonstrating the problem:
mkfs.ext4 -F /dev/sda -b 4096 100M
mount /dev/sda /tmp/test
fallocate -l1M /tmp/test/tmp
fallocate -l10M /tmp/test/file
fallocate -i -o 1M -l16777203M /tmp/test/file
fsstress -d /tmp/test -l 0 -n 100000 -p 8 &
sleep 10 && killall -9 fsstress
rm -f /tmp/test/tmp
xfs_io -c "open -ad /tmp/test/file" -c "pwrite -S 0xff 0 8192"
We simply refactor the logic for adjusting the best extent by adding
a temporary ext4_free_extent ex and use extent_logical_end() to avoid
overflow, which also simplifies the code.
Cc: stable(a)kernel.org # 6.4
Fixes: 93cdf49f6eca ("ext4: Fix best extent lstart adjustment logic in ext4_mb_new_inode_pa()")
Signed-off-by: Baokun Li <libaokun1(a)huawei.com>
Reviewed-by: Ritesh Harjani (IBM) <ritesh.list(a)gmail.com>
Link: https://lore.kernel.org/r/20230724121059.11834-3-libaokun1@huawei.com
Signed-off-by: Theodore Ts'o <tytso(a)mit.edu>
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 4cb13b3e41b3..86bce870dc5a 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -5177,8 +5177,11 @@ ext4_mb_new_inode_pa(struct ext4_allocation_context *ac)
pa = ac->ac_pa;
if (ac->ac_b_ex.fe_len < ac->ac_orig_goal_len) {
- int new_bex_start;
- int new_bex_end;
+ struct ext4_free_extent ex = {
+ .fe_logical = ac->ac_g_ex.fe_logical,
+ .fe_len = ac->ac_orig_goal_len,
+ };
+ loff_t orig_goal_end = extent_logical_end(sbi, &ex);
/* we can't allocate as much as normalizer wants.
* so, found space must get proper lstart
@@ -5197,29 +5200,23 @@ ext4_mb_new_inode_pa(struct ext4_allocation_context *ac)
* still cover original start
* 3. Else, keep the best ex at start of original request.
*/
- new_bex_end = ac->ac_g_ex.fe_logical +
- EXT4_C2B(sbi, ac->ac_orig_goal_len);
- new_bex_start = new_bex_end - EXT4_C2B(sbi, ac->ac_b_ex.fe_len);
- if (ac->ac_o_ex.fe_logical >= new_bex_start)
- goto adjust_bex;
+ ex.fe_len = ac->ac_b_ex.fe_len;
- new_bex_start = ac->ac_g_ex.fe_logical;
- new_bex_end =
- new_bex_start + EXT4_C2B(sbi, ac->ac_b_ex.fe_len);
- if (ac->ac_o_ex.fe_logical < new_bex_end)
+ ex.fe_logical = orig_goal_end - EXT4_C2B(sbi, ex.fe_len);
+ if (ac->ac_o_ex.fe_logical >= ex.fe_logical)
goto adjust_bex;
- new_bex_start = ac->ac_o_ex.fe_logical;
- new_bex_end =
- new_bex_start + EXT4_C2B(sbi, ac->ac_b_ex.fe_len);
+ ex.fe_logical = ac->ac_g_ex.fe_logical;
+ if (ac->ac_o_ex.fe_logical < extent_logical_end(sbi, &ex))
+ goto adjust_bex;
+ ex.fe_logical = ac->ac_o_ex.fe_logical;
adjust_bex:
- ac->ac_b_ex.fe_logical = new_bex_start;
+ ac->ac_b_ex.fe_logical = ex.fe_logical;
BUG_ON(ac->ac_o_ex.fe_logical < ac->ac_b_ex.fe_logical);
BUG_ON(ac->ac_o_ex.fe_len > ac->ac_b_ex.fe_len);
- BUG_ON(new_bex_end > (ac->ac_g_ex.fe_logical +
- EXT4_C2B(sbi, ac->ac_orig_goal_len)));
+ BUG_ON(extent_logical_end(sbi, &ex) > orig_goal_end);
}
pa->pa_lstart = ac->ac_b_ex.fe_logical;
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x bc056e7163ac7db945366de219745cf94f32a3e6
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023091644-chamomile-presoak-599f@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
bc056e7163ac ("ext4: fix BUG in ext4_mb_new_inode_pa() due to overflow")
7e170922f06b ("ext4: Add allocation criteria 1.5 (CR1_5)")
1b4200112108 ("ext4: Avoid scanning smaller extents in BG during CR1")
3ef5d2638796 ("ext4: Add counter to track successful allocation of goal length")
4eb7a4a1a33b ("ext4: Convert mballoc cr (criteria) to enum")
c3defd99d58c ("ext4: treat stripe in block unit")
361eb69fc99f ("ext4: Remove the logic to trim inode PAs")
3872778664e3 ("ext4: Use rbtrees to manage PAs instead of inode i_prealloc_list")
a8e38fd37cff ("ext4: Convert pa->pa_inode_list and pa->pa_obj_lock into a union")
93cdf49f6eca ("ext4: Fix best extent lstart adjustment logic in ext4_mb_new_inode_pa()")
0830344c953a ("ext4: Abstract out overlap fix/check logic in ext4_mb_normalize_request()")
7692094ac513 ("ext4: Move overlap assert logic into a separate function")
bcf434992145 ("ext4: Refactor code in ext4_mb_normalize_request() and ext4_mb_use_preallocated()")
e86a718228b6 ("ext4: Stop searching if PA doesn't satisfy non-extent file")
91a48aaf59d0 ("ext4: avoid unnecessary pointer dereference in ext4_mb_normalize_request")
83e80a6e3543 ("ext4: use buckets for cr 1 block scan instead of rbtree")
4fca50d440cc ("ext4: make mballoc try target group first even with mb_optimize_scan")
cf4ff938b47f ("ext4: correct the judgment of BUG in ext4_mb_normalize_request")
359745d78351 ("proc: remove PDE_DATA() completely")
6dfbbae14a7b ("fs: proc: store PDE()->data into inode->i_private")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From bc056e7163ac7db945366de219745cf94f32a3e6 Mon Sep 17 00:00:00 2001
From: Baokun Li <libaokun1(a)huawei.com>
Date: Mon, 24 Jul 2023 20:10:58 +0800
Subject: [PATCH] ext4: fix BUG in ext4_mb_new_inode_pa() due to overflow
When we calculate the end position of ext4_free_extent, this position may
be exactly where ext4_lblk_t (i.e. uint) overflows. For example, if
ac_g_ex.fe_logical is 4294965248 and ac_orig_goal_len is 2048, then the
computed end is 0x100000000, which is 0. If ac->ac_o_ex.fe_logical is not
the first case of adjusting the best extent, that is, new_bex_end > 0, the
following BUG_ON will be triggered:
=========================================================
kernel BUG at fs/ext4/mballoc.c:5116!
invalid opcode: 0000 [#1] PREEMPT SMP PTI
CPU: 3 PID: 673 Comm: xfs_io Tainted: G E 6.5.0-rc1+ #279
RIP: 0010:ext4_mb_new_inode_pa+0xc5/0x430
Call Trace:
<TASK>
ext4_mb_use_best_found+0x203/0x2f0
ext4_mb_try_best_found+0x163/0x240
ext4_mb_regular_allocator+0x158/0x1550
ext4_mb_new_blocks+0x86a/0xe10
ext4_ext_map_blocks+0xb0c/0x13a0
ext4_map_blocks+0x2cd/0x8f0
ext4_iomap_begin+0x27b/0x400
iomap_iter+0x222/0x3d0
__iomap_dio_rw+0x243/0xcb0
iomap_dio_rw+0x16/0x80
=========================================================
A simple reproducer demonstrating the problem:
mkfs.ext4 -F /dev/sda -b 4096 100M
mount /dev/sda /tmp/test
fallocate -l1M /tmp/test/tmp
fallocate -l10M /tmp/test/file
fallocate -i -o 1M -l16777203M /tmp/test/file
fsstress -d /tmp/test -l 0 -n 100000 -p 8 &
sleep 10 && killall -9 fsstress
rm -f /tmp/test/tmp
xfs_io -c "open -ad /tmp/test/file" -c "pwrite -S 0xff 0 8192"
We simply refactor the logic for adjusting the best extent by adding
a temporary ext4_free_extent ex and use extent_logical_end() to avoid
overflow, which also simplifies the code.
Cc: stable(a)kernel.org # 6.4
Fixes: 93cdf49f6eca ("ext4: Fix best extent lstart adjustment logic in ext4_mb_new_inode_pa()")
Signed-off-by: Baokun Li <libaokun1(a)huawei.com>
Reviewed-by: Ritesh Harjani (IBM) <ritesh.list(a)gmail.com>
Link: https://lore.kernel.org/r/20230724121059.11834-3-libaokun1@huawei.com
Signed-off-by: Theodore Ts'o <tytso(a)mit.edu>
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 4cb13b3e41b3..86bce870dc5a 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -5177,8 +5177,11 @@ ext4_mb_new_inode_pa(struct ext4_allocation_context *ac)
pa = ac->ac_pa;
if (ac->ac_b_ex.fe_len < ac->ac_orig_goal_len) {
- int new_bex_start;
- int new_bex_end;
+ struct ext4_free_extent ex = {
+ .fe_logical = ac->ac_g_ex.fe_logical,
+ .fe_len = ac->ac_orig_goal_len,
+ };
+ loff_t orig_goal_end = extent_logical_end(sbi, &ex);
/* we can't allocate as much as normalizer wants.
* so, found space must get proper lstart
@@ -5197,29 +5200,23 @@ ext4_mb_new_inode_pa(struct ext4_allocation_context *ac)
* still cover original start
* 3. Else, keep the best ex at start of original request.
*/
- new_bex_end = ac->ac_g_ex.fe_logical +
- EXT4_C2B(sbi, ac->ac_orig_goal_len);
- new_bex_start = new_bex_end - EXT4_C2B(sbi, ac->ac_b_ex.fe_len);
- if (ac->ac_o_ex.fe_logical >= new_bex_start)
- goto adjust_bex;
+ ex.fe_len = ac->ac_b_ex.fe_len;
- new_bex_start = ac->ac_g_ex.fe_logical;
- new_bex_end =
- new_bex_start + EXT4_C2B(sbi, ac->ac_b_ex.fe_len);
- if (ac->ac_o_ex.fe_logical < new_bex_end)
+ ex.fe_logical = orig_goal_end - EXT4_C2B(sbi, ex.fe_len);
+ if (ac->ac_o_ex.fe_logical >= ex.fe_logical)
goto adjust_bex;
- new_bex_start = ac->ac_o_ex.fe_logical;
- new_bex_end =
- new_bex_start + EXT4_C2B(sbi, ac->ac_b_ex.fe_len);
+ ex.fe_logical = ac->ac_g_ex.fe_logical;
+ if (ac->ac_o_ex.fe_logical < extent_logical_end(sbi, &ex))
+ goto adjust_bex;
+ ex.fe_logical = ac->ac_o_ex.fe_logical;
adjust_bex:
- ac->ac_b_ex.fe_logical = new_bex_start;
+ ac->ac_b_ex.fe_logical = ex.fe_logical;
BUG_ON(ac->ac_o_ex.fe_logical < ac->ac_b_ex.fe_logical);
BUG_ON(ac->ac_o_ex.fe_len > ac->ac_b_ex.fe_len);
- BUG_ON(new_bex_end > (ac->ac_g_ex.fe_logical +
- EXT4_C2B(sbi, ac->ac_orig_goal_len)));
+ BUG_ON(extent_logical_end(sbi, &ex) > orig_goal_end);
}
pa->pa_lstart = ac->ac_b_ex.fe_logical;
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x bc056e7163ac7db945366de219745cf94f32a3e6
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023091643-excluding-lining-7075@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
bc056e7163ac ("ext4: fix BUG in ext4_mb_new_inode_pa() due to overflow")
7e170922f06b ("ext4: Add allocation criteria 1.5 (CR1_5)")
1b4200112108 ("ext4: Avoid scanning smaller extents in BG during CR1")
3ef5d2638796 ("ext4: Add counter to track successful allocation of goal length")
4eb7a4a1a33b ("ext4: Convert mballoc cr (criteria) to enum")
c3defd99d58c ("ext4: treat stripe in block unit")
361eb69fc99f ("ext4: Remove the logic to trim inode PAs")
3872778664e3 ("ext4: Use rbtrees to manage PAs instead of inode i_prealloc_list")
a8e38fd37cff ("ext4: Convert pa->pa_inode_list and pa->pa_obj_lock into a union")
93cdf49f6eca ("ext4: Fix best extent lstart adjustment logic in ext4_mb_new_inode_pa()")
0830344c953a ("ext4: Abstract out overlap fix/check logic in ext4_mb_normalize_request()")
7692094ac513 ("ext4: Move overlap assert logic into a separate function")
bcf434992145 ("ext4: Refactor code in ext4_mb_normalize_request() and ext4_mb_use_preallocated()")
e86a718228b6 ("ext4: Stop searching if PA doesn't satisfy non-extent file")
91a48aaf59d0 ("ext4: avoid unnecessary pointer dereference in ext4_mb_normalize_request")
83e80a6e3543 ("ext4: use buckets for cr 1 block scan instead of rbtree")
4fca50d440cc ("ext4: make mballoc try target group first even with mb_optimize_scan")
cf4ff938b47f ("ext4: correct the judgment of BUG in ext4_mb_normalize_request")
359745d78351 ("proc: remove PDE_DATA() completely")
6dfbbae14a7b ("fs: proc: store PDE()->data into inode->i_private")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From bc056e7163ac7db945366de219745cf94f32a3e6 Mon Sep 17 00:00:00 2001
From: Baokun Li <libaokun1(a)huawei.com>
Date: Mon, 24 Jul 2023 20:10:58 +0800
Subject: [PATCH] ext4: fix BUG in ext4_mb_new_inode_pa() due to overflow
When we calculate the end position of ext4_free_extent, this position may
be exactly where ext4_lblk_t (i.e. uint) overflows. For example, if
ac_g_ex.fe_logical is 4294965248 and ac_orig_goal_len is 2048, then the
computed end is 0x100000000, which is 0. If ac->ac_o_ex.fe_logical is not
the first case of adjusting the best extent, that is, new_bex_end > 0, the
following BUG_ON will be triggered:
=========================================================
kernel BUG at fs/ext4/mballoc.c:5116!
invalid opcode: 0000 [#1] PREEMPT SMP PTI
CPU: 3 PID: 673 Comm: xfs_io Tainted: G E 6.5.0-rc1+ #279
RIP: 0010:ext4_mb_new_inode_pa+0xc5/0x430
Call Trace:
<TASK>
ext4_mb_use_best_found+0x203/0x2f0
ext4_mb_try_best_found+0x163/0x240
ext4_mb_regular_allocator+0x158/0x1550
ext4_mb_new_blocks+0x86a/0xe10
ext4_ext_map_blocks+0xb0c/0x13a0
ext4_map_blocks+0x2cd/0x8f0
ext4_iomap_begin+0x27b/0x400
iomap_iter+0x222/0x3d0
__iomap_dio_rw+0x243/0xcb0
iomap_dio_rw+0x16/0x80
=========================================================
A simple reproducer demonstrating the problem:
mkfs.ext4 -F /dev/sda -b 4096 100M
mount /dev/sda /tmp/test
fallocate -l1M /tmp/test/tmp
fallocate -l10M /tmp/test/file
fallocate -i -o 1M -l16777203M /tmp/test/file
fsstress -d /tmp/test -l 0 -n 100000 -p 8 &
sleep 10 && killall -9 fsstress
rm -f /tmp/test/tmp
xfs_io -c "open -ad /tmp/test/file" -c "pwrite -S 0xff 0 8192"
We simply refactor the logic for adjusting the best extent by adding
a temporary ext4_free_extent ex and use extent_logical_end() to avoid
overflow, which also simplifies the code.
Cc: stable(a)kernel.org # 6.4
Fixes: 93cdf49f6eca ("ext4: Fix best extent lstart adjustment logic in ext4_mb_new_inode_pa()")
Signed-off-by: Baokun Li <libaokun1(a)huawei.com>
Reviewed-by: Ritesh Harjani (IBM) <ritesh.list(a)gmail.com>
Link: https://lore.kernel.org/r/20230724121059.11834-3-libaokun1@huawei.com
Signed-off-by: Theodore Ts'o <tytso(a)mit.edu>
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 4cb13b3e41b3..86bce870dc5a 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -5177,8 +5177,11 @@ ext4_mb_new_inode_pa(struct ext4_allocation_context *ac)
pa = ac->ac_pa;
if (ac->ac_b_ex.fe_len < ac->ac_orig_goal_len) {
- int new_bex_start;
- int new_bex_end;
+ struct ext4_free_extent ex = {
+ .fe_logical = ac->ac_g_ex.fe_logical,
+ .fe_len = ac->ac_orig_goal_len,
+ };
+ loff_t orig_goal_end = extent_logical_end(sbi, &ex);
/* we can't allocate as much as normalizer wants.
* so, found space must get proper lstart
@@ -5197,29 +5200,23 @@ ext4_mb_new_inode_pa(struct ext4_allocation_context *ac)
* still cover original start
* 3. Else, keep the best ex at start of original request.
*/
- new_bex_end = ac->ac_g_ex.fe_logical +
- EXT4_C2B(sbi, ac->ac_orig_goal_len);
- new_bex_start = new_bex_end - EXT4_C2B(sbi, ac->ac_b_ex.fe_len);
- if (ac->ac_o_ex.fe_logical >= new_bex_start)
- goto adjust_bex;
+ ex.fe_len = ac->ac_b_ex.fe_len;
- new_bex_start = ac->ac_g_ex.fe_logical;
- new_bex_end =
- new_bex_start + EXT4_C2B(sbi, ac->ac_b_ex.fe_len);
- if (ac->ac_o_ex.fe_logical < new_bex_end)
+ ex.fe_logical = orig_goal_end - EXT4_C2B(sbi, ex.fe_len);
+ if (ac->ac_o_ex.fe_logical >= ex.fe_logical)
goto adjust_bex;
- new_bex_start = ac->ac_o_ex.fe_logical;
- new_bex_end =
- new_bex_start + EXT4_C2B(sbi, ac->ac_b_ex.fe_len);
+ ex.fe_logical = ac->ac_g_ex.fe_logical;
+ if (ac->ac_o_ex.fe_logical < extent_logical_end(sbi, &ex))
+ goto adjust_bex;
+ ex.fe_logical = ac->ac_o_ex.fe_logical;
adjust_bex:
- ac->ac_b_ex.fe_logical = new_bex_start;
+ ac->ac_b_ex.fe_logical = ex.fe_logical;
BUG_ON(ac->ac_o_ex.fe_logical < ac->ac_b_ex.fe_logical);
BUG_ON(ac->ac_o_ex.fe_len > ac->ac_b_ex.fe_len);
- BUG_ON(new_bex_end > (ac->ac_g_ex.fe_logical +
- EXT4_C2B(sbi, ac->ac_orig_goal_len)));
+ BUG_ON(extent_logical_end(sbi, &ex) > orig_goal_end);
}
pa->pa_lstart = ac->ac_b_ex.fe_logical;