From: Max Kellermann <max.kellermann(a)ionos.com>
commit 550f7ca98ee028a606aa75705a7e77b1bd11720f upstream.
If the full path to be built by ceph_mdsc_build_path() happens to be
longer than PATH_MAX, then this function will enter an endless (retry)
loop, effectively blocking the whole task. Most of the machine
becomes unusable, making this a very simple and effective DoS
vulnerability.
I cannot imagine why this retry was ever implemented, but it seems
rather useless and harmful to me. Let's remove it and fail with
ENAMETOOLONG instead.
Cc: stable(a)vger.kernel.org
Reported-by: Dario Weißer <dario(a)cure53.de>
Signed-off-by: Max Kellermann <max.kellermann(a)ionos.com>
Reviewed-by: Alex Markuze <amarkuze(a)redhat.com>
Signed-off-by: Ilya Dryomov <idryomov(a)gmail.com>
[idryomov(a)gmail.com: backport to 6.6: pr_warn() is still in use]
---
fs/ceph/mds_client.c | 9 ++++-----
1 file changed, 4 insertions(+), 5 deletions(-)
diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
index 11289ce8a8cc..dfa1b3c82b53 100644
--- a/fs/ceph/mds_client.c
+++ b/fs/ceph/mds_client.c
@@ -2713,12 +2713,11 @@ char *ceph_mdsc_build_path(struct ceph_mds_client *mdsc, struct dentry *dentry,
if (pos < 0) {
/*
- * A rename didn't occur, but somehow we didn't end up where
- * we thought we would. Throw a warning and try again.
+ * The path is longer than PATH_MAX and this function
+ * cannot ever succeed. Creating paths that long is
+ * possible with Ceph, but Linux cannot use them.
*/
- pr_warn("build_path did not end path lookup where expected (pos = %d)\n",
- pos);
- goto retry;
+ return ERR_PTR(-ENAMETOOLONG);
}
*pbase = base;
--
2.46.1
The MT8173 disp-pwm device should have only one compatible string, based
on the following DT validation error:
arch/arm64/boot/dts/mediatek/mt8173-elm.dtb: pwm@1401e000: compatible: 'oneOf' conditional failed, one must be fixed:
['mediatek,mt8173-disp-pwm', 'mediatek,mt6595-disp-pwm'] is too long
'mediatek,mt8173-disp-pwm' is not one of ['mediatek,mt6795-disp-pwm', 'mediatek,mt8167-disp-pwm']
'mediatek,mt8173-disp-pwm' is not one of ['mediatek,mt8186-disp-pwm', 'mediatek,mt8188-disp-pwm', 'mediatek,mt8192-disp-pwm', 'mediatek,mt8195-disp-pwm', 'mediatek,mt8365-disp-pwm']
'mediatek,mt8173-disp-pwm' was expected
'mediatek,mt8183-disp-pwm' was expected
from schema $id: http://devicetree.org/schemas/pwm/mediatek,pwm-disp.yaml#
arch/arm64/boot/dts/mediatek/mt8173-elm.dtb: pwm@1401f000: compatible: 'oneOf' conditional failed, one must be fixed:
['mediatek,mt8173-disp-pwm', 'mediatek,mt6595-disp-pwm'] is too long
'mediatek,mt8173-disp-pwm' is not one of ['mediatek,mt6795-disp-pwm', 'mediatek,mt8167-disp-pwm']
'mediatek,mt8173-disp-pwm' is not one of ['mediatek,mt8186-disp-pwm', 'mediatek,mt8188-disp-pwm', 'mediatek,mt8192-disp-pwm', 'mediatek,mt8195-disp-pwm', 'mediatek,mt8365-disp-pwm']
'mediatek,mt8173-disp-pwm' was expected
'mediatek,mt8183-disp-pwm' was expected
from schema $id: http://devicetree.org/schemas/pwm/mediatek,pwm-disp.yaml#
Drop the extra "mediatek,mt6595-disp-pwm" compatible string.
Fixes: 61aee9342514 ("arm64: dts: mt8173: add MT8173 display PWM driver support node")
Cc: YH Huang <yh.huang(a)mediatek.com>
Cc: <stable(a)vger.kernel.org> # v4.5+
Signed-off-by: Chen-Yu Tsai <wenst(a)chromium.org>
---
arch/arm64/boot/dts/mediatek/mt8173.dtsi | 6 ++----
1 file changed, 2 insertions(+), 4 deletions(-)
diff --git a/arch/arm64/boot/dts/mediatek/mt8173.dtsi b/arch/arm64/boot/dts/mediatek/mt8173.dtsi
index 3458be7f7f61..f49ec7495906 100644
--- a/arch/arm64/boot/dts/mediatek/mt8173.dtsi
+++ b/arch/arm64/boot/dts/mediatek/mt8173.dtsi
@@ -1255,8 +1255,7 @@ dpi0_out: endpoint {
};
pwm0: pwm@1401e000 {
- compatible = "mediatek,mt8173-disp-pwm",
- "mediatek,mt6595-disp-pwm";
+ compatible = "mediatek,mt8173-disp-pwm";
reg = <0 0x1401e000 0 0x1000>;
#pwm-cells = <2>;
clocks = <&mmsys CLK_MM_DISP_PWM026M>,
@@ -1266,8 +1265,7 @@ pwm0: pwm@1401e000 {
};
pwm1: pwm@1401f000 {
- compatible = "mediatek,mt8173-disp-pwm",
- "mediatek,mt6595-disp-pwm";
+ compatible = "mediatek,mt8173-disp-pwm";
reg = <0 0x1401f000 0 0x1000>;
#pwm-cells = <2>;
clocks = <&mmsys CLK_MM_DISP_PWM126M>,
--
2.47.1.613.gc27f4b7a9f-goog
The last two are perhaps not strictly fixes, as they essentially add
support for nested ATRs. The first one is a fix, thus stable is in cc.
Tomi
Signed-off-by: Tomi Valkeinen <tomi.valkeinen(a)ideasonboard.com>
---
Changes in v2:
- Use mutext_init_with_key()
- Rearrange the series so that the fix is the first patch
- Link to v1: https://lore.kernel.org/r/20241122-i2c-atr-fixes-v1-0-62c51ce790be@ideasonb…
---
Cosmin Tanislav (1):
i2c: atr: Allow unmapped addresses from nested ATRs
Tomi Valkeinen (2):
i2c: atr: Fix client detach
i2c: atr: Fix lockdep for nested ATRs
drivers/i2c/i2c-atr.c | 27 ++++++++++++++-------------
1 file changed, 14 insertions(+), 13 deletions(-)
---
base-commit: adc218676eef25575469234709c2d87185ca223a
change-id: 20241121-i2c-atr-fixes-6d52b9f5f0c1
Best regards,
--
Tomi Valkeinen <tomi.valkeinen(a)ideasonboard.com>
Since the backport commit eea46baf1451 ("ftrace: Fix possible
use-after-free issue in ftrace_location()") on linux-5.4.y branch, the
old ftrace_int3_handler()->ftrace_location() path has included
rcu_read_lock(), which has mcount location inside and leads to potential
double fault.
Replace rcu_read_lock/unlock with preempt_enable/disable notrace macros
so that the mcount location does not appear on the int3 handler path.
This fix is specific to linux-5.4.y branch, the only branch still using
ftrace_int3_handler with commit e60b613df8b6 ("ftrace: Fix possible
use-after-free issue in ftrace_location()") backported. It also avoids
the need to backport the code conversion to text_poke() on this branch.
Reported-by: Koichiro Den <koichiro.den(a)canonical.com>
Closes: https://lore.kernel.org/all/74gjhwxupvozwop7ndhrh7t5qeckomt7yqvkkbm5j2tlx6d…
Fixes: eea46baf1451 ("ftrace: Fix possible use-after-free issue in ftrace_location()") # linux-5.4.y
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
Signed-off-by: Koichiro Den <koichiro.den(a)canonical.com>
---
kernel/trace/ftrace.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index 380032a27f98..2eb1a8ec5755 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -1554,7 +1554,7 @@ unsigned long ftrace_location_range(unsigned long start, unsigned long end)
struct dyn_ftrace key;
unsigned long ip = 0;
- rcu_read_lock();
+ preempt_disable_notrace();
key.ip = start;
key.flags = end; /* overload flags, as it is unsigned long */
@@ -1572,7 +1572,7 @@ unsigned long ftrace_location_range(unsigned long start, unsigned long end)
break;
}
}
- rcu_read_unlock();
+ preempt_enable_notrace();
return ip;
}
--
2.43.0
From: yangge <yangge1116(a)126.com>
My machine has 4 NUMA nodes, each equipped with 32GB of memory. I
have configured each NUMA node with 16GB of CMA and 16GB of in-use
hugetlb pages. The allocation of contiguous memory via the
cma_alloc() function can fail probabilistically.
The cma_alloc() function may fail if it sees an in-use hugetlb page
within the allocation range, even if that page has already been
migrated. When in-use hugetlb pages are migrated, they may simply
be released back into the free hugepage pool instead of being
returned to the buddy system. This can cause the
test_pages_isolated() function check to fail, ultimately leading
to the failure of the cma_alloc() function:
cma_alloc()
__alloc_contig_migrate_range() // migrate in-use hugepage
test_pages_isolated()
__test_page_isolated_in_pageblock()
PageBuddy(page) // check if the page is in buddy
To address this issue, we will add a function named
replace_free_hugepage_folios(). This function will replace the
hugepage in the free hugepage pool with a new one and release the
old one to the buddy system. After the migration of in-use hugetlb
pages is completed, we will invoke the replace_free_hugepage_folios()
function to ensure that these hugepages are properly released to
the buddy system. Following this step, when the test_pages_isolated()
function is executed for inspection, it will successfully pass.
Signed-off-by: yangge <yangge1116(a)126.com>
---
include/linux/hugetlb.h | 6 ++++++
mm/hugetlb.c | 37 +++++++++++++++++++++++++++++++++++++
mm/page_alloc.c | 13 ++++++++++++-
3 files changed, 55 insertions(+), 1 deletion(-)
diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index ae4fe86..7d36ac8 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -681,6 +681,7 @@ struct huge_bootmem_page {
};
int isolate_or_dissolve_huge_page(struct page *page, struct list_head *list);
+int replace_free_hugepage_folios(unsigned long start_pfn, unsigned long end_pfn);
struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma,
unsigned long addr, int avoid_reserve);
struct folio *alloc_hugetlb_folio_nodemask(struct hstate *h, int preferred_nid,
@@ -1059,6 +1060,11 @@ static inline int isolate_or_dissolve_huge_page(struct page *page,
return -ENOMEM;
}
+int replace_free_hugepage_folios(unsigned long start_pfn, unsigned long end_pfn)
+{
+ return 0;
+}
+
static inline struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma,
unsigned long addr,
int avoid_reserve)
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 8e1db80..a099c54 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -2975,6 +2975,43 @@ int isolate_or_dissolve_huge_page(struct page *page, struct list_head *list)
return ret;
}
+/*
+ * replace_free_hugepage_folios - Replace free hugepage folios in a given pfn
+ * range with new folios.
+ * @stat_pfn: start pfn of the given pfn range
+ * @end_pfn: end pfn of the given pfn range
+ * Returns 0 on success, otherwise negated error.
+ */
+int replace_free_hugepage_folios(unsigned long start_pfn, unsigned long end_pfn)
+{
+ struct hstate *h;
+ struct folio *folio;
+ int ret = 0;
+
+ LIST_HEAD(isolate_list);
+
+ while (start_pfn < end_pfn) {
+ folio = pfn_folio(start_pfn);
+ if (folio_test_hugetlb(folio)) {
+ h = folio_hstate(folio);
+ } else {
+ start_pfn++;
+ continue;
+ }
+
+ if (!folio_ref_count(folio)) {
+ ret = alloc_and_dissolve_hugetlb_folio(h, folio, &isolate_list);
+ if (ret)
+ break;
+
+ putback_movable_pages(&isolate_list);
+ }
+ start_pfn++;
+ }
+
+ return ret;
+}
+
struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma,
unsigned long addr, int avoid_reserve)
{
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index dde19db..1dcea28 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -6504,7 +6504,18 @@ int alloc_contig_range_noprof(unsigned long start, unsigned long end,
ret = __alloc_contig_migrate_range(&cc, start, end, migratetype);
if (ret && ret != -EBUSY)
goto done;
- ret = 0;
+
+ /*
+ * When in-use hugetlb pages are migrated, they may simply be
+ * released back into the free hugepage pool instead of being
+ * returned to the buddy system. After the migration of in-use
+ * huge pages is completed, we will invoke the
+ * replace_free_hugepage_folios() function to ensure that
+ * these hugepages are properly released to the buddy system.
+ */
+ ret = replace_free_hugepage_folios(start, end);
+ if (ret)
+ goto done;
/*
* Pages from [start, end) are within a pageblock_nr_pages
--
2.7.4
According to the UFS Device Specification, the bUFSFeaturesSupport
defines the support for TOO_HIGH_TEMPERATURE as bit[4] and the
TOO_LOW_TEMPERATURE as bit[5]. Correct the code to match with
the UFS device specification definition.
Fixes: e88e2d322 ("scsi: ufs: core: Probe for temperature notification support")
Cc: stable(a)vger.kernel.org
Signed-off-by: Bao D. Nguyen <quic_nguyenb(a)quicinc.com>
---
include/ufs/ufs.h | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/include/ufs/ufs.h b/include/ufs/ufs.h
index e594abe..f0c6111 100644
--- a/include/ufs/ufs.h
+++ b/include/ufs/ufs.h
@@ -386,8 +386,8 @@ enum {
/* Possible values for dExtendedUFSFeaturesSupport */
enum {
- UFS_DEV_LOW_TEMP_NOTIF = BIT(4),
- UFS_DEV_HIGH_TEMP_NOTIF = BIT(5),
+ UFS_DEV_HIGH_TEMP_NOTIF = BIT(4),
+ UFS_DEV_LOW_TEMP_NOTIF = BIT(5),
UFS_DEV_EXT_TEMP_NOTIF = BIT(6),
UFS_DEV_HPB_SUPPORT = BIT(7),
UFS_DEV_WRITE_BOOSTER_SUP = BIT(8),
--
2.7.4
[BUG]
When running btrfs with block size (4K) smaller than page size (64K,
aarch64), there is a very high chance to crash the kernel at
generic/750, with the following messages:
(before the call traces, there are 3 extra debug messages added)
BTRFS warning (device dm-3): read-write for sector size 4096 with page size 65536 is experimental
BTRFS info (device dm-3): checking UUID tree
hrtimer: interrupt took 5451385 ns
BTRFS error (device dm-3): cow_file_range failed, root=4957 inode=257 start=1605632 len=69632: -28
BTRFS error (device dm-3): run_delalloc_nocow failed, root=4957 inode=257 start=1605632 len=69632: -28
BTRFS error (device dm-3): failed to run delalloc range, root=4957 ino=257 folio=1572864 submit_bitmap=8-15 start=1605632 len=69632: -28
------------[ cut here ]------------
WARNING: CPU: 2 PID: 3020984 at ordered-data.c:360 can_finish_ordered_extent+0x370/0x3b8 [btrfs]
CPU: 2 UID: 0 PID: 3020984 Comm: kworker/u24:1 Tainted: G OE 6.13.0-rc1-custom+ #89
Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
Hardware name: QEMU KVM Virtual Machine, BIOS unknown 2/2/2022
Workqueue: events_unbound btrfs_async_reclaim_data_space [btrfs]
pc : can_finish_ordered_extent+0x370/0x3b8 [btrfs]
lr : can_finish_ordered_extent+0x1ec/0x3b8 [btrfs]
Call trace:
can_finish_ordered_extent+0x370/0x3b8 [btrfs] (P)
can_finish_ordered_extent+0x1ec/0x3b8 [btrfs] (L)
btrfs_mark_ordered_io_finished+0x130/0x2b8 [btrfs]
extent_writepage+0x10c/0x3b8 [btrfs]
extent_write_cache_pages+0x21c/0x4e8 [btrfs]
btrfs_writepages+0x94/0x160 [btrfs]
do_writepages+0x74/0x190
filemap_fdatawrite_wbc+0x74/0xa0
start_delalloc_inodes+0x17c/0x3b0 [btrfs]
btrfs_start_delalloc_roots+0x17c/0x288 [btrfs]
shrink_delalloc+0x11c/0x280 [btrfs]
flush_space+0x288/0x328 [btrfs]
btrfs_async_reclaim_data_space+0x180/0x228 [btrfs]
process_one_work+0x228/0x680
worker_thread+0x1bc/0x360
kthread+0x100/0x118
ret_from_fork+0x10/0x20
---[ end trace 0000000000000000 ]---
BTRFS critical (device dm-3): bad ordered extent accounting, root=4957 ino=257 OE offset=1605632 OE len=16384 to_dec=16384 left=0
BTRFS critical (device dm-3): bad ordered extent accounting, root=4957 ino=257 OE offset=1622016 OE len=12288 to_dec=12288 left=0
Unable to handle kernel NULL pointer dereference at virtual address 0000000000000008
BTRFS critical (device dm-3): bad ordered extent accounting, root=4957 ino=257 OE offset=1634304 OE len=8192 to_dec=4096 left=0
CPU: 1 UID: 0 PID: 3286940 Comm: kworker/u24:3 Tainted: G W OE 6.13.0-rc1-custom+ #89
Hardware name: QEMU KVM Virtual Machine, BIOS unknown 2/2/2022
Workqueue: btrfs_work_helper [btrfs] (btrfs-endio-write)
pstate: 404000c5 (nZcv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : process_one_work+0x110/0x680
lr : worker_thread+0x1bc/0x360
Call trace:
process_one_work+0x110/0x680 (P)
worker_thread+0x1bc/0x360 (L)
worker_thread+0x1bc/0x360
kthread+0x100/0x118
ret_from_fork+0x10/0x20
Code: f84086a1 f9000fe1 53041c21 b9003361 (f9400661)
---[ end trace 0000000000000000 ]---
Kernel panic - not syncing: Oops: Fatal exception
SMP: stopping secondary CPUs
SMP: failed to stop secondary CPUs 2-3
Dumping ftrace buffer:
(ftrace buffer empty)
Kernel Offset: 0x275bb9540000 from 0xffff800080000000
PHYS_OFFSET: 0xffff8fbba0000000
CPU features: 0x100,00000070,00801250,8201720b
[CAUSE]
The above warning is triggered immediately after the delalloc range
failure, this happens in the following sequence:
- Range [1568K, 1636K) is dirty
1536K 1568K 1600K 1636K 1664K
| |/////////|////////| |
Where 1536K, 1600K and 1664K are page boundaries (64K page size)
- Enter extent_writepage() for page 1536K
- Enter run_delalloc_nocow() with locked page 1536K and range
[1568K, 1636K)
This is due to the inode has preallocated extents.
- Enter cow_file_range() with locked page 1536K and range
[1568K, 1636K)
- btrfs_reserve_extent() only reserved two extents
The main loop of cow_file_range() only reserved two data extents,
Now we have:
1536K 1568K 1600K 1636K 1664K
| |<-->|<--->|/|///////| |
1584K 1596K
Range [1568K, 1596K) has ordered extent reserved.
- btrfs_reserve_extent() failed inside cow_file_range() for file offset
1596K
This is already a bug in our space reservation code, but for now let's
focus on the error handling path.
Now cow_file_range() returned -ENOSPC.
- btrfs_run_delalloc_range() do error cleanup <<< ROOT CAUSE
Call btrfs_cleanup_ordered_extents() with locked folio 1536K and range
[1568K, 1636K)
Function btrfs_cleanup_ordered_extents() normally needs to skip the
ranges inside the folio, as it will normally be cleaned up by
extent_writepage().
Such split error handling is already problematic in the first place.
What's worse is the folio range skipping itself, which is not taking
subpage cases into consideration at all, it will only skip the range
if the page start >= the range start.
In our case, the page start < the range start, since for subpage cases
we can have delalloc ranges inside the folio but not covering the
folio.
So it doesn't skip the page range at all.
This means all the ordered extents, both [1568K, 1584K) and
[1584K, 1596K) will be marked as IOERR.
And those two ordered extents have no more pending ios, it is marked
finished, and *QUEUED* to be deleted from the io tree.
- extent_writepage() do error cleanup
Call btrfs_mark_ordered_io_finished() for the range [1536K, 1600K).
Although ranges [1568K, 1584K) and [1584K, 1596K) are finished, the
deletion from io tree is async, it may or may not happen at this
timing.
If the ranges are not yet removed, we will do double cleaning on those
ranges, triggers the above ordered extent warnings.
In theory there are other bugs, like the cleanup in extent_writepage()
can cause double accounting on ranges that are submitted async
(compression for example).
But that's much harder to trigger because normally we do not mix regular
and compression delalloc ranges.
[FIX]
The folio range split is already buggy and not subpage compatible, it's
introduced a long time ago where subpage support is not even considered.
So instead of splitting the ordered extents cleanup into the folio range
and out of folio range, do all the cleanup inside writepage_delalloc().
- Pass @NULL as locked_folio for btrfs_cleanup_ordered_extents() in
btrfs_run_delalloc_range()
- Skip the btrfs_cleanup_ordered_extents() if writepage_delalloc()
failed
So all ordered extents are only cleaned up by
btrfs_run_delalloc_range().
- Handle the ranges that already have ordered extents allocated
If part of the folio already has ordered extent allocated, and
btrfs_run_delalloc_range() failed, we also need to cleanup that range.
Now we have a concentrated error handling for ordered extents during
btrfs_run_delalloc_range().
Cc: stable(a)vger.kernel.org # 5.15+
Fixes: d1051d6ebf8e ("btrfs: Fix error handling in btrfs_cleanup_ordered_extents")
Signed-off-by: Qu Wenruo <wqu(a)suse.com>
---
fs/btrfs/extent_io.c | 37 ++++++++++++++++++++++++++++++++-----
fs/btrfs/inode.c | 2 +-
2 files changed, 33 insertions(+), 6 deletions(-)
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 9725ff7f274d..417c710c55ca 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -1167,6 +1167,12 @@ static noinline_for_stack int writepage_delalloc(struct btrfs_inode *inode,
* last delalloc end.
*/
u64 last_delalloc_end = 0;
+ /*
+ * Save the last successfully ran delalloc range end (exclusive).
+ * This is for error handling to avoid ranges with ordered extent created
+ * but no IO will be submitted due to error.
+ */
+ u64 last_finished = page_start;
u64 delalloc_start = page_start;
u64 delalloc_end = page_end;
u64 delalloc_to_write = 0;
@@ -1235,11 +1241,19 @@ static noinline_for_stack int writepage_delalloc(struct btrfs_inode *inode,
found_len = last_delalloc_end + 1 - found_start;
if (ret >= 0) {
+ /*
+ * Some delalloc range may be created by previous folios.
+ * Thus we still need to clean those range up during error
+ * handling.
+ */
+ last_finished = found_start;
/* No errors hit so far, run the current delalloc range. */
ret = btrfs_run_delalloc_range(inode, folio,
found_start,
found_start + found_len - 1,
wbc);
+ if (ret >= 0)
+ last_finished = found_start + found_len;
} else {
/*
* We've hit an error during previous delalloc range,
@@ -1274,8 +1288,21 @@ static noinline_for_stack int writepage_delalloc(struct btrfs_inode *inode,
delalloc_start = found_start + found_len;
}
- if (ret < 0)
+ /*
+ * It's possible we have some ordered extents created before we hit
+ * an error, cleanup non-async successfully created delalloc ranges.
+ */
+ if (unlikely(ret < 0)) {
+ unsigned int bitmap_size = min(
+ (last_finished - page_start) >> fs_info->sectorsize_bits,
+ fs_info->sectors_per_page);
+
+ for_each_set_bit(bit, &bio_ctrl->submit_bitmap, bitmap_size)
+ btrfs_mark_ordered_io_finished(inode, folio,
+ page_start + (bit << fs_info->sectorsize_bits),
+ fs_info->sectorsize, false);
return ret;
+ }
out:
if (last_delalloc_end)
delalloc_end = last_delalloc_end;
@@ -1509,13 +1536,13 @@ static int extent_writepage(struct folio *folio, struct btrfs_bio_ctrl *bio_ctrl
bio_ctrl->wbc->nr_to_write--;
-done:
- if (ret) {
+ if (ret)
btrfs_mark_ordered_io_finished(BTRFS_I(inode), folio,
page_start, PAGE_SIZE, !ret);
- mapping_set_error(folio->mapping, ret);
- }
+done:
+ if (ret < 0)
+ mapping_set_error(folio->mapping, ret);
/*
* Only unlock ranges that are submitted. As there can be some async
* submitted ranges inside the folio.
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index c4997200dbb2..d41bb47d59fb 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -2305,7 +2305,7 @@ int btrfs_run_delalloc_range(struct btrfs_inode *inode, struct folio *locked_fol
out:
if (ret < 0)
- btrfs_cleanup_ordered_extents(inode, locked_folio, start,
+ btrfs_cleanup_ordered_extents(inode, NULL, start,
end - start + 1);
return ret;
}
--
2.47.1