The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 5f1ef0dfcb5b7f4a91a9b0e0ba533efd9f7e2cdb
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2026011223-watch-majestic-5e5d@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 5f1ef0dfcb5b7f4a91a9b0e0ba533efd9f7e2cdb Mon Sep 17 00:00:00 2001
From: Steven Rostedt <rostedt(a)goodmis.org>
Date: Mon, 5 Jan 2026 20:31:41 -0500
Subject: [PATCH] tracing: Add recursion protection in kernel stack trace
recording
A bug was reported about an infinite recursion caused by tracing the rcu
events with the kernel stack trace trigger enabled. The stack trace code
called back into RCU which then called the stack trace again.
Expand the ftrace recursion protection to add a set of bits to protect
events from recursion. Each bit represents the context that the event is
in (normal, softirq, interrupt and NMI).
Have the stack trace code use the interrupt context to protect against
recursion.
Note, the bug showed an issue in both the RCU code as well as the tracing
stacktrace code. This only handles the tracing stack trace side of the
bug. The RCU fix will be handled separately.
Link: https://lore.kernel.org/all/20260102122807.7025fc87@gandalf.local.home/
Cc: stable(a)vger.kernel.org
Cc: Masami Hiramatsu <mhiramat(a)kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com>
Cc: Joel Fernandes <joel(a)joelfernandes.org>
Cc: "Paul E. McKenney" <paulmck(a)kernel.org>
Cc: Boqun Feng <boqun.feng(a)gmail.com>
Link: https://patch.msgid.link/20260105203141.515cd49f@gandalf.local.home
Reported-by: Yao Kai <yaokai34(a)huawei.com>
Tested-by: Yao Kai <yaokai34(a)huawei.com>
Fixes: 5f5fa7ea89dc ("rcu: Don't use negative nesting depth in __rcu_read_unlock()")
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
diff --git a/include/linux/trace_recursion.h b/include/linux/trace_recursion.h
index ae04054a1be3..e6ca052b2a85 100644
--- a/include/linux/trace_recursion.h
+++ b/include/linux/trace_recursion.h
@@ -34,6 +34,13 @@ enum {
TRACE_INTERNAL_SIRQ_BIT,
TRACE_INTERNAL_TRANSITION_BIT,
+ /* Internal event use recursion bits */
+ TRACE_INTERNAL_EVENT_BIT,
+ TRACE_INTERNAL_EVENT_NMI_BIT,
+ TRACE_INTERNAL_EVENT_IRQ_BIT,
+ TRACE_INTERNAL_EVENT_SIRQ_BIT,
+ TRACE_INTERNAL_EVENT_TRANSITION_BIT,
+
TRACE_BRANCH_BIT,
/*
* Abuse of the trace_recursion.
@@ -58,6 +65,8 @@ enum {
#define TRACE_LIST_START TRACE_INTERNAL_BIT
+#define TRACE_EVENT_START TRACE_INTERNAL_EVENT_BIT
+
#define TRACE_CONTEXT_MASK ((1 << (TRACE_LIST_START + TRACE_CONTEXT_BITS)) - 1)
/*
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 6f2148df14d9..aef9058537d5 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -3012,6 +3012,11 @@ static void __ftrace_trace_stack(struct trace_array *tr,
struct ftrace_stack *fstack;
struct stack_entry *entry;
int stackidx;
+ int bit;
+
+ bit = trace_test_and_set_recursion(_THIS_IP_, _RET_IP_, TRACE_EVENT_START);
+ if (bit < 0)
+ return;
/*
* Add one, for this function and the call to save_stack_trace()
@@ -3080,6 +3085,7 @@ static void __ftrace_trace_stack(struct trace_array *tr,
/* Again, don't let gcc optimize things here */
barrier();
__this_cpu_dec(ftrace_stack_reserve);
+ trace_clear_recursion(bit);
}
static inline void ftrace_trace_stack(struct trace_array *tr,
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x 5f1ef0dfcb5b7f4a91a9b0e0ba533efd9f7e2cdb
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2026011222-murmuring-promoter-9231@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 5f1ef0dfcb5b7f4a91a9b0e0ba533efd9f7e2cdb Mon Sep 17 00:00:00 2001
From: Steven Rostedt <rostedt(a)goodmis.org>
Date: Mon, 5 Jan 2026 20:31:41 -0500
Subject: [PATCH] tracing: Add recursion protection in kernel stack trace
recording
A bug was reported about an infinite recursion caused by tracing the rcu
events with the kernel stack trace trigger enabled. The stack trace code
called back into RCU which then called the stack trace again.
Expand the ftrace recursion protection to add a set of bits to protect
events from recursion. Each bit represents the context that the event is
in (normal, softirq, interrupt and NMI).
Have the stack trace code use the interrupt context to protect against
recursion.
Note, the bug showed an issue in both the RCU code as well as the tracing
stacktrace code. This only handles the tracing stack trace side of the
bug. The RCU fix will be handled separately.
Link: https://lore.kernel.org/all/20260102122807.7025fc87@gandalf.local.home/
Cc: stable(a)vger.kernel.org
Cc: Masami Hiramatsu <mhiramat(a)kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com>
Cc: Joel Fernandes <joel(a)joelfernandes.org>
Cc: "Paul E. McKenney" <paulmck(a)kernel.org>
Cc: Boqun Feng <boqun.feng(a)gmail.com>
Link: https://patch.msgid.link/20260105203141.515cd49f@gandalf.local.home
Reported-by: Yao Kai <yaokai34(a)huawei.com>
Tested-by: Yao Kai <yaokai34(a)huawei.com>
Fixes: 5f5fa7ea89dc ("rcu: Don't use negative nesting depth in __rcu_read_unlock()")
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
diff --git a/include/linux/trace_recursion.h b/include/linux/trace_recursion.h
index ae04054a1be3..e6ca052b2a85 100644
--- a/include/linux/trace_recursion.h
+++ b/include/linux/trace_recursion.h
@@ -34,6 +34,13 @@ enum {
TRACE_INTERNAL_SIRQ_BIT,
TRACE_INTERNAL_TRANSITION_BIT,
+ /* Internal event use recursion bits */
+ TRACE_INTERNAL_EVENT_BIT,
+ TRACE_INTERNAL_EVENT_NMI_BIT,
+ TRACE_INTERNAL_EVENT_IRQ_BIT,
+ TRACE_INTERNAL_EVENT_SIRQ_BIT,
+ TRACE_INTERNAL_EVENT_TRANSITION_BIT,
+
TRACE_BRANCH_BIT,
/*
* Abuse of the trace_recursion.
@@ -58,6 +65,8 @@ enum {
#define TRACE_LIST_START TRACE_INTERNAL_BIT
+#define TRACE_EVENT_START TRACE_INTERNAL_EVENT_BIT
+
#define TRACE_CONTEXT_MASK ((1 << (TRACE_LIST_START + TRACE_CONTEXT_BITS)) - 1)
/*
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 6f2148df14d9..aef9058537d5 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -3012,6 +3012,11 @@ static void __ftrace_trace_stack(struct trace_array *tr,
struct ftrace_stack *fstack;
struct stack_entry *entry;
int stackidx;
+ int bit;
+
+ bit = trace_test_and_set_recursion(_THIS_IP_, _RET_IP_, TRACE_EVENT_START);
+ if (bit < 0)
+ return;
/*
* Add one, for this function and the call to save_stack_trace()
@@ -3080,6 +3085,7 @@ static void __ftrace_trace_stack(struct trace_array *tr,
/* Again, don't let gcc optimize things here */
barrier();
__this_cpu_dec(ftrace_stack_reserve);
+ trace_clear_recursion(bit);
}
static inline void ftrace_trace_stack(struct trace_array *tr,
The patch below does not apply to the 6.12-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.12.y
git checkout FETCH_HEAD
git cherry-pick -x 5f1ef0dfcb5b7f4a91a9b0e0ba533efd9f7e2cdb
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2026011216-preacher-swampland-24d3@gregkh' --subject-prefix 'PATCH 6.12.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 5f1ef0dfcb5b7f4a91a9b0e0ba533efd9f7e2cdb Mon Sep 17 00:00:00 2001
From: Steven Rostedt <rostedt(a)goodmis.org>
Date: Mon, 5 Jan 2026 20:31:41 -0500
Subject: [PATCH] tracing: Add recursion protection in kernel stack trace
recording
A bug was reported about an infinite recursion caused by tracing the rcu
events with the kernel stack trace trigger enabled. The stack trace code
called back into RCU which then called the stack trace again.
Expand the ftrace recursion protection to add a set of bits to protect
events from recursion. Each bit represents the context that the event is
in (normal, softirq, interrupt and NMI).
Have the stack trace code use the interrupt context to protect against
recursion.
Note, the bug showed an issue in both the RCU code as well as the tracing
stacktrace code. This only handles the tracing stack trace side of the
bug. The RCU fix will be handled separately.
Link: https://lore.kernel.org/all/20260102122807.7025fc87@gandalf.local.home/
Cc: stable(a)vger.kernel.org
Cc: Masami Hiramatsu <mhiramat(a)kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com>
Cc: Joel Fernandes <joel(a)joelfernandes.org>
Cc: "Paul E. McKenney" <paulmck(a)kernel.org>
Cc: Boqun Feng <boqun.feng(a)gmail.com>
Link: https://patch.msgid.link/20260105203141.515cd49f@gandalf.local.home
Reported-by: Yao Kai <yaokai34(a)huawei.com>
Tested-by: Yao Kai <yaokai34(a)huawei.com>
Fixes: 5f5fa7ea89dc ("rcu: Don't use negative nesting depth in __rcu_read_unlock()")
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
diff --git a/include/linux/trace_recursion.h b/include/linux/trace_recursion.h
index ae04054a1be3..e6ca052b2a85 100644
--- a/include/linux/trace_recursion.h
+++ b/include/linux/trace_recursion.h
@@ -34,6 +34,13 @@ enum {
TRACE_INTERNAL_SIRQ_BIT,
TRACE_INTERNAL_TRANSITION_BIT,
+ /* Internal event use recursion bits */
+ TRACE_INTERNAL_EVENT_BIT,
+ TRACE_INTERNAL_EVENT_NMI_BIT,
+ TRACE_INTERNAL_EVENT_IRQ_BIT,
+ TRACE_INTERNAL_EVENT_SIRQ_BIT,
+ TRACE_INTERNAL_EVENT_TRANSITION_BIT,
+
TRACE_BRANCH_BIT,
/*
* Abuse of the trace_recursion.
@@ -58,6 +65,8 @@ enum {
#define TRACE_LIST_START TRACE_INTERNAL_BIT
+#define TRACE_EVENT_START TRACE_INTERNAL_EVENT_BIT
+
#define TRACE_CONTEXT_MASK ((1 << (TRACE_LIST_START + TRACE_CONTEXT_BITS)) - 1)
/*
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 6f2148df14d9..aef9058537d5 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -3012,6 +3012,11 @@ static void __ftrace_trace_stack(struct trace_array *tr,
struct ftrace_stack *fstack;
struct stack_entry *entry;
int stackidx;
+ int bit;
+
+ bit = trace_test_and_set_recursion(_THIS_IP_, _RET_IP_, TRACE_EVENT_START);
+ if (bit < 0)
+ return;
/*
* Add one, for this function and the call to save_stack_trace()
@@ -3080,6 +3085,7 @@ static void __ftrace_trace_stack(struct trace_array *tr,
/* Again, don't let gcc optimize things here */
barrier();
__this_cpu_dec(ftrace_stack_reserve);
+ trace_clear_recursion(bit);
}
static inline void ftrace_trace_stack(struct trace_array *tr,
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x e9e3b22ddfa760762b696ac6417c8d6edd182e49
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2026011247-sharper-eel-d578@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From e9e3b22ddfa760762b696ac6417c8d6edd182e49 Mon Sep 17 00:00:00 2001
From: Qu Wenruo <wqu(a)suse.com>
Date: Thu, 11 Dec 2025 12:45:17 +1030
Subject: [PATCH] btrfs: fix beyond-EOF write handling
[BUG]
For the following write sequence with 64K page size and 4K fs block size,
it will lead to file extent items to be inserted without any data
checksum:
mkfs.btrfs -s 4k -f $dev > /dev/null
mount $dev $mnt
xfs_io -f -c "pwrite 0 16k" -c "pwrite 32k 4k" -c pwrite "60k 64K" \
-c "truncate 16k" $mnt/foobar
umount $mnt
This will result the following 2 file extent items to be inserted (extra
trace point added to insert_ordered_extent_file_extent()):
btrfs_finish_one_ordered: root=5 ino=257 file_off=61440 num_bytes=4096 csum_bytes=0
btrfs_finish_one_ordered: root=5 ino=257 file_off=0 num_bytes=16384 csum_bytes=16384
Note for file offset 60K, we're inserting a file extent without any
data checksum.
Also note that range [32K, 36K) didn't reach
insert_ordered_extent_file_extent(), which is the correct behavior as
that OE is fully truncated, should not result any file extent.
Although file extent at 60K will be later dropped by btrfs_truncate(),
if the transaction got committed after file extent inserted but before
the file extent dropping, we will have a small window where we have a
file extent beyond EOF and without any data checksum.
That will cause "btrfs check" to report error.
[CAUSE]
The sequence happens like this:
- Buffered write dirtied the page cache and updated isize
Now the inode size is 64K, with the following page cache layout:
0 16K 32K 48K 64K
|/////////////| |//| |//|
- Truncate the inode to 16K
Which will trigger writeback through:
btrfs_setsize()
|- truncate_setsize()
| Now the inode size is set to 16K
|
|- btrfs_truncate()
|- btrfs_wait_ordered_range() for [16K, u64(-1)]
|- btrfs_fdatawrite_range() for [16K, u64(-1)}
|- extent_writepage() for folio 0
|- writepage_delalloc()
| Generated OE for [0, 16K), [32K, 36K] and [60K, 64K)
|
|- extent_writepage_io()
Then inside extent_writepage_io(), the dirty fs blocks are handled
differently:
- Submit write for range [0, 16K)
As they are still inside the inode size (16K).
- Mark OE [32K, 36K) as truncated
Since we only call btrfs_lookup_first_ordered_range() once, which
returned the first OE after file offset 16K.
- Mark all OEs inside range [16K, 64K) as finished
Which will mark OE ranges [32K, 36K) and [60K, 64K) as finished.
For OE [32K, 36K) since it's already marked as truncated, and its
truncated length is 0, no file extent will be inserted.
For OE [60K, 64K) it has never been submitted thus has no data
checksum, and we insert the file extent as usual.
This is the root cause of file extent at 60K to be inserted without
any data checksum.
- Clear dirty flags for range [16K, 64K)
It is the function btrfs_folio_clear_dirty() which searches and clears
any dirty blocks inside that range.
[FIX]
The bug itself was introduced a long time ago, way before subpage and
large folio support.
At that time, fs block size must match page size, thus the range
[cur, end) is just one fs block.
But later with subpage and large folios, the same range [cur, end)
can have multiple blocks and ordered extents.
Later commit 18de34daa7c6 ("btrfs: truncate ordered extent when skipping
writeback past i_size") was fixing a bug related to subpage/large
folios, but it's still utilizing the old range [cur, end), meaning only
the first OE will be marked as truncated.
The proper fix here is to make EOF handling block-by-block, not trying
to handle the whole range to @end.
By this we always locate and truncate the OE for every dirty block.
CC: stable(a)vger.kernel.org # 5.15+
Reviewed-by: Filipe Manana <fdmanana(a)suse.com>
Signed-off-by: Qu Wenruo <wqu(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 2d32dfc34ae3..97748d0d54d9 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -1728,7 +1728,7 @@ static noinline_for_stack int extent_writepage_io(struct btrfs_inode *inode,
struct btrfs_ordered_extent *ordered;
ordered = btrfs_lookup_first_ordered_range(inode, cur,
- folio_end - cur);
+ fs_info->sectorsize);
/*
* We have just run delalloc before getting here, so
* there must be an ordered extent.
@@ -1742,7 +1742,7 @@ static noinline_for_stack int extent_writepage_io(struct btrfs_inode *inode,
btrfs_put_ordered_extent(ordered);
btrfs_mark_ordered_io_finished(inode, folio, cur,
- end - cur, true);
+ fs_info->sectorsize, true);
/*
* This range is beyond i_size, thus we don't need to
* bother writing back.
@@ -1751,8 +1751,8 @@ static noinline_for_stack int extent_writepage_io(struct btrfs_inode *inode,
* writeback the sectors with subpage dirty bits,
* causing writeback without ordered extent.
*/
- btrfs_folio_clear_dirty(fs_info, folio, cur, end - cur);
- break;
+ btrfs_folio_clear_dirty(fs_info, folio, cur, fs_info->sectorsize);
+ continue;
}
ret = submit_one_sector(inode, folio, cur, bio_ctrl, i_size);
if (unlikely(ret < 0)) {
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x e9e3b22ddfa760762b696ac6417c8d6edd182e49
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2026011245-mummified-entree-046b@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From e9e3b22ddfa760762b696ac6417c8d6edd182e49 Mon Sep 17 00:00:00 2001
From: Qu Wenruo <wqu(a)suse.com>
Date: Thu, 11 Dec 2025 12:45:17 +1030
Subject: [PATCH] btrfs: fix beyond-EOF write handling
[BUG]
For the following write sequence with 64K page size and 4K fs block size,
it will lead to file extent items to be inserted without any data
checksum:
mkfs.btrfs -s 4k -f $dev > /dev/null
mount $dev $mnt
xfs_io -f -c "pwrite 0 16k" -c "pwrite 32k 4k" -c pwrite "60k 64K" \
-c "truncate 16k" $mnt/foobar
umount $mnt
This will result the following 2 file extent items to be inserted (extra
trace point added to insert_ordered_extent_file_extent()):
btrfs_finish_one_ordered: root=5 ino=257 file_off=61440 num_bytes=4096 csum_bytes=0
btrfs_finish_one_ordered: root=5 ino=257 file_off=0 num_bytes=16384 csum_bytes=16384
Note for file offset 60K, we're inserting a file extent without any
data checksum.
Also note that range [32K, 36K) didn't reach
insert_ordered_extent_file_extent(), which is the correct behavior as
that OE is fully truncated, should not result any file extent.
Although file extent at 60K will be later dropped by btrfs_truncate(),
if the transaction got committed after file extent inserted but before
the file extent dropping, we will have a small window where we have a
file extent beyond EOF and without any data checksum.
That will cause "btrfs check" to report error.
[CAUSE]
The sequence happens like this:
- Buffered write dirtied the page cache and updated isize
Now the inode size is 64K, with the following page cache layout:
0 16K 32K 48K 64K
|/////////////| |//| |//|
- Truncate the inode to 16K
Which will trigger writeback through:
btrfs_setsize()
|- truncate_setsize()
| Now the inode size is set to 16K
|
|- btrfs_truncate()
|- btrfs_wait_ordered_range() for [16K, u64(-1)]
|- btrfs_fdatawrite_range() for [16K, u64(-1)}
|- extent_writepage() for folio 0
|- writepage_delalloc()
| Generated OE for [0, 16K), [32K, 36K] and [60K, 64K)
|
|- extent_writepage_io()
Then inside extent_writepage_io(), the dirty fs blocks are handled
differently:
- Submit write for range [0, 16K)
As they are still inside the inode size (16K).
- Mark OE [32K, 36K) as truncated
Since we only call btrfs_lookup_first_ordered_range() once, which
returned the first OE after file offset 16K.
- Mark all OEs inside range [16K, 64K) as finished
Which will mark OE ranges [32K, 36K) and [60K, 64K) as finished.
For OE [32K, 36K) since it's already marked as truncated, and its
truncated length is 0, no file extent will be inserted.
For OE [60K, 64K) it has never been submitted thus has no data
checksum, and we insert the file extent as usual.
This is the root cause of file extent at 60K to be inserted without
any data checksum.
- Clear dirty flags for range [16K, 64K)
It is the function btrfs_folio_clear_dirty() which searches and clears
any dirty blocks inside that range.
[FIX]
The bug itself was introduced a long time ago, way before subpage and
large folio support.
At that time, fs block size must match page size, thus the range
[cur, end) is just one fs block.
But later with subpage and large folios, the same range [cur, end)
can have multiple blocks and ordered extents.
Later commit 18de34daa7c6 ("btrfs: truncate ordered extent when skipping
writeback past i_size") was fixing a bug related to subpage/large
folios, but it's still utilizing the old range [cur, end), meaning only
the first OE will be marked as truncated.
The proper fix here is to make EOF handling block-by-block, not trying
to handle the whole range to @end.
By this we always locate and truncate the OE for every dirty block.
CC: stable(a)vger.kernel.org # 5.15+
Reviewed-by: Filipe Manana <fdmanana(a)suse.com>
Signed-off-by: Qu Wenruo <wqu(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 2d32dfc34ae3..97748d0d54d9 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -1728,7 +1728,7 @@ static noinline_for_stack int extent_writepage_io(struct btrfs_inode *inode,
struct btrfs_ordered_extent *ordered;
ordered = btrfs_lookup_first_ordered_range(inode, cur,
- folio_end - cur);
+ fs_info->sectorsize);
/*
* We have just run delalloc before getting here, so
* there must be an ordered extent.
@@ -1742,7 +1742,7 @@ static noinline_for_stack int extent_writepage_io(struct btrfs_inode *inode,
btrfs_put_ordered_extent(ordered);
btrfs_mark_ordered_io_finished(inode, folio, cur,
- end - cur, true);
+ fs_info->sectorsize, true);
/*
* This range is beyond i_size, thus we don't need to
* bother writing back.
@@ -1751,8 +1751,8 @@ static noinline_for_stack int extent_writepage_io(struct btrfs_inode *inode,
* writeback the sectors with subpage dirty bits,
* causing writeback without ordered extent.
*/
- btrfs_folio_clear_dirty(fs_info, folio, cur, end - cur);
- break;
+ btrfs_folio_clear_dirty(fs_info, folio, cur, fs_info->sectorsize);
+ continue;
}
ret = submit_one_sector(inode, folio, cur, bio_ctrl, i_size);
if (unlikely(ret < 0)) {
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x e9e3b22ddfa760762b696ac6417c8d6edd182e49
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2026011243-manor-perkiness-7a2a@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From e9e3b22ddfa760762b696ac6417c8d6edd182e49 Mon Sep 17 00:00:00 2001
From: Qu Wenruo <wqu(a)suse.com>
Date: Thu, 11 Dec 2025 12:45:17 +1030
Subject: [PATCH] btrfs: fix beyond-EOF write handling
[BUG]
For the following write sequence with 64K page size and 4K fs block size,
it will lead to file extent items to be inserted without any data
checksum:
mkfs.btrfs -s 4k -f $dev > /dev/null
mount $dev $mnt
xfs_io -f -c "pwrite 0 16k" -c "pwrite 32k 4k" -c pwrite "60k 64K" \
-c "truncate 16k" $mnt/foobar
umount $mnt
This will result the following 2 file extent items to be inserted (extra
trace point added to insert_ordered_extent_file_extent()):
btrfs_finish_one_ordered: root=5 ino=257 file_off=61440 num_bytes=4096 csum_bytes=0
btrfs_finish_one_ordered: root=5 ino=257 file_off=0 num_bytes=16384 csum_bytes=16384
Note for file offset 60K, we're inserting a file extent without any
data checksum.
Also note that range [32K, 36K) didn't reach
insert_ordered_extent_file_extent(), which is the correct behavior as
that OE is fully truncated, should not result any file extent.
Although file extent at 60K will be later dropped by btrfs_truncate(),
if the transaction got committed after file extent inserted but before
the file extent dropping, we will have a small window where we have a
file extent beyond EOF and without any data checksum.
That will cause "btrfs check" to report error.
[CAUSE]
The sequence happens like this:
- Buffered write dirtied the page cache and updated isize
Now the inode size is 64K, with the following page cache layout:
0 16K 32K 48K 64K
|/////////////| |//| |//|
- Truncate the inode to 16K
Which will trigger writeback through:
btrfs_setsize()
|- truncate_setsize()
| Now the inode size is set to 16K
|
|- btrfs_truncate()
|- btrfs_wait_ordered_range() for [16K, u64(-1)]
|- btrfs_fdatawrite_range() for [16K, u64(-1)}
|- extent_writepage() for folio 0
|- writepage_delalloc()
| Generated OE for [0, 16K), [32K, 36K] and [60K, 64K)
|
|- extent_writepage_io()
Then inside extent_writepage_io(), the dirty fs blocks are handled
differently:
- Submit write for range [0, 16K)
As they are still inside the inode size (16K).
- Mark OE [32K, 36K) as truncated
Since we only call btrfs_lookup_first_ordered_range() once, which
returned the first OE after file offset 16K.
- Mark all OEs inside range [16K, 64K) as finished
Which will mark OE ranges [32K, 36K) and [60K, 64K) as finished.
For OE [32K, 36K) since it's already marked as truncated, and its
truncated length is 0, no file extent will be inserted.
For OE [60K, 64K) it has never been submitted thus has no data
checksum, and we insert the file extent as usual.
This is the root cause of file extent at 60K to be inserted without
any data checksum.
- Clear dirty flags for range [16K, 64K)
It is the function btrfs_folio_clear_dirty() which searches and clears
any dirty blocks inside that range.
[FIX]
The bug itself was introduced a long time ago, way before subpage and
large folio support.
At that time, fs block size must match page size, thus the range
[cur, end) is just one fs block.
But later with subpage and large folios, the same range [cur, end)
can have multiple blocks and ordered extents.
Later commit 18de34daa7c6 ("btrfs: truncate ordered extent when skipping
writeback past i_size") was fixing a bug related to subpage/large
folios, but it's still utilizing the old range [cur, end), meaning only
the first OE will be marked as truncated.
The proper fix here is to make EOF handling block-by-block, not trying
to handle the whole range to @end.
By this we always locate and truncate the OE for every dirty block.
CC: stable(a)vger.kernel.org # 5.15+
Reviewed-by: Filipe Manana <fdmanana(a)suse.com>
Signed-off-by: Qu Wenruo <wqu(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 2d32dfc34ae3..97748d0d54d9 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -1728,7 +1728,7 @@ static noinline_for_stack int extent_writepage_io(struct btrfs_inode *inode,
struct btrfs_ordered_extent *ordered;
ordered = btrfs_lookup_first_ordered_range(inode, cur,
- folio_end - cur);
+ fs_info->sectorsize);
/*
* We have just run delalloc before getting here, so
* there must be an ordered extent.
@@ -1742,7 +1742,7 @@ static noinline_for_stack int extent_writepage_io(struct btrfs_inode *inode,
btrfs_put_ordered_extent(ordered);
btrfs_mark_ordered_io_finished(inode, folio, cur,
- end - cur, true);
+ fs_info->sectorsize, true);
/*
* This range is beyond i_size, thus we don't need to
* bother writing back.
@@ -1751,8 +1751,8 @@ static noinline_for_stack int extent_writepage_io(struct btrfs_inode *inode,
* writeback the sectors with subpage dirty bits,
* causing writeback without ordered extent.
*/
- btrfs_folio_clear_dirty(fs_info, folio, cur, end - cur);
- break;
+ btrfs_folio_clear_dirty(fs_info, folio, cur, fs_info->sectorsize);
+ continue;
}
ret = submit_one_sector(inode, folio, cur, bio_ctrl, i_size);
if (unlikely(ret < 0)) {
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x c6c209ceb87f64a6ceebe61761951dcbbf4a0baa
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2026011224-freezing-captive-52e7@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From c6c209ceb87f64a6ceebe61761951dcbbf4a0baa Mon Sep 17 00:00:00 2001
From: Chuck Lever <chuck.lever(a)oracle.com>
Date: Tue, 9 Dec 2025 19:28:49 -0500
Subject: [PATCH] NFSD: Remove NFSERR_EAGAIN
I haven't found an NFSERR_EAGAIN in RFCs 1094, 1813, 7530, or 8881.
None of these RFCs have an NFS status code that match the numeric
value "11".
Based on the meaning of the EAGAIN errno, I presume the use of this
status in NFSD means NFS4ERR_DELAY. So replace the one usage of
nfserr_eagain, and remove it from NFSD's NFS status conversion
tables.
As far as I can tell, NFSERR_EAGAIN has existed since the pre-git
era, but was not actually used by any code until commit f4e44b393389
("NFSD: delay unmount source's export after inter-server copy
completed."), at which time it become possible for NFSD to return
a status code of 11 (which is not valid NFS protocol).
Fixes: f4e44b393389 ("NFSD: delay unmount source's export after inter-server copy completed.")
Cc: stable(a)vger.kernel.org
Reviewed-by: NeilBrown <neil(a)brown.name>
Reviewed-by: Jeff Layton <jlayton(a)kernel.org>
Signed-off-by: Chuck Lever <chuck.lever(a)oracle.com>
diff --git a/fs/nfs_common/common.c b/fs/nfs_common/common.c
index af09aed09fd2..0778743ae2c2 100644
--- a/fs/nfs_common/common.c
+++ b/fs/nfs_common/common.c
@@ -17,7 +17,6 @@ static const struct {
{ NFSERR_NOENT, -ENOENT },
{ NFSERR_IO, -EIO },
{ NFSERR_NXIO, -ENXIO },
-/* { NFSERR_EAGAIN, -EAGAIN }, */
{ NFSERR_ACCES, -EACCES },
{ NFSERR_EXIST, -EEXIST },
{ NFSERR_XDEV, -EXDEV },
diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
index 7f7e6bb23a90..42a6b914c0fe 100644
--- a/fs/nfsd/nfs4proc.c
+++ b/fs/nfsd/nfs4proc.c
@@ -1506,7 +1506,7 @@ static __be32 nfsd4_ssc_setup_dul(struct nfsd_net *nn, char *ipaddr,
(schedule_timeout(20*HZ) == 0)) {
finish_wait(&nn->nfsd_ssc_waitq, &wait);
kfree(work);
- return nfserr_eagain;
+ return nfserr_jukebox;
}
finish_wait(&nn->nfsd_ssc_waitq, &wait);
goto try_again;
diff --git a/fs/nfsd/nfsd.h b/fs/nfsd/nfsd.h
index 50be785f1d2c..b0283213a8f5 100644
--- a/fs/nfsd/nfsd.h
+++ b/fs/nfsd/nfsd.h
@@ -233,7 +233,6 @@ void nfsd_lockd_shutdown(void);
#define nfserr_noent cpu_to_be32(NFSERR_NOENT)
#define nfserr_io cpu_to_be32(NFSERR_IO)
#define nfserr_nxio cpu_to_be32(NFSERR_NXIO)
-#define nfserr_eagain cpu_to_be32(NFSERR_EAGAIN)
#define nfserr_acces cpu_to_be32(NFSERR_ACCES)
#define nfserr_exist cpu_to_be32(NFSERR_EXIST)
#define nfserr_xdev cpu_to_be32(NFSERR_XDEV)
diff --git a/include/trace/misc/nfs.h b/include/trace/misc/nfs.h
index c82233e950ac..a394b4d38e18 100644
--- a/include/trace/misc/nfs.h
+++ b/include/trace/misc/nfs.h
@@ -16,7 +16,6 @@ TRACE_DEFINE_ENUM(NFSERR_PERM);
TRACE_DEFINE_ENUM(NFSERR_NOENT);
TRACE_DEFINE_ENUM(NFSERR_IO);
TRACE_DEFINE_ENUM(NFSERR_NXIO);
-TRACE_DEFINE_ENUM(NFSERR_EAGAIN);
TRACE_DEFINE_ENUM(NFSERR_ACCES);
TRACE_DEFINE_ENUM(NFSERR_EXIST);
TRACE_DEFINE_ENUM(NFSERR_XDEV);
@@ -52,7 +51,6 @@ TRACE_DEFINE_ENUM(NFSERR_JUKEBOX);
{ NFSERR_NXIO, "NXIO" }, \
{ ECHILD, "CHILD" }, \
{ ETIMEDOUT, "TIMEDOUT" }, \
- { NFSERR_EAGAIN, "AGAIN" }, \
{ NFSERR_ACCES, "ACCES" }, \
{ NFSERR_EXIST, "EXIST" }, \
{ NFSERR_XDEV, "XDEV" }, \
diff --git a/include/uapi/linux/nfs.h b/include/uapi/linux/nfs.h
index f356f2ba3814..71c7196d3281 100644
--- a/include/uapi/linux/nfs.h
+++ b/include/uapi/linux/nfs.h
@@ -49,7 +49,6 @@
NFSERR_NOENT = 2, /* v2 v3 v4 */
NFSERR_IO = 5, /* v2 v3 v4 */
NFSERR_NXIO = 6, /* v2 v3 v4 */
- NFSERR_EAGAIN = 11, /* v2 v3 */
NFSERR_ACCES = 13, /* v2 v3 v4 */
NFSERR_EXIST = 17, /* v2 v3 v4 */
NFSERR_XDEV = 18, /* v3 v4 */
This is the start of the stable review cycle for the 6.12.65 release.
There are 16 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Sun, 11 Jan 2026 11:19:41 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.12.65-rc…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.12.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 6.12.65-rc1
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Revert "iommu/amd: Skip enabling command/event buffers for kdump"
Sean Nyekjaer <sean(a)geanix.com>
pwm: stm32: Always program polarity
Maximilian Immanuel Brandtner <maxbr(a)linux.ibm.com>
virtio_console: fix order of fields cols and rows
Peter Zijlstra <peterz(a)infradead.org>
sched/fair: Proportional newidle balance
Peter Zijlstra <peterz(a)infradead.org>
sched/fair: Small cleanup to update_newidle_cost()
Peter Zijlstra <peterz(a)infradead.org>
sched/fair: Small cleanup to sched_balance_newidle()
Thadeu Lima de Souza Cascardo <cascardo(a)igalia.com>
net: Remove RTNL dance for SIOCBRADDIF and SIOCBRDELIF.
Richa Bharti <richa.bharti(a)siemens.com>
cpufreq: intel_pstate: Check IDA only before MSR_IA32_PERF_CTL writes
Natalie Vock <natalie.vock(a)gmx.de>
drm/amdgpu: Forward VMID reservation errors
Miaoqian Lin <linmq006(a)gmail.com>
net: phy: mediatek: fix nvmem cell reference leak in mt798x_phy_calibration
Jouni Malinen <jouni.malinen(a)oss.qualcomm.com>
wifi: mac80211: Discard Beacon frames to non-broadcast address
Paolo Abeni <pabeni(a)redhat.com>
mptcp: ensure context reset on disconnect()
Bijan Tabatabai <bijan311(a)gmail.com>
mm: consider non-anon swap cache folios in folio_expected_ref_count()
David Hildenbrand <david(a)redhat.com>
mm: simplify folio_expected_ref_count()
Alexander Gordeev <agordeev(a)linux.ibm.com>
mm/page_alloc: change all pageblocks migrate type on coalescing
Paolo Abeni <pabeni(a)redhat.com>
mptcp: fallback earlier on simult connection
-------------
Diffstat:
Makefile | 4 +--
drivers/char/virtio_console.c | 2 +-
drivers/cpufreq/intel_pstate.c | 9 +++--
drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 6 ++--
drivers/iommu/amd/init.c | 28 +++++----------
drivers/net/phy/mediatek-ge-soc.c | 2 +-
drivers/pwm/pwm-stm32.c | 3 +-
include/linux/if_bridge.h | 6 ++--
include/linux/mm.h | 10 +++---
include/linux/sched/topology.h | 3 ++
kernel/sched/core.c | 3 ++
kernel/sched/fair.c | 65 +++++++++++++++++++++++++++-------
kernel/sched/features.h | 5 +++
kernel/sched/sched.h | 7 ++++
kernel/sched/topology.c | 6 ++++
mm/page_alloc.c | 24 ++++++-------
net/bridge/br_ioctl.c | 36 +++++++++++++++++--
net/bridge/br_private.h | 3 +-
net/core/dev_ioctl.c | 16 ---------
net/mac80211/rx.c | 5 +++
net/mptcp/options.c | 10 ++++++
net/mptcp/protocol.c | 8 +++--
net/mptcp/protocol.h | 9 +++--
net/mptcp/subflow.c | 10 +-----
net/socket.c | 19 +++++-----
25 files changed, 186 insertions(+), 113 deletions(-)
This is the start of the stable review cycle for the 6.18.5 release.
There are 5 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Sun, 11 Jan 2026 11:19:41 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.18.5-rc1…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.18.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 6.18.5-rc1
Mike Snitzer <snitzer(a)kernel.org>
nfs/localio: fix regression due to out-of-order __put_cred
Peter Zijlstra <peterz(a)infradead.org>
sched/fair: Proportional newidle balance
Peter Zijlstra <peterz(a)infradead.org>
sched/fair: Small cleanup to update_newidle_cost()
Peter Zijlstra <peterz(a)infradead.org>
sched/fair: Small cleanup to sched_balance_newidle()
Paolo Abeni <pabeni(a)redhat.com>
mptcp: ensure context reset on disconnect()
-------------
Diffstat:
Makefile | 4 +--
fs/nfs/localio.c | 12 ++++----
include/linux/sched/topology.h | 3 ++
kernel/sched/core.c | 3 ++
kernel/sched/fair.c | 65 ++++++++++++++++++++++++++++++++++--------
kernel/sched/features.h | 5 ++++
kernel/sched/sched.h | 7 +++++
kernel/sched/topology.c | 6 ++++
net/mptcp/protocol.c | 8 ++++--
net/mptcp/protocol.h | 3 +-
10 files changed, 92 insertions(+), 24 deletions(-)