The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 23f9485510c338476b9735d516c1d4aacb810d46
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2026011244-unbaked-pajamas-9c74@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 23f9485510c338476b9735d516c1d4aacb810d46 Mon Sep 17 00:00:00 2001
From: Alexander Sverdlin <alexander.sverdlin(a)gmail.com>
Date: Tue, 18 Nov 2025 09:35:48 +0100
Subject: [PATCH] counter: interrupt-cnt: Drop IRQF_NO_THREAD flag
An IRQ handler can either be IRQF_NO_THREAD or acquire spinlock_t, as
CONFIG_PROVE_RAW_LOCK_NESTING warns:
=============================
[ BUG: Invalid wait context ]
6.18.0-rc1+git... #1
-----------------------------
some-user-space-process/1251 is trying to lock:
(&counter->events_list_lock){....}-{3:3}, at: counter_push_event [counter]
other info that might help us debug this:
context-{2:2}
no locks held by some-user-space-process/....
stack backtrace:
CPU: 0 UID: 0 PID: 1251 Comm: some-user-space-process 6.18.0-rc1+git... #1 PREEMPT
Call trace:
show_stack (C)
dump_stack_lvl
dump_stack
__lock_acquire
lock_acquire
_raw_spin_lock_irqsave
counter_push_event [counter]
interrupt_cnt_isr [interrupt_cnt]
__handle_irq_event_percpu
handle_irq_event
handle_simple_irq
handle_irq_desc
generic_handle_domain_irq
gpio_irq_handler
handle_irq_desc
generic_handle_domain_irq
gic_handle_irq
call_on_irq_stack
do_interrupt_handler
el0_interrupt
__el0_irq_handler_common
el0t_64_irq_handler
el0t_64_irq
... and Sebastian correctly points out. Remove IRQF_NO_THREAD as an
alternative to switching to raw_spinlock_t, because the latter would limit
all potential nested locks to raw_spinlock_t only.
Cc: Sebastian Andrzej Siewior <bigeasy(a)linutronix.de>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/all/20251117151314.xwLAZrWY@linutronix.de/
Fixes: a55ebd47f21f ("counter: add IRQ or GPIO based counter")
Signed-off-by: Alexander Sverdlin <alexander.sverdlin(a)siemens.com>
Reviewed-by: Sebastian Andrzej Siewior <bigeasy(a)linutronix.de>
Reviewed-by: Oleksij Rempel <o.rempel(a)pengutronix.de>
Link: https://lore.kernel.org/r/20251118083603.778626-1-alexander.sverdlin@siemen…
Signed-off-by: William Breathitt Gray <wbg(a)kernel.org>
diff --git a/drivers/counter/interrupt-cnt.c b/drivers/counter/interrupt-cnt.c
index 6c0c1d2d7027..e6100b5fb082 100644
--- a/drivers/counter/interrupt-cnt.c
+++ b/drivers/counter/interrupt-cnt.c
@@ -229,8 +229,7 @@ static int interrupt_cnt_probe(struct platform_device *pdev)
irq_set_status_flags(priv->irq, IRQ_NOAUTOEN);
ret = devm_request_irq(dev, priv->irq, interrupt_cnt_isr,
- IRQF_TRIGGER_RISING | IRQF_NO_THREAD,
- dev_name(dev), counter);
+ IRQF_TRIGGER_RISING, dev_name(dev), counter);
if (ret)
return ret;
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x e9e3b22ddfa760762b696ac6417c8d6edd182e49
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2026011247-sharper-eel-d578@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From e9e3b22ddfa760762b696ac6417c8d6edd182e49 Mon Sep 17 00:00:00 2001
From: Qu Wenruo <wqu(a)suse.com>
Date: Thu, 11 Dec 2025 12:45:17 +1030
Subject: [PATCH] btrfs: fix beyond-EOF write handling
[BUG]
For the following write sequence with 64K page size and 4K fs block size,
it will lead to file extent items to be inserted without any data
checksum:
mkfs.btrfs -s 4k -f $dev > /dev/null
mount $dev $mnt
xfs_io -f -c "pwrite 0 16k" -c "pwrite 32k 4k" -c pwrite "60k 64K" \
-c "truncate 16k" $mnt/foobar
umount $mnt
This will result the following 2 file extent items to be inserted (extra
trace point added to insert_ordered_extent_file_extent()):
btrfs_finish_one_ordered: root=5 ino=257 file_off=61440 num_bytes=4096 csum_bytes=0
btrfs_finish_one_ordered: root=5 ino=257 file_off=0 num_bytes=16384 csum_bytes=16384
Note for file offset 60K, we're inserting a file extent without any
data checksum.
Also note that range [32K, 36K) didn't reach
insert_ordered_extent_file_extent(), which is the correct behavior as
that OE is fully truncated, should not result any file extent.
Although file extent at 60K will be later dropped by btrfs_truncate(),
if the transaction got committed after file extent inserted but before
the file extent dropping, we will have a small window where we have a
file extent beyond EOF and without any data checksum.
That will cause "btrfs check" to report error.
[CAUSE]
The sequence happens like this:
- Buffered write dirtied the page cache and updated isize
Now the inode size is 64K, with the following page cache layout:
0 16K 32K 48K 64K
|/////////////| |//| |//|
- Truncate the inode to 16K
Which will trigger writeback through:
btrfs_setsize()
|- truncate_setsize()
| Now the inode size is set to 16K
|
|- btrfs_truncate()
|- btrfs_wait_ordered_range() for [16K, u64(-1)]
|- btrfs_fdatawrite_range() for [16K, u64(-1)}
|- extent_writepage() for folio 0
|- writepage_delalloc()
| Generated OE for [0, 16K), [32K, 36K] and [60K, 64K)
|
|- extent_writepage_io()
Then inside extent_writepage_io(), the dirty fs blocks are handled
differently:
- Submit write for range [0, 16K)
As they are still inside the inode size (16K).
- Mark OE [32K, 36K) as truncated
Since we only call btrfs_lookup_first_ordered_range() once, which
returned the first OE after file offset 16K.
- Mark all OEs inside range [16K, 64K) as finished
Which will mark OE ranges [32K, 36K) and [60K, 64K) as finished.
For OE [32K, 36K) since it's already marked as truncated, and its
truncated length is 0, no file extent will be inserted.
For OE [60K, 64K) it has never been submitted thus has no data
checksum, and we insert the file extent as usual.
This is the root cause of file extent at 60K to be inserted without
any data checksum.
- Clear dirty flags for range [16K, 64K)
It is the function btrfs_folio_clear_dirty() which searches and clears
any dirty blocks inside that range.
[FIX]
The bug itself was introduced a long time ago, way before subpage and
large folio support.
At that time, fs block size must match page size, thus the range
[cur, end) is just one fs block.
But later with subpage and large folios, the same range [cur, end)
can have multiple blocks and ordered extents.
Later commit 18de34daa7c6 ("btrfs: truncate ordered extent when skipping
writeback past i_size") was fixing a bug related to subpage/large
folios, but it's still utilizing the old range [cur, end), meaning only
the first OE will be marked as truncated.
The proper fix here is to make EOF handling block-by-block, not trying
to handle the whole range to @end.
By this we always locate and truncate the OE for every dirty block.
CC: stable(a)vger.kernel.org # 5.15+
Reviewed-by: Filipe Manana <fdmanana(a)suse.com>
Signed-off-by: Qu Wenruo <wqu(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 2d32dfc34ae3..97748d0d54d9 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -1728,7 +1728,7 @@ static noinline_for_stack int extent_writepage_io(struct btrfs_inode *inode,
struct btrfs_ordered_extent *ordered;
ordered = btrfs_lookup_first_ordered_range(inode, cur,
- folio_end - cur);
+ fs_info->sectorsize);
/*
* We have just run delalloc before getting here, so
* there must be an ordered extent.
@@ -1742,7 +1742,7 @@ static noinline_for_stack int extent_writepage_io(struct btrfs_inode *inode,
btrfs_put_ordered_extent(ordered);
btrfs_mark_ordered_io_finished(inode, folio, cur,
- end - cur, true);
+ fs_info->sectorsize, true);
/*
* This range is beyond i_size, thus we don't need to
* bother writing back.
@@ -1751,8 +1751,8 @@ static noinline_for_stack int extent_writepage_io(struct btrfs_inode *inode,
* writeback the sectors with subpage dirty bits,
* causing writeback without ordered extent.
*/
- btrfs_folio_clear_dirty(fs_info, folio, cur, end - cur);
- break;
+ btrfs_folio_clear_dirty(fs_info, folio, cur, fs_info->sectorsize);
+ continue;
}
ret = submit_one_sector(inode, folio, cur, bio_ctrl, i_size);
if (unlikely(ret < 0)) {
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x e9e3b22ddfa760762b696ac6417c8d6edd182e49
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2026011245-mummified-entree-046b@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From e9e3b22ddfa760762b696ac6417c8d6edd182e49 Mon Sep 17 00:00:00 2001
From: Qu Wenruo <wqu(a)suse.com>
Date: Thu, 11 Dec 2025 12:45:17 +1030
Subject: [PATCH] btrfs: fix beyond-EOF write handling
[BUG]
For the following write sequence with 64K page size and 4K fs block size,
it will lead to file extent items to be inserted without any data
checksum:
mkfs.btrfs -s 4k -f $dev > /dev/null
mount $dev $mnt
xfs_io -f -c "pwrite 0 16k" -c "pwrite 32k 4k" -c pwrite "60k 64K" \
-c "truncate 16k" $mnt/foobar
umount $mnt
This will result the following 2 file extent items to be inserted (extra
trace point added to insert_ordered_extent_file_extent()):
btrfs_finish_one_ordered: root=5 ino=257 file_off=61440 num_bytes=4096 csum_bytes=0
btrfs_finish_one_ordered: root=5 ino=257 file_off=0 num_bytes=16384 csum_bytes=16384
Note for file offset 60K, we're inserting a file extent without any
data checksum.
Also note that range [32K, 36K) didn't reach
insert_ordered_extent_file_extent(), which is the correct behavior as
that OE is fully truncated, should not result any file extent.
Although file extent at 60K will be later dropped by btrfs_truncate(),
if the transaction got committed after file extent inserted but before
the file extent dropping, we will have a small window where we have a
file extent beyond EOF and without any data checksum.
That will cause "btrfs check" to report error.
[CAUSE]
The sequence happens like this:
- Buffered write dirtied the page cache and updated isize
Now the inode size is 64K, with the following page cache layout:
0 16K 32K 48K 64K
|/////////////| |//| |//|
- Truncate the inode to 16K
Which will trigger writeback through:
btrfs_setsize()
|- truncate_setsize()
| Now the inode size is set to 16K
|
|- btrfs_truncate()
|- btrfs_wait_ordered_range() for [16K, u64(-1)]
|- btrfs_fdatawrite_range() for [16K, u64(-1)}
|- extent_writepage() for folio 0
|- writepage_delalloc()
| Generated OE for [0, 16K), [32K, 36K] and [60K, 64K)
|
|- extent_writepage_io()
Then inside extent_writepage_io(), the dirty fs blocks are handled
differently:
- Submit write for range [0, 16K)
As they are still inside the inode size (16K).
- Mark OE [32K, 36K) as truncated
Since we only call btrfs_lookup_first_ordered_range() once, which
returned the first OE after file offset 16K.
- Mark all OEs inside range [16K, 64K) as finished
Which will mark OE ranges [32K, 36K) and [60K, 64K) as finished.
For OE [32K, 36K) since it's already marked as truncated, and its
truncated length is 0, no file extent will be inserted.
For OE [60K, 64K) it has never been submitted thus has no data
checksum, and we insert the file extent as usual.
This is the root cause of file extent at 60K to be inserted without
any data checksum.
- Clear dirty flags for range [16K, 64K)
It is the function btrfs_folio_clear_dirty() which searches and clears
any dirty blocks inside that range.
[FIX]
The bug itself was introduced a long time ago, way before subpage and
large folio support.
At that time, fs block size must match page size, thus the range
[cur, end) is just one fs block.
But later with subpage and large folios, the same range [cur, end)
can have multiple blocks and ordered extents.
Later commit 18de34daa7c6 ("btrfs: truncate ordered extent when skipping
writeback past i_size") was fixing a bug related to subpage/large
folios, but it's still utilizing the old range [cur, end), meaning only
the first OE will be marked as truncated.
The proper fix here is to make EOF handling block-by-block, not trying
to handle the whole range to @end.
By this we always locate and truncate the OE for every dirty block.
CC: stable(a)vger.kernel.org # 5.15+
Reviewed-by: Filipe Manana <fdmanana(a)suse.com>
Signed-off-by: Qu Wenruo <wqu(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 2d32dfc34ae3..97748d0d54d9 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -1728,7 +1728,7 @@ static noinline_for_stack int extent_writepage_io(struct btrfs_inode *inode,
struct btrfs_ordered_extent *ordered;
ordered = btrfs_lookup_first_ordered_range(inode, cur,
- folio_end - cur);
+ fs_info->sectorsize);
/*
* We have just run delalloc before getting here, so
* there must be an ordered extent.
@@ -1742,7 +1742,7 @@ static noinline_for_stack int extent_writepage_io(struct btrfs_inode *inode,
btrfs_put_ordered_extent(ordered);
btrfs_mark_ordered_io_finished(inode, folio, cur,
- end - cur, true);
+ fs_info->sectorsize, true);
/*
* This range is beyond i_size, thus we don't need to
* bother writing back.
@@ -1751,8 +1751,8 @@ static noinline_for_stack int extent_writepage_io(struct btrfs_inode *inode,
* writeback the sectors with subpage dirty bits,
* causing writeback without ordered extent.
*/
- btrfs_folio_clear_dirty(fs_info, folio, cur, end - cur);
- break;
+ btrfs_folio_clear_dirty(fs_info, folio, cur, fs_info->sectorsize);
+ continue;
}
ret = submit_one_sector(inode, folio, cur, bio_ctrl, i_size);
if (unlikely(ret < 0)) {
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x e9e3b22ddfa760762b696ac6417c8d6edd182e49
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2026011243-manor-perkiness-7a2a@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From e9e3b22ddfa760762b696ac6417c8d6edd182e49 Mon Sep 17 00:00:00 2001
From: Qu Wenruo <wqu(a)suse.com>
Date: Thu, 11 Dec 2025 12:45:17 +1030
Subject: [PATCH] btrfs: fix beyond-EOF write handling
[BUG]
For the following write sequence with 64K page size and 4K fs block size,
it will lead to file extent items to be inserted without any data
checksum:
mkfs.btrfs -s 4k -f $dev > /dev/null
mount $dev $mnt
xfs_io -f -c "pwrite 0 16k" -c "pwrite 32k 4k" -c pwrite "60k 64K" \
-c "truncate 16k" $mnt/foobar
umount $mnt
This will result the following 2 file extent items to be inserted (extra
trace point added to insert_ordered_extent_file_extent()):
btrfs_finish_one_ordered: root=5 ino=257 file_off=61440 num_bytes=4096 csum_bytes=0
btrfs_finish_one_ordered: root=5 ino=257 file_off=0 num_bytes=16384 csum_bytes=16384
Note for file offset 60K, we're inserting a file extent without any
data checksum.
Also note that range [32K, 36K) didn't reach
insert_ordered_extent_file_extent(), which is the correct behavior as
that OE is fully truncated, should not result any file extent.
Although file extent at 60K will be later dropped by btrfs_truncate(),
if the transaction got committed after file extent inserted but before
the file extent dropping, we will have a small window where we have a
file extent beyond EOF and without any data checksum.
That will cause "btrfs check" to report error.
[CAUSE]
The sequence happens like this:
- Buffered write dirtied the page cache and updated isize
Now the inode size is 64K, with the following page cache layout:
0 16K 32K 48K 64K
|/////////////| |//| |//|
- Truncate the inode to 16K
Which will trigger writeback through:
btrfs_setsize()
|- truncate_setsize()
| Now the inode size is set to 16K
|
|- btrfs_truncate()
|- btrfs_wait_ordered_range() for [16K, u64(-1)]
|- btrfs_fdatawrite_range() for [16K, u64(-1)}
|- extent_writepage() for folio 0
|- writepage_delalloc()
| Generated OE for [0, 16K), [32K, 36K] and [60K, 64K)
|
|- extent_writepage_io()
Then inside extent_writepage_io(), the dirty fs blocks are handled
differently:
- Submit write for range [0, 16K)
As they are still inside the inode size (16K).
- Mark OE [32K, 36K) as truncated
Since we only call btrfs_lookup_first_ordered_range() once, which
returned the first OE after file offset 16K.
- Mark all OEs inside range [16K, 64K) as finished
Which will mark OE ranges [32K, 36K) and [60K, 64K) as finished.
For OE [32K, 36K) since it's already marked as truncated, and its
truncated length is 0, no file extent will be inserted.
For OE [60K, 64K) it has never been submitted thus has no data
checksum, and we insert the file extent as usual.
This is the root cause of file extent at 60K to be inserted without
any data checksum.
- Clear dirty flags for range [16K, 64K)
It is the function btrfs_folio_clear_dirty() which searches and clears
any dirty blocks inside that range.
[FIX]
The bug itself was introduced a long time ago, way before subpage and
large folio support.
At that time, fs block size must match page size, thus the range
[cur, end) is just one fs block.
But later with subpage and large folios, the same range [cur, end)
can have multiple blocks and ordered extents.
Later commit 18de34daa7c6 ("btrfs: truncate ordered extent when skipping
writeback past i_size") was fixing a bug related to subpage/large
folios, but it's still utilizing the old range [cur, end), meaning only
the first OE will be marked as truncated.
The proper fix here is to make EOF handling block-by-block, not trying
to handle the whole range to @end.
By this we always locate and truncate the OE for every dirty block.
CC: stable(a)vger.kernel.org # 5.15+
Reviewed-by: Filipe Manana <fdmanana(a)suse.com>
Signed-off-by: Qu Wenruo <wqu(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 2d32dfc34ae3..97748d0d54d9 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -1728,7 +1728,7 @@ static noinline_for_stack int extent_writepage_io(struct btrfs_inode *inode,
struct btrfs_ordered_extent *ordered;
ordered = btrfs_lookup_first_ordered_range(inode, cur,
- folio_end - cur);
+ fs_info->sectorsize);
/*
* We have just run delalloc before getting here, so
* there must be an ordered extent.
@@ -1742,7 +1742,7 @@ static noinline_for_stack int extent_writepage_io(struct btrfs_inode *inode,
btrfs_put_ordered_extent(ordered);
btrfs_mark_ordered_io_finished(inode, folio, cur,
- end - cur, true);
+ fs_info->sectorsize, true);
/*
* This range is beyond i_size, thus we don't need to
* bother writing back.
@@ -1751,8 +1751,8 @@ static noinline_for_stack int extent_writepage_io(struct btrfs_inode *inode,
* writeback the sectors with subpage dirty bits,
* causing writeback without ordered extent.
*/
- btrfs_folio_clear_dirty(fs_info, folio, cur, end - cur);
- break;
+ btrfs_folio_clear_dirty(fs_info, folio, cur, fs_info->sectorsize);
+ continue;
}
ret = submit_one_sector(inode, folio, cur, bio_ctrl, i_size);
if (unlikely(ret < 0)) {
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x c6c209ceb87f64a6ceebe61761951dcbbf4a0baa
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2026011224-freezing-captive-52e7@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From c6c209ceb87f64a6ceebe61761951dcbbf4a0baa Mon Sep 17 00:00:00 2001
From: Chuck Lever <chuck.lever(a)oracle.com>
Date: Tue, 9 Dec 2025 19:28:49 -0500
Subject: [PATCH] NFSD: Remove NFSERR_EAGAIN
I haven't found an NFSERR_EAGAIN in RFCs 1094, 1813, 7530, or 8881.
None of these RFCs have an NFS status code that match the numeric
value "11".
Based on the meaning of the EAGAIN errno, I presume the use of this
status in NFSD means NFS4ERR_DELAY. So replace the one usage of
nfserr_eagain, and remove it from NFSD's NFS status conversion
tables.
As far as I can tell, NFSERR_EAGAIN has existed since the pre-git
era, but was not actually used by any code until commit f4e44b393389
("NFSD: delay unmount source's export after inter-server copy
completed."), at which time it become possible for NFSD to return
a status code of 11 (which is not valid NFS protocol).
Fixes: f4e44b393389 ("NFSD: delay unmount source's export after inter-server copy completed.")
Cc: stable(a)vger.kernel.org
Reviewed-by: NeilBrown <neil(a)brown.name>
Reviewed-by: Jeff Layton <jlayton(a)kernel.org>
Signed-off-by: Chuck Lever <chuck.lever(a)oracle.com>
diff --git a/fs/nfs_common/common.c b/fs/nfs_common/common.c
index af09aed09fd2..0778743ae2c2 100644
--- a/fs/nfs_common/common.c
+++ b/fs/nfs_common/common.c
@@ -17,7 +17,6 @@ static const struct {
{ NFSERR_NOENT, -ENOENT },
{ NFSERR_IO, -EIO },
{ NFSERR_NXIO, -ENXIO },
-/* { NFSERR_EAGAIN, -EAGAIN }, */
{ NFSERR_ACCES, -EACCES },
{ NFSERR_EXIST, -EEXIST },
{ NFSERR_XDEV, -EXDEV },
diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
index 7f7e6bb23a90..42a6b914c0fe 100644
--- a/fs/nfsd/nfs4proc.c
+++ b/fs/nfsd/nfs4proc.c
@@ -1506,7 +1506,7 @@ static __be32 nfsd4_ssc_setup_dul(struct nfsd_net *nn, char *ipaddr,
(schedule_timeout(20*HZ) == 0)) {
finish_wait(&nn->nfsd_ssc_waitq, &wait);
kfree(work);
- return nfserr_eagain;
+ return nfserr_jukebox;
}
finish_wait(&nn->nfsd_ssc_waitq, &wait);
goto try_again;
diff --git a/fs/nfsd/nfsd.h b/fs/nfsd/nfsd.h
index 50be785f1d2c..b0283213a8f5 100644
--- a/fs/nfsd/nfsd.h
+++ b/fs/nfsd/nfsd.h
@@ -233,7 +233,6 @@ void nfsd_lockd_shutdown(void);
#define nfserr_noent cpu_to_be32(NFSERR_NOENT)
#define nfserr_io cpu_to_be32(NFSERR_IO)
#define nfserr_nxio cpu_to_be32(NFSERR_NXIO)
-#define nfserr_eagain cpu_to_be32(NFSERR_EAGAIN)
#define nfserr_acces cpu_to_be32(NFSERR_ACCES)
#define nfserr_exist cpu_to_be32(NFSERR_EXIST)
#define nfserr_xdev cpu_to_be32(NFSERR_XDEV)
diff --git a/include/trace/misc/nfs.h b/include/trace/misc/nfs.h
index c82233e950ac..a394b4d38e18 100644
--- a/include/trace/misc/nfs.h
+++ b/include/trace/misc/nfs.h
@@ -16,7 +16,6 @@ TRACE_DEFINE_ENUM(NFSERR_PERM);
TRACE_DEFINE_ENUM(NFSERR_NOENT);
TRACE_DEFINE_ENUM(NFSERR_IO);
TRACE_DEFINE_ENUM(NFSERR_NXIO);
-TRACE_DEFINE_ENUM(NFSERR_EAGAIN);
TRACE_DEFINE_ENUM(NFSERR_ACCES);
TRACE_DEFINE_ENUM(NFSERR_EXIST);
TRACE_DEFINE_ENUM(NFSERR_XDEV);
@@ -52,7 +51,6 @@ TRACE_DEFINE_ENUM(NFSERR_JUKEBOX);
{ NFSERR_NXIO, "NXIO" }, \
{ ECHILD, "CHILD" }, \
{ ETIMEDOUT, "TIMEDOUT" }, \
- { NFSERR_EAGAIN, "AGAIN" }, \
{ NFSERR_ACCES, "ACCES" }, \
{ NFSERR_EXIST, "EXIST" }, \
{ NFSERR_XDEV, "XDEV" }, \
diff --git a/include/uapi/linux/nfs.h b/include/uapi/linux/nfs.h
index f356f2ba3814..71c7196d3281 100644
--- a/include/uapi/linux/nfs.h
+++ b/include/uapi/linux/nfs.h
@@ -49,7 +49,6 @@
NFSERR_NOENT = 2, /* v2 v3 v4 */
NFSERR_IO = 5, /* v2 v3 v4 */
NFSERR_NXIO = 6, /* v2 v3 v4 */
- NFSERR_EAGAIN = 11, /* v2 v3 */
NFSERR_ACCES = 13, /* v2 v3 v4 */
NFSERR_EXIST = 17, /* v2 v3 v4 */
NFSERR_XDEV = 18, /* v3 v4 */
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 2857bd59feb63fcf40fe4baf55401baea6b4feb4
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2026011224-thesaurus-unmixed-7797@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 2857bd59feb63fcf40fe4baf55401baea6b4feb4 Mon Sep 17 00:00:00 2001
From: NeilBrown <neil(a)brown.name>
Date: Sat, 13 Dec 2025 13:41:59 -0500
Subject: [PATCH] nfsd: provide locking for v4_end_grace
Writing to v4_end_grace can race with server shutdown and result in
memory being accessed after it was freed - reclaim_str_hashtbl in
particularly.
We cannot hold nfsd_mutex across the nfsd4_end_grace() call as that is
held while client_tracking_op->init() is called and that can wait for
an upcall to nfsdcltrack which can write to v4_end_grace, resulting in a
deadlock.
nfsd4_end_grace() is also called by the landromat work queue and this
doesn't require locking as server shutdown will stop the work and wait
for it before freeing anything that nfsd4_end_grace() might access.
However, we must be sure that writing to v4_end_grace doesn't restart
the work item after shutdown has already waited for it. For this we
add a new flag protected with nn->client_lock. It is set only while it
is safe to make client tracking calls, and v4_end_grace only schedules
work while the flag is set with the spinlock held.
So this patch adds a nfsd_net field "client_tracking_active" which is
set as described. Another field "grace_end_forced", is set when
v4_end_grace is written. After this is set, and providing
client_tracking_active is set, the laundromat is scheduled.
This "grace_end_forced" field bypasses other checks for whether the
grace period has finished.
This resolves a race which can result in use-after-free.
Reported-by: Li Lingfeng <lilingfeng3(a)huawei.com>
Closes: https://lore.kernel.org/linux-nfs/20250623030015.2353515-1-neil@brown.name/…
Fixes: 7f5ef2e900d9 ("nfsd: add a v4_end_grace file to /proc/fs/nfsd")
Cc: stable(a)vger.kernel.org
Signed-off-by: NeilBrown <neil(a)brown.name>
Tested-by: Li Lingfeng <lilingfeng3(a)huawei.com>
Reviewed-by: Jeff Layton <jlayton(a)kernel.org>
Signed-off-by: Chuck Lever <chuck.lever(a)oracle.com>
diff --git a/fs/nfsd/netns.h b/fs/nfsd/netns.h
index 3e2d0fde80a7..fe8338735e7c 100644
--- a/fs/nfsd/netns.h
+++ b/fs/nfsd/netns.h
@@ -66,6 +66,8 @@ struct nfsd_net {
struct lock_manager nfsd4_manager;
bool grace_ended;
+ bool grace_end_forced;
+ bool client_tracking_active;
time64_t boot_time;
struct dentry *nfsd_client_dir;
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index 5b83cb33bf83..a1dccce8b99c 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -84,7 +84,7 @@ static u64 current_sessionid = 1;
/* forward declarations */
static bool check_for_locks(struct nfs4_file *fp, struct nfs4_lockowner *lowner);
static void nfs4_free_ol_stateid(struct nfs4_stid *stid);
-void nfsd4_end_grace(struct nfsd_net *nn);
+static void nfsd4_end_grace(struct nfsd_net *nn);
static void _free_cpntf_state_locked(struct nfsd_net *nn, struct nfs4_cpntf_state *cps);
static void nfsd4_file_hash_remove(struct nfs4_file *fi);
static void deleg_reaper(struct nfsd_net *nn);
@@ -6570,7 +6570,7 @@ nfsd4_renew(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
return nfs_ok;
}
-void
+static void
nfsd4_end_grace(struct nfsd_net *nn)
{
/* do nothing if grace period already ended */
@@ -6603,6 +6603,33 @@ nfsd4_end_grace(struct nfsd_net *nn)
*/
}
+/**
+ * nfsd4_force_end_grace - forcibly end the NFSv4 grace period
+ * @nn: network namespace for the server instance to be updated
+ *
+ * Forces bypass of normal grace period completion, then schedules
+ * the laundromat to end the grace period immediately. Does not wait
+ * for the grace period to fully terminate before returning.
+ *
+ * Return values:
+ * %true: Grace termination schedule
+ * %false: No action was taken
+ */
+bool nfsd4_force_end_grace(struct nfsd_net *nn)
+{
+ if (!nn->client_tracking_ops)
+ return false;
+ spin_lock(&nn->client_lock);
+ if (nn->grace_ended || !nn->client_tracking_active) {
+ spin_unlock(&nn->client_lock);
+ return false;
+ }
+ WRITE_ONCE(nn->grace_end_forced, true);
+ mod_delayed_work(laundry_wq, &nn->laundromat_work, 0);
+ spin_unlock(&nn->client_lock);
+ return true;
+}
+
/*
* If we've waited a lease period but there are still clients trying to
* reclaim, wait a little longer to give them a chance to finish.
@@ -6612,6 +6639,8 @@ static bool clients_still_reclaiming(struct nfsd_net *nn)
time64_t double_grace_period_end = nn->boot_time +
2 * nn->nfsd4_lease;
+ if (READ_ONCE(nn->grace_end_forced))
+ return false;
if (nn->track_reclaim_completes &&
atomic_read(&nn->nr_reclaim_complete) ==
nn->reclaim_str_hashtbl_size)
@@ -8931,6 +8960,8 @@ static int nfs4_state_create_net(struct net *net)
nn->unconf_name_tree = RB_ROOT;
nn->boot_time = ktime_get_real_seconds();
nn->grace_ended = false;
+ nn->grace_end_forced = false;
+ nn->client_tracking_active = false;
nn->nfsd4_manager.block_opens = true;
INIT_LIST_HEAD(&nn->nfsd4_manager.list);
INIT_LIST_HEAD(&nn->client_lru);
@@ -9011,6 +9042,10 @@ nfs4_state_start_net(struct net *net)
return ret;
locks_start_grace(net, &nn->nfsd4_manager);
nfsd4_client_tracking_init(net);
+ /* safe for laundromat to run now */
+ spin_lock(&nn->client_lock);
+ nn->client_tracking_active = true;
+ spin_unlock(&nn->client_lock);
if (nn->track_reclaim_completes && nn->reclaim_str_hashtbl_size == 0)
goto skip_grace;
printk(KERN_INFO "NFSD: starting %lld-second grace period (net %x)\n",
@@ -9059,6 +9094,9 @@ nfs4_state_shutdown_net(struct net *net)
shrinker_free(nn->nfsd_client_shrinker);
cancel_work_sync(&nn->nfsd_shrinker_work);
+ spin_lock(&nn->client_lock);
+ nn->client_tracking_active = false;
+ spin_unlock(&nn->client_lock);
cancel_delayed_work_sync(&nn->laundromat_work);
locks_end_grace(&nn->nfsd4_manager);
diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
index 2b79129703d5..36ce3ca97d97 100644
--- a/fs/nfsd/nfsctl.c
+++ b/fs/nfsd/nfsctl.c
@@ -1082,10 +1082,9 @@ static ssize_t write_v4_end_grace(struct file *file, char *buf, size_t size)
case 'Y':
case 'y':
case '1':
- if (!nn->nfsd_serv)
+ if (!nfsd4_force_end_grace(nn))
return -EBUSY;
trace_nfsd_end_grace(netns(file));
- nfsd4_end_grace(nn);
break;
default:
return -EINVAL;
diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h
index 1e736f402426..50d2b2963390 100644
--- a/fs/nfsd/state.h
+++ b/fs/nfsd/state.h
@@ -849,7 +849,7 @@ static inline void nfsd4_revoke_states(struct net *net, struct super_block *sb)
#endif
/* grace period management */
-void nfsd4_end_grace(struct nfsd_net *nn);
+bool nfsd4_force_end_grace(struct nfsd_net *nn);
/* nfs4recover operations */
extern int nfsd4_client_tracking_init(struct net *net);
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 2857bd59feb63fcf40fe4baf55401baea6b4feb4
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2026011222-abdomen-balsamic-f41c@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 2857bd59feb63fcf40fe4baf55401baea6b4feb4 Mon Sep 17 00:00:00 2001
From: NeilBrown <neil(a)brown.name>
Date: Sat, 13 Dec 2025 13:41:59 -0500
Subject: [PATCH] nfsd: provide locking for v4_end_grace
Writing to v4_end_grace can race with server shutdown and result in
memory being accessed after it was freed - reclaim_str_hashtbl in
particularly.
We cannot hold nfsd_mutex across the nfsd4_end_grace() call as that is
held while client_tracking_op->init() is called and that can wait for
an upcall to nfsdcltrack which can write to v4_end_grace, resulting in a
deadlock.
nfsd4_end_grace() is also called by the landromat work queue and this
doesn't require locking as server shutdown will stop the work and wait
for it before freeing anything that nfsd4_end_grace() might access.
However, we must be sure that writing to v4_end_grace doesn't restart
the work item after shutdown has already waited for it. For this we
add a new flag protected with nn->client_lock. It is set only while it
is safe to make client tracking calls, and v4_end_grace only schedules
work while the flag is set with the spinlock held.
So this patch adds a nfsd_net field "client_tracking_active" which is
set as described. Another field "grace_end_forced", is set when
v4_end_grace is written. After this is set, and providing
client_tracking_active is set, the laundromat is scheduled.
This "grace_end_forced" field bypasses other checks for whether the
grace period has finished.
This resolves a race which can result in use-after-free.
Reported-by: Li Lingfeng <lilingfeng3(a)huawei.com>
Closes: https://lore.kernel.org/linux-nfs/20250623030015.2353515-1-neil@brown.name/…
Fixes: 7f5ef2e900d9 ("nfsd: add a v4_end_grace file to /proc/fs/nfsd")
Cc: stable(a)vger.kernel.org
Signed-off-by: NeilBrown <neil(a)brown.name>
Tested-by: Li Lingfeng <lilingfeng3(a)huawei.com>
Reviewed-by: Jeff Layton <jlayton(a)kernel.org>
Signed-off-by: Chuck Lever <chuck.lever(a)oracle.com>
diff --git a/fs/nfsd/netns.h b/fs/nfsd/netns.h
index 3e2d0fde80a7..fe8338735e7c 100644
--- a/fs/nfsd/netns.h
+++ b/fs/nfsd/netns.h
@@ -66,6 +66,8 @@ struct nfsd_net {
struct lock_manager nfsd4_manager;
bool grace_ended;
+ bool grace_end_forced;
+ bool client_tracking_active;
time64_t boot_time;
struct dentry *nfsd_client_dir;
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index 5b83cb33bf83..a1dccce8b99c 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -84,7 +84,7 @@ static u64 current_sessionid = 1;
/* forward declarations */
static bool check_for_locks(struct nfs4_file *fp, struct nfs4_lockowner *lowner);
static void nfs4_free_ol_stateid(struct nfs4_stid *stid);
-void nfsd4_end_grace(struct nfsd_net *nn);
+static void nfsd4_end_grace(struct nfsd_net *nn);
static void _free_cpntf_state_locked(struct nfsd_net *nn, struct nfs4_cpntf_state *cps);
static void nfsd4_file_hash_remove(struct nfs4_file *fi);
static void deleg_reaper(struct nfsd_net *nn);
@@ -6570,7 +6570,7 @@ nfsd4_renew(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
return nfs_ok;
}
-void
+static void
nfsd4_end_grace(struct nfsd_net *nn)
{
/* do nothing if grace period already ended */
@@ -6603,6 +6603,33 @@ nfsd4_end_grace(struct nfsd_net *nn)
*/
}
+/**
+ * nfsd4_force_end_grace - forcibly end the NFSv4 grace period
+ * @nn: network namespace for the server instance to be updated
+ *
+ * Forces bypass of normal grace period completion, then schedules
+ * the laundromat to end the grace period immediately. Does not wait
+ * for the grace period to fully terminate before returning.
+ *
+ * Return values:
+ * %true: Grace termination schedule
+ * %false: No action was taken
+ */
+bool nfsd4_force_end_grace(struct nfsd_net *nn)
+{
+ if (!nn->client_tracking_ops)
+ return false;
+ spin_lock(&nn->client_lock);
+ if (nn->grace_ended || !nn->client_tracking_active) {
+ spin_unlock(&nn->client_lock);
+ return false;
+ }
+ WRITE_ONCE(nn->grace_end_forced, true);
+ mod_delayed_work(laundry_wq, &nn->laundromat_work, 0);
+ spin_unlock(&nn->client_lock);
+ return true;
+}
+
/*
* If we've waited a lease period but there are still clients trying to
* reclaim, wait a little longer to give them a chance to finish.
@@ -6612,6 +6639,8 @@ static bool clients_still_reclaiming(struct nfsd_net *nn)
time64_t double_grace_period_end = nn->boot_time +
2 * nn->nfsd4_lease;
+ if (READ_ONCE(nn->grace_end_forced))
+ return false;
if (nn->track_reclaim_completes &&
atomic_read(&nn->nr_reclaim_complete) ==
nn->reclaim_str_hashtbl_size)
@@ -8931,6 +8960,8 @@ static int nfs4_state_create_net(struct net *net)
nn->unconf_name_tree = RB_ROOT;
nn->boot_time = ktime_get_real_seconds();
nn->grace_ended = false;
+ nn->grace_end_forced = false;
+ nn->client_tracking_active = false;
nn->nfsd4_manager.block_opens = true;
INIT_LIST_HEAD(&nn->nfsd4_manager.list);
INIT_LIST_HEAD(&nn->client_lru);
@@ -9011,6 +9042,10 @@ nfs4_state_start_net(struct net *net)
return ret;
locks_start_grace(net, &nn->nfsd4_manager);
nfsd4_client_tracking_init(net);
+ /* safe for laundromat to run now */
+ spin_lock(&nn->client_lock);
+ nn->client_tracking_active = true;
+ spin_unlock(&nn->client_lock);
if (nn->track_reclaim_completes && nn->reclaim_str_hashtbl_size == 0)
goto skip_grace;
printk(KERN_INFO "NFSD: starting %lld-second grace period (net %x)\n",
@@ -9059,6 +9094,9 @@ nfs4_state_shutdown_net(struct net *net)
shrinker_free(nn->nfsd_client_shrinker);
cancel_work_sync(&nn->nfsd_shrinker_work);
+ spin_lock(&nn->client_lock);
+ nn->client_tracking_active = false;
+ spin_unlock(&nn->client_lock);
cancel_delayed_work_sync(&nn->laundromat_work);
locks_end_grace(&nn->nfsd4_manager);
diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
index 2b79129703d5..36ce3ca97d97 100644
--- a/fs/nfsd/nfsctl.c
+++ b/fs/nfsd/nfsctl.c
@@ -1082,10 +1082,9 @@ static ssize_t write_v4_end_grace(struct file *file, char *buf, size_t size)
case 'Y':
case 'y':
case '1':
- if (!nn->nfsd_serv)
+ if (!nfsd4_force_end_grace(nn))
return -EBUSY;
trace_nfsd_end_grace(netns(file));
- nfsd4_end_grace(nn);
break;
default:
return -EINVAL;
diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h
index 1e736f402426..50d2b2963390 100644
--- a/fs/nfsd/state.h
+++ b/fs/nfsd/state.h
@@ -849,7 +849,7 @@ static inline void nfsd4_revoke_states(struct net *net, struct super_block *sb)
#endif
/* grace period management */
-void nfsd4_end_grace(struct nfsd_net *nn);
+bool nfsd4_force_end_grace(struct nfsd_net *nn);
/* nfs4recover operations */
extern int nfsd4_client_tracking_init(struct net *net);
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 2857bd59feb63fcf40fe4baf55401baea6b4feb4
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2026011223-account-preteen-f696@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 2857bd59feb63fcf40fe4baf55401baea6b4feb4 Mon Sep 17 00:00:00 2001
From: NeilBrown <neil(a)brown.name>
Date: Sat, 13 Dec 2025 13:41:59 -0500
Subject: [PATCH] nfsd: provide locking for v4_end_grace
Writing to v4_end_grace can race with server shutdown and result in
memory being accessed after it was freed - reclaim_str_hashtbl in
particularly.
We cannot hold nfsd_mutex across the nfsd4_end_grace() call as that is
held while client_tracking_op->init() is called and that can wait for
an upcall to nfsdcltrack which can write to v4_end_grace, resulting in a
deadlock.
nfsd4_end_grace() is also called by the landromat work queue and this
doesn't require locking as server shutdown will stop the work and wait
for it before freeing anything that nfsd4_end_grace() might access.
However, we must be sure that writing to v4_end_grace doesn't restart
the work item after shutdown has already waited for it. For this we
add a new flag protected with nn->client_lock. It is set only while it
is safe to make client tracking calls, and v4_end_grace only schedules
work while the flag is set with the spinlock held.
So this patch adds a nfsd_net field "client_tracking_active" which is
set as described. Another field "grace_end_forced", is set when
v4_end_grace is written. After this is set, and providing
client_tracking_active is set, the laundromat is scheduled.
This "grace_end_forced" field bypasses other checks for whether the
grace period has finished.
This resolves a race which can result in use-after-free.
Reported-by: Li Lingfeng <lilingfeng3(a)huawei.com>
Closes: https://lore.kernel.org/linux-nfs/20250623030015.2353515-1-neil@brown.name/…
Fixes: 7f5ef2e900d9 ("nfsd: add a v4_end_grace file to /proc/fs/nfsd")
Cc: stable(a)vger.kernel.org
Signed-off-by: NeilBrown <neil(a)brown.name>
Tested-by: Li Lingfeng <lilingfeng3(a)huawei.com>
Reviewed-by: Jeff Layton <jlayton(a)kernel.org>
Signed-off-by: Chuck Lever <chuck.lever(a)oracle.com>
diff --git a/fs/nfsd/netns.h b/fs/nfsd/netns.h
index 3e2d0fde80a7..fe8338735e7c 100644
--- a/fs/nfsd/netns.h
+++ b/fs/nfsd/netns.h
@@ -66,6 +66,8 @@ struct nfsd_net {
struct lock_manager nfsd4_manager;
bool grace_ended;
+ bool grace_end_forced;
+ bool client_tracking_active;
time64_t boot_time;
struct dentry *nfsd_client_dir;
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index 5b83cb33bf83..a1dccce8b99c 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -84,7 +84,7 @@ static u64 current_sessionid = 1;
/* forward declarations */
static bool check_for_locks(struct nfs4_file *fp, struct nfs4_lockowner *lowner);
static void nfs4_free_ol_stateid(struct nfs4_stid *stid);
-void nfsd4_end_grace(struct nfsd_net *nn);
+static void nfsd4_end_grace(struct nfsd_net *nn);
static void _free_cpntf_state_locked(struct nfsd_net *nn, struct nfs4_cpntf_state *cps);
static void nfsd4_file_hash_remove(struct nfs4_file *fi);
static void deleg_reaper(struct nfsd_net *nn);
@@ -6570,7 +6570,7 @@ nfsd4_renew(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
return nfs_ok;
}
-void
+static void
nfsd4_end_grace(struct nfsd_net *nn)
{
/* do nothing if grace period already ended */
@@ -6603,6 +6603,33 @@ nfsd4_end_grace(struct nfsd_net *nn)
*/
}
+/**
+ * nfsd4_force_end_grace - forcibly end the NFSv4 grace period
+ * @nn: network namespace for the server instance to be updated
+ *
+ * Forces bypass of normal grace period completion, then schedules
+ * the laundromat to end the grace period immediately. Does not wait
+ * for the grace period to fully terminate before returning.
+ *
+ * Return values:
+ * %true: Grace termination schedule
+ * %false: No action was taken
+ */
+bool nfsd4_force_end_grace(struct nfsd_net *nn)
+{
+ if (!nn->client_tracking_ops)
+ return false;
+ spin_lock(&nn->client_lock);
+ if (nn->grace_ended || !nn->client_tracking_active) {
+ spin_unlock(&nn->client_lock);
+ return false;
+ }
+ WRITE_ONCE(nn->grace_end_forced, true);
+ mod_delayed_work(laundry_wq, &nn->laundromat_work, 0);
+ spin_unlock(&nn->client_lock);
+ return true;
+}
+
/*
* If we've waited a lease period but there are still clients trying to
* reclaim, wait a little longer to give them a chance to finish.
@@ -6612,6 +6639,8 @@ static bool clients_still_reclaiming(struct nfsd_net *nn)
time64_t double_grace_period_end = nn->boot_time +
2 * nn->nfsd4_lease;
+ if (READ_ONCE(nn->grace_end_forced))
+ return false;
if (nn->track_reclaim_completes &&
atomic_read(&nn->nr_reclaim_complete) ==
nn->reclaim_str_hashtbl_size)
@@ -8931,6 +8960,8 @@ static int nfs4_state_create_net(struct net *net)
nn->unconf_name_tree = RB_ROOT;
nn->boot_time = ktime_get_real_seconds();
nn->grace_ended = false;
+ nn->grace_end_forced = false;
+ nn->client_tracking_active = false;
nn->nfsd4_manager.block_opens = true;
INIT_LIST_HEAD(&nn->nfsd4_manager.list);
INIT_LIST_HEAD(&nn->client_lru);
@@ -9011,6 +9042,10 @@ nfs4_state_start_net(struct net *net)
return ret;
locks_start_grace(net, &nn->nfsd4_manager);
nfsd4_client_tracking_init(net);
+ /* safe for laundromat to run now */
+ spin_lock(&nn->client_lock);
+ nn->client_tracking_active = true;
+ spin_unlock(&nn->client_lock);
if (nn->track_reclaim_completes && nn->reclaim_str_hashtbl_size == 0)
goto skip_grace;
printk(KERN_INFO "NFSD: starting %lld-second grace period (net %x)\n",
@@ -9059,6 +9094,9 @@ nfs4_state_shutdown_net(struct net *net)
shrinker_free(nn->nfsd_client_shrinker);
cancel_work_sync(&nn->nfsd_shrinker_work);
+ spin_lock(&nn->client_lock);
+ nn->client_tracking_active = false;
+ spin_unlock(&nn->client_lock);
cancel_delayed_work_sync(&nn->laundromat_work);
locks_end_grace(&nn->nfsd4_manager);
diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
index 2b79129703d5..36ce3ca97d97 100644
--- a/fs/nfsd/nfsctl.c
+++ b/fs/nfsd/nfsctl.c
@@ -1082,10 +1082,9 @@ static ssize_t write_v4_end_grace(struct file *file, char *buf, size_t size)
case 'Y':
case 'y':
case '1':
- if (!nn->nfsd_serv)
+ if (!nfsd4_force_end_grace(nn))
return -EBUSY;
trace_nfsd_end_grace(netns(file));
- nfsd4_end_grace(nn);
break;
default:
return -EINVAL;
diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h
index 1e736f402426..50d2b2963390 100644
--- a/fs/nfsd/state.h
+++ b/fs/nfsd/state.h
@@ -849,7 +849,7 @@ static inline void nfsd4_revoke_states(struct net *net, struct super_block *sb)
#endif
/* grace period management */
-void nfsd4_end_grace(struct nfsd_net *nn);
+bool nfsd4_force_end_grace(struct nfsd_net *nn);
/* nfs4recover operations */
extern int nfsd4_client_tracking_init(struct net *net);
This is the start of the stable review cycle for the 6.12.65 release.
There are 16 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Sun, 11 Jan 2026 11:19:41 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.12.65-rc…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.12.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 6.12.65-rc1
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Revert "iommu/amd: Skip enabling command/event buffers for kdump"
Sean Nyekjaer <sean(a)geanix.com>
pwm: stm32: Always program polarity
Maximilian Immanuel Brandtner <maxbr(a)linux.ibm.com>
virtio_console: fix order of fields cols and rows
Peter Zijlstra <peterz(a)infradead.org>
sched/fair: Proportional newidle balance
Peter Zijlstra <peterz(a)infradead.org>
sched/fair: Small cleanup to update_newidle_cost()
Peter Zijlstra <peterz(a)infradead.org>
sched/fair: Small cleanup to sched_balance_newidle()
Thadeu Lima de Souza Cascardo <cascardo(a)igalia.com>
net: Remove RTNL dance for SIOCBRADDIF and SIOCBRDELIF.
Richa Bharti <richa.bharti(a)siemens.com>
cpufreq: intel_pstate: Check IDA only before MSR_IA32_PERF_CTL writes
Natalie Vock <natalie.vock(a)gmx.de>
drm/amdgpu: Forward VMID reservation errors
Miaoqian Lin <linmq006(a)gmail.com>
net: phy: mediatek: fix nvmem cell reference leak in mt798x_phy_calibration
Jouni Malinen <jouni.malinen(a)oss.qualcomm.com>
wifi: mac80211: Discard Beacon frames to non-broadcast address
Paolo Abeni <pabeni(a)redhat.com>
mptcp: ensure context reset on disconnect()
Bijan Tabatabai <bijan311(a)gmail.com>
mm: consider non-anon swap cache folios in folio_expected_ref_count()
David Hildenbrand <david(a)redhat.com>
mm: simplify folio_expected_ref_count()
Alexander Gordeev <agordeev(a)linux.ibm.com>
mm/page_alloc: change all pageblocks migrate type on coalescing
Paolo Abeni <pabeni(a)redhat.com>
mptcp: fallback earlier on simult connection
-------------
Diffstat:
Makefile | 4 +--
drivers/char/virtio_console.c | 2 +-
drivers/cpufreq/intel_pstate.c | 9 +++--
drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 6 ++--
drivers/iommu/amd/init.c | 28 +++++----------
drivers/net/phy/mediatek-ge-soc.c | 2 +-
drivers/pwm/pwm-stm32.c | 3 +-
include/linux/if_bridge.h | 6 ++--
include/linux/mm.h | 10 +++---
include/linux/sched/topology.h | 3 ++
kernel/sched/core.c | 3 ++
kernel/sched/fair.c | 65 +++++++++++++++++++++++++++-------
kernel/sched/features.h | 5 +++
kernel/sched/sched.h | 7 ++++
kernel/sched/topology.c | 6 ++++
mm/page_alloc.c | 24 ++++++-------
net/bridge/br_ioctl.c | 36 +++++++++++++++++--
net/bridge/br_private.h | 3 +-
net/core/dev_ioctl.c | 16 ---------
net/mac80211/rx.c | 5 +++
net/mptcp/options.c | 10 ++++++
net/mptcp/protocol.c | 8 +++--
net/mptcp/protocol.h | 9 +++--
net/mptcp/subflow.c | 10 +-----
net/socket.c | 19 +++++-----
25 files changed, 186 insertions(+), 113 deletions(-)