As reported by Athul upstream in [1], there is a userspace regression caused by commit 0c58a97f919c ("fuse: remove tmp folio for writebacks and internal rb tree") where if there is a bug in a fuse server that causes the server to never complete writeback, it will make wait_sb_inodes() wait forever, causing sync paths to hang.
This is a resubmission of this patch [2] that was dropped from the original series due to a buggy/malicious server still being able to hold up sync() / the system in other ways if they wanted to, but the wait_sb_inodes() path is particularly common and easier to hit for malfunctioning servers.
Thanks, Joanne
[1] https://lore.kernel.org/regressions/CAJnrk1ZjQ8W8NzojsvJPRXiv9TuYPNdj8Ye7=Cg... [2] https://lore.kernel.org/linux-fsdevel/20241122232359.429647-4-joannelkoong@g...
Joanne Koong (2): mm: rename AS_WRITEBACK_MAY_DEADLOCK_ON_RECLAIM to AS_WRITEBACK_MAY_HANG fs/writeback: skip inodes with potential writeback hang in wait_sb_inodes()
fs/fs-writeback.c | 3 +++ fs/fuse/file.c | 2 +- include/linux/pagemap.h | 10 +++++----- mm/vmscan.c | 3 +-- 4 files changed, 10 insertions(+), 8 deletions(-)
AS_WRITEBACK_MAY_DEADLOCK_ON_RECLAIM was added to avoid waiting on writeback during reclaim for inodes belonging to filesystems where a) waiting on writeback in reclaim may lead to a deadlock or b) a writeback request may never complete due to the nature of the filesystem (unrelated to reclaim)
Rename AS_WRITEBACK_MAY_DEADLOCK_ON_RECLAIM to the more generic AS_WRITEBACK_MAY_HANG to reflect mappings where writeback may hang where the cause could be unrelated to reclaim.
This allows us to later use AS_WRITEBACK_MAY_HANG to mitigate other scenarios such as possible hangs when sync waits on writeback.
Signed-off-by: Joanne Koong joannelkoong@gmail.com --- fs/fuse/file.c | 2 +- include/linux/pagemap.h | 10 +++++----- mm/vmscan.c | 3 +-- 3 files changed, 7 insertions(+), 8 deletions(-)
diff --git a/fs/fuse/file.c b/fs/fuse/file.c index f1ef77a0be05..0804c832bcb7 100644 --- a/fs/fuse/file.c +++ b/fs/fuse/file.c @@ -3126,7 +3126,7 @@ void fuse_init_file_inode(struct inode *inode, unsigned int flags) inode->i_fop = &fuse_file_operations; inode->i_data.a_ops = &fuse_file_aops; if (fc->writeback_cache) - mapping_set_writeback_may_deadlock_on_reclaim(&inode->i_data); + mapping_set_writeback_may_hang(&inode->i_data);
INIT_LIST_HEAD(&fi->write_files); INIT_LIST_HEAD(&fi->queued_writes); diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h index 09b581c1d878..a895d6b6aabb 100644 --- a/include/linux/pagemap.h +++ b/include/linux/pagemap.h @@ -210,7 +210,7 @@ enum mapping_flags { AS_STABLE_WRITES = 7, /* must wait for writeback before modifying folio contents */ AS_INACCESSIBLE = 8, /* Do not attempt direct R/W access to the mapping */ - AS_WRITEBACK_MAY_DEADLOCK_ON_RECLAIM = 9, + AS_WRITEBACK_MAY_HANG = 9, AS_KERNEL_FILE = 10, /* mapping for a fake kernel file that shouldn't account usage to user cgroups */ /* Bits 16-25 are used for FOLIO_ORDER */ @@ -338,14 +338,14 @@ static inline bool mapping_inaccessible(const struct address_space *mapping) return test_bit(AS_INACCESSIBLE, &mapping->flags); }
-static inline void mapping_set_writeback_may_deadlock_on_reclaim(struct address_space *mapping) +static inline void mapping_set_writeback_may_hang(struct address_space *mapping) { - set_bit(AS_WRITEBACK_MAY_DEADLOCK_ON_RECLAIM, &mapping->flags); + set_bit(AS_WRITEBACK_MAY_HANG, &mapping->flags); }
-static inline bool mapping_writeback_may_deadlock_on_reclaim(const struct address_space *mapping) +static inline bool mapping_writeback_may_hang(const struct address_space *mapping) { - return test_bit(AS_WRITEBACK_MAY_DEADLOCK_ON_RECLAIM, &mapping->flags); + return test_bit(AS_WRITEBACK_MAY_HANG, &mapping->flags); }
static inline gfp_t mapping_gfp_mask(const struct address_space *mapping) diff --git a/mm/vmscan.c b/mm/vmscan.c index 92980b072121..636c18ee2b2c 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1216,8 +1216,7 @@ static unsigned int shrink_folio_list(struct list_head *folio_list, } else if (writeback_throttling_sane(sc) || !folio_test_reclaim(folio) || !may_enter_fs(folio, sc->gfp_mask) || - (mapping && - mapping_writeback_may_deadlock_on_reclaim(mapping))) { + (mapping && mapping_writeback_may_hang(mapping))) { /* * This is slightly racy - * folio_end_writeback() might have
On 11/20/25 19:42, Joanne Koong wrote:
AS_WRITEBACK_MAY_DEADLOCK_ON_RECLAIM was added to avoid waiting on writeback during reclaim for inodes belonging to filesystems where a) waiting on writeback in reclaim may lead to a deadlock or b) a writeback request may never complete due to the nature of the filesystem (unrelated to reclaim)
Rename AS_WRITEBACK_MAY_DEADLOCK_ON_RECLAIM to the more generic AS_WRITEBACK_MAY_HANG to reflect mappings where writeback may hang where the cause could be unrelated to reclaim.
This allows us to later use AS_WRITEBACK_MAY_HANG to mitigate other scenarios such as possible hangs when sync waits on writeback.
Hmm, there is a difference whether writeback may hang or whether writeback may deadlock.
In particular, isn't it the case that writeback on any filesystem might effectively hang forever on I/O errors etc?
Is this going back to the previous flag semantics before we decided on AS_WRITEBACK_MAY_DEADLOCK_ON_RECLAIM? (I'd have to look at the previous discussions, but "writeback may take an indefinite amount" in patch #2 pretty much looks like what I remember there)
On Thu, Nov 20, 2025 at 12:08 PM David Hildenbrand (Red Hat) david@kernel.org wrote:
On 11/20/25 19:42, Joanne Koong wrote:
AS_WRITEBACK_MAY_DEADLOCK_ON_RECLAIM was added to avoid waiting on writeback during reclaim for inodes belonging to filesystems where a) waiting on writeback in reclaim may lead to a deadlock or b) a writeback request may never complete due to the nature of the filesystem (unrelated to reclaim)
Rename AS_WRITEBACK_MAY_DEADLOCK_ON_RECLAIM to the more generic AS_WRITEBACK_MAY_HANG to reflect mappings where writeback may hang where the cause could be unrelated to reclaim.
This allows us to later use AS_WRITEBACK_MAY_HANG to mitigate other scenarios such as possible hangs when sync waits on writeback.
Hmm, there is a difference whether writeback may hang or whether writeback may deadlock.
In particular, isn't it the case that writeback on any filesystem might effectively hang forever on I/O errors etc?
Is this going back to the previous flag semantics before we decided on AS_WRITEBACK_MAY_DEADLOCK_ON_RECLAIM? (I'd have to look at the previous discussions, but "writeback may take an indefinite amount" in patch #2 pretty much looks like what I remember there)
Yes, I think if we keep AS_WRITEBACK_MAY_DEADLOCK_ON_RECLAIM, then we would need another flag to denote the inode should be skipped in wait_sb_inodes(), which seems unideal. I was considering renaming this to AS_WRITEBACK_INDETERMINATE but I remember everyone hated that name.
Thanks, Joanne
-- Cheers
David
During superblock writeback waiting, skip inodes where writeback may take an indefinite amount of time or hang, as denoted by the AS_WRITEBACK_MAY_HANG mapping flag.
Currently, fuse is the only filesystem with this flag set. For a properly functioning fuse server, writeback requests are completed and there is no issue. However, if there is a bug in the fuse server and it hangs on writeback, then without this change, wait_sb_inodes() will wait forever.
Signed-off-by: Joanne Koong joannelkoong@gmail.com Fixes: 0c58a97f919c ("fuse: remove tmp folio for writebacks and internal rb tree") Reported-by: Athul Krishna athul.krishna.kr@protonmail.com --- fs/fs-writeback.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c index 2b35e80037fe..eb246e9fbf3d 100644 --- a/fs/fs-writeback.c +++ b/fs/fs-writeback.c @@ -2733,6 +2733,9 @@ static void wait_sb_inodes(struct super_block *sb) if (!mapping_tagged(mapping, PAGECACHE_TAG_WRITEBACK)) continue;
+ if (mapping_writeback_may_hang(mapping)) + continue; + spin_unlock_irq(&sb->s_inode_wblist_lock);
spin_lock(&inode->i_lock);
Hi,
Thanks for your patch.
FYI: kernel test robot notices the stable kernel rule is not satisfied.
The check is based on https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.html#opti...
Rule: add the tag "Cc: stable@vger.kernel.org" in the sign-off area to have the patch automatically included in the stable tree. Subject: [PATCH v1 2/2] fs/writeback: skip inodes with potential writeback hang in wait_sb_inodes() Link: https://lore.kernel.org/stable/20251120184211.2379439-3-joannelkoong%40gmail...
On Thu, Nov 20, 2025 at 10:45 AM kernel test robot lkp@intel.com wrote:
Hi,
Thanks for your patch.
FYI: kernel test robot notices the stable kernel rule is not satisfied.
The check is based on https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.html#opti...
Rule: add the tag "Cc: stable@vger.kernel.org" in the sign-off area to have the patch automatically included in the stable tree. Subject: [PATCH v1 2/2] fs/writeback: skip inodes with potential writeback hang in wait_sb_inodes() Link: https://lore.kernel.org/stable/20251120184211.2379439-3-joannelkoong%40gmail...
Sorry about that, I thought it was enough to cc it in the email. I'll add the stable cc tag to the patch itself from now on.
Thanks, Joanne
-- 0-DAY CI Kernel Test Service https://github.com/intel/lkp-tests/wiki
On 11/20/25 19:42, Joanne Koong wrote:
During superblock writeback waiting, skip inodes where writeback may take an indefinite amount of time or hang, as denoted by the AS_WRITEBACK_MAY_HANG mapping flag.
Currently, fuse is the only filesystem with this flag set. For a properly functioning fuse server, writeback requests are completed and there is no issue. However, if there is a bug in the fuse server and it hangs on writeback, then without this change, wait_sb_inodes() will wait forever.
Signed-off-by: Joanne Koong joannelkoong@gmail.com Fixes: 0c58a97f919c ("fuse: remove tmp folio for writebacks and internal rb tree") Reported-by: Athul Krishna athul.krishna.kr@protonmail.com
fs/fs-writeback.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c index 2b35e80037fe..eb246e9fbf3d 100644 --- a/fs/fs-writeback.c +++ b/fs/fs-writeback.c @@ -2733,6 +2733,9 @@ static void wait_sb_inodes(struct super_block *sb) if (!mapping_tagged(mapping, PAGECACHE_TAG_WRITEBACK)) continue;
if (mapping_writeback_may_hang(mapping))continue;
I think I raised it in the past, but simply because it could happen, why would we unconditionally want to do that for all fuse mounts? That just seems wrong :(
To phrase it in a different way, if any writeback could theoretically hang, why are we even waiting on writeback in the first place?
On Thu, Nov 20, 2025 at 12:23 PM David Hildenbrand (Red Hat) david@kernel.org wrote:
On 11/20/25 19:42, Joanne Koong wrote:
During superblock writeback waiting, skip inodes where writeback may take an indefinite amount of time or hang, as denoted by the AS_WRITEBACK_MAY_HANG mapping flag.
Currently, fuse is the only filesystem with this flag set. For a properly functioning fuse server, writeback requests are completed and there is no issue. However, if there is a bug in the fuse server and it hangs on writeback, then without this change, wait_sb_inodes() will wait forever.
Signed-off-by: Joanne Koong joannelkoong@gmail.com Fixes: 0c58a97f919c ("fuse: remove tmp folio for writebacks and internal rb tree") Reported-by: Athul Krishna athul.krishna.kr@protonmail.com
fs/fs-writeback.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c index 2b35e80037fe..eb246e9fbf3d 100644 --- a/fs/fs-writeback.c +++ b/fs/fs-writeback.c @@ -2733,6 +2733,9 @@ static void wait_sb_inodes(struct super_block *sb) if (!mapping_tagged(mapping, PAGECACHE_TAG_WRITEBACK)) continue;
if (mapping_writeback_may_hang(mapping))continue;I think I raised it in the past, but simply because it could happen, why would we unconditionally want to do that for all fuse mounts? That just seems wrong :(
I think it's considered a userspace regression if we don't revert the program behavior back to its previous version, even if it is from the program being incorrectly written, as per the conversation in [1].
[1] https://lore.kernel.org/regressions/CAJnrk1Yh4GtF-wxWo_2ffbr90R44u0WDmMAEn9v...
To phrase it in a different way, if any writeback could theoretically hang, why are we even waiting on writeback in the first place?
I think it's because on other filesystems, something has to go seriously wrong for writeback to hang, but on fuse a server can easily make writeback hang and as it turns out, there are already existing userspace programs that do this accidentally.
Thanks, Joanne
-- Cheers
David
linux-stable-mirror@lists.linaro.org