The quilt patch titled
Subject: nilfs2: fix deadlock in nilfs_count_free_blocks()
has been removed from the -mm tree. Its filename was
nilfs2-fix-deadlock-in-nilfs_count_free_blocks.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Subject: nilfs2: fix deadlock in nilfs_count_free_blocks()
Date: Sat, 29 Oct 2022 13:49:12 +0900
A semaphore deadlock can occur if nilfs_get_block() detects metadata
corruption while locating data blocks and a superblock writeback occurs at
the same time:
task 1 task 2
------ ------
* A file operation *
nilfs_truncate()
nilfs_get_block()
down_read(rwsem A) <--
nilfs_bmap_lookup_contig()
... generic_shutdown_super()
nilfs_put_super()
* Prepare to write superblock *
down_write(rwsem B) <--
nilfs_cleanup_super()
* Detect b-tree corruption * nilfs_set_log_cursor()
nilfs_bmap_convert_error() nilfs_count_free_blocks()
__nilfs_error() down_read(rwsem A) <--
nilfs_set_error()
down_write(rwsem B) <--
*** DEADLOCK ***
Here, nilfs_get_block() readlocks rwsem A (= NILFS_MDT(dat_inode)->mi_sem)
and then calls nilfs_bmap_lookup_contig(), but if it fails due to metadata
corruption, __nilfs_error() is called from nilfs_bmap_convert_error()
inside the lock section.
Since __nilfs_error() calls nilfs_set_error() unless the filesystem is
read-only and nilfs_set_error() attempts to writelock rwsem B (=
nilfs->ns_sem) to write back superblock exclusively, hierarchical lock
acquisition occurs in the order rwsem A -> rwsem B.
Now, if another task starts updating the superblock, it may writelock
rwsem B during the lock sequence above, and can deadlock trying to
readlock rwsem A in nilfs_count_free_blocks().
However, there is actually no need to take rwsem A in
nilfs_count_free_blocks() because it, within the lock section, only reads
a single integer data on a shared struct with
nilfs_sufile_get_ncleansegs(). This has been the case after commit
aa474a220180 ("nilfs2: add local variable to cache the number of clean
segments"), that is, even before this bug was introduced.
So, this resolves the deadlock problem by just not taking the semaphore in
nilfs_count_free_blocks().
Link: https://lkml.kernel.org/r/20221029044912.9139-1-konishi.ryusuke@gmail.com
Fixes: e828949e5b42 ("nilfs2: call nilfs_error inside bmap routines")
Signed-off-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Reported-by: syzbot+45d6ce7b7ad7ef455d03(a)syzkaller.appspotmail.com
Tested-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Cc: <stable(a)vger.kernel.org> [2.6.38+
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/nilfs2/the_nilfs.c | 2 --
1 file changed, 2 deletions(-)
--- a/fs/nilfs2/the_nilfs.c~nilfs2-fix-deadlock-in-nilfs_count_free_blocks
+++ a/fs/nilfs2/the_nilfs.c
@@ -690,9 +690,7 @@ int nilfs_count_free_blocks(struct the_n
{
unsigned long ncleansegs;
- down_read(&NILFS_MDT(nilfs->ns_dat)->mi_sem);
ncleansegs = nilfs_sufile_get_ncleansegs(nilfs->ns_sufile);
- up_read(&NILFS_MDT(nilfs->ns_dat)->mi_sem);
*nblocks = (sector_t)ncleansegs * nilfs->ns_blocks_per_segment;
return 0;
}
_
Patches currently in -mm which might be from konishi.ryusuke(a)gmail.com are
nilfs2-fix-shift-out-of-bounds-overflow-in-nilfs_sb2_bad_offset.patch
nilfs2-fix-shift-out-of-bounds-due-to-too-large-exponent-of-block-size.patch
The quilt patch titled
Subject: hugetlbfs: don't delete error page from pagecache
has been removed from the -mm tree. Its filename was
hugetlbfs-dont-delete-error-page-from-pagecache.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: James Houghton <jthoughton(a)google.com>
Subject: hugetlbfs: don't delete error page from pagecache
Date: Tue, 18 Oct 2022 20:01:25 +0000
This change is very similar to the change that was made for shmem [1], and
it solves the same problem but for HugeTLBFS instead.
Currently, when poison is found in a HugeTLB page, the page is removed
from the page cache. That means that attempting to map or read that
hugepage in the future will result in a new hugepage being allocated
instead of notifying the user that the page was poisoned. As [1] states,
this is effectively memory corruption.
The fix is to leave the page in the page cache. If the user attempts to
use a poisoned HugeTLB page with a syscall, the syscall will fail with
EIO, the same error code that shmem uses. For attempts to map the page,
the thread will get a BUS_MCEERR_AR SIGBUS.
[1]: commit a76054266661 ("mm: shmem: don't truncate page if memory failure happens")
Link: https://lkml.kernel.org/r/20221018200125.848471-1-jthoughton@google.com
Signed-off-by: James Houghton <jthoughton(a)google.com>
Reviewed-by: Mike Kravetz <mike.kravetz(a)oracle.com>
Reviewed-by: Naoya Horiguchi <naoya.horiguchi(a)nec.com>
Tested-by: Naoya Horiguchi <naoya.horiguchi(a)nec.com>
Reviewed-by: Yang Shi <shy828301(a)gmail.com>
Cc: Axel Rasmussen <axelrasmussen(a)google.com>
Cc: James Houghton <jthoughton(a)google.com>
Cc: Miaohe Lin <linmiaohe(a)huawei.com>
Cc: Muchun Song <songmuchun(a)bytedance.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/hugetlbfs/inode.c | 13 ++++++-------
mm/hugetlb.c | 4 ++++
mm/memory-failure.c | 5 ++++-
3 files changed, 14 insertions(+), 8 deletions(-)
--- a/fs/hugetlbfs/inode.c~hugetlbfs-dont-delete-error-page-from-pagecache
+++ a/fs/hugetlbfs/inode.c
@@ -328,6 +328,12 @@ static ssize_t hugetlbfs_read_iter(struc
} else {
unlock_page(page);
+ if (PageHWPoison(page)) {
+ put_page(page);
+ retval = -EIO;
+ break;
+ }
+
/*
* We have the page, copy it to user space buffer.
*/
@@ -1111,13 +1117,6 @@ static int hugetlbfs_migrate_folio(struc
static int hugetlbfs_error_remove_page(struct address_space *mapping,
struct page *page)
{
- struct inode *inode = mapping->host;
- pgoff_t index = page->index;
-
- hugetlb_delete_from_page_cache(page);
- if (unlikely(hugetlb_unreserve_pages(inode, index, index + 1, 1)))
- hugetlb_fix_reserve_counts(inode);
-
return 0;
}
--- a/mm/hugetlb.c~hugetlbfs-dont-delete-error-page-from-pagecache
+++ a/mm/hugetlb.c
@@ -6111,6 +6111,10 @@ int hugetlb_mcopy_atomic_pte(struct mm_s
ptl = huge_pte_lock(h, dst_mm, dst_pte);
+ ret = -EIO;
+ if (PageHWPoison(page))
+ goto out_release_unlock;
+
/*
* We allow to overwrite a pte marker: consider when both MISSING|WP
* registered, we firstly wr-protect a none pte which has no page cache
--- a/mm/memory-failure.c~hugetlbfs-dont-delete-error-page-from-pagecache
+++ a/mm/memory-failure.c
@@ -1080,6 +1080,7 @@ static int me_huge_page(struct page_stat
int res;
struct page *hpage = compound_head(p);
struct address_space *mapping;
+ bool extra_pins = false;
if (!PageHuge(hpage))
return MF_DELAYED;
@@ -1087,6 +1088,8 @@ static int me_huge_page(struct page_stat
mapping = page_mapping(hpage);
if (mapping) {
res = truncate_error_page(hpage, page_to_pfn(p), mapping);
+ /* The page is kept in page cache. */
+ extra_pins = true;
unlock_page(hpage);
} else {
unlock_page(hpage);
@@ -1104,7 +1107,7 @@ static int me_huge_page(struct page_stat
}
}
- if (has_extra_refcount(ps, p, false))
+ if (has_extra_refcount(ps, p, extra_pins))
res = MF_FAILED;
return res;
_
Patches currently in -mm which might be from jthoughton(a)google.com are