The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
8625147cafaa ("hugetlbfs: don't delete error page from pagecache")
7e1813d48dd3 ("hugetlb: rename remove_huge_page to hugetlb_delete_from_page_cache")
1508062ecd55 ("hugetlbfs: Convert remove_inode_hugepages() to use filemap_get_folios()")
d9ef44de5d73 ("hugetlb: Convert huge_add_to_page_cache() to use a folio")
dd0f230a0a80 ("mm: hwpoison: refactor refcount check handling")
ea6d0630100b ("mm/hwpoison: do not lock page again when me_huge_page() successfully recovers")
171936ddaf97 ("mm/memory-failure: use a mutex to avoid memory_failure() races")
e32905e57358 ("userfaultfd: hugetlbfs: fix new flag usage in error path")
15b836536321 ("mm/hugetlb: remove unused variable pseudo_vma in remove_inode_hugepages()")
d4241a049ac0 ("mm/hugetlb: avoid calculating fault_mutex_hash in truncate_op case")
d6995da31122 ("hugetlb: use page.private for hugetlb specific page flags")
585fc0d2871c ("mm: hugetlbfs: fix cannot migrate the fallocated HugeTLB page")
a8b2c2ce89d4 ("mm,hwpoison: take free pages off the buddy freelists")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 8625147cafaa9ba74713d682f5185eb62cb2aedb Mon Sep 17 00:00:00 2001
From: James Houghton <jthoughton(a)google.com>
Date: Tue, 18 Oct 2022 20:01:25 +0000
Subject: [PATCH] hugetlbfs: don't delete error page from pagecache
This change is very similar to the change that was made for shmem [1], and
it solves the same problem but for HugeTLBFS instead.
Currently, when poison is found in a HugeTLB page, the page is removed
from the page cache. That means that attempting to map or read that
hugepage in the future will result in a new hugepage being allocated
instead of notifying the user that the page was poisoned. As [1] states,
this is effectively memory corruption.
The fix is to leave the page in the page cache. If the user attempts to
use a poisoned HugeTLB page with a syscall, the syscall will fail with
EIO, the same error code that shmem uses. For attempts to map the page,
the thread will get a BUS_MCEERR_AR SIGBUS.
[1]: commit a76054266661 ("mm: shmem: don't truncate page if memory failure happens")
Link: https://lkml.kernel.org/r/20221018200125.848471-1-jthoughton@google.com
Signed-off-by: James Houghton <jthoughton(a)google.com>
Reviewed-by: Mike Kravetz <mike.kravetz(a)oracle.com>
Reviewed-by: Naoya Horiguchi <naoya.horiguchi(a)nec.com>
Tested-by: Naoya Horiguchi <naoya.horiguchi(a)nec.com>
Reviewed-by: Yang Shi <shy828301(a)gmail.com>
Cc: Axel Rasmussen <axelrasmussen(a)google.com>
Cc: James Houghton <jthoughton(a)google.com>
Cc: Miaohe Lin <linmiaohe(a)huawei.com>
Cc: Muchun Song <songmuchun(a)bytedance.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index dd54f67e47fd..df7772335dc0 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -328,6 +328,12 @@ static ssize_t hugetlbfs_read_iter(struct kiocb *iocb, struct iov_iter *to)
} else {
unlock_page(page);
+ if (PageHWPoison(page)) {
+ put_page(page);
+ retval = -EIO;
+ break;
+ }
+
/*
* We have the page, copy it to user space buffer.
*/
@@ -1111,13 +1117,6 @@ static int hugetlbfs_migrate_folio(struct address_space *mapping,
static int hugetlbfs_error_remove_page(struct address_space *mapping,
struct page *page)
{
- struct inode *inode = mapping->host;
- pgoff_t index = page->index;
-
- hugetlb_delete_from_page_cache(page);
- if (unlikely(hugetlb_unreserve_pages(inode, index, index + 1, 1)))
- hugetlb_fix_reserve_counts(inode);
-
return 0;
}
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 546df97c31e4..e48f8ef45b17 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -6111,6 +6111,10 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm,
ptl = huge_pte_lock(h, dst_mm, dst_pte);
+ ret = -EIO;
+ if (PageHWPoison(page))
+ goto out_release_unlock;
+
/*
* We allow to overwrite a pte marker: consider when both MISSING|WP
* registered, we firstly wr-protect a none pte which has no page cache
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 145bb561ddb3..bead6bccc7f2 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1080,6 +1080,7 @@ static int me_huge_page(struct page_state *ps, struct page *p)
int res;
struct page *hpage = compound_head(p);
struct address_space *mapping;
+ bool extra_pins = false;
if (!PageHuge(hpage))
return MF_DELAYED;
@@ -1087,6 +1088,8 @@ static int me_huge_page(struct page_state *ps, struct page *p)
mapping = page_mapping(hpage);
if (mapping) {
res = truncate_error_page(hpage, page_to_pfn(p), mapping);
+ /* The page is kept in page cache. */
+ extra_pins = true;
unlock_page(hpage);
} else {
unlock_page(hpage);
@@ -1104,7 +1107,7 @@ static int me_huge_page(struct page_state *ps, struct page *p)
}
}
- if (has_extra_refcount(ps, p, false))
+ if (has_extra_refcount(ps, p, extra_pins))
res = MF_FAILED;
return res;
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
8625147cafaa ("hugetlbfs: don't delete error page from pagecache")
7e1813d48dd3 ("hugetlb: rename remove_huge_page to hugetlb_delete_from_page_cache")
1508062ecd55 ("hugetlbfs: Convert remove_inode_hugepages() to use filemap_get_folios()")
d9ef44de5d73 ("hugetlb: Convert huge_add_to_page_cache() to use a folio")
dd0f230a0a80 ("mm: hwpoison: refactor refcount check handling")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 8625147cafaa9ba74713d682f5185eb62cb2aedb Mon Sep 17 00:00:00 2001
From: James Houghton <jthoughton(a)google.com>
Date: Tue, 18 Oct 2022 20:01:25 +0000
Subject: [PATCH] hugetlbfs: don't delete error page from pagecache
This change is very similar to the change that was made for shmem [1], and
it solves the same problem but for HugeTLBFS instead.
Currently, when poison is found in a HugeTLB page, the page is removed
from the page cache. That means that attempting to map or read that
hugepage in the future will result in a new hugepage being allocated
instead of notifying the user that the page was poisoned. As [1] states,
this is effectively memory corruption.
The fix is to leave the page in the page cache. If the user attempts to
use a poisoned HugeTLB page with a syscall, the syscall will fail with
EIO, the same error code that shmem uses. For attempts to map the page,
the thread will get a BUS_MCEERR_AR SIGBUS.
[1]: commit a76054266661 ("mm: shmem: don't truncate page if memory failure happens")
Link: https://lkml.kernel.org/r/20221018200125.848471-1-jthoughton@google.com
Signed-off-by: James Houghton <jthoughton(a)google.com>
Reviewed-by: Mike Kravetz <mike.kravetz(a)oracle.com>
Reviewed-by: Naoya Horiguchi <naoya.horiguchi(a)nec.com>
Tested-by: Naoya Horiguchi <naoya.horiguchi(a)nec.com>
Reviewed-by: Yang Shi <shy828301(a)gmail.com>
Cc: Axel Rasmussen <axelrasmussen(a)google.com>
Cc: James Houghton <jthoughton(a)google.com>
Cc: Miaohe Lin <linmiaohe(a)huawei.com>
Cc: Muchun Song <songmuchun(a)bytedance.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index dd54f67e47fd..df7772335dc0 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -328,6 +328,12 @@ static ssize_t hugetlbfs_read_iter(struct kiocb *iocb, struct iov_iter *to)
} else {
unlock_page(page);
+ if (PageHWPoison(page)) {
+ put_page(page);
+ retval = -EIO;
+ break;
+ }
+
/*
* We have the page, copy it to user space buffer.
*/
@@ -1111,13 +1117,6 @@ static int hugetlbfs_migrate_folio(struct address_space *mapping,
static int hugetlbfs_error_remove_page(struct address_space *mapping,
struct page *page)
{
- struct inode *inode = mapping->host;
- pgoff_t index = page->index;
-
- hugetlb_delete_from_page_cache(page);
- if (unlikely(hugetlb_unreserve_pages(inode, index, index + 1, 1)))
- hugetlb_fix_reserve_counts(inode);
-
return 0;
}
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 546df97c31e4..e48f8ef45b17 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -6111,6 +6111,10 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm,
ptl = huge_pte_lock(h, dst_mm, dst_pte);
+ ret = -EIO;
+ if (PageHWPoison(page))
+ goto out_release_unlock;
+
/*
* We allow to overwrite a pte marker: consider when both MISSING|WP
* registered, we firstly wr-protect a none pte which has no page cache
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 145bb561ddb3..bead6bccc7f2 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1080,6 +1080,7 @@ static int me_huge_page(struct page_state *ps, struct page *p)
int res;
struct page *hpage = compound_head(p);
struct address_space *mapping;
+ bool extra_pins = false;
if (!PageHuge(hpage))
return MF_DELAYED;
@@ -1087,6 +1088,8 @@ static int me_huge_page(struct page_state *ps, struct page *p)
mapping = page_mapping(hpage);
if (mapping) {
res = truncate_error_page(hpage, page_to_pfn(p), mapping);
+ /* The page is kept in page cache. */
+ extra_pins = true;
unlock_page(hpage);
} else {
unlock_page(hpage);
@@ -1104,7 +1107,7 @@ static int me_huge_page(struct page_state *ps, struct page *p)
}
}
- if (has_extra_refcount(ps, p, false))
+ if (has_extra_refcount(ps, p, extra_pins))
res = MF_FAILED;
return res;
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
fb1dec44c675 ("mms: sdhci-esdhc-imx: Fix SDHCI_RESET_ALL for CQHCI")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From fb1dec44c6750bb414f47b929c8c175a1a127c31 Mon Sep 17 00:00:00 2001
From: Brian Norris <briannorris(a)chromium.org>
Date: Wed, 26 Oct 2022 12:42:06 -0700
Subject: [PATCH] mms: sdhci-esdhc-imx: Fix SDHCI_RESET_ALL for CQHCI
[[ NOTE: this is completely untested by the author, but included solely
because, as noted in commit df57d73276b8 ("mmc: sdhci-pci: Fix
SDHCI_RESET_ALL for CQHCI for Intel GLK-based controllers"), "other
drivers using CQHCI might benefit from a similar change, if they
also have CQHCI reset by SDHCI_RESET_ALL." We've now seen the same
bug on at least MSM, Arasan, and Intel hardware. ]]
SDHCI_RESET_ALL resets will reset the hardware CQE state, but we aren't
tracking that properly in software. When out of sync, we may trigger
various timeouts.
It's not typical to perform resets while CQE is enabled, but this may
occur in some suspend or error recovery scenarios.
Include this fix by way of the new sdhci_and_cqhci_reset() helper.
This patch depends on (and should not compile without) the patch
entitled "mmc: cqhci: Provide helper for resetting both SDHCI and
CQHCI".
Fixes: bb6e358169bf ("mmc: sdhci-esdhc-imx: add CMDQ support")
Signed-off-by: Brian Norris <briannorris(a)chromium.org>
Reviewed-by: Haibo Chen <haibo.chen(a)nxp.com>
Acked-by: Adrian Hunter <adrian.hunter(a)intel.com>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/20221026124150.v4.4.I7d01f9ad11bacdc9213dee61b791…
Signed-off-by: Ulf Hansson <ulf.hansson(a)linaro.org>
diff --git a/drivers/mmc/host/sdhci-esdhc-imx.c b/drivers/mmc/host/sdhci-esdhc-imx.c
index 747df79d90ee..89225faa242a 100644
--- a/drivers/mmc/host/sdhci-esdhc-imx.c
+++ b/drivers/mmc/host/sdhci-esdhc-imx.c
@@ -25,6 +25,7 @@
#include <linux/of_device.h>
#include <linux/pinctrl/consumer.h>
#include <linux/pm_runtime.h>
+#include "sdhci-cqhci.h"
#include "sdhci-pltfm.h"
#include "sdhci-esdhc.h"
#include "cqhci.h"
@@ -1288,7 +1289,7 @@ static void esdhc_set_uhs_signaling(struct sdhci_host *host, unsigned timing)
static void esdhc_reset(struct sdhci_host *host, u8 mask)
{
- sdhci_reset(host, mask);
+ sdhci_and_cqhci_reset(host, mask);
sdhci_writel(host, host->ier, SDHCI_INT_ENABLE);
sdhci_writel(host, host->ier, SDHCI_SIGNAL_ENABLE);
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
f90daa975911 ("drm/i915/dmabuf: fix sg_table handling in map_dma_buf")
10be98a77c55 ("drm/i915: Move more GEM objects under gem/")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From f90daa975911961b65070ec72bd7dd8d448f9ef7 Mon Sep 17 00:00:00 2001
From: Matthew Auld <matthew.auld(a)intel.com>
Date: Fri, 28 Oct 2022 16:50:26 +0100
Subject: [PATCH] drm/i915/dmabuf: fix sg_table handling in map_dma_buf
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
We need to iterate over the original entries here for the sg_table,
pulling out the struct page for each one, to be remapped. However
currently this incorrectly iterates over the final dma mapped entries,
which is likely just one gigantic sg entry if the iommu is enabled,
leading to us only mapping the first struct page (and any physically
contiguous pages following it), even if there is potentially lots more
data to follow.
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/7306
Fixes: 1286ff739773 ("i915: add dmabuf/prime buffer sharing support.")
Signed-off-by: Matthew Auld <matthew.auld(a)intel.com>
Cc: Lionel Landwerlin <lionel.g.landwerlin(a)intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin(a)linux.intel.com>
Cc: Ville Syrjälä <ville.syrjala(a)linux.intel.com>
Cc: Michael J. Ruhl <michael.j.ruhl(a)intel.com>
Cc: <stable(a)vger.kernel.org> # v3.5+
Reviewed-by: Michael J. Ruhl <michael.j.ruhl(a)intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221028155029.494736-1-matth…
(cherry picked from commit 28d52f99bbca7227008cf580c9194c9b3516968e)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin(a)intel.com>
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
index f5062d0c6333..824971a1ceec 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
@@ -40,13 +40,13 @@ static struct sg_table *i915_gem_map_dma_buf(struct dma_buf_attachment *attachme
goto err;
}
- ret = sg_alloc_table(st, obj->mm.pages->nents, GFP_KERNEL);
+ ret = sg_alloc_table(st, obj->mm.pages->orig_nents, GFP_KERNEL);
if (ret)
goto err_free;
src = obj->mm.pages->sgl;
dst = st->sgl;
- for (i = 0; i < obj->mm.pages->nents; i++) {
+ for (i = 0; i < obj->mm.pages->orig_nents; i++) {
sg_set_page(dst, sg_page(src), src->length, 0);
dst = sg_next(dst);
src = sg_next(src);
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
8cccf05fe857 ("nilfs2: fix use-after-free bug of ns_writer on remount")
1751e8a6cb93 ("Rename superblock flags (MS_xyz -> SB_xyz)")
b04a23421bf6 ("Merge branch 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 8cccf05fe857a18ee26e20d11a8455a73ffd4efd Mon Sep 17 00:00:00 2001
From: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Date: Fri, 4 Nov 2022 23:29:59 +0900
Subject: [PATCH] nilfs2: fix use-after-free bug of ns_writer on remount
If a nilfs2 filesystem is downgraded to read-only due to metadata
corruption on disk and is remounted read/write, or if emergency read-only
remount is performed, detaching a log writer and synchronizing the
filesystem can be done at the same time.
In these cases, use-after-free of the log writer (hereinafter
nilfs->ns_writer) can happen as shown in the scenario below:
Task1 Task2
-------------------------------- ------------------------------
nilfs_construct_segment
nilfs_segctor_sync
init_wait
init_waitqueue_entry
add_wait_queue
schedule
nilfs_remount (R/W remount case)
nilfs_attach_log_writer
nilfs_detach_log_writer
nilfs_segctor_destroy
kfree
finish_wait
_raw_spin_lock_irqsave
__raw_spin_lock_irqsave
do_raw_spin_lock
debug_spin_lock_before <-- use-after-free
While Task1 is sleeping, nilfs->ns_writer is freed by Task2. After Task1
waked up, Task1 accesses nilfs->ns_writer which is already freed. This
scenario diagram is based on the Shigeru Yoshida's post [1].
This patch fixes the issue by not detaching nilfs->ns_writer on remount so
that this UAF race doesn't happen. Along with this change, this patch
also inserts a few necessary read-only checks with superblock instance
where only the ns_writer pointer was used to check if the filesystem is
read-only.
Link: https://syzkaller.appspot.com/bug?id=79a4c002e960419ca173d55e863bd09e8112df…
Link: https://lkml.kernel.org/r/20221103141759.1836312-1-syoshida@redhat.com [1]
Link: https://lkml.kernel.org/r/20221104142959.28296-1-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Reported-by: syzbot+f816fa82f8783f7a02bb(a)syzkaller.appspotmail.com
Reported-by: Shigeru Yoshida <syoshida(a)redhat.com>
Tested-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/fs/nilfs2/segment.c b/fs/nilfs2/segment.c
index b4cebad21b48..3335ef352915 100644
--- a/fs/nilfs2/segment.c
+++ b/fs/nilfs2/segment.c
@@ -317,7 +317,7 @@ void nilfs_relax_pressure_in_lock(struct super_block *sb)
struct the_nilfs *nilfs = sb->s_fs_info;
struct nilfs_sc_info *sci = nilfs->ns_writer;
- if (!sci || !sci->sc_flush_request)
+ if (sb_rdonly(sb) || unlikely(!sci) || !sci->sc_flush_request)
return;
set_bit(NILFS_SC_PRIOR_FLUSH, &sci->sc_flags);
@@ -2242,7 +2242,7 @@ int nilfs_construct_segment(struct super_block *sb)
struct nilfs_sc_info *sci = nilfs->ns_writer;
struct nilfs_transaction_info *ti;
- if (!sci)
+ if (sb_rdonly(sb) || unlikely(!sci))
return -EROFS;
/* A call inside transactions causes a deadlock. */
@@ -2280,7 +2280,7 @@ int nilfs_construct_dsync_segment(struct super_block *sb, struct inode *inode,
struct nilfs_transaction_info ti;
int err = 0;
- if (!sci)
+ if (sb_rdonly(sb) || unlikely(!sci))
return -EROFS;
nilfs_transaction_lock(sb, &ti, 0);
@@ -2776,11 +2776,12 @@ int nilfs_attach_log_writer(struct super_block *sb, struct nilfs_root *root)
if (nilfs->ns_writer) {
/*
- * This happens if the filesystem was remounted
- * read/write after nilfs_error degenerated it into a
- * read-only mount.
+ * This happens if the filesystem is made read-only by
+ * __nilfs_error or nilfs_remount and then remounted
+ * read/write. In these cases, reuse the existing
+ * writer.
*/
- nilfs_detach_log_writer(sb);
+ return 0;
}
nilfs->ns_writer = nilfs_segctor_new(sb, root);
diff --git a/fs/nilfs2/super.c b/fs/nilfs2/super.c
index ba108f915391..6edb6e0dd61f 100644
--- a/fs/nilfs2/super.c
+++ b/fs/nilfs2/super.c
@@ -1133,8 +1133,6 @@ static int nilfs_remount(struct super_block *sb, int *flags, char *data)
if ((bool)(*flags & SB_RDONLY) == sb_rdonly(sb))
goto out;
if (*flags & SB_RDONLY) {
- /* Shutting down log writer */
- nilfs_detach_log_writer(sb);
sb->s_flags |= SB_RDONLY;
/*
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
8cccf05fe857 ("nilfs2: fix use-after-free bug of ns_writer on remount")
1751e8a6cb93 ("Rename superblock flags (MS_xyz -> SB_xyz)")
b04a23421bf6 ("Merge branch 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 8cccf05fe857a18ee26e20d11a8455a73ffd4efd Mon Sep 17 00:00:00 2001
From: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Date: Fri, 4 Nov 2022 23:29:59 +0900
Subject: [PATCH] nilfs2: fix use-after-free bug of ns_writer on remount
If a nilfs2 filesystem is downgraded to read-only due to metadata
corruption on disk and is remounted read/write, or if emergency read-only
remount is performed, detaching a log writer and synchronizing the
filesystem can be done at the same time.
In these cases, use-after-free of the log writer (hereinafter
nilfs->ns_writer) can happen as shown in the scenario below:
Task1 Task2
-------------------------------- ------------------------------
nilfs_construct_segment
nilfs_segctor_sync
init_wait
init_waitqueue_entry
add_wait_queue
schedule
nilfs_remount (R/W remount case)
nilfs_attach_log_writer
nilfs_detach_log_writer
nilfs_segctor_destroy
kfree
finish_wait
_raw_spin_lock_irqsave
__raw_spin_lock_irqsave
do_raw_spin_lock
debug_spin_lock_before <-- use-after-free
While Task1 is sleeping, nilfs->ns_writer is freed by Task2. After Task1
waked up, Task1 accesses nilfs->ns_writer which is already freed. This
scenario diagram is based on the Shigeru Yoshida's post [1].
This patch fixes the issue by not detaching nilfs->ns_writer on remount so
that this UAF race doesn't happen. Along with this change, this patch
also inserts a few necessary read-only checks with superblock instance
where only the ns_writer pointer was used to check if the filesystem is
read-only.
Link: https://syzkaller.appspot.com/bug?id=79a4c002e960419ca173d55e863bd09e8112df…
Link: https://lkml.kernel.org/r/20221103141759.1836312-1-syoshida@redhat.com [1]
Link: https://lkml.kernel.org/r/20221104142959.28296-1-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Reported-by: syzbot+f816fa82f8783f7a02bb(a)syzkaller.appspotmail.com
Reported-by: Shigeru Yoshida <syoshida(a)redhat.com>
Tested-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/fs/nilfs2/segment.c b/fs/nilfs2/segment.c
index b4cebad21b48..3335ef352915 100644
--- a/fs/nilfs2/segment.c
+++ b/fs/nilfs2/segment.c
@@ -317,7 +317,7 @@ void nilfs_relax_pressure_in_lock(struct super_block *sb)
struct the_nilfs *nilfs = sb->s_fs_info;
struct nilfs_sc_info *sci = nilfs->ns_writer;
- if (!sci || !sci->sc_flush_request)
+ if (sb_rdonly(sb) || unlikely(!sci) || !sci->sc_flush_request)
return;
set_bit(NILFS_SC_PRIOR_FLUSH, &sci->sc_flags);
@@ -2242,7 +2242,7 @@ int nilfs_construct_segment(struct super_block *sb)
struct nilfs_sc_info *sci = nilfs->ns_writer;
struct nilfs_transaction_info *ti;
- if (!sci)
+ if (sb_rdonly(sb) || unlikely(!sci))
return -EROFS;
/* A call inside transactions causes a deadlock. */
@@ -2280,7 +2280,7 @@ int nilfs_construct_dsync_segment(struct super_block *sb, struct inode *inode,
struct nilfs_transaction_info ti;
int err = 0;
- if (!sci)
+ if (sb_rdonly(sb) || unlikely(!sci))
return -EROFS;
nilfs_transaction_lock(sb, &ti, 0);
@@ -2776,11 +2776,12 @@ int nilfs_attach_log_writer(struct super_block *sb, struct nilfs_root *root)
if (nilfs->ns_writer) {
/*
- * This happens if the filesystem was remounted
- * read/write after nilfs_error degenerated it into a
- * read-only mount.
+ * This happens if the filesystem is made read-only by
+ * __nilfs_error or nilfs_remount and then remounted
+ * read/write. In these cases, reuse the existing
+ * writer.
*/
- nilfs_detach_log_writer(sb);
+ return 0;
}
nilfs->ns_writer = nilfs_segctor_new(sb, root);
diff --git a/fs/nilfs2/super.c b/fs/nilfs2/super.c
index ba108f915391..6edb6e0dd61f 100644
--- a/fs/nilfs2/super.c
+++ b/fs/nilfs2/super.c
@@ -1133,8 +1133,6 @@ static int nilfs_remount(struct super_block *sb, int *flags, char *data)
if ((bool)(*flags & SB_RDONLY) == sb_rdonly(sb))
goto out;
if (*flags & SB_RDONLY) {
- /* Shutting down log writer */
- nilfs_detach_log_writer(sb);
sb->s_flags |= SB_RDONLY;
/*
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
ebb5fd38f411 ("mmc: cqhci: Provide helper for resetting both SDHCI and CQHCI")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From ebb5fd38f41132e6924cb33b647337f4a5d5360c Mon Sep 17 00:00:00 2001
From: Brian Norris <briannorris(a)chromium.org>
Date: Wed, 26 Oct 2022 12:42:03 -0700
Subject: [PATCH] mmc: cqhci: Provide helper for resetting both SDHCI and CQHCI
Several SDHCI drivers need to deactivate command queueing in their reset
hook (see sdhci_cqhci_reset() / sdhci-pci-core.c, for example), and
several more are coming.
Those reset implementations have some small subtleties (e.g., ordering
of initialization of SDHCI vs. CQHCI might leave us resetting with a
NULL ->cqe_private), and are often identical across different host
drivers.
We also don't want to force a dependency between SDHCI and CQHCI, or
vice versa; non-SDHCI drivers use CQHCI, and SDHCI drivers might support
command queueing through some other means.
So, implement a small helper, to avoid repeating the same mistakes in
different drivers. Simply stick it in a header, because it's so small it
doesn't deserve its own module right now, and inlining to each driver is
pretty reasonable.
This is marked for -stable, as it is an important prerequisite patch for
several SDHCI controller bugfixes that follow.
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Brian Norris <briannorris(a)chromium.org>
Acked-by: Adrian Hunter <adrian.hunter(a)intel.com>
Reviewed-by: Florian Fainelli <f.fainelli(a)gmail.com>
Link: https://lore.kernel.org/r/20221026124150.v4.1.Ie85faa09432bfe1b0890d8c24ff9…
Signed-off-by: Ulf Hansson <ulf.hansson(a)linaro.org>
diff --git a/drivers/mmc/host/sdhci-cqhci.h b/drivers/mmc/host/sdhci-cqhci.h
new file mode 100644
index 000000000000..cf8e7ba71bbd
--- /dev/null
+++ b/drivers/mmc/host/sdhci-cqhci.h
@@ -0,0 +1,24 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright 2022 The Chromium OS Authors
+ *
+ * Support that applies to the combination of SDHCI and CQHCI, while not
+ * expressing a dependency between the two modules.
+ */
+
+#ifndef __MMC_HOST_SDHCI_CQHCI_H__
+#define __MMC_HOST_SDHCI_CQHCI_H__
+
+#include "cqhci.h"
+#include "sdhci.h"
+
+static inline void sdhci_and_cqhci_reset(struct sdhci_host *host, u8 mask)
+{
+ if ((host->mmc->caps2 & MMC_CAP2_CQE) && (mask & SDHCI_RESET_ALL) &&
+ host->mmc->cqe_private)
+ cqhci_deactivate(host->mmc);
+
+ sdhci_reset(host, mask);
+}
+
+#endif /* __MMC_HOST_SDHCI_CQHCI_H__ */
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
5d249ac37fc2 ("mmc: sdhci-of-arasan: Fix SDHCI_RESET_ALL for CQHCI")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 5d249ac37fc2396e8acc1adb0650cdacae5a990d Mon Sep 17 00:00:00 2001
From: Brian Norris <briannorris(a)chromium.org>
Date: Wed, 26 Oct 2022 12:42:04 -0700
Subject: [PATCH] mmc: sdhci-of-arasan: Fix SDHCI_RESET_ALL for CQHCI
SDHCI_RESET_ALL resets will reset the hardware CQE state, but we aren't
tracking that properly in software. When out of sync, we may trigger
various timeouts.
It's not typical to perform resets while CQE is enabled, but one
particular case I hit commonly enough: mmc_suspend() -> mmc_power_off().
Typically we will eventually deactivate CQE (cqhci_suspend() ->
cqhci_deactivate()), but that's not guaranteed -- in particular, if
we perform a partial (e.g., interrupted) system suspend.
The same bug was already found and fixed for two other drivers, in v5.7
and v5.9:
5cf583f1fb9c ("mmc: sdhci-msm: Deactivate CQE during SDHC reset")
df57d73276b8 ("mmc: sdhci-pci: Fix SDHCI_RESET_ALL for CQHCI for Intel
GLK-based controllers")
The latter is especially prescient, saying "other drivers using CQHCI
might benefit from a similar change, if they also have CQHCI reset by
SDHCI_RESET_ALL."
So like these other patches, deactivate CQHCI when resetting the
controller. Do this via the new sdhci_and_cqhci_reset() helper.
This patch depends on (and should not compile without) the patch
entitled "mmc: cqhci: Provide helper for resetting both SDHCI and
CQHCI".
Fixes: 84362d79f436 ("mmc: sdhci-of-arasan: Add CQHCI support for arasan,sdhci-5.1")
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Brian Norris <briannorris(a)chromium.org>
Reviewed-by: Guenter Roeck <linux(a)roeck-us.net>
Acked-by: Adrian Hunter <adrian.hunter(a)intel.com>
Link: https://lore.kernel.org/r/20221026124150.v4.2.I29f6a2189e84e35ad89c1833793d…
Signed-off-by: Ulf Hansson <ulf.hansson(a)linaro.org>
diff --git a/drivers/mmc/host/sdhci-of-arasan.c b/drivers/mmc/host/sdhci-of-arasan.c
index 3997cad1f793..cfb891430174 100644
--- a/drivers/mmc/host/sdhci-of-arasan.c
+++ b/drivers/mmc/host/sdhci-of-arasan.c
@@ -25,6 +25,7 @@
#include <linux/firmware/xlnx-zynqmp.h>
#include "cqhci.h"
+#include "sdhci-cqhci.h"
#include "sdhci-pltfm.h"
#define SDHCI_ARASAN_VENDOR_REGISTER 0x78
@@ -366,7 +367,7 @@ static void sdhci_arasan_reset(struct sdhci_host *host, u8 mask)
struct sdhci_pltfm_host *pltfm_host = sdhci_priv(host);
struct sdhci_arasan_data *sdhci_arasan = sdhci_pltfm_priv(pltfm_host);
- sdhci_reset(host, mask);
+ sdhci_and_cqhci_reset(host, mask);
if (sdhci_arasan->quirks & SDHCI_ARASAN_QUIRK_FORCE_CDTEST) {
ctrl = sdhci_readb(host, SDHCI_HOST_CONTROL);
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
f002f45a00ee ("mmc: sdhci-esdhc-imx: use the correct host caps for MMC_CAP_8_BIT_DATA")
1ed5c3b22fc7 ("mmc: sdhci-esdhc-imx: Propagate ESDHC_FLAG_HS400* only on 8bit bus")
2991ad76d253 ("mmc: sdhci-esdhc-imx: advertise HS400 mode through MMC caps")
854a22997ad5 ("mmc: sdhci-esdhc-imx: Convert the driver to DT-only")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From f002f45a00ee14214d96b18b9a555fe2c56afb20 Mon Sep 17 00:00:00 2001
From: Haibo Chen <haibo.chen(a)nxp.com>
Date: Tue, 8 Nov 2022 15:45:03 +0800
Subject: [PATCH] mmc: sdhci-esdhc-imx: use the correct host caps for
MMC_CAP_8_BIT_DATA
MMC_CAP_8_BIT_DATA belongs to struct mmc_host, not struct sdhci_host.
So correct it here.
Fixes: 1ed5c3b22fc7 ("mmc: sdhci-esdhc-imx: Propagate ESDHC_FLAG_HS400* only on 8bit bus")
Signed-off-by: Haibo Chen <haibo.chen(a)nxp.com>
Cc: stable(a)vger.kernel.org
Acked-by: Adrian Hunter <adrian.hunter(a)intel.com>
Link: https://lore.kernel.org/r/1667893503-20583-1-git-send-email-haibo.chen@nxp.…
Signed-off-by: Ulf Hansson <ulf.hansson(a)linaro.org>
diff --git a/drivers/mmc/host/sdhci-esdhc-imx.c b/drivers/mmc/host/sdhci-esdhc-imx.c
index 89225faa242a..31ea0a2fce35 100644
--- a/drivers/mmc/host/sdhci-esdhc-imx.c
+++ b/drivers/mmc/host/sdhci-esdhc-imx.c
@@ -1672,14 +1672,14 @@ static int sdhci_esdhc_imx_probe(struct platform_device *pdev)
if (imx_data->socdata->flags & ESDHC_FLAG_ERR004536)
host->quirks |= SDHCI_QUIRK_BROKEN_ADMA;
- if (host->caps & MMC_CAP_8_BIT_DATA &&
+ if (host->mmc->caps & MMC_CAP_8_BIT_DATA &&
imx_data->socdata->flags & ESDHC_FLAG_HS400)
host->mmc->caps2 |= MMC_CAP2_HS400;
if (imx_data->socdata->flags & ESDHC_FLAG_BROKEN_AUTO_CMD23)
host->quirks2 |= SDHCI_QUIRK2_ACMD23_BROKEN;
- if (host->caps & MMC_CAP_8_BIT_DATA &&
+ if (host->mmc->caps & MMC_CAP_8_BIT_DATA &&
imx_data->socdata->flags & ESDHC_FLAG_HS400_ES) {
host->mmc->caps2 |= MMC_CAP2_HS400_ES;
host->mmc_host_ops.hs400_enhanced_strobe =
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
56baa208f910 ("mmc: sdhci-brcmstb: Fix SDHCI_RESET_ALL for CQHCI")
6bcc55fe648b ("mmc: sdhci-brcmstb: Enable Clock Gating to save power")
f3a70f991dd0 ("mmc: sdhci-brcmstb: Re-organize flags")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 56baa208f91061ff27ec2d93fbc483f624d373b4 Mon Sep 17 00:00:00 2001
From: Brian Norris <briannorris(a)chromium.org>
Date: Wed, 26 Oct 2022 12:42:05 -0700
Subject: [PATCH] mmc: sdhci-brcmstb: Fix SDHCI_RESET_ALL for CQHCI
[[ NOTE: this is completely untested by the author, but included solely
because, as noted in commit df57d73276b8 ("mmc: sdhci-pci: Fix
SDHCI_RESET_ALL for CQHCI for Intel GLK-based controllers"), "other
drivers using CQHCI might benefit from a similar change, if they
also have CQHCI reset by SDHCI_RESET_ALL." We've now seen the same
bug on at least MSM, Arasan, and Intel hardware. ]]
SDHCI_RESET_ALL resets will reset the hardware CQE state, but we aren't
tracking that properly in software. When out of sync, we may trigger
various timeouts.
It's not typical to perform resets while CQE is enabled, but this may
occur in some suspend or error recovery scenarios.
Include this fix by way of the new sdhci_and_cqhci_reset() helper.
I only patch the bcm7216 variant even though others potentially *could*
provide the 'supports-cqe' property (and thus enable CQHCI), because
d46ba2d17f90 ("mmc: sdhci-brcmstb: Add support for Command Queuing
(CQE)") and some Broadcom folks confirm that only the 7216 variant
actually supports it.
This patch depends on (and should not compile without) the patch
entitled "mmc: cqhci: Provide helper for resetting both SDHCI and
CQHCI".
Fixes: d46ba2d17f90 ("mmc: sdhci-brcmstb: Add support for Command Queuing (CQE)")
Signed-off-by: Brian Norris <briannorris(a)chromium.org>
Reviewed-by: Florian Fainelli <f.fainelli(a)gmail.com>
Acked-by: Adrian Hunter <adrian.hunter(a)intel.com>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/20221026124150.v4.3.I6a715feab6d01f760455865e968e…
Signed-off-by: Ulf Hansson <ulf.hansson(a)linaro.org>
diff --git a/drivers/mmc/host/sdhci-brcmstb.c b/drivers/mmc/host/sdhci-brcmstb.c
index aff36a933ebe..55d8bd232695 100644
--- a/drivers/mmc/host/sdhci-brcmstb.c
+++ b/drivers/mmc/host/sdhci-brcmstb.c
@@ -12,6 +12,7 @@
#include <linux/bitops.h>
#include <linux/delay.h>
+#include "sdhci-cqhci.h"
#include "sdhci-pltfm.h"
#include "cqhci.h"
@@ -55,7 +56,7 @@ static void brcmstb_reset(struct sdhci_host *host, u8 mask)
struct sdhci_pltfm_host *pltfm_host = sdhci_priv(host);
struct sdhci_brcmstb_priv *priv = sdhci_pltfm_priv(pltfm_host);
- sdhci_reset(host, mask);
+ sdhci_and_cqhci_reset(host, mask);
/* Reset will clear this, so re-enable it */
if (priv->flags & BRCMSTB_PRIV_FLAGS_GATE_CLOCK)