pnv_tce() returns a pointer to a TCE entry and originally a TCE table
would be pre-allocated. For the default case of 2GB window the table
needs only a single level and that is fine. However if more levels are
requested, it is possible to get a race when 2 threads want a pointer
to a TCE entry from the same page of TCEs.
This adds cmpxchg to handle the race. Note that once TCE is non-zero,
it cannot become zero again.
CC: stable(a)vger.kernel.org # v4.19+
Fixes: a68bd1267b72 ("powerpc/powernv/…
[View More]ioda: Allocate indirect TCE levels on demand")
Signed-off-by: Alexey Kardashevskiy <aik(a)ozlabs.ru>
---
The race occurs about 30 times in the first 3 minutes of copying files
via rsync and that's about it.
This fixes EEH's from
https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=110810
---
Changes:
v2:
* replaced spin_lock with cmpxchg+readonce
---
arch/powerpc/platforms/powernv/pci-ioda-tce.c | 18 +++++++++++++-----
1 file changed, 13 insertions(+), 5 deletions(-)
diff --git a/arch/powerpc/platforms/powernv/pci-ioda-tce.c b/arch/powerpc/platforms/powernv/pci-ioda-tce.c
index e28f03e1eb5e..8d6569590161 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda-tce.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda-tce.c
@@ -48,6 +48,9 @@ static __be64 *pnv_alloc_tce_level(int nid, unsigned int shift)
return addr;
}
+static void pnv_pci_ioda2_table_do_free_pages(__be64 *addr,
+ unsigned long size, unsigned int levels);
+
static __be64 *pnv_tce(struct iommu_table *tbl, bool user, long idx, bool alloc)
{
__be64 *tmp = user ? tbl->it_userspace : (__be64 *) tbl->it_base;
@@ -57,9 +60,9 @@ static __be64 *pnv_tce(struct iommu_table *tbl, bool user, long idx, bool alloc)
while (level) {
int n = (idx & mask) >> (level * shift);
- unsigned long tce;
+ unsigned long oldtce, tce = be64_to_cpu(READ_ONCE(tmp[n]));
- if (tmp[n] == 0) {
+ if (!tce) {
__be64 *tmp2;
if (!alloc)
@@ -70,10 +73,15 @@ static __be64 *pnv_tce(struct iommu_table *tbl, bool user, long idx, bool alloc)
if (!tmp2)
return NULL;
- tmp[n] = cpu_to_be64(__pa(tmp2) |
- TCE_PCI_READ | TCE_PCI_WRITE);
+ tce = __pa(tmp2) | TCE_PCI_READ | TCE_PCI_WRITE;
+ oldtce = be64_to_cpu(cmpxchg(&tmp[n], 0,
+ cpu_to_be64(tce)));
+ if (oldtce) {
+ pnv_pci_ioda2_table_do_free_pages(tmp2,
+ ilog2(tbl->it_level_size) + 3, 1);
+ tce = oldtce;
+ }
}
- tce = be64_to_cpu(tmp[n]);
tmp = __va(tce & ~(TCE_PCI_READ | TCE_PCI_WRITE));
idx &= ~mask;
--
2.17.1
[View Less]
This is a note to let you know that I've just added the patch titled
staging: erofs: detect potential multiref due to corrupted images
to my staging git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging.git
in the staging-next branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will also be merged in the next major kernel release
during the merge window.
…
[View More]
If you have any questions about this process, please let me know.
>From e12a0ce2fa69798194f3a8628baf6edfbd5c548f Mon Sep 17 00:00:00 2001
From: Gao Xiang <gaoxiang25(a)huawei.com>
Date: Wed, 21 Aug 2019 22:01:52 +0800
Subject: staging: erofs: detect potential multiref due to corrupted images
As reported by erofs-utils fuzzer, currently, multiref
(ondisk deduplication) hasn't been supported for now,
we should forbid it properly.
Fixes: 3883a79abd02 ("staging: erofs: introduce VLE decompression support")
Cc: <stable(a)vger.kernel.org> # 4.19+
Signed-off-by: Gao Xiang <gaoxiang25(a)huawei.com>
Reviewed-by: Chao Yu <yuchao0(a)huawei.com>
Link: https://lore.kernel.org/r/20190821140152.229648-1-gaoxiang25@huawei.com
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/staging/erofs/zdata.c | 20 +++++++++++++++++---
1 file changed, 17 insertions(+), 3 deletions(-)
diff --git a/drivers/staging/erofs/zdata.c b/drivers/staging/erofs/zdata.c
index 4d6faaab04f5..60d7c20db87d 100644
--- a/drivers/staging/erofs/zdata.c
+++ b/drivers/staging/erofs/zdata.c
@@ -798,6 +798,7 @@ static int z_erofs_decompress_pcluster(struct super_block *sb,
for (i = 0; i < nr_pages; ++i)
pages[i] = NULL;
+ err = 0;
z_erofs_pagevec_ctor_init(&ctor, Z_EROFS_NR_INLINE_PAGEVECS,
cl->pagevec, 0);
@@ -819,8 +820,17 @@ static int z_erofs_decompress_pcluster(struct super_block *sb,
pagenr = z_erofs_onlinepage_index(page);
DBG_BUGON(pagenr >= nr_pages);
- DBG_BUGON(pages[pagenr]);
+ /*
+ * currently EROFS doesn't support multiref(dedup),
+ * so here erroring out one multiref page.
+ */
+ if (unlikely(pages[pagenr])) {
+ DBG_BUGON(1);
+ SetPageError(pages[pagenr]);
+ z_erofs_onlinepage_endio(pages[pagenr]);
+ err = -EFSCORRUPTED;
+ }
pages[pagenr] = page;
}
z_erofs_pagevec_ctor_exit(&ctor, true);
@@ -828,7 +838,6 @@ static int z_erofs_decompress_pcluster(struct super_block *sb,
overlapped = false;
compressed_pages = pcl->compressed_pages;
- err = 0;
for (i = 0; i < clusterpages; ++i) {
unsigned int pagenr;
@@ -852,7 +861,12 @@ static int z_erofs_decompress_pcluster(struct super_block *sb,
pagenr = z_erofs_onlinepage_index(page);
DBG_BUGON(pagenr >= nr_pages);
- DBG_BUGON(pages[pagenr]);
+ if (unlikely(pages[pagenr])) {
+ DBG_BUGON(1);
+ SetPageError(pages[pagenr]);
+ z_erofs_onlinepage_endio(pages[pagenr]);
+ err = -EFSCORRUPTED;
+ }
pages[pagenr] = page;
overlapped = true;
--
2.23.0
[View Less]