pnv_tce() returns a pointer to a TCE entry and originally a TCE table
would be pre-allocated. For the default case of 2GB window the table
needs only a single level and that is fine. However if more levels are
requested, it is possible to get a race when 2 threads want a pointer
to a TCE entry from the same page of TCEs.
This adds cmpxchg to handle the race. Note that once TCE is non-zero,
it cannot become zero again.
CC: stable(a)vger.kernel.org # v4.19+
Fixes: a68bd1267b72 ("powerpc/powernv/ioda: Allocate indirect TCE levels on demand")
Signed-off-by: Alexey Kardashevskiy <aik(a)ozlabs.ru>
---
The race occurs about 30 times in the first 3 minutes of copying files
via rsync and that's about it.
This fixes EEH's from
https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=110810
---
Changes:
v2:
* replaced spin_lock with cmpxchg+readonce
---
arch/powerpc/platforms/powernv/pci-ioda-tce.c | 18 +++++++++++++-----
1 file changed, 13 insertions(+), 5 deletions(-)
diff --git a/arch/powerpc/platforms/powernv/pci-ioda-tce.c b/arch/powerpc/platforms/powernv/pci-ioda-tce.c
index e28f03e1eb5e..8d6569590161 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda-tce.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda-tce.c
@@ -48,6 +48,9 @@ static __be64 *pnv_alloc_tce_level(int nid, unsigned int shift)
return addr;
}
+static void pnv_pci_ioda2_table_do_free_pages(__be64 *addr,
+ unsigned long size, unsigned int levels);
+
static __be64 *pnv_tce(struct iommu_table *tbl, bool user, long idx, bool alloc)
{
__be64 *tmp = user ? tbl->it_userspace : (__be64 *) tbl->it_base;
@@ -57,9 +60,9 @@ static __be64 *pnv_tce(struct iommu_table *tbl, bool user, long idx, bool alloc)
while (level) {
int n = (idx & mask) >> (level * shift);
- unsigned long tce;
+ unsigned long oldtce, tce = be64_to_cpu(READ_ONCE(tmp[n]));
- if (tmp[n] == 0) {
+ if (!tce) {
__be64 *tmp2;
if (!alloc)
@@ -70,10 +73,15 @@ static __be64 *pnv_tce(struct iommu_table *tbl, bool user, long idx, bool alloc)
if (!tmp2)
return NULL;
- tmp[n] = cpu_to_be64(__pa(tmp2) |
- TCE_PCI_READ | TCE_PCI_WRITE);
+ tce = __pa(tmp2) | TCE_PCI_READ | TCE_PCI_WRITE;
+ oldtce = be64_to_cpu(cmpxchg(&tmp[n], 0,
+ cpu_to_be64(tce)));
+ if (oldtce) {
+ pnv_pci_ioda2_table_do_free_pages(tmp2,
+ ilog2(tbl->it_level_size) + 3, 1);
+ tce = oldtce;
+ }
}
- tce = be64_to_cpu(tmp[n]);
tmp = __va(tce & ~(TCE_PCI_READ | TCE_PCI_WRITE));
idx &= ~mask;
--
2.17.1
This is a note to let you know that I've just added the patch titled
staging: erofs: detect potential multiref due to corrupted images
to my staging git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging.git
in the staging-next branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will also be merged in the next major kernel release
during the merge window.
If you have any questions about this process, please let me know.
>From e12a0ce2fa69798194f3a8628baf6edfbd5c548f Mon Sep 17 00:00:00 2001
From: Gao Xiang <gaoxiang25(a)huawei.com>
Date: Wed, 21 Aug 2019 22:01:52 +0800
Subject: staging: erofs: detect potential multiref due to corrupted images
As reported by erofs-utils fuzzer, currently, multiref
(ondisk deduplication) hasn't been supported for now,
we should forbid it properly.
Fixes: 3883a79abd02 ("staging: erofs: introduce VLE decompression support")
Cc: <stable(a)vger.kernel.org> # 4.19+
Signed-off-by: Gao Xiang <gaoxiang25(a)huawei.com>
Reviewed-by: Chao Yu <yuchao0(a)huawei.com>
Link: https://lore.kernel.org/r/20190821140152.229648-1-gaoxiang25@huawei.com
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/staging/erofs/zdata.c | 20 +++++++++++++++++---
1 file changed, 17 insertions(+), 3 deletions(-)
diff --git a/drivers/staging/erofs/zdata.c b/drivers/staging/erofs/zdata.c
index 4d6faaab04f5..60d7c20db87d 100644
--- a/drivers/staging/erofs/zdata.c
+++ b/drivers/staging/erofs/zdata.c
@@ -798,6 +798,7 @@ static int z_erofs_decompress_pcluster(struct super_block *sb,
for (i = 0; i < nr_pages; ++i)
pages[i] = NULL;
+ err = 0;
z_erofs_pagevec_ctor_init(&ctor, Z_EROFS_NR_INLINE_PAGEVECS,
cl->pagevec, 0);
@@ -819,8 +820,17 @@ static int z_erofs_decompress_pcluster(struct super_block *sb,
pagenr = z_erofs_onlinepage_index(page);
DBG_BUGON(pagenr >= nr_pages);
- DBG_BUGON(pages[pagenr]);
+ /*
+ * currently EROFS doesn't support multiref(dedup),
+ * so here erroring out one multiref page.
+ */
+ if (unlikely(pages[pagenr])) {
+ DBG_BUGON(1);
+ SetPageError(pages[pagenr]);
+ z_erofs_onlinepage_endio(pages[pagenr]);
+ err = -EFSCORRUPTED;
+ }
pages[pagenr] = page;
}
z_erofs_pagevec_ctor_exit(&ctor, true);
@@ -828,7 +838,6 @@ static int z_erofs_decompress_pcluster(struct super_block *sb,
overlapped = false;
compressed_pages = pcl->compressed_pages;
- err = 0;
for (i = 0; i < clusterpages; ++i) {
unsigned int pagenr;
@@ -852,7 +861,12 @@ static int z_erofs_decompress_pcluster(struct super_block *sb,
pagenr = z_erofs_onlinepage_index(page);
DBG_BUGON(pagenr >= nr_pages);
- DBG_BUGON(pages[pagenr]);
+ if (unlikely(pages[pagenr])) {
+ DBG_BUGON(1);
+ SetPageError(pages[pagenr]);
+ z_erofs_onlinepage_endio(pages[pagenr]);
+ err = -EFSCORRUPTED;
+ }
pages[pagenr] = page;
overlapped = true;
--
2.23.0