The patch titled
Subject: fs/ocfs2: fix race in ocfs2_dentry_attach_lock()
has been removed from the -mm tree. Its filename was
fs-ocfs2-fix-race-in-ocfs2_dentry_attach_lock.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Wengang Wang <wen.gang.wang(a)oracle.com>
Subject: fs/ocfs2: fix race in ocfs2_dentry_attach_lock()
ocfs2_dentry_attach_lock() can be executed in parallel threads against the
same dentry. Make that race safe. The race is like this:
thread A thread B
(A1) enter ocfs2_dentry_attach_lock,
seeing dentry->d_fsdata is NULL,
and no alias found by
ocfs2_find_local_alias, so kmalloc
a new ocfs2_dentry_lock structure
to local variable "dl", dl1
.....
(B1) enter ocfs2_dentry_attach_lock,
seeing dentry->d_fsdata is NULL,
and no alias found by
ocfs2_find_local_alias so kmalloc
a new ocfs2_dentry_lock structure
to local variable "dl", dl2.
......
(A2) set dentry->d_fsdata with dl1,
call ocfs2_dentry_lock() and increase
dl1->dl_lockres.l_ro_holders to 1 on
success.
......
(B2) set dentry->d_fsdata with dl2
call ocfs2_dentry_lock() and increase
dl2->dl_lockres.l_ro_holders to 1 on
success.
......
(A3) call ocfs2_dentry_unlock()
and decrease
dl2->dl_lockres.l_ro_holders to 0
on success.
....
(B3) call ocfs2_dentry_unlock(),
decreasing
dl2->dl_lockres.l_ro_holders, but
see it's zero now, panic
Link: http://lkml.kernel.org/r/20190529174636.22364-1-wen.gang.wang@oracle.com
Signed-off-by: Wengang Wang <wen.gang.wang(a)oracle.com>
Reported-by: Daniel Sobe <daniel.sobe(a)nxp.com>
Tested-by: Daniel Sobe <daniel.sobe(a)nxp.com>
Reviewed-by: Changwei Ge <gechangwei(a)live.cn>
Reviewed-by: Joseph Qi <joseph.qi(a)linux.alibaba.com>
Cc: Mark Fasheh <mark(a)fasheh.com>
Cc: Joel Becker <jlbec(a)evilplan.org>
Cc: Junxiao Bi <junxiao.bi(a)oracle.com>
Cc: Gang He <ghe(a)suse.com>
Cc: Jun Piao <piaojun(a)huawei.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/ocfs2/dcache.c | 12 ++++++++++++
1 file changed, 12 insertions(+)
--- a/fs/ocfs2/dcache.c~fs-ocfs2-fix-race-in-ocfs2_dentry_attach_lock
+++ a/fs/ocfs2/dcache.c
@@ -296,6 +296,18 @@ int ocfs2_dentry_attach_lock(struct dent
out_attach:
spin_lock(&dentry_attach_lock);
+ if (unlikely(dentry->d_fsdata && !alias)) {
+ /* d_fsdata is set by a racing thread which is doing
+ * the same thing as this thread is doing. Leave the racing
+ * thread going ahead and we return here.
+ */
+ spin_unlock(&dentry_attach_lock);
+ iput(dl->dl_inode);
+ ocfs2_lock_res_free(&dl->dl_lockres);
+ kfree(dl);
+ return 0;
+ }
+
dentry->d_fsdata = dl;
dl->dl_count++;
spin_unlock(&dentry_attach_lock);
_
Patches currently in -mm which might be from wen.gang.wang(a)oracle.com are
Currently the default clock rates for the HDA and HDA2CODEC_2X clocks
are both 19.2MHz. However, the default rates for these clocks should
actually be 51MHz and 48MHz, respectively. The current clock settings
results in a distorted output during audio playback. Correct the default
clock rates for these clocks by specifying them in the clock init table
for Tegra210.
Cc: stable(a)vger.kernel.org
Signed-off-by: Jon Hunter <jonathanh(a)nvidia.com>
Acked-by: Thierry Reding <treding(a)nvidia.com>
---
Changes since V1:
- Updated the changelog and cc'ed stable.
Please note that I have not added a fixes tag because this has always
been broken for Tegra210.
drivers/clk/tegra/clk-tegra210.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/clk/tegra/clk-tegra210.c b/drivers/clk/tegra/clk-tegra210.c
index e1ba62d2b1a0..ac1d27a8c650 100644
--- a/drivers/clk/tegra/clk-tegra210.c
+++ b/drivers/clk/tegra/clk-tegra210.c
@@ -3366,6 +3366,8 @@ static struct tegra_clk_init_table init_table[] __initdata = {
{ TEGRA210_CLK_I2S3_SYNC, TEGRA210_CLK_CLK_MAX, 24576000, 0 },
{ TEGRA210_CLK_I2S4_SYNC, TEGRA210_CLK_CLK_MAX, 24576000, 0 },
{ TEGRA210_CLK_VIMCLK_SYNC, TEGRA210_CLK_CLK_MAX, 24576000, 0 },
+ { TEGRA210_CLK_HDA, TEGRA210_CLK_PLL_P, 51000000, 0 },
+ { TEGRA210_CLK_HDA2CODEC_2X, TEGRA210_CLK_PLL_P, 48000000, 0 },
/* This MUST be the last entry. */
{ TEGRA210_CLK_CLK_MAX, TEGRA210_CLK_CLK_MAX, 0, 0 },
};
--
2.7.4
From: Mike Marciniszyn <mike.marciniszyn(a)intel.com>
Heavy contention of the sde flushlist_lock can cause hard lockups at
extreme scale when the flushing logic is under stress.
Mitigate by replacing the item at a time copy to the local list with
an O(1) list_splice_init() and using the high priority work queue to
do the flushes.
Fixes: 7724105686e7 ("IB/hfi1: add driver files")
Cc: <stable(a)vger.kernel.org>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro(a)intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn(a)intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro(a)intel.com>
---
drivers/infiniband/hw/hfi1/sdma.c | 9 +++------
1 file changed, 3 insertions(+), 6 deletions(-)
diff --git a/drivers/infiniband/hw/hfi1/sdma.c b/drivers/infiniband/hw/hfi1/sdma.c
index b011072..70828de 100644
--- a/drivers/infiniband/hw/hfi1/sdma.c
+++ b/drivers/infiniband/hw/hfi1/sdma.c
@@ -410,10 +410,7 @@ static void sdma_flush(struct sdma_engine *sde)
sdma_flush_descq(sde);
spin_lock_irqsave(&sde->flushlist_lock, flags);
/* copy flush list */
- list_for_each_entry_safe(txp, txp_next, &sde->flushlist, list) {
- list_del_init(&txp->list);
- list_add_tail(&txp->list, &flushlist);
- }
+ list_splice_init(&sde->flushlist, &flushlist);
spin_unlock_irqrestore(&sde->flushlist_lock, flags);
/* flush from flush list */
list_for_each_entry_safe(txp, txp_next, &flushlist, list)
@@ -2413,7 +2410,7 @@ int sdma_send_txreq(struct sdma_engine *sde,
list_add_tail(&tx->list, &sde->flushlist);
spin_unlock(&sde->flushlist_lock);
iowait_inc_wait_count(wait, tx->num_desc);
- schedule_work(&sde->flush_worker);
+ queue_work_on(sde->cpu, system_highpri_wq, &sde->flush_worker);
ret = -ECOMM;
goto unlock;
nodesc:
@@ -2511,7 +2508,7 @@ int sdma_send_txlist(struct sdma_engine *sde, struct iowait_work *wait,
iowait_inc_wait_count(wait, tx->num_desc);
}
spin_unlock(&sde->flushlist_lock);
- schedule_work(&sde->flush_worker);
+ queue_work_on(sde->cpu, system_highpri_wq, &sde->flush_worker);
ret = -ECOMM;
goto update_tail;
nodesc:
Since commit d52888aa2753 ("x86/mm: Move LDT remap out of KASLR region on
5-level paging") kernel doesn't boot with KASAN on 5-level paging machines.
The bug is actually in early_p4d_offset() and introduced by commit
12a8cc7fcf54 ("x86/kasan: Use the same shadow offset for 4- and 5-level paging")
early_p4d_offset() tries to convert pgd_val(*pgd) value to physical
address. This doesn't make sense because pgd_val() already contains
physical address.
It did work prior to commit d52888aa2753 because the result of
"__pa_nodebug(pgd_val(*pgd)) & PTE_PFN_MASK" was the same as
"pgd_val(*pgd) & PTE_PFN_MASK". __pa_nodebug() just set some high bit
which were masked out by applying PTE_PFN_MASK.
After the change of the PAGE_OFFSET offset in commit d52888aa2753
__pa_nodebug(pgd_val(*pgd)) started to return value with more high bits
set and PTE_PFN_MASK wasn't enough to mask out all of them. So we've got
wrong not even canonical address and crash on the attempt to dereference it.
Fixes: 12a8cc7fcf54 ("x86/kasan: Use the same shadow offset for 4- and 5-level paging")
Reported-by: Kirill A. Shutemov <kirill(a)shutemov.name>
Signed-off-by: Andrey Ryabinin <aryabinin(a)virtuozzo.com>
Cc: <stable(a)vger.kernel.org>
---
arch/x86/mm/kasan_init_64.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/x86/mm/kasan_init_64.c b/arch/x86/mm/kasan_init_64.c
index 8dc0fc0b1382..296da58f3013 100644
--- a/arch/x86/mm/kasan_init_64.c
+++ b/arch/x86/mm/kasan_init_64.c
@@ -199,7 +199,7 @@ static inline p4d_t *early_p4d_offset(pgd_t *pgd, unsigned long addr)
if (!pgtable_l5_enabled())
return (p4d_t *)pgd;
- p4d = __pa_nodebug(pgd_val(*pgd)) & PTE_PFN_MASK;
+ p4d = pgd_val(*pgd) & PTE_PFN_MASK;
p4d += __START_KERNEL_map - phys_base;
return (p4d_t *)p4d + p4d_index(addr);
}
--
2.21.0
When md raid device (e.g. raid456) is used as backing device, read-ahead
requests on a degrading and recovering md raid device might be failured
immediately by md raid code, but indeed this md raid array can still be
read or write for normal I/O requests. Therefore such failed read-ahead
request are not real hardware failure. Further more, after degrading and
recovering accomplished, read-ahead requests will be handled by md raid
array again.
For such condition, I/O failures of read-ahead requests don't indicate
real health status (because normal I/O still be served), they should not
be counted into I/O error counter dc->io_errors.
Since there is no simple way to detect whether the backing divice is a
md raid device, this patch simply ignores I/O failures for read-ahead
bios on backing device, to avoid bogus backing device failure on a
degrading md raid array.
Suggested-and-tested-by: Thorsten Knabe <linux(a)thorsten-knabe.de>
Signed-off-by: Coly Li <colyli(a)suse.de>
Cc: stable(a)vger.kernel.org
---
drivers/md/bcache/io.c | 12 ++++++++++++
1 file changed, 12 insertions(+)
diff --git a/drivers/md/bcache/io.c b/drivers/md/bcache/io.c
index c25097968319..4d93f07f63e5 100644
--- a/drivers/md/bcache/io.c
+++ b/drivers/md/bcache/io.c
@@ -58,6 +58,18 @@ void bch_count_backing_io_errors(struct cached_dev *dc, struct bio *bio)
WARN_ONCE(!dc, "NULL pointer of struct cached_dev");
+ /*
+ * Read-ahead requests on a degrading and recovering md raid
+ * (e.g. raid6) device might be failured immediately by md
+ * raid code, which is not a real hardware media failure. So
+ * we shouldn't count failed REQ_RAHEAD bio to dc->io_errors.
+ */
+ if (bio->bi_opf & REQ_RAHEAD) {
+ pr_warn_ratelimited("%s: Read-ahead I/O failed on backing device, ignore",
+ dc->backing_dev_name);
+ return;
+ }
+
errors = atomic_add_return(1, &dc->io_errors);
if (errors < dc->error_limit)
pr_err("%s: IO error on backing device, unrecoverable",
--
2.16.4