The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 1a932ef4e47984dee227834667b5ff5a334e4805 Mon Sep 17 00:00:00 2001
From: Liu Bo <bo.li.liu(a)oracle.com>
Date: Thu, 25 Jan 2018 11:02:54 -0700
Subject: [PATCH] Btrfs: fix use-after-free on root->orphan_block_rsv
I got these from running generic/475,
WARNING: CPU: 0 PID: 26384 at fs/btrfs/inode.c:3326 btrfs_orphan_commit_root+0x1ac/0x2b0 [btrfs]
BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
IP: btrfs_block_rsv_release+0x1c/0x70 [btrfs]
Call Trace:
btrfs_orphan_release_metadata+0x9f/0x200 [btrfs]
btrfs_orphan_del+0x10d/0x170 [btrfs]
btrfs_setattr+0x500/0x640 [btrfs]
notify_change+0x7ae/0x870
do_truncate+0xca/0x130
vfs_truncate+0x2ee/0x3d0
do_sys_truncate+0xaf/0xf0
SyS_truncate+0xe/0x10
entry_SYSCALL_64_fastpath+0x1f/0x96
The race is between btrfs_orphan_commit_root and btrfs_orphan_del,
t1 t2
btrfs_orphan_commit_root btrfs_orphan_del
spin_lock
check (&root->orphan_inodes)
root->orphan_block_rsv = NULL;
spin_unlock
atomic_dec(&root->orphan_inodes);
access root->orphan_block_rsv
Accessing root->orphan_block_rsv must be done before decreasing
root->orphan_inodes.
cc: <stable(a)vger.kernel.org> v3.12+
Fixes: 703c88e03524 ("Btrfs: fix tracking of orphan inode count")
Signed-off-by: Liu Bo <bo.li.liu(a)oracle.com>
Reviewed-by: Josef Bacik <jbacik(a)fb.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 44a152d8f32f..29b491328f4e 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -3387,6 +3387,11 @@ int btrfs_orphan_add(struct btrfs_trans_handle *trans,
ret = btrfs_orphan_reserve_metadata(trans, inode);
ASSERT(!ret);
if (ret) {
+ /*
+ * dec doesn't need spin_lock as ->orphan_block_rsv
+ * would be released only if ->orphan_inodes is
+ * zero.
+ */
atomic_dec(&root->orphan_inodes);
clear_bit(BTRFS_INODE_ORPHAN_META_RESERVED,
&inode->runtime_flags);
@@ -3401,12 +3406,17 @@ int btrfs_orphan_add(struct btrfs_trans_handle *trans,
if (insert >= 1) {
ret = btrfs_insert_orphan_item(trans, root, btrfs_ino(inode));
if (ret) {
- atomic_dec(&root->orphan_inodes);
if (reserve) {
clear_bit(BTRFS_INODE_ORPHAN_META_RESERVED,
&inode->runtime_flags);
btrfs_orphan_release_metadata(inode);
}
+ /*
+ * btrfs_orphan_commit_root may race with us and set
+ * ->orphan_block_rsv to zero, in order to avoid that,
+ * decrease ->orphan_inodes after everything is done.
+ */
+ atomic_dec(&root->orphan_inodes);
if (ret != -EEXIST) {
clear_bit(BTRFS_INODE_HAS_ORPHAN_ITEM,
&inode->runtime_flags);
@@ -3438,28 +3448,26 @@ static int btrfs_orphan_del(struct btrfs_trans_handle *trans,
{
struct btrfs_root *root = inode->root;
int delete_item = 0;
- int release_rsv = 0;
int ret = 0;
- spin_lock(&root->orphan_lock);
if (test_and_clear_bit(BTRFS_INODE_HAS_ORPHAN_ITEM,
&inode->runtime_flags))
delete_item = 1;
+ if (delete_item && trans)
+ ret = btrfs_del_orphan_item(trans, root, btrfs_ino(inode));
+
if (test_and_clear_bit(BTRFS_INODE_ORPHAN_META_RESERVED,
&inode->runtime_flags))
- release_rsv = 1;
- spin_unlock(&root->orphan_lock);
+ btrfs_orphan_release_metadata(inode);
- if (delete_item) {
+ /*
+ * btrfs_orphan_commit_root may race with us and set ->orphan_block_rsv
+ * to zero, in order to avoid that, decrease ->orphan_inodes after
+ * everything is done.
+ */
+ if (delete_item)
atomic_dec(&root->orphan_inodes);
- if (trans)
- ret = btrfs_del_orphan_item(trans, root,
- btrfs_ino(inode));
- }
-
- if (release_rsv)
- btrfs_orphan_release_metadata(inode);
return ret;
}
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 815c6704bf9f1c59f3a6be380a4032b9c57b12f1 Mon Sep 17 00:00:00 2001
From: Keith Busch <keith.busch(a)intel.com>
Date: Tue, 13 Feb 2018 05:44:44 -0700
Subject: [PATCH] nvme-pci: Remap CMB SQ entries on every controller reset
The controller memory buffer is remapped into a kernel address on each
reset, but the driver was setting the submission queue base address
only on the very first queue creation. The remapped address is likely to
change after a reset, so accessing the old address will hit a kernel bug.
This patch fixes that by setting the queue's CMB base address each time
the queue is created.
Fixes: f63572dff1421 ("nvme: unmap CMB and remove sysfs file in reset path")
Reported-by: Christian Black <christian.d.black(a)intel.com>
Cc: Jon Derrick <jonathan.derrick(a)intel.com>
Cc: <stable(a)vger.kernel.org> # 4.9+
Signed-off-by: Keith Busch <keith.busch(a)intel.com>
Reviewed-by: Christoph Hellwig <hch(a)lst.de>
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index ab9c19525fa8..b427157af74e 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -1364,18 +1364,14 @@ static int nvme_cmb_qdepth(struct nvme_dev *dev, int nr_io_queues,
static int nvme_alloc_sq_cmds(struct nvme_dev *dev, struct nvme_queue *nvmeq,
int qid, int depth)
{
- if (qid && dev->cmb && use_cmb_sqes && (dev->cmbsz & NVME_CMBSZ_SQS)) {
- unsigned offset = (qid - 1) * roundup(SQ_SIZE(depth),
- dev->ctrl.page_size);
- nvmeq->sq_dma_addr = dev->cmb_bus_addr + offset;
- nvmeq->sq_cmds_io = dev->cmb + offset;
- } else {
- nvmeq->sq_cmds = dma_alloc_coherent(dev->dev, SQ_SIZE(depth),
- &nvmeq->sq_dma_addr, GFP_KERNEL);
- if (!nvmeq->sq_cmds)
- return -ENOMEM;
- }
+ /* CMB SQEs will be mapped before creation */
+ if (qid && dev->cmb && use_cmb_sqes && (dev->cmbsz & NVME_CMBSZ_SQS))
+ return 0;
+ nvmeq->sq_cmds = dma_alloc_coherent(dev->dev, SQ_SIZE(depth),
+ &nvmeq->sq_dma_addr, GFP_KERNEL);
+ if (!nvmeq->sq_cmds)
+ return -ENOMEM;
return 0;
}
@@ -1449,6 +1445,13 @@ static int nvme_create_queue(struct nvme_queue *nvmeq, int qid)
struct nvme_dev *dev = nvmeq->dev;
int result;
+ if (dev->cmb && use_cmb_sqes && (dev->cmbsz & NVME_CMBSZ_SQS)) {
+ unsigned offset = (qid - 1) * roundup(SQ_SIZE(nvmeq->q_depth),
+ dev->ctrl.page_size);
+ nvmeq->sq_dma_addr = dev->cmb_bus_addr + offset;
+ nvmeq->sq_cmds_io = dev->cmb + offset;
+ }
+
nvmeq->cq_vector = qid - 1;
result = adapter_alloc_cq(dev, qid, nvmeq);
if (result < 0)
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 815c6704bf9f1c59f3a6be380a4032b9c57b12f1 Mon Sep 17 00:00:00 2001
From: Keith Busch <keith.busch(a)intel.com>
Date: Tue, 13 Feb 2018 05:44:44 -0700
Subject: [PATCH] nvme-pci: Remap CMB SQ entries on every controller reset
The controller memory buffer is remapped into a kernel address on each
reset, but the driver was setting the submission queue base address
only on the very first queue creation. The remapped address is likely to
change after a reset, so accessing the old address will hit a kernel bug.
This patch fixes that by setting the queue's CMB base address each time
the queue is created.
Fixes: f63572dff1421 ("nvme: unmap CMB and remove sysfs file in reset path")
Reported-by: Christian Black <christian.d.black(a)intel.com>
Cc: Jon Derrick <jonathan.derrick(a)intel.com>
Cc: <stable(a)vger.kernel.org> # 4.9+
Signed-off-by: Keith Busch <keith.busch(a)intel.com>
Reviewed-by: Christoph Hellwig <hch(a)lst.de>
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index ab9c19525fa8..b427157af74e 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -1364,18 +1364,14 @@ static int nvme_cmb_qdepth(struct nvme_dev *dev, int nr_io_queues,
static int nvme_alloc_sq_cmds(struct nvme_dev *dev, struct nvme_queue *nvmeq,
int qid, int depth)
{
- if (qid && dev->cmb && use_cmb_sqes && (dev->cmbsz & NVME_CMBSZ_SQS)) {
- unsigned offset = (qid - 1) * roundup(SQ_SIZE(depth),
- dev->ctrl.page_size);
- nvmeq->sq_dma_addr = dev->cmb_bus_addr + offset;
- nvmeq->sq_cmds_io = dev->cmb + offset;
- } else {
- nvmeq->sq_cmds = dma_alloc_coherent(dev->dev, SQ_SIZE(depth),
- &nvmeq->sq_dma_addr, GFP_KERNEL);
- if (!nvmeq->sq_cmds)
- return -ENOMEM;
- }
+ /* CMB SQEs will be mapped before creation */
+ if (qid && dev->cmb && use_cmb_sqes && (dev->cmbsz & NVME_CMBSZ_SQS))
+ return 0;
+ nvmeq->sq_cmds = dma_alloc_coherent(dev->dev, SQ_SIZE(depth),
+ &nvmeq->sq_dma_addr, GFP_KERNEL);
+ if (!nvmeq->sq_cmds)
+ return -ENOMEM;
return 0;
}
@@ -1449,6 +1445,13 @@ static int nvme_create_queue(struct nvme_queue *nvmeq, int qid)
struct nvme_dev *dev = nvmeq->dev;
int result;
+ if (dev->cmb && use_cmb_sqes && (dev->cmbsz & NVME_CMBSZ_SQS)) {
+ unsigned offset = (qid - 1) * roundup(SQ_SIZE(nvmeq->q_depth),
+ dev->ctrl.page_size);
+ nvmeq->sq_dma_addr = dev->cmb_bus_addr + offset;
+ nvmeq->sq_cmds_io = dev->cmb + offset;
+ }
+
nvmeq->cq_vector = qid - 1;
result = adapter_alloc_cq(dev, qid, nvmeq);
if (result < 0)
The patch below does not apply to the 4.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 815c6704bf9f1c59f3a6be380a4032b9c57b12f1 Mon Sep 17 00:00:00 2001
From: Keith Busch <keith.busch(a)intel.com>
Date: Tue, 13 Feb 2018 05:44:44 -0700
Subject: [PATCH] nvme-pci: Remap CMB SQ entries on every controller reset
The controller memory buffer is remapped into a kernel address on each
reset, but the driver was setting the submission queue base address
only on the very first queue creation. The remapped address is likely to
change after a reset, so accessing the old address will hit a kernel bug.
This patch fixes that by setting the queue's CMB base address each time
the queue is created.
Fixes: f63572dff1421 ("nvme: unmap CMB and remove sysfs file in reset path")
Reported-by: Christian Black <christian.d.black(a)intel.com>
Cc: Jon Derrick <jonathan.derrick(a)intel.com>
Cc: <stable(a)vger.kernel.org> # 4.9+
Signed-off-by: Keith Busch <keith.busch(a)intel.com>
Reviewed-by: Christoph Hellwig <hch(a)lst.de>
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index ab9c19525fa8..b427157af74e 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -1364,18 +1364,14 @@ static int nvme_cmb_qdepth(struct nvme_dev *dev, int nr_io_queues,
static int nvme_alloc_sq_cmds(struct nvme_dev *dev, struct nvme_queue *nvmeq,
int qid, int depth)
{
- if (qid && dev->cmb && use_cmb_sqes && (dev->cmbsz & NVME_CMBSZ_SQS)) {
- unsigned offset = (qid - 1) * roundup(SQ_SIZE(depth),
- dev->ctrl.page_size);
- nvmeq->sq_dma_addr = dev->cmb_bus_addr + offset;
- nvmeq->sq_cmds_io = dev->cmb + offset;
- } else {
- nvmeq->sq_cmds = dma_alloc_coherent(dev->dev, SQ_SIZE(depth),
- &nvmeq->sq_dma_addr, GFP_KERNEL);
- if (!nvmeq->sq_cmds)
- return -ENOMEM;
- }
+ /* CMB SQEs will be mapped before creation */
+ if (qid && dev->cmb && use_cmb_sqes && (dev->cmbsz & NVME_CMBSZ_SQS))
+ return 0;
+ nvmeq->sq_cmds = dma_alloc_coherent(dev->dev, SQ_SIZE(depth),
+ &nvmeq->sq_dma_addr, GFP_KERNEL);
+ if (!nvmeq->sq_cmds)
+ return -ENOMEM;
return 0;
}
@@ -1449,6 +1445,13 @@ static int nvme_create_queue(struct nvme_queue *nvmeq, int qid)
struct nvme_dev *dev = nvmeq->dev;
int result;
+ if (dev->cmb && use_cmb_sqes && (dev->cmbsz & NVME_CMBSZ_SQS)) {
+ unsigned offset = (qid - 1) * roundup(SQ_SIZE(nvmeq->q_depth),
+ dev->ctrl.page_size);
+ nvmeq->sq_dma_addr = dev->cmb_bus_addr + offset;
+ nvmeq->sq_cmds_io = dev->cmb + offset;
+ }
+
nvmeq->cq_vector = qid - 1;
result = adapter_alloc_cq(dev, qid, nvmeq);
if (result < 0)
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 8dd601fa8317243be887458c49f6c29c2f3d719f Mon Sep 17 00:00:00 2001
From: NeilBrown <neilb(a)suse.com>
Date: Thu, 15 Feb 2018 20:00:15 +1100
Subject: [PATCH] dm: correctly handle chained bios in dec_pending()
dec_pending() is given an error status (possibly 0) to be recorded
against a bio. It can be called several times on the one 'struct
dm_io', and it is careful to only assign a non-zero error to
io->status. However when it then assigned io->status to bio->bi_status,
it is not careful and could overwrite a genuine error status with 0.
This can happen when chained bios are in use. If a bio is chained
beneath the bio that this dm_io is handling, the child bio might
complete and set bio->bi_status before the dm_io completes.
This has been possible since chained bios were introduced in 3.14, and
has become a lot easier to trigger with commit 18a25da84354 ("dm: ensure
bio submission follows a depth-first tree walk") as that commit caused
dm to start using chained bios itself.
A particular failure mode is that if a bio spans an 'error' target and a
working target, the 'error' fragment will complete instantly and set the
->bi_status, and the other fragment will normally complete a little
later, and will clear ->bi_status.
The fix is simply to only assign io_error to bio->bi_status when
io_error is not zero.
Reported-and-tested-by: Milan Broz <gmazyland(a)gmail.com>
Cc: stable(a)vger.kernel.org (v3.14+)
Signed-off-by: NeilBrown <neilb(a)suse.com>
Signed-off-by: Mike Snitzer <snitzer(a)redhat.com>
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index d6de00f367ef..68136806d365 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -903,7 +903,8 @@ static void dec_pending(struct dm_io *io, blk_status_t error)
queue_io(md, bio);
} else {
/* done with normal IO or empty flush */
- bio->bi_status = io_error;
+ if (io_error)
+ bio->bi_status = io_error;
bio_endio(bio);
}
}
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 8dd601fa8317243be887458c49f6c29c2f3d719f Mon Sep 17 00:00:00 2001
From: NeilBrown <neilb(a)suse.com>
Date: Thu, 15 Feb 2018 20:00:15 +1100
Subject: [PATCH] dm: correctly handle chained bios in dec_pending()
dec_pending() is given an error status (possibly 0) to be recorded
against a bio. It can be called several times on the one 'struct
dm_io', and it is careful to only assign a non-zero error to
io->status. However when it then assigned io->status to bio->bi_status,
it is not careful and could overwrite a genuine error status with 0.
This can happen when chained bios are in use. If a bio is chained
beneath the bio that this dm_io is handling, the child bio might
complete and set bio->bi_status before the dm_io completes.
This has been possible since chained bios were introduced in 3.14, and
has become a lot easier to trigger with commit 18a25da84354 ("dm: ensure
bio submission follows a depth-first tree walk") as that commit caused
dm to start using chained bios itself.
A particular failure mode is that if a bio spans an 'error' target and a
working target, the 'error' fragment will complete instantly and set the
->bi_status, and the other fragment will normally complete a little
later, and will clear ->bi_status.
The fix is simply to only assign io_error to bio->bi_status when
io_error is not zero.
Reported-and-tested-by: Milan Broz <gmazyland(a)gmail.com>
Cc: stable(a)vger.kernel.org (v3.14+)
Signed-off-by: NeilBrown <neilb(a)suse.com>
Signed-off-by: Mike Snitzer <snitzer(a)redhat.com>
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index d6de00f367ef..68136806d365 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -903,7 +903,8 @@ static void dec_pending(struct dm_io *io, blk_status_t error)
queue_io(md, bio);
} else {
/* done with normal IO or empty flush */
- bio->bi_status = io_error;
+ if (io_error)
+ bio->bi_status = io_error;
bio_endio(bio);
}
}
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From fd0e786d9d09024f67bd71ec094b110237dc3840 Mon Sep 17 00:00:00 2001
From: Tony Luck <tony.luck(a)intel.com>
Date: Thu, 25 Jan 2018 14:23:48 -0800
Subject: [PATCH] x86/mm, mm/hwpoison: Don't unconditionally unmap kernel 1:1
pages
In the following commit:
ce0fa3e56ad2 ("x86/mm, mm/hwpoison: Clear PRESENT bit for kernel 1:1 mappings of poison pages")
... we added code to memory_failure() to unmap the page from the
kernel 1:1 virtual address space to avoid speculative access to the
page logging additional errors.
But memory_failure() may not always succeed in taking the page offline,
especially if the page belongs to the kernel. This can happen if
there are too many corrected errors on a page and either mcelog(8)
or drivers/ras/cec.c asks to take a page offline.
Since we remove the 1:1 mapping early in memory_failure(), we can
end up with the page unmapped, but still in use. On the next access
the kernel crashes :-(
There are also various debug paths that call memory_failure() to simulate
occurrence of an error. Since there is no actual error in memory, we
don't need to map out the page for those cases.
Revert most of the previous attempt and keep the solution local to
arch/x86/kernel/cpu/mcheck/mce.c. Unmap the page only when:
1) there is a real error
2) memory_failure() succeeds.
All of this only applies to 64-bit systems. 32-bit kernel doesn't map
all of memory into kernel space. It isn't worth adding the code to unmap
the piece that is mapped because nobody would run a 32-bit kernel on a
machine that has recoverable machine checks.
Signed-off-by: Tony Luck <tony.luck(a)intel.com>
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Cc: Andy Lutomirski <luto(a)kernel.org>
Cc: Borislav Petkov <bp(a)suse.de>
Cc: Brian Gerst <brgerst(a)gmail.com>
Cc: Dave <dave.hansen(a)intel.com>
Cc: Denys Vlasenko <dvlasenk(a)redhat.com>
Cc: Josh Poimboeuf <jpoimboe(a)redhat.com>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: Naoya Horiguchi <n-horiguchi(a)ah.jp.nec.com>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Robert (Persistent Memory) <elliott(a)hpe.com>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: linux-mm(a)kvack.org
Cc: stable(a)vger.kernel.org #v4.14
Fixes: ce0fa3e56ad2 ("x86/mm, mm/hwpoison: Clear PRESENT bit for kernel 1:1 mappings of poison pages")
Signed-off-by: Ingo Molnar <mingo(a)kernel.org>
diff --git a/arch/x86/include/asm/page_64.h b/arch/x86/include/asm/page_64.h
index 4baa6bceb232..d652a3808065 100644
--- a/arch/x86/include/asm/page_64.h
+++ b/arch/x86/include/asm/page_64.h
@@ -52,10 +52,6 @@ static inline void clear_page(void *page)
void copy_page(void *to, void *from);
-#ifdef CONFIG_X86_MCE
-#define arch_unmap_kpfn arch_unmap_kpfn
-#endif
-
#endif /* !__ASSEMBLY__ */
#ifdef CONFIG_X86_VSYSCALL_EMULATION
diff --git a/arch/x86/kernel/cpu/mcheck/mce-internal.h b/arch/x86/kernel/cpu/mcheck/mce-internal.h
index aa0d5df9dc60..e956eb267061 100644
--- a/arch/x86/kernel/cpu/mcheck/mce-internal.h
+++ b/arch/x86/kernel/cpu/mcheck/mce-internal.h
@@ -115,4 +115,19 @@ static inline void mce_unregister_injector_chain(struct notifier_block *nb) { }
extern struct mca_config mca_cfg;
+#ifndef CONFIG_X86_64
+/*
+ * On 32-bit systems it would be difficult to safely unmap a poison page
+ * from the kernel 1:1 map because there are no non-canonical addresses that
+ * we can use to refer to the address without risking a speculative access.
+ * However, this isn't much of an issue because:
+ * 1) Few unmappable pages are in the 1:1 map. Most are in HIGHMEM which
+ * are only mapped into the kernel as needed
+ * 2) Few people would run a 32-bit kernel on a machine that supports
+ * recoverable errors because they have too much memory to boot 32-bit.
+ */
+static inline void mce_unmap_kpfn(unsigned long pfn) {}
+#define mce_unmap_kpfn mce_unmap_kpfn
+#endif
+
#endif /* __X86_MCE_INTERNAL_H__ */
diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index 75f405ac085c..8ff94d1e2dce 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -105,6 +105,10 @@ static struct irq_work mce_irq_work;
static void (*quirk_no_way_out)(int bank, struct mce *m, struct pt_regs *regs);
+#ifndef mce_unmap_kpfn
+static void mce_unmap_kpfn(unsigned long pfn);
+#endif
+
/*
* CPU/chipset specific EDAC code can register a notifier call here to print
* MCE errors in a human-readable form.
@@ -590,7 +594,8 @@ static int srao_decode_notifier(struct notifier_block *nb, unsigned long val,
if (mce_usable_address(mce) && (mce->severity == MCE_AO_SEVERITY)) {
pfn = mce->addr >> PAGE_SHIFT;
- memory_failure(pfn, 0);
+ if (!memory_failure(pfn, 0))
+ mce_unmap_kpfn(pfn);
}
return NOTIFY_OK;
@@ -1057,12 +1062,13 @@ static int do_memory_failure(struct mce *m)
ret = memory_failure(m->addr >> PAGE_SHIFT, flags);
if (ret)
pr_err("Memory error not recovered");
+ else
+ mce_unmap_kpfn(m->addr >> PAGE_SHIFT);
return ret;
}
-#if defined(arch_unmap_kpfn) && defined(CONFIG_MEMORY_FAILURE)
-
-void arch_unmap_kpfn(unsigned long pfn)
+#ifndef mce_unmap_kpfn
+static void mce_unmap_kpfn(unsigned long pfn)
{
unsigned long decoy_addr;
@@ -1073,7 +1079,7 @@ void arch_unmap_kpfn(unsigned long pfn)
* We would like to just call:
* set_memory_np((unsigned long)pfn_to_kaddr(pfn), 1);
* but doing that would radically increase the odds of a
- * speculative access to the posion page because we'd have
+ * speculative access to the poison page because we'd have
* the virtual address of the kernel 1:1 mapping sitting
* around in registers.
* Instead we get tricky. We create a non-canonical address
@@ -1098,7 +1104,6 @@ void arch_unmap_kpfn(unsigned long pfn)
if (set_memory_np(decoy_addr, 1))
pr_warn("Could not invalidate pfn=0x%lx from 1:1 map\n", pfn);
-
}
#endif
diff --git a/include/linux/mm_inline.h b/include/linux/mm_inline.h
index c30b32e3c862..10191c28fc04 100644
--- a/include/linux/mm_inline.h
+++ b/include/linux/mm_inline.h
@@ -127,10 +127,4 @@ static __always_inline enum lru_list page_lru(struct page *page)
#define lru_to_page(head) (list_entry((head)->prev, struct page, lru))
-#ifdef arch_unmap_kpfn
-extern void arch_unmap_kpfn(unsigned long pfn);
-#else
-static __always_inline void arch_unmap_kpfn(unsigned long pfn) { }
-#endif
-
#endif
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 4b80ccee4535..8291b75f42c8 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1139,8 +1139,6 @@ int memory_failure(unsigned long pfn, int flags)
return 0;
}
- arch_unmap_kpfn(pfn);
-
orig_head = hpage = compound_head(p);
num_poisoned_pages_inc();
This is a note to let you know that I've just added the patch titled
rtlwifi: rtl8821ae: Fix connection lost problem correctly
to the 4.15-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
rtlwifi-rtl8821ae-fix-connection-lost-problem-correctly.patch
and it can be found in the queue-4.15 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From c713fb071edc0efc01a955f65a006b0e1795d2eb Mon Sep 17 00:00:00 2001
From: Larry Finger <Larry.Finger(a)lwfinger.net>
Date: Mon, 5 Feb 2018 12:38:11 -0600
Subject: rtlwifi: rtl8821ae: Fix connection lost problem correctly
From: Larry Finger <Larry.Finger(a)lwfinger.net>
commit c713fb071edc0efc01a955f65a006b0e1795d2eb upstream.
There has been a coding error in rtl8821ae since it was first introduced,
namely that an 8-bit register was read using a 16-bit read in
_rtl8821ae_dbi_read(). This error was fixed with commit 40b368af4b75
("rtlwifi: Fix alignment issues"); however, this change led to
instability in the connection. To restore stability, this change
was reverted in commit b8b8b16352cd ("rtlwifi: rtl8821ae: Fix connection
lost problem").
Unfortunately, the unaligned access causes machine checks in ARM
architecture, and we were finally forced to find the actual cause of the
problem on x86 platforms. Following a suggestion from Pkshih
<pkshih(a)realtek.com>, it was found that increasing the ASPM L1
latency from 0 to 7 fixed the instability. This parameter was varied to
see if a smaller value would work; however, it appears that 7 is the
safest value. A new symbol is defined for this quantity, thus it can be
easily changed if necessary.
Fixes: b8b8b16352cd ("rtlwifi: rtl8821ae: Fix connection lost problem")
Cc: Stable <stable(a)vger.kernel.org> # 4.14+
Fix-suggested-by: Pkshih <pkshih(a)realtek.com>
Signed-off-by: Larry Finger <Larry.Finger(a)lwfinger.net>
Tested-by: James Cameron <quozl(a)laptop.org> # x86_64 OLPC NL3
Signed-off-by: Kalle Valo <kvalo(a)codeaurora.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/net/wireless/realtek/rtlwifi/rtl8821ae/hw.c | 5 +++--
drivers/net/wireless/realtek/rtlwifi/wifi.h | 1 +
2 files changed, 4 insertions(+), 2 deletions(-)
--- a/drivers/net/wireless/realtek/rtlwifi/rtl8821ae/hw.c
+++ b/drivers/net/wireless/realtek/rtlwifi/rtl8821ae/hw.c
@@ -1123,7 +1123,7 @@ static u8 _rtl8821ae_dbi_read(struct rtl
}
if (0 == tmp) {
read_addr = REG_DBI_RDATA + addr % 4;
- ret = rtl_read_word(rtlpriv, read_addr);
+ ret = rtl_read_byte(rtlpriv, read_addr);
}
return ret;
}
@@ -1165,7 +1165,8 @@ static void _rtl8821ae_enable_aspm_back_
}
tmp = _rtl8821ae_dbi_read(rtlpriv, 0x70f);
- _rtl8821ae_dbi_write(rtlpriv, 0x70f, tmp | BIT(7));
+ _rtl8821ae_dbi_write(rtlpriv, 0x70f, tmp | BIT(7) |
+ ASPM_L1_LATENCY << 3);
tmp = _rtl8821ae_dbi_read(rtlpriv, 0x719);
_rtl8821ae_dbi_write(rtlpriv, 0x719, tmp | BIT(3) | BIT(4));
--- a/drivers/net/wireless/realtek/rtlwifi/wifi.h
+++ b/drivers/net/wireless/realtek/rtlwifi/wifi.h
@@ -99,6 +99,7 @@
#define RTL_USB_MAX_RX_COUNT 100
#define QBSS_LOAD_SIZE 5
#define MAX_WMMELE_LENGTH 64
+#define ASPM_L1_LATENCY 7
#define TOTAL_CAM_ENTRY 32
Patches currently in stable-queue which might be from Larry.Finger(a)lwfinger.net are
queue-4.15/rtlwifi-rtl8821ae-fix-connection-lost-problem-correctly.patch