The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 5e0fb5df2ee871b841f96f9cb6a7f2784e96aa4e Mon Sep 17 00:00:00 2001
From: Toshi Kani <toshi.kani(a)hpe.com>
Date: Wed, 27 Jun 2018 08:13:48 -0600
Subject: [PATCH] x86/mm: Add TLB purge to free pmd/pte page interfaces
ioremap() calls pud_free_pmd_page() / pmd_free_pte_page() when it creates
a pud / pmd map. The following preconditions are met at their entry.
- All pte entries for a target pud/pmd address range have been cleared.
- System-wide TLB purges have been peformed for a target pud/pmd address
range.
The preconditions assure that there is no stale TLB entry for the range.
Speculation may not cache TLB entries since it requires all levels of page
entries, including ptes, to have P & A-bits set for an associated address.
However, speculation may cache pud/pmd entries (paging-structure caches)
when they have P-bit set.
Add a system-wide TLB purge (INVLPG) to a single page after clearing
pud/pmd entry's P-bit.
SDM 4.10.4.1, Operation that Invalidate TLBs and Paging-Structure Caches,
states that:
INVLPG invalidates all paging-structure caches associated with the
current PCID regardless of the liner addresses to which they correspond.
Fixes: 28ee90fe6048 ("x86/mm: implement free pmd/pte page interfaces")
Signed-off-by: Toshi Kani <toshi.kani(a)hpe.com>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Cc: mhocko(a)suse.com
Cc: akpm(a)linux-foundation.org
Cc: hpa(a)zytor.com
Cc: cpandya(a)codeaurora.org
Cc: linux-mm(a)kvack.org
Cc: linux-arm-kernel(a)lists.infradead.org
Cc: Joerg Roedel <joro(a)8bytes.org>
Cc: stable(a)vger.kernel.org
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: "H. Peter Anvin" <hpa(a)zytor.com>
Cc: <stable(a)vger.kernel.org>
Link: https://lkml.kernel.org/r/20180627141348.21777-4-toshi.kani@hpe.com
diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
index fbd14e506758..e3deefb891da 100644
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -725,24 +725,44 @@ int pmd_clear_huge(pmd_t *pmd)
* @pud: Pointer to a PUD.
* @addr: Virtual address associated with pud.
*
- * Context: The pud range has been unmaped and TLB purged.
+ * Context: The pud range has been unmapped and TLB purged.
* Return: 1 if clearing the entry succeeded. 0 otherwise.
+ *
+ * NOTE: Callers must allow a single page allocation.
*/
int pud_free_pmd_page(pud_t *pud, unsigned long addr)
{
- pmd_t *pmd;
+ pmd_t *pmd, *pmd_sv;
+ pte_t *pte;
int i;
if (pud_none(*pud))
return 1;
pmd = (pmd_t *)pud_page_vaddr(*pud);
+ pmd_sv = (pmd_t *)__get_free_page(GFP_KERNEL);
+ if (!pmd_sv)
+ return 0;
- for (i = 0; i < PTRS_PER_PMD; i++)
- if (!pmd_free_pte_page(&pmd[i], addr + (i * PMD_SIZE)))
- return 0;
+ for (i = 0; i < PTRS_PER_PMD; i++) {
+ pmd_sv[i] = pmd[i];
+ if (!pmd_none(pmd[i]))
+ pmd_clear(&pmd[i]);
+ }
pud_clear(pud);
+
+ /* INVLPG to clear all paging-structure caches */
+ flush_tlb_kernel_range(addr, addr + PAGE_SIZE-1);
+
+ for (i = 0; i < PTRS_PER_PMD; i++) {
+ if (!pmd_none(pmd_sv[i])) {
+ pte = (pte_t *)pmd_page_vaddr(pmd_sv[i]);
+ free_page((unsigned long)pte);
+ }
+ }
+
+ free_page((unsigned long)pmd_sv);
free_page((unsigned long)pmd);
return 1;
@@ -753,7 +773,7 @@ int pud_free_pmd_page(pud_t *pud, unsigned long addr)
* @pmd: Pointer to a PMD.
* @addr: Virtual address associated with pmd.
*
- * Context: The pmd range has been unmaped and TLB purged.
+ * Context: The pmd range has been unmapped and TLB purged.
* Return: 1 if clearing the entry succeeded. 0 otherwise.
*/
int pmd_free_pte_page(pmd_t *pmd, unsigned long addr)
@@ -765,6 +785,10 @@ int pmd_free_pte_page(pmd_t *pmd, unsigned long addr)
pte = (pte_t *)pmd_page_vaddr(*pmd);
pmd_clear(pmd);
+
+ /* INVLPG to clear all paging-structure caches */
+ flush_tlb_kernel_range(addr, addr + PAGE_SIZE-1);
+
free_page((unsigned long)pte);
return 1;
The patch below does not apply to the 3.18-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 8088d3dd4d7c6933a65aa169393b5d88d8065672 Mon Sep 17 00:00:00 2001
From: Eric Biggers <ebiggers(a)google.com>
Date: Mon, 23 Jul 2018 10:54:56 -0700
Subject: [PATCH] crypto: skcipher - fix crash flushing dcache in error path
scatterwalk_done() is only meant to be called after a nonzero number of
bytes have been processed, since scatterwalk_pagedone() will flush the
dcache of the *previous* page. But in the error case of
skcipher_walk_done(), e.g. if the input wasn't an integer number of
blocks, scatterwalk_done() was actually called after advancing 0 bytes.
This caused a crash ("BUG: unable to handle kernel paging request")
during '!PageSlab(page)' on architectures like arm and arm64 that define
ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE, provided that the input was
page-aligned as in that case walk->offset == 0.
Fix it by reorganizing skcipher_walk_done() to skip the
scatterwalk_advance() and scatterwalk_done() if an error has occurred.
This bug was found by syzkaller fuzzing.
Reproducer, assuming ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE:
#include <linux/if_alg.h>
#include <sys/socket.h>
#include <unistd.h>
int main()
{
struct sockaddr_alg addr = {
.salg_type = "skcipher",
.salg_name = "cbc(aes-generic)",
};
char buffer[4096] __attribute__((aligned(4096))) = { 0 };
int fd;
fd = socket(AF_ALG, SOCK_SEQPACKET, 0);
bind(fd, (void *)&addr, sizeof(addr));
setsockopt(fd, SOL_ALG, ALG_SET_KEY, buffer, 16);
fd = accept(fd, NULL, NULL);
write(fd, buffer, 15);
read(fd, buffer, 15);
}
Reported-by: Liu Chao <liuchao741(a)huawei.com>
Fixes: b286d8b1a690 ("crypto: skcipher - Add skcipher walk interface")
Cc: <stable(a)vger.kernel.org> # v4.10+
Signed-off-by: Eric Biggers <ebiggers(a)google.com>
Signed-off-by: Herbert Xu <herbert(a)gondor.apana.org.au>
diff --git a/crypto/skcipher.c b/crypto/skcipher.c
index 835e5d36ad59..0bd8c6caa498 100644
--- a/crypto/skcipher.c
+++ b/crypto/skcipher.c
@@ -95,7 +95,7 @@ static inline u8 *skcipher_get_spot(u8 *start, unsigned int len)
return max(start, end_page);
}
-static int skcipher_done_slow(struct skcipher_walk *walk, unsigned int bsize)
+static void skcipher_done_slow(struct skcipher_walk *walk, unsigned int bsize)
{
u8 *addr;
@@ -103,23 +103,24 @@ static int skcipher_done_slow(struct skcipher_walk *walk, unsigned int bsize)
addr = skcipher_get_spot(addr, bsize);
scatterwalk_copychunks(addr, &walk->out, bsize,
(walk->flags & SKCIPHER_WALK_PHYS) ? 2 : 1);
- return 0;
}
int skcipher_walk_done(struct skcipher_walk *walk, int err)
{
- unsigned int n = walk->nbytes - err;
- unsigned int nbytes;
-
- nbytes = walk->total - n;
-
- if (unlikely(err < 0)) {
- nbytes = 0;
- n = 0;
- } else if (likely(!(walk->flags & (SKCIPHER_WALK_PHYS |
- SKCIPHER_WALK_SLOW |
- SKCIPHER_WALK_COPY |
- SKCIPHER_WALK_DIFF)))) {
+ unsigned int n; /* bytes processed */
+ bool more;
+
+ if (unlikely(err < 0))
+ goto finish;
+
+ n = walk->nbytes - err;
+ walk->total -= n;
+ more = (walk->total != 0);
+
+ if (likely(!(walk->flags & (SKCIPHER_WALK_PHYS |
+ SKCIPHER_WALK_SLOW |
+ SKCIPHER_WALK_COPY |
+ SKCIPHER_WALK_DIFF)))) {
unmap_src:
skcipher_unmap_src(walk);
} else if (walk->flags & SKCIPHER_WALK_DIFF) {
@@ -131,28 +132,28 @@ int skcipher_walk_done(struct skcipher_walk *walk, int err)
skcipher_unmap_dst(walk);
} else if (unlikely(walk->flags & SKCIPHER_WALK_SLOW)) {
if (WARN_ON(err)) {
+ /* unexpected case; didn't process all bytes */
err = -EINVAL;
- nbytes = 0;
- } else
- n = skcipher_done_slow(walk, n);
+ goto finish;
+ }
+ skcipher_done_slow(walk, n);
+ goto already_advanced;
}
- if (err > 0)
- err = 0;
-
- walk->total = nbytes;
- walk->nbytes = nbytes;
-
scatterwalk_advance(&walk->in, n);
scatterwalk_advance(&walk->out, n);
- scatterwalk_done(&walk->in, 0, nbytes);
- scatterwalk_done(&walk->out, 1, nbytes);
+already_advanced:
+ scatterwalk_done(&walk->in, 0, more);
+ scatterwalk_done(&walk->out, 1, more);
- if (nbytes) {
+ if (more) {
crypto_yield(walk->flags & SKCIPHER_WALK_SLEEP ?
CRYPTO_TFM_REQ_MAY_SLEEP : 0);
return skcipher_walk_next(walk);
}
+ err = 0;
+finish:
+ walk->nbytes = 0;
/* Short-circuit for the common/fast path. */
if (!((unsigned long)walk->buffer | (unsigned long)walk->page))
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 8088d3dd4d7c6933a65aa169393b5d88d8065672 Mon Sep 17 00:00:00 2001
From: Eric Biggers <ebiggers(a)google.com>
Date: Mon, 23 Jul 2018 10:54:56 -0700
Subject: [PATCH] crypto: skcipher - fix crash flushing dcache in error path
scatterwalk_done() is only meant to be called after a nonzero number of
bytes have been processed, since scatterwalk_pagedone() will flush the
dcache of the *previous* page. But in the error case of
skcipher_walk_done(), e.g. if the input wasn't an integer number of
blocks, scatterwalk_done() was actually called after advancing 0 bytes.
This caused a crash ("BUG: unable to handle kernel paging request")
during '!PageSlab(page)' on architectures like arm and arm64 that define
ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE, provided that the input was
page-aligned as in that case walk->offset == 0.
Fix it by reorganizing skcipher_walk_done() to skip the
scatterwalk_advance() and scatterwalk_done() if an error has occurred.
This bug was found by syzkaller fuzzing.
Reproducer, assuming ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE:
#include <linux/if_alg.h>
#include <sys/socket.h>
#include <unistd.h>
int main()
{
struct sockaddr_alg addr = {
.salg_type = "skcipher",
.salg_name = "cbc(aes-generic)",
};
char buffer[4096] __attribute__((aligned(4096))) = { 0 };
int fd;
fd = socket(AF_ALG, SOCK_SEQPACKET, 0);
bind(fd, (void *)&addr, sizeof(addr));
setsockopt(fd, SOL_ALG, ALG_SET_KEY, buffer, 16);
fd = accept(fd, NULL, NULL);
write(fd, buffer, 15);
read(fd, buffer, 15);
}
Reported-by: Liu Chao <liuchao741(a)huawei.com>
Fixes: b286d8b1a690 ("crypto: skcipher - Add skcipher walk interface")
Cc: <stable(a)vger.kernel.org> # v4.10+
Signed-off-by: Eric Biggers <ebiggers(a)google.com>
Signed-off-by: Herbert Xu <herbert(a)gondor.apana.org.au>
diff --git a/crypto/skcipher.c b/crypto/skcipher.c
index 835e5d36ad59..0bd8c6caa498 100644
--- a/crypto/skcipher.c
+++ b/crypto/skcipher.c
@@ -95,7 +95,7 @@ static inline u8 *skcipher_get_spot(u8 *start, unsigned int len)
return max(start, end_page);
}
-static int skcipher_done_slow(struct skcipher_walk *walk, unsigned int bsize)
+static void skcipher_done_slow(struct skcipher_walk *walk, unsigned int bsize)
{
u8 *addr;
@@ -103,23 +103,24 @@ static int skcipher_done_slow(struct skcipher_walk *walk, unsigned int bsize)
addr = skcipher_get_spot(addr, bsize);
scatterwalk_copychunks(addr, &walk->out, bsize,
(walk->flags & SKCIPHER_WALK_PHYS) ? 2 : 1);
- return 0;
}
int skcipher_walk_done(struct skcipher_walk *walk, int err)
{
- unsigned int n = walk->nbytes - err;
- unsigned int nbytes;
-
- nbytes = walk->total - n;
-
- if (unlikely(err < 0)) {
- nbytes = 0;
- n = 0;
- } else if (likely(!(walk->flags & (SKCIPHER_WALK_PHYS |
- SKCIPHER_WALK_SLOW |
- SKCIPHER_WALK_COPY |
- SKCIPHER_WALK_DIFF)))) {
+ unsigned int n; /* bytes processed */
+ bool more;
+
+ if (unlikely(err < 0))
+ goto finish;
+
+ n = walk->nbytes - err;
+ walk->total -= n;
+ more = (walk->total != 0);
+
+ if (likely(!(walk->flags & (SKCIPHER_WALK_PHYS |
+ SKCIPHER_WALK_SLOW |
+ SKCIPHER_WALK_COPY |
+ SKCIPHER_WALK_DIFF)))) {
unmap_src:
skcipher_unmap_src(walk);
} else if (walk->flags & SKCIPHER_WALK_DIFF) {
@@ -131,28 +132,28 @@ int skcipher_walk_done(struct skcipher_walk *walk, int err)
skcipher_unmap_dst(walk);
} else if (unlikely(walk->flags & SKCIPHER_WALK_SLOW)) {
if (WARN_ON(err)) {
+ /* unexpected case; didn't process all bytes */
err = -EINVAL;
- nbytes = 0;
- } else
- n = skcipher_done_slow(walk, n);
+ goto finish;
+ }
+ skcipher_done_slow(walk, n);
+ goto already_advanced;
}
- if (err > 0)
- err = 0;
-
- walk->total = nbytes;
- walk->nbytes = nbytes;
-
scatterwalk_advance(&walk->in, n);
scatterwalk_advance(&walk->out, n);
- scatterwalk_done(&walk->in, 0, nbytes);
- scatterwalk_done(&walk->out, 1, nbytes);
+already_advanced:
+ scatterwalk_done(&walk->in, 0, more);
+ scatterwalk_done(&walk->out, 1, more);
- if (nbytes) {
+ if (more) {
crypto_yield(walk->flags & SKCIPHER_WALK_SLEEP ?
CRYPTO_TFM_REQ_MAY_SLEEP : 0);
return skcipher_walk_next(walk);
}
+ err = 0;
+finish:
+ walk->nbytes = 0;
/* Short-circuit for the common/fast path. */
if (!((unsigned long)walk->buffer | (unsigned long)walk->page))
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 8088d3dd4d7c6933a65aa169393b5d88d8065672 Mon Sep 17 00:00:00 2001
From: Eric Biggers <ebiggers(a)google.com>
Date: Mon, 23 Jul 2018 10:54:56 -0700
Subject: [PATCH] crypto: skcipher - fix crash flushing dcache in error path
scatterwalk_done() is only meant to be called after a nonzero number of
bytes have been processed, since scatterwalk_pagedone() will flush the
dcache of the *previous* page. But in the error case of
skcipher_walk_done(), e.g. if the input wasn't an integer number of
blocks, scatterwalk_done() was actually called after advancing 0 bytes.
This caused a crash ("BUG: unable to handle kernel paging request")
during '!PageSlab(page)' on architectures like arm and arm64 that define
ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE, provided that the input was
page-aligned as in that case walk->offset == 0.
Fix it by reorganizing skcipher_walk_done() to skip the
scatterwalk_advance() and scatterwalk_done() if an error has occurred.
This bug was found by syzkaller fuzzing.
Reproducer, assuming ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE:
#include <linux/if_alg.h>
#include <sys/socket.h>
#include <unistd.h>
int main()
{
struct sockaddr_alg addr = {
.salg_type = "skcipher",
.salg_name = "cbc(aes-generic)",
};
char buffer[4096] __attribute__((aligned(4096))) = { 0 };
int fd;
fd = socket(AF_ALG, SOCK_SEQPACKET, 0);
bind(fd, (void *)&addr, sizeof(addr));
setsockopt(fd, SOL_ALG, ALG_SET_KEY, buffer, 16);
fd = accept(fd, NULL, NULL);
write(fd, buffer, 15);
read(fd, buffer, 15);
}
Reported-by: Liu Chao <liuchao741(a)huawei.com>
Fixes: b286d8b1a690 ("crypto: skcipher - Add skcipher walk interface")
Cc: <stable(a)vger.kernel.org> # v4.10+
Signed-off-by: Eric Biggers <ebiggers(a)google.com>
Signed-off-by: Herbert Xu <herbert(a)gondor.apana.org.au>
diff --git a/crypto/skcipher.c b/crypto/skcipher.c
index 835e5d36ad59..0bd8c6caa498 100644
--- a/crypto/skcipher.c
+++ b/crypto/skcipher.c
@@ -95,7 +95,7 @@ static inline u8 *skcipher_get_spot(u8 *start, unsigned int len)
return max(start, end_page);
}
-static int skcipher_done_slow(struct skcipher_walk *walk, unsigned int bsize)
+static void skcipher_done_slow(struct skcipher_walk *walk, unsigned int bsize)
{
u8 *addr;
@@ -103,23 +103,24 @@ static int skcipher_done_slow(struct skcipher_walk *walk, unsigned int bsize)
addr = skcipher_get_spot(addr, bsize);
scatterwalk_copychunks(addr, &walk->out, bsize,
(walk->flags & SKCIPHER_WALK_PHYS) ? 2 : 1);
- return 0;
}
int skcipher_walk_done(struct skcipher_walk *walk, int err)
{
- unsigned int n = walk->nbytes - err;
- unsigned int nbytes;
-
- nbytes = walk->total - n;
-
- if (unlikely(err < 0)) {
- nbytes = 0;
- n = 0;
- } else if (likely(!(walk->flags & (SKCIPHER_WALK_PHYS |
- SKCIPHER_WALK_SLOW |
- SKCIPHER_WALK_COPY |
- SKCIPHER_WALK_DIFF)))) {
+ unsigned int n; /* bytes processed */
+ bool more;
+
+ if (unlikely(err < 0))
+ goto finish;
+
+ n = walk->nbytes - err;
+ walk->total -= n;
+ more = (walk->total != 0);
+
+ if (likely(!(walk->flags & (SKCIPHER_WALK_PHYS |
+ SKCIPHER_WALK_SLOW |
+ SKCIPHER_WALK_COPY |
+ SKCIPHER_WALK_DIFF)))) {
unmap_src:
skcipher_unmap_src(walk);
} else if (walk->flags & SKCIPHER_WALK_DIFF) {
@@ -131,28 +132,28 @@ int skcipher_walk_done(struct skcipher_walk *walk, int err)
skcipher_unmap_dst(walk);
} else if (unlikely(walk->flags & SKCIPHER_WALK_SLOW)) {
if (WARN_ON(err)) {
+ /* unexpected case; didn't process all bytes */
err = -EINVAL;
- nbytes = 0;
- } else
- n = skcipher_done_slow(walk, n);
+ goto finish;
+ }
+ skcipher_done_slow(walk, n);
+ goto already_advanced;
}
- if (err > 0)
- err = 0;
-
- walk->total = nbytes;
- walk->nbytes = nbytes;
-
scatterwalk_advance(&walk->in, n);
scatterwalk_advance(&walk->out, n);
- scatterwalk_done(&walk->in, 0, nbytes);
- scatterwalk_done(&walk->out, 1, nbytes);
+already_advanced:
+ scatterwalk_done(&walk->in, 0, more);
+ scatterwalk_done(&walk->out, 1, more);
- if (nbytes) {
+ if (more) {
crypto_yield(walk->flags & SKCIPHER_WALK_SLEEP ?
CRYPTO_TFM_REQ_MAY_SLEEP : 0);
return skcipher_walk_next(walk);
}
+ err = 0;
+finish:
+ walk->nbytes = 0;
/* Short-circuit for the common/fast path. */
if (!((unsigned long)walk->buffer | (unsigned long)walk->page))
The patch below does not apply to the 3.18-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 0567fc9e90b9b1c8dbce8a5468758e6206744d4a Mon Sep 17 00:00:00 2001
From: Eric Biggers <ebiggers(a)google.com>
Date: Mon, 23 Jul 2018 09:57:50 -0700
Subject: [PATCH] crypto: skcipher - fix aligning block size in
skcipher_copy_iv()
The ALIGN() macro needs to be passed the alignment, not the alignmask
(which is the alignment minus 1).
Fixes: b286d8b1a690 ("crypto: skcipher - Add skcipher walk interface")
Cc: <stable(a)vger.kernel.org> # v4.10+
Signed-off-by: Eric Biggers <ebiggers(a)google.com>
Signed-off-by: Herbert Xu <herbert(a)gondor.apana.org.au>
diff --git a/crypto/skcipher.c b/crypto/skcipher.c
index 7d6a49fe3047..4f6b8dadaceb 100644
--- a/crypto/skcipher.c
+++ b/crypto/skcipher.c
@@ -398,7 +398,7 @@ static int skcipher_copy_iv(struct skcipher_walk *walk)
unsigned size;
u8 *iv;
- aligned_bs = ALIGN(bs, alignmask);
+ aligned_bs = ALIGN(bs, alignmask + 1);
/* Minimum size to align buffer by alignmask. */
size = alignmask & ~a;
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 0567fc9e90b9b1c8dbce8a5468758e6206744d4a Mon Sep 17 00:00:00 2001
From: Eric Biggers <ebiggers(a)google.com>
Date: Mon, 23 Jul 2018 09:57:50 -0700
Subject: [PATCH] crypto: skcipher - fix aligning block size in
skcipher_copy_iv()
The ALIGN() macro needs to be passed the alignment, not the alignmask
(which is the alignment minus 1).
Fixes: b286d8b1a690 ("crypto: skcipher - Add skcipher walk interface")
Cc: <stable(a)vger.kernel.org> # v4.10+
Signed-off-by: Eric Biggers <ebiggers(a)google.com>
Signed-off-by: Herbert Xu <herbert(a)gondor.apana.org.au>
diff --git a/crypto/skcipher.c b/crypto/skcipher.c
index 7d6a49fe3047..4f6b8dadaceb 100644
--- a/crypto/skcipher.c
+++ b/crypto/skcipher.c
@@ -398,7 +398,7 @@ static int skcipher_copy_iv(struct skcipher_walk *walk)
unsigned size;
u8 *iv;
- aligned_bs = ALIGN(bs, alignmask);
+ aligned_bs = ALIGN(bs, alignmask + 1);
/* Minimum size to align buffer by alignmask. */
size = alignmask & ~a;
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 0567fc9e90b9b1c8dbce8a5468758e6206744d4a Mon Sep 17 00:00:00 2001
From: Eric Biggers <ebiggers(a)google.com>
Date: Mon, 23 Jul 2018 09:57:50 -0700
Subject: [PATCH] crypto: skcipher - fix aligning block size in
skcipher_copy_iv()
The ALIGN() macro needs to be passed the alignment, not the alignmask
(which is the alignment minus 1).
Fixes: b286d8b1a690 ("crypto: skcipher - Add skcipher walk interface")
Cc: <stable(a)vger.kernel.org> # v4.10+
Signed-off-by: Eric Biggers <ebiggers(a)google.com>
Signed-off-by: Herbert Xu <herbert(a)gondor.apana.org.au>
diff --git a/crypto/skcipher.c b/crypto/skcipher.c
index 7d6a49fe3047..4f6b8dadaceb 100644
--- a/crypto/skcipher.c
+++ b/crypto/skcipher.c
@@ -398,7 +398,7 @@ static int skcipher_copy_iv(struct skcipher_walk *walk)
unsigned size;
u8 *iv;
- aligned_bs = ALIGN(bs, alignmask);
+ aligned_bs = ALIGN(bs, alignmask + 1);
/* Minimum size to align buffer by alignmask. */
size = alignmask & ~a;
commit b3681dd548d06deb2e1573890829dff4b15abf46 upstream.
This version applies to v4.9.
>From Andy Lutomirski, original author:
error_entry and error_exit communicate the user vs kernel status of
the frame using %ebx. This is unnecessary -- the information is in
regs->cs. Just use regs->cs.
This makes error_entry simpler and makes error_exit more robust.
It also fixes a nasty bug. Before all the Spectre nonsense, The
xen_failsafe_callback entry point returned like this:
ALLOC_PT_GPREGS_ON_STACK
SAVE_C_REGS
SAVE_EXTRA_REGS
ENCODE_FRAME_POINTER
jmp error_exit
And it did not go through error_entry. This was bogus: RBX
contained garbage, and error_exit expected a flag in RBX.
Fortunately, it generally contained *nonzero* garbage, so the
correct code path was used. As part of the Spectre fixes, code was
added to clear RBX to mitigate certain speculation attacks. Now,
depending on kernel configuration, RBX got zeroed and, when running
some Wine workloads, the kernel crashes. This was introduced by:
commit 3ac6d8c787b8 ("x86/entry/64: Clear registers for
exceptions/interrupts, to reduce speculation attack surface")
With this patch applied, RBX is no longer needed as a flag, and the
problem goes away.
I suspect that malicious userspace could use this bug to crash the
kernel even without the offending patch applied, though.
[Historical note: I wrote this patch as a cleanup before I was aware
of the bug it fixed.]
[Note to stable maintainers: this should probably get applied to all
kernels.]
Cc: Brian Gerst <brgerst(a)gmail.com>
Cc: Borislav Petkov <bp(a)alien8.de>
Cc: Dominik Brodowski <linux(a)dominikbrodowski.net>
Cc: Ingo Molnar <mingo(a)redhat.com>
Cc: "H. Peter Anvin" <hpa(a)zytor.com>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Boris Ostrovsky <boris.ostrovsky(a)oracle.com>
Cc: Juergen Gross <jgross(a)suse.com>
Cc: xen-devel(a)lists.xenproject.org
Cc: x86(a)kernel.org
Cc: stable(a)vger.kernel.org
Cc: Andy Lutomirski <luto(a)kernel.org>
Fixes: 3ac6d8c787b8 ("x86/entry/64: Clear registers for exceptions/interrupts, to reduce speculation attack surface")
Reported-and-tested-by: "M. Vefa Bicakci" <m.v.b(a)runbox.com>
Signed-off-by: Andy Lutomirski <luto(a)kernel.org>
Signed-off-by: Sarah Newman <srn(a)prgmr.com>
---
arch/x86/entry/entry_64.S | 19 ++++---------------
1 file changed, 4 insertions(+), 15 deletions(-)
diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index d58d8dc..0dab47a 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -774,7 +774,7 @@ ENTRY(\sym)
call \do_sym
- jmp error_exit /* %ebx: no swapgs flag */
+ jmp error_exit
.endif
END(\sym)
.endm
@@ -1043,7 +1043,6 @@ END(paranoid_exit)
/*
* Save all registers in pt_regs, and switch gs if needed.
- * Return: EBX=0: came from user mode; EBX=1: otherwise
*/
ENTRY(error_entry)
cld
@@ -1087,7 +1086,6 @@ ENTRY(error_entry)
* for these here too.
*/
.Lerror_kernelspace:
- incl %ebx
leaq native_irq_return_iret(%rip), %rcx
cmpq %rcx, RIP+8(%rsp)
je .Lerror_bad_iret
@@ -1119,28 +1117,19 @@ ENTRY(error_entry)
/*
* Pretend that the exception came from user mode: set up pt_regs
- * as if we faulted immediately after IRET and clear EBX so that
- * error_exit knows that we will be returning to user mode.
+ * as if we faulted immediately after IRET.
*/
mov %rsp, %rdi
call fixup_bad_iret
mov %rax, %rsp
- decl %ebx
jmp .Lerror_entry_from_usermode_after_swapgs
END(error_entry)
-
-/*
- * On entry, EBX is a "return to kernel mode" flag:
- * 1: already in kernel mode, don't need SWAPGS
- * 0: user gsbase is loaded, we need SWAPGS and standard preparation for return to usermode
- */
ENTRY(error_exit)
- movl %ebx, %eax
DISABLE_INTERRUPTS(CLBR_NONE)
TRACE_IRQS_OFF
- testl %eax, %eax
- jnz retint_kernel
+ testb $3, CS(%rsp)
+ jz retint_kernel
jmp retint_user
END(error_exit)
--
1.9.1
From: Andrey Konovalov <andreyknvl(a)google.com>
commit 0e410e158e5baa1300bdf678cea4f4e0cf9d8b94 upstream.
With KASAN enabled the kernel has two different memset() functions, one
with KASAN checks (memset) and one without (__memset). KASAN uses some
macro tricks to use the proper version where required. For example
memset() calls in mm/slub.c are without KASAN checks, since they operate
on poisoned slab object metadata.
The issue is that clang emits memset() calls even when there is no
memset() in the source code. They get linked with improper memset()
implementation and the kernel fails to boot due to a huge amount of KASAN
reports during early boot stages.
The solution is to add -fno-builtin flag for files with KASAN_SANITIZE :=
n marker.
Link: http://lkml.kernel.org/r/8ffecfffe04088c52c42b92739c2bd8a0bcb3f5e.151638459…
Signed-off-by: Andrey Konovalov <andreyknvl(a)google.com>
Acked-by: Nick Desaulniers <ndesaulniers(a)google.com>
Cc: Masahiro Yamada <yamada.masahiro(a)socionext.com>
Cc: Michal Marek <michal.lkml(a)markovi.net>
Cc: Andrey Ryabinin <aryabinin(a)virtuozzo.com>
Cc: Alexander Potapenko <glider(a)google.com>
Cc: Dmitry Vyukov <dvyukov(a)google.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
[ Sami: Backported to 4.9 avoiding c5caf21ab0cf8 and e7c52b84fb ]
Signed-off-by: Sami Tolvanen <samitolvanen(a)google.com>
Signed-off-by: Nick Desaulniers <ndesaulniers(a)google.com>
---
Makefile | 3 ++-
scripts/Makefile.kasan | 3 +++
scripts/Makefile.lib | 2 +-
3 files changed, 6 insertions(+), 2 deletions(-)
diff --git a/Makefile b/Makefile
index 0723bbe1d4a7..1fa448ba243f 100644
--- a/Makefile
+++ b/Makefile
@@ -417,7 +417,8 @@ export MAKE AWK GENKSYMS INSTALLKERNEL PERL PYTHON UTS_MACHINE
export HOSTCXX HOSTCXXFLAGS LDFLAGS_MODULE CHECK CHECKFLAGS
export KBUILD_CPPFLAGS NOSTDINC_FLAGS LINUXINCLUDE OBJCOPYFLAGS LDFLAGS
-export KBUILD_CFLAGS CFLAGS_KERNEL CFLAGS_MODULE CFLAGS_KASAN CFLAGS_UBSAN
+export KBUILD_CFLAGS CFLAGS_KERNEL CFLAGS_MODULE
+export CFLAGS_KASAN CFLAGS_KASAN_NOSANITIZE CFLAGS_UBSAN
export KBUILD_AFLAGS AFLAGS_KERNEL AFLAGS_MODULE
export KBUILD_AFLAGS_MODULE KBUILD_CFLAGS_MODULE KBUILD_LDFLAGS_MODULE
export KBUILD_AFLAGS_KERNEL KBUILD_CFLAGS_KERNEL
diff --git a/scripts/Makefile.kasan b/scripts/Makefile.kasan
index 37323b0df374..2624d4bf9a45 100644
--- a/scripts/Makefile.kasan
+++ b/scripts/Makefile.kasan
@@ -28,4 +28,7 @@ else
CFLAGS_KASAN := $(CFLAGS_KASAN_MINIMAL)
endif
endif
+
+CFLAGS_KASAN_NOSANITIZE := -fno-builtin
+
endif
diff --git a/scripts/Makefile.lib b/scripts/Makefile.lib
index ae0f9ab1a70d..c954040c3cf2 100644
--- a/scripts/Makefile.lib
+++ b/scripts/Makefile.lib
@@ -127,7 +127,7 @@ endif
ifeq ($(CONFIG_KASAN),y)
_c_flags += $(if $(patsubst n%,, \
$(KASAN_SANITIZE_$(basetarget).o)$(KASAN_SANITIZE)y), \
- $(CFLAGS_KASAN))
+ $(CFLAGS_KASAN), $(CFLAGS_KASAN_NOSANITIZE))
endif
ifeq ($(CONFIG_UBSAN),y)
--
2.18.0.865.gffc8e1a3cd6-goog
The 4.4.y stable backport dc6ae4dffd65 for the upstream commit
3d4bf93ac120 ("tcp: detect malicious patterns in
tcp_collapse_ofo_queue()") missed a line that enlarges the
range_truesize value, which broke the whole check.
Fixes: dc6ae4dffd65 ("tcp: detect malicious patterns in tcp_collapse_ofo_queue()")
Signed-off-by: Takashi Iwai <tiwai(a)suse.de>
---
Greg, this is a fix-up specific to 4.4.y stable backport that had a
slightly different form from upstream fix. I haven't looked at the
older trees, but 4.9.y and later took the upstream fix as is, so this
patch isn't needed for them.
The patch hasn't been tested with the real test case, though; let me
know if the current code is intended. Thanks!
net/ipv4/tcp_input.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 4a261e078082..9c4c6cd0316e 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -4835,6 +4835,7 @@ static void tcp_collapse_ofo_queue(struct sock *sk)
end = TCP_SKB_CB(skb)->end_seq;
range_truesize = skb->truesize;
} else {
+ range_truesize += skb->truesize;
if (before(TCP_SKB_CB(skb)->seq, start))
start = TCP_SKB_CB(skb)->seq;
if (after(TCP_SKB_CB(skb)->end_seq, end))
--
2.18.0
From: Yannik Sembritzki <yannik(a)sembritzki.me>
The split of .system_keyring into .builtin_trusted_keys and
.secondary_trusted_keys broke kexec, thereby preventing kernels signed by
keys which are now in the secondary keyring from being kexec'd.
Fix this by passing VERIFY_USE_SECONDARY_KEYRING to
verify_pefile_signature().
Fixes: d3bfe84129f6 ("certs: Add a secondary system keyring that can be added to dynamically")
Signed-off-by: Yannik Sembritzki <yannik(a)sembritzki.me>
Signed-off-by: David Howells <dhowells(a)redhat.com>
cc: kexec(a)lists.infradead.org
cc: keyrings(a)vger.kernel.org
cc: linux-security-module(a)vger.kernel.org
cc: stable(a)vger.kernel.org
---
arch/x86/kernel/kexec-bzimage64.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/x86/kernel/kexec-bzimage64.c b/arch/x86/kernel/kexec-bzimage64.c
index 7326078eaa7a..278cd07228dd 100644
--- a/arch/x86/kernel/kexec-bzimage64.c
+++ b/arch/x86/kernel/kexec-bzimage64.c
@@ -532,7 +532,7 @@ static int bzImage64_cleanup(void *loader_data)
static int bzImage64_verify_sig(const char *kernel, unsigned long kernel_len)
{
return verify_pefile_signature(kernel, kernel_len,
- NULL,
+ VERIFY_USE_SECONDARY_KEYRING,
VERIFYING_KEXEC_PE_SIGNATURE);
}
#endif
Inside of start_xmit() the call to check if the connection is up and the
queueing of the packets for later transmission is not atomic which
leaves a window where cm_rep_handler can run, set the connection up,
dequeue pending packets and leave the subsequently queued packets by
start_xmit() sitting on neigh->queue until they're dropped when the
connection is torn down. This only applies to connected mode. These
dropped packets can really upset TCP, for example, and cause
multi-minute delays in transmission for open connections.
I've got a reproducer available if it's needed.
Here's the code in start_xmit where we check to see if the connection
is up:
if (ipoib_cm_get(neigh)) {
if (ipoib_cm_up(neigh)) {
ipoib_cm_send(dev, skb, ipoib_cm_get(neigh));
goto unref;
}
}
The race occurs if cm_rep_handler execution occurs after the above
connection check (specifically if it gets to the point where it acquires
priv->lock to dequeue pending skb's) but before the below code snippet
in start_xmit where packets are queued.
if (skb_queue_len(&neigh->queue) < IPOIB_MAX_PATH_REC_QUEUE) {
push_pseudo_header(skb, phdr->hwaddr);
spin_lock_irqsave(&priv->lock, flags);
__skb_queue_tail(&neigh->queue, skb);
spin_unlock_irqrestore(&priv->lock, flags);
} else {
++dev->stats.tx_dropped;
dev_kfree_skb_any(skb);
}
The patch re-checks ipoib_cm_up with priv->lock held to avoid this
race condition. Since odds are the conn should be up most of the time
(and thus the connection *not* down most of the time) we don't hold the
lock for the first check attempt to avoid a slowdown from unecessary
locking for the majority of the packets transmitted during the
connection's life.
Cc: stable(a)vger.kernel.org
Tested-by: Ira Weiny <ira.weiny(a)intel.com>
Signed-off-by: Aaron Knister <aaron.s.knister(a)nasa.gov>
---
drivers/infiniband/ulp/ipoib/ipoib_main.c | 53 +++++++++++++++++++++++++------
1 file changed, 44 insertions(+), 9 deletions(-)
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c
index 26cde95b..529dbeab 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_main.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c
@@ -1093,6 +1093,34 @@ static void unicast_arp_send(struct sk_buff *skb, struct net_device *dev,
spin_unlock_irqrestore(&priv->lock, flags);
}
+static void defer_neigh_skb(struct sk_buff *skb, struct net_device *dev,
+ struct ipoib_neigh *neigh,
+ struct ipoib_pseudo_header *phdr,
+ unsigned long *flags)
+{
+ struct ipoib_dev_priv *priv = ipoib_priv(dev);
+ unsigned long local_flags;
+ int acquire_priv_lock = 0;
+
+ /* Passing in pointer to spin_lock flags indicates spin lock
+ * already acquired so we don't need to acquire the priv lock */
+ if (flags == NULL) {
+ flags = &local_flags;
+ acquire_priv_lock = 1;
+ }
+
+ if (skb_queue_len(&neigh->queue) < IPOIB_MAX_PATH_REC_QUEUE) {
+ push_pseudo_header(skb, phdr->hwaddr);
+ if (acquire_priv_lock)
+ spin_lock_irqsave(&priv->lock, *flags);
+ __skb_queue_tail(&neigh->queue, skb);
+ spin_unlock_irqrestore(&priv->lock, *flags);
+ } else {
+ ++dev->stats.tx_dropped;
+ dev_kfree_skb_any(skb);
+ }
+}
+
static netdev_tx_t ipoib_start_xmit(struct sk_buff *skb, struct net_device *dev)
{
struct ipoib_dev_priv *priv = ipoib_priv(dev);
@@ -1160,6 +1188,21 @@ static netdev_tx_t ipoib_start_xmit(struct sk_buff *skb, struct net_device *dev)
ipoib_cm_send(dev, skb, ipoib_cm_get(neigh));
goto unref;
}
+ /*
+ * Re-check ipoib_cm_up with priv->lock held to avoid
+ * race condition between start_xmit and skb_dequeue in
+ * cm_rep_handler. Since odds are the conn should be up
+ * most of the time, we don't hold the lock for the
+ * first check above
+ */
+ spin_lock_irqsave(&priv->lock, flags);
+ if (ipoib_cm_up(neigh)) {
+ spin_unlock_irqrestore(&priv->lock, flags);
+ ipoib_cm_send(dev, skb, ipoib_cm_get(neigh));
+ } else {
+ defer_neigh_skb(skb, dev, neigh, phdr, &flags);
+ }
+ goto unref;
} else if (neigh->ah && neigh->ah->valid) {
neigh->ah->last_send = rn->send(dev, skb, neigh->ah->ah,
IPOIB_QPN(phdr->hwaddr));
@@ -1168,15 +1211,7 @@ static netdev_tx_t ipoib_start_xmit(struct sk_buff *skb, struct net_device *dev)
neigh_refresh_path(neigh, phdr->hwaddr, dev);
}
- if (skb_queue_len(&neigh->queue) < IPOIB_MAX_PATH_REC_QUEUE) {
- push_pseudo_header(skb, phdr->hwaddr);
- spin_lock_irqsave(&priv->lock, flags);
- __skb_queue_tail(&neigh->queue, skb);
- spin_unlock_irqrestore(&priv->lock, flags);
- } else {
- ++dev->stats.tx_dropped;
- dev_kfree_skb_any(skb);
- }
+ defer_neigh_skb(skb, dev, neigh, phdr, NULL);
unref:
ipoib_neigh_put(neigh);
--
2.12.3
Hi!
According to 1), disabling EPT offers the same maximum protection against L1TF as disabling SMT but
has a severe performance impact.
FWIW: With EPT disabled (2)), I can *not* confirm any performance-degradation for the VirtualBox
Windows- or Linux-VMs that I use. Those VMs are for desktop-use, though.
So to me it seems that the performance impact depends on the use case and in a desktop-setting
disabling EPT may offer a simple max-protection-option with the advantage of still enabled
hyperthreading.
I have tried this with 4.18.1 and 4.14.63.
Rainer Fiebig
***
1) https://www.kernel.org/doc/html/latest/admin-guide/l1tf.html#mitigation-sel…
2) kvm-intel.ept=0
> tail /sys/devices/system/cpu/vulnerabilities/*
==> /sys/devices/system/cpu/vulnerabilities/l1tf <==
Mitigation: PTE Inversion; VMX: EPT disabled
==> /sys/devices/system/cpu/vulnerabilities/meltdown <==
Mitigation: PTI
==> /sys/devices/system/cpu/vulnerabilities/spec_store_bypass <==
Mitigation: Speculative Store Bypass disabled via prctl and seccomp
==> /sys/devices/system/cpu/vulnerabilities/spectre_v1 <==
Mitigation: __user pointer sanitization
==> /sys/devices/system/cpu/vulnerabilities/spectre_v2 <==
Mitigation: Full generic retpoline, IBPB, IBRS_FW
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 5e0fb5df2ee871b841f96f9cb6a7f2784e96aa4e Mon Sep 17 00:00:00 2001
From: Toshi Kani <toshi.kani(a)hpe.com>
Date: Wed, 27 Jun 2018 08:13:48 -0600
Subject: [PATCH] x86/mm: Add TLB purge to free pmd/pte page interfaces
ioremap() calls pud_free_pmd_page() / pmd_free_pte_page() when it creates
a pud / pmd map. The following preconditions are met at their entry.
- All pte entries for a target pud/pmd address range have been cleared.
- System-wide TLB purges have been peformed for a target pud/pmd address
range.
The preconditions assure that there is no stale TLB entry for the range.
Speculation may not cache TLB entries since it requires all levels of page
entries, including ptes, to have P & A-bits set for an associated address.
However, speculation may cache pud/pmd entries (paging-structure caches)
when they have P-bit set.
Add a system-wide TLB purge (INVLPG) to a single page after clearing
pud/pmd entry's P-bit.
SDM 4.10.4.1, Operation that Invalidate TLBs and Paging-Structure Caches,
states that:
INVLPG invalidates all paging-structure caches associated with the
current PCID regardless of the liner addresses to which they correspond.
Fixes: 28ee90fe6048 ("x86/mm: implement free pmd/pte page interfaces")
Signed-off-by: Toshi Kani <toshi.kani(a)hpe.com>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Cc: mhocko(a)suse.com
Cc: akpm(a)linux-foundation.org
Cc: hpa(a)zytor.com
Cc: cpandya(a)codeaurora.org
Cc: linux-mm(a)kvack.org
Cc: linux-arm-kernel(a)lists.infradead.org
Cc: Joerg Roedel <joro(a)8bytes.org>
Cc: stable(a)vger.kernel.org
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: "H. Peter Anvin" <hpa(a)zytor.com>
Cc: <stable(a)vger.kernel.org>
Link: https://lkml.kernel.org/r/20180627141348.21777-4-toshi.kani@hpe.com
diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
index fbd14e506758..e3deefb891da 100644
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -725,24 +725,44 @@ int pmd_clear_huge(pmd_t *pmd)
* @pud: Pointer to a PUD.
* @addr: Virtual address associated with pud.
*
- * Context: The pud range has been unmaped and TLB purged.
+ * Context: The pud range has been unmapped and TLB purged.
* Return: 1 if clearing the entry succeeded. 0 otherwise.
+ *
+ * NOTE: Callers must allow a single page allocation.
*/
int pud_free_pmd_page(pud_t *pud, unsigned long addr)
{
- pmd_t *pmd;
+ pmd_t *pmd, *pmd_sv;
+ pte_t *pte;
int i;
if (pud_none(*pud))
return 1;
pmd = (pmd_t *)pud_page_vaddr(*pud);
+ pmd_sv = (pmd_t *)__get_free_page(GFP_KERNEL);
+ if (!pmd_sv)
+ return 0;
- for (i = 0; i < PTRS_PER_PMD; i++)
- if (!pmd_free_pte_page(&pmd[i], addr + (i * PMD_SIZE)))
- return 0;
+ for (i = 0; i < PTRS_PER_PMD; i++) {
+ pmd_sv[i] = pmd[i];
+ if (!pmd_none(pmd[i]))
+ pmd_clear(&pmd[i]);
+ }
pud_clear(pud);
+
+ /* INVLPG to clear all paging-structure caches */
+ flush_tlb_kernel_range(addr, addr + PAGE_SIZE-1);
+
+ for (i = 0; i < PTRS_PER_PMD; i++) {
+ if (!pmd_none(pmd_sv[i])) {
+ pte = (pte_t *)pmd_page_vaddr(pmd_sv[i]);
+ free_page((unsigned long)pte);
+ }
+ }
+
+ free_page((unsigned long)pmd_sv);
free_page((unsigned long)pmd);
return 1;
@@ -753,7 +773,7 @@ int pud_free_pmd_page(pud_t *pud, unsigned long addr)
* @pmd: Pointer to a PMD.
* @addr: Virtual address associated with pmd.
*
- * Context: The pmd range has been unmaped and TLB purged.
+ * Context: The pmd range has been unmapped and TLB purged.
* Return: 1 if clearing the entry succeeded. 0 otherwise.
*/
int pmd_free_pte_page(pmd_t *pmd, unsigned long addr)
@@ -765,6 +785,10 @@ int pmd_free_pte_page(pmd_t *pmd, unsigned long addr)
pte = (pte_t *)pmd_page_vaddr(*pmd);
pmd_clear(pmd);
+
+ /* INVLPG to clear all paging-structure caches */
+ flush_tlb_kernel_range(addr, addr + PAGE_SIZE-1);
+
free_page((unsigned long)pte);
return 1;
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 5e0fb5df2ee871b841f96f9cb6a7f2784e96aa4e Mon Sep 17 00:00:00 2001
From: Toshi Kani <toshi.kani(a)hpe.com>
Date: Wed, 27 Jun 2018 08:13:48 -0600
Subject: [PATCH] x86/mm: Add TLB purge to free pmd/pte page interfaces
ioremap() calls pud_free_pmd_page() / pmd_free_pte_page() when it creates
a pud / pmd map. The following preconditions are met at their entry.
- All pte entries for a target pud/pmd address range have been cleared.
- System-wide TLB purges have been peformed for a target pud/pmd address
range.
The preconditions assure that there is no stale TLB entry for the range.
Speculation may not cache TLB entries since it requires all levels of page
entries, including ptes, to have P & A-bits set for an associated address.
However, speculation may cache pud/pmd entries (paging-structure caches)
when they have P-bit set.
Add a system-wide TLB purge (INVLPG) to a single page after clearing
pud/pmd entry's P-bit.
SDM 4.10.4.1, Operation that Invalidate TLBs and Paging-Structure Caches,
states that:
INVLPG invalidates all paging-structure caches associated with the
current PCID regardless of the liner addresses to which they correspond.
Fixes: 28ee90fe6048 ("x86/mm: implement free pmd/pte page interfaces")
Signed-off-by: Toshi Kani <toshi.kani(a)hpe.com>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Cc: mhocko(a)suse.com
Cc: akpm(a)linux-foundation.org
Cc: hpa(a)zytor.com
Cc: cpandya(a)codeaurora.org
Cc: linux-mm(a)kvack.org
Cc: linux-arm-kernel(a)lists.infradead.org
Cc: Joerg Roedel <joro(a)8bytes.org>
Cc: stable(a)vger.kernel.org
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: "H. Peter Anvin" <hpa(a)zytor.com>
Cc: <stable(a)vger.kernel.org>
Link: https://lkml.kernel.org/r/20180627141348.21777-4-toshi.kani@hpe.com
diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
index fbd14e506758..e3deefb891da 100644
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -725,24 +725,44 @@ int pmd_clear_huge(pmd_t *pmd)
* @pud: Pointer to a PUD.
* @addr: Virtual address associated with pud.
*
- * Context: The pud range has been unmaped and TLB purged.
+ * Context: The pud range has been unmapped and TLB purged.
* Return: 1 if clearing the entry succeeded. 0 otherwise.
+ *
+ * NOTE: Callers must allow a single page allocation.
*/
int pud_free_pmd_page(pud_t *pud, unsigned long addr)
{
- pmd_t *pmd;
+ pmd_t *pmd, *pmd_sv;
+ pte_t *pte;
int i;
if (pud_none(*pud))
return 1;
pmd = (pmd_t *)pud_page_vaddr(*pud);
+ pmd_sv = (pmd_t *)__get_free_page(GFP_KERNEL);
+ if (!pmd_sv)
+ return 0;
- for (i = 0; i < PTRS_PER_PMD; i++)
- if (!pmd_free_pte_page(&pmd[i], addr + (i * PMD_SIZE)))
- return 0;
+ for (i = 0; i < PTRS_PER_PMD; i++) {
+ pmd_sv[i] = pmd[i];
+ if (!pmd_none(pmd[i]))
+ pmd_clear(&pmd[i]);
+ }
pud_clear(pud);
+
+ /* INVLPG to clear all paging-structure caches */
+ flush_tlb_kernel_range(addr, addr + PAGE_SIZE-1);
+
+ for (i = 0; i < PTRS_PER_PMD; i++) {
+ if (!pmd_none(pmd_sv[i])) {
+ pte = (pte_t *)pmd_page_vaddr(pmd_sv[i]);
+ free_page((unsigned long)pte);
+ }
+ }
+
+ free_page((unsigned long)pmd_sv);
free_page((unsigned long)pmd);
return 1;
@@ -753,7 +773,7 @@ int pud_free_pmd_page(pud_t *pud, unsigned long addr)
* @pmd: Pointer to a PMD.
* @addr: Virtual address associated with pmd.
*
- * Context: The pmd range has been unmaped and TLB purged.
+ * Context: The pmd range has been unmapped and TLB purged.
* Return: 1 if clearing the entry succeeded. 0 otherwise.
*/
int pmd_free_pte_page(pmd_t *pmd, unsigned long addr)
@@ -765,6 +785,10 @@ int pmd_free_pte_page(pmd_t *pmd, unsigned long addr)
pte = (pte_t *)pmd_page_vaddr(*pmd);
pmd_clear(pmd);
+
+ /* INVLPG to clear all paging-structure caches */
+ flush_tlb_kernel_range(addr, addr + PAGE_SIZE-1);
+
free_page((unsigned long)pte);
return 1;
The patch below does not apply to the 4.17-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 5e0fb5df2ee871b841f96f9cb6a7f2784e96aa4e Mon Sep 17 00:00:00 2001
From: Toshi Kani <toshi.kani(a)hpe.com>
Date: Wed, 27 Jun 2018 08:13:48 -0600
Subject: [PATCH] x86/mm: Add TLB purge to free pmd/pte page interfaces
ioremap() calls pud_free_pmd_page() / pmd_free_pte_page() when it creates
a pud / pmd map. The following preconditions are met at their entry.
- All pte entries for a target pud/pmd address range have been cleared.
- System-wide TLB purges have been peformed for a target pud/pmd address
range.
The preconditions assure that there is no stale TLB entry for the range.
Speculation may not cache TLB entries since it requires all levels of page
entries, including ptes, to have P & A-bits set for an associated address.
However, speculation may cache pud/pmd entries (paging-structure caches)
when they have P-bit set.
Add a system-wide TLB purge (INVLPG) to a single page after clearing
pud/pmd entry's P-bit.
SDM 4.10.4.1, Operation that Invalidate TLBs and Paging-Structure Caches,
states that:
INVLPG invalidates all paging-structure caches associated with the
current PCID regardless of the liner addresses to which they correspond.
Fixes: 28ee90fe6048 ("x86/mm: implement free pmd/pte page interfaces")
Signed-off-by: Toshi Kani <toshi.kani(a)hpe.com>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Cc: mhocko(a)suse.com
Cc: akpm(a)linux-foundation.org
Cc: hpa(a)zytor.com
Cc: cpandya(a)codeaurora.org
Cc: linux-mm(a)kvack.org
Cc: linux-arm-kernel(a)lists.infradead.org
Cc: Joerg Roedel <joro(a)8bytes.org>
Cc: stable(a)vger.kernel.org
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: "H. Peter Anvin" <hpa(a)zytor.com>
Cc: <stable(a)vger.kernel.org>
Link: https://lkml.kernel.org/r/20180627141348.21777-4-toshi.kani@hpe.com
diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
index fbd14e506758..e3deefb891da 100644
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -725,24 +725,44 @@ int pmd_clear_huge(pmd_t *pmd)
* @pud: Pointer to a PUD.
* @addr: Virtual address associated with pud.
*
- * Context: The pud range has been unmaped and TLB purged.
+ * Context: The pud range has been unmapped and TLB purged.
* Return: 1 if clearing the entry succeeded. 0 otherwise.
+ *
+ * NOTE: Callers must allow a single page allocation.
*/
int pud_free_pmd_page(pud_t *pud, unsigned long addr)
{
- pmd_t *pmd;
+ pmd_t *pmd, *pmd_sv;
+ pte_t *pte;
int i;
if (pud_none(*pud))
return 1;
pmd = (pmd_t *)pud_page_vaddr(*pud);
+ pmd_sv = (pmd_t *)__get_free_page(GFP_KERNEL);
+ if (!pmd_sv)
+ return 0;
- for (i = 0; i < PTRS_PER_PMD; i++)
- if (!pmd_free_pte_page(&pmd[i], addr + (i * PMD_SIZE)))
- return 0;
+ for (i = 0; i < PTRS_PER_PMD; i++) {
+ pmd_sv[i] = pmd[i];
+ if (!pmd_none(pmd[i]))
+ pmd_clear(&pmd[i]);
+ }
pud_clear(pud);
+
+ /* INVLPG to clear all paging-structure caches */
+ flush_tlb_kernel_range(addr, addr + PAGE_SIZE-1);
+
+ for (i = 0; i < PTRS_PER_PMD; i++) {
+ if (!pmd_none(pmd_sv[i])) {
+ pte = (pte_t *)pmd_page_vaddr(pmd_sv[i]);
+ free_page((unsigned long)pte);
+ }
+ }
+
+ free_page((unsigned long)pmd_sv);
free_page((unsigned long)pmd);
return 1;
@@ -753,7 +773,7 @@ int pud_free_pmd_page(pud_t *pud, unsigned long addr)
* @pmd: Pointer to a PMD.
* @addr: Virtual address associated with pmd.
*
- * Context: The pmd range has been unmaped and TLB purged.
+ * Context: The pmd range has been unmapped and TLB purged.
* Return: 1 if clearing the entry succeeded. 0 otherwise.
*/
int pmd_free_pte_page(pmd_t *pmd, unsigned long addr)
@@ -765,6 +785,10 @@ int pmd_free_pte_page(pmd_t *pmd, unsigned long addr)
pte = (pte_t *)pmd_page_vaddr(*pmd);
pmd_clear(pmd);
+
+ /* INVLPG to clear all paging-structure caches */
+ flush_tlb_kernel_range(addr, addr + PAGE_SIZE-1);
+
free_page((unsigned long)pte);
return 1;