August 2018 - Linux-stable-mirror

[PATCH 4.4.y] tcp: Fix missing range_truesize enlargement in the backport

by Takashi Iwai

The 4.4.y stable backport dc6ae4dffd65 for the upstream commit 3d4bf93ac120 ("tcp: detect malicious patterns in tcp_collapse_ofo_queue()") missed a line that enlarges the range_truesize value, which broke the whole check. Fixes: dc6ae4dffd65 ("tcp: detect malicious patterns in tcp_collapse_ofo_queue()") Signed-off-by: Takashi Iwai <tiwai(a)suse.de> --- Greg, this is a fix-up specific to 4.4.y stable backport that had a slightly different form from upstream fix. I haven't looked at the older trees, but 4.9.y and later took the upstream fix as is, so this patch isn't needed for them. The patch hasn't been tested with the real test case, though; let me know if the current code is intended. Thanks! net/ipv4/tcp_input.c | 1 + 1 file changed, 1 insertion(+) diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index 4a261e078082..9c4c6cd0316e 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -4835,6 +4835,7 @@ static void tcp_collapse_ofo_queue(struct sock *sk) end = TCP_SKB_CB(skb)->end_seq; range_truesize = skb->truesize; } else { + range_truesize += skb->truesize; if (before(TCP_SKB_CB(skb)->seq, start)) start = TCP_SKB_CB(skb)->seq; if (after(TCP_SKB_CB(skb)->end_seq, end)) -- 2.18.0

6 years, 10 months

3
4
0 0

[PATCH 2/2] Fix kexec forbidding kernels signed with keys in the secondary keyring to boot

by David Howells

From: Yannik Sembritzki <yannik(a)sembritzki.me> The split of .system_keyring into .builtin_trusted_keys and .secondary_trusted_keys broke kexec, thereby preventing kernels signed by keys which are now in the secondary keyring from being kexec'd. Fix this by passing VERIFY_USE_SECONDARY_KEYRING to verify_pefile_signature(). Fixes: d3bfe84129f6 ("certs: Add a secondary system keyring that can be added to dynamically") Signed-off-by: Yannik Sembritzki <yannik(a)sembritzki.me> Signed-off-by: David Howells <dhowells(a)redhat.com> cc: kexec(a)lists.infradead.org cc: keyrings(a)vger.kernel.org cc: linux-security-module(a)vger.kernel.org cc: stable(a)vger.kernel.org --- arch/x86/kernel/kexec-bzimage64.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/x86/kernel/kexec-bzimage64.c b/arch/x86/kernel/kexec-bzimage64.c index 7326078eaa7a..278cd07228dd 100644 --- a/arch/x86/kernel/kexec-bzimage64.c +++ b/arch/x86/kernel/kexec-bzimage64.c @@ -532,7 +532,7 @@ static int bzImage64_cleanup(void *loader_data) static int bzImage64_verify_sig(const char *kernel, unsigned long kernel_len) { return verify_pefile_signature(kernel, kernel_len, - NULL, + VERIFY_USE_SECONDARY_KEYRING, VERIFYING_KEXEC_PE_SIGNATURE); } #endif

6 years, 10 months

1
0
0 0

[PATCH] avoid race condition between start_xmit and cm_rep_handler

by Aaron Knister

Inside of start_xmit() the call to check if the connection is up and the queueing of the packets for later transmission is not atomic which leaves a window where cm_rep_handler can run, set the connection up, dequeue pending packets and leave the subsequently queued packets by start_xmit() sitting on neigh->queue until they're dropped when the connection is torn down. This only applies to connected mode. These dropped packets can really upset TCP, for example, and cause multi-minute delays in transmission for open connections. I've got a reproducer available if it's needed. Here's the code in start_xmit where we check to see if the connection is up: if (ipoib_cm_get(neigh)) { if (ipoib_cm_up(neigh)) { ipoib_cm_send(dev, skb, ipoib_cm_get(neigh)); goto unref; } } The race occurs if cm_rep_handler execution occurs after the above connection check (specifically if it gets to the point where it acquires priv->lock to dequeue pending skb's) but before the below code snippet in start_xmit where packets are queued. if (skb_queue_len(&neigh->queue) < IPOIB_MAX_PATH_REC_QUEUE) { push_pseudo_header(skb, phdr->hwaddr); spin_lock_irqsave(&priv->lock, flags); __skb_queue_tail(&neigh->queue, skb); spin_unlock_irqrestore(&priv->lock, flags); } else { ++dev->stats.tx_dropped; dev_kfree_skb_any(skb); } The patch re-checks ipoib_cm_up with priv->lock held to avoid this race condition. Since odds are the conn should be up most of the time (and thus the connection *not* down most of the time) we don't hold the lock for the first check attempt to avoid a slowdown from unecessary locking for the majority of the packets transmitted during the connection's life. Cc: stable(a)vger.kernel.org Tested-by: Ira Weiny <ira.weiny(a)intel.com> Signed-off-by: Aaron Knister <aaron.s.knister(a)nasa.gov> --- drivers/infiniband/ulp/ipoib/ipoib_main.c | 53 +++++++++++++++++++++++++------ 1 file changed, 44 insertions(+), 9 deletions(-) diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c index 26cde95b..529dbeab 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_main.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c @@ -1093,6 +1093,34 @@ static void unicast_arp_send(struct sk_buff *skb, struct net_device *dev, spin_unlock_irqrestore(&priv->lock, flags); } +static void defer_neigh_skb(struct sk_buff *skb, struct net_device *dev, + struct ipoib_neigh *neigh, + struct ipoib_pseudo_header *phdr, + unsigned long *flags) +{ + struct ipoib_dev_priv *priv = ipoib_priv(dev); + unsigned long local_flags; + int acquire_priv_lock = 0; + + /* Passing in pointer to spin_lock flags indicates spin lock + * already acquired so we don't need to acquire the priv lock */ + if (flags == NULL) { + flags = &local_flags; + acquire_priv_lock = 1; + } + + if (skb_queue_len(&neigh->queue) < IPOIB_MAX_PATH_REC_QUEUE) { + push_pseudo_header(skb, phdr->hwaddr); + if (acquire_priv_lock) + spin_lock_irqsave(&priv->lock, *flags); + __skb_queue_tail(&neigh->queue, skb); + spin_unlock_irqrestore(&priv->lock, *flags); + } else { + ++dev->stats.tx_dropped; + dev_kfree_skb_any(skb); + } +} + static netdev_tx_t ipoib_start_xmit(struct sk_buff *skb, struct net_device *dev) { struct ipoib_dev_priv *priv = ipoib_priv(dev); @@ -1160,6 +1188,21 @@ static netdev_tx_t ipoib_start_xmit(struct sk_buff *skb, struct net_device *dev) ipoib_cm_send(dev, skb, ipoib_cm_get(neigh)); goto unref; } + /* + * Re-check ipoib_cm_up with priv->lock held to avoid + * race condition between start_xmit and skb_dequeue in + * cm_rep_handler. Since odds are the conn should be up + * most of the time, we don't hold the lock for the + * first check above + */ + spin_lock_irqsave(&priv->lock, flags); + if (ipoib_cm_up(neigh)) { + spin_unlock_irqrestore(&priv->lock, flags); + ipoib_cm_send(dev, skb, ipoib_cm_get(neigh)); + } else { + defer_neigh_skb(skb, dev, neigh, phdr, &flags); + } + goto unref; } else if (neigh->ah && neigh->ah->valid) { neigh->ah->last_send = rn->send(dev, skb, neigh->ah->ah, IPOIB_QPN(phdr->hwaddr)); @@ -1168,15 +1211,7 @@ static netdev_tx_t ipoib_start_xmit(struct sk_buff *skb, struct net_device *dev) neigh_refresh_path(neigh, phdr->hwaddr, dev); } - if (skb_queue_len(&neigh->queue) < IPOIB_MAX_PATH_REC_QUEUE) { - push_pseudo_header(skb, phdr->hwaddr); - spin_lock_irqsave(&priv->lock, flags); - __skb_queue_tail(&neigh->queue, skb); - spin_unlock_irqrestore(&priv->lock, flags); - } else { - ++dev->stats.tx_dropped; - dev_kfree_skb_any(skb); - } + defer_neigh_skb(skb, dev, neigh, phdr, NULL); unref: ipoib_neigh_put(neigh); -- 2.12.3

6 years, 10 months

2
3
0 0

L1TF: Disabling EPT / performance-impact

by Rainer Fiebig

Hi! According to 1), disabling EPT offers the same maximum protection against L1TF as disabling SMT but has a severe performance impact. FWIW: With EPT disabled (2)), I can *not* confirm any performance-degradation for the VirtualBox Windows- or Linux-VMs that I use. Those VMs are for desktop-use, though. So to me it seems that the performance impact depends on the use case and in a desktop-setting disabling EPT may offer a simple max-protection-option with the advantage of still enabled hyperthreading. I have tried this with 4.18.1 and 4.14.63. Rainer Fiebig *** 1) https://www.kernel.org/doc/html/latest/admin-guide/l1tf.html#mitigation-sel… 2) kvm-intel.ept=0 > tail /sys/devices/system/cpu/vulnerabilities/* ==> /sys/devices/system/cpu/vulnerabilities/l1tf <== Mitigation: PTE Inversion; VMX: EPT disabled ==> /sys/devices/system/cpu/vulnerabilities/meltdown <== Mitigation: PTI ==> /sys/devices/system/cpu/vulnerabilities/spec_store_bypass <== Mitigation: Speculative Store Bypass disabled via prctl and seccomp ==> /sys/devices/system/cpu/vulnerabilities/spectre_v1 <== Mitigation: __user pointer sanitization ==> /sys/devices/system/cpu/vulnerabilities/spectre_v2 <== Mitigation: Full generic retpoline, IBPB, IBRS_FW

6 years, 10 months

2
2
0 0

FAILED: patch "[PATCH] x86/mm: Add TLB purge to free pmd/pte page interfaces" failed to apply to 4.9-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 4.9-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. thanks, greg k-h ------------------ original commit in Linus's tree ------------------ >From 5e0fb5df2ee871b841f96f9cb6a7f2784e96aa4e Mon Sep 17 00:00:00 2001 From: Toshi Kani <toshi.kani(a)hpe.com> Date: Wed, 27 Jun 2018 08:13:48 -0600 Subject: [PATCH] x86/mm: Add TLB purge to free pmd/pte page interfaces ioremap() calls pud_free_pmd_page() / pmd_free_pte_page() when it creates a pud / pmd map. The following preconditions are met at their entry. - All pte entries for a target pud/pmd address range have been cleared. - System-wide TLB purges have been peformed for a target pud/pmd address range. The preconditions assure that there is no stale TLB entry for the range. Speculation may not cache TLB entries since it requires all levels of page entries, including ptes, to have P & A-bits set for an associated address. However, speculation may cache pud/pmd entries (paging-structure caches) when they have P-bit set. Add a system-wide TLB purge (INVLPG) to a single page after clearing pud/pmd entry's P-bit. SDM 4.10.4.1, Operation that Invalidate TLBs and Paging-Structure Caches, states that: INVLPG invalidates all paging-structure caches associated with the current PCID regardless of the liner addresses to which they correspond. Fixes: 28ee90fe6048 ("x86/mm: implement free pmd/pte page interfaces") Signed-off-by: Toshi Kani <toshi.kani(a)hpe.com> Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de> Cc: mhocko(a)suse.com Cc: akpm(a)linux-foundation.org Cc: hpa(a)zytor.com Cc: cpandya(a)codeaurora.org Cc: linux-mm(a)kvack.org Cc: linux-arm-kernel(a)lists.infradead.org Cc: Joerg Roedel <joro(a)8bytes.org> Cc: stable(a)vger.kernel.org Cc: Andrew Morton <akpm(a)linux-foundation.org> Cc: Michal Hocko <mhocko(a)suse.com> Cc: "H. Peter Anvin" <hpa(a)zytor.com> Cc: <stable(a)vger.kernel.org> Link: https://lkml.kernel.org/r/20180627141348.21777-4-toshi.kani@hpe.com diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c index fbd14e506758..e3deefb891da 100644 --- a/arch/x86/mm/pgtable.c +++ b/arch/x86/mm/pgtable.c @@ -725,24 +725,44 @@ int pmd_clear_huge(pmd_t *pmd) * @pud: Pointer to a PUD. * @addr: Virtual address associated with pud. * - * Context: The pud range has been unmaped and TLB purged. + * Context: The pud range has been unmapped and TLB purged. * Return: 1 if clearing the entry succeeded. 0 otherwise. + * + * NOTE: Callers must allow a single page allocation. */ int pud_free_pmd_page(pud_t *pud, unsigned long addr) { - pmd_t *pmd; + pmd_t *pmd, *pmd_sv; + pte_t *pte; int i; if (pud_none(*pud)) return 1; pmd = (pmd_t *)pud_page_vaddr(*pud); + pmd_sv = (pmd_t *)__get_free_page(GFP_KERNEL); + if (!pmd_sv) + return 0; - for (i = 0; i < PTRS_PER_PMD; i++) - if (!pmd_free_pte_page(&pmd[i], addr + (i * PMD_SIZE))) - return 0; + for (i = 0; i < PTRS_PER_PMD; i++) { + pmd_sv[i] = pmd[i]; + if (!pmd_none(pmd[i])) + pmd_clear(&pmd[i]); + } pud_clear(pud); + + /* INVLPG to clear all paging-structure caches */ + flush_tlb_kernel_range(addr, addr + PAGE_SIZE-1); + + for (i = 0; i < PTRS_PER_PMD; i++) { + if (!pmd_none(pmd_sv[i])) { + pte = (pte_t *)pmd_page_vaddr(pmd_sv[i]); + free_page((unsigned long)pte); + } + } + + free_page((unsigned long)pmd_sv); free_page((unsigned long)pmd); return 1; @@ -753,7 +773,7 @@ int pud_free_pmd_page(pud_t *pud, unsigned long addr) * @pmd: Pointer to a PMD. * @addr: Virtual address associated with pmd. * - * Context: The pmd range has been unmaped and TLB purged. + * Context: The pmd range has been unmapped and TLB purged. * Return: 1 if clearing the entry succeeded. 0 otherwise. */ int pmd_free_pte_page(pmd_t *pmd, unsigned long addr) @@ -765,6 +785,10 @@ int pmd_free_pte_page(pmd_t *pmd, unsigned long addr) pte = (pte_t *)pmd_page_vaddr(*pmd); pmd_clear(pmd); + + /* INVLPG to clear all paging-structure caches */ + flush_tlb_kernel_range(addr, addr + PAGE_SIZE-1); + free_page((unsigned long)pte); return 1;

6 years, 10 months

1
0
0 0

FAILED: patch "[PATCH] x86/mm: Add TLB purge to free pmd/pte page interfaces" failed to apply to 4.14-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 4.14-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. thanks, greg k-h ------------------ original commit in Linus's tree ------------------ >From 5e0fb5df2ee871b841f96f9cb6a7f2784e96aa4e Mon Sep 17 00:00:00 2001 From: Toshi Kani <toshi.kani(a)hpe.com> Date: Wed, 27 Jun 2018 08:13:48 -0600 Subject: [PATCH] x86/mm: Add TLB purge to free pmd/pte page interfaces ioremap() calls pud_free_pmd_page() / pmd_free_pte_page() when it creates a pud / pmd map. The following preconditions are met at their entry. - All pte entries for a target pud/pmd address range have been cleared. - System-wide TLB purges have been peformed for a target pud/pmd address range. The preconditions assure that there is no stale TLB entry for the range. Speculation may not cache TLB entries since it requires all levels of page entries, including ptes, to have P & A-bits set for an associated address. However, speculation may cache pud/pmd entries (paging-structure caches) when they have P-bit set. Add a system-wide TLB purge (INVLPG) to a single page after clearing pud/pmd entry's P-bit. SDM 4.10.4.1, Operation that Invalidate TLBs and Paging-Structure Caches, states that: INVLPG invalidates all paging-structure caches associated with the current PCID regardless of the liner addresses to which they correspond. Fixes: 28ee90fe6048 ("x86/mm: implement free pmd/pte page interfaces") Signed-off-by: Toshi Kani <toshi.kani(a)hpe.com> Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de> Cc: mhocko(a)suse.com Cc: akpm(a)linux-foundation.org Cc: hpa(a)zytor.com Cc: cpandya(a)codeaurora.org Cc: linux-mm(a)kvack.org Cc: linux-arm-kernel(a)lists.infradead.org Cc: Joerg Roedel <joro(a)8bytes.org> Cc: stable(a)vger.kernel.org Cc: Andrew Morton <akpm(a)linux-foundation.org> Cc: Michal Hocko <mhocko(a)suse.com> Cc: "H. Peter Anvin" <hpa(a)zytor.com> Cc: <stable(a)vger.kernel.org> Link: https://lkml.kernel.org/r/20180627141348.21777-4-toshi.kani@hpe.com diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c index fbd14e506758..e3deefb891da 100644 --- a/arch/x86/mm/pgtable.c +++ b/arch/x86/mm/pgtable.c @@ -725,24 +725,44 @@ int pmd_clear_huge(pmd_t *pmd) * @pud: Pointer to a PUD. * @addr: Virtual address associated with pud. * - * Context: The pud range has been unmaped and TLB purged. + * Context: The pud range has been unmapped and TLB purged. * Return: 1 if clearing the entry succeeded. 0 otherwise. + * + * NOTE: Callers must allow a single page allocation. */ int pud_free_pmd_page(pud_t *pud, unsigned long addr) { - pmd_t *pmd; + pmd_t *pmd, *pmd_sv; + pte_t *pte; int i; if (pud_none(*pud)) return 1; pmd = (pmd_t *)pud_page_vaddr(*pud); + pmd_sv = (pmd_t *)__get_free_page(GFP_KERNEL); + if (!pmd_sv) + return 0; - for (i = 0; i < PTRS_PER_PMD; i++) - if (!pmd_free_pte_page(&pmd[i], addr + (i * PMD_SIZE))) - return 0; + for (i = 0; i < PTRS_PER_PMD; i++) { + pmd_sv[i] = pmd[i]; + if (!pmd_none(pmd[i])) + pmd_clear(&pmd[i]); + } pud_clear(pud); + + /* INVLPG to clear all paging-structure caches */ + flush_tlb_kernel_range(addr, addr + PAGE_SIZE-1); + + for (i = 0; i < PTRS_PER_PMD; i++) { + if (!pmd_none(pmd_sv[i])) { + pte = (pte_t *)pmd_page_vaddr(pmd_sv[i]); + free_page((unsigned long)pte); + } + } + + free_page((unsigned long)pmd_sv); free_page((unsigned long)pmd); return 1; @@ -753,7 +773,7 @@ int pud_free_pmd_page(pud_t *pud, unsigned long addr) * @pmd: Pointer to a PMD. * @addr: Virtual address associated with pmd. * - * Context: The pmd range has been unmaped and TLB purged. + * Context: The pmd range has been unmapped and TLB purged. * Return: 1 if clearing the entry succeeded. 0 otherwise. */ int pmd_free_pte_page(pmd_t *pmd, unsigned long addr) @@ -765,6 +785,10 @@ int pmd_free_pte_page(pmd_t *pmd, unsigned long addr) pte = (pte_t *)pmd_page_vaddr(*pmd); pmd_clear(pmd); + + /* INVLPG to clear all paging-structure caches */ + flush_tlb_kernel_range(addr, addr + PAGE_SIZE-1); + free_page((unsigned long)pte); return 1;

6 years, 10 months

1
0
0 0

FAILED: patch "[PATCH] x86/mm: Add TLB purge to free pmd/pte page interfaces" failed to apply to 4.17-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 4.17-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. thanks, greg k-h ------------------ original commit in Linus's tree ------------------ >From 5e0fb5df2ee871b841f96f9cb6a7f2784e96aa4e Mon Sep 17 00:00:00 2001 From: Toshi Kani <toshi.kani(a)hpe.com> Date: Wed, 27 Jun 2018 08:13:48 -0600 Subject: [PATCH] x86/mm: Add TLB purge to free pmd/pte page interfaces ioremap() calls pud_free_pmd_page() / pmd_free_pte_page() when it creates a pud / pmd map. The following preconditions are met at their entry. - All pte entries for a target pud/pmd address range have been cleared. - System-wide TLB purges have been peformed for a target pud/pmd address range. The preconditions assure that there is no stale TLB entry for the range. Speculation may not cache TLB entries since it requires all levels of page entries, including ptes, to have P & A-bits set for an associated address. However, speculation may cache pud/pmd entries (paging-structure caches) when they have P-bit set. Add a system-wide TLB purge (INVLPG) to a single page after clearing pud/pmd entry's P-bit. SDM 4.10.4.1, Operation that Invalidate TLBs and Paging-Structure Caches, states that: INVLPG invalidates all paging-structure caches associated with the current PCID regardless of the liner addresses to which they correspond. Fixes: 28ee90fe6048 ("x86/mm: implement free pmd/pte page interfaces") Signed-off-by: Toshi Kani <toshi.kani(a)hpe.com> Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de> Cc: mhocko(a)suse.com Cc: akpm(a)linux-foundation.org Cc: hpa(a)zytor.com Cc: cpandya(a)codeaurora.org Cc: linux-mm(a)kvack.org Cc: linux-arm-kernel(a)lists.infradead.org Cc: Joerg Roedel <joro(a)8bytes.org> Cc: stable(a)vger.kernel.org Cc: Andrew Morton <akpm(a)linux-foundation.org> Cc: Michal Hocko <mhocko(a)suse.com> Cc: "H. Peter Anvin" <hpa(a)zytor.com> Cc: <stable(a)vger.kernel.org> Link: https://lkml.kernel.org/r/20180627141348.21777-4-toshi.kani@hpe.com diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c index fbd14e506758..e3deefb891da 100644 --- a/arch/x86/mm/pgtable.c +++ b/arch/x86/mm/pgtable.c @@ -725,24 +725,44 @@ int pmd_clear_huge(pmd_t *pmd) * @pud: Pointer to a PUD. * @addr: Virtual address associated with pud. * - * Context: The pud range has been unmaped and TLB purged. + * Context: The pud range has been unmapped and TLB purged. * Return: 1 if clearing the entry succeeded. 0 otherwise. + * + * NOTE: Callers must allow a single page allocation. */ int pud_free_pmd_page(pud_t *pud, unsigned long addr) { - pmd_t *pmd; + pmd_t *pmd, *pmd_sv; + pte_t *pte; int i; if (pud_none(*pud)) return 1; pmd = (pmd_t *)pud_page_vaddr(*pud); + pmd_sv = (pmd_t *)__get_free_page(GFP_KERNEL); + if (!pmd_sv) + return 0; - for (i = 0; i < PTRS_PER_PMD; i++) - if (!pmd_free_pte_page(&pmd[i], addr + (i * PMD_SIZE))) - return 0; + for (i = 0; i < PTRS_PER_PMD; i++) { + pmd_sv[i] = pmd[i]; + if (!pmd_none(pmd[i])) + pmd_clear(&pmd[i]); + } pud_clear(pud); + + /* INVLPG to clear all paging-structure caches */ + flush_tlb_kernel_range(addr, addr + PAGE_SIZE-1); + + for (i = 0; i < PTRS_PER_PMD; i++) { + if (!pmd_none(pmd_sv[i])) { + pte = (pte_t *)pmd_page_vaddr(pmd_sv[i]); + free_page((unsigned long)pte); + } + } + + free_page((unsigned long)pmd_sv); free_page((unsigned long)pmd); return 1; @@ -753,7 +773,7 @@ int pud_free_pmd_page(pud_t *pud, unsigned long addr) * @pmd: Pointer to a PMD. * @addr: Virtual address associated with pmd. * - * Context: The pmd range has been unmaped and TLB purged. + * Context: The pmd range has been unmapped and TLB purged. * Return: 1 if clearing the entry succeeded. 0 otherwise. */ int pmd_free_pte_page(pmd_t *pmd, unsigned long addr) @@ -765,6 +785,10 @@ int pmd_free_pte_page(pmd_t *pmd, unsigned long addr) pte = (pte_t *)pmd_page_vaddr(*pmd); pmd_clear(pmd); + + /* INVLPG to clear all paging-structure caches */ + flush_tlb_kernel_range(addr, addr + PAGE_SIZE-1); + free_page((unsigned long)pte); return 1;

6 years, 10 months

1
0
0 0

FAILED: patch "[PATCH] x86/mm: Add TLB purge to free pmd/pte page interfaces" failed to apply to 4.18-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 4.18-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. thanks, greg k-h ------------------ original commit in Linus's tree ------------------ >From 5e0fb5df2ee871b841f96f9cb6a7f2784e96aa4e Mon Sep 17 00:00:00 2001 From: Toshi Kani <toshi.kani(a)hpe.com> Date: Wed, 27 Jun 2018 08:13:48 -0600 Subject: [PATCH] x86/mm: Add TLB purge to free pmd/pte page interfaces ioremap() calls pud_free_pmd_page() / pmd_free_pte_page() when it creates a pud / pmd map. The following preconditions are met at their entry. - All pte entries for a target pud/pmd address range have been cleared. - System-wide TLB purges have been peformed for a target pud/pmd address range. The preconditions assure that there is no stale TLB entry for the range. Speculation may not cache TLB entries since it requires all levels of page entries, including ptes, to have P & A-bits set for an associated address. However, speculation may cache pud/pmd entries (paging-structure caches) when they have P-bit set. Add a system-wide TLB purge (INVLPG) to a single page after clearing pud/pmd entry's P-bit. SDM 4.10.4.1, Operation that Invalidate TLBs and Paging-Structure Caches, states that: INVLPG invalidates all paging-structure caches associated with the current PCID regardless of the liner addresses to which they correspond. Fixes: 28ee90fe6048 ("x86/mm: implement free pmd/pte page interfaces") Signed-off-by: Toshi Kani <toshi.kani(a)hpe.com> Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de> Cc: mhocko(a)suse.com Cc: akpm(a)linux-foundation.org Cc: hpa(a)zytor.com Cc: cpandya(a)codeaurora.org Cc: linux-mm(a)kvack.org Cc: linux-arm-kernel(a)lists.infradead.org Cc: Joerg Roedel <joro(a)8bytes.org> Cc: stable(a)vger.kernel.org Cc: Andrew Morton <akpm(a)linux-foundation.org> Cc: Michal Hocko <mhocko(a)suse.com> Cc: "H. Peter Anvin" <hpa(a)zytor.com> Cc: <stable(a)vger.kernel.org> Link: https://lkml.kernel.org/r/20180627141348.21777-4-toshi.kani@hpe.com diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c index fbd14e506758..e3deefb891da 100644 --- a/arch/x86/mm/pgtable.c +++ b/arch/x86/mm/pgtable.c @@ -725,24 +725,44 @@ int pmd_clear_huge(pmd_t *pmd) * @pud: Pointer to a PUD. * @addr: Virtual address associated with pud. * - * Context: The pud range has been unmaped and TLB purged. + * Context: The pud range has been unmapped and TLB purged. * Return: 1 if clearing the entry succeeded. 0 otherwise. + * + * NOTE: Callers must allow a single page allocation. */ int pud_free_pmd_page(pud_t *pud, unsigned long addr) { - pmd_t *pmd; + pmd_t *pmd, *pmd_sv; + pte_t *pte; int i; if (pud_none(*pud)) return 1; pmd = (pmd_t *)pud_page_vaddr(*pud); + pmd_sv = (pmd_t *)__get_free_page(GFP_KERNEL); + if (!pmd_sv) + return 0; - for (i = 0; i < PTRS_PER_PMD; i++) - if (!pmd_free_pte_page(&pmd[i], addr + (i * PMD_SIZE))) - return 0; + for (i = 0; i < PTRS_PER_PMD; i++) { + pmd_sv[i] = pmd[i]; + if (!pmd_none(pmd[i])) + pmd_clear(&pmd[i]); + } pud_clear(pud); + + /* INVLPG to clear all paging-structure caches */ + flush_tlb_kernel_range(addr, addr + PAGE_SIZE-1); + + for (i = 0; i < PTRS_PER_PMD; i++) { + if (!pmd_none(pmd_sv[i])) { + pte = (pte_t *)pmd_page_vaddr(pmd_sv[i]); + free_page((unsigned long)pte); + } + } + + free_page((unsigned long)pmd_sv); free_page((unsigned long)pmd); return 1; @@ -753,7 +773,7 @@ int pud_free_pmd_page(pud_t *pud, unsigned long addr) * @pmd: Pointer to a PMD. * @addr: Virtual address associated with pmd. * - * Context: The pmd range has been unmaped and TLB purged. + * Context: The pmd range has been unmapped and TLB purged. * Return: 1 if clearing the entry succeeded. 0 otherwise. */ int pmd_free_pte_page(pmd_t *pmd, unsigned long addr) @@ -765,6 +785,10 @@ int pmd_free_pte_page(pmd_t *pmd, unsigned long addr) pte = (pte_t *)pmd_page_vaddr(*pmd); pmd_clear(pmd); + + /* INVLPG to clear all paging-structure caches */ + flush_tlb_kernel_range(addr, addr + PAGE_SIZE-1); + free_page((unsigned long)pte); return 1;

6 years, 10 months

1
0
0 0

FAILED: patch "[PATCH] x86/mm/init: Remove freed kernel image areas from alias" failed to apply to 4.18-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 4.18-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. thanks, greg k-h ------------------ original commit in Linus's tree ------------------ >From c40a56a7818cfe735fc93a69e1875f8bba834483 Mon Sep 17 00:00:00 2001 From: Dave Hansen <dave.hansen(a)linux.intel.com> Date: Thu, 2 Aug 2018 15:58:31 -0700 Subject: [PATCH] x86/mm/init: Remove freed kernel image areas from alias mapping The kernel image is mapped into two places in the virtual address space (addresses without KASLR, of course): 1. The kernel direct map (0xffff880000000000) 2. The "high kernel map" (0xffffffff81000000) We actually execute out of #2. If we get the address of a kernel symbol, it points to #2, but almost all physical-to-virtual translations point to Parts of the "high kernel map" alias are mapped in the userspace page tables with the Global bit for performance reasons. The parts that we map to userspace do not (er, should not) have secrets. When PTI is enabled then the global bit is usually not set in the high mapping and just used to compensate for poor performance on systems which lack PCID. This is fine, except that some areas in the kernel image that are adjacent to the non-secret-containing areas are unused holes. We free these holes back into the normal page allocator and reuse them as normal kernel memory. The memory will, of course, get *used* via the normal map, but the alias mapping is kept. This otherwise unused alias mapping of the holes will, by default keep the Global bit, be mapped out to userspace, and be vulnerable to Meltdown. Remove the alias mapping of these pages entirely. This is likely to fracture the 2M page mapping the kernel image near these areas, but this should affect a minority of the area. The pageattr code changes *all* aliases mapping the physical pages that it operates on (by default). We only want to modify a single alias, so we need to tweak its behavior. This unmapping behavior is currently dependent on PTI being in place. Going forward, we should at least consider doing this for all configurations. Having an extra read-write alias for memory is not exactly ideal for debugging things like random memory corruption and this does undercut features like DEBUG_PAGEALLOC or future work like eXclusive Page Frame Ownership (XPFO). Before this patch: current_kernel:---[ High Kernel Mapping ]--- current_kernel-0xffffffff80000000-0xffffffff81000000 16M pmd current_kernel-0xffffffff81000000-0xffffffff81e00000 14M ro PSE GLB x pmd current_kernel-0xffffffff81e00000-0xffffffff81e11000 68K ro GLB x pte current_kernel-0xffffffff81e11000-0xffffffff82000000 1980K RW NX pte current_kernel-0xffffffff82000000-0xffffffff82600000 6M ro PSE GLB NX pmd current_kernel-0xffffffff82600000-0xffffffff82c00000 6M RW PSE NX pmd current_kernel-0xffffffff82c00000-0xffffffff82e00000 2M RW NX pte current_kernel-0xffffffff82e00000-0xffffffff83200000 4M RW PSE NX pmd current_kernel-0xffffffff83200000-0xffffffffa0000000 462M pmd current_user:---[ High Kernel Mapping ]--- current_user-0xffffffff80000000-0xffffffff81000000 16M pmd current_user-0xffffffff81000000-0xffffffff81e00000 14M ro PSE GLB x pmd current_user-0xffffffff81e00000-0xffffffff81e11000 68K ro GLB x pte current_user-0xffffffff81e11000-0xffffffff82000000 1980K RW NX pte current_user-0xffffffff82000000-0xffffffff82600000 6M ro PSE GLB NX pmd current_user-0xffffffff82600000-0xffffffffa0000000 474M pmd After this patch: current_kernel:---[ High Kernel Mapping ]--- current_kernel-0xffffffff80000000-0xffffffff81000000 16M pmd current_kernel-0xffffffff81000000-0xffffffff81e00000 14M ro PSE GLB x pmd current_kernel-0xffffffff81e00000-0xffffffff81e11000 68K ro GLB x pte current_kernel-0xffffffff81e11000-0xffffffff82000000 1980K pte current_kernel-0xffffffff82000000-0xffffffff82400000 4M ro PSE GLB NX pmd current_kernel-0xffffffff82400000-0xffffffff82488000 544K ro NX pte current_kernel-0xffffffff82488000-0xffffffff82600000 1504K pte current_kernel-0xffffffff82600000-0xffffffff82c00000 6M RW PSE NX pmd current_kernel-0xffffffff82c00000-0xffffffff82c0d000 52K RW NX pte current_kernel-0xffffffff82c0d000-0xffffffff82dc0000 1740K pte current_user:---[ High Kernel Mapping ]--- current_user-0xffffffff80000000-0xffffffff81000000 16M pmd current_user-0xffffffff81000000-0xffffffff81e00000 14M ro PSE GLB x pmd current_user-0xffffffff81e00000-0xffffffff81e11000 68K ro GLB x pte current_user-0xffffffff81e11000-0xffffffff82000000 1980K pte current_user-0xffffffff82000000-0xffffffff82400000 4M ro PSE GLB NX pmd current_user-0xffffffff82400000-0xffffffff82488000 544K ro NX pte current_user-0xffffffff82488000-0xffffffff82600000 1504K pte current_user-0xffffffff82600000-0xffffffffa0000000 474M pmd [ tglx: Do not unmap on 32bit as there is only one mapping ] Fixes: 0f561fce4d69 ("x86/pti: Enable global pages for shared areas") Signed-off-by: Dave Hansen <dave.hansen(a)linux.intel.com> Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de> Cc: Kees Cook <keescook(a)google.com> Cc: Andrea Arcangeli <aarcange(a)redhat.com> Cc: Juergen Gross <jgross(a)suse.com> Cc: Josh Poimboeuf <jpoimboe(a)redhat.com> Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> Cc: Peter Zijlstra <peterz(a)infradead.org> Cc: Hugh Dickins <hughd(a)google.com> Cc: Linus Torvalds <torvalds(a)linux-foundation.org> Cc: Borislav Petkov <bp(a)alien8.de> Cc: Andy Lutomirski <luto(a)kernel.org> Cc: Andi Kleen <ak(a)linux.intel.com> Cc: Joerg Roedel <jroedel(a)suse.de> Link: https://lkml.kernel.org/r/20180802225831.5F6A2BFC@viggo.jf.intel.com diff --git a/arch/x86/include/asm/set_memory.h b/arch/x86/include/asm/set_memory.h index bd090367236c..34cffcef7375 100644 --- a/arch/x86/include/asm/set_memory.h +++ b/arch/x86/include/asm/set_memory.h @@ -46,6 +46,7 @@ int set_memory_np(unsigned long addr, int numpages); int set_memory_4k(unsigned long addr, int numpages); int set_memory_encrypted(unsigned long addr, int numpages); int set_memory_decrypted(unsigned long addr, int numpages); +int set_memory_np_noalias(unsigned long addr, int numpages); int set_memory_array_uc(unsigned long *addr, int addrinarray); int set_memory_array_wc(unsigned long *addr, int addrinarray); diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c index bc11dedffc45..74b157ac078d 100644 --- a/arch/x86/mm/init.c +++ b/arch/x86/mm/init.c @@ -780,8 +780,30 @@ void free_init_pages(char *what, unsigned long begin, unsigned long end) */ void free_kernel_image_pages(void *begin, void *end) { - free_init_pages("unused kernel image", - (unsigned long)begin, (unsigned long)end); + unsigned long begin_ul = (unsigned long)begin; + unsigned long end_ul = (unsigned long)end; + unsigned long len_pages = (end_ul - begin_ul) >> PAGE_SHIFT; + + + free_init_pages("unused kernel image", begin_ul, end_ul); + + /* + * PTI maps some of the kernel into userspace. For performance, + * this includes some kernel areas that do not contain secrets. + * Those areas might be adjacent to the parts of the kernel image + * being freed, which may contain secrets. Remove the "high kernel + * image mapping" for these freed areas, ensuring they are not even + * potentially vulnerable to Meltdown regardless of the specific + * optimizations PTI is currently using. + * + * The "noalias" prevents unmapping the direct map alias which is + * needed to access the freed pages. + * + * This is only valid for 64bit kernels. 32bit has only one mapping + * which can't be treated in this way for obvious reasons. + */ + if (IS_ENABLED(CONFIG_X86_64) && cpu_feature_enabled(X86_FEATURE_PTI)) + set_memory_np_noalias(begin_ul, len_pages); } void __ref free_initmem(void) diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c index c04153796f61..0a74996a1149 100644 --- a/arch/x86/mm/pageattr.c +++ b/arch/x86/mm/pageattr.c @@ -53,6 +53,7 @@ static DEFINE_SPINLOCK(cpa_lock); #define CPA_FLUSHTLB 1 #define CPA_ARRAY 2 #define CPA_PAGES_ARRAY 4 +#define CPA_NO_CHECK_ALIAS 8 /* Do not search for aliases */ #ifdef CONFIG_PROC_FS static unsigned long direct_pages_count[PG_LEVEL_NUM]; @@ -1486,6 +1487,9 @@ static int change_page_attr_set_clr(unsigned long *addr, int numpages, /* No alias checking for _NX bit modifications */ checkalias = (pgprot_val(mask_set) | pgprot_val(mask_clr)) != _PAGE_NX; + /* Has caller explicitly disabled alias checking? */ + if (in_flag & CPA_NO_CHECK_ALIAS) + checkalias = 0; ret = __change_page_attr_set_clr(&cpa, checkalias); @@ -1772,6 +1776,15 @@ int set_memory_np(unsigned long addr, int numpages) return change_page_attr_clear(&addr, numpages, __pgprot(_PAGE_PRESENT), 0); } +int set_memory_np_noalias(unsigned long addr, int numpages) +{ + int cpa_flags = CPA_NO_CHECK_ALIAS; + + return change_page_attr_set_clr(&addr, numpages, __pgprot(0), + __pgprot(_PAGE_PRESENT), 0, + cpa_flags, NULL); +} + int set_memory_4k(unsigned long addr, int numpages) { return change_page_attr_set_clr(&addr, numpages, __pgprot(0),

6 years, 10 months

1
0
0 0

Feedback: 4.18.1

by Rainer Fiebig

Hi! With 4.18.1 and l1tf=full no problems so far and no noticeable performance-degradation in normal desktop-usage including a VirtualBox-VM. Compiling a kernel seems to take a bit longer, though. 4.9.120 and 4.14.63 also seem OK in cursory testing. CPU: Core i3. Thanks and so long! Rainer Fiebig -- The truth always turns out to be simpler than you thought. Richard Feynman

6 years, 10 months

2
1
0 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

Linux-stable-mirror August 2018