This is the start of the stable review cycle for the 4.14.284 release.
There are 20 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Thu, 16 Jun 2022 18:37:02 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.14.284-r…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.14.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 4.14.284-rc1
Josh Poimboeuf <jpoimboe(a)kernel.org>
x86/speculation/mmio: Print SMT warning
Pawan Gupta <pawan.kumar.gupta(a)linux.intel.com>
KVM: x86/speculation: Disable Fill buffer clear within guests
Pawan Gupta <pawan.kumar.gupta(a)linux.intel.com>
x86/speculation/mmio: Reuse SRBDS mitigation for SBDS
Pawan Gupta <pawan.kumar.gupta(a)linux.intel.com>
x86/speculation/srbds: Update SRBDS mitigation selection
Pawan Gupta <pawan.kumar.gupta(a)linux.intel.com>
x86/speculation/mmio: Add sysfs reporting for Processor MMIO Stale Data
Pawan Gupta <pawan.kumar.gupta(a)linux.intel.com>
x86/speculation/mmio: Enable CPU Fill buffer clearing on idle
Pawan Gupta <pawan.kumar.gupta(a)linux.intel.com>
x86/bugs: Group MDS, TAA & Processor MMIO Stale Data mitigations
Pawan Gupta <pawan.kumar.gupta(a)linux.intel.com>
x86/speculation/mmio: Add mitigation for Processor MMIO Stale Data
Pawan Gupta <pawan.kumar.gupta(a)linux.intel.com>
x86/speculation: Add a common function for MD_CLEAR mitigation update
Pawan Gupta <pawan.kumar.gupta(a)linux.intel.com>
x86/speculation/mmio: Enumerate Processor MMIO Stale Data bug
Pawan Gupta <pawan.kumar.gupta(a)linux.intel.com>
Documentation: Add documentation for Processor MMIO Stale Data
Gayatri Kammela <gayatri.kammela(a)intel.com>
x86/cpu: Add another Alder Lake CPU to the Intel family
Tony Luck <tony.luck(a)intel.com>
x86/cpu: Add Lakefield, Alder Lake and Rocket Lake models to the to Intel CPU family
Kan Liang <kan.liang(a)linux.intel.com>
x86/cpu: Add Comet Lake to the Intel CPU models header
Kan Liang <kan.liang(a)linux.intel.com>
x86/CPU: Add more Icelake model numbers
Rajneesh Bhardwaj <rajneesh.bhardwaj(a)linux.intel.com>
x86/CPU: Add Icelake model number
Rajneesh Bhardwaj <rajneesh.bhardwaj(a)intel.com>
x86/cpu: Add Cannonlake to Intel family
Zhang Rui <rui.zhang(a)intel.com>
x86/cpu: Add Jasper Lake to Intel family
Guenter Roeck <linux(a)roeck-us.net>
cpu/speculation: Add prototype for cpu_show_srbds()
Gayatri Kammela <gayatri.kammela(a)intel.com>
x86/cpu: Add Elkhart Lake to Intel family
-------------
Diffstat:
Documentation/ABI/testing/sysfs-devices-system-cpu | 1 +
Documentation/admin-guide/hw-vuln/index.rst | 1 +
.../hw-vuln/processor_mmio_stale_data.rst | 246 +++++++++++++++++++++
Documentation/admin-guide/kernel-parameters.txt | 36 +++
Makefile | 4 +-
arch/x86/include/asm/cpufeatures.h | 1 +
arch/x86/include/asm/intel-family.h | 25 +++
arch/x86/include/asm/msr-index.h | 25 +++
arch/x86/include/asm/nospec-branch.h | 2 +
arch/x86/kernel/cpu/bugs.c | 235 +++++++++++++++++---
arch/x86/kernel/cpu/common.c | 52 ++++-
arch/x86/kvm/vmx.c | 77 ++++++-
arch/x86/kvm/x86.c | 4 +
drivers/base/cpu.c | 8 +
include/linux/cpu.h | 4 +
15 files changed, 679 insertions(+), 42 deletions(-)
Den 2022-06-14 kl. 20:12, skrev Greg Kroah-Hartman:
> On Tue, Jun 14, 2022 at 10:08:27AM -0700, Guenter Roeck wrote:
>> On Tue, Jun 14, 2022 at 08:36:08AM -0700, Guenter Roeck wrote:
>>> On Mon, Jun 13, 2022 at 08:19:49PM +0200, Greg Kroah-Hartman wrote:
>>>> This is the start of the stable review cycle for the 5.15.47 release.
>>>> There are 251 patches in this series, all will be posted as a response
>>>> to this one. If anyone has any issues with these being applied, please
>>>> let me know.
>>>>
>>>> Responses should be made by Wed, 15 Jun 2022 18:18:03 +0000.
>>>> Anything received after that time might be too late.
>>>>
>>>
>>> Build results:
>>> total: 159 pass: 159 fail: 0
>>> Qemu test results:
>>> total: 488 pass: 488 fail: 0
>>>
>>
>> I spoke a bit too early. I see the following backtrace in some qemu arm
>> boot tests.
>>
>> BUG: spinlock bad magic on CPU#0, kdevtmpfs/15
>> lock: noop_backing_dev_info+0x6c/0x3b0, .magic: 00000000, .owner: <none>/-1, .owner_cpu: 0
>> CPU: 0 PID: 15 Comm: kdevtmpfs Not tainted 5.15.47-rc2-00252-g677f0128d0ed #1
>> Hardware name: ARM RealView Machine (Device Tree Support)
>> [<c01101d0>] (unwind_backtrace) from [<c010bc0c>] (show_stack+0x10/0x14)
>> [<c010bc0c>] (show_stack) from [<c0a10ae4>] (dump_stack_lvl+0x68/0x90)
>> [<c0a10ae4>] (dump_stack_lvl) from [<c0191250>] (do_raw_spin_lock+0xbc/0x124)
>> [<c0191250>] (do_raw_spin_lock) from [<c02eb578>] (__mark_inode_dirty+0x1cc/0x704)
>> [<c02eb578>] (__mark_inode_dirty) from [<c02e6a74>] (simple_setattr+0x44/0x5c)
>> [<c02e6a74>] (simple_setattr) from [<c02d7a18>] (notify_change+0x400/0x45c)
>> [<c02d7a18>] (notify_change) from [<c0a19ef8>] (devtmpfsd+0x1f8/0x2b8)
>> [<c0a19ef8>] (devtmpfsd) from [<c014cf3c>] (kthread+0x150/0x17c)
>> [<c014cf3c>] (kthread) from [<c0100120>] (ret_from_fork+0x14/0x34)
>> Exception stack(0xd00dbfb0 to 0xd00dbff8)
>> bfa0: 00000000 00000000 00000000 00000000
>> bfc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
>> bfe0: 00000000 00000000 00000000 00000000 00000013 00000000
>>
>> This bisects to commit bc5d960d4e58 ("writeback: Fix inode->i_io_list not
>> be protected by inode->i_lock error"). The problem is also seen in the
>> mainline kernel. v5.15.y.queue and later are affected. Reverting the patch
>> here and in mainline fixes the problem.
>
> Thanks for letting me know. Hopefully it gets fixed in upstream...
>
I "think" this is the suggested fix:
https://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs.git/commit/?h…
--
Thomas
The patch titled
Subject: mm/memory-failure: disable unpoison once hw error happens
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
mm-memory-failure-disable-unpoison-once-hw-error-happens.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: zhenwei pi <pizhenwei(a)bytedance.com>
Subject: mm/memory-failure: disable unpoison once hw error happens
Date: Wed, 15 Jun 2022 17:32:09 +0800
Currently unpoison_memory(unsigned long pfn) is designed for soft
poison(hwpoison-inject) only. Since 17fae1294ad9d, the KPTE gets cleared
on a x86 platform once hardware memory corrupts.
Unpoisoning a hardware corrupted page puts page back buddy only, the
kernel has a chance to access the page with *NOT PRESENT* KPTE. This
leads BUG during accessing on the corrupted KPTE.
Suggested by David&Naoya, disable unpoison mechanism when a real HW error
happens to avoid BUG like this:
Unpoison: Software-unpoisoned page 0x61234
BUG: unable to handle page fault for address: ffff888061234000
#PF: supervisor write access in kernel mode
#PF: error_code(0x0002) - not-present page
PGD 2c01067 P4D 2c01067 PUD 107267063 PMD 10382b063 PTE 800fffff9edcb062
Oops: 0002 [#1] PREEMPT SMP NOPTI
CPU: 4 PID: 26551 Comm: stress Kdump: loaded Tainted: G M OE 5.18.0.bm.1-amd64 #7
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996) ...
RIP: 0010:clear_page_erms+0x7/0x10
Code: ...
RSP: 0000:ffffc90001107bc8 EFLAGS: 00010246
RAX: 0000000000000000 RBX: 0000000000000901 RCX: 0000000000001000
RDX: ffffea0001848d00 RSI: ffffea0001848d40 RDI: ffff888061234000
RBP: ffffea0001848d00 R08: 0000000000000901 R09: 0000000000001276
R10: 0000000000000003 R11: 0000000000000000 R12: 0000000000000001
R13: 0000000000000000 R14: 0000000000140dca R15: 0000000000000001
FS: 00007fd8b2333740(0000) GS:ffff88813fd00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffff888061234000 CR3: 00000001023d2005 CR4: 0000000000770ee0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
PKRU: 55555554
Call Trace:
<TASK>
prep_new_page+0x151/0x170
get_page_from_freelist+0xca0/0xe20
? sysvec_apic_timer_interrupt+0xab/0xc0
? asm_sysvec_apic_timer_interrupt+0x1b/0x20
__alloc_pages+0x17e/0x340
__folio_alloc+0x17/0x40
vma_alloc_folio+0x84/0x280
__handle_mm_fault+0x8d4/0xeb0
handle_mm_fault+0xd5/0x2a0
do_user_addr_fault+0x1d0/0x680
? kvm_read_and_reset_apf_flags+0x3b/0x50
exc_page_fault+0x78/0x170
asm_exc_page_fault+0x27/0x30
Link: https://lkml.kernel.org/r/20220615093209.259374-2-pizhenwei@bytedance.com
Fixes: 847ce401df392 ("HWPOISON: Add unpoisoning support")
Fixes: 17fae1294ad9d ("x86/{mce,mm}: Unmap the entire page if the whole page is affected and poisoned")
Signed-off-by: zhenwei pi <pizhenwei(a)bytedance.com>
Acked-by: David Hildenbrand <david(a)redhat.com>
Acked-by: Naoya Horiguchi <naoya.horiguchi(a)nec.com>
Cc: Oscar Salvador <osalvador(a)suse.de>
Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
Documentation/vm/hwpoison.rst | 3 ++-
drivers/base/memory.c | 2 +-
include/linux/mm.h | 1 +
mm/hwpoison-inject.c | 2 +-
mm/madvise.c | 2 +-
mm/memory-failure.c | 12 ++++++++++++
6 files changed, 18 insertions(+), 4 deletions(-)
--- a/Documentation/vm/hwpoison.rst~mm-memory-failure-disable-unpoison-once-hw-error-happens
+++ a/Documentation/vm/hwpoison.rst
@@ -120,7 +120,8 @@ Testing
unpoison-pfn
Software-unpoison page at PFN echoed into this file. This way
a page can be reused again. This only works for Linux
- injected failures, not for real memory failures.
+ injected failures, not for real memory failures. Once any hardware
+ memory failure happens, this feature is disabled.
Note these injection interfaces are not stable and might change between
kernel versions
--- a/drivers/base/memory.c~mm-memory-failure-disable-unpoison-once-hw-error-happens
+++ a/drivers/base/memory.c
@@ -558,7 +558,7 @@ static ssize_t hard_offline_page_store(s
if (kstrtoull(buf, 0, &pfn) < 0)
return -EINVAL;
pfn >>= PAGE_SHIFT;
- ret = memory_failure(pfn, 0);
+ ret = memory_failure(pfn, MF_SW_SIMULATED);
if (ret == -EOPNOTSUPP)
ret = 0;
return ret ? ret : count;
--- a/include/linux/mm.h~mm-memory-failure-disable-unpoison-once-hw-error-happens
+++ a/include/linux/mm.h
@@ -3232,6 +3232,7 @@ enum mf_flags {
MF_MUST_KILL = 1 << 2,
MF_SOFT_OFFLINE = 1 << 3,
MF_UNPOISON = 1 << 4,
+ MF_SW_SIMULATED = 1 << 5,
};
extern int memory_failure(unsigned long pfn, int flags);
extern void memory_failure_queue(unsigned long pfn, int flags);
--- a/mm/hwpoison-inject.c~mm-memory-failure-disable-unpoison-once-hw-error-happens
+++ a/mm/hwpoison-inject.c
@@ -48,7 +48,7 @@ static int hwpoison_inject(void *data, u
inject:
pr_info("Injecting memory failure at pfn %#lx\n", pfn);
- err = memory_failure(pfn, 0);
+ err = memory_failure(pfn, MF_SW_SIMULATED);
return (err == -EOPNOTSUPP) ? 0 : err;
}
--- a/mm/madvise.c~mm-memory-failure-disable-unpoison-once-hw-error-happens
+++ a/mm/madvise.c
@@ -1112,7 +1112,7 @@ static int madvise_inject_error(int beha
} else {
pr_info("Injecting memory failure for pfn %#lx at process virtual address %#lx\n",
pfn, start);
- ret = memory_failure(pfn, MF_COUNT_INCREASED);
+ ret = memory_failure(pfn, MF_COUNT_INCREASED | MF_SW_SIMULATED);
if (ret == -EOPNOTSUPP)
ret = 0;
}
--- a/mm/memory-failure.c~mm-memory-failure-disable-unpoison-once-hw-error-happens
+++ a/mm/memory-failure.c
@@ -69,6 +69,8 @@ int sysctl_memory_failure_recovery __rea
atomic_long_t num_poisoned_pages __read_mostly = ATOMIC_LONG_INIT(0);
+static bool hw_memory_failure __read_mostly = false;
+
static bool __page_handle_poison(struct page *page)
{
int ret;
@@ -1768,6 +1770,9 @@ int memory_failure(unsigned long pfn, in
mutex_lock(&mf_mutex);
+ if (!(flags & MF_SW_SIMULATED))
+ hw_memory_failure = true;
+
p = pfn_to_online_page(pfn);
if (!p) {
res = arch_memory_failure(pfn, flags);
@@ -2103,6 +2108,13 @@ int unpoison_memory(unsigned long pfn)
mutex_lock(&mf_mutex);
+ if (hw_memory_failure) {
+ unpoison_pr_info("Unpoison: Disabled after HW memory failure %#lx\n",
+ pfn, &unpoison_rs);
+ ret = -EOPNOTSUPP;
+ goto unlock_mutex;
+ }
+
if (!PageHWPoison(p)) {
unpoison_pr_info("Unpoison: Page was already unpoisoned %#lx\n",
pfn, &unpoison_rs);
_
Patches currently in -mm which might be from pizhenwei(a)bytedance.com are
mm-memory-failure-disable-unpoison-once-hw-error-happens.patch
From: Paolo Bonzini <pbonzini(a)redhat.com>
[ Upstream commit 6cd88243c7e03845a450795e134b488fc2afb736 ]
If a vCPU is outside guest mode and is scheduled out, it might be in the
process of making a memory access. A problem occurs if another vCPU uses
the PV TLB flush feature during the period when the vCPU is scheduled
out, and a virtual address has already been translated but has not yet
been accessed, because this is equivalent to using a stale TLB entry.
To avoid this, only report a vCPU as preempted if sure that the guest
is at an instruction boundary. A rescheduling request will be delivered
to the host physical CPU as an external interrupt, so for simplicity
consider any vmexit *not* instruction boundary except for external
interrupts.
It would in principle be okay to report the vCPU as preempted also
if it is sleeping in kvm_vcpu_block(): a TLB flush IPI will incur the
vmentry/vmexit overhead unnecessarily, and optimistic spinning is
also unlikely to succeed. However, leave it for later because right
now kvm_vcpu_check_block() is doing memory accesses. Even
though the TLB flush issue only applies to virtual memory address,
it's very much preferrable to be conservative.
Reported-by: Jann Horn <jannh(a)google.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
arch/x86/include/asm/kvm_host.h | 3 +++
arch/x86/kvm/svm/svm.c | 2 ++
arch/x86/kvm/vmx/vmx.c | 1 +
arch/x86/kvm/x86.c | 22 ++++++++++++++++++++++
4 files changed, 28 insertions(+)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 49d814b2a341..a35f5e23fc2a 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -642,6 +642,7 @@ struct kvm_vcpu_arch {
u64 ia32_misc_enable_msr;
u64 smbase;
u64 smi_count;
+ bool at_instruction_boundary;
bool tpr_access_reporting;
bool xsaves_enabled;
u64 ia32_xss;
@@ -1271,6 +1272,8 @@ struct kvm_vcpu_stat {
u64 nested_run;
u64 directed_yield_attempted;
u64 directed_yield_successful;
+ u64 preemption_reported;
+ u64 preemption_other;
u64 guest_mode;
};
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 26f2da1590ed..5b51156712f7 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -4263,6 +4263,8 @@ static int svm_check_intercept(struct kvm_vcpu *vcpu,
static void svm_handle_exit_irqoff(struct kvm_vcpu *vcpu)
{
+ if (to_svm(vcpu)->vmcb->control.exit_code == SVM_EXIT_INTR)
+ vcpu->arch.at_instruction_boundary = true;
}
static void svm_sched_in(struct kvm_vcpu *vcpu, int cpu)
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 16a660a0ed5f..08f9f05a6cb2 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -6390,6 +6390,7 @@ static void handle_external_interrupt_irqoff(struct kvm_vcpu *vcpu)
return;
handle_interrupt_nmi_irqoff(vcpu, gate_offset(desc));
+ vcpu->arch.at_instruction_boundary = true;
}
static void vmx_handle_exit_irqoff(struct kvm_vcpu *vcpu)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 23905ba3058a..fbf10fa99507 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -277,6 +277,8 @@ const struct _kvm_stats_desc kvm_vcpu_stats_desc[] = {
STATS_DESC_COUNTER(VCPU, nested_run),
STATS_DESC_COUNTER(VCPU, directed_yield_attempted),
STATS_DESC_COUNTER(VCPU, directed_yield_successful),
+ STATS_DESC_COUNTER(VCPU, preemption_reported),
+ STATS_DESC_COUNTER(VCPU, preemption_other),
STATS_DESC_ICOUNTER(VCPU, guest_mode)
};
@@ -4368,6 +4370,19 @@ static void kvm_steal_time_set_preempted(struct kvm_vcpu *vcpu)
struct kvm_memslots *slots;
static const u8 preempted = KVM_VCPU_PREEMPTED;
+ /*
+ * The vCPU can be marked preempted if and only if the VM-Exit was on
+ * an instruction boundary and will not trigger guest emulation of any
+ * kind (see vcpu_run). Vendor specific code controls (conservatively)
+ * when this is true, for example allowing the vCPU to be marked
+ * preempted if and only if the VM-Exit was due to a host interrupt.
+ */
+ if (!vcpu->arch.at_instruction_boundary) {
+ vcpu->stat.preemption_other++;
+ return;
+ }
+
+ vcpu->stat.preemption_reported++;
if (!(vcpu->arch.st.msr_val & KVM_MSR_ENABLED))
return;
@@ -9936,6 +9951,13 @@ static int vcpu_run(struct kvm_vcpu *vcpu)
vcpu->arch.l1tf_flush_l1d = true;
for (;;) {
+ /*
+ * If another guest vCPU requests a PV TLB flush in the middle
+ * of instruction emulation, the rest of the emulation could
+ * use a stale page translation. Assume that any code after
+ * this point can start executing an instruction.
+ */
+ vcpu->arch.at_instruction_boundary = false;
if (kvm_vcpu_running(vcpu)) {
r = vcpu_enter_guest(vcpu);
} else {
--
2.35.1