Hi all,
This patch series includes backports for the changes that fix CVE-2023-52447.
The changes were applied cleanly from my 5.15 backport, viewable at
https://lore.kernel.org/stable/cover.1710187165.git.rkolchmeyer@google.com/….
The notes in that patch set largely apply here.
The only note I have for 5.10 is that the test cases with sleepable BPF
programs in commit 1624918be84a ("selftests/bpf: Add test cases for inner map")
do not seem to be compatible with the 5.10 kernel. But the situation with the
other test cases matches the observations I shared in the 5.15 patch set.
Thanks,
-Robert
Hou Tao (1):
bpf: Defer the free of inner map when necessary
Paul E. McKenney (1):
rcu-tasks: Provide rcu_trace_implies_rcu_gp()
include/linux/bpf.h | 7 ++++++-
include/linux/rcupdate.h | 12 ++++++++++++
kernel/bpf/map_in_map.c | 11 ++++++++---
kernel/bpf/syscall.c | 26 ++++++++++++++++++++++++--
kernel/rcu/tasks.h | 2 ++
5 files changed, 52 insertions(+), 6 deletions(-)
--
2.44.0.278.ge034bb2e1d-goog
From: Ryan Roberts <ryan.roberts(a)arm.com>
There was previously a theoretical window where swapoff() could run and
teardown a swap_info_struct while a call to free_swap_and_cache() was
running in another thread. This could cause, amongst other bad
possibilities, swap_page_trans_huge_swapped() (called by
free_swap_and_cache()) to access the freed memory for swap_map.
This is a theoretical problem and I haven't been able to provoke it from a
test case. But there has been agreement based on code review that this is
possible (see link below).
Fix it by using get_swap_device()/put_swap_device(), which will stall
swapoff(). There was an extra check in _swap_info_get() to confirm that
the swap entry was not free. This isn't present in get_swap_device()
because it doesn't make sense in general due to the race between getting
the reference and swapoff. So I've added an equivalent check directly in
free_swap_and_cache().
Details of how to provoke one possible issue:
--8<-----
CPU0 CPU1
---- ----
shmem_undo_range
shmem_free_swap
xa_cmpxchg_irq
free_swap_and_cache
__swap_entry_free
/* swap_count() become 0 */
swapoff
try_to_unuse
shmem_unuse /* cannot find swap entry */
find_next_to_unuse
filemap_get_folio
folio_free_swap
/* remove swap cache */
/* free si->swap_map[] */
swap_page_trans_huge_swapped <-- access freed si->swap_map !!!
--8<-----
Link: https://lkml.kernel.org/r/20240306140356.3974886-1-ryan.roberts@arm.com
Closes: https://lore.kernel.org/linux-mm/8734t27awd.fsf@yhuang6-desk2.ccr.corp.inte…
Signed-off-by: Ryan Roberts <ryan.roberts(a)arm.com>
Signed-off-by: "Huang, Ying" <ying.huang(a)intel.com> [patch description]
Cc: David Hildenbrand <david(a)redhat.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
Hi, Andrew,
If it's not too late. Please replace v2 of this patch in mm-stable
with this version.
Changes since v2:
- Remove comments for get_swap_device() because it's not correct.
- Revised patch description about the race condition description.
Changes since v1:
- Added comments for get_swap_device() as suggested by David
- Moved check that swap entry is not free from get_swap_device() to
free_swap_and_cache() since there are some paths that legitimately call with
a free offset.
Best Regards,
Huang, Ying
mm/swapfile.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/mm/swapfile.c b/mm/swapfile.c
index 2b3a2d85e350..9e0691276f5e 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -1609,13 +1609,19 @@ int free_swap_and_cache(swp_entry_t entry)
if (non_swap_entry(entry))
return 1;
- p = _swap_info_get(entry);
+ p = get_swap_device(entry);
if (p) {
+ if (WARN_ON(data_race(!p->swap_map[swp_offset(entry)]))) {
+ put_swap_device(p);
+ return 0;
+ }
+
count = __swap_entry_free(p, entry);
if (count == SWAP_HAS_CACHE &&
!swap_page_trans_huge_swapped(p, entry))
__try_to_reclaim_swap(p, swp_offset(entry),
TTRS_UNMAPPED | TTRS_FULL);
+ put_swap_device(p);
}
return p != NULL;
}
--
2.39.2
From: Vasant Karasulli <vkarasulli(a)suse.de>
Hi,
here are changes to enable kexec/kdump in SEV-ES guests. The biggest
problem for supporting kexec/kdump under SEV-ES is to find a way to
hand the non-boot CPUs (APs) from one kernel to another.
Without SEV-ES the first kernel parks the CPUs in a HLT loop until
they get reset by the kexec'ed kernel via an INIT-SIPI-SIPI sequence.
For virtual machines the CPU reset is emulated by the hypervisor,
which sets the vCPU registers back to reset state.
This does not work under SEV-ES, because the hypervisor has no access
to the vCPU registers and can't make modifications to them. So an
SEV-ES guest needs to reset the vCPU itself and park it using the
AP-reset-hold protocol. Upon wakeup the guest needs to jump to
real-mode and to the reset-vector configured in the AP-Jump-Table.
The code to do this is the main part of this patch-set. It works by
placing code on the AP Jump-Table page itself to park the vCPU and for
jumping to the reset vector upon wakeup. The code on the AP Jump Table
runs in 16-bit protected mode with segment base set to the beginning
of the page. The AP Jump-Table is usually not within the first 1MB of
memory, so the code can't run in real-mode.
The AP Jump-Table is the best place to put the parking code, because
the memory is owned, but read-only by the firmware and writeable by
the OS. Only the first 4 bytes are used for the reset-vector, leaving
the rest of the page for code/data/stack to park a vCPU. The code
can't be in kernel memory because by the time the vCPU wakes up the
memory will be owned by the new kernel, which might have overwritten it
already.
The other patches add initial GHCB Version 2 protocol support, because
kexec/kdump need the MSR-based (without a GHCB) AP-reset-hold VMGEXIT,
which is a GHCB protocol version 2 feature.
The kexec'ed kernel is also entered via the decompressor and needs
MMIO support there, so this patch-set also adds MMIO #VC support to
the decompressor and support for handling CLFLUSH instructions.
Finally there is also code to disable kexec/kdump support at runtime
when the environment does not support it (e.g. no GHCB protocol
version 2 support or AP Jump Table over 4GB).
The diffstat looks big, but most of it is moving code for MMIO #VC
support around to make it available to the decompressor.
The previous version of this patch-set can be found here:
https://lore.kernel.org/lkml/20220127101044.13803-1-joro@8bytes.org/
Please review.
Thanks,
Vasant
Changes v3->v4:
- Rebased to v6.8 kernel
- Applied review comments by Sean Christopherson
- Combined sev_es_setup_ap_jump_table() and sev_setup_ap_jump_table()
into a single function which makes caching jump table address
unnecessary
- annotated struct sev_ap_jump_table_header with __packed attribute
- added code to set up real mode data segment at boot time instead of
hardcoding the value.
Changes v2->v3:
- Rebased to v5.17-rc1
- Applied most review comments by Boris
- Use the name 'AP jump table' consistently
- Make kexec-disabling for unsupported guests x86-specific
- Cleanup and consolidate patches to detect GHCB v2 protocol
support
Joerg Roedel (9):
x86/kexec/64: Disable kexec when SEV-ES is active
x86/sev: Save and print negotiated GHCB protocol version
x86/sev: Set GHCB data structure version
x86/sev: Setup code to park APs in the AP Jump Table
x86/sev: Park APs on AP Jump Table with GHCB protocol version 2
x86/sev: Use AP Jump Table blob to stop CPU
x86/sev: Add MMIO handling support to boot/compressed/ code
x86/sev: Handle CLFLUSH MMIO events
x86/kexec/64: Support kexec under SEV-ES with AP Jump Table Blob
arch/x86/boot/compressed/sev.c | 45 +-
arch/x86/include/asm/insn-eval.h | 1 +
arch/x86/include/asm/realmode.h | 5 +
arch/x86/include/asm/sev-ap-jumptable.h | 30 +
arch/x86/include/asm/sev.h | 7 +
arch/x86/kernel/machine_kexec_64.c | 12 +
arch/x86/kernel/process.c | 8 +
arch/x86/kernel/sev-shared.c | 234 +++++-
arch/x86/kernel/sev.c | 372 +++++-----
arch/x86/lib/insn-eval-shared.c | 912 ++++++++++++++++++++++++
arch/x86/lib/insn-eval.c | 911 +----------------------
arch/x86/realmode/Makefile | 9 +-
arch/x86/realmode/rm/Makefile | 11 +-
arch/x86/realmode/rm/header.S | 3 +
arch/x86/realmode/rm/sev.S | 85 +++
arch/x86/realmode/rmpiggy.S | 6 +
arch/x86/realmode/sev/Makefile | 33 +
arch/x86/realmode/sev/ap_jump_table.S | 131 ++++
arch/x86/realmode/sev/ap_jump_table.lds | 24 +
19 files changed, 1695 insertions(+), 1144 deletions(-)
create mode 100644 arch/x86/include/asm/sev-ap-jumptable.h
create mode 100644 arch/x86/lib/insn-eval-shared.c
create mode 100644 arch/x86/realmode/rm/sev.S
create mode 100644 arch/x86/realmode/sev/Makefile
create mode 100644 arch/x86/realmode/sev/ap_jump_table.S
create mode 100644 arch/x86/realmode/sev/ap_jump_table.lds
base-commit: e8f897f4afef0031fe618a8e94127a0934896aba
--
2.34.1
This is the backport of recently upstreamed series that moves VERW
execution to a later point in exit-to-user path. This is needed because
in some cases it may be possible for data accessed after VERW executions
may end into MDS affected CPU buffers. Moving VERW closer to ring
transition reduces the attack surface.
- The series includes a dependency commit f87bc8dc7a7c ("x86/asm: Add
_ASM_RIP() macro for x86-64 (%rip) suffix").
- Patch 2 includes a change that adds runtime patching for jmp (instead
of verw in original series) due to lack of rip-relative relocation
support in kernels <v6.5.
- Fixed warning:
arch/x86/entry/entry.o: warning: objtool: mds_verw_sel+0x0: unreachable instruction.
- Resolved merge conflicts in:
swapgs_restore_regs_and_return_to_usermode in entry_64.S.
__vmx_vcpu_run in vmenter.S.
vmx_update_fb_clear_dis in vmx.c.
- Boot tested with KASLR and KPTI enabled.
- Verified VERW being executed with mitigation ON, and not being
executed with mitigation turned OFF.
To: stable(a)vger.kernel.org
Signed-off-by: Pawan Gupta <pawan.kumar.gupta(a)linux.intel.com>
---
H. Peter Anvin (Intel) (1):
x86/asm: Add _ASM_RIP() macro for x86-64 (%rip) suffix
Pawan Gupta (5):
x86/bugs: Add asm helpers for executing VERW
x86/entry_64: Add VERW just before userspace transition
x86/entry_32: Add VERW just before userspace transition
x86/bugs: Use ALTERNATIVE() instead of mds_user_clear static key
KVM/VMX: Move VERW closer to VMentry for MDS mitigation
Sean Christopherson (1):
KVM/VMX: Use BT+JNC, i.e. EFLAGS.CF to select VMRESUME vs. VMLAUNCH
Documentation/x86/mds.rst | 38 +++++++++++++++++++++++++-----------
arch/x86/entry/entry.S | 23 ++++++++++++++++++++++
arch/x86/entry/entry_32.S | 3 +++
arch/x86/entry/entry_64.S | 11 +++++++++++
arch/x86/entry/entry_64_compat.S | 1 +
arch/x86/include/asm/asm.h | 5 +++++
arch/x86/include/asm/cpufeatures.h | 2 +-
arch/x86/include/asm/entry-common.h | 1 -
arch/x86/include/asm/nospec-branch.h | 27 +++++++++++++------------
arch/x86/kernel/cpu/bugs.c | 15 ++++++--------
arch/x86/kernel/nmi.c | 3 ---
arch/x86/kvm/vmx/run_flags.h | 7 +++++--
arch/x86/kvm/vmx/vmenter.S | 9 ++++++---
arch/x86/kvm/vmx/vmx.c | 12 ++++++++----
14 files changed, 111 insertions(+), 46 deletions(-)
---
base-commit: 80efc6265290d34b75921bf7294e0d9c5a8749dc
change-id: 20240304-delay-verw-backport-5-15-y-e16f07fbb71e
Best regards,
--
Thanks,
Pawan