May 2025 - Linux-stable-mirror

[for-linus][PATCH 2/4] ftrace: Fix preemption acounting for stacktrace trigger command

by Steven Rostedt

From: pengdonglin <pengdonglin(a)xiaomi.com> When using the stacktrace trigger command to trace syscalls, the preemption count was consistently reported as 1 when the system call event itself had 0 ("."). For example: root@ubuntu22-vm:/sys/kernel/tracing/events/syscalls/sys_enter_read $ echo stacktrace > trigger $ echo 1 > enable sshd-416 [002] ..... 232.864910: sys_read(fd: a, buf: 556b1f3221d0, count: 8000) sshd-416 [002] ...1. 232.864913: <stack trace> => ftrace_syscall_enter => syscall_trace_enter => do_syscall_64 => entry_SYSCALL_64_after_hwframe The root cause is that the trace framework disables preemption in __DO_TRACE before invoking the trigger callback. Use the tracing_gen_ctx_dec() that will accommodate for the increase of the preemption count in __DO_TRACE when calling the callback. The result is the accurate reporting of: sshd-410 [004] ..... 210.117660: sys_read(fd: 4, buf: 559b725ba130, count: 40000) sshd-410 [004] ..... 210.117662: <stack trace> => ftrace_syscall_enter => syscall_trace_enter => do_syscall_64 => entry_SYSCALL_64_after_hwframe Cc: stable(a)vger.kernel.org Fixes: ce33c845b030c ("tracing: Dump stacktrace trigger to the corresponding instance") Link: https://lore.kernel.org/20250512094246.1167956-1-dolinux.peng@gmail.com Signed-off-by: pengdonglin <dolinux.peng(a)gmail.com> Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org> --- kernel/trace/trace_events_trigger.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/trace/trace_events_trigger.c b/kernel/trace/trace_events_trigger.c index b66b6d235d91..6e87ae2a1a66 100644 --- a/kernel/trace/trace_events_trigger.c +++ b/kernel/trace/trace_events_trigger.c @@ -1560,7 +1560,7 @@ stacktrace_trigger(struct event_trigger_data *data, struct trace_event_file *file = data->private_data; if (file) - __trace_stack(file->tr, tracing_gen_ctx(), STACK_SKIP); + __trace_stack(file->tr, tracing_gen_ctx_dec(), STACK_SKIP); else trace_dump_stack(STACK_SKIP); } -- 2.47.2

5 months, 3 weeks

1
0
0 0

[for-linus][PATCH 1/4] tracing: samples: Initialize trace_array_printk() with the correct function

by Steven Rostedt

From: Steven Rostedt <rostedt(a)goodmis.org> When using trace_array_printk() on a created instance, the correct function to use to initialize it is: trace_array_init_printk() Not trace_printk_init_buffer() The former is a proper function to use, the latter is for initializing trace_printk() and causes the NOTICE banner to be displayed. Cc: stable(a)vger.kernel.org Cc: Masami Hiramatsu <mhiramat(a)kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com> Cc: Divya Indi <divya.indi(a)oracle.com> Link: https://lore.kernel.org/20250509152657.0f6744d9@gandalf.local.home Fixes: 89ed42495ef4a ("tracing: Sample module to demonstrate kernel access to Ftrace instances.") Fixes: 38ce2a9e33db6 ("tracing: Add trace_array_init_printk() to initialize instance trace_printk() buffers") Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org> --- samples/ftrace/sample-trace-array.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/samples/ftrace/sample-trace-array.c b/samples/ftrace/sample-trace-array.c index dac67c367457..4147616102f9 100644 --- a/samples/ftrace/sample-trace-array.c +++ b/samples/ftrace/sample-trace-array.c @@ -112,7 +112,7 @@ static int __init sample_trace_array_init(void) /* * If context specific per-cpu buffers havent already been allocated. */ - trace_printk_init_buffers(); + trace_array_init_printk(tr); simple_tsk = kthread_run(simple_thread, NULL, "sample-instance"); if (IS_ERR(simple_tsk)) { -- 2.47.2

5 months, 3 weeks

1
0
0 0

[PATCH] usb: Flush altsetting 0 endpoints before reinitializating them after reset.

by Mathias Nyman

usb core avoids sending a Set-Interface altsetting 0 request after device reset, and instead relies on calling usb_disable_interface() and usb_enable_interface() to flush and reset host-side of those endpoints. xHCI hosts allocate and set up endpoint ring buffers and host_ep->hcpriv during usb_hcd_alloc_bandwidth() callback, which in this case is called before flushing the endpoint in usb_disable_interface(). Call usb_disable_interface() before usb_hcd_alloc_bandwidth() to ensure URBs are flushed before new ring buffers for the endpoints are allocated. Otherwise host driver will attempt to find and remove old stale URBs from a freshly allocated new ringbuffer. Cc: stable(a)vger.kernel.org Fixes: 4fe0387afa89 ("USB: don't send Set-Interface after reset") Signed-off-by: Mathias Nyman <mathias.nyman(a)linux.intel.com> --- drivers/usb/core/hub.c | 16 ++++++++++++++-- 1 file changed, 14 insertions(+), 2 deletions(-) diff --git a/drivers/usb/core/hub.c b/drivers/usb/core/hub.c index 0e1dd6ef60a7..9f19fc7494e0 100644 --- a/drivers/usb/core/hub.c +++ b/drivers/usb/core/hub.c @@ -6133,6 +6133,7 @@ static int usb_reset_and_verify_device(struct usb_device *udev) struct usb_hub *parent_hub; struct usb_hcd *hcd = bus_to_hcd(udev->bus); struct usb_device_descriptor descriptor; + struct usb_interface *intf; struct usb_host_bos *bos; int i, j, ret = 0; int port1 = udev->portnum; @@ -6190,6 +6191,18 @@ static int usb_reset_and_verify_device(struct usb_device *udev) if (!udev->actconfig) goto done; + /* + * Some devices can't handle setting default altsetting 0 with a + * Set-Interface request. Disable host-side endpoints of those + * interfaces here. Enable and reset them back after host has set + * its internal endpoint structures during usb_hcd_alloc_bandwith() + */ + for (i = 0; i < udev->actconfig->desc.bNumInterfaces; i++) { + intf = udev->actconfig->interface[i]; + if (intf->cur_altsetting->desc.bAlternateSetting == 0) + usb_disable_interface(udev, intf, true); + } + mutex_lock(hcd->bandwidth_mutex); ret = usb_hcd_alloc_bandwidth(udev, udev->actconfig, NULL, NULL); if (ret < 0) { @@ -6221,12 +6234,11 @@ static int usb_reset_and_verify_device(struct usb_device *udev) */ for (i = 0; i < udev->actconfig->desc.bNumInterfaces; i++) { struct usb_host_config *config = udev->actconfig; - struct usb_interface *intf = config->interface[i]; struct usb_interface_descriptor *desc; + intf = config->interface[i]; desc = &intf->cur_altsetting->desc; if (desc->bAlternateSetting == 0) { - usb_disable_interface(udev, intf, true); usb_enable_interface(udev, intf, true); ret = 0; } else { -- 2.43.0

5 months, 3 weeks

1
0
0 0

[PATCH] ACPI: PPTT: Fix processor subtable walk

by Jeremy Linton

The original PPTT code had a bug where the processor subtable length was not correctly validated when encountering a truncated acpi_pptt_processor node. Commit 7ab4f0e37a0f4 ("ACPI PPTT: Fix coding mistakes in a couple of sizeof() calls") attempted to fix this by validating the size is as large as the acpi_pptt_processor node structure. This introduced a regression where the last processor node in the PPTT table is ignored if it doesn't contain any private resources. That results errors like: ACPI PPTT: PPTT table found, but unable to locate core XX (XX) ACPI: SPE must be homogeneous Furthermore, it fail in a common case where the node length isn't equal to the acpi_pptt_processor structure size, leaving the original bug in a modified form. Correct the regression by adjusting the loop termination conditions as suggested by the bug reporters. An additional check performed after the subtable node type is detected, validates the acpi_pptt_processor node is fully contained in the PPTT table. Repeating the check in acpi_pptt_leaf_node() is largely redundant as the node is already known to be fully contained in the table. The case where a final truncated node's parent property is accepted, but the node itself is rejected should not be considered a bug. Fixes: 7ab4f0e37a0f4 ("ACPI PPTT: Fix coding mistakes in a couple of sizeof() calls") Reported-by: Maximilian Heyne <mheyne(a)amazon.de> Closes: https://lore.kernel.org/linux-acpi/20250506-draco-taped-15f475cd@mheyne-ama… Reported-by: Yicong Yang <yangyicong(a)hisilicon.com> Closes: https://lore.kernel.org/linux-acpi/20250507035124.28071-1-yangyicong@huawei… Signed-off-by: Jeremy Linton <jeremy.linton(a)arm.com> Cc: Jean-Marc Eurin <jmeurin(a)google.com> Cc: <stable(a)vger.kernel.org> --- drivers/acpi/pptt.c | 11 ++++++++--- 1 file changed, 8 insertions(+), 3 deletions(-) diff --git a/drivers/acpi/pptt.c b/drivers/acpi/pptt.c index f73ce6e13065..54676e3d82dd 100644 --- a/drivers/acpi/pptt.c +++ b/drivers/acpi/pptt.c @@ -231,16 +231,18 @@ static int acpi_pptt_leaf_node(struct acpi_table_header *table_hdr, sizeof(struct acpi_table_pptt)); proc_sz = sizeof(struct acpi_pptt_processor); - while ((unsigned long)entry + proc_sz < table_end) { + /* ignore subtable types that are smaller than a processor node */ + while ((unsigned long)entry + proc_sz <= table_end) { cpu_node = (struct acpi_pptt_processor *)entry; + if (entry->type == ACPI_PPTT_TYPE_PROCESSOR && cpu_node->parent == node_entry) return 0; if (entry->length == 0) return 0; + entry = ACPI_ADD_PTR(struct acpi_subtable_header, entry, entry->length); - } return 1; } @@ -273,15 +275,18 @@ static struct acpi_pptt_processor *acpi_find_processor_node(struct acpi_table_he proc_sz = sizeof(struct acpi_pptt_processor); /* find the processor structure associated with this cpuid */ - while ((unsigned long)entry + proc_sz < table_end) { + while ((unsigned long)entry + proc_sz <= table_end) { cpu_node = (struct acpi_pptt_processor *)entry; if (entry->length == 0) { pr_warn("Invalid zero length subtable\n"); break; } + /* entry->length may not equal proc_sz, revalidate the processor structure length */ if (entry->type == ACPI_PPTT_TYPE_PROCESSOR && acpi_cpu_id == cpu_node->acpi_processor_id && + (unsigned long)entry + entry->length <= table_end && + entry->length == proc_sz + cpu_node->number_of_priv_resources * sizeof(u32) && acpi_pptt_leaf_node(table_hdr, cpu_node)) { return (struct acpi_pptt_processor *)entry; } -- 2.49.0

5 months, 3 weeks

6
5
0 0

6.14.7-rc1: undefined reference to `__x86_indirect_its_thunk_array' when CONFIG_CPU_MITIGATIONS is off

by Holger Hoffstätte

While trying to build 6.14.7-rc1 with CONFIG_CPU_MITIGATIONS unset: LD .tmp_vmlinux1 ld: arch/x86/net/bpf_jit_comp.o: in function `emit_indirect_jump': /tmp/linux-6.14.7/arch/x86/net/bpf_jit_comp.c:660:(.text+0x97e): undefined reference to `__x86_indirect_its_thunk_array' make[2]: *** [scripts/Makefile.vmlinux:77: vmlinux] Error 1 make[1]: *** [/tmp/linux-6.14.7/Makefile:1234: vmlinux] Error 2 make: *** [Makefile:251: __sub-make] Error 2 - applying 9f35e33144ae aka "x86/its: Fix build errors when CONFIG_MODULES=n" did not help - mainline at 9f35e33144ae does not have this problem (same config) Are we missing a commit in stable? I temporarily threw "if (IS_ENABLED(CONFIG_MITIGATION_ITS))" around the problematic feature check and that made it work, but I get the feeling that cpu_feature_enabled(X86_FEATURE_INDIRECT_THUNK_ITS) is implemented differently than the other feature checks and/or is missing something. thanks Holger

5 months, 3 weeks

4
5
0 0

[PATCH] ASoC: amd: acp: Add null pointer check in acp_max98388_hw_paramsi()

by Wentao Liang

The acp_max98388_hw_params() calls the snd_soc_card_get_codec_dai(), but does not check its return value which is a null pointer if the function fails. This can result in a null pointer dereference. Add a null pointer check for snd_soc_card_get_codec_dai() to avoid null pointer dereference when the function fails. Fixes: ac91c8c89782 ("ASoC: amd: acp: Add machine driver support for max98388 codec") Cc: stable(a)vger.kernel.org # v6.6 Signed-off-by: Wentao Liang <vulab(a)iscas.ac.cn> --- sound/soc/amd/acp/acp-mach-common.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/sound/soc/amd/acp/acp-mach-common.c b/sound/soc/amd/acp/acp-mach-common.c index f7602c1769bf..a795cc1836cc 100644 --- a/sound/soc/amd/acp/acp-mach-common.c +++ b/sound/soc/amd/acp/acp-mach-common.c @@ -918,6 +918,9 @@ static int acp_max98388_hw_params(struct snd_pcm_substream *substream, MAX98388_CODEC_DAI); int ret; + if (codec_dai) + return -EINVAL; + ret = snd_soc_dai_set_fmt(codec_dai, SND_SOC_DAIFMT_CBS_CFS | SND_SOC_DAIFMT_I2S | SND_SOC_DAIFMT_NB_NF); -- 2.42.0.windows.2

5 months, 3 weeks

1
0
0 0

[PATCH v2 RESEND] phy: Fix error handling in tegra_xusb_port_init

by Ma Ke

If device_add() fails, do not use device_unregister() for error handling. device_unregister() consists two functions: device_del() and put_device(). device_unregister() should only be called after device_add() succeeded because device_del() undoes what device_add() does if successful. Change device_unregister() to put_device() call before returning from the function. As comment of device_add() says, 'if device_add() succeeds, you should call device_del() when you want to get rid of it. If device_add() has not succeeded, use only put_device() to drop the reference count'. Found by code review. Cc: stable(a)vger.kernel.org Fixes: 53d2a715c240 ("phy: Add Tegra XUSB pad controller support") Signed-off-by: Ma Ke <make24(a)iscas.ac.cn> --- Changes in v2: - modified the bug description as suggestions. --- drivers/phy/tegra/xusb.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/drivers/phy/tegra/xusb.c b/drivers/phy/tegra/xusb.c index 79d4814d758d..c89df95aa6ca 100644 --- a/drivers/phy/tegra/xusb.c +++ b/drivers/phy/tegra/xusb.c @@ -548,16 +548,16 @@ static int tegra_xusb_port_init(struct tegra_xusb_port *port, err = dev_set_name(&port->dev, "%s-%u", name, index); if (err < 0) - goto unregister; + goto put_device; err = device_add(&port->dev); if (err < 0) - goto unregister; + goto put_device; return 0; -unregister: - device_unregister(&port->dev); +put_device: + put_device(&port->dev); return err; } -- 2.25.1

5 months, 4 weeks

3
3
0 0

[PATCH] ALSA: es1968: Add error handling for snd_pcm_hw_constraint_pow2()

by Wentao Liang

The function snd_es1968_capture_open() calls the function snd_pcm_hw_constraint_pow2(), but does not check its return value. A proper implementation can be found in snd_cx25821_pcm_open(). Add error handling for snd_pcm_hw_constraint_pow2() and propagate its error code. Fixes: b942cf815b57 ("[ALSA] es1968 - Fix stuttering capture") Cc: stable(a)vger.kernel.org # v2.6.22 Signed-off-by: Wentao Liang <vulab(a)iscas.ac.cn> --- sound/pci/es1968.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/sound/pci/es1968.c b/sound/pci/es1968.c index c6c018b40c69..4e0693f0ab0f 100644 --- a/sound/pci/es1968.c +++ b/sound/pci/es1968.c @@ -1561,7 +1561,7 @@ static int snd_es1968_capture_open(struct snd_pcm_substream *substream) struct snd_pcm_runtime *runtime = substream->runtime; struct es1968 *chip = snd_pcm_substream_chip(substream); struct esschan *es; - int apu1, apu2; + int err, apu1, apu2; apu1 = snd_es1968_alloc_apu_pair(chip, ESM_APU_PCM_CAPTURE); if (apu1 < 0) @@ -1605,7 +1605,9 @@ static int snd_es1968_capture_open(struct snd_pcm_substream *substream) runtime->hw = snd_es1968_capture; runtime->hw.buffer_bytes_max = runtime->hw.period_bytes_max = calc_available_memory_size(chip) - 1024; /* keep MIXBUF size */ - snd_pcm_hw_constraint_pow2(runtime, 0, SNDRV_PCM_HW_PARAM_BUFFER_BYTES); + err = snd_pcm_hw_constraint_pow2(runtime, 0, SNDRV_PCM_HW_PARAM_BUFFER_BYTES); + if (err < 0) + return err; spin_lock_irq(&chip->substream_lock); list_add(&es->list, &chip->substream_list); -- 2.42.0.windows.2

5 months, 4 weeks

2
1
0 0

[PATCH v2 2/2] fs: minix: Fix handling of corrupted directories

by Andrey Kriulin

If the directory is corrupted and the number of nlinks is less than 2 (valid nlinks have at least 2), then when the directory is deleted, the minix_rmdir will try to reduce the nlinks(unsigned int) to a negative value. Make nlinks validity check for directory in minix_lookup. Signed-off-by: Andrey Kriulin <kitotavrik.s(a)gmail.com> --- v2: Move nlinks validaty check to V[12]_minix_iget() per Jan Kara <jack(a)suse.cz> request. Change return error code to EUCLEAN. Don't block directory in r/o mode per Al Viro <viro(a)zeniv.linux.org.uk> request. fs/minix/inode.c | 16 ++++++++++++++++ fs/minix/namei.c | 7 +------ 2 files changed, 17 insertions(+), 6 deletions(-) diff --git a/fs/minix/inode.c b/fs/minix/inode.c index f007e389d5d2..d815397b8b0d 100644 --- a/fs/minix/inode.c +++ b/fs/minix/inode.c @@ -517,6 +517,14 @@ static struct inode *V1_minix_iget(struct inode *inode) iget_failed(inode); return ERR_PTR(-ESTALE); } + if (S_ISDIR(raw_inode->i_mode) && raw_inode->i_nlinks < 2) { + printk("MINIX-fs: inode directory with corrupted number of links"); + if (!sb_rdonly(inode->i_sb)) { + brelse(bh); + iget_failed(inode); + return ERR_PTR(-EUCLEAN); + } + } inode->i_mode = raw_inode->i_mode; i_uid_write(inode, raw_inode->i_uid); i_gid_write(inode, raw_inode->i_gid); @@ -555,6 +563,14 @@ static struct inode *V2_minix_iget(struct inode *inode) iget_failed(inode); return ERR_PTR(-ESTALE); } + if (S_ISDIR(raw_inode->i_mode) && raw_inode->i_nlinks < 2) { + printk("MINIX-fs: inode directory with corrupted number of links"); + if (!sb_rdonly(inode->i_sb)) { + brelse(bh); + iget_failed(inode); + return ERR_PTR(-EUCLEAN); + } + } inode->i_mode = raw_inode->i_mode; i_uid_write(inode, raw_inode->i_uid); i_gid_write(inode, raw_inode->i_gid); diff --git a/fs/minix/namei.c b/fs/minix/namei.c index 5717a56fa01a..8938536d8d3c 100644 --- a/fs/minix/namei.c +++ b/fs/minix/namei.c @@ -28,13 +28,8 @@ static struct dentry *minix_lookup(struct inode * dir, struct dentry *dentry, un return ERR_PTR(-ENAMETOOLONG); ino = minix_inode_by_name(dentry); - if (ino) { + if (ino) inode = minix_iget(dir->i_sb, ino); - if (S_ISDIR(inode->i_mode) && inode->i_nlink < 2) { - iput(inode); - return ERR_PTR(-EIO); - } - } return d_splice_alias(inode, dentry); } -- 2.47.2

5 months, 4 weeks

2
1
0 0

[PATCH v3] x86/sev: Do not touch VMSA pages during kdump of SNP guest memory

by Ashish Kalra

From: Ashish Kalra <ashish.kalra(a)amd.com> When kdump is running makedumpfile to generate vmcore and dumping SNP guest memory it touches the VMSA page of the vCPU executing kdump which then results in unrecoverable #NPF/RMP faults as the VMSA page is marked busy/in-use when the vCPU is running and subsequently causes guest softlockup/hang. Additionally other APs may be halted in guest mode and their VMSA pages are marked busy and touching these VMSA pages during guest memory dump will also cause #NPF. Issue AP_DESTROY GHCB calls on other APs to ensure they are kicked out of guest mode and then clear the VMSA bit on their VMSA pages. If the vCPU running kdump is an AP, mark it's VMSA page as offline to ensure that makedumpfile excludes that page while dumping guest memory. Cc: stable(a)vger.kernel.org Fixes: 3074152e56c9 ("x86/sev: Convert shared memory back to private on kexec") Signed-off-by: Ashish Kalra <ashish.kalra(a)amd.com> --- arch/x86/coco/sev/core.c | 241 +++++++++++++++++++++++++-------------- 1 file changed, 155 insertions(+), 86 deletions(-) diff --git a/arch/x86/coco/sev/core.c b/arch/x86/coco/sev/core.c index dcfaa698d6cf..f4eb5b645239 100644 --- a/arch/x86/coco/sev/core.c +++ b/arch/x86/coco/sev/core.c @@ -877,6 +877,99 @@ void snp_accept_memory(phys_addr_t start, phys_addr_t end) set_pages_state(vaddr, npages, SNP_PAGE_STATE_PRIVATE); } +static int vmgexit_ap_control(u64 event, struct sev_es_save_area *vmsa, u32 apic_id) +{ + struct ghcb_state state; + unsigned long flags; + struct ghcb *ghcb; + int ret = 0; + + local_irq_save(flags); + + ghcb = __sev_get_ghcb(&state); + + vc_ghcb_invalidate(ghcb); + if (event == SVM_VMGEXIT_AP_CREATE) + ghcb_set_rax(ghcb, vmsa->sev_features); + ghcb_set_sw_exit_code(ghcb, SVM_VMGEXIT_AP_CREATION); + ghcb_set_sw_exit_info_1(ghcb, + ((u64)apic_id << 32) | + ((u64)snp_vmpl << 16) | + event); + ghcb_set_sw_exit_info_2(ghcb, __pa(vmsa)); + + sev_es_wr_ghcb_msr(__pa(ghcb)); + VMGEXIT(); + + if (!ghcb_sw_exit_info_1_is_valid(ghcb) || + lower_32_bits(ghcb->save.sw_exit_info_1)) { + pr_err("SNP AP %s error\n", (event == SVM_VMGEXIT_AP_CREATE ? "CREATE" : "DESTROY")); + ret = -EINVAL; + } + + __sev_put_ghcb(&state); + + local_irq_restore(flags); + + return ret; +} + +static int snp_set_vmsa(void *va, void *caa, int apic_id, bool make_vmsa) +{ + int ret; + + if (snp_vmpl) { + struct svsm_call call = {}; + unsigned long flags; + + local_irq_save(flags); + + call.caa = this_cpu_read(svsm_caa); + call.rcx = __pa(va); + + if (make_vmsa) { + /* Protocol 0, Call ID 2 */ + call.rax = SVSM_CORE_CALL(SVSM_CORE_CREATE_VCPU); + call.rdx = __pa(caa); + call.r8 = apic_id; + } else { + /* Protocol 0, Call ID 3 */ + call.rax = SVSM_CORE_CALL(SVSM_CORE_DELETE_VCPU); + } + + ret = svsm_perform_call_protocol(&call); + + local_irq_restore(flags); + } else { + /* + * If the kernel runs at VMPL0, it can change the VMSA + * bit for a page using the RMPADJUST instruction. + * However, for the instruction to succeed it must + * target the permissions of a lesser privileged (higher + * numbered) VMPL level, so use VMPL1. + */ + u64 attrs = 1; + + if (make_vmsa) + attrs |= RMPADJUST_VMSA_PAGE_BIT; + + ret = rmpadjust((unsigned long)va, RMP_PG_SIZE_4K, attrs); + } + + return ret; +} + +static void snp_cleanup_vmsa(struct sev_es_save_area *vmsa, int apic_id) +{ + int err; + + err = snp_set_vmsa(vmsa, NULL, apic_id, false); + if (err) + pr_err("clear VMSA page failed (%u), leaking page\n", err); + else + free_page((unsigned long)vmsa); +} + static void set_pte_enc(pte_t *kpte, int level, void *va) { struct pte_enc_desc d = { @@ -973,6 +1066,65 @@ void snp_kexec_begin(void) pr_warn("Failed to stop shared<->private conversions\n"); } +/* + * Shutdown all APs except the one handling kexec/kdump and clearing + * the VMSA tag on AP's VMSA pages as they are not being used as + * VMSA page anymore. + */ +static void shutdown_all_aps(void) +{ + struct sev_es_save_area *vmsa; + int apic_id, this_cpu, cpu; + + this_cpu = get_cpu(); + + /* + * APs are already in HLT loop when enc_kexec_finish() callback + * is invoked. + */ + for_each_present_cpu(cpu) { + vmsa = per_cpu(sev_vmsa, cpu); + + /* + * BSP does not have guest allocated VMSA and there is no need + * to clear the VMSA tag for this page. + */ + if (!vmsa) + continue; + + /* + * Cannot clear the VMSA tag for the currently running vCPU. + */ + if (this_cpu == cpu) { + unsigned long pa; + struct page *p; + + pa = __pa(vmsa); + /* + * Mark the VMSA page of the running vCPU as offline + * so that is excluded and not touched by makedumpfile + * while generating vmcore during kdump. + */ + p = pfn_to_online_page(pa >> PAGE_SHIFT); + if (p) + __SetPageOffline(p); + continue; + } + + apic_id = cpuid_to_apicid[cpu]; + + /* + * Issue AP destroy to ensure AP gets kicked out of guest mode + * to allow using RMPADJUST to remove the VMSA tag on it's + * VMSA page. + */ + vmgexit_ap_control(SVM_VMGEXIT_AP_DESTROY, vmsa, apic_id); + snp_cleanup_vmsa(vmsa, apic_id); + } + + put_cpu(); +} + void snp_kexec_finish(void) { struct sev_es_runtime_data *data; @@ -987,6 +1139,8 @@ void snp_kexec_finish(void) if (!IS_ENABLED(CONFIG_KEXEC_CORE)) return; + shutdown_all_aps(); + unshare_all_memory(); /* @@ -1008,51 +1162,6 @@ void snp_kexec_finish(void) } } -static int snp_set_vmsa(void *va, void *caa, int apic_id, bool make_vmsa) -{ - int ret; - - if (snp_vmpl) { - struct svsm_call call = {}; - unsigned long flags; - - local_irq_save(flags); - - call.caa = this_cpu_read(svsm_caa); - call.rcx = __pa(va); - - if (make_vmsa) { - /* Protocol 0, Call ID 2 */ - call.rax = SVSM_CORE_CALL(SVSM_CORE_CREATE_VCPU); - call.rdx = __pa(caa); - call.r8 = apic_id; - } else { - /* Protocol 0, Call ID 3 */ - call.rax = SVSM_CORE_CALL(SVSM_CORE_DELETE_VCPU); - } - - ret = svsm_perform_call_protocol(&call); - - local_irq_restore(flags); - } else { - /* - * If the kernel runs at VMPL0, it can change the VMSA - * bit for a page using the RMPADJUST instruction. - * However, for the instruction to succeed it must - * target the permissions of a lesser privileged (higher - * numbered) VMPL level, so use VMPL1. - */ - u64 attrs = 1; - - if (make_vmsa) - attrs |= RMPADJUST_VMSA_PAGE_BIT; - - ret = rmpadjust((unsigned long)va, RMP_PG_SIZE_4K, attrs); - } - - return ret; -} - #define __ATTR_BASE (SVM_SELECTOR_P_MASK | SVM_SELECTOR_S_MASK) #define INIT_CS_ATTRIBS (__ATTR_BASE | SVM_SELECTOR_READ_MASK | SVM_SELECTOR_CODE_MASK) #define INIT_DS_ATTRIBS (__ATTR_BASE | SVM_SELECTOR_WRITE_MASK) @@ -1084,24 +1193,10 @@ static void *snp_alloc_vmsa_page(int cpu) return page_address(p + 1); } -static void snp_cleanup_vmsa(struct sev_es_save_area *vmsa, int apic_id) -{ - int err; - - err = snp_set_vmsa(vmsa, NULL, apic_id, false); - if (err) - pr_err("clear VMSA page failed (%u), leaking page\n", err); - else - free_page((unsigned long)vmsa); -} - static int wakeup_cpu_via_vmgexit(u32 apic_id, unsigned long start_ip) { struct sev_es_save_area *cur_vmsa, *vmsa; - struct ghcb_state state; struct svsm_ca *caa; - unsigned long flags; - struct ghcb *ghcb; u8 sipi_vector; int cpu, ret; u64 cr4; @@ -1215,33 +1310,7 @@ static int wakeup_cpu_via_vmgexit(u32 apic_id, unsigned long start_ip) } /* Issue VMGEXIT AP Creation NAE event */ - local_irq_save(flags); - - ghcb = __sev_get_ghcb(&state); - - vc_ghcb_invalidate(ghcb); - ghcb_set_rax(ghcb, vmsa->sev_features); - ghcb_set_sw_exit_code(ghcb, SVM_VMGEXIT_AP_CREATION); - ghcb_set_sw_exit_info_1(ghcb, - ((u64)apic_id << 32) | - ((u64)snp_vmpl << 16) | - SVM_VMGEXIT_AP_CREATE); - ghcb_set_sw_exit_info_2(ghcb, __pa(vmsa)); - - sev_es_wr_ghcb_msr(__pa(ghcb)); - VMGEXIT(); - - if (!ghcb_sw_exit_info_1_is_valid(ghcb) || - lower_32_bits(ghcb->save.sw_exit_info_1)) { - pr_err("SNP AP Creation error\n"); - ret = -EINVAL; - } - - __sev_put_ghcb(&state); - - local_irq_restore(flags); - - /* Perform cleanup if there was an error */ + ret = vmgexit_ap_control(SVM_VMGEXIT_AP_CREATE, vmsa, apic_id); if (ret) { snp_cleanup_vmsa(vmsa, apic_id); vmsa = NULL; -- 2.34.1

5 months, 4 weeks

6
8
0 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

Linux-stable-mirror May 2025