During an S4 resume, the system first performs a cold power-on. The kernel image is initially loaded to a random linear address, and the FRED MSRs are initialized. Subsequently, the S4 image is loaded, and the kernel image is relocated to its original address from before the S4 suspend. Due to changes in the kernel text and data mappings, the FRED MSRs must be reinitialized.
Reported-by: Xi Pardee xi.pardee@intel.com Reported-and-Tested-by: Todd Brandt todd.e.brandt@intel.com Suggested-by: H. Peter Anvin (Intel) hpa@zytor.com Signed-off-by: Xin Li (Intel) xin@zytor.com Cc: stable@kernel.org # 6.9+ --- arch/x86/power/cpu.c | 16 ++++++++++++++++ 1 file changed, 16 insertions(+)
diff --git a/arch/x86/power/cpu.c b/arch/x86/power/cpu.c index 63230ff8cf4f..ef3c152c319c 100644 --- a/arch/x86/power/cpu.c +++ b/arch/x86/power/cpu.c @@ -27,6 +27,7 @@ #include <asm/mmu_context.h> #include <asm/cpu_device_id.h> #include <asm/microcode.h> +#include <asm/fred.h>
#ifdef CONFIG_X86_32 __visible unsigned long saved_context_ebx; @@ -231,6 +232,21 @@ static void notrace __restore_processor_state(struct saved_context *ctxt) */ #ifdef CONFIG_X86_64 wrmsrl(MSR_GS_BASE, ctxt->kernelmode_gs_base); + + /* + * Restore FRED configs. + * + * FRED configs are completely derived from current kernel text and + * data mappings, thus nothing needs to be saved and restored. + * + * As such, simply re-initialize FRED to restore FRED configs. + * + * Note, FRED RSPs setup needs to access percpu data structures. + */ + if (ctxt->cr4 & X86_CR4_FRED) { + cpu_init_fred_exceptions(); + cpu_init_fred_rsps(); + } #else loadsegment(fs, __KERNEL_PERCPU); #endif
Hi,
Thanks for your patch.
FYI: kernel test robot notices the stable kernel rule is not satisfied.
The check is based on https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.html#opti...
Rule: add the tag "Cc: stable@vger.kernel.org" in the sign-off area to have the patch automatically included in the stable tree. Subject: [PATCH v1 1/1] x86/fred: Fix system hang during S4 resume with FRED enabled Link: https://lore.kernel.org/stable/20250326062540.820556-1-xin%40zytor.com
On Wed, Mar 26, 2025 at 7:26 AM Xin Li (Intel) xin@zytor.com wrote:
During an S4 resume, the system first performs a cold power-on. The kernel image is initially loaded to a random linear address, and the FRED MSRs are initialized. Subsequently, the S4 image is loaded, and the kernel image is relocated to its original address from before the S4 suspend. Due to changes in the kernel text and data mappings, the FRED MSRs must be reinitialized.
To be precise, the above description of the hibernation control flow doesn't exactly match the code.
Yes, a new kernel is booted upon a wakeup from S4, but this is not "a cold power-on", strictly speaking. This kernel is often referred to as the restore kernel and yes, it initializes the FRED MSRs as appropriate from its perspective.
Yes, it loads a hibernation image, including the kernel that was running before hibernation, often referred to as the image kernel, but it does its best to load image pages directly into the page frames occupied by them before hibernation unless those page frames are currently in use. In that case, the given image pages are loaded into currently free page frames, but they may or may not be part of the image kernel (they may as well belong to user space processes that were running before hibernation). Yes, all of these pages need to be moved to their original locations before the last step of restore, which is a jump into a "trampoline" page in the image kernel, but this is sort of irrelevant to the issue at hand.
At this point, the image kernel has control, but the FRED MSRs still contain values written to them by the restore kernel and there is no guarantee that those values are the same as the ones written into them by the image kernel before hibernation. Thus the image kernel must ensure that the values of the FRED MSRs will be the same as they were before hibernation, and because they only depend on the location of the kernel text and data, they may as well be recomputed from scratch.
Reported-by: Xi Pardee xi.pardee@intel.com Reported-and-Tested-by: Todd Brandt todd.e.brandt@intel.com Suggested-by: H. Peter Anvin (Intel) hpa@zytor.com Signed-off-by: Xin Li (Intel) xin@zytor.com Cc: stable@kernel.org # 6.9+
arch/x86/power/cpu.c | 16 ++++++++++++++++ 1 file changed, 16 insertions(+)
diff --git a/arch/x86/power/cpu.c b/arch/x86/power/cpu.c index 63230ff8cf4f..ef3c152c319c 100644 --- a/arch/x86/power/cpu.c +++ b/arch/x86/power/cpu.c @@ -27,6 +27,7 @@ #include <asm/mmu_context.h> #include <asm/cpu_device_id.h> #include <asm/microcode.h> +#include <asm/fred.h>
#ifdef CONFIG_X86_32 __visible unsigned long saved_context_ebx; @@ -231,6 +232,21 @@ static void notrace __restore_processor_state(struct saved_context *ctxt) */ #ifdef CONFIG_X86_64 wrmsrl(MSR_GS_BASE, ctxt->kernelmode_gs_base);
/*
* Restore FRED configs.
*
* FRED configs are completely derived from current kernel text and
* data mappings, thus nothing needs to be saved and restored.
*
* As such, simply re-initialize FRED to restore FRED configs.
Instead of the above, I would just say "Reinitialize FRED to ensure that the FRED registers contain the same values as before hibernation."
*
* Note, FRED RSPs setup needs to access percpu data structures.
And I'm not sure what you wanted to say here? Does this refer to the ordering of the code below or to something else?
*/
if (ctxt->cr4 & X86_CR4_FRED) {
cpu_init_fred_exceptions();
cpu_init_fred_rsps();
}
#else loadsegment(fs, __KERNEL_PERCPU);
#endif
* Rafael J. Wysocki rafael@kernel.org wrote:
On Wed, Mar 26, 2025 at 7:26 AM Xin Li (Intel) xin@zytor.com wrote:
During an S4 resume, the system first performs a cold power-on. The kernel image is initially loaded to a random linear address, and the FRED MSRs are initialized. Subsequently, the S4 image is loaded, and the kernel image is relocated to its original address from before the S4 suspend. Due to changes in the kernel text and data mappings, the FRED MSRs must be reinitialized.
To be precise, the above description of the hibernation control flow doesn't exactly match the code.
Yes, a new kernel is booted upon a wakeup from S4, but this is not "a cold power-on", strictly speaking. This kernel is often referred to as the restore kernel and yes, it initializes the FRED MSRs as appropriate from its perspective.
Yes, it loads a hibernation image, including the kernel that was running before hibernation, often referred to as the image kernel, but it does its best to load image pages directly into the page frames occupied by them before hibernation unless those page frames are currently in use. In that case, the given image pages are loaded into currently free page frames, but they may or may not be part of the image kernel (they may as well belong to user space processes that were running before hibernation). Yes, all of these pages need to be moved to their original locations before the last step of restore, which is a jump into a "trampoline" page in the image kernel, but this is sort of irrelevant to the issue at hand.
At this point, the image kernel has control, but the FRED MSRs still contain values written to them by the restore kernel and there is no guarantee that those values are the same as the ones written into them by the image kernel before hibernation. Thus the image kernel must ensure that the values of the FRED MSRs will be the same as they were before hibernation, and because they only depend on the location of the kernel text and data, they may as well be recomputed from scratch.
That's a rather critical difference... I zapped the commit from tip:x86/urgent, awaiting -v2 with a better changelog and better in-code comments.
Thanks,
Ingo
On March 31, 2025 8:30:49 AM PDT, "Rafael J. Wysocki" rafael@kernel.org wrote:
On Wed, Mar 26, 2025 at 7:26 AM Xin Li (Intel) xin@zytor.com wrote:
During an S4 resume, the system first performs a cold power-on. The kernel image is initially loaded to a random linear address, and the FRED MSRs are initialized. Subsequently, the S4 image is loaded, and the kernel image is relocated to its original address from before the S4 suspend. Due to changes in the kernel text and data mappings, the FRED MSRs must be reinitialized.
To be precise, the above description of the hibernation control flow doesn't exactly match the code.
Yes, a new kernel is booted upon a wakeup from S4, but this is not "a cold power-on", strictly speaking. This kernel is often referred to as the restore kernel and yes, it initializes the FRED MSRs as appropriate from its perspective.
Yes, it loads a hibernation image, including the kernel that was running before hibernation, often referred to as the image kernel, but it does its best to load image pages directly into the page frames occupied by them before hibernation unless those page frames are currently in use. In that case, the given image pages are loaded into currently free page frames, but they may or may not be part of the image kernel (they may as well belong to user space processes that were running before hibernation). Yes, all of these pages need to be moved to their original locations before the last step of restore, which is a jump into a "trampoline" page in the image kernel, but this is sort of irrelevant to the issue at hand.
At this point, the image kernel has control, but the FRED MSRs still contain values written to them by the restore kernel and there is no guarantee that those values are the same as the ones written into them by the image kernel before hibernation. Thus the image kernel must ensure that the values of the FRED MSRs will be the same as they were before hibernation, and because they only depend on the location of the kernel text and data, they may as well be recomputed from scratch.
Reported-by: Xi Pardee xi.pardee@intel.com Reported-and-Tested-by: Todd Brandt todd.e.brandt@intel.com Suggested-by: H. Peter Anvin (Intel) hpa@zytor.com Signed-off-by: Xin Li (Intel) xin@zytor.com Cc: stable@kernel.org # 6.9+
arch/x86/power/cpu.c | 16 ++++++++++++++++ 1 file changed, 16 insertions(+)
diff --git a/arch/x86/power/cpu.c b/arch/x86/power/cpu.c index 63230ff8cf4f..ef3c152c319c 100644 --- a/arch/x86/power/cpu.c +++ b/arch/x86/power/cpu.c @@ -27,6 +27,7 @@ #include <asm/mmu_context.h> #include <asm/cpu_device_id.h> #include <asm/microcode.h> +#include <asm/fred.h>
#ifdef CONFIG_X86_32 __visible unsigned long saved_context_ebx; @@ -231,6 +232,21 @@ static void notrace __restore_processor_state(struct saved_context *ctxt) */ #ifdef CONFIG_X86_64 wrmsrl(MSR_GS_BASE, ctxt->kernelmode_gs_base);
/*
* Restore FRED configs.
*
* FRED configs are completely derived from current kernel text and
* data mappings, thus nothing needs to be saved and restored.
*
* As such, simply re-initialize FRED to restore FRED configs.
Instead of the above, I would just say "Reinitialize FRED to ensure that the FRED registers contain the same values as before hibernation."
*
* Note, FRED RSPs setup needs to access percpu data structures.
And I'm not sure what you wanted to say here? Does this refer to the ordering of the code below or to something else?
*/
if (ctxt->cr4 & X86_CR4_FRED) {
cpu_init_fred_exceptions();
cpu_init_fred_rsps();
}
#else loadsegment(fs, __KERNEL_PERCPU);
#endif
Just to make it clear: the patch is correct, the shortcoming is in the description.
I would say that Xin's description, although perhaps excessively brief, is correct from the *hardware* point of view, whereas Rafael adds the much needed *software* perspective.
As far as hardware is concerned, Linux S4 is just a power on (we don't use any BIOS support for S4 even if it exists, which it rarely does anymore, and for very good reasons.) From a software point of view, it is more like a kexec into the frozen kernel image, which then has to re-establish its runtime execution environment – (including the FRED state, which is what this patch does.)
For the APs this is done through the normal AP bringup mechanism, it is only the BSP that needs special treatment.
On Mon, Mar 31, 2025 at 10:04 PM H. Peter Anvin hpa@zytor.com wrote:
On March 31, 2025 8:30:49 AM PDT, "Rafael J. Wysocki" rafael@kernel.org wrote:
On Wed, Mar 26, 2025 at 7:26 AM Xin Li (Intel) xin@zytor.com wrote:
During an S4 resume, the system first performs a cold power-on. The kernel image is initially loaded to a random linear address, and the FRED MSRs are initialized. Subsequently, the S4 image is loaded, and the kernel image is relocated to its original address from before the S4 suspend. Due to changes in the kernel text and data mappings, the FRED MSRs must be reinitialized.
To be precise, the above description of the hibernation control flow doesn't exactly match the code.
Yes, a new kernel is booted upon a wakeup from S4, but this is not "a cold power-on", strictly speaking. This kernel is often referred to as the restore kernel and yes, it initializes the FRED MSRs as appropriate from its perspective.
Yes, it loads a hibernation image, including the kernel that was running before hibernation, often referred to as the image kernel, but it does its best to load image pages directly into the page frames occupied by them before hibernation unless those page frames are currently in use. In that case, the given image pages are loaded into currently free page frames, but they may or may not be part of the image kernel (they may as well belong to user space processes that were running before hibernation). Yes, all of these pages need to be moved to their original locations before the last step of restore, which is a jump into a "trampoline" page in the image kernel, but this is sort of irrelevant to the issue at hand.
At this point, the image kernel has control, but the FRED MSRs still contain values written to them by the restore kernel and there is no guarantee that those values are the same as the ones written into them by the image kernel before hibernation. Thus the image kernel must ensure that the values of the FRED MSRs will be the same as they were before hibernation, and because they only depend on the location of the kernel text and data, they may as well be recomputed from scratch.
Reported-by: Xi Pardee xi.pardee@intel.com Reported-and-Tested-by: Todd Brandt todd.e.brandt@intel.com Suggested-by: H. Peter Anvin (Intel) hpa@zytor.com Signed-off-by: Xin Li (Intel) xin@zytor.com Cc: stable@kernel.org # 6.9+
arch/x86/power/cpu.c | 16 ++++++++++++++++ 1 file changed, 16 insertions(+)
diff --git a/arch/x86/power/cpu.c b/arch/x86/power/cpu.c index 63230ff8cf4f..ef3c152c319c 100644 --- a/arch/x86/power/cpu.c +++ b/arch/x86/power/cpu.c @@ -27,6 +27,7 @@ #include <asm/mmu_context.h> #include <asm/cpu_device_id.h> #include <asm/microcode.h> +#include <asm/fred.h>
#ifdef CONFIG_X86_32 __visible unsigned long saved_context_ebx; @@ -231,6 +232,21 @@ static void notrace __restore_processor_state(struct saved_context *ctxt) */ #ifdef CONFIG_X86_64 wrmsrl(MSR_GS_BASE, ctxt->kernelmode_gs_base);
/*
* Restore FRED configs.
*
* FRED configs are completely derived from current kernel text and
* data mappings, thus nothing needs to be saved and restored.
*
* As such, simply re-initialize FRED to restore FRED configs.
Instead of the above, I would just say "Reinitialize FRED to ensure that the FRED registers contain the same values as before hibernation."
*
* Note, FRED RSPs setup needs to access percpu data structures.
And I'm not sure what you wanted to say here? Does this refer to the ordering of the code below or to something else?
*/
if (ctxt->cr4 & X86_CR4_FRED) {
cpu_init_fred_exceptions();
cpu_init_fred_rsps();
}
#else loadsegment(fs, __KERNEL_PERCPU);
#endif
Just to make it clear: the patch is correct, the shortcoming is in the description.
Yes, the code changes in the patch are technically correct.
I would say that Xin's description, although perhaps excessively brief, is correct from the *hardware* point of view, whereas Rafael adds the much needed *software* perspective.
As far as hardware is concerned, Linux S4 is just a power on (we don't use any BIOS support for S4 even if it exists, which it rarely does anymore, and for very good reasons.) From a software point of view, it is more like a kexec into the frozen kernel image, which then has to re-establish its runtime execution environment – (including the FRED state, which is what this patch does.)
For the APs this is done through the normal AP bringup mechanism, it is only the BSP that needs special treatment.
That's correct.
* H. Peter Anvin hpa@zytor.com wrote:
Just to make it clear: the patch is correct, the shortcoming is in the description.
I would say that Xin's description, although perhaps excessively brief, is correct from the *hardware* point of view, whereas Rafael adds the much needed *software* perspective.
This part of the -v1 patch was a bit misleading to me:
Due to changes in the kernel text and data mappings, the FRED MSRs must be reinitialized.
... as it suggests that the FRED MSRs will change from before the suspend - while they don't.
What this sentence meant is that FRED MSRs set up by the intermediate *kexec kernel* are incorrect and must be reinitialized again to reconstruct the pre-hibernation state. Ie. there's 3 FRED setup states: pre-S4, kexec and post-S4, where pre-S4 == post-S4. Right?
I think the description and comments in the -v2 patch are better in this regard.
Thanks,
Ingo
On 3/31/2025 8:30 AM, Rafael J. Wysocki wrote:
*
* Note, FRED RSPs setup needs to access percpu data structures.
And I'm not sure what you wanted to say here? Does this refer to the ordering of the code below or to something else?
Yes, this refers to the ordering of the code: FRED config restoration needs to happen after wrmsrl(MSR_GS_BASE, ctxt->kernelmode_gs_base), or the system still hangs.
I will make it explicit in the next iteration.
linux-stable-mirror@lists.linaro.org