The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x b4885bd5935bb26f0a414ad55679a372e53f9b9b
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025041739-rumor-retention-833e@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From b4885bd5935bb26f0a414ad55679a372e53f9b9b Mon Sep 17 00:00:00 2001
From: Alexandra Diupina <adiupina(a)astralinux.ru>
Date: Wed, 19 Mar 2025 17:28:58 +0300
Subject: [PATCH] cifs: avoid NULL pointer dereference in dbg call
cifs_server_dbg() implies server to be non-NULL so
move call under condition to avoid NULL pointer dereference.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Fixes: e79b0332ae06 ("cifs: ignore cached share root handle closing errors")
Cc: stable(a)vger.kernel.org
Signed-off-by: Alexandra Diupina <adiupina(a)astralinux.ru>
Signed-off-by: Steve French <stfrench(a)microsoft.com>
diff --git a/fs/smb/client/smb2misc.c b/fs/smb/client/smb2misc.c
index f3c4b70b77b9..cddf273c14ae 100644
--- a/fs/smb/client/smb2misc.c
+++ b/fs/smb/client/smb2misc.c
@@ -816,11 +816,12 @@ smb2_handle_cancelled_close(struct cifs_tcon *tcon, __u64 persistent_fid,
WARN_ONCE(tcon->tc_count < 0, "tcon refcount is negative");
spin_unlock(&cifs_tcp_ses_lock);
- if (tcon->ses)
+ if (tcon->ses) {
server = tcon->ses->server;
-
- cifs_server_dbg(FYI, "tid=0x%x: tcon is closing, skipping async close retry of fid %llu %llu\n",
- tcon->tid, persistent_fid, volatile_fid);
+ cifs_server_dbg(FYI,
+ "tid=0x%x: tcon is closing, skipping async close retry of fid %llu %llu\n",
+ tcon->tid, persistent_fid, volatile_fid);
+ }
return 0;
}
Signed-off-by: He Zhe <zhe.he(a)windriver.com>
---
scripts/cve_create | 2 +-
scripts/cve_create_batch | 2 +-
scripts/cve_search | 2 +-
scripts/cvelistV5_check | 2 +-
4 files changed, 4 insertions(+), 4 deletions(-)
diff --git a/scripts/cve_create b/scripts/cve_create
index dee842d43..eed7b45b2 100755
--- a/scripts/cve_create
+++ b/scripts/cve_create
@@ -24,7 +24,7 @@
KERNEL_TREE=${CVEKERNELTREE}
if [ ! -d ${KERNEL_TREE} ]; then
- echo "CVEERNELTREE needs setting to the stable repo directory"
+ echo "CVEKERNELTREE needs setting to the stable repo directory"
echo "Either manually export it or add it to your .bashrc/.zshrc et al."
echo "See HOWTO in the root of this repo"
exit 1
diff --git a/scripts/cve_create_batch b/scripts/cve_create_batch
index 00c7e89f8..28e98b836 100755
--- a/scripts/cve_create_batch
+++ b/scripts/cve_create_batch
@@ -27,7 +27,7 @@ fi
KERNEL_TREE=${CVEKERNELTREE}
if [ ! -d ${KERNEL_TREE} ]; then
- echo "CVEERNELTREE needs setting to the stable repo directory"
+ echo "CVEKERNELTREE needs setting to the stable repo directory"
echo "Either manually export it or add it to your .bashrc/.zshrc et al."
echo "See HOWTO in the root of this repo"
exit 1
diff --git a/scripts/cve_search b/scripts/cve_search
index cd90a1599..cb0730c63 100755
--- a/scripts/cve_search
+++ b/scripts/cve_search
@@ -18,7 +18,7 @@
KERNEL_TREE=${CVEKERNELTREE}
if [ ! -d "${KERNEL_TREE}" ]; then
- echo "CVEERNELTREE needs setting to the stable repo directory"
+ echo "CVEKERNELTREE needs setting to the stable repo directory"
echo "Either manually export it or add it to your .bashrc/.zshrc et al."
echo "See HOWTO in the root of this repo"
exit 1
diff --git a/scripts/cvelistV5_check b/scripts/cvelistV5_check
index 5eb41cea1..8a3bd71d3 100755
--- a/scripts/cvelistV5_check
+++ b/scripts/cvelistV5_check
@@ -45,7 +45,7 @@ fi
KERNEL_TREE=${CVEKERNELTREE}
if [ ! -d ${KERNEL_TREE} ]; then
- echo "CVEERNELTREE needs setting to the stable repo directory"
+ echo "CVEKERNELTREE needs setting to the stable repo directory"
echo "Either manually export it or add it to your .bashrc/.zshrc et al."
echo "See HOWTO in the root of this repo"
exit 1
--
2.34.1
The following commit has been merged into the perf/core branch of tip:
Commit-ID: a5f5e1238f4ff919816f69e77d2537a48911767b
Gitweb: https://git.kernel.org/tip/a5f5e1238f4ff919816f69e77d2537a48911767b
Author: Dapeng Mi <dapeng1.mi(a)linux.intel.com>
AuthorDate: Tue, 15 Apr 2025 10:41:34
Committer: Ingo Molnar <mingo(a)kernel.org>
CommitterDate: Thu, 17 Apr 2025 14:19:07 +02:00
perf/x86/intel: Don't clear perf metrics overflow bit unconditionally
The below code would always unconditionally clear other status bits like
perf metrics overflow bit once PEBS buffer overflows:
status &= intel_ctrl | GLOBAL_STATUS_TRACE_TOPAPMI;
This is incorrect. Perf metrics overflow bit should be cleared only when
fixed counter 3 in PEBS counter group. Otherwise perf metrics overflow
could be missed to handle.
Closes: https://lore.kernel.org/all/20250225110012.GK31462@noisy.programming.kicks-…
Fixes: 7b2c05a15d29 ("perf/x86/intel: Generic support for hardware TopDown metrics")
Signed-off-by: Dapeng Mi <dapeng1.mi(a)linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz(a)infradead.org>
Signed-off-by: Ingo Molnar <mingo(a)kernel.org>
Reviewed-by: Kan Liang <kan.liang(a)linux.intel.com>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/20250415104135.318169-1-dapeng1.mi@linux.intel.com
---
arch/x86/events/intel/core.c | 13 +++++++++++--
1 file changed, 11 insertions(+), 2 deletions(-)
diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index 09d2d66..2b70a3a 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -3049,7 +3049,6 @@ static int handle_pmi_common(struct pt_regs *regs, u64 status)
struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
int bit;
int handled = 0;
- u64 intel_ctrl = hybrid(cpuc->pmu, intel_ctrl);
inc_irq_stat(apic_perf_irqs);
@@ -3093,7 +3092,6 @@ static int handle_pmi_common(struct pt_regs *regs, u64 status)
handled++;
x86_pmu_handle_guest_pebs(regs, &data);
static_call(x86_pmu_drain_pebs)(regs, &data);
- status &= intel_ctrl | GLOBAL_STATUS_TRACE_TOPAPMI;
/*
* PMI throttle may be triggered, which stops the PEBS event.
@@ -3104,6 +3102,15 @@ static int handle_pmi_common(struct pt_regs *regs, u64 status)
*/
if (pebs_enabled != cpuc->pebs_enabled)
wrmsrl(MSR_IA32_PEBS_ENABLE, cpuc->pebs_enabled);
+
+ /*
+ * Above PEBS handler (PEBS counters snapshotting) has updated fixed
+ * counter 3 and perf metrics counts if they are in counter group,
+ * unnecessary to update again.
+ */
+ if (cpuc->events[INTEL_PMC_IDX_FIXED_SLOTS] &&
+ is_pebs_counter_event_group(cpuc->events[INTEL_PMC_IDX_FIXED_SLOTS]))
+ status &= ~GLOBAL_STATUS_PERF_METRICS_OVF_BIT;
}
/*
@@ -3123,6 +3130,8 @@ static int handle_pmi_common(struct pt_regs *regs, u64 status)
static_call(intel_pmu_update_topdown_event)(NULL, NULL);
}
+ status &= hybrid(cpuc->pmu, intel_ctrl);
+
/*
* Checkpointed counters can lead to 'spurious' PMIs because the
* rollback caused by the PMI will have cleared the overflow status
The following commit has been merged into the perf/core branch of tip:
Commit-ID: 71dcc11c2cd9e434c34a63154ecadca21c135ddd
Gitweb: https://git.kernel.org/tip/71dcc11c2cd9e434c34a63154ecadca21c135ddd
Author: Dapeng Mi <dapeng1.mi(a)linux.intel.com>
AuthorDate: Tue, 15 Apr 2025 10:41:35
Committer: Ingo Molnar <mingo(a)kernel.org>
CommitterDate: Thu, 17 Apr 2025 14:19:38 +02:00
perf/x86/intel: Allow to update user space GPRs from PEBS records
Currently when a user samples user space GPRs (--user-regs option) with
PEBS, the user space GPRs actually always come from software PMI
instead of from PEBS hardware. This leads to the sampled GPRs to
possibly be inaccurate for single PEBS record case because of the
skid between counter overflow and GPRs sampling on PMI.
For the large PEBS case, it is even worse. If user sets the
exclude_kernel attribute, large PEBS would be used to sample user space
GPRs, but since PEBS GPRs group is not really enabled, it leads to all
samples in the large PEBS record to share the same piece of user space
GPRs, like this reproducer shows:
$ perf record -e branches:pu --user-regs=ip,ax -c 100000 ./foo
$ perf report -D | grep "AX"
.... AX 0x000000003a0d4ead
.... AX 0x000000003a0d4ead
.... AX 0x000000003a0d4ead
.... AX 0x000000003a0d4ead
.... AX 0x000000003a0d4ead
.... AX 0x000000003a0d4ead
.... AX 0x000000003a0d4ead
.... AX 0x000000003a0d4ead
.... AX 0x000000003a0d4ead
.... AX 0x000000003a0d4ead
.... AX 0x000000003a0d4ead
So enable GPRs group for user space GPRs sampling and prioritize reading
GPRs from PEBS. If the PEBS sampled GPRs is not user space GPRs (single
PEBS record case), perf_sample_regs_user() modifies them to user space
GPRs.
[ mingo: Clarified the changelog. ]
Fixes: c22497f5838c ("perf/x86/intel: Support adaptive PEBS v4")
Signed-off-by: Dapeng Mi <dapeng1.mi(a)linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz(a)infradead.org>
Signed-off-by: Ingo Molnar <mingo(a)kernel.org>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/20250415104135.318169-2-dapeng1.mi@linux.intel.com
---
arch/x86/events/intel/ds.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c
index 1f7e1a6..18c3ab5 100644
--- a/arch/x86/events/intel/ds.c
+++ b/arch/x86/events/intel/ds.c
@@ -1399,8 +1399,10 @@ static u64 pebs_update_adaptive_cfg(struct perf_event *event)
* + precise_ip < 2 for the non event IP
* + For RTM TSX weight we need GPRs for the abort code.
*/
- gprs = (sample_type & PERF_SAMPLE_REGS_INTR) &&
- (attr->sample_regs_intr & PEBS_GP_REGS);
+ gprs = ((sample_type & PERF_SAMPLE_REGS_INTR) &&
+ (attr->sample_regs_intr & PEBS_GP_REGS)) ||
+ ((sample_type & PERF_SAMPLE_REGS_USER) &&
+ (attr->sample_regs_user & PEBS_GP_REGS));
tsx_weight = (sample_type & PERF_SAMPLE_WEIGHT_TYPE) &&
((attr->config & INTEL_ARCH_EVENT_MASK) ==
@@ -2123,7 +2125,7 @@ static void setup_pebs_adaptive_sample_data(struct perf_event *event,
regs->flags &= ~PERF_EFLAGS_EXACT;
}
- if (sample_type & PERF_SAMPLE_REGS_INTR)
+ if (sample_type & (PERF_SAMPLE_REGS_INTR | PERF_SAMPLE_REGS_USER))
adaptive_pebs_save_regs(regs, gprs);
}
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x e1a453a57bc76be678bd746f84e3d73f378a9511
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025041710-herald-hardwood-63d1@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From e1a453a57bc76be678bd746f84e3d73f378a9511 Mon Sep 17 00:00:00 2001
From: Steven Rostedt <rostedt(a)goodmis.org>
Date: Mon, 7 Apr 2025 15:41:39 -0400
Subject: [PATCH] tracing: Do not add length to print format in synthetic
events
The following causes a vsnprintf fault:
# echo 's:wake_lat char[] wakee; u64 delta;' >> /sys/kernel/tracing/dynamic_events
# echo 'hist:keys=pid:ts=common_timestamp.usecs if !(common_flags & 0x18)' > /sys/kernel/tracing/events/sched/sched_waking/trigger
# echo 'hist:keys=next_pid:delta=common_timestamp.usecs-$ts:onmatch(sched.sched_waking).trace(wake_lat,next_comm,$delta)' > /sys/kernel/tracing/events/sched/sched_switch/trigger
Because the synthetic event's "wakee" field is created as a dynamic string
(even though the string copied is not). The print format to print the
dynamic string changed from "%*s" to "%s" because another location
(__set_synth_event_print_fmt()) exported this to user space, and user
space did not need that. But it is still used in print_synth_event(), and
the output looks like:
<idle>-0 [001] d..5. 193.428167: wake_lat: wakee=(efault)sshd-sessiondelta=155
sshd-session-879 [001] d..5. 193.811080: wake_lat: wakee=(efault)kworker/u34:5delta=58
<idle>-0 [002] d..5. 193.811198: wake_lat: wakee=(efault)bashdelta=91
bash-880 [002] d..5. 193.811371: wake_lat: wakee=(efault)kworker/u35:2delta=21
<idle>-0 [001] d..5. 193.811516: wake_lat: wakee=(efault)sshd-sessiondelta=129
sshd-session-879 [001] d..5. 193.967576: wake_lat: wakee=(efault)kworker/u34:5delta=50
The length isn't needed as the string is always nul terminated. Just print
the string and not add the length (which was hard coded to the max string
length anyway).
Cc: stable(a)vger.kernel.org
Cc: Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com>
Cc: Tom Zanussi <zanussi(a)kernel.org>
Cc: Douglas Raillard <douglas.raillard(a)arm.com>
Acked-by: Masami Hiramatsu (Google) <mhiramat(a)kernel.org>
Link: https://lore.kernel.org/20250407154139.69955768@gandalf.local.home
Fixes: 4d38328eb442d ("tracing: Fix synth event printk format for str fields");
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
diff --git a/kernel/trace/trace_events_synth.c b/kernel/trace/trace_events_synth.c
index 969f48742d72..33cfbd4ed76d 100644
--- a/kernel/trace/trace_events_synth.c
+++ b/kernel/trace/trace_events_synth.c
@@ -370,7 +370,6 @@ static enum print_line_t print_synth_event(struct trace_iterator *iter,
union trace_synth_field *data = &entry->fields[n_u64];
trace_seq_printf(s, print_fmt, se->fields[i]->name,
- STR_VAR_LEN_MAX,
(char *)entry + data->as_dynamic.offset,
i == se->n_fields - 1 ? "" : " ");
n_u64++;
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x e1a453a57bc76be678bd746f84e3d73f378a9511
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025041706-ambiance-zen-5f4e@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From e1a453a57bc76be678bd746f84e3d73f378a9511 Mon Sep 17 00:00:00 2001
From: Steven Rostedt <rostedt(a)goodmis.org>
Date: Mon, 7 Apr 2025 15:41:39 -0400
Subject: [PATCH] tracing: Do not add length to print format in synthetic
events
The following causes a vsnprintf fault:
# echo 's:wake_lat char[] wakee; u64 delta;' >> /sys/kernel/tracing/dynamic_events
# echo 'hist:keys=pid:ts=common_timestamp.usecs if !(common_flags & 0x18)' > /sys/kernel/tracing/events/sched/sched_waking/trigger
# echo 'hist:keys=next_pid:delta=common_timestamp.usecs-$ts:onmatch(sched.sched_waking).trace(wake_lat,next_comm,$delta)' > /sys/kernel/tracing/events/sched/sched_switch/trigger
Because the synthetic event's "wakee" field is created as a dynamic string
(even though the string copied is not). The print format to print the
dynamic string changed from "%*s" to "%s" because another location
(__set_synth_event_print_fmt()) exported this to user space, and user
space did not need that. But it is still used in print_synth_event(), and
the output looks like:
<idle>-0 [001] d..5. 193.428167: wake_lat: wakee=(efault)sshd-sessiondelta=155
sshd-session-879 [001] d..5. 193.811080: wake_lat: wakee=(efault)kworker/u34:5delta=58
<idle>-0 [002] d..5. 193.811198: wake_lat: wakee=(efault)bashdelta=91
bash-880 [002] d..5. 193.811371: wake_lat: wakee=(efault)kworker/u35:2delta=21
<idle>-0 [001] d..5. 193.811516: wake_lat: wakee=(efault)sshd-sessiondelta=129
sshd-session-879 [001] d..5. 193.967576: wake_lat: wakee=(efault)kworker/u34:5delta=50
The length isn't needed as the string is always nul terminated. Just print
the string and not add the length (which was hard coded to the max string
length anyway).
Cc: stable(a)vger.kernel.org
Cc: Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com>
Cc: Tom Zanussi <zanussi(a)kernel.org>
Cc: Douglas Raillard <douglas.raillard(a)arm.com>
Acked-by: Masami Hiramatsu (Google) <mhiramat(a)kernel.org>
Link: https://lore.kernel.org/20250407154139.69955768@gandalf.local.home
Fixes: 4d38328eb442d ("tracing: Fix synth event printk format for str fields");
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
diff --git a/kernel/trace/trace_events_synth.c b/kernel/trace/trace_events_synth.c
index 969f48742d72..33cfbd4ed76d 100644
--- a/kernel/trace/trace_events_synth.c
+++ b/kernel/trace/trace_events_synth.c
@@ -370,7 +370,6 @@ static enum print_line_t print_synth_event(struct trace_iterator *iter,
union trace_synth_field *data = &entry->fields[n_u64];
trace_seq_printf(s, print_fmt, se->fields[i]->name,
- STR_VAR_LEN_MAX,
(char *)entry + data->as_dynamic.offset,
i == se->n_fields - 1 ? "" : " ");
n_u64++;
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x f2f29da9f0d4367f6ff35e0d9d021257bb53e273
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025041752-bleak-unmixable-8ca9@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From f2f29da9f0d4367f6ff35e0d9d021257bb53e273 Mon Sep 17 00:00:00 2001
From: Myrrh Periwinkle <myrrhperiwinkle(a)qtmlabs.xyz>
Date: Sun, 6 Apr 2025 11:45:22 +0700
Subject: [PATCH] x86/e820: Fix handling of subpage regions when calculating
nosave ranges in e820__register_nosave_regions()
While debugging kexec/hibernation hangs and crashes, it turned out that
the current implementation of e820__register_nosave_regions() suffers from
multiple serious issues:
- The end of last region is tracked by PFN, causing it to find holes
that aren't there if two consecutive subpage regions are present
- The nosave PFN ranges derived from holes are rounded out (instead of
rounded in) which makes it inconsistent with how explicitly reserved
regions are handled
Fix this by:
- Treating reserved regions as if they were holes, to ensure consistent
handling (rounding out nosave PFN ranges is more correct as the
kernel does not use partial pages)
- Tracking the end of the last RAM region by address instead of pages
to detect holes more precisely
These bugs appear to have been introduced about ~18 years ago with the very
first version of e820_mark_nosave_regions(), and its flawed assumptions were
carried forward uninterrupted through various waves of rewrites and renames.
[ mingo: Added Git archeology details, for kicks and giggles. ]
Fixes: e8eff5ac294e ("[PATCH] Make swsusp avoid memory holes and reserved memory regions on x86_64")
Reported-by: Roberto Ricci <io(a)r-ricci.it>
Tested-by: Roberto Ricci <io(a)r-ricci.it>
Signed-off-by: Myrrh Periwinkle <myrrhperiwinkle(a)qtmlabs.xyz>
Signed-off-by: Ingo Molnar <mingo(a)kernel.org>
Cc: Rafael J. Wysocki <rafael.j.wysocki(a)intel.com>
Cc: Ard Biesheuvel <ardb(a)kernel.org>
Cc: H. Peter Anvin <hpa(a)zytor.com>
Cc: Kees Cook <keescook(a)chromium.org>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: David Woodhouse <dwmw(a)amazon.co.uk>
Cc: Len Brown <len.brown(a)intel.com>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/20250406-fix-e820-nosave-v3-1-f3787bc1ee1d@qtmlab…
Closes: https://lore.kernel.org/all/Z4WFjBVHpndct7br@desktop0a/
diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c
index 57120f0749cc..9d8dd8deb2a7 100644
--- a/arch/x86/kernel/e820.c
+++ b/arch/x86/kernel/e820.c
@@ -753,22 +753,21 @@ void __init e820__memory_setup_extended(u64 phys_addr, u32 data_len)
void __init e820__register_nosave_regions(unsigned long limit_pfn)
{
int i;
- unsigned long pfn = 0;
+ u64 last_addr = 0;
for (i = 0; i < e820_table->nr_entries; i++) {
struct e820_entry *entry = &e820_table->entries[i];
- if (pfn < PFN_UP(entry->addr))
- register_nosave_region(pfn, PFN_UP(entry->addr));
-
- pfn = PFN_DOWN(entry->addr + entry->size);
-
if (entry->type != E820_TYPE_RAM)
- register_nosave_region(PFN_UP(entry->addr), pfn);
+ continue;
- if (pfn >= limit_pfn)
- break;
+ if (last_addr < entry->addr)
+ register_nosave_region(PFN_DOWN(last_addr), PFN_UP(entry->addr));
+
+ last_addr = entry->addr + entry->size;
}
+
+ register_nosave_region(PFN_DOWN(last_addr), limit_pfn);
}
#ifdef CONFIG_ACPI
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x f2f29da9f0d4367f6ff35e0d9d021257bb53e273
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025041748-cosmetics-maritime-e503@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From f2f29da9f0d4367f6ff35e0d9d021257bb53e273 Mon Sep 17 00:00:00 2001
From: Myrrh Periwinkle <myrrhperiwinkle(a)qtmlabs.xyz>
Date: Sun, 6 Apr 2025 11:45:22 +0700
Subject: [PATCH] x86/e820: Fix handling of subpage regions when calculating
nosave ranges in e820__register_nosave_regions()
While debugging kexec/hibernation hangs and crashes, it turned out that
the current implementation of e820__register_nosave_regions() suffers from
multiple serious issues:
- The end of last region is tracked by PFN, causing it to find holes
that aren't there if two consecutive subpage regions are present
- The nosave PFN ranges derived from holes are rounded out (instead of
rounded in) which makes it inconsistent with how explicitly reserved
regions are handled
Fix this by:
- Treating reserved regions as if they were holes, to ensure consistent
handling (rounding out nosave PFN ranges is more correct as the
kernel does not use partial pages)
- Tracking the end of the last RAM region by address instead of pages
to detect holes more precisely
These bugs appear to have been introduced about ~18 years ago with the very
first version of e820_mark_nosave_regions(), and its flawed assumptions were
carried forward uninterrupted through various waves of rewrites and renames.
[ mingo: Added Git archeology details, for kicks and giggles. ]
Fixes: e8eff5ac294e ("[PATCH] Make swsusp avoid memory holes and reserved memory regions on x86_64")
Reported-by: Roberto Ricci <io(a)r-ricci.it>
Tested-by: Roberto Ricci <io(a)r-ricci.it>
Signed-off-by: Myrrh Periwinkle <myrrhperiwinkle(a)qtmlabs.xyz>
Signed-off-by: Ingo Molnar <mingo(a)kernel.org>
Cc: Rafael J. Wysocki <rafael.j.wysocki(a)intel.com>
Cc: Ard Biesheuvel <ardb(a)kernel.org>
Cc: H. Peter Anvin <hpa(a)zytor.com>
Cc: Kees Cook <keescook(a)chromium.org>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: David Woodhouse <dwmw(a)amazon.co.uk>
Cc: Len Brown <len.brown(a)intel.com>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/20250406-fix-e820-nosave-v3-1-f3787bc1ee1d@qtmlab…
Closes: https://lore.kernel.org/all/Z4WFjBVHpndct7br@desktop0a/
diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c
index 57120f0749cc..9d8dd8deb2a7 100644
--- a/arch/x86/kernel/e820.c
+++ b/arch/x86/kernel/e820.c
@@ -753,22 +753,21 @@ void __init e820__memory_setup_extended(u64 phys_addr, u32 data_len)
void __init e820__register_nosave_regions(unsigned long limit_pfn)
{
int i;
- unsigned long pfn = 0;
+ u64 last_addr = 0;
for (i = 0; i < e820_table->nr_entries; i++) {
struct e820_entry *entry = &e820_table->entries[i];
- if (pfn < PFN_UP(entry->addr))
- register_nosave_region(pfn, PFN_UP(entry->addr));
-
- pfn = PFN_DOWN(entry->addr + entry->size);
-
if (entry->type != E820_TYPE_RAM)
- register_nosave_region(PFN_UP(entry->addr), pfn);
+ continue;
- if (pfn >= limit_pfn)
- break;
+ if (last_addr < entry->addr)
+ register_nosave_region(PFN_DOWN(last_addr), PFN_UP(entry->addr));
+
+ last_addr = entry->addr + entry->size;
}
+
+ register_nosave_region(PFN_DOWN(last_addr), limit_pfn);
}
#ifdef CONFIG_ACPI
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x f2f29da9f0d4367f6ff35e0d9d021257bb53e273
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025041744-grill-debit-4fe2@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From f2f29da9f0d4367f6ff35e0d9d021257bb53e273 Mon Sep 17 00:00:00 2001
From: Myrrh Periwinkle <myrrhperiwinkle(a)qtmlabs.xyz>
Date: Sun, 6 Apr 2025 11:45:22 +0700
Subject: [PATCH] x86/e820: Fix handling of subpage regions when calculating
nosave ranges in e820__register_nosave_regions()
While debugging kexec/hibernation hangs and crashes, it turned out that
the current implementation of e820__register_nosave_regions() suffers from
multiple serious issues:
- The end of last region is tracked by PFN, causing it to find holes
that aren't there if two consecutive subpage regions are present
- The nosave PFN ranges derived from holes are rounded out (instead of
rounded in) which makes it inconsistent with how explicitly reserved
regions are handled
Fix this by:
- Treating reserved regions as if they were holes, to ensure consistent
handling (rounding out nosave PFN ranges is more correct as the
kernel does not use partial pages)
- Tracking the end of the last RAM region by address instead of pages
to detect holes more precisely
These bugs appear to have been introduced about ~18 years ago with the very
first version of e820_mark_nosave_regions(), and its flawed assumptions were
carried forward uninterrupted through various waves of rewrites and renames.
[ mingo: Added Git archeology details, for kicks and giggles. ]
Fixes: e8eff5ac294e ("[PATCH] Make swsusp avoid memory holes and reserved memory regions on x86_64")
Reported-by: Roberto Ricci <io(a)r-ricci.it>
Tested-by: Roberto Ricci <io(a)r-ricci.it>
Signed-off-by: Myrrh Periwinkle <myrrhperiwinkle(a)qtmlabs.xyz>
Signed-off-by: Ingo Molnar <mingo(a)kernel.org>
Cc: Rafael J. Wysocki <rafael.j.wysocki(a)intel.com>
Cc: Ard Biesheuvel <ardb(a)kernel.org>
Cc: H. Peter Anvin <hpa(a)zytor.com>
Cc: Kees Cook <keescook(a)chromium.org>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: David Woodhouse <dwmw(a)amazon.co.uk>
Cc: Len Brown <len.brown(a)intel.com>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/20250406-fix-e820-nosave-v3-1-f3787bc1ee1d@qtmlab…
Closes: https://lore.kernel.org/all/Z4WFjBVHpndct7br@desktop0a/
diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c
index 57120f0749cc..9d8dd8deb2a7 100644
--- a/arch/x86/kernel/e820.c
+++ b/arch/x86/kernel/e820.c
@@ -753,22 +753,21 @@ void __init e820__memory_setup_extended(u64 phys_addr, u32 data_len)
void __init e820__register_nosave_regions(unsigned long limit_pfn)
{
int i;
- unsigned long pfn = 0;
+ u64 last_addr = 0;
for (i = 0; i < e820_table->nr_entries; i++) {
struct e820_entry *entry = &e820_table->entries[i];
- if (pfn < PFN_UP(entry->addr))
- register_nosave_region(pfn, PFN_UP(entry->addr));
-
- pfn = PFN_DOWN(entry->addr + entry->size);
-
if (entry->type != E820_TYPE_RAM)
- register_nosave_region(PFN_UP(entry->addr), pfn);
+ continue;
- if (pfn >= limit_pfn)
- break;
+ if (last_addr < entry->addr)
+ register_nosave_region(PFN_DOWN(last_addr), PFN_UP(entry->addr));
+
+ last_addr = entry->addr + entry->size;
}
+
+ register_nosave_region(PFN_DOWN(last_addr), limit_pfn);
}
#ifdef CONFIG_ACPI